Skip to main content

Hi,

I need to read approx. 1.4M records within a single workbench process. No getting around it. They must all be read since they are participating in a FeatureMerger later and any one of the features might be a candidate.

The reader starts out fine reading 2500 records a time under 10 seconds a pop. However this slowly degrades so that after 800K records it is now 30 sec. a pop and near the end close to 50. Soup to nuts, the entire read takes about 4 hrs and 45 min. RAM/Network speed really isn't a problem. So the gradual increase of record processing time seems odd.

Is there any tip/trick for reading a large number of records (>300K ) from PostGIS (or any) reader so that the amount of time is significantly reduced? I was thinking of amping up the "Number of Records To Fetch At A Time" but I don't want to make matters worse.

Thanks very much,

Pete

Hi @pvaziri, I suspect that your workflow could consume a large memory space and it could cause decreasing the performance gradually.

If you send the features read from the PostGIS database to the Requestor port of the FeatureMerger, consider if you could apply the Supplier First option. If you could guarantee that all Supplier features would arrive in the FeatureMerger before the first Requestor feature would arrive, the option could be applied, then it could reduce memory usage and you could expect the performance would be increased.


I tried it @takashi. It is faster. However, in this case, the database table does not have geometry and the FGDB source does. So if I want to keep the geometry provided by the FGDB feature class, don't I have to always define it as the Requestor?


Hi @pvaziri, have you considered using the DatabaseJoiner? Depending on if you need to do any manipulation of the PostGIS features prior to merging, this transformer may help.


Hi @pvaziri, have you considered using the DatabaseJoiner? Depending on if you need to do any manipulation of the PostGIS features prior to merging, this transformer may help.

I'll try that. Thanks.However, I do some things upstream to the sources before the feature merger...

 

 


Hi @pvaziri, have you considered using the DatabaseJoiner? Depending on if you need to do any manipulation of the PostGIS features prior to merging, this transformer may help.

I checked the table and even though I am using a PostGIS reader, the table does not have any geometry (in essence a PostgreSQL) table... I'll try the database joiner but I'm worried that it will also bog things down since it is storing those values before moving on, no?

 

 


I tried it @takashi. It is faster. However, in this case, the database table does not have geometry and the FGDB source does. So if I want to keep the geometry provided by the FGDB feature class, don't I have to always define it as the Requestor?

Not sure how this works for you performance wise but you can easily put the geometry in an attribute using the GeometryExtractor (I use FME Binary) and restore the geometry with an GeometryReplacer.

Are you running this process on 32bit or 64bit FME? On 32bit you would probably quickly run out of usable physical RAM which would significantly slow down your process.


I'd also suggest trying the latest FME 2018 and try the FeatureJoiner instead of FeatureMerger. I'd suspect it would be much faster in all ways.


For more heavy jobs I prefer not to load all the data to workbench. Alternatives I use are:

- Only load the data you need. FeatureReaders with where statements and / or spatial intersects will help you dynamically load subsets of data.

 

- Let the database do the job, it will be quicker than anything else. Write the to be merged data to a temp table, dynamically write a join script and run it with the SQL executor to pick it up again.

Probably not the fastest to create next-next-finish workspace but problems often have more than one workable solution. As I have not too much patience I tend to rebuild the workspaces when data grows. When data volumes get too big you need to consider an alternative anyway. Try loading terrestrial laserscan data :-)


For more heavy jobs I prefer not to load all the data to workbench. Alternatives I use are:

- Only load the data you need. FeatureReaders with where statements and / or spatial intersects will help you dynamically load subsets of data.

 

- Let the database do the job, it will be quicker than anything else. Write the to be merged data to a temp table, dynamically write a join script and run it with the SQL executor to pick it up again.

Probably not the fastest to create next-next-finish workspace but problems often have more than one workable solution. As I have not too much patience I tend to rebuild the workspaces when data grows. When data volumes get too big you need to consider an alternative anyway. Try loading terrestrial laserscan data :-)

Thanks Niel. The issue I have is that, after the where clause, I still have 1.4M non-spatial fields that I'm trying to merge with 500K spatial fields (point data). There is no way to pre-predict what matches will happen or not otherwise I would filter further. I've had a little better success using the Database Joiner rather than the Feature Merger, so that is positive. Thanks again,

 

Pete

 

 


I'd also suggest trying the latest FME 2018 and try the FeatureJoiner instead of FeatureMerger. I'd suspect it would be much faster in all ways.

I will try this! Thanks Dale!

 

 


Are you running this process on 32bit or 64bit FME? On 32bit you would probably quickly run out of usable physical RAM which would significantly slow down your process.

64-bit, thanks.

 

 


Reply