Skip to main content
Helo,

 

I want to filter from a CSV file all the features that have an ID present in the database (unique ID as it is the table primary key).

For the moment I used a CSV reader to get my features from the CSV file and then used a Database joiner with join on the ID both in my table (primary key) and in the CSV. The cardinality is set to "1:1".

This way I get in the port the features with no match in the database table.

I'm not sure it is the fastest way (takes more than 50 sec to perform the join - ~400 000 lines in the table). Would it be more efficient to have all the ID's in a list and filter with this list ?

I am a little bit lost how to do it efficiently. Any idea would be appreciated.

It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

 

 

FME can read CSV super fast (like lightning fast) - and if you use the FeatureJoiner (not FeatureMerger) with the list of ID's and your original CSV you should see a very fast workflow. Just be sure you're using 2018+ to get the speed benefits (see this article on Bulk Mode: https://knowledge.safe.com/questions/49482/what-transformers-have-feature-table-technology-fo.html ).

 

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

 

 


Depends on the numbers involved and database set up, but it can often be quicker to write a temp table containing all the ids from the csv file, then an sqlexecutor to select all the records that don't appear in this temp table.

It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

 

 

FME can read CSV super fast (like lightning fast) - and if you use the FeatureJoiner (not FeatureMerger) with the list of ID's and your original CSV you should see a very fast workflow. Just be sure you're using 2018+ to get the speed benefits (see this article on Bulk Mode: https://knowledge.safe.com/questions/49482/what-transformers-have-feature-table-technology-fo.html ).

 

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

 

 

Thanks, I forgot about the Prefetch query. It doesn't seems to accelerate the process though...

I'm on something else right now but I'll try with the ID's in a CSV + FeatureJoiner.


Depends on the numbers involved and database set up, but it can often be quicker to write a temp table containing all the ids from the csv file, then an sqlexecutor to select all the records that don't appear in this temp table.

@egomm : any hints on how to do this with FME (create the temp table and delete it after the SQLExecutor). I'm not sure how...


@egomm : any hints on how to do this with FME (create the temp table and delete it after the SQLExecutor). I'm not sure how...

What flavour is your database?


What flavour is your database?

It's an Oracle DB.


It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

 

 

FME can read CSV super fast (like lightning fast) - and if you use the FeatureJoiner (not FeatureMerger) with the list of ID's and your original CSV you should see a very fast workflow. Just be sure you're using 2018+ to get the speed benefits (see this article on Bulk Mode: https://knowledge.safe.com/questions/49482/what-transformers-have-feature-table-technology-fo.html ).

 

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

 

 

I ended up staying with the solution DatabaseJoiner + prefetch query. It is already pretty fast in fact... thanks for your help.


Reply