Skip to main content
Question

Filtering features from CSV that are present in database


p.jeremie
Contributor
Forum|alt.badge.img+6
Helo,

 

I want to filter from a CSV file all the features that have an ID present in the database (unique ID as it is the table primary key).

For the moment I used a CSV reader to get my features from the CSV file and then used a Database joiner with join on the ID both in my table (primary key) and in the CSV. The cardinality is set to "1:1".

This way I get in the port the features with no match in the database table.

I'm not sure it is the fastest way (takes more than 50 sec to perform the join - ~400 000 lines in the table). Would it be more efficient to have all the ID's in a list and filter with this list ?

I am a little bit lost how to do it efficiently. Any idea would be appreciated.

7 replies

virtualcitymatt
Celebrity
Forum|alt.badge.img+35

It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

 

 

FME can read CSV super fast (like lightning fast) - and if you use the FeatureJoiner (not FeatureMerger) with the list of ID's and your original CSV you should see a very fast workflow. Just be sure you're using 2018+ to get the speed benefits (see this article on Bulk Mode: https://knowledge.safe.com/questions/49482/what-transformers-have-feature-table-technology-fo.html ).

 

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

 

 


ebygomm
Influencer
Forum|alt.badge.img+38
  • Influencer
  • June 7, 2019
Depends on the numbers involved and database set up, but it can often be quicker to write a temp table containing all the ids from the csv file, then an sqlexecutor to select all the records that don't appear in this temp table.

p.jeremie
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • June 7, 2019
virtualcitymatt wrote:

It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

 

 

FME can read CSV super fast (like lightning fast) - and if you use the FeatureJoiner (not FeatureMerger) with the list of ID's and your original CSV you should see a very fast workflow. Just be sure you're using 2018+ to get the speed benefits (see this article on Bulk Mode: https://knowledge.safe.com/questions/49482/what-transformers-have-feature-table-technology-fo.html ).

 

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

 

 

Thanks, I forgot about the Prefetch query. It doesn't seems to accelerate the process though...

I'm on something else right now but I'll try with the ID's in a CSV + FeatureJoiner.


p.jeremie
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • June 7, 2019
ebygomm wrote:
Depends on the numbers involved and database set up, but it can often be quicker to write a temp table containing all the ids from the csv file, then an sqlexecutor to select all the records that don't appear in this temp table.

@egomm : any hints on how to do this with FME (create the temp table and delete it after the SQLExecutor). I'm not sure how...


ebygomm
Influencer
Forum|alt.badge.img+38
  • Influencer
  • June 7, 2019
p.jeremie wrote:

@egomm : any hints on how to do this with FME (create the temp table and delete it after the SQLExecutor). I'm not sure how...

What flavour is your database?


p.jeremie
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • June 7, 2019
ebygomm wrote:

What flavour is your database?

It's an Oracle DB.


p.jeremie
Contributor
Forum|alt.badge.img+6
  • Author
  • Contributor
  • June 17, 2019
virtualcitymatt wrote:

It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

 

 

FME can read CSV super fast (like lightning fast) - and if you use the FeatureJoiner (not FeatureMerger) with the list of ID's and your original CSV you should see a very fast workflow. Just be sure you're using 2018+ to get the speed benefits (see this article on Bulk Mode: https://knowledge.safe.com/questions/49482/what-transformers-have-feature-table-technology-fo.html ).

 

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

 

 

I ended up staying with the solution DatabaseJoiner + prefetch query. It is already pretty fast in fact... thanks for your help.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings