Question

Filtering features from CSV that are present in database

6 years ago
June 6, 2019
7 replies
162 views

p.jeremie
Contributor
53 replies

Helo,

I want to filter from a CSV file all the features that have an ID present in the database (unique ID as it is the table primary key).

For the moment I used a CSV reader to get my features from the CSV file and then used a Database joiner with join on the ID both in my table (primary key) and in the CSV. The cardinality is set to "1:1".

This way I get in the port the features with no match in the database table.

I'm not sure it is the fastest way (takes more than 50 sec to perform the join - ~400 000 lines in the table). Would it be more efficient to have all the ID's in a list and filter with this list ?

I am a little bit lost how to do it efficiently. Any idea would be appreciated.

+36

virtualcitymatt
Celebrity
1867 replies
6 years ago
June 7, 2019

It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

FME can read CSV super fast (like lightning fast) - and if you use the FeatureJoiner (not FeatureMerger) with the list of ID's and your original CSV you should see a very fast workflow. Just be sure you're using 2018+ to get the speed benefits (see this article on Bulk Mode: https://knowledge.safe.com/questions/49482/what-transformers-have-feature-table-technology-fo.html ).

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

+39

ebygomm
Influencer
3288 replies
6 years ago
June 7, 2019

Depends on the numbers involved and database set up, but it can often be quicker to write a temp table containing all the ids from the csv file, then an sqlexecutor to select all the records that don't appear in this temp table.

p.jeremie
Author
Contributor
53 replies
6 years ago
June 7, 2019

virtualcitymatt wrote:

It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

Thanks, I forgot about the Prefetch query. It doesn't seems to accelerate the process though...

I'm on something else right now but I'll try with the ID's in a CSV + FeatureJoiner.

p.jeremie
Author
Contributor
53 replies
6 years ago
June 7, 2019

ebygomm wrote:

@egomm : any hints on how to do this with FME (create the temp table and delete it after the SQLExecutor). I'm not sure how...

+39

ebygomm
Influencer
3288 replies
6 years ago
June 7, 2019

p.jeremie wrote:

@egomm : any hints on how to do this with FME (create the temp table and delete it after the SQLExecutor). I'm not sure how...

What flavour is your database?

p.jeremie
Author
Contributor
53 replies
6 years ago
June 7, 2019

ebygomm wrote:

What flavour is your database?

It's an Oracle DB.

p.jeremie
Author
Contributor
53 replies
6 years ago
June 17, 2019

virtualcitymatt wrote:

It probably would be faster to have the list of ID's in a CSV file and then just use another CSV reader.

If instead you want to keep using the DB joiner you could try using a Prefetch Query in the DatabaseJoiner

'SELECT id from <table>'

This could speed up your current process but I think having them in a list would be much faster if this doesn't create too much overhead. I'm no database pro so others might have better ideas.

I ended up staying with the solution DatabaseJoiner + prefetch query. It is already pretty fast in fact... thanks for your help.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Filtering features from CSV that are present in database