Solved

Write index for csv to be used with DatabaseJoiner

5 years ago
12 March 2019
4 replies
4 views

+4

nmatton
Contributor
16 replies

Hello !

I am joining some data with information included in distinct CSV files. For this joining, I use the DatabaseJoiner. As it is quite big CSV files (> 7M entries), the index creation takes some times before actually making the join.

However, as the input data are big also, I am processing the joins by tiles, but it means that for each tiles, the DatabaseJoiner is creating its own index again before joining and so on...

Is it a parameter or something else to write somewhere this created index so that it only have to read it again to use on the other tiles ?

Many thanks !

Nicolas

icon

Best answer by markatsafe 12 March 2019, 20:13

View original

4 replies

+4

arnovananrooij
Contributor
72 replies
5 years ago
12 March 2019

Hi @nmatton

Maybe it will be better for the perfomance to add a csv reader to read the complete csv at once and use a FeatureMerger (or FeatureJoiner).

+4

nmatton
Author
Contributor
16 replies
5 years ago
12 March 2019

Hi @nmatton

Maybe it will be better for the perfomance to add a csv reader to read the complete csv at once and use a FeatureMerger (or FeatureJoiner).

I can't just read the complete csv, more than 7 million of record will just completely fill up the memory... Additionally, it does not solve the issue as I am tiling the process, meaning that I run several independent FME processes. And this is "index" result that I want to use across those different FME processes

M

+2

markatsafe
1891 replies
5 years ago
12 March 2019
Best Answer

@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)

+4

nmatton
Author
Contributor
16 replies
5 years ago
13 March 2019

@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)

Thanks for the answer ! I'll try this approach

Write index for csv to be used with DatabaseJoiner

4 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded