Skip to main content

Hello !

 

I am joining some data with information included in distinct CSV files. For this joining, I use the DatabaseJoiner. As it is quite big CSV files (> 7M entries), the index creation takes some times before actually making the join.

 

However, as the input data are big also, I am processing the joins by tiles, but it means that for each tiles, the DatabaseJoiner is creating its own index again before joining and so on...

 

Is it a parameter or something else to write somewhere this created index so that it only have to read it again to use on the other tiles ?

 

Many thanks !

 

Nicolas

 

Hi @nmatton

Maybe it will be better for the perfomance to add a csv reader to read the complete csv at once and use a FeatureMerger (or FeatureJoiner).


Hi @nmatton

Maybe it will be better for the perfomance to add a csv reader to read the complete csv at once and use a FeatureMerger (or FeatureJoiner).

I can't just read the complete csv, more than 7 million of record will just completely fill up the memory... Additionally, it does not solve the issue as I am tiling the process, meaning that I run several independent FME processes. And this is "index" result that I want to use across those different FME processes


@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)


@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)

Thanks for the answer ! I'll try this approach


Reply