Skip to main content
Solved

Write index for csv to be used with DatabaseJoiner


nmatton
Contributor
Forum|alt.badge.img+4

Hello !

 

I am joining some data with information included in distinct CSV files. For this joining, I use the DatabaseJoiner. As it is quite big CSV files (> 7M entries), the index creation takes some times before actually making the join.

 

However, as the input data are big also, I am processing the joins by tiles, but it means that for each tiles, the DatabaseJoiner is creating its own index again before joining and so on...

 

Is it a parameter or something else to write somewhere this created index so that it only have to read it again to use on the other tiles ?

 

Many thanks !

 

Nicolas

 

Best answer by markatsafe

@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)

View original
Did this help you find an answer to your question?

4 replies

arnovananrooij
Contributor
Forum|alt.badge.img+5

Hi @nmatton

Maybe it will be better for the perfomance to add a csv reader to read the complete csv at once and use a FeatureMerger (or FeatureJoiner).


nmatton
Contributor
Forum|alt.badge.img+4
  • Author
  • Contributor
  • March 12, 2019
arnovananrooij wrote:

Hi @nmatton

Maybe it will be better for the perfomance to add a csv reader to read the complete csv at once and use a FeatureMerger (or FeatureJoiner).

I can't just read the complete csv, more than 7 million of record will just completely fill up the memory... Additionally, it does not solve the issue as I am tiling the process, meaning that I run several independent FME processes. And this is "index" result that I want to use across those different FME processes


Forum|alt.badge.img+2
  • Best Answer
  • March 12, 2019

@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)


nmatton
Contributor
Forum|alt.badge.img+4
  • Author
  • Contributor
  • March 13, 2019
markatsafe wrote:

@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)

Thanks for the answer ! I'll try this approach


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings