Solved

Write index for csv to be used with DatabaseJoiner

6 years ago
March 12, 2019
4 replies
9 views

+4

nmatton
Contributor
16 replies

Hello !

I am joining some data with information included in distinct CSV files. For this joining, I use the DatabaseJoiner. As it is quite big CSV files (> 7M entries), the index creation takes some times before actually making the join.

However, as the input data are big also, I am processing the joins by tiles, but it means that for each tiles, the DatabaseJoiner is creating its own index again before joining and so on...

Is it a parameter or something else to write somewhere this created index so that it only have to read it again to use on the other tiles ?

Many thanks !

Nicolas

Best answer by markatsafe

@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)

View original

Did this help you find an answer to your question?

+5

arnovananrooij
Contributor
73 replies
6 years ago
March 12, 2019

Hi @nmatton

Maybe it will be better for the perfomance to add a csv reader to read the complete csv at once and use a FeatureMerger (or FeatureJoiner).

+4

nmatton
Author
Contributor
16 replies
6 years ago
March 12, 2019

arnovananrooij wrote:

Hi @nmatton

Maybe it will be better for the perfomance to add a csv reader to read the complete csv at once and use a FeatureMerger (or FeatureJoiner).

I can't just read the complete csv, more than 7 million of record will just completely fill up the memory... Additionally, it does not solve the issue as I am tiling the process, meaning that I run several independent FME processes. And this is "index" result that I want to use across those different FME processes

M

+2

markatsafe
1891 replies
Best Answer
6 years ago
March 12, 2019

@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)

+4

nmatton
Author
Contributor
16 replies
6 years ago
March 13, 2019

markatsafe wrote:

@nmatton I would agree with @arnovananrooij - I think reading in the CSV and using FeatureJoiner will give you better results. In newer versions of FME (2018 and higher) the CSV reader uses a Bulk Mode for reading, which is very fast. FeatureJoiner uses the same Bulk Mode technology. FeatureJoiner to some degree replaces the older FeatureMerger transformer. DatabaseJoiner makes a query for every input feature so that 7M queries back to the CSV (but uses less memory resources! The article Merging or Joining Spreadsheet or Database Data may help you decide which is the most suitable join transformer.

But I would try CSV reader + FeatureJoiner in FME 2018 or higher (64bit)

Thanks for the answer ! I'll try this approach

Reply

Rich Text Editor, editor1

Write index for csv to be used with DatabaseJoiner

4 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to see which features have invalid source datasets when using a FeatureWrite?

How to compare multiple AGOL Feature Services

Simple arithmatic problem

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Helpful Members This Week

Recently Solved Questions

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings