I have A csv file with more 10000000 records how can split into 5 smaller csv file (each with + o- 2000000 refcords) to performafter my actions ?

CSv file | Community

Userlevel 4

+30

danilo_fme
Evangelist
1873 replies
6 years ago
14 January 2018

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo

Userlevel 4

+25

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.

+22

jdh
Contributor
1959 replies
6 years ago
15 January 2018

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).

Userlevel 4

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo

I suspect using a blocking transformer on 10 million features isn't going to be super fast...

Userlevel 4

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.

The Grouper: Non-blocking and doesn't mess up the record order, I like it!

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
16 January 2018

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

Userlevel 4

+25

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).

Of course. ModuloCounter is a great solution here.

+22

jdh
Contributor
1959 replies
6 years ago
16 January 2018

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

Yeah, I had considered that, but it presupposed the total number of records.

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
16 January 2018

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

If the number of files must be five even if the number of input records was more than 10 million, this expression is available.

CSv file

9 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded