Question

CSv file

  • 14 January 2018
  • 9 replies
  • 20 views

Badge

I have A csv file with more 10000000 records how can split into 5 smaller csv file (each with + o- 2000000 refcords) to performafter my actions ?


9 replies

Userlevel 4
Badge +30

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo

Userlevel 4
Badge +25

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.

Badge +22

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).

Userlevel 4

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo

I suspect using a blocking transformer on 10 million features isn't going to be super fast...
Userlevel 4

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.

The Grouper: Non-blocking and doesn't mess up the record order, I like it!
Userlevel 2
Badge +17

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png

Userlevel 4
Badge +25

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).

Of course. ModuloCounter is a great solution here.

 

 

Badge +22

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png

Yeah, I had considered that, but it presupposed the total number of records.

 

 

Userlevel 2
Badge +17

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png

If the number of files must be five even if the number of input records was more than 10 million, this expression is available.

 

0684Q00000ArMPqQAN.png

 

Reply