Skip to main content
Question

CSv file

  • January 14, 2018
  • 9 replies
  • 51 views

Forum|alt.badge.img

I have A csv file with more 10000000 records how can split into 5 smaller csv file (each with + o- 2000000 refcords) to performafter my actions ?

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

9 replies

danilo_fme
Celebrity
Forum|alt.badge.img+52
  • Celebrity
  • January 14, 2018

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo


mark2atsafe
Safer
Forum|alt.badge.img+59
  • Safer
  • January 15, 2018

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.


jdh
Contributor
Forum|alt.badge.img+40
  • Contributor
  • January 15, 2018

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).


david_r
Celebrity
  • January 16, 2018

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo

I suspect using a blocking transformer on 10 million features isn't going to be super fast...

david_r
Celebrity
  • January 16, 2018

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.

The Grouper: Non-blocking and doesn't mess up the record order, I like it!

takashi
Celebrity
  • January 16, 2018

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png


mark2atsafe
Safer
Forum|alt.badge.img+59
  • Safer
  • January 16, 2018

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).

Of course. ModuloCounter is a great solution here.

 

 


jdh
Contributor
Forum|alt.badge.img+40
  • Contributor
  • January 16, 2018

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png

Yeah, I had considered that, but it presupposed the total number of records.

 

 


takashi
Celebrity
  • January 16, 2018

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png

If the number of files must be five even if the number of input records was more than 10 million, this expression is available.

 

0684Q00000ArMPqQAN.png