Skip to main content
Question

CSv file


Forum|alt.badge.img

I have A csv file with more 10000000 records how can split into 5 smaller csv file (each with + o- 2000000 refcords) to performafter my actions ?

9 replies

danilo_fme
Evangelist
Forum|alt.badge.img+45
  • Evangelist
  • January 14, 2018

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo


mark2atsafe
Safer
Forum|alt.badge.img+49
  • Safer
  • January 15, 2018

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • January 15, 2018

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).


david_r
Celebrity
  • January 16, 2018
danilo_fme wrote:

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo

I suspect using a blocking transformer on 10 million features isn't going to be super fast...

david_r
Celebrity
  • January 16, 2018
mark2atsafe wrote:

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.

The Grouper: Non-blocking and doesn't mess up the record order, I like it!

takashi
Evangelist
  • January 16, 2018

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png


mark2atsafe
Safer
Forum|alt.badge.img+49
  • Safer
  • January 16, 2018
jdh wrote:

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).

Of course. ModuloCounter is a great solution here.

 

 


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • January 16, 2018
takashi wrote:

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png

Yeah, I had considered that, but it presupposed the total number of records.

 

 


takashi
Evangelist
  • January 16, 2018
takashi wrote:

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

0684Q00000ArLNTQA3.png

If the number of files must be five even if the number of input records was more than 10 million, this expression is available.

 

0684Q00000ArMPqQAN.png

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings