Question

CSv file

7 years ago
January 14, 2018
9 replies
34 views

frsisani
12 replies

I have A csv file with more 10000000 records how can split into 5 smaller csv file (each with + o- 2000000 refcords) to performafter my actions ?

+45

danilo_fme
Evangelist
2059 replies
7 years ago
January 14, 2018

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo

+49

mark2atsafe
Safer
2522 replies
7 years ago
January 15, 2018

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.

+28

jdh
Contributor
1984 replies
7 years ago
January 15, 2018

If the order of the records is irrelevant, I would use a ModuloCounter (set to 5), and fanout based on the _modulo_count. That way the features aren't being kept in memory to determine the total number of features (StatisticsCalculator).

david_r
8355 replies
7 years ago
January 16, 2018

danilo_fme wrote:

Hi @frsisani,

1 - After your Reader you can use the transformer StatisticsCalculator to have a count of features in Attribute = number.

2 - Use the transformer ExpressionEvaluator to calculate @Value(number)/5 and generate the attribute group.

3 - Use the transformer Count to generate the attribute _count and after another transformer ExpressionEvaluator = result

int(@Value(number/@Value(group))

4 - Connect the output port ExpressionEvaluator in your Write file and set the configuration CSV File Name = _result

5 - Set in Navigator the Option Fanout.

Attached the Workspace. - workspace-fanout-split.fmw

Thanks,

Danilo

I suspect using a blocking transformer on 10 million features isn't going to be super fast...

david_r
8355 replies
7 years ago
January 16, 2018

mark2atsafe wrote:

Alternatively a Counter/AttributeRangeFilter combination might be useful.

I also note a Grouper transformer on the FME Hub, although I haven't tried it myself.

The Grouper: Non-blocking and doesn't mess up the record order, I like it!

takashi
7723 replies
7 years ago
January 16, 2018

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

+49

mark2atsafe
Safer
2522 replies
7 years ago
January 16, 2018

jdh wrote:

Of course. ModuloCounter is a great solution here.

+28

jdh
Contributor
1984 replies
7 years ago
January 16, 2018

takashi wrote:

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

Yeah, I had considered that, but it presupposed the total number of records.

takashi
7723 replies
7 years ago
January 16, 2018

takashi wrote:

I would just use a feature type fanout expression like this.

output_@Evaluate(@int(@Count()/2000000))

If the number of files must be five even if the number of input records was more than 10 million, this expression is available.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

CSv file

9 replies

Reply

Helpful Members This Week

Recently Solved Questions

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

How to set a "reply_to" parameter in flow automation action "email send"

AttributeValidator Pass Nulls

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

FTP CSV File Reader : need to read current date fileicon

Use the file name of the inputted csv as the automatic default in the 'CSV File Name' value in FeatureWritericon

Update Block Attribute from CSV fileicon

Adding "name of person who uploaded file" to CSV fileicon

I have a csv file where I want to calculate the sum of a respective attribute which have the same ID. In the example below I want to sum the values of v2 and add it in the correct place for id = 22, then go onto the next "group", thanks!icon

Helpful Members This Week

Recently Solved Questions

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

How to set a "reply_to" parameter in flow automation action "email send"

AttributeValidator Pass Nulls

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings