It is taking a considerable amount of time to run the workspace with a million row CSV as a reader. Please suggest how to reduce the run time

I have a csv with a million rows to be processed though a workspace. How to reduce the run time.

Userlevel 6

+32

nielsgerrits
Celebrity
2337 replies
11 months ago
12 May 2023

Have you tried turning off featurecaching? (Run--> Enable Feature Caching)

Userlevel 4

+25

It would really depend on what you're doing to the data. CSV reading would be lightning quick, even with a million rows. Please add some more details about the data transformation and writing.

R

It would really depend on what you're doing to the data. CSV reading would be lightning quick, even with a million rows. Please add some more details about the data transformation and writing.

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

R

Have you tried turning off featurecaching? (Run--> Enable Feature Caching)

I think I have tried that, but it has not given significant time saving

R

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

I have ticked "Enable Adjacent Feature Attributes" with "23" number of subsequent features

Userlevel 1

+11

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

With adjacent features enabled, the AttributeManager becomes a blocking transformer and bulk mode is lost. FYI this is a known issue - FMEENGINE-63455

R

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

I want to perform an operation where I want to add a new attribute which will have value of next 23rd attribute. Is there any quicker way to do this?

+5

paalpedersen
Contributor
25 replies
11 months ago
15 May 2023

Sounds like you have relationships between rows. I would then try to write the data in to a relational database. cause it would reduce any blocking. Then read the relational data instead.

Before writing to the relational database you should try to create a clear design on how to make the data clean and valid. Maybe showing ChatGPT a piece of your datastructure would help.

I have a csv with a million rows to be processed though a workspace. How to reduce the run time.

8 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded