Question

I have a csv with a million rows to be processed though a workspace. How to reduce the run time.

  • 12 May 2023
  • 8 replies
  • 0 views

It is taking a considerable amount of time to run the workspace with a million row CSV as a reader

. Please suggest how to reduce the run time


8 replies

Userlevel 6
Badge +32

Have you tried turning off featurecaching? (Run--> Enable Feature Caching)

Userlevel 4
Badge +25

It would really depend on what you're doing to the data. CSV reading would be lightning quick, even with a million rows. Please add some more details about the data transformation and writing.

It would really depend on what you're doing to the data. CSV reading would be lightning quick, even with a million rows. Please add some more details about the data transformation and writing.

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

Have you tried turning off featurecaching? (Run--> Enable Feature Caching)

I think I have tried that, but it has not given significant time saving​

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

I have ticked​ "Enable Adjacent Feature Attributes" with "23" number of subsequent features

Userlevel 1
Badge +11

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

With adjacent features enabled, the AttributeManager becomes a blocking transformer and bulk mode is lost. FYI this is a known issue - FMEENGINE-63455

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

I want to perform an operation where I want to add a new attribute which will have value of next 23rd attribute. Is there any quicker way to do this?

Badge +5

Sounds like you have relationships between rows. I would then try to write the data in to a relational database. cause it would reduce any blocking. Then read the relational data instead.

Before writing to the relational database you should try to create a clear design on how to make the data clean and valid. Maybe showing ChatGPT a piece of your datastructure would help.

Reply