Skip to main content
Question

I have a csv with a million rows to be processed though a workspace. How to reduce the run time.


It is taking a considerable amount of time to run the workspace with a million row CSV as a reader

. Please suggest how to reduce the run time

8 replies

nielsgerrits
VIP
Forum|alt.badge.img+54

Have you tried turning off featurecaching? (Run--> Enable Feature Caching)


mark2atsafe
Safer
Forum|alt.badge.img+44

It would really depend on what you're doing to the data. CSV reading would be lightning quick, even with a million rows. Please add some more details about the data transformation and writing.


mark2atsafe wrote:

It would really depend on what you're doing to the data. CSV reading would be lightning quick, even with a million rows. Please add some more details about the data transformation and writing.

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute


nielsgerrits wrote:

Have you tried turning off featurecaching? (Run--> Enable Feature Caching)

I think I have tried that, but it has not given significant time saving​


rushabhgulalkar wrote:

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

I have ticked​ "Enable Adjacent Feature Attributes" with "23" number of subsequent features


DanAtSafe
Safer
Forum|alt.badge.img+18
  • Safer
  • May 12, 2023
rushabhgulalkar wrote:

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

With adjacent features enabled, the AttributeManager becomes a blocking transformer and bulk mode is lost. FYI this is a known issue - FMEENGINE-63455


rushabhgulalkar wrote:

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

I want to perform an operation where I want to add a new attribute which will have value of next 23rd attribute. Is there any quicker way to do this?


paalpedersen
Contributor
Forum|alt.badge.img+7

Sounds like you have relationships between rows. I would then try to write the data in to a relational database. cause it would reduce any blocking. Then read the relational data instead.

Before writing to the relational database you should try to create a clear design on how to make the data clean and valid. Maybe showing ChatGPT a piece of your datastructure would help.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings