Question

I have a csv with a million rows to be processed though a workspace. How to reduce the run time.

2 years ago
May 12, 2023
8 replies
9 views

rushabhgulalkar
6 replies

It is taking a considerable amount of time to run the workspace with a million row CSV as a reader

. Please suggest how to reduce the run time

+54

nielsgerrits
2839 replies
2 years ago
May 12, 2023

Have you tried turning off featurecaching? (Run--> Enable Feature Caching)

+44

mark2atsafe
Safer
2517 replies
2 years ago
May 12, 2023

It would really depend on what you're doing to the data. CSV reading would be lightning quick, even with a million rows. Please add some more details about the data transformation and writing.

rushabhgulalkar
Author
6 replies
2 years ago
May 12, 2023

mark2atsafe wrote:

It would really depend on what you're doing to the data. CSV reading would be lightning quick, even with a million rows. Please add some more details about the data transformation and writing.

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

rushabhgulalkar
Author
6 replies
2 years ago
May 12, 2023

nielsgerrits wrote:

Have you tried turning off featurecaching? (Run--> Enable Feature Caching)

I think I have tried that, but it has not given significant time saving

rushabhgulalkar
Author
6 replies
2 years ago
May 12, 2023

rushabhgulalkar wrote:

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

I have ticked "Enable Adjacent Feature Attributes" with "23" number of subsequent features

+18

DanAtSafe
Safer
330 replies
2 years ago
May 12, 2023

rushabhgulalkar wrote:

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

With adjacent features enabled, the AttributeManager becomes a blocking transformer and bulk mode is lost. FYI this is a known issue - FMEENGINE-63455

rushabhgulalkar
Author
6 replies
2 years ago
May 13, 2023

rushabhgulalkar wrote:

To be more specific, it is getting more time at one of the attribute manager where I am using +23rd Feature value of date attribute

I want to perform an operation where I want to add a new attribute which will have value of next 23rd attribute. Is there any quicker way to do this?

paalpedersen
Contributor
27 replies
2 years ago
May 15, 2023

Sounds like you have relationships between rows. I would then try to write the data in to a relational database. cause it would reduce any blocking. Then read the relational data instead.

Before writing to the relational database you should try to create a clear design on how to make the data clean and valid. Maybe showing ChatGPT a piece of your datastructure would help.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

I have a csv with a million rows to be processed though a workspace. How to reduce the run time.