Solved

How can I optimize the performance of my FME workflows when dealing with large datasets? Any best practices or recommended settings to consider?

1 year ago
August 9, 2023
7 replies
301 views

lija
7 replies

Hii all,

How can I optimize the performance of my FME workflows when dealing with large datasets? Any best practices or recommended settings to consider?

Thanks in adavnce.

Best answer by virtualcitymatt

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.

Did this help you find an answer to your question?

Forum|alt.badge.img

+35

virtualcitymatt
Celebrity
1856 replies
Best Answer
1 year ago
August 9, 2023

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.

Forum|alt.badge.img

+54

nielsgerrits
2838 replies
1 year ago
August 9, 2023

Switch of FeatureCaching.
If a database is the input, let the it do the work when possible. (Spatial queries, joins, etc.) Loading all data into FME and doing it there will take probably longer.

Forum|alt.badge.img

+14

alexlynch3450
Contributor
68 replies
1 year ago
August 9, 2023

place transformers in bookmark and collapse bookmark when running workspace with feature cache https://s3.amazonaws.com/gitbook/Desktop-Upgrade-To-2018/2018Upgrade3CollapsibleBookmarks/3.03.CollapsibleBookmarksAndCaches.html

lija
Author
7 replies
1 year ago
August 10, 2023

virtualcitymatt wrote:

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.

Thanks for the solution.

lija
Author
7 replies
1 year ago
August 10, 2023

nielsgerrits wrote:

Switch of FeatureCaching.
If a database is the input, let the it do the work when possible. (Spatial queries, joins, etc.) Loading all data into FME and doing it there will take probably longer.

Thanks for the solution.

lija
Author
7 replies
1 year ago
August 10, 2023

alexlynch3450 wrote:

place transformers in bookmark and collapse bookmark when running workspace with feature cache https://s3.amazonaws.com/gitbook/Desktop-Upgrade-To-2018/2018Upgrade3CollapsibleBookmarks/3.03.CollapsibleBookmarksAndCaches.html

Thanks for the solution.

Forum|alt.badge.img

+9

boydfme
Contributor
24 replies
1 year ago
August 15, 2023

Another tip if applicable: is partitioning large datasets and running the entire workspace in parallel.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

×

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing