Skip to main content
Solved

How can I optimize the performance of my FME workflows when dealing with large datasets? Any best practices or recommended settings to consider?

  • August 9, 2023
  • 7 replies
  • 417 views

Hii all,

How can I optimize the performance of my FME workflows when dealing with large datasets? Any best practices or recommended settings to consider?

Thanks in adavnce.

Best answer by virtualcitymatt

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

 

  1. Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
  2. Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
  3. Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
  4. Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
  5. Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

 

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

7 replies

virtualcitymatt
Celebrity
Forum|alt.badge.img+47

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

 

  1. Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
  2. Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
  3. Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
  4. Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
  5. Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

 

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.


nielsgerrits
VIP
Forum|alt.badge.img+62
  • Switch of FeatureCaching.
  • If a database is the input, let the it do the work when possible. (Spatial queries, joins, etc.) Loading all data into FME and doing it there will take probably longer.

alexlynch3450
Contributor
Forum|alt.badge.img+16

place transformers in bookmark and collapse bookmark when running workspace with feature cache https://s3.amazonaws.com/gitbook/Desktop-Upgrade-To-2018/2018Upgrade3CollapsibleBookmarks/3.03.CollapsibleBookmarksAndCaches.html


  • Author
  • August 10, 2023

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

 

  1. Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
  2. Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
  3. Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
  4. Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
  5. Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

 

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.

Thanks for the solution.


  • Author
  • August 10, 2023
  • Switch of FeatureCaching.
  • If a database is the input, let the it do the work when possible. (Spatial queries, joins, etc.) Loading all data into FME and doing it there will take probably longer.

Thanks for the solution.


  • Author
  • August 10, 2023

place transformers in bookmark and collapse bookmark when running workspace with feature cache https://s3.amazonaws.com/gitbook/Desktop-Upgrade-To-2018/2018Upgrade3CollapsibleBookmarks/3.03.CollapsibleBookmarksAndCaches.html

Thanks for the solution.


boydfme
Contributor
Forum|alt.badge.img+10
  • Contributor
  • August 15, 2023

Another tip if applicable: is partitioning large datasets and running the entire workspace in parallel.