Skip to main content
Solved

How can I optimize the performance of my FME workflows when dealing with large datasets? Any best practices or recommended settings to consider?


Hii all,

How can I optimize the performance of my FME workflows when dealing with large datasets? Any best practices or recommended settings to consider?

Thanks in adavnce.

Best answer by virtualcitymatt

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

 

  1. Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
  2. Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
  3. Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
  4. Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
  5. Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

 

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.

View original
Did this help you find an answer to your question?

7 replies

virtualcitymatt
Celebrity
Forum|alt.badge.img+34

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

 

  1. Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
  2. Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
  3. Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
  4. Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
  5. Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

 

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.


nielsgerrits
VIP
Forum|alt.badge.img+53
  • Switch of FeatureCaching.
  • If a database is the input, let the it do the work when possible. (Spatial queries, joins, etc.) Loading all data into FME and doing it there will take probably longer.

alexlynch3450
Contributor
Forum|alt.badge.img+14

place transformers in bookmark and collapse bookmark when running workspace with feature cache https://s3.amazonaws.com/gitbook/Desktop-Upgrade-To-2018/2018Upgrade3CollapsibleBookmarks/3.03.CollapsibleBookmarksAndCaches.html


  • Author
  • August 10, 2023
virtualcitymatt wrote:

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

 

  1. Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
  2. Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
  3. Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
  4. Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
  5. Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

 

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.

Thanks for the solution.


  • Author
  • August 10, 2023
nielsgerrits wrote:
  • Switch of FeatureCaching.
  • If a database is the input, let the it do the work when possible. (Spatial queries, joins, etc.) Loading all data into FME and doing it there will take probably longer.

Thanks for the solution.


  • Author
  • August 10, 2023
alexlynch3450 wrote:

place transformers in bookmark and collapse bookmark when running workspace with feature cache https://s3.amazonaws.com/gitbook/Desktop-Upgrade-To-2018/2018Upgrade3CollapsibleBookmarks/3.03.CollapsibleBookmarksAndCaches.html

Thanks for the solution.


boydfme
Contributor
Forum|alt.badge.img+6
  • Contributor
  • August 15, 2023

Another tip if applicable: is partitioning large datasets and running the entire workspace in parallel.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings