Skip to main content

Hii all,

How can I optimize the performance of my FME workflows when dealing with large datasets? Any best practices or recommended settings to consider?

Thanks in adavnce.

Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

 

  1. Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
  2. Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
  3. Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
  4. Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
  5. Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

 

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.


  • Switch of FeatureCaching.
  • If a database is the input, let the it do the work when possible. (Spatial queries, joins, etc.) Loading all data into FME and doing it there will take probably longer.

place transformers in bookmark and collapse bookmark when running workspace with feature cache https://s3.amazonaws.com/gitbook/Desktop-Upgrade-To-2018/2018Upgrade3CollapsibleBookmarks/3.03.CollapsibleBookmarksAndCaches.html


Here is a general article which should be helpful: https://community.safe.com/s/article/performance-tuning-fme but in addition here are a few things for me which I consider to be important.

 

  1. Try to avoid transformers which break bulk mode where possible (look at the feature count and for messages in the log file). The feature count when in bulk mode should jump large intervals whereas when in 'per feature mode' the count will be one-by-one. The log file should also indicate when bulk mode is lost.
  2. Avoid blocking transformers if possible - if you have to use them there can be ways to reduce the amount of blocking by reordering the readers to read smaller datasets first (if reading multiple data-sources)
  3. Keep attributes to as few as possible. The AttribtueKeeper can help here if you really have a lot
  4. Avoid list attributes and large attributes (e.g., large blocks of text or JSON) if you can.
  5. Make sure your hard drive is an SSD with plenty of free space and that the FME_TEMP location is also using an SSD

 

Other considerations will depend on the specific format and datatypes and the type of work flow you're trying to do.

Thanks for the solution.


  • Switch of FeatureCaching.
  • If a database is the input, let the it do the work when possible. (Spatial queries, joins, etc.) Loading all data into FME and doing it there will take probably longer.

Thanks for the solution.


place transformers in bookmark and collapse bookmark when running workspace with feature cache https://s3.amazonaws.com/gitbook/Desktop-Upgrade-To-2018/2018Upgrade3CollapsibleBookmarks/3.03.CollapsibleBookmarksAndCaches.html

Thanks for the solution.


Another tip if applicable: is partitioning large datasets and running the entire workspace in parallel.


Reply