Skip to main content
Archived

Parallel Reads and Writes

Related products:FME Form
siennaatsafe
david_r
davideagle
bruceharold
evanedwards
+3
  • siennaatsafe
    siennaatsafe
  • david_r
    david_r
  • davideagle
    davideagle
  • bruceharold
    bruceharold
  • evanedwards
    evanedwards
  • jt
  • jerrodstutzman
  • samuelvaldez

Many times, I have a workflow that reads multiple tables from one database, performs a few simple tasks to each one individually, and writes to another database. If there are 10 tables, it completes one before moving on to the next. This can add up to long processing times in the end. However, if I split that job up into 10 separate workbenches and run them concurrently on FME Server, the total processing is drastically reduced since they are run in parallel. Unfortunately, that method creates a data management nightmare.

My suggestion is to allow parallel reads and writes within one workbench (when those reads/writes don't depend on the other reads/writes in the same workbench).


Obviously this wouldn't apply if there are table joins or any transformers that hold features.


Example: The screenshot below is a job that takes nearly 24 hours to run due to large amounts of data. However, no single table takes more than a few hours. But since FME runs in "serial" mode, all those hours are added together in the end.

This post is closed to further activity.
It may be a question with a best answer, an implemented idea, or just a post needing no comment.
If you have a follow-up or related question, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

rylanatsafe
Safer
Forum|alt.badge.img+13
  • Safer
  • September 12, 2017

Try replacing all your "native" FME Writers with FeatureWriter Transformers! This should prevent holding data in memory while one writer writes at a time...


bruceharold
Contributor
Forum|alt.badge.img+17
  • Contributor
  • September 12, 2017

Another pattern is to use WorkspaceRunner with no wait for completion. For example if your data are in a directory use the Path reader, process path_windows inputs and write with dataset fanout @Value(fme_basename).

One caveat for Data Interpoerability users; the process limit code appears not to be implemented and you get as many processes as you have inputs, which is exciting!


fmelizard
Safer
Forum|alt.badge.img+18
  • Safer
  • September 14, 2017

Hi @jerrodstutzman -- the idea of analyzing the workspace to look for independent flows and then run them in parallel is a very good one. As @RylanAtSafe mentions, in the meantime, using FeatureWriters will provide at least some efficiency boost if your outputs were going to different writers entirely. However, we'll keep thinking about the idea of doing graph analysis and splitting (if the workspace author agrees) chunks to be run in parallel.


rbell
Contributor
  • Contributor
  • April 4, 2022

I like this idea. This would resemble what SSIS does, being able to link dependencies graphically. If no dependencies the activity can start in parallel as the software best determines.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings