Skip to main content

I have a routine to be done once a quarter-year or maximun a month. There is some major file downloading and processing that was managed by parallel workspace runner in Form (Desktop). I need to deploy the same routine on FME Flow, but I need to take advantage of parallel processing. As I understood I can't process multiple simultaneous jobs with one engine.

So I have tried combining it with FME Form WorkspaceRunner that actually works, but somehow can't pass the DB connection from Flow job to Form workspace. Is there a way to pass the connection from Flow job to WSRunner other than embedding the connection and so creating a workspace for every DB connection?

In newer version of FME you can have the http caller run simultaneous calls at the same time without needing to spin up separate concurrent workbenches


In newer version of FME you can have the http caller run simultaneous calls at the same time without needing to spin up separate concurrent workbenches

The actual downloading is a minor load in all the processing, I have to process the dowloaded data trough other simultaneous workspaces. If I'll rely on one process at a time I won't be using the computing power and it would last very long, possibly more than a day. There are some system callers and writing to the database in Upsert mode (slow!).


The actual downloading is a minor load in all the processing, I have to process the dowloaded data trough other simultaneous workspaces. If I'll rely on one process at a time I won't be using the computing power and it would last very long, possibly more than a day. There are some system callers and writing to the database in Upsert mode (slow!).

For parallelisation on server if that's where you have to do the processing, you are constrained by the number of engines available to you.

The equivalent to WorkspaceRunner is FMEFlowJobSubmitter, you can choose to spam the job queue with jobs, using all engines in parallel, and with queue management you can leave an engine free for other jobs.

For passing a db connection to each workspace, I think each db connection may have to be published up to Flow (unless connection details can be passed in as a non-connection type parameter, may depend on the db type you have).


You may want to consider using CPU-Time Engines, which is a concept that was pretty much made for this type of scenario. It will let you spin up an unlimited number of engines for a period of time and you will be billed only for the actual engine time consumed. It can be very cost effective for heavy processing that only occurs once in a while.


Reply