Skip to main content

Hi,

I have a workspace that reads some data and after some processing it writes the data via feature writer inside an postgis database. Afterwards more sql scripts are triggerd via sql executors. At the end all data is read again for final writing via FME.

The problem is that the time increases exponentially when the processed data becomes larger (taking a few hours to a few days).

In the processing part I can split my data into smaller parts and write this data in multiple schema's in postgis. The problem is that I can't run the next sql scripts in parallel on the X amount created schema's. If I trigger every script x times, this triggering is done sequentially and timing remains the same.

If I try to put the longest script into a separate workspace runner, I can only run this WSR in parallel when I say that it should not wait for completed job, but then I have an issue with the latest step (= combining all data from X schema's into 1) as the table is not yet existing (or doesn't contain data) since the workspace runner hasn't really finished the job yet.

 

Does anyone has an idea on how to trigger these scripts in parallel and still wait till those parallel jobs are finished to continue with the last part?

A group by function is not present on sql executor.

Ok found the answer in the documentation regarding parallel processing.

Seems it's possible to create a custom transformer, where I can specify in the CT advanced parameters on which attributes and how aggressive to do parallelization. Now I'm playing with the different settings.

I do had to adapt the settings of those parameters as it wouln't read the initial setup.

By just using moderate parallelization, I'm able to cut timing from 5 hours to 40 mins (depends on the amount of cores the machine head). Which is significantly better.


Reply