Skip to main content
Solved

Parallel processing with sql executor

  • July 19, 2018
  • 1 reply
  • 91 views

tva
Contributor
Forum|alt.badge.img+12
  • Contributor

Hi,

I have a workspace that reads some data and after some processing it writes the data via feature writer inside an postgis database. Afterwards more sql scripts are triggerd via sql executors. At the end all data is read again for final writing via FME.

The problem is that the time increases exponentially when the processed data becomes larger (taking a few hours to a few days).

In the processing part I can split my data into smaller parts and write this data in multiple schema's in postgis. The problem is that I can't run the next sql scripts in parallel on the X amount created schema's. If I trigger every script x times, this triggering is done sequentially and timing remains the same.

If I try to put the longest script into a separate workspace runner, I can only run this WSR in parallel when I say that it should not wait for completed job, but then I have an issue with the latest step (= combining all data from X schema's into 1) as the table is not yet existing (or doesn't contain data) since the workspace runner hasn't really finished the job yet.

 

Does anyone has an idea on how to trigger these scripts in parallel and still wait till those parallel jobs are finished to continue with the last part?

A group by function is not present on sql executor.

Best answer by tva

Ok found the answer in the documentation regarding parallel processing.

Seems it's possible to create a custom transformer, where I can specify in the CT advanced parameters on which attributes and how aggressive to do parallelization. Now I'm playing with the different settings.

I do had to adapt the settings of those parameters as it wouln't read the initial setup.

By just using moderate parallelization, I'm able to cut timing from 5 hours to 40 mins (depends on the amount of cores the machine head). Which is significantly better.

View original
Did this help you find an answer to your question?

1 reply

tva
Contributor
Forum|alt.badge.img+12
  • Author
  • Contributor
  • Best Answer
  • July 20, 2018

Ok found the answer in the documentation regarding parallel processing.

Seems it's possible to create a custom transformer, where I can specify in the CT advanced parameters on which attributes and how aggressive to do parallelization. Now I'm playing with the different settings.

I do had to adapt the settings of those parameters as it wouln't read the initial setup.

By just using moderate parallelization, I'm able to cut timing from 5 hours to 40 mins (depends on the amount of cores the machine head). Which is significantly better.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings