Skip to main content
Solved

Does FME Server handle large batch of workspaces ? (around 1 000 000 workspaces)


amooo

Hello everyone,

 

Let's say that I need to launch the same workspace 1 000 000 times (one for each object i have in entry). This workspace would have some tester, filter, geometry transformers.

 

I have planned on using the FME Rest API to launch my jobs. I will launch them by batch of 10 000 jobs (so 100 batchs in total)

 

Is it a good way to do it ?

I would be happy to read your advices !

Best answer by david_r

I'm fairly certain that it's doable, but I'm less certain that it's the optimal solution: there's a bit of overhead each time a workspace is started, and although it's short, if you multiply it with a million it's going to be noticeable. Is it not possible to process all the objects in the same workspace using e.g. Group By processing?

View original
Did this help you find an answer to your question?

5 replies

david_r
Celebrity
  • Best Answer
  • November 14, 2022

I'm fairly certain that it's doable, but I'm less certain that it's the optimal solution: there's a bit of overhead each time a workspace is started, and although it's short, if you multiply it with a million it's going to be noticeable. Is it not possible to process all the objects in the same workspace using e.g. Group By processing?


amooo
  • Author
  • November 14, 2022

hi thank you for your response!

 

You are absolutely right, I realized that there was a bit of overhead (varying from ~200 ms to 2s or 3s between each job) that is why I questioned myself on the right way to do it.

 

I didn't know about Group By processing, I will explore this option and make some tests !

I wanted to get the most out of the two engines I have.

What is your opinion on launching the workspace with Group By processing twice. The first time with the first half of my group and the second time with the second half ?

 


tomfriedl
Contributor
Forum|alt.badge.img+13
  • Contributor
  • November 14, 2022

There is a workflow problem:

1.000.000 workbench, each runnning min. 1 second.

that's 11.5 days.

Tell us more about your workbench.


amooo
  • Author
  • November 14, 2022
tomfriedl wrote:

There is a workflow problem:

1.000.000 workbench, each runnning min. 1 second.

that's 11.5 days.

Tell us more about your workbench.

Yes it is more or less the time it takes !

The idea of the workbench is to take 1 000 000 lines of a table in a postgis database and for each line, we want to check if the geometry of the line is close to a rectangle.

 

With my team we tried another approach which uses more the database (with sql creator transformer) with a group by instead of reading one line per execution and we are down to 5 hours (instead of ~14 days).

 

Thank you so much for your answers !!


hkingsbury
Celebrity
Forum|alt.badge.img+54
  • Celebrity
  • November 14, 2022
amooo wrote:

Yes it is more or less the time it takes !

The idea of the workbench is to take 1 000 000 lines of a table in a postgis database and for each line, we want to check if the geometry of the line is close to a rectangle.

 

With my team we tried another approach which uses more the database (with sql creator transformer) with a group by instead of reading one line per execution and we are down to 5 hours (instead of ~14 days).

 

Thank you so much for your answers !!

Based on this, you shouldn't even need group by. Bearing in mind that in a workspace each feature is a row. So when you do a test, you're only considering that single feature in isolation.

 

Without knowing your data or the wider requirements, you should be able to achieve that with a LineCloser, CircularityCalculator and Tester


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings