Skip to main content

Hello everyone,

 

Let's say that I need to launch the same workspace 1 000 000 times (one for each object i have in entry). This workspace would have some tester, filter, geometry transformers.

 

I have planned on using the FME Rest API to launch my jobs. I will launch them by batch of 10 000 jobs (so 100 batchs in total)

 

Is it a good way to do it ?

I would be happy to read your advices !

I'm fairly certain that it's doable, but I'm less certain that it's the optimal solution: there's a bit of overhead each time a workspace is started, and although it's short, if you multiply it with a million it's going to be noticeable. Is it not possible to process all the objects in the same workspace using e.g. Group By processing?


hi thank you for your response!

 

You are absolutely right, I realized that there was a bit of overhead (varying from ~200 ms to 2s or 3s between each job) that is why I questioned myself on the right way to do it.

 

I didn't know about Group By processing, I will explore this option and make some tests !

I wanted to get the most out of the two engines I have.

What is your opinion on launching the workspace with Group By processing twice. The first time with the first half of my group and the second time with the second half ?

 


There is a workflow problem:

1.000.000 workbench, each runnning min. 1 second.

that's 11.5 days.

Tell us more about your workbench.


There is a workflow problem:

1.000.000 workbench, each runnning min. 1 second.

that's 11.5 days.

Tell us more about your workbench.

Yes it is more or less the time it takes !

The idea of the workbench is to take 1 000 000 lines of a table in a postgis database and for each line, we want to check if the geometry of the line is close to a rectangle.

 

With my team we tried another approach which uses more the database (with sql creator transformer) with a group by instead of reading one line per execution and we are down to 5 hours (instead of ~14 days).

 

Thank you so much for your answers !!


Yes it is more or less the time it takes !

The idea of the workbench is to take 1 000 000 lines of a table in a postgis database and for each line, we want to check if the geometry of the line is close to a rectangle.

 

With my team we tried another approach which uses more the database (with sql creator transformer) with a group by instead of reading one line per execution and we are down to 5 hours (instead of ~14 days).

 

Thank you so much for your answers !!

Based on this, you shouldn't even need group by. Bearing in mind that in a workspace each feature is a row. So when you do a test, you're only considering that single feature in isolation.

 

Without knowing your data or the wider requirements, you should be able to achieve that with a LineCloser, CircularityCalculator and Tester


Reply