Skip to main content

Hi there,

 

UPDATE 14th Feb 2022: I have now added my workbenches, though not sure how much more insight they will provide - I am wondering if it is the fact that an 'umbrella' process is running and taking a long time and therefore using up a lot of processing? Or if it is simply downloading a lot of zip files and then loading them into the database in a linear fashion that makes it take such a long time.

I have also now upgraded my FME Cloud instance to 'Professional' - m5.xlarge, 4 CPUs, 16 GB RAM and 100GB temporary disk space - this did show an initial improvement (see additional graph data 14th Feb), but it is still averaging 00:02:15 per file, so will take 11 days just to download and upload 7,00+ files - this doesn't seem right to me?!

Any pointers very gratefully received. And thanks again.

 

 

I have an FME Server running on a standard FME Cloud (8 GB RAM, 2 CPUs, 25 GB Temp disk space)

 

I am trying to understand the best set up of my workbenches / FME Cloud to enable the most efficient processing.

 

To briefly explain my processes - there are several workspaces involved - one reads an atom feed (from a list of atom feed URLs provided in csv), it identifies *.zip download URLs and passes them to the next workspace one at a time, which downloads the *zip file and then passes on to the next workspace which puts the data into a database (which is RDS and in the same AWS region).

 

I have noticed that whilst these processes start off fairly quick - ~30sec-1 min to process a *.zip file, over time, the process starts to take longer per file.

 

Can anyone provide any useful insight as to what might be happening? OR is this just expected within FME CLoud / Server?

 

So far I have tried moving the initial identifying *zip URLs off the FME Server and run that on an FME Desktop in AWS Workspace instead. I have also tried so that I only run the initial workspace for 500 *.zip at a time - I left that running overnight and it hasn't reached 500 yet (I expected it to take 8 hours, and it's been running 10.5 hours and only processed 286 files).

 

As you can see moving the initial workspace to FME Desktop didn't seem to help, though I notice from the monitoring it is FME Server is now using 3 Engines instead of 4.

 

This is quite a time critical task, as I need to process a number of these Atom feeds and their large number of URLs in a short space of time (end of month). I hadn't realised this was so potentially unrealistic, so any advice happily received.

 

I am attaching screenshots of the FME Cloud monitoring (last 4 hours) and a graph showing increase in processing time.

 

I will attach workbenches once I've removed credentials

 

FME Server version:

FME Server 2021.2.2

Build 21806 - linux

 

FME Desktop version:

FME(R) 2021.2.3.0 (20220131 - Build 21812 - WIN64) 

 

Many thanks,

Fiona

So this is a process which should be able to be run in parallel and should only be limited by the server. With FME Cloud you are not limited by the number of engines from the license. Unless there is something specific about this job in which the order is important. The way you have it configured now only uses one engine as far as I can tell, perhaps I'm missing something?. The monitoring shows what engines are available to be used. For 4 cores you should use 4 engines.

 

Which is the 'slow' workspace? - Knowing this can help you identify any blocking issue or slow down. The database could be the reason why it's taking longer and longer over time, as it grows there is potentially more overhead - Especially if indexes are involved. You could take a look at the database and see if there is an option to first drop the indexes. You can then reenable them at the end after the whole process has completed.

 

I would suggest considering switching your first FMEServerJobSubmitter in the chain to be InParallel and for wait to be No. This should set the parent process finish after the FME Server queue has been populated with the 500 jobs freeing up the initial engine.

This should really speed things up - just make sure there aren't too many engines available or the processes will be fighting for resources.

 

 


Reply