Skip to main content
Question

Very un-random engine selection


I have an FME Cloud Starter instance, 2019.1.3.1, with 2 engines which, at the moment, is running mainly one workspace with an average processing time of 20-30 seconds on a 1-minute schedule plus some other (10 at most) jobs either on a schedule or ad-hoc. There are no job queues in use so I would expect each of the two engines to be used "about 50%" of the time.

However, it recently (with the addition of that workspace on the 1-minute schedule, replacing an ad-hoc workspace with similar load figures) started to show a preference for running many jobs in a row on the same engine.

This load graph from yesterday shows it clearly and I'm especially surprised by the sudden switch to running almost all jobs in a given hour on the same engines.

Of the 500 jobs on the completed page right now, 347 have been run on Engine2, that includes a whole section of this morning where only about 1 in every 30 jobs was run on Engine1.

Is there a logical explanation for this?

Hi @redgeographics,

I don't know exactly how the engines decide to pick up a job but I would not have expected a 1:1 distribution when using 2 engines. In fact, I would expect that all jobs run on a single-engine if no job ever ends up being queued. Maybe when an engine automatically restarts after a fixed number of jobs the second already running engine might pick up the next job. I think this is in-line with the graphic you shared considering that there are a few queued jobs which will lead to alternation.

Is this causing any undesirable symptoms or issues with your workflows?

If there are still jobs in the queue while 2 jobs are running you could just add another engine and monitor the server load, but usually such short running jobs it shouldn't be a problem.

Other than that I recommend using Job Queues to control job distribution on FME Server.


It's not causing issues, I just thought it was really odd and previously it always seemed like it would randomly pick an available engine. Plus that change that you see in that graph is something I have not seen happen before (there's a few aborted jobs, the yellow lines, right before that change so maybe that caused it).

Anyway, this instance is scheduled to be updated to 2020.1 as that becomes available, curious to see if the behaviour will continue then.

 


Reply