Skip to main content

Hi, we are seeing some odd behaviour in our FME Server 2015.1 instance

We are running a 'runner' workbench, that uses 3 chained fmeserverjobsubmitter transfomers to call 3 'workers' and all are configured in 'sync' mode (i.e. wait for the job to finish)

Occasionally the first worker (which can take just over an hour to run) is submitted twice. The timing of the second submission appears to be random and between 10-50 minutes.

the driver workbench log shows no record of triggering this second job and appears to fail with the error

(ServerJobSubmissionFactory): A connection error occurred with server "http://fme-server.prod/fmeserver"

both the workers appear to complete successfully sometime after the runner has failed.

Does anyone know what conditions might be causing this automatic resubmission of a job and, ideally, how to prevent it?

I suspect that some component of FME Server is losing connectivity with the initial job, assuming failure, trying again and losing connectivity with the second subsequent job and finally quitting.

Luckily this is just an extract type process, so not causing database issues.

regards,

Nick

p.s. (we are working on an upgrade path, but this 'fix' option is not currently available!)

 

Hi @nrich,

Could you tell me a bit more about your FME Server set up.. what type of installation do you have? e.g. Express, any distributed components, configured to run through a load balancer etc

Your symptoms sound similar to an issue we had when FME Server was configured to use a Load Balancer which by default usually have an idle timeout - however under this scenario I would expect the second submission time to be consistent. If you are using a Load Balancer and think this could be the problem you are encountering this was fixed for the FMEServerJobSubmitter in 2019.1.

In 2015.1 one thing you could try is in the FMEServerJobSubmitter set wait for Jobs to Complete = No and then use an FMEServerJobWaiter. This transformer has a Polling Interval parameter that can be used to keep the connection alive. This would require an additional engine to run the child jobs though so you'd have to consider if that was an option for you, setting this up could at least confirm whether this is the problem.


Hi @nrich,

Could you tell me a bit more about your FME Server set up.. what type of installation do you have? e.g. Express, any distributed components, configured to run through a load balancer etc

Your symptoms sound similar to an issue we had when FME Server was configured to use a Load Balancer which by default usually have an idle timeout - however under this scenario I would expect the second submission time to be consistent. If you are using a Load Balancer and think this could be the problem you are encountering this was fixed for the FMEServerJobSubmitter in 2019.1.

In 2015.1 one thing you could try is in the FMEServerJobSubmitter set wait for Jobs to Complete = No and then use an FMEServerJobWaiter. This transformer has a Polling Interval parameter that can be used to keep the connection alive. This would require an additional engine to run the child jobs though so you'd have to consider if that was an option for you, setting this up could at least confirm whether this is the problem.

Hi @hollyatsafe, thanks for this response and the response on the other question: auto resubmit job (i'll stick to this thread now)

Yes we were thinking about using job waiters as our Plan B, we are a little confused though because we have other jobs configured like this, on the same environment, but we haven't noticed the same issue.

We are running 2015.0 on two physical servers in 2 tier mode, load balanced with an active/passive configuration. The servers are using shared database and fileserver repositories.

the web services are deployed into our own version of tomcat, with apache web server sitting over the top of that (don't ask me why, our IT provider insisted.) These are on the same servers as the core and engines though.

Each server has 6 engines.

I have found a 4.5GB 'rest.log' file, which potentially could be causing communication issues with the fme server rest service, so I'm getting that recycled at the weekend (as it requires downtime).

Nick


Hi @hollyatsafe, thanks for this response and the response on the other question: auto resubmit job (i'll stick to this thread now)

Yes we were thinking about using job waiters as our Plan B, we are a little confused though because we have other jobs configured like this, on the same environment, but we haven't noticed the same issue.

We are running 2015.0 on two physical servers in 2 tier mode, load balanced with an active/passive configuration. The servers are using shared database and fileserver repositories.

the web services are deployed into our own version of tomcat, with apache web server sitting over the top of that (don't ask me why, our IT provider insisted.) These are on the same servers as the core and engines though.

Each server has 6 engines.

I have found a 4.5GB 'rest.log' file, which potentially could be causing communication issues with the fme server rest service, so I'm getting that recycled at the weekend (as it requires downtime).

Nick

Hi @nrich,

Hmm, given the time between re-submissions is inconsistent and you have other FMEServerJobSubmitter jobs that do not encounter this problem it does seem unlikely this is the issue you're encountering.

Given 2015 is a mature release and will be retired next year if this is a bug then the only option will be to upgrade or use the JobWaiter, however if you're able to provide a copy of the problem workspaces and zip up the logs folder I can take a look and see if there is anything else that might indicate what's going on here. If you'd rather not share this over the forum, you can create a case and include 'Attn Holly' in the subject.


Reply