Question

Why does my FME Server Engine on separate HOST become unresponsive?


Badge

My current FME Server setup includes a CORE/2 Engines set up on one machine with another Engine-Only Host running remotely on another machine.

 

Recently, I noticed jobs on the remote machine queuing up and never processing. Restarting the FME Server Engines Windows service temporarily fixed the problem. But only for a few hours.

 

I don't see any errors in logs. But I've just switched to debug level logging for more info and hope to capture some. What I do notice is that in FME Server web console, under licensing & engines, the remote engine just won't show up. Once I restart the service, I see it under both the Engines tab and the Deployment Status tab, until it disappears again.

 

Unlikely this is service account related (as other similar post/solutions suggest) since I can restart service and then it runs for a bit. I did make a change regarding domain controller/Active Directory recently but rolled that back and it didn't solve the issue. I am not aware of any other environmental changes on the remote Host, except potentially OS upgrades/patches that I am not responsible for.

 

Pending further insights from logs, what other prerequisites or dependencies should I be checking ?

 

Running FME Server 2020.2. Planning to upgrade to 2022 when it is released.

Thanks!


4 replies

Badge

Short term fix, although only partial, was a scheduled job that restarts the engine service a few minutes before my scheduled jobs. That worked for a while. Then is stopped working.

 

All I see in the 'fmeserver' log file is this about 2 hours after I restarted it:

Mon-09-May-2022 09:36:21.500 AM   ERROR    Engine-<servername>-<servername>_Engine1   401924 : Lost connection to FME Engine <servername>_Engine1 running on host <servername>
Mon-09-May-2022 09:36:21.500 AM   INFORM   Engine-<servername>-<servername>_Engine1   401936 : Disconnected FME Engine: <servername>_Engine1 on host <servername>

FME Server still appends jobs to the queue for that engine. But nothing processes any more. I assume there is some kind of polling from <Core> to <Host>? Where can I find some info on that or run some diagnostics? Thanks.

Badge +6

Hi @agelfert​ ,

Sorry that your engines are misbehaving! Firstly, I recommend reviewing the steps in our Engines troubleshooting guide, under the heading "If your Engines are successfully connected but are now missing from the Web UI..." To summarize, when a distributed engine loses connection, the issue will typically come down to one of the following items:

  • Service account permissions
  • Ports (in particular, 7070)
  • Communication with the database
  • Mismatched timezones between components
  • A specific job causing the engine to stop

Potential solutions for each of these issues can be found in the linked article. Please review and let us know if the engine problem persists. It may be best to open a support case with us so that we can have a closer look at your system environment and configuration.

Badge

Hi @agelfert​ ,

Sorry that your engines are misbehaving! Firstly, I recommend reviewing the steps in our Engines troubleshooting guide, under the heading "If your Engines are successfully connected but are now missing from the Web UI..." To summarize, when a distributed engine loses connection, the issue will typically come down to one of the following items:

  • Service account permissions
  • Ports (in particular, 7070)
  • Communication with the database
  • Mismatched timezones between components
  • A specific job causing the engine to stop

Potential solutions for each of these issues can be found in the linked article. Please review and let us know if the engine problem persists. It may be best to open a support case with us so that we can have a closer look at your system environment and configuration.

Thanks, @sanaeatsafe​ - I had actually found some similar things to try in another article. But your link is helpful. In the meantime, things seem to have calmed down on my end. I'm still restarting the engine 1x/d but I will turn that off to see if it's still necessary. This may have been a perfect storm of various things going on with networks and servers. Don't you love it when you can't explain what happened?! Thanks either way. I'll study those how-to's for future reference.

Badge +6

Thanks, @sanaeatsafe​ - I had actually found some similar things to try in another article. But your link is helpful. In the meantime, things seem to have calmed down on my end. I'm still restarting the engine 1x/d but I will turn that off to see if it's still necessary. This may have been a perfect storm of various things going on with networks and servers. Don't you love it when you can't explain what happened?! Thanks either way. I'll study those how-to's for future reference.

Fantastic, I hope things continue running smoothly from here on :) If not, let us know!

Reply