Early warning of any failed components in a production system is vital. In our context the issue was a failing engine that wouldn't restart, it turns out this was a problem that a Python script was causing that is a known issue for which there is a work around.
http://fmepedia.safe.com/articles/Error_Unexpected_Behavior/Could-not-read-FME-Engine-response-Connection-may-have-been-lost
Cheers, Dave
Another thing I was considering was using the REST-API to FMEServer to check the QUEUE.XML as a scheduled task on another server. If the jobs in queue are more than X-minutes old and there is no task in the RUNNING.XML jobs then I will send a warning to the operator that the engine MIGHT be down.
I just happened across this by chance, but I think I might be encountering this issue on FME Server 2017. Any who, I'm thinking of submitting an idea to include number of active engines & number of requested engines available as an API call. I'm surprised that Safe hasn't added the capability to be notified via a notification method when you either lose an active engine, active engines = 0, or active engines != requested engines. This would be a HUGE feature enhancement that would allow everyone to sleep better at night... Then again, you could set up an alert to trigger a physical alarm or a less intense alarm clock so you would only wake up when 0 = 6 :D
I just happened across this by chance, but I think I might be encountering this issue on FME Server 2017. Any who, I'm thinking of submitting an idea to include number of active engines & number of requested engines available as an API call. I'm surprised that Safe hasn't added the capability to be notified via a notification method when you either lose an active engine, active engines = 0, or active engines != requested engines. This would be a HUGE feature enhancement that would allow everyone to sleep better at night... Then again, you could set up an alert to trigger a physical alarm or a less intense alarm clock so you would only wake up when 0 = 6 :D
/fmerest/v3/licensing/license/status
Sample result:
{
"expiryDate": "20170930",
"maximumEngines": 1,
"serialNumber": "xxxx",
"isLicenseExpired": false,
"isLicensed": true
}
Then you can check for the number of running engines using:
/fmerest/v3/transformations/engines
Sample result:
{
"offset": -1,
"limit": -1,
"totalCount": 1,
"items": f
{
"hostName": "myservername",
"resultFailureCount": 0,
"resultSuccessCount": 0,
"maxTransactionResultSuccess": 100,
"instanceName": "myservername_Engine2",
"transactionPort": 58008,
"currentJobID": -1,
"maxTransactionResultFailure": 10,
"buildNumber": 17539,
"platform": "WIN32"
}
]
}
In my opinion, it's much better to do the monitoring through dedicated monitoring software rather than letting FME Server monitor itself. IT services usually have dedicated software that can relatively easily be set up to monitor FME Server using the REST API.
I just happened across this by chance, but I think I might be encountering this issue on FME Server 2017. Any who, I'm thinking of submitting an idea to include number of active engines & number of requested engines available as an API call. I'm surprised that Safe hasn't added the capability to be notified via a notification method when you either lose an active engine, active engines = 0, or active engines != requested engines. This would be a HUGE feature enhancement that would allow everyone to sleep better at night... Then again, you could set up an alert to trigger a physical alarm or a less intense alarm clock so you would only wake up when 0 = 6 :D
@rylanatsafe Just lost an engine while testing 2019.0 due to it "hanging". Apparently the dev team knows about it and a fix might? be in the works for .1. Until then (and I guess beyond then) what is the workaround we should be using to see if an engine is hanging so we can manually restart it? I looked in Automations triggering, but it doesn't seem like this is in there. Might be something nice to add?
Thanks!
@rylanatsafe Just lost an engine while testing 2019.0 due to it "hanging". Apparently the dev team knows about it and a fix might? be in the works for .1. Until then (and I guess beyond then) what is the workaround we should be using to see if an engine is hanging so we can manually restart it? I looked in Automations triggering, but it doesn't seem like this is in there. Might be something nice to add?
Thanks!
@runneals - We implemented some fixes for FME Server 2019.0 related to FME Engine issues and improper shutdowns at the end of a translation – i.e. the command to shutdown/end/restart was issued, but effectively ignored.
I'm concerned that you have experienced a similar issue in 2019.0 despite the fixes in place (assuming that you are in Build 19238+). It would be great if we can throw together a reproduction package for the team to investigate. We do not have any specific scenarios targeted for FME Engines in the 2019.1 Release that, I think, would address this issue (given the information in hand, anyway).
So far as I understand the scenario presented, we do not have a workaround available short of manual intervention.
Does the FME Engine show up on the Engines & Licensing page? Does the previous translation, run on that FME Engine, complete successfully?
Please feel free to or contact support so that we can work towards identifying the cause. (Referencing this thread would be helpful!)
Kind regards,
Rylan Maschak