Question

Notification only after multiple job failures?


Badge +5

We have configured a regular scheduled (every 15 mins) job in FME Server (2019.1) and understand how to notify to a topic on failure and then send an email.

However, ideally we would like to generate a single email only after a number of failures and not an email for every single failure. Does anyone know of a way to do this in FME Server?

Perhaps it might be possible with automations to record details of previous failures and then only send an email after N failures?

 

Any thoughts would be much appreciated.


3 replies

Userlevel 5
Badge +29

You could set this up as a parent/child workflow. The parent runs the child process (FMEServerJobSubmitter) and the returns the status of the Job. You then write the result of this, along with a timestamp, to a database/log (could use the same DB (different table) that FME Server uses OR use the rest API to look at previously run jobs OR directly read the FME Server Database to look at job history).

Each time the parent workbench runs it looks at the past Jobs and see's if there are previous failures that combined, should trigger an email. If the current child process fails, then the parent, knowing that a failure condition has been met, can send an email.

Badge +5

Thanks @hkingsbury​, that sounds like a reasonable solution. We are looking at using an Automation for this along the lines of the process you've mentioned. The plan is to:

 

  • Create a custom topic.
  • Set the schedule job(s) to notify this topic on failure.
  • Create an automation that triggers a workspace on notification to this topic.
  • The workspace would query the FME server REST API to see if the given job had failed previously (in last 24 hrs).
  • If there were no previous failures, then send an email.

 

From initial experimentation, it looks like we can pick up most of the information about the failed job from the event json that is available via the topic trigger.

 

Regards

John

Badge +5

Hi @john_gis4bus​ ,

I've created a workbench that reads job status by a rest api request and send an email for each status == FAIL.

I think you can do the same approach but applying a filter to get only your specific workbench to check. You can log each job that get you an error and when the count of rows in this table are enough you can fire an email and delete the entries in the sqlite db table.

You can schedule the workbench that do the job each 15 minutes... or less.

Reply