Solved

auto resubmit job

  • 25 July 2019
  • 8 replies
  • 14 views

Badge

Hello,

 

Is it possible to configure auto resubmit job after fail on fme server using fme 2019?

icon

Best answer by redgeographics 25 July 2019, 14:18

View original

8 replies

Userlevel 5
Badge +25

If FME Server crashes during the processing of the job it should resubmit once it gets back on its feet. For other jobs that fail there is no such thing in place. In most, if not all, cases a job that failed the first time will fail the second time as well so potentially you will lock up an engine until eternity.

What you could do is set up a topic to trigger on failure and use that to find the job id, find its parameters and resubmit it via the Rest API. Alternatively, an approach through FMEServerJobSubmitters or the new Automations will let you add processing logic to a failed job as well.

Badge

If FME Server crashes during the processing of the job it should resubmit once it gets back on its feet. For other jobs that fail there is no such thing in place. In most, if not all, cases a job that failed the first time will fail the second time as well so potentially you will lock up an engine until eternity.

What you could do is set up a topic to trigger on failure and use that to find the job id, find its parameters and resubmit it via the Rest API. Alternatively, an approach through FMEServerJobSubmitters or the new Automations will let you add processing logic to a failed job as well.

Well sometimes job may fail because it can't connect to the database because of network issues that may be fixed after 1 min. So then next time it will work without problems. I think that it would be nice to have configuration to resubmit with circuit breaker. For example try 3 times (1 time after minute, second time after 5 min, third time after 30 min) and if soent work stop retrying :)

 

 

I think that I will try a bit with automations in fme 2019.

 

 

Is there any easy way to resubmit job using rest api with information that it was triggred by given scheduled task?
Userlevel 5
Badge +25

Well sometimes job may fail because it can't connect to the database because of network issues that may be fixed after 1 min. So then next time it will work without problems. I think that it would be nice to have configuration to resubmit with circuit breaker. For example try 3 times (1 time after minute, second time after 5 min, third time after 30 min) and if soent work stop retrying :)

 

 

I think that I will try a bit with automations in fme 2019.

 

 

Is there any easy way to resubmit job using rest api with information that it was triggred by given scheduled task?

I think it'll be a bit clunky. If you want to get that retry after a certain amount of time you'll have to create a one-time schedule using the Rest API to trigger the workspace again. If you have a lot of user parameters you'll need to retrieve those as well so you can set them again for the next run.

Badge +4

If FME Server crashes during the processing of the job it should resubmit once it gets back on its feet. For other jobs that fail there is no such thing in place. In most, if not all, cases a job that failed the first time will fail the second time as well so potentially you will lock up an engine until eternity.

What you could do is set up a topic to trigger on failure and use that to find the job id, find its parameters and resubmit it via the Rest API. Alternatively, an approach through FMEServerJobSubmitters or the new Automations will let you add processing logic to a failed job as well.

hello - crashing the thread a little here - but we are seeing this 'auto resubmit' issue on our fme 2015 server instance - and we don't like it! I'm hunting around for the triggers of this behaviour and, ideally, how to stop it! anyone have any more info?

Badge +2

hello - crashing the thread a little here - but we are seeing this 'auto resubmit' issue on our fme 2015 server instance - and we don't like it! I'm hunting around for the triggers of this behaviour and, ideally, how to stop it! anyone have any more info?

Hi @nrich,

FME Server is set up by default to attempt to run a job three times if it fails. If you would prefer no resubmits this behaviour can be turned off by modifying the parameters in fmeServerConfig.txt.

  1. Navigate to <FMEServerDIr>\\Server\\fmeServerConfig.txt
  2. Locate the ENABLE_TRANSACTION_VALIDATION_RETRIES parameter and change the value to false.
  3. Restart FME Server to apply the changes.

The 2015 documentation on this parameter can be found at http://docs.safe.com/fme/2015.1/html/FME_Server_Documentation/Default.htm#ReferenceManual/ConfigFileRef.htm

 

Badge +4

Hi @nrich,

FME Server is set up by default to attempt to run a job three times if it fails. If you would prefer no resubmits this behaviour can be turned off by modifying the parameters in fmeServerConfig.txt.

  1. Navigate to <FMEServerDIr>\\Server\\fmeServerConfig.txt
  2. Locate the ENABLE_TRANSACTION_VALIDATION_RETRIES parameter and change the value to false.
  3. Restart FME Server to apply the changes.

The 2015 documentation on this parameter can be found at http://docs.safe.com/fme/2015.1/html/FME_Server_Documentation/Default.htm#ReferenceManual/ConfigFileRef.htm

 

thanks @hollyatsafe, it's good to learn something new, I thought I'd been over all these config files with a fine toothed comb years ago.

Are you able to specify what parameters/conditions are used to assess a job as 'invalid' by this process?

As I'm trying to track down why one particular worker/runner process is causing fmeserver to generate this 'resubmit' behaviour, we have other workbenches using fmeserverjobsubmitters on the same fmeserver instance that don't.

we have also run the same workbenches, against the same database, on a different fme server without issue, so we thing the issue is fme server related, rather than workbench related.

I have a rest.log file which is 4.5Gb on the problematic server, so I'm having that recycled, on the possibility that the rest service isn't responding quickly enough due to accessing the log file. If this doesn't work I'll raise a support request.

 

thanks Nick

 

Badge +4

Hi @nrich,

FME Server is set up by default to attempt to run a job three times if it fails. If you would prefer no resubmits this behaviour can be turned off by modifying the parameters in fmeServerConfig.txt.

  1. Navigate to <FMEServerDIr>\\Server\\fmeServerConfig.txt
  2. Locate the ENABLE_TRANSACTION_VALIDATION_RETRIES parameter and change the value to false.
  3. Restart FME Server to apply the changes.

The 2015 documentation on this parameter can be found at http://docs.safe.com/fme/2015.1/html/FME_Server_Documentation/Default.htm#ReferenceManual/ConfigFileRef.htm

 

# ENABLE_TRANSACTION_VALIDATION_RETRIES - Can be true or false. true to enable requeue of jobs that are in invalid

# state, otherwise false to have jobs in invalid states treated as job failures.

# When true, jobs will be retried up to the limit of

# MAX_FAILED_TRANSACTION_REQUEST_RETRIES.

Badge +2

# ENABLE_TRANSACTION_VALIDATION_RETRIES - Can be true or false. true to enable requeue of jobs that are in invalid

# state, otherwise false to have jobs in invalid states treated as job failures.

# When true, jobs will be retried up to the limit of

# MAX_FAILED_TRANSACTION_REQUEST_RETRIES.

Hi @nrich,

A job is considered invalid if FME Server fails to get the job status, this will occur if anything stops the job finishing to completion, for example an engine crash or network interruption between the core and engine.

One of the key things to note however is that under this scenario the job is resubmitted under the same id, so it is not something that is noticeable to the user. The log displayed in the Web UI will be from the last attempted run however by viewing the jobs folder in Resources > Logs > Core > Current you will see the job listed as a folder with multiple logs for the job and the attempt number appended to the log name e.g. job_id#_0, job_id#_1, this will indicate if there was a previous invalid response and how many attempts were made.

I see you also posted a separate question here and from that description it sounds like this re-submit ran under a new job ID in which case this parameter is not the cause and changing it will not have any effect. Were you able to review my suggestions on that post?

Reply