Question

FMEServerJobSubmitter problems


Badge

Hello,

we are using FmeServerJobSubmitter. Fme version 2019.

 

There is a strange problem: job is submitted, I can see it on the jobs list. So on the jobs list there are 2 jobs: main one and submitted. In the logs of main there is information:

 

2019-07-28 22:56:41|   1.0|  0.0|INFORM FMEServerJobSubmitter (ServerFactory): http://fmetest.com:80 - Running workspace

2019-07-28 23:01:41|   1.1|  0.1|WARN FMEServerJobSubmitter (ServerFactory): http://fmetest.com:80 - Failed to submit request to run workspace

2019-07-28 23:01:41|   1.1|  0.0|ERROR |FMEServerJobSubmitter (ServerFactory): Reason - ''

So it looks that after 5 min main workspace decided that it could not submit request, it logged what is above and finished. But child job was actually still running for 15 min more (as expected as this job takes 20 min) and then it finished.

 

 

I use settings in fme server job submitter:

SubmitJobs: In sequence

Wait for jobs to complete: Yes

 

Funny thing is that if child workspace finishes before 5 min then everything is ok and works as expected.

 

 

Problem does not occur on my express instance. It occurs on oter environment where we have failover setup (2 web server, 2 app cores, few engine servers).

 

 

I also tried with the simplest workspace possible and result is the same:

 

 

Main:

 

 

0684Q00000ArDSXQA3.png

 

 

0684Q00000ArDUtQAN.png

 

 

 

 

And child (only deccelerator with for example 360 s set)

 

 

0684Q00000ArDN5QAN.png

15 replies

Userlevel 4
Badge +26

Yeah that is a weird one - seems like there is a timeout happening. The workspace submits the requests, waits for a response, but after 5 mins states that it couldn't submit it. Looks like a bug to me - either that, or a misconfiguration of your FME Server. Perhaps submit a ticket with Support

Badge

Yeah that is a weird one - seems like there is a timeout happening. The workspace submits the requests, waits for a response, but after 5 mins states that it couldn't submit it. Looks like a bug to me - either that, or a misconfiguration of your FME Server. Perhaps submit a ticket with Support

Yes and strange thing is that when job completes under 5 min then everything is fine. Also I can see that job running even though main worksace says that there was a problem with request :/

 

 

I submited a ticket to support.

 

 

 

Problem does not occur on my express instance. It occurs on oter environment where we have failover setup (2 web server, 2 app cores, few engine servers).
Badge +21

Just out of curiosity, does anyone here on the Safe forum use FME Server with a load balancer (which?) and have you changed any specific setting to keep-alive connections between core and engines that are long (let's say 2hours+?) .

Badge

Did anyone find a reasonto this, or even a solution?

I have the same problem as described by @witos, and is also running FME 2019. Only my timeout is 2 minutes (on the second) before the FMEServerJobSubmitter fails. This could point us towards a timeout setting somewere, but I could not find anything in my FMEServerConfig.txt that had a 2 minutes parameter.

The Expiry Time parameters in the FMEServerJobSubmitter does not seem to change anything.

@witos what was the response on your ticket?

@hollyatsafe do you have any input? I see you have answered some other posts about FmeServerJobSubmitter.

Badge +2

Did anyone find a reasonto this, or even a solution?

I have the same problem as described by @witos, and is also running FME 2019. Only my timeout is 2 minutes (on the second) before the FMEServerJobSubmitter fails. This could point us towards a timeout setting somewere, but I could not find anything in my FMEServerConfig.txt that had a 2 minutes parameter.

The Expiry Time parameters in the FMEServerJobSubmitter does not seem to change anything.

@witos what was the response on your ticket?

@hollyatsafe do you have any input? I see you have answered some other posts about FmeServerJobSubmitter.

Hi @rumle,

Do you happen to have a load balancer in place within your FME Server environment?

Load Balancers are usually configured by default with an idle timeout value and when the child job takes longer than this time (in your case 2 minutes) it is actually cutting the connection between the parent and child engine so you'll see the parent job log report this as a failure.

This was the case for witos, as a workaround you can:

  • Change the FMEServerJobSubmitter option 'Wait for Job to Complete' to No
  • Add an FMEServerJobWaiter transformer right after the JobSubmitter
  • You will need to set up the FMEServerJobWaiter to receive the Job ID from the FMEServerJobSubmitter so that it knows which job to wait for. This transformer has a Polling interval that will let you set how often it should check to see if the Job is finished, this will keep the connection alive. 1 minute is probably a reasonable option for your workflow.

However, please note this will use up an additional engine, but it will at least be a good check to confirm this is the issue you are encountering. Alternatively if you have the option to upgrade we implemented a fix into the FMEServerJobSubmitter for FME 2019.1.3.1 which is available to download now from safe.com/downloads.

Badge

Hi @rumle,

Do you happen to have a load balancer in place within your FME Server environment?

Load Balancers are usually configured by default with an idle timeout value and when the child job takes longer than this time (in your case 2 minutes) it is actually cutting the connection between the parent and child engine so you'll see the parent job log report this as a failure.

This was the case for witos, as a workaround you can:

  • Change the FMEServerJobSubmitter option 'Wait for Job to Complete' to No
  • Add an FMEServerJobWaiter transformer right after the JobSubmitter
  • You will need to set up the FMEServerJobWaiter to receive the Job ID from the FMEServerJobSubmitter so that it knows which job to wait for. This transformer has a Polling interval that will let you set how often it should check to see if the Job is finished, this will keep the connection alive. 1 minute is probably a reasonable option for your workflow.

However, please note this will use up an additional engine, but it will at least be a good check to confirm this is the issue you are encountering. Alternatively if you have the option to upgrade we implemented a fix into the FMEServerJobSubmitter for FME 2019.1.3.1 which is available to download now from safe.com/downloads.

Hi @hollyatsafe.

Thanks for your reply.

I have a rather simple setup with just 1 engine, so the workaround with a FMEServerJobWaiter is not possible. Actually, I did try it earlier, but clearly I ended up waiting on a job that never started ;-)

Being on a 1 engine setup, I am also pretty sure I do not have a load balancer. But I am not into the subject, and I did not set up the system myself.

This issue with the exactly 2 minutes timeout tells me there is a parameter to be changed somewere, but I can not figure out were.

It is nice to hear an update could fix it. That might be the final solution, if I can't figure out something better.

Badge +2

Hi @hollyatsafe.

Thanks for your reply.

I have a rather simple setup with just 1 engine, so the workaround with a FMEServerJobWaiter is not possible. Actually, I did try it earlier, but clearly I ended up waiting on a job that never started ;-)

Being on a 1 engine setup, I am also pretty sure I do not have a load balancer. But I am not into the subject, and I did not set up the system myself.

This issue with the exactly 2 minutes timeout tells me there is a parameter to be changed somewere, but I can not figure out were.

It is nice to hear an update could fix it. That might be the final solution, if I can't figure out something better.

Hi @rumle,

Whilst I've only seen this with a load balancer it is possible that your organisations firewall, browser settings or something else has a similar idle timeout in place. I would ask your IT team to see if they can help you monitor the network traffic on this machine and identify what is causing the connection to break. Then you may be able to go in and increase the timeout value so the connections are not dropped.

Badge

Hi @rumle,

Do you happen to have a load balancer in place within your FME Server environment?

Load Balancers are usually configured by default with an idle timeout value and when the child job takes longer than this time (in your case 2 minutes) it is actually cutting the connection between the parent and child engine so you'll see the parent job log report this as a failure.

This was the case for witos, as a workaround you can:

  • Change the FMEServerJobSubmitter option 'Wait for Job to Complete' to No
  • Add an FMEServerJobWaiter transformer right after the JobSubmitter
  • You will need to set up the FMEServerJobWaiter to receive the Job ID from the FMEServerJobSubmitter so that it knows which job to wait for. This transformer has a Polling interval that will let you set how often it should check to see if the Job is finished, this will keep the connection alive. 1 minute is probably a reasonable option for your workflow.

However, please note this will use up an additional engine, but it will at least be a good check to confirm this is the issue you are encountering. Alternatively if you have the option to upgrade we implemented a fix into the FMEServerJobSubmitter for FME 2019.1.3.1 which is available to download now from safe.com/downloads.

@rumle Please take a look at this:

 

 

https://knowledge.safe.com/idea/96750/fmeserverjobsubmitter-add-polling-interval-to-make.html

 

 

 

Try to download latest version of fme and check if this option is available. As a result there will not be connection all the time but status will be checked on given interval and it should solve your problem. Please comment here if it worked for you.
Badge

@rumle Please take a look at this:

 

 

https://knowledge.safe.com/idea/96750/fmeserverjobsubmitter-add-polling-interval-to-make.html

 

 

 

Try to download latest version of fme and check if this option is available. As a result there will not be connection all the time but status will be checked on given interval and it should solve your problem. Please comment here if it worked for you.

Thanks @witos and @hollyatsafe.

I will test a little more and do an upgrade. I will comment when I have som results.

Hi

same error for me - we are running FME Server 2019.2.1 - using Citrix Netscaler for load balancer

it fails immediately on submit

FMEServerJobSubmitter_Tiler (ServerFactory): http://gisfmeserver.melbourne.vic.gov.au - Submitting a request to run workspace 'LawnAreaPointTiler.fmw' in repository 'QualityAssurance'...

60 FMEServerJobSubmitter_Tiler (ServerFactory): http://gisfmeserver.melbourne.vic.gov.au - Failed to submit request to run workspace 'LawnAreaPointTiler.fmw' in repository 'QualityAssurance'

61 FMEServerJobSubmitter_Tiler (ServerFactory): Reason - user 'admin' is not authorized to perform this action

62 FMEServerJobSubmitter_Tiler (ServerFactory): http://gisfmeserver.melbourne.vic.gov.au - Submitting a request to run workspace 'LawnAreaPointTiler.fmw' in repository 'QualityAssurance'...

63 FMEServerJobSubmitter_Tiler (ServerFactory): http://gisfmeserver.melbourne.vic.gov.au - Failed to submit request to run workspace 'LawnAreaPointTiler.fmw' in repository 'QualityAssurance'

64 FMEServerJobSubmitter_Tiler (ServerFactory): Reason - user 'admin' is not authorized to perform this action

 

 

Badge +2

Hi

same error for me - we are running FME Server 2019.2.1 - using Citrix Netscaler for load balancer

it fails immediately on submit

FMEServerJobSubmitter_Tiler (ServerFactory): http://gisfmeserver.melbourne.vic.gov.au - Submitting a request to run workspace 'LawnAreaPointTiler.fmw' in repository 'QualityAssurance'...

60 FMEServerJobSubmitter_Tiler (ServerFactory): http://gisfmeserver.melbourne.vic.gov.au - Failed to submit request to run workspace 'LawnAreaPointTiler.fmw' in repository 'QualityAssurance'

61 FMEServerJobSubmitter_Tiler (ServerFactory): Reason - user 'admin' is not authorized to perform this action

62 FMEServerJobSubmitter_Tiler (ServerFactory): http://gisfmeserver.melbourne.vic.gov.au - Submitting a request to run workspace 'LawnAreaPointTiler.fmw' in repository 'QualityAssurance'...

63 FMEServerJobSubmitter_Tiler (ServerFactory): http://gisfmeserver.melbourne.vic.gov.au - Failed to submit request to run workspace 'LawnAreaPointTiler.fmw' in repository 'QualityAssurance'

64 FMEServerJobSubmitter_Tiler (ServerFactory): Reason - user 'admin' is not authorized to perform this action

 

 

Hi @rudy_v,

Based on your description this sounds like a separate issue as in witos case the jobs were being submitted and running successfully, they were just losing the connection with the parent job after a load balancer timeout. Therefore I'd recommend posting a new question for better visibility. That being said I have a few questions to help get started with troubleshooting:

1. The error you are getting is based on the user used in the FMEServerJobSubmitter not the user running the parent job. Can you confirm this user has permission on FME Server to run this workspace, and also any web/database connections this workspace is using?

 

 

2. Do you have FME Desktop installed on the same machine as the FME Server, does the job run successfully submitted from that Desktop application?

 

 

3. As a test can you submit the job through the REST API from that user account? This should be fairly similar to the FMEServerJobSubmitter

 

 

4. What is your FME Server installation (express, distributed, fault tolerant..) and did you perform any additional configuration steps or is there a proxy involved?

Hi @rudy_v,

Based on your description this sounds like a separate issue as in witos case the jobs were being submitted and running successfully, they were just losing the connection with the parent job after a load balancer timeout. Therefore I'd recommend posting a new question for better visibility. That being said I have a few questions to help get started with troubleshooting:

1. The error you are getting is based on the user used in the FMEServerJobSubmitter not the user running the parent job. Can you confirm this user has permission on FME Server to run this workspace, and also any web/database connections this workspace is using?

 

 

2. Do you have FME Desktop installed on the same machine as the FME Server, does the job run successfully submitted from that Desktop application?

 

 

3. As a test can you submit the job through the REST API from that user account? This should be fairly similar to the FMEServerJobSubmitter

 

 

4. What is your FME Server installation (express, distributed, fault tolerant..) and did you perform any additional configuration steps or is there a proxy involved?

Hi @hollyatsafe

1 user running the parent and child are the same - admin, which is the fme server adminstrator, it uses a web comnnection in the fmeserverjobmitter - admin is setup as that user. I have used a REST API call for parent

2 desktop worked fine runniing on server

3 see item 1

4 fme setup as fault tlerant - using netscaler, everything works ok, except jobsubmitter - proxy setup correctly on server, as we communicating with AWS and AGOL and a few API streams external

I have used other users also - same error result

Badge +2

Hi @hollyatsafe

1 user running the parent and child are the same - admin, which is the fme server adminstrator, it uses a web comnnection in the fmeserverjobmitter - admin is setup as that user. I have used a REST API call for parent

2 desktop worked fine runniing on server

3 see item 1

4 fme setup as fault tlerant - using netscaler, everything works ok, except jobsubmitter - proxy setup correctly on server, as we communicating with AWS and AGOL and a few API streams external

I have used other users also - same error result

Hi @rudy_v,

 

 

It looks like there was an issue between 2019.0 and 2019.2.2 when the user had both SSL configured and the Web UI Proxy enabled that produced an error with the FMEServerJobSubmitter that was the same as what you have reported.

 

 

In your case I can see you are not using SSL, so whilst this may not be the same, there was a workaround provided for this issue that we could try to see if it also resolves your problem...

Please follow the instructions for configuring a proxy prior to 2019, that is, using APPLY_SETTINGS via the command line: https://docs.safe.com/fme/html/FME_Server_Documentation/AdminGuide/Using_FME_Server_with_Proxy_Server.htm

Also make sure the "Bypass proxy server for local address" is checked.

If this does not resolve the issue please can you submit a case to Safe Software Support so that we can investigate this issue in more detail.

  • Attach the full log file.
  • Please replace the JobSubmitter with an HTTPCaller and submit the job via the Rest API and send this log as well.

Hi @rudy_v,

 

 

It looks like there was an issue between 2019.0 and 2019.2.2 when the user had both SSL configured and the Web UI Proxy enabled that produced an error with the FMEServerJobSubmitter that was the same as what you have reported.

 

 

In your case I can see you are not using SSL, so whilst this may not be the same, there was a workaround provided for this issue that we could try to see if it also resolves your problem...

Please follow the instructions for configuring a proxy prior to 2019, that is, using APPLY_SETTINGS via the command line: https://docs.safe.com/fme/html/FME_Server_Documentation/AdminGuide/Using_FME_Server_with_Proxy_Server.htm

Also make sure the "Bypass proxy server for local address" is checked.

If this does not resolve the issue please can you submit a case to Safe Software Support so that we can investigate this issue in more detail.

  • Attach the full log file.
  • Please replace the JobSubmitter with an HTTPCaller and submit the job via the Rest API and send this log as well.

Hi

yes we disabled proxy - as i already had the settings applied as in previous versions of FME Server - It resolved the issue

 

Hi @rudy_v,

 

 

It looks like there was an issue between 2019.0 and 2019.2.2 when the user had both SSL configured and the Web UI Proxy enabled that produced an error with the FMEServerJobSubmitter that was the same as what you have reported.

 

 

In your case I can see you are not using SSL, so whilst this may not be the same, there was a workaround provided for this issue that we could try to see if it also resolves your problem...

Please follow the instructions for configuring a proxy prior to 2019, that is, using APPLY_SETTINGS via the command line: https://docs.safe.com/fme/html/FME_Server_Documentation/AdminGuide/Using_FME_Server_with_Proxy_Server.htm

Also make sure the "Bypass proxy server for local address" is checked.

If this does not resolve the issue please can you submit a case to Safe Software Support so that we can investigate this issue in more detail.

  • Attach the full log file.
  • Please replace the JobSubmitter with an HTTPCaller and submit the job via the Rest API and send this log as well.

Hi, so in this case it seems like a bug in the proxy setup through the Web GUI interface, as it does not adhere to reading proxy pac file or wpad

so for the web gui - we should have the same confugration or be able to add

the url - as the wpad contains which to exclude not going to the proxy

Reply