Skip to main content

Hello,

So we have some workbenches (FME 2018.1.1; 20181203 - Build 18578 - WIN64) that either read or write to AGOL after performing a bunch of operations on the data. These workbenches were working fine up until a week ago, when all of a sudden they started throwing a very generic 503 error message and failing completely.

If I understand correctly, a 503 means it's a server side error (so it's likely something with ESRI), right?

We've been having issues with our organization's ArcGIS Online portal for the last couple of days (layers being randomly dropped from maps, apps refusing to load) and I'm starting to think that their December 2018 AGOL update messed some things up, including the way FME's readers/writers communicate with it. Is anyone else having trouble working with AGOL lately?

Thanks!

 

@runneals can you explain the pitfalls of appending the nocdn string? Why would esri caution against it?

Because it's not scalable and goes around the caching mechanisms built in place to help increase scalability. I use it to pull data with a scheduled FME job, which I think is OK, but you don't want to start using it on public maps and stuff.


We are seeing 503 responses as well from AGOL.

 

my perspective is HTTP 503 is quite a valid response that should not be treated as a error.

The documentation on 503 indicates there is a Retry-After response header which indicates when to try again.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After

 

It would be nice if the fme feature layer connectors, correctly handled a 503 response code, and implemented the timed retry.

 

thanks


@mariofederis There really wasn't a "fix" since the errors are all coming from ArcGIS Online. SAFE implemented suggested improvements from the AGOL team (retry 3 times and then fail). The AGOL dev team also suggested trying to reduce the number of records that are being written for each write request to no more than 100-200 ("Features Per Request") which should help improve things. I have seen a maximum of 5-10 50x errors occur per day, which is a major improvement, although still a problem for those jobs that run less than once a day.

@runneals. I attempted the recommendation of running 100-200 features per request, but still receive a timeout error consistently. Did you make any additional changes to your workspace? Also what version of FME are you running? I'm currently on 2019.2.1


Just curious, is anyone experiencing any 500, 502, 503, 504 errors any longer? Following up with my esri Technical Account Manager to see how to keep this moving through the system, but the more people experiencing this the better (in terms of getting it reviewed by esri).

There's 2 bugs out there: BUG-000124290 (502 errors) & BUG-000123780 (503 errors)


Per ESRI they are now public bugs and in the product plan. Follow them here:

https://support.esri.com/en/bugs/nimbus/QlVHLTAwMDEyMzc4MA==

https://support.esri.com/en/bugs/nimbus/QlVHLTAwMDEyNDI5MA==

 


Per ESRI they are now public bugs and in the product plan. Follow them here:

https://support.esri.com/en/bugs/nimbus/QlVHLTAwMDEyMzc4MA==

https://support.esri.com/en/bugs/nimbus/QlVHLTAwMDEyNDI5MA==

 

As a side note, we introduced automatic retires on 5xx errors for ArcGIS Online in FME 2019.1 and later to help cope with this behaviour. But as @runneals indicates the real fix will be from the above stated ESRI bugs.


Per ESRI they are now public bugs and in the product plan. Follow them here:

https://support.esri.com/en/bugs/nimbus/QlVHLTAwMDEyMzc4MA==

https://support.esri.com/en/bugs/nimbus/QlVHLTAwMDEyNDI5MA==

 

Thanks for the continued updates on this issue and also posting the Esri bug links @runneals!

We have 3 AGOL-related FME workbenches that run every night as a scheduled task. In our environment we are now using FME 2019.2 and ArcGIS 10.7.1 on a newer Win 2016 server and we see the 502 and 503 errors far less than we did in our old environment (FME 2018.1 and ArcGIS 10.3.1). In our old environment we saw the errors almost daily but now I think we've gone a month and have only seen 1 or 2 errors.


A 503 Service Unavailable Error is an HTTP response status code indicating that your web server operates properly, but it can't temporarily handle the request at the moment. This error happen for a wide variety of reasons. Normally, this error can be due to a temporary overloading or maintenance being performed on the server and it is resolved after a period of time or once another thread has been released by web-server application.  A 503 service unavailable is a temporary condition and the caller should retry after a reasonable period of time. Also check the http response headers for the description of the 503 error.


Common causes are a server that is down for maintenance or that is overloaded. This response should be used for temporary conditions and the Retry-After HTTP header should, if possible, contain the estimated time for the recovery of the service. In most cases this could happen (assuming there are no faults in your app) if there are long running tasks and as a result the request queue is backed up. Almost always, the 503 Service Unavailable Error is on the website itself and there's nothing you can do about it but try again later.

 

 


Reply