I'm running FME Server in AWS using Aurora PostgreSQL Serverless for the FME Server Database and I check the health of the server by calling the fmerest/v3/healthcheck?textResponse=true&ready=true REST API periodically.
Everything runs ok for 24hrs but after that, the health checks start to fail and do not recover again. From the errors logged (below), I think it might be a connection pooling issue. I don't get the same issues on an Express install and there's a line on https://aws.amazon.com/blogs/database/best-practices-for-working-with-amazon-aurora-serverless/ that says "Aurora Serverless closes connections that are older than 24 hours. Make sure that your connection pool refreshes connections frequently.", so I think this might be the cause.
I've copied some errors from the logs (can provide full logs if required), am I right in thinking this is caused by Aurora Serverless killing the connections or is this something else? Is there a setting anywhere that will force FME to refresh the connections before 24hrs or do I need to look at a different database option?
fmescheduler log:
Tue-28-Jun-2022 08:04:38.933 AM ERROR fmehealthnodeclient SQLException: An I/O error occurred while sending to the backend.
Tue-28-Jun-2022 08:04:38.936 AM ERROR fmehealthnodeclient org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
COM.safe.fmeserver.api.FMEServerException: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at COM.safe.fmeserver.database.ManagerBase.execute(ManagerBase.java:146)
at COM.safe.fmeserver.database.healthNode.HealthNodeOps.healthNodeKeepAlive(HealthNodeOps.java:40)
at COM.safe.fmeserver.database.healthNode.HealthNodeClient.run(HealthNodeClient.java:103)
at java.lang.Thread.run(Thread.java:748)
fmeserver log
Tue-28-Jun-2022 08:05:39.803 AM ERROR fmeenginemgrnodeclient 402902 : Failed to connect to Job Queue. Please ensure Job Queue is started.
Tue-28-Jun-2022 08:05:39.804 AM ERROR fmeenginemgrnodeclient Could not get a resource from the poolredis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:53)
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226)
at COM.safe.fmeserver.JobRouterConfig.checkActiveQueueNodeAlive(JobRouterConfig.java:205)
at COM.safe.fmeserver.FMEServerJobRouter.checkActiveQueueNodeAlive(FMEServerJobRouter.java:146)
at COM.safe.fmeserver.jobs.EngineManagerNodeOps.checkActiveQueueNodeAlive(EngineManagerNodeOps.java:107)
at COM.safe.fmeserver.jobs.EngineManagerNodeClient.executeLeaderOp(EngineManagerNodeClient.java:98)
at COM.safe.fmeserver.database.NodeClient.run(NodeClient.java:123)
at java.lang.Thread.run(Thread.java:748)
I'm running FME Server v2022.0.1.1