Skip to main content
Solved

How is it possible that 6 of 8 engines in FME server fail during the night running jobs? The error related to this faillure is attached in the text below. How can we prevent it to happen again?

  • August 18, 2022
  • 9 replies
  • 35 views

mkastelijns
Contributor
Forum|alt.badge.img+5

There is a FME server setup with 8 engines, each on their own server.

 

Error: Authentication failed: java.sql.SQLException: COM.safe.fmeserver.database.FMEServerDBException: org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections

 

Anonymized event before the engine faillure:

INFORM  xxxxxxx  418049 : Received HTTP Basic credentials for user: xxxxxxxxx

 

Temporary solution is restarting the FME-services on the servers.

 

Possible cause: recursive connection error during authentication check which eventually shuts down the engine?

Best answer by mkastelijns

Hey Richard,

 

We have increased the number of connections in the backend postgresql database from 300 to 500. Since then there are no more issues. The engines remain up and running. Thanks for your advise.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

9 replies

richardatsafe
Safer
Forum|alt.badge.img+10
  • Safer
  • 217 replies
  • August 19, 2022

Hi @mkastelijns​ ,

 

Thanks for the question. It would be helpful to know the build of FME Server, and if you are using the default Postgres database or your own. FME Server usage may also be useful, for example are you running many short jobs or if you are extensively using automation? Lastly, it would be helpful to know if and if there are multiple cores / web applications. I encourage you to make a case and we can help refine the issue. That all being said, based on the error, it sounds like we might need to increase the max_connections on the Postgres database.

 


mkastelijns
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • 8 replies
  • August 19, 2022

Hi @mkastelijns​ ,

 

Thanks for the question. It would be helpful to know the build of FME Server, and if you are using the default Postgres database or your own. FME Server usage may also be useful, for example are you running many short jobs or if you are extensively using automation? Lastly, it would be helpful to know if and if there are multiple cores / web applications. I encourage you to make a case and we can help refine the issue. That all being said, based on the error, it sounds like we might need to increase the max_connections on the Postgres database.

 

Hello Richard,

 

Thanks for the quick response. Some more information about the FME server 2018 setup:

 

We have a build:

FME Server 2018.1 - Build 18520 - win64

We have our own postgresql database that we can configure ourselves including the number of connections.

There are multiple cores / web applications; every server has 1 engine 1 core and the web application.

 

If you need more information, please let me know.

 

Kind regards,

 

Matthijs Kastelijns


richardatsafe
Safer
Forum|alt.badge.img+10
  • Safer
  • 217 replies
  • August 22, 2022

Hello Richard,

 

Thanks for the quick response. Some more information about the FME server 2018 setup:

 

We have a build:

FME Server 2018.1 - Build 18520 - win64

We have our own postgresql database that we can configure ourselves including the number of connections.

There are multiple cores / web applications; every server has 1 engine 1 core and the web application.

 

If you need more information, please let me know.

 

Kind regards,

 

Matthijs Kastelijns

Hi @mkastelijns​ ,

 

Hopefully, you were able to test increasing the number of connections. I would suggest 100 connections should be enough. If you want to make a case we could review the logs and configuration to see if we can find any reasons you're running out of logs.


mkastelijns
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • 8 replies
  • Best Answer
  • October 3, 2022

Hey Richard,

 

We have increased the number of connections in the backend postgresql database from 300 to 500. Since then there are no more issues. The engines remain up and running. Thanks for your advise.


richardatsafe
Safer
Forum|alt.badge.img+10
  • Safer
  • 217 replies
  • October 3, 2022

Hey Richard,

 

We have increased the number of connections in the backend postgresql database from 300 to 500. Since then there are no more issues. The engines remain up and running. Thanks for your advise.

Hi @mkastelijns​ ,

 

Thanks for the response. I'm happy it's stable, but 500 or even 300 connections are much higher than we have ever seen. We are curious about the setup and the reason this is happening. If you are concerned and want to open a case we would be happy to investigate and see if we can make a reproduction of this scenario.


mkastelijns
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • 8 replies
  • March 22, 2023

Hi @mkastelijns​ ,

 

Thanks for the response. I'm happy it's stable, but 500 or even 300 connections are much higher than we have ever seen. We are curious about the setup and the reason this is happening. If you are concerned and want to open a case we would be happy to investigate and see if we can make a reproduction of this scenario.

14 FME server machines on two clusters, every machine has its own core, tomcat and standard engine. So far the changes have greatly improved the machines. The last 6 months there have been no more issues.


vn1
Contributor
Forum|alt.badge.img+4
  • Contributor
  • 9 replies
  • May 1, 2023

Hi @mkastelijns​ ,

 

Thanks for the question. It would be helpful to know the build of FME Server, and if you are using the default Postgres database or your own. FME Server usage may also be useful, for example are you running many short jobs or if you are extensively using automation? Lastly, it would be helpful to know if and if there are multiple cores / web applications. I encourage you to make a case and we can help refine the issue. That all being said, based on the error, it sounds like we might need to increase the max_connections on the Postgres database.

 

@richardatsafe​ 

 

Could you elaborate on "FME Server usage may also be useful, for example are you running many short jobs or if you are extensively using automation?"

 

We recently upgraded to FME Server 2022.1.1 and are experiencing jobs failing with no error message in the job logs. All of our jobs more or less run under Automations and most run in a short amount of time. The jobs that fail will run independently on server without issue.


richardatsafe
Safer
Forum|alt.badge.img+10
  • Safer
  • 217 replies
  • May 1, 2023

@richardatsafe​ 

 

Could you elaborate on "FME Server usage may also be useful, for example are you running many short jobs or if you are extensively using automation?"

 

We recently upgraded to FME Server 2022.1.1 and are experiencing jobs failing with no error message in the job logs. All of our jobs more or less run under Automations and most run in a short amount of time. The jobs that fail will run independently on server without issue.

Hi @vn1​ ,

 

I don't think I am able to generalize this to specific usage at this point. This issue is very uncommon and typically restricted to larger 10 engine + installations, but there may be a usage pattern in there as well. If you suspect you are seeing a similar issue please create a case with the log files, particularly the database logs, and we can explore it further.


vn1
Contributor
Forum|alt.badge.img+4
  • Contributor
  • 9 replies
  • June 2, 2023

@richardatsafe​ 

 

Could you elaborate on "FME Server usage may also be useful, for example are you running many short jobs or if you are extensively using automation?"

 

We recently upgraded to FME Server 2022.1.1 and are experiencing jobs failing with no error message in the job logs. All of our jobs more or less run under Automations and most run in a short amount of time. The jobs that fail will run independently on server without issue.

Thank you for the response. If anyone stumbles on this, we were able to resolve our issues by implementing automatic retries, seen in the article below. It's still unknown why a couple of our automations would fail with no error messages in the job/system log files. They may still continue to fail, but will eventually succeed with automatic retries.

 

https://community.safe.com/s/article/Configuring-Guaranteed-Delivery-in-FME-Server-Automations-with-Automated-Retries