Skip to main content

Hey Team,

We’re getting a really weird error on Flow/Server (2022.2), and unfortunately it seems to be one of those ones that we can’t really replicate!
The Infrastructure setup is two hosts, with a mixture of standard engines and dynamic engines.

 

We got the below errors across two (very) different workbenches, on the same engine/host (standard engine):
 

 

We had something similar a couple of weeks ago and resolved it by restarting the effected host. However, this time around it has seemed to fix itself (assuming it was restarted as part of the automatic engine recycling).

Looking at the engine logs for that host, they don’t really give too much more info as to what has happened

Tue-12-Mar-2024 10:37:00.923 AM   INFORM   Thread-13401   BANPRIFMESP01_Engine2   Translation Finished (service '7070'). Return message is '23001:Module 'SurfaceModelFactory' is unavailable for use with this FME edition|LogFileName=job_522412.log'



Wondering if anyone has seen something similar or know what might be causing this?

I have had similair issues in server 2019 and 2021. I’m not sure what caused it but it happened during certain nights when the queue had a lot of scheduled tasks to process. Someone suggested that this might be memory related. Rescheduled some of the jobs with errors an hour later and the issue went away. I suspect it has to do with the amount of jobs per engine cycle, but I’m not an expert. Also see the article How to Control FME Server Engine Memory Usage.
 

 

 


I’ve had a support case recently with a customer who had basically set up one workspace to do everything that that company did. So it was massive with lots and lots of readers and writers. Apparently there is a certain maximum number of DLL’s that can be loaded in memory.


Thanks @nielsgerrits  and @redgeographics interesting avenues to explore.

The times it has occurred have not been high usage times (they’re jobs requested on demand via a web front end), overnight we dump >300 jobs in the queue and don’t get these issues. They’re also small (relative) workflows performing very discrete tasks.

I wonder if lowering the number of successful jobs for an engine restart might help reduce the likelihood of this happening.

I’ll keep an eye on it and see if it becomes more regular and then put forward a case to the client to adjust the MAX_TRANSACTION_RESULT_SUCCESSES parameter


Reply