We're in the middle of setting up fault tolerance in FME Server with distributed engine hosts, per the optional configuration in the following documentation:
https://knowledge.safe.com/articles/74845/introducing-the-new-20181-fault-tolerant-architect.html
I found the following un-answered question posted by @swedper, and I have similar concerns about how distributed engine hosts work:
https://knowledge.safe.com/questions/107724/engines-disappear-when-active-core-is-down-in-a-fa.html
We are planning to have two completely independent sets of core and engine hosts, each in its own data center - one primary and one secondary. We have a load balancer that will send all traffic to the primary core host, unless the primary goes down, in which case the load balancer will send traffic to the secondary core host. Both core hosts will be configured to point to the load balancer URL, so given that I have some questions about how FME Server works with this setup:
- Where is the queue itself stored? Core, database, file system? I would hope either the file system or db...
- What happens to running jobs in the primary if the primary goes down?
- If the primary core host can't see the secondary engines and vice versa, how do we keep engine queues aligned between environments?
- Corollary question - do we tell the engine hosts to point to the load balancer url? If we did that, our current plan breaks down a bit...
Any advice here would be appreciated.