Question

How does FME Server Failover work under the covers?

  • 26 September 2016
  • 4 replies
  • 0 views

Userlevel 4
Badge +13


4 replies

Userlevel 4
Badge +13

If we log the messages sent from the failover system we see the following.

 

This set of messages is repeated 3 times.

 

 

fail_message arrived at 16:48:58

 

{ "msg": "FAILOVER Node: Heartbeat not detected on host AP-FAIL-CORE2 and port 7,073.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 16:48:51

 

{ "msg": "FAILOVER Node: Heartbeat not detected on host AP-FAIL-CORE2 and port 7,075.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 16:48:50

 

{ "msg": "FAILOVER Node: Heartbeat not detected on host AP-FAIL-CORE2 and port 7,071.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 16:48:45

 

{ "msg": "FAILOVER Node: Heartbeat not detected on host AP-FAIL-CORE2 and port 7,072.", "ws_topic": "fail_message" }

 

 

Then we see this sent once:

 

 

fail_message arrived at 17:23:42

 

{ "msg": "ACTIVE Node: AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2 and port 7,071.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 17:23:41

 

{ "msg": "ACTIVE Node: AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2 and port 7,073.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 17:23:40

 

{ "msg": "ACTIVE Node: AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2 and port 7,072.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 17:23:37

 

{ "msg": "ACTIVE Node: AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2 and port 7,075.", "ws_topic": "fail_message" }

 

 

There are 4 port numbers in these messages. They correspond to the following components:

 

7073 = scheduling

 

7075 = publishers

 

7071 = core

 

7072 = notification requests? core
Userlevel 4
Badge +13

"AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2”

 

 

^- this means: AP-FAIL-CORE2 is dead.
Userlevel 4
Badge +13

fmeserver.log:

ACTIVE Node: Heartbeat not detected on host AP-FAIL-CORE1 and port 7,071 ACTIVE Node: Taking over jobs from host AP-FAIL-CORE1.
Userlevel 4
Badge +13

I don't know how these work:

 

 

# FAILOVER_SCHEDULER_OWNER - This is the scheduler owner name of the host to be monitored. By default the

# FAILOVER_MONITOR_HOST value is used which by default corresponds with the SCHEDULER_OWNER setitng

# of the monitored host.

#

# FAILOVER_TRANSFORMATION_OWNER - This is the transformation owner name of the host to be monitored. By default the

# FAILOVER_MONITOR_HOST value is used which by default corresponds with the TRANSFORMATION_OWNER setitng

# of the monitored host.

Reply