Skip to main content
Question

How does FME Server Failover work under the covers?

  • September 26, 2016
  • 4 replies
  • 40 views

fmelizard
Safer
Forum|alt.badge.img+22

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

fmelizard
Safer
Forum|alt.badge.img+22
  • Author
  • Safer
  • October 17, 2016

If we log the messages sent from the failover system we see the following.

 

This set of messages is repeated 3 times.

 

 

fail_message arrived at 16:48:58

 

{ "msg": "FAILOVER Node: Heartbeat not detected on host AP-FAIL-CORE2 and port 7,073.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 16:48:51

 

{ "msg": "FAILOVER Node: Heartbeat not detected on host AP-FAIL-CORE2 and port 7,075.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 16:48:50

 

{ "msg": "FAILOVER Node: Heartbeat not detected on host AP-FAIL-CORE2 and port 7,071.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 16:48:45

 

{ "msg": "FAILOVER Node: Heartbeat not detected on host AP-FAIL-CORE2 and port 7,072.", "ws_topic": "fail_message" }

 

 

Then we see this sent once:

 

 

fail_message arrived at 17:23:42

 

{ "msg": "ACTIVE Node: AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2 and port 7,071.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 17:23:41

 

{ "msg": "ACTIVE Node: AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2 and port 7,073.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 17:23:40

 

{ "msg": "ACTIVE Node: AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2 and port 7,072.", "ws_topic": "fail_message" }

 

 

fail_message arrived at 17:23:37

 

{ "msg": "ACTIVE Node: AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2 and port 7,075.", "ws_topic": "fail_message" }

 

 

There are 4 port numbers in these messages. They correspond to the following components:

 

7073 = scheduling

 

7075 = publishers

 

7071 = core

 

7072 = notification requests? core

fmelizard
Safer
Forum|alt.badge.img+22
  • Author
  • Safer
  • October 17, 2016

"AP-FAIL-CORE1 has executed failover operation on host AP-FAIL-CORE2”

 

 

^- this means: AP-FAIL-CORE2 is dead.

fmelizard
Safer
Forum|alt.badge.img+22
  • Author
  • Safer
  • October 17, 2016

fmeserver.log:

ACTIVE Node: Heartbeat not detected on host AP-FAIL-CORE1 and port 7,071 ACTIVE Node: Taking over jobs from host AP-FAIL-CORE1.

fmelizard
Safer
Forum|alt.badge.img+22
  • Author
  • Safer
  • October 17, 2016

I don't know how these work:

 

 

# FAILOVER_SCHEDULER_OWNER - This is the scheduler owner name of the host to be monitored. By default the

# FAILOVER_MONITOR_HOST value is used which by default corresponds with the SCHEDULER_OWNER setitng

# of the monitored host.

#

# FAILOVER_TRANSFORMATION_OWNER - This is the transformation owner name of the host to be monitored. By default the

# FAILOVER_MONITOR_HOST value is used which by default corresponds with the TRANSFORMATION_OWNER setitng

# of the monitored host.