metal/mtce/src
Eric MacDonald 5c043f7ca9 Make Mtce ignore heartbeat events from in-active controller.
There is the potential for a race condition that can lead to
mtce incorrectly failing hosts due to heartbeat failure event
messages sourced from the in-active controller.

During a split brain recovery action scenario there was a swact
which left the hbsAgent on the new stand-by controller thinking
it was still on the active controller.

This specific split brain failure mode was one where the active
and then (after swact) stand-by controller was failing heartbeat
to its peer and other nodes in the system even though the new
active controller saw heartbeat working fine.

The problem being, the in-active controller detected and sent
a heartbeat loss message to mtce before mtce was able to update
the in-active controller's heartbeat activity status which would
have gated the loss event send.

This update adds an additional layer of protection by intentionally
ignoring heartbeat events from the in-active controller that might
slip through due to this activity state change race condition.

Also fixed a flooding log in the hbsAgent for big systems.

Change-Id: I825a801166b3e80cbf67945c7f587851f4e0d90b
Closes-Bug: 1813976
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-05-09 14:42:01 +00:00
..
alarm Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
common Make Mtce ignore heartbeat events from in-active controller. 2019-05-09 14:42:01 +00:00
fsmon Add EXTRALDFLAGS to linker in a number of Makefiles 2019-02-28 22:34:54 -06:00
fsync Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
heartbeat Make Mtce ignore heartbeat events from in-active controller. 2019-05-09 14:42:01 +00:00
hostw Fix the logic error in hostwd 2019-04-25 10:28:18 +08:00
hwmon Remove references to ceilometer in maintenance 2019-04-30 14:28:12 -04:00
lmon Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
maintenance Make Mtce ignore heartbeat events from in-active controller. 2019-05-09 14:42:01 +00:00
mtclog Add EXTRALDFLAGS to linker in a number of Makefiles 2019-02-28 22:34:54 -06:00
pmon Remove references to ceilometer in maintenance 2019-04-30 14:28:12 -04:00
public Set SHELL in Makefiles that use bash constructs 2018-12-07 14:09:48 -06:00
scripts Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
LICENSE Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
Makefile Remove Resource Monitor ; aka rmon, from the load 2019-03-19 16:12:38 -04:00