metal/mtce/src/maintenance
Eric MacDonald 5c043f7ca9 Make Mtce ignore heartbeat events from in-active controller.
There is the potential for a race condition that can lead to
mtce incorrectly failing hosts due to heartbeat failure event
messages sourced from the in-active controller.

During a split brain recovery action scenario there was a swact
which left the hbsAgent on the new stand-by controller thinking
it was still on the active controller.

This specific split brain failure mode was one where the active
and then (after swact) stand-by controller was failing heartbeat
to its peer and other nodes in the system even though the new
active controller saw heartbeat working fine.

The problem being, the in-active controller detected and sent
a heartbeat loss message to mtce before mtce was able to update
the in-active controller's heartbeat activity status which would
have gated the loss event send.

This update adds an additional layer of protection by intentionally
ignoring heartbeat events from the in-active controller that might
slip through due to this activity state change race condition.

Also fixed a flooding log in the hbsAgent for big systems.

Change-Id: I825a801166b3e80cbf67945c7f587851f4e0d90b
Closes-Bug: 1813976
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-05-09 14:42:01 +00:00
..
Makefile Remove Resource Monitor ; aka rmon, from the load 2019-03-19 16:12:38 -04:00
ipmiClient.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcAlarm.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcAlarm.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcBrdMgmt.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcBrdMgmt.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcCmdHdlr.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcCompMsg.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcCtrlMsg.cpp Make Mtce ignore heartbeat events from in-active controller. 2019-05-09 14:42:01 +00:00
mtcHttpSvr.cpp Drop the redundant code 2019-04-09 11:13:34 +08:00
mtcHttpSvr.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcHttpUtil.cpp MTCE: reading BMC passwords from Barbican secret storage. 2019-02-14 09:04:46 -05:00
mtcHttpUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcInvApi.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcInvApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcIpmiUtil.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcIpmiUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeComp.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcNodeComp.h Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcNodeCtrl.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcNodeFsm.cpp Mtce: Add Thresholded Maintenance Enable Recovery support 2018-12-12 08:11:36 -05:00
mtcNodeFsm.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeHdlrs.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcNodeHdlrs.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeMnfa.cpp Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcNodeMsg.h Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
mtcSmgrApi.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcSmgrApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcStubs.cpp Implement Active-Active Heartbeat as HA Improvement Fix 2018-12-10 09:57:34 -05:00
mtcSubfHdlrs.cpp Change compute node to worker node personality 2018-12-13 13:08:48 -05:00
mtcThreads.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcThreads.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcVimApi.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcVimApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcWorkQueue.cpp [Trivial Fix] fix typos in docstrings 2019-02-21 14:46:06 +08:00