metal/mtce/src/maintenance
Eric MacDonald da398e0c5f Debian: Make Mtce offline handler more resilient to slow shutdowns
The current offline handler assumes the node is offline after
'offline_search_count' reaches 'offline_threshold' count
regardless of whether mtcAlive messages were received during
the search window.

The offline algorithm requires that no mtcAlive messages
be seen for the full offline_threshold count.

During a slow shutdown the mtcClient runs for longer than
it should and as a result can lead to maintenance seeing
the node as recovered before it should.

This update manages the offline search counter to ensure that
it only reached the count threshold after seeing no mtcAlive
messages for the full search count. Any mtcAlive message seen
during the count triggers a count reset.

This update also
1. Adjusts the reset retry cadence from 7 to 12 secs
   to prevent unnecessary reboot thrash during
   the current shutdown.
2. Clears the hbsClient ready event at the start of the
   subfunction handler so the heartbeat soak is only
   started after seeing heartbeat client ready events
   that follow the main config.

Test Plan:

PASS: Debian and CentOS Build and DX install
PASS: Verify search count management
PASS: Verify issue does not occur over lock/unlock soak (100+)
      - where the same test without update did show issue.
PASS: Monitor alive logs for behavioral correctness
PASS: Verify recovery reset occurs after expected extended time.

Closes-Bug: 1993656
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: If10bb75a1fb01d0ecd3f88524d74c232658ca29e
2022-10-24 15:57:43 +00:00
..
Makefile Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
mtcAlarm.cpp Alarm Hostname controller function has in-service failure reported 2022-10-05 10:30:01 -04:00
mtcAlarm.h Alarm Hostname controller function has in-service failure reported 2022-10-05 10:30:01 -04:00
mtcBmcUtil.cpp Mtce: Add ActionInfo extension support for reset operations. 2022-10-13 17:40:05 +00:00
mtcBmcUtil.h Add redfish support detection to maintenance 2019-08-19 14:03:37 +00:00
mtcCmdHdlr.cpp Debian: Make Mtce offline handler more resilient to slow shutdowns 2022-10-24 15:57:43 +00:00
mtcCompMsg.cpp Prevent mtcClient from sending to uninitialized socket in AIO SX 2021-04-21 10:20:10 -04:00
mtcCtrlMsg.cpp Fix enabling heartbeat of self from the peer controller 2021-05-06 13:35:54 -04:00
mtcHttpSvr.cpp Fix Mtce's VIM systems query handling 2019-10-09 09:44:35 -04:00
mtcHttpSvr.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcHttpUtil.cpp MTCE: reading BMC passwords from Barbican secret storage. 2019-02-14 09:04:46 -05:00
mtcHttpUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcInvApi.cpp Prevent mtcClient from sending to uninitialized socket in AIO SX 2021-04-21 10:20:10 -04:00
mtcInvApi.h Fix format-overflow warning in mtcInvApi 2019-08-27 10:33:44 -05:00
mtcNodeComp.cpp Add Debian packaging for mtce packages 2021-10-29 09:17:00 -05:00
mtcNodeComp.h Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
mtcNodeCtrl.cpp Prevent mtcClient from sending to uninitialized socket in AIO SX 2021-04-21 10:20:10 -04:00
mtcNodeFsm.cpp Prevent mtcClient from sending to uninitialized socket in AIO SX 2021-04-21 10:20:10 -04:00
mtcNodeFsm.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeHdlrs.cpp Debian: Make Mtce offline handler more resilient to slow shutdowns 2022-10-24 15:57:43 +00:00
mtcNodeHdlrs.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcNodeMnfa.cpp Fix Graceful Recovery handling while in Graceful Recovery handling 2021-03-17 14:25:19 -04:00
mtcNodeMsg.h Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
mtcSmgrApi.cpp Debian: Fix mtcAgent segfault on SM host state change requests 2022-06-26 20:18:20 +00:00
mtcSmgrApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcStubs.cpp Implement Active-Active Heartbeat as HA Improvement Fix 2018-12-10 09:57:34 -05:00
mtcSubfHdlrs.cpp Debian: Make Mtce offline handler more resilient to slow shutdowns 2022-10-24 15:57:43 +00:00
mtcThreads.cpp Mtce: Add ActionInfo extension support for reset operations. 2022-10-13 17:40:05 +00:00
mtcThreads.h Add redfish power/reset/reinstall bmc support to maintenance 2019-09-26 15:59:35 -04:00
mtcVimApi.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcVimApi.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
mtcWorkQueue.cpp [Trivial Fix] fix typos in docstrings 2019-02-21 14:46:06 +08:00