cb5fa9510f
A forced reboot of the active controller in an AIO DC system puts SM into a failover failure recovery loop that prevents maintenance from detecting the heartbeat failure of the just- rebooted controller. The SM's failover failure recovery handling algorithm includes a self (sm process) restart preceded by a restart of the hbsAgent, both added by the following update last year. update: Add unhealthy state recovery audit to service management (sm) review: https://review.opendev.org/c/starlingx/ha/+/735219 The self restart of SM was and is required in this case. However, the restart of the hbsAgent was only included as a safety measure, at the time, to ensure SM received updated cluster state info. The hbsAgent restart was only added at that time with the longer term intention to have it removed once the hbsAgent cluster state change notification improvement was implemented. That change is now implemented and merged by the following update. update: Mtce heartbeat cluster state change notification improvement review: https://review.opendev.org/c/starlingx/metal/+/769936 Testing of the fix for the following issue in an AIO DC system resulted in the takeover controller not detecting a heartbeat loss of the just rebooted standby controller. title: Force active controller reboot results in a second reboot issue: https://bugs.launchpad.net/starlingx/+bug/1922584 The hbsAgent is not able to detect the heartbeat loss of the just- booted controller because SM keeps restarting it before it reaches the heartbeat loss state. With the cluster notification improvement update now implemented and merged it's time to remove the hbsAgent restart from SM's failover failure recovery algorithm. Test Plan: PASS: Active controller force reboot handling in AIO DC, DX and standard systems. PASS: Standby controller force reboot handling in AIO DC, DX and standard systems Partial-Bug: 1922584 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com> Change-Id: I26aa5ed9e0faec7294816269dbaa49cbb4696f66 |
||
---|---|---|
.. | ||
sm | ||
sm-common | ||
sm-db | ||
LICENSE |