5ab03b5222
The current heartbeat cluster state change notification needs to be sent when heartbeat pulses begin to be missed rather than only after the host has reached the Heartbeat Loss threshold. This buys SM more time, almost a full second, and in doing so provides more accurate data for it to make its SM heartbeat failure handling decisions. This update also begins sending maintenance heartbeat cluster state change notifications just before the next multicast pulse request but after the cluster vault is updated from the last pulse period. This ensures that SM gets the most up-to-date cluster information. This update also changes the hbsAgent's service file to depend on the local hbsClient. By doing so, the hbsAgent shuts down earlier over a graceful reboot thereby preventing the hbsAgent from continuing to report healthy response to the inactive controller during active controller shutdown. This way the inactive SM sees the failed active controller when it queries the cluster in its fail-pending state resulting in an inactive SM take-over rather than stand-down. Additional hbsAgent service file changes were made to prevent systemd from auto recovering a failed hbsAgent process, as its monitored and managed by pmond, and fixed the ExecStop command line. Test Plan: PASS: Verify active controller graceful reboot. Standby controller takes over rather than shutdown - 30 of 30 iterations PASS: Verify active controller forced reboot PASS: Verify enabled standby controller graceful reboot PASS: Verify Standard System install PASS: Verify AIO DX system install Regression: PASS: Verify SM Uncontrolled Swact if active controller Mgmnt link drops. PASS: Verify handling of downed cluster interface in - AIO DX (fail) and Standard (degrade) system PASS: Verify no coredumps PASS: Verify update as a patch Change-Id: I6869631e091eb28a3cbb6f15d9a8ccd939c54410 Closes-Bug: 1906556 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com> |
||
---|---|---|
.. | ||
centos | ||
opensuse | ||
src | ||
PKG-INFO |