Maintenance Heartbeat maintains an ever rotating heartbeat summary
of the last 20 heartbeat results for both controllers over both
mgmt and clstr monitored networks referred to as the Cluster Vault.
It shares this vault of information with SM on cluster state changes
or by request from SM fault handling as a key decision datapoint in
its action handling.
Unfortunately however, when the maintenance heartbeat period is
modified, the hbsAgent re-initializes the cluster vault which includes
a monitored_hosts field. Which is the problem.
When the cluster vault's 'monitored_hosts' field gets cleared, the
cluser module sees that there are no monitored hosts and stops sending
cluster state change events to SM and responds to explicit SM cluster
query requests with effectively null data [0:0] ie: no monitored hosts
and no responding hosts.
This state is only cleared restarting the hbsAgent on both controllers.
This update fixes that issue by introducing a hbs_cluster_set_period
function that replaces the cluster vault init with a simple heartbeat
period update.
This preserves monitored_hosts data in the cluster vault and keeps SM
informed of the current heartbeat period.
Test Plan:
PASS: Verify hbsAgent always continues to send SM valid cluster vault
data that includes the current heartbeat period, even over a
heartbeat period update for hbsAgent on both controllers.
Regression:
PASS: Verify hbsAgent cluster data content steady state and over
a heartbeat period change.
PASS: Verify handling of spontaneous reboot of standby controller over
a wide range of heartbeat periods.
Closes-Bug: 2144023
Change-Id: I5b5a24837987b6bc21d65325e4842d7e68014899
Signed-off-by: Eric Macdonald <eric.macdonald@windriver.com>
Description
Languages
C++
83.1%
Shell
10.1%
Python
3.2%
C
2.5%
Makefile
1%