Eric Macdonald 7c27500faa Prevent Mtce Heartbeat Cluster reinit over heartbeat period modify
Maintenance Heartbeat maintains an ever rotating heartbeat summary
of the last 20 heartbeat results for both controllers over both
mgmt and clstr monitored networks referred to as the Cluster Vault.
It shares this vault of information with SM on cluster state changes
or by request from SM fault handling as a key decision datapoint in
its action handling.

Unfortunately however, when the maintenance heartbeat period is
modified, the hbsAgent re-initializes the cluster vault which includes
a monitored_hosts field. Which is the problem.

When the cluster vault's 'monitored_hosts' field gets cleared, the
cluser module sees that there are no monitored hosts and stops sending
cluster state change events to SM and responds to explicit SM cluster
query requests with effectively null data [0:0] ie: no monitored hosts
and no responding hosts.

This state is only cleared restarting the hbsAgent on both controllers.

This update fixes that issue by introducing a hbs_cluster_set_period
function that replaces the cluster vault init with a simple heartbeat
period update.

This preserves monitored_hosts data in the cluster vault and keeps SM
informed of the current heartbeat period.

Test Plan:

PASS: Verify hbsAgent always continues to send SM valid cluster vault
      data that includes the current heartbeat period, even over a
      heartbeat period update for hbsAgent on both controllers.

Regression:

PASS: Verify hbsAgent cluster data content steady state and over
      a heartbeat period change.
PASS: Verify handling of spontaneous reboot of standby controller over
      a wide range of heartbeat periods.

Closes-Bug: 2144023
Change-Id: I5b5a24837987b6bc21d65325e4842d7e68014899
Signed-off-by: Eric Macdonald <eric.macdonald@windriver.com>
2026-03-12 17:18:06 -04:00
2023-08-29 16:50:22 -04:00
2019-04-19 19:52:33 +00:00
2025-11-24 16:07:28 -03:00
2018-05-31 07:36:43 -07:00
2023-07-19 12:32:13 -03:00
2022-12-26 23:26:54 +00:00

metal

The starlingx/metal repository handles StarlingX Bare Metal Management1.

This repository is not intended to be developed standalone, but rather as part of the StarlingX Source System, which is defined by the StarlingX manifest2.

References


  1. https://docs.starlingx.io/api-ref/metal↩︎

  2. https://opendev.org/starlingx/manifest.git↩︎

Description
StarlingX Bare Metal and Node Management, Hardware Maintenance
Readme 17 MiB
Languages
C++ 83.1%
Shell 10.1%
Python 3.2%
C 2.5%
Makefile 1%