StarlingX Bare Metal and Node Management, Hardware Maintenance
Go to file
Eric MacDonald 5ab03b5222 Mtce heartbeat cluster state change notification improvement
The current heartbeat cluster state change notification
needs to be sent when heartbeat pulses begin to be missed
rather than only after the host has reached the Heartbeat
Loss threshold. This buys SM more time, almost a full
second, and in doing so provides more accurate data for
it to make its SM heartbeat failure handling decisions.

This update also begins sending maintenance heartbeat
cluster state change notifications just before the next
multicast pulse request but after the cluster vault is
updated from the last pulse period. This ensures that
SM gets the most up-to-date cluster information.

This update also changes the hbsAgent's service file
to depend on the local hbsClient. By doing so, the
hbsAgent shuts down earlier over a graceful reboot
thereby preventing the hbsAgent from continuing to
report healthy response to the inactive controller
during active controller shutdown.

This way the inactive SM sees the failed active
controller when it queries the cluster in its
fail-pending state resulting in an inactive SM
take-over rather than stand-down.

Additional hbsAgent service file changes were made to
prevent systemd from auto recovering a failed hbsAgent
process, as its monitored and managed by pmond, and
fixed the ExecStop command line.

Test Plan:

PASS: Verify active controller graceful reboot.
      Standby controller takes over rather than shutdown
      - 30 of 30 iterations
PASS: Verify active controller forced reboot
PASS: Verify enabled standby controller graceful reboot
PASS: Verify Standard System install
PASS: Verify AIO DX system install

Regression:

PASS: Verify SM Uncontrolled Swact if active
      controller Mgmnt link drops.
PASS: Verify handling of downed cluster interface in
      - AIO DX (fail) and Standard (degrade) system
PASS: Verify no coredumps
PASS: Verify update as a patch

Change-Id: I6869631e091eb28a3cbb6f15d9a8ccd939c54410
Closes-Bug: 1906556
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2021-01-08 09:59:24 -05:00
api-ref/source Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
bsp-files Revert "Enable 'rcu_nocb_poll' kernel config option" 2020-10-27 13:53:04 +00:00
devstack Security: Handle nospectre_v1 in the bootargs 2020-01-28 18:21:13 -05:00
doc Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
installer Remove reference to cgcs-centos-repo 2020-09-18 16:04:44 -04:00
kickstart Drop isolcpu from AIO/worker kickstarts 2020-06-19 02:08:28 -04:00
mtce Mtce heartbeat cluster state change notification improvement 2021-01-08 09:59:24 -05:00
mtce-common Add SM process heartbeat and status to the hbs cluster 2020-12-10 11:13:13 -05:00
mtce-compute Add auto-versioning to starlingx/metal mtce packages 2020-05-21 15:18:43 -04:00
mtce-control Mtce heartbeat cluster state change notification improvement 2021-01-08 09:59:24 -05:00
mtce-storage Add auto-versioning to starlingx/metal mtce packages 2020-05-21 15:18:43 -04:00
releasenotes Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
tools/rvmc/centos Redfish Virtual Media Controller enhancements 2020-08-17 21:14:50 +00:00
.gitignore Update tox.ini files to use stein constraints 2019-06-25 13:20:35 -04:00
.gitreview OpenDev Migration Patch 2019-04-19 19:52:33 +00:00
.zuul.yaml Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:43 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:43 -07:00
README.rst Followup opendev cleanup and test jobs 2019-04-22 16:42:03 +00:00
centos_build_layer.cfg Build layering, add layer build config file 2019-10-15 19:19:45 +08:00
centos_iso_image.inc Remove unused inventory and python-inventoryclient 2020-01-08 14:12:05 -06:00
centos_pkg_dirs rvmc: remove un-used build data 2020-01-16 08:39:54 -08:00
centos_stable_docker_images.inc Utility to install a server via Redfish 2019-12-31 15:34:54 +00:00
pylint.rc Add pylint checks for python files in metal 2020-01-03 13:27:00 -06:00
test-requirements.txt Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
tox.ini Use newer flake8 to run on ubuntu-focal Zuul machines 2020-09-09 17:59:49 -04:00

README.rst

metal

StarlingX Bare Metal Management