StarlingX Bare Metal and Node Management, Hardware Maintenance
Go to file
Eric MacDonald ba6c61584d Refactor background in-service start host services handling
The maintenance add_handler fsm loads inventory and recovers
host state over a process restart. If the active controller's
uptime is less than 15 minutes the restart event is treated as
a Dead Office Recovery (DOR) and is more forgiving to host
recovery by scheduling the 'start host services' as a
background operation so as to not hold up the add operation.

The current implementation of the background handling of
'start host services' is not handling the AIO subfunction
case properly in DOR mode as well as being difficult to
follow and therfore fix and maintain. This miss handling
leads to maintenance incorrectly failing the node with a
subfunction configuration error over the DOR case.

This update refactors the background handling of 'start host
services' to fix the issue and improve its clearity and
maintainability.

Test Cases:

PASS: Verify AIO DX DOR handling
PASS: Verify AIO DX active controller reboot handling
      - standby with uptime ; < 15 min and > 15 min
PASS: Verify AIO DX standby controller reboot handling
PASS: Verify subfunction configuration error handling

Regression:

PASS: Verify start host services wait/retry handling.
PASS: Verify start host services failure handling.
PASS: Verify DOR of Standard system
PASS: Verify DOR of AIO Plus system
PASS: Verify AIO System Install
PASS: Verify Standard System Install
PASS: Verify AIO plus system install

Change-Id: Ia4683672e3a2852b5b4837167b2dcd2a1e4e6d57
Closes-Bug: 1928095
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2021-05-11 12:25:27 -04:00
api-ref/source Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
bsp-files Restrict isolcpu_plugin to nodes with worker function 2021-04-06 14:25:58 +00:00
devstack Security: Handle nospectre_v1 in the bootargs 2020-01-28 18:21:13 -05:00
doc Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
installer Add /pxeboot/grubx64.efi symlink for UEFI pxeboot 2021-05-07 08:56:06 -04:00
kickstart Drop isolcpu from AIO/worker kickstarts 2020-06-19 02:08:28 -04:00
mtce Refactor background in-service start host services handling 2021-05-11 12:25:27 -04:00
mtce-common Improved maintenance handling of spontaneous active controller reboot 2021-04-30 15:35:53 +00:00
mtce-compute Add auto-versioning to starlingx/metal mtce packages 2020-05-21 15:18:43 -04:00
mtce-control Mtce heartbeat cluster state change notification improvement 2021-01-08 09:59:24 -05:00
mtce-storage Add auto-versioning to starlingx/metal mtce packages 2020-05-21 15:18:43 -04:00
releasenotes Switch to newer openstackdocstheme and reno versions 2020-06-04 14:32:46 +02:00
tools/rvmc/centos Redfish Virtual Media Controller enhancements 2020-08-17 21:14:50 +00:00
.gitignore Update tox.ini files to use stein constraints 2019-06-25 13:20:35 -04:00
.gitreview OpenDev Migration Patch 2019-04-19 19:52:33 +00:00
.zuul.yaml Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
centos_build_layer.cfg Build layering, add layer build config file 2019-10-15 19:19:45 +08:00
centos_iso_image.inc Remove unused inventory and python-inventoryclient 2020-01-08 14:12:05 -06:00
centos_pkg_dirs rvmc: remove un-used build data 2020-01-16 08:39:54 -08:00
centos_stable_docker_images.inc Utility to install a server via Redfish 2019-12-31 15:34:54 +00:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:43 -07:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:43 -07:00
pylint.rc Add pylint checks for python files in metal 2020-01-03 13:27:00 -06:00
README.rst Followup opendev cleanup and test jobs 2019-04-22 16:42:03 +00:00
test-requirements.txt Tox and Zuul job for the bandit code scan in starlingx/metal 2020-06-29 08:24:46 +00:00
tox.ini Use newer flake8 to run on ubuntu-focal Zuul machines 2020-09-09 17:59:49 -04:00

metal

StarlingX Bare Metal Management