cb55cc8ce5
When running minor update in a composable HA, different roles could run ansible tasks concurrently. However, there is currently a race when pacemaker nodes are stopped in parallel [1,2], that could cause nodes to incorrectly stop themselves once they reconnect to the cluster. To prevent concurrent shutdown, use a cluster-wide lock to signals that one node is about to shutdown, and block the others until the node disconnects from the cluster. Tested the minor update in a composable HA environment: . when run with "openstack update run", every role is updated sequentially, and the shutdown lock doesn't interfere. . when running multiple ansible tasks in parallel "openstack update run --limit role<X>", pacemaker nodes are correctly stopped sequentially thanks to the shutdown lock. . when updating an existing overcloud, the new locking script used in the review is correctly injected on the overcloud, thanks to [3]. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1791841 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1872404 [3] I2ac6bb98e1d4183327e888240fc8d5a70e0d6fcb Closes-Bug: #1904193 Change-Id: I0e041c6a95a7f53019967f9263df2326b1408c6f |
||
---|---|---|
.. | ||
monitoring | ||
tests | ||
__init__.py | ||
nova_statedir_ownership.py | ||
nova_wait_for_api_service.py | ||
nova_wait_for_compute_service.py | ||
pacemaker_mutex_restart_bundle.sh | ||
pacemaker_mutex_shutdown.sh | ||
pacemaker_resource_lock.sh | ||
pacemaker_restart_bundle.sh | ||
pacemaker_wait_bundle.sh | ||
placement_wait_for_service.py | ||
pyshim.sh | ||
wait-port-and-run.sh |