8e9798caf6
When running minor update in a composable HA, different
roles could run ansible tasks concurrently. However,
there is currently a race when pacemaker nodes are
stopped in parallel [1,2], that could cause nodes to
incorrectly stop themselves once they reconnect to the
cluster.
To prevent concurrent shutdown, use a cluster-wide lock
to signals that one node is about to shutdown, and block
the others until the node disconnects from the cluster.
Tested the minor update in a composable HA environment:
. when run with "openstack update run", every role
is updated sequentially, and the shutdown lock
doesn't interfere.
. when running multiple ansible tasks in parallel
"openstack update run --limit role<X>", pacemaker
nodes are correctly stopped sequentially thanks
to the shutdown lock.
. when updating an existing overcloud, the new
locking script used in the review is correctly
injected on the overcloud, thanks to [3].
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1791841
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1872404
[3] I2ac6bb98e1d4183327e888240fc8d5a70e0d6fcb
Closes-Bug: #1904193
Change-Id: I0e041c6a95a7f53019967f9263df2326b1408c6f
(cherry picked from commit
|
||
---|---|---|
.. | ||
monitoring | ||
tests | ||
__init__.py | ||
nova_statedir_ownership.py | ||
nova_wait_for_api_service.py | ||
nova_wait_for_compute_service.py | ||
pacemaker_mutex_restart_bundle.sh | ||
pacemaker_mutex_shutdown.sh | ||
pacemaker_resource_lock.sh | ||
pacemaker_restart_bundle.sh | ||
pacemaker_wait_bundle.sh | ||
placement_wait_for_service.py | ||
pyshim.sh | ||
wait-port-and-run.sh |