7f785e8757
For each HA service we have a paunch container <service>_restart_bundle which is started by paunch whenever config files changes during stack deploy/update. This container runs a pcs command on a single node to restart all the service's containers (e.g. all galera on all controllers). By design, when it is run, configs have already been regenerated by the deploy tasks on all nodes. For minor updates, the workflow runs differently: all the steps of the deploy tasks are run one node after the other, so when <service>_restart_bundle is called, there is no guarantee that the service's configs have been regenerated on all the nodes yet. To fix the wrong restart behaviour, only restart local containers when running during a minor update. And run once per node. When the minor update workflow calls <service>_restart_container, we still have the guarantee that the config files are already regenerated locally. Co-Authored-By: Michele Baldessari <michele@acksyn.org> Co-Authored-By: Luca Miccini <lmiccini@redhat.com> Change-Id: I92d4ddf2feeac06ce14468ae928c283f3fd04f45 Closes-Bug: #1841629
41 lines
2.1 KiB
Bash
41 lines
2.1 KiB
Bash
#!/bin/bash
|
|
|
|
set -u
|
|
|
|
# ./pacemaker_restart_bundle.sh galera-bundle galera
|
|
RESOURCE=$1
|
|
TRIPLEO_SERVICE=$2
|
|
|
|
# try to restart only if resource has been created already
|
|
if /usr/sbin/pcs resource show $RESOURCE; then
|
|
if [ x"${TRIPLEO_MINOR_UPDATE,,}" != x"true" ]; then
|
|
# During a stack update, this script is called in parallel on
|
|
# every node the resource runs on, after the service's configs
|
|
# have been updated on all nodes. So we need to run pcs only
|
|
# once (e.g. on the service's boostrap node).
|
|
echo "$(date -u): Restarting ${RESOURCE} globally"
|
|
/usr/bin/bootstrap_host_exec $TRIPLEO_SERVICE /sbin/pcs resource restart --wait=__PCMKTIMEOUT__ $RESOURCE
|
|
else
|
|
# During a minor update workflow however, a host gets fully
|
|
# updated before updating the next one. So unlike stack
|
|
# update, at the time this script is called, the service's
|
|
# configs aren't updated on all nodes yet. So only restart the
|
|
# resource locally, where it's guaranteed that the config is
|
|
# up to date.
|
|
HOST=$(facter hostname)
|
|
# XPath rationale: as long as there is a bundle running
|
|
# locally and it is managed by pacemaker, no matter the state
|
|
# of any inner pcmk_remote or ocf resource, we should restart
|
|
# it to give it a chance to read the new config.
|
|
# XPath rationale 2: if the resource is being stopped, the
|
|
# attribute "target_role" will be present in the output of
|
|
# crm_mon. Do not restart the resource if that is the case.
|
|
if crm_mon -r --as-xml | xmllint --format --xpath "//bundle[@id='${RESOURCE}']/replica/resource[@managed='true' and (not(boolean(@target_role)) or (boolean(@target_role) and @target_role!='Stopped'))]/node[@name='${HOST}']/../.." - &>/dev/null; then
|
|
echo "$(date -u): Restarting ${RESOURCE} locally on '${HOST}'"
|
|
/sbin/pcs resource restart --wait=__PCMKTIMEOUT__ $RESOURCE "${HOST}"
|
|
else
|
|
echo "$(date -u): Resource ${RESOURCE} currently not running on '${HOST}', no restart needed"
|
|
fi
|
|
fi
|
|
fi
|