tripleo-heat-templates/container_config_scripts/pacemaker_restart_bundle.sh
Michele Baldessari dcfc98d236 Fix pcs restart in composable HA
When a redeploy command is being run in a composable HA environment, if there
are any configuration changes, the <bundle>_restart containers will be kicked
off. These restart containers will then try and restart the bundles globally in
the cluster.

These restarts will be fired off in parallel from different nodes. So
haproxy-bundle will be restarted from controller-0, mysql-bundle from
database-0, rabbitmq-bundle from messaging-0.

This has proven to be problematic and very often (rhbz#1868113) it would fail
the redeploy with:
2020-08-11T13:40:25.996896822+00:00 stderr F Error: Could not complete shutdown of rabbitmq-bundle, 1 resources remaining
2020-08-11T13:40:25.996896822+00:00 stderr F Error performing operation: Timer expired
2020-08-11T13:40:25.996896822+00:00 stderr F Set 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role set=rabbitmq-bundle-meta_attributes name=target-role value=stopped
2020-08-11T13:40:25.996896822+00:00 stderr F Waiting for 2 resources to stop:
2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F * galera-bundle
2020-08-11T13:40:25.996896822+00:00 stderr F Deleted 'rabbitmq-bundle' option: id=rabbitmq-bundle-meta_attributes-target-role name=target-role
2020-08-11T13:40:25.996896822+00:00 stderr F

or

2020-08-11T13:39:49.197487180+00:00 stderr F Waiting for 2 resources to start again:
2020-08-11T13:39:49.197487180+00:00 stderr F * galera-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F Could not complete restart of galera-bundle, 1 resources remaining
2020-08-11T13:39:49.197487180+00:00 stderr F * rabbitmq-bundle
2020-08-11T13:39:49.197487180+00:00 stderr F

After discussing it with kgaillot it seems that concurrent restarts in pcmk are just brittle:
"""
Sadly restarts are brittle, and they do in fact assume that nothing else is causing resources to start or stop. They work like this:

- Get the current configuration and state of the cluster, including a list of active resources (list #1)
- Set resource target-role to Stopped
- Get the current configuration and state of the cluster, including a list of which resources *should* be active (list #2)
- Compare lists #1 and #2, and the difference is the resources that should stop
- Periodically refresh the configuration and state until the list of active resources matches list #2
- Delete the target-role
- Periodically refresh the configuration and state until the list of active resources matches list #1
"""

So the suggestion is to replace the restarts with an enable/disable cycle of the resource.

Tested this on a dozen runs on a composable HA environment and did not observe the error
any longer.

Closes-Bug: #1892206

Change-Id: I9cc27b1539a62a88fb0bccac64e6b1ae9295f22e
2020-08-19 16:21:15 +02:00

97 lines
4.2 KiB
Bash
Executable File

#!/bin/bash
set -u
# ./pacemaker_restart_bundle.sh mysql galera galera-bundle Master _
# ./pacemaker_restart_bundle.sh redis redis redis-bundle Slave Master
# ./pacemaker_restart_bundle.sh ovn_dbs ovndb_servers ovn-dbs-bundle Slave Master
RESTART_SCRIPTS_DIR=$(dirname $0)
TRIPLEO_SERVICE=$1
RESOURCE_NAME=$2
BUNDLE_NAME=$3
WAIT_TARGET_LOCAL=$4
WAIT_TARGET_ANYWHERE=${5:-_}
TRIPLEO_MINOR_UPDATE="${TRIPLEO_MINOR_UPDATE:-false}"
bundle_can_be_restarted() {
local bundle=$1
# As long as the resource bundle is managed by pacemaker and is
# not meant to stay stopped, no matter the state of any inner
# pcmk_remote or ocf resource, we should restart it to give it a
# chance to read the new config.
[ "$(crm_resource --meta -r $1 -g is-managed 2>/dev/null)" != "false" ] && \
[ "$(crm_resource --meta -r $1 -g target-role 2>/dev/null)" != "Stopped" ]
}
if [ x"${TRIPLEO_MINOR_UPDATE,,}" != x"true" ]; then
if hiera -c /etc/puppet/hiera.yaml stack_action | grep -q -x CREATE; then
# Do not restart during initial deployment, as the resource
# has just been created.
exit 0
else
# During a stack update, this script is called in parallel on
# every node the resource runs on, after the service's configs
# have been updated on all nodes. So we need to run pcs only
# once (e.g. on the service's boostrap node).
if bundle_can_be_restarted ${BUNDLE_NAME}; then
HOSTNAME=$(/bin/hostname -s)
SERVICE_NODEID=$(/bin/hiera -c /etc/puppet/hiera.yaml "${TRIPLEO_SERVICE}_short_bootstrap_node_name")
if [[ "${HOSTNAME,,}" == "${SERVICE_NODEID,,}" ]]; then
replicas_running=$(crm_resource -Q -r $BUNDLE_NAME --locate 2>&1 | wc -l)
if [ "$replicas_running" != "0" ]; then
echo "$(date -u): Restarting ${BUNDLE_NAME} globally. Stopping:"
/sbin/pcs resource disable --wait=__PCMKTIMEOUT__ $BUNDLE_NAME
echo "$(date -u): Restarting ${BUNDLE_NAME} globally. Starting:"
/sbin/pcs resource enable --wait=__PCMKTIMEOUT__ $BUNDLE_NAME
else
echo "$(date -u): ${BUNDLE_NAME} is not running anywhere," \
"cleaning up to restart it globally if necessary"
/sbin/pcs resource cleanup $BUNDLE_NAME
fi
else
echo "$(date -u): Skipping global restart of ${BUNDLE_NAME} on ${HOSTNAME} it will be restarted by node ${SERVICE_NODEID}"
fi
else
echo "$(date -u): No global restart needed for ${BUNDLE_NAME}."
fi
fi
else
# During a minor update workflow however, a host gets fully
# updated before updating the next one. So unlike stack
# update, at the time this script is called, the service's
# configs aren't updated on all nodes yet. So only restart the
# resource locally, where it's guaranteed that the config is
# up to date.
HOST=$(facter hostname)
if bundle_can_be_restarted ${BUNDLE_NAME}; then
# if the resource is running locally, restart it
if crm_resource -r $BUNDLE_NAME --locate 2>&1 | grep -w -q "${HOST}"; then
echo "$(date -u): Restarting ${BUNDLE_NAME} locally on '${HOST}'"
/sbin/pcs resource restart $BUNDLE_NAME "${HOST}"
else
# At this point, if no resource is running locally, it's
# either because a) it has failed previously, or b) because
# it's an A/P resource running elsewhere.
# By cleaning up resource, we ensure that a) it will try to
# restart, or b) it won't do anything if the resource is
# already running elsewhere.
echo "$(date -u): ${BUNDLE_NAME} is currently not running on '${HOST}'," \
"cleaning up its state to restart it if necessary"
/sbin/pcs resource cleanup $BUNDLE_NAME --node "${HOST}"
fi
# Wait until the resource is in the expected target state
$RESTART_SCRIPTS_DIR/pacemaker_wait_bundle.sh \
$RESOURCE_NAME $BUNDLE_NAME \
"$WAIT_TARGET_LOCAL" "$WAIT_TARGET_ANYWHERE" \
"${HOST}" __PCMKTIMEOUT__
else
echo "$(date -u): No restart needed for ${BUNDLE_NAME}."
fi
fi