Avoid simultaneous restarts

It is possible for two or more percona-cluster units to simultaneously
attempt to restart and join the cluster. When this race condition
occurs one unit may error with:
"Failed to start mysql (max retries reached)"

We already have the control mechanism distributed_wait used in other
charms. This change implements this mechanism for percona-cluster.

Configuration options allow for fine tuning. The balance is time vs
tolerance for collision errors. CI systems may tolerate the occasion
false positive for time saved. Where production deployments can
sacrifice a bit of time for a guaranteed deploy.

Change-Id: I52e7f8e410ecd77a7a142d44b43414e33eff3a6e
Closes-Bug: #1745432
This commit is contained in:
David Ames
2018-01-25 15:20:29 -08:00
parent dc19ecb4a3
commit bd5474ce2f
4 changed files with 86 additions and 0 deletions

View File

@@ -106,6 +106,7 @@ from percona_utils import (
update_bootstrap_uuid,
LeaderNoBootstrapUUIDError,
update_root_password,
cluster_wait,
)
from charmhelpers.core.unitdata import kv
@@ -239,6 +240,8 @@ def render_config_restart_on_changed(clustered, hosts, bootstrap=False):
# new units will join and apply their own config.
if not seeded():
action = service_restart
# If we are restarting avoid simultaneous restart collisions
cluster_wait()
else:
action = service_start