Avoid simultaneous restarts

It is possible for two or more percona-cluster units to simultaneously attempt to restart and join the cluster. When this race condition occurs one unit may error with: "Failed to start mysql (max retries reached)" We already have the control mechanism distributed_wait used in other charms. This change implements this mechanism for percona-cluster. Configuration options allow for fine tuning. The balance is time vs tolerance for collision errors. CI systems may tolerate the occasion false positive for time saved. Where production deployments can sacrifice a bit of time for a guaranteed deploy. Change-Id: I52e7f8e410ecd77a7a142d44b43414e33eff3a6e Closes-Bug: #1745432
2018-01-25 15:20:29 -08:00
parent dc19ecb4a3
commit bd5474ce2f
4 changed files with 86 additions and 0 deletions
--- a/hooks/percona_hooks.py
+++ b/hooks/percona_hooks.py
@@ -106,6 +106,7 @@ from percona_utils import (
    update_bootstrap_uuid,
    LeaderNoBootstrapUUIDError,
    update_root_password,
+    cluster_wait,
 )

 from charmhelpers.core.unitdata import kv
@@ -239,6 +240,8 @@ def render_config_restart_on_changed(clustered, hosts, bootstrap=False):
            # new units will join and apply their own config.
            if not seeded():
                action = service_restart
+                # If we are restarting avoid simultaneous restart collisions
+                cluster_wait()
            else:
                action = service_start