Move stonith resource creation to step2

With the merging of the pcs on host patchset for train we are seeing a
problem with FFUs on Instance HA environments.

Preamble:
Tripleo keeps the stonith-enabled cluster property set to false until the puppet step 5

With the pcs on host patchset the enablement happens still at step 5 but
it gets triggered during tripleo_ha_wrapper deployment task of
cinder-volume which tries to restart the cinder-volume service (during
the leapp of the first controller) and this hangs forever because
pacemaker is in the following transition:
- stonith-fence_compute-fence-nova is configured
- pacemaker wants to call stonith on for controller-0 (which is probably
  dumb, but it is unlikely we'll be able to change that in the right
  timeframe as it seems a potentially involved change in behaviour)
- Any other action, like cinder-volume restart in this case, is stuck
  and the FFU fails.

If we simply move the stonith resource creation (and change nothing else
in the stonith-enabled property being set at step 5) to step 2, we
fix this.

Tested and with the injection of this puppet-tripleo review into the
FFU queens->train upgrade on an IHA system, now the FFU passes.
Also applied this patch to a Train based IHA deployment and verified
that deployment, redeploy, minor update and scaleup all keep on working.

Closes-Bug: #1923723

Change-Id: Ib3e2d9c93221dfc2e15974142f30e8c84e7afd63
(cherry picked from commit 6196157b54)
This commit is contained in:
Michele Baldessari 2021-04-12 14:22:58 +02:00
parent 4a23dc84d4
commit bd1807c48b

View File

@ -146,7 +146,14 @@ class tripleo::profile::base::pacemaker (
$pacemaker_master = false
}
# enable_fencing guides the enablement of the stonith-enabled cluster-wide property
# enable_stonith_resources drives the creation of the stonith resources themselves and happens at
# step2. The reason for step2 is the following:
# During step1 the cluster is created (and also the pcmk remote resources in case of IHA)
# Since stonith resources are created on each node separately we need to have the guarantee that
# all cluster nodes + remote exist before creating stonith resources for them
$enable_fencing = str2bool(hiera('enable_fencing', false)) and $step >= 5
$enable_stonith_resources = str2bool(hiera('enable_fencing', false)) and $step >= 2
if $step >= 1 {
include ::pacemaker::params
@ -233,7 +240,7 @@ class tripleo::profile::base::pacemaker (
}
Class['pacemaker::stonith'] -> Exec<|tag == 'pacemaker-scaleup'|>
}
if $enable_fencing {
if $enable_stonith_resources {
include ::tripleo::fencing
# enable stonith after all Pacemaker resources have been created