With the merging of the pcs on host patchset for train we are seeing a
problem with FFUs on Instance HA environments.
Tripleo keeps the stonith-enabled cluster property set to false until the puppet step 5
With the pcs on host patchset the enablement happens still at step 5 but
it gets triggered during tripleo_ha_wrapper deployment task of
cinder-volume which tries to restart the cinder-volume service (during
the leapp of the first controller) and this hangs forever because
pacemaker is in the following transition:
- stonith-fence_compute-fence-nova is configured
- pacemaker wants to call stonith on for controller-0 (which is probably
dumb, but it is unlikely we'll be able to change that in the right
timeframe as it seems a potentially involved change in behaviour)
- Any other action, like cinder-volume restart in this case, is stuck
and the FFU fails.
If we simply move the stonith resource creation (and change nothing else
in the stonith-enabled property being set at step 5) to step 2, we
Tested and with the injection of this puppet-tripleo review into the
FFU queens->train upgrade on an IHA system, now the FFU passes.
Also applied this patch to a Train based IHA deployment and verified
that deployment, redeploy, minor update and scaleup all keep on working.
(cherry picked from commit