Do not create stonith location constraint when there is a single node

This is needed for IHA FFU. And the reason is as follows:
When we upgrade the first node of the control plane (in the 3 controller
case scenario), we create a brand new one-node cluster. Now when the
stonith location constraint for ipmi is created, it assigns a negative
score to the ipmi resource running on the single node and so it will
simply be stopped and not run at all. This creates a problem at the
end of the upgrade process for the first controller, because
the final steps that verify resources are up will fail:
  fatal: [controller-0]: FAILED! => {"changed": false, "error": "Error: waiting timeout\n\nPending actions:\n\tAction 29:
  stonith-fence_ipmilan-525400c4f1c1_monitor_0\ton cont roller-0\n\tAction 28: stonith-controller-0-on\ton controller-0\nError performing
  operation: Timer expired\n", "msg": "Failed, to set the resource openstack-cinder-volume to the state enable", "output": "", "rc": 1}

That is because pacemaker will calculate a transition that requires a
stonith 'ON' operation which cannot happen because the stonith
resource is not running.
I've checked with Beekhof and this is a bit of a current limitation
in pacemaker when running on a single node (inability to ignore the
negative score when there is only one node). To avoid this we
only create the stonith location constraint when there is more than
one node (as output by crm_node -l)

Tested a number of times on IHA FFU successfully.

Related-Bug: #1888398
Change-Id: Iefe68c26b188922d1e8fe4cd0d24242262ee6665
This commit is contained in:
Michele Baldessari 2020-07-17 10:01:00 +02:00
parent 1bb2fe70b1
commit 7a82bae811
2 changed files with 12 additions and 1 deletions

View File

@ -20,6 +20,7 @@ end
PCS_BIN = "#{prefix_path}pcs" unless defined? PCS_BIN
CRMDIFF_BIN = "#{prefix_path}crm_diff" unless defined? CRMDIFF_BIN
CRMNODE_BIN = "#{prefix_path}crm_node" unless defined? CRMNODE_BIN
CRMSIMULATE_BIN = "#{prefix_path}crm_simulate" unless defined? CRMSIMULATE_BIN
CRMRESOURCE_BIN = "#{prefix_path}crm_resource" unless defined? CRMRESOURCE_BIN
TIMEOUT_BIN = "#{prefix_path}timeout" unless defined? TIMEOUT_BIN
@ -36,6 +37,15 @@ def pcs_cli_version()
return pcs_cli_version
end
def crm_node_l()
begin
nodes = `#{CRMNODE_BIN} -l`
rescue
nodes = ''
end
return nodes
end
# Ruby 2.5 has dropped Dir::Tmpname.make_tmpname
# https://github.com/ruby/ruby/commit/25d56ea7b7b52dc81af30c92a9a0e2d2dab6ff27

View File

@ -131,7 +131,8 @@ Puppet::Type.type(:pcmk_stonith).provide(:default) do
def stonith_location_rule_create()
pcmk_host_list = @resource[:pcmk_host_list]
if not_empty_string(pcmk_host_list)
nodes_count = crm_node_l().lines.size
if not_empty_string(pcmk_host_list) and nodes_count > 1
location_cmd = "constraint location #{@resource[:name]} avoids #{pcmk_host_list}=10000"
Puppet.debug("stonith_location_rule_create: #{location_cmd}")
pcs('create', @resource[:name], location_cmd, @resource[:tries],