From 7a82bae8111ea29362ea63f174cd91b83db68430 Mon Sep 17 00:00:00 2001 From: Michele Baldessari Date: Fri, 17 Jul 2020 10:01:00 +0200 Subject: [PATCH] Do not create stonith location constraint when there is a single node This is needed for IHA FFU. And the reason is as follows: When we upgrade the first node of the control plane (in the 3 controller case scenario), we create a brand new one-node cluster. Now when the stonith location constraint for ipmi is created, it assigns a negative score to the ipmi resource running on the single node and so it will simply be stopped and not run at all. This creates a problem at the end of the upgrade process for the first controller, because the final steps that verify resources are up will fail: fatal: [controller-0]: FAILED! => {"changed": false, "error": "Error: waiting timeout\n\nPending actions:\n\tAction 29: stonith-fence_ipmilan-525400c4f1c1_monitor_0\ton cont roller-0\n\tAction 28: stonith-controller-0-on\ton controller-0\nError performing operation: Timer expired\n", "msg": "Failed, to set the resource openstack-cinder-volume to the state enable", "output": "", "rc": 1} That is because pacemaker will calculate a transition that requires a stonith 'ON' operation which cannot happen because the stonith resource is not running. I've checked with Beekhof and this is a bit of a current limitation in pacemaker when running on a single node (inability to ignore the negative score when there is only one node). To avoid this we only create the stonith location constraint when there is more than one node (as output by crm_node -l) Tested a number of times on IHA FFU successfully. Related-Bug: #1888398 Change-Id: Iefe68c26b188922d1e8fe4cd0d24242262ee6665 --- lib/puppet/provider/pcmk_common.rb | 10 ++++++++++ lib/puppet/provider/pcmk_stonith/default.rb | 3 ++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/lib/puppet/provider/pcmk_common.rb b/lib/puppet/provider/pcmk_common.rb index d6a5db79..068f2507 100644 --- a/lib/puppet/provider/pcmk_common.rb +++ b/lib/puppet/provider/pcmk_common.rb @@ -20,6 +20,7 @@ end PCS_BIN = "#{prefix_path}pcs" unless defined? PCS_BIN CRMDIFF_BIN = "#{prefix_path}crm_diff" unless defined? CRMDIFF_BIN +CRMNODE_BIN = "#{prefix_path}crm_node" unless defined? CRMNODE_BIN CRMSIMULATE_BIN = "#{prefix_path}crm_simulate" unless defined? CRMSIMULATE_BIN CRMRESOURCE_BIN = "#{prefix_path}crm_resource" unless defined? CRMRESOURCE_BIN TIMEOUT_BIN = "#{prefix_path}timeout" unless defined? TIMEOUT_BIN @@ -36,6 +37,15 @@ def pcs_cli_version() return pcs_cli_version end +def crm_node_l() + begin + nodes = `#{CRMNODE_BIN} -l` + rescue + nodes = '' + end + return nodes +end + # Ruby 2.5 has dropped Dir::Tmpname.make_tmpname # https://github.com/ruby/ruby/commit/25d56ea7b7b52dc81af30c92a9a0e2d2dab6ff27 diff --git a/lib/puppet/provider/pcmk_stonith/default.rb b/lib/puppet/provider/pcmk_stonith/default.rb index cf5b2e99..a822cb71 100644 --- a/lib/puppet/provider/pcmk_stonith/default.rb +++ b/lib/puppet/provider/pcmk_stonith/default.rb @@ -131,7 +131,8 @@ Puppet::Type.type(:pcmk_stonith).provide(:default) do def stonith_location_rule_create() pcmk_host_list = @resource[:pcmk_host_list] - if not_empty_string(pcmk_host_list) + nodes_count = crm_node_l().lines.size + if not_empty_string(pcmk_host_list) and nodes_count > 1 location_cmd = "constraint location #{@resource[:name]} avoids #{pcmk_host_list}=10000" Puppet.debug("stonith_location_rule_create: #{location_cmd}") pcs('create', @resource[:name], location_cmd, @resource[:tries],