Enable fence_watchdog configuration in stonith topology

This commit extends the fencing manifest to make use of a
"fence_watchdog" device and allows using the resulting "watchdog"
resource in a stonith topology.

In order for this to work the cluster must have been configured with
sbd, either manually or via 'pacemaker::corosync::enable_sbd: true'.
In addition, the fence_watchdog resource needs a supported watchdog
timer device to perform the self fencing.

The fence_watchdog configuration is very much opinionated:
- it assumes the resource name to be 'watchdog' (hardcoded in pacemaker)
- it only supports "all or nothing" scenario, in which all the cluster
  nodes need to make use of it
- it is not supported to be used with pacemaker_remote nodes

The fencing creation logic has been adjusted to use the pacemaker
boostrap node to create the watchdog resource and the stonith topology
for all the nodes in the cluster (since this is a single shared
resource we couldn't reuse the old "every man for himself" logic).

fence_watchdog device can be defined like any other fencing device
via fencing.yaml or equivalent:

parameter_defaults:
  EnableFencing: true
  FencingConfig:
    devices:
    - agent: fence_watchdog
      host_mac: 52:54:00:74:f7:51
    ...

Ideally fence_watchdog should be used a last resort, and so placed
at the bottom of a stonith topology where power-based fencing agents
are the primary choice for fencing.

The default value for stonith-watchdog-timeout (60s) can be
overridden via tripleo::fencing::watchdog_timeout .

Depends-On: Id010a392df0047d53dfab1c21cc78021c8c1aabf

Change-Id: I89a6014ffb40bc0935a348af7687684f3a71a968
(cherry picked from commit 6fc7430c18)
This commit is contained in:
Luca Miccini 2022-10-31 13:55:51 +01:00
parent 08e4898053
commit f5df16ab28
1 changed files with 34 additions and 0 deletions

View File

@ -55,6 +55,11 @@
# after the resource update.
# Defaults to 600 (seconds)
#
# [*watchdog_timeout*]
# Only valid if sbd watchdog fencing is enabled.
# Pacemaker will assume unseen nodes self-fence within this much time.
# Defaults to 60 (seconds)
#
# [*enable_instanceha*]
# (Optional) Boolean driving the Instance HA controlplane configuration
# Defaults to false
@ -65,6 +70,7 @@ class tripleo::fencing(
$try_sleep = 3,
$deep_compare = false,
$update_settle_secs = 600,
$watchdog_timeout = 60,
$enable_instanceha = hiera('tripleo::instanceha', false),
) {
$common_params = {
@ -185,6 +191,34 @@ class tripleo::fencing(
Pcmk_stonith<||> -> Pcmk_stonith_level<||>
}
}
# we use the boostrap_node to create the watchdog resource and the stonith
# topology for all the nodes in the cluster, because the watchdog resource
# is not per-node but cluster-wide
$watchdog_devices = local_fence_devices('fence_watchdog', $all_devices)
if length($watchdog_devices) > 0 {
# check if this is the bootstrap node
if downcase($::hostname) == lookup('pacemaker_short_bootstrap_node_name') {
create_resources('pacemaker::stonith::fence_watchdog', $watchdog_devices, $common_params)
$stonith_resources = [ 'watchdog' ]
# if this is the boostrap node we set watchdog as levelX for all
# the pacemaker nodes
lookup('pacemaker_short_node_names').each |$node| {
pacemaker::stonith::level{ "stonith-${level}-watchdog-${node}":
level => $level,
target => $node,
stonith_resources => [ 'watchdog' ],
tries => $tries,
try_sleep => $try_sleep,
}
}
pacemaker::property { 'stonith-watchdog-timeout':
property => 'stonith-watchdog-timeout',
value => $watchdog_timeout,
tries => $tries,
}
Pcmk_property<||> -> Pcmk_stonith<||> -> Pcmk_stonith_level<||>
}
}
}
}