Browse Source

HA: increase resource default op timeout for podman bundles

Pacemaker currently relies on a single, system-wide timeout for
every operation run on containers, set by default to 20s.

With podman's reliance on disk, the default timeout can be
reached under high disk IO (e.g. during container-puppet or
when paunch restart containers from step 4). This causes unneeded
resource restart in pacemaker and can even trigger fencing if
stop operations time out.

Raise the default to a safer value of 120s when using podman,
and make that default timeout configurable via a new Heat
parameter PacemakerBundleOperationTimeout.

Closes-Bug: #1834325

Change-Id: Ie45a56b8dab1272798440110ad96b6ee11fcf26b
(cherry picked from commit 744da29ba7)
(cherry picked from commit add09e8634)
changes/01/699901/3
Damien Ciabrini 2 months ago
parent
commit
015211b188
1 changed files with 33 additions and 0 deletions
  1. +33
    -0
      puppet/services/pacemaker.yaml

+ 33
- 0
puppet/services/pacemaker.yaml View File

@@ -105,9 +105,28 @@ parameters:
description: Use Leapp for operating system upgrade
type: boolean
default: true
ContainerCli:
type: string
default: 'podman'
description: CLI tool used to manage containers.
constraints:
- allowed_values: ['docker', 'podman']
PacemakerBundleOperationTimeout:
type: string
default: ''
description: The timeout for start, monitor and stop operations
run by the container resource agent, in seconds.
When set to default '', the timeout comes from
pacemaker's default operation timeouts (20s). When
set to default and podman is used, force the timeout
to 120s.
constraints:
- allowed_pattern: "([1-9][0-9]*s)?"

conditions:
pcmk_tls_priorities_empty: {equals: [{get_param: PacemakerTLSPriorities}, '']}
pcmk_bundle_op_timeout_empty: {equals: [{get_param: PacemakerBundleOperationTimeout}, '']}
podman_enabled: {equals: [{get_param: ContainerCli}, 'podman']}

outputs:
role_data:
@@ -153,6 +172,20 @@ outputs:
- pcmk_tls_priorities_empty
- {}
- tripleo::pacemaker::tls_priorities: {get_param: PacemakerTLSPriorities}
-
if:
- and:
- pcmk_bundle_op_timeout_empty
- not: podman_enabled
- {}
- tripleo::profile::base::pacemaker::resource_op_defaults:
bundle:
name: timeout
value:
if:
- pcmk_bundle_op_timeout_empty
- '120s'
- {get_param: PacemakerBundleOperationTimeout}
service_config_settings:
fluentd:
tripleo_fluentd_groups_pacemaker:

Loading…
Cancel
Save