b24837a73d
Introduce a helper class SubprocessCleanup in dccommon which allows a worker to register a subprocess that must be cleaned up (killed) upon service exit. There are two parts to this mechanism: 1. Registration: - The subprocess is registered for cleanup when spawned (see utils.run_playbook_with_timeout) - Suprocess is also spawned using setsid in order to start a new process group + session 2. The Service calls subprocess_cleanup upon stopping. - All registered subprocesses are terminated using the os.killpg() call to terminate the entire subprocess process group. Caveat: This mechanism only handles clean process exit cases. If the process crashes or is is killed non-gracefully via SIGKILL, the cleanup will not happen. Closes-Bug: 1972013 Test Plan: PASS: Orchestrated prestaging: * Perform system host-swact while prestaging packages in progress - ansible-playbook is terminated - prestaging task is marked as prestaging-failed * Perform system host-swact while prestaging images in progress - ansible-playbook is terminated - prestaging task is marked as prestaging-failed * Restart dcmanager-orchestrator service for the same two cases as above - behaviour is the same as for swact * Kill dcmanager-orchestrator service while prestaging in progress Non-Orchestrated prestaging: * Perform host-swact and service restart for non-orchestrated prestaging - ansible-playbook is terminated - subcloud deploy status marked as prestaging-failed Swact during large-scale subcloud add - initiate large number of subcloud add operations - swact during 'installing' state - swact during 'bootstrapping' state - verify that ansible playbooks are killed - verify that deploy status is updated with -failed state Not covered: Tested a sudo 'pkill -9 dcmanager-manager' (ungraceful SIGKILL) - in this case the ansible subprocess tree is not cleaned up - this is expected - we aren't handling a non-clean shutdown Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com> Change-Id: I714398017b71c99edeeaa828933edd8163fb67cd
74 lines
2.3 KiB
Python
74 lines
2.3 KiB
Python
#
|
|
# Copyright (c) 2022 Wind River Systems, Inc.
|
|
#
|
|
# SPDX-License-Identifier: Apache-2.0
|
|
#
|
|
|
|
import os
|
|
import signal
|
|
import time
|
|
|
|
from oslo_concurrency import lockutils
|
|
from oslo_log import log as logging
|
|
|
|
LOG = logging.getLogger(__name__)
|
|
|
|
|
|
class SubprocessCleanup(object):
|
|
"""Lifecycle manager for subprocesses spawned via python subprocess.
|
|
|
|
Notes:
|
|
- This is a best-effort cleanup. We need to preserve fast shutdown
|
|
times in case of a SWACT.
|
|
- There could potentially be multiple hundreds of subprocesses needing
|
|
to be cleaned up here.
|
|
"""
|
|
|
|
LOCK_NAME = 'subprocess-cleanup'
|
|
SUBPROCESS_GROUPS = {}
|
|
|
|
@staticmethod
|
|
def register_subprocess_group(subprocess_p):
|
|
SubprocessCleanup.SUBPROCESS_GROUPS[subprocess_p.pid] = subprocess_p
|
|
|
|
@staticmethod
|
|
def unregister_subprocess_group(subprocess_p):
|
|
SubprocessCleanup.SUBPROCESS_GROUPS.pop(subprocess_p.pid, None)
|
|
|
|
@staticmethod
|
|
@lockutils.synchronized(LOCK_NAME)
|
|
def shutdown_cleanup(origin='service'):
|
|
SubprocessCleanup._shutdown_subprocess_groups(origin)
|
|
|
|
@staticmethod
|
|
def _shutdown_subprocess_groups(origin):
|
|
num_process_groups = len(SubprocessCleanup.SUBPROCESS_GROUPS)
|
|
if num_process_groups > 0:
|
|
LOG.warn("Shutting down %d process groups via %s",
|
|
num_process_groups, origin)
|
|
start_time = time.time()
|
|
for _, subp in SubprocessCleanup.SUBPROCESS_GROUPS.items():
|
|
kill_subprocess_group(subp)
|
|
LOG.info("Time for %s child processes to exit: %s",
|
|
num_process_groups,
|
|
time.time() - start_time)
|
|
|
|
|
|
def kill_subprocess_group(subp, logmsg=None):
|
|
"""Kill the subprocess and any children."""
|
|
exitcode = subp.poll()
|
|
if exitcode:
|
|
LOG.info("kill_subprocess_tree: subprocess has already "
|
|
"terminated, pid: %s, exitcode=%s", subp.pid, exitcode)
|
|
return
|
|
|
|
if logmsg:
|
|
LOG.warn(logmsg)
|
|
else:
|
|
LOG.warn("Killing subprocess group for pid: %s, args: %s",
|
|
subp.pid, subp.args)
|
|
# Send a SIGTERM (normal kill). We do not verify if the processes
|
|
# are shutdown (best-effort), since we don't want to wait around before
|
|
# issueing a SIGKILL (fast shutdown)
|
|
os.killpg(subp.pid, signal.SIGTERM)
|