Terminate lingering kubeadm and puppet process during K8S abort

During a Kubernetes orchestration upgrade, if the kubeadm
control plane upgrade process is interrupted, it can cause
the control plane upgrade to fail due to a timeout. If a
Kubernetes abort is triggered after this interruption, the
system will become stuck in the "upgrade-aborting" state.

This change ensures that both the kubeadm and Puppet processes
are properly terminated when a Kubernetes abort occurs.
By doing so, it allows the system to successfully complete
the abort process.

This updates the runtime configuration report status for
update control plane to failed.

Test Plan:
PASS: Perform a manual and orchestrated Kubernetes upgrade
from version 1.24.4 to 1.25.3.
PASS: Execute the Kubernetes orchestration upgrade, and verify
by manually stopping the kubeadm and its parent process during the
control plane upgrade. Ensure that Kubernetes aborts successfully
after the timeout is reached.
PASS: Perform a manual Kubernetes upgrade and verify by aborting
the upgrade at each step, ensuring that it is aborted successfully
at each stage.

Closes-Bug: 2095599

Change-Id: Ia215998a70eb60eee44b2aadbb1ad726d136648b
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
This commit is contained in:
Boovan Rajendran
2025-01-22 06:00:41 -05:00
parent aa4f1e714d
commit 2f1fa2a931

View File

@@ -38,6 +38,7 @@ import io
import json
import math
import os
import psutil
import re
import requests
import ruamel.yaml as yaml
@@ -18305,6 +18306,23 @@ class ConductorManager(service.PeriodicService):
constants.CONTROLLER)
system = self.dbapi.isystem_get_one()
if system.system_mode == constants.SYSTEM_MODE_SIMPLEX:
# Terminate lingering kubeadm and puppet processes
# left-over from timed out operation.
try:
for proc in psutil.process_iter(['pid', 'name', 'cmdline']):
if any('kubeadm' in line or 'puppet' in line for line in proc.info['cmdline']):
proc.kill()
except Exception as e:
LOG.error("Error in killing process %s" % e)
# update runtime config report status for upgrade control plane to failed.
pending_runtime_config = self.dbapi.runtime_config_get_all(
state=constants.RUNTIME_CONFIG_STATE_PENDING)
for rc in pending_runtime_config:
config_dict = json.loads(rc.config_dict)
if config_dict['report_status'] == puppet_common.REPORT_UPGRADE_CONTROL_PLANE:
rc_update_values = {"state": constants.RUNTIME_CONFIG_STATE_FAILED}
self.dbapi.runtime_config_update(rc.id, rc_update_values)
# check for the control plane backup path exists
if not os.path.exists(kubernetes.KUBE_CONTROL_PLANE_ETCD_BACKUP_PATH) or \
not os.path.exists(kubernetes.KUBE_CONTROL_PLANE_STATIC_PODS_BACKUP_PATH):