Terminate lingering kubeadm and puppet process during K8S abort
During a Kubernetes orchestration upgrade, if the kubeadm control plane upgrade process is interrupted, it can cause the control plane upgrade to fail due to a timeout. If a Kubernetes abort is triggered after this interruption, the system will become stuck in the "upgrade-aborting" state. This change ensures that both the kubeadm and Puppet processes are properly terminated when a Kubernetes abort occurs. By doing so, it allows the system to successfully complete the abort process. This updates the runtime configuration report status for update control plane to failed. Test Plan: PASS: Perform a manual and orchestrated Kubernetes upgrade from version 1.24.4 to 1.25.3. PASS: Execute the Kubernetes orchestration upgrade, and verify by manually stopping the kubeadm and its parent process during the control plane upgrade. Ensure that Kubernetes aborts successfully after the timeout is reached. PASS: Perform a manual Kubernetes upgrade and verify by aborting the upgrade at each step, ensuring that it is aborted successfully at each stage. Closes-Bug: 2095599 Change-Id: Ia215998a70eb60eee44b2aadbb1ad726d136648b Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
This commit is contained in:
@@ -38,6 +38,7 @@ import io
|
||||
import json
|
||||
import math
|
||||
import os
|
||||
import psutil
|
||||
import re
|
||||
import requests
|
||||
import ruamel.yaml as yaml
|
||||
@@ -18305,6 +18306,23 @@ class ConductorManager(service.PeriodicService):
|
||||
constants.CONTROLLER)
|
||||
system = self.dbapi.isystem_get_one()
|
||||
if system.system_mode == constants.SYSTEM_MODE_SIMPLEX:
|
||||
# Terminate lingering kubeadm and puppet processes
|
||||
# left-over from timed out operation.
|
||||
try:
|
||||
for proc in psutil.process_iter(['pid', 'name', 'cmdline']):
|
||||
if any('kubeadm' in line or 'puppet' in line for line in proc.info['cmdline']):
|
||||
proc.kill()
|
||||
except Exception as e:
|
||||
LOG.error("Error in killing process %s" % e)
|
||||
# update runtime config report status for upgrade control plane to failed.
|
||||
pending_runtime_config = self.dbapi.runtime_config_get_all(
|
||||
state=constants.RUNTIME_CONFIG_STATE_PENDING)
|
||||
for rc in pending_runtime_config:
|
||||
config_dict = json.loads(rc.config_dict)
|
||||
if config_dict['report_status'] == puppet_common.REPORT_UPGRADE_CONTROL_PLANE:
|
||||
rc_update_values = {"state": constants.RUNTIME_CONFIG_STATE_FAILED}
|
||||
self.dbapi.runtime_config_update(rc.id, rc_update_values)
|
||||
|
||||
# check for the control plane backup path exists
|
||||
if not os.path.exists(kubernetes.KUBE_CONTROL_PLANE_ETCD_BACKUP_PATH) or \
|
||||
not os.path.exists(kubernetes.KUBE_CONTROL_PLANE_STATIC_PODS_BACKUP_PATH):
|
||||
|
||||
Reference in New Issue
Block a user