Deleting ic-nginx-ingress-controller at restore

Once k8s comes up after the etcd restore, there is a span of time
(around 20s) that the pod states have not been updated and are reported
as they were at the point in time where the backup was taken. This
returns that the ic-nginx-ingress-ingress-nginx-controller-XXX pod is
"Ready", but it is not... in several instances during my tests, the pod
was restarted 3-10 seconds after the task "Launch Armada with Helm v3"
failed due to not being able to call the webhook. The proposed solution
is to delete the pod preemptively and wait for it to be recreated and
"Ready".

TEST PLAN
PASS restore on virtual AIO-SX (CentOS)

Closes-Bug: #1978899
Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I20bec1fbbf809bfcf5d515ef55c6d47ab968dbf3
This commit is contained in:
Thiago Brito 2022-08-09 18:34:43 -03:00 committed by Thiago Paiva Brito
parent 822540ac77
commit c2e5db4305

View File

@ -162,6 +162,13 @@
register: nginx_webhook_service
ignore_errors: true
- name: If on system restore mode, kill ingress validating webhook pod so it can be recreated
shell: >-
kubectl delete pod -n kube-system
-l $(kubectl get service -n kube-system {{ nginx_webhook_service.stdout }}
-o jsonpath="{.spec.selector}" | tr -d "{}\"" | tr ":" "=")
when: mode == 'restore' and armada_check.rc == 0 and nginx_webhook_service.rc == 0
- name: Check ingress validating webhook service and pod status
shell: >-
kubectl wait pod -n kube-system