8e84309624
This script was set to always restart the local sriov device plugin pod which could result in sriov pods not starting properly. Originally, this sequence of commands would not work properly if the device plugin was running kubectl delete pods -n kube-system --selector=app=sriovdp --field-selector=spec.nodeName=${HOST} --wait=false kubectl wait pods -n kube-system --selector=app=sriovdp --field-selector=spec.nodeName=${HOST} --for=condition=Ready --timeout=360s Result when device plugin is running: pod "kube-sriov-device-plugin-amd64-rbjpw" deleted pod/kube-sriov-device-plugin-amd64-rbjpw condition met The wait command succeeds against the deleted pod and the script continues. It then deletes labeled pods without having confirmed that the device plugin is running and can result in sriov pods not starting properly. Ensuring that we are only restarting a not-running device plugin pod prevents the wait condition from immediately passing. Closes-Bug: 1928965 Signed-off-by: Cole Walker <cole.walker@windriver.com> Change-Id: I1cc576b26a4bba4eba4a088d33f918bb07ef3b0d |
||
---|---|---|
.. | ||
k8s-pod-recovery | ||
k8s-pod-recovery.service |