This script was set to always restart the local sriov device plugin pod
which could result in sriov pods not starting properly.
Originally, this sequence of commands would not work properly if the
device plugin was running
kubectl delete pods -n kube-system --selector=app=sriovdp
--field-selector=spec.nodeName=${HOST} --wait=false
kubectl wait pods -n kube-system --selector=app=sriovdp
--field-selector=spec.nodeName=${HOST} --for=condition=Ready
--timeout=360s
Result when device plugin is running:
pod "kube-sriov-device-plugin-amd64-rbjpw" deleted
pod/kube-sriov-device-plugin-amd64-rbjpw condition met
The wait command succeeds against the deleted pod and the script
continues. It then deletes labeled pods without having confirmed that
the device plugin is running and can result in sriov pods not starting
properly.
Ensuring that we are only restarting a not-running device plugin pod
prevents the wait condition from immediately passing.
Closes-Bug: 1928965
Signed-off-by: Cole Walker <cole.walker@windriver.com>
Change-Id: I1cc576b26a4bba4eba4a088d33f918bb07ef3b0d
This change modifies the k8s-pod-recovery service to wait for the
kube-sriov-device-plugin-amd64 pod on the local node to become
available before proceeding with the recovery of
restart-on-reboot=true labeled pods.
This is required because of a race condition where pods marked for
recovery would be restarted before the device plugin was ready and
the pods would then be stuck in "ContainerCreating".
The fix in this commit uses the kubectl wait ...
command to wait for the daemonset to be available. A timeout of 360s
has been set for this command in order to all enough time on busy
systems for the device-plugin pod to come up. The wait command
completes as soon as the pod is ready.
Closes-Bug: 1928965
Signed-off-by: Cole Walker <cole.walker@windriver.com>
Change-Id: Ie1937cf0612827b28762049e2dc440e55726d4f3
This reverts commit 8abcbf6fb1951b25e9964933558b75b9aff88135.
Reason for revert:
After performing a backup and restore on an AIO-SX system, SRIOV pods do
not return to a running state and are instead stuck in "container
creating". The workaround for this is to restart SRIOV pods when the
system unlocks.
Reverting this commit to allow users to label SRIOV pods and have them
restarted by k8s-pod-recovery. Labelled pods will be restarted by
k8s-pod-recovery and will be running after backup and restore is
completed.
This change has been tested by performing backup and restore on an
AIO-SX system. SRIOV pods now come up correctly when labelled with
restart-on-reboot=true
Closes-Bug: 1928965
Signed-off-by: Cole Walker <cole.walker@windriver.com>
Change-Id: I9c520c0a47aabca7b96e50adf0f71742f4199c2f
Update k8s pod recovery service to include armada namespace
so armada pod that stuck in an unknown state after host
lock/unlock or reboot could be recovered by the service.
Change-Id: Iacd92637a9b4fcaf4c0076e922e1bd739f69a584
Closes-Bug: 1928018
Signed-off-by: Angie Wang <angie.wang@windriver.com>
The pods being labeled as "restart-on-reboot" is to workaround
kubernetes restart on worker manifest. As the AIO running a
single manifest to start kubernetes only once, the operation
is no longer needed.
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/785736
Change-Id: I0d6c549199559b2bc19d8edff52f64ea0b08b50d
Closes-Bug: 1918139
Signed-off-by: Bin Qian <bin.qian@windriver.com>
At startup, there might be pods that are left in unknown states.
The k8s-pod-recovery service takes care of
recovering these unknown pods in specific namespaces.
To fix this for custom apps that are not part of starlingx,
we modify the service to look into the /etc/k8s-post-recovery.d
directory for conf files. Any app that needs to be recovered by this
service will have to create a conf file e.g the app-1 will create
/etc/k8s-post-recovery.d/APP_1.conf which will contain the following:
namespace=app-1-namespace
Closes-Bug: 1917781
Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Change-Id: I8febdb685d506cff3c34946163612cafdab3e3a8
Pods that are in a k8s deployment, daemonset, etc can be labeled as
restart-on-reboot="true", which will automatically cause them to be
restarted after the worker manifest has completed in an AIO system.
It may happen, however, that k8s-pod-recovery service is started
before the pods are scheduled and created at the node the script is
running on, causing them to be not restarted. The proposed solution is
to wait for stabilization of labeled pods before restarting them.
Closes-Bug: 1900920
Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Change-Id: I5c73bd838ab2be070bd40bea9e315dcf3852e47f
This commit adds a mechanism to the pod recovery service to restart
pods based on the restart-on-reboot label.
This is a mitigation for an issue seen on an AIO system using SR-IOV
interfaces on an N3000 FPGA device. Since the kubernetes services
start coming up after the controller manifest has completed, a race
can happen with the configuration of devices and the SR-IOV device
plugin in the worker manifest. The symptom of this would be the
SR-IOV device in the running pod disappearing as the FPGA device is
reset.
Notes:
- The pod recovery service only runs on controller nodes.
- The raciness between the kubernetes bring-up and worker configuration
should be fixed in the future by a re-organization of the manifests to
either have a separate AIO or kubernetes manifest. This would require
extensive feature work. In the meantime, this mitigation will allow
pods which experience this issue to recover.
Change-Id: If84b66b3a632752bd08293105bb780ea8c7cf400
Closes-Bug: #1896631
Signed-off-by: Steven Webster <steven.webster@windriver.com>
Add a recovery service, started by systemd on a host boot, that waits
for pod transitions to stabilize and then takes corrective action for
the following set of conditions:
- Delete to restart pods stuck in an Unknown or Init:Unknown state for
the 'openstack' and 'monitor' namespaces.
- Delete to restart Failed pods stuck in a NodeAffinity state that occur
in any namespace.
- Delete to restart the libvirt pod in the 'openstack' namespace when
any of its conditions (Initialized, Ready, ContainersReady,
PodScheduled) are not True.
This will only recover pods specific to the host where the service is
installed.
This service is installed on all controller types. There is currently no
evidence that we need this on dedicated worker nodes.
Each of these conditions should to be evaluated after the next k8s
component rebase to determine if any of these recovery action can be
removed.
Change-Id: I0e304d1a2b0425624881f3b2d9c77f6568844196
Closes-Bug: #1893977
Signed-off-by: Robert Church <robert.church@windriver.com>