This commit adds a mechanism to the pod recovery service to restart
pods based on the restart-on-reboot label.
This is a mitigation for an issue seen on an AIO system using SR-IOV
interfaces on an N3000 FPGA device. Since the kubernetes services
start coming up after the controller manifest has completed, a race
can happen with the configuration of devices and the SR-IOV device
plugin in the worker manifest. The symptom of this would be the
SR-IOV device in the running pod disappearing as the FPGA device is
reset.
Notes:
- The pod recovery service only runs on controller nodes.
- The raciness between the kubernetes bring-up and worker configuration
should be fixed in the future by a re-organization of the manifests to
either have a separate AIO or kubernetes manifest. This would require
extensive feature work. In the meantime, this mitigation will allow
pods which experience this issue to recover.
Change-Id: If84b66b3a632752bd08293105bb780ea8c7cf400
Closes-Bug: #1896631
Signed-off-by: Steven Webster <steven.webster@windriver.com>