7756299303
This commit adds a mechanism to the pod recovery service to restart pods based on the restart-on-reboot label. This is a mitigation for an issue seen on an AIO system using SR-IOV interfaces on an N3000 FPGA device. Since the kubernetes services start coming up after the controller manifest has completed, a race can happen with the configuration of devices and the SR-IOV device plugin in the worker manifest. The symptom of this would be the SR-IOV device in the running pod disappearing as the FPGA device is reset. Notes: - The pod recovery service only runs on controller nodes. - The raciness between the kubernetes bring-up and worker configuration should be fixed in the future by a re-organization of the manifests to either have a separate AIO or kubernetes manifest. This would require extensive feature work. In the meantime, this mitigation will allow pods which experience this issue to recover. Change-Id: If84b66b3a632752bd08293105bb780ea8c7cf400 Closes-Bug: #1896631 Signed-off-by: Steven Webster <steven.webster@windriver.com> |
||
---|---|---|
.. | ||
centos |