nova/pci at cd03bbc1c33e33872594cf002f0e7011ab8ea047 - nova - OpenDev: Free Software Needs Free Tools

History

Balazs Gibizer 284ea72e96 Remove unavailable but not reported PCI devices at startup We saw in the field that the pci_devices table can end up in inconsistent state after a compute node HW failure and re-deployment. There could be dependent devices where the parent PF is in available state while the children VFs are in unavailable state. (Before the HW fault the PF was allocated hence the VFs was marked unavailable). In this state this PF is still schedulable but during the PCI claim the handling of dependent devices in the PCI tracker fill fail with the error: "Attempt to consume PCI device XXX from empty pool". The reason of the failure is that when the PF is claimed, all the children VFs are marked unavailable. But if the VF is already unavailable such step fails. One way the deployer might try to recover from this state is to remove the VFs from the hypervisor and restart the compute agent. The compute startup already has a logic to delete PCI devices that are unused and not reported by the hypervisor. However this logic only removed devices in 'available' state and ignored devices in 'unavailable' state. If a device is unused and the hypervisor is not reporting the device any more then it is safe to delete that device from the PCI tracker. So this patch extends the logic to allow deleting 'unavailable' devices. There is a small window when dependent PCI device is in 'unclaimable' state. From cleanup perspective this is an analogous state. So it is also added to the cleanup logic. Related-Bug: #1969496 Change-Id: If9ab424cc7375a1f0d41b03f01c4a823216b3eb8		2022-04-28 16:01:38 +02:00
..
__init__.py	PCI utils	2013-08-23 14:21:12 +08:00
devspec.py	Introduce remote_managed tag for PCI devs	2022-02-09 01:23:24 +03:00
manager.py	Remove unavailable but not reported PCI devices at startup	2022-04-28 16:01:38 +02:00
request.py	Introduce remote_managed tag for PCI devs	2022-02-09 01:23:24 +03:00
stats.py	Filter computes without remote-managed ports early	2022-02-09 01:23:27 +03:00
utils.py	Introduce remote_managed tag for PCI devs	2022-02-09 01:23:24 +03:00
whitelist.py	Introduce remote_managed tag for PCI devs	2022-02-09 01:23:24 +03:00