nova/nova
Balazs Gibizer 284ea72e96 Remove unavailable but not reported PCI devices at startup
We saw in the field that the pci_devices table can end up in
inconsistent state after a compute node HW failure and re-deployment.
There could be dependent devices where the parent PF is in available
state while the children VFs are in unavailable state. (Before the HW
fault the PF was allocated hence the VFs was marked unavailable).

In this state this PF is still schedulable but during the
PCI claim the handling of dependent devices in the PCI tracker fill fail
with the error: "Attempt to consume PCI device XXX from empty pool".

The reason of the failure is that when the PF is claimed, all the
children VFs are marked unavailable. But if the VF is already
unavailable such step fails.

One way the deployer might try to recover from this state is to remove
the VFs from the hypervisor and restart the compute agent. The compute
startup already has a logic to delete PCI devices that are unused and
not reported by the hypervisor. However this logic only removed devices
in 'available' state and ignored devices in 'unavailable' state.

If a device is unused and the hypervisor is not reporting the device any
more then it is safe to delete that device from the PCI tracker. So this
patch extends the logic to allow deleting 'unavailable' devices. There
is a small window when dependent PCI device is in 'unclaimable' state.
From cleanup perspective this is an analogous state. So it is also
added to the cleanup logic.

Related-Bug: #1969496
Change-Id: If9ab424cc7375a1f0d41b03f01c4a823216b3eb8
2022-04-28 16:01:38 +02:00
..
accelerator smartnic support - reject server move and suspend 2021-08-05 15:58:41 +08:00
api Fix wrong attribute to find remote address 2022-04-04 00:22:25 +09:00
cmd Merge "Follow up for nova-manage image property commands" 2022-04-21 09:43:55 +00:00
compute VMware: Early fail spawn if memory is not multiple of 4. 2022-04-19 15:47:35 +00:00
conductor Enforce resource limits using oslo.limit 2022-02-24 16:21:03 +00:00
conf Deprecate [api] use_forwarded_for 2022-04-23 16:15:15 +00:00
console Merge "console: Improve logging" 2021-09-07 14:29:08 +00:00
db db: Close connection on early return 2022-04-22 10:24:34 +01:00
hacking hacking: Prevent use of six 2022-04-05 12:59:12 +01:00
image Merge "Close Glance image if downloading failed." 2022-01-17 10:31:21 +00:00
keymgr
limit Follow up for unified limits 2022-03-04 03:42:58 +00:00
locale Imported Translations from Zanata 2022-04-01 04:02:00 +00:00
network Merge "refactor: remove duplicated logic" 2022-04-14 12:10:37 +00:00
notifications object/notification for Adds Pick guest CPU architecture based on host 2022-02-24 12:06:55 -05:00
objects Remove unavailable but not reported PCI devices at startup 2022-04-28 16:01:38 +02:00
pci Remove unavailable but not reported PCI devices at startup 2022-04-28 16:01:38 +02:00
policies Complete phase-1 of RBAC community-wide goal 2022-02-24 16:33:34 +00:00
privsep Retry lvm volume and volume group query 2021-06-15 12:39:26 +02:00
releasenotes/notes api: enable oslo.reports when using uWSGI 2021-10-14 09:23:08 +03:00
scheduler Merge "Tell oslo.limit how to count nova resources" 2022-02-26 23:44:52 +00:00
servicegroup Remove six.binary_type/integer_types/string_types 2020-12-13 11:25:14 +00:00
storage Add autopep8 to tox and pre-commit 2021-11-08 12:37:27 +00:00
tests Remove unavailable but not reported PCI devices at startup 2022-04-28 16:01:38 +02:00
virt VMware: Split out VMwareAPISession 2022-04-23 12:54:56 +00:00
volume Add volume-rebuild support to cinder module 2022-02-25 02:12:55 +05:30
__init__.py
availability_zones.py Remove six.PY2 and six.PY3 2020-08-15 07:45:23 +00:00
baserpc.py
block_device.py fup: Remove unused legacy block_device_info format 2021-08-20 13:26:46 +01:00
cache_utils.py trivial: Remove unused 'cache_utils' APIs 2020-02-05 17:20:28 +00:00
config.py conf: Allow cinderclient and os_brick to independently log at DEBUG 2021-12-03 18:21:16 +00:00
context.py db: Unify 'nova.db.api', 'nova.db.sqlalchemy.api' 2021-08-09 15:34:40 +01:00
crypto.py Replace md5 for fips 2021-02-25 16:01:43 -05:00
debugger.py trivial: Remove remaining '_LW' instances 2020-05-18 17:00:41 +01:00
exception.py Add logic to enforce local api and db limits 2022-02-24 16:21:02 +00:00
exception_wrapper.py rpc: Rework 'get_notifier', 'wrap_exception' 2021-03-01 11:06:48 +00:00
filters.py Add autopep8 to tox and pre-commit 2021-11-08 12:37:27 +00:00
i18n.py trivial: Remove remaining '_LI' instances 2020-05-18 17:00:57 +01:00
loadables.py
manager.py db: Unify 'nova.db.api', 'nova.db.sqlalchemy.api' 2021-08-09 15:34:40 +01:00
middleware.py Allow X-OpenStack-Nova-API-Version header in CORS 2021-06-15 07:35:36 -04:00
monkey_patch.py reenable greendns in nova. 2022-03-08 16:16:11 +00:00
policy.py Reuse code from oslo lib for JSON policy migration 2021-01-14 22:41:33 +00:00
profiler.py
quota.py Follow up for unified limits 2022-03-04 03:42:58 +00:00
rpc.py rpc: Rework 'get_notifier', 'wrap_exception' 2021-03-01 11:06:48 +00:00
safe_utils.py
service.py Add service version check workaround for FFU 2022-01-24 08:45:58 -08:00
service_auth.py
test.py db: Don't pass strings to 'Connection.execute' 2021-11-12 09:58:42 +00:00
utils.py Fix eventlet.tpool import 2022-02-22 12:40:15 +01:00
version.py Change API unexpected exception message 2021-02-17 21:30:07 +00:00
weights.py Add debug log for scheduler weight calculation 2021-11-11 19:10:32 +01:00
wsgi.py trivial: Remove remaining '_LI' instances 2020-05-18 17:00:57 +01:00