nova/nova
Balazs Gibizer 30317e6b3f Replace blind retry with libvirt event waiting in detach
Nova so far applied a retry loop that tried to periodically detach the
device from libvirt while the device was visible in the domain xml. This
could lead to an issue where an already progressing detach on the
libvirt side is interrupted by nova re-sending the detach request for
the same device. See bug #1882521 for more information.

Also if there was both a persistent and a live domain the nova tried the
detach from both at the same call. This lead to confusion about the
result when such call failed. Was the detach failed partially?

We can do better, at least for the live detach case. Based on the
libvirt developers detaching from the persistent domain always
succeeds and it is a synchronous process. Detaching from the live
domain can be both synchronous or asynchronous depending on the guest
OS and the load on the hypervisor. But for live detach libvirt always
sends an event [1] nova can wait for.

So this patch does two things.

1) Separates the detach from the persistent domain from the detach from
   the live domain to make the error cases clearer.

2) Changes the retry mechanism.

   Detaching from the persistent domain is not retried. If libvirt
   reports device not found, while both persistent and live detach
   is needed, the error is ignored, and the process continues with
   the live detach. In any other case the error considered as fatal.

   Detaching from the live domain is changed to always wait for the
   libvirt event. In case of timeout, the live detach is retried.
   But a failure event from libvirt considered fatal, based on the
   information from the libvirt developers, so in this case the
   detach is not retried.

Related-Bug: #1882521

[1]https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectDomainEventDeviceRemovedCallback

Change-Id: I7f2b6330decb92e2838aa7cee47fb228f00f47da
(cherry picked from commit e56cc4f439)
2021-05-05 09:47:35 +02:00
..
accelerator Remove six.text_type (1/2) 2020-12-13 11:25:31 +00:00
api Merge "docs: Add note about rescuing bfv instances with the 2.87 microversion" 2021-03-24 13:23:49 +00:00
cmd Merge "Drop support for custom schedulers" 2021-03-07 11:35:48 +00:00
compute compute: Reject requests to commit intermediary snapshot of an inactive instance 2021-04-09 10:06:09 +01:00
conductor rpc: Rework 'get_notifier', 'wrap_exception' 2021-03-01 11:06:48 +00:00
conf Replace blind retry with libvirt event waiting in detach 2021-05-05 09:47:35 +02:00
console Remove six.text_type (1/2) 2020-12-13 11:25:31 +00:00
db Merge "Dynamically archive FK related records in archive_deleted_rows" 2021-03-23 13:19:38 +00:00
hacking Add a hacking rule for assert_has_calls 2020-09-28 23:08:15 +09:00
image glance: Remove [glance]/allowed_direct_url_schemes 2021-01-28 12:46:57 +00:00
keymgr
locale Imported Translations from Zanata 2020-04-26 07:51:21 +00:00
network [neutron] Get only ID and name of the SGs from Neutron 2021-04-21 07:14:21 +00:00
notifications libvirt: Add support for virtio-based input devices 2021-03-05 11:00:02 +00:00
objects Bump the Compute RPC API to version 6.0 2021-03-25 11:23:07 +01:00
pci tests: Add functional test for vDPA device 2021-03-16 20:39:27 +00:00
policies virt: Remove 'reset_network' API 2020-11-23 15:55:50 +00:00
privsep Remove VFSLocalFS 2021-03-03 17:55:43 +01:00
scheduler Merge "scheduler: Translate secure boot requests to trait" 2021-03-14 08:14:41 +00:00
servicegroup Remove six.binary_type/integer_types/string_types 2020-12-13 11:25:14 +00:00
storage Merge "rbd: Only log import failures when the RbdDriver is used" 2020-11-09 23:51:46 +00:00
tests Replace blind retry with libvirt event waiting in detach 2021-05-05 09:47:35 +02:00
virt Replace blind retry with libvirt event waiting in detach 2021-05-05 09:47:35 +02:00
volume Remove six.text_type (1/2) 2020-12-13 11:25:31 +00:00
__init__.py
availability_zones.py Remove six.PY2 and six.PY3 2020-08-15 07:45:23 +00:00
baserpc.py
block_device.py virt: Remove 'is_xenapi' helper 2020-09-11 14:09:06 +01:00
cache_utils.py trivial: Remove unused 'cache_utils' APIs 2020-02-05 17:20:28 +00:00
config.py Fix config option default value for sample config file 2020-11-25 00:05:08 +00:00
context.py Remove six.binary_type/integer_types/string_types 2020-12-13 11:25:14 +00:00
crypto.py Replace md5 for fips 2021-02-25 16:01:43 -05:00
debugger.py trivial: Remove remaining '_LW' instances 2020-05-18 17:00:41 +01:00
exception.py api: Block unsupported actions with vDPA 2021-03-16 20:39:27 +00:00
exception_wrapper.py rpc: Rework 'get_notifier', 'wrap_exception' 2021-03-01 11:06:48 +00:00
filters.py trivial: Remove remaining '_LI' instances 2020-05-18 17:00:57 +01:00
i18n.py trivial: Remove remaining '_LI' instances 2020-05-18 17:00:57 +01:00
loadables.py trivial: Remove dead code 2019-12-12 10:55:02 +00:00
manager.py Remove six.add_metaclass 2020-08-15 07:45:39 +00:00
middleware.py Rename 'nova.common.config' module to 'nova.middleware' 2019-08-16 00:53:03 +01:00
monkey_patch.py Correctly disable greendns 2020-09-11 12:42:04 -04:00
policy.py Reuse code from oslo lib for JSON policy migration 2021-01-14 22:41:33 +00:00
profiler.py
quota.py Make quotas respect instance_list_per_project_cells 2020-05-15 17:21:29 -04:00
rpc.py rpc: Rework 'get_notifier', 'wrap_exception' 2021-03-01 11:06:48 +00:00
safe_utils.py
service.py Restore retrying the RPC connection to conductor 2020-11-13 18:02:00 +01:00
service_auth.py
test.py Reset global wsgi app state in unit test 2021-03-24 12:04:51 +01:00
utils.py Merge "Initialize global data separately and run_once in WSGI app init" 2021-03-23 16:55:49 +00:00
version.py Change API unexpected exception message 2021-02-17 21:30:07 +00:00
weights.py Remove six.add_metaclass 2020-08-15 07:45:39 +00:00
wsgi.py trivial: Remove remaining '_LI' instances 2020-05-18 17:00:57 +01:00