cinder/cinder
Gorka Eguileor 0b2c0d9238 LVM: Fix delete volume error due to lvs failure
Sometimes we get an unexpected failure on the lvs command where it exits
with code 139, which is not one of the 5 predefined ones in the code
ECMD_PROCESSED, ENO_SUCH_CMD, EINVALID_CMD_LINE, EINIT_FAILED, or
ECMD_FAILED.

We've seen this happen mostly on CI jobs when deleting a volume, on the
check to see if the volume is present (``_volume_not_present``) and
makes the whole operation unexpectedly fail.

When looking at the logs we can find other instances of this same exit
code happening in other calls, but those are fortunately to be covered
by retries for other unexpected errors such as bug #1335905 that seem to
make the call eventually succeed.

The stderr of the failure is polluted with debug and warning messages
such as:

  [1] /dev/sda1: stat failed: No such file or directory

      This has been removed [2] from the LVM code indicating it's
      somewhat troublesome, but doesn't explain how.

  [3] Path /dev/sda1 no longer valid for device(8,1)

  [4] WARNING: Scan ignoring device 8:0 with no paths.

But the real error is most likely:

  [5]: Device open /dev/sda 8:0 failed errno 2

On failure we see that error twice, because the code retries it in LVM
trying to workaround some kind of unknown udev race [6].

Since the LVM code indicates that a retry can help, we'll retry on error
139 when calling ``get_lv_info``.

To narrow down the retry we'll only do it on error 139, so we modify the
existing ``retry`` decorator to accept the ``retry`` parameter (same as
the tenacity library) and create our own retry if the
ProcessExecutionError fails with a specific error.

This pattern seems better than blindly retrying all
ProcessExecutionError cases.

[1]: 17f5572bc9/lib/filters/filter-persistent.c (L132)
[2]: 22c5467add
[3]: b84a9927b7/lib/device/dev-cache.c (L1396-L1402)
[4]: b84a9927b7/lib/label/label.c (L798)
[5]: b84a9927b7/lib/label/label.c (L550)
[6]: b84a9927b7/lib/label/label.c (L562-L567)

Closes-Bug: #1901783
Change-Id: I6824ba4fbcb6fd8f57f8ff86ad7132446ac6c504
2021-03-29 16:43:10 +02:00
..
api API validation: Use cinder_host for services checks 2021-03-13 21:55:57 +00:00
backup Merge "Backup manager: Synchronously call remove_export" 2021-03-25 17:00:42 +00:00
brick LVM: Fix delete volume error due to lvs failure 2021-03-29 16:43:10 +02:00
cmd Merge "Remove NestedQuotaDriver" 2021-02-16 16:26:03 +00:00
common Support mTLS when calling the glance API 2021-03-22 22:00:44 +00:00
compute nova: use EndpointNotFound from keystoneauth1 2019-09-03 10:58:59 -04:00
db Fix automatic quota sync for temporary volumes 2021-03-26 12:26:15 +01:00
group Use resource_backend for volumes and groups 2020-08-14 08:13:42 +00:00
image Support mTLS when calling the glance API 2021-03-22 22:00:44 +00:00
interface Add explanations on safe delete 2021-03-17 14:04:20 +01:00
keymgr Introduce flake8-import-order extension 2020-01-06 09:59:35 -06:00
locale Imported Translations from Zanata 2021-03-24 06:25:01 +00:00
message Add user messages for some volume snapshot actions 2019-04-26 17:02:05 -04:00
objects Fix volume OVO create method 2021-03-17 13:07:09 +01:00
policies Simplify composite check strings for project personas 2021-02-17 17:44:40 +00:00
privsep Enable flake8-logging-format extension 2020-01-09 14:35:20 -06:00
scheduler Remove six of dir cinder/scheduler/* 2020-10-08 08:36:17 +08:00
tests LVM: Fix delete volume error due to lvs failure 2021-03-29 16:43:10 +02:00
transfer Fix: show volume transfer by name for non-admins 2020-08-03 12:46:31 +00:00
volume LVM: Fix delete volume error due to lvs failure 2021-03-29 16:43:10 +02:00
wsgi Introduce flake8-import-order extension 2020-01-06 09:59:35 -06:00
zonemanager Brocade: Fix lookup UnboundLocalError 2020-08-07 15:24:44 +02:00
__init__.py
context.py mypy: annotate volume manager 2021-02-10 12:27:47 -05:00
coordination.py
exception.py Merge "Remove NestedQuotaDriver" 2021-02-16 16:26:03 +00:00
flow_utils.py
i18n.py
manager.py mypy: annotate volume manager 2021-02-10 12:27:47 -05:00
opts.py Merge "Update code layout and missing Zadara features" 2021-03-19 19:00:48 +00:00
policy.py Merge "Make sure we pass context objects directly to policy enforcement" 2021-03-07 00:07:42 +00:00
quota.py Remove NestedQuotaDriver 2021-01-19 17:43:29 +00:00
quota_utils.py Remove NestedQuotaDriver 2021-01-19 17:43:29 +00:00
rpc.py Remove six in files under cinder/* 2020-10-08 14:00:14 +08:00
service.py Fix typo on service cluster change method 2020-05-06 19:36:07 -05:00
service_auth.py Add service_token for cinder-nova interaction 2017-12-15 12:04:23 +05:30
ssh_utils.py Remove six in files under cinder/* 2020-10-08 14:00:14 +08:00
utils.py LVM: Fix delete volume error due to lvs failure 2021-03-29 16:43:10 +02:00
version.py