0b2c0d9238
Sometimes we get an unexpected failure on the lvs command where it exits with code 139, which is not one of the 5 predefined ones in the code ECMD_PROCESSED, ENO_SUCH_CMD, EINVALID_CMD_LINE, EINIT_FAILED, or ECMD_FAILED. We've seen this happen mostly on CI jobs when deleting a volume, on the check to see if the volume is present (``_volume_not_present``) and makes the whole operation unexpectedly fail. When looking at the logs we can find other instances of this same exit code happening in other calls, but those are fortunately to be covered by retries for other unexpected errors such as bug #1335905 that seem to make the call eventually succeed. The stderr of the failure is polluted with debug and warning messages such as: [1] /dev/sda1: stat failed: No such file or directory This has been removed [2] from the LVM code indicating it's somewhat troublesome, but doesn't explain how. [3] Path /dev/sda1 no longer valid for device(8,1) [4] WARNING: Scan ignoring device 8:0 with no paths. But the real error is most likely: [5]: Device open /dev/sda 8:0 failed errno 2 On failure we see that error twice, because the code retries it in LVM trying to workaround some kind of unknown udev race [6]. Since the LVM code indicates that a retry can help, we'll retry on error 139 when calling ``get_lv_info``. To narrow down the retry we'll only do it on error 139, so we modify the existing ``retry`` decorator to accept the ``retry`` parameter (same as the tenacity library) and create our own retry if the ProcessExecutionError fails with a specific error. This pattern seems better than blindly retrying all ProcessExecutionError cases. [1]:17f5572bc9/lib/filters/filter-persistent.c (L132)
[2]:22c5467add
[3]:b84a9927b7/lib/device/dev-cache.c (L1396-L1402)
[4]:b84a9927b7/lib/label/label.c (L798)
[5]:b84a9927b7/lib/label/label.c (L550)
[6]:b84a9927b7/lib/label/label.c (L562-L567)
Closes-Bug: #1901783 Change-Id: I6824ba4fbcb6fd8f57f8ff86ad7132446ac6c504
7 lines
200 B
YAML
7 lines
200 B
YAML
---
|
|
fixes:
|
|
- |
|
|
LVM driver `bug #1901783
|
|
<https://bugs.launchpad.net/cinder/+bug/1901783>`_: Fix unexpected delete
|
|
volume failure due to unexpected exit code 139 on ``lvs`` command call.
|