This patch refactors iSCSI disconnect code changing the approach to one
that just uses `iscsiadm -m session` and sysfs to get all the required
information: devices from the connection, multipath system device name,
multipath name, the WWN for the block devices...
By doing so, not only do we fix a good number of bugs, but we also
improve the reliability and speed of the mechanism.
A good example of improvements and benefits achieved by this patch are:
- Common code for multipath and single path disconnects.
- No more querying iSCSI devices for their WWN (page 0x83) removing
delays and issue on flaky connections.
- All devices are properly cleaned even if they are not part of the
multipath.
- We wait for device removal and do it in parallel if there are
multiple.
- Removed usage of `multipath -l` to find devices which is really slow
with flaky connections and didn't work when called with a device from
a path that is down.
- Prevent losing data when detaching, currently if the multipath flush
fails for any other reason than "in use" we silently continue with the
removal. That is the case when all paths are momentarily down.
- Adds a new mechanism for the caller of the disconnect to specify that
it's acceptable to lose data and that it's more important to leave a
clean system. That is the case if we are creating a volume from an
image, since the volume will just be set to error, but we don't want
leftovers. Optionally we can tell os-brick to ignore errors and don't
raise an exception if the flush fails.
- Add a warning when we could be leaving leftovers behind due to
disconnect issues.
- Action retries (like multipath flush) will now only log the final
exception instead of logging all the exceptions.
- Flushes of individual paths now use exponential backoff retries
instead of random retries between 0.2 and 2 seconds (from oslo
library).
- We no longer use symlinks from `/dev/disk/by-path`, `/dev/disk/by-id`,
or `/dev/mapper` to find devices or multipaths, as they could be
leftovers from previous runs.
- With high failure rates (above 30%) some CLI calls will enter into a
weird state where they wait forever, so we add a timeout mechanism in
our `execute` method and add it to those specific calls.
Closes-Bug: #1502534
Change-Id: I058ff0a0e5ad517507dc3cda39087c913558561d