By default, udev will set device by-path name to its ID_PATH.
Looking at [1], most of the time, current code works,
but when system uses platform, then disk name will be prefixed by
'platform-xxxxx'.
[1]: a0be538616/src/udev/udev-builtin-path_id.c (L530)
Change-Id: I9b2c120f074f60b9af6dd81718a5287656040aba
Closes-Bug: #1862443
(cherry picked from commit 7c9020f006)
(cherry picked from commit 9491b6a417)
Method _discover_mpath_device isn't overwrited in FibreChannelConnector,
just use it by self._discover_mpath_device.
Change-Id: I3d5a6e0010e56d256da17e4b20be3db0569c6702
In FC environment, when disconnect volume,
if the first path in the loop is failed,
it will lead get scsi wwn to failure.
So I think we need to add path validity check
before get scsi wwn
Change-Id: I9d3a9dac13dcd585330e7b891c61b2626e5aabec
Closes-Bug: 1831621
Previously linuxscsi.extend_volume would always attempt to wait for
multipath devices to appear regardless of multipath actually being used
by the connector.
This change corrects this by passing down the existing use_multipath
attribute from the iSCSI and FC connectors into linuxscsi.extend_volume.
The same attribute is introduced to the NVMe connector to also allow it
to skip this search for multipath devices.
Change-Id: I29d65ae036957f3a63cba93dd330b14e3361a1b9
Closes-bug: #1832247
For FC connections there are multiple places where we check the
initiator target map provided by the backend against the port names of
the HBAs on the system.
Currently this check is case sensitive, but some backends are returning
the port names in the initiator target map upper cased, which usually
results in attach failures.
Some of the reasons for the attach failures is that os-brick will not
issue scan requests.
Example from a 3PAR backend from a specific system:
Connector properties:
{
'wwpns': ['10001409dcd71ff6', '10001409dcd71ff7'],
'wwnns': ['20001409dcd71ff6', '20001409dcd71ff7'],
...
}
Connection properties:
{
'initiator_target_map': {
'10001409DCD71FF6': ['20320002AC01E166', '21420002AC01E166'],
'10001409DCD71FF7': ['20410002AC01E166', '21410002AC01E166']
}
...
}
This patch converts to lower case the
initiator_target_map and the target_wwn/target_wwns.
Closes-Bug: #1775677
Change-Id: I12b9535d8a9969356394e406a1ed5ac4a5f1f959
Should check not only if 'initiator_target_map' is in conn_props but
also its value is not None.
Change-Id: I7ca0bf5fac20a370097f1c39ce005a1b23814459
Closes-Bug: #1746218
The msg of VolumePathsNotFound exception should contain more info in
connection_properties, like 'target_wwns' and 'target_luns'. In this
case we provide the summary info 'luns' and 'wwns' instead.
Change-Id: Id787809161e2125393bf4fe303b5bc0a8d92c2e4
Checks for fibre channel device paths will only fail if there are
no FC HBAs since the actual paths are just calculated based on the
HBA information, whether they exist or not. This moves the check
earlier in the process to avoid that calculation if we don't have
any HBA information.
Change-Id: Ib8bfea2c9603335e93b1407605f0e59d3c29f622
Under certain conditions detaching a multipath device may result on
failure when flushing one of the individual paths, but the disconnect
should have succeeded, because there were other paths available to flush
all the data.
OS-Brick is currently following standard recommended disconnect
mechanism for multipath devices:
- Release all device holders
- Flush multipath
- Flush single paths
- Delete single devices
The problem is that this procedure does an innecessary step, flushing
individual single paths, that may result in an error.
Originally it was thought that the individual flushes were necessary to
prevent data loss, but upon further study of the multipath-tools and the
device-mapper code it was discovered that this is not really the case.
After the multipath flushing has been completed we can be sure that the
data has been successfully sent and acknowledge by the device.
Closes-Bug: #1785669
Change-Id: I10f7fea2d69d5d9011f0d5486863a8d9d8a9696e
We made assumptions in the fibre channel connector code that
there was only ever a single lun per volume, even with many
wwns per connections. There is need to support multiple luns
per multipath device, similar to how the iSCSI volumes work.
What we do is allow a list for 'target_luns' and 'target_wwns'
in the connection properties, similar to how the iSCSI connector
treats things like 'target_portals', 'target_luns', etc. we
then group together 'targets' as combination of wwpns and the
lun associated with them. This grouping is used to through
the attach and detach workflow now to determine dev paths and
scsi target information for rescans.
All existing calls with 'target_lun' and 'target_wwn' will
continue working as before, the new plural keys are optional.
Change-Id: I393a028457a162228666d8497b695984fefdfab4
Closes-Bug: #1774293
Current FC tries to limit the scanning range by detecting the target and
channel, unfortunately this code has a good number of implementation
issues:
- Matching uses local WWNN instead of target's WWPN.
- Not using a shell to run the command, so the * glob won't expand.
- Not using -l on grep command to list file names instead of contents.
- Not making the search case insensitive.
This patch fixes all these issues by using the target's WWPNs instead
-taking into account FC Zone/Access control information if present- and
supporting both possible connection information formats for the WWPNs
(single value or list of values).
Rescan tests have been modified to adhere to unit tests best practices,
where each test case only tests the specific code in the method under
test and mocks everything else.
Closes-Bug: #1664653
Closes-Bug: #1684996
Closes-Bug: #1687607
Change-Id: Ib539f6a3652bab4399c30cd90f326829e839ec02
"tries" is never incremented, but is logged.
Just use self.tries instead of "tries" and "self.tries".
Closes-Bug: #1706648
Change-Id: Ib3e7c2e6a918912fda09419e89e138d96ed41fc2
This patch refactors iSCSI disconnect code changing the approach to one
that just uses `iscsiadm -m session` and sysfs to get all the required
information: devices from the connection, multipath system device name,
multipath name, the WWN for the block devices...
By doing so, not only do we fix a good number of bugs, but we also
improve the reliability and speed of the mechanism.
A good example of improvements and benefits achieved by this patch are:
- Common code for multipath and single path disconnects.
- No more querying iSCSI devices for their WWN (page 0x83) removing
delays and issue on flaky connections.
- All devices are properly cleaned even if they are not part of the
multipath.
- We wait for device removal and do it in parallel if there are
multiple.
- Removed usage of `multipath -l` to find devices which is really slow
with flaky connections and didn't work when called with a device from
a path that is down.
- Prevent losing data when detaching, currently if the multipath flush
fails for any other reason than "in use" we silently continue with the
removal. That is the case when all paths are momentarily down.
- Adds a new mechanism for the caller of the disconnect to specify that
it's acceptable to lose data and that it's more important to leave a
clean system. That is the case if we are creating a volume from an
image, since the volume will just be set to error, but we don't want
leftovers. Optionally we can tell os-brick to ignore errors and don't
raise an exception if the flush fails.
- Add a warning when we could be leaving leftovers behind due to
disconnect issues.
- Action retries (like multipath flush) will now only log the final
exception instead of logging all the exceptions.
- Flushes of individual paths now use exponential backoff retries
instead of random retries between 0.2 and 2 seconds (from oslo
library).
- We no longer use symlinks from `/dev/disk/by-path`, `/dev/disk/by-id`,
or `/dev/mapper` to find devices or multipaths, as they could be
leftovers from previous runs.
- With high failure rates (above 30%) some CLI calls will enter into a
weird state where they wait forever, so we add a timeout mechanism in
our `execute` method and add it to those specific calls.
Closes-Bug: #1502534
Change-Id: I058ff0a0e5ad517507dc3cda39087c913558561d
In Fibre Channel environment, when nova hard reboot a vm,
it will select the first discovered path to find out the multipath id.
If the first path was failed, but not deleted, then multipath will fail,
and reboot will fail. Actually we have alive paths to boot the VM.
Closes-Bug: 1685969
Change-Id: I252569f8b49d9838090d1a0f454103204557c916
Signed-off-by: Liu Qing <liuqing@chinac.com>
When we are using friendly names for multipath the multipaths are not
getting flushed, which may lead to data loss on slow connections and
multipath entries with no actual paths.
This happens in both iSCSI and FC connections, and it is due to the
flush being requested on the WWN instead of the actual name of the
device.
So when we are not using friendly names the WWN and the device name are
the same and our call to multipath -f will successfully flush remaining
data, but when we are using friendly names they will not match, and the
call to multipath -f will silently fail (return code 0) and the flush
will not actually go through. When the flush doesn't happen, if there is
remaining data, then the multipath will stay once the individual paths
have been removed.
Closes-Bug: #1663925
Change-Id: Ib93d945a5b5fca57bcac4e176d62d1412b95f2da
When resizing a in-use volume, if multipath is enabled, function
extend_volume only rescans ONE SCSI device and then run the command:
"multipathd reconfigure"
"multipathd resize map mpath_id"
The SCSI device resizes successfully, but the multipath device is
still the old size. This patch fixes it by rescanning all SCSI devices
of the multipath device.
Change-Id: I3a7c7d5e86defedfacd71067f2e5a89bca6aa35b
Closes-Bug: #1611659
The warning log level is too high for rescanning for a device
after connecting a volume. This should be INFO level at best.
As a data point, the iscsi warnings alone show up over 207K+ times
in a 1 week gate run.
Change-Id: I7903929519a22f908b8a4798dc6ed0a5b0439df2
Closes-Bug: #1621566
Cleanup order for multipath should be:
- Flush multipath (multipath -f device)
- Flush blockdev devices (blockdev --flushbufs device)
- Remove paths
But now we have:
- Flush blockdev devices (blockdev --flushbufs device)
- Flush multipath (multipath -f device)
- Remove paths
This patch sets the right order and adds a test to avoid regression.
Closes-Bug: #1502979
Change-Id: I065a79514e0fcaf722f57e2edd6f18f31878c711
Fiber Chanel multipath rescan uses wildcards for the host rescan, which
can end up recreating devices that had just been removed if there's a
race condition between the removal of a SCSI device and the connection
of a volume.
The race condition happens if a rescan done when attaching happens right
between us removing the path and removing the lun, because the rescan
will add not only the new path we are attaching, but the old path we are
removing, since the lun still hasn't been removed.
This would leave orphaned devices that pollute our environment and will
be recognized as down paths when the storage controller reuses the same
WWID.
This patch narrows the rescan to only rescan for the specific lun
number, and if possible it also filters the rescan by HBA channel and
SCSI target ID.
We only filter by HBA channel and SCSI target ID when we can find this
information, and that is when the FC storage servers implement a single
WWNN for all ports.
Change-Id: Id6ed98d3fb8b4b980de86256dec8eeda84562c98
Closes-Bug: #1608614
This is a larger refactor of the connector.py file. The goal is to
simplfy the file by moving the vendor connector classes to their own
files, and keep only the InitiatorConnector in the connector.py file.
The vendor specific connector tests are also split out into their own
files.
Change-Id: I020e75ca8cd8bec2ad1b38f3ade5cc1f63a4fee5
Implements: bp connector-refactor