Currently os-brick is using in-process locks that will only prevent concurrent
access to critical sections to threads within a single process.
But based on the comment from iSCSI it seems like the code assumed that
these were file based locks that prevented concurrent access from
multiple processes.
Mentioned iSCSI comment is being removed because it's not correct that
our current retry mechanism will work with connect and disconnect
concurrency issues.
The reason why we haven't seen errors in Nova is because it runs a
single process and locks will be effective.
This is probably also not an issue in some transport protocols, such as
FC and RBD, and it wouldn't be an issue in iSCSI connections that don't
share targets.
But for others, such as iSCSI with shared targets and NVMe-OF, not
using file locks will create race conditions in the following cases:
- More than 1 cinder backend: Because we can have one backend doing a
detach in a create volume from image and the other an attach for an
offline migration.
- Backup/Restore if backup and volume services are running on the same
host.
- HCI scenarios where cinder volume and nova compute are running on the
same host, even if the same lock path if configured.
- Glance using Cinder as backend and is running on the same node as
cinder-volume or cinder-backup.
The problematic race conditions happen because the disconnect will do a
logout of the iSCSI target once the connect call has already confirmed
that the session to the target exists.
We could just add the file locks to iSCSI and NVMe, but I think it's
safer to add it to all the connectors and then, after proper testing, we
can can change back the locks that can be changed, and remove or reduce
the critical section in others.
Closes-Bug: #1947370
Change-Id: I6f7f7d19540361204d4ae3ead2bd6dcddb8fcd68
(cherry picked from commit 6a43669edc)
(cherry picked from commit 19a4820f5c)
Conflicts: os_brick/initiator/connectors/iscsi.py
(cherry picked from commit 08ddf69d64)
Conflicts: os_brick/initiator/connectors/nvmeof.py
(cherry picked from commit b03eca8353)
Conflicts: os_brick/tests/base.py
Conflicts: os_brick/initiator/connectors/iscsi.py
After compute host reboot, in an iSCSI/multipath environment, some
of the connections to the iSCSI portal are not reinitiated and missing
iSCSI devices are observed. This patchset introduced retries for this
particular scenario.
Closes-Bug: #1944474
Change-Id: I60ee7421f7b792e8324286908a9fdd8fb53e433e
(cherry picked from commit 8832c53899)
(cherry picked from commit 4d116483af)
(cherry picked from commit 779d1e48c7)
(cherry picked from commit 7c7650b4de)
(cherry picked from commit d0eea8a6eb)
Logging and then ignoring multipathd errors when it isn't enforced is
confusing to operators investigating attach or detach issues.
Change-Id: Ib234e585da5644d87fcf1ef6bcb95f2871dc6b32
(cherry picked from commit bdfe6b43cb)
(cherry picked from commit d09dc9e51c)
(cherry picked from commit ffe0cbfc93)
(cherry picked from commit 892583b388)
Currently we don't properly catch some possible exceptions during
connectiing to iSCSI portals, like failures in "iscsiadm -m session".
Because of this _connect_vol threads can abort unexpectedly in some
failure patterns, and this abort causes hung in subsequent steps
waiting for results from _connct_vol threads.
This change ensures that any exceptions during connecting to iSCSI
portals are handled in the _connect_vol method corectly, to avoid
unexpected abort without updating thread results.
Conflicts:
os_brick/tests/initiator/connectors/test_iscsi.py
Closes-Bug: #1915678
Change-Id: I0278c502806b99f8ec65cb146e3852e43031e9b8
(cherry picked from commit 4478433550)
(cherry picked from commit 57c8f4334c)
(cherry picked from commit c336cb76b3)
Found while testing cinder-backup with ceph on Ubuntu Focal, which
installs ceph Octopus. Octopus apparently enforces the requirement
that the config file contain a '[global]' section for general
requirements. The '[global]' section goes back at least to ceph
Hammer [0], so we will simply add it to the temporary ceph config
file that os-brick generates in the RBDConnector class.
[0] https://docs.ceph.com/docs/hammer/rados/configuration/mon-config-ref/
Co-authored-by: Alex Kavanagh <alex@ajkavanagh.co.uk>
Co-authored-by: Ivan Kolodyazhny <e0ne@e0ne.info>
Change-Id: I86eb31535d990291945de5d9846b1a03157ec2cf
Closes-bug: #1865754
(cherry picked from commit c6ad4d864c)
(cherry picked from commit 474583b4f8)
(cherry picked from commit 91c73a433a)
Recent multipathd doesn't remove path devices timely when it receives
burst of udev events but wait for a while to start actual removal.
Because os-brick removes path devices in a short time during detaching
a multipath device, it is likely to hit this burst limit and sometimes
path devices are not removed before a subsequent operation is started.
This change ensures that os-brick tells mutlipathd to remove path
devices when the devices should be deleted, so that orphan paths are
not left when starting a subsequent attach operation.
Closes-Bug: #1924652
Change-Id: I65204aa7495740dc1545bff2c5c485a8041e7930
(cherry picked from commit 1b2e229542)
(cherry picked from commit 0cd58a9b0a)
(cherry picked from commit a0e995b130)
(cherry picked from commit 5c031cb2d9)
OS-Brick disconnect_volume code assumes that the use_multipath parameter
that is used to instantiate the connector has the same value than the
connector that was used on the original connect_volume call.
Unfortunately this is not necessarily true, because Nova can attach a
volume, then its multipath configuration can be enabled or disabled, and
then a detach can be issued.
This leads to a series of serious issues such as:
- Not flushing the single path on disconnect_volume (possible data loss)
and leaving it as a leftover device on the host when Nova calls
terminate-connection on Cinder.
- Not flushing the multipath device (possible data loss) and leaving it
as a leftover device similarly to the other case.
This patch changes how we do disconnects, now we assume we are always
disconnecting multipaths, and fallback to doing the single path
disconnect we used to do if we can't go that route.
The case when we cannot do a multipathed detach is mostly when we did
the connect as a single path and the Cinder driver doesn't provide
portal_targets and portal_iqns in the connection info for non
multipathed initialize-connection calls.
This changes introduces an additional call when working with single
paths (checking the discoverydb), but it should be an acceptable
trade-off to not lose data.
Also includes squash to add release note from:
Add release note prelude for os-brick 4.4.0
Change-Id: I0f9e088fbb14ee9a73175aa31bca8c619db5a96f
Closes-Bug: #1921381
Change-Id: I066d456fb1fe9159d4be50ffd8abf9a6d8d07901
(cherry picked from commit d4205bd0be)
(cherry picked from commit c70d70b240)
(cherry picked from commit 9528ceab4f)
(cherry picked from commit 4894b242cf)
When doing multipathed Fibre Channel we may end up not flushing the
volume when disconnecting it if didn't find a multipath when doing
the connection.
The issue is caused by the `connect_volume` call returning a real path
instead of a symlink and us passing the wrong device_info to the
_remove_devices method, which makes the disconnect code incorrectly
identify when a multipath was used to decide whether to do a flush or
not.
This patch fixes these things in order to flush the individual path when
necessary.
Closes-Bug: #1897787
Change-Id: Ie3266c81b3c6bb5a2c213a12410539d404d1febe
(cherry picked from commit 1432c369fb)
(cherry picked from commit 3adcb40678)
(cherry picked from commit 1e7ad8b7a4)
The command 'iscsiadm -m node' will return entries for corrupt targets
in the form '[]:port,-1' instead of the expected format. This causes an
IndexError exception during parsing. This patch skips invalid entries.
Closes-bug: #1886855
Change-Id: I9a1746658474c0f1be7ec29a36767085aaf2ab7f
(cherry picked from commit 4fabe1b33d)
(cherry picked from commit 958e4f5fb8)
(cherry picked from commit 897dbdb9a1)
By default, udev will set device by-path name to its ID_PATH.
Looking at [1], most of the time, current code works,
but when system uses platform, then disk name will be prefixed by
'platform-xxxxx'.
[1]: a0be538616/src/udev/udev-builtin-path_id.c (L530)
Change-Id: I9b2c120f074f60b9af6dd81718a5287656040aba
Closes-Bug: #1862443
(cherry picked from commit 7c9020f006)
(cherry picked from commit 9491b6a417)
Changes:
- hacking 0.12.0 -> 1.1.0
* it was already at >=1.1.0,<1.2.0 in test-req, so updated l-c
- flake8 2.5.5 -> 2.6.0 (for hacking)
- added bandit to test-requirements and l-c
* needed to set cap for py27
Change-Id: If8a12c18daea9513130a020d01db2c27a93c3cfc
Now that we search the multipath device even if we haven't been able to
find the WWN in the sysfs we can leverage the multipath daemon
information on sysfs to get the WWN.
Pass the mpath to "get_sysfs_wwn" method where we check the sysfs to get
the WWN.
Conflicts:
os_brick/tests/initiator/connectors/test_iscsi.py
os_brick/tests/initiator/test_linuxscsi.py
The tests required a few fixes to account for the differences between
the branches and the changes introduced by the backports this one
builds upon.
Change-Id: Id1905bc174b8f2f3a345664d8a0a05284ca69927
(cherry picked from commit 0cdd9bbbe2)
(cherry picked from commit 31c01b543c)
If udev rules are slow (or don't happen) and they don't generate the
symlinks we use to detect the WWN, or if we just fail to find the WWN we
end up not detecting the iSCSI multipath even if it is present in the
system.
With this patch we no longer wait to find the WWN before trying to
locate the multipath device.
This means that as long as the multipath daemon is able to generate the
multipath we will be able to return a multipath device path to the
caller.
This backport has been tuned to properly work with Python 2 too, where
the number of calls to get_wwn_mock may be higher than 2
because the thread may not start immediately as it happens on Python 3.x.
Closes-Bug: 1881619
Change-Id: Ic48bd9ac408c56073e58168df7e74e4b949ac2f2
(cherry picked from commit 63f52be546)
(cherry picked from commit a04e553add)
The LUKS encryptor feature expects devices to have a symbolic link that
it can overwrite in order to enable transparent encryption/decryption
for instances [1]. This is generally the case for RBD volumes, as Ceph
uses udev rules [2] to create a '/dev/rbd/{pool}/{device}' ->
'/dev/rbdN' symlink. However, in an environment where udev daemon is not
present or configured correctly, this symlink will never be configured.
This causes things to crash and burn in a rather non-obvious manner when
locally attaching an encrypted RBD volume:
oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-foo crypt-volume-foo
Exit code: 4
Stdout: ''
Stderr: "Device /dev/rbd/volumes/foo doesn't exist or access denied.\n"
('foo' being a stand-in for a very long 'device-$UUID' name)
The long term fix here is to probably stop relying on the side effects
of these udev rules, i.e. the symlinks, but that is a far more involved
fix that would not be backportable. Instead, for now we simply leave a
breadcrumb for the user, informing them as to what's gone wrong and
encouraging them to look at the bug report for more information.
[1] https://github.com/openstack/os-brick/blob/3.1.0/os_brick/encryptors/luks.py#L191-L195
[2] https://github.com/ceph/ceph/blob/v14.0.0/udev/50-rbd.rules
Change-Id: I2775f55039695c7ec029106c0dafe4d46255b336
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Related-Bug: #1884114
(cherry picked from commit ee34d925ff)
(cherry picked from commit 1eeffd986d)
To avoid issues with the scsi_id command getting stuck and blocking the
attachment we use sysfs to search for the WWN, but it can happen that we
fail to detect the WWN even if it's present in sysfs.
This happens when the storage array has multiple designators and the
multipath daemon detects the multipaths very fast.
The flow is:
- os-brick attaches volumes using iscsiadm --login
- udev generates the symlink with the WWN (this is the one we want)
- multipathd detects the volumes and forms the DM
- udev replaces the previous symlink to point to the multipath DM
- os-brick checks the symlink
This patch adds code to get_sysfs_wwn that checks the individual devices
belonging to a multipath DM if the symlink points to a DM.
Closes-Bug: #1881608
Change-Id: I05f94d31277efec28ad50ae2f3502ab6fccfe37c
(cherry picked from commit 7fb37c2000)
(cherry picked from commit 935daead18)
Ceph 13.2.0 changed [1] the JSON and XML output format for the 'rbd
showmapped' command, from a object keyed by device ID to a list of
device objects. Handle this change.
Change-Id: I55bc70437d41da3a32b8440d00c139805b8be256
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Closes-Bug: #1884052
(cherry picked from commit a87ef7bbab)
(cherry picked from commit bf0faeaea4)
Change I6f01a178616b74ed9a86876ca46e7e46eb360518 needs to be
backported and included in a new os-brick release in all releaseable
stable branches, so add a release note saying what it does.
Change-Id: Ib98043358d51426ca650104ad59a7e09911ee8e9
(cherry picked from commit 7ca8e56d50)
(cherry picked from commit 930ce7fab9)
When we fixed bug 1823200 in Change-ID
Iab54c515fe7be252df52b1a0503a251779805759 we made the ScaleIO connector
incompatible with the old connection properties dictionary as it only
supported the new 'config_group' and 'failed_over' parameters to get the
password.
This is a problem in any system that is upgraded and has attachments to
the array, because the connection properties of those volumes will not
contain the new fields and detaching them will result in error
"KeyError: 'config_group'".
This patch adds compatibility code to support the old connection
properties format so we can detach those volumes.
Related-Bug: #1823200
Change-Id: I6f01a178616b74ed9a86876ca46e7e46eb360518
(cherry picked from commit 5450483082)
(cherry picked from commit 31589a624f)
Conflicts:
os_brick/initiator/connectors/scaleio.py
All existing jobs are converted to native Zuul v3 versions.
os-brick-cinder-tempest-dsvm-lvm-lio-barbican has been removed
because it is a duplicate of cinder-tempest-plugin-lvm-lio-barbican
which does not test against the current os-brick.
os-brick-src-tempest-lvm-lio-barbican is in fact the same job,
but it tests against the os-brick codebase being tested.
Partial-Bug: #1853372
Depends-On: https://review.opendev.org/672804
Change-Id: I3e6b8f7fffff8aa2e9d4a3009374c74baa131405
(cherry picked from commit 8ef7c54807)
Change Ia1d2b2151e5676037d40bfaf388b54023fc37093 uses a ConfigParser
instance in a way that is incompatible with python 2.7, resulting in
a "ConfigParser instance has no attribute '__getitem__'" error. This
patch accesses the ConfigParser using the "old" API that works in both
Python 2.7 and Python 3 and adds some tests.
Co-authored-by: Brian Rosmaita <rosmaita.fossdev@gmail.com>
Change-Id: Ie2db587c3bc379acd53cfd449788d171ae58dec5
Closes-Bug: 1883654
Id507109df80391699074773f4787f74507c4b882 introduced the showmapped
command when attempting to disconnect locally attached rbd volumes.
Unfortunately at the time the test changes incorrectly attempted to
assert the commands made by using has_calls instead of the valid
assert_has_calls method.
This change now corrects this, ensures _get_rbd_args is called to
populate all of the required arguments for the showmapped command and
finally corrects the name stored within the fake showmapped output in
the test.
Change-Id: I7e761828b3799cef720e15ca7896e8e8d6f98182
(cherry picked from commit 71331b0e06)
(cherry picked from commit 55cfc97581)
VxFlex OS password is not stored in block_device_mapping table. Instead of this
passwords are stored in separate file and are retrieved during each attach/detach
operation.
Closes-Bug: #1823200
Change-Id: Ia1d2b2151e5676037d40bfaf388b54023fc37093
The original method of unmapping the higher level RBD device found under
/dev/rbd/{pool}/{volume} fails when an encryptor has been attached to
the volume locally. This is due to a symlink being created that points
to the the decrypted dm-crypt device, breaking any attempt to unmap the
RBD volume.
To avoid this we can simply find and unmap the lower level RBD device of
/dev/rbd*.
Change-Id: Id507109df80391699074773f4787f74507c4b882
(cherry picked from commit 9415b3b41f)
When creating multiple concurrent volumes, there's a lock
that prevents mount from returning quickly. Reading /proc/mounts
is way faster.
Change-Id: If31d120955eb216823a55005fdd3d24870aa6b9a
Closes-Bug: #1856889
(cherry picked from commit f240960847)
LUKS password quality checking is not useful
since we only use long hex strings for passwords.
Not skipping this means that we have to install
cracklib-dicts for cryptsetup to work, which is
unnecessary weight.
Closes-Bug: #1861120
Change-Id: Idc281be7cf88eeeeefe260877a1fc275d94f2bed
(cherry picked from commit afb7beb7ce)
We'll keep separate lists of connectors for each of the supported
operating systems so that we don't end up trying to use unsupported
connectors.
Since Cinder is explicitly trying to use the iSCSI/FC connectors,
we'll have to include those as well, for now.
Change-Id: Ibec2b798e8c5c3457cebea12cfd2f5813e62fb9e
Closes-Bug: #1850109
(cherry picked from commit 9e8657dd6e)
Bug #1820007 documents failures to find /dev/disk/by-id/ symlinks
associated with encrypted volumes both in real world and CI
environments. These failures appear to be due to udev on these slow or
overloaded hosts failing to populate the required /dev/disk/by-id/
symlinks in time after the iSCSI volume has been connected.
This change seeks to avoid such failures by simply decorating
_get_device_link with the @utils.retry to hopefully allow udev time to
create the required symlinks under /dev/disk/by-id/.
Closes-Bug: #1820007
Change-Id: Ib9c8ebae7a6051e18538920139fecd123682a474
(cherry picked from commit 331316827a)
After some changes to the FC connector we have introduced a regression
on the way we do the scans, and we end up scanning using wildcards even
though we shouldn't.
The targets in the "initiator_target_map" don't mean that they are all
connected, so we must take that into account.
With the current code, if we have the following connections:
HBA host7 ---- SWITCH ---- port W (channel 0, target 2)
\--- SWITCH ---- port X (channel 0, target 3)
HBA host8 ---- SWITCH ---- port Y (channel 0, target 2)
\--- SWITCH ---- port Z (channel 0, target 3)
We will end up with the following scans 8 scans for LUN L:
- - L > host7
- - L > host7
0 2 L > host7
0 3 L > host7
0 2 L > host8
0 3 L > host8
- - L > host8
- - L > host8
Which correspond to the responses from _get_hba_channel_scsi_target like
this:
port Y port Z port W port X
host7 ... ['-','-',L] ['-','-',L] ['0','2',L] ['0','3',L]
host8 ... ['0','2',L] ['0','3',L] ['-','-',L] ['-','-',L]
And we should only be doing 4 scans:
0 2 L > host7
0 3 L > host7
0 2 L > host8
0 3 L > host8
Most storage arrays get their target ports automatically detected by the
Linux FC initiator and sysfs gets populated with that information, but
there are some that don't.
We'll do a narrow scan using the channel, target, and LUN for the former
and a wider scan for the latter.
If all paths to a former type of array were down on the system boot the
array could look like it's of the latter type and make us bring us
unwanted volumes into the system by doing a broad scan.
To prevent this from happening Cinder drivers can use the
"enable_wildcard_scan" key in the connection information to let us know
they don't want us to do broad scans even if no target ports are found
(because they know the cause is there's no connection).
Close-Bug: #1849504
Related-Bug: #1765000
Related-Bug: #1828440
Change-Id: I5dbefaff43fb902b15117b443fc92f7b6a6ad8c9
(cherry picked from commit 708733e495)
The OpenStack os-brick library uses hardcoded paths to binary files to
interact with the VxFlexOS SDC. This leads to problems when using
containerized OpenStack (Kolla & Red Hat). Due to the fact that VxFlexOS
SDC binary files has to be used inside containers (nova, cinder, etc.)
the overcloud deployment must be performed in 3 stages:
1) deploy overcloud without additional volume mounts
2) install the VxFlexOS client on the controller and compute nodes
3) update overcloud with additional volume mounts
Using these changes overcloud can be deployed without update step after
initial deployment since os-brick does not have external dependencies
and uses python built-in libraries. The scini device through which the
VxFlexOS client interacts is presented in the containers by default
because /dev directory from the host is mounted in all containers.
Change-Id: Ifc4dee0a51bafd6aa9865ec66c46c10087daa667
Closes-Bug: #1846483
(cherry picked from commit 2d694361fe)
Update the URL to the upper-constraints file to point to the redirect
rule on releases.openstack.org so that anyone working on this branch
will switch to the correct upper-constraints list automatically when
the requirements repository branches.
Until the requirements repository has as stable/train branch, tests will
continue to use the upper-constraints list on master.
Change-Id: Ibd2a510f4e56c55700ceaf6faa90856c03a8f693
LUKS2 support was introduced into cryptsetup 2.0.0 [1] and offers various
improvements over the original format now referred to as LUKS1.
This change introduces an encryptor to os-brick mostly using the
existing LuksEncryptor class with the only difference being the `--type`
switch supplied to cryptsetup when formatting a volume. As such the bulk
of the _format_volume method from the original class has been extracted
into a new _format_luks_volume method both the original and new
Luks2Encryptor class can now reuse.
[1] https://www.saout.de/pipermail/dm-crypt/2017-December/005771.html
Change-Id: I09fb2b2be1e376f8ec0f49741c855cfd54ee27f0
The nvme connector in os-brick is actually for nvme over fabrics, so
to avoid future confusion we have renamed the nvme connector object
to NVMeOF to better reflect it's capability. This patch keeps the
backwards compatibility of the mapping of the initiator.NVME to
the renamed nvmeof object.
Change-Id: I97b41139f2e67ab42e2aa8075c51ef939b3cde18