The "multipath -C <mpath_name>" command waits for the multipath
device map to be ready for I/O. This is useful in preventing race
conditions when we are trying to write to the multipath device
before it is ready for I/O.
We added 2 new config options to make the wait time configurable,
1. wait_mpath_device_attempts - defaults to 4 attempts
2. wait_mpath_device_interval - defaults to 1 second
Closes-Bug: #2067949
Change-Id: Ib075ec62a2bf993615c5c802f34acd7838bfa2af
(cherry picked from commit 639f953194cdc07b1e6aff1fc1ca2a7bc9d28536)
When running in a container using overlayfs we may see the following
warning:
WARNING os_brick.initiator.connectors.nvmeof process execution error
in _get_host_uuid: Unexpected error while running command.
Command: blkid overlay -s UUID -o value
Exit code: 2
Stdout: ''
Stderr: '': oslo_concurrency.processutils.ProcessExecutionError:
Unexpected error while running command.
This change fixes the issue by not running the command when the file
system source is overlay.
Closes-Bug: #2045557
Change-Id: I3abc5bee7f474a9a40d396c559a42edff86334e0
(cherry picked from commit bd07c571da7a3f78dbdec68af638b7e16181848d)
Update the URL to the upper-constraints file to point to the redirect
rule on releases.openstack.org so that anyone working on this branch
will switch to the correct upper-constraints list automatically when
the requirements repository branches.
Until the requirements repository has as stable/2024.1 branch, tests will
continue to use the upper-constraints list on master.
Change-Id: I327e641569470fb111f78806f0ea127092378156
When fetching the target value (T in HCTL) for the storage HBAs,
we use the /sys/class/fc_transport path to find available targets.
However, this path only contains targets that already have a LUN
attached from, to the host.
Scenario:
If we have 2 controllers on the backend side with 4 target HBAs each (total 8).
For the first LUN mapping from controller1, we will do a wildcard
scan and find the 4 targets from controller1 which will get
populated in the /fc_transport path.
If we try mapping a LUN from controller2, we try to find targets in the
fc_transport path but the path only contains targets from controller1 so
we will not be able to discover the LUN from controller2 and fail with
NoFibreChannelVolumeDeviceFound exception.
Solution:
In each rescan attempt, we will first search for targets in the
fc_transport path: "/sys/class/fc_transport/target<host>*".
If the target in not found then we will search in the fc_remote_ports
path: "/sys/class/fc_remote_ports/rport-<host>*"
If a [c,t,l] combination is found from either path, we add it to
the list of ctls we later use it for scanning.
This way, we don't alter the current "working" mechanism of scanning
but also add an additional way of discovering targets and improving
the scan to avoid failure scenarios in each rescan attempt.
Closes-Bug: #2051237
Change-Id: Ia74b0fc24e0cf92453e65d15b4a76e565ed04d16
As per the current release tested runtime, we test
till python 3.11 so updating the same in python
classifier in setup.cfg
Change-Id: I06e453b6d02ac8c7b615d3d61b06173eb249dc27
The nvme cli has changed its behavior, now they no longer differentiate
between errors returning a different exit code.
Exit code 1 is for errors and 0 for success.
This patch fixes the detection of race conditions to also look for the
message in case it's a newer CLI version.
Together with change I318f167baa0ba7789f4ca2c7c12a8de5568195e0 we are
ready for nvme CLI v2.
Closes-Bug: #1961222
Change-Id: Idf4d79527e1f03cec754ad708d069b2905b90d3f
Attaching NVMe-oF no longer works in CentosOS 9 stream using nvme 2.4
and libnvme 1.4.
The reason is that the 'address' file in sysfs now has the 'src_addr'
information.
Before we had:
traddr=127.0.0.1,trsvcid=4420 After:
Now we have:
traddr=127.0.0.1,trsvcid=4420,src_addr=127.0.0.1
This patch fixes this issue and future proofs for any additional
information that may be added by parsing the contents and searching for
the parts we care: destination address and port.
Closes-Bug: #2035811
Change-Id: I7a33f38fb1b215dd23e2cff3ffa79025cf19def7
When an nvme subsystem has all portals in connecting state and we try
to attach a new volume to that same subsystem it will fail.
We can reproduce it with LVM+nvmet if we configure it to share targets
and then:
- Create instance
- Attach 2 volumes
- Delete instance (this leaves the subsystem in connecting state [1])
- Create instance
- Attach volume <== FAILS
The problem comes from the '_connect_target' method that ignores
subsystems in 'connecting' state, so if they are all in that state it
considers it equivalent to all portals being inaccessible.
This patch changes this behavior and if we cannot connect to a target
but we have portals in 'connecting' state we wait for the next retry of
the nvme linux driver. Specifically we wait 10 more seconds that the
interval between retries.
[1]: https://bugs.launchpad.net/nova/+bug/2035375
Closes-Bug: #2035695
Change-Id: Ife710f52c339d67f2dcb160c20ad0d75480a1f48
Dell Powerflex 4.x changed the error code of VOLUME_NOT_MAPPED_ERROR
to 4039. This patch adds that error code.
Closes-Bug: #2046810
Change-Id: I76aa9e353747b1651480efb0f3de11c707fe5abe
The mypy job complaints about 'exc' variable[1] since it was used
for ExceptionChainer as well as TargetPortalNotFound exception.
Changing the variable name for TargetPortalNotFound exception from
'exc' to 'target_exc' makes the 'type: ignore' comments unnecessary.
[1] Trying to read deleted variable 'exc'
Change-Id: I4b10db0754f0e00bb02d3a60f9aaf88b90466a8f
This patch improves the creation of the /etc/nvme/hostnqn file by using
the system UUID value we usually already know.
This saves us one or two calls to the nvme-cli command and it also
allows older nvme-cli versions that don't have the `show-hostnqn`
command or have it but can only read from file to generate the same
value every time, which may be useful when running inside a container
under some circumstances.
Change-Id: Ib250d213295695390dbdbb3506cb297a86e95218
The Dell PowerFlex (scaleio) connector maintains a token cache
for PowerFlex OS.
The cache was overwritten with None by misktake
in Change-ID I6f01a178616b74ed9a86876ca46e7e46eb360518.
This patch fixes the broken cache to avoid unnecessary login.
Closes-Bug: #2004630
Change-Id: I2399b0b2af8254cd5697b44dcfcec553c2845bec
This reverts commit 33661ece808a6c32ad36aee0acb46a3c0624d7ce.
Reason for revert: breaks reading password from the config file
Change-Id: I840d8c4d66daf0ab8636617b42cdb47dd4313cc9
from an image
This patch fixes the issue of password getting writen in plain text in
logs while creating a new volume. It created a new logger with default
log level at error.
Closes-Bug: #2003179
Change-Id: I0292a30f402e5acddd8bbc31dfaef12ce24bf0b9
Dell Powerflex 4.x changed the error code of VOLUME_ALREADY_MAPPED_ERROR
to 4037. This patch adds that error code.
Closes-Bug: #2013749
Change-Id: I928c97ea977f6d0a0b654f15c80c00523c141406
In some old nvme-cli versions the NVMe-oF create_hostnqn method fails.
This happens specifically on versions between not having the
show-hostnqn command and having it always return a value. On those
version the command only returns the value present in the file and never
tries to return an idempotent or random value.
This patch adds for that specific case, which is identified by the
stderr message:
hostnqn is not available -- use nvme gen-hostnqn
Closes-Bug: #2035606
Change-Id: Ic57d0fd85daf358e2b23326022fc471f034b0a2f
Add file to the reno documentation build to show release notes for
stable/2023.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.2.
Sem-Ver: feature
Change-Id: I25bef272ded6c7c963c6ad0f95103fe421fa8fe7
Change If402f9ae0ca06fec0 replaced cycle-specific testing templates
that had to be changed in each project's zuul config file with a
generic template that only needs to be updated in one place, namely,
in the openstack-zuul-jobs repo.
Apparently os-brick didn't get the memo, so we fix that now.
Change-Id: I8202283d5bd5ecede3414fe3e92e95e743df2f67
After merging change I0b60f9078f23f8464d8234841645ed520e8ba655, we
noticed an issue with existing unit tests which started failing.
The reason is 'nvme_hostid' was an additional parameter returned
in the response while fetching connector properties from nvme
connector.
This is environment specific and won't occur in environments where
'/etc/nvme/hostid' file doesn't exist due to which these tests
passed in gate but failed in the local run when hostid file
was present.
This patch mocks the get_nvme_host_id method for tests so the
hostid is never returned irrespective of the environment.
Closes-Bug: #2032941
Change-Id: I8b1aaedfdb9bef6e34813e39dede9afe98371d2b
In a multipath enabled deployment, when we try to extend a volume
and some paths are down, we fail to extend the multipath device and
leave the environment in an inconsistent state. See LP Bug #2032177
for more details.
To handle this, we check if all the paths are up before trying to
extend the device and fail fast if any path is down. This ensures
we don't partially extend some paths and leave the other to the
original size leading to inconsistent state in the environment.
Closes-Bug: 2032177
Change-Id: I5fc02efc5e9657821a1335f1c1ac5fe036e9329a