Changes:
- eliminate whitespace in passenv values
- account for stricter allowlist checking
- removed skipsdist=True, which in tox 4 appears to prevent os-brick
from being installed in the testenvs
- made 4.0.0 the tox minversion
This patch makes tox 4 the default so that we can hopefully catch
problems locally before they block the gate.
Change-Id: I9f88d024c9d7b4f2761aa4e132ee0bd5b7272207
Eventlet hasn't actually been a direct os-brick
requirement since Ussuri Change-Id I9684db.
Leave it in test-reqs since we have a unit test
that mocks it. (Used via oslo.service.)
Change-Id: I9a72d41c9bb70ad568e5cd9218730b93b226b202
On some platforms there is no 'dmidecode' program (e.g. s390x), when
privsep attempts to run a program that's not found, it raises a
FileNotFoundError instead of ProcessExecutionError.
Closes-Bug: #1994083
Change-Id: I6fc43ab7e243d50b74036d1af5f9e8f880401cc6
Add file to the reno documentation build to show release notes for
stable/zed.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/zed.
Sem-Ver: feature
Change-Id: Ia775c42636307fa35d0612937e4c949c0cc2193b
The RBDVolumeIOWrapper class inherits from io.RawIOBase but its close
method doesn't flush and after being called the "closed" attribute still
returns False.
Trying to set the "closed" attribute ourselves in
RBDVolumeIOWrapper.close results in an error, so we call the parent's
implementation to do it.
The parent's close method will also call the flush method.
This change ensures that rbd_image.close is only called once, that the
closed attribute is correct, and that the data is flushed on close.
Without these changes we could have, unintentionally, multiple calls to
the rbd_image.close method. If we called close on the wrapper
ourselves, then when the Garbage Collector is cleaning up the wrapper it
will call the io.RawIOBase finalizer that will see that the file is not
closed and call close again.
Now that we correctly set the closed flag on the instance, the flush
method should also behave like the parent implementation and fail if it
has already been closed.
Change-Id: Ib3b066a7da071b1c2de78a1a4e569676539bd335
Move the _get_system_uuid method from the NVMeOFConnector connector to
os_brick/privsep/nvmeof.py renaming it to get_system_uuid and running it
as privileged.
This allows us to read sys/class/dmi/id/product_uuid file in Python
instead of running a subprocess to execute cat.
It also allows the connector to have one less privsep calls if the file
doesn't exist, because it can execute the dmidecode command directly
without making another request to the privsep daemon.
Change-Id: I8f2edef6fda97af0ff3f92e39c8b24a85b6c3402
Use dict[], list[] etc. instead of Dict[], List[].
This works on all versions of Python that we support.
Change-Id: If9c737277c041ffd2d3bca1863d300ae7d9bd52e
This coverage is somewhat limited because this
uses classes (context, keymgr) that live outside
of os-brick and therefore don't have type definitons
here.
Change-Id: I7827a870c04a7a6e02510b7909430feebb70e6f1
The get_md_name uses the contents of mdstat using cat and piping it to
grep and then to awk using a privileged shell, but there's no need to do
that.
Actually the mdstat file has 0444 access, so we can read it in Python
directly instead of having to call the privsep daemon to create the
subprocess with the shell.
This patch changes the current implementation of the method to do the
reading and searching in Python.
Change-Id: Idbba6f9a6d928aa94d2920fc1bf8b6e2a0626a07
Turn off in-project stubs for now until this bug
is fixed and in a mypy release we can consume.
https://github.com/python/mypy/issues/13214
Change-Id: Ica954cffd760d7a9d48ba38d01e0e01e84553ff8
Automated logging format checking was added to cinder in ussuri
by change I1dedc0b31f78f518c, but it wasn't added to os-brick and
a few infelicities have crept into the code. Enable the extension
and correct the issues.
To be consistent with cinder, we don't enable G200 ("Logging
statement uses exception in arguments").
Change-Id: I650b92bde5509b654d16301ffe50056e1e5ba579
There were some review comments for minor changes on patch
I6c2a952f7e286f3e3890e3f2fcb2fdd1063f0c17. Since the patch
is merged, this patch addresses those comments.
Change-Id: I7084da4abb0debedc35ff6248ff38d3bcc4000a4
Cinder microversion 3.69 adds an additional value to shared_targets
beyond true and false. Now null/None is also a valid value that can be
used to force locking.
So we now have 3 possible values:
- True ==> Lock if iSCSI initiator doesn't support manual scans
- False ==> Never lock.
- None ==> Always lock.
This patch updates the guard_connection context manager to support the
3 possible values.
With this Cinder can now force locking for NVMe-oF drivers that share
subsystems.
Closes-Bug: #1961102
Depends-On: I8cda6d9830f39e27ac700b1d8796fe0489fd7c0a
Change-Id: Id872543ce08c934cefbbbdaff6ddc61e3828b1f1
Current code doesn't disconnect NVMe-oF subsystems when doing a
disconnect_volume or on connect_volume failure.
This is very problematic for systems that don't share subsytems for
multiple namespaces, because both the device (i.e., /dev/nvme0n1) and
the subsystem (i.e., /dev/nvme0) will stay forever (now that we connect
with the controller loss timeout set to infinite, before it was for 10
minutes) in the system (until manually removed) while the host keeps
trying to connect to the remote subsystem, but it won't be able to
connect because in this case drivers usually destroy both the namespace
and the subsystem simultaneously (so there's no AER message to indicate
the change in available namespaces within the subsystem).
We'll experience multiple issues with all these leftover devices, such
as an ever increasing number of kernel log messages with the connection
retries, possible exhaustion of number of connected NVMe subsystems
and/or files in /dev, and so on.
This patch makes sure the nvmeof connector disconnects a subsystem when
there is no longer a namespace present or when the only namespace
present is the one we are disconnecting. This is done both on the
disconnect_volume call as well as failures during connect_volume.
This is not a full solution to the problem of leaving leftover devices,
because for drivers that share the subsystem there are race conditions
between unexport/unmap of volumes on the cinder side and os-brick
disconnect_volume calls. To fully prevent this situation Cinder needs
to start reporting the shared_targets value for NVMe volumes (something
it's already doing for iSCSI).
Partial-Bug: #1961102
Change-Id: Ia00be53420307d6ac1f100420d039da7b65dc349
We currently have 2 different connection information formats: The
initial one and the new one.
Within the new format we have 2 variants: Replicated and Not replicated.
Currently the nvmeof connector has multiple code paths and we end up
finding bugs that affect one path and not the other, and bugs that are
present in both may end up getting fixed in only one of them.
This patch consolidates both formats by converting the connection
information into a common internal representation, thus reducing the
number of code paths.
Thanks to this consolidation the Cinder drivers are less restricted on
how they can identify a volume in the connection information. They are
no longer forced to pass the NVMe UUID (in case they cannot get that
information or the backend doesn't support it) and they can provide
other information instead (nguid or namespace id).
This connection properties format consolidation is explained in the new
NVMeOFConnProps class docstring.
By consolidating the code paths a number of bugs get fixed
automatically, because they were broken in one path but not in the
other. As examples, the old code path didn't have rescans but the new
one did, and the old code path had device flush but the new one didn't.
Given the big refactoring needed to consolidate everything this patch
also accomplishes the following things:
- Uses Portal, Target, and NVMeOFConnProps classes instead of using the
harder to read and error prone dictionaries and tuples.
- Adds method annotations.
- Documents most methods (exect the raid ones which I'm not familiar
with)
- Adds the connector to the docs.
- Makes method signatures conform with Cinder team standards: no more
static methods passing self and no more calling class methods using
the class name instead of self.
Closes-Bug: #1964590
Closes-Bug: #1964395
Closes-Bug: #1964388
Closes-Bug: #1964385
Closes-Bug: #1964380
Closes-Bug: #1964383
Closes-Bug: #1965954
Closes-Bug: #1903032
Related-Bug: #1961102
Change-Id: I6c2a952f7e286f3e3890e3f2fcb2fdd1063f0c17
Patch fixing bug #1861071 resolved the issue of extending LUKS v1
volumes when nova connects them via libvirt instead of through os-brick,
but nova side still fails to extend LUKSv2 in-use volumes when they
don't go through libvirt.
The logs will show a very similar error, but the user won't know that
this has happened and Cinder will show the new size:
libvirt.libvirtError: internal error: unable to execute QEMU command
'block_resize': Cannot grow device files
There are 2 parts to this problem:
- The device mapper device is not automatically extended.
- Nova tries to use the encrypted block device size as the size of the
decrypted device.
This patch adds new functionality to the encryptors so that they can
extend decrypted volumes to match the size of the encrypted device.
New method added to encryptors is called "extend_volume", and should be
called after the homonymous method in the connector has been called, and
the value returned by the encryptor's extend_volume method is the real
size of the decrypted volume (encrypted volume - headers).
The patch only adds functionality for LUKS and LUKSv2 volumes, not to
cryptsetup volumes.
Related-Bug: #1967157
Change-Id: I351f1a7769c9f915e4cd280f05a8b8b87f40df84
OS-Brick uses file locks to ensure single access to critital sections,
and uses the oslo concurrency "lock_path" configuration option to
determine where to create these locks.
This is fine when each host has a single service using os-brick, which
is the case of Compute nodes and Controller nodes where Glance is not
using Cinder as its backend.
The problem happens when we have an HCI deployment where Cinder and Nova
are collocated on the same host or when Glance uses Cinder as its
backend and is running on the same host. In those scenarios os-brick
will create file locks in different paths for each service, which is not
the intended behavior.
A possible solutions is to set the "lock_path" of all the services to
the same location, which is not great, not only because we'll have all
nova, cinder, glance, and os-brick locks mixed in a single location
(service prefixes help a bit here), but also because then Cinder will
have permissions on the Nova and Glance locks, which doesn't seem right.
This patch introduces a new mechanism in os-brick to have its own
"lock_path" configuration option in the "os_brick" group. It defaults
to the current behavior if not explicitly defined, so it will use olso
concurrency's "lock_path".
To do this the patch introduces the oslo.config dependency and adds a
new "setup" method that sets the default of the os_brick "lock_path".
This new "setup" method is required because the order in which
configuration options for the different namespaces are imported cannot
be guaranteed, so the setup must be called *after* the service has
already imported all (or at least the ones os-brick cares about)
configuration option namespaces.
In other words, when os-brick files are loaded the value for oslo
concurrency's "lock_path" is not yet known, so it cannot be used as a
default, and the oslo_config substitution feature does not support
values from other namespaces, so we cannot default the "lock_path" from
the os_brick namespace to the value in "oslo_concurrency".
Since the value that is going to be used as the "lock_path" is not known
until the "setup" method is called, we cannot use the standard
"synchronized" method from "oslo_concurrency.lock_utils" and we need to
use our own.
In the current os-brick code, each connector that requires a lock is
importing and creating its own "synchronized" instance, but this patch
changes this behavior and creates a single instance, just like Cinder
does.
This feature requires changes in multiple projects -os-brick, cinder,
nova, glance, glance_store, devstack- to be fully supported, but this is
the base of all this effort.
Change-Id: Ic52338278eb5bb3d90ce582fe6b23f37eb5568c4
The os-brick-src-devstack-plugin-ceph job's history indicates
that it is too unstable to vote, but at the same time, having
it as non-voting has burned us in merging rbd-related changes
that could be detected by that job. At the 2022-07-13 cinder
meeting we agreed to compromise by replacing that job with two
jobs, one that votes and runs only on changes that could affect
rbd, and the other that doesn't vote and runs on all other
appropriate changes.
As was the case with the os-brick-src-devstack-plugin-ceph
job, these run only in the check pipeline.
Change-Id: I0777de370ea53212feaff5a791f9b42652bdc7da