docs: Add context around asynchronous device initialization

Centos Stream and ultimately RHEL have switched to asynchronous
device initialization, which impacts root device hints and their
usability on those systems, in large part because context which
people have traditionally had, no longer holds true on those newer
kernels.

This doc update attempts to provide the needful context to guide
operators to the best possible outcome given the distribution changes.

Change-Id: I541086cfe235b10f1f1dba95fad95022a22f9ce7
This commit is contained in:
Julia Kreger 2024-08-29 08:15:43 -07:00 committed by Doug Goldstein
parent 462f86889b
commit ac31720ac1
No known key found for this signature in database
2 changed files with 80 additions and 1 deletions
doc/source

@ -1283,3 +1283,62 @@ related to image files.
Image safety checks are generally performed as the deployment process begins Image safety checks are generally performed as the deployment process begins
and stages artifacts, however a late stage check is performed when and stages artifacts, however a late stage check is performed when
needed by the ironic-python-agent. needed by the ironic-python-agent.
Using /dev/sda does not write to the first disk
===============================================
Alternative name: I chose /dev/sda but I found it as /dev/sdb after rebooting.
Historically, Linux users have grown accustom to a context where /dev/sda is
the first device in a physical machine. Meaning, if you look at the device
by_path information or the HCTL, or device LUN, the device ends with a zero.
For example, assuming 3 disks, two controllers, with a single disk on the
second controller would look something like this:
* /dev/sda maps to a device with lun 0, HCTL 0:0:0:0
* /dev/sdb maps to a device with lun 1, HCTL 0:0:1:0
* /dev/sdc maps to a device with lun 2, HCTL 0:1:0:0
However, this was a pattern we grew accustom to because the order of device
discovery was sequential *and* synchronous. In other words the kernel stepped
through all possible devices one at a time. Where this breaks is when the
kernel is operating in a mode where device initialization is asynchronous as
some distributions have decided to adopt.
The result of a move to an asynchronous initialization is /dev/sda has always
been the *first* device to initialize, *not* the first device in the system.
As a result, we can end up with something looking like:
* /dev/sda maps to a device with lun 1, HCTL 0:0:1:0
* /dev/sdb maps to a device with lun 2, HCTL 0:1:0:0
* /dev/sdc maps to a device with lun 0, HCTL 0:0:0:0
Generally, most operators might then consider referencing the
/dev/disk/by-path structure to match disk devices because that seems to imply
a static order, *however* a kernel operating with asynchronous device
initialization will order *everything*, including PCI devices the same way,
meaning by-path can also be unreliable. Furthermore, if your server hardware
is using multipath IO, you should be operating with multipath enabled such
that the device is used.
The net result is the best criteria to match on is:
* Serial Number
* World Wide Name
* Device HCTL, which *does* appear to be static in these cases, but is not
applicable for hosts using multipathing. It may, ultimately, not be static
enough, just depending on the hardware in use.
.. NOTE: Some RAID controllers will generate fake WWN and Serial numbers for
"disks" being supplied by the RAID controller. Some may also use the same
WWN for *all* devices, which is a valid approach as the device Logical Unit
Numbers or Device identifier number would be different. Ultimately this
means labels on disks may not be able to be matched to volumes through a
RAID controller, and operators will need to simply "know their hardware"
to navigate the best path depending on the configuration and behavior of
their hardware.
.. NOTE: Centos Stream-9 appears to have a probe_type="sync" option which
reverts this behavior. For more information please see
this `centos stream-9 changeset <https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2819/diffs?commit_id=a93f405246083f0c2e81d0e6c37ba31c6c1b29c3>`_.

@ -9,6 +9,12 @@ which disk it should pick for the deployment. The list of supported hints is:
* model (STRING): device identifier * model (STRING): device identifier
* vendor (STRING): device vendor * vendor (STRING): device vendor
* serial (STRING): disk serial number * serial (STRING): disk serial number
.. note::
Some RAID controllers will generate serial numbers to represent volumes
provided to the operating system which do not match or align to physical
disks in a system.
* size (INT): size of the device in GiB * size (INT): size of the device in GiB
.. note:: .. note::
@ -18,7 +24,9 @@ which disk it should pick for the deployment. The list of supported hints is:
should be the actual size. For example, for a 128 GiB disk ``local_gb`` should be the actual size. For example, for a 128 GiB disk ``local_gb``
will be 127, but size hint will be 128. will be 127, but size hint will be 128.
* wwn (STRING): unique storage identifier * wwn (STRING): unique storage identifier and typically mapping to a device.
This can be a single device, or a SAN storage controller,
or a RAID controller.
* wwn_with_extension (STRING): unique storage identifier with the vendor extension appended * wwn_with_extension (STRING): unique storage identifier with the vendor extension appended
* wwn_vendor_extension (STRING): unique vendor storage identifier * wwn_vendor_extension (STRING): unique vendor storage identifier
* rotational (BOOLEAN): whether it's a rotational device or not. This * rotational (BOOLEAN): whether it's a rotational device or not. This
@ -28,6 +36,11 @@ which disk it should pick for the deployment. The list of supported hints is:
e.g '1:0:0:0' e.g '1:0:0:0'
* by_path (STRING): the alternate device name corresponding to a particular * by_path (STRING): the alternate device name corresponding to a particular
PCI or iSCSI path, e.g /dev/disk/by-path/pci-0000:00 PCI or iSCSI path, e.g /dev/disk/by-path/pci-0000:00
.. note::
Device identification by-path may not be reliable on Linux kernels using
asynchronous device initialization.
* name (STRING): the device name, e.g /dev/md0 * name (STRING): the device name, e.g /dev/md0
@ -39,6 +52,13 @@ which disk it should pick for the deployment. The list of supported hints is:
devices like /dev/sda and /dev/sdb `switching around at boot time devices like /dev/sda and /dev/sdb `switching around at boot time
<https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/persistent_naming.html>`_. <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/persistent_naming.html>`_.
.. warning::
Furthermore, recent move to asynchronous device initialization among
some Linux distribution kernels means that the actual device name string
is entirely unreliable when multiple devices are present in the host, as
the device name is claimed by the device which responded first, as opposed
to the previous pattern where it was the first initialized device in
a synchronous process.
To associate one or more hints with a node, update the node's properties To associate one or more hints with a node, update the node's properties
with a ``root_device`` key, for example:: with a ``root_device`` key, for example::