From ac31720ac1704df3aa0481aee72cdcf135668cec Mon Sep 17 00:00:00 2001
From: Julia Kreger <juliaashleykreger@gmail.com>
Date: Thu, 29 Aug 2024 08:15:43 -0700
Subject: [PATCH] docs: Add context around asynchronous device initialization

Centos Stream and ultimately RHEL have switched to asynchronous
device initialization, which impacts root device hints and their
usability on those systems, in large part because context which
people have traditionally had, no longer holds true on those newer
kernels.

This doc update attempts to provide the needful context to guide
operators to the best possible outcome given the distribution changes.

Change-Id: I541086cfe235b10f1f1dba95fad95022a22f9ce7
---
 doc/source/admin/troubleshooting.rst          | 59 +++++++++++++++++++
 .../install/include/root-device-hints.inc     | 22 ++++++-
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/doc/source/admin/troubleshooting.rst b/doc/source/admin/troubleshooting.rst
index e7e07d1c9b..86859455ab 100644
--- a/doc/source/admin/troubleshooting.rst
+++ b/doc/source/admin/troubleshooting.rst
@@ -1283,3 +1283,62 @@ related to image files.
   Image safety checks are generally performed as the deployment process begins
   and stages artifacts, however a late stage check is performed when
   needed by the ironic-python-agent.
+
+Using /dev/sda does not write to the first disk
+===============================================
+
+Alternative name: I chose /dev/sda but I found it as /dev/sdb after rebooting.
+
+Historically, Linux users have grown accustom to a context where /dev/sda is
+the first device in a physical machine. Meaning, if you look at the device
+by_path information or the HCTL, or device LUN, the device ends with a zero.
+
+For example, assuming 3 disks, two controllers, with a single disk on the
+second controller would look something like this:
+
+* /dev/sda maps to a device with lun 0, HCTL 0:0:0:0
+* /dev/sdb maps to a device with lun 1, HCTL 0:0:1:0
+* /dev/sdc maps to a device with lun 2, HCTL 0:1:0:0
+
+However, this was a pattern we grew accustom to because the order of device
+discovery was sequential *and* synchronous. In other words the kernel stepped
+through all possible devices one at a time. Where this breaks is when the
+kernel is operating in a mode where device initialization is asynchronous as
+some distributions have decided to adopt.
+
+The result of a move to an asynchronous initialization is /dev/sda has always
+been the *first* device to initialize, *not* the first device in the system.
+As a result, we can end up with something looking like:
+
+* /dev/sda maps to a device with lun 1, HCTL 0:0:1:0
+* /dev/sdb maps to a device with lun 2, HCTL 0:1:0:0
+* /dev/sdc maps to a device with lun 0, HCTL 0:0:0:0
+
+Generally, most operators might then consider referencing the
+/dev/disk/by-path structure to match disk devices because that seems to imply
+a static order, *however* a kernel operating with asynchronous device
+initialization will order *everything*, including PCI devices the same way,
+meaning by-path can also be unreliable. Furthermore, if your server hardware
+is using multipath IO, you should be operating with multipath enabled such
+that the device is used.
+
+The net result is the best criteria to match on is:
+
+* Serial Number
+* World Wide Name
+* Device HCTL, which *does* appear to be static in these cases, but is not
+  applicable for hosts using multipathing. It may, ultimately, not be static
+  enough, just depending on the hardware in use.
+
+.. NOTE: Some RAID controllers will generate fake WWN and Serial numbers for
+   "disks" being supplied by the RAID controller. Some may also use the same
+   WWN for *all* devices, which is a valid approach as the device Logical Unit
+   Numbers or Device identifier number would be different. Ultimately this
+   means labels on disks may not be able to be matched to volumes through a
+   RAID controller, and operators will need to simply "know their hardware"
+   to navigate the best path depending on the configuration and behavior of
+   their hardware.
+
+.. NOTE: Centos Stream-9 appears to have a probe_type="sync" option which
+   reverts this behavior. For more information please see
+   this `centos stream-9 changeset <https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2819/diffs?commit_id=a93f405246083f0c2e81d0e6c37ba31c6c1b29c3>`_.
diff --git a/doc/source/install/include/root-device-hints.inc b/doc/source/install/include/root-device-hints.inc
index e31bd225b4..89c2e770ec 100644
--- a/doc/source/install/include/root-device-hints.inc
+++ b/doc/source/install/include/root-device-hints.inc
@@ -9,6 +9,12 @@ which disk it should pick for the deployment. The list of supported hints is:
 * model (STRING): device identifier
 * vendor (STRING): device vendor
 * serial (STRING): disk serial number
+
+  .. note::
+    Some RAID controllers will generate serial numbers to represent volumes
+    provided to the operating system which do not match or align to physical
+    disks in a system.
+
 * size (INT): size of the device in GiB
 
   .. note::
@@ -18,7 +24,9 @@ which disk it should pick for the deployment. The list of supported hints is:
     should be the actual size. For example, for a 128 GiB disk ``local_gb``
     will be 127, but size hint will be 128.
 
-* wwn (STRING): unique storage identifier
+* wwn (STRING): unique storage identifier and typically mapping to a device.
+  This can be a single device, or a SAN storage controller,
+  or a RAID controller.
 * wwn_with_extension (STRING): unique storage identifier with the vendor extension appended
 * wwn_vendor_extension (STRING): unique vendor storage identifier
 * rotational (BOOLEAN): whether it's a rotational device or not. This
@@ -28,6 +36,11 @@ which disk it should pick for the deployment. The list of supported hints is:
   e.g '1:0:0:0'
 * by_path (STRING): the alternate device name corresponding to a particular
   PCI or iSCSI path, e.g /dev/disk/by-path/pci-0000:00
+
+  .. note::
+    Device identification by-path may not be reliable on Linux kernels using
+    asynchronous device initialization.
+
 * name (STRING): the device name, e.g /dev/md0
 
 
@@ -39,6 +52,13 @@ which disk it should pick for the deployment. The list of supported hints is:
      devices like /dev/sda and /dev/sdb `switching around at boot time
      <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/persistent_naming.html>`_.
 
+  .. warning::
+     Furthermore, recent move to asynchronous device initialization among
+     some Linux distribution kernels means that the actual device name string
+     is entirely unreliable when multiple devices are present in the host, as
+     the device name is claimed by the device which responded first, as opposed
+     to the previous pattern where it was the first initialized device in
+     a synchronous process.
 
 To associate one or more hints with a node, update the node's properties
 with a ``root_device`` key, for example::