330 Commits

Author SHA1 Message Date
Jay Faulkner
60132c96d1 Fix issues caused/found by new codespell
Fixed spelling where appropriate, added ignore where appropriate

Change-Id: I07f203d311484321e0dfcbdf02083784693f4b96
2024-05-23 15:49:48 -07:00
Jay Faulkner
c39517b044 Call evaluate_hardware_support exactly once per hwm
Fixes an issue where we could call evaluate_hardware_support multiple
times each run. Now, instead, we cache the values and use the cache
where needed.

Adds unit test coverage for get_managers and the new method.
Fixes issue where we were caching hardware managers between unit tests.

Also includes fixes for codespell CI:
- skip build files in repo
- fix spelling issues introduced to repo

Closes-bug: 2066308
Change-Id: Iebc5b6d2440bfc9f23daa322493379bbe69e84d0
2024-05-22 08:46:21 -07:00
Zuul
28053644cd Merge "add mixed matching of root device hints" 2024-04-27 17:26:25 +00:00
Adam Rozman
84a1195d5a add mixed matching of root device hints
This commit introduces the following changes:
  - New optional `all_serial_and_wwn` argument for the block device
    listing logic. The new argument makes it possible to
    collect wwn and serial number information from both
    lsblk and udevadm at the same time
  - Both the short and the long serials are collected
    from udeavadm without prioritization when the new argument
    has teh value True
  - The new feature is automatically enabled during block device listing
    as part of the root disk selecetion
  - New options are added to the lsblk command when used in the block
    device discovery process, previously lsblk was not looking
    for wwn numbers and now it does

Closes-Bug: #2061437
Change-Id: I438a686d948cd929311e2f418bb02fb771805148
Signed-off-by: Adam Rozman <adam.rozman@est.tech>
2024-04-15 15:53:50 +03:00
Steve Baker
215fecd447 Step to clean UEFI NVRAM entries
Adds a deploy step ``clean_uefi_nvram`` to remove unrequired extra UEFI
NVRAM boot entries. By default any entry matching ``HD`` as the root
device, or with a ``shim`` or ``grub`` efi file in the path will be
deleted, ensuring that disk based boot entries are removed before the
new entry is created for the written image. The ``match_patterns``
parameter allows a list of regular expressions to be passed, where a
case insensitive search in the device path will result in that entry
being deleted.

Closes-Bug: #2041901
Change-Id: I3559dc800fcdfb0322286eba30ce47041419b0c6
2024-04-11 01:17:23 +12:00
Zuul
cdd0a83448 Merge "Import disk_{utils,partitioner} from ironic-lib" 2024-04-03 01:04:10 +00:00
Zuul
b6075156b3 Merge "USB device discovery" 2024-03-28 21:22:53 +00:00
Dmitry Tantsur
f824930bbd
Import disk_{utils,partitioner} from ironic-lib
With the iscsi deploy long gone, these modules are only used in IPA and
in fact represent a large part of its critical logic. Having them
separately sometimes makes fixing issues tricky if an interface of
a function needs changing.

This change imports the code mostly as it is, just removing run_as_root and
a deprecated function, as well as moving configuration options to config.py.

Also migrates one relevant function from ironic_lib.utils.

Change-Id: If8fae8210d85c61abb85c388b300e40a75d0531c
2024-03-15 18:45:04 +01:00
Damien Rannou
3fd68c0848 USB device discovery
The idea is to retreive USB devices informations via 'lshw' and
return the list to ironic in order to be able to create introspection
rules based on USB devices.

Change-Id: I39d60cb467614fca7a7f701dbe576154213580a5
2024-02-19 14:49:52 +01:00
Zuul
1e107bd625 Merge "Add support for reporting CPU socket number" 2024-01-22 11:52:06 +00:00
Kaifeng Wang
9cafe76225 Add support for reporting CPU socket number
IPA reports a few cpu fields including cores, arch, flags etc.
There is a need that user wants to utilize the physical number in
a baremetal since cores are just a logical representation of the
compute resource.
The socket number is more suitable for the quota control in some
use cases.

Change-Id: I94be86d6b12a3a7e7ca1041d948427a073412a31
2024-01-19 21:24:37 +00:00
Jay Faulkner
36e5993a04 [codespell] Fix spelling issues in IPA
This fixes several spelling issues identified by codepsell. In some
cases, I may have manually modified a line to make the output more clear
or to correct grammatical issues which were obvious in the codespell
output.

Later changes in this chain will provide the codespell config used to
generate this, as well as adding this commit's SHA, once landed, to a
.git-blame-ignore-revs file to ensure it will not pollute git historys
for modern clients.

Related-Bug: 2047654
Change-Id: I240cf8484865c9b748ceb51f3c7b9fd973cb5ada
2023-12-28 10:54:46 -08:00
Zuul
62041d6d9e Merge "Fix referencing to the raid_device var which is not set" 2023-12-12 17:01:32 +00:00
Maryna Savchenko
f80330839d Fix referencing to the raid_device var which is not set
Change-Id: I11180e5d61d893a78583ace555f6e90ba8845950
2023-11-29 12:40:29 +01:00
Zuul
7a4114512c Merge "Handle different device outputs for multipath" 2023-11-22 21:36:40 +00:00
Iury Gregory Melo Ferreira
0a29206b8d Handle different device outputs for multipath
In some cases the output of the multipath can differ
and we would return a wrong parent device.

Closes-Bug: 2043992
Change-Id: I848d7df798cc736bd5a55eed8fa46110caea1dc3
2023-11-20 22:51:41 -03:00
Adam Rozman
7a52314695 fix multipathd error handling release notes
This commit:
  - fixes some "multipathd error handling improvement"
    release notes
  - fixes a related comment in the code

Related launchpad issue https://bugs.launchpad.net/ironic-python-agent/+bug/2031092

Change-Id: Ie3ba0601fa117b053cb8db6284e47249ca9c9134
Signed-off-by: Adam Rozman <adam.rozman@est.tech>
2023-11-10 09:54:20 +02:00
Zuul
845df338f8 Merge "improve multipathd error handling" 2023-11-09 17:31:32 +00:00
Adam Rozman
13537db293 improve multipathd error handling
This commit:
  - Adds the ability to ignore inconsequential OS error caused
    by starting the multipathd service when an instance of the
    service is already running.

Related launchpad issue https://bugs.launchpad.net/ironic-python-agent/+bug/2031092

Change-Id: Iebf486915bfdc2546451e6b38a450b4c241e43a8
2023-10-23 16:33:03 +03:00
Boushra Bettir
dbf3e5408d Replace shlex module with helper function
Used helper function, `parse_device_tags`
from ironic_lib instead of the
shlex module for their identical
functionality. Updated
mock_execute.side_effect for lsblk
compatibility in utils.execute.

Closes-Bug: #2037572
Change-Id: I6600e054f9644c67ab003f0e0f6c380b5c217223
2023-10-12 13:34:32 -07:00
Julia Kreger
eb95273ffb Add get_service_steps logic to the agent
Initial code patches for service steps have merged in
ironic, and it is now time to add support into the
agent which allows service steps to be raised to
the service.

Updates the default hardware manager version to 1.2,
which has *rarely* been incremented due to oversight.

Change-Id: Iabd2c6c551389ec3c24e94b71245b1250345f7a7
2023-08-31 06:22:22 -07:00
Dmitry Tantsur
9ed232e77e Add network interface speed to the inventory
This is another fact that Metal3's baremetal-operator is currently
consuming from extra-hardware.

Change-Id: I2ec9d5e9369f5508e7583a4e13c2083f5c8b28ba
2023-05-03 12:20:35 +02:00
Dmitry Tantsur
3e05a03f7c Deprecate LLDP in inventory in favour of a new collector
Binary LLDP data is bloating inventory causing us to disable its collection
by default. For other similar low-level information, such as PCI devices
or DMI data, we already use inspection collectors instead. Now that the
inventory format is shared with out-of-band inspection, having LLDP
there makes even less sense.

This change adds a new collector ``lldp`` to replace the now-deprecated
inventory field.

Change-Id: I56be06a7d1db28407e1128c198c12bea0809d3a3
2023-04-26 19:33:51 +00:00
Dmitry Tantsur
0304c73c0e Report system firmware information in the inventory
Change-Id: I5b6ceb9cdcf4baa97a6f0482d1030d14f3f2ecff
2023-03-31 14:28:32 +02:00
Dmitry Tantsur
c26f498f49 Make logs collection a hardware manager call
This allows hardware managers to collect additional logs.

Change-Id: If082b921d4bf71c4cc41a5a72db6995b08637374
2023-01-25 15:17:06 +01:00
Rozzii
830fdfa4c6
prioritize lsblk as a source of device serials
The current way of prioritizing ID/DM_SERIAL_SHORT or ID/DM_SERIAL works
in most cases but the udev values seem to be unreliable.

Based on experience it looks like lsblk might be a better
source of truth than udev in regerards to serial number
information. This commit makes lsblk the default provider
of block device serial number information.

Story: 2010263
Task: 46161

Change-Id: I16039b46676f1a61b32ee7ca7e6d526e65829113
2022-10-10 19:31:47 +03:00
Jakub Jelinek
a99bf274e4 SoftwareRAID: Enable skipping RAIDS
Extend the ability to skip disks to RAID devices
This allows users to specify the volume name of
a logical device in the skip list which is then not cleaned
or created again during the create/apply configuration phase
The volume name can be specified in target raid config provided
the change https://review.opendev.org/c/openstack/ironic-python-agent/+/853182/
passes

Story: 2010233

Change-Id: Ib9290a97519bc48e585e1bafb0b60cc14e621e0f
2022-09-05 20:43:51 +00:00
Zuul
ed6a8d28b7 Merge "Create RAIDs with volume name" 2022-09-02 19:26:57 +00:00
Jakub Jelinek
daa20b01d1 Create RAIDs with volume name
Use 'volume_name' field from 'target_raid_config' to create logical
disks if it is present
Do not allow two logical disks to have the same volume name

Change-Id: If3e4e9f8698ec3e0cb49717f8ed2087d2ba03f2c
2022-09-02 14:51:42 +00:00
Julia Kreger
f3e3de8097 Fix software raid output poisoning
In the event a device name is set to contain a raid device path,
it is possible for the Name and Events field values of mdadm's
detailed output to contain text which inadvertently gets captured and
mapped as component data for the "holder" devices of the RAID set.

This would cause invalid values to get passed to UEFI methods
which would cause a deployment to fail under these circumstances.

We now ignore the Name and Events fields in mdadm output.

Change-Id: If721dfe1caa5915326482969e55fbf4697538231
2022-08-24 10:15:27 -07:00
Zuul
f89d54f4b8 Merge "Improve function list_block_devices_check_skip_list" 2022-08-17 12:47:45 +00:00
Jakub Jelinek
1ac61e1dbd Improve function list_block_devices_check_skip_list
Fix minor issues suggested by dtantsur
Add an example of skip list specification to the documentation

A follow-up patch to I3bdad3cca8acb3e0a69ebb218216e8c8419e9d65

Change-Id: Ic94a33b7bc0572a1cc8f92b330474ec63a173e81
2022-08-16 15:17:15 +00:00
Zuul
3a4baa637f Merge "Enable skipping disks for cleaning" 2022-08-16 11:49:48 +00:00
Jakub Jelinek
0212337bd5 Enable skipping disks for cleaning
Introduce a field skip_block_devices in properties - this is a list of dictionaries
Create a helper function list_block_devices_check_skip_list
Update tests of erase_devices_express to use node when calling _list_erasable_devices
Add tests covering various options of the skip list definition
Use the helper function in get_os_install_device when node is cached

Story: 2009914

Change-Id: I3bdad3cca8acb3e0a69ebb218216e8c8419e9d65
2022-08-11 09:30:00 +00:00
Zuul
eb2215090a Merge "Use lsblk json output for safety_check_block_device" 2022-08-03 23:47:17 +00:00
Jakub Jelinek
e196fdfb62 Remove unused lines of code
The 5 lines of code were extracted from erase_devices_metadata to _list_erasable_devices, but now are duplicated in both functions.
The variable block_devices is not used in erase_devices_metadata.

Change-Id: I89f56c69d90fb0eb61907d6667266fbd57d333af
2022-07-20 10:00:53 +00:00
Riccardo Pittau
b5fac66bc3 Use lsblk json output for safety_check_block_device
Change-Id: Ibfc2e203287d92e66567c33dc48f59392852b88e
2022-07-20 11:56:27 +02:00
Julia Kreger
beb7484858 Guard shared device/cluster filesystems
Certain filesystems are sometimes used in specialty computing
environments where a shared storage infrastructure or fabric exists.
These filesystems allow for multi-host shared concurrent read/write
access to the underlying block device by *not* locking the entire
device for exclusive use. Generally ranges of the disk are reserved
for each interacting node to write to, and locking schemes are used
to prevent collissions.

These filesystems are common for use cases where high availability
is required or ability for individual computers to collaborate on a
given workload is critical, such as a group of hypervisors supporting
virtual machines because it can allow for nearly seamless transfer
of workload from one machine to another.

Similar technologies are also used for cluster quorum and cluster
durable state sharing, however that is not specifically considered
in scope.

Where things get difficult is becuase the entire device is not
exclusively locked with the storage fabrics, and in some cases locking
is handled by a Distributed Lock Manager on the network, or via special
sector interactions amongst the cluster members which understand
and support the filesystem.

As a reult of this IO/Interaction model, an Ironic-Python-Agent
performing cleaning can effectively destroy the cluster just by
attempting to clean storage which it percieves as attached locally.
This is not IPA's fault, often this case occurs when a Storage
Administrator forgot to update LUN masking or volume settings on
a SAN as it relates to an individual host in the overall
computing environment. The net result of one node cleaning the
shared volume may include restoration from snapshot, backup
storage, or may ultimately cause permenant data loss, depending
on the environment and the usage of that environment.

Included in this patch:
- IBM GPFS - Can be used on a shared block device... apparently according
             to IBM's documentation. The standard use of GPFS is more Ceph
             like in design... however GPFS is also a specially licensed
             commercial offering, so it is a red flag if this is
             encountered, and should be investigated by the environment's
             systems operator.
- Red Hat GFS2 - Is used with shared common block devices in clusters.
- VMware VMFS - Is used with shared SAN block devices, as well as
                local block devices. With shared block devices,
                ranges of the disk are locked instead of the whole
                disk, and the ranges are mapped to virtual machine
                disk interfaces.
                It is unknown, due to lack of information, if this
                will detect and prevent erasure of VMFS logical
                extent volumes.

Co-Authored-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b
Story: 2009978
Task: 44985
2022-07-19 13:24:03 -07:00
Zuul
ccf4ee31cf Merge "Gather details about bond interfaces if present" 2022-07-02 02:56:46 +00:00
Zuul
a9de7f80cc Merge "Use json for lsblk output" 2022-06-30 23:38:15 +00:00
Zuul
312e1527ab Merge "Warn when smartctl not found" 2022-06-27 12:10:31 +00:00
Mark Goddard
b68fa6b2e1 Warn when smartctl not found
Currently, if smartctl is not found by IPA, it will silently skip ATA
secure erase and proceed to shred (if enabled). This is supposedly for
backwards compatibility, but is quite hard to diagnose.

This change adds a warning message to make it more obvious what is
happening.

TrivialFix

Change-Id: I03a381e99de79f201ec7e9a388777c3d48457e93
2022-06-24 16:58:37 +01:00
Derek Higgins
7e4fe3bf6a Gather details about bond interfaces if present
If present gather information about bonded interfaces.

Story: #2010093
Task: #45637

Change-Id: I394187640b4788ebec21c3391d33ed728fb72ffa
2022-06-21 09:45:03 +01:00
Dmitry Tantsur
69e2254503 Fix discovering WWN/serial for devicemapper devices
UDev prefix is DM_ not ID_ for them. On top of that, they don't have
short serials (or at least don't always have).

Change-Id: I5b6075fbff72201a2fd620f789978acceafc417b
2022-06-14 19:06:53 +02:00
Riccardo Pittau
09ea41c83d Use json for lsblk output
The lsblk output is available in json format since version 2.27 of
util-linux [1]

https: //mirrors.edge.kernel.org/pub/linux/utils/util-linux/v2.27/v2.27-ReleaseNotes

Change-Id: I0c5812736b7a320cc4ecc333f80db70eb78cc76d
2022-06-14 17:50:05 +02:00
Julia Kreger
014d37743a Multipath Hardware path handling
Removes multipath base devices from consideration by
default, and instead allows the device-mapper device
managed by multipath to be picked up and utilized
instead.

In effect, allowing us to ignore standby paths *and*
leverage multiple concurrent IO paths if so offered
via ALUA.

In reality, anyone who has previously built IPA with
multipath tooling might not have encountered issues
previously because they used Active/Active SAN storage
environments. They would have worked because the IO lock
would have been exchanged between controllers and paths.
However, Active/Passive environments will block passive
paths from access, ultimately preventing new locks from
being established without proper negotiation. Ultimately
requiring multipathing *and* the agent to be smart enough
to know to disqualify underlying paths to backend storage
volumes.

An additional benefit of this is active/active MPIO devices
will, as long as ``multipath`` is present inside the ramdisk,
no longer possibly result in duplicate IO wipes occuring
accross numerous devices.

Story: #2010003
Task: #45108
Resolves: rhbz#2076622
Resolves: rhbz#2070519
Change-Id: I0fd6356f036d5ff17510fb838eaf418164cdfc92
2022-05-18 20:26:39 -03:00
Zuul
f08f70134d Merge "Improve efficiency of storage cleaning in mixed media envs" 2022-03-15 18:05:29 +00:00
Jacob Anders
c5f7f18bcb Improve efficiency of storage cleaning in mixed media envs
https://storyboard.openstack.org/#!/story/2008290 added support
for NVMe-native storage cleaning, greatly improving storage clean
times on NVMe-based nodes as well as reducing device wear.

This is a follow up change which aims to make further improvements
to cleaning efficiency in mixed NVMe-HDD environments. This is
achieved by combining NVMe-native cleaning methods on NVMe devices
with traditional metadata clean on non-NVMe devices.

Story: 2009264
Task: 43498
Change-Id: I445d8f4aaa6cd191d2e540032aed3148fdbff341
2022-03-15 19:00:25 +10:00
Julia Kreger
99ca1086db Create fstab entry with appropriate label
Depending on the how the stars align with partition images
being written to a remote system, we *may* end up with
*either* a Partition UUID value, or a Partition's UUID value.

Which are distinctly different.

This is becasue the value, when collected as a result of writing
an image to disk *falls* back and passes the value to enable
partition discovery and matching.

Later on, when we realized we ought to create an fstab entry,
we blindly re-used the value thinking it was, indeed, always
a Partition's UUID and not the Partition UUID. Obviously,
the label type is quite explicit, either UUID or PARTUUID
respectively, when initial ramdisk utilities such as dracut
are searching and mounting filesystems.

Adds capability to identify the correct label to utilize
based upon the current state of the block devices on disk.

Granted, we are likely only exposed to this because of IO
race conditions under high concurrecy load operations.
Normally this would only be seen on test VMs, but
systems being backed by a Storage Area Network *can*
exibit the same IO race conditions as virtual machines.

Change-Id: I953c936cbf8fad889108cbf4e50b1a15f511b38c
Resolves: rhbz#2058717
Story: #2009881
Task: 44623
2022-03-10 07:04:01 -08:00
Arne Wiebalck
62c5674a60 SoftwareRAID: Use efibootmgr (and drop grub2-install)
Move the software RAID code path from grub2-install to
efibootmgr:

- remove the UEFI efibootmgr exception for software RAID
- create and populate the ESPs on the holder disks
- update the NVRAM with all ESPs (the component devices
  of the ESP mirror, use unique labels to avoid unintentional
  deduplication of entries in the NVRAM)

Story: #2009794

Change-Id: I7ed34e595215194a589c2f1cd0b39ff0336da8f1
2022-01-26 14:43:40 +01:00