Commit Graph

459 Commits (088610844a70aa6cba06c899eaeeeb835fdf6fa7)

Author SHA1 Message Date
Zuul 088610844a Merge "update NVIDIA NIC firmware images and settings by ironic-python-agent" 2023-01-31 19:35:53 +00:00
Zuul c12135911a Merge "Make logs collection a hardware manager call" 2023-01-26 16:26:42 +00:00
Jay Faulkner f8fc7e52f3 Make reno ignore bugfix eol tags
Reno was assuming all tags ending in -eol represented an old, EOL'd
stable branch. That's not true for Ironic projects which have bugfix
branches. Update the regexp to exclude those branches.

Co-Authored-By: Adam McArthur <>
Change-Id: I265969ab40a98a02962c2fc8460b6519ab576f99
2023-01-25 13:18:01 -08:00
Dmitry Tantsur c26f498f49 Make logs collection a hardware manager call
This allows hardware managers to collect additional logs.

Change-Id: If082b921d4bf71c4cc41a5a72db6995b08637374
2023-01-25 15:17:06 +01:00
waleed mousa 2c7f95e3ac update NVIDIA NIC firmware images and settings by ironic-python-agent
Add "update_nvidia_nic_firmware_image" and "update_nvidia_nic_firmware_settings"
clean steps to MellanoxDeviceHardwareManager.

By adding those two steps, we can update the firmware image and
firmware settings of NVIDIA NICs by ironic-python-agent using
manual cleaning command
The clean steps require mstflint package installed on the image.
The "update_nvidia_nic_firmware_image" clean step requires to pass
"images" parameter to the clean command
The "images" parameter is a json blob contains
a list of images, where each image contains a map of:
  * url: to firmware image (file://, http://)
  * checksum: checksum of the provided image
  * checksumType: md5/sha512/sha256
  * componentFlavor: PSID of the nic
  * version: version of the FW

The "update_nvidia_nic_firmware_settings" clean step requires to pass
"settings" parameter to the clean command
The "settings" parameter is a json blob contains
a list of settings, where each settings contains a map of:
  * deviceID: device ID
  * globalConfig: global config
  * function0Config: function 0 config
  * function1Config: function 1 config

Change-Id: Icfaffd7c58c3c73c3fa28cfc2a6c954d2c93c16e
Story: 2010228
Task: 46016
2023-01-11 14:00:07 +00:00
Zuul 929ae3dd28 Merge "prioritize lsblk as a source of device serials" 2022-10-14 16:02:51 +00:00
Zuul 29c03cadc3 Merge "Update release versions for yoga" 2022-10-14 01:13:07 +00:00
Rozzii 830fdfa4c6
prioritize lsblk as a source of device serials
The current way of prioritizing ID/DM_SERIAL_SHORT or ID/DM_SERIAL works
in most cases but the udev values seem to be unreliable.

Based on experience it looks like lsblk might be a better
source of truth than udev in regerards to serial number
information. This commit makes lsblk the default provider
of block device serial number information.

Story: 2010263
Task: 46161

Change-Id: I16039b46676f1a61b32ee7ca7e6d526e65829113
2022-10-10 19:31:47 +03:00
OpenStack Release Bot 1132128252 Update master for stable/zed
Add file to the reno documentation build to show release notes for

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on

Sem-Ver: feature
Change-Id: Iff9b5efee0b436357d5cae3909a89cd09d5e6070
2022-09-23 08:41:52 +00:00
Riccardo Pittau cdd6b4f5ac Update release versions for yoga
Change-Id: I06d14bc499a7c081fe73b68de6c49e2f1bc51dc5
2022-09-23 09:05:53 +02:00
Jakub Jelinek a99bf274e4 SoftwareRAID: Enable skipping RAIDS
Extend the ability to skip disks to RAID devices
This allows users to specify the volume name of
a logical device in the skip list which is then not cleaned
or created again during the create/apply configuration phase
The volume name can be specified in target raid config provided
the change

Story: 2010233

Change-Id: Ib9290a97519bc48e585e1bafb0b60cc14e621e0f
2022-09-05 20:43:51 +00:00
Zuul ed6a8d28b7 Merge "Create RAIDs with volume name" 2022-09-02 19:26:57 +00:00
Jakub Jelinek daa20b01d1 Create RAIDs with volume name
Use 'volume_name' field from 'target_raid_config' to create logical
disks if it is present
Do not allow two logical disks to have the same volume name

Change-Id: If3e4e9f8698ec3e0cb49717f8ed2087d2ba03f2c
2022-09-02 14:51:42 +00:00
Julia Kreger 4359c1e8ad Trivial: Fix reno for software raid fix
Fixes the release note for

Change-Id: I9971d12665f2c8a4fdfe82911c6173021d03ddc0
2022-08-25 08:16:16 -07:00
Zuul ef5d9da134 Merge "Fix software raid output poisoning" 2022-08-25 14:54:29 +00:00
Julia Kreger f3e3de8097 Fix software raid output poisoning
In the event a device name is set to contain a raid device path,
it is possible for the Name and Events field values of mdadm's
detailed output to contain text which inadvertently gets captured and
mapped as component data for the "holder" devices of the RAID set.

This would cause invalid values to get passed to UEFI methods
which would cause a deployment to fail under these circumstances.

We now ignore the Name and Events fields in mdadm output.

Change-Id: If721dfe1caa5915326482969e55fbf4697538231
2022-08-24 10:15:27 -07:00
niuke 4bf88b204f remove unicode prefix from code
Change-Id: I70f0112f1ee3066ffd9316d10b84b9ea5b7fc306
2022-08-23 19:44:10 +08:00
Jakub Jelinek 0212337bd5 Enable skipping disks for cleaning
Introduce a field skip_block_devices in properties - this is a list of dictionaries
Create a helper function list_block_devices_check_skip_list
Update tests of erase_devices_express to use node when calling _list_erasable_devices
Add tests covering various options of the skip list definition
Use the helper function in get_os_install_device when node is cached

Story: 2009914

Change-Id: I3bdad3cca8acb3e0a69ebb218216e8c8419e9d65
2022-08-11 09:30:00 +00:00
Zuul 21b21a5f15 Merge "Guard shared device/cluster filesystems" 2022-07-20 08:23:55 +00:00
Julia Kreger beb7484858 Guard shared device/cluster filesystems
Certain filesystems are sometimes used in specialty computing
environments where a shared storage infrastructure or fabric exists.
These filesystems allow for multi-host shared concurrent read/write
access to the underlying block device by *not* locking the entire
device for exclusive use. Generally ranges of the disk are reserved
for each interacting node to write to, and locking schemes are used
to prevent collissions.

These filesystems are common for use cases where high availability
is required or ability for individual computers to collaborate on a
given workload is critical, such as a group of hypervisors supporting
virtual machines because it can allow for nearly seamless transfer
of workload from one machine to another.

Similar technologies are also used for cluster quorum and cluster
durable state sharing, however that is not specifically considered
in scope.

Where things get difficult is becuase the entire device is not
exclusively locked with the storage fabrics, and in some cases locking
is handled by a Distributed Lock Manager on the network, or via special
sector interactions amongst the cluster members which understand
and support the filesystem.

As a reult of this IO/Interaction model, an Ironic-Python-Agent
performing cleaning can effectively destroy the cluster just by
attempting to clean storage which it percieves as attached locally.
This is not IPA's fault, often this case occurs when a Storage
Administrator forgot to update LUN masking or volume settings on
a SAN as it relates to an individual host in the overall
computing environment. The net result of one node cleaning the
shared volume may include restoration from snapshot, backup
storage, or may ultimately cause permenant data loss, depending
on the environment and the usage of that environment.

Included in this patch:
- IBM GPFS - Can be used on a shared block device... apparently according
             to IBM's documentation. The standard use of GPFS is more Ceph
             like in design... however GPFS is also a specially licensed
             commercial offering, so it is a red flag if this is
             encountered, and should be investigated by the environment's
             systems operator.
- Red Hat GFS2 - Is used with shared common block devices in clusters.
- VMware VMFS - Is used with shared SAN block devices, as well as
                local block devices. With shared block devices,
                ranges of the disk are locked instead of the whole
                disk, and the ranges are mapped to virtual machine
                disk interfaces.
                It is unknown, due to lack of information, if this
                will detect and prevent erasure of VMFS logical
                extent volumes.

Co-Authored-by: Jay Faulkner <>
Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b
Story: 2009978
Task: 44985
2022-07-19 13:24:03 -07:00
Dmitry Tantsur 6a1334a068 Drop support for instance netboot
Change-Id: I2b4c543537dac8904028fdcdb590c1c214238e10
2022-07-07 16:38:22 +02:00
Zuul 5129eb4933 Merge "Fix passing kwargs in clean steps" 2022-07-04 13:56:52 +00:00
Zuul ccf4ee31cf Merge "Gather details about bond interfaces if present" 2022-07-02 02:56:46 +00:00
Zuul 0cf5959f67 Merge "Collect udev properties in the ramdisk logs" 2022-07-02 00:37:35 +00:00
waleedm eb07839bd4 Fix passing kwargs in clean steps
Pass kwargs to dispatch_to_managers method in execute_clean_step

Change-Id: Ida4ed4646659b2ee3f8f92b0a4d73c0266dd5a99
Story: 2010123
Task: 45705
2022-07-01 23:03:55 +00:00
Derek Higgins 7e4fe3bf6a Gather details about bond interfaces if present
If present gather information about bonded interfaces.

Story: #2010093
Task: #45637

Change-Id: I394187640b4788ebec21c3391d33ed728fb72ffa
2022-06-21 09:45:03 +01:00
Dmitry Tantsur a98675890f Collect udev properties in the ramdisk logs
Change-Id: Ifcf3dfff00b604dec1e2f430369ab8053f50f137
2022-06-17 16:19:58 +02:00
Dmitry Tantsur 69e2254503 Fix discovering WWN/serial for devicemapper devices
UDev prefix is DM_ not ID_ for them. On top of that, they don't have
short serials (or at least don't always have).

Change-Id: I5b6075fbff72201a2fd620f789978acceafc417b
2022-06-14 19:06:53 +02:00
Julia Kreger 014d37743a Multipath Hardware path handling
Removes multipath base devices from consideration by
default, and instead allows the device-mapper device
managed by multipath to be picked up and utilized

In effect, allowing us to ignore standby paths *and*
leverage multiple concurrent IO paths if so offered
via ALUA.

In reality, anyone who has previously built IPA with
multipath tooling might not have encountered issues
previously because they used Active/Active SAN storage
environments. They would have worked because the IO lock
would have been exchanged between controllers and paths.
However, Active/Passive environments will block passive
paths from access, ultimately preventing new locks from
being established without proper negotiation. Ultimately
requiring multipathing *and* the agent to be smart enough
to know to disqualify underlying paths to backend storage

An additional benefit of this is active/active MPIO devices
will, as long as ``multipath`` is present inside the ramdisk,
no longer possibly result in duplicate IO wipes occuring
accross numerous devices.

Story: #2010003
Task: #45108
Resolves: rhbz#2076622
Resolves: rhbz#2070519
Change-Id: I0fd6356f036d5ff17510fb838eaf418164cdfc92
2022-05-18 20:26:39 -03:00
Zuul 6b8f387498 Merge "Collect a full lsblk output in the ramdisk logs" 2022-05-09 14:21:43 +00:00
Zuul 979eea621e Merge "Do not try to guess EFI partition path by its number" 2022-05-05 15:17:35 +00:00
Dmitry Tantsur f09f6c9f1a Do not try to guess EFI partition path by its number
The logic of adding a partition number to the device path does not work
for devicemapper devices (e.g. a multipath storage device).

Change-Id: I9a445e847d282c50adfa4bad5e7136776861005d
2022-05-04 15:06:02 +02:00
Dmitry Tantsur 65c4de903a Use a pre-defined partition UUID to detect configdrive on GPT
Using partition numbers is currently broken for devicemapper devices.
Fortunately, GPT has partition UUIDs, so we can just generate one and
use it for lookup.

Change-Id: I41ffe4f8e4c6e43182090b5aa2a2b4b34f32efd5
2022-04-29 16:56:53 +02:00
Dmitry Tantsur 424e649bed Collect a full lsblk output in the ramdisk logs
The existing lsblk call is very handy for an overview, but there a lot
more useful pairs to collect. Collect them in a machine-readable format
to be able to use in debugging and further development.

Change-Id: Ib27843524421944ee93de975d275e93276a5597a
2022-04-29 14:24:19 +02:00
OpenStack Release Bot cbdb4dd8f3 Update master for stable/yoga
Add file to the reno documentation build to show release notes for

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on

Sem-Ver: feature
Change-Id: Ib1aa5d02cc5dc32bc4eebf6982d3f00d44e703f3
2022-03-23 14:34:06 +00:00
Zuul f08f70134d Merge "Improve efficiency of storage cleaning in mixed media envs" 2022-03-15 18:05:29 +00:00
Jacob Anders c5f7f18bcb Improve efficiency of storage cleaning in mixed media envs!/story/2008290 added support
for NVMe-native storage cleaning, greatly improving storage clean
times on NVMe-based nodes as well as reducing device wear.

This is a follow up change which aims to make further improvements
to cleaning efficiency in mixed NVMe-HDD environments. This is
achieved by combining NVMe-native cleaning methods on NVMe devices
with traditional metadata clean on non-NVMe devices.

Story: 2009264
Task: 43498
Change-Id: I445d8f4aaa6cd191d2e540032aed3148fdbff341
2022-03-15 19:00:25 +10:00
Zuul de28b7bfdc Merge "Create fstab entry with appropriate label" 2022-03-11 00:40:01 +00:00
Julia Kreger 99ca1086db Create fstab entry with appropriate label
Depending on the how the stars align with partition images
being written to a remote system, we *may* end up with
*either* a Partition UUID value, or a Partition's UUID value.

Which are distinctly different.

This is becasue the value, when collected as a result of writing
an image to disk *falls* back and passes the value to enable
partition discovery and matching.

Later on, when we realized we ought to create an fstab entry,
we blindly re-used the value thinking it was, indeed, always
a Partition's UUID and not the Partition UUID. Obviously,
the label type is quite explicit, either UUID or PARTUUID
respectively, when initial ramdisk utilities such as dracut
are searching and mounting filesystems.

Adds capability to identify the correct label to utilize
based upon the current state of the block devices on disk.

Granted, we are likely only exposed to this because of IO
race conditions under high concurrecy load operations.
Normally this would only be seen on test VMs, but
systems being backed by a Storage Area Network *can*
exibit the same IO race conditions as virtual machines.

Change-Id: I953c936cbf8fad889108cbf4e50b1a15f511b38c
Resolves: rhbz#2058717
Story: #2009881
Task: 44623
2022-03-10 07:04:01 -08:00
Zuul bcd5d11d9a Merge "Rescan device after filesystem creation" 2022-03-07 18:37:50 +00:00
Riccardo Pittau 697fa6f3b6 Use utf-16-le if BOM not present
In case no BOM is present in the CSV file the utf-16 codec won't work.
We fail over to utf-16-le as Little Endian is commonly used.

Change-Id: I3e25ce4997f5dd3df87caba753daced65838f85a
2022-02-22 15:53:54 +01:00
Vanou Ishii fa70a1909b Rescan device after filesystem creation
In work_on_disk function, IPA runs mkfs commands without
following device rescan operation. This leads to incorrect
content of uuids_to_return to be returned.
These mkfs commands modify partition label but IPA fails
to catch such changes because of no following device
rescan operation.

This commit adds call of device rescan function before
uuids_to_return construction.

Change-Id: I4e8b30deb5e2247f51ce8f10bd3271f64a264089
2022-02-11 11:02:52 +09:00
Dmitry Tantsur b8b1991bea Clean up release notes
Change-Id: I568d7edfe81e928e6d7f09bd4a7933ca72b8813a
2022-02-03 14:49:36 +01:00
Arne Wiebalck 62c5674a60 SoftwareRAID: Use efibootmgr (and drop grub2-install)
Move the software RAID code path from grub2-install to

- remove the UEFI efibootmgr exception for software RAID
- create and populate the ESPs on the holder disks
- update the NVRAM with all ESPs (the component devices
  of the ESP mirror, use unique labels to avoid unintentional
  deduplication of entries in the NVRAM)

Story: #2009794

Change-Id: I7ed34e595215194a589c2f1cd0b39ff0336da8f1
2022-01-26 14:43:40 +01:00
Arne Wiebalck 7f15455d8d Burn-in: Dynamic network pairing
Pair nodes dynamically via a distributed coordination backend for
network burn-in. The algorithm uses a group to pair nodes: after
acquiring a lock, a first node joins the group, releases the lock,
waits for a second node, then they both leave, and release the lock
for the next pair.

Story: #2007523
Task: #42796

Change-Id: I572093b144bc90a49cd76929c7e8685ed45d9f6e
2022-01-10 11:31:33 +01:00
Zuul fa5cccd137 Merge "Burn-in: Add options for named log files" 2021-12-09 11:54:17 +00:00
Zuul 60df149c8f Merge "Instruct qemu-img to write image zeros to disk." 2021-12-09 11:00:50 +00:00
Zuul 8abc930d97 Merge "Burn-in: Add SMART self test to disk burn-in" 2021-12-09 09:38:39 +00:00
Zuul 3cd964fa84 Merge "Prepare for bugfix release" 2021-12-08 19:27:57 +00:00
Arne Wiebalck e751218059 Burn-in: Add options for named log files
In order to ease logging of the various burn-in steps, this patch
proposes options to define the outpout files for all burn-in steps:
{'agent_burnin_cpu', 'agent_burnin_vm', 'agent_burnin_fio_network',
'agent_burnin_fio_disk'}_outputfile  via a node's driver-info.

Story: #2007523
Task: #44102

Change-Id: I327cae5949d38e738d3c535487b3795d00ad8f1e
2021-12-08 17:47:19 +01:00