1044 Commits

Author SHA1 Message Date
Zuul
6b8f387498 Merge "Collect a full lsblk output in the ramdisk logs" 2022-05-09 14:21:43 +00:00
Zuul
979eea621e Merge "Do not try to guess EFI partition path by its number" 2022-05-05 15:17:35 +00:00
Dmitry Tantsur
f09f6c9f1a Do not try to guess EFI partition path by its number
The logic of adding a partition number to the device path does not work
for devicemapper devices (e.g. a multipath storage device).

Change-Id: I9a445e847d282c50adfa4bad5e7136776861005d
2022-05-04 15:06:02 +02:00
Dmitry Tantsur
65c4de903a Use a pre-defined partition UUID to detect configdrive on GPT
Using partition numbers is currently broken for devicemapper devices.
Fortunately, GPT has partition UUIDs, so we can just generate one and
use it for lookup.

Change-Id: I41ffe4f8e4c6e43182090b5aa2a2b4b34f32efd5
2022-04-29 16:56:53 +02:00
Dmitry Tantsur
424e649bed Collect a full lsblk output in the ramdisk logs
The existing lsblk call is very handy for an overview, but there a lot
more useful pairs to collect. Collect them in a machine-readable format
to be able to use in debugging and further development.

Change-Id: Ib27843524421944ee93de975d275e93276a5597a
2022-04-29 14:24:19 +02:00
Riccardo Pittau
8111475eb0 Use Werkzeug modern version
Request class from Werkzeug now includes json capability by default.
See [1] and [2] for more info.

[1] 2cd4fa9484
[2] 7b52ecd8f3

Change-Id: I3c74b26ef4aff07c371364203a5b39c658b552a7
2022-04-14 10:47:06 +00:00
Zuul
a247fbcc8c Merge "Refactor efi_utils for easier maintaining and debugging" 2022-03-18 20:55:57 +00:00
Zuul
f08f70134d Merge "Improve efficiency of storage cleaning in mixed media envs" 2022-03-15 18:05:29 +00:00
Jacob Anders
c5f7f18bcb Improve efficiency of storage cleaning in mixed media envs
https://storyboard.openstack.org/#!/story/2008290 added support
for NVMe-native storage cleaning, greatly improving storage clean
times on NVMe-based nodes as well as reducing device wear.

This is a follow up change which aims to make further improvements
to cleaning efficiency in mixed NVMe-HDD environments. This is
achieved by combining NVMe-native cleaning methods on NVMe devices
with traditional metadata clean on non-NVMe devices.

Story: 2009264
Task: 43498
Change-Id: I445d8f4aaa6cd191d2e540032aed3148fdbff341
2022-03-15 19:00:25 +10:00
Zuul
de28b7bfdc Merge "Create fstab entry with appropriate label" 2022-03-11 00:40:01 +00:00
Julia Kreger
99ca1086db Create fstab entry with appropriate label
Depending on the how the stars align with partition images
being written to a remote system, we *may* end up with
*either* a Partition UUID value, or a Partition's UUID value.

Which are distinctly different.

This is becasue the value, when collected as a result of writing
an image to disk *falls* back and passes the value to enable
partition discovery and matching.

Later on, when we realized we ought to create an fstab entry,
we blindly re-used the value thinking it was, indeed, always
a Partition's UUID and not the Partition UUID. Obviously,
the label type is quite explicit, either UUID or PARTUUID
respectively, when initial ramdisk utilities such as dracut
are searching and mounting filesystems.

Adds capability to identify the correct label to utilize
based upon the current state of the block devices on disk.

Granted, we are likely only exposed to this because of IO
race conditions under high concurrecy load operations.
Normally this would only be seen on test VMs, but
systems being backed by a Storage Area Network *can*
exibit the same IO race conditions as virtual machines.

Change-Id: I953c936cbf8fad889108cbf4e50b1a15f511b38c
Resolves: rhbz#2058717
Story: #2009881
Task: 44623
2022-03-10 07:04:01 -08:00
Zuul
59c02f48cc Merge "Run partx in verbose mode to simplify debugging" 2022-03-08 12:35:29 +00:00
Zuul
63171c7f38 Merge "Add mount and parted -l to the collected commands" 2022-03-08 12:16:19 +00:00
Zuul
bcd5d11d9a Merge "Rescan device after filesystem creation" 2022-03-07 18:37:50 +00:00
Riccardo Pittau
697fa6f3b6 Use utf-16-le if BOM not present
In case no BOM is present in the CSV file the utf-16 codec won't work.
We fail over to utf-16-le as Little Endian is commonly used.

Change-Id: I3e25ce4997f5dd3df87caba753daced65838f85a
2022-02-22 15:53:54 +01:00
Dmitry Tantsur
f1ee454a0e Add mount and parted -l to the collected commands
Change-Id: I1c759552220291890704d0002a62ea3f51701691
2022-02-14 13:01:32 +01:00
Dmitry Tantsur
3d3df17e5a Refactor efi_utils for easier maintaining and debugging
* Move irrelevant code from inside the giant try..except block
* Do not bother removing the (empty) temporary mountpoint
* Fix log messages according to the actual code
* Fix some code duplication
* Add missing unit tests for failure case

Change-Id: Id7b557419d513375816d73901e2ab6f139d765ad
2022-02-14 12:46:25 +01:00
Dmitry Tantsur
4d16ea413f Run partx in verbose mode to simplify debugging
Otherwise the actual failure cause is not recorded.

Change-Id: If66ee97016ddf0e5c3f40ad9400ff3bc6fdebedc
2022-02-14 12:02:22 +01:00
Arne Wiebalck
a83f38479e Move prepare_boot_partitions_for_softraid to raid_utils
prepare_boot_partitions_for_softraid() is used in BIOS and UEFI
modes to prepare the partitions for the bootloader. Move it from
the image extensions to raid_utils to reflect this and avoid the
import of an extension to efi_utils.

Follow-up to 62c5674a600baeeef0af3b12baeab486870eb103.

Change-Id: I9f5974fbbfea5e8cdfbb7e49bea375e5cbfdd145
2022-02-14 11:21:36 +01:00
Vanou Ishii
fa70a1909b Rescan device after filesystem creation
In work_on_disk function, IPA runs mkfs commands without
following device rescan operation. This leads to incorrect
content of uuids_to_return to be returned.
These mkfs commands modify partition label but IPA fails
to catch such changes because of no following device
rescan operation.

This commit adds call of device rescan function before
uuids_to_return construction.

Change-Id: I4e8b30deb5e2247f51ce8f10bd3271f64a264089
2022-02-11 11:02:52 +09:00
Dmitry Tantsur
6ebf041704 Use canonical device name for RAID device for ESP
It seems like tinyIPA silently replaces /dev/md/esp with /dev/md127.
Find the next free /dev/md device and use it instead.

Also rescan the resulting device before copying files.

Change-Id: Ie04f530be434c4b1561e75f387b9da679e4607e0
Depends-On: https://review.opendev.org/c/openstack/ironic/+/827129/
2022-02-01 10:06:31 +01:00
Arne Wiebalck
62c5674a60 SoftwareRAID: Use efibootmgr (and drop grub2-install)
Move the software RAID code path from grub2-install to
efibootmgr:

- remove the UEFI efibootmgr exception for software RAID
- create and populate the ESPs on the holder disks
- update the NVRAM with all ESPs (the component devices
  of the ESP mirror, use unique labels to avoid unintentional
  deduplication of entries in the NVRAM)

Story: #2009794

Change-Id: I7ed34e595215194a589c2f1cd0b39ff0336da8f1
2022-01-26 14:43:40 +01:00
Zuul
e06dd22e78 Merge "Burn-in: Dynamic network pairing" 2022-01-20 21:17:38 +00:00
Arne Wiebalck
7f15455d8d Burn-in: Dynamic network pairing
Pair nodes dynamically via a distributed coordination backend for
network burn-in. The algorithm uses a group to pair nodes: after
acquiring a lock, a first node joins the group, releases the lock,
waits for a second node, then they both leave, and release the lock
for the next pair.

Story: #2007523
Task: #42796

Change-Id: I572093b144bc90a49cd76929c7e8685ed45d9f6e
2022-01-10 11:31:33 +01:00
Arne Wiebalck
0b69890c11 [trivial] Fix typo in __init__.py
Change-Id: I67810abbfb975c0d0ad0faf9807318c462580528
2021-12-16 22:03:51 +01:00
Zuul
fa5cccd137 Merge "Burn-in: Add options for named log files" 2021-12-09 11:54:17 +00:00
Zuul
60df149c8f Merge "Instruct qemu-img to write image zeros to disk." 2021-12-09 11:00:50 +00:00
Zuul
8abc930d97 Merge "Burn-in: Add SMART self test to disk burn-in" 2021-12-09 09:38:39 +00:00
Arne Wiebalck
e751218059 Burn-in: Add options for named log files
In order to ease logging of the various burn-in steps, this patch
proposes options to define the outpout files for all burn-in steps:
{'agent_burnin_cpu', 'agent_burnin_vm', 'agent_burnin_fio_network',
'agent_burnin_fio_disk'}_outputfile  via a node's driver-info.

Story: #2007523
Task: #44102

Change-Id: I327cae5949d38e738d3c535487b3795d00ad8f1e
2021-12-08 17:47:19 +01:00
Derek Higgins
12f5f30e63 Instruct qemu-img to write image zeros to disk.
Doing this will cause it not to zero out the entire
block device which can be very costly on a slow HDD.

Story: 2009227
Task: 43315

Change-Id: I62ba2afc037d9844387e6b0984fe5008779d95d2
2021-12-08 15:56:05 +00:00
Arne Wiebalck
c6b1cb1c32 Burn-in: Add SMART self test to disk burn-in
Add the option to run a SMART self test right after
the disk burn-in. The disk burn-in step will fail if
the SMART test on any of the disk fails.

Story: #2007523
Task: #43383

Change-Id: I1312d5b71bedd044581a136af0b4c43769d21877
2021-12-06 09:09:35 +01:00
Iury Gregory Melo Ferreira
4042e7b08c Get rid of lambda in RealFilePartitioningTestCase
This commit changes the lambda usage in the RealFilePartitioningTestCase
to autospec to avoid problems with unexpected args.

Change-Id: I21356a7783f105dde9ff0d3777e2a06f3f28a786
2021-11-25 11:21:32 +01:00
Zuul
bcf2846553 Merge "Trivial: split away efibootmgr helpers" 2021-11-23 12:26:17 +00:00
Zuul
4954fe3702 Merge "Call execute from ironic-lib in hardware.py" 2021-11-22 20:04:40 +00:00
Dmitry Tantsur
5cf61e797a Trivial: split away efibootmgr helpers
These are very useful for downstream deploy steps, make them public.

Change-Id: I26106a07049f751d3e3cc646431e2176001f4645
2021-11-19 17:27:27 +01:00
Dmitry Tantsur
abe38a6a5f Fix compatibility with disk_utils.find_efi_partition
This function returns the complete block device record, not just number.
Fixes regression in 89bc73aa0105850c6ae44428642e31802bba3b20.

Also fix the incorrect job in the gate queue, which prevented us from
catching this issue on merging.

Change-Id: I4cbc359ceabfc193ce18fed14a1952359460e7d9
2021-11-19 14:51:27 +01:00
Dmitry Tantsur
89bc73aa01 Use two more functions from disk_utils
Change-Id: If01c9cd7f95b4495509369786360741b731161db
2021-11-18 13:49:51 +01:00
Riccardo Pittau
7b03fbbb36 Call execute from ironic-lib in hardware.py
Replace the execute wrapper from utils with execute from ironic-lib in
hardware.py

Adjust unit tests as needed.

Change-Id: I63a3b0407b2ca2246bd0e6624bfa0f748c0d73f7
2021-11-18 07:52:48 +01:00
Dmitry Tantsur
36d4a18fbc Move manage_uefi from the image extension to a public location
This call is very useful for custom deploy implementations, such as one
we maintain for OpenShift. Splitting it out also makes image.py slightly
more manageable.

The get_partition call is moved to partition_utils.

Change-Id: I60a6a2823d3eb27a4ae78e913e3655dae7b54ffe
2021-11-16 17:58:16 +01:00
Zuul
f5efbc3e7e Merge "Simplify error messages when running clean/deploy step" 2021-11-13 07:35:50 +00:00
Riccardo Pittau
a799dcc422 Move rescan device function to general utils
We use basically the same function in two modules in the same way, let's
put that in a common place.

Change-Id: I4016e43f2cb102d4327bafcc8a2f90112a6f944a
2021-11-10 15:34:37 +01:00
Dmitry Tantsur
c5fb191393 Simplify error messages when running clean/deploy step
The caller knows what step it invokes, there is no point in repeating
it in the error message. There is also no need to wrap the exception
if it's a RESTError or an ironic-lib exception already since they
are normally detailed enough.

Only leave a detailed message when an unexpected exception happens.

Change-Id: I1d8ca1e7ed1462159e4ae5f0bcf58686f6a2681c
2021-11-09 13:58:44 +01:00
Riccardo Pittau
23e67b5fea Re-read the partition table with partx -a, part 2
Use add instead of update to re-read the partition table with partx.

See [1] for more details.

Co-authored-by: Arne Wiebalck <arne.wiebalck@cern.ch>

[1] https: //opendev.org/openstack/ironic-python-agent/commit/dc8c1f16f9a00e2bff21612d1a9cf0ea0f3addf0

Change-Id: I2336e22dadc790cfbde87904612fcaa3b8c501db
2021-11-09 13:03:14 +01:00
Arne Wiebalck
dc8c1f16f9 Re-read the partition table with partx -a
Re-read the partition table with 'partx -a', rather than 'partx -u'.

This should fix an timing issue where the bootloader installation
fails to mount the EFI partition from a whole disk image since it
is not yet aware of the new partitions (observed with both, the
iscsi and the direct deploy interface).

Change-Id: If5da3075e813ae01df3decf8f0647aba111b0515
2021-11-06 13:43:48 +01:00
Zuul
0b56cca7f0 Merge "Fix UEFI record regex" 2021-11-05 14:59:35 +00:00
Julia Kreger
c5268bbdbb Fix UEFI record regex
I accidently put colons on the test data and remembered taking the
colon character out of the regex I was working on, but apparently
left it in, and accounted for the active entry indicator flag
which appears to have inconsistent support across vendors.

The regex has been fixed, and a test added from a Lenovo SR650
which has some additional string entry data in the UEFI output
which may separate entries.

Change-Id: I1f67b0fb1f645fa82e98bd7c7bba3ffc7755cc74
2021-11-04 09:45:25 -07:00
Zuul
a4b73058ee Merge "Always include the oslo_log log file in ramdisk logs" 2021-11-04 15:14:33 +00:00
Zuul
65827b3015 Merge "Stop requiring mocking of utils.execute if ironic-lib execute is mocked" 2021-11-03 14:19:52 +00:00
Julia Kreger
67eddfa7e3 Delete EFI boot entry duplicate labels first
Some firmware seems to take an objection with EFI nvram
entries being deleted after one is added, resulting in the
entire entry table being reset to the last known good state.

This is problematic, as ultimately deployments can time out
if we previously booted with Networking, and the machine, while
commanded to do other wise, reboots back to networking regardless.

We will now delete entries first, before proceeding.

Additionally, for general use, this pattern may serve the
community better by avoiding cases where we would have
previously just relied upon efibootmgr[0] to warn us of duplicate
entries.

[0]: 103aa22ece/src/efibootmgr.c (L228)

Change-Id: Ib61a7100a059e79a8b0901fd8f46b9bc41d657dc
Story: 2009649
Task: 43808
2021-11-01 06:59:26 -07:00
Dmitry Tantsur
2cedaa53c2 Always include the oslo_log log file in ramdisk logs
Even if journald is present, there is no guarantee that IPA logs there
(this is the case in container-based ramdisks).

Change-Id: Iceeab0010827728711e19e5b031ccac55fe1efde
2021-10-28 18:32:40 +02:00