90 Commits

Author SHA1 Message Date
Jay Faulkner
c39517b044 Call evaluate_hardware_support exactly once per hwm
Fixes an issue where we could call evaluate_hardware_support multiple
times each run. Now, instead, we cache the values and use the cache
where needed.

Adds unit test coverage for get_managers and the new method.
Fixes issue where we were caching hardware managers between unit tests.

Also includes fixes for codespell CI:
- skip build files in repo
- fix spelling issues introduced to repo

Closes-bug: 2066308
Change-Id: Iebc5b6d2440bfc9f23daa322493379bbe69e84d0
2024-05-22 08:46:21 -07:00
Julia Kreger
6ac3f350c0 Unmount config drives
If this seems like deja vu, that is because it is. We had this
very same issue with the original CoreOS ramdisk. Since we don't
control the whole OS of the ramdisk, it only made sense to teach
the agent to umount the folder.

The folder is referenced already, and the agent does have safeguards
in place, but unfortunately this issue led to a rebuild breaking where
cloud-init, glean, and the agent were all trying do the right thing
as they thought, and there were just multiple /mnt/config folders
present in the OS. These are separate issues we also need to try and
remedy.

What happens is when the device is locked via a mount, the partition
table is never updated to the running OS as the mount creates a lock.
So the agent ends up thinking, in the case of a rebuild, that everything
including creating a configuration drive on that device has been
successful, but when you reboot, there is no partition table entry
for the new partition as the change was not successfully written.
This state prevented the workload from rebooting properly.
This change eliminates that possibility moving forward by attempting
to ensure that the cloud configuration folder is no longer mounted.

Change-Id: I4399dd0934361003cca9ff95a7e3e3ae9bba3dab
2024-04-29 15:41:59 -07:00
Adam Rozman
84a1195d5a add mixed matching of root device hints
This commit introduces the following changes:
  - New optional `all_serial_and_wwn` argument for the block device
    listing logic. The new argument makes it possible to
    collect wwn and serial number information from both
    lsblk and udevadm at the same time
  - Both the short and the long serials are collected
    from udeavadm without prioritization when the new argument
    has teh value True
  - The new feature is automatically enabled during block device listing
    as part of the root disk selecetion
  - New options are added to the lsblk command when used in the block
    device discovery process, previously lsblk was not looking
    for wwn numbers and now it does

Closes-Bug: #2061437
Change-Id: I438a686d948cd929311e2f418bb02fb771805148
Signed-off-by: Adam Rozman <adam.rozman@est.tech>
2024-04-15 15:53:50 +03:00
Jay Faulkner
36e5993a04 [codespell] Fix spelling issues in IPA
This fixes several spelling issues identified by codepsell. In some
cases, I may have manually modified a line to make the output more clear
or to correct grammatical issues which were obvious in the codespell
output.

Later changes in this chain will provide the codespell config used to
generate this, as well as adding this commit's SHA, once landed, to a
.git-blame-ignore-revs file to ensure it will not pollute git historys
for modern clients.

Related-Bug: 2047654
Change-Id: I240cf8484865c9b748ceb51f3c7b9fd973cb5ada
2023-12-28 10:54:46 -08:00
Dmitry Tantsur
c57deb7e76 Revert "Fix vmedia network config drive handling"
This reverts commit 33f01fa3c2f32f447ed36f00fea68321c3991c2e.

There are a few issues with the patch - see my comments there.
The most pressing and the reasons to revert are:
1) It breaks deployments when the vmedia is present but does not
   have a network_data.json (the case for Metal3).
2) It assumes the presence of Glean which may not be the case.

Neither Julia nor myself have time to thoroughly fix the issue,
leaving a revert as the only option to unblock Metal3.

Change-Id: I3f1a18a4910308699ca8f88d8e814c5efa78baee
Closes-Bug: #2045255
2023-11-30 10:33:29 +00:00
Julia Kreger
33f01fa3c2 Fix vmedia network config drive handling
When performing DHCP-less deployments, the agent can start and
discover more than one configuration drive present on a host.

For example, a host was previously deployed using Ironic, and
is now being re-deployed again.

If Glean was present in the ramdisk, the glean-early.sh would end
mounting the folder based upon label.

If cloud-init, somehow is still in the ramdisk, the other folder
could somehow get mounted.

This patch, which is intended to be backportable, causes the agent
to unmount any configuration drive folders, mount the most likely
candidate based upon device type, partition, and overall state of
the machine, and then utilize that configuration, if present,
to re-configure and reload networking.

Thus allowing dhcp-less re-deployments to be fixed without
forcing any breaking changes.

It should also be noted that this fix was generated in concert
with an additional tempest test case, because this overall failure
case needed to be reproduced to ensure we had a workable non-breaking
path forward.

Closes-Bug: 2032377
Change-Id: I9a3b3dbb9ca98771ce2decf893eba7a4c1890eee
2023-11-08 12:11:06 -08:00
Dmitry Tantsur
9ed232e77e Add network interface speed to the inventory
This is another fact that Metal3's baremetal-operator is currently
consuming from extra-hardware.

Change-Id: I2ec9d5e9369f5508e7583a4e13c2083f5c8b28ba
2023-05-03 12:20:35 +02:00
Dmitry Tantsur
0304c73c0e Report system firmware information in the inventory
Change-Id: I5b6ceb9cdcf4baa97a6f0482d1030d14f3f2ecff
2023-03-31 14:28:32 +02:00
Dmitry Tantsur
c26f498f49 Make logs collection a hardware manager call
This allows hardware managers to collect additional logs.

Change-Id: If082b921d4bf71c4cc41a5a72db6995b08637374
2023-01-25 15:17:06 +01:00
Rozzii
830fdfa4c6
prioritize lsblk as a source of device serials
The current way of prioritizing ID/DM_SERIAL_SHORT or ID/DM_SERIAL works
in most cases but the udev values seem to be unreliable.

Based on experience it looks like lsblk might be a better
source of truth than udev in regerards to serial number
information. This commit makes lsblk the default provider
of block device serial number information.

Story: 2010263
Task: 46161

Change-Id: I16039b46676f1a61b32ee7ca7e6d526e65829113
2022-10-10 19:31:47 +03:00
Jakub Jelinek
a99bf274e4 SoftwareRAID: Enable skipping RAIDS
Extend the ability to skip disks to RAID devices
This allows users to specify the volume name of
a logical device in the skip list which is then not cleaned
or created again during the create/apply configuration phase
The volume name can be specified in target raid config provided
the change https://review.opendev.org/c/openstack/ironic-python-agent/+/853182/
passes

Story: 2010233

Change-Id: Ib9290a97519bc48e585e1bafb0b60cc14e621e0f
2022-09-05 20:43:51 +00:00
Zuul
7d15efd7a6 Merge "Remove oslo.serialization dependency" 2022-07-02 02:56:44 +00:00
Dmitry Tantsur
a98675890f Collect udev properties in the ramdisk logs
Change-Id: Ifcf3dfff00b604dec1e2f430369ab8053f50f137
2022-06-17 16:19:58 +02:00
Riccardo Pittau
64ffd2ee80 Remove oslo.serialization dependency
Use pure json instead of jsonutils.

Borrow encode function from oslo.serialization to be used in the
utils module.

Change-Id: Ied9a2259a4329a86b4f0853bd1fb187563c0a036
2022-06-17 09:37:35 +02:00
Julia Kreger
014d37743a Multipath Hardware path handling
Removes multipath base devices from consideration by
default, and instead allows the device-mapper device
managed by multipath to be picked up and utilized
instead.

In effect, allowing us to ignore standby paths *and*
leverage multiple concurrent IO paths if so offered
via ALUA.

In reality, anyone who has previously built IPA with
multipath tooling might not have encountered issues
previously because they used Active/Active SAN storage
environments. They would have worked because the IO lock
would have been exchanged between controllers and paths.
However, Active/Passive environments will block passive
paths from access, ultimately preventing new locks from
being established without proper negotiation. Ultimately
requiring multipathing *and* the agent to be smart enough
to know to disqualify underlying paths to backend storage
volumes.

An additional benefit of this is active/active MPIO devices
will, as long as ``multipath`` is present inside the ramdisk,
no longer possibly result in duplicate IO wipes occuring
accross numerous devices.

Story: #2010003
Task: #45108
Resolves: rhbz#2076622
Resolves: rhbz#2070519
Change-Id: I0fd6356f036d5ff17510fb838eaf418164cdfc92
2022-05-18 20:26:39 -03:00
Dmitry Tantsur
424e649bed Collect a full lsblk output in the ramdisk logs
The existing lsblk call is very handy for an overview, but there a lot
more useful pairs to collect. Collect them in a machine-readable format
to be able to use in debugging and further development.

Change-Id: Ib27843524421944ee93de975d275e93276a5597a
2022-04-29 14:24:19 +02:00
Zuul
59c02f48cc Merge "Run partx in verbose mode to simplify debugging" 2022-03-08 12:35:29 +00:00
Dmitry Tantsur
f1ee454a0e Add mount and parted -l to the collected commands
Change-Id: I1c759552220291890704d0002a62ea3f51701691
2022-02-14 13:01:32 +01:00
Dmitry Tantsur
4d16ea413f Run partx in verbose mode to simplify debugging
Otherwise the actual failure cause is not recorded.

Change-Id: If66ee97016ddf0e5c3f40ad9400ff3bc6fdebedc
2022-02-14 12:02:22 +01:00
Dmitry Tantsur
89bc73aa01 Use two more functions from disk_utils
Change-Id: If01c9cd7f95b4495509369786360741b731161db
2021-11-18 13:49:51 +01:00
Riccardo Pittau
a799dcc422 Move rescan device function to general utils
We use basically the same function in two modules in the same way, let's
put that in a common place.

Change-Id: I4016e43f2cb102d4327bafcc8a2f90112a6f944a
2021-11-10 15:34:37 +01:00
Dmitry Tantsur
2cedaa53c2 Always include the oslo_log log file in ramdisk logs
Even if journald is present, there is no guarantee that IPA logs there
(this is the case in container-based ramdisks).

Change-Id: Iceeab0010827728711e19e5b031ccac55fe1efde
2021-10-28 18:32:40 +02:00
Riccardo Pittau
efbbc86f53 Increase version of hacking and pycodestyle
Fix H904 "Delay string interpolations at logging calls" errors

Change-Id: I331808d0132094faf739998a6984440787d3ebf8
2021-07-30 14:34:33 +02:00
Arne Wiebalck
5531d5cee7 Force immediate NTP time sync with chronyd at IPA startup
In order to make sure we have the correct time early, e.g.
by the time we create a TLS certificate, this patch proposes
to force an immediate NTP update when using chronyd. While
the previous approach uses the passed NTP server as well, the
update may happen only after chronyd has performed measurements
(which may be too late).

Story: #2009058
Task: #42843

Change-Id: I6edafe8edeb8549f324959e7a1ec175c3049a515
2021-07-16 10:28:31 +02:00
Dmitry Tantsur
2fcf35e56d Reduce logging verbosity when collecting logs
It's not uncommon that some commands fail when collecting logs.
We already log all failures in utils.execute, no need to duplicate
them with a non-fatal ERROR logging.

Change-Id: If151b3a3be979bd2b3ce01030e5d6242ad74eaa3
2021-06-11 16:04:59 +02:00
Zuul
5bac375f73 Merge "Capture the early logging" 2021-04-08 12:22:32 +00:00
Julia Kreger
df418984f0 Capture the early logging
_early_log prints to stdout, which is fine in some cases,
however in other cases it gets lost in the shuffle of process
launch by things like systemd.

Lets try to save everything, and re-log it so it is easy to
debug early issues.

Change-Id: I334a9073d17cccec4c669fae82edc3e388debc5c
2021-04-01 11:16:20 -07:00
Dmitry Tantsur
afcc5d392c Fix incorrect lsblk tag and add a virtual media job
Follow-up to 8dd6589e66d03e45e1d510601da9531a30842cff: PATH is not a
valid lsblk tag, we need to use KNAME with -p flag.

Also add a vmedia job to avoid breakages in the future. It's added
non-voting because we have a deadlock with this change:
https://review.opendev.org/c/openstack/ironic/+/783722

Change-Id: Ifffeac9c1c4d394526d655eaa14c9fe7bd3a1e5e
2021-03-30 12:25:14 +02:00
Julia Kreger
8dd6589e66 Validate vmedia for vmedia usage
Virtual media devices based logic needs to be
guarded from being used or considered based upon
if the machine actually booted from virtual media,
or not.

At the same time, actual devices need to be checked
in order to make sure they align with what we expect
in order to prevent consideration of content which
should not be leveraged.

Change-Id: If2d5c6f4815c9e42798a2d96d59015e1b1dbd457
Story: 2008749
Task: 42108
2021-03-29 13:22:43 -07:00
Dmitry Tantsur
d622d38da6 Refactor: use mounted from ironic-lib
Change-Id: I0b597ddbc71c133abe6c0acfd8f49e3af4e896bb
2021-03-23 17:24:03 +01:00
Riccardo Pittau
0459c61c8d Use try_execute from ironic-lib
Also adapt unit tests

Change-Id: I37d050877daabc9dc0a5821cf20a689652b26f34
2021-02-25 14:46:17 +01:00
Dmitry Tantsur
59cb08fd28 New deploy step for injecting arbitrary files
This change adds a deploy step inject_files that adds a flexible
way to inject files into the instance.

Change-Id: I0e70a2cbc13744195c9493a48662e465ec010dbe
Story: #2008611
Task: #41794
2021-02-16 16:56:52 +01:00
Kaifeng Wang
6072e2d65a Remove lldp-timeout support
The kernel parameter lldp-timeout was deprecated removed in this patch.

Change-Id: I98da49e61d9ed3236cc495d1ab351eba0931473b
2021-01-15 16:13:52 +08:00
Dmitry Tantsur
d69f12e0fd Handle situation when a configdrive is already mounted
Glean mounts the configdrive and does not unmount it afterwards.
If a mount point already exists, just use it.

Change-Id: Ia62279afbb9fd9770864942dc40629b69ae8f4ae
2020-12-16 18:17:24 +01:00
Dmitry Tantsur
b9b67fad77 Copy any configuration from the virtual media
For ramdisk TLS (and other potential future enhancements) we need
to be able to inject configuration and certificates into the ramdisk.
Since we cannot pass files through kernel parameters, we need to
put them on the generated ISO or (in the future) config drive.

This change detects IPA configuration and copies it into the ramdisk
early enough for any configuration files to get picked.

Changed /dev/disk/by-label to blkid since the former may not exist
on all ramdisks (e.g. tinyIPA).

Change-Id: Ic64d7842a59795bbf02f194221dedc07c6b56e8c
2020-11-23 16:04:45 +01:00
Dmitry Tantsur
0eee26ea66 Fix confusing logging when running asynchronous commands
We log them as completed when they start executing.

Also fix a problem in remove_large_keys that prevented items
with defaultdict from being logged.

Change-Id: I34a06cc85f55c693416f8c4c9877d55d6affafc9
2020-06-26 15:19:04 +02:00
Riccardo Pittau
557d5603a2 Split and move logic for partition tables
Move and split the logic to create the partition tables when
applying raid configuration.

Change-Id: Ic76dd2067ace02dd02351caca0c7f9b05571e510
2020-05-25 08:11:28 +00:00
Dmitry Tantsur
ff49b04e28 A boot partition on a GPT disk should be considered an EFI partition
DIB builds instance images with EFI partitions that only have the boot
flag, but not esp. According to parted documentation, boot is an alias
for esp on GPT, so accept it as well.

To avoid complexities when parsing parted output, the implementation
is switched to existing utils and ironic-lib functions.

Change-Id: I5f57535e5a89528c38d0879177b59db6c0f5c06e
Story: #2007455
Task: #39423
2020-04-15 18:38:15 +02:00
Raphael Glon
9343348106 Software RAID: Add UEFI support
The proposed changes concern two steps:

First, when creating the RAID configuration, have a GPT partition
table type (this is not necessary, but more natural with UEFI).
Also, leave some space, either for the EFI partitions or the BIOS
boot partitions, outside the Software RAID.

Secondly, when installing the bootloader, make sure the correct
boot partitions are created or relocated.

Change-Id: Icf0a76b0de89e7a8494363ec91b2f1afda4faa3b
Story: #2006379
Task: #37635
2020-04-02 18:02:19 +02:00
Riccardo Pittau
a332a19a57 Bump hacking to 3.0.0
Change-Id: I1032ea6a2e9d79aeaecb1458c319cbeb15ac1fff
2020-03-30 12:55:46 +02:00
Julia Kreger
cee4bfc4bc Add NTP time sync
Attempt to sync the clock and save it to the hardware clock.

This feature supports use of chrony or ntpdate.

Sem-Ver: feature
Change-Id: I178d7614429d582e742d9cba6d0fa3ae099775e3
Story: 1619054
Task: 11591
2020-03-07 09:16:19 -08:00
Iury Gregory Melo Ferreira
b6210be196 Avoid grub2-install when on UEFI boot mode
This patch changes the workflow for whole disk images when using uefi.
If we can identify the bootloader and it's valid we can update using
efibootmgr since grub2-install have problems specially on secure boot
mode.
We also updated the regex to search for the uefi partition on the disk,
since in some cases the parted command output can be without the FS
for the partition with esp Flag.

Change-Id: I7167e71e5d2352a045565289b200e5530d0ba11d
Story: #2006847
Task: #37435
2020-01-16 11:23:41 +01:00
Dmitry Tantsur
d40132ad71 Omit configdrive and system_logs from logging
Since they are large and base64-encoded, they bloat ramdisk logs.

Change-Id: I2e995ef356075be2a7f5b0a1906d02f90fe98a06
2020-01-13 11:53:12 +01:00
Zuul
12b62d6c3a Merge "Collect lsblk and /proc/mdstat with ramdisk logs" 2020-01-10 09:22:29 +00:00
Iury Gregory Melo Ferreira
966356e58c Search for efi partition
This patch adds a function that will be responsible to identify
the efi partition on a give device, this is necessary on the Software
Raid scenario and when installing bootloader.

Change-Id: I5f326db2d37b2a15090ec84e477e63f7d92e7447
Co-Authored-By: Raphael Glon <raphael.glon@corp.ovh.com>
2019-12-04 20:09:59 +01:00
Riccardo Pittau
ca7a46b113 Stop using six library
Since we've dropped support for Python 2.7, it's time to look at
the bright future that Python 3.x will bring and stop forcing
compatibility with older versions.
This patch removes the six library from requirements, not
looking back.

Change-Id: I4795417aa649be75ba7162a8cf30eacbb88c7b5e
2019-11-29 10:18:14 +01:00
Dmitry Tantsur
11976c9d2b Collect lsblk and /proc/mdstat with ramdisk logs
This should improve debugability of partitioning problems.

Change-Id: I3c7ae3f2831c9900a3f0d24daec6dd6b8bea6a60
2019-10-14 15:28:08 +02:00
Raphael Glon
c546749423 Fixes get_holder disks with nvme drives
Change-Id: I195ffdeeb3c13bdec5fc1735b82efa53c8d9d3de
2019-08-13 10:37:18 +02:00
Dmitry Tantsur
94048fe97e Stop logging lshw output, collect it with other logs instead
The lshw output is huge even on virtual machines, and it pollutes
the debug logging. This change silences it. Instead, the lshw output
is collected as part of the ramdisk logs.

Depends-On: https://review.opendev.org/#/c/665635/
Change-Id: I6a3015b2d8d09f6f48b5cbd39dc84bd75b72f909
2019-06-17 14:00:26 +02:00
Kaifeng Wang
4cb2ac4ae4 Fix docs job failure due to malformated docstring
Change-Id: Ic3532e51481fd07e2f816aeacb07ded2d56791ee
2019-04-09 10:24:17 +08:00