Fixes an issue where we could call evaluate_hardware_support multiple
times each run. Now, instead, we cache the values and use the cache
where needed.
Adds unit test coverage for get_managers and the new method.
Fixes issue where we were caching hardware managers between unit tests.
Also includes fixes for codespell CI:
- skip build files in repo
- fix spelling issues introduced to repo
Closes-bug: 2066308
Change-Id: Iebc5b6d2440bfc9f23daa322493379bbe69e84d0
If this seems like deja vu, that is because it is. We had this
very same issue with the original CoreOS ramdisk. Since we don't
control the whole OS of the ramdisk, it only made sense to teach
the agent to umount the folder.
The folder is referenced already, and the agent does have safeguards
in place, but unfortunately this issue led to a rebuild breaking where
cloud-init, glean, and the agent were all trying do the right thing
as they thought, and there were just multiple /mnt/config folders
present in the OS. These are separate issues we also need to try and
remedy.
What happens is when the device is locked via a mount, the partition
table is never updated to the running OS as the mount creates a lock.
So the agent ends up thinking, in the case of a rebuild, that everything
including creating a configuration drive on that device has been
successful, but when you reboot, there is no partition table entry
for the new partition as the change was not successfully written.
This state prevented the workload from rebooting properly.
This change eliminates that possibility moving forward by attempting
to ensure that the cloud configuration folder is no longer mounted.
Change-Id: I4399dd0934361003cca9ff95a7e3e3ae9bba3dab
This commit introduces the following changes:
- New optional `all_serial_and_wwn` argument for the block device
listing logic. The new argument makes it possible to
collect wwn and serial number information from both
lsblk and udevadm at the same time
- Both the short and the long serials are collected
from udeavadm without prioritization when the new argument
has teh value True
- The new feature is automatically enabled during block device listing
as part of the root disk selecetion
- New options are added to the lsblk command when used in the block
device discovery process, previously lsblk was not looking
for wwn numbers and now it does
Closes-Bug: #2061437
Change-Id: I438a686d948cd929311e2f418bb02fb771805148
Signed-off-by: Adam Rozman <adam.rozman@est.tech>
This fixes several spelling issues identified by codepsell. In some
cases, I may have manually modified a line to make the output more clear
or to correct grammatical issues which were obvious in the codespell
output.
Later changes in this chain will provide the codespell config used to
generate this, as well as adding this commit's SHA, once landed, to a
.git-blame-ignore-revs file to ensure it will not pollute git historys
for modern clients.
Related-Bug: 2047654
Change-Id: I240cf8484865c9b748ceb51f3c7b9fd973cb5ada
This reverts commit 33f01fa3c2f32f447ed36f00fea68321c3991c2e.
There are a few issues with the patch - see my comments there.
The most pressing and the reasons to revert are:
1) It breaks deployments when the vmedia is present but does not
have a network_data.json (the case for Metal3).
2) It assumes the presence of Glean which may not be the case.
Neither Julia nor myself have time to thoroughly fix the issue,
leaving a revert as the only option to unblock Metal3.
Change-Id: I3f1a18a4910308699ca8f88d8e814c5efa78baee
Closes-Bug: #2045255
When performing DHCP-less deployments, the agent can start and
discover more than one configuration drive present on a host.
For example, a host was previously deployed using Ironic, and
is now being re-deployed again.
If Glean was present in the ramdisk, the glean-early.sh would end
mounting the folder based upon label.
If cloud-init, somehow is still in the ramdisk, the other folder
could somehow get mounted.
This patch, which is intended to be backportable, causes the agent
to unmount any configuration drive folders, mount the most likely
candidate based upon device type, partition, and overall state of
the machine, and then utilize that configuration, if present,
to re-configure and reload networking.
Thus allowing dhcp-less re-deployments to be fixed without
forcing any breaking changes.
It should also be noted that this fix was generated in concert
with an additional tempest test case, because this overall failure
case needed to be reproduced to ensure we had a workable non-breaking
path forward.
Closes-Bug: 2032377
Change-Id: I9a3b3dbb9ca98771ce2decf893eba7a4c1890eee
The current way of prioritizing ID/DM_SERIAL_SHORT or ID/DM_SERIAL works
in most cases but the udev values seem to be unreliable.
Based on experience it looks like lsblk might be a better
source of truth than udev in regerards to serial number
information. This commit makes lsblk the default provider
of block device serial number information.
Story: 2010263
Task: 46161
Change-Id: I16039b46676f1a61b32ee7ca7e6d526e65829113
Extend the ability to skip disks to RAID devices
This allows users to specify the volume name of
a logical device in the skip list which is then not cleaned
or created again during the create/apply configuration phase
The volume name can be specified in target raid config provided
the change https://review.opendev.org/c/openstack/ironic-python-agent/+/853182/
passes
Story: 2010233
Change-Id: Ib9290a97519bc48e585e1bafb0b60cc14e621e0f
Use pure json instead of jsonutils.
Borrow encode function from oslo.serialization to be used in the
utils module.
Change-Id: Ied9a2259a4329a86b4f0853bd1fb187563c0a036
Removes multipath base devices from consideration by
default, and instead allows the device-mapper device
managed by multipath to be picked up and utilized
instead.
In effect, allowing us to ignore standby paths *and*
leverage multiple concurrent IO paths if so offered
via ALUA.
In reality, anyone who has previously built IPA with
multipath tooling might not have encountered issues
previously because they used Active/Active SAN storage
environments. They would have worked because the IO lock
would have been exchanged between controllers and paths.
However, Active/Passive environments will block passive
paths from access, ultimately preventing new locks from
being established without proper negotiation. Ultimately
requiring multipathing *and* the agent to be smart enough
to know to disqualify underlying paths to backend storage
volumes.
An additional benefit of this is active/active MPIO devices
will, as long as ``multipath`` is present inside the ramdisk,
no longer possibly result in duplicate IO wipes occuring
accross numerous devices.
Story: #2010003
Task: #45108
Resolves: rhbz#2076622
Resolves: rhbz#2070519
Change-Id: I0fd6356f036d5ff17510fb838eaf418164cdfc92
The existing lsblk call is very handy for an overview, but there a lot
more useful pairs to collect. Collect them in a machine-readable format
to be able to use in debugging and further development.
Change-Id: Ib27843524421944ee93de975d275e93276a5597a
We use basically the same function in two modules in the same way, let's
put that in a common place.
Change-Id: I4016e43f2cb102d4327bafcc8a2f90112a6f944a
Even if journald is present, there is no guarantee that IPA logs there
(this is the case in container-based ramdisks).
Change-Id: Iceeab0010827728711e19e5b031ccac55fe1efde
In order to make sure we have the correct time early, e.g.
by the time we create a TLS certificate, this patch proposes
to force an immediate NTP update when using chronyd. While
the previous approach uses the passed NTP server as well, the
update may happen only after chronyd has performed measurements
(which may be too late).
Story: #2009058
Task: #42843
Change-Id: I6edafe8edeb8549f324959e7a1ec175c3049a515
It's not uncommon that some commands fail when collecting logs.
We already log all failures in utils.execute, no need to duplicate
them with a non-fatal ERROR logging.
Change-Id: If151b3a3be979bd2b3ce01030e5d6242ad74eaa3
_early_log prints to stdout, which is fine in some cases,
however in other cases it gets lost in the shuffle of process
launch by things like systemd.
Lets try to save everything, and re-log it so it is easy to
debug early issues.
Change-Id: I334a9073d17cccec4c669fae82edc3e388debc5c
Follow-up to 8dd6589e66d03e45e1d510601da9531a30842cff: PATH is not a
valid lsblk tag, we need to use KNAME with -p flag.
Also add a vmedia job to avoid breakages in the future. It's added
non-voting because we have a deadlock with this change:
https://review.opendev.org/c/openstack/ironic/+/783722
Change-Id: Ifffeac9c1c4d394526d655eaa14c9fe7bd3a1e5e
Virtual media devices based logic needs to be
guarded from being used or considered based upon
if the machine actually booted from virtual media,
or not.
At the same time, actual devices need to be checked
in order to make sure they align with what we expect
in order to prevent consideration of content which
should not be leveraged.
Change-Id: If2d5c6f4815c9e42798a2d96d59015e1b1dbd457
Story: 2008749
Task: 42108
This change adds a deploy step inject_files that adds a flexible
way to inject files into the instance.
Change-Id: I0e70a2cbc13744195c9493a48662e465ec010dbe
Story: #2008611
Task: #41794
Glean mounts the configdrive and does not unmount it afterwards.
If a mount point already exists, just use it.
Change-Id: Ia62279afbb9fd9770864942dc40629b69ae8f4ae
For ramdisk TLS (and other potential future enhancements) we need
to be able to inject configuration and certificates into the ramdisk.
Since we cannot pass files through kernel parameters, we need to
put them on the generated ISO or (in the future) config drive.
This change detects IPA configuration and copies it into the ramdisk
early enough for any configuration files to get picked.
Changed /dev/disk/by-label to blkid since the former may not exist
on all ramdisks (e.g. tinyIPA).
Change-Id: Ic64d7842a59795bbf02f194221dedc07c6b56e8c
We log them as completed when they start executing.
Also fix a problem in remove_large_keys that prevented items
with defaultdict from being logged.
Change-Id: I34a06cc85f55c693416f8c4c9877d55d6affafc9
DIB builds instance images with EFI partitions that only have the boot
flag, but not esp. According to parted documentation, boot is an alias
for esp on GPT, so accept it as well.
To avoid complexities when parsing parted output, the implementation
is switched to existing utils and ironic-lib functions.
Change-Id: I5f57535e5a89528c38d0879177b59db6c0f5c06e
Story: #2007455
Task: #39423
The proposed changes concern two steps:
First, when creating the RAID configuration, have a GPT partition
table type (this is not necessary, but more natural with UEFI).
Also, leave some space, either for the EFI partitions or the BIOS
boot partitions, outside the Software RAID.
Secondly, when installing the bootloader, make sure the correct
boot partitions are created or relocated.
Change-Id: Icf0a76b0de89e7a8494363ec91b2f1afda4faa3b
Story: #2006379
Task: #37635
Attempt to sync the clock and save it to the hardware clock.
This feature supports use of chrony or ntpdate.
Sem-Ver: feature
Change-Id: I178d7614429d582e742d9cba6d0fa3ae099775e3
Story: 1619054
Task: 11591
This patch changes the workflow for whole disk images when using uefi.
If we can identify the bootloader and it's valid we can update using
efibootmgr since grub2-install have problems specially on secure boot
mode.
We also updated the regex to search for the uefi partition on the disk,
since in some cases the parted command output can be without the FS
for the partition with esp Flag.
Change-Id: I7167e71e5d2352a045565289b200e5530d0ba11d
Story: #2006847
Task: #37435
This patch adds a function that will be responsible to identify
the efi partition on a give device, this is necessary on the Software
Raid scenario and when installing bootloader.
Change-Id: I5f326db2d37b2a15090ec84e477e63f7d92e7447
Co-Authored-By: Raphael Glon <raphael.glon@corp.ovh.com>
Since we've dropped support for Python 2.7, it's time to look at
the bright future that Python 3.x will bring and stop forcing
compatibility with older versions.
This patch removes the six library from requirements, not
looking back.
Change-Id: I4795417aa649be75ba7162a8cf30eacbb88c7b5e
The lshw output is huge even on virtual machines, and it pollutes
the debug logging. This change silences it. Instead, the lshw output
is collected as part of the ramdisk logs.
Depends-On: https://review.opendev.org/#/c/665635/
Change-Id: I6a3015b2d8d09f6f48b5cbd39dc84bd75b72f909