Image caching was never fully supported in Ironic or IPA; this is vestigal
code leftover from a partial implementation.
Even if we implemetented it today, we'd likely use a completely different
methodology.
Change-Id: Id4ab7b3c4f106b209585dbd090cdcb229b1daa73
IPA now includes information about numa node id when collecting
information about PCI devices.
Closes-bug: #1622940
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Change-Id: I70b0cb3eff66d67bb8168982acbbf335de0599cd
Used helper function, `parse_device_tags`
from ironic_lib instead of the
shlex module for their identical
functionality. Updated
mock_execute.side_effect for lsblk
compatibility in utils.execute.
Closes-Bug: #2037572
Change-Id: I6600e054f9644c67ab003f0e0f6c380b5c217223
HTTP is a fun protocol.
Size is basically optional. And clients implicitly trust the server
and socket has transferred all the bytes. Which *really* means you
should always checksum.
But... previously we didn't checksum as part of retrying.
So if anything happened with python-requests, or lower level
library code or the system itself causing bytes to be lost off the
buffer, creating an incomplete transfer situation, then we wouldn't
know until the checksum.
So now, we checksum and re-trigger the download if there is a
failure of the checksum.
This involved a minor shift in the download logic, and resulted in
a needful minor fix to an image checksum test as it would loop for
90 seconds as well.
Closes-Bug: 2038934
Change-Id: I543a60555a2621b49dd7b6564bd0654a46db2e9a
Initial code patches for service steps have merged in
ironic, and it is now time to add support into the
agent which allows service steps to be raised to
the service.
Updates the default hardware manager version to 1.2,
which has *rarely* been incremented due to oversight.
Change-Id: Iabd2c6c551389ec3c24e94b71245b1250345f7a7
Changes the default lookup timeout to be 600 seconds which
reduces the risk of lookup failing as a write operation
to the backing database is performed upon lookup thanks to
generation of an agent token.
Overall, this is fairly harmless since by default ramdisks
restart the agent if they were not able to successfully
start.
Change-Id: I35c64c0b4f9b3b607df1bc0c4c2a852aa3595cbd
When an underlying block device (or driver) only supports 4KB IO,
this can cause some issues with aspects like using an ISO9660 filesystem
which can only support a maximum of 2KB IO.
The agent will now attempt to mount the filesystem *before* deleting the
supplied file, and should that fail it will mount the configuration drive
file from the ramdisk utilizing a loopback, and then extract the contents
of the ramdisk into a newly created VFAT filesystem which supports 4KB
block IO.
Closes-Bug: #2028002
Change-Id: I336acb8e8eb5a02dde2f5e24c258e23797d200ee
If the node is locked, a lookup cannot be performed when an agent
token needs to be generated, which tends to error like this:
ironic_python_agent.ironic_api_client [-] Failed looking up node
with addresses '00:6f:bb:34:b3:4d,00:6f:bb:34:b3:4b' at
https://172.22.0.2:6385. Error 409: Node
c25e451b-d2fb-4168-b690-f15bc8365520 is locked by host 172.22.0.2,
please retry after the current operation is completed..
Check if inspection has completed.
Problem is, if we keep pounding on the door, we can actually worsen
the situation, and previously we would just just let tenacity
retry.
We will now hold for 30 seconds before proceeding, so we have
hopefully allowed the operation to complete.
Also fixes the error logging to help human's sanity.
Change-Id: I97d3e27e2adb731794a7746737d3788c6e7977a0
Rebuilding an instance on a RAIDed ESPs will fail due to sgdisk
running against an non-clean disk and bailing out. Check if there
is a RAIDed ESP already and skip creation if it exists.
Change-Id: I13617ae77515a9d34bc4bb3caf9fae73d5e4e578
When troubleshooting download issues, which may present
as checksum validation failures, it is difficult to understand
if the *entire* file was downloaded due to the way HTTP works.
In that, a download may start with a successful result code,
and the content is streamed out until the socket is closed.
But with HTTP there is no way to know if that socket closed
prematurely and the original server size is *also* an optional
field, so just log the size we got to so we don't drive the
humans [more-]insane.
Also now logs the (optional) content-length field if
supplied by the server.
Change-Id: Id71b167f4e330d54b9afddf95f1a2ef9e40398bf
Bandit 1.7.5 released with a timeout check for all requests and
urllib calls.
Fixed those.
In the process, then exposed a bandit b310 issue, which was already
covered by the code, but explicitly marked it as such.
Also, enables bandit checks to be voting for CI..
Change-Id: If0e87790191f5f3648366d571e1d85dd7393a548
Also fixes my use of set_override, as it is not on the actual
config object. You'd think I'd remember that, since I've done
that before...
Change-Id: I4b578c4319354001cbbd3b3856af96b30fd25555
This was a significant breaking change that was landed despite explicit
disagreement by some community members (myself included). It has already
resulted in an accidental Ironic CI breakage, has broken Bifrost and has
a potential of breaking Metal3. In case of Metal3, MD5 support is a part
of its public API.
While MD5 is a potential security hazard, I don't see the need to hurry
this change without giving the community time to prepare. This change
reverts the new option md5_enabled to True.
Change-Id: I32b291ea162e8eb22429712c15cb5b225a6daafd
The CentOS Stream SUM files uses format:
# FILENAME: <size> bytes
ALGORITHM (FILENAME) = CHECKSUM
Compared to the more common format:
CHECKSUM *FILE_A
CHECKSUM FILE_B
Use regular expressions to check for filename both
in the middle with parentheses and at the end.
Similarly look for valid checksums at beginning or
end of line. Also look for know checsum patterns in
case file only contain the checksum iteself.
Change-Id: I9e49c1a6c66e51a7b884485f0bcaf7f1802bda33
The checksum validation logic, which was updated early on in the
whole process of deprecating md5, didn't account for a URL *or* a
longer checksum (i.e. sha256/sha512) which was decided while the
overall approach was being decided.
Fixes the logic, and adds additional tests.
Change-Id: Ic4053776e131fc02ace295a1e69e9f9faab47f42
Binary LLDP data is bloating inventory causing us to disable its collection
by default. For other similar low-level information, such as PCI devices
or DMI data, we already use inspection collectors instead. Now that the
inventory format is shared with out-of-band inspection, having LLDP
there makes even less sense.
This change adds a new collector ``lldp`` to replace the now-deprecated
inventory field.
Change-Id: I56be06a7d1db28407e1128c198c12bea0809d3a3
MD5 image checksums have long been supersceeded by the use of a
``os_hash_algo`` and ``os_hash_value`` field as part of the
properties of an image.
In the process of doing this, we determined that checksum via
URL usage was non-trivial and determined that an appropriate
path was to allow the checksum type to be determined as needed.
Change-Id: I26ba8f8c37d663096f558e83028ff463d31bd4e6
The tl;dr is that UEFI NVRAM is in encoded
in UTF-16, and when we run the efibootmgr command,
we can get unicode characters back.
Except we previously were forcing everything to be
treated as UTF-8 due to the way oslo.concurrency's
processutils module works.
This could be observed with UTF character 0x00FF
which raises up a nice exception when we try to
decode it.
Anyhow! while fixing handling of this, we discovered
we could get basically the cruft out of the NVRAM,
by getting what was most likey a truncated string
out of our own test VMs. As such, we need to also
permit decoding to be tollerant of failures.
This could be binary data or as simple as flipped
bits which get interpretted invalid characters.
As such, we have introduced such data into one of our
tests involving UEFI record de-duplication.
Closes-Bug: 2015602
Change-Id: I006535bf124379ed65443c7b283bc99ecc95568b
Add "update_nvidia_nic_firmware_image" and "update_nvidia_nic_firmware_settings"
clean steps to MellanoxDeviceHardwareManager.
By adding those two steps, we can update the firmware image and
firmware settings of NVIDIA NICs by ironic-python-agent using
manual cleaning command
The clean steps require mstflint package installed on the image.
The "update_nvidia_nic_firmware_image" clean step requires to pass
"images" parameter to the clean command
The "images" parameter is a json blob contains
a list of images, where each image contains a map of:
* url: to firmware image (file://, http://)
* checksum: checksum of the provided image
* checksumType: md5/sha512/sha256
* componentFlavor: PSID of the nic
* version: version of the FW
The "update_nvidia_nic_firmware_settings" clean step requires to pass
"settings" parameter to the clean command
The "settings" parameter is a json blob contains
a list of settings, where each settings contains a map of:
* deviceID: device ID
* globalConfig: global config
* function0Config: function 0 config
* function1Config: function 1 config
Change-Id: Icfaffd7c58c3c73c3fa28cfc2a6c954d2c93c16e
Story: 2010228
Task: 46016
The unit tests for create_configuration give different result if
ran on a bios or uefi booted machine because they get the
partition table type value based on the utils function
get_node_boot_mode.
Let's mock the boot_mode as we do in other tests to get an
independent result.
Change-Id: Ic0e7daea7ec4ce0806cd126c27166f84690c5d9e
The current way of prioritizing ID/DM_SERIAL_SHORT or ID/DM_SERIAL works
in most cases but the udev values seem to be unreliable.
Based on experience it looks like lsblk might be a better
source of truth than udev in regerards to serial number
information. This commit makes lsblk the default provider
of block device serial number information.
Story: 2010263
Task: 46161
Change-Id: I16039b46676f1a61b32ee7ca7e6d526e65829113
When IPA runs _install_grub2, IPA tries to bind mount /dev, /proc and /run
to <temporal directory path root partition mounted>/{dev,proc,run}.
However that bind mount fails because there aren't such mount point path
under temporal directory.
To fix this failure, this patch add mkdir command before bind mount.
Story: 2010292
Task: 46273
Change-Id: I434ce1bf1863ee0f11c4d09918d6d2d8dc065c02
Extend the ability to skip disks to RAID devices
This allows users to specify the volume name of
a logical device in the skip list which is then not cleaned
or created again during the create/apply configuration phase
The volume name can be specified in target raid config provided
the change https://review.opendev.org/c/openstack/ironic-python-agent/+/853182/
passes
Story: 2010233
Change-Id: Ib9290a97519bc48e585e1bafb0b60cc14e621e0f
Use 'volume_name' field from 'target_raid_config' to create logical
disks if it is present
Do not allow two logical disks to have the same volume name
Change-Id: If3e4e9f8698ec3e0cb49717f8ed2087d2ba03f2c
In the event a device name is set to contain a raid device path,
it is possible for the Name and Events field values of mdadm's
detailed output to contain text which inadvertently gets captured and
mapped as component data for the "holder" devices of the RAID set.
This would cause invalid values to get passed to UEFI methods
which would cause a deployment to fail under these circumstances.
We now ignore the Name and Events fields in mdadm output.
Change-Id: If721dfe1caa5915326482969e55fbf4697538231