ironic-python-agent

Author	SHA1	Message	Date
Zuul	89be7bd420	Merge "Conditional creation of RAIDed ESP for UEFI Software RAID"	2023-10-10 11:07:25 +00:00
Boushra Bettir	25704d2555	Add additional mock tests to unit tests for read only devices. Change ordering to ensure mock tests work correctly. Closes-Bug: #2037690 Change-Id: Ie9b884e58e4677a47e57c3ad39cadd65db8eec75	2023-10-08 20:02:05 +00:00
Zuul	73b76da5fe	Merge "Add get_service_steps logic to the agent"	2023-09-15 22:29:59 +00:00
Julia Kreger	f86975d53c	Add mlnx deploy_step entry to enable deploy time firmware Follow-up from service steps addition change to add a deploy steps alias for the Nvidia Mellanox network device firmware update clean steps. This allows deploy time firmware updates to be codified as part of a deployment with custom steps. Change-Id: I9d80447dee7cfde4d3f8d81d9d39e738916b7824	2023-08-31 06:35:39 -07:00
Julia Kreger	eb95273ffb	Add get_service_steps logic to the agent Initial code patches for service steps have merged in ironic, and it is now time to add support into the agent which allows service steps to be raised to the service. Updates the default hardware manager version to 1.2, which has rarely been incremented due to oversight. Change-Id: Iabd2c6c551389ec3c24e94b71245b1250345f7a7	2023-08-31 06:22:22 -07:00
Julia Kreger	b6c263a5dc	preserve/handle config drives on 4k block devices When an underlying block device (or driver) only supports 4KB IO, this can cause some issues with aspects like using an ISO9660 filesystem which can only support a maximum of 2KB IO. The agent will now attempt to mount the filesystem before deleting the supplied file, and should that fail it will mount the configuration drive file from the ramdisk utilizing a loopback, and then extract the contents of the ramdisk into a newly created VFAT filesystem which supports 4KB block IO. Closes-Bug: #2028002 Change-Id: I336acb8e8eb5a02dde2f5e24c258e23797d200ee	2023-08-24 08:10:22 -07:00
Julia Kreger	5ed520df89	Handle the node being locked If the node is locked, a lookup cannot be performed when an agent token needs to be generated, which tends to error like this: ironic_python_agent.ironic_api_client [-] Failed looking up node with addresses '00:6f:bb:34:b3:4d,00:6f:bb:34:b3:4b' at https://172.22.0.2:6385. Error 409: Node c25e451b-d2fb-4168-b690-f15bc8365520 is locked by host 172.22.0.2, please retry after the current operation is completed.. Check if inspection has completed. Problem is, if we keep pounding on the door, we can actually worsen the situation, and previously we would just just let tenacity retry. We will now hold for 30 seconds before proceeding, so we have hopefully allowed the operation to complete. Also fixes the error logging to help human's sanity. Change-Id: I97d3e27e2adb731794a7746737d3788c6e7977a0	2023-08-22 16:47:28 -07:00
Arne Wiebalck	286d66709a	Conditional creation of RAIDed ESP for UEFI Software RAID Rebuilding an instance on a RAIDed ESPs will fail due to sgdisk running against an non-clean disk and bailing out. Check if there is a RAIDed ESP already and skip creation if it exists. Change-Id: I13617ae77515a9d34bc4bb3caf9fae73d5e4e578	2023-08-16 17:39:04 +02:00
Zuul	e493cad02c	Merge "Log the number of bytes downloaded"	2023-07-27 21:39:12 +00:00
Julia Kreger	c65ad42ff1	Log the number of bytes downloaded When troubleshooting download issues, which may present as checksum validation failures, it is difficult to understand if the entire file was downloaded due to the way HTTP works. In that, a download may start with a successful result code, and the content is streamed out until the socket is closed. But with HTTP there is no way to know if that socket closed prematurely and the original server size is also an optional field, so just log the size we got to so we don't drive the humans [more-]insane. Also now logs the (optional) content-length field if supplied by the server. Change-Id: Id71b167f4e330d54b9afddf95f1a2ef9e40398bf	2023-07-19 16:20:40 +00:00
Zuul	0fb7fec56e	Merge "Allow md5 to be disabled from the conductor"	2023-07-12 03:53:14 +00:00
Zuul	119981a818	Merge "Fix nvidia hardware manager url parser to permit https"	2023-06-26 10:11:55 +00:00
Zuul	bb156aad6c	Merge "Fix Bandit errors"	2023-06-26 09:25:09 +00:00
Julia Kreger	b83678c968	Fix nvidia hardware manager url parser to permit https Change-Id: I9a10e543d3256ceaa78c6fbdb01fc0d88c0ee6e6	2023-06-06 15:35:16 +00:00
Julia Kreger	78c1343a54	Fix Bandit errors Bandit 1.7.5 released with a timeout check for all requests and urllib calls. Fixed those. In the process, then exposed a bandit b310 issue, which was already covered by the code, but explicitly marked it as such. Also, enables bandit checks to be voting for CI.. Change-Id: If0e87790191f5f3648366d571e1d85dd7393a548	2023-06-06 08:34:55 -07:00
Julia Kreger	e6fd7e753e	Allow md5 to be disabled from the conductor Also fixes my use of set_override, as it is not on the actual config object. You'd think I'd remember that, since I've done that before... Change-Id: I4b578c4319354001cbbd3b3856af96b30fd25555	2023-05-25 07:59:07 -07:00
Zuul	141c5ff1c3	Merge "Add support for CentOS SUM files"	2023-05-09 09:03:25 +00:00
Zuul	03e88b579e	Merge "Revert disabling MD5 checksums"	2023-05-05 08:44:37 +00:00
Zuul	44d9c2219f	Merge "Add network interface speed to the inventory"	2023-05-04 09:04:30 +00:00
Dmitry Tantsur	c1c5537ba2	Revert disabling MD5 checksums This was a significant breaking change that was landed despite explicit disagreement by some community members (myself included). It has already resulted in an accidental Ironic CI breakage, has broken Bifrost and has a potential of breaking Metal3. In case of Metal3, MD5 support is a part of its public API. While MD5 is a potential security hazard, I don't see the need to hurry this change without giving the community time to prepare. This change reverts the new option md5_enabled to True. Change-Id: I32b291ea162e8eb22429712c15cb5b225a6daafd	2023-05-04 09:26:10 +02:00
Harald Jensås	e7a048ecbe	Add support for CentOS SUM files The CentOS Stream SUM files uses format: # FILENAME: <size> bytes ALGORITHM (FILENAME) = CHECKSUM Compared to the more common format: CHECKSUM *FILE_A CHECKSUM FILE_B Use regular expressions to check for filename both in the middle with parentheses and at the end. Similarly look for valid checksums at beginning or end of line. Also look for know checsum patterns in case file only contain the checksum iteself. Change-Id: I9e49c1a6c66e51a7b884485f0bcaf7f1802bda33	2023-05-03 21:31:23 +02:00
Dmitry Tantsur	9ed232e77e	Add network interface speed to the inventory This is another fact that Metal3's baremetal-operator is currently consuming from extra-hardware. Change-Id: I2ec9d5e9369f5508e7583a4e13c2083f5c8b28ba	2023-05-03 12:20:35 +02:00
Julia Kreger	c05fdf790c	Fix checksum validation logic The checksum validation logic, which was updated early on in the whole process of deprecating md5, didn't account for a URL or a longer checksum (i.e. sha256/sha512) which was decided while the overall approach was being decided. Fixes the logic, and adds additional tests. Change-Id: Ic4053776e131fc02ace295a1e69e9f9faab47f42	2023-05-02 17:24:57 -07:00
Zuul	f37ea85a27	Merge "Disable MD5 image checksums"	2023-05-02 06:41:25 +00:00
Julia Kreger	32df26a22a	Disable MD5 image checksums MD5 image checksums have long been supersceeded by the use of a ``os_hash_algo`` and ``os_hash_value`` field as part of the properties of an image. In the process of doing this, we determined that checksum via URL usage was non-trivial and determined that an appropriate path was to allow the checksum type to be determined as needed. Change-Id: I26ba8f8c37d663096f558e83028ff463d31bd4e6	2023-04-24 16:54:42 -07:00
Julia Kreger	76accfb880	Fix UTF-16 result handling for efibootmgr The tl;dr is that UEFI NVRAM is in encoded in UTF-16, and when we run the efibootmgr command, we can get unicode characters back. Except we previously were forcing everything to be treated as UTF-8 due to the way oslo.concurrency's processutils module works. This could be observed with UTF character 0x00FF which raises up a nice exception when we try to decode it. Anyhow! while fixing handling of this, we discovered we could get basically the cruft out of the NVRAM, by getting what was most likey a truncated string out of our own test VMs. As such, we need to also permit decoding to be tollerant of failures. This could be binary data or as simple as flipped bits which get interpretted invalid characters. As such, we have introduced such data into one of our tests involving UEFI record de-duplication. Closes-Bug: 2015602 Change-Id: I006535bf124379ed65443c7b283bc99ecc95568b	2023-04-17 09:14:24 -07:00
Dmitry Tantsur	0304c73c0e	Report system firmware information in the inventory Change-Id: I5b6ceb9cdcf4baa97a6f0482d1030d14f3f2ecff	2023-03-31 14:28:32 +02:00
Zuul	088610844a	Merge "update NVIDIA NIC firmware images and settings by ironic-python-agent"	2023-01-31 19:35:53 +00:00
Dmitry Tantsur	c26f498f49	Make logs collection a hardware manager call This allows hardware managers to collect additional logs. Change-Id: If082b921d4bf71c4cc41a5a72db6995b08637374	2023-01-25 15:17:06 +01:00
waleed mousa	2c7f95e3ac	update NVIDIA NIC firmware images and settings by ironic-python-agent Add "update_nvidia_nic_firmware_image" and "update_nvidia_nic_firmware_settings" clean steps to MellanoxDeviceHardwareManager. By adding those two steps, we can update the firmware image and firmware settings of NVIDIA NICs by ironic-python-agent using manual cleaning command The clean steps require mstflint package installed on the image. The "update_nvidia_nic_firmware_image" clean step requires to pass "images" parameter to the clean command The "images" parameter is a json blob contains a list of images, where each image contains a map of: * url: to firmware image (file://, http://) * checksum: checksum of the provided image * checksumType: md5/sha512/sha256 * componentFlavor: PSID of the nic * version: version of the FW The "update_nvidia_nic_firmware_settings" clean step requires to pass "settings" parameter to the clean command The "settings" parameter is a json blob contains a list of settings, where each settings contains a map of: * deviceID: device ID * globalConfig: global config * function0Config: function 0 config * function1Config: function 1 config Change-Id: Icfaffd7c58c3c73c3fa28cfc2a6c954d2c93c16e Story: 2010228 Task: 46016	2023-01-11 14:00:07 +00:00
Riccardo Pittau	604c7081db	Fix create configuration unit tests The unit tests for create_configuration give different result if ran on a bios or uefi booted machine because they get the partition table type value based on the utils function get_node_boot_mode. Let's mock the boot_mode as we do in other tests to get an independent result. Change-Id: Ic0e7daea7ec4ce0806cd126c27166f84690c5d9e	2022-12-15 11:49:34 +01:00
Zuul	a1670753a2	Merge "Fix failure of bind mount in _install_grub2"	2022-10-17 23:46:05 +00:00
Rozzii	830fdfa4c6	prioritize lsblk as a source of device serials The current way of prioritizing ID/DM_SERIAL_SHORT or ID/DM_SERIAL works in most cases but the udev values seem to be unreliable. Based on experience it looks like lsblk might be a better source of truth than udev in regerards to serial number information. This commit makes lsblk the default provider of block device serial number information. Story: 2010263 Task: 46161 Change-Id: I16039b46676f1a61b32ee7ca7e6d526e65829113	2022-10-10 19:31:47 +03:00
Vanou Ishii	0bf579c955	Fix failure of bind mount in _install_grub2 When IPA runs _install_grub2, IPA tries to bind mount /dev, /proc and /run to <temporal directory path root partition mounted>/{dev,proc,run}. However that bind mount fails because there aren't such mount point path under temporal directory. To fix this failure, this patch add mkdir command before bind mount. Story: 2010292 Task: 46273 Change-Id: I434ce1bf1863ee0f11c4d09918d6d2d8dc065c02	2022-09-22 19:34:12 +09:00
Jakub Jelinek	a99bf274e4	SoftwareRAID: Enable skipping RAIDS Extend the ability to skip disks to RAID devices This allows users to specify the volume name of a logical device in the skip list which is then not cleaned or created again during the create/apply configuration phase The volume name can be specified in target raid config provided the change https://review.opendev.org/c/openstack/ironic-python-agent/+/853182/ passes Story: 2010233 Change-Id: Ib9290a97519bc48e585e1bafb0b60cc14e621e0f	2022-09-05 20:43:51 +00:00
Zuul	ed6a8d28b7	Merge "Create RAIDs with volume name"	2022-09-02 19:26:57 +00:00
Jakub Jelinek	daa20b01d1	Create RAIDs with volume name Use 'volume_name' field from 'target_raid_config' to create logical disks if it is present Do not allow two logical disks to have the same volume name Change-Id: If3e4e9f8698ec3e0cb49717f8ed2087d2ba03f2c	2022-09-02 14:51:42 +00:00
Julia Kreger	f3e3de8097	Fix software raid output poisoning In the event a device name is set to contain a raid device path, it is possible for the Name and Events field values of mdadm's detailed output to contain text which inadvertently gets captured and mapped as component data for the "holder" devices of the RAID set. This would cause invalid values to get passed to UEFI methods which would cause a deployment to fail under these circumstances. We now ignore the Name and Events fields in mdadm output. Change-Id: If721dfe1caa5915326482969e55fbf4697538231	2022-08-24 10:15:27 -07:00
Zuul	3a4baa637f	Merge "Enable skipping disks for cleaning"	2022-08-16 11:49:48 +00:00
Jakub Jelinek	0212337bd5	Enable skipping disks for cleaning Introduce a field skip_block_devices in properties - this is a list of dictionaries Create a helper function list_block_devices_check_skip_list Update tests of erase_devices_express to use node when calling _list_erasable_devices Add tests covering various options of the skip list definition Use the helper function in get_os_install_device when node is cached Story: 2009914 Change-Id: I3bdad3cca8acb3e0a69ebb218216e8c8419e9d65	2022-08-11 09:30:00 +00:00
Zuul	eb2215090a	Merge "Use lsblk json output for safety_check_block_device"	2022-08-03 23:47:17 +00:00
Riccardo Pittau	b5fac66bc3	Use lsblk json output for safety_check_block_device Change-Id: Ibfc2e203287d92e66567c33dc48f59392852b88e	2022-07-20 11:56:27 +02:00
Zuul	21b21a5f15	Merge "Guard shared device/cluster filesystems"	2022-07-20 08:23:55 +00:00
Julia Kreger	beb7484858	Guard shared device/cluster filesystems Certain filesystems are sometimes used in specialty computing environments where a shared storage infrastructure or fabric exists. These filesystems allow for multi-host shared concurrent read/write access to the underlying block device by not locking the entire device for exclusive use. Generally ranges of the disk are reserved for each interacting node to write to, and locking schemes are used to prevent collissions. These filesystems are common for use cases where high availability is required or ability for individual computers to collaborate on a given workload is critical, such as a group of hypervisors supporting virtual machines because it can allow for nearly seamless transfer of workload from one machine to another. Similar technologies are also used for cluster quorum and cluster durable state sharing, however that is not specifically considered in scope. Where things get difficult is becuase the entire device is not exclusively locked with the storage fabrics, and in some cases locking is handled by a Distributed Lock Manager on the network, or via special sector interactions amongst the cluster members which understand and support the filesystem. As a reult of this IO/Interaction model, an Ironic-Python-Agent performing cleaning can effectively destroy the cluster just by attempting to clean storage which it percieves as attached locally. This is not IPA's fault, often this case occurs when a Storage Administrator forgot to update LUN masking or volume settings on a SAN as it relates to an individual host in the overall computing environment. The net result of one node cleaning the shared volume may include restoration from snapshot, backup storage, or may ultimately cause permenant data loss, depending on the environment and the usage of that environment. Included in this patch: - IBM GPFS - Can be used on a shared block device... apparently according to IBM's documentation. The standard use of GPFS is more Ceph like in design... however GPFS is also a specially licensed commercial offering, so it is a red flag if this is encountered, and should be investigated by the environment's systems operator. - Red Hat GFS2 - Is used with shared common block devices in clusters. - VMware VMFS - Is used with shared SAN block devices, as well as local block devices. With shared block devices, ranges of the disk are locked instead of the whole disk, and the ranges are mapped to virtual machine disk interfaces. It is unknown, due to lack of information, if this will detect and prevent erasure of VMFS logical extent volumes. Co-Authored-by: Jay Faulkner <jay@jvf.cc> Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b Story: 2009978 Task: 44985	2022-07-19 13:24:03 -07:00
Dmitry Tantsur	6a1334a068	Drop support for instance netboot Change-Id: I2b4c543537dac8904028fdcdb590c1c214238e10	2022-07-07 16:38:22 +02:00
Zuul	5129eb4933	Merge "Fix passing kwargs in clean steps"	2022-07-04 13:56:52 +00:00
Zuul	ccf4ee31cf	Merge "Gather details about bond interfaces if present"	2022-07-02 02:56:46 +00:00
Zuul	7d15efd7a6	Merge "Remove oslo.serialization dependency"	2022-07-02 02:56:44 +00:00
Zuul	0cf5959f67	Merge "Collect udev properties in the ramdisk logs"	2022-07-02 00:37:35 +00:00
waleedm	eb07839bd4	Fix passing kwargs in clean steps Pass kwargs to dispatch_to_managers method in execute_clean_step Change-Id: Ida4ed4646659b2ee3f8f92b0a4d73c0266dd5a99 Story: 2010123 Task: 45705	2022-07-01 23:03:55 +00:00

1 2 3 4 5 ...

844 Commits