ironic-python-agent

Author	SHA1	Message	Date
Julia Kreger	e5d552474b	Catch ismount not being handled While investigating another grub issue, I was confused by the path taken in the logs reported, and noticed that on a ramdisk, we might not actually have a valid response to os.path.ismount, I'm guessing depending on what in memory filesystem is in use while also coupled with attempting to check a filesystem. Adds a test to validate that exceptions raised on these commands where this issue can be encountered, are properly bypassed, and also adds additional logging to make it easier to figure out what is going on in the entire bootloader setup sequence. Change-Id: Ibd3060bef2e56468ada6b1a5c1cc1632a42803c3	2021-06-29 14:14:52 -07:00
Arne Wiebalck	27568204ae	Only mount the ESP if not yet mounted Check if the ESP is already mounted before attempting to mount it for the bootloader installation. Change-Id: Ifd738b2c5663f1a211d7e13b5ba386be631d8db1	2021-06-21 12:10:54 +02:00
Julia Kreger	2fab70c36b	Utilize CSV file for EFI loader selection Adds support to identify and utilize a CSV file to signal which bootloader to utilize, and set it when the OS is running as opposed to when EFI is running. This works around EFI loader potentially crashing some vendors hardware types when entry stored in the image does not match the EFI loader record which was utilzied to boot. Grub2+shim specifically specifically needs the CSV file name and entry label to match what the system was booted with in order to prevent the machine from potentially crashing. See https://storyboard.openstack.org/#!/story/2008962 and https://bugzilla.redhat.com/show_bug.cgi?id=1966129#c37 for more information. Change-Id: Ibf1ef4fe0764c0a6f1a39cb7eebc23ecc0ee177d Story: 2008962 Task: 42598 Co-Authored-By: Bob Fournier <bfournie@redhat.com>	2021-06-10 11:23:14 -07:00
Zuul	434de569e6	Merge "Ignore efi grub2-install failure"	2021-06-07 09:47:12 +00:00
Zuul	6be440eb3b	Merge "Refactor: use convert_image from ironic_lib"	2021-06-04 16:35:00 +00:00
Steve Baker	a057be7dad	Ignore efi grub2-install failure Recent releases of redhat grub2 will always fail when installing to EFI paths, to encourage a transition to the signed shim bootloader. Partition image deploys avoid calling grub2-install with the preserve-efi-assets functions. Deploying whole disk images doesn't require grub2-install. This leaves whole disk images installed onto softraid devices, which still attempts to call grub2-install. This change will still attempt to run grub2-install in this one remaining case, but will ignore any failure. A future enhancement can avoid calling grub2-install entirely so that non-redhat secure-boot capable images can keep their signed bootloaders. Story: 2008923 Task: 42521 Change-Id: If432ef795d64d76442d739eb4f7d155ff847041e	2021-06-04 10:03:55 +12:00
Zuul	7fdbcde3de	Merge "Stop accepting duplicated configdrive"	2021-06-02 12:36:57 +00:00
Dmitry Tantsur	f657526807	Stop accepting duplicated configdrive We're currently requiring it twice: in image_info and in a separate configdrive argument. I think we should eventually settle on separate arguments for separate entities, so this change makes the value in image_info optional with a goal to stop accepting it. We could probably just remove the handling in image_info, but a deprecation is safer. The (unused in ironic) cache_image call is updated with an optional configdrive arguments. Story: #2008904 Task: #42480 Change-Id: I1e2efa28efa3ea7e389774cb7633d916757bc6ed	2021-06-02 11:19:39 +02:00
Dmitry Tantsur	33d889c3c4	Refactor: use convert_image from ironic_lib Change-Id: If890baf3545cff6cef7c645c42e7f9d9038c9aa7	2021-06-01 14:07:34 +02:00
Zuul	5c063c8224	Merge "Make _get_efi_bootloaders return relative paths"	2021-05-27 13:09:48 +00:00
Julia Kreger	9e4c7052a2	Limit qemu-img execution arenas qemu-img attempts to launch multiple threads by default and attempts to have multiple memory allocation arenas to operate from. While multithreading can be good for performance, this pattern and the memory footprint for process launch and dependencies can turn the memory footprint for a cirros image conversion (16MB) into 1.2GB of memory being asked for by the qemu-img tool. In order to limit this impact, as the default number of arenas is governed by the number of CPUs times the number 8, it seems reasonable to lower this to a more reasonable number which also helps keep our possible memory footprint from being exceeded. Change-Id: I71a28ec59ec31c691205eb34d9fcab63a2ccb682 Story: 2008928 Task: 42528	2021-05-26 13:04:46 -07:00
Zuul	2172122b87	Merge "Rewrite write_image.sh in Python"	2021-05-26 17:17:02 +00:00
Steve Baker	10d18c4113	Make _get_efi_bootloaders return relative paths To make this function useful for purposes other than efibootmgr entries, this change moves the path manipulation to _run_efibootmgr. This change also adds boot*.efi entries to BOOTLOADERS_EFI so that it includes every entry in the UEFI Spec 2.9[1] Table 3-2 UEFI Image Types. [1] https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf Story: 2008923 Task: 42521 Change-Id: Ibe02786609aa0de65115897d8f4a9b4f36c8aed2	2021-05-26 11:21:15 +12:00
Zuul	6fc5a14760	Merge "Do not serialize command_params"	2021-05-18 14:58:42 +00:00
Dmitry Tantsur	606e500312	Rewrite write_image.sh in Python Change-Id: I0caa65561948f4e0934943a7a0d3a209701b5a59	2021-05-18 14:45:13 +02:00
Dmitry Tantsur	51aa31070a	Do not serialize command_params The command params can be huge when configdrive is used. There is no point in sending them back, Ironic does not use them anyhow. Story: #2008904 Task: #42479 Change-Id: I6e3db5db2042ca3fb5dafacfacf036fd7fc2fc4c	2021-05-18 12:59:28 +02:00
Zuul	d6e4fbd827	Merge "Remove the iscsi extension"	2021-05-12 11:08:19 +00:00
Zuul	29f3230791	Merge "Software RAID: RAID the ESPs"	2021-05-11 09:31:36 +00:00
Zuul	9837f1c2f0	Merge "Fix NVMe Partition image on UEFI"	2021-05-10 15:00:21 +00:00
Dmitry Tantsur	be3882162e	Remove the iscsi extension Change-Id: I2f0e581575112d6c7ba0d211661cab3e0b6caca6	2021-05-10 12:43:44 +02:00
Julia Kreger	fe825fa97e	Fix NVMe Partition image on UEFI The _manage_uefi code has a check where it attempts to just identify the precise partition number of the device, in order for configuration to be parsed and passed. However, the same code did not handle the existence of a `p1` partition instead of just a partition #1. This is because the device naming format is different with NVMe and Software RAID. Likely, this wasn't an issue with software raid due to how complex the code interaction is, but the docs also indicate to use only whole disk images in that case. This patch was pulled down my one RH's professional services folks who has confirmed it does indeed fix the issue at hand. This is noted as a public comment on the Red Hat bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=1954096 Story: 2008881 Task: 42426 Related: rhbz#1954096 Change-Id: Ie3bd49add9a57fabbcdcbae4b73309066b620d02	2021-05-04 16:44:37 +00:00
Dmitry Tantsur	24951b1029	Import deployment logic from ironic-lib The two functions work_on_disk and create_config_drive_partition contain a substantial part of the deployment logic. Previously we placed them in ironic-lib for re-using on the conductor side in the iSCSI deploy interface. Since the iSCSI deploy is going away, we can move this code to ironic-python-agent to simplify maintenance. Imports code from ironic_lib commit 9fb5be348202f4854a455cd08f400ae12b99e1f2. Change-Id: I6cbcd81533f135208b57746cb0e33ffdfaf94eee	2021-05-03 14:17:57 +02:00
Arne Wiebalck	c2d04dc156	Software RAID: RAID the ESPs For software RAID in UEFI mode, we create ESPs on all holder disks and copy the bootloader there. Since there is no mechanism to keep the ESPs in sync, e.g. on kernel upgrades or when kernel parameters are updated, the ESPs will get out of sync eventually. This may lead to a situation where a node boots with outdated parameters or does not have any of the installed kernels in the boot menu anymore. This change proposes to RAID the ESPs. While the UEFI firmware will find an ESP partition (one leg of the mirror), the node will see an md device and all subsequent updates will go to all member disks. Also, remove the source ESP after copying in order to avoid mount confusion (same UUID!). Story: #2008745 Task: #42103 Change-Id: I9078ef37f1e94382c645ae98ce724ac9ed87c287	2021-04-16 14:40:28 +02:00
Dmitry Tantsur	b395181b1b	Always fall back to sysrq when power off fails The line we're looking for is not there when IPA is in a container, at least for CentOS based containers. Just fall back to sysrq on errors. Change-Id: Ie4ee605ad9c6cda58808512a563247175859c71e	2021-04-13 19:05:04 +02:00
Steve Baker	e61336602f	Fix root UUID for streamed partition images The root UUID changes after a streamed partition image is written to the block device, causing later deployment failure when assuming the old UUID. This change updates the root UUID after streaming the partition image is complete. This issue may have been missed in local testing because deploying the same image repeatedly will result in stable root UUID across runs. Change-Id: Ice4630c16fc216980488d1427f3b02e1b8a417fa	2021-03-19 12:08:43 +01:00
Riccardo Pittau	bff252c726	Remove default parameter from execute The param check_exit_code from the processutils extension execute has default already at [0] See: https://opendev.org/openstack/oslo.concurrency/src/branch/master/oslo_concurrency/processutils.py#L214 Change-Id: Iedff5325e0737556d5eb3da601c984ddfc633873	2021-03-02 16:19:32 +01:00
kartikeyaj0	319efe2c2d	Fixes local boot for partition images IPA is not properly checking if the root partition is already mounted. Device is being passed to os.path.ismount() instead of the mount point. Story: 2008631 Task: 41839 Change-Id: I37a6e7e6bbe0bbbb0317c6e55bb822dafe7cce20	2021-02-17 10:56:31 +05:30
Dmitry Tantsur	403d2f06c6	Fix error message with UEFI-incompatible images It's somewhat confusing at the moment, since we're trying to find a UEFI partition by UUID "None". Don't search for partition if we don't know its UUID, and provide a better error message. Change-Id: Ief874084132797a445ddae8009264712a05facfd	2021-02-10 18:08:58 +01:00
Xinliang Liu	68a43b9da8	Fix UEFI boot entry creation for aarch64 Diskimage-builder installs grub with option '--removable'[1], thus for aarch64 no 'grubaa64.efi' file in efi directory only got 'BOOTAA64.EFI': linaro@bm-ubuntu:~$ tree /boot/efi /boot/efi └── EFI └── BOOT └── BOOTAA64.EFI 2 directories, 1 file [1]: `8f12d9530e/diskimage_builder/elements/bootloader/finalise.d/50-bootloader (L158)` Task: #41698 Story: #2008560 Change-Id: I9fc55c068ea980beae273411db9d3568eec25eb8	2021-01-27 03:32:23 +00:00
Julia Kreger	4fb8163717	Fix boot mode detection for partition images Previously, partition images were hard coded to be bios based as opposed to consulting all of the values AND the node itself before making the most appropriate determination. Now the agent utilises the internal helper to properly determine the boot mode when calling ironic-lib. Story: 2008070 Task: 41265 Change-Id: Id5eeda69d5b9de2b393af414472d57b0d4380c43	2020-12-19 19:03:16 +00:00
Julia Kreger	246e0cf29e	Change default ironic_lib invocation to flag local booting The partition image support has been telling ironic-lib that the machine will be local booted. While this is likely harmless, and doesn't seem to break anythign, we should have it match moving forward just to be on the safe side so we don't accidently break things down the road. Change-Id: I33e5d583964ef8c21aa04d7427bcd3957b89d449	2020-12-19 19:02:58 +00:00
Julia Kreger	a12a5744b6	Add fstab pointer to EFI partition Adds support for the EFI partition to be appended to fstab so the filesystem can be automounted and EFI loader updated should the deployed operating system need to do so. This should enable bootloaders to be upgraded by linux based operating systems after the instance has been deployed when a partition image was utilized for the initial deployment. Change-Id: Iec28a8841cc01ec8b01a3f5cca070c934c7a2531 Story: 2008070 Task: 40754	2020-12-17 14:17:31 +00:00
Julia Kreger	f9870d5812	Prevent broken partition image UEFI deploys Partition images can sometimes contain a /boot folder structure event he assets for EFI booting on that filesystem. Which is a good thing. The conundrum is that Ironic does not handle this properly and potentially replaces the bootloader in this sequence such that grub2-install is used instead of signed bootloader assets. As such, we should be preserving the assets and using them from a partition image much like we do when we have a wholedisk image and can identify the assets. Now we will preserve the EFI boot assets, copy them to the new EFI boot partition, and call the EFI setup methods to manage the EFI nvram. Note, this change also splits the logic path out that performs the end call of the EFI boot manager into a reusable method but does not retool all of the testing as it is intertwined in the install_grub2 testing. Also adds some additional debug logging, as much of the bootloader installation code has multiple fallback/cleanup points which makes it difficult to debug from logs. Story: 2008070 Task: 40753 Change-Id: If17d4b4c06df5504987e61a1fde6662e9acd6989	2020-12-14 14:37:14 +00:00
Julia Kreger	cb6c0059b5	Fix default disk label with partition images Partition images through the agent have the unfortunate side effect of being executed without full node context by default. Luckilly we've had a similar problem and cache the node. This patch changes the lookup from a default of msdos partitions to use the cached node object. Change-Id: I002816c9372fdf1cc32f3c67f420073551479fd9	2020-12-14 06:36:18 -08:00
Julia Kreger	7a83773fbc	Option to enable bootloader config failure bypass Some hardware is very well intentioned. However this intention can result in the UEFI NVRAM table being full which prevents us from adding new records to the table. We can't be sure what to delete, so in this case some operators just need the ability to tell ironic "it is okay if this fails, it will still work." The added ``ignore_bootloader_failure`` option adds this capability which can be set per-node either in the agent configuation via the ramdisk image, or in the pxe_append_params configuration parameter for the node itself with a ``ipa-ignore-bootloader-failure`` option in order to prevent the failure from being raised. Change-Id: If3c83fb2ea2025fce092d495a64f32077c70d2d6 Story: 2008386 Task: 41309	2020-12-10 06:42:48 -08:00
Fedor Tarasenko	694ea7425d	Support using LABEL as identifier for rootfs Add possibility to use disk LABEL to identify rootfs uuid for Software RAID deployment Change-Id: I77f36e70ddc539af0190db1c1abe0fb2c66f34b4 Story: 2008303 Task: 41188	2020-11-03 13:03:34 +03:00
Julia Kreger	6542a9cb04	Don't run os-prober from grub2-mkconfig By default, grub2-mkconfig scans everything to look for other environments and then load those into the grub configuration. It makes sense, but on newer versions of grub2 in distribution images, os-prober is taking an exceptionally long time in some cases where more than one storage device exists with other filesystems. As a result, of the os-prober execution by grub2-mkconfig, the bootloader installation can completely time out and fail the deployment. This is presently experienced with metalsmith on centos8. There are numerous sporatic reports of issues like this issue where grub2-mkconfig hangs for some period of time, and this is observable on Centos8.2 in our CI. While one report[0] mentions this issue, Another bug [1] has the dialog that actually helps us frame the context as to what we likely should do. Also, fixes the unit testing so we actually test if we're running with grub2. :\ [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1744693 [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1709682 Depends-On: https://review.opendev.org/#/c/748315 Change-Id: I14bf299afef3a1ddb2006fe5f182d7f0d249e734	2020-10-22 22:28:07 +00:00
Dmitry Tantsur	420ebc0d73	Do not silently swallow errors in the write_image deploy step Calling join() does not raise, we need to explicitly check the result. Change-Id: I81d3d727af220c2b50358edab8139f07874611f0 Story: #2008240 Task: #41083	2020-10-09 11:24:12 +02:00
Zuul	35d2292aa4	Merge "Log a warning of target_boot_mode does not match current boot mode"	2020-10-07 17:01:51 +00:00
Dmitry Tantsur	1a67dddde7	Log a warning of target_boot_mode does not match current boot mode This is not a normal situation and is likely to cause problems. Change-Id: Id0668fd160ac0539d85997e985f8c43d9da75c90	2020-10-07 12:30:23 +02:00
Dmitry Tantsur	fc4e0eed6a	Don't try to call GRUB when root UUID is not provided We don't have a really working way to detect root UUID for whole disk images at the moment, which results in an ignored traceback every time install_bootloader is called with whole disk images in UEFI mode. Avoid it by skipping GRUB2 if root UUID is unknown. Change-Id: I84245538f59c664b72d1cafbca8d61be0978f489	2020-10-07 12:06:42 +02:00
Dmitry Tantsur	fe6b687968	When reporting that agent is busy, report the executed command Also make this API return a proper HTTP code (409 instead of 500). Change-Id: I5d86878b5ed6142ed2630adee78c0867c49b663f	2020-09-18 17:52:49 +02:00
Julia Kreger	d3c3d4dabe	Update the cache if we don't have a root device hint Or at least try to. Some deployments just don't use root device hints, and this is okay. However, other deployments need root device hints, and with fast track mode in ramdisks, we created a situation where the node cache could be updated by a human or software between the time the agent was started, and the deployment was requested. As a result, the agent has been updated to check if we have a hint and if we don't, update the cache from the node lookup endpoint. This is not needed when the inband deploy steps are executed, as the process of updating the steps does force the node cache to be updated. Change-Id: I27201319f31cdc01605a3c5ae9ef4b4218e4a3f6 Story: 2008039 Task: 40701	2020-08-25 19:34:48 +00:00
Zuul	dc395c5837	Merge "More refactoring of the image module"	2020-07-27 07:15:42 +00:00
Zuul	9ca640a1c5	Merge "Prevent un-needed iscsi cleanup"	2020-07-25 13:54:51 +00:00
Riccardo Pittau	80e11811f5	More refactoring of the image module Introducing new function _umount_all_partitions to reduce the size of _install_grub2 Change-Id: I304468d57b10d677f2a9d58aec42a1bf414c6cba	2020-07-24 14:34:46 +02:00
Zuul	bfb395837d	Merge "Adds poll mode deployment support"	2020-07-22 19:53:31 +00:00
Julia Kreger	2a56ee03b6	Prevent un-needed iscsi cleanup When we added software raid support, we started calling bootloader installation. As time went on, we ehnanced that code path for non RAID cases in order to ensure that UEFI nvram was setup for the instance to boot properly. Somewhere in this process, we missed a possible failure case where the iscsi client tgtadm may return failures. Obviously, the correct path is to not call iscsi teardown if we don't need to. Since it was always semi-opportunistic teardown, we can't blindly catch any error, and if we started iSCSI and failed to tear the connection down, we might want to still fail, so this change moves the logic over to use a flag on the agent object which one extension to set the flag and the other to read it and take action based upon that. Change-Id: Id3b1ae5e59282f4109f6246d5614d44c93aefa7c Story: 2007937 Task: 40395	2020-07-20 14:24:06 -07:00
Riccardo Pittau	9d9a6bce5c	Refactor part of image module Shuffle some functions around and reduce size of _is_bootloader_loaded moving logic out to a new function. Change-Id: I9c10bf05186dcebb37f175d61bf4ac9ff86b6510	2020-07-07 10:44:50 +02:00
Dmitry Tantsur	ba3caa6c64	Increase the ESP partition size to 550 MiB when using software RAID This has been a popular guidance, and diskimage-builder has recently started following it. Change-Id: I794c846fb191c15b0a30546bf64d624dfbde0fd4	2020-07-02 17:30:33 +02:00

1 2 3 4 5

244 Commits