ironic-python-agent

Author	SHA1	Message	Date
Riccardo Pittau	95b3ed3fed	Fix unit tests after ironic-lib changes Updating tests after change [1] and [2] in ironic-lib. [1] `ae53e8e4b3` [2] `7644196e7d` Change-Id: I880b4f82beb117d8812e60c13040e19476cec32b	2024-03-12 09:13:14 +01:00
Zuul	df7eccd7f1	Merge "Trivial: avoid deprecated utcnow"	2024-02-08 14:43:41 +00:00
Zuul	6d35c1e949	Merge "Make inspection URL optional if the collectors are provided"	2024-02-07 23:06:34 +00:00
Zuul	359ac636f0	Merge "Drop usage of run_as_root"	2024-01-31 16:29:06 +00:00
Dmitry Tantsur	8877e1f319	Trivial: avoid deprecated utcnow Change-Id: I5dbe3c2be36e23e749fbeebbc448d413d276b401	2024-01-31 10:09:13 +01:00
Dmitry Tantsur	0010f5c11a	Also retry inspection on HTTP CONFLICT The new implementation can return it when unable to lock the node. Other possible errors are 400 and 404 (should not be retried), as well as 5xx (already retried). Change-Id: I74c2f54a624dc47e8e2d1e67ae4c6a6078e01d2f	2024-01-26 16:21:24 +01:00
Dmitry Tantsur	9f849472ca	Drop usage of run_as_root IPA can only be run as root and does not use rootwrap. We need to eventually remove support for rootwrap from ironic-lib. Change-Id: Iffd5cae5e3dc8637bc6dd10b3bcc9fe33932b8cf	2024-01-23 14:23:23 +01:00
Zuul	1e107bd625	Merge "Add support for reporting CPU socket number"	2024-01-22 11:52:06 +00:00
Kaifeng Wang	9cafe76225	Add support for reporting CPU socket number IPA reports a few cpu fields including cores, arch, flags etc. There is a need that user wants to utilize the physical number in a baremetal since cores are just a logical representation of the compute resource. The socket number is more suitable for the quota control in some use cases. Change-Id: I94be86d6b12a3a7e7ca1041d948427a073412a31	2024-01-19 21:24:37 +00:00
Dmitry Tantsur	6cd36a750f	Make inspection URL optional if the collectors are provided With the new in-band inspection, we can derive the callback URL from the Ironic URL, there is no need to duplicate it. This change uses the presence of collectors as a sign to run inspection. The previous approach of setting an inspection URL, with or without explicitly setting collectors, still works for compatibility with ironic-inspector. Change-Id: Ie4279ee6d2995c9686f1dcdef1d6e5dc1dd20871	2024-01-10 08:55:42 +01:00
Dmitry Tantsur	0d4ae976c2	Support several API and Inspector URLs Allows nodes with a single IP stack to be deployed from a dual-stack Ironic. Detecting advertised address and usable Ironic URLs are done completely independently which does open some space for a misconfiguration. I hope it's not likely in the reality, especially since this feature is targetting advanced standalone users. Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2 Closes-Bug: #2045548	2024-01-09 16:43:23 +01:00
Dmitry Tantsur	2bb74523ae	Add missing headers to the inspection callback Somehow, it has worked correctly for years, but now I've discovered that the new inspection is (no longer?) tolerant to the missing header. While here, copy all headers from the heartbeat code. Change-Id: I9e5c609eb4435e520bc225dea08aedfdf169744b	2024-01-09 16:38:46 +01:00
Jay Faulkner	36e5993a04	[codespell] Fix spelling issues in IPA This fixes several spelling issues identified by codepsell. In some cases, I may have manually modified a line to make the output more clear or to correct grammatical issues which were obvious in the codespell output. Later changes in this chain will provide the codespell config used to generate this, as well as adding this commit's SHA, once landed, to a .git-blame-ignore-revs file to ensure it will not pollute git historys for modern clients. Related-Bug: 2047654 Change-Id: I240cf8484865c9b748ceb51f3c7b9fd973cb5ada	2023-12-28 10:54:46 -08:00
Iury Gregory Melo Ferreira	03b6b0a4ab	Fix inspector retries to not take a long time Since we moved to exponential wait we increased the amount of time to run unit tests, now we can configure the max time to wait - before: Ran: 33 tests in 22.6581 sec. - after: Ran: 33 tests in 4.0256 sec. Change-Id: Ibdcfebacad0489d17183e43ceb0d603fce67e72b	2023-12-19 14:26:59 -03:00
Dmitry Tantsur	2ab8364649	Add a jitter to heartbeat retries Currently, if heartbeat fails, we reschedule it after 5 seconds. This is fine for the first retry, but it can cause a thundering herd problem when a lot of nodes fail to heartbeat at once. This change adds jitter to the minimum wait of 5 seconds. The jitter is not applied for forced heartbeats: they still have a minimum wait of exactly 5 seconds from the last heartbeat. The code is re-ordered to move the interval calculation to one place. Bonus: correctly logging the next interval. The unit tests have been rewritten to test the heartbeat process step by step and not rely on the exact sequence of the calls. Closes-Bug: #2038438 Change-Id: I4c4207b15fb3d48b55e340b7b3b54af833f92cb5	2023-12-13 17:34:24 +01:00
Iury Gregory Melo Ferreira	801da9ec1f	Retry in ProxyError during post inspector data * ProxyError is derived from ConnectionError, but it's necessary to check the Response object to identify. - Added ProxyError in retry_if_exception_type - Updated _post_to_inspector to proper handle ProxyError - Updated the wait to use wait_exponential instead of wait_fixed. Closes-Bug: 2045429 Change-Id: Iefe3fe581cd4e7c91a0da708e6f6d0fdaacab6fe	2023-12-06 12:01:35 -03:00
Zuul	beccfe8c92	Merge "Revert "Fix vmedia network config drive handling""	2023-11-30 15:14:20 +00:00
Dmitry Tantsur	c57deb7e76	Revert "Fix vmedia network config drive handling" This reverts commit `33f01fa3c2`. There are a few issues with the patch - see my comments there. The most pressing and the reasons to revert are: 1) It breaks deployments when the vmedia is present but does not have a network_data.json (the case for Metal3). 2) It assumes the presence of Glean which may not be the case. Neither Julia nor myself have time to thoroughly fix the issue, leaving a revert as the only option to unblock Metal3. Change-Id: I3f1a18a4910308699ca8f88d8e814c5efa78baee Closes-Bug: #2045255	2023-11-30 10:33:29 +00:00
Zuul	61d17e2225	Merge "Parse efibootmgr type and details"	2023-11-29 01:10:27 +00:00
Zuul	eea9917023	Merge "Fix vmedia network config drive handling"	2023-11-29 01:10:25 +00:00
Steve Baker	352df0bc54	Parse efibootmgr type and details This change improves the regex to match an exact entry name, and to also match with the the entry type from a set of recognised types. The boot entry details start from the recognised type onwards. This can be used by a step which deletes all entries of type 'HW' and UsbClass. Related-Bug: #2041901 Change-Id: I5d879f724efc2919b541fd3fef0f931df67ff9c7	2023-11-24 09:45:40 +13:00
Zuul	768aa17442	Merge "Add mlnx deploy_step entry to enable deploy time firmware"	2023-11-23 00:12:13 +00:00
Zuul	7a4114512c	Merge "Handle different device outputs for multipath"	2023-11-22 21:36:40 +00:00
Zuul	9f9940efdc	Merge "Test coverage for efi_utils.get_boot_record"	2023-11-22 21:36:39 +00:00
Iury Gregory Melo Ferreira	0a29206b8d	Handle different device outputs for multipath In some cases the output of the multipath can differ and we would return a wrong parent device. Closes-Bug: 2043992 Change-Id: I848d7df798cc736bd5a55eed8fa46110caea1dc3	2023-11-20 22:51:41 -03:00
Zuul	845df338f8	Merge "improve multipathd error handling"	2023-11-09 17:31:32 +00:00
Julia Kreger	33f01fa3c2	Fix vmedia network config drive handling When performing DHCP-less deployments, the agent can start and discover more than one configuration drive present on a host. For example, a host was previously deployed using Ironic, and is now being re-deployed again. If Glean was present in the ramdisk, the glean-early.sh would end mounting the folder based upon label. If cloud-init, somehow is still in the ramdisk, the other folder could somehow get mounted. This patch, which is intended to be backportable, causes the agent to unmount any configuration drive folders, mount the most likely candidate based upon device type, partition, and overall state of the machine, and then utilize that configuration, if present, to re-configure and reload networking. Thus allowing dhcp-less re-deployments to be fixed without forcing any breaking changes. It should also be noted that this fix was generated in concert with an additional tempest test case, because this overall failure case needed to be reproduced to ensure we had a workable non-breaking path forward. Closes-Bug: 2032377 Change-Id: I9a3b3dbb9ca98771ce2decf893eba7a4c1890eee	2023-11-08 12:11:06 -08:00
Zuul	9d9568ba23	Merge "Get numa_node info when collecting pci devices info"	2023-11-06 18:15:33 +00:00
Steve Baker	26be55f763	Test coverage for efi_utils.get_boot_record A step will be developed to delete all EFI entries of type HD. As part of this get_boot_record will need to parse more of the output of `efibootmgr -v`. This change asserts the existing behaviour of get_boot_record, and the test can evolve with the changes in get_boot_record. Related-Bug: #2041901 Change-Id: I0c5ac4adc1044c528c27a4eaf580c619ceef47e0	2023-11-06 14:12:02 +13:00
Jay Faulkner	3d42298619	Remove standby.cache_image support Image caching was never fully supported in Ironic or IPA; this is vestigal code leftover from a partial implementation. Even if we implemetented it today, we'd likely use a completely different methodology. Change-Id: Id4ab7b3c4f106b209585dbd090cdcb229b1daa73	2023-10-24 15:02:44 -07:00
Zhou Ya	76ad06225a	Get numa_node info when collecting pci devices info IPA now includes information about numa node id when collecting information about PCI devices. Closes-bug: #1622940 Co-Authored-By: Jay Faulkner <jay@jvf.cc> Change-Id: I70b0cb3eff66d67bb8168982acbbf335de0599cd	2023-10-24 14:27:21 -07:00
Adam Rozman	13537db293	improve multipathd error handling This commit: - Adds the ability to ignore inconsequential OS error caused by starting the multipathd service when an instance of the service is already running. Related launchpad issue https://bugs.launchpad.net/ironic-python-agent/+bug/2031092 Change-Id: Iebf486915bfdc2546451e6b38a450b4c241e43a8	2023-10-23 16:33:03 +03:00
Zuul	b42f0be422	Merge "implement basic-auth support for user-image download process"	2023-10-13 17:08:28 +00:00
Boushra Bettir	dbf3e5408d	Replace shlex module with helper function Used helper function, `parse_device_tags` from ironic_lib instead of the shlex module for their identical functionality. Updated mock_execute.side_effect for lsblk compatibility in utils.execute. Closes-Bug: #2037572 Change-Id: I6600e054f9644c67ab003f0e0f6c380b5c217223	2023-10-12 13:34:32 -07:00
Julia Kreger	cb61a8d6c0	Retry on checksum failures HTTP is a fun protocol. Size is basically optional. And clients implicitly trust the server and socket has transferred all the bytes. Which really means you should always checksum. But... previously we didn't checksum as part of retrying. So if anything happened with python-requests, or lower level library code or the system itself causing bytes to be lost off the buffer, creating an incomplete transfer situation, then we wouldn't know until the checksum. So now, we checksum and re-trigger the download if there is a failure of the checksum. This involved a minor shift in the download logic, and resulted in a needful minor fix to an image checksum test as it would loop for 90 seconds as well. Closes-Bug: 2038934 Change-Id: I543a60555a2621b49dd7b6564bd0654a46db2e9a	2023-10-10 09:15:31 -07:00
Adam Rozman	70961789a6	implement basic-auth support for user-image download process This feature was proposed in https://bugs.launchpad.net/ironic-python-agent/+bug/2021947 Change-Id: I9dbfc1402240beb75b6736214753fd86dccae676	2023-10-10 16:25:51 +03:00
Zuul	89be7bd420	Merge "Conditional creation of RAIDed ESP for UEFI Software RAID"	2023-10-10 11:07:25 +00:00
Boushra Bettir	25704d2555	Add additional mock tests to unit tests for read only devices. Change ordering to ensure mock tests work correctly. Closes-Bug: #2037690 Change-Id: Ie9b884e58e4677a47e57c3ad39cadd65db8eec75	2023-10-08 20:02:05 +00:00
Zuul	73b76da5fe	Merge "Add get_service_steps logic to the agent"	2023-09-15 22:29:59 +00:00
Julia Kreger	f86975d53c	Add mlnx deploy_step entry to enable deploy time firmware Follow-up from service steps addition change to add a deploy steps alias for the Nvidia Mellanox network device firmware update clean steps. This allows deploy time firmware updates to be codified as part of a deployment with custom steps. Change-Id: I9d80447dee7cfde4d3f8d81d9d39e738916b7824	2023-08-31 06:35:39 -07:00
Julia Kreger	eb95273ffb	Add get_service_steps logic to the agent Initial code patches for service steps have merged in ironic, and it is now time to add support into the agent which allows service steps to be raised to the service. Updates the default hardware manager version to 1.2, which has rarely been incremented due to oversight. Change-Id: Iabd2c6c551389ec3c24e94b71245b1250345f7a7	2023-08-31 06:22:22 -07:00
Julia Kreger	b6c263a5dc	preserve/handle config drives on 4k block devices When an underlying block device (or driver) only supports 4KB IO, this can cause some issues with aspects like using an ISO9660 filesystem which can only support a maximum of 2KB IO. The agent will now attempt to mount the filesystem before deleting the supplied file, and should that fail it will mount the configuration drive file from the ramdisk utilizing a loopback, and then extract the contents of the ramdisk into a newly created VFAT filesystem which supports 4KB block IO. Closes-Bug: #2028002 Change-Id: I336acb8e8eb5a02dde2f5e24c258e23797d200ee	2023-08-24 08:10:22 -07:00
Julia Kreger	5ed520df89	Handle the node being locked If the node is locked, a lookup cannot be performed when an agent token needs to be generated, which tends to error like this: ironic_python_agent.ironic_api_client [-] Failed looking up node with addresses '00:6f:bb:34:b3:4d,00:6f:bb:34:b3:4b' at https://172.22.0.2:6385. Error 409: Node c25e451b-d2fb-4168-b690-f15bc8365520 is locked by host 172.22.0.2, please retry after the current operation is completed.. Check if inspection has completed. Problem is, if we keep pounding on the door, we can actually worsen the situation, and previously we would just just let tenacity retry. We will now hold for 30 seconds before proceeding, so we have hopefully allowed the operation to complete. Also fixes the error logging to help human's sanity. Change-Id: I97d3e27e2adb731794a7746737d3788c6e7977a0	2023-08-22 16:47:28 -07:00
Arne Wiebalck	286d66709a	Conditional creation of RAIDed ESP for UEFI Software RAID Rebuilding an instance on a RAIDed ESPs will fail due to sgdisk running against an non-clean disk and bailing out. Check if there is a RAIDed ESP already and skip creation if it exists. Change-Id: I13617ae77515a9d34bc4bb3caf9fae73d5e4e578	2023-08-16 17:39:04 +02:00
Zuul	e493cad02c	Merge "Log the number of bytes downloaded"	2023-07-27 21:39:12 +00:00
Julia Kreger	c65ad42ff1	Log the number of bytes downloaded When troubleshooting download issues, which may present as checksum validation failures, it is difficult to understand if the entire file was downloaded due to the way HTTP works. In that, a download may start with a successful result code, and the content is streamed out until the socket is closed. But with HTTP there is no way to know if that socket closed prematurely and the original server size is also an optional field, so just log the size we got to so we don't drive the humans [more-]insane. Also now logs the (optional) content-length field if supplied by the server. Change-Id: Id71b167f4e330d54b9afddf95f1a2ef9e40398bf	2023-07-19 16:20:40 +00:00
Zuul	0fb7fec56e	Merge "Allow md5 to be disabled from the conductor"	2023-07-12 03:53:14 +00:00
Zuul	119981a818	Merge "Fix nvidia hardware manager url parser to permit https"	2023-06-26 10:11:55 +00:00
Zuul	bb156aad6c	Merge "Fix Bandit errors"	2023-06-26 09:25:09 +00:00
Julia Kreger	b83678c968	Fix nvidia hardware manager url parser to permit https Change-Id: I9a10e543d3256ceaa78c6fbdb01fc0d88c0ee6e6	2023-06-06 15:35:16 +00:00

1 2 3 4 5 ...

830 Commits