ironic-python-agent

Author	SHA1	Message	Date
Julia Kreger	c77a7df851	Extend retries to 9, 10 seconds apart. The download retry interval was previously five seconds which is not long enough to recover after a hard network connectivity break where we may be reliant upon network port forwarding hold-down timers or even routing protocol route propogation to recover communication. Previously the time value was 5 seconds, with 3 attempts, meaning 15 seconds total ignoring the error detection timeouts. Now it is 10 seconds, with 10 attempts, meaning 100 seconds before the error detection timeouts. Change-Id: I6d11edc9a3156f2bdc21c3d432ecc7625d652699	2020-06-23 20:27:49 +00:00
Julia Kreger	159ab9f0ce	Add full download retries Instead of just trying to get the connection and handler for the download, lets try to retry the whole action of of downloading. Change-Id: I9217792d32e6f33c70f146a9b7d3ef58c5644d8a	2020-06-23 20:27:41 +00:00
Julia Kreger	c5b97eb781	Add timeout operations to try and prevent hang on read() Socket read operations can be blocking and may not timeout as expected when thinking of timeouts at the beginning of a socket request. This can occur when streaming file contents down to the agent and there is a hard connectivity break. In other words, we could be in a situation like: - read(fd, len) - Gets data - Select returns context to the program, we do things with data. hard connectivity break for next 90 seconds - read(fd, len) - We drain the in-memory buffer side of the socket. - Select returns context, we do things with our remaining data Server retransmits Server times out due to no ack Server closes socket and issues a FIN,RST packet to the client Connectivity restored, Client never got FIN,RST Client socket still waiting for more data - read(fd, len) - No data returned - Select returns, yet we have no data to act on as the buffer is empty OR the buffered data doesn't meet our requried read len value. tl;dr noop - read(fd, len) <-- We continue to try and read until the socket is recognized as dead, which could be a long time. NOTE: The above read()s are python's read() on an contents being streamed. Lower level reads exist, but brains will hurt if we try to cover the dynamics at that level. As such, we need to keep an eye on when the last time we received a packet, and treat that as if we have timed out or not. Requests periodically yeilds back even when no data has been received, in order to allow the caller to wall clock the progress/status and take appropriate action. When we exceed the timeout time value with our wall clock, we will fail the download. Change-Id: I7214fc9dbd903789c9e39ee809f05454aeb5a240	2020-06-23 13:25:09 -07:00
Zuul	46bf7e0ef4	Merge "Add a deploy step for writing an image"	2020-06-20 00:00:10 +00:00
Dmitry Tantsur	6d7ec350ff	Make get_partition_uuids work with whole disk images We used to popular root UUID inside the message formatting function, move it to actual prepare_image/cache_image calls. Change-Id: Ifb22220dfd49633e8623dd76f7a6a128f5874b78	2020-06-17 14:38:58 +02:00
Zuul	751dac7b90	Merge "Split and move logic for partition tables"	2020-06-11 22:53:07 +00:00
Zuul	d7cf7bd341	Merge "New extension call to return partition UUIDs"	2020-06-09 12:31:55 +00:00
Dmitry Tantsur	7e5fe1121e	Make the install_bootloader command asynchronous It does not return anything, so it makes no point for it to be synchronous. Ironic always calls it with wait=True, so there is no problem with backward compatibility either. Change-Id: I44fec2e0cb54486328ce71263613d8592e384870	2020-06-08 15:10:05 +02:00
Dmitry Tantsur	9d4cf5532f	Add a deploy step for writing an image The new step just invokes the appropriate method of the standby extension. Change-Id: Ic74f83ab2b7e58f8e4b46e0abfab79e221afeb3e Story: 2006963	2020-06-02 15:23:54 +02:00
Dmitry Tantsur	6c1545b75b	New extension call to return partition UUIDs Currently we parse the success message from the write_image call. This is inconvenient and incompatible with the deploy steps split. Change-Id: I258dc1ff1ad1c9df5cbc26a7825d9e7ef2f3205b Story: #2006963	2020-06-02 15:05:59 +02:00
Fedor Tarasenko	952489020e	Fix an issue with high cpu usage caused by ironic-python-agent Currently running of ipa-centos8-stable-ussuri image causes 100% cpu usage while cleaning. Proposed change fixes this behavior and significantly speeds up cleaning. Change-Id: I2ba9a69f22b11830d8ff1bc346b17bf1a52f25b0 Story: #2007696 Task: #39809	2020-05-25 22:18:17 +03:00
Riccardo Pittau	557d5603a2	Split and move logic for partition tables Move and split the logic to create the partition tables when applying raid configuration. Change-Id: Ic76dd2067ace02dd02351caca0c7f9b05571e510	2020-05-25 08:11:28 +00:00
Zuul	fc7ac48a6e	Merge "Fix pep8 errors"	2020-05-13 11:27:36 +00:00
Riccardo Pittau	f6ee877cde	Fix pep8 errors For some reason pep8 test started to complain causing mayhem. This patch fixes the issues and does some refactor of dmi_inspector tests moving pure data to a separate file. Change-Id: Ia244a496acd80abad679f8ae9832d4f0471500e7	2020-05-12 10:57:23 +02:00
Riccardo Pittau	8d210638a8	Fix TypeError with newer version of lshw The issue with json output in lshw was fixed in version B.02.19 This patch makes the memory calculation compatible with that version and later versions that are included in recent distributions (e.g. Ubuntu 20.04, Fedora 31) Change-Id: Id5a30028b139c51cae6232cac73a50b917fea233 Story: 2007588 Task: 39527	2020-04-27 15:07:54 +02:00
Riccardo Pittau	2738e57f2a	Add function to calculate memory Move logic to calculate memory to its own function. Change-Id: I5ab98b6450ff45dff35ddae093a83140f37047a8	2020-04-27 10:46:05 +02:00
Dmitry Tantsur	8adb7e1a04	Add timeout and retries when connection to an image server If the server is stuck for any reason, the download will hang for a potentially long time. Provide a timeout (defaults to 60 seconds) and 2 retries on failure. Change-Id: Ie53519266edd914fdbfa82fe52b4a55151e5ec5f	2020-04-24 10:34:40 +02:00
Zuul	5dcff4d2b3	Merge "Add raid.apply_configuration deploy step"	2020-04-21 09:32:40 +00:00
Zuul	6a95e216f6	Merge "Simplify deduplicate_steps"	2020-04-21 00:37:15 +00:00
Dmitry Tantsur	896f389d5c	Mock get_node_boot_mode in software RAID unit tests This function checks for /sys/firmware/efi. Some tests do not mock isdir, so they fail on UEFI machines. Change-Id: I088218ddb88717ac07669d0b97c6cd50208ede8c	2020-04-20 16:41:28 +02:00
Dmitry Tantsur	c0502649ba	Add raid.apply_configuration deploy step For compatibility with out-of-band RAID deploy steps, we need to have one apply_configuration step, not a create/delete pair. Change-Id: I55bbed96673c9fa247cafdac9a3ade3a6ff3f38d Story: #2006963	2020-04-20 12:50:14 +02:00
Sean McGinnis	5589b05df4	Use unittest.mock instead of third party mock Now that we no longer support py27, we can use the standard library unittest.mock module instead of the third party mock lib. Change-Id: I5fdb2a02ee83c692d46cbe28266fcae033bec6f6 Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>	2020-04-18 11:53:28 -05:00
Zuul	43d2b2cbe0	Merge "A boot partition on a GPT disk should be considered an EFI partition"	2020-04-16 16:22:49 +00:00
Dmitry Tantsur	ff49b04e28	A boot partition on a GPT disk should be considered an EFI partition DIB builds instance images with EFI partitions that only have the boot flag, but not esp. According to parted documentation, boot is an alias for esp on GPT, so accept it as well. To avoid complexities when parsing parted output, the implementation is switched to existing utils and ironic-lib functions. Change-Id: I5f57535e5a89528c38d0879177b59db6c0f5c06e Story: #2007455 Task: #39423	2020-04-15 18:38:15 +02:00
Dmitry Tantsur	6a9a9e9e14	Fix the token logic to be compatible with older ironic Currently we fail with HTTP 401 if both the known and the received tokens are None. This prevents IPA from being updated before ironic. Story: #2007557 Task: #39419 Change-Id: I80249bd3468b581dc035d72156cbfa2f5f225a1b	2020-04-15 13:01:26 +02:00
Zuul	f6668f94c9	Merge "Move minimum ironic version to latest ocata"	2020-04-15 10:22:12 +00:00
Zuul	b56ed05d25	Merge "Move logic to calculate raid sectors to raid_utils"	2020-04-13 10:51:28 +00:00
Zuul	020761a513	Merge "Remove unused version parameter in version header function"	2020-04-09 15:52:29 +00:00
Riccardo Pittau	3966871f47	Move logic to calculate raid sectors to raid_utils Some more raid related logic moved to raid_utils. Change-Id: I08c73ad14e5b01ebac2490b83997c5452506d4a2	2020-04-09 15:03:37 +02:00
Zuul	83b5a8b202	Merge "Move logic for raid start sector to raid_utils"	2020-04-09 12:31:29 +00:00
Zuul	b9e320e76f	Merge "Add an ability to run in-band deploy steps"	2020-04-09 09:31:49 +00:00
Riccardo Pittau	f32a4a2b29	Move logic for raid start sector to raid_utils A starting tentative in reducing size of raid related functions. Change-Id: I81f912d0dc0ad138d8cc776cdb4ee3b5251ec3ba	2020-04-08 17:32:38 +02:00
Riccardo Pittau	6c51709a1a	Move minimum ironic version to latest ocata All other API versions from releases before that are not supported anymore. Change-Id: I49fb3e4facdec42a4dab343c46a84f3cba6d2b7c	2020-04-08 15:40:02 +02:00
Riccardo Pittau	9ac6040110	Remove unused version parameter in version header function The logic to determine the version when getting the ironic version header is not influenced by the version parameter passed to the function. Change-Id: Ie52a82bf71a2277cea11fd2dedfd9c1e0001d95f	2020-04-08 12:17:41 +02:00
Zuul	bdc5e9448d	Merge "Editing follow-up for UEFI Software RAID support"	2020-04-07 13:18:44 +00:00
Arne Wiebalck	66c32784af	Editing follow-up for UEFI Software RAID support This is a follow-up to https://review.opendev.org/#/c/696156/ Change-Id: I0fd2c09045ff07a57374934c35d4a3a8467f5e99 Story: #2006379 Task: #37635	2020-04-06 18:03:25 +02:00
Riccardo Pittau	d5d62c8dbf	Use unittest mock from standard library Drop the third party mock library to use unittest mock from standard library. Change-Id: Ib64b661572e4869a24865c02a6c84a6603930394	2020-04-06 14:35:50 +02:00
Dmitry Tantsur	079f61d09c	Simplify deduplicate_steps The same result can be achieved using a multi-component sorting key. Change-Id: Ieacf9fcecb2a6de7b4ccd8889f789099af39aa37	2020-04-06 10:30:31 +02:00
Mark Goddard	1b4ce47921	Add an ability to run in-band deploy steps Mostly adaptation of cleaning methods. Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com> Change-Id: Ife0502391bbece46d619a20a825dfdb191d5c2b4 Story: 2006963 Task: 37791	2020-04-06 10:24:08 +02:00
Raphael Glon	9343348106	Software RAID: Add UEFI support The proposed changes concern two steps: First, when creating the RAID configuration, have a GPT partition table type (this is not necessary, but more natural with UEFI). Also, leave some space, either for the EFI partitions or the BIOS boot partitions, outside the Software RAID. Secondly, when installing the bootloader, make sure the correct boot partitions are created or relocated. Change-Id: Icf0a76b0de89e7a8494363ec91b2f1afda4faa3b Story: #2006379 Task: #37635	2020-04-02 18:02:19 +02:00
Zuul	d71a8375fa	Merge "Only check for partitions on devices that are part of software RAID"	2020-04-02 14:56:45 +00:00
Zuul	dea6de0b2d	Merge "Add jitter to inspection command reporting"	2020-04-02 09:36:38 +00:00
Zuul	ab8c7c05bc	Merge "Allow specifying target devices for software RAID"	2020-04-01 17:36:50 +00:00
Dmitry Tantsur	34b58f6024	Only check for partitions on devices that are part of software RAID Now that an operator can pick the devices that participate in RAID, it no longer makes sense to verify all devices. Change-Id: Id5d8d539183f0db4ba3c4132ce6bc9919f9cd1ea Story: #2006369	2020-04-01 16:02:05 +02:00
Julia Kreger	368ab136f0	Add jitter to inspection command reporting Adds a jitter and backoff behavior to the inspector data collection command to prevent thundering heard sorts of issues. Change-Id: I00517010991cbe43d5958c7d76019ef6fe89c983	2020-03-31 08:13:13 -07:00
Zuul	68a71513f0	Merge "Bump hacking to 3.0.0"	2020-03-31 12:36:11 +00:00
Riccardo Pittau	a332a19a57	Bump hacking to 3.0.0 Change-Id: I1032ea6a2e9d79aeaecb1458c319cbeb15ac1fff	2020-03-30 12:55:46 +02:00
Julia Kreger	916cd5c8de	Rescan after restarting the md device If an md device is restarted, there is a chance, depending on the OS, that the partition may not be found upon start of the md device. Instead, we should always rescan after re-assembling the raid device. Story: 2007275 Task: 38712 Change-Id: I92bac20812940e04381a54ef2905ef5f6e293813	2020-03-29 14:47:41 +00:00
Julia Kreger	55b011cb1f	Fix GPT partition tables after agent writes contents Fixes errors that were being raised upon restarting the agent directly written out software raid images as the raidset is restarted for device consistency and partition updates later on in the code path of deployment. Story: 2007455 Task: 39187 Change-Id: I9abf51eb77b262932e70329af5ce1593106a3171	2020-03-29 07:45:25 -07:00
Julia Kreger	bf0bb7a87a	Improve debug logging around Raid/Bootloader Change-Id: I7d34b918a859972a2d5650494824d3333016dd11	2020-03-28 08:55:32 -07:00

1 2 3 4 5 ...

777 Commits