ironic-python-agent

Author	SHA1	Message	Date
Kaifeng Wang	b424fbfa35	Extends pci devices metrics Collects PCI class, revision, and bus information for the pci-devices collector, these metrics as well as vendor id and device id are components which can be used to construct device information like lspci output, which is how cyborg agent collects accelerator devices. Accelerator device based scheduling is possible after ironic has such information in place. Change-Id: I6c37c554f37dd5f1d21c8fd4fad2a4f44a3c75d7 Story: 2007971 Task: 40474	2020-08-04 23:32:37 +08:00
Zuul	a9ed390f08	Merge "set EVENTLET_NO_GREENDNS to 'yes'"	2020-07-31 18:27:55 +00:00
Julia Kreger	9830f3cb0f	set EVENTLET_NO_GREENDNS to 'yes' Eventlet, when monkey patching occurs, replaces the base dns resolver methods. This can lead to compatability issues, and un-expected exceptions being raised during the process of monkey patching. Such as one if there are no resolvers. As such, since we don't really need monkey patching of DNS, and setting the flag should make the inspector CI jobs happier where we don't need nor use DNS, AND tinycore may not be setting a resolver configuration at all, which is the root of the failure upon monkey patching that casues IPA to fail on start in certian circumstances. As a note, this has been performed on other projects due to bugs. See Id9fe265d67f6e9ea5090bebcacae4a7a9150c5c2. Change-Id: Ib8f7b844b1bfffff16f88ebbb6ef5ddbe61d5a30 Story: 2007936 Task: 40394	2020-07-31 16:21:06 +02:00
Zuul	ad9c54f55c	Merge "Return the final RAID configuration from apply_configuration"	2020-07-29 14:00:08 +00:00
Dmitry Tantsur	f03d72019a	Return the final RAID configuration from apply_configuration AgentRAID expects it and fails with TypeError if it's not provided. Change-Id: Id84ac129bba97540338e25f0027aa0a0f51bde52 Story: #2006963	2020-07-29 10:10:18 +02:00
Dmitry Tantsur	eb87651496	Allow erase_devices_metadata to be used as a deploy step Change-Id: I75f156dd76b0e3aaa1592ba24fe42fb2a7057cc8 Story: #2006963	2020-07-27 17:57:37 +02:00
Zuul	dc395c5837	Merge "More refactoring of the image module"	2020-07-27 07:15:42 +00:00
Zuul	9ca640a1c5	Merge "Prevent un-needed iscsi cleanup"	2020-07-25 13:54:51 +00:00
Riccardo Pittau	80e11811f5	More refactoring of the image module Introducing new function _umount_all_partitions to reduce the size of _install_grub2 Change-Id: I304468d57b10d677f2a9d58aec42a1bf414c6cba	2020-07-24 14:34:46 +02:00
Zuul	daf61f33b0	Merge "Fix bootloader install issue with MDRAID"	2020-07-22 22:13:34 +00:00
Zuul	bfb395837d	Merge "Adds poll mode deployment support"	2020-07-22 19:53:31 +00:00
Doug Szumski	5e95b1321d	Fix bootloader install issue with MDRAID When no root_device hint is set, an MDRAID partition can be incorrectly selected as the root device which causes installation of the bootloader to the physical disks behind the MDRAID volume to fail. See the notes in the referenced Story for more detail. This change adds a little more specificity to the listing of block devices. Change-Id: I66db457e71a0586723ee753bef961aec5bf58827 Story: 2007905 Task: 40303	2020-07-22 11:16:13 -07:00
Julia Kreger	2a56ee03b6	Prevent un-needed iscsi cleanup When we added software raid support, we started calling bootloader installation. As time went on, we ehnanced that code path for non RAID cases in order to ensure that UEFI nvram was setup for the instance to boot properly. Somewhere in this process, we missed a possible failure case where the iscsi client tgtadm may return failures. Obviously, the correct path is to not call iscsi teardown if we don't need to. Since it was always semi-opportunistic teardown, we can't blindly catch any error, and if we started iSCSI and failed to tear the connection down, we might want to still fail, so this change moves the logic over to use a flag on the agent object which one extension to set the flag and the other to read it and take action based upon that. Change-Id: Id3b1ae5e59282f4109f6246d5614d44c93aefa7c Story: 2007937 Task: 40395	2020-07-20 14:24:06 -07:00
Dmitry Tantsur	1f3b70c4e9	Ignore devices with size 0 when collecting inventory delete_configuration still fetches all devices as it needs to clean ones with broken RAID. Story: #2007907 Task: #40307 Change-Id: I4b0be2b0755108490f9cd3c4f3b71a5e036761a1	2020-07-09 18:28:20 +02:00
Riccardo Pittau	9d9a6bce5c	Refactor part of image module Shuffle some functions around and reduce size of _is_bootloader_loaded moving logic out to a new function. Change-Id: I9c10bf05186dcebb37f175d61bf4ac9ff86b6510	2020-07-07 10:44:50 +02:00
Zuul	2e9620a2c0	Merge "Limit Inspection->Lookup->Heartbeat lag"	2020-07-06 18:08:14 +00:00
Zuul	6218725610	Merge "Fix serializing ironic-lib exceptions"	2020-07-06 16:47:58 +00:00
Julia Kreger	c76b8b2c21	Limit Inspection->Lookup->Heartbeat lag Caches hardware information collected during inspection so that the initial lookup can occur without any delay. Also adds logging to track how long inventory collection takes. Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com> Change-Id: I3e0d237d37219e783d81913fa6cc490492b3f96a	2020-07-03 10:32:26 +02:00
Dmitry Tantsur	ba3caa6c64	Increase the ESP partition size to 550 MiB when using software RAID This has been a popular guidance, and diskimage-builder has recently started following it. Change-Id: I794c846fb191c15b0a30546bf64d624dfbde0fd4	2020-07-02 17:30:33 +02:00
Zuul	de7d5affe7	Merge "Mount all vfat partitions before calling grub2"	2020-07-02 10:37:04 +00:00
Dmitry Tantsur	a4855c544c	Fix serializing ironic-lib exceptions Change-Id: If1408e4b81d263c56b4bbab618dd0737db5f762e Story: #2007889 Task: #40268	2020-07-02 12:18:53 +02:00
Arne Wiebalck	c5022790b3	Mount all vfat partitions before calling grub2 In order to ensure grub2 finds all files it needs, mount all vfat partitions specified in the deployed image. Story: #2007618 Task: #39629 Change-Id: Ie5b6e0abc3f266409562f9ecb26538126b667056	2020-06-30 18:31:58 +02:00
Dmitry Tantsur	00ad03b709	Fixes minor issues in the read() retries patch Follow-up to commit c5b97eb781cf9851f9abe87a1500b4da55b8bde8. Two things slipped through the cracks: * ImageDownloadError was instantiated incorrectly, resulting in a wrong error message. This was uncovered by using assertRaisesRegext in tests. * We allowed calling write(None). This was uncovered by avoiding sleep(4) in tests and enabling more failed calls before timeout. Change-Id: If5e798c5461ea3e474a153574b0db2da96f2dfa8	2020-06-30 10:51:53 +02:00
Zuul	f97f8e2c06	Merge "Fix confusing logging when running asynchronous commands"	2020-06-29 22:40:02 +00:00
Zuul	9219aae291	Merge "Extend retries to 9, 10 seconds apart."	2020-06-29 22:40:01 +00:00
Dmitry Tantsur	0eee26ea66	Fix confusing logging when running asynchronous commands We log them as completed when they start executing. Also fix a problem in remove_large_keys that prevented items with defaultdict from being logged. Change-Id: I34a06cc85f55c693416f8c4c9877d55d6affafc9	2020-06-26 15:19:04 +02:00
Riccardo Pittau	5cc44d251f	Add debug message to node lookup This should help identify the start of the node lookup. Change-Id: I72f0949fee84be5a2b06eab976c5560e252fa63a	2020-06-25 16:04:00 +02:00
Zuul	c94fb84497	Merge "Minor clean-up follow-up to timeout on read() fix"	2020-06-25 10:23:18 +00:00
Julia Kreger	7abda4eefe	Minor clean-up follow-up to timeout on read() fix Just some minor cleanup driven from the review process. Change-Id: I0b3d73c251d6da6d85e11279990dcc36751e27e7	2020-06-24 10:02:28 -07:00
Julia Kreger	c77a7df851	Extend retries to 9, 10 seconds apart. The download retry interval was previously five seconds which is not long enough to recover after a hard network connectivity break where we may be reliant upon network port forwarding hold-down timers or even routing protocol route propogation to recover communication. Previously the time value was 5 seconds, with 3 attempts, meaning 15 seconds total ignoring the error detection timeouts. Now it is 10 seconds, with 10 attempts, meaning 100 seconds before the error detection timeouts. Change-Id: I6d11edc9a3156f2bdc21c3d432ecc7625d652699	2020-06-23 20:27:49 +00:00
Julia Kreger	159ab9f0ce	Add full download retries Instead of just trying to get the connection and handler for the download, lets try to retry the whole action of of downloading. Change-Id: I9217792d32e6f33c70f146a9b7d3ef58c5644d8a	2020-06-23 20:27:41 +00:00
Julia Kreger	c5b97eb781	Add timeout operations to try and prevent hang on read() Socket read operations can be blocking and may not timeout as expected when thinking of timeouts at the beginning of a socket request. This can occur when streaming file contents down to the agent and there is a hard connectivity break. In other words, we could be in a situation like: - read(fd, len) - Gets data - Select returns context to the program, we do things with data. hard connectivity break for next 90 seconds - read(fd, len) - We drain the in-memory buffer side of the socket. - Select returns context, we do things with our remaining data Server retransmits Server times out due to no ack Server closes socket and issues a FIN,RST packet to the client Connectivity restored, Client never got FIN,RST Client socket still waiting for more data - read(fd, len) - No data returned - Select returns, yet we have no data to act on as the buffer is empty OR the buffered data doesn't meet our requried read len value. tl;dr noop - read(fd, len) <-- We continue to try and read until the socket is recognized as dead, which could be a long time. NOTE: The above read()s are python's read() on an contents being streamed. Lower level reads exist, but brains will hurt if we try to cover the dynamics at that level. As such, we need to keep an eye on when the last time we received a packet, and treat that as if we have timed out or not. Requests periodically yeilds back even when no data has been received, in order to allow the caller to wall clock the progress/status and take appropriate action. When we exceed the timeout time value with our wall clock, we will fail the download. Change-Id: I7214fc9dbd903789c9e39ee809f05454aeb5a240	2020-06-23 13:25:09 -07:00
Kaifeng Wang	61c95554ff	Adds poll mode deployment support Adds a new poll extension to provide get_hardware_info and get_node_info interfaces. get_hardware_info will be used for node validation by ironic deploy drivers. get_node_info will be used for sending lookup data to IPA. standalone mode is assumed as debug only, but it's not the case considering the poll mode will be introduced, slightly updates the description, also prevents the mdns lookup when standalone is true. Story: 1526486 Task: 28724 Change-Id: I5ad772a18cc4584585c5a7b6fb127547cece1998	2020-06-21 16:44:00 +08:00
Zuul	46bf7e0ef4	Merge "Add a deploy step for writing an image"	2020-06-20 00:00:10 +00:00
Dmitry Tantsur	6d7ec350ff	Make get_partition_uuids work with whole disk images We used to popular root UUID inside the message formatting function, move it to actual prepare_image/cache_image calls. Change-Id: Ifb22220dfd49633e8623dd76f7a6a128f5874b78	2020-06-17 14:38:58 +02:00
Zuul	751dac7b90	Merge "Split and move logic for partition tables"	2020-06-11 22:53:07 +00:00
Zuul	d7cf7bd341	Merge "New extension call to return partition UUIDs"	2020-06-09 12:31:55 +00:00
Dmitry Tantsur	7e5fe1121e	Make the install_bootloader command asynchronous It does not return anything, so it makes no point for it to be synchronous. Ironic always calls it with wait=True, so there is no problem with backward compatibility either. Change-Id: I44fec2e0cb54486328ce71263613d8592e384870	2020-06-08 15:10:05 +02:00
Dmitry Tantsur	9d4cf5532f	Add a deploy step for writing an image The new step just invokes the appropriate method of the standby extension. Change-Id: Ic74f83ab2b7e58f8e4b46e0abfab79e221afeb3e Story: 2006963	2020-06-02 15:23:54 +02:00
Dmitry Tantsur	6c1545b75b	New extension call to return partition UUIDs Currently we parse the success message from the write_image call. This is inconvenient and incompatible with the deploy steps split. Change-Id: I258dc1ff1ad1c9df5cbc26a7825d9e7ef2f3205b Story: #2006963	2020-06-02 15:05:59 +02:00
Fedor Tarasenko	952489020e	Fix an issue with high cpu usage caused by ironic-python-agent Currently running of ipa-centos8-stable-ussuri image causes 100% cpu usage while cleaning. Proposed change fixes this behavior and significantly speeds up cleaning. Change-Id: I2ba9a69f22b11830d8ff1bc346b17bf1a52f25b0 Story: #2007696 Task: #39809	2020-05-25 22:18:17 +03:00
Riccardo Pittau	557d5603a2	Split and move logic for partition tables Move and split the logic to create the partition tables when applying raid configuration. Change-Id: Ic76dd2067ace02dd02351caca0c7f9b05571e510	2020-05-25 08:11:28 +00:00
Zuul	fc7ac48a6e	Merge "Fix pep8 errors"	2020-05-13 11:27:36 +00:00
Riccardo Pittau	f6ee877cde	Fix pep8 errors For some reason pep8 test started to complain causing mayhem. This patch fixes the issues and does some refactor of dmi_inspector tests moving pure data to a separate file. Change-Id: Ia244a496acd80abad679f8ae9832d4f0471500e7	2020-05-12 10:57:23 +02:00
Riccardo Pittau	8d210638a8	Fix TypeError with newer version of lshw The issue with json output in lshw was fixed in version B.02.19 This patch makes the memory calculation compatible with that version and later versions that are included in recent distributions (e.g. Ubuntu 20.04, Fedora 31) Change-Id: Id5a30028b139c51cae6232cac73a50b917fea233 Story: 2007588 Task: 39527	2020-04-27 15:07:54 +02:00
Riccardo Pittau	2738e57f2a	Add function to calculate memory Move logic to calculate memory to its own function. Change-Id: I5ab98b6450ff45dff35ddae093a83140f37047a8	2020-04-27 10:46:05 +02:00
Dmitry Tantsur	8adb7e1a04	Add timeout and retries when connection to an image server If the server is stuck for any reason, the download will hang for a potentially long time. Provide a timeout (defaults to 60 seconds) and 2 retries on failure. Change-Id: Ie53519266edd914fdbfa82fe52b4a55151e5ec5f	2020-04-24 10:34:40 +02:00
Zuul	5dcff4d2b3	Merge "Add raid.apply_configuration deploy step"	2020-04-21 09:32:40 +00:00
Zuul	6a95e216f6	Merge "Simplify deduplicate_steps"	2020-04-21 00:37:15 +00:00
Dmitry Tantsur	896f389d5c	Mock get_node_boot_mode in software RAID unit tests This function checks for /sys/firmware/efi. Some tests do not mock isdir, so they fail on UEFI machines. Change-Id: I088218ddb88717ac07669d0b97c6cd50208ede8c	2020-04-20 16:41:28 +02:00

1 2 3 4 5 ...

807 Commits