ironic-python-agent

Author	SHA1	Message	Date
Julia Kreger	beb7484858	Guard shared device/cluster filesystems Certain filesystems are sometimes used in specialty computing environments where a shared storage infrastructure or fabric exists. These filesystems allow for multi-host shared concurrent read/write access to the underlying block device by not locking the entire device for exclusive use. Generally ranges of the disk are reserved for each interacting node to write to, and locking schemes are used to prevent collissions. These filesystems are common for use cases where high availability is required or ability for individual computers to collaborate on a given workload is critical, such as a group of hypervisors supporting virtual machines because it can allow for nearly seamless transfer of workload from one machine to another. Similar technologies are also used for cluster quorum and cluster durable state sharing, however that is not specifically considered in scope. Where things get difficult is becuase the entire device is not exclusively locked with the storage fabrics, and in some cases locking is handled by a Distributed Lock Manager on the network, or via special sector interactions amongst the cluster members which understand and support the filesystem. As a reult of this IO/Interaction model, an Ironic-Python-Agent performing cleaning can effectively destroy the cluster just by attempting to clean storage which it percieves as attached locally. This is not IPA's fault, often this case occurs when a Storage Administrator forgot to update LUN masking or volume settings on a SAN as it relates to an individual host in the overall computing environment. The net result of one node cleaning the shared volume may include restoration from snapshot, backup storage, or may ultimately cause permenant data loss, depending on the environment and the usage of that environment. Included in this patch: - IBM GPFS - Can be used on a shared block device... apparently according to IBM's documentation. The standard use of GPFS is more Ceph like in design... however GPFS is also a specially licensed commercial offering, so it is a red flag if this is encountered, and should be investigated by the environment's systems operator. - Red Hat GFS2 - Is used with shared common block devices in clusters. - VMware VMFS - Is used with shared SAN block devices, as well as local block devices. With shared block devices, ranges of the disk are locked instead of the whole disk, and the ranges are mapped to virtual machine disk interfaces. It is unknown, due to lack of information, if this will detect and prevent erasure of VMFS logical extent volumes. Co-Authored-by: Jay Faulkner <jay@jvf.cc> Change-Id: Ic8cade008577516e696893fdbdabf70999c06a5b Story: 2009978 Task: 44985	2022-07-19 13:24:03 -07:00
Zuul	ccf4ee31cf	Merge "Gather details about bond interfaces if present"	2022-07-02 02:56:46 +00:00
Zuul	a9de7f80cc	Merge "Use json for lsblk output"	2022-06-30 23:38:15 +00:00
Zuul	312e1527ab	Merge "Warn when smartctl not found"	2022-06-27 12:10:31 +00:00
Mark Goddard	b68fa6b2e1	Warn when smartctl not found Currently, if smartctl is not found by IPA, it will silently skip ATA secure erase and proceed to shred (if enabled). This is supposedly for backwards compatibility, but is quite hard to diagnose. This change adds a warning message to make it more obvious what is happening. TrivialFix Change-Id: I03a381e99de79f201ec7e9a388777c3d48457e93	2022-06-24 16:58:37 +01:00
Derek Higgins	7e4fe3bf6a	Gather details about bond interfaces if present If present gather information about bonded interfaces. Story: #2010093 Task: #45637 Change-Id: I394187640b4788ebec21c3391d33ed728fb72ffa	2022-06-21 09:45:03 +01:00
Dmitry Tantsur	69e2254503	Fix discovering WWN/serial for devicemapper devices UDev prefix is DM_ not ID_ for them. On top of that, they don't have short serials (or at least don't always have). Change-Id: I5b6075fbff72201a2fd620f789978acceafc417b	2022-06-14 19:06:53 +02:00
Riccardo Pittau	09ea41c83d	Use json for lsblk output The lsblk output is available in json format since version 2.27 of util-linux [1] https: //mirrors.edge.kernel.org/pub/linux/utils/util-linux/v2.27/v2.27-ReleaseNotes Change-Id: I0c5812736b7a320cc4ecc333f80db70eb78cc76d	2022-06-14 17:50:05 +02:00
Julia Kreger	014d37743a	Multipath Hardware path handling Removes multipath base devices from consideration by default, and instead allows the device-mapper device managed by multipath to be picked up and utilized instead. In effect, allowing us to ignore standby paths and leverage multiple concurrent IO paths if so offered via ALUA. In reality, anyone who has previously built IPA with multipath tooling might not have encountered issues previously because they used Active/Active SAN storage environments. They would have worked because the IO lock would have been exchanged between controllers and paths. However, Active/Passive environments will block passive paths from access, ultimately preventing new locks from being established without proper negotiation. Ultimately requiring multipathing and the agent to be smart enough to know to disqualify underlying paths to backend storage volumes. An additional benefit of this is active/active MPIO devices will, as long as ``multipath`` is present inside the ramdisk, no longer possibly result in duplicate IO wipes occuring accross numerous devices. Story: #2010003 Task: #45108 Resolves: rhbz#2076622 Resolves: rhbz#2070519 Change-Id: I0fd6356f036d5ff17510fb838eaf418164cdfc92	2022-05-18 20:26:39 -03:00
Zuul	f08f70134d	Merge "Improve efficiency of storage cleaning in mixed media envs"	2022-03-15 18:05:29 +00:00
Jacob Anders	c5f7f18bcb	Improve efficiency of storage cleaning in mixed media envs https://storyboard.openstack.org/#!/story/2008290 added support for NVMe-native storage cleaning, greatly improving storage clean times on NVMe-based nodes as well as reducing device wear. This is a follow up change which aims to make further improvements to cleaning efficiency in mixed NVMe-HDD environments. This is achieved by combining NVMe-native cleaning methods on NVMe devices with traditional metadata clean on non-NVMe devices. Story: 2009264 Task: 43498 Change-Id: I445d8f4aaa6cd191d2e540032aed3148fdbff341	2022-03-15 19:00:25 +10:00
Julia Kreger	99ca1086db	Create fstab entry with appropriate label Depending on the how the stars align with partition images being written to a remote system, we may end up with either a Partition UUID value, or a Partition's UUID value. Which are distinctly different. This is becasue the value, when collected as a result of writing an image to disk falls back and passes the value to enable partition discovery and matching. Later on, when we realized we ought to create an fstab entry, we blindly re-used the value thinking it was, indeed, always a Partition's UUID and not the Partition UUID. Obviously, the label type is quite explicit, either UUID or PARTUUID respectively, when initial ramdisk utilities such as dracut are searching and mounting filesystems. Adds capability to identify the correct label to utilize based upon the current state of the block devices on disk. Granted, we are likely only exposed to this because of IO race conditions under high concurrecy load operations. Normally this would only be seen on test VMs, but systems being backed by a Storage Area Network can exibit the same IO race conditions as virtual machines. Change-Id: I953c936cbf8fad889108cbf4e50b1a15f511b38c Resolves: rhbz#2058717 Story: #2009881 Task: 44623	2022-03-10 07:04:01 -08:00
Arne Wiebalck	62c5674a60	SoftwareRAID: Use efibootmgr (and drop grub2-install) Move the software RAID code path from grub2-install to efibootmgr: - remove the UEFI efibootmgr exception for software RAID - create and populate the ESPs on the holder disks - update the NVRAM with all ESPs (the component devices of the ESP mirror, use unique labels to avoid unintentional deduplication of entries in the NVRAM) Story: #2009794 Change-Id: I7ed34e595215194a589c2f1cd0b39ff0336da8f1	2022-01-26 14:43:40 +01:00
Riccardo Pittau	7b03fbbb36	Call execute from ironic-lib in hardware.py Replace the execute wrapper from utils with execute from ironic-lib in hardware.py Adjust unit tests as needed. Change-Id: I63a3b0407b2ca2246bd0e6624bfa0f748c0d73f7	2021-11-18 07:52:48 +01:00
Riccardo Pittau	a799dcc422	Move rescan device function to general utils We use basically the same function in two modules in the same way, let's put that in a common place. Change-Id: I4016e43f2cb102d4327bafcc8a2f90112a6f944a	2021-11-10 15:34:37 +01:00
Riccardo Pittau	23e67b5fea	Re-read the partition table with partx -a, part 2 Use add instead of update to re-read the partition table with partx. See [1] for more details. Co-authored-by: Arne Wiebalck <arne.wiebalck@cern.ch> [1] https: //opendev.org/openstack/ironic-python-agent/commit/dc8c1f16f9a00e2bff21612d1a9cf0ea0f3addf0 Change-Id: I2336e22dadc790cfbde87904612fcaa3b8c501db	2021-11-09 13:03:14 +01:00
Arne Wiebalck	9d707e9f4b	Software RAID: Call udev_settle before creation This patch fixes a race during software RAID creation: we create the partition with parted, the kernel then notifies udev, but we need to wait for udevd to create the device files before calling mdadm to create the md device. Credits to jcosmao for finding this. Change-Id: I642f28acc351cf50263e37dfbc8468bf59de2cc5	2021-10-05 11:42:49 +02:00
Dmitry Tantsur	07ff3b8bbc	Trivial: better debugging in list_all_block_devices One debug message only specified "Skipping" without any details. Another did not log the whole line from lsblk. Fix both. Change-Id: I9f8f4edad88ba2df5abc6a45a74ebdb3c7afcf97	2021-08-27 12:19:28 +02:00
Zuul	438a1f4445	Merge "Move loading of IPMI module loading to a single point"	2021-08-23 16:14:14 +00:00
Zuul	71f54b7f98	Merge "Increase version of hacking and pycodestyle"	2021-08-11 10:02:24 +00:00
Jonas Schäfer	6441db61ce	Move loading of IPMI module loading to a single point This means we do not have to rely on modprobe idempotency as much and it's less code duplication, which is always nice. Signed-off-by: Jonas Schäfer <jonas.schaefer@cloudandheat.com> Change-Id: I996aba47bc54309e15e7d56e4a96b23b8deb5c9c	2021-08-06 13:14:45 +02:00
Jonas Schäfer	61af712fe5	Expose BMC MAC address in inventory data This exposes the MAC address of the first LAN channel with an assigned IP address in the inventory data. This is useful for inventory processes where the asset number is not discoverable from the software side: the BMC MAC is going to be unique (at least within an organization). Change-Id: I8a4bee0c25743befd7f2033e4e0cba26895c8926	2021-08-06 13:14:45 +02:00
Riccardo Pittau	efbbc86f53	Increase version of hacking and pycodestyle Fix H904 "Delay string interpolations at logging calls" errors Change-Id: I331808d0132094faf739998a6984440787d3ebf8	2021-07-30 14:34:33 +02:00
Arne Wiebalck	cacdd9bab3	Burn-in: Add network step Add a clean step for network burn-in via fio. Get basic run parameters from the node's driver_info. Story: #2007523 Task: #42385 Change-Id: I2861696740b2de9ec38f7e9fc2c5e448c009d0bf	2021-07-13 11:36:31 +02:00
Arne Wiebalck	20c5894bc2	Burn-in: Add disk step Add a clean step for disk burn-in via fio. Get basic run parameters from the node's driver_info. Story: #2007523 Task: #42384 Change-Id: I5f5e336bd629846b3d779fd0fc7a2060b385b035	2021-05-21 16:33:11 +02:00
Zuul	823e0ed743	Merge "Burn-in: Add memory step"	2021-05-11 09:31:54 +00:00
Zuul	5c01ec4f6f	Merge "Burn-in: Add CPU step"	2021-05-10 15:00:14 +00:00
Arne Wiebalck	5c222560f0	Burn-in: Add memory step Add a clean step for memory burn-in via stress-ng. Get basic run parameters from the node's driver_info. Story: #2007523 Task: #42383 Change-Id: I33a83968c9f87cf795ec7ec922bce98b52c5181c	2021-05-01 10:36:58 +02:00
Arne Wiebalck	6702fcaa43	Burn-in: Add CPU step Add a clean step for CPU burn-in via stress-ng. Get basic run parameters from the node's driver_info. Story: #2007523 Task: #42382 Change-Id: I14fd4164991fb94263757244f716b6bfe8edf875	2021-05-01 10:36:20 +02:00
Zuul	10c29cdc41	Merge "Fix getting memory size in some lshw output"	2021-04-30 12:24:44 +00:00
Zane Bitter	ed791d9778	Fix getting memory size in some lshw output Due to a regression in lshw introduced by https://github.com/lyonel/lshw/pull/60, there are some versions in the wild that do not return sizes for memory banks <32GiB. In those cases, work around the problem by looking at the top-level size (if available) to find the total size. Previously we assumed that we only needed the top-level size when there was no list of memory banks. The issue is fixed upstream by https://github.com/lyonel/lshw/pull/65, but the erroneous patch is still present in the lshw-B.02.19.2-5.el8 package in CentOS 8.4 and 8.5. Change-Id: I6eb5981d28b9ae368239af0c1d0ec32ff79d95b3 Story: #2008865 Task: 42395	2021-04-29 14:41:11 -04:00
Zane Bitter	c56cd4abc0	Fix missing data in log messages Change-Id: I5d08deed86d79a7ea0b7a1625122af595037dab5	2021-04-29 09:55:56 -04:00
Dmitry Tantsur	1ab405b509	Do not fail network interface collection on unsupported interface Currently if one interface cannot be handled (e.g. it has empty MAC), the whole collection fails. Ignore unsupported interfaces instead. Change-Id: Ibdaad62b39c239d4f3fb3111c2fae9e31e877b28	2021-04-07 17:16:27 +02:00
Bernd Mueller	2a64413bb6	typo chanages -> changes Change-Id: Ifb75a5f6f01bd98011464eb05f98d8db001dcd54	2021-03-24 13:53:32 +01:00
Bob Fournier	4afe4f6069	Check the base device if the read-only file cannot be read For some drives, the partition e.g. `/dev/sda1` will not have the 'ro' file which can result in a metadata erasure failure but the base device (`/dev/sda`) will have this file. Add an additional check for the base device. Change-Id: Ia01bdbf82cee6ce15fabdc42f9c23036df55b4c5 Story: 2008696 Task: 42004	2021-03-09 07:05:27 -05:00
Riccardo Pittau	bff252c726	Remove default parameter from execute The param check_exit_code from the processutils extension execute has default already at [0] See: https://opendev.org/openstack/oslo.concurrency/src/branch/master/oslo_concurrency/processutils.py#L214 Change-Id: Iedff5325e0737556d5eb3da601c984ddfc633873	2021-03-02 16:19:32 +01:00
Jacob Anders	d2127e7ef4	Remove nvme-cli warning and delay on nvme-format This change adds '-f' flag to nvme-cli calls during NVMe Secure Erase. This removes nvme-cli output warning that the device is about to be irreversibly deleted as well as the related 10 second delay which is pointlessly increasing NVMe cleaning time. Story: 2008290 Change-Id: I7b7b8b7d4f643b07d5c9dcf7ec35cf7ebedf44d1	2021-03-02 15:37:35 +10:00
Zuul	4a22c887f8	Merge "Use try_execute from ironic-lib"	2021-03-01 13:54:15 +00:00
Mohammed Naser	ab267aabdd	Allow clean_configuration to run against full-device arrays At the moment, it is not possible for Ironic to clean up a RAID array that is built from an entire device. This patch allows it to do so by overriding the behaviour of attempting to find the device name if the device names does not end with a number and is a real block device. Story: #2008663 Task: #41948 Change-Id: I66b0990acaec45b1635795563987b99f9fa04ac7	2021-02-27 17:24:16 -05:00
Riccardo Pittau	0459c61c8d	Use try_execute from ironic-lib Also adapt unit tests Change-Id: I37d050877daabc9dc0a5821cf20a689652b26f34	2021-02-25 14:46:17 +01:00
Zuul	6ea3aff8d6	Merge "New deploy step for injecting arbitrary files"	2021-02-22 18:48:22 +00:00
Zuul	2979ee5314	Merge "Add support for using NVMe specific cleaning"	2021-02-19 12:13:55 +00:00
Jacob Anders	8bcf1be920	Add support for using NVMe specific cleaning This change adds support for utilising NVMe specific cleaning tools on supported devices. This will remove the neccessity of using shred to securely delete the contents of a NVMe drive and enable using nvme-cli tools instead, improving cleaning performance and reducing wear on the device. Story: 2008290 Task: 41168 Change-Id: I2f63db9b739e53699bd5f164b79640927bf757d7	2021-02-18 22:51:34 +10:00
Riccardo Pittau	7d7940d904	Move some raid specific functions to raid_utils To reduce size of the hardware module and separate the raid specific code in raid_utils, we move some functions and adapt the tests. Change-Id: I73f6cf118575b627e66727d88d5567377c1999a0	2021-02-17 10:11:13 +01:00
Dmitry Tantsur	59cb08fd28	New deploy step for injecting arbitrary files This change adds a deploy step inject_files that adds a flexible way to inject files into the instance. Change-Id: I0e70a2cbc13744195c9493a48662e465ec010dbe Story: #2008611 Task: #41794	2021-02-16 16:56:52 +01:00
Riccardo Pittau	fc1f2c73c6	Use variable for lsblk columns device info Adjusted unit tests accordingly. Also removed redundant parenthesis. Change-Id: I8e2cac5172f009d5204f83bd83e1f27cfd721f09	2021-02-03 15:31:32 +01:00
Julia Kreger	cb6c0059b5	Fix default disk label with partition images Partition images through the agent have the unfortunate side effect of being executed without full node context by default. Luckilly we've had a similar problem and cache the node. This patch changes the lookup from a default of msdos partitions to use the cached node object. Change-Id: I002816c9372fdf1cc32f3c67f420073551479fd9	2020-12-14 06:36:18 -08:00
Zuul	1a9491e651	Merge "Bring up VLAN interfaces and include in introspection report"	2020-12-02 13:59:28 +00:00
Zuul	22985da710	Merge "Make mdadm a soft requirement"	2020-11-23 19:37:59 +00:00
Dmitry Tantsur	ab8dee0386	Make mdadm a soft requirement No point in requiring it for deployments that don't use software RAID. Change-Id: I8b40f02cc81d3154f98fa3f2cbb4d3c7319291b8	2020-11-20 17:07:00 +01:00

1 2 3 4 5 ...

293 Commits