285 Commits

Author SHA1 Message Date
Julia Kreger
014d37743a Multipath Hardware path handling
Removes multipath base devices from consideration by
default, and instead allows the device-mapper device
managed by multipath to be picked up and utilized
instead.

In effect, allowing us to ignore standby paths *and*
leverage multiple concurrent IO paths if so offered
via ALUA.

In reality, anyone who has previously built IPA with
multipath tooling might not have encountered issues
previously because they used Active/Active SAN storage
environments. They would have worked because the IO lock
would have been exchanged between controllers and paths.
However, Active/Passive environments will block passive
paths from access, ultimately preventing new locks from
being established without proper negotiation. Ultimately
requiring multipathing *and* the agent to be smart enough
to know to disqualify underlying paths to backend storage
volumes.

An additional benefit of this is active/active MPIO devices
will, as long as ``multipath`` is present inside the ramdisk,
no longer possibly result in duplicate IO wipes occuring
accross numerous devices.

Story: #2010003
Task: #45108
Resolves: rhbz#2076622
Resolves: rhbz#2070519
Change-Id: I0fd6356f036d5ff17510fb838eaf418164cdfc92
2022-05-18 20:26:39 -03:00
Zuul
f08f70134d Merge "Improve efficiency of storage cleaning in mixed media envs" 2022-03-15 18:05:29 +00:00
Jacob Anders
c5f7f18bcb Improve efficiency of storage cleaning in mixed media envs
https://storyboard.openstack.org/#!/story/2008290 added support
for NVMe-native storage cleaning, greatly improving storage clean
times on NVMe-based nodes as well as reducing device wear.

This is a follow up change which aims to make further improvements
to cleaning efficiency in mixed NVMe-HDD environments. This is
achieved by combining NVMe-native cleaning methods on NVMe devices
with traditional metadata clean on non-NVMe devices.

Story: 2009264
Task: 43498
Change-Id: I445d8f4aaa6cd191d2e540032aed3148fdbff341
2022-03-15 19:00:25 +10:00
Julia Kreger
99ca1086db Create fstab entry with appropriate label
Depending on the how the stars align with partition images
being written to a remote system, we *may* end up with
*either* a Partition UUID value, or a Partition's UUID value.

Which are distinctly different.

This is becasue the value, when collected as a result of writing
an image to disk *falls* back and passes the value to enable
partition discovery and matching.

Later on, when we realized we ought to create an fstab entry,
we blindly re-used the value thinking it was, indeed, always
a Partition's UUID and not the Partition UUID. Obviously,
the label type is quite explicit, either UUID or PARTUUID
respectively, when initial ramdisk utilities such as dracut
are searching and mounting filesystems.

Adds capability to identify the correct label to utilize
based upon the current state of the block devices on disk.

Granted, we are likely only exposed to this because of IO
race conditions under high concurrecy load operations.
Normally this would only be seen on test VMs, but
systems being backed by a Storage Area Network *can*
exibit the same IO race conditions as virtual machines.

Change-Id: I953c936cbf8fad889108cbf4e50b1a15f511b38c
Resolves: rhbz#2058717
Story: #2009881
Task: 44623
2022-03-10 07:04:01 -08:00
Arne Wiebalck
62c5674a60 SoftwareRAID: Use efibootmgr (and drop grub2-install)
Move the software RAID code path from grub2-install to
efibootmgr:

- remove the UEFI efibootmgr exception for software RAID
- create and populate the ESPs on the holder disks
- update the NVRAM with all ESPs (the component devices
  of the ESP mirror, use unique labels to avoid unintentional
  deduplication of entries in the NVRAM)

Story: #2009794

Change-Id: I7ed34e595215194a589c2f1cd0b39ff0336da8f1
2022-01-26 14:43:40 +01:00
Riccardo Pittau
7b03fbbb36 Call execute from ironic-lib in hardware.py
Replace the execute wrapper from utils with execute from ironic-lib in
hardware.py

Adjust unit tests as needed.

Change-Id: I63a3b0407b2ca2246bd0e6624bfa0f748c0d73f7
2021-11-18 07:52:48 +01:00
Riccardo Pittau
a799dcc422 Move rescan device function to general utils
We use basically the same function in two modules in the same way, let's
put that in a common place.

Change-Id: I4016e43f2cb102d4327bafcc8a2f90112a6f944a
2021-11-10 15:34:37 +01:00
Riccardo Pittau
23e67b5fea Re-read the partition table with partx -a, part 2
Use add instead of update to re-read the partition table with partx.

See [1] for more details.

Co-authored-by: Arne Wiebalck <arne.wiebalck@cern.ch>

[1] https: //opendev.org/openstack/ironic-python-agent/commit/dc8c1f16f9a00e2bff21612d1a9cf0ea0f3addf0

Change-Id: I2336e22dadc790cfbde87904612fcaa3b8c501db
2021-11-09 13:03:14 +01:00
Arne Wiebalck
9d707e9f4b Software RAID: Call udev_settle before creation
This patch fixes a race during software RAID creation:
we create the partition with parted, the kernel then
notifies udev, but we need to wait for udevd to create
the device files before calling mdadm to create the
md device.

Credits to jcosmao for finding this.

Change-Id: I642f28acc351cf50263e37dfbc8468bf59de2cc5
2021-10-05 11:42:49 +02:00
Dmitry Tantsur
07ff3b8bbc Trivial: better debugging in list_all_block_devices
One debug message only specified "Skipping" without any details.
Another did not log the whole line from lsblk. Fix both.

Change-Id: I9f8f4edad88ba2df5abc6a45a74ebdb3c7afcf97
2021-08-27 12:19:28 +02:00
Zuul
438a1f4445 Merge "Move loading of IPMI module loading to a single point" 2021-08-23 16:14:14 +00:00
Zuul
71f54b7f98 Merge "Increase version of hacking and pycodestyle" 2021-08-11 10:02:24 +00:00
Jonas Schäfer
6441db61ce Move loading of IPMI module loading to a single point
This means we do not have to rely on modprobe idempotency as
much and it's less code duplication, which is always nice.

Signed-off-by: Jonas Schäfer <jonas.schaefer@cloudandheat.com>

Change-Id: I996aba47bc54309e15e7d56e4a96b23b8deb5c9c
2021-08-06 13:14:45 +02:00
Jonas Schäfer
61af712fe5 Expose BMC MAC address in inventory data
This exposes the MAC address of the first LAN channel with an assigned
IP address in the inventory data. This is useful for inventory
processes where the asset number is not discoverable from the software
side: the BMC MAC is going to be unique (at least within an
organization).

Change-Id: I8a4bee0c25743befd7f2033e4e0cba26895c8926
2021-08-06 13:14:45 +02:00
Riccardo Pittau
efbbc86f53 Increase version of hacking and pycodestyle
Fix H904 "Delay string interpolations at logging calls" errors

Change-Id: I331808d0132094faf739998a6984440787d3ebf8
2021-07-30 14:34:33 +02:00
Arne Wiebalck
cacdd9bab3 Burn-in: Add network step
Add a clean step for network burn-in via fio. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42385

Change-Id: I2861696740b2de9ec38f7e9fc2c5e448c009d0bf
2021-07-13 11:36:31 +02:00
Arne Wiebalck
20c5894bc2 Burn-in: Add disk step
Add a clean step for disk burn-in via fio. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42384

Change-Id: I5f5e336bd629846b3d779fd0fc7a2060b385b035
2021-05-21 16:33:11 +02:00
Zuul
823e0ed743 Merge "Burn-in: Add memory step" 2021-05-11 09:31:54 +00:00
Zuul
5c01ec4f6f Merge "Burn-in: Add CPU step" 2021-05-10 15:00:14 +00:00
Arne Wiebalck
5c222560f0 Burn-in: Add memory step
Add a clean step for memory burn-in via stress-ng. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42383

Change-Id: I33a83968c9f87cf795ec7ec922bce98b52c5181c
2021-05-01 10:36:58 +02:00
Arne Wiebalck
6702fcaa43 Burn-in: Add CPU step
Add a clean step for CPU burn-in via stress-ng. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42382

Change-Id: I14fd4164991fb94263757244f716b6bfe8edf875
2021-05-01 10:36:20 +02:00
Zuul
10c29cdc41 Merge "Fix getting memory size in some lshw output" 2021-04-30 12:24:44 +00:00
Zane Bitter
ed791d9778 Fix getting memory size in some lshw output
Due to a regression in lshw introduced by
https://github.com/lyonel/lshw/pull/60, there are some versions in the
wild that do not return sizes for memory banks <32GiB. In those cases,
work around the problem by looking at the top-level size (if available)
to find the total size. Previously we assumed that we only needed the
top-level size when there was no list of memory banks.

The issue is fixed upstream by https://github.com/lyonel/lshw/pull/65,
but the erroneous patch is still present in the lshw-B.02.19.2-5.el8
package in CentOS 8.4 and 8.5.

Change-Id: I6eb5981d28b9ae368239af0c1d0ec32ff79d95b3
Story: #2008865
Task: 42395
2021-04-29 14:41:11 -04:00
Zane Bitter
c56cd4abc0 Fix missing data in log messages
Change-Id: I5d08deed86d79a7ea0b7a1625122af595037dab5
2021-04-29 09:55:56 -04:00
Dmitry Tantsur
1ab405b509 Do not fail network interface collection on unsupported interface
Currently if one interface cannot be handled (e.g. it has empty MAC),
the whole collection fails. Ignore unsupported interfaces instead.

Change-Id: Ibdaad62b39c239d4f3fb3111c2fae9e31e877b28
2021-04-07 17:16:27 +02:00
Bernd Mueller
2a64413bb6 typo chanages -> changes
Change-Id: Ifb75a5f6f01bd98011464eb05f98d8db001dcd54
2021-03-24 13:53:32 +01:00
Bob Fournier
4afe4f6069 Check the base device if the read-only file cannot be read
For some drives, the partition e.g. `/dev/sda1` will not have the
'ro' file which can result in a metadata erasure failure but the base
device (`/dev/sda`) will have this file.  Add an additional check
for the base device.

Change-Id: Ia01bdbf82cee6ce15fabdc42f9c23036df55b4c5
Story: 2008696
Task: 42004
2021-03-09 07:05:27 -05:00
Riccardo Pittau
bff252c726 Remove default parameter from execute
The param check_exit_code from the processutils extension execute has
default already at [0]
See:
https://opendev.org/openstack/oslo.concurrency/src/branch/master/oslo_concurrency/processutils.py#L214

Change-Id: Iedff5325e0737556d5eb3da601c984ddfc633873
2021-03-02 16:19:32 +01:00
Jacob Anders
d2127e7ef4 Remove nvme-cli warning and delay on nvme-format
This change adds '-f' flag to nvme-cli calls during NVMe Secure Erase.
This removes nvme-cli output warning that the device is about to be
irreversibly deleted as well as the related 10 second delay which is
pointlessly increasing NVMe cleaning time.

Story: 2008290
Change-Id: I7b7b8b7d4f643b07d5c9dcf7ec35cf7ebedf44d1
2021-03-02 15:37:35 +10:00
Zuul
4a22c887f8 Merge "Use try_execute from ironic-lib" 2021-03-01 13:54:15 +00:00
Mohammed Naser
ab267aabdd Allow clean_configuration to run against full-device arrays
At the moment, it is not possible for Ironic to clean up a
RAID array that is built from an entire device.  This patch
allows it to do so by overriding the behaviour of attempting
to find the device name if the device names does not end with
a number and is a real block device.

Story: #2008663
Task: #41948
Change-Id: I66b0990acaec45b1635795563987b99f9fa04ac7
2021-02-27 17:24:16 -05:00
Riccardo Pittau
0459c61c8d Use try_execute from ironic-lib
Also adapt unit tests

Change-Id: I37d050877daabc9dc0a5821cf20a689652b26f34
2021-02-25 14:46:17 +01:00
Zuul
6ea3aff8d6 Merge "New deploy step for injecting arbitrary files" 2021-02-22 18:48:22 +00:00
Zuul
2979ee5314 Merge "Add support for using NVMe specific cleaning" 2021-02-19 12:13:55 +00:00
Jacob Anders
8bcf1be920 Add support for using NVMe specific cleaning
This change adds support for utilising NVMe specific cleaning tools
on supported devices. This will remove the neccessity of using shred to
securely delete the contents of a NVMe drive and enable using nvme-cli
tools instead, improving cleaning performance and reducing wear on the device.

Story: 2008290
Task: 41168
Change-Id: I2f63db9b739e53699bd5f164b79640927bf757d7
2021-02-18 22:51:34 +10:00
Riccardo Pittau
7d7940d904 Move some raid specific functions to raid_utils
To reduce size of the hardware module and separate the raid specific
code in raid_utils, we move some functions and adapt the tests.

Change-Id: I73f6cf118575b627e66727d88d5567377c1999a0
2021-02-17 10:11:13 +01:00
Dmitry Tantsur
59cb08fd28 New deploy step for injecting arbitrary files
This change adds a deploy step inject_files that adds a flexible
way to inject files into the instance.

Change-Id: I0e70a2cbc13744195c9493a48662e465ec010dbe
Story: #2008611
Task: #41794
2021-02-16 16:56:52 +01:00
Riccardo Pittau
fc1f2c73c6 Use variable for lsblk columns device info
Adjusted unit tests accordingly.

Also removed redundant parenthesis.

Change-Id: I8e2cac5172f009d5204f83bd83e1f27cfd721f09
2021-02-03 15:31:32 +01:00
Julia Kreger
cb6c0059b5 Fix default disk label with partition images
Partition images through the agent have the unfortunate
side effect of being executed without full node context
by default. Luckilly we've had a similar problem and
cache the node.

This patch changes the lookup from a default of msdos
partitions to use the cached node object.

Change-Id: I002816c9372fdf1cc32f3c67f420073551479fd9
2020-12-14 06:36:18 -08:00
Zuul
1a9491e651 Merge "Bring up VLAN interfaces and include in introspection report" 2020-12-02 13:59:28 +00:00
Zuul
22985da710 Merge "Make mdadm a soft requirement" 2020-11-23 19:37:59 +00:00
Dmitry Tantsur
ab8dee0386 Make mdadm a soft requirement
No point in requiring it for deployments that don't use software RAID.

Change-Id: I8b40f02cc81d3154f98fa3f2cbb4d3c7319291b8
2020-11-20 17:07:00 +01:00
Bob Fournier
6e3f28d720 Bring up VLAN interfaces and include in introspection report
Add the ability to bring up VLAN interfaces and include them in the
introspection report.  A new configuration field is added -
``ipa-enable-vlan-interfaces``, which defines either the VLAN interface
to enable, the interface to use, or 'all' - which indicates all
interfaces.  If the particular VLAN is not provided, IPA will
use the lldp info for the interface to determine which VLANs should
be enabled.

Change-Id: Icb4f66a02b298b4d165ebb58134cd31029e535cc
Story: 2008298
Task: 41183
2020-11-20 10:17:00 -05:00
Zuul
4762aca077 Merge "Add clean step 'erase_pstore'" 2020-11-18 17:38:00 +00:00
Arne Wiebalck
92e26b01e9 Add clean step 'erase_pstore'
Add an automatic clean step to clean the Linux kernel's pstore.
The step is disabled by default.

Story: #2008317
Task: #41214

Change-Id: Ie1a42dfff4c7e1c7abeaf39feca956bb9e2ea497
2020-11-17 18:00:16 +01:00
Vladyslav Drok
3761a44800 Fix vendor info retrieval for some versions of lshw
There is one more place that relies on lshw json output being a dict,
so let's fix the function that gets the dict rather than places it is
being used in.

Change-Id: Ia1c2c2e6a32c76ac0249e6a46e4cced18d6093a9
Task: 39527
Story: 2007588
2020-11-16 15:25:12 +01:00
Zuul
c33b3fff66 Merge "Add UUID to BlockDevice object" 2020-11-11 21:42:51 +00:00
Vladyslav Drok
c7858d3cc8 Add UUID to BlockDevice object
It'd allow for example custom ansible playbooks to use UUIDs of the
introspected node's disks. In future it might also enable agent
to use UUID (or by_path value) to refer to a device instead of
name, as it happens currently.

Change-Id: Id00437d2295c39fb12f3c25a92b30b56a58eef13
2020-11-11 17:25:59 +00:00
Vladyslav Drok
448ded43fe Fix physical memory calculation with new lshw
It seems that fix Id5a30028b139c51cae6232cac73a50b917fea233 was
dealing with a different issue. According to the description
in the story, and the linked commit there, the problem is the
fact that output is changed from dictionary to a list (with just
one value supposedly?). This commit changes the isinstance call
to check if an output of lshw is a list, and if so, we just use
the first element of the list.

Story: 2007588
Task: 39527
Change-Id: I87d87fd035701303e7d530a47b682db84e72ccb9
2020-11-06 19:09:28 +01:00
Zuul
f52863a4d8 Merge "Updated Implementation of string interpolation delay on LOG messages" 2020-11-04 10:45:32 +00:00