953 Commits

Author SHA1 Message Date
Zuul
2172122b87 Merge "Rewrite write_image.sh in Python" 2021-05-26 17:17:02 +00:00
Zuul
b4dd03168e Merge "Enable out-of-order writes when writing whole disk images" 2021-05-25 13:03:29 +00:00
Zuul
6fc5a14760 Merge "Do not serialize command_params" 2021-05-18 14:58:42 +00:00
Dmitry Tantsur
606e500312 Rewrite write_image.sh in Python
Change-Id: I0caa65561948f4e0934943a7a0d3a209701b5a59
2021-05-18 14:45:13 +02:00
Dmitry Tantsur
d1844c61b1 Enable out-of-order writes when writing whole disk images
Per documentation it improves performance when using -O host_device.

Change-Id: Ic6a97af9f865d07c9cb4257397a320475a28f88b
2021-05-18 14:41:21 +02:00
Dmitry Tantsur
51aa31070a Do not serialize command_params
The command params can be huge when configdrive is used. There is no
point in sending them back, Ironic does not use them anyhow.

Story: #2008904
Task: #42479
Change-Id: I6e3db5db2042ca3fb5dafacfacf036fd7fc2fc4c
2021-05-18 12:59:28 +02:00
Zuul
d6e4fbd827 Merge "Remove the iscsi extension" 2021-05-12 11:08:19 +00:00
Zuul
719f20aaf5 Merge "Migrate functional tests for work_on_disk from ironic-lib" 2021-05-12 09:15:49 +00:00
Zuul
823e0ed743 Merge "Burn-in: Add memory step" 2021-05-11 09:31:54 +00:00
Zuul
29f3230791 Merge "Software RAID: RAID the ESPs" 2021-05-11 09:31:36 +00:00
Zuul
9837f1c2f0 Merge "Fix NVMe Partition image on UEFI" 2021-05-10 15:00:21 +00:00
Zuul
5c01ec4f6f Merge "Burn-in: Add CPU step" 2021-05-10 15:00:14 +00:00
Dmitry Tantsur
5492f57dfd Migrate functional tests for work_on_disk from ironic-lib
Missed in commit 24951b1029170840484a50fdd38d2a57858a578c.

Change-Id: Iad5e8f161ac69b96b9332d83fe22b5e0b9192258
2021-05-10 13:00:12 +02:00
Dmitry Tantsur
be3882162e Remove the iscsi extension
Change-Id: I2f0e581575112d6c7ba0d211661cab3e0b6caca6
2021-05-10 12:43:44 +02:00
Zuul
4ac3d79519 Merge "Remove runtime dependency on pbr" 2021-05-04 19:11:39 +00:00
Julia Kreger
fe825fa97e Fix NVMe Partition image on UEFI
The _manage_uefi code has a check where it attempts to just
identify the precise partition number of the device, in order
for configuration to be parsed and passed. However, the same code
did not handle the existence of a `p1` partition instead of just a
partition #1. This is because the device naming format is different
with NVMe and Software RAID.

Likely, this wasn't an issue with software raid due to how complex the
code interaction is, but the docs also indicate to use only whole disk
images in that case.

This patch was pulled down my one RH's professional services folks
who has confirmed it does indeed fix the issue at hand. This is noted
as a public comment on the Red Hat bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=1954096

Story: 2008881
Task: 42426
Related: rhbz#1954096
Change-Id: Ie3bd49add9a57fabbcdcbae4b73309066b620d02
2021-05-04 16:44:37 +00:00
Dmitry Tantsur
24951b1029 Import deployment logic from ironic-lib
The two functions work_on_disk and create_config_drive_partition contain
a substantial part of the deployment logic. Previously we placed them in
ironic-lib for re-using on the conductor side in the iSCSI deploy
interface. Since the iSCSI deploy is going away, we can move this code
to ironic-python-agent to simplify maintenance.

Imports code from ironic_lib commit 9fb5be348202f4854a455cd08f400ae12b99e1f2.

Change-Id: I6cbcd81533f135208b57746cb0e33ffdfaf94eee
2021-05-03 14:17:57 +02:00
Arne Wiebalck
5c222560f0 Burn-in: Add memory step
Add a clean step for memory burn-in via stress-ng. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42383

Change-Id: I33a83968c9f87cf795ec7ec922bce98b52c5181c
2021-05-01 10:36:58 +02:00
Arne Wiebalck
6702fcaa43 Burn-in: Add CPU step
Add a clean step for CPU burn-in via stress-ng. Get basic
run parameters from the node's driver_info.

Story: #2007523
Task: #42382

Change-Id: I14fd4164991fb94263757244f716b6bfe8edf875
2021-05-01 10:36:20 +02:00
Zuul
10c29cdc41 Merge "Fix getting memory size in some lshw output" 2021-04-30 12:24:44 +00:00
Zane Bitter
ed791d9778 Fix getting memory size in some lshw output
Due to a regression in lshw introduced by
https://github.com/lyonel/lshw/pull/60, there are some versions in the
wild that do not return sizes for memory banks <32GiB. In those cases,
work around the problem by looking at the top-level size (if available)
to find the total size. Previously we assumed that we only needed the
top-level size when there was no list of memory banks.

The issue is fixed upstream by https://github.com/lyonel/lshw/pull/65,
but the erroneous patch is still present in the lshw-B.02.19.2-5.el8
package in CentOS 8.4 and 8.5.

Change-Id: I6eb5981d28b9ae368239af0c1d0ec32ff79d95b3
Story: #2008865
Task: 42395
2021-04-29 14:41:11 -04:00
Zane Bitter
c56cd4abc0 Fix missing data in log messages
Change-Id: I5d08deed86d79a7ea0b7a1625122af595037dab5
2021-04-29 09:55:56 -04:00
Dmitry Tantsur
3251d7b641 Remove runtime dependency on pbr
Pbr is a very heavy package to depend on. It requires git-core, which is
16 MiB on my Fedora. We only use it to detect the version, which can be
done without pbr using a much lighter importlib_metadata.

Copied from https://review.opendev.org/c/openstack/osprofiler/+/739379

Change-Id: I5f434e6bfde6f645804941f3a36d5458a28270e7
2021-04-26 09:16:34 +02:00
Zuul
9edb13d891 Merge "Do not fail network interface collection on unsupported interface" 2021-04-22 16:35:25 +00:00
Derek Higgins
9c3fbfd000 Add a call to "udevadm settle" in write_image.sh
After GPT and MBR are destroyed systemd-udevd gets triggered
which may hold /dev/sda open preventing qemu-img from writting
its image.

Story: 2008830
Task: 42312
Change-Id: I6105192a16fcb7f6898910e8d0ab824d731d491d
2021-04-20 17:48:46 +01:00
Arne Wiebalck
c2d04dc156 Software RAID: RAID the ESPs
For software RAID in UEFI mode, we create ESPs on all holder disks
and copy the bootloader there. Since there is no mechanism to keep
the ESPs in sync, e.g. on kernel upgrades or when kernel parameters
are updated, the ESPs will get out of sync eventually. This may lead
to a situation where a node boots with outdated parameters or does
not have any of the installed kernels in the boot menu anymore.
This change proposes to RAID the ESPs. While the UEFI firmware will
find an ESP partition (one leg of the mirror), the node will see
an md device and all subsequent updates will go to all member disks.

Also, remove the source ESP after copying in order to avoid mount
confusion (same UUID!).

Story: #2008745
Task: #42103
Change-Id: I9078ef37f1e94382c645ae98ce724ac9ed87c287
2021-04-16 14:40:28 +02:00
Zuul
c72997d8d0 Merge "Always fall back to sysrq when power off fails" 2021-04-14 12:13:37 +00:00
Dmitry Tantsur
b395181b1b Always fall back to sysrq when power off fails
The line we're looking for is not there when IPA is in a container, at least
for CentOS based containers. Just fall back to sysrq on errors.

Change-Id: Ie4ee605ad9c6cda58808512a563247175859c71e
2021-04-13 19:05:04 +02:00
Zuul
5bac375f73 Merge "Capture the early logging" 2021-04-08 12:22:32 +00:00
Dmitry Tantsur
1ab405b509 Do not fail network interface collection on unsupported interface
Currently if one interface cannot be handled (e.g. it has empty MAC),
the whole collection fails. Ignore unsupported interfaces instead.

Change-Id: Ibdaad62b39c239d4f3fb3111c2fae9e31e877b28
2021-04-07 17:16:27 +02:00
Julia Kreger
df418984f0 Capture the early logging
_early_log prints to stdout, which is fine in some cases,
however in other cases it gets lost in the shuffle of process
launch by things like systemd.

Lets try to save everything, and re-log it so it is easy to
debug early issues.

Change-Id: I334a9073d17cccec4c669fae82edc3e388debc5c
2021-04-01 11:16:20 -07:00
Dmitry Tantsur
afcc5d392c Fix incorrect lsblk tag and add a virtual media job
Follow-up to 8dd6589e66d03e45e1d510601da9531a30842cff: PATH is not a
valid lsblk tag, we need to use KNAME with -p flag.

Also add a vmedia job to avoid breakages in the future. It's added
non-voting because we have a deadlock with this change:
https://review.opendev.org/c/openstack/ironic/+/783722

Change-Id: Ifffeac9c1c4d394526d655eaa14c9fe7bd3a1e5e
2021-03-30 12:25:14 +02:00
Zuul
49d123dd6e Merge "Validate vmedia for vmedia usage" 2021-03-29 23:38:10 +00:00
Julia Kreger
8dd6589e66 Validate vmedia for vmedia usage
Virtual media devices based logic needs to be
guarded from being used or considered based upon
if the machine actually booted from virtual media,
or not.

At the same time, actual devices need to be checked
in order to make sure they align with what we expect
in order to prevent consideration of content which
should not be leveraged.

Change-Id: If2d5c6f4815c9e42798a2d96d59015e1b1dbd457
Story: 2008749
Task: 42108
2021-03-29 13:22:43 -07:00
Jay Faulkner
de726d4acf Do not permit IPA standalone to be enabled by conf
IPA standalone mode is a developer-only option, and if enabled
accidentally on a production agent could cause undesired behavior.

Developers who need this behavior should build a purpose-built agent,
with standalone hardcoded to True in cmd/agent.py.

Change-Id: Icc67dbe15acbbf6fee886f274d2169a0769a5053
2021-03-25 12:45:28 +01:00
Bernd Mueller
2a64413bb6 typo chanages -> changes
Change-Id: Ifb75a5f6f01bd98011464eb05f98d8db001dcd54
2021-03-24 13:53:32 +01:00
Dmitry Tantsur
d622d38da6 Refactor: use mounted from ironic-lib
Change-Id: I0b597ddbc71c133abe6c0acfd8f49e3af4e896bb
2021-03-23 17:24:03 +01:00
Steve Baker
e61336602f Fix root UUID for streamed partition images
The root UUID changes after a streamed partition image is written to
the block device, causing later deployment failure when assuming the
old UUID.

This change updates the root UUID after streaming the partition image
is complete.

This issue may have been missed in local testing because deploying the
same image repeatedly will result in stable root UUID across runs.

Change-Id: Ice4630c16fc216980488d1427f3b02e1b8a417fa
2021-03-19 12:08:43 +01:00
Bob Fournier
4afe4f6069 Check the base device if the read-only file cannot be read
For some drives, the partition e.g. `/dev/sda1` will not have the
'ro' file which can result in a metadata erasure failure but the base
device (`/dev/sda`) will have this file.  Add an additional check
for the base device.

Change-Id: Ia01bdbf82cee6ce15fabdc42f9c23036df55b4c5
Story: 2008696
Task: 42004
2021-03-09 07:05:27 -05:00
Zuul
7931ccedfb Merge "Remove default parameter from execute" 2021-03-03 07:14:44 +00:00
Zuul
77bc398833 Merge "Increase the memory limit for qemu-img" 2021-03-02 16:03:45 +00:00
Riccardo Pittau
bff252c726 Remove default parameter from execute
The param check_exit_code from the processutils extension execute has
default already at [0]
See:
https://opendev.org/openstack/oslo.concurrency/src/branch/master/oslo_concurrency/processutils.py#L214

Change-Id: Iedff5325e0737556d5eb3da601c984ddfc633873
2021-03-02 16:19:32 +01:00
Derek Higgins
5492ad7da5 Increase the memory limit for qemu-img
We appear to be bumping up against this limit when deploying
RHCOS images(currently 977MB). Curiously the problem isn't
happening all the time but increasing the limit eliminates it.

This limit was intruduced to guard against a malicious image
allocating an arbitrary amount of memory. Nothing else runs
on hosts when IPA is running so we should be ok bumping up
the limit.

Story: #2008667
Task: #41955
Change-Id: I9405995915a874b00b7177c9642c5469d05d66a8
2021-03-02 11:38:57 +00:00
Jacob Anders
d2127e7ef4 Remove nvme-cli warning and delay on nvme-format
This change adds '-f' flag to nvme-cli calls during NVMe Secure Erase.
This removes nvme-cli output warning that the device is about to be
irreversibly deleted as well as the related 10 second delay which is
pointlessly increasing NVMe cleaning time.

Story: 2008290
Change-Id: I7b7b8b7d4f643b07d5c9dcf7ec35cf7ebedf44d1
2021-03-02 15:37:35 +10:00
Zuul
4a22c887f8 Merge "Use try_execute from ironic-lib" 2021-03-01 13:54:15 +00:00
Mohammed Naser
ab267aabdd Allow clean_configuration to run against full-device arrays
At the moment, it is not possible for Ironic to clean up a
RAID array that is built from an entire device.  This patch
allows it to do so by overriding the behaviour of attempting
to find the device name if the device names does not end with
a number and is a real block device.

Story: #2008663
Task: #41948
Change-Id: I66b0990acaec45b1635795563987b99f9fa04ac7
2021-02-27 17:24:16 -05:00
Riccardo Pittau
0459c61c8d Use try_execute from ironic-lib
Also adapt unit tests

Change-Id: I37d050877daabc9dc0a5821cf20a689652b26f34
2021-02-25 14:46:17 +01:00
Zuul
6ea3aff8d6 Merge "New deploy step for injecting arbitrary files" 2021-02-22 18:48:22 +00:00
Zuul
2979ee5314 Merge "Add support for using NVMe specific cleaning" 2021-02-19 12:13:55 +00:00
Jacob Anders
8bcf1be920 Add support for using NVMe specific cleaning
This change adds support for utilising NVMe specific cleaning tools
on supported devices. This will remove the neccessity of using shred to
securely delete the contents of a NVMe drive and enable using nvme-cli
tools instead, improving cleaning performance and reducing wear on the device.

Story: 2008290
Task: 41168
Change-Id: I2f63db9b739e53699bd5f164b79640927bf757d7
2021-02-18 22:51:34 +10:00