876 Commits

Author SHA1 Message Date
Dmitry Tantsur
557293ca6a Generate TLS certificates with validity time in the past
Otherwise a slight clock skew may prevent them from working, see
e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1906448.

Change-Id: Icea103af06edef16c0dc4578877dc04cd6ec3b0c
2020-12-10 16:22:13 +01:00
Zuul
1a9491e651 Merge "Bring up VLAN interfaces and include in introspection report" 2020-12-02 13:59:28 +00:00
Zuul
22985da710 Merge "Make mdadm a soft requirement" 2020-11-23 19:37:59 +00:00
Dmitry Tantsur
ab8dee0386 Make mdadm a soft requirement
No point in requiring it for deployments that don't use software RAID.

Change-Id: I8b40f02cc81d3154f98fa3f2cbb4d3c7319291b8
2020-11-20 17:07:00 +01:00
Bob Fournier
6e3f28d720 Bring up VLAN interfaces and include in introspection report
Add the ability to bring up VLAN interfaces and include them in the
introspection report.  A new configuration field is added -
``ipa-enable-vlan-interfaces``, which defines either the VLAN interface
to enable, the interface to use, or 'all' - which indicates all
interfaces.  If the particular VLAN is not provided, IPA will
use the lldp info for the interface to determine which VLANs should
be enabled.

Change-Id: Icb4f66a02b298b4d165ebb58134cd31029e535cc
Story: 2008298
Task: 41183
2020-11-20 10:17:00 -05:00
Zuul
4762aca077 Merge "Add clean step 'erase_pstore'" 2020-11-18 17:38:00 +00:00
Arne Wiebalck
92e26b01e9 Add clean step 'erase_pstore'
Add an automatic clean step to clean the Linux kernel's pstore.
The step is disabled by default.

Story: #2008317
Task: #41214

Change-Id: Ie1a42dfff4c7e1c7abeaf39feca956bb9e2ea497
2020-11-17 18:00:16 +01:00
Vladyslav Drok
3761a44800 Fix vendor info retrieval for some versions of lshw
There is one more place that relies on lshw json output being a dict,
so let's fix the function that gets the dict rather than places it is
being used in.

Change-Id: Ia1c2c2e6a32c76ac0249e6a46e4cced18d6093a9
Task: 39527
Story: 2007588
2020-11-16 15:25:12 +01:00
Zuul
37dc11fcc1 Merge "Log configuration options on start-up" 2020-11-12 17:39:23 +00:00
Zuul
c33b3fff66 Merge "Add UUID to BlockDevice object" 2020-11-11 21:42:51 +00:00
Vladyslav Drok
c7858d3cc8 Add UUID to BlockDevice object
It'd allow for example custom ansible playbooks to use UUIDs of the
introspected node's disks. In future it might also enable agent
to use UUID (or by_path value) to refer to a device instead of
name, as it happens currently.

Change-Id: Id00437d2295c39fb12f3c25a92b30b56a58eef13
2020-11-11 17:25:59 +00:00
Dmitry Tantsur
c585603ee6 Log configuration options on start-up
This is very convenient for debugging and is something ironic and
ironic-inspector already do.

Register SSL options earlier so that they're accounted for.

Change-Id: I56aca8eec1dfeb065ac657452a7076a9e3d17cc3
2020-11-11 16:38:10 +01:00
Zuul
1f590ea382 Merge "Support using LABEL as identifier for rootfs" 2020-11-10 17:56:09 +00:00
Vladyslav Drok
448ded43fe Fix physical memory calculation with new lshw
It seems that fix Id5a30028b139c51cae6232cac73a50b917fea233 was
dealing with a different issue. According to the description
in the story, and the linked commit there, the problem is the
fact that output is changed from dictionary to a list (with just
one value supposedly?). This commit changes the isinstance call
to check if an output of lshw is a list, and if so, we just use
the first element of the list.

Story: 2007588
Task: 39527
Change-Id: I87d87fd035701303e7d530a47b682db84e72ccb9
2020-11-06 19:09:28 +01:00
Zuul
f52863a4d8 Merge "Updated Implementation of string interpolation delay on LOG messages" 2020-11-04 10:45:32 +00:00
ebagakis
35d412e9d5 Updated Implementation of string interpolation delay on LOG messages
This is a follow up to https://review.opendev.org/#/c/756300/

Change-Id: Ifba8a57b58d61ede169c60f6d51f224d134c7708
2020-11-03 15:27:27 +01:00
Fedor Tarasenko
694ea7425d Support using LABEL as identifier for rootfs
Add possibility to use disk LABEL to identify rootfs uuid for
Software RAID deployment

Change-Id: I77f36e70ddc539af0190db1c1abe0fb2c66f34b4
Story: 2008303
Task: 41188
2020-11-03 13:03:34 +03:00
Zuul
f356356486 Merge "Follow-up to API version setting" 2020-11-02 11:48:22 +00:00
Zuul
d84e88769e Merge "Don't run os-prober from grub2-mkconfig" 2020-11-01 12:27:07 +00:00
Julia Kreger
066a96a926 Follow-up to API version setting
Follow-up on Ib96a1057792f45f2e4554671e32c436140463ee8 to
improve some of the wording and review feedback by
Dmitry Tantsur.

Change-Id: Id77b0d72f3d78e5befd05fbdb6b21bc780f4ddfe
2020-10-30 08:28:54 -07:00
Jay Faulkner
80575566b1 Allow manual setting of Ironic API Version
Typically, the Ironic API client in IPA will autodetect the API version
based on the output of a GET of the root of the API. If for some reason
this API endpoint is restricted, or the operator wishes to limit the
Ironic API version IPA uses, they can now set CONF.ironic_api_version to
avoid autodetection and force a version.

Change-Id: Ib96a1057792f45f2e4554671e32c436140463ee8
2020-10-23 15:38:42 +00:00
Julia Kreger
6542a9cb04 Don't run os-prober from grub2-mkconfig
By default, grub2-mkconfig scans everything to look for other
environments and then load those into the grub configuration.

It makes sense, but on newer versions of grub2 in distribution
images, os-prober is taking an exceptionally long time in some
cases where more than one storage device exists with other
filesystems.

As a result, of the os-prober execution by grub2-mkconfig, the
bootloader installation can completely time out and fail the
deployment. This is presently experienced with metalsmith on
centos8.

There are numerous sporatic reports of issues like this issue
where grub2-mkconfig hangs for some period of time, and this is
observable on Centos8.2 in our CI. While one report[0] mentions
this issue, Another bug [1] has the dialog that actually helps us
frame the context as to what we likely should do.

Also, fixes the unit testing so we actually test if we're running
with grub2. :\

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1744693
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1709682

Depends-On: https://review.opendev.org/#/c/748315
Change-Id: I14bf299afef3a1ddb2006fe5f182d7f0d249e734
2020-10-22 22:28:07 +00:00
Arne Wiebalck
c7f6baf7f4 [trivial] Remove redundant list conversion
Follow-up to https://review.opendev.org/#/c/756300/

Change-Id: Ibc6c044e24dde82928f19a9b9a7eaf68be53fb0e
2020-10-13 08:29:53 +02:00
Zuul
80b0a9a132 Merge "Software RAID: Re-add missing devices" 2020-10-12 12:24:24 +00:00
Dmitry Tantsur
420ebc0d73 Do not silently swallow errors in the write_image deploy step
Calling join() does not raise, we need to explicitly check the result.

Change-Id: I81d3d727af220c2b50358edab8139f07874611f0
Story: #2008240
Task: #41083
2020-10-09 11:24:12 +02:00
Zuul
bd127d193b Merge "Reduce the duration of retries in the inspector tests" 2020-10-08 23:04:24 +00:00
Zuul
35d2292aa4 Merge "Log a warning of target_boot_mode does not match current boot mode" 2020-10-07 17:01:51 +00:00
Dmitry Tantsur
62672de131 Reduce the duration of retries in the inspector tests
Currently the test takes 5*5=25 seconds. Re-arrange the code so
that it's possible to change the retry delay in tests.

Change-Id: Ia559dad4bc656f8ad6b2cb8cb0137a97e2614db7
2020-10-07 12:39:01 +02:00
Dmitry Tantsur
1a67dddde7 Log a warning of target_boot_mode does not match current boot mode
This is not a normal situation and is likely to cause problems.

Change-Id: Id0668fd160ac0539d85997e985f8c43d9da75c90
2020-10-07 12:30:23 +02:00
Dmitry Tantsur
fc4e0eed6a Don't try to call GRUB when root UUID is not provided
We don't have a really working way to detect root UUID for whole
disk images at the moment, which results in an ignored traceback
every time install_bootloader is called with whole disk images in
UEFI mode. Avoid it by skipping GRUB2 if root UUID is unknown.

Change-Id: I84245538f59c664b72d1cafbca8d61be0978f489
2020-10-07 12:06:42 +02:00
Zuul
abd9f91813 Merge "Add basic retries for inspection" 2020-10-06 17:07:20 +00:00
Arne Wiebalck
253b4887d5 Software RAID: Re-add missing devices
Upon md device creation, component devices are sometimes removed
immediately again due to a "disk failure". The disks seem healthy,
though. This patch re-adds compoenent devices in such cases to
prevent that the md device will remain in a degraded state (which
would cause issues later, e.g. during ESP creation).

Story: #2008164
Task: #40914

Change-Id: I2ac7cb4a546de84686d5c3435e850c14b3f6c1d7
2020-10-06 14:00:57 +02:00
Zuul
99dee5067e Merge "Software RAID: Get component devices by md UUID" 2020-09-30 18:30:56 +00:00
Zuul
faeb9441d3 Merge "Simplify heartbeating by removing use of select()" 2020-09-29 15:47:08 +00:00
Arne Wiebalck
044c64dbc0 Software RAID: Get component devices by md UUID
Scanning the output of mdadm commands for RAID members will
miss component devices which are currently not part of the
RAID. For proper cleaning it is better to scan block devices
for a signature of the md device for which we would like to
get the components.

Story: #2008186
Task: #40947

Change-Id: Ib46612697851e36a16d272ccaeb0115106253863
2020-09-29 17:08:40 +02:00
Arne Wiebalck
c7aec775ff Software RAID: Don't delete partitions too early
Partions on the holder disk should only be deleted after
all RAID devices have been deleted. Otherwise, super blocks
on partitions which reside on the same disks cannot be cleaned.

Story: #2008199
Task: #40979
Change-Id: I19293f5b992cd1fa68957d6f306dcec8f3b7a820
2020-09-28 10:35:12 +02:00
Zuul
5e61ad18e3 Merge "When reporting that agent is busy, report the executed command" 2020-09-23 22:00:15 +00:00
Zuul
c7ff931fe6 Merge "Fix: make Intel CNA hardware manager none generic" 2020-09-23 14:57:40 +00:00
Zuul
11a87365fb Merge "Generate a TLS certificate and send it to ironic" 2020-09-23 12:14:38 +00:00
Qianbiao.NG
4b0ef13d08 Fix: make Intel CNA hardware manager none generic
Currently, IntelCnaHardwareManager inherits GenericHardwareManager
which makes it a new "GenericHardwareManager" with "MAINLINE" priority.
This causes all other hardware-managers with lower priority than
"MAINLINE" never be used. To fix this, make IntelCnaHardwareManager
inherit basic HardwareManager.

Change-Id: I28b665d8841b0b2e83b132e1f25df95e03e7ba10
Story: 2008142
Task: 40882
2020-09-23 18:24:26 +08:00
Jay Faulkner
a01646f56b Simplify heartbeating by removing use of select()
Heartbeating in IPA has used select.poll() for years to workaround
a bug where changing the time in the ramdisk could cause heartbeats
to stop and never resume.

Now that IPA syncs time at start and exit, this workaround is no
longer needed. So instead, we'll revert to using threading.Event()
in order to make the code simpler and easier to understand.

Since we need this to be an eventlet-event, and not a standard-thread
event, also monkey_patch threading.

Additionally, there were a few completely unused backoff interval
values set, that were never applied. In respect of maintaining the
5+ years old behavior of not doing error backoffs, that code was
removed instead of being made to work.

Change-Id: Ibcde99de64bb7e95d5df63a42a4ca4999f0c4c9b
2020-09-22 16:59:47 +00:00
Dmitry Tantsur
fe6b687968 When reporting that agent is busy, report the executed command
Also make this API return a proper HTTP code (409 instead of 500).

Change-Id: I5d86878b5ed6142ed2630adee78c0867c49b663f
2020-09-18 17:52:49 +02:00
Julia Kreger
bb27badf76 Add basic retries for inspection
A transitory connection failure, such as one caused by
a port being held down for traffic forwarding, can experience
intermittent connectivity failures which result in failed
introspections.

Now the agent retries.

Change-Id: I72c5e3aca000d3854a17f8a461b1a2935e5c0d9b
2020-09-14 22:38:18 +00:00
Zuul
42df6c174f Merge "Fix backup node lookup" 2020-09-14 09:11:21 +00:00
Zuul
d43dc1ee36 Merge "Refactor API version negotiation code" 2020-09-12 15:24:42 +00:00
Zuul
a3b10db95a Merge "Replace oslo's loopingcall with tenacity" 2020-09-12 15:24:41 +00:00
Dmitry Tantsur
021e0a6a46 Generate a TLS certificate and send it to ironic
Adds a new flag (on by default) that enables generating a TLS
certificate and sending it to ironic via heartbeat. Whether
ironic supports auto-generated certificates is determined by
checking its API version.

Change-Id: I01f83dd04cfec2adc9e2a6b9c531391773ed36e5
Depends-On: https://review.opendev.org/747136
Depends-On: https://review.opendev.org/749975
Story: #2007214
Task: #40604
2020-09-11 17:46:52 +02:00
Dmitry Tantsur
6a8056414e Refactor API version negotiation code
Makes sure heartbeats can send versions higher than one required for
tokens while also making sure we never send a version we don't know.

Also makes code easier to understand.

Change-Id: Ice1e7d45ea90c9fd8220c4b94e691b6015e23074
2020-09-11 17:45:37 +02:00
Julia Kreger
3426963552 Fix backup node lookup
The node lookup code added in change
I27201319f31cdc01605a3c5ae9ef4b4218e4a3f6
was slightly broken in that we call a method
with a keyword arguemnt which doesn't exist.

uuid versus node_uuid.

It happens, it is a quick fix!

Spotted on a metalsmith job:

[-] Agent is requesting to perform an explicit node cache update.
    This is to pickup any chanages in the cache before deployment.
[-] Failed to update node cache. Error lookup_node() got an
    unexpected keyword argument 'uuid'

Change-Id: I59ecec65707a2f03918b233f1925395ebe59b8c4
2020-09-09 15:19:38 -07:00
Dmitry Tantsur
9b75453339 Fix and run the correct functional tests job
Apparently, functional-py36 just runs unit tests.

Fix the test that has regressed in the meantime and make it voting
so that we don't regress again.

Change-Id: Id5efe89a12a00c27e6299380a51cdb840285d691
2020-09-04 17:10:41 +02:00