555 Commits

Author SHA1 Message Date
Zuul
ca07e941cf Merge "Add a release note for 939340" 2025-01-17 19:40:39 +00:00
cid
c222626b01 Treat 'No space left on device' error as fatal
Fail without retries when Errno 28 - "No space left
on device" error is encountered.

Closes-Bug: #2094854
Change-Id: Ie84b422916ddc02f2474164fe3da083324ef4824
2025-01-17 11:13:01 +01:00
kubajj
2ece938671 Add a release note for 939340
Follow-up to 939340 to add a release note about the bug-fix.

Change-Id: I202f22d40776ab5d3245b8e14021d1404a9f478d
2025-01-16 09:34:08 +00:00
Zuul
06077cb88e Merge "Inventoried MAC address for only ipv6 addresses" 2024-12-04 19:09:09 +00:00
b010580caf reno: Update master for unmaintained/2023.1
Update the 2023.1 release notes configuration to build from
unmaintained/2023.1.

Change-Id: I0d8b1773367a61b326b5a6ff86ac1f126b15099b
2024-11-29 07:54:13 +00:00
Maximilian Brandt
6ccd3965ff Inventoried MAC address for only ipv6 addresses
Extended the function that expose BMC MAC address in inventory data
for an IPv6 only interface.
Previously, if no IPv4 address was configured, no mac address was exposed.

Change-Id: I93e49d308cfd63be1c09749ced4428a87a3daff9
2024-11-21 17:51:15 +01:00
Zuul
01639aab20 Merge "Add a command to lock down the agent" 2024-11-21 16:20:33 +00:00
Zuul
4f9f461ce9 Merge "A hardware manager call for a full sync before shutdown" 2024-11-07 15:07:12 +00:00
Dmitry Tantsur
aa98250066
Add a command to lock down the agent
To support a safer take-over from the provisioning to the tenant network
for hardware that cannot be powered off, this change introduces a new
command system.lockdown. When invoked, it stops the API, the heartbeater
and disables all network interfaces (if possible).

Partial-Bug: #2077432
Change-Id: I211fc64a46226127b0d82ab458029b3c702b3f74
2024-11-07 15:50:06 +01:00
Zuul
5746ac1222 Merge "Vendor metrics library from Ironic-Lib & deprecate" 2024-11-05 16:11:20 +00:00
Dmitry Tantsur
5aa0c1a2bb
A hardware manager call for a full sync before shutdown
This is largely required for the future lockdown command but can also be
used before the normal shutdown, especially in the sync command which is
currently used before an out-of-band shutdown command is issued.

In addition to a plain sync, the new command also tells the kernel to
drop its cached and issues a low-level sync command to each block
device.

Partial-Bug: #2077432
Change-Id: I3fc87b20bc5387a466b24ebc19b9982e4e368d20
2024-11-05 15:27:10 +01:00
Jay Faulkner
75abdb4148 Vendor metrics library from Ironic-Lib & deprecate
We are phasing out use of ironic-lib, and as such are removing the
metrics module from it. However, due to it's requirement of having
a statsd instance on the same subnet as the agent and there being no
support for prometheus exporting of metrics from IPA, these metrics are
no longer valuable (in the agent).

We are vendoring the module for the deprecation in order to facilitate
its removal from ironic-lib.

Change-Id: Ie50e078bc3f78d65cfa53680dc4116d1119ce155
2024-11-04 20:02:11 +00:00
Zuul
b851ae1bc8 Merge "Remove Python 3.8 support" 2024-10-31 17:44:24 +00:00
Takashi Kajinami
b0ef2c0483 Remove Python 3.8 support
Python 3.8 was removed from the tested runtimes for 2024.2[1] and has
not been tested since then.

Also add Python 3.12 which is part of the tested runtimes for 2025.1.
Now unit tests job with Python 3.12 is voting.

[1] https://governance.openstack.org/tc/reference/runtimes/2024.2.html

Change-Id: Id314b4453d81dcab806768e3c7ab5dc050a35136
2024-10-24 18:15:08 +09:00
Steve Baker
1a939105ba Capture and log sector sizes
``logical_sectors`` and ``physical_sectors`` sizes are now captured for
each hardware info ``disks`` entry, and also logged for ``lsblk`` calls.
This will be increasingly useful as storage devices with 4096 byte
sector sizes become more common.

Change-Id: I80b6b137f6e3071d9b8a4c1abe14416249aed9ac
2024-10-24 15:07:56 +13:00
e4d07fd1ba Update master for stable/2024.2
Add file to the reno documentation build to show release notes for
stable/2024.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.2.

Sem-Ver: feature
Change-Id: Iffa68c4207e97d92382fbff637a661a879c1909d
2024-09-20 13:52:29 +00:00
Zuul
ab99f36baa Merge "Check for the existence of an IPMI device" 2024-09-09 16:44:27 +00:00
cid
2d79eae382 Check for the existence of an IPMI device
Check for IPMI device files before the use of the `'ipmitool lan.*'`
command, avoiding unnecessary calls on non-IPMI systems.

Closes-Bug: #2076367
Change-Id: Ib800717701e6f2828df55a0da0e999fc014c12e1
2024-09-05 20:48:07 +01:00
Jay Faulkner
e303a369dc Inspect non-raw images for safety
When IPA gets a non-raw image, it performs an on-the-fly conversion
using qemu-img convert, as well as running qemu-img frequently to get
basic information about the image before validating it.

Now, we ensure that before any qemu-img calls are made, that we have
inspected the image for safety and pass through the detected format.

If given a disk_format=raw image and image streaming is enabled
(default), we retain the existing behavior of not inspecting it in
any way and streaming it bit-perfect to the device. In this case, we
never use qemu-based tools on the image at all.

If given a disk_format=raw image and image streaming is disabled, this
change fixes a bug where the image may have been converted if it was not
actually raw in the first place. We now stream these bit-perfect to the
device.

Adds two config options:
- [DEFAULT]/disable_deep_image_inspection, which can be set to "True" in
  order to disable all security features. Do not do this.
- [DEFAULT]/permitted_image_formats, default raw,qcow2, for image types
  IPA should accept.

Both of these configuration options are wired up to be set by the lookup
data returned by Ironic at lookup time.

This uses a image format inspection module imported from Nova; this
inspector will eventually live in oslo.utils, at which point we'll
migrate our usage of the inspector to it.

Closes-Bug: #2071740
Change-Id: I5254b80717cb5a7f9084e3eff32a00b968f987b7
2024-09-04 09:11:28 -07:00
Riccardo Pittau
bd3b596ced Fix series in release notes
Change-Id: I6844ce33274afdb64e78b79930c8aa32776e7665
2024-08-23 10:16:27 +02:00
Riccardo Pittau
599a825554 Fix versions in release notes
Change-Id: Ief6299e4b1bbef5fdb33a28b90b078f420cf8508
2024-06-10 16:01:36 +02:00
Jay Faulkner
c39517b044 Call evaluate_hardware_support exactly once per hwm
Fixes an issue where we could call evaluate_hardware_support multiple
times each run. Now, instead, we cache the values and use the cache
where needed.

Adds unit test coverage for get_managers and the new method.
Fixes issue where we were caching hardware managers between unit tests.

Also includes fixes for codespell CI:
- skip build files in repo
- fix spelling issues introduced to repo

Closes-bug: 2066308
Change-Id: Iebc5b6d2440bfc9f23daa322493379bbe69e84d0
2024-05-22 08:46:21 -07:00
c303bd971b reno: Update master for unmaintained/zed
Update the zed release notes configuration to build from
unmaintained/zed.

Change-Id: I673a729e1598d2100631262d61c91690f500306b
2024-05-06 06:22:59 +00:00
Julia Kreger
6ac3f350c0 Unmount config drives
If this seems like deja vu, that is because it is. We had this
very same issue with the original CoreOS ramdisk. Since we don't
control the whole OS of the ramdisk, it only made sense to teach
the agent to umount the folder.

The folder is referenced already, and the agent does have safeguards
in place, but unfortunately this issue led to a rebuild breaking where
cloud-init, glean, and the agent were all trying do the right thing
as they thought, and there were just multiple /mnt/config folders
present in the OS. These are separate issues we also need to try and
remedy.

What happens is when the device is locked via a mount, the partition
table is never updated to the running OS as the mount creates a lock.
So the agent ends up thinking, in the case of a rebuild, that everything
including creating a configuration drive on that device has been
successful, but when you reboot, there is no partition table entry
for the new partition as the change was not successfully written.
This state prevented the workload from rebooting properly.
This change eliminates that possibility moving forward by attempting
to ensure that the cloud configuration folder is no longer mounted.

Change-Id: I4399dd0934361003cca9ff95a7e3e3ae9bba3dab
2024-04-29 15:41:59 -07:00
Zuul
28053644cd Merge "add mixed matching of root device hints" 2024-04-27 17:26:25 +00:00
Zuul
2b67f277b7 Merge "Step to clean UEFI NVRAM entries" 2024-04-27 02:10:54 +00:00
Tudor Domnescu
ceec5a7367 destroy_disk_metadata: support 4096 sector size
A sector size of 512 was assumed and hardcoded, causing dd to fail when
it tried to write in chunks smaller than the sector size for disks with
4096 bytes sectors. The size of GPT in sectors also depends on sector size.

Change-Id: Ide5318eb503d728cff3221c26bebbd1c214f6995
2024-04-24 20:37:44 +00:00
Adam Rozman
84a1195d5a add mixed matching of root device hints
This commit introduces the following changes:
  - New optional `all_serial_and_wwn` argument for the block device
    listing logic. The new argument makes it possible to
    collect wwn and serial number information from both
    lsblk and udevadm at the same time
  - Both the short and the long serials are collected
    from udeavadm without prioritization when the new argument
    has teh value True
  - The new feature is automatically enabled during block device listing
    as part of the root disk selecetion
  - New options are added to the lsblk command when used in the block
    device discovery process, previously lsblk was not looking
    for wwn numbers and now it does

Closes-Bug: #2061437
Change-Id: I438a686d948cd929311e2f418bb02fb771805148
Signed-off-by: Adam Rozman <adam.rozman@est.tech>
2024-04-15 15:53:50 +03:00
Steve Baker
215fecd447 Step to clean UEFI NVRAM entries
Adds a deploy step ``clean_uefi_nvram`` to remove unrequired extra UEFI
NVRAM boot entries. By default any entry matching ``HD`` as the root
device, or with a ``shim`` or ``grub`` efi file in the path will be
deleted, ensuring that disk based boot entries are removed before the
new entry is created for the written image. The ``match_patterns``
parameter allows a list of regular expressions to be passed, where a
case insensitive search in the device path will result in that entry
being deleted.

Closes-Bug: #2041901
Change-Id: I3559dc800fcdfb0322286eba30ce47041419b0c6
2024-04-11 01:17:23 +12:00
Zuul
b6075156b3 Merge "USB device discovery" 2024-03-28 21:22:53 +00:00
783a0377ad Update master for stable/2024.1
Add file to the reno documentation build to show release notes for
stable/2024.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.1.

Sem-Ver: feature
Change-Id: I67ee5ead4aa7f47517c35d1a77d594fcad22cc4c
2024-03-19 11:10:44 +00:00
Zuul
ee8340f2cb Merge "Update regex to detect closed branch" 2024-03-18 11:07:10 +00:00
Zuul
815e1f462f Merge "reno: Update master for unmaintained/victoria" 2024-03-14 12:05:39 +00:00
Zuul
aa76962b4e Merge "reno: Update master for unmaintained/wallaby" 2024-03-14 12:00:33 +00:00
38ba0d8508 reno: Update master for unmaintained/xena
Update the xena release notes configuration to build from
unmaintained/xena.

Change-Id: I3bbef10b65dc43596a59eaca5d792f5e451d5d4c
2024-03-14 11:27:13 +00:00
5a017ea84a reno: Update master for unmaintained/wallaby
Update the wallaby release notes configuration to build from
unmaintained/wallaby.

Change-Id: Iaf279482847d781d7d338c4923a672a5e9337332
2024-03-14 11:22:41 +00:00
6ebaf277a6 reno: Update master for unmaintained/victoria
Update the victoria release notes configuration to build from
unmaintained/victoria.

Change-Id: I00a9bcb8ee6d5160d2598fbecb8e585885212df7
2024-03-14 11:18:06 +00:00
Takashi Kajinami
bffa88acb8 Update regex to detect closed branch
... based on the change made in reno recently[1].

Also the overall regex is updated to be more consistent with the regex
used in ironic.

[1] https://review.opendev.org/c/openstack/reno/+/910547

Change-Id: I362de82fb5478b846df7a343da02a359f5f7dece
2024-03-13 19:40:40 +09:00
Damien Rannou
3fd68c0848 USB device discovery
The idea is to retreive USB devices informations via 'lshw' and
return the list to ironic in order to be able to create introspection
rules based on USB devices.

Change-Id: I39d60cb467614fca7a7f701dbe576154213580a5
2024-02-19 14:49:52 +01:00
Zuul
6d35c1e949 Merge "Make inspection URL optional if the collectors are provided" 2024-02-07 23:06:34 +00:00
614532d2a2 reno: Update master for unmaintained/yoga
Update the yoga release notes configuration to build from
unmaintained/yoga.

Change-Id: I0c5ab4348bd293ce77b04180247773412edbe179
2024-02-06 15:03:51 +00:00
Dmitry Tantsur
0010f5c11a
Also retry inspection on HTTP CONFLICT
The new implementation can return it when unable to lock the node.

Other possible errors are 400 and 404 (should not be retried), as well as
5xx (already retried).

Change-Id: I74c2f54a624dc47e8e2d1e67ae4c6a6078e01d2f
2024-01-26 16:21:24 +01:00
Zuul
1e107bd625 Merge "Add support for reporting CPU socket number" 2024-01-22 11:52:06 +00:00
Kaifeng Wang
9cafe76225 Add support for reporting CPU socket number
IPA reports a few cpu fields including cores, arch, flags etc.
There is a need that user wants to utilize the physical number in
a baremetal since cores are just a logical representation of the
compute resource.
The socket number is more suitable for the quota control in some
use cases.

Change-Id: I94be86d6b12a3a7e7ca1041d948427a073412a31
2024-01-19 21:24:37 +00:00
Dmitry Tantsur
6cd36a750f
Make inspection URL optional if the collectors are provided
With the new in-band inspection, we can derive the callback URL from
the Ironic URL, there is no need to duplicate it. This change uses
the presence of collectors as a sign to run inspection.

The previous approach of setting an inspection URL, with or without
explicitly setting collectors, still works for compatibility with
ironic-inspector.

Change-Id: Ie4279ee6d2995c9686f1dcdef1d6e5dc1dd20871
2024-01-10 08:55:42 +01:00
Dmitry Tantsur
0d4ae976c2
Support several API and Inspector URLs
Allows nodes with a single IP stack to be deployed from a dual-stack
Ironic.

Detecting advertised address and usable Ironic URLs are done completely
independently which does open some space for a misconfiguration. I hope
it's not likely in the reality, especially since this feature is
targetting advanced standalone users.

Change-Id: Ifa506c58caebe00b37167d329b81c166cdb323f2
Closes-Bug: #2045548
2024-01-09 16:43:23 +01:00
Dmitry Tantsur
2bb74523ae
Add missing headers to the inspection callback
Somehow, it has worked correctly for years, but now I've discovered that
the new inspection is (no longer?) tolerant to the missing header.

While here, copy all headers from the heartbeat code.

Change-Id: I9e5c609eb4435e520bc225dea08aedfdf169744b
2024-01-09 16:38:46 +01:00
Jay Faulkner
36e5993a04 [codespell] Fix spelling issues in IPA
This fixes several spelling issues identified by codepsell. In some
cases, I may have manually modified a line to make the output more clear
or to correct grammatical issues which were obvious in the codespell
output.

Later changes in this chain will provide the codespell config used to
generate this, as well as adding this commit's SHA, once landed, to a
.git-blame-ignore-revs file to ensure it will not pollute git historys
for modern clients.

Related-Bug: 2047654
Change-Id: I240cf8484865c9b748ceb51f3c7b9fd973cb5ada
2023-12-28 10:54:46 -08:00
Dmitry Tantsur
2ab8364649
Add a jitter to heartbeat retries
Currently, if heartbeat fails, we reschedule it after 5 seconds.
This is fine for the first retry, but it can cause a thundering herd
problem when a lot of nodes fail to heartbeat at once.

This change adds jitter to the minimum wait of 5 seconds. The jitter is
not applied for forced heartbeats: they still have a minimum wait of
exactly 5 seconds from the last heartbeat.

The code is re-ordered to move the interval calculation to one place.
Bonus: correctly logging the next interval.

The unit tests have been rewritten to test the heartbeat process step by
step and not rely on the exact sequence of the calls.

Closes-Bug: #2038438
Change-Id: I4c4207b15fb3d48b55e340b7b3b54af833f92cb5
2023-12-13 17:34:24 +01:00
Zuul
62041d6d9e Merge "Fix referencing to the raid_device var which is not set" 2023-12-12 17:01:32 +00:00