2449 Commits

Author SHA1 Message Date
Julia Kreger
cb61a8d6c0 Retry on checksum failures
HTTP is a fun protocol.

Size is basically optional. And clients implicitly trust the server
and socket has transferred all the bytes. Which *really* means you
should always checksum.

But... previously we didn't checksum as part of retrying.

So if anything happened with python-requests, or lower level
library code or the system itself causing bytes to be lost off the
buffer, creating an incomplete transfer situation, then we wouldn't
know until the checksum.

So now, we checksum and re-trigger the download if there is a
failure of the checksum.

This involved a minor shift in the download logic, and resulted in
a needful minor fix to an image checksum test as it would loop for
90 seconds as well.

Closes-Bug: 2038934
Change-Id: I543a60555a2621b49dd7b6564bd0654a46db2e9a
2023-10-10 09:15:31 -07:00
Zuul
89be7bd420 Merge "Conditional creation of RAIDed ESP for UEFI Software RAID" 2023-10-10 11:07:25 +00:00
Boushra Bettir
25704d2555 Add additional mock tests to unit tests for read only devices.
Change ordering to ensure mock tests work correctly.

Closes-Bug: #2037690

Change-Id: Ie9b884e58e4677a47e57c3ad39cadd65db8eec75
2023-10-08 20:02:05 +00:00
Zuul
23c8427224 Merge "Extend the lookup timeout to 600 seconds" 2023-09-22 12:35:00 +00:00
db9545eeec Update master for stable/2023.2
Add file to the reno documentation build to show release notes for
stable/2023.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.2.

Sem-Ver: feature
Change-Id: I8150eb8f35a444ef5a2bc7a648ec301e5094e52d
2023-09-21 11:18:18 +00:00
Zuul
73b76da5fe Merge "Add get_service_steps logic to the agent" 9.7.0 2023-09-15 22:29:59 +00:00
Zuul
1581f91826 Merge "preserve/handle config drives on 4k block devices" 2023-08-31 14:30:03 +00:00
Julia Kreger
eb95273ffb Add get_service_steps logic to the agent
Initial code patches for service steps have merged in
ironic, and it is now time to add support into the
agent which allows service steps to be raised to
the service.

Updates the default hardware manager version to 1.2,
which has *rarely* been incremented due to oversight.

Change-Id: Iabd2c6c551389ec3c24e94b71245b1250345f7a7
2023-08-31 06:22:22 -07:00
Zuul
667abee812 Merge "Use sparkingly new metalsmith cs9 job" 2023-08-24 18:57:36 +00:00
Zuul
5a3c8bd138 Merge "tox: Remove basepython" 2023-08-24 15:52:34 +00:00
Julia Kreger
4efcce5310 Extend the lookup timeout to 600 seconds
Changes the default lookup timeout to be 600 seconds which
reduces the risk of lookup failing as a write operation
to the backing database is performed upon lookup thanks to
generation of an agent token.

Overall, this is fairly harmless since by default ramdisks
restart the agent if they were not able to successfully
start.

Change-Id: I35c64c0b4f9b3b607df1bc0c4c2a852aa3595cbd
2023-08-24 08:29:07 -07:00
Julia Kreger
b6c263a5dc preserve/handle config drives on 4k block devices
When an underlying block device (or driver) only supports 4KB IO,
this can cause some issues with aspects like using an ISO9660 filesystem
which can only support a maximum of 2KB IO.

The agent will now attempt to mount the filesystem *before* deleting the
supplied file, and should that fail it will mount the configuration drive
file from the ramdisk utilizing a loopback, and then extract the contents
of the ramdisk into a newly created VFAT filesystem which supports 4KB
block IO.

Closes-Bug: #2028002
Change-Id: I336acb8e8eb5a02dde2f5e24c258e23797d200ee
2023-08-24 08:10:22 -07:00
Riccardo Pittau
51f2115c56 Use sparkingly new metalsmith cs9 job
Instead of the old dusty cs8 one.

Depends-On: I56a0473ecbff8ab8fc143954d3c493037765cdf1
Change-Id: I7bf9cbff9d10299c1a6b9b19fddd8124c1b185ba
2023-08-24 16:29:50 +02:00
Julia Kreger
5ed520df89 Handle the node being locked
If the node is locked, a lookup cannot be performed when an agent
token needs to be generated, which tends to error like this:

  ironic_python_agent.ironic_api_client [-] Failed looking up node
  with addresses '00:6f:bb:34:b3:4d,00:6f:bb:34:b3:4b' at
  https://172.22.0.2:6385. Error 409: Node
  c25e451b-d2fb-4168-b690-f15bc8365520 is locked by host 172.22.0.2,
  please retry after the current operation is completed..
  Check if inspection has completed.

Problem is, if we keep pounding on the door, we can actually worsen
the situation, and previously we would just just let tenacity
retry.

We will now hold for 30 seconds before proceeding, so we have
hopefully allowed the operation to complete.

Also fixes the error logging to help human's sanity.

Change-Id: I97d3e27e2adb731794a7746737d3788c6e7977a0
2023-08-22 16:47:28 -07:00
Arne Wiebalck
286d66709a Conditional creation of RAIDed ESP for UEFI Software RAID
Rebuilding an instance on a RAIDed ESPs will fail due to sgdisk
running against an non-clean disk and bailing out. Check if there
is a RAIDed ESP already and skip creation if it exists.

Change-Id: I13617ae77515a9d34bc4bb3caf9fae73d5e4e578
2023-08-16 17:39:04 +02:00
Julia Kreger
b68a4c8a92 minor: fix release notes file path
Change-Id: I458d88bf14b55253179488cb771ae42e7b8c84d7
9.6.0
2023-08-07 12:57:34 -07:00
likui
80c3f568bd tox: Remove basepython
Python 2 is EOL. No environment should be defaulting to it. Our CI
environments certainly aren't.

Change-Id: Ib2e4304fc6c95c853570f48690a2d2a4aeeabdbe
2023-08-02 16:59:59 +08:00
likui
c274869756 Add python3.10 support in testing runtime
In 2023.2 cycle testing runtime, project started adding python 3.10

[1] https://governance.openstack.org/tc/reference/runtimes/2023.2.html

Change-Id: I6e3eb0c9dec4c48e1bf1c7c53b8c177775ec91eb
2023-07-31 15:26:15 +08:00
Zuul
e493cad02c Merge "Log the number of bytes downloaded" 2023-07-27 21:39:12 +00:00
Julia Kreger
c65ad42ff1 Log the number of bytes downloaded
When troubleshooting download issues, which may present
as checksum validation failures, it is difficult to understand
if the *entire* file was downloaded due to the way HTTP works.

In that, a download may start with a successful result code,
and the content is streamed out until the socket is closed.

But with HTTP there is no way to know if that socket closed
prematurely and the original server size is *also* an optional
field, so just log the size we got to so we don't drive the
humans [more-]insane.

Also now logs the (optional) content-length field if
supplied by the server.

Change-Id: Id71b167f4e330d54b9afddf95f1a2ef9e40398bf
2023-07-19 16:20:40 +00:00
Zuul
0fb7fec56e Merge "Allow md5 to be disabled from the conductor" 2023-07-12 03:53:14 +00:00
Zuul
119981a818 Merge "Fix nvidia hardware manager url parser to permit https" 2023-06-26 10:11:55 +00:00
Zuul
bb156aad6c Merge "Fix Bandit errors" 2023-06-26 09:25:09 +00:00
Julia Kreger
b83678c968 Fix nvidia hardware manager url parser to permit https
Change-Id: I9a10e543d3256ceaa78c6fbdb01fc0d88c0ee6e6
2023-06-06 15:35:16 +00:00
Julia Kreger
78c1343a54 Fix Bandit errors
Bandit 1.7.5 released with a timeout check for all requests and
urllib calls.

Fixed those.

In the process, then exposed a bandit b310 issue, which was already
covered by the code, but explicitly marked it as such.

Also, enables bandit checks to be voting for CI..

Change-Id: If0e87790191f5f3648366d571e1d85dd7393a548
2023-06-06 08:34:55 -07:00
Zuul
4845fd04ba Merge "Follow-up Add documentation for MellanoxDeviceHardwareManager" 9.5.0 2023-05-25 15:03:19 +00:00
Julia Kreger
e6fd7e753e Allow md5 to be disabled from the conductor
Also fixes my use of set_override, as it is not on the actual
config object. You'd think I'd remember that, since I've done
that before...

Change-Id: I4b578c4319354001cbbd3b3856af96b30fd25555
2023-05-25 07:59:07 -07:00
waleedm
406c844aac Follow-up Add documentation for MellanoxDeviceHardwareManager
Add a follow-up documentation for
"update NVIDIA NIC firmware images and settings by ironic-python-agent"
Icfaffd7c58c3c73c3fa28cfc2a6c954d2c93c16e

Change-Id: I481cdd622f360cbba3312c6f3d4af45383bb7e1b
2023-05-25 10:55:11 +00:00
Jay Faulkner
6098747ec5 Ironic (and IPA) use launchpad now
Correct links to point to launchpad bug tracker, correct docs config

Change-Id: I5d46af2a9d94f3b2e05e4f937e0619a89fe04d4c
2023-05-17 15:38:57 -07:00
Zuul
141c5ff1c3 Merge "Add support for CentOS SUM files" 2023-05-09 09:03:25 +00:00
Zuul
03e88b579e Merge "Revert disabling MD5 checksums" 2023-05-05 08:44:37 +00:00
Zuul
44d9c2219f Merge "Add network interface speed to the inventory" 2023-05-04 09:04:30 +00:00
Dmitry Tantsur
c1c5537ba2 Revert disabling MD5 checksums
This was a significant breaking change that was landed despite explicit
disagreement by some community members (myself included). It has already
resulted in an accidental Ironic CI breakage, has broken Bifrost and has
a potential of breaking Metal3. In case of Metal3, MD5 support is a part
of its public API.

While MD5 is a potential security hazard, I don't see the need to hurry
this change without giving the community time to prepare. This change
reverts the new option md5_enabled to True.

Change-Id: I32b291ea162e8eb22429712c15cb5b225a6daafd
2023-05-04 09:26:10 +02:00
Harald Jensås
e7a048ecbe
Add support for CentOS SUM files
The CentOS Stream SUM files uses format:
  # FILENAME: <size> bytes
  ALGORITHM (FILENAME) = CHECKSUM

Compared to the more common format:
  CHECKSUM  *FILE_A
  CHECKSUM  FILE_B

Use regular expressions to check for filename both
in the middle with parentheses and at the end.
Similarly look for valid checksums at beginning or
end of line. Also look for know checsum patterns in
case file only contain the checksum iteself.

Change-Id: I9e49c1a6c66e51a7b884485f0bcaf7f1802bda33
2023-05-03 21:31:23 +02:00
Dmitry Tantsur
9ed232e77e Add network interface speed to the inventory
This is another fact that Metal3's baremetal-operator is currently
consuming from extra-hardware.

Change-Id: I2ec9d5e9369f5508e7583a4e13c2083f5c8b28ba
2023-05-03 12:20:35 +02:00
Julia Kreger
c05fdf790c Fix checksum validation logic
The checksum validation logic, which was updated early on in the
whole process of deprecating md5, didn't account for a URL *or* a
longer checksum (i.e. sha256/sha512) which was decided while the
overall approach was being decided.

Fixes the logic, and adds additional tests.

Change-Id: Ic4053776e131fc02ace295a1e69e9f9faab47f42
2023-05-02 17:24:57 -07:00
Zuul
f37ea85a27 Merge "Disable MD5 image checksums" 2023-05-02 06:41:25 +00:00
Zuul
3cd8c294fb Merge "Deprecate LLDP in inventory in favour of a new collector" 2023-04-27 12:05:11 +00:00
Zuul
33e3bae28b Merge "Fix UTF-16 result handling for efibootmgr" 2023-04-27 00:49:35 +00:00
Dmitry Tantsur
3e05a03f7c Deprecate LLDP in inventory in favour of a new collector
Binary LLDP data is bloating inventory causing us to disable its collection
by default. For other similar low-level information, such as PCI devices
or DMI data, we already use inspection collectors instead. Now that the
inventory format is shared with out-of-band inspection, having LLDP
there makes even less sense.

This change adds a new collector ``lldp`` to replace the now-deprecated
inventory field.

Change-Id: I56be06a7d1db28407e1128c198c12bea0809d3a3
2023-04-26 19:33:51 +00:00
Julia Kreger
32df26a22a Disable MD5 image checksums
MD5 image checksums have long been supersceeded by the use of a
``os_hash_algo`` and ``os_hash_value`` field as part of the
properties of an image.

In the process of doing this, we determined that checksum via
URL usage was non-trivial and determined that an appropriate
path was to allow the checksum type to be determined as needed.

Change-Id: I26ba8f8c37d663096f558e83028ff463d31bd4e6
2023-04-24 16:54:42 -07:00
Jay Faulkner
d7234c2be0 Upgrade to latest hacking - v6
No code changes needed to comply with newest flake.

Change-Id: I256397efe0fbb3e307d808b0eda2e4d72d83f9b0
2023-04-21 12:19:02 -07:00
Julia Kreger
76accfb880 Fix UTF-16 result handling for efibootmgr
The tl;dr is that UEFI NVRAM is in encoded
in UTF-16, and when we run the efibootmgr command,
we can get unicode characters back.

Except we previously were forcing everything to be
treated as UTF-8 due to the way oslo.concurrency's
processutils module works.

This could be observed with UTF character 0x00FF
which raises up a nice exception when we try to
decode it.

Anyhow! while fixing handling of this, we discovered
we could get basically the cruft out of the NVRAM,
by getting what was most likey a truncated string
out of our own test VMs. As such, we need to also
permit decoding to be tollerant of failures.
This could be binary data or as simple as flipped
bits which get interpretted invalid characters.
As such, we have introduced such data into one of our
tests involving UEFI record de-duplication.

Closes-Bug: 2015602
Change-Id: I006535bf124379ed65443c7b283bc99ecc95568b
2023-04-17 09:14:24 -07:00
Dmitry Tantsur
0304c73c0e Report system firmware information in the inventory
Change-Id: I5b6ceb9cdcf4baa97a6f0482d1030d14f3f2ecff
2023-03-31 14:28:32 +02:00
Dmitry Tantsur
2ddb693491 Trivial: formatting issue in the inventory docs
Double ticks don't work if followed by a symbol without space.

Change-Id: Ia455650b5e601dadb2b0ab91f71e1d9286d26071
2023-03-30 13:33:39 +02:00
Arne Wiebalck
b32f6c6d94 [Trivial] Fix typo in efi_utils
Change-Id: I692e045e6bc8683038a2e85a6a132687d2b30f18
2023-03-15 14:25:42 +01:00
9f09b885bd Update master for stable/2023.1
Add file to the reno documentation build to show release notes for
stable/2023.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.

Sem-Ver: feature
Change-Id: Id58fd751e6ed8ed3d478a78a6895cad75667c9b1
2023-03-09 14:09:43 +00:00
Zuul
088610844a Merge "update NVIDIA NIC firmware images and settings by ironic-python-agent" 9.4.0 2023-01-31 19:35:53 +00:00
Zuul
c12135911a Merge "Make logs collection a hardware manager call" 9.3.0 2023-01-26 16:26:42 +00:00
Zuul
7f687a1734 Merge "Readd usedevelop true to tox.ini" 2023-01-26 09:28:56 +00:00