60746 Commits

Author SHA1 Message Date
Julien Le Jeune
cd4e58173a
Skip snapshot test when missing qemu-img
Since the commit the remove AMI snapshot format special casing
has merged, we're now running the libvirt snapshot tests as expected.
However, for those tests qemu-img binary needs to be installed.
Because these tests have been silently and incorrectly skipped for so long,
they didn't receive the same maintenance as other tests as the failures went unnoticed.

Change-Id: Ia90eedbe35f4ab2b200bdc90e0e35e5a86cc2110
Closes-bug: #2075178
Signed-off-by: Julien Le Jeune <julien.le-jeune@ovhcloud.com>
(cherry picked from commit 0809f75d7921fe01a6832211081e756a11b3ad4e)
2024-09-04 09:04:09 +02:00
Zuul
0e6cda2851 Merge "Fix port group network metadata generation" into stable/2024.1 2024-09-03 17:41:24 +00:00
Artom Lifshitz
294444b360 libvirt: call get_capabilities() with all CPUs online
While we do cache the hosts's capabilities in self._caps in the
libvirt Host object, if we happen to fist call get_capabilities() with
some of our dedicated CPUs offline, libvirt erroneously reports them
as being on socket 0 regardless of their real socket. We would then
cache that topology, thus breaking pretty much all of our NUMA
accounting.

To fix this, this patch makes sure to call get_capabilities()
immediately upon host init, and to power up all our dedicated CPUs
before doing so. That way, we cache their real socket ID.

For testing, because we don't really want to implement a libvirt bug
in our Python libvirt fixture, we make due with a simple unit tests
that asserts that init_host() has powered on the correct CPUs.

Closes-bug: 2077228
Change-Id: I9a2a7614313297f11a55d99fb94916d3583a9504
(cherry picked from commit 79d1f06094599249e6e30ebba2488b8b7a10834e)
2024-08-30 02:00:23 +00:00
Zuul
642b5984a6 Merge "hardware: Correct log" into stable/2024.1 2024-08-28 23:43:42 +00:00
Stephen Finucane
f619311dd9 hardware: Correct log
We currently get the following error message if attempting to fit a
guest with hugepages on a node that doesn't have enough:

  Host does not support requested memory pagesize, or not enough free
  pages of the requested size. Requested: -2 kB

Correct this, removing the kB suffix and adding a note on the meaning of
the negative values, like we have for the success path.

Change-Id: I247dc0ec03cd9e5a7b41f5c5534bdfb1af550029
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #2075959
(cherry picked from commit 4678bcbb064da580500b1dbeddb0bdfdeac074ef)
2024-08-26 11:41:31 +00:00
Balazs Gibizer
a0be2c97a6 Fix PCI passthrough cleanup on reschedule
The resource tracker Claim object works on a copy of the instance object
got from the compute manager. But the PCI claim logic does not use the
copy but use the original instance object. However the abort claim logic
including the abort PCI claim logic worked on the copy only. Therefore the
claimed PCI devices are visible to the compute manager in the
instance.pci_decives list even after the claim is aborted.

There was another bug in the PCIDevice object where the instance object
wasn't passed to the free() function and therefore the
instance.pci_devices list wasn't updated when the device was freed.

Closes-Bug: #1860555
Change-Id: Iff343d4d78996cd17a6a584fefa7071c81311673
(cherry picked from commit f8b98390dc99f6cb0101c88223eb840e0d1c7124)
2024-08-26 09:38:19 +02:00
sdmitriev1
7b0c0cf591 Functional test test_boot_reschedule_with_proper_pci_device_count
Lets first ensure we have a test that proves we have bad behaviour,
then follow up with the fix and the test tweak to prove it.

On the first compute node it fails due to group policy error.
On the second compute node instance should have exactly one PCI device.

Related-Bug: #1860555
Change-Id: Ia122fff268c8f45ad3e5a3071d2cb7c990cb2c1d
(cherry picked from commit a2d77845ab247f1b09e2ae4f32f9421c3f50b98d)
2024-08-26 09:38:08 +02:00
Elod Illes
8223e6a7c4 [CI] Replace deprecated regex
Latest Zuul drops the following warnings:

  All regular expressions must conform to RE2 syntax, but an
  expression using the deprecated Perl-style syntax has been detected.
  Adjust the configuration to conform to RE2 syntax.

  The RE2 syntax error is: invalid perl operator: (?!

This patch replaces the 'irrelevant-files' to 'files' with explicitly
listing the pattern which files should be the tests run against.

Change-Id: If287e800fb9ff428dbe6f9c4c046627f22afe3df
(cherry picked from commit 9b77bae8a32ff41712b96bb6a67c7eacae45a4c9)
2024-08-01 20:41:28 +02:00
Zuul
da9e56c9af Merge "cpu: Only check governor type on online cores" into stable/2024.1 29.2.0 2024-07-31 21:42:05 +00:00
Zuul
fc2d39f842 Merge "[tools] Backport validator: handle unmaintained" into stable/2024.1 2024-07-31 16:49:17 +00:00
Zuul
159458de2d Merge "[tools] Ignore bot generated patches" into stable/2024.1 2024-07-31 16:49:11 +00:00
Elod Illes
602e68364c [tools] Backport validator: handle unmaintained
When the script was created there were only stable/* branches, but now
there are unmaintained/* branches as well, where the validator fails
when looking for hashes only on stable/* branches even if the given
hash is already on unmtaintained/* branch. This patch matches now both
stable/* and unmaintained/* branches.

Change-Id: I08fcc63ab0fbe5af1be70d5fde5af98bf006101c
(cherry picked from commit e2697de8e41a566eb86aefa364906bda9bc59863)
2024-07-30 18:37:42 +02:00
Elod Illes
92b781f96e [tools] Ignore bot generated patches
This is a fix for the test whether a patch is bot generated or not, as
that did not worked as intended. The problem is that the script is
checking the email address of the parent patch (HEAD~), which probably
should be right in case the patch would be a MERGE patch. But this is
wrong in case the patch is not a MERGE patch. This fix uses the very
same pattern as it is using for the commit message parsing: the
$commit_hash variable, which is the parent's commit hash if the patch
is a MERGE patch, and an empty string in the other case (causing to
call 'git show' on HEAD).

Change-Id: I0abc72180edf34a6dd0624a40fb8682397805eca
(cherry picked from commit b8f3975d3641fad19971cc159bdb9decb6ea95f8)
2024-07-30 18:37:31 +02:00
Amit Uniyal
9146e90566
Refactor vf profile for PCI device
In general the card_serial_number will not be present on sriov
VFs/PFs, it is only supported on very new cards.
Also, all 3 need not to be always required for vf_profile.

Related-Bug: #2008238
Change-Id: I00b126635612ace51b5e3138afcb064f001f1901
(cherry picked from commit a1a07e0d2d01e95b7d3c8db62d149a4278617c93)
2024-07-30 11:45:07 +02:00
Zuul
2f020b72ec Merge "Remove AMI snapshot format special case" into stable/2024.1 2024-07-29 13:14:26 +00:00
Julien Le Jeune
c782f0e6a0
Fix test_vmdk_bad_descriptor_mem_limit and test_vmdk_bad_descriptor_mem_limit_stream_optimized
These tests depend on qemu-img being installed and in the path, if it is not installed, skip them.

Change-Id: I896f16c512f24bcdd898ab002af4e5e068f66b64
Closes-bug: #2073862
Signed-off-by: Julien Le Jeune <julien.le-jeune@ovhcloud.com>
(cherry picked from commit a3202f7bf9f1aecc7d0632011167d38a1698f0a0)
2024-07-26 14:20:40 +02:00
Dan Smith
8c5929ff51 Remove AMI snapshot format special case
Note that this includes seemingly-unrelated test changes because we
were actually skipping the snapshot_running test for libvirt, which
has been a bug for years. In that test case, when we went to look
for image_meta.disk_format, that attribute was not set on the o.vo
object, which raised a NotImplementedError. That error is also checked
by the test to skip the test for drivers that do not support snapshot,
which meant that for libvirt, we haven't been running that case
beyond the point at which we create snapshot metadata and trip that
exception. Thus, once removing that, there are other mocks not in
place that are required for the test to actually run. So, this adds
mocks for qemu_img_info() calls that actually try to read the file on
disk, as well as the privsep chown() that attempts to run after.

Change-Id: Ie731045629f0899840a4680d21793a16ade9b98e
(cherry picked from commit d5a631ba7791b37e49213707e4ea650a56d2ed9e)
2024-07-25 21:46:18 +00:00
Zuul
39b6df523d Merge "Change force_format strategy to catch mismatches" into stable/2024.1 2024-07-24 20:30:32 +00:00
Zuul
f788cbe26c Merge "Stop using split UEC image (mostly)" into stable/2024.1 2024-07-24 12:46:38 +00:00
Dan Smith
8ef5ec9716 Change force_format strategy to catch mismatches
When we moved the qemu-img command in fetch_to_raw() to force the
format to what we expect, we lost the ability to identify and react
to situations where qemu-img detected a file as a format that is not
supported by us (i.e. identfied and safety-checked by
format_inspector). In the case of some of the other VMDK variants
that we don't support, we need to be sure to catch any case where
qemu-img thinks it's something other than raw when we think it is,
which will be the case for those formats we don't support.

Note this also moves us from explicitly using the format_inspector
that we're told by glance is appropriate, to using our own detection.
We assert that we agree with glance and as above, qemu agrees with
us. This helps us avoid cases where the uploader lies about the
image format, causing us to not run the appropriate safety check.
AMI formats are a liability here since we have a very hard time
asserting what they are and what they will be detected as later in
the pipeline, so there is still special-casing for those.

Closes-Bug: #2071734
Change-Id: I4b792c5bc959a904854c21565682ed3a687baa1a
(cherry picked from commit 8b4c522f6699514e7d1f20ac25cf426af6ea588f)
2024-07-23 20:03:03 +00:00
Sylvain Bauza
cf721336d4 cpu: Only check governor type on online cores
Kernels don't accept to access the governor strategy on an offline core, so
we need to only validate strategies for online cores.

Change-Id: I14c9b268d0b97221216bd1a9ab9e48b48d6dcc2c
Closes-Bug: #2073528
(cherry picked from commit 757c333c0e55df4bcaf9d442fbe8dc8009e36989)
2024-07-19 17:02:51 +02:00
Balazs Gibizer
a8783a7675 Stabilize iso format unit tests
Some version of mkisofs does not properly handle if both the input and
the output file of the command are the same. So this commit changes the
unit tests depending on that binary to use a different files.

Related-Bug: #2059809
Change-Id: I6924eb23ff5804c22a48ec6fabcec25f061906bb
(cherry picked from commit c6d8c6972d52845774b36acb84cd08a4b2e4dcde)
2024-07-11 11:16:10 +02:00
Sean Mooney
ae10fde55b fix qemu-img version dependent tests
while backporting Ia34203f246f0bc574e11476287dfb33fda7954fe

We observed that several of the tests showed distro specific
behavior depending on if qemu was installed in the test env,
what version is installed and how it was compiled

This change ensures that if qemu is present that it
supprot the required formats otherwise it skips the test.

Change-Id: I131996cdd7aaf1f52d4caac33b153753ff6db869
(cherry picked from commit cc2514d02e0b0ebaf60a46d02732f7f8facc3191)
2024-07-11 09:27:03 +02:00
Dan Smith
81e3b0850c Stop using split UEC image (mostly)
This reverts us back to using the standard disk image for most of our
tests, which is more representative of how people actually use nova.
This leaves the UEC image on a few jobs for the sake of comparison
data for the time being, and because we should actually test that
code path if we're going to say we support it.

Change-Id: I16ed92d342464325d4bef33c1e22b328bcfbe7d6
(cherry picked from commit eed3e2b47ffea24d08ad7a85a4e9c36ef56d815e)
2024-07-10 10:37:11 +01:00
Sean Mooney
eeda7c333c Add iso file format inspector
This change includes unit tests for the ISO
format inspector using mkisofs to generate
the iso files.

A test for stashing qcow content in the system_area
of an iso file is also included.

This change modifies format_inspector.detect_file_format
to evaluate all inspectors until they are complete and
raise an InvalidDiskInfo exception if multiple formats
match.

Related-Bug: #2059809
Change-Id: I7e12718fb3e1f77eb8d1cfcb9fa64e8ddeb9e712
(cherry picked from commit b1cc39848ebe9b9cb63141a647bda52a2842ee4b)
2024-07-09 10:41:17 +01:00
Sean Mooney
3a6d9a038f Reproduce iso regression with deep format inspection
This change adds a reproducer for the regression in iso
file support when
workarounds.disable_deep_image_inspection = False

Change-Id: I56d8b9980b4871941ba5de91e60a7df6a40106a8
(cherry picked from commit b5a1d3b4b2d0aaa351479b1d7e41a3895c28fab0)
2024-07-09 10:40:52 +01:00
Sean Mooney
66205be426 port format inspector tests from glance
This commit is a direct port of the format inspector
unit tests from glance as of commit
0d8e79b713bc31a78f0f4eac14ee594ca8520999

the only changes to the test are as follows

"from glance.common import format_inspector" was updated to
"from nova.image import format_inspector"

"from glance.tests import utils as test_utils"
was replaced with "from nova import test"

"test_utils.BaseTestCase" was replaced with "test.NoDBTestCase"

"glance-unittest-formatinspector-" was replaced with
"nova-unittest-formatinspector-"

This makes the test funtional in nova.

TestFormatInspectors requries qemu-img to be installed on the
host which would be a new depency for executing unit tests.
to avoid that we skip TestFormatInspectors if qemu-img
is not installed.
TestFormatInspectorInfra and TestFormatInspectorsTargeted
do not have a qemu-img dependency so
no changes to the test assertions were required.

Change-Id: Ia34203f246f0bc574e11476287dfb33fda7954fe
(cherry picked from commit 838daa3cad5fb3cdd10fb7aa76c647330a66939e)
2024-07-09 10:40:36 +01:00
Zuul
8459a6af90 Merge "retry write_sys call on device busy" into stable/2024.1 2024-07-09 05:41:32 +00:00
Zuul
d7284411fb Merge "add functional repoducer for bug 2065927" into stable/2024.1 2024-07-09 04:08:13 +00:00
Mohammed Naser
1936a408c0 Fix port group network metadata generation
When switching to using OpenStack SDK, there was a change missed
that didn't account for the SDK returning generators instead of
a list, so the loop on ports and port groups made it so that it
started returning an empty list afterwards.

Since there is no a masse of ports for a baremetal system usually,
we take the generator into a list right away to prevent this.

Closes-Bug: #2071972
Change-Id: I90766f8c225d834bb2eec606754107ea6a212f6d
(cherry picked from commit 8558f59630f81beba2789e6deef2cb5e6b367f20)
2024-07-09 00:26:09 +00:00
Dan Smith
11301e7e3f Fix vmdk_allowed_types checking
This restores the vmdk_allowed_types checking in create_image()
that was unintentionally lost by tightening the
qemu-type-matches-glance code in the fetch patch recently. Since we
are still detecting the format of base images without metadata, we
would have treated a vmdk file that claims to be raw as raw in fetch,
but then read it like a vmdk once it was used as a base image for
something else.

Change-Id: I07b332a7edb814f6a91661651d9d24bfd6651ae7
Related-Bug: #2059809
(cherry picked from commit 08be7b2a0dc1d7728d8034bc2aab0428c4fb642e)
29.1.0
2024-07-04 17:12:23 +02:00
Dan Smith
8a0d5f2afa Additional qemu safety checking on base images
There is an additional way we can be fooled into using a qcow2 file
with a data-file, which is uploading it as raw to glance and then
booting an instance from it. Because when we go to create the
ephemeral disk from a cached base image, we've lost the information
about the original source's format, we probe the image's file type
without a strict format specified. If a qcow2 file is listed in
glance as a raw, we won't notice it until it is too late.

This brings over another piece of code (proposed against) glance's
format inspector which provides a safe format detection routine. This
patch uses that to detect the format of and run a safety check on the
base image each time we go to use it to create an ephemeral disk
image from it.

This also detects QED files and always marks them as unsafe as we do
not support that format at all. Since we could be fooled into
downloading one and passing it to qemu-img if we don't recognize it,
we need to detect and reject it as unsafe.

Change-Id: I4881c8cbceb30c1ff2d2b859c554e0d02043f1f5
(cherry picked from commit b1b88bf001757546fbbea959f4b73cb344407dfb)
2024-07-04 17:11:12 +02:00
Dan Smith
f07fa55fd8 Check images with format_inspector for safety
It has been asserted that we should not be calling qemu-img info
on untrusted files. That means we need to know if they have a
backing_file, data_file or other unsafe configuration *before* we use
qemu-img to probe or convert them.

This grafts glance's format_inspector module into nova/images so we
can use it to check the file early for safety. The expectation is that
this will be moved to oslo.utils (or something) later and thus we will
just delete the file from nova and change our import when that happens.

NOTE: This includes whitespace changes from the glance version of
format_inspector.py because of autopep8 demands.

Change-Id: Iaefbe41b4c4bf0cf95d8f621653fdf65062aaa59
Closes-Bug: #2059809
(cherry picked from commit 9cdce715945619fc851ab3f43c97fab4bae4e35a)
2024-07-04 17:10:56 +02:00
Dan Smith
58d933eafb Reject qcow files with data-file attributes
Change-Id: Ic3fa16f55acc38cf6c1a4ac1dce4487225e66d04
Closes-Bug: #2059809
(cherry picked from commit ec9c55cbbc91d1f31e42ced289a7c82cf79dc2a2)
2024-07-04 17:10:34 +02:00
Dan Smith
62b3192c70 Fix disk_formats in ceph job tempest config
Tempest currently defaults to disk_formats[0] for images it creates,
which is 'ami'. However, it's actually using a qcow2 disk image by
default, which means we're lying to glance when we create those.

Change-Id: I737e9aa51c268a387f1eed24cf717618d057d747
(cherry picked from commit c0ff2386ed15b531c44283909b9619e43d1ed66a)
2024-07-02 18:54:59 -07:00
Sean Mooney
1581f6695f retry write_sys call on device busy
This change adds a retry_if_busy decorator
to the read_sys and write_sys functions in the filesystem
module that will retry reads and writes up to 5 times with
an linear backoff.

This allows nova to tolerate short periods of time where
sysfs retruns device busy. If the reties are exausted
and offlineing a core fails a warning is log and the failure is
ignored. onling a core is always treated as a hard error if
retries are exausted.

Closes-Bug: #2065927
Change-Id: I2a6a9f243cb403167620405e167a8dd2bbf3fa79
(cherry picked from commit 44c1b48b3121682cf959c90b3adaf2a3f92e318c)
2024-06-27 22:08:21 +00:00
Sean Mooney
f1c46802b1 add functional repoducer for bug 2065927
Today if the write sys call to offline a cpu when
deleting an instnace fails due to an OSERROR or ValueERROR
the instance delete fails and the instance goes to error.

as reported in bug: #2065927 this can happen as a result of
OSError: [Errno 16] Device or resource busy if the vm is
deleted shortly after its started.

Related-Bug: #2065927
Change-Id: I1352a3a1e28cfe14ec8f32042ed35cb25e70338e
(cherry picked from commit ee581a5c9d1c0b7c0d8830a08f55fe8bc2fbcd0f)
2024-06-27 14:25:58 +00:00
Zuul
e05b2a0ea3 Merge "[ironic] Fix rebooting instance" into stable/2024.1 29.0.2 2024-05-14 09:52:32 +00:00
Balazs Gibizer
994358d582 Reject AZ changes during aggregate add / remove host
After this patch nova rejects the add host to aggregate API action
if the host has instances and the new aggregate for the host would
mean that these instances need to move from one AZ (even from the
default one) to another. Such AZ change is not implemented in nova
and currently leads to stuck instances.

Similarly nova will reject remove host from aggregate API action if the
host has instances and the aggregate removal would mean that the
instances need to change AZ.

Depends-On: https://review.opendev.org/c/openstack/tempest/+/821732

Change-Id: I19c4c6d34aa2cc1f32d81e8c1a52762fa3a18580
Closes-Bug: #1907775
(cherry picked from commit 3c0eadae0b9ec48586087ea6c0c4e9176f0aa3bc)
2024-05-10 09:36:37 +00:00
Vasyl Saienko
7634cfa09e [ironic] Fix rebooting instance
The correct state for hard and soft reboots are rebooting [0]

[0] https://github.com/openstack/openstacksdk/blob/master/openstack/baremetal/v1/node.py#L44

Closes-Bug: #2064826
Change-Id: I18e0352b3638872e85ce91a3cfcbbfddc812ab67
(cherry picked from commit 0e766885f65068559f6680116db6dad898ad5b8c)
2024-05-08 05:14:23 +00:00
Zuul
c33a8ae6b3 Merge "Fix nova-manage image_property show unexpected keyword" into stable/2024.1 2024-04-10 11:03:44 +00:00
Simon Hensel
ebd45460f7 Always delete NVRAM files when deleting instances
When deleting an instance, always send VIR_DOMAIN_UNDEFINE_NVRAM to
delete the NVRAM file, regardless of whether the image is of type UEFI.
This prevents a bug when rebuilding an instance from an UEFI image to a
non-UEFI image.

Closes-Bug: #1997352

Change-Id: I24648f5b7895bf5d093f222b6c6e364becbb531f
Signed-off-by: Simon Hensel <simon.hensel@inovex.de>
(cherry picked from commit 406d590a364d2c3ebc91e5f28f94011b158459d2)
2024-04-04 11:07:58 +02:00
Robert Breker
fc4b592d55 Fix nova-manage image_property show unexpected keyword
Reproduction steps:
1. Execute: nova-manage image_property show <vm_uuid> \
                            hw_vif_multiqueue_enabled
2. Observe:
  An error has occurred:
  Traceback (most recent call last):
    File "/var/lib/kolla/venv/lib/python3.9/
          site-packages/nova/cmd/manage.py", line 3394, in main
      ret = fn(*fn_args, **fn_kwargs)
  TypeError: show() got an unexpected keyword argument 'property'

Change-Id: I1349b880934ad9f44a943cf7de324d7338619d2e
Closes-Bug: #2016346
(cherry picked from commit 1c02c0da1702ab1f58e930782a8866ed683c3c7d)
2024-03-29 18:55:02 +00:00
ae003e7a8d [stable-only] Update TOX_CONSTRAINTS_FILE for stable/2024.1
Update the URL to the upper-constraints file to point to the redirect
rule on releases.openstack.org so that anyone working on this branch
will switch to the correct upper-constraints list automatically when
the requirements repository branches.

Until the requirements repository has as stable/2024.1 branch, tests will
continue to use the upper-constraints list on master.

Change-Id: I29cf77fe03c2456c380ddd0e980c41e7cc415e22
2024-03-19 15:30:51 +00:00
8b9063bc99 [stable-only] Update .gitreview for stable/2024.1
Change-Id: I2fc8e1e13ef64c0cc1e430913dac1a92482d591e
2024-03-19 15:30:13 +00:00
Zuul
77de002d61 Merge "Add a Caracal prelude section" 29.0.0.0rc1 29.0.0 29.0.1 2024-03-18 19:46:09 +00:00
Sylvain Bauza
1ddfda5b11 Add a Caracal prelude section
Shamelessly copied from the cycle highlights.

Change-Id: I6fd5ce392ee07700600ccae8916cd4e6b524cbc3
2024-03-18 19:59:41 +01:00
Zuul
3e358bc37c Merge "vgpu: Allow device_addresses to not be set" 2024-03-18 16:58:28 +00:00
Zuul
e255323f46 Merge "libvirt: Cap with max_instances GPU types" 2024-03-18 12:31:30 +00:00
Zuul
8f3976d4cc Merge "Update python classifier in setup.cfg" 2024-03-15 10:01:27 +00:00