Added check if quiesce fails because libvirt fails to connect with
qemu guest agent inside instance
Closes-Bug: #1980720
Change-Id: I134a4060ace2678f76ae3606bf117c07194a8d92
This adds a microversion and API support for triggering a rebuild
of volume-backed instances by leveraging cinder functionality to
do so.
Implements: blueprint volume-backed-server-rebuild
Closes-Bug: #1482040
Co-Authored-By: Rajat Dhasmana <rajatdhasmana@gmail.com>
Change-Id: I211ad6b8aa7856eb94bfd40e4fdb7376a7f5c358
Allow instances to be created with VNIC_TYPE_REMOTE_MANAGED ports.
Those ports are assumed to require remote-managed PCI devices which
means that operators need to tag those as "remote_managed" in the PCI
whitelist if this is the case (there is no meta information or standard
means of querying this information).
The following changes are introduced:
* Handling for VNIC_TYPE_REMOTE_MANAGED ports during allocation of
resources for instance creation (remote_managed == true in
InstancePciRequests);
* Usage of the noop os-vif plugin for VNIC_TYPE_REMOTE_MANAGED ports
in order to avoid the invocation of the local representor plugging
logic since a networking backend is responsible for that in this
case;
* Expectation of bind time events for ports of VNIC_TYPE_REMOTE_MANAGED.
Events for those arrive early from Neutron after a port update (before
Nova begins to wait in the virt driver code, therefore, Nova is set
to avoid waiting for plug events for VNIC_TYPE_REMOTE_MANAGED ports;
* Making sure the service version is high enough on all compute services
before creating instances with ports that have VNIC type
VNIC_TYPE_REMOTE_MANAGED. Network requests are examined for the presence
of port ids to determine the VNIC type via Neutron API. If
remote-managed ports are requested, a compute service version check
is performed across all cells.
Change-Id: Ica09376951d49bc60ce6e33147477e4fa38b9482
Implements: blueprint integration-with-off-path-network-backends
For some reason, we have two lineages of quota-related exceptions in
Nova. We have QuotaError (which sounds like an actual error), from
which all of our case-specific "over quota" exceptions inhert, such
as KeypairLimitExceeded, etc. In contrast, we have OverQuota which
lives outside that hierarchy and is unrelated. In a number of places,
we raise one and translate to the other, or raise the generic
QuotaError to signal an overquota situation, instead of OverQuota.
This leads to places where we have to catch both, signaling the same
over quota situation, but looking like there could be two different
causes (i.e. an error and being over quota).
This joins the two cases, by putting OverQuota at the top of the
hierarchy of specific exceptions and removing QuotaError. The latter
was only used in a few situations, so this isn't actually much change.
Cleaning this up will help with the unified limits work, reducing the
number of potential exceptions that mean the same thing.
Related to blueprint bp/unified-limits-nova
Change-Id: I17a3e20b8be98f9fb1a04b91fcf1237d67165871
Virtually all of the code for parsing 'hw:'-prefixed extra specs and
'hw_'-prefix image metadata properties lives in the 'nova.virt.hardware'
module. It makes sense for these to be included there. Do that.
Change-Id: I1fabdf1827af597f9e5fdb40d5aef244024dd015
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Specifying a duplicate port ID is currently "allowed" but results in an
integrity error when nova attempts to create a duplicate
'VirtualInterface' entry. Start rejecting these requests by checking for
duplicate IDs and rejecting offending requests. This is arguably an API
change because there isn't a HTTP 5xx error (server create is an async
operation), however, users shouldn't have to opt in to non-broken
behavior and the underlying instance was never actually created
previously, meaning automation that relied on this "feature" was always
going to fail in a later step. We're also silently failing to do what
the user asked (per flow chart at [1]).
[1] https://docs.openstack.org/nova/latest/contributor/microversions.html#when-do-i-need-a-new-microversion
Change-Id: Ie90fb83662dd06e7188f042fc6340596f93c5ef9
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1821088
There is inconsistency on return code nova API return
for "Feature not supported/implemented'. Current return
code are 400, 409, and 403.
- 400 case: Example: Multiattach Swap Volume Not Supported
- 403 case: Cyborg integration
- 409 case: Example: Operation Not Supported For SEV ,
Operation Not Supported For VTPM
In xena PTG, we agreed to fix this by returning 400 in all cases
- L446: https://etherpad.opendev.org/p/nova-xena-ptg
This commit convert all the features not supported error to
HTTPBadRequest(400).
To avoid converting every NotSupported inherited exception
in API controller to HTTPBadRequest generic conversion is
added in expected_errors() decorator.
Closes-Bug: #1938093
Change-Id: I410924668a73785f1bfe5c79827915d72e1d9e03
Nova re-generates the resource request of an instance for each server
move operation (migrate, resize, evacuate, live-migrate, unshelve) to
find (or validate) a target host for the instance move. This patch
extends the this logic to support the extended resource request from
neutron.
As the changes in the neutron interface code is called from nova-compute
service during the port binding the compute service version is bumped.
And a check is added to the compute-api to reject the move operations
with ports having extended resource request if there are old computes
in the cluster.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: Ibcf703e254e720b9a6de17527325758676628d48
This adds the final missing pieces to support creating servers with
ports having extended resource request. As the changes in the neutron
interface code is called from nova-compute service during the port
binding the compute service version is bumped. And a check is added to
the compute-api to reject such server create requests if there are old
computes in the cluster.
Note that some of the negative and SRIOV related interface attach
tests are also started to pass as they are not dependent on any of the
interface attach specific implementation. Still interface attach is
broken here as the failing of the positive tests show.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: I9060cc9cb9e0d5de641ade78c5fd7e1cc77ade46
As a precaution reject all the server lifecycle operations that currently
do not support port-resource-request-groups API extension. These
are:
* resize
* migrate
* live migrate
* evacuate
* unshelve after shelve offload
* interface attach
This rejection will be removed in the patch that adds support for the
given operation.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: I12c25550b08be6854b71ed3ad4c411a244a6c813
To prepare for the unlikely event that Neutron merges and an operator
enables the port-resource-request-groups neutron API extension before
nova adds support for it, this patch rejects server creation if such
extension is enabled in Neutron. Enabling that extension has zero
benefits without nova support hence the harsh but simple rejection.
A subsequent patch will reject server lifecycle operations in a more
sophisticated way and as soon as we support some operations, like
boot, the deployer might rightfully choose to enable the Neutron
extension.
Change-Id: I2c55d9da13a570efbc1c862116cea31aaa6aa02e
blueprint: qos-minimum-guaranteed-packet-rate
A base is something that it's inherited by other things. If everything
is a base, nothing is. This is just noise so remove it.
Change-Id: I9e2d3c37650465d0748852f8cdb82fbeba7b3f4c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Server with ARQ in the port does not support move and suspend,
reject these operations in API stage:
- resize
- shelve
- live_migrate
- evacuate
- suspend
- attach/detach a smartnic port
Reject create server with smartnic in port if minimal compute
service version less than 57
Reject create server with port which have a malformed device
profile that request multi devices, like:
{
"resources:CUSTOM_ACCELERATOR_FPGA": "2",
"trait:CUSTOM_INTEL_PAC_ARRIA10": "required",
}
Implements: blueprint sriov-smartnic-support
Change-Id: Ia705a0341fb067e746a3b91ec4fc6d149bcaffb8
If a user requests an invalid volume UUID when creating an instance,
a 'VolumeNotFound' exception will be raised. This is not currently
handled. Correct this.
Change-Id: I6137dc1b6b51321fee1c080bf4b85197b19bf223
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1930448
Users can create a server like so:
$ openstack server create --availability-zone az:host ...
This is a historical way to request that an instance be scheduled to a
specific host and it causes the scheduler to be bypassed. However, no
validation of this availability zone-host combo takes place. The host
could in fact belong to a different availability zone. If it does, we'll
end up in a very odd situation whereby the RequestSpec record for the
instance will record the availability zone requested by the user at
create time, but the Instance record itself will record the availability
zone of the host on which the instance was scheduled. This leads to even
more confusing behavior when we attempt to do something like live
migrate the instance since the RequestSpec record, with its original and
possibly invalid availability zone information, is used. The
'AvailabilityZoneFilter' will fail an error message like the following:
Availability Zone 'foo' requested. ... has AZs: bar
but the 'openstack server list --long' command will show a non-foo value
for the availability zone column.
The solution is simple: when given an availability zone-host combo, make
sure the availability zone requested matches that of the host (or, more
specifically, the host is a member of the host aggregates that form the
availability zone [1]). If not, simply ignore the requested availability
zone information in favour of using the availability zone of the host,
logging a warning just for record keeping purposes. This is deemed
preferable to failing with HTTP 400 (Bad Request) since what users are
really requesting by using this was to schedule to a specific host: the
availability zone portion of the request is really irrelevant and just
an artifact of this legacy mechanism to request hosts. If users wish to
truly validate a host-availability zone combo, they can use the 'host'
field introduced in microversion 2.74 along with the 'availability_zone'
field:
$ openstack server create --availability-zone az --host host ...
[1] https://docs.openstack.org/nova/latest/admin/aggregates.html
Change-Id: Iac0e634e66cd4e150a50935cf635f626fc11b70e
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1934770
This is significantly shorter since we're mostly dealing with local
variables.
Change-Id: Ib5456a8b02fe69592a6cc59ee77ea32386ce17a5
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
There are a number of operations that are known not to work with vDPA
interfaces and another few that may work but haven't been tested. Start
blocking these. In all cases where an operation is blocked a HTTP 409
(Conflict) is returned. This will allow lifecycle operations to be
enabled as they are tested or bugs are addressed.
Change-Id: I7f3cbc57a374b2f271018a2f6ef33ef579798db8
Blueprint: libvirt-vdpa-support
Add microversion 2.90, which allows allows users to configure the
hostname that will be exposed via the nova metadata service when
creating their instance.
Change-Id: I95047c1689ac14fa73eba48e19dc438988b78aad
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Replace six.text_type with str.
A subsequent patch will replace other six.text_type.
Change-Id: I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
To support move operations with qos ports both the source and the
destination compute hosts need to be on Ussuri level. We have service
level checks implemented in Ussuri. In Victoria we could remove those
checks as nova only supports compatibility between N and N-1 computes.
But we kept them there just for extra safety. In the meanwhile we
codified [1] the rule that nova does not support N-2 computes any
more. So in Wallaby we can assume that the oldest compute is already
on Victoria (Ussuri would be enough too).
So this patch removes the unnecessary service level checks and related
test cases.
[1] Ie15ec8299ae52ae8f5334d591ed3944e9585cf71
Change-Id: I14177e35b9d6d27d49e092604bf0f288cd05f57e
When configure 'os_compute_api:servers:create:forced_host' to
'rule:admin_or_owner', but still doesn't allow.
In nova/api/openstack/compute/servers.py#L669, the target is
set to '{}' that is not equal None, so then it will not be set
in nova/policy.py#L205.
This patch configures the target param.
Change-Id: I7a563386bd2f5d1930b5eb2cfc00425a19747e24
Closes-Bug: #1894975
When using emulated TPM, libvirt will store the persistent TPM data
under '/var/lib/libvirt/swtpm/<instance_uuid>' which is owned by the
"tss" or "root" user depending how libvirt is configured (the parent
directory, '/var/lib/libvirt/swtpm' is always owned by root). When doing
a resize or a cold migration between nodes, this data needs to be copied
to the other node to ensure that the TPM data is not lost. Libvirt
won't do this automatically for us since cold migrations, or offline
migrations in libvirt lingo, do not currently support "copying
non-shared storage or other file based storages", which includes the
vTPM device [1].
To complicate things further, even if migration/resize is supported,
only the user that nova-compute runs as is guaranteed to be able to have
SSH keys set up for passwordless access, and it's only guaranteed to be
able to copy files to the instance directory on the dest node.
The solution is to have nova (via privsep) copy the TPM files into the
local instance directory on the source and changes the ownership. This
is handled through an additional call in 'migrate_disk_and_power_off'.
As itself, nova then copies them into the instance directory on the
dest. Nova then (once again, via privsep) changes the ownership back and
moves the files to where libvirt expects to find them. This second step
is handled by 'finish_migration'. Confirming the resize will result in
the original TPM data at '/var/lib/libvirt/swtpm' being deleted by
libvirt and the copied TPM data in the instance data being cleaned up by
nova (via 'confirm_migration'), while reverting it will result on the
same on the host.
Part of blueprint add-emulated-virtual-tpm
[1] https://libvirt.org/migration.html#offline
Change-Id: I9b053919bb499c308912c8c9bff4c1fc396c1193
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
Co-authored-by: Stephen Finucane <stephenfin@redhat.com>
When _poll_unconfirmed_resizes runs or a user tries to confirm
a resize in the API, if the source compute service is down the
migration status will be stuck in "confirming" status if it never
reached the source compute. Subsequent runs of
_poll_unconfirmed_resizes will not be able to auto-confirm the
resize nor will the user be able to manually confirm the resize.
An admin could reset the status on the server to ACTIVE or ERROR
but that means the source compute never gets cleaned up since you
can only confirm or revert a resize on a server with VERIFY_RESIZE
status.
This adds a check in the API before updating the migration record
such that if the source compute service is down the API returns a
409 response as an indication to try again later.
SingleCellSimple._fake_target_cell is updated so that tests using
it can assert when a context was targeted without having to stub
nova.context.target_cell. As a result some HostManager unit tests
needed to be updated.
Change-Id: I33aa5e32cb321e5a16da51e227af2f67ed9e6713
Closes-Bug: #1855927
We're going to gradually introduce support for the various instance
operations when using vTPM due to the complications of having to worry
about the state of the vTPM device on the host. Add in API checks to
reject all manner of requests until we get to include support for each
one. With this change, the upcoming patch to turn everything on will
allow a user to create, delete and reboot an instance with vTPM, while
evacuate, rebuild, cold migration, live migration, resize, rescue and
shelve will not be supported immediately.
While we're here, we rename two unit test files so that their names
match the files they are testing and one doesn't have to spend time
finding them.
Change-Id: I3862a06ca28b383d525bcc9dcbc6fb1d4062f193
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
When rebuilding, we should only allow detaching
the volume with 'in-use' status, volume in status
such as 'retyping' should not allowed.
Change-Id: I7f93cfd18f948134c9cb429dea55740d2cf97994
Closes-Bug: #1489304
Previously disk_bus values were never validated and could easily end up
being ignored by the underlying virt driver and hypervisor.
For example, a common mistake made by users is to request a virtio-scsi
disk_bus when using the libvirt virt driver. This however isn't a valid
bus and is ignored, defaulting back to the virtio (virtio-blk) bus.
This change adds a simple validation in the compute API using the
potential disk_bus values provided by the DiskBus field class as used
when validating the hw_*_bus image properties.
Closes-Bug: #1876301
Change-Id: I77b28b9cc8f99b159f628f4655d85ff305a71db8
Enable the 'hw:cpu_dedicated_mask' flavor extra spec interface, user
can create CPU mixing instance through a flavor with following
extra spec settings:
openstack flavor set <flavor_id> \
--property hw:cpu_policy=mixed \
--property hw:cpu_dedicated_mask=0-3,7
In a topic coming later, we'll introduce another way to create a
mixed instance through the real-time interface.
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I2a3311c08a52eb11859c68ef940a0bd755a94c6b
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Bump nova-compute service version, announce the support of
'mixed' instance and then check the service version for all
nova-compute nodes in API layer.
The nova-compute nodes in cluster need to be ensured that
they support the 'mixed' instance CPU allocation policy
once they want to:
- Create a brand-new instance
- Resize to a mixed instance from a dedicated or shared instance.
And we don't support rebuilding an instance that changes
the NUMA topology, and changing the CPU policy will
definitely mean changing the NUMA topology, so nova-compute
nodes version will not be checked when rebuilding.
It is also not necessary to check the service version when
shelving and unshelving an instance, because the instance
CPU policy cannot be changed in this process, and all
compute nodes service have been checked before shelving a
mixed instance, no need to check this again.
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I59298788f26ca8f32bf3e38f3a52f72ff63fcc8b
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Introduce a 'mixed' instance CPU allocation policy and
will be worked with upcoming patches, for purpose of
creating an instance combined shared CPUs with dedicated
or realtime CPUs.
In an instance mixed with different type of CPUs, the shared CPU
shared CPU time slots with other instances, and also might be a
CPU with less or un-guaranteed hardware resources, which implies
to have no guarantee for the behavior of the workload running on
it. If we call the shared CPU as 'low priority' CPU, then the
realtime or dedicated CPU could be called as 'high priority' CPU,
user could assign more hardware CPU resources or place some
guaranteed resource to it to let the workload to entail high
performance or stable service quality.
Based on https://review.opendev.org/714704
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I99cfee14bb105a8792651129426c0c5a3749796d
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
This has introduced a new version of pycodestyle, which has improved its
parser to catch new corner cases and introduced some new tests.
Closes-Bug: #1878317
Change-Id: I853cf4dbd7ad6b7903a7f444f5c1da3f0fb21f77
servers API policy is default to admin_or_owner[1] but API
is allowed for everyone.
We can see the test trying with other project context can access the API
- https://review.opendev.org/#/c/717204
This is because API does not pass the server project_id in policy target[2]
and if no target is passed then, policy.py add the default targets which is
nothing but context.project_id (allow for everyone who try to access)[3]
This commit fix this policy by passing the server's project_id in policy
target.
Closes-bug: #1871665
Partial implement blueprint policy-defaults-refresh
[1] cd16ae25c8/nova/policies/servers.py (L285)
[2] cd16ae25c8/nova/api/openstack/compute/servers.py (L872)
[3] c16315165c/nova/policy.py (L191)
Change-Id: Ia8234fd9f4ee1871d6f225c8bd4e4adc5289d605
The block is applied to primary operations, such as pause
or shelve, but not to their reverse operations, like
unpause or unshelve, because that is not necessary.
Added functional tests for various instance operations,
including those that work and those that fail.
Rebuild functional test passes.
Change-Id: I016bc1812404ce1019c71b7a3363f34acc3f8aed
Blueprint: nova-cyborg-interaction
Find the name of the device profile, if any, in flavor extra specs.
Get its profile groups (equiv to flavor request groups) from Cyborg.
Parse/validate them similar to extra_specs.
Generate RequestGroup objects and add them to the request spec
(in requested_resources field, following precedent).
Change-Id: Icd2ee9024dd4af0a7eb105eca14df8e458e9de77
Blueprint: nova-cyborg-interaction
Microversion bump to allow non-admin user to use more filters key
when listing instances.
In order to stay coherent, all existing instance filters who are
related to a field readable by default to non admin users when showing
instance details, should be allowed by default without policy
modification.
Implements: blueprint non-admin-filter-instance-by-az
Change-Id: Ia66d3a1ceb74ed521cf44922929b2a502f3ee935
When resizing a non-volume-backed instance, we call the
'_validate_flavor_image_nostatus' function to do a myriad of checks with
the aim of ensuring the flavor and image don't conflict. One of these
checks tests whether the flavor is requesting a smaller local disk than
the size of the image of the minimum size the image says it requires. If
this check fails, it will raise the 'FlavorDiskSmallerThanImage' or
'FlavorDiskSmallerThanMinDisk' exceptions, respectively. We currently
handle this exception in the 'create' and 'rebuild' flows but do not in
the 'resize' path. Correct this by way of adding this exception to
'INVALID_FLAVOR_IMAGE_EXCEPTIONS', a list of exceptions that can be
raised when an flavor and image conflict.
The fix for this issue also highlights another exception that can be
raised in the three code paths but is not handled by them all,
'FlavorMemoryTooSmall'. This is added to
'INVALID_FLAVOR_IMAGE_EXCEPTIONS' also.
Change-Id: Idc82ed3bcfc37220a50d9e2d552be5ab8844374a
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1867077
Currently, when attempting to rebuild an instance with an image with invalid
architecture, the API raises a 500 error and leaves the instance stuck in
the REBUILDING task state.This patch adds checking image's architecture
before updating instance's task_state. And catches
exception.InvalidArchitectureName then returns HTTPBadRequest.
Change-Id: I25eff0271c856a8d3e83867b448e1dec6f6732ab
Closes-Bug: #1861749
This doesn't exist for 'nova.volume' and no longer exists for
'nova.network'. There's only one image backend we support, so do like
we've done elsewhere and just use 'nova.image.glance'.
Change-Id: I7ca7d8a92dfbc7c8d0ee2f9e660eabaa7e220e2a
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Nova has never supported direct booting of an image of an encrypted
volume uploaded to Glance via the Cinder upload-volume-to-image
process, but instead of rejecting such a request, an 'active' but
unusable instance is created. This patch allows Nova to use image
metadata to detect such an image and reject the boot request.
Change-Id: Idf84ccff254d26fa13473fe9741ddac21cbcf321
Related-bug: #1852106
Closes-bug: #1863611