Commit Graph

2365 Commits (19.0.0)

Author SHA1 Message Date
Matt Riedemann 33cab70e1c Add functional test to delete a server while in VERIFY_RESIZE
This scenario came up while discussing what might be causing
leaked resource allocations in Placement [1].

I fully expected the new test to fail because I couldn't see
from the compute manager where the migration-based allocations
would be cleaned up when deleting the server. It turns out that
when deleting a VERIFY_RESIZE server, the API confirms the resize
which drops the migration-based allocations on the source node
before deleting the server on the target node.

Since this is not obvious, a comment in the compute API
_confirm_resize_on_deleting() method is added.


Change-Id: I4c12502c86c7ac27369d119e0f97768cf41695b5
4 years ago
Zuul a5e3054e1d Merge "Share snapshot image membership with instance owner" 4 years ago
melanie witt a7de4917a0 Populate InstanceMapping.user_id during migrations and schedules
The InstanceMapping user_id field is a new, non-nullable field
representing the user_id for the instance.

When new instance create requests come in, we create the instance
mapping. We will set user_id here before creating the record.

Some virtual interface online data migration and map_instances routine
create InstanceMapping records and since the user_id field did not
previously exist, they were not setting it. We will populate user_id in
these cases.

Finally, whenever an API does a compute_api.get(), we can
opportunistically set and save user_id on the instance mapping if it is
not set.

Part of blueprint count-quota-usage-from-placement

Change-Id: Ic4bb7b49b90a3d6d7ce6c6c62d87836f96309f06
4 years ago
Zuul 6304bcf781 Merge "Validate PCI aliases early in resize" 4 years ago
Zuul 14c4c8040c Merge "Flavor extra spec and image properties validation from API" 4 years ago
Chris Friesen fb908e154d Validate PCI aliases early in resize
Add an early check to validate any PCI aliases in the requested flavor
during a resize.  This should ensure the user gets a useful error

blueprint: flavor-extra-spec-image-property-validation
Change-Id: I25454cd408e08589e5cfd6107dcbadd15bbb405f
Signed-off-by: Chris Friesen <>
4 years ago
Chris Friesen 5e7b840e48 Flavor extra spec and image properties validation from API
Validate the combination of the flavor extra-specs and image properties
as early as possible once they're both known (since you can specify
mutually-incompatible changes in the two places). If validation failed
then synchronously return error to user. We need to do this anywhere
the flavor or image changes, so basically instance creation, rebuild,
and resize.

- Rename _check_requested_image() to _validate_flavor_image() and add
  a call from the resize code path.  (It's already called for create
  and rebuild.)
- In _validate_flavor_image() add new checks to validate numa related
  options from flavor and image including CPU policy, CPU thread
  policy, CPU topology, memory topology, hugepages, CPU pinning,
  serial ports, realtime mask, etc.
- Added new 400 exceptions in Server API correspondent to added

blueprint: flavor-extra-spec-image-property-validation
Change-Id: I06fad233006c7bab14749a51ffa226c3801f951b
Signed-off-by: Jack Ding <>
Signed-off-by: Chris Friesen <>
4 years ago
Andrey Volkov 8e19ef4173 Check hosts have no instances for AZ rename
Update aggregate and update aggregate metadata API calls have the
ability to update availability zone name for the aggregate. If the
aggregate is not empty (has hosts with instances on it)
the update leads to discrepancy for objects saving availability zone as a
string but not reference.

From devstack DB they are:
- cinder.backups.availability_zone
- cinder.consistencygroups.availability_zone
- cinder.groups.availability_zone
- cinder.volumes.availability_zone
- neutron.agents.availability_zone
- neutron.networks.availability_zone_hints
- neutron.router_extra_attributes.availability_zone_hints
- nova.dns_domains.availability_zone
- nova.instances.availability_zone
- nova.volume_usage_cache.availability_zone
- nova.shadow_dns_domains.availability_zone
- nova.shadow_instances.availability_zone
- nova.shadow_volume_usage_cache.availability_zone

Why that's bad?
First, API and Horizon show different values for host and instance for
example. Second, migration for instances with changed availability
zone fails with "No valid host found" for old AZ.

This change adds an additional check to aggregate an Update Aggregate API call.
With the check, it's not possible to rename AZ if the corresponding
aggregate has instances in any hosts.

PUT /os-aggregates/{aggregate_id} and
POST /os-aggregates/{aggregate_id}/action return HTTP 400 for
availability zone renaming if the hosts of the aggregate have any instances.
It's similar to conflicting AZ names error already available.

Change-Id: Ic27195e46502067c87ee9c71a811a3ca3f610b73
Closes-Bug: #1378904
4 years ago
Zuul 5a09c81af3 Merge "Plumbing for allowing the all-tenants filter with down cells" 4 years ago
Zuul f77791954d Merge "Fup for the bandwidth resource provider series" 4 years ago
Zuul bff3fd1cdb Merge "API: Remove evacuate/live-migrate 'force' parameter" 4 years ago
Matt Riedemann 35cc0f5e94 Share snapshot image membership with instance owner
When an admin creates a snapshot of another project owners
instance, either via the createImage API directly, or via the
shelve or createBackup APIs, the admin project is the owner
of the image and the owner of the instance (in another project)
cannot "see" the image. This is a problem, for example, if an
admin shelves a tenant user's server and then the user tries to
unshelve the server because the user will not have access to
get the shelved snapshot image.

This change fixes the problem by leveraging the sharing feature [1]
in the v2 image API. When a snapshot is created where the request
context project_id does not match the owner of the instance project_id,
the instance owner project_id is granted sharing access to the image.
By default, this means the instance owner (tenant user) can get the
image directly via the image ID if they know it, but otherwise the image
is not listed for the user to avoid spamming their image listing. In the
case of unshelve, the end user does not need to know the image ID since
it is stored in the instance system_metadata. Regardless, the user could
accept the pending image membership if they want to see the snapshot
show up when listing available images.

Note that while the non-admin project has access to the snapshot
image, they cannot delete it. For example, if the user tries to
delete or unshelve a shelved offloaded server, nova will try to
delete the snapshot image which will fail and log a warning since
the user does not own the image (the admin does). However, the
delete/unshelve operations will not fail because the image cannot
be deleted, which is an acceptable trade-off.

Due to some very old legacy virt driver code which started in the
libvirt driver and was copied to several other drivers, several virt
drivers had to be modified to not overwrite the "visibility=shared"
image property by passing "is_public=False" when uploading the image
data. There was no point in the virt drivers setting is_public=False
since the API already controls that. It does mean, however, that
the bug fix is not really in effect until both the API and compute
service code has this fix.

A functional test is added which depends on tracking the owner/member
values in the _FakeImageService fixture. Impacted unit tests are
updated accordingly.


Change-Id: If53bc8fa8ab4a8a9072061af7afed53fc12c97a5
Closes-Bug: #1675791
4 years ago
Stephen Finucane 36a91936a8 API: Remove evacuate/live-migrate 'force' parameter
Add a new microversion that removes support for the aforementioned
argument, which cannot be adequately guaranteed in the new placement

Change-Id: I2a395aa6eccad75a97fa49e993b0300bdcfc7258
Signed-off-by: Stephen Finucane <>
Implements: blueprint remove-force-flag-from-live-migrate-and-evacuate
4 years ago
Surya Seetharaman 0c8824b44d Plumbing for allowing the all-tenants filter with down cells
This patch lays the underground work for supporting the
``all-tenants`` filter.

Related to blueprint handling-down-cell

Change-Id: I7dcef20aed0178c81d6580aa9534288eaa383dab
4 years ago
Surya Seetharaman 646d575744 Plumbing for ignoring list_records_by_skipping_down_cells
This patch adds the plumbing required to ignore the value of the
``[api]/list_records_by_skipping_down_cells`` config from the new
microversion if cell_down_support is True. This config will be
considered only if cell_down_support is False in which case we
look at the [api]/list_records_by_skipping_down_cells and
accordingly skip the records from the down cells or generate an
API exception as the response.

The description of list_records_by_skipping_down_cells is updated
in Id9f12532897912b39093f63e9286540d9029edeb when the microversion
is added.

Related to blueprint handling-down-cell

Change-Id: Icbe27c941c9b934f8f1894e9b9da1d34f047e942
4 years ago
Balazs Gibizer 03bc8b6a6b Transfer port.resource_request to the scheduler
This patch collects the resource requests from each neutron port
involved in a server create request. Converts each request to
a RequestGroup object and includes them in the RequestSpec.
This way the requests are reaching the scheduler and there
they are included in the generation of the allocation_candidates

This patch only handles the happy path of a server create request. But
it adds couple of TODOs to places where the server move operations
related code paths need to be implemented. That implementation will be
part of subsequent patches.

Note that this patch technically makes it possible to boot server with
one neutron port that has resource request. But it does not handle
multiple such ports or SRIOV ports where two PFs are supporting the
same physnet as well as many server lifecycle operations like resize,
migrate, live-migrate, unshelve. To avoid possible resource allocation
inconsistencies due to the partial support nova rejects any requests
that involves such ports. See the previous patches in this patch
series for details.

Also note that the simple boot cases are verified with functional tests
and in those tests we need to mock out the above described logic that
reject such requests. See a more background about this approach on the
ML [1].


blueprint bandwidth-resource-provider
Change-Id: Ica6152ccb97dce805969d964d6ed032bfe22a33f
4 years ago
Matt Riedemann 7e229ba40d Extract compute API _create_image to compute.utils
This refactors the _create_image and related
_initialize_instance_snapshot_metadata methods to
nova.compute.utils and makes them public utilities
so that code outside the API can re-use it.

Change-Id: I08630bbbd1df9b758ff5088f9b48183a503ecc09
4 years ago
Zuul 99b452ac09 Merge "Rip out the SchedulerClient" 4 years ago
Balazs Gibizer cc16de04e8 Fup for the bandwidth resource provider series
This patch fixes minor comments found in the following patches:

Change-Id: I8c51654eb0744c400a1b55a9d6b8594a103dadcc
blueprint: bandwidth-resource-provider
4 years ago
Zuul fae8b3efcd Merge "Reject server create with port having resource request" 4 years ago
Zuul af4a13b79c Merge "Read port resource request from Neutron" 4 years ago
Zuul 9fab7e73e3 Merge "Reject interface attach with QoS aware port" 4 years ago
Balazs Gibizer 735c2181dc Reject server create with port having resource request
Nova does not consider the resource request of a Neutron port as of now.
So this patch makes sure that server create request is rejected if it
involves a port that has resource request. When the feature is ready on
the nova side this limitation will be lifted.

Change-Id: I97f06d0ec34cbd75c182caaa686b8de5c777a576
blueprint: bandwidth-resource-provider
4 years ago
Balazs Gibizer 9ba910bb53 Read port resource request from Neutron
This patch collects the resource requests from each neutron port
involved in a server create request. Converts each request to
a RequestGroup object.

blueprint bandwidth-resource-provider

Change-Id: I4473cb192447b5bfa9d1dfcc0dd5216c536caf73
4 years ago
Balazs Gibizer bd6f33070b Reject interface attach with QoS aware port
Attaching a port with minimum bandwidth policy would require to update
the allocation of the server. But for that nova would need to select the
proper networking resource provider under the compute resource provider
the server is running on.

For the first iteration of the feature we consider this out of scope. To
avoid resource allocation inconsistencies this patch propose to reject
such attach interface request. Rejecting such interface attach does not
break existing functionality as today only the SRIOV Neutron backend
supports the minimum bandwidth policy but Nova does not support
interface attach with SRIOV interfaces today.

A subsequent patch will handle attaching a network that has QoS policy.

Co-Authored-By: Elod Illes <>

Change-Id: Id8b5c48a6e8cf65dc0a7dc13a80a0a72684f70d9
blueprint: bandwidth-resource-provider
4 years ago
Kevin_Zheng 41b982c9fe Don't call begin_detaching when detaching volume from shelved vm
When shelve an instance, if the instance has volume attached,
with new attach/detach flow, we will delete the old attachment
and create a new attachment, the volume status will be ``reserved``.

If the user tries to detach these volumes, it fails due to that
Cinder does not allow a begin_detaching() call on a `reserved` volume.

Actually for shelved instances, we can just skip this step and
directly detach it.

Change-Id: Ib1799feebbd8f4b0f389168939df7e5e90c8add1
closes-bug: #1808089
4 years ago
Eric Fried 4c34ab574e Rip out the SchedulerClient
Since all remaining SchedulerClient methods were direct passthroughs to
the SchedulerQueryClient, and nothing special is being done anymore to
instantiate that client, the SchedulerClient is no longer necessary. All
references to it are replaced with direct references to

Change-Id: I57dd199a7c5e762d97a600307aa13a7aeb62d2b2
4 years ago
Matt Riedemann 5938fa9b46 Drop old service version check compat from _delete_while_booting
Like change I11083aa3c78bd8b6201558561457f3d65007a177 the API
should not need to have compatibility code for mitaka-era
services, so this drops the service version check from the
_delete_while_booting method.

Change-Id: I26d58e8156b209c26d01da0baf7137d33eda4307
5 years ago
Matt Riedemann 52d515ba7d Remove legacy request spec compat code from API
For operations which require a request spec, we now require that
the request spec exists when we look it up in the API or we fail.
All instances created since Newton should have a request spec, and
instances created before Newton should have been migrated to have
a request spec as part of an online data migration. Failing to find
a request spec will now result in a 500 internal server error.

As a result, the RequestSpec.get_by_instance_uuid query is moved
to before updating the instance.task_state in the various API methods
since we want to do all we can before updating the instance in case
we fail.

Related to blueprint request-spec-use-by-compute

Change-Id: I34ffaf285718059b55f90e812b57f1e11d566c6f
5 years ago
Matt Riedemann 66e44c6429 Only construct SchedulerReportClient on first access from API
With commit 5d1a500185 each
API/AggregateAPI class instance constructs a SchedulerReportClient
which holds an in-memory lock during its initialization.

With at least 34 API extensions constructing at least
one of those two API classes, the accumulated affect of the
SchedulerReportClient construction can slow down nova-api startup
times, especially when running with multiple API workers, like
in our tempest-full CI job (there are 2 workers, so 68 inits).

This change simply defers constructing the SchedulerReportClient
until it is used, which is only in a few spots in the API code,
which should help with nova-api start times.

The AggregateAPI also has to construct the SchedulerQueryClient
separately because SchedulerClient creates both the query and
report clients.

Long-term we could consider making it a singleton in nova.compute.api
if that is safe (the aggregate code might be relying on some caching
aspects in the SchedulerReportClient).

Change-Id: Idf6e548d725db0181629a451f46b6a3a5850d186
Closes-Bug: #1807219
5 years ago
Matt Riedemann 097693aaa3 Drop pre-cellsv2 compat in compute API.get()
This removes the nova-osapi_compute service version
compatibility check for cells v2 which was added in

A "nova-status upgrade check" was added for that
in Rocky:


And it is also causing issues with functional tests
where instances are buried in cell0 and because we
don't use the instance mapping, we fail to find the

As a result of this, compute API tests that rely on
retrieving a real instance out of the database will
also need to make sure an associated instance mapping
record exists.

Change-Id: I11083aa3c78bd8b6201558561457f3d65007a177
Closes-Bug: #1800472
5 years ago
Surya Seetharaman 870d718c17 Minimal construct plumbing for nova service-list when a cell is down
This patch modifies the existing behavior of nova service-list which
ignores a down cell to returning a partial construct consisting of
the binary and host info obtained from the API host_mappings table
for the compute services in the down cells. This is enabled by passing
a new enough microversion, which is added in a later patch in this

Related to blueprint handling-down-cell

Change-Id: I9cf5e4eb7f70c495001fd064c060d7d5b3dd5bff
5 years ago
Surya Seetharaman ef5aa8a818 Minimal construct plumbing for nova show when a cell is down
This patch sets the stage for modifying the behavior of nova show
which currently gives a 500 when the cell in which the instance
lives is down. The new behavior will return a partial construct
consisting of uuid, project_id, created_at from instance_mappings
table and user_id, flavor, image_ref and availability_zone info
from request_specs table.  Note that the rest of the keys will be
missing. This behavior will be enabled by passing a new enough
microversion, handling for which is introduced later in this series.

Related to blueprint handling-down-cell

Change-Id: Iaea1cb4ed93bb98f451de4f993106d7891ca3682
5 years ago
Surya Seetharaman 031314b6d8 Refactor scatter-gather utility to return exception objects
Scatter-gather utility returns a raised_exception_sentinel for
all kinds of exceptions that are caught and often times there
maybe situations where we may have to handle the different types
of exceptions differently. To facilitate that, it might be more
useful to return the Exception object itself instead of the dummy
raised_exception_sentinel so that based on the result's exception
type we can handle them differently.

Related to blueprint handling-down-cell

Change-Id: I861b223ee46b0f0a31f646a4b45f8a02410253cf
5 years ago
Surya Seetharaman 37a5ef5113 Minimal construct plumbing for nova list when a cell is down
This patch augments the existing behavior of nova list which
ignores a down cell to returning a partial construct consisting of
uuid, project_id and created_at fields obtained from the API
instance_mappings table for the instances in the down cell by
introducing a new API microversion. This way when the user
gets partial results for certain records, he/she will know
that that particular cell is unreachable. Note that the other
keys will be missing from the API response. Also note that this
is not externally poke-able until the microversion and API change
later in this series.

Related to blueprint handling-down-cell

Change-Id: Ie9a9aed0676d469e5180d4c2d2631cf339335b2d
5 years ago
Zuul 5e8af9fc09 Merge "Use nova-consoleauth only if workaround enabled" 5 years ago
Zuul 20bc0136d0 Merge "Add compute API validation for when a volume_type is requested" 5 years ago
zhangbailin 686dbc3640 Add compute API validation for when a volume_type is requested
This adds validation in the compute API for when a
block_device_mapping_v2 request specifies a volume_type. Note that
this is all noop code right now since users cannot specify that field in
the API until the microversion is added which enables that feature.
In other words, this is just plumbing.

Part of bp boot-instance-specific-storage-backend
Change-Id: I45bd42908d44a0f05e1231febab926b23232b57b
5 years ago
melanie witt b49209cc29 Use nova-consoleauth only if workaround enabled
In Rocky, we deprecated the nova-consoleauth service but there were
unconditional calls to nova-consoleauth in the compute/api, which
made it impossible to avoid running the nova-consoleauth service.

This adds conditional checks to call nova-consoleauth only if the
[workarounds]enable_consoleauth configuration option is True. The
option defaults to False because the default console token auth TTL
is 10 minutes and only operators who have configured much longer TTL
or otherwise wish to avoid resetting all consoles at upgrade time
need to use the option.

This also updates the /os-console-auth-tokens/{console_token} API to
use nova-consoleauth only if the [workarounds] option is enabled. This
had to be done in the same change because the conditional checks in
the compute/api code caused the /os-console-auth-tokens API functional
tests to fail to find token authorizations in nova-consoleauth.

Closes-Bug: #1788470
Closes-Bug: #1795982

Change-Id: Iff6020f1a10afc476864f979faf251ef5a1a6184
5 years ago
Zuul ba509fddd5 Merge "Follow up for Ib6f95c22ffd3ea235b60db4da32094d49c2efa2a" 5 years ago
Zuul 99bf62e427 Merge "Send soft_delete from context manager" 5 years ago
Balazs Gibizer 669d6499ee Follow up for Ib6f95c22ffd3ea235b60db4da32094d49c2efa2a
This patch removes the host parameter of the
notify_about_instance_delete() call as it is always filled with value.

Change-Id: Iff3b605b9410d5a097b53f532870df65780bc1e4
Implements: bp versioned-notification-transformation-stein
5 years ago
Dan Smith c8e2de6684 Enforce case-sensitive hostnames in aggregate host add
If we are using a case-insensitive backend database like mysql, a
user can request an aggregate host add with a non-matching hostname
and we will not signal failure. We will create a mapping which will
not actually include that host in the aggregate, which will be
confusing later. This change makes us fail if the host name does not
match exactly, which is the same behavior as if we are using a
case-sensitive backend (like postgres).

Change-Id: I597dee74d33de337913eddda74ab056fbf81a23c
Closes-Bug: #1709260
5 years ago
Takashi NATSUME 61a12e1801 Remove an unnecessary comment
Remove an unnecessary 'TODO' comment
because the check is already done
when checking the host existence.

Change-Id: I242159f80a7a682aac4632511fbd3ecc9b2d34db
5 years ago
Zuul 0ba34a8184 Merge "Update RequestSpec.flavor on resize_revert" 5 years ago
Zuul 3be8dbb68f Merge "Remove old check_attach version check in API" 5 years ago
Zuul c815013b6b Merge "Fix soft deleting vm fails after "nova resize" vm" 5 years ago
Matt Riedemann fc5839b959 Remove old check_attach version check in API
In Pike, I3a3caa4c566ecc132aa2699f8c7e5987bbcc863a added
a version check for computes during boot from volume
to determine if the computes were running Ocata and would
call the "check_attach" routine which was removed in that
same change.

Since the API doesn't support computes that old at this point
we can remove that version compat checking code. Related tests
are updated appropriately including several tests for _validate_bdm
that were incorrectly passing because of the mock on
Service.get_minimum_version when in fact instances going through
_validate_bdm won't have their attribute set.

Finally, the really old "check_attach" comment in the
get_image_metadata_from_volume() method is updated for context.

Change-Id: Iaffbb019fef7779e7fa44306aacca954512b6970
5 years ago
Balazs Gibizer e83dbe1205 Send soft_delete from context manager
This patch moves soft_delete notification sending to the
notify_about_instance_delete context manager to unify the
delete notification sending code practice.

Implements: bp versioned-notification-transformation-stein
Change-Id: Ib6f95c22ffd3ea235b60db4da32094d49c2efa2a
5 years ago
Stephen Finucane bea2ab0379 api: Remove unnecessary default parameter
This was always set by callers and appears to be here purely to avoid
updating tests. Update the tests and ease comprehension of the function.

Change-Id: Ib22b64ca499ffdb1a32d21f52240e80676ae1165
5 years ago