This scenario came up while discussing what might be causing
leaked resource allocations in Placement [1].
I fully expected the new test to fail because I couldn't see
from the compute manager where the migration-based allocations
would be cleaned up when deleting the server. It turns out that
when deleting a VERIFY_RESIZE server, the API confirms the resize
which drops the migration-based allocations on the source node
before deleting the server on the target node.
Since this is not obvious, a comment in the compute API
_confirm_resize_on_deleting() method is added.
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-November/136295.html
Change-Id: I4c12502c86c7ac27369d119e0f97768cf41695b5
The InstanceMapping user_id field is a new, non-nullable field
representing the user_id for the instance.
When new instance create requests come in, we create the instance
mapping. We will set user_id here before creating the record.
Some virtual interface online data migration and map_instances routine
create InstanceMapping records and since the user_id field did not
previously exist, they were not setting it. We will populate user_id in
these cases.
Finally, whenever an API does a compute_api.get(), we can
opportunistically set and save user_id on the instance mapping if it is
not set.
Part of blueprint count-quota-usage-from-placement
Change-Id: Ic4bb7b49b90a3d6d7ce6c6c62d87836f96309f06
Add an early check to validate any PCI aliases in the requested flavor
during a resize. This should ensure the user gets a useful error
message.
blueprint: flavor-extra-spec-image-property-validation
Change-Id: I25454cd408e08589e5cfd6107dcbadd15bbb405f
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
Validate the combination of the flavor extra-specs and image properties
as early as possible once they're both known (since you can specify
mutually-incompatible changes in the two places). If validation failed
then synchronously return error to user. We need to do this anywhere
the flavor or image changes, so basically instance creation, rebuild,
and resize.
- Rename _check_requested_image() to _validate_flavor_image() and add
a call from the resize code path. (It's already called for create
and rebuild.)
- In _validate_flavor_image() add new checks to validate numa related
options from flavor and image including CPU policy, CPU thread
policy, CPU topology, memory topology, hugepages, CPU pinning,
serial ports, realtime mask, etc.
- Added new 400 exceptions in Server API correspondent to added
validations.
blueprint: flavor-extra-spec-image-property-validation
Change-Id: I06fad233006c7bab14749a51ffa226c3801f951b
Signed-off-by: Jack Ding <jack.ding@windriver.com>
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
Update aggregate and update aggregate metadata API calls have the
ability to update availability zone name for the aggregate. If the
aggregate is not empty (has hosts with instances on it)
the update leads to discrepancy for objects saving availability zone as a
string but not reference.
From devstack DB they are:
- cinder.backups.availability_zone
- cinder.consistencygroups.availability_zone
- cinder.groups.availability_zone
- cinder.services.availability_zone
- cinder.volumes.availability_zone
- neutron.agents.availability_zone
- neutron.networks.availability_zone_hints
- neutron.router_extra_attributes.availability_zone_hints
- nova.dns_domains.availability_zone
- nova.instances.availability_zone
- nova.volume_usage_cache.availability_zone
- nova.shadow_dns_domains.availability_zone
- nova.shadow_instances.availability_zone
- nova.shadow_volume_usage_cache.availability_zone
Why that's bad?
First, API and Horizon show different values for host and instance for
example. Second, migration for instances with changed availability
zone fails with "No valid host found" for old AZ.
This change adds an additional check to aggregate an Update Aggregate API call.
With the check, it's not possible to rename AZ if the corresponding
aggregate has instances in any hosts.
PUT /os-aggregates/{aggregate_id} and
POST /os-aggregates/{aggregate_id}/action return HTTP 400 for
availability zone renaming if the hosts of the aggregate have any instances.
It's similar to conflicting AZ names error already available.
Change-Id: Ic27195e46502067c87ee9c71a811a3ca3f610b73
Closes-Bug: #1378904
When an admin creates a snapshot of another project owners
instance, either via the createImage API directly, or via the
shelve or createBackup APIs, the admin project is the owner
of the image and the owner of the instance (in another project)
cannot "see" the image. This is a problem, for example, if an
admin shelves a tenant user's server and then the user tries to
unshelve the server because the user will not have access to
get the shelved snapshot image.
This change fixes the problem by leveraging the sharing feature [1]
in the v2 image API. When a snapshot is created where the request
context project_id does not match the owner of the instance project_id,
the instance owner project_id is granted sharing access to the image.
By default, this means the instance owner (tenant user) can get the
image directly via the image ID if they know it, but otherwise the image
is not listed for the user to avoid spamming their image listing. In the
case of unshelve, the end user does not need to know the image ID since
it is stored in the instance system_metadata. Regardless, the user could
accept the pending image membership if they want to see the snapshot
show up when listing available images.
Note that while the non-admin project has access to the snapshot
image, they cannot delete it. For example, if the user tries to
delete or unshelve a shelved offloaded server, nova will try to
delete the snapshot image which will fail and log a warning since
the user does not own the image (the admin does). However, the
delete/unshelve operations will not fail because the image cannot
be deleted, which is an acceptable trade-off.
Due to some very old legacy virt driver code which started in the
libvirt driver and was copied to several other drivers, several virt
drivers had to be modified to not overwrite the "visibility=shared"
image property by passing "is_public=False" when uploading the image
data. There was no point in the virt drivers setting is_public=False
since the API already controls that. It does mean, however, that
the bug fix is not really in effect until both the API and compute
service code has this fix.
A functional test is added which depends on tracking the owner/member
values in the _FakeImageService fixture. Impacted unit tests are
updated accordingly.
[1] https://developer.openstack.org/api-ref/image/v2/index.html#sharing
Change-Id: If53bc8fa8ab4a8a9072061af7afed53fc12c97a5
Closes-Bug: #1675791
Add a new microversion that removes support for the aforementioned
argument, which cannot be adequately guaranteed in the new placement
world.
Change-Id: I2a395aa6eccad75a97fa49e993b0300bdcfc7258
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Implements: blueprint remove-force-flag-from-live-migrate-and-evacuate
APIImpact
This patch lays the underground work for supporting the
``all-tenants`` filter.
Related to blueprint handling-down-cell
Change-Id: I7dcef20aed0178c81d6580aa9534288eaa383dab
This patch adds the plumbing required to ignore the value of the
``[api]/list_records_by_skipping_down_cells`` config from the new
microversion if cell_down_support is True. This config will be
considered only if cell_down_support is False in which case we
look at the [api]/list_records_by_skipping_down_cells and
accordingly skip the records from the down cells or generate an
API exception as the response.
The description of list_records_by_skipping_down_cells is updated
in Id9f12532897912b39093f63e9286540d9029edeb when the microversion
is added.
Related to blueprint handling-down-cell
Change-Id: Icbe27c941c9b934f8f1894e9b9da1d34f047e942
This patch collects the resource requests from each neutron port
involved in a server create request. Converts each request to
a RequestGroup object and includes them in the RequestSpec.
This way the requests are reaching the scheduler and there
they are included in the generation of the allocation_candidates
query.
This patch only handles the happy path of a server create request. But
it adds couple of TODOs to places where the server move operations
related code paths need to be implemented. That implementation will be
part of subsequent patches.
Note that this patch technically makes it possible to boot server with
one neutron port that has resource request. But it does not handle
multiple such ports or SRIOV ports where two PFs are supporting the
same physnet as well as many server lifecycle operations like resize,
migrate, live-migrate, unshelve. To avoid possible resource allocation
inconsistencies due to the partial support nova rejects any requests
that involves such ports. See the previous patches in this patch
series for details.
Also note that the simple boot cases are verified with functional tests
and in those tests we need to mock out the above described logic that
reject such requests. See a more background about this approach on the
ML [1].
[1] http://lists.openstack.org/pipermail/openstack-discuss/2018-December/001129.html
blueprint bandwidth-resource-provider
Change-Id: Ica6152ccb97dce805969d964d6ed032bfe22a33f
This refactors the _create_image and related
_initialize_instance_snapshot_metadata methods to
nova.compute.utils and makes them public utilities
so that code outside the API can re-use it.
Change-Id: I08630bbbd1df9b758ff5088f9b48183a503ecc09
This patch fixes minor comments found in the following patches:
I7e1edede827cf8469771c0496b1dce55c627cf5d
I4473cb192447b5bfa9d1dfcc0dd5216c536caf73
I97f06d0ec34cbd75c182caaa686b8de5c777a576
Change-Id: I8c51654eb0744c400a1b55a9d6b8594a103dadcc
blueprint: bandwidth-resource-provider
Nova does not consider the resource request of a Neutron port as of now.
So this patch makes sure that server create request is rejected if it
involves a port that has resource request. When the feature is ready on
the nova side this limitation will be lifted.
Change-Id: I97f06d0ec34cbd75c182caaa686b8de5c777a576
blueprint: bandwidth-resource-provider
This patch collects the resource requests from each neutron port
involved in a server create request. Converts each request to
a RequestGroup object.
blueprint bandwidth-resource-provider
Change-Id: I4473cb192447b5bfa9d1dfcc0dd5216c536caf73
Attaching a port with minimum bandwidth policy would require to update
the allocation of the server. But for that nova would need to select the
proper networking resource provider under the compute resource provider
the server is running on.
For the first iteration of the feature we consider this out of scope. To
avoid resource allocation inconsistencies this patch propose to reject
such attach interface request. Rejecting such interface attach does not
break existing functionality as today only the SRIOV Neutron backend
supports the minimum bandwidth policy but Nova does not support
interface attach with SRIOV interfaces today.
A subsequent patch will handle attaching a network that has QoS policy.
Co-Authored-By: Elod Illes <elod.illes@ericsson.com>
Change-Id: Id8b5c48a6e8cf65dc0a7dc13a80a0a72684f70d9
blueprint: bandwidth-resource-provider
When shelve an instance, if the instance has volume attached,
with new attach/detach flow, we will delete the old attachment
and create a new attachment, the volume status will be ``reserved``.
If the user tries to detach these volumes, it fails due to that
Cinder does not allow a begin_detaching() call on a `reserved` volume.
Actually for shelved instances, we can just skip this step and
directly detach it.
Change-Id: Ib1799feebbd8f4b0f389168939df7e5e90c8add1
closes-bug: #1808089
Since all remaining SchedulerClient methods were direct passthroughs to
the SchedulerQueryClient, and nothing special is being done anymore to
instantiate that client, the SchedulerClient is no longer necessary. All
references to it are replaced with direct references to
SchedulerQueryClient.
Change-Id: I57dd199a7c5e762d97a600307aa13a7aeb62d2b2
Like change I11083aa3c78bd8b6201558561457f3d65007a177 the API
should not need to have compatibility code for mitaka-era
services, so this drops the service version check from the
_delete_while_booting method.
Change-Id: I26d58e8156b209c26d01da0baf7137d33eda4307
For operations which require a request spec, we now require that
the request spec exists when we look it up in the API or we fail.
All instances created since Newton should have a request spec, and
instances created before Newton should have been migrated to have
a request spec as part of an online data migration. Failing to find
a request spec will now result in a 500 internal server error.
As a result, the RequestSpec.get_by_instance_uuid query is moved
to before updating the instance.task_state in the various API methods
since we want to do all we can before updating the instance in case
we fail.
Related to blueprint request-spec-use-by-compute
Change-Id: I34ffaf285718059b55f90e812b57f1e11d566c6f
With commit 5d1a500185 each
API/AggregateAPI class instance constructs a SchedulerReportClient
which holds an in-memory lock during its initialization.
With at least 34 API extensions constructing at least
one of those two API classes, the accumulated affect of the
SchedulerReportClient construction can slow down nova-api startup
times, especially when running with multiple API workers, like
in our tempest-full CI job (there are 2 workers, so 68 inits).
This change simply defers constructing the SchedulerReportClient
until it is used, which is only in a few spots in the API code,
which should help with nova-api start times.
The AggregateAPI also has to construct the SchedulerQueryClient
separately because SchedulerClient creates both the query and
report clients.
Long-term we could consider making it a singleton in nova.compute.api
if that is safe (the aggregate code might be relying on some caching
aspects in the SchedulerReportClient).
Change-Id: Idf6e548d725db0181629a451f46b6a3a5850d186
Closes-Bug: #1807219
This removes the nova-osapi_compute service version
compatibility check for cells v2 which was added in
Newton.
A "nova-status upgrade check" was added for that
in Rocky:
Ie2bc4616439352850cf29a9de7d33a06c8f7c2b8
And it is also causing issues with functional tests
where instances are buried in cell0 and because we
don't use the instance mapping, we fail to find the
instance.
As a result of this, compute API tests that rely on
retrieving a real instance out of the database will
also need to make sure an associated instance mapping
record exists.
Change-Id: I11083aa3c78bd8b6201558561457f3d65007a177
Closes-Bug: #1800472
This patch modifies the existing behavior of nova service-list which
ignores a down cell to returning a partial construct consisting of
the binary and host info obtained from the API host_mappings table
for the compute services in the down cells. This is enabled by passing
a new enough microversion, which is added in a later patch in this
series.
Related to blueprint handling-down-cell
Change-Id: I9cf5e4eb7f70c495001fd064c060d7d5b3dd5bff
This patch sets the stage for modifying the behavior of nova show
which currently gives a 500 when the cell in which the instance
lives is down. The new behavior will return a partial construct
consisting of uuid, project_id, created_at from instance_mappings
table and user_id, flavor, image_ref and availability_zone info
from request_specs table. Note that the rest of the keys will be
missing. This behavior will be enabled by passing a new enough
microversion, handling for which is introduced later in this series.
Related to blueprint handling-down-cell
Change-Id: Iaea1cb4ed93bb98f451de4f993106d7891ca3682
Scatter-gather utility returns a raised_exception_sentinel for
all kinds of exceptions that are caught and often times there
maybe situations where we may have to handle the different types
of exceptions differently. To facilitate that, it might be more
useful to return the Exception object itself instead of the dummy
raised_exception_sentinel so that based on the result's exception
type we can handle them differently.
Related to blueprint handling-down-cell
Change-Id: I861b223ee46b0f0a31f646a4b45f8a02410253cf
This patch augments the existing behavior of nova list which
ignores a down cell to returning a partial construct consisting of
uuid, project_id and created_at fields obtained from the API
instance_mappings table for the instances in the down cell by
introducing a new API microversion. This way when the user
gets partial results for certain records, he/she will know
that that particular cell is unreachable. Note that the other
keys will be missing from the API response. Also note that this
is not externally poke-able until the microversion and API change
later in this series.
Related to blueprint handling-down-cell
Change-Id: Ie9a9aed0676d469e5180d4c2d2631cf339335b2d
This adds validation in the compute API for when a
block_device_mapping_v2 request specifies a volume_type. Note that
this is all noop code right now since users cannot specify that field in
the API until the microversion is added which enables that feature.
In other words, this is just plumbing.
Part of bp boot-instance-specific-storage-backend
Change-Id: I45bd42908d44a0f05e1231febab926b23232b57b
In Rocky, we deprecated the nova-consoleauth service but there were
unconditional calls to nova-consoleauth in the compute/api, which
made it impossible to avoid running the nova-consoleauth service.
This adds conditional checks to call nova-consoleauth only if the
[workarounds]enable_consoleauth configuration option is True. The
option defaults to False because the default console token auth TTL
is 10 minutes and only operators who have configured much longer TTL
or otherwise wish to avoid resetting all consoles at upgrade time
need to use the option.
This also updates the /os-console-auth-tokens/{console_token} API to
use nova-consoleauth only if the [workarounds] option is enabled. This
had to be done in the same change because the conditional checks in
the compute/api code caused the /os-console-auth-tokens API functional
tests to fail to find token authorizations in nova-consoleauth.
Closes-Bug: #1788470
Closes-Bug: #1795982
Change-Id: Iff6020f1a10afc476864f979faf251ef5a1a6184
This patch removes the host parameter of the
notify_about_instance_delete() call as it is always filled with
CONF.host value.
Change-Id: Iff3b605b9410d5a097b53f532870df65780bc1e4
Implements: bp versioned-notification-transformation-stein
If we are using a case-insensitive backend database like mysql, a
user can request an aggregate host add with a non-matching hostname
and we will not signal failure. We will create a mapping which will
not actually include that host in the aggregate, which will be
confusing later. This change makes us fail if the host name does not
match exactly, which is the same behavior as if we are using a
case-sensitive backend (like postgres).
Change-Id: I597dee74d33de337913eddda74ab056fbf81a23c
Closes-Bug: #1709260
Remove an unnecessary 'TODO' comment
because the check is already done
when checking the host existence.
Change-Id: I242159f80a7a682aac4632511fbd3ecc9b2d34db
TrivialFix
In Pike, I3a3caa4c566ecc132aa2699f8c7e5987bbcc863a added
a version check for computes during boot from volume
to determine if the computes were running Ocata and would
call the "check_attach" routine which was removed in that
same change.
Since the API doesn't support computes that old at this point
we can remove that version compat checking code. Related tests
are updated appropriately including several tests for _validate_bdm
that were incorrectly passing because of the mock on
Service.get_minimum_version when in fact instances going through
_validate_bdm won't have their instance.id attribute set.
Finally, the really old "check_attach" comment in the
get_image_metadata_from_volume() method is updated for context.
Change-Id: Iaffbb019fef7779e7fa44306aacca954512b6970
This patch moves soft_delete notification sending to the
notify_about_instance_delete context manager to unify the
delete notification sending code practice.
Implements: bp versioned-notification-transformation-stein
Change-Id: Ib6f95c22ffd3ea235b60db4da32094d49c2efa2a
This was always set by callers and appears to be here purely to avoid
updating tests. Update the tests and ease comprehension of the function.
Change-Id: Ib22b64ca499ffdb1a32d21f52240e80676ae1165