This patch adds a new external event called "power-update"
through which ironic will convey all (power_off and power_on)
power state changes (running -> shutdown or shutdown -> running
will be the only ones handled by nova and the rest will be ignored)
on a physical instance to nova. The database will be updated
accordingly to reflect the real vm_state and power_state of the
instance. This way nova will not be able to enforce
an incorrect power state on the physical instance during
the periodic "sync_power_states" task.
Implements blueprint nova-support-instance-power-update
Story: 2004969
Task: 29423
Change-Id: I2b292050cc3ce5ef625659f5a1fe56bb76072496
This patch removes the legacy code for image checksumming
as well as configuration values that are not longer being
used.
Change-Id: I9c552e33456bb862688beaabe69f2b72bb8ebcce
This microversion implements below API cleanups:
1. 400 for unknown param for query param and for request body.
2. Making server representation always consistent among all APIs
returning the complete server representation.
3. Change the default return value of ``swap`` field from the empty string
to 0 (integer) in flavor APIs.
4. Return ``servers`` field always in the response of GET
hypervisors API even there are no servers on hypervisor
Details: https://specs.openstack.org/openstack/nova-specs/specs/train/approved/api-consistency-cleanup.html
Partial-Implements: blueprint api-consistency-cleanup
Change-Id: I9d257a003d315b84b937dcef91f3cb41f3e24b53
The server fault "message" is always shown in the API
server response, regardless of policy or user role.
The fault "details" are only shown to users with the
admin role when the fault code is 500.
The problem with this is for non-nova exceptions, the
fault message is a string-ified version of the exception
(see nova.compute.utils.exception_to_dict) which can
contain sensitive information which the non-admin owner
of the server can see.
This change adds a functional test to recreate the issue
and a change to exception_to_dict which for the non-nova
case changes the fault message by simply storing the
exception type class name. Admins can still see the fault
traceback in the "details" key of the fault dict in the
server API response. Note that _get_fault_details is
changed so that the details also includes the exception
value which is what used to be in the fault message for
non-nova exceptions. This is necessary so admins can still
get the exception message with the traceback details.
Note that nova exceptions with a %(reason)s replacement
variable could potentially be leaking sensitive details as
well but those would need to be cleaned up on a case-by-case
basis since we don't want to change the behavior of all
fault messages otherwise users might not see information
like NoValidHost when their server goes to ERROR status
during scheduling.
SecurityImpact: This change contains a fix for CVE-2019-14433.
Change-Id: I5e0a43ec59341c9ac62f89105ddf82c4a014df81
Closes-Bug: #1837877
These were deprecated during Stein [1] and can now be removed, lest they
cause hassle with the PCPU work. As noted in [1], the aggregate
equivalents of same are left untouched for now.
[1] https://review.opendev.org/#/c/596502/
Change-Id: I8a0d332877fbb9794700081e7954f2501b7e7c09
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Some options are now automatically configured by the version 1.20:
- project
- html_last_updated_fmt
- latex_engine
- latex_elements
- version
- release.
Change-Id: I3a5c7e115d0c4f52b015d0d55eb09c9836cd2fe7
If more than one numbered request group is in the placement a_c query
then the group_policy is mandatory. Based on the PTG discussion [1]
'none' seems to be a good default policy from nova perspective. So this
patch makes sure that if the group_policy is not provided in the flavor
extra_spec and there are more than one numbered group in the request and
the flavor only provide one or zero groups (so groups are coming from
other sources like neutron ports) then the group_policy is defaulted to
'none'.
The reasoning behind this change: If more than one numbered request
group is coming from the flavor extra_spec then the creator of the
flavor is responsible to add a group_policy to the flavor. So in this
nova only warns but let the request fail in placement to force the
fixing of the flavor. However when numbered groups are coming from
other sources (like neutron ports) then the creator of the flavor
cannot know if additional group will be included so we don't want to
force the flavor creator but simply default the group_policy.
[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005807.html
Change-Id: I0681de217ed9f5d77dae0d9555632b8d160bb179
Before I97f06d0ec34cbd75c182caaa686b8de5c777a576 it was possible to
create servers with neutron ports which had resource_request (e.g. a
port with QoS minimum bandwidth policy rule) without allocating the
requested resources in placement. So there could be servers for which
the allocation needs to be healed in placement.
This patch extends the nova-manage heal_allocation CLI to create the
missing port allocations in placement and update the port in neutron
with the resource provider uuid that is used for the allocation.
There are known limiations of this patch. It does not try to reimplement
Placement's allocation candidate functionality. Therefore it cannot
handle the situation when there is more than one RP in the compute
tree which provides the required traits for a port. In this situation
deciding which RP to use would require the in_tree allocation candidate
support from placement which is not available yet and 2) information
about which PCI PF an SRIOV port is allocated from its VF and which RP
represents that PCI device in placement. This information is only
available on the compute hosts.
For the unsupported cases the command will fail gracefully. As soon as
migration support for such servers are implemented in the blueprint
support-move-ops-with-qos-ports the admin can heal the allocation of
such servers by migrating them.
During healing both placement and neutron need to be updated. If any of
those updates fail the code tries to roll back the previous updates for
the instance to make sure that the healing can be re-run later without
issue. However if the rollback fails then the script will terminate with
an error message pointing to documentation that describes how to
recover from such a partially healed situation manually.
Closes-Bug: #1819923
Change-Id: I4b2b1688822eb2f0174df0c8c6c16d554781af85
Add a new microversion that adds two new params to create
server named 'host' and 'hypervisor_hostname'.
Part of Blueprint: add-host-and-hypervisor-hostname-flag-to-create-server
Change-Id: I3afea20edaf738da253ede44b4a07414ededafd6
Obliterate all references to the aforementioned service. This mostly
consists of removing the core service and any references to the now
removed '[workarounds] enable_consoleauth' configuration option.
Part of blueprint remove-consoleauth
Change-Id: I0498599fd636aa9e30df932f0d893db5efa23260
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Depends-On: Icfc175c49a1fc650d1c9ad06b77209a70c6386db
This adds a new mandatory placement request pre-filter
which is used to exclude compute node resource providers
with the COMPUTE_STATUS_DISABLED trait. The trait is
managed by the nova-compute service when the service's
disabled status changes.
Change I3005b46221ac3c0e559e1072131a7e4846c9867c makes
the compute service sync the trait during the
update_available_resource flow (either on start of the
compute service or during the periodic task run).
Change Ifabbb543aab62b917394eefe48126231df7cd503 makes
the libvirt driver's _set_host_enabled callback reflect
the trait when the hypervisor goes up or down out of band.
Change If32bca070185937ef83f689b7163d965a89ec10a will add
the final piece which is the os-services API calling the
compute service to add/remove the trait when a compute
service is disabled or enabled.
Since this series technically functions without the API
change, the docs and release note are added here.
Part of blueprint pre-filter-disabled-computes
Change-Id: I317cabbe49a337848325f96df79d478fd65811d9
Previously the initial call to connect to a RBD cluster via the RADOS
API could hang indefinitely if network or other environmental related
issues were encountered.
When encountered during a call to update_available_resource this can
result in the local n-cpu service reporting as UP while never being able
to break out of a subsequent RPC timeout loop as documented in bug
This change adds a simple timeout configurable to be used when initially
connecting to the cluster [1][2][3]. The default timeout of 5 seconds
being sufficiently small enough to ensure that if encountered the n-cpu
service will be able to be marked as DOWN before a RPC timeout is seen.
[1] http://docs.ceph.com/docs/luminous/rados/api/python/#rados.Rados.connect
[2] http://docs.ceph.com/docs/mimic/rados/api/python/#rados.Rados.connect
[3] http://docs.ceph.com/docs/nautilus/rados/api/python/#rados.Rados.connect
Closes-bug: #1834048
Change-Id: I67f341bf895d6cc5d503da274c089d443295199e
With all in-tree virt drivers now implementing the
update_provider_tree interface, we can deprecate the
compatibility code in the ResourceTracker. This change
simply logs a warning if the driver does not implement
the upt interface and sets the timer for removal in the
U release at the earliest.
The resource tracker unit tests will need to be cleaned
up but that can happen in a separate change so it does
not slow down this deprecation.
Change-Id: I1eae47bce08f6292d38e893a2122289bcd6f4b58
We're going to be removing the configuration option so the advice from
this check will no longer make sense.
Part of blueprint remove-consoleauth
Change-Id: I5c7e54259857d9959f5a2dfb99102602a0cf9bb7
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
- This change extends the VideoModel field object to allow 3 new values
(virtio, gop, none)
- This change makes the libvirt driver use ALL tuple from the
nova.fields.VideoModel object instead of declaring a second
tuple inline for validation.
- This change allows the virtio video model to now be used
for all architectures when explicitly requested via the
hw_video_model image metadata property
- This change introduces unit tests and a release note
for the new capablities.
Change-Id: I2830ccfc81cfa9654cfeac7ad5effc294f523552
Implements: blueprint libvirt-video-device-models
Remove deprecated optional argument '--version'
in the following commands.
* nova-manage db sync
* nova-manage api_db sync
Change-Id: I7795e308497de66329f288b43ecfbf978d67ad75
The --before option to nova manage db purge and archive_deleted_rows
accepts a string to be parsed by dateutils.parser.parse() with
fuzzy=True. This is fairly forgiving, but doesn't handle e.g. "now - 1
day". This commit adds some clarification to the help strings, and some
examples to the docs.
Change-Id: Ib218b971784573fce16b6be4b79e0bf948371954
Since blueprint return-alternate-hosts in Queens, the scheduler
returns a primary selected host and some alternate hosts based
on the max_attempts config option. The only reschedules we have
are during server create and resize/cold migrate. The list of
alternative hosts are passed down from conductor through compute
and back to conductor on reschedule and if conductor gets a list
of alternate hosts on reschedule it will not call the scheduler
again. This means the RetryFilter is effectively useless now since
it shouldn't ever filter out hosts on the first schedule attempt
and because we're using alternates for reschedules, we shouldn't
go back to the scheduler on a reschedule. As a result this change
deprecates the RetryFilter and removes it from the default list
of enabled filters.
Change-Id: Ic0a03e89903bf925638fa26cca3dac7db710dca3
Starting in noVNC v1.1.0, the token query parameter is no longer
forwarded via cookie [1]. We must instead use the 'path' query
parameter to pass the token through to the websocketproxy [2].
This means that if someone deploys noVNC v1.1.0, VNC consoles will
break in nova because the code is relying on the cookie functionality
that v1.1.0 removed.
This modifies the ConsoleAuthToken.access_url property to include the
'path' query parameter as part of the returned access_url that the
client will use to call the console proxy service.
This change is backward compatible with noVNC < v1.1.0. The 'path' query
parameter is a long supported feature in noVNC.
Co-Authored-By: melanie witt <melwittt@gmail.com>
Closes-Bug: #1822676
[1] 51f9f0098d
[2] https://github.com/novnc/noVNC/pull/1220
Change-Id: I2ddf0f4d768b698e980594dd67206464a9cea37b
If enable_dhcp is set on subnet, but, for
some reason neutron did not have any DHCP port yet, we still
want the network_info to be populated with a valid dhcp_server
value. This is mostly useful for the metadata API (which is
relying on this value to give network_data to the instance).
This will also help some providers which are using external
DHCP servers not handled by neutron.
In this case, neutron will never create any DHCP port in the
subnet.
Also note that we cannot set the value to None because then the
value would be discarded by the metadata API.
So the subnet gateway will be used as fallback.
Change-Id: Ie2cd54c159ea693e48e00d0ca3b0ca5a468d79cb
Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>
Change I7b8622b178d5043ed1556d7bdceaf60f47e5ac80 started deleting the
compute node resource provider associated with a compute node when
deleting a nova-compute service. However, it would only delete the
first compute node associated with the service which means for an
ironic compute service that is managing multiple nodes, the resource
providers were not cleaned up in placement. This fixes the issue by
iterating all the compute nodes and cleaning up their providers.
Note this could be potentially a lot of nodes, but we don't really
have many good options here but to iterate them and clean them up
one at a time.
Note that this is best-effort but because of how the
SchedulerReportClient.delete_resource_provider method ignores
ResourceProviderInUse errors, and we could have stale allocations
on the host for which delete_resource_provider is not accounting,
namely allocations from evacuated instances (or incomplete migrations
though you can't migrate baremetal instances today), we could still
delete the compute service and orphan those in-use providers. That,
however, is no worse than before this change where we did not try
to cleanup all providers. The issue described above is being tracked
with bug 1829479 and will be dealt with separately.
Change-Id: I9e852e25ea89f32bf19cdaeb1f5dac8f749f5dbc
Closes-Bug: #1811726
This was called out in change Ib0e0b708c46e4330e51f8f8fdfbb02d45aaf0f44.
Change-Id: I1dcc1bb072c0f98deb42841753e3474ac510cce5
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
The default config `both` means that both the legacy and the versioned
notifications are emitted. This was selected as default in the past when
we thought that this will help the adoption of the versioned interface
while we worked on to make that new interface in feature parity with the
legacy. Even though the versioned notification interface is in feature
parity with the legacy interface since Stein the projects consuming nova
notifications do not have the resources to switch to the new interface.
In the other hand having `both` as a default in an environtment where
only the legacy notifications are consumed causes performance issues in
the message bus hence the bug #1805659.
The original plan was that we set the default to `versioned` when the
interface reaches feature parity but as major consumers are not ready
to switch we cannot do that.
So the only option left is to set the default to `unversioned`.
Related devstack patch: https://review.opendev.org/#/c/662849/
Closes-Bug: #1805659
Change-Id: I72faa356afffb7a079a9ce86fed1b463773a0507
Blueprints hide-hypervisor-id-flavor-extra-spec [1] and
add-kvm-hidden-feature [2] allow hiding KVM's signature for guests,
which is necessary for Nvidia drivers to work in VMs with passthrough
GPUs. While this works well for linux guests on KVM, it doesn't work
for Windows guests.
For them, KVM emulates some HyperV features. With the
current implementation, KVM's signature is hidden, but HyperV's is not,
and Nvidia drivers don't work in Windows VMs.
This change generates an extra element in the libvirt xml for Windows
guests on KVM which obfuscates HyperV's signature too, controlled by the
existing image and flavor parameters (img_hide_hypervisor_id and
hide_hypervisor_id correspondingly). The extra xml element is
<vendor_id state='on' value='1234567890ab'/>
in features/hyperv.
[1] https://blueprints.launchpad.net/nova/+spec/hide-hypervisor-id-flavor-extra-spec
[2] https://blueprints.launchpad.net/nova/+spec/add-kvm-hidden-feature
Change-Id: Iaaeae9281301f14f4ae9b43f4a06de58b699fd68
Closes-Bug: 1779845
This is no longer used anywhere and can therefore be safely removed.
Part of blueprint remove-cells-v1
Change-Id: I16b6d428accabf9dd7692909084faaf426e13524
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
This counts instance mappings for counting quota usage for instances
and adds calls to placement for counting quota usage for cores and ram.
During an upgrade, if any un-migrated instance mappings are found (with
NULL user_id or NULL queued_for_delete fields), we will fall back to
the legacy counting method.
Counting quota usage from placement is opt-in via the
[quota]count_usage_from_placement configuration option because:
* Though beneficial for multi-cell deployments to be resilient to
down cells, the vast majority of deployments are single cell and
will not be able to realize a down cells resiliency benefit and may
prefer to keep legacy quota usage counting.
* Usage for resizes will reflect resources being held on both the
source and destination until the resize is confirmed or reverted.
Operators may not want to enable counting from placement based on
whether the behavior change is problematic for them.
* Placement does not yet support the ability to partition resource
providers from mulitple Nova deployments, so environments that are
sharing a single placement deployment would see usage that
aggregates all Nova deployments together. Such environments should
not enable counting from placement.
* Usage for unscheduled instances in ERROR state will not reflect
resource consumption for cores and ram because the instance has no
placement allocations.
* Usage for instances in SHELVED_OFFLOADED state will not reflect
resource consumption for cores and ram because the instance has no
placement allocations. Note that because of this, it will be possible for a
request to unshelve a server to be rejected if the user does not have
enough quota available to support the cores and ram needed by the server to
be unshelved.
Part of blueprint count-quota-usage-from-placement
Change-Id: Ie22b0acb5824a41da327abdcf9848d02fc9a92f5
Add a parameter to limit the archival of deleted rows by date. That is,
only rows related to instances deleted before provided date will be
archived.
This option works together with --max_rows, if both are specified both
will take effect.
Closes-Bug: #1751192
Change-Id: I408c22d8eada0518ec5d685213f250e8e3dae76e
Implements: blueprint nova-archive-before
If we're swapping from a multiattach volume that has more than one
read/write attachment, another server on the secondary attachment could
be writing to the volume which is not getting copied into the volume to
which we're swapping, so we could have data loss during the swap.
This change does volume read/write attachment counting for the volume
we're swapping from and if there is more than one read/write attachment
on the volume, the swap volume operation fails with a 400 BadRequest
error.
Depends-On: https://review.openstack.org/573025/
Closes-Bug: #1775418
Change-Id: Icd7fcb87a09c35a13e4e14235feb30a289d22778
Ceph doesn't support QCOW2 for hosting a virtual machine
disk:
http://docs.ceph.com/docs/master/rbd/rbd-openstack/
When we set image_type as rbd and force_raw_images as
False and we don't launch an instance with boot-from-volume,
the instance is spawned using qcow2 as root disk but
fails to boot because data is accessed as raw.
To fix this, we raise an error and refuse to start
nova-compute service when force_raw_images and
image_type are incompatible.
When we import image into rbd, check the format of cache
images. If the format is not raw, remove it first and
fetch it again. It will be raw format now.
Change-Id: I1aa471e8df69fbb6f5d9aeb35651bd32c7123d78
Closes-Bug: 1816686