The combined fixes for the two related bugs resolve the problem where
SIGHUP breaks the nova-compute service. Bump the minimum requirements
for oslo.privsep and oslo.service to make sure these fixes are in place,
and add a reno to advertise resolution of the issue.
This also bumps oslo.utils to match the lower constraint from
oslo.service.
Change-Id: I39ead744b21a4423352a88573f327273e4d09630
Related-Bug: #1794708
Related-Bug: #1715374
This adds support, in a new microversion, for specifying an availability
zone to the unshelve server action when the server is shelved offloaded.
Note that the functional test changes are due to those tests using the
"latest" microversion where an empty dict is not allowed for unshelve
with 2.77 so the value is changed from an empty dict to None.
Implements: blueprint support-specifying-az-when-restore-shelved-server
Closes-Bug: #1723880
Change-Id: I4b13483eef42bed91d69eabf1f30762d6866f957
The archive_deleted_rows cmd depend on DB connection config from config
file, and when applying super-conductor mode, there are several config
files for different cells. If so, the command can only archive rows in
cell0 DB as it only reads the nova.conf
This patch added code that provides --all-cells parameter to the
command and read info for all cells from the api_db and then archive
rows across all cells.
The --all-cells parameter is passed on to the purge command when
archive_deleted_rows is called with both --all-cells and --purge.
Co-Authored-By: melanie witt <melwittt@gmail.com>
Change-Id: Id16c3d91d9ce5db9ffd125b59fffbfedf4a6843d
Closes-Bug: #1719487
If any nova-manage command fails in an unexpected way and
it bubbles back up to main() the return code will be 1.
There are some commands like archive_deleted_rows,
map_instances and heal_allocations which return 1 for flow
control with automation systems. As a result, those tools
could be calling the command repeatedly getting rc=1 thinking
there is more work to do when really something is failing.
This change makes the unexpected error code 255, updates the
relevant nova-manage command docs that already mention return
codes in some kind of list/table format, and adds an upgrade
release note just to cover our bases in case someone was for
some weird reason relying on 1 specifically for failures rather
than anything greater than 0.
Change-Id: I2937c9ef00f1d1699427f9904cb86fe2f03d9205
Closes-Bug: #1840978
The url option was deprecated in Queens:
I41724a612a5f3eabd504f3eaa9d2f9d141ca3f69
The same functionality is available in the
endpoint_override option so tests and docs
are updated to use that where they were using
url before.
Note that because the logic in the get_client
method changed, some small changes were made to
the test_withtoken and test_withtoken_context_is_admin
unit tests to differentiate from when there is a
context with a token that is not admin and an
admin context that does not have a token which
was otherwise determined by asserting the default
region name.
Change-Id: I6c068a84c4c0bd88f088f9328d7897bfc1f843f1
This change adds the ablity for a user or operator to contol
the virtualisation of a performance monitoring unit within a vm.
This change introduces a new "hw:pmu" extra spec and a corresponding
image metadata property "hw_pmu".
The glance image metadata doc will be updated seperately by:
https://review.opendev.org/#/c/675182
Change-Id: I5576fa2a67d2771614266022428b4a95487ab6d5
Implements: blueprint libvirt-pmu-configuration
This patch adds a new external event called "power-update"
through which ironic will convey all (power_off and power_on)
power state changes (running -> shutdown or shutdown -> running
will be the only ones handled by nova and the rest will be ignored)
on a physical instance to nova. The database will be updated
accordingly to reflect the real vm_state and power_state of the
instance. This way nova will not be able to enforce
an incorrect power state on the physical instance during
the periodic "sync_power_states" task.
Implements blueprint nova-support-instance-power-update
Story: 2004969
Task: 29423
Change-Id: I2b292050cc3ce5ef625659f5a1fe56bb76072496
This patch removes the legacy code for image checksumming
as well as configuration values that are not longer being
used.
Change-Id: I9c552e33456bb862688beaabe69f2b72bb8ebcce
This microversion implements below API cleanups:
1. 400 for unknown param for query param and for request body.
2. Making server representation always consistent among all APIs
returning the complete server representation.
3. Change the default return value of ``swap`` field from the empty string
to 0 (integer) in flavor APIs.
4. Return ``servers`` field always in the response of GET
hypervisors API even there are no servers on hypervisor
Details: https://specs.openstack.org/openstack/nova-specs/specs/train/approved/api-consistency-cleanup.html
Partial-Implements: blueprint api-consistency-cleanup
Change-Id: I9d257a003d315b84b937dcef91f3cb41f3e24b53
The server fault "message" is always shown in the API
server response, regardless of policy or user role.
The fault "details" are only shown to users with the
admin role when the fault code is 500.
The problem with this is for non-nova exceptions, the
fault message is a string-ified version of the exception
(see nova.compute.utils.exception_to_dict) which can
contain sensitive information which the non-admin owner
of the server can see.
This change adds a functional test to recreate the issue
and a change to exception_to_dict which for the non-nova
case changes the fault message by simply storing the
exception type class name. Admins can still see the fault
traceback in the "details" key of the fault dict in the
server API response. Note that _get_fault_details is
changed so that the details also includes the exception
value which is what used to be in the fault message for
non-nova exceptions. This is necessary so admins can still
get the exception message with the traceback details.
Note that nova exceptions with a %(reason)s replacement
variable could potentially be leaking sensitive details as
well but those would need to be cleaned up on a case-by-case
basis since we don't want to change the behavior of all
fault messages otherwise users might not see information
like NoValidHost when their server goes to ERROR status
during scheduling.
SecurityImpact: This change contains a fix for CVE-2019-14433.
Change-Id: I5e0a43ec59341c9ac62f89105ddf82c4a014df81
Closes-Bug: #1837877
These were deprecated during Stein [1] and can now be removed, lest they
cause hassle with the PCPU work. As noted in [1], the aggregate
equivalents of same are left untouched for now.
[1] https://review.opendev.org/#/c/596502/
Change-Id: I8a0d332877fbb9794700081e7954f2501b7e7c09
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
If more than one numbered request group is in the placement a_c query
then the group_policy is mandatory. Based on the PTG discussion [1]
'none' seems to be a good default policy from nova perspective. So this
patch makes sure that if the group_policy is not provided in the flavor
extra_spec and there are more than one numbered group in the request and
the flavor only provide one or zero groups (so groups are coming from
other sources like neutron ports) then the group_policy is defaulted to
'none'.
The reasoning behind this change: If more than one numbered request
group is coming from the flavor extra_spec then the creator of the
flavor is responsible to add a group_policy to the flavor. So in this
nova only warns but let the request fail in placement to force the
fixing of the flavor. However when numbered groups are coming from
other sources (like neutron ports) then the creator of the flavor
cannot know if additional group will be included so we don't want to
force the flavor creator but simply default the group_policy.
[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005807.html
Change-Id: I0681de217ed9f5d77dae0d9555632b8d160bb179
Before I97f06d0ec34cbd75c182caaa686b8de5c777a576 it was possible to
create servers with neutron ports which had resource_request (e.g. a
port with QoS minimum bandwidth policy rule) without allocating the
requested resources in placement. So there could be servers for which
the allocation needs to be healed in placement.
This patch extends the nova-manage heal_allocation CLI to create the
missing port allocations in placement and update the port in neutron
with the resource provider uuid that is used for the allocation.
There are known limiations of this patch. It does not try to reimplement
Placement's allocation candidate functionality. Therefore it cannot
handle the situation when there is more than one RP in the compute
tree which provides the required traits for a port. In this situation
deciding which RP to use would require the in_tree allocation candidate
support from placement which is not available yet and 2) information
about which PCI PF an SRIOV port is allocated from its VF and which RP
represents that PCI device in placement. This information is only
available on the compute hosts.
For the unsupported cases the command will fail gracefully. As soon as
migration support for such servers are implemented in the blueprint
support-move-ops-with-qos-ports the admin can heal the allocation of
such servers by migrating them.
During healing both placement and neutron need to be updated. If any of
those updates fail the code tries to roll back the previous updates for
the instance to make sure that the healing can be re-run later without
issue. However if the rollback fails then the script will terminate with
an error message pointing to documentation that describes how to
recover from such a partially healed situation manually.
Closes-Bug: #1819923
Change-Id: I4b2b1688822eb2f0174df0c8c6c16d554781af85
Add a new microversion that adds two new params to create
server named 'host' and 'hypervisor_hostname'.
Part of Blueprint: add-host-and-hypervisor-hostname-flag-to-create-server
Change-Id: I3afea20edaf738da253ede44b4a07414ededafd6
Obliterate all references to the aforementioned service. This mostly
consists of removing the core service and any references to the now
removed '[workarounds] enable_consoleauth' configuration option.
Part of blueprint remove-consoleauth
Change-Id: I0498599fd636aa9e30df932f0d893db5efa23260
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Depends-On: Icfc175c49a1fc650d1c9ad06b77209a70c6386db
This adds a new mandatory placement request pre-filter
which is used to exclude compute node resource providers
with the COMPUTE_STATUS_DISABLED trait. The trait is
managed by the nova-compute service when the service's
disabled status changes.
Change I3005b46221ac3c0e559e1072131a7e4846c9867c makes
the compute service sync the trait during the
update_available_resource flow (either on start of the
compute service or during the periodic task run).
Change Ifabbb543aab62b917394eefe48126231df7cd503 makes
the libvirt driver's _set_host_enabled callback reflect
the trait when the hypervisor goes up or down out of band.
Change If32bca070185937ef83f689b7163d965a89ec10a will add
the final piece which is the os-services API calling the
compute service to add/remove the trait when a compute
service is disabled or enabled.
Since this series technically functions without the API
change, the docs and release note are added here.
Part of blueprint pre-filter-disabled-computes
Change-Id: I317cabbe49a337848325f96df79d478fd65811d9
Previously the initial call to connect to a RBD cluster via the RADOS
API could hang indefinitely if network or other environmental related
issues were encountered.
When encountered during a call to update_available_resource this can
result in the local n-cpu service reporting as UP while never being able
to break out of a subsequent RPC timeout loop as documented in bug
This change adds a simple timeout configurable to be used when initially
connecting to the cluster [1][2][3]. The default timeout of 5 seconds
being sufficiently small enough to ensure that if encountered the n-cpu
service will be able to be marked as DOWN before a RPC timeout is seen.
[1] http://docs.ceph.com/docs/luminous/rados/api/python/#rados.Rados.connect
[2] http://docs.ceph.com/docs/mimic/rados/api/python/#rados.Rados.connect
[3] http://docs.ceph.com/docs/nautilus/rados/api/python/#rados.Rados.connect
Closes-bug: #1834048
Change-Id: I67f341bf895d6cc5d503da274c089d443295199e
With all in-tree virt drivers now implementing the
update_provider_tree interface, we can deprecate the
compatibility code in the ResourceTracker. This change
simply logs a warning if the driver does not implement
the upt interface and sets the timer for removal in the
U release at the earliest.
The resource tracker unit tests will need to be cleaned
up but that can happen in a separate change so it does
not slow down this deprecation.
Change-Id: I1eae47bce08f6292d38e893a2122289bcd6f4b58
We're going to be removing the configuration option so the advice from
this check will no longer make sense.
Part of blueprint remove-consoleauth
Change-Id: I5c7e54259857d9959f5a2dfb99102602a0cf9bb7
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
- This change extends the VideoModel field object to allow 3 new values
(virtio, gop, none)
- This change makes the libvirt driver use ALL tuple from the
nova.fields.VideoModel object instead of declaring a second
tuple inline for validation.
- This change allows the virtio video model to now be used
for all architectures when explicitly requested via the
hw_video_model image metadata property
- This change introduces unit tests and a release note
for the new capablities.
Change-Id: I2830ccfc81cfa9654cfeac7ad5effc294f523552
Implements: blueprint libvirt-video-device-models
Remove deprecated optional argument '--version'
in the following commands.
* nova-manage db sync
* nova-manage api_db sync
Change-Id: I7795e308497de66329f288b43ecfbf978d67ad75
Currently, the reporting of bytes available works well for recommended
Ceph deployments that run one OSD per disk [1]. However, for users who
are running multiple OSDs on a single disk, the current reporting will
reflect bytes available * number of replicas.
We can enhance the bytes available reporting method to accomodate
unrecommended Ceph deployments by using the MAX_AVAIL stat obtainable
via the 'ceph df' command. The MAX_AVAIL stat takes the number of
configured replicas into consideration and will reflect the correct
number of bytes available even when Ceph is deployed in a way the
documentation recommends against.
For most users, this change should make no difference. It will only be
a help for users who are running unrecommended Ceph deployments.
[1] http://docs.ceph.com/docs/luminous/start/hardware-recommendations/#hard-disk-drives
Change-Id: I96faff6d3b9747514441d83c629fdd1cface1eb5
The --before option to nova manage db purge and archive_deleted_rows
accepts a string to be parsed by dateutils.parser.parse() with
fuzzy=True. This is fairly forgiving, but doesn't handle e.g. "now - 1
day". This commit adds some clarification to the help strings, and some
examples to the docs.
Change-Id: Ib218b971784573fce16b6be4b79e0bf948371954
Since blueprint return-alternate-hosts in Queens, the scheduler
returns a primary selected host and some alternate hosts based
on the max_attempts config option. The only reschedules we have
are during server create and resize/cold migrate. The list of
alternative hosts are passed down from conductor through compute
and back to conductor on reschedule and if conductor gets a list
of alternate hosts on reschedule it will not call the scheduler
again. This means the RetryFilter is effectively useless now since
it shouldn't ever filter out hosts on the first schedule attempt
and because we're using alternates for reschedules, we shouldn't
go back to the scheduler on a reschedule. As a result this change
deprecates the RetryFilter and removes it from the default list
of enabled filters.
Change-Id: Ic0a03e89903bf925638fa26cca3dac7db710dca3
Starting in noVNC v1.1.0, the token query parameter is no longer
forwarded via cookie [1]. We must instead use the 'path' query
parameter to pass the token through to the websocketproxy [2].
This means that if someone deploys noVNC v1.1.0, VNC consoles will
break in nova because the code is relying on the cookie functionality
that v1.1.0 removed.
This modifies the ConsoleAuthToken.access_url property to include the
'path' query parameter as part of the returned access_url that the
client will use to call the console proxy service.
This change is backward compatible with noVNC < v1.1.0. The 'path' query
parameter is a long supported feature in noVNC.
Co-Authored-By: melanie witt <melwittt@gmail.com>
Closes-Bug: #1822676
[1] 51f9f0098d
[2] https://github.com/novnc/noVNC/pull/1220
Change-Id: I2ddf0f4d768b698e980594dd67206464a9cea37b
If enable_dhcp is set on subnet, but, for
some reason neutron did not have any DHCP port yet, we still
want the network_info to be populated with a valid dhcp_server
value. This is mostly useful for the metadata API (which is
relying on this value to give network_data to the instance).
This will also help some providers which are using external
DHCP servers not handled by neutron.
In this case, neutron will never create any DHCP port in the
subnet.
Also note that we cannot set the value to None because then the
value would be discarded by the metadata API.
So the subnet gateway will be used as fallback.
Change-Id: Ie2cd54c159ea693e48e00d0ca3b0ca5a468d79cb
Signed-off-by: Arnaud Morin <arnaud.morin@corp.ovh.com>
Change I7b8622b178d5043ed1556d7bdceaf60f47e5ac80 started deleting the
compute node resource provider associated with a compute node when
deleting a nova-compute service. However, it would only delete the
first compute node associated with the service which means for an
ironic compute service that is managing multiple nodes, the resource
providers were not cleaned up in placement. This fixes the issue by
iterating all the compute nodes and cleaning up their providers.
Note this could be potentially a lot of nodes, but we don't really
have many good options here but to iterate them and clean them up
one at a time.
Note that this is best-effort but because of how the
SchedulerReportClient.delete_resource_provider method ignores
ResourceProviderInUse errors, and we could have stale allocations
on the host for which delete_resource_provider is not accounting,
namely allocations from evacuated instances (or incomplete migrations
though you can't migrate baremetal instances today), we could still
delete the compute service and orphan those in-use providers. That,
however, is no worse than before this change where we did not try
to cleanup all providers. The issue described above is being tracked
with bug 1829479 and will be dealt with separately.
Change-Id: I9e852e25ea89f32bf19cdaeb1f5dac8f749f5dbc
Closes-Bug: #1811726
This was called out in change Ib0e0b708c46e4330e51f8f8fdfbb02d45aaf0f44.
Change-Id: I1dcc1bb072c0f98deb42841753e3474ac510cce5
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>