This patch finishes to remove the 'check_attach' call from Nova
completely. As Cinder already performs the required checks as part
of the 'reserve_volume' (os-reserve) call it is unnecessary to check the
statemachine in Nova also and it can lead to race conditions.
The missing 'reserve_volume' call is added to the BFV flow. In case of
build failure the volume will be locked in 'attaching' state until the
instance in ERROR state is cleaned up.
We also check AZ for each volume attach operation which we haven't
done for unshelve. A release note is added to enable 'cross_az_attach'
in case the user does not care about AZ.
The compute service version had to be bumped as the old computes still
perform 'check_attach', which will fail when the API reserves the
volume and the volume state moves to 'attaching'. If the computes
are not new enough the old check will be called as opposed to
'reserve_volume'.
Closes-Bug: #1581230
Change-Id: I3a3caa4c566ecc132aa2699f8c7e5987bbcc863a
In Ocata, the filter scheduler would not consult placement until all of
the computes had been upgraded. That check no longer makes sense in Pike
and isn't multi-cell-aware anyway. This removes that check.
Change-Id: Ia1a0066dc30025c02553584a077365b28d8ff80e
There are two in-tree options for the xenserver.vif_driver,
the bridge driver and the ovs driver. The XenAPI subteam has
confirmed that the bridge driver is for nova-network (which is
deprecated) and the ovs driver is for Neutron, and that's how
things are tested in CI.
Since we changed the default on use_neutron to be True for Ocata
we need to change the default on the vif_driver to be the ovs
driver so it works with the default config, which is Neutron.
We're deprecating the option though since we can use the use_neutron
option to decide which vif driver to load - which will make
deploying and configuring nova with xen as the backend simpler.
Change-Id: I599f3449f18d2821403961fb9d52e9a14dd3366b
This change fixes a few things with the recently added
"os_interface" option in the [placement] config group.
1. It adds tests for the scheduler report client that
were missing in the original change that added the
config option.
2. It uses the option in the "nova-status upgrade check"
command so it is consistent with how the scheduler
report client uses it.
3. It removes the restrictive choices list from the
config option definition. keystoneauth1 allows an
"auth" value for the endpoint interface which means
don't use the service catalog to find the endpoint
but instead just read it from the "auth_url" config
option. Also, the Keystone v3 API performs strict
validation of the endpoint interface when creating
an endpoint record. The list of supported interfaces
may change over time, so we shouldn't encode that
list within Nova.
4. As part of removing the choices, the release note
associated with the new option is updated and changed
from a 'feature' release note to simply 'other' since
it's not really a feature as much as it is a bug fix.
Change-Id: Ia5af05cc4d8155349bab942280c83e7318749959
Closes-Bug: #1664334
Based on the libvirt distro support matrix wiki [1] this change
bumps the minimum required version of libvirt to 1.2.9 and
QEMU to 2.1.0. These were both advertised as the next minimums
since Newton, we just never made the change in Ocata.
The next minimum libvirt version is set to 1.3.1 and the next
minimum QEMU version is set to 2.5.0, which is what we gate
on with Ubuntu 16.04 but also falls within the distro support
matrix for a representative set of other supported distros.
[1] https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix
Change-Id: I9a972e3fde2e4e552f6fc98350820c07873c3de3
UpgradeImpact: IntOpt type provides min parameter to restrict integer's
minimum value in oslo.config, and will generate description about this
in the format like '# Minimum value: XXX', then we don't need round up
the minimum value quietly in code.
Change-Id: I54592ba4f46c2d6260f1513e5e29dd466c89724d
This is a nova-net option and can go bye bye when nova-net does.
Change-Id: If2f20cc359b5bb2a411a82df3a442212837a17d8
Implements: blueprint centralize-config-options-pike
The legacy v2 API allowed None for the boot_index [1]. It
allowed this implicitly because the API code would convert
the block_device_mapping_v2 dict from the request into a
BlockDeviceMapping object, which has a boot_index field that
is nullable (allows None).
The API reference documentation [2] also says:
"To disable a device from booting, set the boot index
to a negative value or use the default boot index value,
which is None."
It appears that with the move to v2.1 and request schema
validation, the boot_index schema was erroneously set to
not allow None for a value, which is not backward compatible
with the v2 API behavior.
This change fixes the schema to allow boot_index=None again
and adds a test to show it working.
This should not require a microversion bump since it's fixing
a regression in the v2.1 API which worked in the v2 API and
is already handled throughout Nova's block device code.
Closes-Bug: #1662699
[1] https://github.com/openstack/nova/blob/13.0.0/nova/compute/api.py#L1268
[2] http://developer.openstack.org/api-ref/compute/#create-server
Change-Id: Ice78a0982bcce491f0c9690903ed2c6b6aaab1be
live_migration_progress_timeout aims to timeout a live-migration well
before the live_migration_completion_timeout limit, by looking for when
it appears that no progress has been made copying the memory between the
hosts. However, it turns out there are several problems with the way we
monitor progress. In production, and stress testing, having
live_migration_progress_timeout > 0 has caused random timeout failures
for live-migrations that take longer than live_migration_progress_timeout
One problem is that block_migrations appear to show no progress, as it
seems we only look for progress in copying memory. Also the way we query
QEMU via libvirt breaks when there are multiple iterations of memory
copying.
We need to revisit this bug and either fix the progress mechanism or
remove the all the code that checks for the progress (including the
automatic trigger for post-copy). But in the mean time, lets default to
having no timeout, and warn users that have overridden this
configuration by deprecating the live_migration_progress_timeout
configuration option.
For users concerned about live-migration timeout errors, I have
cleaned up the configuration option descriptions, so they have a better
chance of stopping the live-migration timeout errors they may come
across.
Related-Bug: #1644248
Change-Id: I1a1143ddf8da5fb9706cf53dbfd6cbe84e606ae1
Currently when a compute node is deleted, its record in the cell DB is
deleted, but its representation as a resource provider in the placement
service remains, along with any inventory and allocations. This could
cause the placement engine to return that provider record, even though
the compute node no longer exists. And since the periodic "healing" by
the resource tracker only updates compute node resources for records in
the compute_nodes table, these old records are never removed.
This patch adds a call to delete the resource provider when the compute
node is deleted. It also adds a method to the scheduler report client
to make these calls to the placement API.
Partial-Bug: #1661258
Closes-Bug: #1661014
Change-Id: I6098d186d05ff8b9a568e23f860295a7bc2e6447
Add a release note for the filter/sort whitelist.
Part of blueprint add-whitelist-for-server-list-filter-sort-parameters
Change-Id: I0369407aced56384783b7a8bb10e06e80530854f
Co-Authored-By: ghanshyam <ghanshyammann@gmail.com>
Co-Authored-By: Alex Xu <hejie.xu@intel.com>
This just adds a simple release note to mention the
new nova-status upgrade check CLI. The details are
in the man page so it links there for further reading.
Change-Id: I5c99c969a419a1b73a58e3146040ddef99c72b85
Nova now implements two major changes wrt: CellsV2 and Placement.
We need to properly document them in the release notes in order to help
our operators.
Co-Authored-By: Dan Smith <dansmith@redhat.com>
Depends-On: Ia0869dc6f7f5bd347ccbd0930d1d668d37695a22
Change-Id: Ie3eace8c851edcdbb2095e9550caab296109b4b8
Since cellsv2 has required host mappings to be present, we have lost
our automatic host registration behavior. This is a conscious choice,
given that we do not want to have upcalls from the cells to the API
database. This adds a periodic task to have the scheduler do the
discovery routine, which for small deployments may be an acceptable
amount of background overhead in exchange for the automatic behavior.
There is also probably some amount of performance improvement that
can be added to the host discovery routine to make it less of an
issue. However, just generalizing the existing routine and letting
it run periodically gives us some coverage, given where we are in
the current cycle.
Related to blueprint cells-scheduler-integration
Change-Id: Iab5f172cdef35645bae56b9b384e032f3160e826
This patch exposes the "interface" option for ks_filter to allow
the placement API to be connected on a specific endpoint interface.
The previous was to force "public", which is default for keystoneauth.
The default for the placement service mirrors this value.
Change-Id: Ic996e596f8473c0b8626e8d0e92e1bf58044b4f8
In the context of device tagging, bugs have caused the tag attribute
to disappear starting with version 2.33 for block_devices and starting
with version 2.37 for network interfaces. In other words, block
devices could only be tagged in 2.32 and network interfaces between
2.32 and 2.36 inclusively.
This patch documents this behaviour in api-ref and introduces
microversion 2.42, which re-adds the tag in all the right places.
Change-Id: Ia0869dc6f7f5bd347ccbd0930d1d668d37695a22
Closes-bug: 1658571
Implements: blueprint fix-tag-attribute-disappearing
Instead of iterating over the long list of compute nodes in the DB,
the scheduler should rather just call the Placement API for getting
the list of ResourceProviders that support the resource amounts
given by the RequestSpec.
Implements: blueprint resource-providers-scheduler-db-filters
Depends-On: I573149b9415da2a8bb3951a4c4ce71c4c3e48c6f
Change-Id: Ie12acb76ec5affba536c3c45fbb6de35d64aea1b
Handle the network interface new vlans field and expose it in the devices
metadata
The vlan tag will be exposed to the instance through the metadata API and
on the config drive.
Implements: blueprint sriov-pf-passthrough-neutron-port-vlan
Change-Id: Id7b9f3f1c2107ec604e7f0ef4fbfd31a9e05d0b0
Discrete Device Assignment is a new feature in Windows Server 2016,
offering users the possibility of taking some of the PCI Express
devices in their systems and pass them through directly to a guest VM.
DocImpact: The compute-pci-passthrough page in the admin-guide will
have to be updated to include details regarding PCI passthrough on
Hyper-V.
Co-Authored-By: Iulia Toader <itoader@cloudbasesolutions.com>
Depends-On: I8e7782d3e1e9f8e92406604f05504a7754ffa3c2
Change-Id: I5a243213ff4241b6f70d21a02c606e8fc96ce6e6
Implements: blueprint hyper-v-pci-passthrough
Some external vendordata services want to provide metadata
based on the role of the user who started the instance. It
would be confusing if the metadata returned changed later
if the role of the user changed, so we cache the boot time
roles and then pass those to the external vendordata
service.
Change-Id: Ieb84c945f4f9a21c2b7b892f9b1ead84dca441e9
Some operators would like instance starts to fail if we cannot
fetch dynamic vendordata. Add an option to do that.
Change-Id: I0c31465c5c52cd4c7e4bb229a4452bc4c8df0e88
We should use a service account to make requests to external
vendordata services. This something which we got wrong in the
newton cycle, and discussed how to resolve at the ocata summit.
It is intended that this fix be backported to newton as well.
There is a sample external vendordata server which has been
tested with this implementat at:
https://github.com/mikalstill/vendordata
Change-Id: I7d29ecc00f99724731d120ff94b4bf3210f3a64e
Co-Authored-By: Stephen Finucane <sfinucan@redhat.com>