With this change, [compute]resource_provider_association_refresh is
allowed to be zero, which will disable refreshing resource provider
traits and aggregates.
Inventories are still refreshed in a different code path.
A subsequent patch will be submitted to allow manual refresh by sending
SIGHUP to the compute process.
Change-Id: Iec33e656491848b26686fbf6fb5db4a4c94b9ea8
The shutdown_timeout config option was added in commit
c07ed15415c0ec3c5862f437f440632eff1e94df without a min
value. The min value was later added in commit
d67ea6e5549086eee1b39946648410f22d0041a9 and set to 1,
which means the option can never be configured to mean
"always immediately shutdown". That is also inconsistent
with the description of the "os_shutdown_timeout" image
property in the glance metadata definition which says the
value can be set to 0 to force an immediate shutdown of
the guest.
This fixes the min value in the config option to be 0 which
is already what happens if we are not performing a clean
shutdown at all.
Change-Id: I399b9031d2aa477194697390e2cd3f78e3ac0f91
Closes-Bug: #1799707
nova usage-list can return incorrect results, having resources counted
twice. This only occurs when using the 2.40 microversion or later.
This microversion introduced pagination, which doesn't work properly.
Nova API will sort the instances using the tenant id and instance uuid,
but 'os-simple-tenant-usage' will not preserve the order when returning
the results.
For this reason, subsequent API calls made by the client will use the
wrong marker (which is supposed to be the last instance id), ending
up counting the same instances twice.
Change-Id: I6c7a67b23ec49aa207c33c38580acd834bb27e3c
Closes-Bug: #1796689
This will check if a deployment is currently using consoles and warns
the operator to set [workarounds]enable_consoleauth = True on their
console proxy host if they are performing a rolling upgrade which is
not yet complete.
Partial-Bug: #1798188
Change-Id: Idd6079ce4038d6f19966e98bcc61422b61b3636b
The CachingScheduler has been deprecated since Pike [1].
It does not use the placement service and as more of nova
relies on placement for managing resource allocations,
maintaining compabitility for the CachingScheduler is
exorbitant.
The release note in this change goes into much more detail
about why the FilterScheduler + Placement should be a
sufficient replacement for the original justification
for the CachingScheduler along with details on how to migrate
from the CachingScheduler to the FilterScheduler.
Since the [scheduler]/driver configuration option does allow
loading out-of-tree drivers and the scheduler driver interface
does have the USES_ALLOCATION_CANDIDATES variable, it is
possible that there are drivers being used which are also not
using the placement service. The release note also explains this
but warns against it. However, as a result some existing
functional tests, which were using the CachingScheduler, are
updated to still test scheduling without allocations being
created in the placement service.
Over time we will likely remove the USES_ALLOCATION_CANDIDATES
variable in the scheduler driver interface along with the
compatibility code associated with it, but that is left for
a later change.
[1] Ia7ff98ff28b7265058845e46b277317a2bfc96d2
Change-Id: I1832da2190be5ef2b04953938860a56a43e8cddf
When online_data_migrations raise exceptions, nova/cinder-manage catches
the exceptions, prints fairly useless "something didn't work" messages,
and moves on. Two issues:
1) The user(/admin) has no way to see what actually failed (exception
detail is not logged)
2) The command returns exit status 0, as if all possible migrations have
been completed successfully - this can cause failures to get missed,
especially if automated
This change adds logging of the exceptions, and introduces a new exit
status of 2, which indicates that no updates took effect in the last
batch attempt, but some are (still) failing, which requires intervention.
Change-Id: Ib684091af0b19e62396f6becc78c656c49a60504
Closes-Bug: #1796192
Add a new microversion 2.67 to support specify ``volume_type``
when boot instances.
Part of bp boot-instance-specific-storage-backend
Change-Id: I13102243f7ce36a5d44c1790f3a633703373ebf7
This patch implements live migration of instances across compute nodes.
Each compute node must be managing a cluster in the same vCenter and ESX
hosts must have vMotion enabled [1].
If the instance is located on a datastore shared between source
and destination cluster, then only the host is changed. Otherwise, we
select the most suitable datastore on the destination cluster and
migrate the instance there.
[1] https://kb.vmware.com/s/article/2054994
Co-Authored-By: gkotton@vmware.com
blueprint vmware-live-migration
Change-Id: I640013383e684497b2d99a9e1d6817d68c4d0a4b
User confirming migration as well as a successfull live migration also
triggers the delete allocation code path. This patch adds test coverage
for these code paths.
If the deletion of the source allocation of a confirmed migration fails
then nova puts the instance to ERROR state. The instance still has two
allocations in this state and deleting the instance only deletes the one that
is held by the instance_uuid. This patch logs an ERROR describing that in this
case the allocation held by the migration_uuid is leaked. The same true
for live migration failing to delete allocaton on the source host.
As this makes every caller of _delete_allocation_after_move logging the
same error for AllocationDeleteFailed exception this patch moves that logging
into _delete_allocation_after_move.
Blueprint: use-nested-allocation-candidates
Change-Id: I99427a52676826990d2a2ffc82cf30ad945b939c
This patch renames the set_and_clear_allocations function in the
scheduler report client to move_allocations and adds handling of
consumer generation conflict for it. This call now moves everything from
one consumer to another and raises AllocationMoveFailed to the caller if
the move fails due to consumer generation conflict.
When migration or resize fails to move the source host allocation to the
migration_uuid then the API returns HTTP 409 and the migration is aborted.
If reverting a migration, a resize, or a resize to same host fails to move
the source host allocation back to the instance_uuid due consumer generation
conflict the instance will be put into ERROR state. The instance still has two
allocations in this state and deleting the instance only deletes the one that
is held by the instance_uuid. This patch logs an ERROR describing that in this
case the allocation held by the migration_uuid is leaked.
Blueprint: use-nested-allocation-candidates
Change-Id: Ie991d4b53e9bb5e7ec26da99219178ab7695abf6
Presently if a cell is down, the instances in that cell are
skipped from results. Sometimes this may not be desirable for
operators as it may confuse the users who saw more instances in
their previous listing than now. This patch adds a new api config
option called list_records_by_skipping_down_cells which can be set to
False (True by default) if the operator desires to just return an
API error altogether if the user has any instance in the down cell
instead of skipping. This is essentially a configurable revert of
change I308b494ab07f6936bef94f4c9da45e9473e3534d for bug 1726301 so
that operators can opt into the 500 response behaviour during listing.
Change-Id: Id749761c58d4e1bc001b745d49b6ff0f3732e133
Related-Bug: #1726301
The hide_server_address_states config option and related
policy rule were deprecated in Queens:
I6040e8c2b3e132b0dfd09f82ae041b4786a63483
They are now removed in Stein as part of the API extension
merge effort.
Part of blueprint api-extensions-merge-stein
Change-Id: Ib3582038274dedbf524ffcaffe818ff0e751489d
This adds the changes-before filter to the servers,
os-instance-actions and os-migrations APIs for
filtering resources which were last updated before
or equal to the given time. The changes-before filter,
like the changes-since filter, will return deleted
server resources.
Part of bp support-to-query-nova-resources-filter-by-changes-before
Change-Id: If91c179e3823c8b0da744a9363906b0f7b05c326
The existing Nova's default machine type for ARMv7 ('vexpress-15') was
added more than four years ago (in commit: 5b27fe7: "libvirt: Allow
specification of default machine type"). The 'vexpress-15' board is a
specific development board, which has hardware limitations (like only
single ethernet adapter).
The upstream QEMU recommendation[1] for the past couple of years is to
use the 'virt' machine type for both ARMv7, and AArch64, which was
specifically designed to be used with virutal machines. Quoting a
write-up[2] from QEMU's ARM subsystem maintainer:
"Why the 'virt' board?
"QEMU has models of nearly 50 different ARM boards, which makes it
difficult for new users to pick one which is right for their
purposes. This wild profusion reflects a similar diversity in the
real hardware world: ARM systems come in many different flavours
with very different hardware components and capabilities. A kernel
which is expecting to run on one system will likely not run on
another. Many of QEMU’s models are annoyingly limited because the
real hardware was also limited — there’s no PCI bus on most mobile
devices, after all, and a fifteen year old development board
wouldn’t have had a gigabyte of RAM on it.
"My recommendation is that if you don’t know for certain that you
want a model of a specific device, you should choose the “virt”
board. This is a purely virtual platform designed for use in virtual
machines, and it supports PCI, virtio, a recent ARM CPU and large
amounts of RAM. The only thing it doesn’t have out of the box is
graphics, but graphical programs on a fully emulated system run very
slowly anyway so are best avoided."
So, change the default machine type for ARMv7 arch to be 'virt'.
[1] https://wiki.qemu.org/Documentation/Platforms/ARM
[2] https://translatedcode.wordpress.com/2016/11/03/installing-debian-on-qemus-32-bit-arm-virt-board/
Change-Id: If9ffa5a019f67734a9f30ccaf3ab96ff41262dc8
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Adds a new policy rule "os_compute_api:servers:allow_all_filters"
to control whether a user can use all filters when listing servers.
Closes-bug: #1737050
Change-Id: Ia5504da9a00bad689766aeda20255e10b7629f63
This makes the instance_list module support batching across cells
with a couple of different strategies, and with room to add more
in the future.
Before this change, an instance list with limit 1000 to a
deployment with 10 cells would generate a query to each cell
database with the same limit. Thus, that API request could end
up processing up to 10,000 instance records despite only
returning 1000 to the user (because of the limit).
This uses the batch functionality in the base code added in
Iaa4759822e70b39bd735104d03d4deec988d35a1
by providing a couple of strategies by which the batch size
per cell can be determined. These should provide a lot of gain
in the short term, and we can extend them with other strategies
as we identify some with additional benefits.
Closes-Bug: #1787977
Change-Id: Ie3a5f5dc49f8d9a4b96f1e97f8a6ea0b5738b768
The time has come.
These filters haven't been necessary since Ocata [1]
when the filter scheduler started using placement
to filter on VCPU, DISK_GB and MEMORY_MB. The
only reason to use them with any in-tree scheduler
drivers is if using the CachingScheduler which doesn't
use placement, but the CachingScheduler itself has
been deprecated since Pike [2]. Furthermore, as of
change [3] in Stein, the ironic driver no longer
reports vcpu/ram/disk inventory for ironic nodes
which will make these filters filter out ironic nodes
thinking they don't have any inventory. Also, as
noted in [4], the DiskFilter does not account for
volume-backed instances and may incorrectly filter
out a host based on disk inventory when it would
otherwise be OK if the instance is not using local
disk.
The related aggregate filters are left intact for
now, see blueprint placement-aggregate-allocation-ratios.
[1] Ie12acb76ec5affba536c3c45fbb6de35d64aea1b
[2] Ia7ff98ff28b7265058845e46b277317a2bfc96d2
[3] If2b8c1a76d7dbabbac7bb359c9e572cfed510800
[4] I9c2111f7377df65c1fc3c72323f85483b3295989
Change-Id: Id62136d293da55e4bb639635ea5421a33b6c3ea2
Related-Bug: #1787910
The release notes said it was okay not to run the nova-consoleauth
service in Rocky, but that's not true because the Rocky code is storing
new console authorization tokens in both the database backend and the
existing nova-consoleauth backend. The use of nova-consoleauth will be
discontinued in Stein (for non-cells v1). We can't remove
nova-consoleauth until we remove cells v1.
Closes-Bug: #1788470
Change-Id: Ibbdc7c50c312da2acc59dfe64de95a519f87f123
/reshaper provides a way to atomically modify some allocations and
inventory in a single transaction, allowing operations like migrating
some inventory from a parent provider to a new child.
A fair amount of code is reused from handler/inventory.py, some
refactoring is in order before things get too far with that.
In handler/allocation.py some code is extracted to its own methods
so it can be reused from reshaper.py.
This is done as microversion 1.30.
A suite of gabbi tests is provided which attempt to cover various
failures including schema violations, generation conflicts, and
data conflicts.
api-ref, release notes and rest history are updated
Change-Id: I5b33ac3572bc3789878174ffc86ca42ae8035cfa
Partially-Implements: blueprint reshape-provider-tree
If IPv6 is passed to URI it should be wrapped within square brackets.
This patch detects IPv6 to form migration URI properly. Domain name, IPv4 or
already bracketed IPv6 address will pass as is
Extend tests to include collapsed IPv6 addresses and IPv6 addresses with
port
Change-Id: I1201db996ea6ceaebd49479b298d74585a78b006
Closes-Bug: #1786058
ChanceScheduler is deprecated in Pike [1] and will be removed in a
subsequent release.
[1] https://review.openstack.org/#/c/492210/
Change-Id: I44f9c1cabf9fc64b1a6903236bc88f5ed8619e9e
API extensions policies have been deprecated in 17.0.0
release[1]. This commit removes them.
[1] Ie05f4e84519f8a00ffb66ea5ee920d5c7722a66b
Change-Id: Ib3faf85c78bc2cdee13175560dc1458ddb6cb7a8
The liberty, mitaka, and newton branches are closed so there is no
reason to scan for release notes dynamically. Use static content for
those pages to speed up the release notes build a little bit.
Change-Id: I983346c97df96fda988a2fefec89c3f0d6c14498
Signed-off-by: Doug Hellmann <doug@doughellmann.com>
Ironic nodes should all be using resource classes for scheduling by now,
which means reporting CPU/RAM/disk isn't useful. Report these as zero so
they cannot be scheduled.
Since we now require resource classes, raise an exception in
update_provider_tree for any nodes that don't have a resource class set
on the node.
Change-Id: If2b8c1a76d7dbabbac7bb359c9e572cfed510800
This reverts commit 8e6d5d404cf49e5b68b43c62e7f6d7db2771a1f4.
As detailed in the bug, this is overly racy, waiting for the event
potentially a long ways (and a couple RPC calls) away from where
the event will be triggered. The compute manager now has a generic
mechanism to do this which conflicts and replaces this functionality,
if enabled (default is off in Rocky, to be defaulted to on in Stein).
Conflicts:
nova/tests/unit/virt/libvirt/test_driver.py
nova/virt/libvirt/driver.py
Change-Id: Ibf2b5eeafd962e93ae4ab6290015d58c33024132
Closes-Bug: #1786346