openstack-dev was decomissioned this night in https://review.openstack.org/621258
Update openstack-dev to openstack-discuss
Change-Id: If51f5d5eb710e06216f6d6981a70d70b6b5783cc
Some tests weren't calling init_host, so the semaphore was None.
This caused the smoke to come out of nova's tests in ways that
would be less confusing if they'd failed during the testing of
the implementing patch.
Instead, set the semaphore to being unbounded, and then override
that later if the user has in fact specified a limit. This relies
on init_host being called very early, but that should be true
already.
Change-Id: If144be253f78b14cef60200a46aefc02c0e19ced
Closes-Bug: #1806123
This reverts commit bbe88786fc.
The new tests are racy and causing a modest amount of
failures in the gate since the change merged, so it is
probably best to just revert the tests so they can be
robustified.
Change-Id: I18bd68ba6e59aba4c450eb85e6f4450d7044b1e9
Related-Bug: #1806126
An earlier change [1] allowed
[compute]resource_provider_association_refresh to be set to zero to
disable the resource tracker's periodic refresh of its local copy of
provider traits and aggregates. To allow for out-of-band changes to
placement (e.g. via the CLI) to be picked up by the resource tracker in
this configuration (or a configuration where the timer is set to a high
value) this change clears the provider tree cache when SIGHUP is sent to
the compute service. The next periodic will repopulate it afresh from
placement.
[1] Iec33e656491848b26686fbf6fb5db4a4c94b9ea8
Change-Id: I65a7ee565ca5b3ec6c33a2fd9e39d461f7d90ed2
If the first argument of assertTrue is True,
the assertion is always passed.
Fix it because it is useless.
Change-Id: Ie954fc770c61956a80d472190e97646a39b7420f
Closes-Bug: #1805800
get_node_uuid was added in [1] and it was used [2], but that code was
removed in Stein [3].
[1] I982b211e0315bdb9a816f346fafffd0f70e46d07
[2] 76136bfb01/nova/compute/manager.py (L3939)
[3] I0851e2d54a1fdc82fe3291fb7e286e790f121e92
Change-Id: I3cd3565b6651677552d8a27c9f7054b0322055fb
This moves _check_allocation_during_evacuate into the
ProviderUsageBaseTestCase base class and drops the
overridden methods from TestEvacuateDeleteServerRestartOriginalCompute.
Change-Id: I6a084031c1d3ffa72b09d2194c44cdd80cc875fa
The _destroy_evacuated_instances method on compute
startup tries to cleanup guests on the hypervisor and
allocations held against that compute node resource
provider by evacuated instances, but doesn't take into
account that those evacuated instances could have been
deleted in the meantime which leads to a lazy-load
InstanceNotFound error that kills the startup of the
compute service.
This change does two things in the _destroy_evacuated_instances
method:
1. Loads the evacuated instances with a read_deleted='yes'
context when calling _get_instances_on_driver(). This
should be fine since _get_instances_on_driver() is already
returning deleted instances anyway (InstanceList.get_by_filters
defaults to read deleted instances unless the filters tell
it otherwise - which we don't in this case). This is needed
so that things like driver.destroy() don't raise
InstanceNotFound while lazy-loading fields on the instance.
2. Skips the call to remove_allocation_from_compute() if the
evacuated instance is already deleted. If the instance is
already deleted, its allocations should have been cleaned
up by its hosting compute service (or the API).
The functional regression test is updated to show the bug is
now fixed.
Change-Id: I1f4b3540dd453650f94333b36d7504ba164192f7
Closes-Bug: #1794996
Previously, there was a just a comment about removing usage on the
destination node. This is incorrect: usage is removed on the compute
host specified by the nodename parameter to the method. This patch
corrects this in a proper docstring.
Change-Id: I2f676966136a78bb9600626852584f838cb08c5b
Introduce an I/O semaphore to limit the number of concurrent
disk-IO-intensive operations. This could reduce disk contention from
image operations like image download, image format conversion, snapshot
extraction, etc.
The new config option max_concurrent_disk_ops can be set in nova.conf
per compute host and would be virt-driver-agnostic. It is default to 0
which means no limit.
blueprint: io-semaphore-for-concurrent-disk-ops
Change-Id: I897999e8a4601694213f068367eae9608cdc7bbb
Signed-off-by: Jack Ding <jack.ding@windriver.com>
This commit adds support for the High Precision Event Timer (HPET) for
x86 guests in the libvirt driver. The timer can be set by image property
'hw_time_hpet'. By default it remains turned off. When it is turned on
the HPET timer is activated in libvirt.
If the image property 'hw_time_hpet' is incorrectly set to a
non-boolean, the HPET timer remains turned off.
blueprint: support-hpet-on-guest
Change-Id: I3debf725544cae245fd31a8d97650392965d480a
Signed-off-by: Jack Ding <jack.ding@windriver.com>
There are cases where ``root_provider_id`` of a resource provider is
set to NULL just after it is upgraded to the Rocky release. In such
cases getting allocation candidates raises a Keyerror.
This patch fixes that bug for cases there is no sharing or nested
providers in play.
Change-Id: I9639d852078c95de506110f24d3f35e7cf5e361e
Closes-Bug:#1799892
The Cinder v1 API was deprecated in Juno on removed completely in
Queens. We no do not support compatibility between Stein Nova and Queens
Cinder, so this checking can be removed.
Change-Id: I947f50e921159f66b425f10e31a08a3e0840228e
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
At the Stein summit (and previous discussions) the topic of exposing
cellsv2 out of the API came up again. This patch adds two FAQ entries
reflecting my notes from early design decisions about why we did not
want to do that, along with more recent examples, such as FFU.
These are my feelings on the subject and I was asked to put these into
FAQ form for posterity to make the discussion easier in the future. I
would recommend that we agree on these and then codify them here.
Change-Id: I0499e141456fcca63f95bad25503c4e86c6aa369
Conductor RPC calls the scheduler to get hosts during
server create, which in a multi-create request with a
lot of servers and the default rpc_response_timeout, can
trigger a MessagingTimeout. Due to the old
retry_select_destinations decorator, conductor will retry
the select_destinations RPC call up to max_attempts times,
so thrice by default. This can clobber the scheduler and
placement while the initial scheduler worker is still
trying to process the beefy request and allocate resources
in placement.
This has been recreated in a devstack test patch [1] and
shown to fail with 1000 instances in a single request with
the default rpc_response_timeout of 60 seconds. Changing the
rpc_response_timeout to 300 avoids the MessagingTimeout and
retry loop.
Since Rocky we have the long_rpc_timeout config option which
defaults to 1800 seconds. The RPC client can thus be changed
to heartbeat the scheduler service during the RPC call every
$rpc_response_timeout seconds with a hard timeout of
$long_rpc_timeout. That change is made here.
As a result, the problematic retry_select_destinations
decorator is also no longer necessary and removed here. That
decorator was added in I2b891bf6d0a3d8f45fd98ca54a665ae78eab78b3
and was a hack for scheduler high availability where a
MessagingTimeout was assumed to be a result of the scheduler
service dying so retrying the request was reasonable to hit
another scheduler worker, but is clearly not sufficient
in the large multi-create case, and long_rpc_timeout is a
better fit for that HA type scenario to heartbeat the scheduler
service.
[1] https://review.openstack.org/507918/
Change-Id: I87d89967bbc5fbf59cf44d9a63eb6e9d477ac1f3
Closes-Bug: #1795992
The 'locked' query parameter is not supported
in the "List Servers Detailed" API.
So replace examples using the 'locked' query parameter
with examples using another query parameters.
Change-Id: Ibcea6147dd6716ad544e7ac5fa0df17f8c397a28
Closes-Bug: #1801904
Since we're extracting placement, we shouldn't be referring to artifacts
in placement namespaces anymore. This patch removes such a reference
from the libvirt driver unit tests.
Change-Id: Idc0c2a0c0f885a21dff412bea761bac82a029eb5
Because of a change [1] in the tokenize package in the stdlib of
recent pythons, one of the tests for the hacking checks can fail.
This change skips the test on newer pythons and leaves a TODO
to fix it.
[1] see https://bugs.python.org/issue33899 and
https://bugs.python.org/issue35107
Change-Id: I64744a8144fcf630eea609eb2b2d14974f4fd4bb
Related-Bug: #1804062
In [0] the way parameters are passed to the glance client was changed.
Sadly one required argument was dropped during this, we need to insert
it again in order to fix e.g. rbd backend usage.
[0] https://review.openstack.org/614351
Change-Id: I5a4cfb3c9b8125eca4f6c9561d3023537e606a93
Closes-Bug: 1803717
Add the description about custom resource classes and
overriding standard resource classes in the "Flavors" document.
Change-Id: I5b804db70d229696e7b7c5b5db16946cf1f1c49f
Closes-Bug: #1800663
This makes the _instances_cores_ram_count() method only query for instances
in cells that the tenant actually has instances landed in. We do this by
getting a list of cell mappings that have instance mappings owned by the
project and limiting the scatter/gather operation to just those cells.
Change-Id: I0e2a9b2460145d3aee92f7fddc4f4da16af63ff8
Closes-Bug: #1771810
The current check uses an alignment of 512 bytes and will fail when the
underlying device has sectors of size 4096 bytes, as is common e.g. for
NVMe disks. So use an alignment of 4096 bytes, which is a multiple of
512 bytes and thus will cover both cases.
Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc
Closes-Bug: 1801702
Co-Authored-By: Alexandre Arents <alexandre.arents@corp.ovh.com>