Commit Graph

52898 Commits

Author SHA1 Message Date
ZhongShengping
ba0502182e Update mailinglist from dev to discuss
openstack-dev was decomissioned this night in https://review.openstack.org/621258
Update openstack-dev to openstack-discuss

Change-Id: If51f5d5eb710e06216f6d6981a70d70b6b5783cc
2018-12-05 09:44:35 +08:00
Zuul
5f648dda49 Merge "Refactor TestEvacuateDeleteServerRestartOriginalCompute" 2018-12-04 07:49:27 +00:00
Zuul
3ce9aa0192 Merge "Fix InstanceNotFound during _destroy_evacuated_instances" 2018-12-04 03:34:40 +00:00
Zuul
33c3759b85 Merge "SIGHUP n-cpu to clear provider tree cache" 2018-12-04 02:23:39 +00:00
Michael Still
1e8c2c0dcb Fix sloppy initialization of the new disk ops semaphore.
Some tests weren't calling init_host, so the semaphore was None.
This caused the smoke to come out of nova's tests in ways that
would be less confusing if they'd failed during the testing of
the implementing patch.

Instead, set the semaphore to being unbounded, and then override
that later if the user has in fact specified a limit. This relies
on init_host being called very early, but that should be true
already.

Change-Id: If144be253f78b14cef60200a46aefc02c0e19ced
Closes-Bug: #1806123
2018-12-03 10:19:22 +11:00
Zuul
288c537fcd Merge "Revert "Add regression test for bug 1550919"" 2018-12-01 05:23:43 +00:00
Zuul
3c4018d37d Merge "Fix misuse of assertTrue" 2018-12-01 05:07:18 +00:00
Matt Riedemann
90d16c270a Revert "Add regression test for bug 1550919"
This reverts commit bbe88786fc.

The new tests are racy and causing a modest amount of
failures in the gate since the change merged, so it is
probably best to just revert the tests so they can be
robustified.

Change-Id: I18bd68ba6e59aba4c450eb85e6f4450d7044b1e9
Related-Bug: #1806126
2018-11-30 21:15:33 +00:00
Zuul
8446a1e58d Merge "Add I/O Semaphore to limit concurrent disk ops" 2018-11-30 03:25:18 +00:00
Eric Fried
bbc2fcb8fb SIGHUP n-cpu to clear provider tree cache
An earlier change [1] allowed
[compute]resource_provider_association_refresh to be set to zero to
disable the resource tracker's periodic refresh of its local copy of
provider traits and aggregates. To allow for out-of-band changes to
placement (e.g. via the CLI) to be picked up by the resource tracker in
this configuration (or a configuration where the timer is set to a high
value) this change clears the provider tree cache when SIGHUP is sent to
the compute service. The next periodic will repopulate it afresh from
placement.

[1] Iec33e656491848b26686fbf6fb5db4a4c94b9ea8

Change-Id: I65a7ee565ca5b3ec6c33a2fd9e39d461f7d90ed2
2018-11-29 15:42:08 -06:00
Takashi NATSUME
96b5ef3456 Fix misuse of assertTrue
If the first argument of assertTrue is True,
the assertion is always passed.
Fix it because it is useless.

Change-Id: Ie954fc770c61956a80d472190e97646a39b7420f
Closes-Bug: #1805800
2018-11-29 09:52:19 +00:00
Eric Fried
8c318d0fb2 Remove get_node_uuid
get_node_uuid was added in [1] and it was used [2], but that code was
removed in Stein [3].

[1] I982b211e0315bdb9a816f346fafffd0f70e46d07
[2] 76136bfb01/nova/compute/manager.py (L3939)
[3] I0851e2d54a1fdc82fe3291fb7e286e790f121e92

Change-Id: I3cd3565b6651677552d8a27c9f7054b0322055fb
2018-11-28 16:24:05 -06:00
Zuul
3b2e42f371 Merge "Give drop_move_claim() correct docstring" 2018-11-28 18:50:50 +00:00
Zuul
62245235bc Merge "Add regression test for bug 1550919" 2018-11-28 00:05:06 +00:00
Matt Riedemann
92dbeae1d4 Refactor TestEvacuateDeleteServerRestartOriginalCompute
This moves _check_allocation_during_evacuate into the
ProviderUsageBaseTestCase base class and drops the
overridden methods from TestEvacuateDeleteServerRestartOriginalCompute.

Change-Id: I6a084031c1d3ffa72b09d2194c44cdd80cc875fa
2018-11-27 12:42:48 -05:00
Matt Riedemann
05cd8d1282 Fix InstanceNotFound during _destroy_evacuated_instances
The _destroy_evacuated_instances method on compute
startup tries to cleanup guests on the hypervisor and
allocations held against that compute node resource
provider by evacuated instances, but doesn't take into
account that those evacuated instances could have been
deleted in the meantime which leads to a lazy-load
InstanceNotFound error that kills the startup of the
compute service.

This change does two things in the _destroy_evacuated_instances
method:

1. Loads the evacuated instances with a read_deleted='yes'
   context when calling _get_instances_on_driver(). This
   should be fine since _get_instances_on_driver() is already
   returning deleted instances anyway (InstanceList.get_by_filters
   defaults to read deleted instances unless the filters tell
   it otherwise - which we don't in this case). This is needed
   so that things like driver.destroy() don't raise
   InstanceNotFound while lazy-loading fields on the instance.

2. Skips the call to remove_allocation_from_compute() if the
   evacuated instance is already deleted. If the instance is
   already deleted, its allocations should have been cleaned
   up by its hosting compute service (or the API).

The functional regression test is updated to show the bug is
now fixed.

Change-Id: I1f4b3540dd453650f94333b36d7504ba164192f7
Closes-Bug: #1794996
2018-11-27 12:42:48 -05:00
Artom Lifshitz
3e32e76d83 Give drop_move_claim() correct docstring
Previously, there was a just a comment about removing usage on the
destination node. This is incorrect: usage is removed on the compute
host specified by the nodename parameter to the method. This patch
corrects this in a proper docstring.

Change-Id: I2f676966136a78bb9600626852584f838cb08c5b
2018-11-26 19:34:09 -05:00
zhufl
8545ba2af7 Add missing ws seperator between words
This is to add missing ws seperator between words, usually
in log messages.

Change-Id: I71bf4c5b5be4dbc89a28bf243b7d11cf1d612ab4
2018-11-26 23:42:18 +00:00
Zuul
c1de096098 Merge "Add debug logs when doubling-up allocations during scheduling" 2018-11-26 22:14:06 +00:00
Zuul
594c653dc1 Merge "Add HPET timer support for x86 guests" 2018-11-24 16:50:57 +00:00
Zuul
1a1ea8e2aa Merge "Use long_rpc_timeout in select_destinations RPC call" 2018-11-21 23:51:14 +00:00
Zuul
a5d63f7e9e Merge "Make supports_direct_io work on 4096b sector size" 2018-11-21 22:39:33 +00:00
Zuul
1d444704a2 Merge "Default embedded instance.flavor.is_public attribute" 2018-11-21 21:20:00 +00:00
Jack Ding
728f20e8f4 Add I/O Semaphore to limit concurrent disk ops
Introduce an I/O semaphore to limit the number of concurrent
disk-IO-intensive operations. This could reduce disk contention from
image operations like image download, image format conversion, snapshot
extraction, etc.

The new config option max_concurrent_disk_ops can be set in nova.conf
per compute host and would be virt-driver-agnostic. It is default to 0
which means no limit.

blueprint: io-semaphore-for-concurrent-disk-ops
Change-Id: I897999e8a4601694213f068367eae9608cdc7bbb
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-11-21 15:57:11 -05:00
Zuul
7217e38baf Merge "Remove v1 check in Cinder client version lookup" 2018-11-21 05:35:43 +00:00
Zuul
208db51fa1 Merge "Consider root id is None in the database case" 2018-11-21 02:00:01 +00:00
Jack Ding
9e884de68a Add HPET timer support for x86 guests
This commit adds support for the High Precision Event Timer (HPET) for
x86 guests in the libvirt driver. The timer can be set by image property
'hw_time_hpet'. By default it remains turned off. When it is turned on
the HPET timer is activated in libvirt.

If the image property 'hw_time_hpet' is incorrectly set to a
non-boolean, the HPET timer remains turned off.

blueprint: support-hpet-on-guest
Change-Id: I3debf725544cae245fd31a8d97650392965d480a
Signed-off-by: Jack Ding <jack.ding@windriver.com>
2018-11-20 22:39:37 +00:00
Zuul
47bcc39cd6 Merge "Add CellsV2 FAQ about API design decisions" 2018-11-20 15:55:36 +00:00
Zuul
72978c0758 Merge "Add description of custom resource classes" 2018-11-20 15:55:29 +00:00
Tetsuro Nakamura
cdbedac920 Consider root id is None in the database case
There are cases where ``root_provider_id`` of a resource provider is
set to NULL just after it is upgraded to the Rocky release. In such
cases getting allocation candidates raises a Keyerror.

This patch fixes that bug for cases there is no sharing or nested
providers in play.

Change-Id: I9639d852078c95de506110f24d3f35e7cf5e361e
Closes-Bug:#1799892
2018-11-20 14:53:59 +00:00
Sean McGinnis
82c5f9b239 Remove v1 check in Cinder client version lookup
The Cinder v1 API was deprecated in Juno on removed completely in
Queens. We no do not support compatibility between Stein Nova and Queens
Cinder, so this checking can be removed.

Change-Id: I947f50e921159f66b425f10e31a08a3e0840228e
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2018-11-20 14:52:12 +00:00
Dan Smith
dc7039669f Add CellsV2 FAQ about API design decisions
At the Stein summit (and previous discussions) the topic of exposing
cellsv2 out of the API came up again. This patch adds two FAQ entries
reflecting my notes from early design decisions about why we did not
want to do that, along with more recent examples, such as FFU.

These are my feelings on the subject and I was asked to put these into
FAQ form for posterity to make the discussion easier in the future. I
would recommend that we agree on these and then codify them here.

Change-Id: I0499e141456fcca63f95bad25503c4e86c6aa369
2018-11-20 06:44:59 -08:00
Matt Riedemann
5af632e9ca Use long_rpc_timeout in select_destinations RPC call
Conductor RPC calls the scheduler to get hosts during
server create, which in a multi-create request with a
lot of servers and the default rpc_response_timeout, can
trigger a MessagingTimeout. Due to the old
retry_select_destinations decorator, conductor will retry
the select_destinations RPC call up to max_attempts times,
so thrice by default. This can clobber the scheduler and
placement while the initial scheduler worker is still
trying to process the beefy request and allocate resources
in placement.

This has been recreated in a devstack test patch [1] and
shown to fail with 1000 instances in a single request with
the default rpc_response_timeout of 60 seconds. Changing the
rpc_response_timeout to 300 avoids the MessagingTimeout and
retry loop.

Since Rocky we have the long_rpc_timeout config option which
defaults to 1800 seconds. The RPC client can thus be changed
to heartbeat the scheduler service during the RPC call every
$rpc_response_timeout seconds with a hard timeout of
$long_rpc_timeout. That change is made here.

As a result, the problematic retry_select_destinations
decorator is also no longer necessary and removed here. That
decorator was added in I2b891bf6d0a3d8f45fd98ca54a665ae78eab78b3
and was a hack for scheduler high availability where a
MessagingTimeout was assumed to be a result of the scheduler
service dying so retrying the request was reasonable to hit
another scheduler worker, but is clearly not sufficient
in the large multi-create case, and long_rpc_timeout is a
better fit for that HA type scenario to heartbeat the scheduler
service.

[1] https://review.openstack.org/507918/

Change-Id: I87d89967bbc5fbf59cf44d9a63eb6e9d477ac1f3
Closes-Bug: #1795992
2018-11-20 09:03:53 -05:00
Zuul
ea26392239 Merge "Nix refs to ResourceProvider obj from libvirt UT" 2018-11-20 11:29:56 +00:00
Zuul
ab78eb2c79 Merge "Fix server query examples" 2018-11-20 04:57:11 +00:00
Takashi NATSUME
54d3745101 Fix server query examples
The 'locked' query parameter is not supported
in the "List Servers Detailed" API.
So replace examples using the 'locked' query parameter
with examples using another query parameters.

Change-Id: Ibcea6147dd6716ad544e7ac5fa0df17f8c397a28
Closes-Bug: #1801904
2018-11-19 23:22:39 +00:00
Eric Fried
440c268e36 Nix refs to ResourceProvider obj from libvirt UT
Since we're extracting placement, we shouldn't be referring to artifacts
in placement namespaces anymore. This patch removes such a reference
from the libvirt driver unit tests.

Change-Id: Idc0c2a0c0f885a21dff412bea761bac82a029eb5
2018-11-19 23:08:20 +00:00
Chris Dent
9875c37d9a Skip double word hacking test
Because of a change [1] in the tokenize package in the stdlib of
recent pythons, one of the tests for the hacking checks can fail.
This change skips the test on newer pythons and leaves a TODO
to fix it.

[1] see https://bugs.python.org/issue33899 and
    https://bugs.python.org/issue35107

Change-Id: I64744a8144fcf630eea609eb2b2d14974f4fd4bb
Related-Bug: #1804062
2018-11-19 21:54:57 +00:00
Zuul
3e756ff674 Merge "doc: Add minimal documentation for MKS consoles" 2018-11-19 04:34:18 +00:00
Zuul
44dfb58ef4 Merge "doc: Add minimal documentation for RDP consoles" 2018-11-19 04:34:11 +00:00
Zuul
238184b23c Merge "doc: Rewrite the console doc" 2018-11-19 04:34:02 +00:00
Jens Harbott
fd540e2135 Fix regression in glance client call
In [0] the way parameters are passed to the glance client was changed.
Sadly one required argument was dropped during this, we need to insert
it again in order to fix e.g. rbd backend usage.

[0] https://review.openstack.org/614351

Change-Id: I5a4cfb3c9b8125eca4f6c9561d3023537e606a93
Closes-Bug: 1803717
2018-11-16 14:50:41 +00:00
Takashi NATSUME
0e718ddb7a Add description of custom resource classes
Add the description about custom resource classes and
overriding standard resource classes in the "Flavors" document.

Change-Id: I5b804db70d229696e7b7c5b5db16946cf1f1c49f
Closes-Bug: #1800663
2018-11-14 15:47:16 +00:00
Zuul
5adfb64c6c Merge "Update compute API.get() stubs in test_server_actions" 2018-11-13 22:07:39 +00:00
Zuul
6fd549399e Merge "Update compute API.get() stubs in test_serversV21" 2018-11-13 22:07:30 +00:00
Zuul
7cc7969e02 Merge "Update compute API.get() mocks in test_server_metadata" 2018-11-13 22:07:23 +00:00
Zuul
fdff55ae7f Merge "Fix a help string in nova-manage" 2018-11-13 21:38:39 +00:00
Zuul
986ee11d82 Merge "Make _instances_cores_ram_count() be smart about cells" 2018-11-13 17:55:52 +00:00
Surya Seetharaman
7788165925 Make _instances_cores_ram_count() be smart about cells
This makes the _instances_cores_ram_count() method only query for instances
in cells that the tenant actually has instances landed in. We do this by
getting a list of cell mappings that have instance mappings owned by the
project and limiting the scatter/gather operation to just those cells.

Change-Id: I0e2a9b2460145d3aee92f7fddc4f4da16af63ff8
Closes-Bug: #1771810
2018-11-13 03:35:33 -05:00
Jens Harbott
14d98ef1b4 Make supports_direct_io work on 4096b sector size
The current check uses an alignment of 512 bytes and will fail when the
underlying device has sectors of size 4096 bytes, as is common e.g. for
NVMe disks. So use an alignment of 4096 bytes, which is a multiple of
512 bytes and thus will cover both cases.

Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc
Closes-Bug: 1801702
Co-Authored-By: Alexandre Arents <alexandre.arents@corp.ovh.com>
2018-11-13 02:17:32 +00:00