openstack/nova - nova - OpenDev: Free Software Needs Free Tools

Author	SHA1	Message	Date
Yikun Jiang	4c3698c0b6	libvirt: remove live_migration_progress_timeout config Config option ``libvirt.live_migration_progress_timeout`` was deprecated in Ocata, and can now be removed. This patch remove live_migration_progress_timeout and also remove the migration progress timeout related logic. Change-Id: Ife89a705892ad96de6d5f8e68b6e4b99063a7512 blueprint: live-migration-force-after-timeout	2018-12-14 14:50:38 -05:00
Yikun Jiang	99a075cc94	libvirt: add live migration timeout action This patch remove the auto trigger post-copy, and add a new libvirt configuration 'live_migration_completion_action'. This option determines what actions will be taken against a VM after ``live_migration_completion_timeout`` expires. This option is set to 'abort' action by default, that means the live migrate operation will be aborted after completion timeout expires. If option is set to 'force_complete', that means will either pause the VM or trigger post_copy depending on if post copy is enabled and available. Note that the progress based post-copy triggering from the libvirt driver will be removed in next patch [1]. [1] Ife89a705892ad96de6d5f8e68b6e4b99063a7512 Change-Id: I0d286d12e588b431df3d94cf2e65d636bcdea2f8 blueprint: live-migration-force-after-timeout	2018-12-14 14:50:29 -05:00
Stephen Finucane	ae2e5650d1	Fail to live migration if instance has a NUMA topology Live migration is currently totally broken if a NUMA topology is present. This affects everything that's been regrettably stuffed in with NUMA topology including CPU pinning, hugepage support and emulator thread support. Side effects can range from simple unexpected performance hits (due to instances running on the same cores) to complete failures (due to instance cores or huge pages being mapped to CPUs/NUMA nodes that don't exist on the destination host). Until such a time as we resolve these issues, we should alert users to the fact that such issues exist. A workaround option is provided for operators that _really_ need the broken behavior, but it's defaulted to False to highlight the brokenness of this feature to unsuspecting operators. Change-Id: I217fba9138132b107e9d62895d699d238392e761 Signed-off-by: Stephen Finucane <sfinucan@redhat.com> Related-bug: #1289064	2018-12-14 14:08:35 -05:00
Zuul	2c7aa78980	Merge "Move nova-cells-v1 to experimental queue"	2018-12-11 01:23:35 +00:00
Zuul	f21a428a69	Merge "Add compute_node ratio online data migration script"	2018-12-10 10:11:32 +00:00
Balazs Gibizer	c6e7cc927f	Final release note for versioned notification transformation As the last versioned notification transformation patch I019e88fabd1d386c0d6395a7b1969315873485fd has been merged this patch adds a release note that the versioned interface is complete. Change-Id: I22586e470356cca5238b94faf257d8886742618f Implements: bp versioned-notification-transformation-stein	2018-12-10 10:09:40 +01:00
Zuul	919c7c3f4b	Merge "Use new ``initial_xxx_allocation_ratio`` CONF"	2018-12-08 15:32:38 +00:00
Matt Riedemann	e02fbb53d5	Move nova-cells-v1 to experimental queue Cells v1 has been deprecated since Pike. CERN has been running with cells v2 since Queens. The cells v1 job used to be the only thing that ran with nova-network, but we switched the job to use neutron in Rocky: I9de6b710baffdffcd1d7ab19897d5776ef27ae7e The cells v1 job also suffers from intermittent test failures, like with snapshot tests. Given the deprecated nature of cells v1 we should just move it to the experimental queue so that it can be run on-demand if desired but does not gate on all nova changes, thus further moving along its eventual removal. This change also updates the cells v1 status doc and adds some documentation about the different job queues that nova uses for integration testing. Change-Id: I74985f1946fffd0ae4d38604696d0d4656b6bf4e Closes-Bug: #1807407	2018-12-07 10:59:37 -05:00
Zuul	61cd9ccc45	Merge "Change the default values of XXX_allocation_ratio"	2018-12-06 23:44:42 +00:00
Yikun Jiang	3562a6a957	Add compute_node ratio online data migration script This patch adds an online data migration script to process the ratio with 0.0 or None value. If it's an existing record with 0.0 values, we'd want to do what the compute does, which is use the configure ``xxx_allocation_ratio`` config if it's not None, and fallback to using the ``initial_xxx_allocation_ratio`` otherwise. Change-Id: I3a6d4d3012b3fffe94f15a724dd78707966bb522 blueprint: initial-allocation-ratios	2018-12-05 11:36:23 -05:00
Zuul	5bf6f6304e	Merge "Deprecate the nova-xvpvncproxy service"	2018-12-05 13:18:41 +00:00
Zuul	e26ac8f24a	Merge "Deprecate the nova-console service"	2018-12-05 13:05:06 +00:00
Zuul	33c3759b85	Merge "SIGHUP n-cpu to clear provider tree cache"	2018-12-04 02:23:39 +00:00
Yikun Jiang	08f3ae9606	Use new ``initial_xxx_allocation_ratio`` CONF This patch adds new ``initial_xxx_allocation_ratio`` CONF options and modifies the resource tracker's initial compute node creation to use these values. During the update_available_resource periodic task, the allocation ratios reported to inventory for VCPU, MEMORY_MB and DISK_GB will be based on: * If CONF._allocation_ratio is set, use it. This overrides everything including externally set allocation ratios via the placement API. If reporting inventory for the first time, the CONF.initial__allocation_ratio value is used. For everything else, the inventory reported remains unchanged which allows operators to set the allocation ratios on the inventory records in placement directly without worrying about nova-compute overwriting those changes. As a result, several TODOs are removed from the virt drivers that implement the update_provider_tree interface and a TODO in the resource tracker about unset-ing allocation ratios to get back to initial values. Change-Id: I14a310b20bd9892e7b34464e6baad49bf5928ece blueprint: initial-allocation-ratios	2018-11-30 15:32:06 -05:00
Zuul	8446a1e58d	Merge "Add I/O Semaphore to limit concurrent disk ops"	2018-11-30 03:25:18 +00:00
Eric Fried	bbc2fcb8fb	SIGHUP n-cpu to clear provider tree cache An earlier change [1] allowed [compute]resource_provider_association_refresh to be set to zero to disable the resource tracker's periodic refresh of its local copy of provider traits and aggregates. To allow for out-of-band changes to placement (e.g. via the CLI) to be picked up by the resource tracker in this configuration (or a configuration where the timer is set to a high value) this change clears the provider tree cache when SIGHUP is sent to the compute service. The next periodic will repopulate it afresh from placement. [1] Iec33e656491848b26686fbf6fb5db4a4c94b9ea8 Change-Id: I65a7ee565ca5b3ec6c33a2fd9e39d461f7d90ed2	2018-11-29 15:42:08 -06:00
Matt Riedemann	d0ba488c1d	Make [cinder]/catalog_info no longer require a service_name The service_name part of the cinder catalog_info option is not necessary to lookup the endpoint from the service catalog when we have the endpoint type (volumev3) and the interface (publicURL). This changes the option default to not include the service name and no longer makes a service name required. If one is provided, then it will be sent along to KSA/cinderclient, otherwise it is omitted. As this is a change in behavior of the config option, a release note is added. Change-Id: I89395fafffd60981fba17a7b09f7015e1f827b62 Closes-Bug: #1803627	2018-11-28 18:33:30 -05:00
Yikun Jiang	212eff600a	Change the default values of XXX_allocation_ratio This patch including 2 changes: 1. Change the default values for CONF.(cpu\|ram\|disk)_allocation_ratio to ``None`` 2. Change the resource tracker to overwrite the compute node's allocations ratios to the value of the XXX_allocation_ratio if the value of these options is NOT ``None`` or ``0.0``. The "0.0" condition is for upgrade impact, and it will be removed in the next version (T version). Change-Id: I6893d63dc5f29bc2eb348fe0aa9fbc8490e6eb40 blueprint: initial-allocation-ratios	2018-11-28 16:30:14 +08:00
Zuul	594c653dc1	Merge "Add HPET timer support for x86 guests"	2018-11-24 16:50:57 +00:00
Zuul	1a1ea8e2aa	Merge "Use long_rpc_timeout in select_destinations RPC call"	2018-11-21 23:51:14 +00:00
Jack Ding	728f20e8f4	Add I/O Semaphore to limit concurrent disk ops Introduce an I/O semaphore to limit the number of concurrent disk-IO-intensive operations. This could reduce disk contention from image operations like image download, image format conversion, snapshot extraction, etc. The new config option max_concurrent_disk_ops can be set in nova.conf per compute host and would be virt-driver-agnostic. It is default to 0 which means no limit. blueprint: io-semaphore-for-concurrent-disk-ops Change-Id: I897999e8a4601694213f068367eae9608cdc7bbb Signed-off-by: Jack Ding <jack.ding@windriver.com>	2018-11-21 15:57:11 -05:00
Jack Ding	9e884de68a	Add HPET timer support for x86 guests This commit adds support for the High Precision Event Timer (HPET) for x86 guests in the libvirt driver. The timer can be set by image property 'hw_time_hpet'. By default it remains turned off. When it is turned on the HPET timer is activated in libvirt. If the image property 'hw_time_hpet' is incorrectly set to a non-boolean, the HPET timer remains turned off. blueprint: support-hpet-on-guest Change-Id: I3debf725544cae245fd31a8d97650392965d480a Signed-off-by: Jack Ding <jack.ding@windriver.com>	2018-11-20 22:39:37 +00:00
Matt Riedemann	5af632e9ca	Use long_rpc_timeout in select_destinations RPC call Conductor RPC calls the scheduler to get hosts during server create, which in a multi-create request with a lot of servers and the default rpc_response_timeout, can trigger a MessagingTimeout. Due to the old retry_select_destinations decorator, conductor will retry the select_destinations RPC call up to max_attempts times, so thrice by default. This can clobber the scheduler and placement while the initial scheduler worker is still trying to process the beefy request and allocate resources in placement. This has been recreated in a devstack test patch [1] and shown to fail with 1000 instances in a single request with the default rpc_response_timeout of 60 seconds. Changing the rpc_response_timeout to 300 avoids the MessagingTimeout and retry loop. Since Rocky we have the long_rpc_timeout config option which defaults to 1800 seconds. The RPC client can thus be changed to heartbeat the scheduler service during the RPC call every $rpc_response_timeout seconds with a hard timeout of $long_rpc_timeout. That change is made here. As a result, the problematic retry_select_destinations decorator is also no longer necessary and removed here. That decorator was added in I2b891bf6d0a3d8f45fd98ca54a665ae78eab78b3 and was a hack for scheduler high availability where a MessagingTimeout was assumed to be a result of the scheduler service dying so retrying the request was reasonable to hit another scheduler worker, but is clearly not sufficient in the large multi-create case, and long_rpc_timeout is a better fit for that HA type scenario to heartbeat the scheduler service. [1] https://review.openstack.org/507918/ Change-Id: I87d89967bbc5fbf59cf44d9a63eb6e9d477ac1f3 Closes-Bug: #1795992	2018-11-20 09:03:53 -05:00
Jens Harbott	14d98ef1b4	Make supports_direct_io work on 4096b sector size The current check uses an alignment of 512 bytes and will fail when the underlying device has sectors of size 4096 bytes, as is common e.g. for NVMe disks. So use an alignment of 4096 bytes, which is a multiple of 512 bytes and thus will cover both cases. Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc Closes-Bug: 1801702 Co-Authored-By: Alexandre Arents <alexandre.arents@corp.ovh.com>	2018-11-13 02:17:32 +00:00
Eric Fried	11a5fcbb6a	Allow resource_provider_association_refresh=0 With this change, [compute]resource_provider_association_refresh is allowed to be zero, which will disable refreshing resource provider traits and aggregates. Inventories are still refreshed in a different code path. A subsequent patch will be submitted to allow manual refresh by sending SIGHUP to the compute process. Change-Id: Iec33e656491848b26686fbf6fb5db4a4c94b9ea8	2018-11-06 11:06:44 -06:00
Zuul	b9697550ed	Merge "Fix min config value for shutdown_timeout option"	2018-10-31 08:00:09 +00:00
Zuul	edee8e6f8d	Merge "Add nova-status upgrade check for consoles"	2018-10-30 22:44:07 +00:00
Zuul	a279671984	Merge "Fix os-simple-tenant-usage result order"	2018-10-27 08:57:52 +00:00
Matt Riedemann	2cff865af4	Fix min config value for shutdown_timeout option The shutdown_timeout config option was added in commit c07ed15415c0ec3c5862f437f440632eff1e94df without a min value. The min value was later added in commit d67ea6e5549086eee1b39946648410f22d0041a9 and set to 1, which means the option can never be configured to mean "always immediately shutdown". That is also inconsistent with the description of the "os_shutdown_timeout" image property in the glance metadata definition which says the value can be set to 0 to force an immediate shutdown of the guest. This fixes the min value in the config option to be 0 which is already what happens if we are not performing a clean shutdown at all. Change-Id: I399b9031d2aa477194697390e2cd3f78e3ac0f91 Closes-Bug: #1799707	2018-10-26 11:58:01 -04:00
Lucian Petrut	afc3a16ce3	Fix os-simple-tenant-usage result order nova usage-list can return incorrect results, having resources counted twice. This only occurs when using the 2.40 microversion or later. This microversion introduced pagination, which doesn't work properly. Nova API will sort the instances using the tenant id and instance uuid, but 'os-simple-tenant-usage' will not preserve the order when returning the results. For this reason, subsequent API calls made by the client will use the wrong marker (which is supposed to be the last instance id), ending up counting the same instances twice. Change-Id: I6c7a67b23ec49aa207c33c38580acd834bb27e3c Closes-Bug: #1796689	2018-10-26 14:47:52 +00:00
melanie witt	d2535b0261	Add nova-status upgrade check for consoles This will check if a deployment is currently using consoles and warns the operator to set [workarounds]enable_consoleauth = True on their console proxy host if they are performing a rolling upgrade which is not yet complete. Partial-Bug: #1798188 Change-Id: Idd6079ce4038d6f19966e98bcc61422b61b3636b	2018-10-26 04:34:49 +00:00
Matt Riedemann	25dadb94db	Remove the CachingScheduler The CachingScheduler has been deprecated since Pike [1]. It does not use the placement service and as more of nova relies on placement for managing resource allocations, maintaining compabitility for the CachingScheduler is exorbitant. The release note in this change goes into much more detail about why the FilterScheduler + Placement should be a sufficient replacement for the original justification for the CachingScheduler along with details on how to migrate from the CachingScheduler to the FilterScheduler. Since the [scheduler]/driver configuration option does allow loading out-of-tree drivers and the scheduler driver interface does have the USES_ALLOCATION_CANDIDATES variable, it is possible that there are drivers being used which are also not using the placement service. The release note also explains this but warns against it. However, as a result some existing functional tests, which were using the CachingScheduler, are updated to still test scheduling without allocations being created in the placement service. Over time we will likely remove the USES_ALLOCATION_CANDIDATES variable in the scheduler driver interface along with the compatibility code associated with it, but that is left for a later change. [1] Ia7ff98ff28b7265058845e46b277317a2bfc96d2 Change-Id: I1832da2190be5ef2b04953938860a56a43e8cddf	2018-10-18 17:55:36 -04:00
Zuul	0c5feb21b3	Merge "Handle online_data_migrations exceptions"	2018-10-17 07:07:06 +00:00
imacdonn	3eea37b85b	Handle online_data_migrations exceptions When online_data_migrations raise exceptions, nova/cinder-manage catches the exceptions, prints fairly useless "something didn't work" messages, and moves on. Two issues: 1) The user(/admin) has no way to see what actually failed (exception detail is not logged) 2) The command returns exit status 0, as if all possible migrations have been completed successfully - this can cause failures to get missed, especially if automated This change adds logging of the exceptions, and introduces a new exit status of 2, which indicates that no updates took effect in the last batch attempt, but some are (still) failing, which requires intervention. Change-Id: Ib684091af0b19e62396f6becc78c656c49a60504 Closes-Bug: #1796192	2018-10-16 15:49:51 +00:00
Stephen Finucane	4e6cffe45e	Deprecate the nova-xvpvncproxy service This is a relic that has long since been replaced by the noVNC proxy service. Start preparing for its removal. Change-Id: Icb225dec3ad291b751e475bd3703ce0eb30b44db	2018-10-15 10:03:13 +01:00
Stephen Finucane	f18ae13e36	Deprecate the nova-console service As discussed on the mailing list [1]. [1] http://lists.openstack.org/pipermail/openstack-dev/2018-October/135413.html Change-Id: I1f1fa1d0f79bec5a4101e03bc2d43ba581dd35a0	2018-10-15 10:03:08 +01:00
Zuul	396156eb13	Merge "Add microversion 2.67 to support volume_type"	2018-10-13 18:46:09 +00:00
zhangbailin	c7f4190af2	Add microversion 2.67 to support volume_type Add a new microversion 2.67 to support specify ``volume_type`` when boot instances. Part of bp boot-instance-specific-storage-backend Change-Id: I13102243f7ce36a5d44c1790f3a633703373ebf7	2018-10-12 02:57:58 -04:00
Zuul	f8e46a5cf4	Merge "VMware: Live migration of instances"	2018-10-08 19:12:43 +00:00
Stephen Finucane	c19ecc34ea	conf: Deprecated 'config_drive_format' This is no longer necessary with our current minimum libvirt version. Change-Id: Id2beaa7c4e5780199298f8e58fb6c7005e420a69 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>	2018-10-06 14:47:20 +01:00
Radoslav Gerganov	2fe92e9162	VMware: Live migration of instances This patch implements live migration of instances across compute nodes. Each compute node must be managing a cluster in the same vCenter and ESX hosts must have vMotion enabled [1]. If the instance is located on a datastore shared between source and destination cluster, then only the host is changed. Otherwise, we select the most suitable datastore on the destination cluster and migrate the instance there. [1] https://kb.vmware.com/s/article/2054994 Co-Authored-By: gkotton@vmware.com blueprint vmware-live-migration Change-Id: I640013383e684497b2d99a9e1d6817d68c4d0a4b	2018-10-02 10:13:57 +03:00
Balazs Gibizer	3a43a931d4	consumer gen: more tests for delete allocation cases User confirming migration as well as a successfull live migration also triggers the delete allocation code path. This patch adds test coverage for these code paths. If the deletion of the source allocation of a confirmed migration fails then nova puts the instance to ERROR state. The instance still has two allocations in this state and deleting the instance only deletes the one that is held by the instance_uuid. This patch logs an ERROR describing that in this case the allocation held by the migration_uuid is leaked. The same true for live migration failing to delete allocaton on the source host. As this makes every caller of _delete_allocation_after_move logging the same error for AllocationDeleteFailed exception this patch moves that logging into _delete_allocation_after_move. Blueprint: use-nested-allocation-candidates Change-Id: I99427a52676826990d2a2ffc82cf30ad945b939c	2018-09-26 13:30:07 +02:00
Balazs Gibizer	53ca096750	consumer gen: move_allocations This patch renames the set_and_clear_allocations function in the scheduler report client to move_allocations and adds handling of consumer generation conflict for it. This call now moves everything from one consumer to another and raises AllocationMoveFailed to the caller if the move fails due to consumer generation conflict. When migration or resize fails to move the source host allocation to the migration_uuid then the API returns HTTP 409 and the migration is aborted. If reverting a migration, a resize, or a resize to same host fails to move the source host allocation back to the instance_uuid due consumer generation conflict the instance will be put into ERROR state. The instance still has two allocations in this state and deleting the instance only deletes the one that is held by the instance_uuid. This patch logs an ERROR describing that in this case the allocation held by the migration_uuid is leaked. Blueprint: use-nested-allocation-candidates Change-Id: Ie991d4b53e9bb5e7ec26da99219178ab7695abf6	2018-09-25 15:56:45 +00:00
Zuul	792cc425f2	Merge "Allow ability for non admin users to use all filters on server list."	2018-09-23 05:17:23 +00:00
Zuul	2274c08460	Merge "Remove deprecated hide_server_address_states option"	2018-09-21 13:58:57 +00:00
Zuul	ca39416b7c	Merge "Making instance/migration listing skipping down cells configurable"	2018-09-21 13:53:20 +00:00
Zuul	1c1a111e5a	Merge "Resource retrieving: add changes-before filter"	2018-09-21 11:48:29 +00:00
Surya Seetharaman	21c5f3e2e5	Making instance/migration listing skipping down cells configurable Presently if a cell is down, the instances in that cell are skipped from results. Sometimes this may not be desirable for operators as it may confuse the users who saw more instances in their previous listing than now. This patch adds a new api config option called list_records_by_skipping_down_cells which can be set to False (True by default) if the operator desires to just return an API error altogether if the user has any instance in the down cell instead of skipping. This is essentially a configurable revert of change I308b494ab07f6936bef94f4c9da45e9473e3534d for bug 1726301 so that operators can opt into the 500 response behaviour during listing. Change-Id: Id749761c58d4e1bc001b745d49b6ff0f3732e133 Related-Bug: #1726301	2018-09-20 22:02:26 +02:00
Matt Riedemann	9b69afd457	Remove deprecated hide_server_address_states option The hide_server_address_states config option and related policy rule were deprecated in Queens: I6040e8c2b3e132b0dfd09f82ae041b4786a63483 They are now removed in Stein as part of the API extension merge effort. Part of blueprint api-extensions-merge-stein Change-Id: Ib3582038274dedbf524ffcaffe818ff0e751489d	2018-09-19 11:36:44 -04:00
zhangbailin	28c1075b59	Resource retrieving: add changes-before filter This adds the changes-before filter to the servers, os-instance-actions and os-migrations APIs for filtering resources which were last updated before or equal to the given time. The changes-before filter, like the changes-since filter, will return deleted server resources. Part of bp support-to-query-nova-resources-filter-by-changes-before Change-Id: If91c179e3823c8b0da744a9363906b0f7b05c326	2018-09-19 09:56:56 -04:00

... 3 4 5 6 7 ...

1580 Commits