nova/nova/compute
Balazs Gibizer 34e0c0205b Store old_flavor already on source host during resize
During resize, on the source host, in resize_instance(), the instance.host
and .node is updated to point to the destination host. This indicates to
the source host's resource tracker that the allocation of this instance
does not need to be tracked as an instance but as an outbound migration
instead. However for the source host's resource tracker to do that it,
needs to use the instance.old_flavor. Unfortunately the
instance.old_flavor is only set during finish_resize() on the
destination host. (resize_instance cast to the finish_resize). So it is
possible that a running resize_instance() set the instance.host to point
to the destination and then before the finish_resize could set the
old_flavor an update_available_resources periodic runs on the source
host. This causes that the allocation of this instance is not tracked as
an instance as the instance.host point to the destination but it is not
tracked as a migration either as the instance.old_flavor is not yet set.
So the allocation on the source host is simply dropped by the periodic
job.

When such migration is confirmed the confirm_resize() tries to drop
the same resource allocation again but fails as the pinned CPUs of the
instance already freed.

When such migration is reverted instead, then revert succeeds but the
source host resource allocation will not contain the resource allocation
of the instance until the next update_available_resources periodic runs
and corrects it.

This does not affect resources tracked exclusively in placement (e.g.
VCPU, MEMORY_MB, DISK_GB) but it does affect NUMA related resource that
are still tracked in the resource tracker (e.g. huge pages, pinned
CPUs).

This patch moves the instance.old_flavor setting to the source node to
the same transaction that sets the instance.host to point to the
destination host. Hence solving the race condition.

Change-Id: Ic0d6c59147abe5e094e8f13e0d3523b178daeba9
Closes-Bug: #1944759
(cherry picked from commit b841e55321)
(cherry picked from commit d4edcd62ba)
(cherry picked from commit c8b04d183f)
2021-09-27 14:39:52 +02:00
..
monitors Remove six.add_metaclass 2020-08-15 07:45:39 +00:00
__init__.py Remove nova.compute.*API() shims 2019-06-12 16:09:46 +01:00
api.py Handle instance = None in _local_delete_cleanup 2021-02-23 20:12:47 +00:00
build_results.py
claims.py objects: Add MigrationTypeField 2020-05-08 14:45:54 +01:00
flavors.py trivial: Remove dead code 2019-12-12 10:55:02 +00:00
instance_actions.py api: Log os-resetState as an instance action 2021-01-26 09:17:57 +00:00
instance_list.py Plumbing for ignoring list_records_by_skipping_down_cells 2019-02-08 16:28:28 -05:00
manager.py Store old_flavor already on source host during resize 2021-09-27 14:39:52 +02:00
migration_list.py Refactor scatter-gather utility to return exception objects 2018-10-31 15:18:07 -04:00
multi_cell_list.py Remove six.add_metaclass 2020-08-15 07:45:39 +00:00
power_state.py Removed enum duplication from nova.compute 2016-09-02 07:30:44 +00:00
provider_config.py Provider Config File: Coding style and test cases improvement 2020-09-01 01:05:34 +00:00
provider_tree.py Add resources dict into _Provider 2019-09-13 08:50:35 +00:00
resource_tracker.py Set instance host and drop migration under lock 2020-11-18 11:36:36 +01:00
rpcapi.py Update compute rpc version alias for victoria 2020-09-12 01:29:34 +09:00
stats.py Change consecutive build failure limit to a weigher 2018-06-06 15:18:50 -07:00
task_states.py Fix resource tracker updates during instance evacuation 2018-09-12 13:05:29 +03:00
utils.py virt: Remove 'is_xenapi' helper 2020-09-11 14:09:06 +01:00
vm_states.py Removed enum duplication from nova.compute 2016-09-02 07:30:44 +00:00