Revert "Handle Neutron errors in _post_live_migration()"

This reverts commit 9f205c620e.

There's a lot of context here, so strap in for a long read.

There are two unrelated bugs, but they intersect in their fixes.

First, a bit of history. In Rocky, as part of the
bp/neutron-new-port-binding-api blueprint, we started using Neutron's
extended port binding API during live migration [1]. In short, that
API allows a port to be bound to two hosts (source and destination),
with the binding being active on only one. Not all versions and
configurations of Neutron support the extended port binding API, so
Nova needed a sentinel to determine whether the extended port binding
API was in use during a live migration. The `migrate_data.vifs` OVO
field was that sentinel [2]. Its presence indicated that Neutron's
extended port binding API was being used.

When SRIOV live migration was added in Train [3], it started
unconditionally setting `migrate_data.vifs` [4]. In cases where
Neutron did not support the extended port binding API, this resulted
in incorrect information in `migrate_data.vifs`. The source vif
information was correct, but, without the ability in Neutron to have
port binding on both the source and the destination, the destination
vif information was missing. This causes libvirt's
_pre_live_migration_plug_vifs() method (introduced in the same
bp/neutron-new-port-binding-api series as [1]) to explode when it
attempts to get the destination vifs [5]. This is bug 1888395, and is
being fixed in [6] by no longer unconditionally setting
`migrate_data.vifs`, and only setting it if Neutron supports the
extended port binding API.

How is any of this relevant to the fix for bug 1879787, which is
what's being reverted here? First, some context refresher.

In bug 1879787, the call to network_api.get_instance_nw_info() in
_post_live_migration() on the source compute manager eventually calls
out to the Neutron REST API. If this fails, the exception is
unhandled, and the migrating instance - which is fully running on the
destination at this point - will never be updated in the database.
This update normally happens later in
post_live_migration_at_destination().

The network_info variable obtained from get_instance_nw_info() is used
for two things: notifications - which aren't critical - and unplugging
the instance's vifs on the source - which is very important!

The latter is done in a call to
driver.post_live_migration_at_source(), which is passed unplug_nw_info
to tell it which vifs to unplug. unplug_nw_info contains either
network_info, or is constructed from `migrate_data.vifs` if
`migrate_data.vifs` exists. Based on [4], `migrate_data.vifs` was
assumed to always exist. When [6] merges, this will no longer be the
case.  Thus, a different fix for bug 1879787 is needed, and the
previously merged one needs to be reverted.

[1] https://review.opendev.org/#/c/522537/
[2] https://review.opendev.org/#/c/522537/35/nova/conductor/tasks/live_migrate.py@255

[3] https://review.opendev.org/#/c/620115/
[4] https://review.opendev.org/#/c/620115/35/nova/compute/manager.py@6227

[5] https://review.opendev.org/#/c/566933/7/nova/virt/libvirt/driver.py@7577

[6] https://review.opendev.org/#/c/742180/

Change-Id: Ifa360b3ab2873fd94afdce1faddf61ff9d06163f
This commit is contained in:
Artom Lifshitz 2020-08-20 14:29:05 -04:00
parent f521f4dbac
commit 64b4271279
2 changed files with 7 additions and 10 deletions

View File

@ -8382,14 +8382,7 @@ class ComputeManager(manager.Manager):
# Releasing vlan.
# (not necessary in current implementation?)
network_info = None
try:
network_info = self.network_api.get_instance_nw_info(
ctxt, instance)
except Exception as e:
LOG.info('Unable to obtain network info: %s. Network info in '
'live.migration._post.start notification will be '
'omitted.', e, instance=instance)
network_info = self.network_api.get_instance_nw_info(ctxt, instance)
self._notify_about_instance_usage(ctxt, instance,
"live_migration._post.start",

View File

@ -215,6 +215,10 @@ class LiveMigrationNeutronFailure(integrated_helpers._IntegratedTestBase):
with mock.patch.object(self.computes['src'].manager,
'_post_live_migration',
side_effect=stub_plm):
self._live_migrate(server, 'completed')
# FIXME(artom) Until bug 1879787 is fixed, the raised
# ConnectionError will go unhandled, the migration will fail, and
# the instance will still be reported as being on the source, even
# though it's actually running on the destination.
self._live_migrate(server, 'error', server_expected_state='ERROR')
server = self.api.get_server(server['id'])
self.assertEqual('dest', server['OS-EXT-SRV-ATTR:host'])
self.assertEqual('src', server['OS-EXT-SRV-ATTR:host'])