From 64b4271279035289c5baa94b50d37c0a7bfd3984 Mon Sep 17 00:00:00 2001 From: Artom Lifshitz Date: Thu, 20 Aug 2020 14:29:05 -0400 Subject: [PATCH] Revert "Handle Neutron errors in _post_live_migration()" This reverts commit 9f205c620e57c9af760d3b52c93a4ff6d5c0e618. There's a lot of context here, so strap in for a long read. There are two unrelated bugs, but they intersect in their fixes. First, a bit of history. In Rocky, as part of the bp/neutron-new-port-binding-api blueprint, we started using Neutron's extended port binding API during live migration [1]. In short, that API allows a port to be bound to two hosts (source and destination), with the binding being active on only one. Not all versions and configurations of Neutron support the extended port binding API, so Nova needed a sentinel to determine whether the extended port binding API was in use during a live migration. The `migrate_data.vifs` OVO field was that sentinel [2]. Its presence indicated that Neutron's extended port binding API was being used. When SRIOV live migration was added in Train [3], it started unconditionally setting `migrate_data.vifs` [4]. In cases where Neutron did not support the extended port binding API, this resulted in incorrect information in `migrate_data.vifs`. The source vif information was correct, but, without the ability in Neutron to have port binding on both the source and the destination, the destination vif information was missing. This causes libvirt's _pre_live_migration_plug_vifs() method (introduced in the same bp/neutron-new-port-binding-api series as [1]) to explode when it attempts to get the destination vifs [5]. This is bug 1888395, and is being fixed in [6] by no longer unconditionally setting `migrate_data.vifs`, and only setting it if Neutron supports the extended port binding API. How is any of this relevant to the fix for bug 1879787, which is what's being reverted here? First, some context refresher. In bug 1879787, the call to network_api.get_instance_nw_info() in _post_live_migration() on the source compute manager eventually calls out to the Neutron REST API. If this fails, the exception is unhandled, and the migrating instance - which is fully running on the destination at this point - will never be updated in the database. This update normally happens later in post_live_migration_at_destination(). The network_info variable obtained from get_instance_nw_info() is used for two things: notifications - which aren't critical - and unplugging the instance's vifs on the source - which is very important! The latter is done in a call to driver.post_live_migration_at_source(), which is passed unplug_nw_info to tell it which vifs to unplug. unplug_nw_info contains either network_info, or is constructed from `migrate_data.vifs` if `migrate_data.vifs` exists. Based on [4], `migrate_data.vifs` was assumed to always exist. When [6] merges, this will no longer be the case. Thus, a different fix for bug 1879787 is needed, and the previously merged one needs to be reverted. [1] https://review.opendev.org/#/c/522537/ [2] https://review.opendev.org/#/c/522537/35/nova/conductor/tasks/live_migrate.py@255 [3] https://review.opendev.org/#/c/620115/ [4] https://review.opendev.org/#/c/620115/35/nova/compute/manager.py@6227 [5] https://review.opendev.org/#/c/566933/7/nova/virt/libvirt/driver.py@7577 [6] https://review.opendev.org/#/c/742180/ Change-Id: Ifa360b3ab2873fd94afdce1faddf61ff9d06163f --- nova/compute/manager.py | 9 +-------- nova/tests/functional/compute/test_live_migration.py | 8 ++++++-- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/nova/compute/manager.py b/nova/compute/manager.py index b3317db582db..a1fb80622cc2 100644 --- a/nova/compute/manager.py +++ b/nova/compute/manager.py @@ -8382,14 +8382,7 @@ class ComputeManager(manager.Manager): # Releasing vlan. # (not necessary in current implementation?) - network_info = None - try: - network_info = self.network_api.get_instance_nw_info( - ctxt, instance) - except Exception as e: - LOG.info('Unable to obtain network info: %s. Network info in ' - 'live.migration._post.start notification will be ' - 'omitted.', e, instance=instance) + network_info = self.network_api.get_instance_nw_info(ctxt, instance) self._notify_about_instance_usage(ctxt, instance, "live_migration._post.start", diff --git a/nova/tests/functional/compute/test_live_migration.py b/nova/tests/functional/compute/test_live_migration.py index d64c49d711a8..57965ae24097 100644 --- a/nova/tests/functional/compute/test_live_migration.py +++ b/nova/tests/functional/compute/test_live_migration.py @@ -215,6 +215,10 @@ class LiveMigrationNeutronFailure(integrated_helpers._IntegratedTestBase): with mock.patch.object(self.computes['src'].manager, '_post_live_migration', side_effect=stub_plm): - self._live_migrate(server, 'completed') + # FIXME(artom) Until bug 1879787 is fixed, the raised + # ConnectionError will go unhandled, the migration will fail, and + # the instance will still be reported as being on the source, even + # though it's actually running on the destination. + self._live_migrate(server, 'error', server_expected_state='ERROR') server = self.api.get_server(server['id']) - self.assertEqual('dest', server['OS-EXT-SRV-ATTR:host']) + self.assertEqual('src', server['OS-EXT-SRV-ATTR:host'])