Browse Source

Fix AttributeError in RT._update_usage_from_migration

Change Ieb539c9a0cfbac743c579a1633234537a8e3e3ee in Stein
added some logging in _update_usage_from_migration to log
the flavor for an inbound and outbound migration.

If an instance is resized and then the resize is immediately
confirmed, it's possible to race with ComputeManager._confirm_resize
setting the instance.old_flavor to None before the migration
status is changed to "confirmed" while the update_available_resource
periodic is running which will result in _update_usage_from_migration
hitting an AttributeError when trying to log instance.old_flavor.flavorid
since instance.old_flavor is None.

There are a few key points there:

- We get into _update_usage_from_migration because the
  _update_available_resource method gets in-progress migrations
  related to the host (in this case the source compute) and the
  migration is consider in-progress until its status is "confirmed".

- The instance is not in the tracked_instances dict when
  _update_usage_from_migration runs because RT only tracks instances
  where the instance.host matches the RT.host and in this case the
  instance has been resized to another compute and the instance.host
  is pointing at the dest compute.

The fix here is to simply check if we got the instance.old_flavor and
not log the message if we do not have it, which gets us back to the old
behavior.

This bug was found by noticing it in CI job logs - there is a link to
hits in logstash in the bug report.

As for the "incoming and not tracked" case in _update_usage_from_migration
I have not modified that since I am not sure we have the same race nor
have I seen it in CI logs.

Change-Id: I43e34b3ff1424d42632a3e8f842c93508905aa1a
Closes-Bug: #1834349
tags/20.0.0.0rc1
Matt Riedemann 3 months ago
parent
commit
818419c9d3
2 changed files with 31 additions and 2 deletions
  1. 8
    2
      nova/compute/resource_tracker.py
  2. 23
    0
      nova/tests/unit/compute/test_resource_tracker.py

+ 8
- 2
nova/compute/resource_tracker.py View File

@@ -1114,8 +1114,14 @@ class ResourceTracker(object):
1114 1114
             itype = self._get_instance_type(instance, 'old_', migration)
1115 1115
             numa_topology = self._get_migration_context_resource(
1116 1116
                 'numa_topology', instance, prefix='old_')
1117
-            LOG.debug('Starting to track outgoing migration %s with flavor %s',
1118
-                      migration.uuid, itype.flavorid, instance=instance)
1117
+            # We could be racing with confirm_resize setting the
1118
+            # instance.old_flavor field to None before the migration status
1119
+            # is "confirmed" so if we did not find the flavor in the outgoing
1120
+            # resized instance we won't track it.
1121
+            if itype:
1122
+                LOG.debug('Starting to track outgoing migration %s with '
1123
+                          'flavor %s', migration.uuid, itype.flavorid,
1124
+                          instance=instance)
1119 1125
 
1120 1126
         if itype:
1121 1127
             cn = self.compute_nodes[nodename]

+ 23
- 0
nova/tests/unit/compute/test_resource_tracker.py View File

@@ -2761,6 +2761,29 @@ class TestUpdateUsageFromMigration(test.NoDBTestCase):
2761 2761
                                         _NODENAME)
2762 2762
         self.assertFalse(get_mock.called)
2763 2763
 
2764
+    def test_missing_old_flavor_outbound_resize(self):
2765
+        """Tests the case that an instance is not being tracked on the source
2766
+        host because it has been resized to a dest host. The confirm_resize
2767
+        operation in ComputeManager sets instance.old_flavor to None before
2768
+        the migration.status is changed to "confirmed" so the source compute
2769
+        RT considers it an in-progress migration and tries to update tracked
2770
+        usage from the instance.old_flavor (which is None when
2771
+        _update_usage_from_migration runs). This test just makes sure that the
2772
+        RT method gracefully handles the instance.old_flavor being gone.
2773
+        """
2774
+        migration = _MIGRATION_FIXTURES['source-only']
2775
+        rt = resource_tracker.ResourceTracker(
2776
+            migration.source_compute, mock.sentinel.virt_driver)
2777
+        ctxt = context.get_admin_context()
2778
+        instance = objects.Instance(
2779
+            uuid=migration.instance_uuid, old_flavor=None,
2780
+            migration_context=objects.MigrationContext())
2781
+        rt._update_usage_from_migration(
2782
+            ctxt, instance, migration, migration.source_node)
2783
+        self.assertNotIn('Starting to track outgoing migration',
2784
+                         self.stdlog.logger.output)
2785
+        self.assertNotIn(migration.instance_uuid, rt.tracked_migrations)
2786
+
2764 2787
 
2765 2788
 class TestUpdateUsageFromMigrations(BaseTestCase):
2766 2789
     @mock.patch('nova.compute.resource_tracker.ResourceTracker.'

Loading…
Cancel
Save