Browse Source

Enhance service restart in functional env

Bugfix Icaf1bae8cb040b939f916a19ce026031ddb84af7 showed that restarting
a compute service in the functional env is unrealistic causing faults
to slip through. During that bug fix only the minimal change was done
in the functional env regarding compute service restart to reproduce
the reported fault. However the restart of the compute service could
be made even more realistic.

This patch simulates a compute service restart in the functional env
by stopping the original compute service and starting a totally new
compute service for the same host and node. This way we can make sure
that we get a brand new ComputeManager in the new service and no
state can leak between the old and the new service.

This change revealed another shortcoming of the functional env.
In the real world the nova-compute service could be restarted without
loosing any running servers on the compute host. But with the naive
implementation of this change the compute service is re-created. This
means that a new ComputeManager is instantiated that loads a new
FakeDriver instance as well. That new FakeDriver instance then reports
an empty hypervisor. This behavior is not totally unrealistic as it
simulates such a compute host restart that cleans the hypervisor state
as well (e.g. compute host redeployment). However this type of restart
shows another bug in the code path that destroys and deallocates
evacuated instance from the source host. Therefore this patch
implements the compute service restart in a way that simulates only a
service restart and not a full compute restart. A subsequent patch will
add a test that uses the clean hypervisor case to reproduces the
revealed bug.

Related-Bug: #1724172

On stable/stein:

Closes-Bug: #1859766

Conflicts:
    doc/notification_samples/libvirt-connect-error.json
    nova/test.py
    nova/tests/functional/libvirt/test_reshape.py
    nova/tests/functional/test_servers.py

NOTE(elod.illes): files conflicts details:
* libvirt-connect-error.json:
  File added only in Stein with libvirt.error notification
  transformation patch I7d2287ce06d77c0afdef0ea8bdfb70f6c52d3c50
* test.py:
  Patches Iecf4dcf8e648c9191bf8846428683ec81812c026 (Remove patching
  the mock lib) and Ibb8c12fb2799bb5ceb9e3d72a2b86dbb4f14451e (Use a
  static resource tracker in compute manager) were not backported to
  Rocky
* test_reshape.py:
  File added only in Stein in the frame of 'Handling Reshaped Provider
  Trees' feature, with patch Ide797ebf7790d69042ae275ebec6ced3fa4787b6
* test_servers.py:
  Patch I7cbd5d9fb875ebf72995362e0b6693492ce32051 (Reject forced move
  with nested source allocation) is not present in Rocky as it is part
  of 'Nested Resource Providers - Allocation Candidates' implemented in
  Stein

Change-Id: I9d6cd6259659a35383c0c9c21db72a9434ba86b1
(cherry picked from commit 2794748d9c)
(cherry picked from commit b874c409c1)
changes/33/713033/1
Balazs Gibizer 1 year ago
committed by Elod Illes
parent
commit
53a893f7c9
3 changed files with 50 additions and 17 deletions
  1. +45
    -12
      nova/test.py
  2. +1
    -1
      nova/tests/functional/regressions/test_bug_1794996.py
  3. +4
    -4
      nova/tests/functional/test_servers.py

+ 45
- 12
nova/test.py View File

@@ -432,20 +432,29 @@ class TestCase(testtools.TestCase):
ctxt = context.get_context()
cell_name = kwargs.pop('cell', CELL1_NAME) or CELL1_NAME
cell = self.cell_mappings[cell_name]
hm = objects.HostMapping(context=ctxt,
host=host or name,
cell_mapping=cell)
hm.create()
self.host_mappings[hm.host] = hm
if (host or name) not in self.host_mappings:
# NOTE(gibi): If the HostMapping does not exists then this is
# the first start of the service so we create the mapping.
hm = objects.HostMapping(context=ctxt,
host=host or name,
cell_mapping=cell)
hm.create()
self.host_mappings[hm.host] = hm
svc = self.useFixture(
nova_fixtures.ServiceFixture(name, host, cell=cell, **kwargs))

return svc.service

def restart_compute_service(self, compute):
"""Restart a compute service in a realistic way.
def restart_compute_service(self, compute, keep_hypervisor_state=True):
"""Stops the service and starts a new one to have realistic restart

:param:compute: the nova-compute service to be restarted
:param:keep_hypervisor_state: If true then already defined instances
will survive the compute service restart.
If false then the new service will see
an empty hypervisor
:returns: a new compute service instance serving the same host and
and node
"""

# NOTE(gibi): The service interface cannot be used to simulate a real
@@ -455,12 +464,36 @@ class TestCase(testtools.TestCase):
# a stop start. The service.kill() call cannot help as it deletes
# the service from the DB which is unrealistic and causes that some
# operation that refers to the killed host (e.g. evacuate) fails.
# So this helper method tries to simulate a better compute service
# restart by cleaning up some of the internal state of the compute
# manager.
# So this helper method will stop the original service and then starts
# a brand new compute service for the same host and node. This way
# a new ComputeManager instance will be created and initialized during
# the service startup.
compute.stop()
compute.manager._resource_tracker = None
compute.start()

# this service was running previously so we have to make sure that
# we restart it in the same cell
cell_name = self.host_mappings[compute.host].cell_mapping.name

if keep_hypervisor_state:
# NOTE(gibi): FakeDriver does not provide a meaningful way to
# define some servers that exists already on the hypervisor when
# the driver is (re)created during the service startup. This means
# that we cannot simulate that the definition of a server
# survives a nova-compute service restart on the hypervisor.
# Instead here we save the FakeDriver instance that knows about
# the defined servers and inject that driver into the new Manager
# class during the startup of the compute service.
old_driver = compute.manager.driver
with mock.patch(
'nova.virt.driver.load_compute_driver') as load_driver:
load_driver.return_value = old_driver
new_compute = self.start_service(
'compute', host=compute.host, cell=cell_name)
else:
new_compute = self.start_service(
'compute', host=compute.host, cell=cell_name)

return new_compute

def assertJsonEqual(self, expected, observed, message=''):
"""Asserts that 2 complex data structures are json equivalent.


+ 1
- 1
nova/tests/functional/regressions/test_bug_1794996.py View File

@@ -176,4 +176,4 @@ class TestEvacuateDeleteServerRestartOriginalCompute(
self._delete_and_check_allocations(server)

# restart the source compute
self.restart_compute_service(self.compute1)
self.compute1 = self.restart_compute_service(self.compute1)

+ 4
- 4
nova/tests/functional/test_servers.py View File

@@ -2310,7 +2310,7 @@ class ServerMovingTests(integrated_helpers.ProviderUsageBaseTestCase):
self.assertFlavorMatchesAllocation(self.flavor1, dest_allocation)

# restart the source compute
self.restart_compute_service(self.compute1)
self.compute1 = self.restart_compute_service(self.compute1)

self.admin_api.put_service(
source_compute_id, {'forced_down': 'false'})
@@ -2387,7 +2387,7 @@ class ServerMovingTests(integrated_helpers.ProviderUsageBaseTestCase):
self.assertFlavorMatchesAllocation(self.flavor1, dest_allocation)

# restart the source compute
self.restart_compute_service(self.compute1)
self.compute1 = self.restart_compute_service(self.compute1)
self.admin_api.put_service(
source_compute_id, {'forced_down': 'false'})

@@ -2473,7 +2473,7 @@ class ServerMovingTests(integrated_helpers.ProviderUsageBaseTestCase):
self.assertFlavorMatchesAllocation(self.flavor1, source_allocation)

# restart the source compute
self.restart_compute_service(self.compute1)
self.compute1 = self.restart_compute_service(self.compute1)
self.admin_api.put_service(
source_compute_id, {'forced_down': 'false'})

@@ -2546,7 +2546,7 @@ class ServerMovingTests(integrated_helpers.ProviderUsageBaseTestCase):
self.assertFlavorMatchesAllocation(self.flavor1, source_allocation)

# restart the source compute
self.restart_compute_service(self.compute1)
self.compute1 = self.restart_compute_service(self.compute1)
self.admin_api.put_service(
source_compute_id, {'forced_down': 'false'})



Loading…
Cancel
Save