Exclude deleted service records when calling hypervisor statistics

Hypervisor statistics could be incorrect if not
exclude deleted service records from DB.

User may stop 'nova-compute' service on some
compute nodes and delete the service from nova.
When delete 'nova-compute' service, it performs
'soft-delete' to the corresponding db records in
both 'service' table and 'compute_nodes' table if
the compute_nodes record is old, i.e. it is linked
to the service record. For modern compute_nodes
records, they aren't linked to the services table
so deleting the services record will not delete
the compute_nodes record, and the ResourceTracker
won't recreate the compute_nodes record if the host
and hypervisor_hostname still match the existing
record, but restarting the process after deleting
the service will create a new services table record
with the same host/binary/topic.

If the 'nova-compute' service on that server
re-starts, it will automatically add a record
in 'compute_nodes' table (assuming it was deleted
because it was an old-style record) and also a correspoding
record in 'service' table, and if the host name
of the compute node did not change, the newly
created records in 'service' and 'compute_nodes'
table will be identical to the priously soft-deleted
records except the deleted row.

When calling Hypervisor-statistics, the DB layer
joined records across the whole deployment by
comparing records' host field selected from
serivce table and records' host field selected
from compute_nodes table, and the calculated
results could be multiplied if multiple records
from service table have the same host field,
and this scenario could happen if user perform
the above actions.

Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>

Change-Id: I9dfa15f69f8ef9c6cb36b2734a8601bd73e9d6b3
Closes-Bug: #1692397
(cherry picked from commit 3d3e9cdd77)
(cherry picked from commit 74e2a400b2)
This commit is contained in:
Kevin_Zheng 2017-05-23 20:28:28 +08:00 committed by Matt Riedemann
parent 9d299ae50e
commit 6dc2a0ec1c
2 changed files with 51 additions and 1 deletions

View File

@ -746,7 +746,8 @@ def compute_node_statistics(context):
inner_sel.c.service_id == services_tbl.c.id
),
services_tbl.c.disabled == false(),
services_tbl.c.binary == 'nova-compute'
services_tbl.c.binary == 'nova-compute',
services_tbl.c.deleted == 0
)
)

View File

@ -8068,6 +8068,55 @@ class ComputeNodeTestCase(test.TestCase, ModelsObjectComparatorMixin):
for key, value in six.iteritems(data):
self.assertEqual(value, stats.pop(key))
def test_compute_node_statistics_delete_and_recreate_service(self):
# Test added for bug #1692397, this test tests that deleted
# service record will not be selected when calculate compute
# node statistics.
# Let's first assert what we expect the setup to look like.
self.assertEqual(1, len(db.service_get_all_by_binary(
self.ctxt, 'nova-compute')))
self.assertEqual(1, len(db.compute_node_get_all_by_host(
self.ctxt, 'host1')))
# Get the statistics for the original node/service before we delete
# the service.
original_stats = db.compute_node_statistics(self.ctxt)
# At this point we have one compute_nodes record and one services
# record pointing at the same host. Now we need to simulate the user
# deleting the service record in the API, which will only delete very
# old compute_nodes records where the service and compute node are
# linked via the compute_nodes.service_id column, which is the case
# in this test class; at some point we should decouple those to be more
# modern.
db.service_destroy(self.ctxt, self.service['id'])
# Now we're going to simulate that the nova-compute service was
# restarted, which will create a new services record with a unique
# uuid but it will have the same host, binary and topic values as the
# deleted service. The unique constraints don't fail in this case since
# they include the deleted column and this service and the old service
# have a different deleted value.
service2_dict = self.service_dict.copy()
service2_dict['uuid'] = uuidsentinel.service2_uuid
db.service_create(self.ctxt, service2_dict)
# Again, because of the way the setUp is done currently, the compute
# node was linked to the original now-deleted service, so when we
# deleted that service it also deleted the compute node record, so we
# have to simulate the ResourceTracker in the nova-compute worker
# re-creating the compute nodes record.
new_compute_node = self.compute_node_dict.copy()
del new_compute_node['service_id'] # make it a new style compute node
new_compute_node['uuid'] = uuidsentinel.new_compute_uuid
db.compute_node_create(self.ctxt, new_compute_node)
# Now get the stats for all compute nodes (we just have one) and it
# should just be for a single service, not double, as we should ignore
# the (soft) deleted service.
stats = db.compute_node_statistics(self.ctxt)
self.assertDictEqual(original_stats, stats)
def test_compute_node_not_found(self):
self.assertRaises(exception.ComputeHostNotFound, db.compute_node_get,
self.ctxt, 100500)