Add nova-manage placement sync_aggregates

This adds the "nova-manage placement sync_aggregates"
command which will compare nova host aggregates to
placement resource provider aggregates and add any
missing resource provider aggregates based on the nova
host aggregates.

At this time, it's only additive in that the command
does not remove resource provider aggregates if those
matching nodes are not found in nova host aggregates.
That likely needs to happen in a change that provides
an opt-in option for that behavior since it could be
destructive for externally-managed provider aggregates
for things like ironic nodes or shared storage pools.

Part of blueprint placement-mirror-host-aggregates

Change-Id: Iac67b6bf7e46fbac02b9d3cb59efc3c59b9e56c8
This commit is contained in:
Matt Riedemann 2018-06-16 09:59:25 -04:00
parent cb0bf8a5e1
commit aa6360d683
7 changed files with 526 additions and 0 deletions

View File

@ -321,6 +321,31 @@ Placement
* 4: Command completed successfully but no allocations were created.
* 127: Invalid input.
``nova-manage placement sync_aggregates [--verbose]``
Mirrors compute host aggregates to resource provider aggregates
in the Placement service. Requires the ``[api_database]`` and
``[placement]`` sections of the nova configuration file to be
populated.
Specify ``--verbose`` to get detailed progress output during execution.
.. note:: Depending on the size of your deployment and the number of
compute hosts in aggregates, this command could cause a non-negligible
amount of traffic to the placement service and therefore is
recommended to be run during maintenance windows.
.. versionadded:: Rocky
Return codes:
* 0: Successful run
* 1: A host was found with more than one matching compute node record
* 2: An unexpected error occurred while working with the placement API
* 3: Failed updating provider aggregates in placement
* 4: Host mappings not found for one or more host aggregate members
* 5: Compute node records not found for one or more hosts
* 6: Resource provider not found by uuid for a given host
See Also
========

View File

@ -47,6 +47,7 @@ from sqlalchemy.engine import url as sqla_url
from nova.api.openstack.placement import db_api as placement_db
from nova.api.openstack.placement.objects import consumer as consumer_obj
from nova.cmd import common as cmd_common
from nova.compute import api as compute_api
import nova.conf
from nova import config
from nova import context
@ -2014,6 +2015,213 @@ class PlacementCommands(object):
return 4
return 0
@staticmethod
def _get_rp_uuid_for_host(ctxt, host):
"""Finds the resource provider (compute node) UUID for the given host.
:param ctxt: cell-targeted nova RequestContext
:param host: name of the compute host
:returns: The UUID of the resource provider (compute node) for the host
:raises: nova.exception.HostMappingNotFound if no host_mappings record
is found for the host; indicates
"nova-manage cell_v2 discover_hosts" needs to be run on the cell.
:raises: nova.exception.ComputeHostNotFound if no compute_nodes record
is found in the cell database for the host; indicates the
nova-compute service on that host might need to be restarted.
:raises: nova.exception.TooManyComputesForHost if there are more than
one compute_nodes records in the cell database for the host which
is only possible (under normal circumstances) for ironic hosts but
ironic hosts are not currently supported with host aggregates so
if more than one compute node is found for the host, it is
considered an error which the operator will need to resolve
manually.
"""
# Get the host mapping to determine which cell it's in.
hm = objects.HostMapping.get_by_host(ctxt, host)
# Now get the compute node record for the host from the cell.
with context.target_cell(ctxt, hm.cell_mapping) as cctxt:
# There should really only be one, since only ironic
# hosts can have multiple nodes, and you can't have
# ironic hosts in aggregates for that reason. If we
# find more than one, it's an error.
nodes = objects.ComputeNodeList.get_all_by_host(
cctxt, host)
if len(nodes) > 1:
# This shouldn't happen, so we need to bail since we
# won't know which node to use.
raise exception.TooManyComputesForHost(
num_computes=len(nodes), host=host)
return nodes[0].uuid
@action_description(
_("Mirrors compute host aggregates to resource provider aggregates "
"in the Placement service. Requires the [api_database] and "
"[placement] sections of the nova configuration file to be "
"populated."))
@args('--verbose', action='store_true', dest='verbose', default=False,
help='Provide verbose output during execution.')
# TODO(mriedem): Add an option for the 'remove aggregate' behavior.
# We know that we want to mirror hosts aggregate membership to
# placement, but regarding removal, what if the operator or some external
# tool added the resource provider to an aggregate but there is no matching
# host aggregate, e.g. ironic nodes or shared storage provider
# relationships?
# TODO(mriedem): Probably want an option to pass a specific host instead of
# doing all of them.
def sync_aggregates(self, verbose=False):
"""Synchronizes nova host aggregates with resource provider aggregates
Adds nodes to missing provider aggregates in Placement.
NOTE: Depending on the size of your deployment and the number of
compute hosts in aggregates, this command could cause a non-negligible
amount of traffic to the placement service and therefore is
recommended to be run during maintenance windows.
Return codes:
* 0: Successful run
* 1: A host was found with more than one matching compute node record
* 2: An unexpected error occurred while working with the placement API
* 3: Failed updating provider aggregates in placement
* 4: Host mappings not found for one or more host aggregate members
* 5: Compute node records not found for one or more hosts
* 6: Resource provider not found by uuid for a given host
"""
# Start by getting all host aggregates.
ctxt = context.get_admin_context()
aggregate_api = compute_api.AggregateAPI()
placement = aggregate_api.placement_client
aggregates = aggregate_api.get_aggregate_list(ctxt)
# Now we're going to loop over the existing compute hosts in aggregates
# and check to see if their corresponding resource provider, found via
# the host's compute node uuid, are in the same aggregate. If not, we
# add the resource provider to the aggregate in Placement.
output = lambda msg: None
if verbose:
output = lambda msg: print(msg)
output(_('Filling in missing placement aggregates'))
# Since hosts can be in more than one aggregate, keep track of the host
# to its corresponding resource provider uuid to avoid redundant
# lookups.
host_to_rp_uuid = {}
unmapped_hosts = set() # keep track of any missing host mappings
computes_not_found = set() # keep track of missing nodes
providers_not_found = {} # map of hostname to missing provider uuid
for aggregate in aggregates:
output(_('Processing aggregate: %s') % aggregate.name)
for host in aggregate.hosts:
output(_('Processing host: %s') % host)
rp_uuid = host_to_rp_uuid.get(host)
if not rp_uuid:
try:
rp_uuid = self._get_rp_uuid_for_host(ctxt, host)
host_to_rp_uuid[host] = rp_uuid
except exception.HostMappingNotFound:
# Don't fail on this now, we can dump it at the end.
unmapped_hosts.add(host)
continue
except exception.ComputeHostNotFound:
# Don't fail on this now, we can dump it at the end.
computes_not_found.add(host)
continue
except exception.TooManyComputesForHost as e:
# TODO(mriedem): Should we treat this like the other
# errors and not fail immediately but dump at the end?
print(e.format_message())
return 1
# We've got our compute node record, so now we can look to
# see if the matching resource provider, found via compute
# node uuid, is in the same aggregate in placement, found via
# aggregate uuid.
# NOTE(mriedem): We could re-use placement.aggregate_add_host
# here although that has to do the provider lookup by host as
# well, but it does handle generation conflicts.
resp = placement.get( # use 1.19 to get the generation
'/resource_providers/%s/aggregates' % rp_uuid,
version='1.19')
if resp:
body = resp.json()
provider_aggregate_uuids = body['aggregates']
# The moment of truth: is the provider in the same host
# aggregate relationship?
aggregate_uuid = aggregate.uuid
if aggregate_uuid not in provider_aggregate_uuids:
# Add the resource provider to this aggregate.
provider_aggregate_uuids.append(aggregate_uuid)
# Now update the provider aggregates using the
# generation to ensure we're conflict-free.
aggregate_update_body = {
'aggregates': provider_aggregate_uuids,
'resource_provider_generation':
body['resource_provider_generation']
}
put_resp = placement.put(
'/resource_providers/%s/aggregates' % rp_uuid,
aggregate_update_body, version='1.19')
if put_resp:
output(_('Successfully added host (%(host)s) and '
'provider (%(provider)s) to aggregate '
'(%(aggregate)s).') %
{'host': host, 'provider': rp_uuid,
'aggregate': aggregate_uuid})
elif put_resp.status_code == 404:
# We must have raced with a delete on the resource
# provider.
providers_not_found[host] = rp_uuid
else:
# TODO(mriedem): Handle 409 conflicts by retrying
# the operation.
print(_('Failed updating provider aggregates for '
'host (%(host)s), provider (%(provider)s) '
'and aggregate (%(aggregate)s). Error: '
'%(error)s') %
{'host': host, 'provider': rp_uuid,
'aggregate': aggregate_uuid,
'error': put_resp.text})
return 3
elif resp.status_code == 404:
# The resource provider wasn't found. Store this for later.
providers_not_found[host] = rp_uuid
else:
print(_('An error occurred getting resource provider '
'aggregates from placement for provider '
'%(provider)s. Error: %(error)s') %
{'provider': rp_uuid, 'error': resp.text})
return 2
# Now do our error handling. Note that there is no real priority on
# the error code we return. We want to dump all of the issues we hit
# so the operator can fix them before re-running the command, but
# whether we return 4 or 5 or 6 doesn't matter.
return_code = 0
if unmapped_hosts:
print(_('The following hosts were found in nova host aggregates '
'but no host mappings were found in the nova API DB. Run '
'"nova-manage cell_v2 discover_hosts" and then retry. '
'Missing: %s') % ','.join(unmapped_hosts))
return_code = 4
if computes_not_found:
print(_('Unable to find matching compute_nodes record entries in '
'the cell database for the following hosts; does the '
'nova-compute service on each host need to be restarted? '
'Missing: %s') % ','.join(computes_not_found))
return_code = 5
if providers_not_found:
print(_('Unable to find matching resource provider record in '
'placement with uuid for the following hosts: %s. Try '
'restarting the nova-compute service on each host and '
'then retry.') %
','.join('(%s=%s)' % (host, providers_not_found[host])
for host in sorted(providers_not_found.keys())))
return_code = 6
return return_code
CATEGORIES = {
'api_db': ApiDbCommands,

View File

@ -2291,6 +2291,12 @@ class AllocationUpdateFailed(NovaException):
'Error: %(error)s')
class TooManyComputesForHost(NovaException):
msg_fmt = _('Unexpected number of compute node records '
'(%(num_computes)d) found for host %(host)s. There should '
'only be a one-to-one mapping.')
class CertificateValidationFailed(NovaException):
msg_fmt = _("Image signature certificate validation failed for "
"certificate: %(cert_uuid)s. %(reason)s")

View File

@ -434,6 +434,10 @@ class TestOpenStackClient(object):
return self.api_post('/os-aggregates/%s/action' % aggregate_id,
{'add_host': {'host': host}})
def remove_host_from_aggregate(self, aggregate_id, host):
return self.api_post('/os-aggregates/%s/action' % aggregate_id,
{'remove_host': {'host': host}})
def get_limits(self):
return self.api_get('/limits').body['limits']

View File

@ -10,6 +10,8 @@
# License for the specific language governing permissions and limitations
# under the License.
import mock
import fixtures
from six.moves import StringIO
@ -642,3 +644,89 @@ class TestNovaManagePlacementHealAllocations(
'/allocations/%s' % server['id'], version='1.12').body
self.assertEqual(server['tenant_id'], allocations['project_id'])
self.assertEqual(server['user_id'], allocations['user_id'])
class TestNovaManagePlacementSyncAggregates(
integrated_helpers.ProviderUsageBaseTestCase):
"""Functional tests for nova-manage placement sync_aggregates"""
# This is required by the parent class.
compute_driver = 'fake.SmallFakeDriver'
def setUp(self):
super(TestNovaManagePlacementSyncAggregates, self).setUp()
self.cli = manage.PlacementCommands()
# Start two computes. At least two computes are useful for testing
# to make sure removing one from an aggregate doesn't remove the other.
self._start_compute('host1')
self._start_compute('host2')
# Make sure we have two hypervisors reported in the API.
hypervisors = self.admin_api.api_get(
'/os-hypervisors').body['hypervisors']
self.assertEqual(2, len(hypervisors))
self.output = StringIO()
self.useFixture(fixtures.MonkeyPatch('sys.stdout', self.output))
def _create_aggregate(self, name):
return self.admin_api.post_aggregate({'aggregate': {'name': name}})
def test_sync_aggregates(self):
"""This is a simple test which does the following:
- add each host to a unique aggregate
- add both hosts to a shared aggregate
- run sync_aggregates and assert both providers are in two aggregates
- run sync_aggregates again and make sure nothing changed
"""
# create three aggregates, one per host and one shared
host1_agg = self._create_aggregate('host1')
host2_agg = self._create_aggregate('host2')
shared_agg = self._create_aggregate('shared')
# Add the hosts to the aggregates. We have to temporarily mock out the
# scheduler report client to *not* mirror the add host changes so that
# sync_aggregates will do the job.
with mock.patch('nova.scheduler.client.report.SchedulerReportClient.'
'aggregate_add_host'):
self.admin_api.add_host_to_aggregate(host1_agg['id'], 'host1')
self.admin_api.add_host_to_aggregate(host2_agg['id'], 'host2')
self.admin_api.add_host_to_aggregate(shared_agg['id'], 'host1')
self.admin_api.add_host_to_aggregate(shared_agg['id'], 'host2')
# Run sync_aggregates and assert both providers are in two aggregates.
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(0, result, self.output.getvalue())
host_to_rp_uuid = {}
for host in ('host1', 'host2'):
rp_uuid = self._get_provider_uuid_by_host(host)
host_to_rp_uuid[host] = rp_uuid
rp_aggregates = self._get_provider_aggregates(rp_uuid)
self.assertEqual(2, len(rp_aggregates),
'%s should be in two provider aggregates' % host)
self.assertIn(
'Successfully added host (%s) and provider (%s) to aggregate '
'(%s)' % (host, rp_uuid, shared_agg['uuid']),
self.output.getvalue())
# Remove host1 from the shared aggregate. Again, we have to temporarily
# mock out the call from the aggregates API to placement to mirror the
# change.
with mock.patch('nova.scheduler.client.report.SchedulerReportClient.'
'aggregate_remove_host'):
self.admin_api.remove_host_from_aggregate(
shared_agg['id'], 'host1')
# Run sync_aggregates and assert the provider for host1 is still in two
# aggregates and host2's provider is still in two aggregates.
# TODO(mriedem): When we add an option to remove providers from
# placement aggregates when the corresponding host isn't in a compute
# aggregate, we can test that the host1 provider is only left in one
# aggregate.
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(0, result, self.output.getvalue())
for host in ('host1', 'host2'):
rp_uuid = host_to_rp_uuid[host]
rp_aggregates = self._get_provider_aggregates(rp_uuid)
self.assertEqual(2, len(rp_aggregates),
'%s should be in two provider aggregates' % host)

View File

@ -21,6 +21,7 @@ import ddt
import fixtures
import mock
from oslo_db import exception as db_exc
from oslo_serialization import jsonutils
from oslo_utils import uuidutils
from six.moves import StringIO
@ -2599,6 +2600,180 @@ class TestNovaManagePlacement(test.NoDBTestCase):
'/allocations/%s' % uuidsentinel.instance, expected_put_data,
version='1.12')
@mock.patch('nova.compute.api.AggregateAPI.get_aggregate_list',
return_value=objects.AggregateList(objects=[
objects.Aggregate(name='foo', hosts=['host1'])]))
@mock.patch('nova.objects.HostMapping.get_by_host',
side_effect=exception.HostMappingNotFound(name='host1'))
def test_sync_aggregates_host_mapping_not_found(
self, mock_get_host_mapping, mock_get_aggs):
"""Tests that we handle HostMappingNotFound."""
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(4, result)
self.assertIn('The following hosts were found in nova host aggregates '
'but no host mappings were found in the nova API DB. '
'Run "nova-manage cell_v2 discover_hosts" and then '
'retry. Missing: host1', self.output.getvalue())
@mock.patch('nova.compute.api.AggregateAPI.get_aggregate_list',
return_value=objects.AggregateList(objects=[
objects.Aggregate(name='foo', hosts=['host1'])]))
@mock.patch('nova.objects.HostMapping.get_by_host',
return_value=objects.HostMapping(
host='host1', cell_mapping=objects.CellMapping()))
@mock.patch('nova.objects.ComputeNodeList.get_all_by_host',
return_value=objects.ComputeNodeList(objects=[
objects.ComputeNode(hypervisor_hostname='node1'),
objects.ComputeNode(hypervisor_hostname='node2')]))
@mock.patch('nova.context.target_cell')
def test_sync_aggregates_too_many_computes_for_host(
self, mock_target_cell, mock_get_nodes, mock_get_host_mapping,
mock_get_aggs):
"""Tests the scenario that a host in an aggregate has more than one
compute node so the command does not know which compute node uuid to
use for the placement resource provider aggregate and fails.
"""
mock_target_cell.return_value.__enter__.return_value = (
mock.sentinel.cell_context)
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(1, result)
self.assertIn('Unexpected number of compute node records '
'(2) found for host host1. There should '
'only be a one-to-one mapping.', self.output.getvalue())
mock_get_nodes.assert_called_once_with(
mock.sentinel.cell_context, 'host1')
@mock.patch('nova.compute.api.AggregateAPI.get_aggregate_list',
return_value=objects.AggregateList(objects=[
objects.Aggregate(name='foo', hosts=['host1'])]))
@mock.patch('nova.objects.HostMapping.get_by_host',
return_value=objects.HostMapping(
host='host1', cell_mapping=objects.CellMapping()))
@mock.patch('nova.objects.ComputeNodeList.get_all_by_host',
side_effect=exception.ComputeHostNotFound(host='host1'))
@mock.patch('nova.context.target_cell')
def test_sync_aggregates_compute_not_found(
self, mock_target_cell, mock_get_nodes, mock_get_host_mapping,
mock_get_aggs):
"""Tests the scenario that no compute node record is found for a given
host in an aggregate.
"""
mock_target_cell.return_value.__enter__.return_value = (
mock.sentinel.cell_context)
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(5, result)
self.assertIn('Unable to find matching compute_nodes record entries '
'in the cell database for the following hosts; does the '
'nova-compute service on each host need to be '
'restarted? Missing: host1', self.output.getvalue())
mock_get_nodes.assert_called_once_with(
mock.sentinel.cell_context, 'host1')
@mock.patch('nova.compute.api.AggregateAPI.get_aggregate_list',
return_value=objects.AggregateList(objects=[
objects.Aggregate(name='foo', hosts=['host1'])]))
@mock.patch('nova.scheduler.client.report.SchedulerReportClient.get',
return_value=fake_requests.FakeResponse(404))
def test_sync_aggregates_get_provider_aggs_provider_not_found(
self, mock_placement_get, mock_get_aggs):
"""Tests the scenario that a resource provider is not found in the
placement service for a compute node found in a nova host aggregate.
"""
with mock.patch.object(self.cli, '_get_rp_uuid_for_host',
return_value=uuidsentinel.rp_uuid):
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(6, result)
self.assertIn('Unable to find matching resource provider record in '
'placement with uuid for the following hosts: '
'(host1=%s)' % uuidsentinel.rp_uuid,
self.output.getvalue())
mock_placement_get.assert_called_once_with(
'/resource_providers/%s/aggregates' % uuidsentinel.rp_uuid,
version='1.19')
@mock.patch('nova.compute.api.AggregateAPI.get_aggregate_list',
return_value=objects.AggregateList(objects=[
objects.Aggregate(name='foo', hosts=['host1'])]))
@mock.patch('nova.scheduler.client.report.SchedulerReportClient.get',
return_value=fake_requests.FakeResponse(500, content='yikes!'))
def test_sync_aggregates_get_provider_aggs_placement_server_error(
self, mock_placement_get, mock_get_aggs):
"""Tests the scenario that placement returns an unexpected server
error when getting aggregates for a given resource provider.
"""
with mock.patch.object(self.cli, '_get_rp_uuid_for_host',
return_value=uuidsentinel.rp_uuid):
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(2, result)
self.assertIn('An error occurred getting resource provider '
'aggregates from placement for provider %s. '
'Error: yikes!' % uuidsentinel.rp_uuid,
self.output.getvalue())
@mock.patch('nova.compute.api.AggregateAPI.get_aggregate_list',
return_value=objects.AggregateList(objects=[
objects.Aggregate(name='foo', hosts=['host1'],
uuid=uuidsentinel.aggregate)]))
@mock.patch('nova.scheduler.client.report.SchedulerReportClient.get')
@mock.patch('nova.scheduler.client.report.SchedulerReportClient.put',
return_value=fake_requests.FakeResponse(404))
def test_sync_aggregates_put_aggregates_fails_provider_not_found(
self, mock_placement_put, mock_placement_get, mock_get_aggs):
"""Tests the scenario that we are trying to add a provider to an
aggregate in placement but the
PUT /resource_providers/{rp_uuid}/aggregates call fails with a 404
because the provider is not found.
"""
mock_placement_get.return_value = (
fake_requests.FakeResponse(200, content=jsonutils.dumps({
'aggregates': [],
'resource_provider_generation': 1})))
with mock.patch.object(self.cli, '_get_rp_uuid_for_host',
return_value=uuidsentinel.rp_uuid):
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(6, result)
self.assertIn('Unable to find matching resource provider record in '
'placement with uuid for the following hosts: '
'(host1=%s)' % uuidsentinel.rp_uuid,
self.output.getvalue())
expected_body = {
'aggregates': [uuidsentinel.aggregate],
'resource_provider_generation': 1
}
self.assertEqual(1, mock_placement_put.call_count)
self.assertDictEqual(expected_body, mock_placement_put.call_args[0][1])
@mock.patch('nova.compute.api.AggregateAPI.get_aggregate_list',
return_value=objects.AggregateList(objects=[
objects.Aggregate(name='foo', hosts=['host1'],
uuid=uuidsentinel.aggregate)]))
@mock.patch('nova.scheduler.client.report.SchedulerReportClient.get')
@mock.patch('nova.scheduler.client.report.SchedulerReportClient.put',
return_value=fake_requests.FakeResponse(
409,
content="Resource provider's generation already changed"))
def test_sync_aggregates_put_aggregates_fails_generation_conflict(
self, mock_placement_put, mock_placement_get, mock_get_aggs):
"""Tests the scenario that we are trying to add a provider to an
aggregate in placement but the
PUT /resource_providers/{rp_uuid}/aggregates call fails with a 404
because the provider is not found.
"""
mock_placement_get.return_value = (
fake_requests.FakeResponse(200, content=jsonutils.dumps({
'aggregates': [],
'resource_provider_generation': 1})))
with mock.patch.object(self.cli, '_get_rp_uuid_for_host',
return_value=uuidsentinel.rp_uuid):
result = self.cli.sync_aggregates(verbose=True)
self.assertEqual(3, result)
self.assertIn("Failed updating provider aggregates for "
"host (host1), provider (%s) and aggregate "
"(%s). Error: Resource provider's generation already "
"changed" %
(uuidsentinel.rp_uuid, uuidsentinel.aggregate),
self.output.getvalue())
class TestNovaManageMain(test.NoDBTestCase):
"""Tests the nova-manage:main() setup code."""

View File

@ -0,0 +1,20 @@
---
features:
- |
A ``nova-manage placement sync_aggregates`` command has been added which
can be used to mirror nova host aggregates to resource provider aggregates
in the placement service. This is a useful tool if you are using aggregates
in placement to optimize scheduling:
https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregates-in-placement
The ``os-aggregates`` compute API ``add_host`` and ``remove_host`` actions
will automatically add/remove compute node resource providers from resource
provider aggregates in the placement service if the ``nova-api`` service
is configured to communicate with the placement service, so this command
is mostly useful for existing deployments with host aggregates which are
not yet mirrored in the placement service.
For more details, see the command documentation:
https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement