Add the NUMATopologyFilter

This patch adds the basic NUMA filter that will take the proposed instance topology and try to match it against that of a host. The matching will be done in the following manner: * Filter will try to match the exact NUMA cells of the instance to those of the host. It *will not* attempt to pack the instance onto the host. * It will consider the standard over-subscription limits for each cell, and provide limits to the compute host accordingly. * If instance has no topology defined, it will be considered for any host. * If instance has a topology defined, it will be considered only for NUMA capable hosts. DocImpact Change-Id: I8788dde69524c8a32a41ce31a96c89f9b09e91ce Blueprint: virt-driver-numa-placement
2014-08-18 17:16:26 +02:00 · 2014-08-18 17:16:26 +02:00 · 52bf71fa27
commit 52bf71fa27
parent 80b97fba33
3 changed files with 176 additions and 0 deletions
--- a/doc/source/devref/filter_scheduler.rst
+++ b/doc/source/devref/filter_scheduler.rst
@ -157,6 +157,8 @@ There are some standard filter classes to use (:mod:`nova.scheduler.filters`):
  properties and aggregate metadata.
 * |MetricsFilter| - filters hosts based on metrics weight_setting. Only hosts with
  the available metrics are passed.
+* |NUMATopologyFilter| - filters hosts based on the NUMA topology requested by the
+  instance, if any.

 Now we can focus on these standard filter classes in details. I will pass the
 simplest ones, such as |AllHostsFilter|, |CoreFilter| and |RamFilter| are,
@ -270,6 +272,24 @@ The value of this pair (``trusted``/``untrusted``) must match the
 integrity of a host (obtained from the Attestation service) before it is
 passed by the |TrustedFilter|.

+The |NUMATopologyFilter| considers the NUMA topology that was specified for the instance
+through the use of flavor extra_specs in combination with the image properties, as
+described in detail in the related nova-spec document:
+
+* http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/virt-driver-numa-placement.rst
+
+and try to match it with the topology exposed by the host, accounting for the
+``ram_allocation_ratio`` and ``cpu_allocation_ratio`` for over-subscription. The
+filtering is done in the following manner:
+
+* Filter will try to match the exact NUMA cells of the instance to those of
+  the host. It *will not* attempt to pack the instance onto the host.
+* It will consider the standard over-subscription limits for each host NUMA cell,
+  and provide limits to the compute host accordingly (as mentioned above).
+* If instance has no topology defined, it will be considered for any host.
+* If instance has a topology defined, it will be considered only for NUMA
+  capable hosts.
+
 To use filters you specify next two settings:

 * ``scheduler_available_filters`` - Defines filter classes made available to the
@ -390,6 +410,7 @@ in :mod:``nova.tests.scheduler``.
 .. |ServerGroupAffinityFilter| replace:: :class:`ServerGroupAffinityFilter <nova.scheduler.filters.affinity_filter.ServerGroupAffinityFilter>`
 .. |AggregateInstanceExtraSpecsFilter| replace:: :class:`AggregateInstanceExtraSpecsFilter <nova.scheduler.filters.aggregate_instance_extra_specs.AggregateInstanceExtraSpecsFilter>`
 .. |AggregateMultiTenancyIsolation| replace:: :class:`AggregateMultiTenancyIsolation <nova.scheduler.filters.aggregate_multitenancy_isolation.AggregateMultiTenancyIsolation>`
+.. |NUMATopologyFilter| replace:: :class:`NUMATopologyFilter <nova.scheduler.filters.numa_topology_filter.NUMATopologyFilter>`
 .. |RamWeigher| replace:: :class:`RamWeigher <nova.scheduler.weights.all_weighers.RamWeigher>`
 .. |AggregateImagePropertiesIsolation| replace:: :class:`AggregateImagePropertiesIsolation <nova.scheduler.filters.aggregate_image_properties_isolation.AggregateImagePropertiesIsolation>`
 .. |MetricsFilter| replace:: :class:`MetricsFilter <nova.scheduler.filters.metrics_filter.MetricsFilter>`
--- a/nova/scheduler/filters/numa_topology_filter.py
+++ b/nova/scheduler/filters/numa_topology_filter.py
@ -0,0 +1,54 @@
+#    Licensed under the Apache License, Version 2.0 (the "License"); you may
+#    not use this file except in compliance with the License. You may obtain
+#    a copy of the License at
+#
+#         http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+#    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+#    License for the specific language governing permissions and limitations
+#    under the License.
+
+from oslo.config import cfg
+
+from nova.scheduler import filters
+from nova.virt import hardware
+
+CONF = cfg.CONF
+CONF.import_opt('cpu_allocation_ratio', 'nova.scheduler.filters.core_filter')
+CONF.import_opt('ram_allocation_ratio', 'nova.scheduler.filters.ram_filter')
+
+
+class NUMATopologyFilter(filters.BaseHostFilter):
+    """Filter on requested NUMA topology."""
+
+    def host_passes(self, host_state, filter_properties):
+        ram_ratio = CONF.ram_allocation_ratio
+        cpu_ratio = CONF.cpu_allocation_ratio
+        instance = filter_properties.get('instance_properties', {})
+        instance_topology = hardware.instance_topology_from_instance(instance)
+        if instance_topology:
+            if host_state.numa_topology:
+                limit_cells = []
+                usage_after_instance = (
+                        hardware.get_host_numa_usage_from_instance(
+                            host_state, instance, never_serialize_result=True))
+                for cell in usage_after_instance.cells:
+                    max_cell_memory = int(cell.memory * ram_ratio)
+                    max_cell_cpu = len(cell.cpuset) * cpu_ratio
+                    if (cell.memory_usage > max_cell_memory or
+                            cell.cpu_usage > max_cell_cpu):
+                        return False
+                    limit_cells.append(
+                        hardware.VirtNUMATopologyCellLimit(
+                            cell.id, cell.cpuset, cell.memory,
+                            max_cell_cpu, max_cell_memory))
+                host_state.limits['numa_topology'] = (
+                        hardware.VirtNUMALimitTopology(
+                            cells=limit_cells).to_json())
+                return True
+            else:
+                return False
+        else:
+            return True
--- a/nova/tests/scheduler/test_host_filters.py
+++ b/nova/tests/scheduler/test_host_filters.py
@ -26,6 +26,7 @@ from nova.compute import arch
 from nova import context
 from nova import db
 from nova import objects
+from nova.objects import base as obj_base
 from nova.openstack.common import jsonutils
 from nova.openstack.common import timeutils
 from nova.pci import pci_stats
@ -35,8 +36,10 @@ from nova.scheduler.filters import ram_filter
 from nova.scheduler.filters import trusted_filter
 from nova import servicegroup
 from nova import test
+from nova.tests import fake_instance
 from nova.tests.scheduler import fakes
 from nova import utils
+from nova.virt import hardware

 CONF = cfg.CONF

@ -2015,3 +2018,101 @@ class HostFiltersTestCase(test.NoDBTestCase):
            hosts=['host1'],
            metadata={'max_instances_per_host': 'XXX'})
        self.assertTrue(filt_cls.host_passes(host, filter_properties))
+
+    def test_numa_topology_filter_pass(self):
+        instance_topology = hardware.VirtNUMAInstanceTopology(
+            cells=[hardware.VirtNUMATopologyCell(0, set([1]), 512),
+                   hardware.VirtNUMATopologyCell(1, set([3]), 512)])
+        instance = fake_instance.fake_instance_obj(self.context)
+        instance.numa_topology = (
+                objects.InstanceNUMATopology.obj_from_topology(
+                    instance_topology))
+        filter_properties = {
+            'instance_properties': obj_base.obj_to_primitive(instance)}
+        host = fakes.FakeHostState('host1', 'node1',
+                                   {'numa_topology': fakes.NUMA_TOPOLOGY})
+        filt_cls = self.class_map['NUMATopologyFilter']()
+        self.assertTrue(filt_cls.host_passes(host, filter_properties))
+
+    def test_numa_topology_filter_numa_instance_no_numa_host_fail(self):
+        instance_topology = hardware.VirtNUMAInstanceTopology(
+            cells=[hardware.VirtNUMATopologyCell(0, set([1]), 512),
+                   hardware.VirtNUMATopologyCell(1, set([3]), 512)])
+        instance = fake_instance.fake_instance_obj(self.context)
+        instance.numa_topology = (
+                objects.InstanceNUMATopology.obj_from_topology(
+                    instance_topology))
+
+        filter_properties = {
+            'instance_properties': obj_base.obj_to_primitive(instance)}
+        host = fakes.FakeHostState('host1', 'node1', {})
+        filt_cls = self.class_map['NUMATopologyFilter']()
+        self.assertFalse(filt_cls.host_passes(host, filter_properties))
+
+    def test_numa_topology_filter_numa_host_no_numa_instance_pass(self):
+        instance = fake_instance.fake_instance_obj(self.context)
+        instance.numa_topology = None
+        filter_properties = {
+            'instance_properties': obj_base.obj_to_primitive(instance)}
+        host = fakes.FakeHostState('host1', 'node1',
+                                   {'numa_topology': fakes.NUMA_TOPOLOGY})
+        filt_cls = self.class_map['NUMATopologyFilter']()
+        self.assertTrue(filt_cls.host_passes(host, filter_properties))
+
+    def test_numa_topology_filter_fail_memory(self):
+        self.flags(ram_allocation_ratio=1)
+
+        instance_topology = hardware.VirtNUMAInstanceTopology(
+            cells=[hardware.VirtNUMATopologyCell(0, set([1]), 1024),
+                   hardware.VirtNUMATopologyCell(1, set([3]), 512)])
+        instance = fake_instance.fake_instance_obj(self.context)
+        instance.numa_topology = (
+                objects.InstanceNUMATopology.obj_from_topology(
+                    instance_topology))
+        filter_properties = {
+            'instance_properties': obj_base.obj_to_primitive(instance)}
+        host = fakes.FakeHostState('host1', 'node1',
+                                   {'numa_topology': fakes.NUMA_TOPOLOGY})
+        filt_cls = self.class_map['NUMATopologyFilter']()
+        self.assertFalse(filt_cls.host_passes(host, filter_properties))
+
+    def test_numa_topology_filter_fail_cpu(self):
+        self.flags(cpu_allocation_ratio=1)
+
+        instance_topology = hardware.VirtNUMAInstanceTopology(
+            cells=[hardware.VirtNUMATopologyCell(0, set([1]), 512),
+                   hardware.VirtNUMATopologyCell(1, set([3, 4, 5]), 512)])
+        instance = fake_instance.fake_instance_obj(self.context)
+        instance.numa_topology = (
+                objects.InstanceNUMATopology.obj_from_topology(
+                    instance_topology))
+        filter_properties = {
+            'instance_properties': obj_base.obj_to_primitive(instance)}
+        host = fakes.FakeHostState('host1', 'node1',
+                                   {'numa_topology': fakes.NUMA_TOPOLOGY})
+        filt_cls = self.class_map['NUMATopologyFilter']()
+        self.assertFalse(filt_cls.host_passes(host, filter_properties))
+
+    def test_numa_topology_filter_pass_set_limit(self):
+        self.flags(cpu_allocation_ratio=21)
+        self.flags(ram_allocation_ratio=1.3)
+
+        instance_topology = hardware.VirtNUMAInstanceTopology(
+            cells=[hardware.VirtNUMATopologyCell(0, set([1]), 512),
+                   hardware.VirtNUMATopologyCell(1, set([3]), 512)])
+        instance = fake_instance.fake_instance_obj(self.context)
+        instance.numa_topology = (
+                objects.InstanceNUMATopology.obj_from_topology(
+                    instance_topology))
+        filter_properties = {
+            'instance_properties': obj_base.obj_to_primitive(instance)}
+        host = fakes.FakeHostState('host1', 'node1',
+                                   {'numa_topology': fakes.NUMA_TOPOLOGY})
+        filt_cls = self.class_map['NUMATopologyFilter']()
+        self.assertTrue(filt_cls.host_passes(host, filter_properties))
+        limits_topology = hardware.VirtNUMALimitTopology.from_json(
+                host.limits['numa_topology'])
+        self.assertEqual(limits_topology.cells[0].cpu_limit, 42)
+        self.assertEqual(limits_topology.cells[1].cpu_limit, 42)
+        self.assertEqual(limits_topology.cells[0].memory_limit, 665)
+        self.assertEqual(limits_topology.cells[1].memory_limit, 665)