Add support for translating CPU policy extra specs, image meta

Map 'hw:cpu_policy' and 'hw:cpu_thread_policy' as follows: hw:cpu_policy dedicated -> resources:PCPU=${flavor.vcpus} shared -> resources:VCPU=${flavor.vcpus} hw:cpu_thread_policy isolate -> trait:HW_CPU_HYPERTHREADING:forbidden require -> trait:HW_CPU_HYPERTHREADING:required prefer -> (none, handled later during scheduling) Ditto for the 'hw_cpu_policy' and 'hw_cpu_thread_policy' image metadata equivalents. In addition, increment the requested 'resources:PCPU' by 1 if the 'hw:emulator_threads_policy' extra spec is present and set to 'isolate'. The scheduler will attempt to get PCPUs allocations and fall back to VCPUs if that fails. This is okay because the NUMA fitting code from the 'hardware' module used by both the 'NUMATopology' filter and libvirt driver protects us. That code doesn't know anything about PCPUs or VCPUs but rather cares about the 'NUMATopology.pcpuset' field, (starting in change I492803eaacc34c69af073689f9159449557919db), which can be set to different values depending on whether this is Train with new-style config, Train with old-style config, or Stein: - For Train compute nodes with new-style config, 'NUMATopology.pcpuset' will be explictly set to the value of '[compute] cpu_dedicated_set' or, if only '[compute] cpu_dedicated_set' is configured, 'None' (it's nullable) by the virt driver so the calls to 'hardware.numa_fit_instance_to_host' in the 'NUMATopologyFilter' or virt driver will fail if it can't actually fit. - For Train compute nodes with old-style config, 'NUMATopology.pcpuset' will be set to the same value as 'NUMATopology.cpuset' by the virt driver. - For Stein compute nodes, 'NUMATopology.pcpuset' will be unset and we'll detect this in 'hardware.numa_fit_instance_to_host' and simply set it to the same value as 'NUMATopology.cpuset'. Part of blueprint cpu-resources Change-Id: Ie38aa625dff543b5980fd437ad2febeba3b50079 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2019-09-12 12:51:11 +01:00
parent d6c96436d6
commit 278ab01c32
13 changed files with 770 additions and 75 deletions
--- a/nova/conf/workarounds.py
+++ b/nova/conf/workarounds.py
@@ -221,6 +221,31 @@ Related options:
 * ``compute_driver`` (libvirt)
 * ``[libvirt]/images_type`` (rbd)
 * ``instances_path``
+"""),
+
+    cfg.BoolOpt(
+        'disable_fallback_pcpu_query',
+        default=False,
+        deprecated_for_removal=True,
+        deprecated_since='20.0.0',
+        help="""
+Disable fallback request for VCPU allocations when using pinned instances.
+
+Starting in Train, compute nodes using the libvirt virt driver can report
+``PCPU`` inventory and will use this for pinned instances. The scheduler will
+automatically translate requests using the legacy CPU pinning-related flavor
+extra specs, ``hw:cpu_policy`` and ``hw:cpu_thread_policy``, their image
+metadata property equivalents, and the emulator threads pinning flavor extra
+spec, ``hw:emulator_threads_policy``, to new placement requests. However,
+compute nodes require additional configuration in order to report ``PCPU``
+inventory and this configuration may not be present immediately after an
+upgrade. To ensure pinned instances can be created without this additional
+configuration, the scheduler will make a second request to placement for
+old-style ``VCPU``-based allocations and fallback to these allocation
+candidates if necessary. This has a slight performance impact and is not
+necessary on new or upgraded deployments where the new configuration has been
+set on all hosts. By setting this option, the second lookup is disabled and the
+scheduler will only request ``PCPU``-based allocations.
 """),
 ]

--- a/nova/scheduler/manager.py
+++ b/nova/scheduler/manager.py
@@ -151,18 +151,46 @@ class SchedulerManager(manager.Manager):
                raise exception.NoValidHost(reason=e.message)

            resources = utils.resources_from_request_spec(
-                ctxt, spec_obj, self.driver.host_manager)
+                ctxt, spec_obj, self.driver.host_manager,
+                enable_pinning_translate=True)
            res = self.placement_client.get_allocation_candidates(ctxt,
                                                                  resources)
            if res is None:
                # We have to handle the case that we failed to connect to the
                # Placement service and the safe_connect decorator on
                # get_allocation_candidates returns None.
-                alloc_reqs, provider_summaries, allocation_request_version = (
-                        None, None, None)
-            else:
-                (alloc_reqs, provider_summaries,
-                            allocation_request_version) = res
+                res = None, None, None
+
+            alloc_reqs, provider_summaries, allocation_request_version = res
+            alloc_reqs = alloc_reqs or []
+            provider_summaries = provider_summaries or {}
+
+            # if the user requested pinned CPUs, we make a second query to
+            # placement for allocation candidates using VCPUs instead of PCPUs.
+            # This is necessary because users might not have modified all (or
+            # any) of their compute nodes meaning said compute nodes will not
+            # be reporting PCPUs yet. This is okay to do because the
+            # NUMATopologyFilter (scheduler) or virt driver (compute node) will
+            # weed out hosts that are actually using new style configuration
+            # but simply don't have enough free PCPUs (or any PCPUs).
+            # TODO(stephenfin): Remove when we drop support for 'vcpu_pin_set'
+            if (resources.cpu_pinning_requested and
+                    not CONF.workarounds.disable_fallback_pcpu_query):
+                LOG.debug('Requesting fallback allocation candidates with '
+                          'VCPU instead of PCPU')
+                resources = utils.resources_from_request_spec(
+                    ctxt, spec_obj, self.driver.host_manager,
+                    enable_pinning_translate=False)
+                res = self.placement_client.get_allocation_candidates(
+                    ctxt, resources)
+                if res:
+                    # merge the allocation requests and provider summaries from
+                    # the two requests together
+                    alloc_reqs_fallback, provider_summaries_fallback, _ = res
+
+                    alloc_reqs.extend(alloc_reqs_fallback)
+                    provider_summaries.update(provider_summaries_fallback)
+
            if not alloc_reqs:
                LOG.info("Got no allocation candidates from the Placement "
                         "API. This could be due to insufficient resources "
--- a/nova/scheduler/utils.py
+++ b/nova/scheduler/utils.py
@@ -20,6 +20,7 @@ import sys
 import traceback

 import os_resource_classes as orc
+import os_traits
 from oslo_log import log as logging
 from oslo_serialization import jsonutils
 from six.moves.urllib import parse
@@ -32,10 +33,11 @@ from nova import exception
 from nova.i18n import _
 from nova import objects
 from nova.objects import base as obj_base
+from nova.objects import fields as obj_fields
 from nova.objects import instance as obj_instance
 from nova import rpc
 from nova.scheduler.filters import utils as filters_utils
-import nova.virt.hardware as hw
+from nova.virt import hardware


 LOG = logging.getLogger(__name__)
@@ -55,7 +57,7 @@ class ResourceRequest(object):
    XS_KEYPAT = re.compile(r"^(%s)([1-9][0-9]*)?:(.*)$" %
                           '|'.join((XS_RES_PREFIX, XS_TRAIT_PREFIX)))

-    def __init__(self, request_spec):
+    def __init__(self, request_spec, enable_pinning_translate=True):
        """Create a new instance of ResourceRequest from a RequestSpec.

        Examines the flavor, flavor extra specs, and (optional) image metadata
@@ -80,6 +82,8 @@ class ResourceRequest(object):
        overridden by flavor extra specs.

        :param request_spec: An instance of ``objects.RequestSpec``.
+        :param enable_pinning_translate: True if the CPU policy extra specs
+            should be translated to placement resources and traits.
        """
        # { ident: RequestGroup }
        self._rg_by_id = {}
@@ -91,8 +95,11 @@ class ResourceRequest(object):
        # TODO(efried): Handle member_of[$N], which will need to be reconciled
        # with destination.aggregates handling in resources_from_request_spec

-        image = (request_spec.image if 'image' in request_spec
-                 else objects.ImageMeta(properties=objects.ImageMetaProps()))
+        # request_spec.image is nullable
+        if 'image' in request_spec and request_spec.image:
+            image = request_spec.image
+        else:
+            image = objects.ImageMeta(properties=objects.ImageMetaProps())

        # Parse the flavor extra specs
        self._process_extra_specs(request_spec.flavor)
@@ -102,12 +109,21 @@ class ResourceRequest(object):
        # Now parse the (optional) image metadata
        self._process_image_meta(image)

+        # TODO(stephenfin): Remove this parameter once we drop support for
+        # 'vcpu_pin_set'
+        self.cpu_pinning_requested = False
+
+        if enable_pinning_translate:
+            # Next up, let's handle those pesky CPU pinning policies
+            self._translate_pinning_policies(request_spec.flavor, image)
+
        # Finally, parse the flavor itself, though we'll only use these fields
        # if they don't conflict with something already provided by the flavor
        # extra specs. These are all added to the unnumbered request group.
        merged_resources = self.merged_resources()

-        if orc.VCPU not in merged_resources:
+        if (orc.VCPU not in merged_resources and
+                orc.PCPU not in merged_resources):
            self._add_resource(None, orc.VCPU, request_spec.vcpus)

        if orc.MEMORY_MB not in merged_resources:
@@ -173,7 +189,7 @@ class ResourceRequest(object):
        # NOTE(aspiers): In theory this could raise FlavorImageConflict,
        # but we already check it in the API layer, so that should never
        # happen.
-        if not hw.get_mem_encryption_constraint(flavor, image):
+        if not hardware.get_mem_encryption_constraint(flavor, image):
            # No memory encryption required, so no further action required.
            return

@@ -185,7 +201,7 @@ class ResourceRequest(object):
        """When the hw:pmem extra spec is present, require hosts which can
        provide enough vpmem resources.
        """
-        vpmem_labels = hw.get_vpmems(flavor)
+        vpmem_labels = hardware.get_vpmems(flavor)
        if not vpmem_labels:
            # No vpmems required
            return
@@ -199,6 +215,54 @@ class ResourceRequest(object):
            LOG.debug("Added resource %s=%d to requested resources",
                      resource_class, amount)

+    def _translate_pinning_policies(self, flavor, image):
+        """Translate the legacy pinning policies to resource requests."""
+        # NOTE(stephenfin): These can raise exceptions but these have already
+        # been validated by 'nova.virt.hardware.numa_get_constraints' in the
+        # API layer (see change I06fad233006c7bab14749a51ffa226c3801f951b).
+        # This call also handles conflicts between explicit VCPU/PCPU
+        # requests and implicit 'hw:cpu_policy'-based requests, mismatches
+        # between the number of CPUs in the flavor and explicit VCPU/PCPU
+        # requests, etc.
+        cpu_policy = hardware.get_cpu_policy_constraint(
+            flavor, image)
+        cpu_thread_policy = hardware.get_cpu_thread_policy_constraint(
+            flavor, image)
+        emul_thread_policy = hardware.get_emulator_thread_policy_constraint(
+            flavor)
+
+        # We don't need to worry about handling 'SHARED' - that will result in
+        # VCPUs which we include by default
+        if cpu_policy == obj_fields.CPUAllocationPolicy.DEDICATED:
+            # TODO(stephenfin): Remove when we drop support for 'vcpu_pin_set'
+            self.cpu_pinning_requested = True
+
+            # Switch VCPU -> PCPU
+            cpus = flavor.vcpus
+
+            LOG.debug('Translating request for %(vcpu_rc)s=%(cpus)d to '
+                      '%(vcpu_rc)s=0,%(pcpu_rc)s=%(cpus)d',
+                      {'vcpu_rc': orc.VCPU, 'pcpu_rc': orc.PCPU,
+                       'cpus': cpus})
+
+            if emul_thread_policy == 'isolate':
+                cpus += 1
+
+                LOG.debug('Adding additional %(pcpu_rc)s to account for '
+                          'emulator threads', {'pcpu_rc': orc.PCPU})
+
+            self._add_resource(None, orc.PCPU, cpus)
+
+        trait = {
+            obj_fields.CPUThreadAllocationPolicy.ISOLATE: 'forbidden',
+            obj_fields.CPUThreadAllocationPolicy.REQUIRE: 'required',
+        }.get(cpu_thread_policy)
+        if trait:
+            LOG.debug('Adding %(trait)s=%(value)s trait',
+                      {'trait': os_traits.HW_CPU_HYPERTHREADING,
+                       'value': trait})
+            self._add_trait(None, os_traits.HW_CPU_HYPERTHREADING, trait)
+
    @property
    def group_policy(self):
        return self._group_policy
@@ -463,18 +527,21 @@ def resources_from_flavor(instance, flavor):
    return res_req.merged_resources()


-def resources_from_request_spec(ctxt, spec_obj, host_manager):
+def resources_from_request_spec(ctxt, spec_obj, host_manager,
+        enable_pinning_translate=True):
    """Given a RequestSpec object, returns a ResourceRequest of the resources,
    traits, and aggregates it represents.

    :param context: The request context.
    :param spec_obj: A RequestSpec object.
    :param host_manager: A HostManager object.
+    :param enable_pinning_translate: True if the CPU policy extra specs should
+        be translated to placement resources and traits.

    :return: A ResourceRequest object.
    :raises NoValidHost: If the specified host/node is not found in the DB.
    """
-    res_req = ResourceRequest(spec_obj)
+    res_req = ResourceRequest(spec_obj, enable_pinning_translate)

    requested_resources = (spec_obj.requested_resources
                           if 'requested_resources' in spec_obj and
--- a/nova/tests/functional/libvirt/base.py
+++ b/nova/tests/functional/libvirt/base.py
@@ -64,17 +64,20 @@ class ServersTestBase(base.ServersTestBase):
        self.useFixture(fixtures.MockPatch(
            'nova.privsep.utils.supports_direct_io',
            return_value=True))
+        self.useFixture(fixtures.MockPatch(
+            'nova.virt.libvirt.host.Host.get_online_cpus',
+            return_value=set(range(16))))

        # Mock the 'get_connection' function, as we're going to need to provide
        # custom capabilities for each test
        _p = mock.patch('nova.virt.libvirt.host.Host.get_connection')
        self.mock_conn = _p.start()
        self.addCleanup(_p.stop)
-        # As above, mock the 'get_arch' function as we may need to provide
-        # different host architectures during some tests.
+
+        # Mock the 'get_arch' function as we may need to provide different host
+        # architectures during some tests. We default to x86_64
        _a = mock.patch('nova.virt.libvirt.utils.get_arch')
        self.mock_arch = _a.start()
-        # Default to X86_64
        self.mock_arch.return_value = obj_fields.Architecture.X86_64
        self.addCleanup(_a.stop)

@@ -96,6 +99,11 @@ class ServersTestBase(base.ServersTestBase):
    def _get_connection(self, host_info, pci_info=None,
                        libvirt_version=fakelibvirt.FAKE_LIBVIRT_VERSION,
                        mdev_info=None):
+        # sanity check
+        self.assertGreater(16, host_info.cpus,
+            "Host.get_online_cpus is only accounting for 16 CPUs but you're "
+            "requesting %d; change the mock or your test" % host_info.cpus)
+
        fake_connection = fakelibvirt.Connection(
            'qemu:///system',
            version=libvirt_version,
--- a/nova/tests/functional/libvirt/test_numa_servers.py
+++ b/nova/tests/functional/libvirt/test_numa_servers.py
@@ -53,6 +53,7 @@ class NUMAServersTestBase(base.ServersTestBase):
 class NUMAServersTest(NUMAServersTestBase):

    def _run_build_test(self, flavor_id, end_status='ACTIVE',
+                        filter_called_on_error=True,
                        expected_usage=None):

        # NOTE(bhagyashris): Always use host as 'compute1' so that it's
@@ -87,15 +88,20 @@ class NUMAServersTest(NUMAServersTestBase):
        self.assertIn(created_server_id, server_ids)

        # Validate the quota usage
-        if end_status == 'ACTIVE':
+        if filter_called_on_error and end_status == 'ACTIVE':
            quota_details = self.api.get_quota_detail()
            expected_core_usages = expected_usage.get(
                'VCPU', expected_usage.get('PCPU', 0))
            self.assertEqual(expected_core_usages,
                             quota_details['cores']['in_use'])

-        # Validate that NUMATopologyFilter has been called
-        self.assertTrue(self.mock_filter.called)
+        # Validate that NUMATopologyFilter has been called or not called,
+        # depending on whether this is expected to make it past placement or
+        # not (hint: if it's a lack of VCPU/PCPU resources, it won't)
+        if filter_called_on_error:
+            self.assertTrue(self.mock_filter.called)
+        else:
+            self.assertFalse(self.mock_filter.called)

        found_server = self._wait_for_state_change(found_server, 'BUILD')

@@ -151,25 +157,28 @@ class NUMAServersTest(NUMAServersTestBase):

        self._run_build_test(flavor_id, end_status='ERROR')

-    def test_create_server_with_pinning(self):
+    def test_create_server_with_legacy_pinning_policy(self):
        """Create a server using the legacy 'hw:cpu_policy' extra spec.

        This should pass and result in a guest NUMA topology with pinned CPUs.
        """

+        self.flags(cpu_dedicated_set='0-9', cpu_shared_set=None,
+                   group='compute')
+        self.flags(vcpu_pin_set=None)
+
        host_info = fakelibvirt.HostInfo(cpu_nodes=1, cpu_sockets=1,
                                         cpu_cores=5, cpu_threads=2,
                                         kB_mem=15740000)
        fake_connection = self._get_connection(host_info=host_info)
        self.mock_conn.return_value = fake_connection

-        # Create a flavor
        extra_spec = {
            'hw:cpu_policy': 'dedicated',
            'hw:cpu_thread_policy': 'prefer',
        }
        flavor_id = self._create_flavor(vcpu=5, extra_spec=extra_spec)
-        expected_usage = {'DISK_GB': 20, 'MEMORY_MB': 2048, 'VCPU': 5}
+        expected_usage = {'DISK_GB': 20, 'MEMORY_MB': 2048, 'PCPU': 5}

        server = self._run_build_test(flavor_id, expected_usage=expected_usage)

@@ -178,11 +187,65 @@ class NUMAServersTest(NUMAServersTestBase):
        self.assertEqual(1, len(inst.numa_topology.cells))
        self.assertEqual(5, inst.numa_topology.cells[0].cpu_topology.cores)

-    def test_create_server_with_pinning_quota_fails(self):
+    def test_create_server_with_legacy_pinning_policy_old_configuration(self):
+        """Create a server using the legacy extra spec and configuration.
+
+        This should pass and result in a guest NUMA topology with pinned CPUs,
+        though we'll still be consuming VCPUs (which would in theory be fixed
+        during a later reshape).
+        """
+
+        self.flags(cpu_dedicated_set=None, cpu_shared_set=None,
+                   group='compute')
+        self.flags(vcpu_pin_set='0-7')
+
+        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
+                                         cpu_cores=2, cpu_threads=2,
+                                         kB_mem=15740000)
+        fake_connection = self._get_connection(host_info=host_info)
+        self.mock_conn.return_value = fake_connection
+
+        extra_spec = {
+            'hw:cpu_policy': 'dedicated',
+            'hw:cpu_thread_policy': 'prefer',
+        }
+        flavor_id = self._create_flavor(extra_spec=extra_spec)
+        expected_usage = {'DISK_GB': 20, 'MEMORY_MB': 2048, 'VCPU': 2}
+
+        self._run_build_test(flavor_id, expected_usage=expected_usage)
+
+    def test_create_server_with_legacy_pinning_policy_fails(self):
+        """Create a pinned instance on a host with no PCPUs.
+
+        This should fail because we're translating the extra spec and the host
+        isn't reporting the PCPUs we need.
+        """
+
+        self.flags(cpu_shared_set='0-9', cpu_dedicated_set=None,
+                   group='compute')
+        self.flags(vcpu_pin_set=None)
+
+        host_info = fakelibvirt.HostInfo(cpu_nodes=1, cpu_sockets=1,
+                                         cpu_cores=5, cpu_threads=2,
+                                         kB_mem=15740000)
+        fake_connection = self._get_connection(host_info=host_info)
+        self.mock_conn.return_value = fake_connection
+
+        extra_spec = {
+            'hw:cpu_policy': 'dedicated',
+            'hw:cpu_thread_policy': 'prefer',
+        }
+        flavor_id = self._create_flavor(vcpu=5, extra_spec=extra_spec)
+        self._run_build_test(flavor_id, end_status='ERROR')
+
+    def test_create_server_with_legacy_pinning_policy_quota_fails(self):
        """Create a pinned instance on a host with PCPUs but not enough quota.

        This should fail because the quota request should fail.
        """
+        self.flags(cpu_dedicated_set='0-7', cpu_shared_set=None,
+                   group='compute')
+        self.flags(vcpu_pin_set=None)

        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
                                         cpu_cores=2, cpu_threads=2,
@@ -213,12 +276,101 @@ class NUMAServersTest(NUMAServersTestBase):
            self.api.post_server, post)
        self.assertEqual(403, ex.response.status_code)

-    def test_resize_unpinned_to_pinned(self):
+    def test_create_server_with_pcpu(self):
+        """Create a server using an explicit 'resources:PCPU' request.
+
+        This should pass and result in a guest NUMA topology with pinned CPUs.
+        """
+
+        self.flags(cpu_dedicated_set='0-7', cpu_shared_set=None,
+                   group='compute')
+        self.flags(vcpu_pin_set=None)
+
+        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
+                                         cpu_cores=2, cpu_threads=2,
+                                         kB_mem=15740000)
+        fake_connection = self._get_connection(host_info=host_info)
+        self.mock_conn.return_value = fake_connection
+
+        extra_spec = {'resources:PCPU': '2'}
+        flavor_id = self._create_flavor(vcpu=2, extra_spec=extra_spec)
+        expected_usage = {'DISK_GB': 20, 'MEMORY_MB': 2048, 'PCPU': 2}
+
+        server = self._run_build_test(flavor_id, expected_usage=expected_usage)
+
+        ctx = nova_context.get_admin_context()
+        inst = objects.Instance.get_by_uuid(ctx, server['id'])
+        self.assertEqual(1, len(inst.numa_topology.cells))
+        self.assertEqual(1, inst.numa_topology.cells[0].cpu_topology.cores)
+        self.assertEqual(2, inst.numa_topology.cells[0].cpu_topology.threads)
+
+    def test_create_server_with_pcpu_fails(self):
+        """Create a pinned instance on a host with no PCPUs.
+
+        This should fail because we're explicitly requesting PCPUs and the host
+        isn't reporting them.
+        """
+
+        self.flags(cpu_shared_set='0-9', cpu_dedicated_set=None,
+                   group='compute')
+        self.flags(vcpu_pin_set=None)
+
+        host_info = fakelibvirt.HostInfo(cpu_nodes=1, cpu_sockets=1,
+                                         cpu_cores=5, cpu_threads=2,
+                                         kB_mem=15740000)
+        fake_connection = self._get_connection(host_info=host_info)
+        self.mock_conn.return_value = fake_connection
+
+        extra_spec = {'resources:PCPU': 2}
+        flavor_id = self._create_flavor(vcpu=2, extra_spec=extra_spec)
+        self._run_build_test(flavor_id, end_status='ERROR',
+                             filter_called_on_error=False)
+
+    def test_create_server_with_pcpu_quota_fails(self):
+        """Create a pinned instance on a host with PCPUs but not enough quota.
+
+        This should fail because the quota request should fail.
+        """
+        self.flags(cpu_dedicated_set='0-7', cpu_shared_set=None,
+                   group='compute')
+        self.flags(vcpu_pin_set=None)
+
+        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
+                                         cpu_cores=2, cpu_threads=2,
+                                         kB_mem=15740000)
+        fake_connection = self._get_connection(host_info=host_info)
+        self.mock_conn.return_value = fake_connection
+
+        extra_spec = {'resources:PCPU': '2'}
+        flavor_id = self._create_flavor(vcpu=2, extra_spec=extra_spec)
+
+        # Update the core quota less than we requested
+        self.api.update_quota({'cores': 1})
+
+        # NOTE(bhagyashris): Always use host as 'compute1' so that it's
+        # possible to get resource provider information for verifying
+        # compute usages. This host name 'compute1' is hard coded in
+        # Connection class in fakelibvirt.py.
+        # TODO(stephenfin): Remove the hardcoded limit, possibly overridding
+        # 'start_service' to make sure there isn't a mismatch
+        self.compute = self.start_service('compute', host='compute1')
+
+        post = {'server': self._build_server(flavor_id)}
+
+        ex = self.assertRaises(client.OpenStackApiException,
+            self.api.post_server, post)
+        self.assertEqual(403, ex.response.status_code)
+
+    def test_resize_vcpu_to_pcpu(self):
        """Create an unpinned instance and resize it to a flavor with pinning.

        This should pass and result in a guest NUMA topology with pinned CPUs.
        """

+        self.flags(cpu_dedicated_set='0-3', cpu_shared_set='4-7',
+                   group='compute')
+        self.flags(vcpu_pin_set=None)
+
        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
                                         cpu_cores=2, cpu_threads=2,
                                         kB_mem=15740000)
@@ -260,13 +412,11 @@ class NUMAServersTest(NUMAServersTestBase):
        original_host = server['OS-EXT-SRV-ATTR:host']

        for host, compute_rp_uuid in self.compute_rp_uuids.items():
-            # TODO(stephenfin): Both of these should report PCPU when we
-            # support that
            if host == original_host:  # the host with the instance
-                expected_usage = {'VCPU': 2, 'DISK_GB': 20,
+                expected_usage = {'VCPU': 2, 'PCPU': 0, 'DISK_GB': 20,
                                  'MEMORY_MB': 2048}
            else:  # the other host
-                expected_usage = {'VCPU': 0, 'DISK_GB': 0,
+                expected_usage = {'VCPU': 0, 'PCPU': 0, 'DISK_GB': 0,
                                  'MEMORY_MB': 0}

            compute_usage = self.placement_api.get(
@@ -299,16 +449,15 @@ class NUMAServersTest(NUMAServersTestBase):
        # resource usage has been updated

        for host, compute_rp_uuid in self.compute_rp_uuids.items():
-            # TODO(stephenfin): This should use PCPU when we support those
            if host == original_host:
                # the host that had the instance should still have allocations
                # since the resize hasn't been confirmed
-                expected_usage = {'VCPU': 2, 'DISK_GB': 20,
+                expected_usage = {'VCPU': 2, 'PCPU': 0, 'DISK_GB': 20,
                                  'MEMORY_MB': 2048}
            else:
                # the other host should have the new allocations replete with
                # PCPUs
-                expected_usage = {'VCPU': 2, 'DISK_GB': 20,
+                expected_usage = {'VCPU': 0, 'PCPU': 2, 'DISK_GB': 20,
                                  'MEMORY_MB': 2048}

            compute_usage = self.placement_api.get(
@@ -329,16 +478,15 @@ class NUMAServersTest(NUMAServersTestBase):
        server = self._wait_for_state_change(server, 'ACTIVE')

        for host, compute_rp_uuid in self.compute_rp_uuids.items():
-            # TODO(stephenfin): This should use PCPU when we support those
            if host == original_host:
                # the host that had the instance should no longer have
                # alocations since the resize has been confirmed
-                expected_usage = {'VCPU': 0, 'DISK_GB': 0,
+                expected_usage = {'VCPU': 0, 'PCPU': 0, 'DISK_GB': 0,
                                  'MEMORY_MB': 0}
            else:
                # the other host should still have the new allocations replete
                # with PCPUs
-                expected_usage = {'VCPU': 2, 'DISK_GB': 20,
+                expected_usage = {'VCPU': 0, 'PCPU': 2, 'DISK_GB': 20,
                                  'MEMORY_MB': 2048}

            compute_usage = self.placement_api.get(
--- a/nova/tests/functional/libvirt/test_pci_sriov_servers.py
+++ b/nova/tests/functional/libvirt/test_pci_sriov_servers.py
@@ -285,6 +285,8 @@ class PCIServersTest(_PCIServersTestBase):
           assigned pci device.
        """

+        self.flags(cpu_dedicated_set='0-7', group='compute')
+
        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
                                         cpu_cores=2, cpu_threads=2,
                                         kB_mem=15740000)
@@ -306,6 +308,8 @@ class PCIServersTest(_PCIServersTestBase):
           memory resources from one NUMA node and a PCI device from another.
        """

+        self.flags(cpu_dedicated_set='0-7', group='compute')
+
        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
                                         cpu_cores=2, cpu_threads=2,
                                         kB_mem=15740000)
@@ -355,6 +359,8 @@ class PCIServersWithNUMAPoliciesTest(_PCIServersTestBase):
        NUMA policies are in use.
        """

+        self.flags(cpu_dedicated_set='0-7', group='compute')
+
        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
                                         cpu_cores=2, cpu_threads=2,
                                         kB_mem=15740000)
--- a/nova/tests/functional/libvirt/test_rt_servers.py
+++ b/nova/tests/functional/libvirt/test_rt_servers.py
@@ -46,6 +46,8 @@ class RealTimeServersTest(base.ServersTestBase):
            self.api.post_server, {'server': server})

    def test_success(self):
+        self.flags(cpu_dedicated_set='0-7', group='compute')
+
        host_info = fakelibvirt.HostInfo(cpu_nodes=2, cpu_sockets=1,
                                         cpu_cores=2, cpu_threads=2,
                                         kB_mem=15740000)
--- a/nova/tests/unit/scheduler/fakes.py
+++ b/nova/tests/unit/scheduler/fakes.py
@@ -25,6 +25,8 @@ from nova.scheduler import driver
 from nova.scheduler import host_manager


+# TODO(stephenfin): Rework these so they're functions instead of global
+# variables that can be mutated
 NUMA_TOPOLOGY = objects.NUMATopology(cells=[
    objects.NUMACell(
        id=0,
@@ -164,19 +166,22 @@ COMPUTE_NODES = [
            host='fake', hypervisor_hostname='fake-hyp'),
 ]

-ALLOC_REQS = [
-    {
-        'allocations': {
-            cn.uuid: {
-                'resources': {
-                    'VCPU': 1,
-                    'MEMORY_MB': 512,
-                    'DISK_GB': 512,
-                },
+
+def get_fake_alloc_reqs():
+    return [
+        {
+            'allocations': {
+                cn.uuid: {
+                    'resources': {
+                        'VCPU': 1,
+                        'MEMORY_MB': 512,
+                        'DISK_GB': 512,
+                    },
+                }
            }
-        }
-    } for cn in COMPUTE_NODES
-]
+        } for cn in COMPUTE_NODES
+    ]
+

 RESOURCE_PROVIDERS = [
    dict(
--- a/nova/tests/unit/scheduler/test_scheduler.py
+++ b/nova/tests/unit/scheduler/test_scheduler.py
@@ -88,10 +88,13 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
        fake_spec = objects.RequestSpec()
        fake_spec.instance_uuid = uuids.instance
        fake_version = "9.42"
-        place_res = (fakes.ALLOC_REQS, mock.sentinel.p_sums, fake_version)
+        mock_p_sums = mock.Mock()
+        fake_alloc_reqs = fakes.get_fake_alloc_reqs()
+        place_res = (fake_alloc_reqs, mock_p_sums, fake_version)
        mock_get_ac.return_value = place_res
+        mock_rfrs.return_value.cpu_pinning_requested = False
        expected_alloc_reqs_by_rp_uuid = {
-            cn.uuid: [fakes.ALLOC_REQS[x]]
+            cn.uuid: [fake_alloc_reqs[x]]
            for x, cn in enumerate(fakes.COMPUTE_NODES)
        }
        with mock.patch.object(self.manager.driver, 'select_destinations'
@@ -102,7 +105,7 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
            select_destinations.assert_called_once_with(
                self.context, fake_spec,
                [fake_spec.instance_uuid], expected_alloc_reqs_by_rp_uuid,
-                mock.sentinel.p_sums, fake_version, False)
+                mock_p_sums, fake_version, False)
            mock_get_ac.assert_called_once_with(
                self.context, mock_rfrs.return_value)

@@ -114,7 +117,7 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
                    return_objects=True, return_alternates=True)
            select_destinations.assert_called_once_with(None, fake_spec,
                [fake_spec.instance_uuid], expected_alloc_reqs_by_rp_uuid,
-                mock.sentinel.p_sums, fake_version, True)
+                mock_p_sums, fake_version, True)

    @mock.patch('nova.scheduler.request_filter.process_reqspec')
    @mock.patch('nova.scheduler.utils.resources_from_request_spec')
@@ -125,10 +128,13 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
        fake_spec = objects.RequestSpec()
        fake_spec.instance_uuid = uuids.instance
        fake_version = "9.42"
-        place_res = (fakes.ALLOC_REQS, mock.sentinel.p_sums, fake_version)
+        mock_p_sums = mock.Mock()
+        fake_alloc_reqs = fakes.get_fake_alloc_reqs()
+        place_res = (fake_alloc_reqs, mock_p_sums, fake_version)
        mock_get_ac.return_value = place_res
+        mock_rfrs.return_value.cpu_pinning_requested = False
        expected_alloc_reqs_by_rp_uuid = {
-            cn.uuid: [fakes.ALLOC_REQS[x]]
+            cn.uuid: [fake_alloc_reqs[x]]
            for x, cn in enumerate(fakes.COMPUTE_NODES)
        }
        with mock.patch.object(self.manager.driver, 'select_destinations'
@@ -148,7 +154,7 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
            # driver should have been called with True for return_alternates.
            select_destinations.assert_called_once_with(None, fake_spec,
                    [fake_spec.instance_uuid], expected_alloc_reqs_by_rp_uuid,
-                    mock.sentinel.p_sums, fake_version, True)
+                    mock_p_sums, fake_version, True)

            # Now pass False for return objects, but keep return_alternates as
            # True. Verify that the manager converted the Selection object back
@@ -164,7 +170,7 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
            # return_alternates as False.
            select_destinations.assert_called_once_with(None, fake_spec,
                    [fake_spec.instance_uuid], expected_alloc_reqs_by_rp_uuid,
-                    mock.sentinel.p_sums, fake_version, False)
+                    mock_p_sums, fake_version, False)

    @mock.patch('nova.scheduler.request_filter.process_reqspec')
    @mock.patch('nova.scheduler.utils.resources_from_request_spec')
@@ -176,6 +182,7 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
        fake_spec.instance_uuid = uuids.instance
        place_res = get_allocation_candidates_response
        mock_get_ac.return_value = place_res
+        mock_rfrs.return_value.cpu_pinning_requested = False
        with mock.patch.object(self.manager.driver, 'select_destinations'
                ) as select_destinations:
            self.assertRaises(messaging.rpc.dispatcher.ExpectedException,
@@ -236,13 +243,17 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
    @mock.patch('nova.scheduler.utils.resources_from_request_spec')
    @mock.patch('nova.scheduler.client.report.SchedulerReportClient.'
                'get_allocation_candidates')
-    def test_select_destination_with_4_3_client(self, mock_get_ac, mock_rfrs,
-                                                mock_process):
+    def test_select_destination_with_4_3_client(
+            self, mock_get_ac, mock_rfrs, mock_process,
+            cpu_pinning_requested=False):
        fake_spec = objects.RequestSpec()
-        place_res = (fakes.ALLOC_REQS, mock.sentinel.p_sums, "42.0")
+        mock_p_sums = mock.Mock()
+        fake_alloc_reqs = fakes.get_fake_alloc_reqs()
+        place_res = (fake_alloc_reqs, mock_p_sums, "42.0")
        mock_get_ac.return_value = place_res
+        mock_rfrs.return_value.cpu_pinning_requested = cpu_pinning_requested
        expected_alloc_reqs_by_rp_uuid = {
-            cn.uuid: [fakes.ALLOC_REQS[x]]
+            cn.uuid: [fake_alloc_reqs[x]]
            for x, cn in enumerate(fakes.COMPUTE_NODES)
        }
        with mock.patch.object(self.manager.driver, 'select_destinations'
@@ -251,10 +262,78 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
            mock_process.assert_called_once_with(self.context, fake_spec)
            select_destinations.assert_called_once_with(self.context,
                fake_spec, None, expected_alloc_reqs_by_rp_uuid,
-                mock.sentinel.p_sums, "42.0", False)
+                mock_p_sums, "42.0", False)
+            mock_rfrs.assert_called_once_with(
+                self.context, fake_spec, mock.ANY,
+                enable_pinning_translate=True)
            mock_get_ac.assert_called_once_with(
                self.context, mock_rfrs.return_value)

+    @mock.patch('nova.scheduler.manager.LOG.debug')
+    @mock.patch('nova.scheduler.request_filter.process_reqspec')
+    @mock.patch('nova.scheduler.utils.resources_from_request_spec')
+    @mock.patch('nova.scheduler.client.report.SchedulerReportClient.'
+                'get_allocation_candidates')
+    def test_select_destination_with_pcpu_fallback(
+            self, mock_get_ac, mock_rfrs, mock_process, mock_log):
+        """Check that we make a second request to placement if we've got a PCPU
+        request.
+        """
+        self.flags(disable_fallback_pcpu_query=False, group='workarounds')
+
+        # mock the result from placement. In reality, the two calls we expect
+        # would return two different results, but we don't care about that. All
+        # we want to check is that it _was_ called twice
+        fake_spec = objects.RequestSpec()
+        mock_p_sums = mock.Mock()
+        fake_alloc_reqs = fakes.get_fake_alloc_reqs()
+        place_res = (fake_alloc_reqs, mock_p_sums, "42.0")
+        mock_get_ac.return_value = place_res
+
+        pcpu_rreq = mock.Mock()
+        pcpu_rreq.cpu_pinning_requested = True
+        vcpu_rreq = mock.Mock()
+        mock_rfrs.side_effect = [pcpu_rreq, vcpu_rreq]
+
+        # as above, the two allocation requests against each compute node would
+        # be different in reality, and not all compute nodes might have two
+        # allocation requests, but that doesn't matter for this simple test
+        expected_alloc_reqs_by_rp_uuid = {
+            cn.uuid: [fake_alloc_reqs[x], fake_alloc_reqs[x]]
+            for x, cn in enumerate(fakes.COMPUTE_NODES)
+        }
+
+        with mock.patch.object(self.manager.driver, 'select_destinations'
+                ) as select_destinations:
+            self.manager.select_destinations(self.context, spec_obj=fake_spec)
+            select_destinations.assert_called_once_with(self.context,
+                fake_spec, None, expected_alloc_reqs_by_rp_uuid,
+                mock_p_sums, "42.0", False)
+
+        mock_process.assert_called_once_with(self.context, fake_spec)
+        mock_log.assert_called_with(
+            'Requesting fallback allocation candidates with VCPU instead of '
+            'PCPU')
+        mock_rfrs.assert_has_calls([
+            mock.call(self.context, fake_spec, mock.ANY,
+                      enable_pinning_translate=True),
+            mock.call(self.context, fake_spec, mock.ANY,
+                      enable_pinning_translate=False),
+        ])
+        mock_get_ac.assert_has_calls([
+            mock.call(self.context, pcpu_rreq),
+            mock.call(self.context, vcpu_rreq),
+        ])
+
+    def test_select_destination_with_pcpu_fallback_disabled(self):
+        """Check that we do not make a second request to placement if we've
+        been told not to, even though we've got a PCPU instance.
+        """
+        self.flags(disable_fallback_pcpu_query=True, group='workarounds')
+
+        self.test_select_destination_with_4_3_client(
+            cpu_pinning_requested=True)
+
    # TODO(sbauza): Remove that test once the API v4 is removed
    @mock.patch('nova.scheduler.request_filter.process_reqspec')
    @mock.patch('nova.scheduler.utils.resources_from_request_spec')
@@ -266,10 +345,13 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
        fake_spec = objects.RequestSpec()
        fake_spec.instance_uuid = uuids.instance
        from_primitives.return_value = fake_spec
-        place_res = (fakes.ALLOC_REQS, mock.sentinel.p_sums, "42.0")
+        mock_p_sums = mock.Mock()
+        fake_alloc_reqs = fakes.get_fake_alloc_reqs()
+        place_res = (fake_alloc_reqs, mock_p_sums, "42.0")
        mock_get_ac.return_value = place_res
+        mock_rfrs.return_value.cpu_pinning_requested = False
        expected_alloc_reqs_by_rp_uuid = {
-            cn.uuid: [fakes.ALLOC_REQS[x]]
+            cn.uuid: [fake_alloc_reqs[x]]
            for x, cn in enumerate(fakes.COMPUTE_NODES)
        }
        with mock.patch.object(self.manager.driver, 'select_destinations'
@@ -282,7 +364,7 @@ class SchedulerManagerTestCase(test.NoDBTestCase):
            select_destinations.assert_called_once_with(
                self.context, fake_spec,
                [fake_spec.instance_uuid], expected_alloc_reqs_by_rp_uuid,
-                mock.sentinel.p_sums, "42.0", False)
+                mock_p_sums, "42.0", False)
            mock_get_ac.assert_called_once_with(
                self.context, mock_rfrs.return_value)

--- a/nova/tests/unit/scheduler/test_utils.py
+++ b/nova/tests/unit/scheduler/test_utils.py
@@ -919,6 +919,58 @@ class TestUtils(TestUtilsBase):
        )
        self.assertEqual(expected_querystring, rr.to_querystring())

+    def _test_resource_request_init_with_legacy_extra_specs(self):
+        flavor = objects.Flavor(
+            vcpus=1, memory_mb=1024, root_gb=10, ephemeral_gb=5, swap=0,
+            extra_specs={
+                'hw:cpu_policy': 'dedicated',
+                'hw:cpu_thread_policy': 'isolate',
+                'hw:emulator_threads_policy': 'isolate',
+            })
+
+        return objects.RequestSpec(flavor=flavor, is_bfv=False)
+
+    def test_resource_request_init_with_legacy_extra_specs(self):
+        expected = FakeResourceRequest()
+        expected._rg_by_id[None] = objects.RequestGroup(
+            use_same_provider=False,
+            resources={
+                # we should have two PCPUs, one due to hw:cpu_policy and the
+                # other due to hw:cpu_thread_policy
+                'PCPU': 2,
+                'MEMORY_MB': 1024,
+                'DISK_GB': 15,
+            },
+            forbidden_traits={
+                # we should forbid hyperthreading due to hw:cpu_thread_policy
+                'HW_CPU_HYPERTHREADING',
+            },
+        )
+        rs = self._test_resource_request_init_with_legacy_extra_specs()
+        rr = utils.ResourceRequest(rs)
+        self.assertResourceRequestsEqual(expected, rr)
+        self.assertTrue(rr.cpu_pinning_requested)
+
+    def test_resource_request_init_with_legacy_extra_specs_no_translate(self):
+        expected = FakeResourceRequest()
+        expected._rg_by_id[None] = objects.RequestGroup(
+            use_same_provider=False,
+            resources={
+                # we should have a VCPU despite hw:cpu_policy because
+                # enable_pinning_translate=False
+                'VCPU': 1,
+                'MEMORY_MB': 1024,
+                'DISK_GB': 15,
+            },
+            # we should not require hyperthreading despite hw:cpu_thread_policy
+            # because enable_pinning_translate=False
+            forbidden_traits=set(),
+        )
+        rs = self._test_resource_request_init_with_legacy_extra_specs()
+        rr = utils.ResourceRequest(rs, enable_pinning_translate=False)
+        self.assertResourceRequestsEqual(expected, rr)
+        self.assertFalse(rr.cpu_pinning_requested)
+
    def test_resource_request_init_with_image_props(self):
        flavor = objects.Flavor(
            vcpus=1, memory_mb=1024, root_gb=10, ephemeral_gb=5, swap=0)
@@ -945,6 +997,58 @@ class TestUtils(TestUtilsBase):
        rr = utils.ResourceRequest(rs)
        self.assertResourceRequestsEqual(expected, rr)

+    def _test_resource_request_init_with_legacy_image_props(self):
+        flavor = objects.Flavor(
+            vcpus=1, memory_mb=1024, root_gb=10, ephemeral_gb=5, swap=0)
+        image = objects.ImageMeta.from_dict({
+            'properties': {
+                'hw_cpu_policy': 'dedicated',
+                'hw_cpu_thread_policy': 'require',
+            },
+            'id': 'c8b1790e-a07d-4971-b137-44f2432936cd',
+        })
+        return objects.RequestSpec(flavor=flavor, image=image, is_bfv=False)
+
+    def test_resource_request_init_with_legacy_image_props(self):
+        expected = FakeResourceRequest()
+        expected._rg_by_id[None] = objects.RequestGroup(
+            use_same_provider=False,
+            resources={
+                # we should have a PCPU due to hw_cpu_policy
+                'PCPU': 1,
+                'MEMORY_MB': 1024,
+                'DISK_GB': 15,
+            },
+            required_traits={
+                # we should require hyperthreading due to hw_cpu_thread_policy
+                'HW_CPU_HYPERTHREADING',
+            },
+        )
+        rs = self._test_resource_request_init_with_legacy_image_props()
+        rr = utils.ResourceRequest(rs)
+        self.assertResourceRequestsEqual(expected, rr)
+        self.assertTrue(rr.cpu_pinning_requested)
+
+    def test_resource_request_init_with_legacy_image_props_no_translate(self):
+        expected = FakeResourceRequest()
+        expected._rg_by_id[None] = objects.RequestGroup(
+            use_same_provider=False,
+            resources={
+                # we should have a VCPU despite hw_cpu_policy because
+                # enable_pinning_translate=False
+                'VCPU': 1,
+                'MEMORY_MB': 1024,
+                'DISK_GB': 15,
+            },
+            # we should not require hyperthreading despite hw_cpu_thread_policy
+            # because enable_pinning_translate=False
+            required_traits=set(),
+        )
+        rs = self._test_resource_request_init_with_legacy_image_props()
+        rr = utils.ResourceRequest(rs, enable_pinning_translate=False)
+        self.assertResourceRequestsEqual(expected, rr)
+        self.assertFalse(rr.cpu_pinning_requested)
+
    def test_resource_request_init_is_bfv(self):
        flavor = objects.Flavor(
            vcpus=1, memory_mb=1024, root_gb=10, ephemeral_gb=5, swap=1555)
--- a/nova/tests/unit/virt/test_hardware.py
+++ b/nova/tests/unit/virt/test_hardware.py
@@ -1330,6 +1330,109 @@ class NUMATopologyTest(test.NoDBTestCase):
                            id=0, cpuset=set([0, 1, 2, 3]), memory=2048),
                    ]),
            },
+            {
+                # We request PCPUs explicitly
+                "flavor": objects.Flavor(vcpus=4, memory_mb=2048,
+                                         extra_specs={
+                    "resources:PCPU": "4",
+                }),
+                "image": {
+                    "properties": {}
+                },
+                "expect": objects.InstanceNUMATopology(
+                    cells=[
+                        objects.InstanceNUMACell(
+                            id=0, cpuset=set([0, 1, 2, 3]), memory=2048,
+                            cpu_policy=fields.CPUAllocationPolicy.DEDICATED,
+                        )]),
+            },
+            {
+                # We request the HW_CPU_HYPERTHREADING trait explicitly
+                "flavor": objects.Flavor(vcpus=4, memory_mb=2048,
+                                         extra_specs={
+                    "resources:PCPU": "4",
+                    "trait:HW_CPU_HYPERTHREADING": "forbidden",
+                }),
+                "image": {
+                    "properties": {}
+                },
+                "expect": objects.InstanceNUMATopology(
+                    cells=[
+                        objects.InstanceNUMACell(
+                            id=0, cpuset=set([0, 1, 2, 3]), memory=2048,
+                            cpu_policy=fields.CPUAllocationPolicy.DEDICATED,
+                            cpu_thread_policy=
+                                fields.CPUThreadAllocationPolicy.ISOLATE,
+                        )]),
+            },
+            {
+                # Requesting both implicit and explicit PCPUs
+                "flavor": objects.Flavor(vcpus=4, memory_mb=2048,
+                                         extra_specs={
+                    "hw:cpu_policy": fields.CPUAllocationPolicy.DEDICATED,
+                    "resources:PCPU": "4",
+                }),
+                "image": {"properties": {}},
+                "expect": exception.InvalidRequest,
+            },
+            {
+                # Requesting both PCPUs and VCPUs
+                "flavor": objects.Flavor(vcpus=4, memory_mb=2048,
+                                         extra_specs={
+                    "resources:PCPU": "2",
+                    "resources:VCPU": "2",
+                }),
+                "image": {"properties": {}},
+                "expect": exception.InvalidRequest,
+            },
+            {
+                # Mismatch between PCPU requests and flavor.vcpus
+                "flavor": objects.Flavor(vcpus=4, memory_mb=2048,
+                                         extra_specs={
+                    "resources:PCPU": "5",
+                }),
+                "image": {"properties": {}},
+                "expect": exception.InvalidRequest,
+            },
+            {
+                # Mismatch between PCPU requests and flavor.vcpus with
+                # 'isolate' emulator thread policy
+                "flavor": objects.Flavor(vcpus=4, memory_mb=2048,
+                                         extra_specs={
+                    "hw:emulator_threads_policy": "isolate",
+                    "resources:PCPU": "4",
+                }),
+                "image": {"properties": {}},
+                "expect": exception.InvalidRequest,
+            },
+            {
+                # Mismatch between implicit and explicit HW_CPU_HYPERTHREADING
+                # trait (flavor)
+                "flavor": objects.Flavor(vcpus=4, memory_mb=2048,
+                                         extra_specs={
+                    "hw:cpu_thread_policy":
+                        fields.CPUThreadAllocationPolicy.ISOLATE,
+                    "trait:HW_CPU_HYPERTHREADING": "required",
+                }),
+                "image": {"properties": {}},
+                "expect": exception.InvalidRequest,
+            },
+            {
+                # Mismatch between implicit and explicit HW_CPU_HYPERTHREADING
+                # trait (image)
+                "flavor": objects.Flavor(vcpus=4, name='foo', memory_mb=2048,
+                                         extra_specs={
+                    "hw:cpu_policy": fields.CPUAllocationPolicy.DEDICATED,
+                    "hw:cpu_thread_policy":
+                        fields.CPUThreadAllocationPolicy.ISOLATE,
+                }),
+                "image": {
+                    "properties": {
+                        "trait:HW_CPU_HYPERTHREADING": "required",
+                    }
+                },
+                "expect": exception.InvalidRequest,
+            },
        ]

        for testitem in testdata:
--- a/nova/virt/hardware.py
+++ b/nova/virt/hardware.py
@@ -16,7 +16,10 @@ import collections
 import fractions
 import itertools
 import math
+import re

+import os_resource_classes as orc
+import os_traits
 from oslo_log import log as logging
 from oslo_utils import strutils
 from oslo_utils import units
@@ -1475,7 +1478,8 @@ def _get_numa_node_count_constraint(flavor, image_meta):
    return int(nodes) if nodes else nodes


-def _get_cpu_policy_constraint(flavor, image_meta):
+# NOTE(stephenfin): This must be public as it's used elsewhere
+def get_cpu_policy_constraint(flavor, image_meta):
    # type: (objects.Flavor, objects.ImageMeta) -> Optional[str]
    """Validate and return the requested CPU policy.

@@ -1511,12 +1515,13 @@ def _get_cpu_policy_constraint(flavor, image_meta):
    elif image_policy == fields.CPUAllocationPolicy.DEDICATED:
        cpu_policy = image_policy
    else:
-        cpu_policy = fields.CPUAllocationPolicy.SHARED
+        cpu_policy = None

    return cpu_policy


-def _get_cpu_thread_policy_constraint(flavor, image_meta):
+# NOTE(stephenfin): This must be public as it's used elsewhere
+def get_cpu_thread_policy_constraint(flavor, image_meta):
    # type: (objects.Flavor, objects.ImageMeta) -> Optional[str]
    """Validate and return the requested CPU thread policy.

@@ -1615,6 +1620,40 @@ def is_realtime_enabled(flavor):
    return strutils.bool_from_string(flavor_rt)


+def _get_vcpu_pcpu_resources(flavor):
+    # type: (objects.Flavor) -> Tuple[bool, bool]
+    requested_vcpu = 0
+    requested_pcpu = 0
+
+    for key, val in flavor.get('extra_specs', {}).items():
+        if re.match('resources([1-9][0-9]*)?:%s' % orc.VCPU, key):
+            try:
+                requested_vcpu += int(val)
+            except ValueError:
+                # this is handled elsewhere
+                pass
+        if re.match('resources([1-9][0-9]*)?:%s' % orc.PCPU, key):
+            try:
+                requested_pcpu += int(val)
+            except ValueError:
+                # this is handled elsewhere
+                pass
+
+    return (requested_vcpu, requested_pcpu)
+
+
+def _get_hyperthreading_trait(flavor, image_meta):
+    # type: (objects.Flavor, objects.ImageMeta) -> Optional[str]
+    for key, val in flavor.get('extra_specs', {}).items():
+        if re.match('trait([1-9][0-9]*)?:%s' % os_traits.HW_CPU_HYPERTHREADING,
+                    key):
+            return val
+
+    if os_traits.HW_CPU_HYPERTHREADING in image_meta.properties.get(
+            'traits_required', []):
+        return 'required'
+
+
 def _get_realtime_constraint(flavor, image_meta):
    # type: (objects.Flavor, objects.ImageMeta) -> Optional[str]
    """Validate and return the requested realtime CPU mask.
@@ -1721,6 +1760,8 @@ def numa_get_constraints(flavor, image_meta):
             invalid value in image or flavor.
    :raises: exception.InvalidCPUThreadAllocationPolicy if policy is defined
             with invalid value in image or flavor.
+    :raises: exception.InvalidRequest if there is a conflict between explicitly
+             and implicitly requested resources of hyperthreading traits
    :returns: objects.InstanceNUMATopology, or None
    """
    numa_topology = None
@@ -1756,14 +1797,69 @@ def numa_get_constraints(flavor, image_meta):
        for c in numa_topology.cells:
            setattr(c, 'pagesize', pagesize)

-    cpu_policy = _get_cpu_policy_constraint(flavor, image_meta)
-    cpu_thread_policy = _get_cpu_thread_policy_constraint(flavor, image_meta)
+    cpu_policy = get_cpu_policy_constraint(flavor, image_meta)
+    cpu_thread_policy = get_cpu_thread_policy_constraint(flavor, image_meta)
    rt_mask = _get_realtime_constraint(flavor, image_meta)
    emu_threads_policy = get_emulator_thread_policy_constraint(flavor)

+    # handle explicit VCPU/PCPU resource requests and the HW_CPU_HYPERTHREADING
+    # trait
+
+    requested_vcpus, requested_pcpus = _get_vcpu_pcpu_resources(flavor)
+
+    if cpu_policy and (requested_vcpus or requested_pcpus):
+        # TODO(stephenfin): Make these custom exceptions
+        raise exception.InvalidRequest(
+            "It is not possible to use the 'resources:VCPU' or "
+            "'resources:PCPU' extra specs in combination with the "
+            "'hw:cpu_policy' extra spec or 'hw_cpu_policy' image metadata "
+            "property; use one or the other")
+
+    if requested_vcpus and requested_pcpus:
+        raise exception.InvalidRequest(
+            "It is not possible to specify both 'resources:VCPU' and "
+            "'resources:PCPU' extra specs; use one or the other")
+
+    if requested_pcpus:
+        if (emu_threads_policy == fields.CPUEmulatorThreadsPolicy.ISOLATE and
+                flavor.vcpus + 1 != requested_pcpus):
+            raise exception.InvalidRequest(
+                "You have requested 'hw:emulator_threads_policy=isolate' but "
+                "have not requested sufficient PCPUs to handle this policy; "
+                "you must allocate exactly flavor.vcpus + 1 PCPUs.")
+
+        if (emu_threads_policy != fields.CPUEmulatorThreadsPolicy.ISOLATE and
+                flavor.vcpus != requested_pcpus):
+            raise exception.InvalidRequest(
+                "There is a mismatch between the number of PCPUs requested "
+                "via 'resourcesNN:PCPU' and the flavor); you must allocate "
+                "exactly flavor.vcpus PCPUs")
+
+        cpu_policy = fields.CPUAllocationPolicy.DEDICATED
+
+    if requested_vcpus:
+        # NOTE(stephenfin): It would be nice if we could error out if
+        # flavor.vcpus != resources:PCPU, but that would be a breaking change.
+        # Better to wait until we remove flavor.vcpus or something
+        cpu_policy = fields.CPUAllocationPolicy.SHARED
+
+    hyperthreading_trait = _get_hyperthreading_trait(flavor, image_meta)
+
+    if cpu_thread_policy and hyperthreading_trait:
+        raise exception.InvalidRequest(
+            "It is not possible to use the 'trait:HW_CPU_HYPERTHREADING' "
+            "extra spec in combination with the 'hw:cpu_thread_policy' "
+            "extra spec or 'hw_cpu_thread_policy' image metadata property; "
+            "use one or the other")
+
+    if hyperthreading_trait == 'forbidden':
+        cpu_thread_policy = fields.CPUThreadAllocationPolicy.ISOLATE
+    elif hyperthreading_trait == 'required':
+        cpu_thread_policy = fields.CPUThreadAllocationPolicy.REQUIRE
+
    # sanity checks

-    if cpu_policy == fields.CPUAllocationPolicy.SHARED:
+    if cpu_policy in (fields.CPUAllocationPolicy.SHARED, None):
        if cpu_thread_policy:
            raise exception.CPUThreadPolicyConfigurationInvalid()

--- a/releasenotes/notes/cpu-resources-d4e6a0c12681fa87.yaml
+++ b/releasenotes/notes/cpu-resources-d4e6a0c12681fa87.yaml
@@ -3,15 +3,36 @@ features:
  - |
    Compute nodes using the libvirt driver can now report ``PCPU`` inventory.
    This is consumed by instances with dedicated (pinned) CPUs. This can be
-    configured using the ``[compute] cpu_dedicated_set`` config option. A
-    legacy path using the now deprecated ``vcpu_pin_set`` config option is
-    provided to assist with upgrades. Refer to the help text of the ``[compute]
-    cpu_dedicated_set``, ``[compute] cpu_shared_set`` and ``vcpu_pin_set``
-    config options for more information.
+    configured using the ``[compute] cpu_dedicated_set`` config option. The
+    scheduler will automatically translate the legacy ``hw:cpu_policy`` flavor
+    extra spec or ``hw_cpu_policy`` image metadata property to ``PCPU``
+    requests, falling back to ``VCPU`` requests only if no ``PCPU`` candidates
+    are found. Refer to the help text of the ``[compute] cpu_dedicated_set``,
+    ``[compute] cpu_shared_set`` and ``vcpu_pin_set`` config options for more
+    information.
+  - |
+    Compute nodes using the libvirt driver will now report the
+    ``HW_CPU_HYPERTHREADING`` trait if the host has hyperthreading. The
+    scheduler will automatically translate the legacy ``hw:cpu_thread_policy``
+    flavor extra spec or ``hw_cpu_thread_policy`` image metadata property to
+    either require or forbid this trait.
  - |
    A new configuration option, ``[compute] cpu_dedicated_set``, has been
    added. This can be used to configure the host CPUs that should be used for
    ``PCPU`` inventory.
+  - |
+    A new configuration option, ``[workarounds] disable_fallback_pcpu_query``,
+    has been added. When creating or moving pinned instances, the scheduler will
+    attempt to provide a ``PCPU``-based allocation, but can also fallback to a legacy
+    ``VCPU``-based allocation. This fallback behavior is enabled by
+    default to ensure it is possible to upgrade without having to modify compute
+    node configuration but it results in an additional request for allocation
+    candidates from placement. This can have a slight performance impact and is
+    unnecessary on new or upgraded deployments where the compute nodes have been
+    correctly configured to report ``PCPU`` inventory. The ``[workarounds]
+    disable_fallback_pcpu_query`` config option can be used to disable this
+    fallback allocation candidate request, meaning only ``PCPU``-based
+    allocation candidates will be retrieved.
 deprecations:
  - |
    The ``vcpu_pin_set`` configuration option has been deprecated. You should