Adds pci-generate-stats spec

Spec covering the removal of storing summary PCI device information in
the Nova database.

Change-Id: Ibbfd1cf40103c366733ee5af251956a3e981afea
Blueprint: pci-generate-stats
This commit is contained in:
Jay Pipes
2015-11-02 08:32:51 -05:00
parent 8b3c2633b7
commit db1ef53c36

View File

@@ -0,0 +1,200 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===================================
On-demand Generate PCI Device Pools
===================================
https://blueprints.launchpad.net/nova/+spec/pci-stats-generate
This proposal is to generate PCI device pool information on-the-fly instead of
storing the summary pool information in the database.
Problem description
===================
The `compute_nodes` table currently stores in the `pci_stats` field a JSON
representation of PCI device "pools". This information is updated by the
nova-compute resource tracker by the `nova.pci.stats.PciDevStats` class and
read by the Nova scheduler in each iteration of the `select_destinations()`
call when the `nova.objects.ComputeNodeList.get_all()` method is used to pull
all information about compute nodes in the system. The reason that this summary
information is pulled by the scheduler is to avoid having to send message
containing thousands of PCI device records across the wire.
The problem with storing this summary information in the `compute_nodes` table
is two-fold:
1) There is the possibility that the summary information can get out of sync
with the non-summary information stored in the `pci_devices` table, and
2) It interferes with our efforts to represent all resources in the system in a
consistent and generic fashion (the resource-objects blueprint work)
Use Cases
----------
As a developer of Nova, I want to be able to represent all quantitative
resources in the system in a consistent and generic fashion. As an operator, I
do not want summary and detail information in my database to get out of sync.
Proposed change
===============
We propose the following changes to the Nova code base:
1) Temporarily duplicate the logic of
`nova.pci.stats.PciDevStats.supports_request()` method into the
`nova.objects.PciDevicePoolList` object.
2) Move the logic for determining if a compute node can provide a requested PCI
device to an instance from the `nova.pci.stats.PciDevStats.consume_requests()`
method to the `nova.pci.manager.PciDevTracker._claim_for_instance()` method.
3) Modify the `nova.objects.ComputeNode` object to load on-demand the
`pci_device_pools` field by a subquery instead of pulling from the
`compute_nodes.pci_stats` field in the database. The `PciDevicePoolList` object
can be generated using a single SQL query on the `pci_devices` table, like so::
SELECT product_id, vendor_id, numa_node, COUNT(*) as count
FROM pci_devices
WHERE compute_node_id = ?
GROUP BY product_id, vendor_id, numa_node;
This will only be used for legacy compute nodes that rely on the
`nova.objects.ComputeNode.pci_device_pools` field attribute.
4) Change the scheduler's host manager to load PCI device pool information
using a new `nova.objects.PciDevicePoolList.get_all` method that returns all
PCI device pool information for all compute nodes, but only when the
PciPassthroughFilter is enabled. This will match how the HostAggregate
information is loaded by the scheduler and collated to HostState objects.
The SQL statement for grabbing all of the PCI device pool information for
compute nodes looks like this::
SELECT compute_node_id, product_id, vendor_id, numa_node, COUNT(*) as count
FROM pci_devices
GROUP BY compute_node_id, product_id, vendor_id, numa_node;
5) Change the `nova.scheduler.pci_passthrough_filter.host_passes` method to use
the `nova.objects.PciDevicePoolList.supports_requests()` method instead of the
`nova.pci.stats.PciDevStats.support_requests()` method.
6) Remove the `nova.pci.stats` module entirely.
7) Deprecate the `compute_node.pci_stats` field in the database and mark it for
removal in the N release.
Alternatives
------------
None.
Data model impact
-----------------
None, this changes the implementation of existing model definitions only.
REST API impact
---------------
None.
Security impact
---------------
None.
Notifications impact
--------------------
None.
Other end user impact
---------------------
None.
Performance Impact
------------------
None. The over-wire information will continue to be the same. The database
query for generating the summary PCI device information should be very quick.
Other deployer impact
---------------------
None.
Developer impact
----------------
This will allow the resource-objects blueprint to proceed, since PCI device
resources will be able to be handled in the same way as NUMA or other
quantitative resources.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
dstepanenko
Other contributors:
jaypipes
Work Items
----------
1) Duplicate supports_request() method into the
`nova.objects.PciDevicePoolList` object.
2) Move `nova.pci.stats.PciDevStats.consume_requests()`
to the `nova.pci.manager.PciDevTracker._claim_for_instance()` method.
3) Modify the `nova.objects.ComputeNode` object to load on-demand the
`pci_device_pools` field
4) Change the scheduler host manager to load PciDevicePoolList object for all
compute nodes in the same way that host aggregate information is loaded, and
only when the PciPassthroughFilter is enabled.
5) Change the `nova.scheduler.pci_passthrough_filter.host_passes` method to use
the `nova.objects.PciDevicePoolList.supports_requests()` method
6) Remove the `nova.pci.stats` module entirely.
7) Annotate the `nova.db.sqlalchemy.models.ComputeNode.pci_stats` field in the
database as deprecated.
Dependencies
============
None.
Testing
=======
Should be a net reduction in unit tests since the logic for decrementing the
PCI device pool counts will be removed entirely.
Documentation Impact
====================
None. No user-facing changes.
References
==========
None.
History
=======
None.