Moves the inventory/capacity-related fields for the compute node into the inventories table introduced in the `resource-providers` blueprint. Change-Id: Ic0b1fc492133ae835765abac90b85c6ccd6deeee Blueprint: compute-node-inventory-newton
411 lines
15 KiB
ReStructuredText
411 lines
15 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
===========================================
|
|
Resource Providers - Compute Node Inventory
|
|
===========================================
|
|
|
|
https://blueprints.launchpad.net/nova/+spec/compute-node-inventory-newton
|
|
|
|
As we move towards a system for generic tracking of all quantitative resources
|
|
in the system using the resource providers modeling system, we need to
|
|
transition the object model and database schema for a compute node to store
|
|
inventory information in the resource provider `inventories` table instead of
|
|
the `compute_nodes` table. This spec outlines the part of this transition
|
|
process that deals with capacity of resources on a compute node -- the
|
|
inventory records.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
Long-term, we would like to be able to add new types of resources (see the
|
|
`resource-classes` blueprint) to the system and do so without requiring
|
|
invasive database schema changes. In order to move to this more generic
|
|
modeling of quantitative resources and capacity records (see
|
|
`resource-providers` blueprint) we must transition the storage of inventory
|
|
information from where that information currently resides to the new
|
|
`inventories` table in the resource providers modeling system.
|
|
|
|
Use Cases
|
|
---------
|
|
|
|
As a deployer, I wish to add new classes of resources to my system and do so
|
|
without any downtime caused by database schema migrations.
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
The two major components of this spec are the alignment of the underlying
|
|
database schema and the changes needed to the `nova.objects.ComputeNode` object
|
|
model to read and write inventory/capacity information from the `inventories`
|
|
table instead of the `compute_nodes` table.
|
|
|
|
Alignment of database schema
|
|
----------------------------
|
|
|
|
To align the underlying database storage for inventory records, we propose to
|
|
move the resource usage and capacity fields from their current locations in the
|
|
database to the new `inventories` table added in the `resource-providers`
|
|
blueprint.
|
|
|
|
Currently, the Nova database stores inventory records for the following
|
|
resource classes:
|
|
|
|
* vCPUs:
|
|
|
|
* `compute_nodes.vcpus`: Total physical CPU cores on the compute node
|
|
* `compute_nodes.vcpus_used`: Number of vCPUs allocated to virtual machines
|
|
running on that compute node
|
|
* `compute_nodes.cpu_allocation_ratio`: Overcommit ratio for vCPU on the
|
|
compute node
|
|
|
|
* RAM:
|
|
|
|
* `compute_nodes.memory_mb`: Total amount of physical RAM in MB on the
|
|
compute node
|
|
* `compute_nodes.memory_mb_used`: Amount of RAM allocated to virtual machines
|
|
running on that compute node
|
|
* `compute_nodes.ram_allocation_ratio`: Overcommit ratio for memory on the
|
|
compute node
|
|
* `compute_nodes.free_ram_mb`: A calculated field that can go away since its
|
|
value can be determined by looking at used versus capacity values
|
|
|
|
* Disk:
|
|
|
|
* `compute_nodes.local_gb`: Amount of disk storage available to the compute
|
|
node for storage virtual machine ephemeral disks. While this is denoted
|
|
"local" disk storage, currently if the local storage for ephemeral disks is
|
|
shared storage, the compute node has no idea that this storage is shared
|
|
among other compute nodes. See the `generic-resource-pools` and
|
|
`resource-providers` blueprints for the solution to this problem
|
|
* `compute_nodes.local_gb_used`: Amount of disk storage allocated for
|
|
ephemeral disks of virtual machines running on the compute node. The same
|
|
problem with shared storage for ephemeral disks applies to this field as
|
|
well
|
|
* `compute_nodes.free_disk_gb`: A calculated field that can go away since its
|
|
value can be determined by looking at used versus capacity values
|
|
* `disk_available_least`: A field that stores the sum of *actual* used disk
|
|
amounts on the local compute node. This information can be stored in the new
|
|
`max_unit` field of the `inventories` table for the `DISK_GB` resource class
|
|
|
|
* PCI devices:
|
|
|
|
* `pci_stats`: Stores summary information about device "pools" (per
|
|
product_id and vendor_id combination). This information is made redundant
|
|
by the `pci-generate-stats` blueprint, which generates a summary view of
|
|
pool information for PCI devices from the main record table, `pci_devices`
|
|
table
|
|
* `pci_devices` table stores all the individual PCI device records, including
|
|
the status of the device and which instance (if any) the device has been
|
|
assigned to.
|
|
|
|
* NUMA topologies:
|
|
|
|
* `compute_nodes.numa_topology`: Serialized `nova.objects.numa.NUMATopology`
|
|
object that represents both the compute node's NUMA topology **and the
|
|
assigned NUMA topologies for instances on the compute node**.
|
|
|
|
To recap from the `resource-providers` blueprint, the schema of the
|
|
`inventories` table in the database looks like this::
|
|
|
|
CREATE TABLE inventories (
|
|
id INT UNSIGNED NOT NULL AUTOINCREMENT PRIMARY KEY,
|
|
resource_provider_id INT UNSIGNED NOT NULL,
|
|
resource_class_id INT UNSIGNED NOT NULL,
|
|
total INT UNSIGNED NOT NULL,
|
|
min_unit INT UNSIGNED NOT NULL,
|
|
max_unit INT UNSIGNED NOT NULL,
|
|
step_size INT UNSIGNED NOT NULL,
|
|
allocation_ratio FLOAT NOT NULL,
|
|
INDEX (resource_provider_id),
|
|
INDEX (resource_class_id)
|
|
);
|
|
|
|
We propose to consolidate all of the inventory/capacity fields from the above
|
|
locations into the new `inventories` table in the following manner:
|
|
|
|
Remember that all compute nodes are resource providers, but not all resource
|
|
providers are compute nodes. There is no globally-unique identifier for a
|
|
compute node within the OpenStack deployment, and we need a globally-unique
|
|
identifier for the resource provider.
|
|
|
|
1) (COMPLETED IN MITAKA) We must first add a new `uuid` field to the
|
|
`compute_nodes` table::
|
|
|
|
ALTER TABLE compute_nodes ADD COLUMN uuid VARCHAR(36) NULL;
|
|
|
|
.. note::
|
|
|
|
The `uuid` field must be NULL at first, since we will not be generating
|
|
values in a schema migration script. See below for where we generate UUIDs for
|
|
each compute node on-demand as each compute node without a UUID specified is
|
|
read from the database.
|
|
|
|
Because we do not want to do any data migrations in SQL migration scripts, we
|
|
need to do the following data migrations in the `nova.objects.ComputeNode`
|
|
object. We propose having a method called `_migrate_inventory()` that handles
|
|
the data migration steps that is called on `_from_db_object()` when certain
|
|
conditions are found to be in place (for instance, the compute node doesn't
|
|
have a UUID field value). The `_migrate_inventory()` method should use a single
|
|
database transaction to ensure all DB writes are done atomically and it should
|
|
first check to ensure that all API and conductor nodes have been upgraded to a
|
|
version that can support the migration.
|
|
|
|
2) (COMPLETED IN MITAKA) Compute nodes that have no `uuid` field set should
|
|
have a new random UUID generated on-demand.
|
|
|
|
3) A record must be added to the `resource_providers` table for each compute
|
|
node::
|
|
|
|
INSERT INTO resource_providers (uuid)
|
|
SELECT uuid FROM compute_nodes;
|
|
|
|
4) We need to create the inventory records for each compute node. For each of
|
|
the resource classes that the compute node provides, we need to store the
|
|
capacity, min and max unit values, and allocation ratios.
|
|
|
|
4a) For the vCPU resource class, we would do the following steps for each
|
|
compute node. Grab the resource class identifier for CPU from the
|
|
`resource_classes` table (see `resource-classes` blueprint).
|
|
|
|
Insert into the `inventories` table a record for the CPU resource class
|
|
with the total, min, max, and allocation ratio. For example::
|
|
|
|
INSERT INTO inventories (
|
|
resource_provider_id,
|
|
resource_class_id,
|
|
total,
|
|
min_unit,
|
|
max_unit,
|
|
allocation_ratio
|
|
)
|
|
SELECT
|
|
rp.id,
|
|
$CPU_RESOURCE_CLASS_ID,
|
|
cn.vcpus,
|
|
1,
|
|
cn.vcpus,
|
|
cn.cpu_allocation_ratio
|
|
FROM compute_nodes AS cn
|
|
JOIN resource_providers rp
|
|
ON cn.uuid = rp.uuid
|
|
WHERE cn.id = $COMPUTE_NODE_ID;
|
|
|
|
4b) Do the same for the RAM and DISK resource classes. For the DISK resource
|
|
class, do not perform the INSERT if the compute node uses shared storage
|
|
for the ephemeral disks.
|
|
|
|
4c) For the PCI device resource classes (`PCI_GENERIC`, `PCI_SRIOV_PF` and
|
|
`PCI_SRIOV_VF`), the inventories table records represent the class of
|
|
resources as a whole, not, for example, individual VFs on an SR-IOV-enabled
|
|
NIC PF. As such, a single record representing the total amount of each PCI
|
|
resource class would be added to the inventories table for each compute
|
|
node that has PCI devices.
|
|
|
|
For example, let us assume that a compute node has one SR-IOV-enabled NIC,
|
|
supporting 255 virtual functions (VFs) and not exposing the physical
|
|
function (PF) for use by a cloud user. We want to limit the number of VFs
|
|
that any single instance can consume to 8.
|
|
|
|
We would insert the following into the inventories table::
|
|
|
|
INSERT INTO inventories (
|
|
resource_provider_id,
|
|
resource_class_id,
|
|
total,
|
|
min_unit,
|
|
max_unit,
|
|
allocation_ratio
|
|
)
|
|
SELECT
|
|
rp.id,
|
|
$PCI_SRIOV_VF_RESOURCE_CLASS_ID,
|
|
255,
|
|
1,
|
|
8,
|
|
1.0
|
|
FROM compute_nodes AS cn
|
|
JOIN resource_providers rp
|
|
ON cn.uuid = rp.uuid
|
|
WHERE cn.id = $COMPUTE_NODE_ID;
|
|
|
|
4d) For the NUMA resource classes (`NUMA_SOCKETS`, `NUMA_CORES`, `NUMA_THREADS`
|
|
and `NUMA_MEMORY`), create an inventory record for each compute node that
|
|
exposes NUMA topology resources.
|
|
|
|
For example, let us assume we have a compute node that exposes 2 NUMA nodes
|
|
(cells), each with 4 cores and 8 threads. We would set the the min_unit and
|
|
max_unit values of the inventory records to the single-NUMA-cell
|
|
constraints and the total value to the combined number of the resource. So,
|
|
for instance, for the `NUMA_CORES`, we'd set total to 8 (2 sockets having 4
|
|
cores each), min_unit to 1, and max_unit to 4 (since each cell has 4 cores).
|
|
|
|
.. note::
|
|
|
|
In the following release from when this code merges, we will do a followup
|
|
patch that makes the UUID column non-nullable and adds a unique constraint
|
|
on the compute_nodes.uuid column.
|
|
|
|
Changes to `ComputeNode` object model
|
|
-------------------------------------
|
|
|
|
In order to ease the transition from the old-style mechanism for determining
|
|
inventory/capacity information, we propose modifying the
|
|
`nova.objects.ComputeNode` object in following ways:
|
|
|
|
1) Make the existing `vcpus`, `memory_mb`, `local_gb`, `cpu_allocation_ratio`,
|
|
and `ram_allocation_ratio`, `disk_allocation_ratio` fields be read using a
|
|
single query against the `inventories` table and populate the values of the
|
|
object fields so that the user is none the wiser that the storage mechanism has
|
|
changed behind the scenes. A single SQL query may be used to grab the above
|
|
fields::
|
|
|
|
SELECT
|
|
i.resource_class_id,
|
|
i.total,
|
|
i.min_unit,
|
|
i.max_unit,
|
|
i.allocation_ratio
|
|
FROM inventories i
|
|
JOIN resource_providers rp
|
|
ON i.resource_provider_id = rp.id
|
|
WHERE rp.uuid = $COMPUTE_NODE_UUID;
|
|
|
|
2) The only piece of code that *writes* changes to the `vcpus`, `memory_mb`,
|
|
`local_gb`, `cpu_allocation_ratio`, and `ram_allocation_ratio` fields of the
|
|
`ComputeNode` is in the resource tracker, which sets the field values and calls
|
|
`save()` on the `ComputeNode` object. We can modify the `save()` method to
|
|
write any changes to inventory/capacity information to the new `inventories`
|
|
table instead of the `compute_nodes` table.
|
|
|
|
.. note::
|
|
|
|
The object should be changed to only save capacity information to the
|
|
inventory table, but **only** if all conductor and API nodes have been
|
|
upgraded to a version that supports the new inventory schema.
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
This is step 3 in an irreversible process that completely changes the way that
|
|
quantitative things are tracked and claimed in Nova.
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
No other database schema changes will be required by this blueprint. The work
|
|
in this blueprint only populates the `inventories` table that is created in the
|
|
`resource-providers` blueprint.
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
None.
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
None.
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
None.
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
None.
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
None.
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
There will be a database schema migration needed that adds the `uuid` column to
|
|
the `compute_nodes` table.
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
None.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
jaypipes
|
|
|
|
Other contributors:
|
|
cdent
|
|
dansmith
|
|
|
|
Work Items
|
|
----------
|
|
|
|
The following distinct tasks are involved in this spec's implementation:
|
|
|
|
* Create the database schema migration that adds the `uuid` column to the
|
|
`compute_nodes` table
|
|
* Modify `nova.objects.ComputeNode.create()` to populate the `uuid` attribute
|
|
of the compute node, insert a record into the `resource_providers` table and
|
|
add any inventory/capacity fields to the `inventories` table.
|
|
* Add a `nova.objects.ComputeNode._migrate_inventory()` method to migrate the
|
|
inventory/capacity fields from `compute_nodes` to `inventories` and populate
|
|
`uuid` column value if it is None, as it would be if an older `nova-compute`
|
|
daemon sent a serialized `ComputeNode` object model to an updated conductor.
|
|
The `_migrate_inventory()` method should also create a record in the
|
|
`resource_providers` table for the compute node
|
|
* Modify `nova.objects.ComputeNode` model to read inventory/capacity
|
|
information from the `inventories` table instead of the `compute_nodes` table
|
|
* Modify `nova.objects.ComputeNode` model to store **changed** inventory
|
|
information (total amount, min and max unit constraints, and allocation
|
|
ratio) to the `inventories` table instead of the `compute_nodes` table, and
|
|
read the inventory information from the `inventories` table instead of the
|
|
`compute_nodes` table
|
|
|
|
Dependencies
|
|
============
|
|
|
|
* `resource-classes` blueprint implemented
|
|
* `resource-providers` blueprint implemented
|
|
|
|
Testing
|
|
=======
|
|
|
|
Full unit, functional, and integration testing of the
|
|
`ComputeNode._migrate_inventory()` method that performs the data migration
|
|
itself.
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
Developer reference documentation only. No user-facing impact is expected from
|
|
this spec's implementation.
|
|
|
|
References
|
|
==========
|
|
|
|
* `resource-classes` blueprint: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/mitaka/approved/resource-classes.rst
|
|
* `resource-providers` blueprint: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/mitaka/approved/resource-providers.rst
|
|
|
|
History
|
|
=======
|
|
|
|
.. list-table:: Revisions
|
|
:header-rows: 1
|
|
|
|
* - Release Name
|
|
- Description
|
|
* - Newton
|
|
- Introduced
|