Merge "resource-providers: Migrate compute node inventory"
This commit is contained in:
410
specs/newton/approved/compute-node-inventory-newton.rst
Normal file
410
specs/newton/approved/compute-node-inventory-newton.rst
Normal file
@@ -0,0 +1,410 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===========================================
|
||||
Resource Providers - Compute Node Inventory
|
||||
===========================================
|
||||
|
||||
https://blueprints.launchpad.net/nova/+spec/compute-node-inventory-newton
|
||||
|
||||
As we move towards a system for generic tracking of all quantitative resources
|
||||
in the system using the resource providers modeling system, we need to
|
||||
transition the object model and database schema for a compute node to store
|
||||
inventory information in the resource provider `inventories` table instead of
|
||||
the `compute_nodes` table. This spec outlines the part of this transition
|
||||
process that deals with capacity of resources on a compute node -- the
|
||||
inventory records.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Long-term, we would like to be able to add new types of resources (see the
|
||||
`resource-classes` blueprint) to the system and do so without requiring
|
||||
invasive database schema changes. In order to move to this more generic
|
||||
modeling of quantitative resources and capacity records (see
|
||||
`resource-providers` blueprint) we must transition the storage of inventory
|
||||
information from where that information currently resides to the new
|
||||
`inventories` table in the resource providers modeling system.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
As a deployer, I wish to add new classes of resources to my system and do so
|
||||
without any downtime caused by database schema migrations.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The two major components of this spec are the alignment of the underlying
|
||||
database schema and the changes needed to the `nova.objects.ComputeNode` object
|
||||
model to read and write inventory/capacity information from the `inventories`
|
||||
table instead of the `compute_nodes` table.
|
||||
|
||||
Alignment of database schema
|
||||
----------------------------
|
||||
|
||||
To align the underlying database storage for inventory records, we propose to
|
||||
move the resource usage and capacity fields from their current locations in the
|
||||
database to the new `inventories` table added in the `resource-providers`
|
||||
blueprint.
|
||||
|
||||
Currently, the Nova database stores inventory records for the following
|
||||
resource classes:
|
||||
|
||||
* vCPUs:
|
||||
|
||||
* `compute_nodes.vcpus`: Total physical CPU cores on the compute node
|
||||
* `compute_nodes.vcpus_used`: Number of vCPUs allocated to virtual machines
|
||||
running on that compute node
|
||||
* `compute_nodes.cpu_allocation_ratio`: Overcommit ratio for vCPU on the
|
||||
compute node
|
||||
|
||||
* RAM:
|
||||
|
||||
* `compute_nodes.memory_mb`: Total amount of physical RAM in MB on the
|
||||
compute node
|
||||
* `compute_nodes.memory_mb_used`: Amount of RAM allocated to virtual machines
|
||||
running on that compute node
|
||||
* `compute_nodes.ram_allocation_ratio`: Overcommit ratio for memory on the
|
||||
compute node
|
||||
* `compute_nodes.free_ram_mb`: A calculated field that can go away since its
|
||||
value can be determined by looking at used versus capacity values
|
||||
|
||||
* Disk:
|
||||
|
||||
* `compute_nodes.local_gb`: Amount of disk storage available to the compute
|
||||
node for storage virtual machine ephemeral disks. While this is denoted
|
||||
"local" disk storage, currently if the local storage for ephemeral disks is
|
||||
shared storage, the compute node has no idea that this storage is shared
|
||||
among other compute nodes. See the `generic-resource-pools` and
|
||||
`resource-providers` blueprints for the solution to this problem
|
||||
* `compute_nodes.local_gb_used`: Amount of disk storage allocated for
|
||||
ephemeral disks of virtual machines running on the compute node. The same
|
||||
problem with shared storage for ephemeral disks applies to this field as
|
||||
well
|
||||
* `compute_nodes.free_disk_gb`: A calculated field that can go away since its
|
||||
value can be determined by looking at used versus capacity values
|
||||
* `disk_available_least`: A field that stores the sum of *actual* used disk
|
||||
amounts on the local compute node. This information can be stored in the new
|
||||
`max_unit` field of the `inventories` table for the `DISK_GB` resource class
|
||||
|
||||
* PCI devices:
|
||||
|
||||
* `pci_stats`: Stores summary information about device "pools" (per
|
||||
product_id and vendor_id combination). This information is made redundant
|
||||
by the `pci-generate-stats` blueprint, which generates a summary view of
|
||||
pool information for PCI devices from the main record table, `pci_devices`
|
||||
table
|
||||
* `pci_devices` table stores all the individual PCI device records, including
|
||||
the status of the device and which instance (if any) the device has been
|
||||
assigned to.
|
||||
|
||||
* NUMA topologies:
|
||||
|
||||
* `compute_nodes.numa_topology`: Serialized `nova.objects.numa.NUMATopology`
|
||||
object that represents both the compute node's NUMA topology **and the
|
||||
assigned NUMA topologies for instances on the compute node**.
|
||||
|
||||
To recap from the `resource-providers` blueprint, the schema of the
|
||||
`inventories` table in the database looks like this::
|
||||
|
||||
CREATE TABLE inventories (
|
||||
id INT UNSIGNED NOT NULL AUTOINCREMENT PRIMARY KEY,
|
||||
resource_provider_id INT UNSIGNED NOT NULL,
|
||||
resource_class_id INT UNSIGNED NOT NULL,
|
||||
total INT UNSIGNED NOT NULL,
|
||||
min_unit INT UNSIGNED NOT NULL,
|
||||
max_unit INT UNSIGNED NOT NULL,
|
||||
step_size INT UNSIGNED NOT NULL,
|
||||
allocation_ratio FLOAT NOT NULL,
|
||||
INDEX (resource_provider_id),
|
||||
INDEX (resource_class_id)
|
||||
);
|
||||
|
||||
We propose to consolidate all of the inventory/capacity fields from the above
|
||||
locations into the new `inventories` table in the following manner:
|
||||
|
||||
Remember that all compute nodes are resource providers, but not all resource
|
||||
providers are compute nodes. There is no globally-unique identifier for a
|
||||
compute node within the OpenStack deployment, and we need a globally-unique
|
||||
identifier for the resource provider.
|
||||
|
||||
1) (COMPLETED IN MITAKA) We must first add a new `uuid` field to the
|
||||
`compute_nodes` table::
|
||||
|
||||
ALTER TABLE compute_nodes ADD COLUMN uuid VARCHAR(36) NULL;
|
||||
|
||||
.. note::
|
||||
|
||||
The `uuid` field must be NULL at first, since we will not be generating
|
||||
values in a schema migration script. See below for where we generate UUIDs for
|
||||
each compute node on-demand as each compute node without a UUID specified is
|
||||
read from the database.
|
||||
|
||||
Because we do not want to do any data migrations in SQL migration scripts, we
|
||||
need to do the following data migrations in the `nova.objects.ComputeNode`
|
||||
object. We propose having a method called `_migrate_inventory()` that handles
|
||||
the data migration steps that is called on `_from_db_object()` when certain
|
||||
conditions are found to be in place (for instance, the compute node doesn't
|
||||
have a UUID field value). The `_migrate_inventory()` method should use a single
|
||||
database transaction to ensure all DB writes are done atomically and it should
|
||||
first check to ensure that all API and conductor nodes have been upgraded to a
|
||||
version that can support the migration.
|
||||
|
||||
2) (COMPLETED IN MITAKA) Compute nodes that have no `uuid` field set should
|
||||
have a new random UUID generated on-demand.
|
||||
|
||||
3) A record must be added to the `resource_providers` table for each compute
|
||||
node::
|
||||
|
||||
INSERT INTO resource_providers (uuid)
|
||||
SELECT uuid FROM compute_nodes;
|
||||
|
||||
4) We need to create the inventory records for each compute node. For each of
|
||||
the resource classes that the compute node provides, we need to store the
|
||||
capacity, min and max unit values, and allocation ratios.
|
||||
|
||||
4a) For the vCPU resource class, we would do the following steps for each
|
||||
compute node. Grab the resource class identifier for CPU from the
|
||||
`resource_classes` table (see `resource-classes` blueprint).
|
||||
|
||||
Insert into the `inventories` table a record for the CPU resource class
|
||||
with the total, min, max, and allocation ratio. For example::
|
||||
|
||||
INSERT INTO inventories (
|
||||
resource_provider_id,
|
||||
resource_class_id,
|
||||
total,
|
||||
min_unit,
|
||||
max_unit,
|
||||
allocation_ratio
|
||||
)
|
||||
SELECT
|
||||
rp.id,
|
||||
$CPU_RESOURCE_CLASS_ID,
|
||||
cn.vcpus,
|
||||
1,
|
||||
cn.vcpus,
|
||||
cn.cpu_allocation_ratio
|
||||
FROM compute_nodes AS cn
|
||||
JOIN resource_providers rp
|
||||
ON cn.uuid = rp.uuid
|
||||
WHERE cn.id = $COMPUTE_NODE_ID;
|
||||
|
||||
4b) Do the same for the RAM and DISK resource classes. For the DISK resource
|
||||
class, do not perform the INSERT if the compute node uses shared storage
|
||||
for the ephemeral disks.
|
||||
|
||||
4c) For the PCI device resource classes (`PCI_GENERIC`, `PCI_SRIOV_PF` and
|
||||
`PCI_SRIOV_VF`), the inventories table records represent the class of
|
||||
resources as a whole, not, for example, individual VFs on an SR-IOV-enabled
|
||||
NIC PF. As such, a single record representing the total amount of each PCI
|
||||
resource class would be added to the inventories table for each compute
|
||||
node that has PCI devices.
|
||||
|
||||
For example, let us assume that a compute node has one SR-IOV-enabled NIC,
|
||||
supporting 255 virtual functions (VFs) and not exposing the physical
|
||||
function (PF) for use by a cloud user. We want to limit the number of VFs
|
||||
that any single instance can consume to 8.
|
||||
|
||||
We would insert the following into the inventories table::
|
||||
|
||||
INSERT INTO inventories (
|
||||
resource_provider_id,
|
||||
resource_class_id,
|
||||
total,
|
||||
min_unit,
|
||||
max_unit,
|
||||
allocation_ratio
|
||||
)
|
||||
SELECT
|
||||
rp.id,
|
||||
$PCI_SRIOV_VF_RESOURCE_CLASS_ID,
|
||||
255,
|
||||
1,
|
||||
8,
|
||||
1.0
|
||||
FROM compute_nodes AS cn
|
||||
JOIN resource_providers rp
|
||||
ON cn.uuid = rp.uuid
|
||||
WHERE cn.id = $COMPUTE_NODE_ID;
|
||||
|
||||
4d) For the NUMA resource classes (`NUMA_SOCKETS`, `NUMA_CORES`, `NUMA_THREADS`
|
||||
and `NUMA_MEMORY`), create an inventory record for each compute node that
|
||||
exposes NUMA topology resources.
|
||||
|
||||
For example, let us assume we have a compute node that exposes 2 NUMA nodes
|
||||
(cells), each with 4 cores and 8 threads. We would set the the min_unit and
|
||||
max_unit values of the inventory records to the single-NUMA-cell
|
||||
constraints and the total value to the combined number of the resource. So,
|
||||
for instance, for the `NUMA_CORES`, we'd set total to 8 (2 sockets having 4
|
||||
cores each), min_unit to 1, and max_unit to 4 (since each cell has 4 cores).
|
||||
|
||||
.. note::
|
||||
|
||||
In the following release from when this code merges, we will do a followup
|
||||
patch that makes the UUID column non-nullable and adds a unique constraint
|
||||
on the compute_nodes.uuid column.
|
||||
|
||||
Changes to `ComputeNode` object model
|
||||
-------------------------------------
|
||||
|
||||
In order to ease the transition from the old-style mechanism for determining
|
||||
inventory/capacity information, we propose modifying the
|
||||
`nova.objects.ComputeNode` object in following ways:
|
||||
|
||||
1) Make the existing `vcpus`, `memory_mb`, `local_gb`, `cpu_allocation_ratio`,
|
||||
and `ram_allocation_ratio`, `disk_allocation_ratio` fields be read using a
|
||||
single query against the `inventories` table and populate the values of the
|
||||
object fields so that the user is none the wiser that the storage mechanism has
|
||||
changed behind the scenes. A single SQL query may be used to grab the above
|
||||
fields::
|
||||
|
||||
SELECT
|
||||
i.resource_class_id,
|
||||
i.total,
|
||||
i.min_unit,
|
||||
i.max_unit,
|
||||
i.allocation_ratio
|
||||
FROM inventories i
|
||||
JOIN resource_providers rp
|
||||
ON i.resource_provider_id = rp.id
|
||||
WHERE rp.uuid = $COMPUTE_NODE_UUID;
|
||||
|
||||
2) The only piece of code that *writes* changes to the `vcpus`, `memory_mb`,
|
||||
`local_gb`, `cpu_allocation_ratio`, and `ram_allocation_ratio` fields of the
|
||||
`ComputeNode` is in the resource tracker, which sets the field values and calls
|
||||
`save()` on the `ComputeNode` object. We can modify the `save()` method to
|
||||
write any changes to inventory/capacity information to the new `inventories`
|
||||
table instead of the `compute_nodes` table.
|
||||
|
||||
.. note::
|
||||
|
||||
The object should be changed to only save capacity information to the
|
||||
inventory table, but **only** if all conductor and API nodes have been
|
||||
upgraded to a version that supports the new inventory schema.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
This is step 3 in an irreversible process that completely changes the way that
|
||||
quantitative things are tracked and claimed in Nova.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
No other database schema changes will be required by this blueprint. The work
|
||||
in this blueprint only populates the `inventories` table that is created in the
|
||||
`resource-providers` blueprint.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None.
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
There will be a database schema migration needed that adds the `uuid` column to
|
||||
the `compute_nodes` table.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
jaypipes
|
||||
|
||||
Other contributors:
|
||||
cdent
|
||||
dansmith
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
The following distinct tasks are involved in this spec's implementation:
|
||||
|
||||
* Create the database schema migration that adds the `uuid` column to the
|
||||
`compute_nodes` table
|
||||
* Modify `nova.objects.ComputeNode.create()` to populate the `uuid` attribute
|
||||
of the compute node, insert a record into the `resource_providers` table and
|
||||
add any inventory/capacity fields to the `inventories` table.
|
||||
* Add a `nova.objects.ComputeNode._migrate_inventory()` method to migrate the
|
||||
inventory/capacity fields from `compute_nodes` to `inventories` and populate
|
||||
`uuid` column value if it is None, as it would be if an older `nova-compute`
|
||||
daemon sent a serialized `ComputeNode` object model to an updated conductor.
|
||||
The `_migrate_inventory()` method should also create a record in the
|
||||
`resource_providers` table for the compute node
|
||||
* Modify `nova.objects.ComputeNode` model to read inventory/capacity
|
||||
information from the `inventories` table instead of the `compute_nodes` table
|
||||
* Modify `nova.objects.ComputeNode` model to store **changed** inventory
|
||||
information (total amount, min and max unit constraints, and allocation
|
||||
ratio) to the `inventories` table instead of the `compute_nodes` table, and
|
||||
read the inventory information from the `inventories` table instead of the
|
||||
`compute_nodes` table
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* `resource-classes` blueprint implemented
|
||||
* `resource-providers` blueprint implemented
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Full unit, functional, and integration testing of the
|
||||
`ComputeNode._migrate_inventory()` method that performs the data migration
|
||||
itself.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Developer reference documentation only. No user-facing impact is expected from
|
||||
this spec's implementation.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* `resource-classes` blueprint: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/mitaka/approved/resource-classes.rst
|
||||
* `resource-providers` blueprint: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/mitaka/approved/resource-providers.rst
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Newton
|
||||
- Introduced
|
||||
Reference in New Issue
Block a user