nova-specs/specs/newton/approved/compute-node-inventory-newton.rst

..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.

 http://creativecommons.org/licenses/by/3.0/legalcode

===========================================
Resource Providers - Compute Node Inventory
===========================================

https://blueprints.launchpad.net/nova/+spec/compute-node-inventory-newton

As we move towards a system for generic tracking of all quantitative resources
in the system using the resource providers modeling system, we need to
transition the object model and database schema for a compute node to store
inventory information in the resource provider `inventories` table instead of
the `compute_nodes` table.  This spec outlines the part of this transition
process that deals with capacity of resources on a compute node -- the
inventory records.

Problem description
===================

Long-term, we would like to be able to add new types of resources (see the
`resource-classes` blueprint) to the system and do so without requiring
invasive database schema changes. In order to move to this more generic
modeling of quantitative resources and capacity records (see
`resource-providers` blueprint) we must transition the storage of inventory
information from where that information currently resides to the new
`inventories` table in the resource providers modeling system.

Use Cases
---------

As a deployer, I wish to add new classes of resources to my system and do so
without any downtime caused by database schema migrations.

Proposed change
===============

The two major components of this spec are the alignment of the underlying
database schema and the changes needed to the `nova.objects.ComputeNode` object
model to read and write inventory/capacity information from the `inventories`
table instead of the `compute_nodes` table.

Alignment of database schema
----------------------------

To align the underlying database storage for inventory records, we propose to
move the resource usage and capacity fields from their current locations in the
database to the new `inventories` table added in the `resource-providers`
blueprint.

Currently, the Nova database stores inventory records for the following
resource classes:

* vCPUs:

 * `compute_nodes.vcpus`: Total physical CPU cores on the compute node
 * `compute_nodes.vcpus_used`: Number of vCPUs allocated to virtual machines
   running on that compute node
 * `compute_nodes.cpu_allocation_ratio`: Overcommit ratio for vCPU on the
   compute node

* RAM:

 * `compute_nodes.memory_mb`: Total amount of physical RAM in MB on the
   compute node
 * `compute_nodes.memory_mb_used`: Amount of RAM allocated to virtual machines
   running on that compute node
 * `compute_nodes.ram_allocation_ratio`: Overcommit ratio for memory on the
   compute node
 * `compute_nodes.free_ram_mb`: A calculated field that can go away since its
   value can be determined by looking at used versus capacity values

* Disk:

 * `compute_nodes.local_gb`: Amount of disk storage available to the compute
   node for storage virtual machine ephemeral disks. While this is denoted
   "local" disk storage, currently if the local storage for ephemeral disks is
   shared storage, the compute node has no idea that this storage is shared
   among other compute nodes. See the `generic-resource-pools` and
   `resource-providers` blueprints for the solution to this problem
 * `compute_nodes.local_gb_used`: Amount of disk storage allocated for
   ephemeral disks of virtual machines running on the compute node. The same
   problem with shared storage for ephemeral disks applies to this field as
   well
 * `compute_nodes.free_disk_gb`: A calculated field that can go away since its
   value can be determined by looking at used versus capacity values
 * `disk_available_least`: A field that stores the sum of *actual* used disk
   amounts on the local compute node. This information can be stored in the new
   `max_unit` field of the `inventories` table for the `DISK_GB` resource class

* PCI devices:

 * `pci_stats`: Stores summary information about device "pools" (per
   product_id and vendor_id combination). This information is made redundant
   by the `pci-generate-stats` blueprint, which generates a summary view of
   pool information for PCI devices from the main record table, `pci_devices`
   table
 * `pci_devices` table stores all the individual PCI device records, including
   the status of the device and which instance (if any) the device has been
   assigned to.

* NUMA topologies:

 * `compute_nodes.numa_topology`: Serialized `nova.objects.numa.NUMATopology`
   object that represents both the compute node's NUMA topology **and the
   assigned NUMA topologies for instances on the compute node**.

To recap from the `resource-providers` blueprint, the schema of the
`inventories` table in the database looks like this::

    CREATE TABLE inventories (
        id INT UNSIGNED NOT NULL AUTOINCREMENT PRIMARY KEY,
        resource_provider_id INT UNSIGNED NOT NULL,
        resource_class_id INT UNSIGNED NOT NULL,
        total INT UNSIGNED NOT NULL,
        min_unit INT UNSIGNED NOT NULL,
        max_unit INT UNSIGNED NOT NULL,
        step_size INT UNSIGNED NOT NULL,
        allocation_ratio FLOAT NOT NULL,
        INDEX (resource_provider_id),
        INDEX (resource_class_id)
    );

We propose to consolidate all of the inventory/capacity fields from the above
locations into the new `inventories` table in the following manner:

Remember that all compute nodes are resource providers, but not all resource
providers are compute nodes. There is no globally-unique identifier for a
compute node within the OpenStack deployment, and we need a globally-unique
identifier for the resource provider.

1) (COMPLETED IN MITAKA) We must first add a new `uuid` field to the
`compute_nodes` table::

    ALTER TABLE compute_nodes ADD COLUMN uuid VARCHAR(36) NULL;

.. note::

    The `uuid` field must be NULL at first, since we will not be generating
    values in a schema migration script. See below for where we generate UUIDs for
    each compute node on-demand as each compute node without a UUID specified is
    read from the database.

Because we do not want to do any data migrations in SQL migration scripts, we
need to do the following data migrations in the `nova.objects.ComputeNode`
object. We propose having a method called `_migrate_inventory()` that handles
the data migration steps that is called on `_from_db_object()` when certain
conditions are found to be in place (for instance, the compute node doesn't
have a UUID field value). The `_migrate_inventory()` method should use a single
database transaction to ensure all DB writes are done atomically and it should
first check to ensure that all API and conductor nodes have been upgraded to a
version that can support the migration.

2) (COMPLETED IN MITAKA) Compute nodes that have no `uuid` field set should
have a new random UUID generated on-demand.

3) A record must be added to the `resource_providers` table for each compute
node::

    INSERT INTO resource_providers (uuid)
    SELECT uuid FROM compute_nodes;

4) We need to create the inventory records for each compute node. For each of
the resource classes that the compute node provides, we need to store the
capacity, min and max unit values, and allocation ratios.

4a) For the vCPU resource class, we would do the following steps for each
compute node. Grab the resource class identifier for CPU from the
`resource_classes` table (see `resource-classes` blueprint).

Insert into the `inventories` table a record for the CPU resource class
with the total, min, max, and allocation ratio. For example::

    INSERT INTO inventories (
        resource_provider_id,
        resource_class_id,
        total,
        min_unit,
        max_unit,
        allocation_ratio
    )
    SELECT
        rp.id,
        $CPU_RESOURCE_CLASS_ID,
        cn.vcpus,
        1,
        cn.vcpus,
        cn.cpu_allocation_ratio
    FROM compute_nodes AS cn
        JOIN resource_providers rp
           ON cn.uuid = rp.uuid
    WHERE cn.id = $COMPUTE_NODE_ID;

4b) Do the same for the RAM and DISK resource classes. For the DISK resource
class, do not perform the INSERT if the compute node uses shared storage
for the ephemeral disks.

4c) For the PCI device resource classes (`PCI_GENERIC`, `PCI_SRIOV_PF` and
`PCI_SRIOV_VF`), the inventories table records represent the class of
resources as a whole, not, for example, individual VFs on an SR-IOV-enabled
NIC PF. As such, a single record representing the total amount of each PCI
resource class would be added to the inventories table for each compute
node that has PCI devices.

For example, let us assume that a compute node has one SR-IOV-enabled NIC,
supporting 255 virtual functions (VFs) and not exposing the physical
function (PF) for use by a cloud user. We want to limit the number of VFs
that any single instance can consume to 8.

We would insert the following into the inventories table::

    INSERT INTO inventories (
        resource_provider_id,
        resource_class_id,
        total,
        min_unit,
        max_unit,
        allocation_ratio
    )
    SELECT
        rp.id,
        $PCI_SRIOV_VF_RESOURCE_CLASS_ID,
        255,
        1,
        8,
        1.0
    FROM compute_nodes AS cn
        JOIN resource_providers rp
           ON cn.uuid = rp.uuid
    WHERE cn.id = $COMPUTE_NODE_ID;

4d) For the NUMA resource classes (`NUMA_SOCKETS`, `NUMA_CORES`, `NUMA_THREADS`
and `NUMA_MEMORY`), create an inventory record for each compute node that
exposes NUMA topology resources.

For example, let us assume we have a compute node that exposes 2 NUMA nodes
(cells), each with 4 cores and 8 threads. We would set the the min_unit and
max_unit values of the inventory records to the single-NUMA-cell
constraints and the total value to the combined number of the resource. So,
for instance, for the `NUMA_CORES`, we'd set total to 8 (2 sockets having 4
cores each), min_unit to 1, and max_unit to 4 (since each cell has 4 cores).

.. note::

    In the following release from when this code merges, we will do a followup
    patch that makes the UUID column non-nullable and adds a unique constraint
    on the compute_nodes.uuid column.

Changes to `ComputeNode` object model
-------------------------------------

In order to ease the transition from the old-style mechanism for determining
inventory/capacity information, we propose modifying the
`nova.objects.ComputeNode` object in following ways:

1) Make the existing `vcpus`, `memory_mb`, `local_gb`, `cpu_allocation_ratio`,
and `ram_allocation_ratio`, `disk_allocation_ratio` fields be read using a
single query against the `inventories` table and populate the values of the
object fields so that the user is none the wiser that the storage mechanism has
changed behind the scenes. A single SQL query may be used to grab the above
fields::

    SELECT
        i.resource_class_id,
        i.total,
        i.min_unit,
        i.max_unit,
        i.allocation_ratio
    FROM inventories i
      JOIN resource_providers rp
      ON i.resource_provider_id = rp.id
    WHERE rp.uuid = $COMPUTE_NODE_UUID;

2) The only piece of code that *writes* changes to the `vcpus`, `memory_mb`,
`local_gb`, `cpu_allocation_ratio`, and `ram_allocation_ratio` fields of the
`ComputeNode` is in the resource tracker, which sets the field values and calls
`save()` on the `ComputeNode` object. We can modify the `save()` method to
write any changes to inventory/capacity information to the new `inventories`
table instead of the `compute_nodes` table.

.. note::

    The object should be changed to only save capacity information to the
    inventory table, but **only** if all conductor and API nodes have been
    upgraded to a version that supports the new inventory schema.

Alternatives
------------

This is step 3 in an irreversible process that completely changes the way that
quantitative things are tracked and claimed in Nova.

Data model impact
-----------------

No other database schema changes will be required by this blueprint. The work
in this blueprint only populates the `inventories` table that is created in the
`resource-providers` blueprint.

REST API impact
---------------

None.

Security impact
---------------

None.

Notifications impact
--------------------

None.

Other end user impact
---------------------

None.

Performance Impact
------------------

None.

Other deployer impact
---------------------

There will be a database schema migration needed that adds the `uuid` column to
the `compute_nodes` table.

Developer impact
----------------

None.

Implementation
==============

Assignee(s)
-----------

Primary assignee:
  jaypipes

Other contributors:
  cdent
  dansmith

Work Items
----------

The following distinct tasks are involved in this spec's implementation:

* Create the database schema migration that adds the `uuid` column to the
  `compute_nodes` table
* Modify `nova.objects.ComputeNode.create()` to populate the `uuid` attribute
  of the compute node, insert a record into the `resource_providers` table and
  add any inventory/capacity fields to the `inventories` table.
* Add a `nova.objects.ComputeNode._migrate_inventory()` method to migrate the
  inventory/capacity fields from `compute_nodes` to `inventories` and populate
  `uuid` column value if it is None, as it would be if an older `nova-compute`
  daemon sent a serialized `ComputeNode` object model to an updated conductor.
  The `_migrate_inventory()` method should also create a record in the
  `resource_providers` table for the compute node
* Modify `nova.objects.ComputeNode` model to read inventory/capacity
  information from the `inventories` table instead of the `compute_nodes` table
* Modify `nova.objects.ComputeNode` model to store **changed** inventory
  information (total amount, min and max unit constraints, and allocation
  ratio) to the `inventories` table instead of the `compute_nodes` table, and
  read the inventory information from the `inventories` table instead of the
  `compute_nodes` table

Dependencies
============

* `resource-classes` blueprint implemented
* `resource-providers` blueprint implemented

Testing
=======

Full unit, functional, and integration testing of the
`ComputeNode._migrate_inventory()` method that performs the data migration
itself.

Documentation Impact
====================

Developer reference documentation only. No user-facing impact is expected from
this spec's implementation.

References
==========

* `resource-classes` blueprint: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/mitaka/approved/resource-classes.rst
* `resource-providers` blueprint: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/mitaka/approved/resource-providers.rst

History
=======

.. list-table:: Revisions
   :header-rows: 1

   * - Release Name
     - Description
   * - Newton
     - Introduced