Merge "resource-providers: Migrate compute node inventory"

2016-04-04 17:36:08 +00:00
parent 3706a202fc 35e3af89c3
commit 069595e54a
1 changed files with 410 additions and 0 deletions
--- a/specs/newton/approved/compute-node-inventory-newton.rst
+++ b/specs/newton/approved/compute-node-inventory-newton.rst
@@ -0,0 +1,410 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+===========================================
+Resource Providers - Compute Node Inventory
+===========================================
+
+https://blueprints.launchpad.net/nova/+spec/compute-node-inventory-newton
+
+As we move towards a system for generic tracking of all quantitative resources
+in the system using the resource providers modeling system, we need to
+transition the object model and database schema for a compute node to store
+inventory information in the resource provider `inventories` table instead of
+the `compute_nodes` table.  This spec outlines the part of this transition
+process that deals with capacity of resources on a compute node -- the
+inventory records.
+
+Problem description
+===================
+
+Long-term, we would like to be able to add new types of resources (see the
+`resource-classes` blueprint) to the system and do so without requiring
+invasive database schema changes. In order to move to this more generic
+modeling of quantitative resources and capacity records (see
+`resource-providers` blueprint) we must transition the storage of inventory
+information from where that information currently resides to the new
+`inventories` table in the resource providers modeling system.
+
+Use Cases
+---------
+
+As a deployer, I wish to add new classes of resources to my system and do so
+without any downtime caused by database schema migrations.
+
+Proposed change
+===============
+
+The two major components of this spec are the alignment of the underlying
+database schema and the changes needed to the `nova.objects.ComputeNode` object
+model to read and write inventory/capacity information from the `inventories`
+table instead of the `compute_nodes` table.
+
+Alignment of database schema
+----------------------------
+
+To align the underlying database storage for inventory records, we propose to
+move the resource usage and capacity fields from their current locations in the
+database to the new `inventories` table added in the `resource-providers`
+blueprint.
+
+Currently, the Nova database stores inventory records for the following
+resource classes:
+
+* vCPUs:
+
+ * `compute_nodes.vcpus`: Total physical CPU cores on the compute node
+ * `compute_nodes.vcpus_used`: Number of vCPUs allocated to virtual machines
+   running on that compute node
+ * `compute_nodes.cpu_allocation_ratio`: Overcommit ratio for vCPU on the
+   compute node
+
+* RAM:
+
+ * `compute_nodes.memory_mb`: Total amount of physical RAM in MB on the
+   compute node
+ * `compute_nodes.memory_mb_used`: Amount of RAM allocated to virtual machines
+   running on that compute node
+ * `compute_nodes.ram_allocation_ratio`: Overcommit ratio for memory on the
+   compute node
+ * `compute_nodes.free_ram_mb`: A calculated field that can go away since its
+   value can be determined by looking at used versus capacity values
+
+* Disk:
+
+ * `compute_nodes.local_gb`: Amount of disk storage available to the compute
+   node for storage virtual machine ephemeral disks. While this is denoted
+   "local" disk storage, currently if the local storage for ephemeral disks is
+   shared storage, the compute node has no idea that this storage is shared
+   among other compute nodes. See the `generic-resource-pools` and
+   `resource-providers` blueprints for the solution to this problem
+ * `compute_nodes.local_gb_used`: Amount of disk storage allocated for
+   ephemeral disks of virtual machines running on the compute node. The same
+   problem with shared storage for ephemeral disks applies to this field as
+   well
+ * `compute_nodes.free_disk_gb`: A calculated field that can go away since its
+   value can be determined by looking at used versus capacity values
+ * `disk_available_least`: A field that stores the sum of *actual* used disk
+   amounts on the local compute node. This information can be stored in the new
+   `max_unit` field of the `inventories` table for the `DISK_GB` resource class
+
+* PCI devices:
+
+ * `pci_stats`: Stores summary information about device "pools" (per
+   product_id and vendor_id combination). This information is made redundant
+   by the `pci-generate-stats` blueprint, which generates a summary view of
+   pool information for PCI devices from the main record table, `pci_devices`
+   table
+ * `pci_devices` table stores all the individual PCI device records, including
+   the status of the device and which instance (if any) the device has been
+   assigned to.
+
+* NUMA topologies:
+
+ * `compute_nodes.numa_topology`: Serialized `nova.objects.numa.NUMATopology`
+   object that represents both the compute node's NUMA topology **and the
+   assigned NUMA topologies for instances on the compute node**.
+
+To recap from the `resource-providers` blueprint, the schema of the
+`inventories` table in the database looks like this::
+
+    CREATE TABLE inventories (
+        id INT UNSIGNED NOT NULL AUTOINCREMENT PRIMARY KEY,
+        resource_provider_id INT UNSIGNED NOT NULL,
+        resource_class_id INT UNSIGNED NOT NULL,
+        total INT UNSIGNED NOT NULL,
+        min_unit INT UNSIGNED NOT NULL,
+        max_unit INT UNSIGNED NOT NULL,
+        step_size INT UNSIGNED NOT NULL,
+        allocation_ratio FLOAT NOT NULL,
+        INDEX (resource_provider_id),
+        INDEX (resource_class_id)
+    );
+
+We propose to consolidate all of the inventory/capacity fields from the above
+locations into the new `inventories` table in the following manner:
+
+Remember that all compute nodes are resource providers, but not all resource
+providers are compute nodes. There is no globally-unique identifier for a
+compute node within the OpenStack deployment, and we need a globally-unique
+identifier for the resource provider.
+
+1) (COMPLETED IN MITAKA) We must first add a new `uuid` field to the
+`compute_nodes` table::
+
+    ALTER TABLE compute_nodes ADD COLUMN uuid VARCHAR(36) NULL;
+
+.. note::
+
+    The `uuid` field must be NULL at first, since we will not be generating
+    values in a schema migration script. See below for where we generate UUIDs for
+    each compute node on-demand as each compute node without a UUID specified is
+    read from the database.
+
+Because we do not want to do any data migrations in SQL migration scripts, we
+need to do the following data migrations in the `nova.objects.ComputeNode`
+object. We propose having a method called `_migrate_inventory()` that handles
+the data migration steps that is called on `_from_db_object()` when certain
+conditions are found to be in place (for instance, the compute node doesn't
+have a UUID field value). The `_migrate_inventory()` method should use a single
+database transaction to ensure all DB writes are done atomically and it should
+first check to ensure that all API and conductor nodes have been upgraded to a
+version that can support the migration.
+
+2) (COMPLETED IN MITAKA) Compute nodes that have no `uuid` field set should
+have a new random UUID generated on-demand.
+
+3) A record must be added to the `resource_providers` table for each compute
+node::
+
+    INSERT INTO resource_providers (uuid)
+    SELECT uuid FROM compute_nodes;
+
+4) We need to create the inventory records for each compute node. For each of
+the resource classes that the compute node provides, we need to store the
+capacity, min and max unit values, and allocation ratios.
+
+4a) For the vCPU resource class, we would do the following steps for each
+compute node. Grab the resource class identifier for CPU from the
+`resource_classes` table (see `resource-classes` blueprint).
+
+Insert into the `inventories` table a record for the CPU resource class
+with the total, min, max, and allocation ratio. For example::
+
+    INSERT INTO inventories (
+        resource_provider_id,
+        resource_class_id,
+        total,
+        min_unit,
+        max_unit,
+        allocation_ratio
+    )
+    SELECT
+        rp.id,
+        $CPU_RESOURCE_CLASS_ID,
+        cn.vcpus,
+        1,
+        cn.vcpus,
+        cn.cpu_allocation_ratio
+    FROM compute_nodes AS cn
+        JOIN resource_providers rp
+           ON cn.uuid = rp.uuid
+    WHERE cn.id = $COMPUTE_NODE_ID;
+
+4b) Do the same for the RAM and DISK resource classes. For the DISK resource
+class, do not perform the INSERT if the compute node uses shared storage
+for the ephemeral disks.
+
+4c) For the PCI device resource classes (`PCI_GENERIC`, `PCI_SRIOV_PF` and
+`PCI_SRIOV_VF`), the inventories table records represent the class of
+resources as a whole, not, for example, individual VFs on an SR-IOV-enabled
+NIC PF. As such, a single record representing the total amount of each PCI
+resource class would be added to the inventories table for each compute
+node that has PCI devices.
+
+For example, let us assume that a compute node has one SR-IOV-enabled NIC,
+supporting 255 virtual functions (VFs) and not exposing the physical
+function (PF) for use by a cloud user. We want to limit the number of VFs
+that any single instance can consume to 8.
+
+We would insert the following into the inventories table::
+
+    INSERT INTO inventories (
+        resource_provider_id,
+        resource_class_id,
+        total,
+        min_unit,
+        max_unit,
+        allocation_ratio
+    )
+    SELECT
+        rp.id,
+        $PCI_SRIOV_VF_RESOURCE_CLASS_ID,
+        255,
+        1,
+        8,
+        1.0
+    FROM compute_nodes AS cn
+        JOIN resource_providers rp
+           ON cn.uuid = rp.uuid
+    WHERE cn.id = $COMPUTE_NODE_ID;
+
+4d) For the NUMA resource classes (`NUMA_SOCKETS`, `NUMA_CORES`, `NUMA_THREADS`
+and `NUMA_MEMORY`), create an inventory record for each compute node that
+exposes NUMA topology resources.
+
+For example, let us assume we have a compute node that exposes 2 NUMA nodes
+(cells), each with 4 cores and 8 threads. We would set the the min_unit and
+max_unit values of the inventory records to the single-NUMA-cell
+constraints and the total value to the combined number of the resource. So,
+for instance, for the `NUMA_CORES`, we'd set total to 8 (2 sockets having 4
+cores each), min_unit to 1, and max_unit to 4 (since each cell has 4 cores).
+
+.. note::
+
+    In the following release from when this code merges, we will do a followup
+    patch that makes the UUID column non-nullable and adds a unique constraint
+    on the compute_nodes.uuid column.
+
+Changes to `ComputeNode` object model
+-------------------------------------
+
+In order to ease the transition from the old-style mechanism for determining
+inventory/capacity information, we propose modifying the
+`nova.objects.ComputeNode` object in following ways:
+
+1) Make the existing `vcpus`, `memory_mb`, `local_gb`, `cpu_allocation_ratio`,
+and `ram_allocation_ratio`, `disk_allocation_ratio` fields be read using a
+single query against the `inventories` table and populate the values of the
+object fields so that the user is none the wiser that the storage mechanism has
+changed behind the scenes. A single SQL query may be used to grab the above
+fields::
+
+    SELECT
+        i.resource_class_id,
+        i.total,
+        i.min_unit,
+        i.max_unit,
+        i.allocation_ratio
+    FROM inventories i
+      JOIN resource_providers rp
+      ON i.resource_provider_id = rp.id
+    WHERE rp.uuid = $COMPUTE_NODE_UUID;
+
+2) The only piece of code that *writes* changes to the `vcpus`, `memory_mb`,
+`local_gb`, `cpu_allocation_ratio`, and `ram_allocation_ratio` fields of the
+`ComputeNode` is in the resource tracker, which sets the field values and calls
+`save()` on the `ComputeNode` object. We can modify the `save()` method to
+write any changes to inventory/capacity information to the new `inventories`
+table instead of the `compute_nodes` table.
+
+.. note::
+
+    The object should be changed to only save capacity information to the
+    inventory table, but **only** if all conductor and API nodes have been
+    upgraded to a version that supports the new inventory schema.
+
+Alternatives
+------------
+
+This is step 3 in an irreversible process that completely changes the way that
+quantitative things are tracked and claimed in Nova.
+
+Data model impact
+-----------------
+
+No other database schema changes will be required by this blueprint. The work
+in this blueprint only populates the `inventories` table that is created in the
+`resource-providers` blueprint.
+
+REST API impact
+---------------
+
+None.
+
+Security impact
+---------------
+
+None.
+
+Notifications impact
+--------------------
+
+None.
+
+Other end user impact
+---------------------
+
+None.
+
+Performance Impact
+------------------
+
+None.
+
+Other deployer impact
+---------------------
+
+There will be a database schema migration needed that adds the `uuid` column to
+the `compute_nodes` table.
+
+Developer impact
+----------------
+
+None.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  jaypipes
+
+Other contributors:
+  cdent
+  dansmith
+
+Work Items
+----------
+
+The following distinct tasks are involved in this spec's implementation:
+
+* Create the database schema migration that adds the `uuid` column to the
+  `compute_nodes` table
+* Modify `nova.objects.ComputeNode.create()` to populate the `uuid` attribute
+  of the compute node, insert a record into the `resource_providers` table and
+  add any inventory/capacity fields to the `inventories` table.
+* Add a `nova.objects.ComputeNode._migrate_inventory()` method to migrate the
+  inventory/capacity fields from `compute_nodes` to `inventories` and populate
+  `uuid` column value if it is None, as it would be if an older `nova-compute`
+  daemon sent a serialized `ComputeNode` object model to an updated conductor.
+  The `_migrate_inventory()` method should also create a record in the
+  `resource_providers` table for the compute node
+* Modify `nova.objects.ComputeNode` model to read inventory/capacity
+  information from the `inventories` table instead of the `compute_nodes` table
+* Modify `nova.objects.ComputeNode` model to store **changed** inventory
+  information (total amount, min and max unit constraints, and allocation
+  ratio) to the `inventories` table instead of the `compute_nodes` table, and
+  read the inventory information from the `inventories` table instead of the
+  `compute_nodes` table
+
+Dependencies
+============
+
+* `resource-classes` blueprint implemented
+* `resource-providers` blueprint implemented
+
+Testing
+=======
+
+Full unit, functional, and integration testing of the
+`ComputeNode._migrate_inventory()` method that performs the data migration
+itself.
+
+Documentation Impact
+====================
+
+Developer reference documentation only. No user-facing impact is expected from
+this spec's implementation.
+
+References
+==========
+
+* `resource-classes` blueprint: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/mitaka/approved/resource-classes.rst
+* `resource-providers` blueprint: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/mitaka/approved/resource-providers.rst
+
+History
+=======
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - Newton
+     - Introduced