From 0099bd4e974ab1fc549baa73f865fe86661cbe74 Mon Sep 17 00:00:00 2001
From: Jay Pipes <jaypipes@gmail.com>
Date: Wed, 20 May 2015 09:26:39 -0700
Subject: [PATCH] Adds spec for modeling resources using objects

Blueprint: resource-objects
Previously-approved-for-kilo

Change-Id: I4e2b38012f48ced09c86b9ed43d0e3ca7a397d33
---
 specs/liberty/approved/resource-objects.rst | 405 ++++++++++++++++++++
 1 file changed, 405 insertions(+)
 create mode 100644 specs/liberty/approved/resource-objects.rst

diff --git a/specs/liberty/approved/resource-objects.rst b/specs/liberty/approved/resource-objects.rst
new file mode 100644
index 000000000..78f96db34
--- /dev/null
+++ b/specs/liberty/approved/resource-objects.rst
@@ -0,0 +1,405 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+==========================
+Model resources as objects
+==========================
+
+https://blueprints.launchpad.net/nova/+spec/resource-objects
+
+Adds model objects to represent the resources that may be requested
+for and consumed by an instance.
+
+Problem description
+===================
+
+In Nova, we have a very loose way of modeling the resources that are
+consumed by virtual machine instances and provided by compute nodes.
+The Flavor object has a number of static fields that correspond to amounts
+of simple resources like CPU, RAM and local disk. We use dictionaries
+of key/value pairs and JSON-serialized BLOBs of data to model other types
+of resources, like PCIe devices or NUMA cell layouts.
+
+The resource tracker on the compute node keeps track of the collection of
+resources that are consumed on the node. The `ResourceTracker.old_resources`
+attribute is a dictionary containing a clutter of nested dictionaries. Some of
+these nested dictionaries include the 'stats' dict for the extensible resource
+tracker; various 'pci_devices', 'pci_stats' and 'pci_passthrough_devices'
+things; a 'numa_topology' blob that stores a JSON-serialized representation of
+an object in `nova.virt.hardware` and a 'metrics' dictionary with completely
+unstructured and undocumented key/value pairs. In addition to these, the
+`ResourceTracker.old_resources` dictionary contains top-level keys including
+some that match the simple resource types that a Flavor object exposes:
+
+- `local_gb_used`: Amount of disk in GB used on the compute node
+- `local_gb`: Total GB of local disk capacity the compute node provides
+- `free_disk_gb`: Calculated amount of disk the compute node has available
+- `vcpus_used`: Number of vCPUs consumed on the compute node
+- `vcpus`: Total number of vCPUs the compute node provides
+- `free_vcpus`: Calculated number of vCPUs the compute node has available
+- `memory_mb_used`: Amount of RAM in MB used on the compute node
+- `memory_mb`: Total MB of RAM capacity the compute node provides
+- `free_ram_mb`: Calculated amount of RAM the compute node has available
+- `running_vms`: Number of virtual machine instances running on the node
+- `current_workload`: Some calculated value of the workload on the node
+
+Unfortunately, none of the above is documented in the code, and in order to add
+new features to the scheduler, people have continued to add free-form keys and
+nested dictionaries to the dictionary. This makes communicating actual usage
+amounts to the scheduler error-prone: the resource tracker calls the
+`scheduler_client.update_resource_stats()` method, passing in this
+unstructured, unversioned dictionary of information as-is.  This means the
+scheduler interface is incredibly fragile since the interface can be altered on
+a whim by any developer who decides to add a new key to the free-form
+dictionary of resources. Typos in resource dictionary keys can be very easy to
+miss in code reviews, and frankly, there is virtually no functional testing for
+a lot of the edge case code in the resource tracker around the extensible
+resource tracker.
+
+In addition to the problem of fragile interfaces, the free-form nature of the
+resources dictionary has meant that different resources are tracked in
+different ways. PCI resources are tracked one way, NUMA topology usage is
+tracked in a different way, CPU/RAM/disk are tracked differently again and any
+resources modeled in the complete free-for-all of the extensible resource
+tracker are tracked in an entirely different way, using plugins that modify a
+supplied 'stats' nested dictionary.
+
+An example of the mess this has created in the resource tracker can be
+seen here:
+
+.. code:: python
+
+    def _update(self, context, values):
+        """Update partial stats locally and populate them to Scheduler."""
+        self._write_ext_resources(values)
+        # NOTE(pmurray): the stats field is stored as a json string. The
+        # json conversion will be done automatically by the ComputeNode object
+        # so this can be removed when using ComputeNode.
+        values['stats'] = jsonutils.dumps(values['stats'])
+
+        if not self._resource_change(values):
+            return
+        if "service" in self.compute_node:
+            del self.compute_node['service']
+        # NOTE(sbauza): Now the DB update is asynchronous, we need to locally
+        #               update the values
+        self.compute_node.update(values)
+        # Persist the stats to the Scheduler
+        self._update_resource_stats(context, values)
+        if self.pci_tracker:
+            self.pci_tracker.save(context)
+
+If resources were actually modeled consistently, the above code would look like
+this instead:
+
+.. code:: python
+
+    def _update(self, context, resources):
+        if not self._resource_change(resources):
+            return
+        # Notify the scheduler about changed resources
+        scheduler_client.update_usage_for_compute_node(
+            context, self.compute_node, resources)
+
+Similarly, the following code (again from the resource tracker):
+
+.. code:: python
+
+    def _update_usage(self, context, resources, usage, sign=1):
+        mem_usage = usage['memory_mb']
+
+        overhead = self.driver.estimate_instance_overhead(usage)
+        mem_usage += overhead['memory_mb']
+
+        resources['memory_mb_used'] += sign * mem_usage
+        resources['local_gb_used'] += sign * usage.get('root_gb', 0)
+        resources['local_gb_used'] += sign * usage.get('ephemeral_gb', 0)
+
+        # free ram and disk may be negative, depending on policy:
+        resources['free_ram_mb'] = (resources['memory_mb'] -
+                                    resources['memory_mb_used'])
+        resources['free_disk_gb'] = (resources['local_gb'] -
+                                     resources['local_gb_used'])
+
+        resources['running_vms'] = self.stats.num_instances
+        self.ext_resources_handler.update_from_instance(usage, sign)
+
+        # Calculate the numa usage
+        free = sign == -1
+        updated_numa_topology = hardware.get_host_numa_usage_from_instance(
+                resources, usage, free)
+        resources['numa_topology'] = updated_numa_topology
+
+would instead look like this:
+
+.. code:: python
+
+    def _update_usage(self, context, amounts):
+        for resource, amount in amounts.items():
+            self.inventories[resource].consume(amount)
+
+Use Cases
+----------
+
+Nova contributors wish to extend the functionality of the scheduler and intend
+to break the scheduler out into the Gantt project. In order to do this
+effectively, the internal interfaces around the resource tracker and the
+scheduler must be cleaned up to use structured objects.
+
+Project Priority
+-----------------
+
+This blueprint is part of the `scheduler` refactoring effort defined as a
+priority for the Liberty release.
+
+Proposed change
+===============
+
+Modeling requested and used resource amounts is the foundational building block
+that must be done first before any further refactoring or cleanup of the
+scheduler or resource tracker interfaces.
+
+This blueprint encompasses the addition of sets of classes to represent:
+
+- Amounts of different datatypes, e.g. `IntegerAmount` or `NUMATopologyAmount`.
+- Inventories of different datatypes, which describe the actual capacity, the
+  amount used up already and any overcommit ratio. E.g. `IntegerInventory`,
+  `NUMAInventory`.
+- Different types of resources, e.g. RAM which uses `IntegerAmount` and
+  `IntegerInventory`, or NUMA topology which uses `NUMAAmount` and
+  `NUMAInventory`.
+
+These amount, inventory and resource classes will be `nova.objects` object
+classes and will enable Nova to evolve, in a versioned manner, the way that it
+tracks resources and exposes resource consumption.
+
+The goals of the extensible resource tracker (ERT) were to put in place a
+framework that allowed adding new resource types and allowed accounting for
+those resources in different ways. While this blueprint does indeed remove the
+ERT, because these resource, amount and inventory classes are being added
+as `nova.object` objects, we will gain the flexibility that the ERT intended
+but with the stability of the nova objects system.
+
+The resource tracker code will then be converted to use the above classes when
+representing inventories of all resources on a compute node. As today, these
+will be persisted by simply calling `compute_node.save()`.
+
+No changes are proposed to the database schema of the `compute_nodes` table or
+the fields in `nova.objects.ComputeNode`, however we do add translation methods
+to `nova.objects.ComputeNode` that will be able to produce a dict of
+`Inventory` objects (keyed by `Resource`) from the compute node and update the
+compute node from a similar structure.
+
+Alternatives
+------------
+
+None.
+
+Data model impact
+-----------------
+
+None. The objects added in this blueprint are not stored in a database. These
+objects are a replacement for an unstructured nested dictionary that is
+currently used to represent resource amounts.
+
+REST API impact
+---------------
+
+None.
+
+Security impact
+---------------
+
+None.
+
+Notifications impact
+--------------------
+
+None.
+
+Other end user impact
+---------------------
+
+None.
+
+Performance Impact
+------------------
+
+None.
+
+Other deployer impact
+---------------------
+
+The ERT will be removed when this blueprint is completed.
+
+Developer impact
+----------------
+
+Once this blueprint is completed, code handling the construction of the
+request_spec will be more structured and much of the spaghetti code in the
+resource tracker around the ERT, PCI tracker and NUMA topology quirks will go
+away.
+
+Implementation
+==============
+
+The following abstract classes will be provided:
+
+.. code:: python
+
+ class Amount(object):
+    """Represents a quantity of a resource."""
+
+    def __eq__(self, other):
+        raise NotImplementedError
+
+    def __ne__(self, other):
+        return not self == other
+
+    def __hash__(self, other):
+        raise NotImplementedError
+
+    def __neg__(self, other):
+        raise NotImplementedError
+
+
+ class Inventory(object):
+    """Describes the capacity, available and used amounts for a resource."""
+
+    def consume(self, amount):
+        """Update (i.e. add) the given amount to the used amount in this
+        inventory. If the amount is negative, more resources will be available
+        afterwards than were before.
+
+        :param amount 'Amount' to add to the usage.
+        :raises ValueError if amount is the wrong type for this inventory.
+        :raises CapacityException if accommodating this request would cause
+                either available or used resources to go negative.
+        """
+        raise NotImplementedError
+
+    def can_provide(self, amount):
+        """Determine if this inventory can provide the given amount of
+        resources. An overcommit ratio may be applied.
+
+        :param amount 'Amount' to determine if there is room for.
+        :raises ValueError if amount is the wrong type for this inventory or is
+                negative.
+        :returns True if the requested amount of resources may be consumed,
+                 False otherwise.
+        """
+        raise NotImplementedError
+
+
+ class Resource(object):
+    """Describes a particular kind of resource."""
+
+    @classmethod
+    def make_amount(cls, *args, **kwargs):
+        """Makes an Amount of the type appropriate to this resource."""
+        raise NotImplementedError
+
+    @classmethod
+    def make_inventory(cls, *args, **kwargs):
+        """Makes an Inventory of the type appropriate to this resource."""
+        raise NotImplementedError
+
+Each concrete specialization of the Inventory class must be able to handle
+overcommit ratios for the type of resource that it handles.
+
+With the idea that *all* requested resources for an instance should be able
+to be compared to *all* resource inventories for a compute node in the
+same way, using code that looks like this:
+
+.. code:: python
+
+ for resource, amount in request_spec.resources.items():
+    if compute_node.inventories[resource].can_provide(amount):
+        # do something... perhaps claim resources on the compute
+        # node, which might eventually call:
+        compute_node.inventories[resource].consume(amount)
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  jaypipes
+
+Other contributors:
+  lxsli
+
+Work Items
+----------
+
+- Add classes for amount and inventory representation.
+
+- Add classes for resource representation.
+
+- Add translation methods (`get_inventories` and `update_inventories`) to
+  `nova.objects.ComputeNode` to return or update from a dict of `Resource,
+  Inventory` objects with unit tests.
+
+- Convert resource tracker to use inventories instead of triples of
+  free/total/used amounts in key/value pairs in a dictionary for the non-PCI,
+  non-ERT, non-NUMA resources.
+
+- Remove the extensible resource tracker code.
+
+- Convert resource tracker to use inventories instead of 'numa_topology' key
+  and `nova.virt.hardware.VirtNUMATopology` object in the `old_resources`
+  dictionary.
+
+- Convert resource tracker to use inventories instead of 'pci_devices' and
+  'pci_passthrough_devices' keys and a `nova.pci.pci_stats.PciDeviceStats`
+  object in the `pci_tracker` attribute of the resource tracker.
+
+- Convert the virt driver's `get_available_resources` method to return a
+  dictionary of resource objects.
+
+- Deprecate the old `update_resource_stats()` conductor RPC API method.
+
+- Convert the scheduler's `HostStateManager` to utilize the new
+  `ComputeNode.get_inventories()` and `ComputeNode.update_inventories` methods.
+
+- Add developer reference documentation for how resources are modeled.
+
+Dependencies
+============
+
+None.
+
+Testing
+=======
+
+New unit tests for the objects will be added. The existing unit tests of
+resource tracker will be overhauled in the patch set that converts the resource
+tracker to use the new resource object models instead of its current free-form
+dictionary of things.
+
+Documentation Impact
+====================
+
+There are currently no developer reference docs that explain how the different
+resources are tracked within Nova.  Developer reference material that explains
+the new resource type and amount classes will be delivered as a part of this
+blueprint.
+
+References
+==========
+
+This blueprint is part of an overall effort to clean up, version, and stabilize
+the interfaces between the nova-api, nova-scheduler, nova-conductor and
+nova-compute daemons that involve scheduling and resource decisions.
+
+- `detach-service-from-computenode`
+- `resource-objects` <-- this blueprint
+- `request-spec-object`
+- `sched-select-destinations-use-request-spec-object`
+- `placement-spec-object`
+- `condition-objects`
+- `sched-placement-spec-use-resource-objects`
+- `sched-placement-spec-use-condition-objects`
+- `sched-get-placement-claims`