Amend count-quota-usage-from-placement to reflect implementation

There was much discussion on the implementation patch series which resulted in some deviations from the spec proposal. This updates the spec to match the resulting implementation. Related to blueprint count-quota-usage-from-placement Depends-On: https://review.opendev.org/662056 Change-Id: Ia0f64734a1ae2c9332b3e70a97eaab288bdf0c79
2019-05-30 03:28:40 +00:00 · 2019-05-30 03:28:40 +00:00 · bf4d8af262
parent d66c33164c
commit bf4d8af262
1 changed files with 54 additions and 44 deletions
--- a/specs/train/approved/count-quota-usage-from-placement.rst
+++ b/specs/train/approved/count-quota-usage-from-placement.rst
@ -78,39 +78,42 @@ The new method will contain:
  mappings for a project and a user to represent the instance count.

 We will rename the ``_instances_cores_ram_count`` method to
-``_cores_ram_count`` that counts cores and ram from the cell databases and
-is only used if ``[workarounds]disable_quota_usage_from_placement`` is True.
+``_instances_cores_ram_count_legacy`` that counts cores and ram from the cell
+databases and is only used if ``[quota]count_usage_from_placement`` is False or
+if the data migration has not yet completed.

-Because there is not yet an ability to partition allocations (or perhaps,
-resource providers from which allocations could derive a partition) in
+Because there is not yet an ability to partition resource providers in
 placement, in order to support deployments where multiple Nova deployments
-share the same placement service, like possibly in an Edge scenario, we can add
-a ``[workarounds]disable_quota_usage_from_placement`` which defaults to False.
-If True, we use the legacy quota counting method for instances, cores, and
-ram. If False, we use a quota counting method that calls placement. This is a
-minimal way to keep "legacy" quota counting available for the scenario of
-multiple Nova deployments sharing one placement service. The config option will
-simply control which counting method will be called by the pluggable quota
-system. For example (pseudo-code):
+share the same placement service, like possibly in an Edge scenario, we will
+add a ``[quota]count_usage_from_placement`` config option which defaults to
+False. If False, we use the legacy quota counting method for instances, cores,
+and ram. If True, we use a quota counting method that calls placement. This is
+a way to keep "legacy" quota counting available for the scenario of multiple
+Nova deployments sharing one placement service. The config option will simply
+control which counting method will be called by the pluggable quota system.
+For example (pseudo-code):

 ::

-    if CONF.workarounds.disable_quota_usage_from_placement:
-        CountableResource('cores', _cores_ram_count, 'cores')
-        CountableResource('ram', _cores_ram_count, 'ram')
+    if CONF.quota.count_usage_from_placement:
+        return _instances_cores_ram_count_api_db_placement(...)
    else:
-        CountableResource('cores', _cores_ram_count_placement, 'cores')
-        CountableResource('ram', _cores_ram_count_placement, 'ram')
+        return _instances_cores_ram_count_legacy(...)

 We will add a new method for counting cores and ram from placement that is used
-when ``[workarounds]disable_quota_usage_from_placement`` is False. This
-method could be called ``_cores_ram_count_placement``.
+when ``[quota]count_usage_from_placement`` is True. This method could be called
+``_cores_ram_count_placement``.

 The new method will contain:

-* One call to placement to get resource usage for CPU and RAM. We can get CPU
-  and RAM usage for a project and user by querying the ``/usages`` resource::
+* Up to two calls to placement to get resource usage for CPU and RAM. One call
+  will count usage across a project. Then, if user-scoped quota limits are
+  found for a resource, a second call will count usage across a project and a
+  user.
+  We can get CPU and RAM usage for a project and user by querying the
+  ``/usages`` resource::

+    GET /usages?project_id=<project id>
    GET /usages?project_id=<project id>&user_id=<user id>

 Alternatives
@ -126,8 +129,9 @@ allowing server create requests to potentionally exceed quota limits.
 Another alternative which has been discussed is, to use placement aggregates to
 surround each entire Nova deployment and use that as a means to partition
 placement usages. We would need to add a ``aggregate=`` query parameter to the
-placement /usages API in this case. This approach would also require some work
-by either Nova or the operator to keep the placement aggregate updated.
+placement ``/usages`` API in this case. This approach would also require some
+work by either Nova or the operator to keep the placement aggregate
+synchronized.

 .. _policy-driven behavior: https://review.openstack.org/614783

@ -187,7 +191,7 @@ Upgrade impact
 The addition of the ``user_id`` column to the ``nova_api.instance_mappings``
 table will require a data migration of all existing instance mappings to
 populate the ``user_id`` field. The migration routine would look for mappings
-where ``user_id`` is None and query cells by corresponding ``project_id`` in
+where ``user_id`` is None and query cells by corresponding ``cell_id`` in
 the mapping. The query could filter on instance UUIDs, finding the ``user_id``
 values to populate in the mappings. This would implement the batched
 ``nova-manage db online_data_migrations`` way of doing the migration.
@ -199,25 +203,28 @@ situation where an upgrade has not run

 In order to handle a live in-progress upgrade, we will need to be able to fall
 back on the legacy counting method for instances, cores, and ram if
-``nova_api.instance_mappings`` don't yet have ``user_id`` populated (if the
+``nova_api.instance_mappings`` do not yet have ``user_id`` populated (if the
 operator has not yet run the data migration). We will need a way to detect that
 the migration has not yet been run in order to fall back on the legacy counting
-method. We could have a check such as ``if count(InstanceMapping.id) where
-project_id=<project id> and user_id=None > 0``, then fall back on the legacy
+method. We could have a check such as ``if exists(InstanceMapping.id) where
+project_id=<project id> and user_id=None``, then fall back on the legacy
 counting method to query cell databases. We should cache the results of the
 each migration completeness check per ``project_id`` so we avoid needlessly
 checking a ``project_id`` that has already been migrated every time quota is
 checked.

 We will populate the ``user_id`` field even for instance mappings that are
-``queued_for_delete=True`` because we will be filtering on
-``queued_for_delete=False`` during the instance count based on instance
-mappings.
+``queued_for_delete=True`` because such instance mappings include instances
+that are ``SOFT_DELETED`` and these can be restored at any time in the future.
+If we do not migrate ``SOFT_DELETED`` instances with ``queued_for_delete=True``
+and they are restored in the future, their instance mappings would be
+unmigrated and would prevent us being able to eventually drop the related data
+migration code.

 The data migrations and fallback to the legacy counting method will be
-temporary for Stein, to be dropped in T with a blocker migration. That is, you
-cannot pass ``nova-manage api_db sync`` if there are any instance mappings with
-``user_id=None`` to force the batched migration using ``nova-manage``.
+temporary for Train, to be dropped in U or V with a blocker migration. That is,
+you cannot pass ``nova-manage api_db sync`` if there are any instance mappings
+with ``user_id=None`` to force the batched migration using ``nova-manage``.

 Implementation
 ==============
@ -239,20 +246,23 @@ Work Items
 * Update the ``_server_group_count_members_by_user`` quota counting method to
  use only the ``nova_api.instance_mappings`` table instead of querying cell
  databases.
-* Add a config option ``[workarounds]disable_quota_usage_from_placement`` that
+* Add a config option ``[quota]count_usage_from_placement`` that
  defaults to False. This will be able to be deprecated when partitioning of
-  resource providers or allocations is available in placement.
+  resource providers is available in placement and other quirks around
+  placement resource allocations in Nova are resolved in the future (example:
+  "doubling" of allocations during resizes).
 * Add a new method to count instances with a count of
  ``nova_api.instance_mappings`` filtering by ``project_id=<project_id>`` and
  ``user_id=<user_id>`` and ``queued_for_delete=False``.
 * Add a new count method that queries the placement API for CPU and RAM usage.
  In the new count method, add a check for whether the online data migration
  has been run yet and if not, fall back on the legacy count method.
-* Rename the ``_instances_cores_ram_count`` method to ``_cores_ram_count`` and
-  let it count only cores and ram in the legacy way, for use if
-  ``[workarounds]disable_quota_usage_from_placement`` is set to True.
-* Adjust the nova-next or nova-live-migration CI job to run with
-  ``[workarounds]disable_quota_usage_from_placement=True``.
+* Rename the ``_instances_cores_ram_count`` method to
+  ``_instances_cores_ram_count_legacy`` and let it count only cores and ram in
+  the legacy way, for use if ``[quota]count_usage_from_placement`` is False or
+  the data migration is not yet completed.
+* Adjust the nova-next CI job to run with
+  ``[quota]count_usage_from_placement=True``.

 Dependencies
 ============
@ -263,16 +273,16 @@ Testing
 =======

 Unit tests and functional tests will be included to test the new functionality.
-We will also adjust one CI job (nova-next or nova-live-migration) to run with
-``[workarounds]disable_quota_usage_from_placement=True`` to make sure we have
-integration test coverage of that path.
+We will also adjust one CI job (nova-next) to run with
+``[quota]count_usage_from_placement=True`` to make sure we have integration
+test coverage of that path.

 Documentation Impact
 ====================

 The documentation_ of Cells v2 caveats will be updated to update the paragraph
 about the inability to correctly calculate quota usage when one or more cells
-are unreachable. We will document that beginning in Stein, there are new
+are unreachable. We will document that beginning in Train, there are new
 deployment options.

 .. _documentation: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#quota-related-quirks