Amend count-quota-usage-from-placement to reflect implementation

There was much discussion on the implementation patch series which
resulted in some deviations from the spec proposal. This updates the
spec to match the resulting implementation.

Related to blueprint count-quota-usage-from-placement

Depends-On: https://review.opendev.org/662056

Change-Id: Ia0f64734a1ae2c9332b3e70a97eaab288bdf0c79
This commit is contained in:
melanie witt 2019-05-30 03:28:40 +00:00
parent d66c33164c
commit bf4d8af262
1 changed files with 54 additions and 44 deletions

View File

@ -78,39 +78,42 @@ The new method will contain:
mappings for a project and a user to represent the instance count.
We will rename the ``_instances_cores_ram_count`` method to
``_cores_ram_count`` that counts cores and ram from the cell databases and
is only used if ``[workarounds]disable_quota_usage_from_placement`` is True.
``_instances_cores_ram_count_legacy`` that counts cores and ram from the cell
databases and is only used if ``[quota]count_usage_from_placement`` is False or
if the data migration has not yet completed.
Because there is not yet an ability to partition allocations (or perhaps,
resource providers from which allocations could derive a partition) in
Because there is not yet an ability to partition resource providers in
placement, in order to support deployments where multiple Nova deployments
share the same placement service, like possibly in an Edge scenario, we can add
a ``[workarounds]disable_quota_usage_from_placement`` which defaults to False.
If True, we use the legacy quota counting method for instances, cores, and
ram. If False, we use a quota counting method that calls placement. This is a
minimal way to keep "legacy" quota counting available for the scenario of
multiple Nova deployments sharing one placement service. The config option will
simply control which counting method will be called by the pluggable quota
system. For example (pseudo-code):
share the same placement service, like possibly in an Edge scenario, we will
add a ``[quota]count_usage_from_placement`` config option which defaults to
False. If False, we use the legacy quota counting method for instances, cores,
and ram. If True, we use a quota counting method that calls placement. This is
a way to keep "legacy" quota counting available for the scenario of multiple
Nova deployments sharing one placement service. The config option will simply
control which counting method will be called by the pluggable quota system.
For example (pseudo-code):
::
if CONF.workarounds.disable_quota_usage_from_placement:
CountableResource('cores', _cores_ram_count, 'cores')
CountableResource('ram', _cores_ram_count, 'ram')
if CONF.quota.count_usage_from_placement:
return _instances_cores_ram_count_api_db_placement(...)
else:
CountableResource('cores', _cores_ram_count_placement, 'cores')
CountableResource('ram', _cores_ram_count_placement, 'ram')
return _instances_cores_ram_count_legacy(...)
We will add a new method for counting cores and ram from placement that is used
when ``[workarounds]disable_quota_usage_from_placement`` is False. This
method could be called ``_cores_ram_count_placement``.
when ``[quota]count_usage_from_placement`` is True. This method could be called
``_cores_ram_count_placement``.
The new method will contain:
* One call to placement to get resource usage for CPU and RAM. We can get CPU
and RAM usage for a project and user by querying the ``/usages`` resource::
* Up to two calls to placement to get resource usage for CPU and RAM. One call
will count usage across a project. Then, if user-scoped quota limits are
found for a resource, a second call will count usage across a project and a
user.
We can get CPU and RAM usage for a project and user by querying the
``/usages`` resource::
GET /usages?project_id=<project id>
GET /usages?project_id=<project id>&user_id=<user id>
Alternatives
@ -126,8 +129,9 @@ allowing server create requests to potentionally exceed quota limits.
Another alternative which has been discussed is, to use placement aggregates to
surround each entire Nova deployment and use that as a means to partition
placement usages. We would need to add a ``aggregate=`` query parameter to the
placement /usages API in this case. This approach would also require some work
by either Nova or the operator to keep the placement aggregate updated.
placement ``/usages`` API in this case. This approach would also require some
work by either Nova or the operator to keep the placement aggregate
synchronized.
.. _policy-driven behavior: https://review.openstack.org/614783
@ -187,7 +191,7 @@ Upgrade impact
The addition of the ``user_id`` column to the ``nova_api.instance_mappings``
table will require a data migration of all existing instance mappings to
populate the ``user_id`` field. The migration routine would look for mappings
where ``user_id`` is None and query cells by corresponding ``project_id`` in
where ``user_id`` is None and query cells by corresponding ``cell_id`` in
the mapping. The query could filter on instance UUIDs, finding the ``user_id``
values to populate in the mappings. This would implement the batched
``nova-manage db online_data_migrations`` way of doing the migration.
@ -199,25 +203,28 @@ situation where an upgrade has not run
In order to handle a live in-progress upgrade, we will need to be able to fall
back on the legacy counting method for instances, cores, and ram if
``nova_api.instance_mappings`` don't yet have ``user_id`` populated (if the
``nova_api.instance_mappings`` do not yet have ``user_id`` populated (if the
operator has not yet run the data migration). We will need a way to detect that
the migration has not yet been run in order to fall back on the legacy counting
method. We could have a check such as ``if count(InstanceMapping.id) where
project_id=<project id> and user_id=None > 0``, then fall back on the legacy
method. We could have a check such as ``if exists(InstanceMapping.id) where
project_id=<project id> and user_id=None``, then fall back on the legacy
counting method to query cell databases. We should cache the results of the
each migration completeness check per ``project_id`` so we avoid needlessly
checking a ``project_id`` that has already been migrated every time quota is
checked.
We will populate the ``user_id`` field even for instance mappings that are
``queued_for_delete=True`` because we will be filtering on
``queued_for_delete=False`` during the instance count based on instance
mappings.
``queued_for_delete=True`` because such instance mappings include instances
that are ``SOFT_DELETED`` and these can be restored at any time in the future.
If we do not migrate ``SOFT_DELETED`` instances with ``queued_for_delete=True``
and they are restored in the future, their instance mappings would be
unmigrated and would prevent us being able to eventually drop the related data
migration code.
The data migrations and fallback to the legacy counting method will be
temporary for Stein, to be dropped in T with a blocker migration. That is, you
cannot pass ``nova-manage api_db sync`` if there are any instance mappings with
``user_id=None`` to force the batched migration using ``nova-manage``.
temporary for Train, to be dropped in U or V with a blocker migration. That is,
you cannot pass ``nova-manage api_db sync`` if there are any instance mappings
with ``user_id=None`` to force the batched migration using ``nova-manage``.
Implementation
==============
@ -239,20 +246,23 @@ Work Items
* Update the ``_server_group_count_members_by_user`` quota counting method to
use only the ``nova_api.instance_mappings`` table instead of querying cell
databases.
* Add a config option ``[workarounds]disable_quota_usage_from_placement`` that
* Add a config option ``[quota]count_usage_from_placement`` that
defaults to False. This will be able to be deprecated when partitioning of
resource providers or allocations is available in placement.
resource providers is available in placement and other quirks around
placement resource allocations in Nova are resolved in the future (example:
"doubling" of allocations during resizes).
* Add a new method to count instances with a count of
``nova_api.instance_mappings`` filtering by ``project_id=<project_id>`` and
``user_id=<user_id>`` and ``queued_for_delete=False``.
* Add a new count method that queries the placement API for CPU and RAM usage.
In the new count method, add a check for whether the online data migration
has been run yet and if not, fall back on the legacy count method.
* Rename the ``_instances_cores_ram_count`` method to ``_cores_ram_count`` and
let it count only cores and ram in the legacy way, for use if
``[workarounds]disable_quota_usage_from_placement`` is set to True.
* Adjust the nova-next or nova-live-migration CI job to run with
``[workarounds]disable_quota_usage_from_placement=True``.
* Rename the ``_instances_cores_ram_count`` method to
``_instances_cores_ram_count_legacy`` and let it count only cores and ram in
the legacy way, for use if ``[quota]count_usage_from_placement`` is False or
the data migration is not yet completed.
* Adjust the nova-next CI job to run with
``[quota]count_usage_from_placement=True``.
Dependencies
============
@ -263,16 +273,16 @@ Testing
=======
Unit tests and functional tests will be included to test the new functionality.
We will also adjust one CI job (nova-next or nova-live-migration) to run with
``[workarounds]disable_quota_usage_from_placement=True`` to make sure we have
integration test coverage of that path.
We will also adjust one CI job (nova-next) to run with
``[quota]count_usage_from_placement=True`` to make sure we have integration
test coverage of that path.
Documentation Impact
====================
The documentation_ of Cells v2 caveats will be updated to update the paragraph
about the inability to correctly calculate quota usage when one or more cells
are unreachable. We will document that beginning in Stein, there are new
are unreachable. We will document that beginning in Train, there are new
deployment options.
.. _documentation: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#quota-related-quirks