Merge "Amend count-quota-usage-from-placement to reflect implementation"

This commit is contained in:
Zuul
2019-06-04 13:47:16 +00:00
committed by Gerrit Code Review

View File

@@ -78,39 +78,42 @@ The new method will contain:
mappings for a project and a user to represent the instance count. mappings for a project and a user to represent the instance count.
We will rename the ``_instances_cores_ram_count`` method to We will rename the ``_instances_cores_ram_count`` method to
``_cores_ram_count`` that counts cores and ram from the cell databases and ``_instances_cores_ram_count_legacy`` that counts cores and ram from the cell
is only used if ``[workarounds]disable_quota_usage_from_placement`` is True. databases and is only used if ``[quota]count_usage_from_placement`` is False or
if the data migration has not yet completed.
Because there is not yet an ability to partition allocations (or perhaps, Because there is not yet an ability to partition resource providers in
resource providers from which allocations could derive a partition) in
placement, in order to support deployments where multiple Nova deployments placement, in order to support deployments where multiple Nova deployments
share the same placement service, like possibly in an Edge scenario, we can add share the same placement service, like possibly in an Edge scenario, we will
a ``[workarounds]disable_quota_usage_from_placement`` which defaults to False. add a ``[quota]count_usage_from_placement`` config option which defaults to
If True, we use the legacy quota counting method for instances, cores, and False. If False, we use the legacy quota counting method for instances, cores,
ram. If False, we use a quota counting method that calls placement. This is a and ram. If True, we use a quota counting method that calls placement. This is
minimal way to keep "legacy" quota counting available for the scenario of a way to keep "legacy" quota counting available for the scenario of multiple
multiple Nova deployments sharing one placement service. The config option will Nova deployments sharing one placement service. The config option will simply
simply control which counting method will be called by the pluggable quota control which counting method will be called by the pluggable quota system.
system. For example (pseudo-code): For example (pseudo-code):
:: ::
if CONF.workarounds.disable_quota_usage_from_placement: if CONF.quota.count_usage_from_placement:
CountableResource('cores', _cores_ram_count, 'cores') return _instances_cores_ram_count_api_db_placement(...)
CountableResource('ram', _cores_ram_count, 'ram')
else: else:
CountableResource('cores', _cores_ram_count_placement, 'cores') return _instances_cores_ram_count_legacy(...)
CountableResource('ram', _cores_ram_count_placement, 'ram')
We will add a new method for counting cores and ram from placement that is used We will add a new method for counting cores and ram from placement that is used
when ``[workarounds]disable_quota_usage_from_placement`` is False. This when ``[quota]count_usage_from_placement`` is True. This method could be called
method could be called ``_cores_ram_count_placement``. ``_cores_ram_count_placement``.
The new method will contain: The new method will contain:
* One call to placement to get resource usage for CPU and RAM. We can get CPU * Up to two calls to placement to get resource usage for CPU and RAM. One call
and RAM usage for a project and user by querying the ``/usages`` resource:: will count usage across a project. Then, if user-scoped quota limits are
found for a resource, a second call will count usage across a project and a
user.
We can get CPU and RAM usage for a project and user by querying the
``/usages`` resource::
GET /usages?project_id=<project id>
GET /usages?project_id=<project id>&user_id=<user id> GET /usages?project_id=<project id>&user_id=<user id>
Alternatives Alternatives
@@ -126,8 +129,9 @@ allowing server create requests to potentionally exceed quota limits.
Another alternative which has been discussed is, to use placement aggregates to Another alternative which has been discussed is, to use placement aggregates to
surround each entire Nova deployment and use that as a means to partition surround each entire Nova deployment and use that as a means to partition
placement usages. We would need to add a ``aggregate=`` query parameter to the placement usages. We would need to add a ``aggregate=`` query parameter to the
placement /usages API in this case. This approach would also require some work placement ``/usages`` API in this case. This approach would also require some
by either Nova or the operator to keep the placement aggregate updated. work by either Nova or the operator to keep the placement aggregate
synchronized.
.. _policy-driven behavior: https://review.openstack.org/614783 .. _policy-driven behavior: https://review.openstack.org/614783
@@ -187,7 +191,7 @@ Upgrade impact
The addition of the ``user_id`` column to the ``nova_api.instance_mappings`` The addition of the ``user_id`` column to the ``nova_api.instance_mappings``
table will require a data migration of all existing instance mappings to table will require a data migration of all existing instance mappings to
populate the ``user_id`` field. The migration routine would look for mappings populate the ``user_id`` field. The migration routine would look for mappings
where ``user_id`` is None and query cells by corresponding ``project_id`` in where ``user_id`` is None and query cells by corresponding ``cell_id`` in
the mapping. The query could filter on instance UUIDs, finding the ``user_id`` the mapping. The query could filter on instance UUIDs, finding the ``user_id``
values to populate in the mappings. This would implement the batched values to populate in the mappings. This would implement the batched
``nova-manage db online_data_migrations`` way of doing the migration. ``nova-manage db online_data_migrations`` way of doing the migration.
@@ -199,25 +203,28 @@ situation where an upgrade has not run
In order to handle a live in-progress upgrade, we will need to be able to fall In order to handle a live in-progress upgrade, we will need to be able to fall
back on the legacy counting method for instances, cores, and ram if back on the legacy counting method for instances, cores, and ram if
``nova_api.instance_mappings`` don't yet have ``user_id`` populated (if the ``nova_api.instance_mappings`` do not yet have ``user_id`` populated (if the
operator has not yet run the data migration). We will need a way to detect that operator has not yet run the data migration). We will need a way to detect that
the migration has not yet been run in order to fall back on the legacy counting the migration has not yet been run in order to fall back on the legacy counting
method. We could have a check such as ``if count(InstanceMapping.id) where method. We could have a check such as ``if exists(InstanceMapping.id) where
project_id=<project id> and user_id=None > 0``, then fall back on the legacy project_id=<project id> and user_id=None``, then fall back on the legacy
counting method to query cell databases. We should cache the results of the counting method to query cell databases. We should cache the results of the
each migration completeness check per ``project_id`` so we avoid needlessly each migration completeness check per ``project_id`` so we avoid needlessly
checking a ``project_id`` that has already been migrated every time quota is checking a ``project_id`` that has already been migrated every time quota is
checked. checked.
We will populate the ``user_id`` field even for instance mappings that are We will populate the ``user_id`` field even for instance mappings that are
``queued_for_delete=True`` because we will be filtering on ``queued_for_delete=True`` because such instance mappings include instances
``queued_for_delete=False`` during the instance count based on instance that are ``SOFT_DELETED`` and these can be restored at any time in the future.
mappings. If we do not migrate ``SOFT_DELETED`` instances with ``queued_for_delete=True``
and they are restored in the future, their instance mappings would be
unmigrated and would prevent us being able to eventually drop the related data
migration code.
The data migrations and fallback to the legacy counting method will be The data migrations and fallback to the legacy counting method will be
temporary for Stein, to be dropped in T with a blocker migration. That is, you temporary for Train, to be dropped in U or V with a blocker migration. That is,
cannot pass ``nova-manage api_db sync`` if there are any instance mappings with you cannot pass ``nova-manage api_db sync`` if there are any instance mappings
``user_id=None`` to force the batched migration using ``nova-manage``. with ``user_id=None`` to force the batched migration using ``nova-manage``.
Implementation Implementation
============== ==============
@@ -239,20 +246,23 @@ Work Items
* Update the ``_server_group_count_members_by_user`` quota counting method to * Update the ``_server_group_count_members_by_user`` quota counting method to
use only the ``nova_api.instance_mappings`` table instead of querying cell use only the ``nova_api.instance_mappings`` table instead of querying cell
databases. databases.
* Add a config option ``[workarounds]disable_quota_usage_from_placement`` that * Add a config option ``[quota]count_usage_from_placement`` that
defaults to False. This will be able to be deprecated when partitioning of defaults to False. This will be able to be deprecated when partitioning of
resource providers or allocations is available in placement. resource providers is available in placement and other quirks around
placement resource allocations in Nova are resolved in the future (example:
"doubling" of allocations during resizes).
* Add a new method to count instances with a count of * Add a new method to count instances with a count of
``nova_api.instance_mappings`` filtering by ``project_id=<project_id>`` and ``nova_api.instance_mappings`` filtering by ``project_id=<project_id>`` and
``user_id=<user_id>`` and ``queued_for_delete=False``. ``user_id=<user_id>`` and ``queued_for_delete=False``.
* Add a new count method that queries the placement API for CPU and RAM usage. * Add a new count method that queries the placement API for CPU and RAM usage.
In the new count method, add a check for whether the online data migration In the new count method, add a check for whether the online data migration
has been run yet and if not, fall back on the legacy count method. has been run yet and if not, fall back on the legacy count method.
* Rename the ``_instances_cores_ram_count`` method to ``_cores_ram_count`` and * Rename the ``_instances_cores_ram_count`` method to
let it count only cores and ram in the legacy way, for use if ``_instances_cores_ram_count_legacy`` and let it count only cores and ram in
``[workarounds]disable_quota_usage_from_placement`` is set to True. the legacy way, for use if ``[quota]count_usage_from_placement`` is False or
* Adjust the nova-next or nova-live-migration CI job to run with the data migration is not yet completed.
``[workarounds]disable_quota_usage_from_placement=True``. * Adjust the nova-next CI job to run with
``[quota]count_usage_from_placement=True``.
Dependencies Dependencies
============ ============
@@ -263,16 +273,16 @@ Testing
======= =======
Unit tests and functional tests will be included to test the new functionality. Unit tests and functional tests will be included to test the new functionality.
We will also adjust one CI job (nova-next or nova-live-migration) to run with We will also adjust one CI job (nova-next) to run with
``[workarounds]disable_quota_usage_from_placement=True`` to make sure we have ``[quota]count_usage_from_placement=True`` to make sure we have integration
integration test coverage of that path. test coverage of that path.
Documentation Impact Documentation Impact
==================== ====================
The documentation_ of Cells v2 caveats will be updated to update the paragraph The documentation_ of Cells v2 caveats will be updated to update the paragraph
about the inability to correctly calculate quota usage when one or more cells about the inability to correctly calculate quota usage when one or more cells
are unreachable. We will document that beginning in Stein, there are new are unreachable. We will document that beginning in Train, there are new
deployment options. deployment options.
.. _documentation: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#quota-related-quirks .. _documentation: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#quota-related-quirks