Repropose New Quota System spec

This is a reproposal of a previously merged spec in the Zed cycle[1].

We have had a long history of quotas going out of sync. This spec proposes
to fix the problem by introducing 2 new quota drivers:

1. Counting: Dynamically counting the resources
2. Counters: Store reserved and in-use quota values in DB and use it

[1] https://specs.openstack.org/openstack/cinder-specs/specs/zed/quota-system.html

Implements: blueprint count-resource-to-check-quota
Change-Id: Ic4c97e46553e94f57e3f8e22f7cd65d7464a34f8
This commit is contained in:
whoami-rajat 2023-10-21 21:27:12 +00:00 committed by Rajat Dhasmana
parent 25e2145f0b
commit b55e37b662

View File

@ -75,25 +75,27 @@ and the chances of introducing new bugs.
This quota system will support 2 different drivers: This quota system will support 2 different drivers:
- ``Counters``: This will be similar to the old system, using counters in the - ``StoredQuotaDriver``: This will be similar to the old system, using
DB, but instead of doing reservations and commits/rollbacks for every counters in the DB, but instead of doing reservations and commits/rollbacks
resource modification, it will only do reservations for the very few for every resource modification, it will only do reservations for the very
operations that really need it to keep track of the resources while the few operations that really need it to keep track of the resources while the
operation is in progress. operation is in progress.
- ``Counting``: This driver will no longer store usage and resource tracking in - ``DynamicQuotaDriver``: This driver will no longer store usage and resource
the database table (``quota_usages``) and instead dynamically calculates each tracking in the database table (``quota_usages``) and instead dynamically
quota check based on the resources that exist in the database. calculates each quota check based on the resources that exist in the
database.
Calculations will be counting for resources (e.g. ``snapshots``) or sum of Calculations will be counting for resources (e.g. ``snapshots``) or sum of
values for sizes (e.g. ``gigabytes``). values for sizes (e.g. ``gigabytes``).
Just like the ``Counters`` driver this will use reservations as little as Just like the ``StoredQuotaDriver`` driver this will use reservations as
possible. little as possible.
The reason for having 2 drivers instead of a single one is because there are The reason for having 2 drivers instead of a single one is because there are
trade-offs with each of the drivers, and the default will be the ``Counting`` trade-offs with each of the drivers, and the default will be the
driver, for reasons explained later in the `Performance Impact`_ section. ``DynamicQuotaDriver`` driver, for reasons explained later in the
`Performance Impact`_ section.
The changes will try, as much as possible, to avoid over engineering the The changes will try, as much as possible, to avoid over engineering the
solution focusing on the 2 new drivers and current cinder and not solve all solution focusing on the 2 new drivers and current cinder and not solve all
@ -140,10 +142,10 @@ except the ``per_volume_gigabytes``, since there cannot be any usage for it.
Reservation values will be stored in the ``delta`` field of the Reservation values will be stored in the ``delta`` field of the
``reservations`` table just like they are today. ``reservations`` table just like they are today.
For the ``Counting`` driver these values will be dynamically added, grouping by For the ``DynamicQuotaDriver`` these values will be dynamically added,
``resource`` for non deleted rows belonging to the specific project. On the grouping by ``resource`` for non deleted rows belonging to the specific
other hand the ``Counters`` driver will track the sum in the ``reserved`` field project. On the other hand the ``StoredQuotaDriver`` will track the sum in
of the ``quota_usages`` table. the ``reserved`` field of the ``quota_usages`` table.
Both drivers will adhere to the following rules when reporting in-use values: Both drivers will adhere to the following rules when reporting in-use values:
@ -228,7 +230,7 @@ does.
The quota driver context manager starts a DB session/transaction in the The quota driver context manager starts a DB session/transaction in the
provided ``context`` so the ``volume_create`` call will use that same session provided ``context`` so the ``volume_create`` call will use that same session
to create the volume record, and the transaction will be finalized when the to create the volume record, and the transaction will be finalized when the
code exists the context manager, thus ensuring that no other operations check code exits the context manager, thus ensuring that no other operations check
the quota until the volume has been created. the quota until the volume has been created.
From a developer's point of view all this will be hidden, because at a higher From a developer's point of view all this will be hidden, because at a higher
@ -299,9 +301,9 @@ limits which provides:
doesn't need to know about the ``no_snapshot_gb_quota`` configuration option. doesn't need to know about the ``no_snapshot_gb_quota`` configuration option.
- If we want to add, in the future, snapshot specific quota limits - - If we want to add, in the future, snapshot specific quota limits -
``snapshot_gigabytes`` and ``snapshot_gigabytes_<volume-ype>``- we'll be able ``snapshot_gigabytes`` and ``snapshot_gigabytes_<volume-type>``- we'll be
to do so without affecting any of the Cinder code with the sole exception of able to do so without affecting any of the Cinder code with the sole
the quota driver itself. exception of the quota driver itself.
Reservations Reservations
------------ ------------
@ -310,11 +312,11 @@ For the new quota system the reservation commit and rollback operations will be
grouped into a single context manager that handles both cases. Committing and grouped into a single context manager that handles both cases. Committing and
rolling back reservations have different meanings for the 2 drivers. rolling back reservations have different meanings for the 2 drivers.
For the ``Counting`` driver these are *noop* operations, since checks use the For the ``DynamicQuotaDriver`` these are *noop* operations, since checks use
DB values every time and the database has already been modified in the same the DB values every time and the database has already been modified in the
transaction that the reservations are removed. On the other hand the same transaction that the reservations are removed. On the other hand the
``Counters`` driver needs to modify the ``in_use`` and ``reserved`` counters in ``StoredQuotaDriver`` needs to modify the ``in_use`` and ``reserved`` counters
the ``quota_usages`` table accordingly to the operation. in the ``quota_usages`` table accordingly to the operation.
As mentioned before, reservations will only be necessary for specific As mentioned before, reservations will only be necessary for specific
operations, to be exact on 3 operations: extend, transfer, and retype. operations, to be exact on 3 operations: extend, transfer, and retype.
@ -332,8 +334,8 @@ Each of these operations have different reasons for requiring reservations:
- Transfer: Under normal circumstances accepting a transfer would not require - Transfer: Under normal circumstances accepting a transfer would not require
the use of a reservation, as we should be able to check the quota and do the the use of a reservation, as we should be able to check the quota and do the
database changes to accept the transfer in the same transaction. database changes to accept the transfer in the same transaction.
Unfortunately the *SolidFire* driver needs to make some changes in its Unfortunately the *SolidFire* and *VMDK* drivers need to make some changes in
backend on transfer, so the volume service has to make a driver call. their backend on transfer, so the volume service has to make a driver call.
We cannot keep the database locked while the driver call completes, as it can We cannot keep the database locked while the driver call completes, as it can
take some time and we don't want to prevent the API from processing other take some time and we don't want to prevent the API from processing other
@ -342,12 +344,12 @@ Each of these operations have different reasons for requiring reservations:
That is why reservations will be created before calling the driver and That is why reservations will be created before calling the driver and
cleared after accepting the resources. cleared after accepting the resources.
In terms of reservations, transfers are complex for the ``Counters`` driver, In terms of reservations, transfers are complex for the
because when completing one it needs to modify 2 different projects. One to ``StoredQuotaDriver``, because when completing one it needs to modify 2
increase counters and the other to decrease them, so higher levels will need different projects. One to increase counters and the other to decrease them,
to make 2 different calls for 2 different projects, one with positive and one so higher levels will need to make 2 different calls for 2 different
with negative numbers and negative numbers should ignore quota usage and projects, one with positive and one with negative numbers and negative
limits. numbers should ignore quota usage and limits.
When storing reservations for transfer of volumes with snapshots they have When storing reservations for transfer of volumes with snapshots they have
to be stored separately in case someone restarts the service after changing to be stored separately in case someone restarts the service after changing
@ -430,17 +432,17 @@ new quota system we have bigger problems, because it's not only
``no_snapshot_gb_quota`` that can be changed, but also ``quota_driver``, and ``no_snapshot_gb_quota`` that can be changed, but also ``quota_driver``, and
changing the quota driver means that a quota system may need to recalculate changing the quota driver means that a quota system may need to recalculate
things to ensure that it starts operating with the correct quota assumptions. things to ensure that it starts operating with the correct quota assumptions.
For example when changing from the ``Counting`` driver to the ``Counter`` For example when changing from the ``DynamicQuotaDriver`` to the
driver all the counters in the DB will be wrong, so the ``Counter`` driver ``StoredQuotaDriver`` all the counters in the DB will be wrong, so the
needs to calculate the counters before it can start working or the whole quota ``StoredQuotaDriver`` needs to calculate the counters before it can start
system will not operate correctly. working or the whole quota system will not operate correctly.
These configuration options are not the kind of things that are frequently These configuration options are not the kind of things that are frequently
changed, and we expect most deployments to never have to change them at all, changed, and we expect most deployments to never have to change them at all,
but Cinder should still provide a way for them to be safely changed since one but Cinder should still provide a way for them to be safely changed since one
of the cases we expect to happen is a deployment outgrowing the usefulness of of the cases we expect to happen is a deployment outgrowing the usefulness of
the ``Counting`` driver and running into performance issues. In that case they the ``DynamicQuotaDriver`` and running into performance issues. In that case
will want to switch to the ``Counter`` driver. they will want to switch to the ``StoredQuotaDriver``.
To support changing configuration option changes to the quota system there are To support changing configuration option changes to the quota system there are
3 things that the new quota system needs to be able to do: 3 things that the new quota system needs to be able to do:
@ -504,9 +506,9 @@ Initialization method for the quota driver where the ``driver_switched``
parameter indicates whether the last run was done using the same Quota driver parameter indicates whether the last run was done using the same Quota driver
or if a different one was used and this is the first run with this one. or if a different one was used and this is the first run with this one.
This is important because switching to the ``Counters`` driver from the This is important because switching to the ``StoredQuotaDriver`` from the
``Counting`` driver means that ``in-use`` and ``reserved`` counters need to be ``DynamicQuotaDriver`` means that ``in-use`` and ``reserved`` counters need to
recalculated since they could be out of sync or missing altogether. be recalculated since they could be out of sync or missing altogether.
This effort is going to focus on only supporting these 2 quota drivers and This effort is going to focus on only supporting these 2 quota drivers and
avoid unnecessary complexity, because if we wanted to support other kind of avoid unnecessary complexity, because if we wanted to support other kind of
@ -519,8 +521,8 @@ when switching.
The interface can be enhanced if a future quota driver finds it insufficient. The interface can be enhanced if a future quota driver finds it insufficient.
The ``no_snapshot_gb_quota_toggled`` parameter indicates whether the option has The ``no_snapshot_gb_quota_toggled`` parameter indicates whether the option has
changed since the last run. This is important for the ``Counters`` driver that changed since the last run. This is important for the ``StoredQuotaDriver``
would need to recalculate ``in-use`` and ``reserved`` counters. This is that would need to recalculate ``in-use`` and ``reserved`` counters. This is
something that doesn't work correctly right now. something that doesn't work correctly right now.
Drivers can block the Cinder database when synchronizing when the driver has Drivers can block the Cinder database when synchronizing when the driver has
@ -536,7 +538,7 @@ resync
def resync(self, context, project_id): def resync(self, context, project_id):
This is only relevant for the ``Counters`` driver, and is intended to allow This is only relevant for the ``StoredQuotaDriver``, and is intended to allow
the ``cinder-manage`` command request a recalculation of quotas for a specific the ``cinder-manage`` command request a recalculation of quotas for a specific
project or for the whole deployment. project or for the whole deployment.
@ -789,7 +791,7 @@ group_free
Context manager to free group quotas upon context exiting. The DB row soft Context manager to free group quotas upon context exiting. The DB row soft
deletion of groups will be enclosed by this call. deletion of groups will be enclosed by this call.
This is only relevant for the ``Counters`` driver that needs to decrease its This is only relevant for the ``StoredQuotaDriver`` that needs to decrease its
counters. counters.
backup_check_cm backup_check_cm
@ -850,7 +852,7 @@ backup_free
Context manager to free backup quotas upon context exiting. The DB row soft Context manager to free backup quotas upon context exiting. The DB row soft
deletion of the backup will be enclosed by this call. deletion of the backup will be enclosed by this call.
This is only relevant for the ``Counters`` driver that needs to decrease its This is only relevant for the ``StoredQuotaDriver`` that needs to decrease its
counters. counters.
vol_snap_check_and_reserve_cm vol_snap_check_and_reserve_cm
@ -928,7 +930,7 @@ reservations will pass the volume's ``uuid``.
Both drivers must use different entries for volume and snapshot gigabyte Both drivers must use different entries for volume and snapshot gigabyte
reservations because the ``no_snapshot_gb_quota_toggled`` configuration option reservations because the ``no_snapshot_gb_quota_toggled`` configuration option
may be changed and the service restarted before a transfer is accepted, and the may be changed and the service restarted before a transfer is accepted, and the
``Counters`` driver will need to make a decision both when recalculating (if ``StoredQuotaDriver`` will need to make a decision both when recalculating (if
driver has changed) and on transfer accept. driver has changed) and on transfer accept.
This context manager must ensure that there are no race conditions with This context manager must ensure that there are no race conditions with
@ -936,8 +938,10 @@ concurrent calls to ``vol_snap_check_and_reserve_cm`` within different threads
and processes in the node as well as across different nodes. and processes in the node as well as across different nodes.
For the database driver this can be achieved using a ``SELECT FOR UPDATE`` on For the database driver this can be achieved using a ``SELECT FOR UPDATE`` on
the ``backups`` and ``backup_gigabytes`` quota limits which blocks other backup the ``volumes``, ``volumes_<volume-type>``, ``snapshots``,
requests until the context manager exists. ``snapshots_<volume-type>``, ``gigabytes`` and ``gigabytes_<volume_type>``
quota limits which blocks other volume and snapshot requests until the context
manager exists.
Users of this context manager should try to keep the code within the context Users of this context manager should try to keep the code within the context
manager to a minimum to allow higher concurrency. manager to a minimum to allow higher concurrency.
@ -987,7 +991,7 @@ vol_snap_free
Context manager to free volume and snapshot quotas upon context exiting. Context manager to free volume and snapshot quotas upon context exiting.
This is only relevant for the ``Counters`` driver that needs to decrease its This is only relevant for the ``StoredQuotaDriver`` that needs to decrease its
counters. counters.
reservations_clean_cm reservations_clean_cm
@ -1004,14 +1008,14 @@ The ``uuid`` is the "primary" uuid of the operation and it won't be a different
uuid for each resource that has been reserved. E.g. when accepting a volume uuid for each resource that has been reserved. E.g. when accepting a volume
transfer with its snapshots, all reservations will use the volume's id. transfer with its snapshots, all reservations will use the volume's id.
For the ``Counting`` driver this is mostly just deleting the entries from the For the ``DynamicQuotaDriver`` this is mostly just deleting the entries from
database, but for the ``Counters`` driver it needs to adjust the ``in-use`` and the database, but for the ``StoredQuotaDriver`` it needs to adjust the
``reserved`` counters. ``in-use`` and ``reserved`` counters.
These counters may be from different projects, for the transfer of volumes, so These counters may be from different projects, for the transfer of volumes, so
the ``context``'s ``project_id`` will be ignored. the ``context``'s ``project_id`` will be ignored.
The ``Counting`` driver must also take into account the The ``DynamicQuotaDriver`` driver must also take into account the
``no_snapshot_gb_quota_toggled`` configuration option when committing a ``no_snapshot_gb_quota_toggled`` configuration option when committing a
transfer, because the snapshot reservations are stored in different row entries transfer, because the snapshot reservations are stored in different row entries
in case the option is changed and the service rebooted before a transfer is in case the option is changed and the service rebooted before a transfer is
@ -1317,7 +1321,7 @@ Table ``global_data`` will have the following fields:
``no_snapshot_gb_quota`` or ``quota_driver``. ``no_snapshot_gb_quota`` or ``quota_driver``.
- ``value``: String with the value of the key. For example ``true`` or - ``value``: String with the value of the key. For example ``true`` or
``Counters``. ``StoredQuotaDriver``.
REST API impact REST API impact
--------------- ---------------
@ -1367,22 +1371,22 @@ Performance Impact
Some preliminary code was prototyped for the volume creation and get usage Some preliminary code was prototyped for the volume creation and get usage
operations to evaluate the performance of the different quota drivers: the old, operations to evaluate the performance of the different quota drivers: the old,
the new ``Counters``, and the new ``Counting``. the new ``StoredQuotaDriver``, and the new ``DynamicQuotaDriver``.
The results showed that the new ``Counters`` system was twice as fast as the The results showed that the new ``StoredQuotaDriver`` system was twice as fast
old code in both operations, and the ``Counting`` was slower than the as the old code in both operations, and the ``DynamicQuotaDriver`` was slower
``Counters`` driver, as expected, but faster than the old one until there are than the ``StoredQuotaDriver``, as expected, but faster than the old one until
around 26000 resources per project. there are around 26000 resources per project.
So the ``Counting`` driver is less likely to be out of sync with reality So the ``DynamicQuotaDriver`` is less likely to be out of sync with reality
because it doesn't store fixed values, but the ``Counters`` driver has better because it doesn't store fixed values, but the ``StoredQuotaDriver`` has better
performance, and that's the reason why both drivers are going to be performance, and that's the reason why both drivers are going to be
implemented, to allow system administrators decide which one is better for implemented, to allow system administrators decide which one is better for
them. them.
The default driver will be ``Counting`` to prioritize the usage values always The default driver will be ``DynamicQuotaDriver`` to prioritize the usage
stay in sync, and large deployments or those looking for best performance will values always stay in sync, and large deployments or those looking for best
be able to use the ``Counters`` quota driver. performance will be able to use the ``StoredQuotaDriver``.
Deployments may even start with one quota system and then switch to the other Deployments may even start with one quota system and then switch to the other
if necessary. if necessary.
@ -1403,7 +1407,7 @@ Other deployer impact
* New quota system will no longer have an internal brute force cleaning * New quota system will no longer have an internal brute force cleaning
mechanism of quotas, the volume state change API will be used to clean mechanism of quotas, the volume state change API will be used to clean
reservations, and the ``cinder-manage quota sync`` command will be used for reservations, and the ``cinder-manage quota sync`` command will be used for
the ``Counters`` driver, so the following configuration options will be the ``StoredQuotaDriver``, so the following configuration options will be
deprecated and will no longer have any effect: deprecated and will no longer have any effect:
``reservation_expire``, ``reservation_clean_interval``, ``until_refresh``, ``reservation_expire``, ``reservation_clean_interval``, ``until_refresh``,
and ``max_age``. and ``max_age``.
@ -1432,6 +1436,8 @@ Assignee(s)
Primary assignee: Primary assignee:
Gorka Eguileor (geguileo) Gorka Eguileor (geguileo)
Other contributors:
Rajat Dhasmana (whoami-rajat)
Work Items Work Items
---------- ----------
@ -1439,8 +1445,8 @@ Work Items
As discussed in the PTG/mid-cycle this work may be split in 2 phases that may As discussed in the PTG/mid-cycle this work may be split in 2 phases that may
be implemented in different releases: be implemented in different releases:
Phase 1: ``Counting`` driver Phase 1: ``DynamicQuotaDriver``
**************************** *******************************
- Deprecate configuration options and log warnings for deployments that are - Deprecate configuration options and log warnings for deployments that are
using custom quota drivers. using custom quota drivers.
@ -1454,7 +1460,7 @@ Phase 1: ``Counting`` driver
- Remove deprecated ``consistencygroups`` resources from the ``quota_classes``, - Remove deprecated ``consistencygroups`` resources from the ``quota_classes``,
``quotas``, ``quota_usages`` and ``reservations`` table. ``quotas``, ``quota_usages`` and ``reservations`` table.
- Write the ``Counting`` database quota driver. - Write the ``DynamicQuotaDriver`` database quota driver.
- Make the following operations use the new quota driver: - Make the following operations use the new quota driver:
@ -1476,19 +1482,19 @@ Phase 1: ``Counting`` driver
- Make the ``cinder-manage quota sync`` and ``check`` be ``noop``. - Make the ``cinder-manage quota sync`` and ``check`` be ``noop``.
- Write the ``Counting`` database quota driver unit tests. - Write the ``DynamicQuotaDriver`` database quota driver unit tests.
- Update existing unit tests. - Update existing unit tests.
- Write initial documentation and mention that a more efficient driver will be - Write initial documentation and mention that a more efficient driver will be
coming in the future. coming in the future.
Phase 2: ``Counters`` driver Phase 2: ``StoredQuotaDriver``
**************************** ******************************
- Write the ``Counters`` database quota driver. - Write the ``StoredQuotaDriver`` database quota driver.
- Write the ``Counters`` database quota driver unit tests. - Write the ``StoredQuotaDriver`` database quota driver unit tests.
- Update the ``cinder-manage quota sync`` and ``check`` commands. - Update the ``cinder-manage quota sync`` and ``check`` commands.