RBD: Change rbd_exclusive_cinder_pool's default

In Cinder we always try to have sane defaults, but the current RBD
default for rbd_exclusive_cinder_pools may lead to issues on deployments
with a large number of volumes:

- Cinder taking a long time to start.
- Cinder becoming non-responsive.
- Cinder stats gathering taking longer than the gathering period.

This is cause by the driver making an independent request to get
detailed information on each image to accurately calculate the space
used by the Cinder volumes.

With this patch we change the default to make sure that these issues
don't happen in the most common deployment case (the exclusive Cinder
pool).

Related-Bug: #1704106
Change-Id: I839441a71238cdad540ba8d9d4d18b1f0fa3ee9d
This commit is contained in:
Gorka Eguileor 2020-11-04 15:34:37 +01:00
parent 7175a23731
commit 4ba6664dee
3 changed files with 54 additions and 3 deletions

View File

@ -102,13 +102,16 @@ RBD_OPTS = [
'dynamic value (used + current free) and to False to ' 'dynamic value (used + current free) and to False to '
'report a static value (quota max bytes if defined and ' 'report a static value (quota max bytes if defined and '
'global size of cluster if not).'), 'global size of cluster if not).'),
cfg.BoolOpt('rbd_exclusive_cinder_pool', default=False, cfg.BoolOpt('rbd_exclusive_cinder_pool', default=True,
help="Set to True if the pool is used exclusively by Cinder. " help="Set to False if the pool is shared with other usages. "
"On exclusive use driver won't query images' provisioned " "On exclusive use driver won't query images' provisioned "
"size as they will match the value calculated by the " "size as they will match the value calculated by the "
"Cinder core code for allocated_capacity_gb. This " "Cinder core code for allocated_capacity_gb. This "
"reduces the load on the Ceph cluster as well as on the " "reduces the load on the Ceph cluster as well as on the "
"volume service."), "volume service. On non exclusive use driver will query "
"the Ceph cluster for per image used disk, this is an "
"intensive operation having an independent request for "
"each image."),
cfg.BoolOpt('enable_deferred_deletion', default=False, cfg.BoolOpt('enable_deferred_deletion', default=False,
help='Enable deferred deletion. Upon deletion, volumes are ' help='Enable deferred deletion. Upon deletion, volumes are '
'tagged for deletion but will only be removed ' 'tagged for deletion but will only be removed '

View File

@ -81,6 +81,37 @@ Ceph exposes RADOS; you can access it through the following interfaces:
Linux kernel and QEMU block devices that stripe Linux kernel and QEMU block devices that stripe
data across multiple objects. data across multiple objects.
RBD pool
~~~~~~~~
The RBD pool used by the Cinder backend is configured with option ``rbd_pool``,
and by default the driver expects exclusive management access to that pool, as
in being the only system creating and deleting resources in it, since that's
the recommended deployment choice.
Pool sharing is strongly discouraged, and if we were to share the pool with
other services, within OpenStack (Nova, Glance, another Cinder backend) or
outside of OpenStack (oVirt), then the stats returned by the driver to the
scheduler would not be entirely accurate.
The inaccuracy would be that the actual size in use by the cinder volumes would
be lower than the reported one, since it would be also including the used space
by the other services.
We can set the ``rbd_exclusive_cinder_pool`` configuration option to ``false``
to fix this inaccuracy, but this has a performance impact.
.. warning::
Setting ``rbd_exclusive_cinder_pool`` to ``false`` will increase the burden
on the Cinder driver and the Ceph cluster, since a request will be made for
each existing image, to retrieve its size, during the stats gathering
process.
For deployments with large amount of volumes it is recommended to leave the
default value of ``true``, and accept the inaccuracy, as it should not be
particularly problematic.
Driver options Driver options
~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~

View File

@ -0,0 +1,17 @@
---
upgrade:
- |
Ceph/RBD volume backends will now assume exclusive cinder pools, as if they
had ``rbd_exclusive_cinder_pool = true`` in their configuration.
This helps deployments with a large number of volumes and prevent issues on
deployments with a growing number of volumes at the small cost of a
slightly less accurate stats being reported to the scheduler.
fixes:
- |
Ceph/RBD: Fix cinder taking a long time to start for Ceph/RBD backends.
(`Related-Bug #1704106 <https://bugs.launchpad.net/cinder/+bug/1704106>`_)
- |
Ceph/RBD: Fix Cinder becoming non-responsive and stats gathering taking
longer that its period. (`Related-Bug #1704106
<https://bugs.launchpad.net/cinder/+bug/1704106>`_)