RADOS Gateway: EC and Storage Classing

Add new appendix detailing how to configure the Ceph RADOS
Gateway to make use of Erasure Coding and automatic storage
device classing.

Change-Id: I85ff5deae37e2471434a499fd4d97d9b577386ce
This commit is contained in:
James Page 2019-04-17 15:54:26 +01:00
parent a342b72134
commit 4e93294c9b
2 changed files with 174 additions and 0 deletions

View File

@ -0,0 +1,173 @@
Appendix M: Ceph Erasure Coding and Device Classing
===================================================
Overview
++++++++
This appendix is intended as a post deployment guide to re-configuring RADOS
gateway pools to use erasure coding rather than replication. It also covers
use of a specific device class (NVMe, SSD or HDD) when creating the erasure
coding profile as well as other configuration options that need to be
considered during deployment.
.. note::
Any existing data is maintained by following this process, however
reconfiguration should take place immediately post deployment to avoid
prolonged copy-pool operations.
RADOS Gateway bucket weighting
++++++++++++++++++++++++++++++
The weighting of the various pools in a deployment drives the number of
placement groups (PGs) created to support each pool. In the ceph-radosgw
charm this is configured for the data bucket using:
.. code::
juju config ceph-radosgw rgw-buckets-pool-weight=20
Note the default of 20% - if the deployment is a pure ceph-radosgw
deployment this value should be increased to the expected % use of
storage. The device class also needs to be taken into account (but
for erasure coding this needs to be specified post deployment via action
execution).
Ceph automatic device classing
++++++++++++++++++++++++++++++
Newer versions of Ceph do automatic classing of OSD devices. Each OSD
will be placed into nvme, ssd or hdd device classes. These can
be used when creating erasure profiles or new CRUSH rules (see following
sections).
The classes can be inspected using:
.. code::
sudo ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
-1 8.18729 root default
-5 2.72910 host node-laveran
2 nvme 0.90970 osd.2
5 ssd 0.90970 osd.5
7 ssd 0.90970 osd.7
-7 2.72910 host node-mees
1 nvme 0.90970 osd.1
6 ssd 0.90970 osd.6
8 ssd 0.90970 osd.8
-3 2.72910 host node-pytheas
0 nvme 0.90970 osd.0
3 ssd 0.90970 osd.3
4 ssd 0.90970 osd.4
Configuring erasure coding
++++++++++++++++++++++++++
The RADOS gateway makes use of a number of pools, but the only pool
that should be converted to use erasure coding (EC) is the data pool:
.. code::
default.rgw.buckets.data
All other pools should be replicated as they are by default.
To create a new EC profile and pool:
.. code::
juju run-action --wait ceph-mon/0 create-erasure-profile \
name=nvme-ec device-class=nvme
juju run-action --wait ceph-mon/0 create-pool \
name=default.rgw.buckets.data.new \
pool-type=erasure \
erasure-profile-name=nvme-ec \
percent-data=90
The percent-data option should be set based on the type of deployment
but if the RADOS gateway is the only target for the NVMe storage class,
then 90% is appropriate (other RADOS gateway pools are tiny and use
between 0.10% and 3% of storage)
.. note::
The create-erasure-profile action has a number of other
options including adjustment of the K/M values which affect the
computational overhead and underlying storage consumed per MB stored.
Sane defaults are provided but they require a minimum of five hosts
with block devices of the right class.
To avoid any creation/mutation of stored data during migration,
shutdown all RADOS gateway instances:
.. code::
juju run --application ceph-radosgw \
"sudo systemctl stop ceph-radosgw.target"
The existing buckets.data pool can then be copied and switched:
.. code::
juju run-action --wait ceph-mon/0 rename-pool \
name=default.rgw.buckets.data \
new-name=default.rgw.buckets.data.old
juju run-action --wait ceph-mon/0 rename-pool \
name=default.rgw.buckets.data.new \
new-name=default.rgw.buckets.data
At this point the RADOS gateway instances can be restarted:
.. code::
juju run --application ceph-radosgw \
"sudo systemctl start ceph-radosgw.target"
Once successful operation of the deployment has been confirmed,
the old pool can be deleted:
.. code::
juju run-action --wait ceph-mon/0 delete-pool \
name=default.rgw.buckets.data.old
Moving other RADOS gateway pools to NVMe storage
++++++++++++++++++++++++++++++++++++++++++++++++
The buckets.data pool is the largest pool and the one that can make
use of EC; other pools could also be migrated to the same storage
class for consistent performance:
.. code::
juju run-action --wait ceph-mon/0 create-crush-rule \
name=replicated_nvme device-class=nvme
The CRUSH rule for the other RADOS gateway pools can then be updated:
.. code::
pools=".rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
default.rgw.intent-log
default.rgw.meta
default.rgw.usage
default.rgw.users.keys
default.rgw.users.uid
default.rgw.buckets.extra
default.rgw.buckets.index
default.rgw.users.email
default.rgw.users.swift"
for pool in $pools; do
juju run-action --wait ceph-mon/0 pool-set \
name=$pool key=crush_rule value=replicated_nvme
done

View File

@ -17,3 +17,4 @@ Appendices
app-rgw-multisite.rst
app-ceph-rbd-mirror.rst
app-masakari.rst
app-erasure-coding.rst