Remove storage pages
Remove the storage pages as they are being migrated to the charm-guide. Add HTML redirects (and tests) as a consequence. Depends-On: I933916e1a77e108ec255289e562149240ba09470 Change-Id: I20ffd8b97cfdac37ce83dc9fab35be236999e1e5
This commit is contained in:
parent
e6c366f6bd
commit
70a2ebe80f
|
@ -23,3 +23,10 @@ RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/nfv.html
|
|||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/pci-passthrough.html$ /charm-guide/$1/admin/compute/pci-passthrough.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/nova-cells.html$ /charm-guide/$1/admin/compute/nova-cells.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/ironic.html$ /charm-guide/$1/admin/compute/ironic.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/app-ceph-rbd-mirror.html$ /charm-guide/$1/admin/storage/ceph-rbd-mirror.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/ceph-erasure-coding.html$ /charm-guide/$1/admin/storage/ceph-erasure-coding.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/rgw-multisite.html$ /charm-guide/$1/admin/storage/ceph-rgw-multisite.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/cinder-volume-replication.html$ /charm-guide/$1/admin/storage/cinder-replication.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/encryption-at-rest.html$ /charm-guide/$1/admin/storage/encryption-at-rest.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/manila-ganesha.html$ /charm-guide/$1/admin/storage/shared-filesystem-services.html
|
||||
RedirectMatch 301 ^/project-deploy-guide/charm-deployment-guide/([^/]+)/swift.html$ /charm-guide/$1/admin/storage/swift.html
|
||||
|
|
|
@ -1,359 +0,0 @@
|
|||
==================
|
||||
Ceph RBD mirroring
|
||||
==================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
RADOS Block Device (RBD) mirroring is a process of asynchronous replication of
|
||||
Ceph block device images between two or more Ceph clusters. Mirroring ensures
|
||||
point-in-time consistent replicas of all changes to an image, including reads
|
||||
and writes, block device resizing, snapshots, clones, and flattening. RBD
|
||||
mirroring is mainly used for disaster recovery (i.e. having a secondary site as
|
||||
a failover). See the Ceph documentation on `RBD mirroring`_ for complete
|
||||
information.
|
||||
|
||||
This guide will show how to deploy two Ceph clusters with RBD mirroring between
|
||||
them with the use of the ceph-rbd-mirror charm. See the `charm's
|
||||
documentation`_ for basic information and charm limitations.
|
||||
|
||||
RBD mirroring is only one aspect of datacentre redundancy. Refer to `Ceph RADOS
|
||||
Gateway Multisite Replication`_ and other work to arrive at a complete
|
||||
solution.
|
||||
|
||||
Performance considerations
|
||||
--------------------------
|
||||
|
||||
RBD mirroring makes use of the journaling feature of Ceph. This incurs an
|
||||
overhead for write activity on an RBD image that will adversely affect
|
||||
performance.
|
||||
|
||||
For more information on performance aspects see Florian Haas' talk
|
||||
`Geographical Redundancy with rbd-mirror`_ (video) given at Cephalocon
|
||||
Barcelona 2019.
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
The two Ceph clusters will correspond to sites 'a' and 'b' and each cluster
|
||||
will reside within a separate model (models 'site-a' and 'site-b'). The
|
||||
deployment will require the use of `Cross model relations`_.
|
||||
|
||||
Deployment characteristics:
|
||||
|
||||
* each cluster will have 7 units:
|
||||
|
||||
* 3 x ceph-osd
|
||||
* 3 x ceph-mon
|
||||
* 1 x ceph-rbd-mirror
|
||||
|
||||
* application names will be used to distinguish between the applications in
|
||||
each site (e.g. site-a-ceph-mon and site-b-ceph-mon)
|
||||
|
||||
* the ceph-osd units will use block device ``/dev/vdd`` for their OSD volumes
|
||||
|
||||
.. note::
|
||||
|
||||
The two Ceph clusters can optionally be placed within the same model, and
|
||||
thus obviate the need for cross model relations. This topology is not
|
||||
generally considered to be a real world scenario.
|
||||
|
||||
Deployment
|
||||
----------
|
||||
|
||||
For site 'a' the following configuration is placed into file ``site-a.yaml``:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
site-a-ceph-mon:
|
||||
monitor-count: 3
|
||||
expected-osd-count: 3
|
||||
source: distro
|
||||
|
||||
site-a-ceph-osd:
|
||||
osd-devices: /dev/vdd
|
||||
source: distro
|
||||
|
||||
site-a-ceph-rbd-mirror:
|
||||
source: distro
|
||||
|
||||
Create the model and deploy the software for each site:
|
||||
|
||||
* Site 'a'
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-model site-a
|
||||
juju deploy -n 3 --config site-a.yaml ceph-osd site-a-ceph-osd
|
||||
juju deploy -n 3 --config site-a.yaml ceph-mon site-a-ceph-mon
|
||||
juju deploy --config site-a.yaml ceph-rbd-mirror site-a-ceph-rbd-mirror
|
||||
|
||||
* Site 'b'
|
||||
|
||||
An analogous configuration file is used (i.e. replace 'a' with 'b'):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-model site-b
|
||||
juju deploy -n 3 --config site-b.yaml ceph-osd site-b-ceph-osd
|
||||
juju deploy -n 3 --config site-b.yaml ceph-mon site-b-ceph-mon
|
||||
juju deploy --config site-b.yaml ceph-rbd-mirror site-b-ceph-rbd-mirror
|
||||
|
||||
Add two local relations for each site:
|
||||
|
||||
* Site 'a'
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation -m site-a site-a-ceph-mon:osd site-a-ceph-osd:mon
|
||||
juju add-relation -m site-a site-a-ceph-mon:rbd-mirror site-a-ceph-rbd-mirror:ceph-local
|
||||
|
||||
* Site 'b'
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation -m site-b site-b-ceph-mon:osd site-b-ceph-osd:mon
|
||||
juju add-relation -m site-b site-b-ceph-mon:rbd-mirror site-b-ceph-rbd-mirror:ceph-local
|
||||
|
||||
Export a ceph-rbd-mirror endpoint (by means of an "offer") for each site. This
|
||||
will enable us to create the inter-site (cross model) relations:
|
||||
|
||||
* Site 'a'
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju switch site-a
|
||||
juju offer site-a-ceph-rbd-mirror:ceph-remote
|
||||
|
||||
Output:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Application "site-a-ceph-rbd-mirror" endpoints [ceph-remote] available at "admin/site-a.site-a-ceph-rbd-mirror"
|
||||
|
||||
* Site 'b'
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju switch site-b
|
||||
juju offer site-b-ceph-rbd-mirror:ceph-remote
|
||||
|
||||
Output:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Application "site-b-ceph-rbd-mirror" endpoints [ceph-remote] available at "admin/site-b.site-b-ceph-rbd-mirror"
|
||||
|
||||
Add the two inter-site relations by referring to the offer URLs (included in
|
||||
the output above) as if they were applications in the local model:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation -m site-a site-a-ceph-mon admin/site-b.site-b-ceph-rbd-mirror
|
||||
juju add-relation -m site-b site-b-ceph-mon admin/site-a.site-a-ceph-rbd-mirror
|
||||
|
||||
Verify the output of :command:`juju status` for each model:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju status -m site-a --relations
|
||||
|
||||
Output:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
site-a prod-1 openstack/default 2.8.9 unsupported 16:00:39Z
|
||||
|
||||
SAAS Status Store URL
|
||||
site-b-ceph-rbd-mirror waiting serverstack admin/site-b.site-b-ceph-rbd-mirror
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
site-a-ceph-mon 15.2.8 active 3 ceph-mon jujucharms 53 ubuntu
|
||||
site-a-ceph-osd 15.2.8 active 3 ceph-osd jujucharms 308 ubuntu
|
||||
site-a-ceph-rbd-mirror 15.2.8 waiting 1 ceph-rbd-mirror jujucharms 15 ubuntu
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
site-a-ceph-mon/0 active idle 0 10.5.0.4 Unit is ready and clustered
|
||||
site-a-ceph-mon/1 active idle 1 10.5.0.14 Unit is ready and clustered
|
||||
site-a-ceph-mon/2* active idle 2 10.5.0.7 Unit is ready and clustered
|
||||
site-a-ceph-osd/0 active idle 0 10.5.0.4 Unit is ready (1 OSD)
|
||||
site-a-ceph-osd/1 active idle 1 10.5.0.14 Unit is ready (1 OSD)
|
||||
site-a-ceph-osd/2* active idle 2 10.5.0.7 Unit is ready (1 OSD)
|
||||
site-a-ceph-rbd-mirror/0* waiting idle 3 10.5.0.11 Waiting for pools to be created
|
||||
|
||||
Machine State DNS Inst id Series AZ Message
|
||||
0 started 10.5.0.4 4f3e4d94-5003-4998-ab30-11fc3c845a7a focal nova ACTIVE
|
||||
1 started 10.5.0.14 7682822e-4469-41e1-b938-225c067f9f82 focal nova ACTIVE
|
||||
2 started 10.5.0.7 786e7d84-3f94-4cd6-9493-72026d629fcf focal nova ACTIVE
|
||||
3 started 10.5.0.11 715c8738-e41e-4be2-8638-560206b2c434 focal nova ACTIVE
|
||||
|
||||
Offer Application Charm Rev Connected Endpoint Interface Role
|
||||
site-a-ceph-rbd-mirror site-a-ceph-rbd-mirror ceph-rbd-mirror 15 1/1 ceph-remote ceph-rbd-mirror requirer
|
||||
|
||||
Relation provider Requirer Interface Type Message
|
||||
site-a-ceph-mon:mon site-a-ceph-mon:mon ceph peer
|
||||
site-a-ceph-mon:osd site-a-ceph-osd:mon ceph-osd regular
|
||||
site-a-ceph-mon:rbd-mirror site-a-ceph-rbd-mirror:ceph-local ceph-rbd-mirror regular
|
||||
site-a-ceph-mon:rbd-mirror site-b-ceph-rbd-mirror:ceph-remote ceph-rbd-mirror regular
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
site-a maas-prod-1 acme-1/default 2.8.1 unsupported 20:00:41Z
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju status -m site-b --relations
|
||||
|
||||
Output:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
Model Controller Cloud/Region Version SLA Timestamp
|
||||
site-b prod-1 openstack/default 2.8.9 unsupported 16:05:39Z
|
||||
|
||||
SAAS Status Store URL
|
||||
site-a-ceph-rbd-mirror waiting serverstack admin/site-a.site-a-ceph-rbd-mirror
|
||||
|
||||
App Version Status Scale Charm Store Rev OS Notes
|
||||
site-b-ceph-mon 15.2.8 active 3 ceph-mon jujucharms 53 ubuntu
|
||||
site-b-ceph-osd 15.2.8 active 3 ceph-osd jujucharms 308 ubuntu
|
||||
site-b-ceph-rbd-mirror 15.2.8 waiting 1 ceph-rbd-mirror jujucharms 15 ubuntu
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
site-b-ceph-mon/0 active idle 0 10.5.0.3 Unit is ready and clustered
|
||||
site-b-ceph-mon/1 active idle 1 10.5.0.20 Unit is ready and clustered
|
||||
site-b-ceph-mon/2* active idle 2 10.5.0.8 Unit is ready and clustered
|
||||
site-b-ceph-osd/0 active idle 0 10.5.0.3 Unit is ready (1 OSD)
|
||||
site-b-ceph-osd/1 active idle 1 10.5.0.20 Unit is ready (1 OSD)
|
||||
site-b-ceph-osd/2* active idle 2 10.5.0.8 Unit is ready (1 OSD)
|
||||
site-b-ceph-rbd-mirror/0* waiting idle 3 10.5.0.12 Waiting for pools to be created
|
||||
|
||||
Machine State DNS Inst id Series AZ Message
|
||||
0 started 10.5.0.3 2caf61f7-8675-4cd9-a3c4-cc68a0cb3f2d focal nova ACTIVE
|
||||
1 started 10.5.0.20 d1b3bd0b-1631-4bd3-abba-14a366b3d752 focal nova ACTIVE
|
||||
2 started 10.5.0.8 84eb5db2-d673-4d36-82b4-902463362704 focal nova ACTIVE
|
||||
3 started 10.5.0.12 c40e1247-7b7d-4b84-ab3a-8b72c22f096e focal nova ACTIVE
|
||||
|
||||
Offer Application Charm Rev Connected Endpoint Interface Role
|
||||
site-b-ceph-rbd-mirror site-b-ceph-rbd-mirror ceph-rbd-mirror 15 1/1 ceph-remote ceph-rbd-mirror requirer
|
||||
|
||||
Relation provider Requirer Interface Type Message
|
||||
site-b-ceph-mon:mon site-b-ceph-mon:mon ceph peer
|
||||
site-b-ceph-mon:osd site-b-ceph-osd:mon ceph-osd regular
|
||||
site-b-ceph-mon:rbd-mirror site-a-ceph-rbd-mirror:ceph-remote ceph-rbd-mirror regular
|
||||
site-b-ceph-mon:rbd-mirror site-b-ceph-rbd-mirror:ceph-local ceph-rbd-mirror regular
|
||||
|
||||
There are no Ceph pools created by default. The next section ('Pool creation')
|
||||
provides guidance.
|
||||
|
||||
Pool creation
|
||||
-------------
|
||||
|
||||
RBD pools can be created by either a supporting charm (through the Ceph broker
|
||||
protocol) or manually by the operator:
|
||||
|
||||
#. A charm-created pool (e.g. the glance or nova-compute charms) will
|
||||
automatically be detected and acted upon (i.e. a remote pool will be set up
|
||||
in the peer cluster).
|
||||
|
||||
#. A manually-created pool, whether done via the ceph-mon application or
|
||||
through Ceph directly, will require an action to be run on the
|
||||
ceph-rbd-mirror application leader in order for the remote pool to come
|
||||
online.
|
||||
|
||||
For example, to create a pool manually in site 'a' and have ceph-rbd-mirror
|
||||
(of site 'a') initialise a pool in site 'b':
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait -m site-a site-a-ceph-mon/leader create-pool name=mypool app-name=rbd
|
||||
juju run-action --wait -m site-a site-a-ceph-rbd-mirror/leader refresh-pools
|
||||
|
||||
This can be verified by listing the pools in site 'b':
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait -m site-b site-b-ceph-mon/leader list-pools
|
||||
|
||||
.. note::
|
||||
|
||||
Automatic peer-pool creation (for a charm-created pool) is based on the
|
||||
local pool being labelled with a Ceph 'rbd' tag. This Ceph-internal
|
||||
labelling occurs when the newly-created local pool is associated with the
|
||||
RBD application. This last feature is supported starting with Ceph Luminous
|
||||
(OpenStack Queens).
|
||||
|
||||
Failover and fallback
|
||||
---------------------
|
||||
|
||||
To manage failover and fallback, the ``demote`` and ``promote`` actions are
|
||||
applied to the ceph-rbd-mirror application leader.
|
||||
|
||||
For instance, to fail over from site 'a' to site 'b' the former is demoted and
|
||||
the latter is promoted. The rest of the commands are status checks:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait -m site-a site-a-ceph-rbd-mirror/leader status verbose=true
|
||||
juju run-action --wait -m site-b site-b-ceph-rbd-mirror/leader status verbose=true
|
||||
|
||||
juju run-action --wait -m site-a site-a-ceph-rbd-mirror/leader demote
|
||||
|
||||
juju run-action --wait -m site-a site-a-ceph-rbd-mirror/leader status verbose=true
|
||||
juju run-action --wait -m site-b site-b-ceph-rbd-mirror/leader status verbose=true
|
||||
|
||||
juju run-action --wait -m site-b site-b-ceph-rbd-mirror/leader promote
|
||||
|
||||
To fall back to site 'a' the actions are reversed:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait -m site-b site-b-ceph-rbd-mirror/leader demote
|
||||
juju run-action --wait -m site-a site-a-ceph-rbd-mirror/leader promote
|
||||
|
||||
.. note::
|
||||
|
||||
With Ceph Luminous (and greater), the mirror status information may not be
|
||||
accurate. Specifically, the ``entries_behind_master`` counter may never get
|
||||
to '0' even though the image has been fully synchronised.
|
||||
|
||||
Recovering from abrupt shutdown
|
||||
-------------------------------
|
||||
|
||||
It is possible that an abrupt shutdown and/or an interruption to communication
|
||||
channels may lead to a "split-brain" condition. This may cause the mirroring
|
||||
daemon in each cluster to claim to be the primary. In such cases, the operator
|
||||
must make a call as to which daemon is correct. Generally speaking, this means
|
||||
deciding which cluster has the most recent data.
|
||||
|
||||
Elect a primary by applying the ``demote`` and ``promote`` actions to the
|
||||
appropriate ceph-rbd-mirror leader. After doing so, the ``resync-pools`` action
|
||||
must be run on the secondary cluster leader. The ``promote`` action may require
|
||||
a force option.
|
||||
|
||||
Here, we make site 'a' be the primary by demoting site 'b' and promoting site
|
||||
'a':
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait -m site-b site-b-ceph-rbd-mirror/leader demote
|
||||
juju run-action --wait -m site-a site-a-ceph-rbd-mirror/leader promote force=true
|
||||
|
||||
juju run-action --wait -m site-a site-a-ceph-rbd-mirror/leader status verbose=true
|
||||
juju run-action --wait -m site-b site-b-ceph-rbd-mirror/leader status verbose=true
|
||||
|
||||
juju run-action --wait -m site-b site-b-ceph-rbd-mirror/leader resync-pools i-really-mean-it=true
|
||||
|
||||
.. note::
|
||||
|
||||
When using Ceph Luminous, the mirror state information will not be accurate
|
||||
after recovering from unclean shutdown. Regardless of the output of the
|
||||
status information, you will be able to write to images after a forced
|
||||
promote.
|
||||
|
||||
.. LINKS
|
||||
.. _charm's documentation: https://opendev.org/openstack/charm-ceph-rbd-mirror/src/branch/master/src/README.md
|
||||
.. _Ceph RADOS Gateway Multisite replication: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/rgw-multisite.html
|
||||
.. _RBD mirroring: https://docs.ceph.com/en/latest/rbd/rbd-mirroring
|
||||
.. _Geographical Redundancy with rbd-mirror: https://youtu.be/ZifNGprBUTA
|
||||
.. _Cross model relations: https://juju.is/docs/olm/cross-model-relations
|
|
@ -1,203 +0,0 @@
|
|||
===================
|
||||
Ceph erasure coding
|
||||
===================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
Ceph pools supporting applications within an OpenStack deployment are
|
||||
by default configured as replicated pools which means that every stored
|
||||
object is copied to multiple hosts or zones to allow the pool to survive
|
||||
the loss of an OSD.
|
||||
|
||||
Ceph also supports Erasure Coded pools which can be used to save
|
||||
raw space within the Ceph cluster. The following charms can be configured
|
||||
to use Erasure Coded pools:
|
||||
|
||||
* `ceph-fs`_
|
||||
* `ceph-radosgw`_
|
||||
* `cinder-ceph`_
|
||||
* `glance`_
|
||||
* `nova-compute`_
|
||||
|
||||
.. warning::
|
||||
|
||||
Enabling the use of Erasure Coded pools will effect the IO performance
|
||||
of the pool and will incur additional CPU and memory overheads on the
|
||||
Ceph OSD nodes due to calculation of coding chunks during read and
|
||||
write operations and during recovery of data chunks from failed OSDs.
|
||||
|
||||
.. note::
|
||||
|
||||
The mirroring of RBD images stored in Erasure Coded pools is not currently
|
||||
supported by the ceph-rbd-mirror charm due to limitations in the functionality
|
||||
of the Ceph rbd-mirror application.
|
||||
|
||||
Configuring charms for Erasure Coding
|
||||
-------------------------------------
|
||||
|
||||
Charms that support Erasure Coded pools have a consistent set of configuration
|
||||
options to enable and tune the Erasure Coding profile used to configure
|
||||
the Erasure Coded pools created for each application.
|
||||
|
||||
Erasure Coding is enabled by setting the ``pool-type`` option to 'erasure-coded'.
|
||||
|
||||
Ceph supports multiple `Erasure Code`_ plugins. A plugin may provide support for
|
||||
multiple Erasure Code techniques - for example the JErasure plugin provides
|
||||
support for Cauchy and Reed-Solomon Vandermonde (and others).
|
||||
|
||||
For the default JErasure plugin, the K value defines the number of data chunks
|
||||
that will be used for each object and the M value defines the number of coding
|
||||
chunks generated for each object. The M value also defines the number of hosts
|
||||
or zones that may be lost before the pool goes into a degraded state.
|
||||
|
||||
K + M must always be less than or equal to the number of hosts or zones in the
|
||||
deployment (depending on the configuration of ``customize-failure-domain``).
|
||||
|
||||
By default the JErasure plugin is used with K=1 and M=2. This does not
|
||||
actually save any raw storage compared to a replicated pool with 3 replicas
|
||||
(and is to allow use on a three node Ceph cluster) so most deployments
|
||||
using Erasure Coded pools will need to tune the K and M values based on either
|
||||
the number of hosts deployed or the number of zones in the deployment (if
|
||||
the ``customize-failure-domain`` option is enabled on the ceph-osd and ceph-mon
|
||||
charms).
|
||||
|
||||
In the example below, the Erasure Coded pool used by the glance application
|
||||
will sustain the loss of two hosts or zones while only consuming 2TB instead
|
||||
of 3TB of storage to store 1TB of data when compared to a replicated pool. This
|
||||
configuration requires a minimum of 4 hosts (or zones).
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
glance:
|
||||
options:
|
||||
pool-type: erasure-coded
|
||||
ec-profile-k: 2
|
||||
ec-profile-m: 2
|
||||
|
||||
The full list of Erasure Coding configuration options is detailed below.
|
||||
Full descriptions of each plugin and its configuration options can also
|
||||
be found in the `Ceph Erasure Code`_ documention for the Ceph project.
|
||||
|
||||
.. list-table:: Erasure Coding charm options
|
||||
:widths: 20 15 5 15 45
|
||||
:header-rows: 1
|
||||
|
||||
* - Option
|
||||
- Charm
|
||||
- Type
|
||||
- Default Value
|
||||
- Description
|
||||
* - pool-type
|
||||
- all
|
||||
- string
|
||||
- replicated
|
||||
- Ceph pool type to use for storage - valid values are 'replicated' and 'erasure-coded'.
|
||||
* - ec-rbd-metadata-pool
|
||||
- glance, cinder-ceph, nova-compute
|
||||
- string
|
||||
-
|
||||
- Name of the metadata pool to be created (for RBD use-cases). If not defined a metadata pool name will be generated based on the name of the data pool used by the application. The metadata pool is always replicated (not erasure coded).
|
||||
* - metadata-pool
|
||||
- ceph-fs
|
||||
- string
|
||||
-
|
||||
- Name of the metadata pool to be created for the CephFS filesystem. If not defined a metadata pool name will be generated based on the name of the data pool used by the application. The metadata pool is always replicated (not erasure coded).
|
||||
* - ec-profile-name
|
||||
- all
|
||||
- string
|
||||
-
|
||||
- Name for the EC profile to be created for the EC pools. If not defined a profile name will be generated based on the name of the pool used by the application.
|
||||
* - ec-profile-k
|
||||
- all
|
||||
- int
|
||||
- 1
|
||||
- Number of data chunks that will be used for EC data pool. K+M factors should never be greater than the number of available AZs for balancing.
|
||||
* - ec-profile-m
|
||||
- all
|
||||
- int
|
||||
- 2
|
||||
- Number of coding chunks that will be used for EC data pool. K+M factors should never be greater than number of available AZs for balancing.
|
||||
* - ec-profile-locality
|
||||
- all
|
||||
- int
|
||||
-
|
||||
- (lrc plugin - l) Group the coding and data chunks into sets of size l. For instance, for k=4 and m=2, when l=3 two groups of three are created. Each set can be recovered without reading chunks from another set. Note that using the lrc plugin does incur more raw storage usage than isa or jerasure in order to reduce the cost of recovery operations.
|
||||
* - ec-profile-crush-locality
|
||||
- all
|
||||
- string
|
||||
-
|
||||
- (lrc plugin) The type of the crush bucket in which each set of chunks defined by l will be stored. For instance, if it is set to rack, each group of l chunks will be placed in a different rack. It is used to create a CRUSH rule step such as 'step choose rack'. If it is not set, no such grouping is done.
|
||||
* - ec-profile-durability-estimator
|
||||
- all
|
||||
- int
|
||||
-
|
||||
- (shec plugin - c) The number of parity chunks each of which includes each data chunk in its calculation range. The number is used as a durability estimator. For instance, if c=2, 2 OSDs can be down without losing data.
|
||||
* - ec-profile-helper-chunks
|
||||
- all
|
||||
- int
|
||||
-
|
||||
- (clay plugin - d) Number of OSDs requested to send data during recovery of a single chunk. d needs to be chosen such that k+1 <= d <= k+m-1. The larger the d, the better the savings.
|
||||
* - ec-profile-scalar-mds
|
||||
- all
|
||||
- string
|
||||
-
|
||||
- (clay plugin) Specifies the plugin that is used as a building block in the layered construction. It can be one of: jerasure, isa or shec.
|
||||
* - ec-profile-plugin
|
||||
- all
|
||||
- string
|
||||
- jerasure
|
||||
- EC plugin to use for this applications pool. These plugins are available: jerasure, lrc, isa, shec, clay.
|
||||
* - ec-profile-technique
|
||||
- all
|
||||
- string
|
||||
- reed_sol_van
|
||||
- EC profile technique used for this applications pool - will be validated based on the plugin configured via ec-profile-plugin. Supported techniques are 'reed_sol_van', 'reed_sol_r6_op', 'cauchy_orig', 'cauchy_good', 'liber8tion' for jerasure, 'reed_sol_van', 'cauchy' for isa and 'single', 'multiple' for shec.
|
||||
* - ec-profile-device-class
|
||||
- all
|
||||
- string
|
||||
-
|
||||
- Device class from CRUSH map to use for placement groups for erasure profile - valid values: ssd, hdd or nvme (or leave unset to not use a device class).
|
||||
|
||||
|
||||
Ceph automatic device classing
|
||||
------------------------------
|
||||
|
||||
Newer versions of Ceph perform automatic classing of OSD devices. Each OSD
|
||||
will be placed into ‘nvme’, ‘ssd’ or ‘hdd’ device classes. These can
|
||||
be used when enabling Erasure Coded pools.
|
||||
|
||||
Device classes can be inspected using:
|
||||
|
||||
.. code::
|
||||
|
||||
sudo ceph osd crush tree
|
||||
|
||||
ID CLASS WEIGHT TYPE NAME
|
||||
-1 8.18729 root default
|
||||
-5 2.72910 host node-laveran
|
||||
2 nvme 0.90970 osd.2
|
||||
5 ssd 0.90970 osd.5
|
||||
7 ssd 0.90970 osd.7
|
||||
-7 2.72910 host node-mees
|
||||
1 nvme 0.90970 osd.1
|
||||
6 ssd 0.90970 osd.6
|
||||
8 ssd 0.90970 osd.8
|
||||
-3 2.72910 host node-pytheas
|
||||
0 nvme 0.90970 osd.0
|
||||
3 ssd 0.90970 osd.3
|
||||
4 ssd 0.90970 osd.4
|
||||
|
||||
The device class for an Erasure Coded pool can be configured in the
|
||||
consuming charm using the ``ec-device-class`` configuration option.
|
||||
|
||||
If this option is not provided devices of any class will be used.
|
||||
|
||||
.. LINKS
|
||||
.. _Ceph Erasure Code: https://docs.ceph.com/docs/master/rados/operations/erasure-code/
|
||||
.. _ceph-fs: https://jaas.ai/ceph-fs
|
||||
.. _ceph-radosgw: https://jaas.ai/ceph-radosgw
|
||||
.. _cinder-ceph: https://jaas.ai/cinder-ceph
|
||||
.. _glance: https://jaas.ai/glance
|
||||
.. _nova-compute: https://jaas.ai/nova-compute
|
||||
.. _Erasure Code: https://en.wikipedia.org/wiki/Erasure_code
|
|
@ -1,275 +0,0 @@
|
|||
:orphan:
|
||||
|
||||
.. _cinder_volume_replication_dr:
|
||||
|
||||
=============================================
|
||||
Cinder volume replication - Disaster recovery
|
||||
=============================================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
This is the disaster recovery scenario of a Cinder volume replication
|
||||
deployment. It should be read in conjunction with the :doc:`Cinder volume
|
||||
replication <cinder-volume-replication>` page.
|
||||
|
||||
Scenario description
|
||||
--------------------
|
||||
|
||||
Disaster recovery involves an uncontrolled failover to the secondary site.
|
||||
Site-b takes over from a troubled site-a and becomes the de-facto primary site,
|
||||
which includes writes to its images. Control is passed back to site-a once it
|
||||
is repaired.
|
||||
|
||||
.. warning::
|
||||
|
||||
The charms support the underlying OpenStack servcies in their native ability
|
||||
to failover and failback. However, a significant degree of administrative
|
||||
care is still needed in order to ensure a successful recovery.
|
||||
|
||||
For example,
|
||||
|
||||
* primary volume images that are currently in use may experience difficulty
|
||||
during their demotion to secondary status
|
||||
|
||||
* running VMs will lose connectivity to their volumes
|
||||
|
||||
* subsequent image resyncs may not be straightforward
|
||||
|
||||
Any work necessary to rectify data issues resulting from an uncontrolled
|
||||
failover is beyond the scope of the OpenStack charms and this document.
|
||||
|
||||
Simulation
|
||||
----------
|
||||
|
||||
For the sake of understanding some of the rudimentary aspects involved in
|
||||
disaster recovery a simulation is provided.
|
||||
|
||||
Preparation
|
||||
~~~~~~~~~~~
|
||||
|
||||
Create the replicated data volume and confirm it is available:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data
|
||||
openstack volume list
|
||||
|
||||
Simulate a failure in site-a by turning off all of its Ceph MON daemons:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju ssh site-a-ceph-mon/0 sudo systemctl stop ceph-mon.target
|
||||
juju ssh site-a-ceph-mon/1 sudo systemctl stop ceph-mon.target
|
||||
juju ssh site-a-ceph-mon/2 sudo systemctl stop ceph-mon.target
|
||||
|
||||
Modify timeout and retry settings
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When a Ceph cluster fails communication between Cinder and the failed cluster
|
||||
will be interrupted and the RBD driver will accommodate with retries and
|
||||
timeouts.
|
||||
|
||||
To accelerate the failover mechanism, timeout and retry settings on the
|
||||
cinder-ceph unit in site-a can be modified:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju ssh cinder-ceph-a/0
|
||||
> sudo apt install -y crudini
|
||||
> sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout 1
|
||||
> sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries 1
|
||||
> sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval 0
|
||||
> sudo crudini --set /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout 1
|
||||
> sudo systemctl restart cinder-volume
|
||||
> exit
|
||||
|
||||
These configuration changes are only intended to be in effect during the
|
||||
failover transition period. They should be reverted afterwards since the
|
||||
default values are fine for normal operations.
|
||||
|
||||
Failover
|
||||
~~~~~~~~
|
||||
|
||||
Perform the failover of site-a, confirm its cinder-volume host is disabled, and
|
||||
that the volume remains available:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder failover-host cinder@cinder-ceph-a
|
||||
cinder service-list
|
||||
openstack volume list
|
||||
|
||||
Confirm that the Cinder log file (``/var/log/cinder/cinder-volume.log``) on
|
||||
unit ``cinder/0`` contains the successful failover message: ``Failed over to
|
||||
replication target successfully.``.
|
||||
|
||||
Revert timeout and retry settings
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Revert the configuration changes made to the cinder-ceph backend:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju ssh cinder-ceph-a/0
|
||||
> sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connect_timeout
|
||||
> sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_retries
|
||||
> sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a rados_connection_interval
|
||||
> sudo crudini --del /etc/cinder/cinder.conf cinder-ceph-a replication_connect_timeout
|
||||
> sudo systemctl restart cinder-volume
|
||||
> exit
|
||||
|
||||
Write to the volume
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Create a VM (named 'vm-with-data-volume'):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server create --image focal-amd64 --flavor m1.tiny \
|
||||
--key-name mykey --network int_net vm-with-data-volume
|
||||
|
||||
FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
|
||||
openstack server add floating ip vm-with-data-volume $FLOATING_IP
|
||||
|
||||
Attach the volume to the VM, write some data to it, and detach it:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server add volume vm-with-data-volume vol-site-a-repl-data
|
||||
|
||||
ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
|
||||
> sudo mkfs.ext4 /dev/vdc
|
||||
> mkdir data
|
||||
> sudo mount /dev/vdc data
|
||||
> sudo chown ubuntu: data
|
||||
> echo "This is a test." > data/test.txt
|
||||
> sync
|
||||
> sudo umount /dev/vdc
|
||||
> exit
|
||||
|
||||
openstack server remove volume vm-with-data-volume vol-site-a-repl-data
|
||||
|
||||
Repair site-a
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
In the current example, site-a is repaired by starting the Ceph MON daemons:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju ssh site-a-ceph-mon/0 sudo systemctl start ceph-mon.target
|
||||
juju ssh site-a-ceph-mon/1 sudo systemctl start ceph-mon.target
|
||||
juju ssh site-a-ceph-mon/2 sudo systemctl start ceph-mon.target
|
||||
|
||||
Confirm that the MON cluster is now healthy (it may take a while):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju status site-a-ceph-mon
|
||||
|
||||
Unit Workload Agent Machine Public address Ports Message
|
||||
site-a-ceph-mon/0 active idle 14 10.5.0.15 Unit is ready and clustered
|
||||
site-a-ceph-mon/1* active idle 15 10.5.0.31 Unit is ready and clustered
|
||||
site-a-ceph-mon/2 active idle 16 10.5.0.11 Unit is ready and clustered
|
||||
|
||||
Image resync
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Putting site-a back online at this point will lead to two primary images for
|
||||
each replicated volume. This is a split-brain condition that cannot be resolved
|
||||
by the RBD mirror daemon. Hence, before failback is invoked each replicated
|
||||
volume will need a resync of its images (site-b images are more recent than the
|
||||
site-a images).
|
||||
|
||||
The image resync is a two-step process that is initiated on the ceph-rbd-mirror
|
||||
unit in site-a:
|
||||
|
||||
Demote the site-a images with the ``demote`` action:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait site-a-ceph-rbd-mirror/0 demote pools=cinder-ceph-a
|
||||
|
||||
Flag the site-a images for a resync with the ``resync-pools`` action. The
|
||||
``pools`` argument should point to the corresponding site's pool, which by
|
||||
default is the name of the cinder-ceph application for the site (here
|
||||
'cinder-ceph-a'):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait site-a-ceph-rbd-mirror/0 resync-pools i-really-mean-it=true pools=cinder-ceph-a
|
||||
|
||||
The Ceph RBD mirror daemon will perform the resync in the background.
|
||||
|
||||
Failback
|
||||
~~~~~~~~
|
||||
|
||||
Prior to failback, confirm that the images of all replicated volumes in site-a
|
||||
are fully synchronised. Perform a check with the ceph-rbd-mirror charm's
|
||||
``status`` action as per :ref:`RBD image status <rbd_image_status>`:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
|
||||
|
||||
This will take a while.
|
||||
|
||||
The state and description for site-a images will transition to:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
state: up+syncing
|
||||
description: bootstrapping, IMAGE_SYNC/CREATE_SYNC_POINT
|
||||
|
||||
The intermediate values will look like:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
state: up+replaying
|
||||
description: replaying, {"bytes_per_second":110318.93,"entries_behind_primary":4712.....
|
||||
|
||||
The final values, as expected, will become:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
state: up+replaying
|
||||
description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0.....
|
||||
|
||||
The failback of site-a can now proceed:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder failover-host cinder@cinder-ceph-a --backend_id default
|
||||
|
||||
Confirm the original health of Cinder services (as per :ref:`Cinder service
|
||||
list <cinder_service_list>`):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder service-list
|
||||
|
||||
Verification
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Re-attach the volume to the VM and verify that the secondary device contains
|
||||
the expected data:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server add volume vm-with-data-volume vol-site-a-repl-data
|
||||
ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
|
||||
> sudo mount /dev/vdc data
|
||||
> cat data/test.txt
|
||||
This is a test.
|
||||
|
||||
We can also check the status of the image as per :ref:`RBD image status
|
||||
<rbd_image_status>` to verify that the primary indeed resides in site-a again:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
|
||||
|
||||
volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
|
||||
global_id: 3a4aa755-c9ee-4319-8ba4-fc494d20d783
|
||||
state: up+stopped
|
||||
description: local image is primary
|
|
@ -1,138 +0,0 @@
|
|||
:orphan:
|
||||
|
||||
.. _cinder_volume_replication_custom_overlay:
|
||||
|
||||
========================================
|
||||
Cinder volume replication custom overlay
|
||||
========================================
|
||||
|
||||
The below bundle overlay is used in the instructions given on the :doc:`Cinder
|
||||
volume replication <cinder-volume-replication>` page.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
series: focal
|
||||
|
||||
# Change these variables according to the local environment, 'osd-devices'
|
||||
# and 'data-port' in particular.
|
||||
variables:
|
||||
openstack-origin: &openstack-origin cloud:focal-victoria
|
||||
osd-devices: &osd-devices /dev/sdb /dev/vdb
|
||||
expected-osd-count: &expected-osd-count 3
|
||||
expected-mon-count: &expected-mon-count 3
|
||||
data-port: &data-port br-ex:ens7
|
||||
|
||||
relations:
|
||||
- - cinder-ceph-a:storage-backend
|
||||
- cinder:storage-backend
|
||||
- - cinder-ceph-b:storage-backend
|
||||
- cinder:storage-backend
|
||||
|
||||
- - site-a-ceph-osd:mon
|
||||
- site-a-ceph-mon:osd
|
||||
- - site-b-ceph-osd:mon
|
||||
- site-b-ceph-mon:osd
|
||||
|
||||
- - site-a-ceph-mon:client
|
||||
- nova-compute:ceph
|
||||
- - site-b-ceph-mon:client
|
||||
- nova-compute:ceph
|
||||
|
||||
- - site-a-ceph-mon:client
|
||||
- cinder-ceph-a:ceph
|
||||
- - site-b-ceph-mon:client
|
||||
- cinder-ceph-b:ceph
|
||||
|
||||
- - nova-compute:ceph-access
|
||||
- cinder-ceph-a:ceph-access
|
||||
- - nova-compute:ceph-access
|
||||
- cinder-ceph-b:ceph-access
|
||||
|
||||
- - site-a-ceph-mon:client
|
||||
- glance:ceph
|
||||
|
||||
- - site-a-ceph-mon:rbd-mirror
|
||||
- site-a-ceph-rbd-mirror:ceph-local
|
||||
- - site-b-ceph-mon:rbd-mirror
|
||||
- site-b-ceph-rbd-mirror:ceph-local
|
||||
|
||||
- - site-a-ceph-mon
|
||||
- site-b-ceph-rbd-mirror:ceph-remote
|
||||
- - site-b-ceph-mon
|
||||
- site-a-ceph-rbd-mirror:ceph-remote
|
||||
|
||||
- - site-a-ceph-mon:client
|
||||
- cinder-ceph-b:ceph-replication-device
|
||||
- - site-b-ceph-mon:client
|
||||
- cinder-ceph-a:ceph-replication-device
|
||||
|
||||
applications:
|
||||
|
||||
# Prevent some applications in the main bundle from being deployed.
|
||||
ceph-radosgw:
|
||||
ceph-osd:
|
||||
ceph-mon:
|
||||
cinder-ceph:
|
||||
|
||||
# Deploy ceph-osd applications with the appropriate names.
|
||||
site-a-ceph-osd:
|
||||
charm: cs:ceph-osd
|
||||
num_units: 3
|
||||
options:
|
||||
osd-devices: *osd-devices
|
||||
source: *openstack-origin
|
||||
|
||||
site-b-ceph-osd:
|
||||
charm: cs:ceph-osd
|
||||
num_units: 3
|
||||
options:
|
||||
osd-devices: *osd-devices
|
||||
source: *openstack-origin
|
||||
|
||||
# Deploy ceph-mon applications with the appropriate names.
|
||||
site-a-ceph-mon:
|
||||
charm: cs:ceph-mon
|
||||
num_units: 3
|
||||
options:
|
||||
expected-osd-count: *expected-osd-count
|
||||
monitor-count: *expected-mon-count
|
||||
source: *openstack-origin
|
||||
|
||||
site-b-ceph-mon:
|
||||
charm: cs:ceph-mon
|
||||
num_units: 3
|
||||
options:
|
||||
expected-osd-count: *expected-osd-count
|
||||
monitor-count: *expected-mon-count
|
||||
source: *openstack-origin
|
||||
|
||||
# Deploy cinder-ceph applications with the appropriate names.
|
||||
cinder-ceph-a:
|
||||
charm: cs:cinder-ceph
|
||||
num_units: 0
|
||||
options:
|
||||
rbd-mirroring-mode: image
|
||||
|
||||
cinder-ceph-b:
|
||||
charm: cs:cinder-ceph
|
||||
num_units: 0
|
||||
options:
|
||||
rbd-mirroring-mode: image
|
||||
|
||||
# Deploy ceph-rbd-mirror applications with the appropriate names.
|
||||
site-a-ceph-rbd-mirror:
|
||||
charm: cs:ceph-rbd-mirror
|
||||
num_units: 1
|
||||
options:
|
||||
source: *openstack-origin
|
||||
|
||||
site-b-ceph-rbd-mirror:
|
||||
charm: cs:ceph-rbd-mirror
|
||||
num_units: 1
|
||||
options:
|
||||
source: *openstack-origin
|
||||
|
||||
# Configure for the local environment.
|
||||
ovn-chassis:
|
||||
options:
|
||||
bridge-interface-mappings: *data-port
|
|
@ -1,576 +0,0 @@
|
|||
=========================
|
||||
Cinder volume replication
|
||||
=========================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
Cinder volume replication is a primary/secondary failover solution based on
|
||||
two-way `Ceph RBD mirroring`_.
|
||||
|
||||
Deployment
|
||||
----------
|
||||
|
||||
The cloud deployment in this document is based on the stable `openstack-base`_
|
||||
bundle in the `openstack-bundles`_ repository. The necessary documentation is
|
||||
found in the `bundle README`_.
|
||||
|
||||
A custom overlay bundle (`cinder-volume-replication-overlay`_) is used to
|
||||
extend the base cloud in order to implement volume replication.
|
||||
|
||||
.. note::
|
||||
|
||||
The key elements for adding volume replication to Ceph RBD mirroring is the
|
||||
relation between cinder-ceph in one site and ceph-mon in the other (using the
|
||||
``ceph-replication-device`` endpoint) and the cinder-ceph charm
|
||||
configuration option ``rbd-mirroring-mode=image``.
|
||||
|
||||
Cloud notes:
|
||||
|
||||
* The cloud used in these instructions is based on Ubuntu 20.04 LTS (Focal) and
|
||||
OpenStack Victoria. The openstack-base bundle may have been updated since.
|
||||
* The two Ceph clusters are named 'site-a' and 'site-b' and are placed in the
|
||||
same Juju model.
|
||||
* A site's pool is named after its corresponding cinder-ceph application (e.g.
|
||||
'cinder-ceph-a' for site-a) and is mirrored to the other site. Each site will
|
||||
therefore have two pools: 'cinder-ceph-a' and 'cinder-ceph-b'.
|
||||
* Glance is only backed by site-a.
|
||||
|
||||
To deploy:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy ./bundle.yaml --overlay ./cinder-volume-replication-overlay.yaml
|
||||
|
||||
Configuration and verification of the base cloud
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Configure the base cloud as per the referenced documentation.
|
||||
|
||||
Before proceeding, verify the base cloud by creating a VM and connecting to it
|
||||
over SSH. See the main bundle's README for guidance.
|
||||
|
||||
.. important::
|
||||
|
||||
A known issue affecting the interaction of the ceph-rbd-mirror charm and
|
||||
Ceph itself gives the impression of a fatal error. The symptom is messaging
|
||||
that appears in :command:`juju status` command output: ``Pools WARNING (1)
|
||||
OK (1) Images unknown (1)``. This remains a cosmetic issue however. See bug
|
||||
`LP #1892201`_ for details.
|
||||
|
||||
Cinder volume types
|
||||
-------------------
|
||||
|
||||
For each site, create replicated and non-replicated Cinder volumes types. A
|
||||
type is referenced at volume-creation time in order to specify whether the
|
||||
volume is replicated (or not) and what pool it will reside in.
|
||||
|
||||
Type 'site-a-repl' denotes replication in site-a:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume type create site-a-repl \
|
||||
--property volume_backend_name=cinder-ceph-a \
|
||||
--property replication_enabled='<is> True'
|
||||
|
||||
Type 'site-a-local' denotes non-replication in site-a:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume type create site-a-local \
|
||||
--property volume_backend_name=cinder-ceph-a
|
||||
|
||||
Type 'site-b-repl' denotes replication in site-b:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume type create site-b-repl \
|
||||
--property volume_backend_name=cinder-ceph-b \
|
||||
--property replication_enabled='<is> True'
|
||||
|
||||
Type 'site-b-local' denotes non-replication in site-b:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume type create site-b-local \
|
||||
--property volume_backend_name=cinder-ceph-b
|
||||
|
||||
List the volume types:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume type list
|
||||
+--------------------------------------+--------------+-----------+
|
||||
| ID | Name | Is Public |
|
||||
+--------------------------------------+--------------+-----------+
|
||||
| ee70dfd9-7b97-407d-a860-868e0209b93b | site-b-local | True |
|
||||
| b0f6d6b5-9c76-4967-9eb4-d488a6690712 | site-b-repl | True |
|
||||
| fc89ca9b-d75a-443e-9025-6710afdbfd5c | site-a-local | True |
|
||||
| 780980dc-1357-4fbd-9714-e16a79df252a | site-a-repl | True |
|
||||
| d57df78d-ff27-4cf0-9959-0ada21ce86ad | __DEFAULT__ | True |
|
||||
+--------------------------------------+--------------+-----------+
|
||||
|
||||
.. note::
|
||||
|
||||
In this document, site-b volume types will not be used. They are created
|
||||
here for the more generalised case where new volumes may be needed while
|
||||
site-a is in a failover state. In such a circumstance, any volumes created
|
||||
in site-b will naturally not be replicated (in site-a).
|
||||
|
||||
.. _rbd_image_status:
|
||||
|
||||
RBD image status
|
||||
----------------
|
||||
|
||||
The status of the two RBD images associated with a replicated volume can be
|
||||
queried using the ``status`` action of the ceph-rbd-mirror unit for each site.
|
||||
|
||||
A state of ``up+replaying`` in combination with the presence of
|
||||
``"entries_behind_primary":0`` in the image description means the image in one
|
||||
site is in sync with its counterpart in the other site.
|
||||
|
||||
A state of ``up+syncing`` indicates that the sync process is still underway.
|
||||
|
||||
A description of ``local image is primary`` means that the image is the
|
||||
primary.
|
||||
|
||||
Consider the volume below that is created and given the volume type of
|
||||
'site-a-repl'. Its primary will be in site-a and its non-primary (secondary)
|
||||
will be in site-b:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume create --size 5 --type site-a-repl vol-site-a-repl
|
||||
|
||||
Their statuses can be queried in each site as shown:
|
||||
|
||||
Site a (primary),
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait site-a-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
|
||||
volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
|
||||
global_id: f66140a6-0c09-478c-9431-4eb1eb16ca86
|
||||
state: up+stopped
|
||||
description: local image is primary
|
||||
|
||||
Site b (secondary is in sync with the primary),
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
|
||||
volume-c44d4d20-6ede-422a-903d-588d1b0d51b0:
|
||||
global_id: f66140a6-0c09-478c-9431-4eb1eb16ca86
|
||||
state: up+replaying
|
||||
description: replaying, {"bytes_per_second":0.0,"entries_behind_primary":0,.....
|
||||
|
||||
.. _cinder_service_list:
|
||||
|
||||
Cinder service list
|
||||
-------------------
|
||||
|
||||
To verify the state of Cinder services the ``cinder service-list`` command is
|
||||
used:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder service-list
|
||||
+------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
|
||||
| Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State |
|
||||
+------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
|
||||
| cinder-scheduler | cinder | nova | enabled | up | 2021-04-08T15:59:25.000000 | - | - | |
|
||||
| cinder-volume | cinder@cinder-ceph-a | nova | enabled | up | 2021-04-08T15:59:24.000000 | - | - | up |
|
||||
| cinder-volume | cinder@cinder-ceph-b | nova | enabled | up | 2021-04-08T15:59:25.000000 | - | - | up |
|
||||
+------------------+----------------------+------+---------+-------+----------------------------+---------+-----------------+---------------+
|
||||
|
||||
Each of the below examples ends with a failback to site-a. The above output is
|
||||
the desired result.
|
||||
|
||||
The failover of a particular site entails the referencing of its corresponding
|
||||
cinder-volume service host (e.g. ``cinder@cinder-ceph-a`` for site-a). We'll
|
||||
see how to do this later on.
|
||||
|
||||
.. note::
|
||||
|
||||
'cinder-ceph-a' and 'cinder-ceph-b' correspond to the two applications
|
||||
deployed via the `cinder-ceph`_ charm. The express purpose of this charm is
|
||||
to connect Cinder to a Ceph cluster. See the
|
||||
`cinder-volume-replication-overlay`_ bundle for details.
|
||||
|
||||
Failover, volumes, images, and pools
|
||||
------------------------------------
|
||||
|
||||
This section will show the basics of failover/failback, non-replicated vs
|
||||
replicated volumes, and what pools are used for the volume images.
|
||||
|
||||
In site-a, create one non-replicated and one replicated data volume and list
|
||||
them:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume create --size 5 --type site-a-local vol-site-a-local
|
||||
openstack volume create --size 5 --type site-a-repl vol-site-a-repl
|
||||
|
||||
openstack volume list
|
||||
+--------------------------------------+------------------+-----------+------+-------------+
|
||||
| ID | Name | Status | Size | Attached to |
|
||||
+--------------------------------------+------------------+-----------+------+-------------+
|
||||
| fba13395-62d1-468e-9b9a-40bebd0373e8 | vol-site-a-local | available | 5 | |
|
||||
| c21a539e-d524-4f4d-991b-9b9476d4f930 | vol-site-a-repl | available | 5 | |
|
||||
+--------------------------------------+------------------+-----------+------+-------------+
|
||||
|
||||
Pools and images
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
For 'vol-site-a-local' there should be one image in the 'cinder-ceph-a' pool of
|
||||
site-a.
|
||||
|
||||
For 'vol-site-a-repl' there should be two images: one in the 'cinder-ceph-a'
|
||||
pool of site-a and one in the 'cinder-ceph-a' pool of site-b:
|
||||
|
||||
This can all be confirmed by querying a Ceph MON in each site:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju ssh site-a-ceph-mon/0 sudo rbd ls -p cinder-ceph-a
|
||||
|
||||
volume-fba13395-62d1-468e-9b9a-40bebd0373e8
|
||||
volume-c21a539e-d524-4f4d-991b-9b9476d4f930
|
||||
|
||||
juju ssh site-b-ceph-mon/0 sudo rbd ls -p cinder-ceph-a
|
||||
|
||||
volume-c21a539e-d524-4f4d-991b-9b9476d4f930
|
||||
|
||||
Failover
|
||||
~~~~~~~~
|
||||
|
||||
Perform the failover of site-a:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder failover-host cinder@cinder-ceph-a
|
||||
|
||||
Wait until the failover is complete:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder service-list
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
| Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State |
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
| cinder-scheduler | cinder | nova | enabled | up | 2021-04-08T17:11:56.000000 | - | - | |
|
||||
| cinder-volume | cinder@cinder-ceph-a | nova | disabled | up | 2021-04-08T17:11:56.000000 | - | failed-over | - |
|
||||
| cinder-volume | cinder@cinder-ceph-b | nova | enabled | up | 2021-04-08T17:11:56.000000 | - | - | up |
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
|
||||
A failover triggers the promotion of one site and the demotion of the other
|
||||
(site-b and site-a respectively in this example). Communication between Cinder
|
||||
and each Ceph cluster is therefore ideal, as in this example.
|
||||
|
||||
Inspection
|
||||
~~~~~~~~~~
|
||||
|
||||
By consulting the volume list we see that the replicated volume is still
|
||||
available but that the non-replicated volume has errored:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume list
|
||||
+--------------------------------------+------------------+-----------+------+-------------+
|
||||
| ID | Name | Status | Size | Attached to |
|
||||
+--------------------------------------+------------------+-----------+------+-------------+
|
||||
| fba13395-62d1-468e-9b9a-40bebd0373e8 | vol-site-a-local | error | 5 | |
|
||||
| c21a539e-d524-4f4d-991b-9b9476d4f930 | vol-site-a-repl | available | 5 | |
|
||||
+--------------------------------------+------------------+-----------+------+-------------+
|
||||
|
||||
Generally a failover indicates a significant degree of non-confidence in the
|
||||
primary site, site-a in this case. Once a **local** volume goes into an error
|
||||
state due to a failover it is expected to not recover after failback. The
|
||||
errored local volumes should normally be discarded (deleted).
|
||||
|
||||
Failback
|
||||
~~~~~~~~
|
||||
|
||||
Failback site-a and confirm the original health of Cinder services (as per
|
||||
`Cinder service list`_):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder failover-host cinder@cinder-ceph-a --backend_id default
|
||||
cinder service-list
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
The following two examples will be considered. They will both use replication
|
||||
and involve the failing over of site-a to site-b:
|
||||
|
||||
#. `Data volume used by a VM`_
|
||||
#. `Bootable volume used by a VM`_
|
||||
|
||||
Data volume used by a VM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In this example, a replicated data volume will be created in site-a and
|
||||
attached to a VM. The volume's block device will then have some test data
|
||||
written to it. This will allow for verification of the replicated data once
|
||||
failover has occurred and the volume is re-attached to the VM.
|
||||
|
||||
Preparation
|
||||
^^^^^^^^^^^
|
||||
|
||||
Create the replicated data volume:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume create --size 5 --type site-a-repl vol-site-a-repl-data
|
||||
openstack volume list
|
||||
+--------------------------------------+---------------------------+-----------+------+-------------+
|
||||
| ID | Name | Status | Size | Attached to |
|
||||
+--------------------------------------+---------------------------+-----------+------+-------------+
|
||||
| f23732c1-3257-4e58-a214-085c460abf56 | vol-site-a-repl-data | available | 5 | |
|
||||
+--------------------------------------+---------------------------+-----------+------+-------------+
|
||||
|
||||
Create the VM (named 'vm-with-data-volume'):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server create --image focal-amd64 --flavor m1.tiny \
|
||||
--key-name mykey --network int_net vm-with-data-volume
|
||||
|
||||
FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
|
||||
openstack server add floating ip vm-with-data-volume $FLOATING_IP
|
||||
|
||||
openstack server list
|
||||
+--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
|
||||
| ID | Name | Status | Networks | Image | Flavor |
|
||||
+--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
|
||||
| fbe07fea-731e-4973-8455-c8466be72293 | vm-with-data-volume | ACTIVE | int_net=192.168.0.38, 10.5.1.28 | focal-amd64 | m1.tiny |
|
||||
+--------------------------------------+----------------------+--------+---------------------------------+--------------------------+---------+
|
||||
|
||||
Attach the data volume to the VM:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server add volume vm-with-data-volume vol-site-a-repl-data
|
||||
|
||||
Prepare the block device and write the test data to it:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
|
||||
> sudo mkfs.ext4 /dev/vdc
|
||||
> mkdir data
|
||||
> sudo mount /dev/vdc data
|
||||
> sudo chown ubuntu: data
|
||||
> echo "This is a test." > data/test.txt
|
||||
> sync
|
||||
> exit
|
||||
|
||||
Failover
|
||||
^^^^^^^^
|
||||
|
||||
When both sites are online, as is here, it is not recommended to perform a
|
||||
failover when volumes are in use. This is because Cinder will try to demote the
|
||||
Ceph image from the primary site, and if there is an active connection to it
|
||||
the operation may fail (i.e. the volume will transition to an error state).
|
||||
|
||||
Here we ensure the volume is not in use by unmounting the block device and
|
||||
removing it from the VM:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP sudo umount /dev/vdc
|
||||
openstack server remove vm-with-data-volume vol-site-a-repl-data
|
||||
|
||||
Prior to failover the images of all replicated volumes must be fully
|
||||
synchronised. Perform a check with the ceph-rbd-mirror charm's ``status``
|
||||
action as per `RBD image status`_. If the volumes were created in site-a then
|
||||
the ceph-rbd-mirror unit in site-b is the target:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
|
||||
|
||||
If all images look good, perform the failover of site-a:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder failover-host cinder@cinder-ceph-a
|
||||
cinder service-list
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
| Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State |
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
| cinder-scheduler | cinder | nova | enabled | up | 2021-04-08T19:30:29.000000 | - | - | |
|
||||
| cinder-volume | cinder@cinder-ceph-a | nova | disabled | up | 2021-04-08T19:30:28.000000 | - | failed-over | - |
|
||||
| cinder-volume | cinder@cinder-ceph-b | nova | enabled | up | 2021-04-08T19:30:28.000000 | - | - | up |
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
|
||||
Verification
|
||||
^^^^^^^^^^^^
|
||||
|
||||
Re-attach the volume to the VM:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server add volume vm-with-data-volume vol-site-a-repl-data
|
||||
|
||||
Verify that the secondary device contains the expected data:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
|
||||
> sudo mount /dev/vdc /data
|
||||
> cat /data/test.txt
|
||||
This is a test.
|
||||
|
||||
Failback
|
||||
^^^^^^^^
|
||||
|
||||
Failback site-a and confirm the original health of Cinder services (as per
|
||||
`Cinder service list`_):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder failover-host cinder@cinder-ceph-a --backend_id default
|
||||
cinder service-list
|
||||
|
||||
Bootable volume used by a VM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In this example, a bootable volume will be created in site-a and have a
|
||||
newly-created VM use that volume as its root device. Identically to the
|
||||
previous example, the volume's block device will have test data written to it
|
||||
to use for verification purposes.
|
||||
|
||||
Preparation
|
||||
^^^^^^^^^^^
|
||||
|
||||
Create the replicated bootable volume:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume create --size 5 --type site-a-repl --image focal-amd64 --bootable vol-site-a-repl-boot
|
||||
|
||||
Wait for the volume to become available (it may take a while):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack volume list
|
||||
+--------------------------------------+----------------------+-----------+------+-------------+
|
||||
| ID | Name | Status | Size | Attached to |
|
||||
+--------------------------------------+----------------------+-----------+------+-------------+
|
||||
| c44d4d20-6ede-422a-903d-588d1b0d51b0 | vol-site-a-repl-boot | available | 5 | |
|
||||
+--------------------------------------+----------------------+-----------+------+-------------+
|
||||
|
||||
Create a VM (named 'vm-with-boot-volume') by specifying the newly-created
|
||||
bootable volume:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server create --volume vol-site-a-repl-boot --flavor m1.tiny \
|
||||
--key-name mykey --network int_net vm-with-boot-volume
|
||||
|
||||
FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
|
||||
openstack server add floating ip vm-with-boot-volume $FLOATING_IP
|
||||
|
||||
openstack server list
|
||||
+--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
|
||||
| ID | Name | Status | Networks | Image | Flavor |
|
||||
+--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
|
||||
| c0a152d7-376b-4500-95d4-7c768a3ff280 | vm-with-boot-volume | ACTIVE | int_net=192.168.0.75, 10.5.1.53 | N/A (booted from volume) | m1.tiny |
|
||||
+--------------------------------------+---------------------+--------+---------------------------------+--------------------------+---------+
|
||||
|
||||
Write the test data to the block device:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
|
||||
> echo "This is a test." > test.txt
|
||||
> sync
|
||||
> exit
|
||||
|
||||
Failover
|
||||
^^^^^^^^
|
||||
|
||||
As explained previously, when both sites are functional, prior to failover the
|
||||
replicated volume should not be in use. Since the testing of the replicated
|
||||
boot volume requires the VM to be rebuilt anyway (Cinder needs to give the
|
||||
updated Ceph connection credentials to Nova) the easiest way forward is to
|
||||
simply delete the VM:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server delete vm-with-boot-volume
|
||||
|
||||
Like before, prior to failover, confirm that the images of all replicated
|
||||
volumes in site-b are fully synchronised. Perform a check with the
|
||||
ceph-rbd-mirror charm's ``status`` action as per `RBD image status`_:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju run-action --wait site-b-ceph-rbd-mirror/0 status verbose=true | grep -A3 volume-
|
||||
|
||||
If all images look good, perform the failover of site-a:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder failover-host cinder@cinder-ceph-a
|
||||
cinder service-list
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
| Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason | Backend State |
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
| cinder-scheduler | cinder | nova | enabled | up | 2021-04-08T21:29:12.000000 | - | - | |
|
||||
| cinder-volume | cinder@cinder-ceph-a | nova | disabled | up | 2021-04-08T21:29:12.000000 | - | failed-over | - |
|
||||
| cinder-volume | cinder@cinder-ceph-b | nova | enabled | up | 2021-04-08T21:29:11.000000 | - | - | up |
|
||||
+------------------+----------------------+------+----------+-------+----------------------------+---------+-----------------+---------------+
|
||||
|
||||
Verification
|
||||
^^^^^^^^^^^^
|
||||
|
||||
Re-create the VM:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack server create --volume vol-site-a-repl-boot --flavor m1.tiny \
|
||||
--key-name mykey --network int_net vm-with-boot-volume
|
||||
|
||||
FLOATING_IP=$(openstack floating ip create -f value -c floating_ip_address ext_net)
|
||||
openstack server add floating ip vm-with-boot-volume $FLOATING_IP
|
||||
|
||||
Verify that the root device contains the expected data:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
ssh -i ~/cloud-keys/mykey ubuntu@$FLOATING_IP
|
||||
> cat test.txt
|
||||
This is a test.
|
||||
> exit
|
||||
|
||||
Failback
|
||||
^^^^^^^^
|
||||
|
||||
Failback site-a and confirm the original health of Cinder services (as per
|
||||
`Cinder service list`_):
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
cinder failover-host cinder@cinder-ceph-a --backend_id default
|
||||
cinder service-list
|
||||
|
||||
Disaster recovery
|
||||
-----------------
|
||||
|
||||
An uncontrolled failover is known as the disaster recovery scenario. It is
|
||||
characterised by the sudden failure of the primary Ceph cluster. See the
|
||||
:ref:`Cinder volume replication - Disaster recovery
|
||||
<cinder_volume_replication_dr>` page for more information.
|
||||
|
||||
.. LINKS
|
||||
.. _Ceph RBD mirroring: app-ceph-rbd-mirror.html
|
||||
.. _openstack-base: https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/bundle.yaml
|
||||
.. _openstack-bundles: https://github.com/openstack-charmers/openstack-bundles/
|
||||
.. _bundle README: https://github.com/openstack-charmers/openstack-bundles/blob/master/stable/openstack-base/README.md
|
||||
.. _cinder-volume-replication-overlay: cinder-volume-replication-overlay.html
|
||||
.. _cinder-ceph: https://jaas.ai/cinder-ceph
|
||||
.. _LP #1892201: https://bugs.launchpad.net/charm-ceph-rbd-mirror/+bug/1892201
|
|
@ -1,74 +0,0 @@
|
|||
==================
|
||||
Encryption at Rest
|
||||
==================
|
||||
|
||||
Overview
|
||||
++++++++
|
||||
|
||||
As of the 18.05 release, the OpenStack charms support encryption of data in three
|
||||
key areas - local ephemeral instance storage for Nova instances, Ceph OSD block
|
||||
devices and Swift Storage block devices.
|
||||
|
||||
The objective of this feature is to mitigate the risk of data compromise in the
|
||||
event that disks or full servers are removed from data center deployments.
|
||||
|
||||
Encryption of underlying block devices is performed using dm-crypt with LUKS; key
|
||||
management is provided by Vault, which provides secure encrypted storage of the
|
||||
keys used for each block device with automatic sealing of secrets in the event
|
||||
of reboot/restart of services.
|
||||
|
||||
The objective of this feature is to mitigate the risk of data compromise in the
|
||||
event that disks or full servers are removed from data center deployments.
|
||||
|
||||
Vault
|
||||
+++++
|
||||
|
||||
See the `vault charm`_.
|
||||
|
||||
Enabling Encryption
|
||||
+++++++++++++++++++
|
||||
|
||||
Encryption is enable via configuration options on the nova-compute, swift-storage and
|
||||
ceph-osd charms and a relation to the vault application:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju config swift-storage encrypt=true
|
||||
juju config nova-compute encrypt=true ephemeral-device=/dev/bcache2
|
||||
juju config ceph-osd osd-encrypt=true osd-encrypt-keymanager=vault
|
||||
juju add-relation swift-storage:secrets-storage vault:secrets
|
||||
juju add-relation nova-compute:secrets-storage vault:secrets
|
||||
juju add-relation ceph-osd:secrets-storage vault:secrets
|
||||
|
||||
.. note::
|
||||
|
||||
Encryption is only enabled during the initial preparation of the underlying
|
||||
block devices by the charms; enabling these options post deployment will
|
||||
not enable encryption on existing in-use devices. As a result its best to
|
||||
enable this options as part of an overlay bundle for during initial
|
||||
deployment.
|
||||
|
||||
Security Design Notes
|
||||
+++++++++++++++++++++
|
||||
|
||||
Consuming application units access Vault using a Vault AppRole and associated policy
|
||||
which is specific to each machine in the deployment; The AppRole enforces uses
|
||||
of a secret id and access is only permitted from the configured network address
|
||||
of the consuming unit. The associated policy only allows the consuming unit to
|
||||
store and retrieve secrets from a specific secrets back-end under a specific
|
||||
sub path (in this case the hostname of the unit).
|
||||
|
||||
The secret id for the AppRole is retrieved out-of-band from Juju by the
|
||||
consuming charm; a one-shot retrieval token is provided over the relation
|
||||
from vault to each consuming application which is specific to each unit which
|
||||
can be used to retrieve the actual secret id; the token also has a limited ttl
|
||||
(2 hours) and the call must originate from the configured network address of
|
||||
the consuming unit. The secret id is only ever visible to the consuming unit
|
||||
and vault itself, providing an additional layer of protection for deployments.
|
||||
|
||||
LUKS encryption keys are never store on local disk; vaultlocker is used to encrypt
|
||||
and store the key in vault, and to retrieve the key and open encrypted block
|
||||
devices during boot. Keys are only ever held in memory.
|
||||
|
||||
.. LINKS
|
||||
.. _vault charm: https://jaas.ai/vault/
|
|
@ -49,18 +49,6 @@ OpenStack Charms usage. To help improve it you can `file an issue`_ or
|
|||
app-octavia
|
||||
configure-bridge
|
||||
|
||||
.. toctree::
|
||||
:caption: Storage
|
||||
:maxdepth: 1
|
||||
|
||||
encryption-at-rest
|
||||
ceph-erasure-coding
|
||||
rgw-multisite
|
||||
app-ceph-rbd-mirror
|
||||
cinder-volume-replication
|
||||
manila-ganesha
|
||||
swift
|
||||
|
||||
.. LINKS
|
||||
.. _file an issue: https://bugs.launchpad.net/charm-deployment-guide/+filebug
|
||||
.. _submit a contribution: https://opendev.org/openstack/charm-deployment-guide
|
||||
|
|
|
@ -1,177 +0,0 @@
|
|||
==========================
|
||||
Shared filesystem services
|
||||
==========================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
As of the 20.02 OpenStack Charms release, with OpenStack Rocky or later,
|
||||
support for integrating Manila with CephFS to provide shared filesystems is
|
||||
available.
|
||||
|
||||
Three new charms are needed to deploy this solution: 'manila',
|
||||
'manila-ganesha', and 'ceph-fs'. The 'manila' charm provides the Manila API
|
||||
service to the OpenStack deployment, the 'ceph-fs' charm provides the Ceph
|
||||
services required to provide CephFS, and the 'manila-ganesha' charm integrates
|
||||
these two via a Manila managed NFS gateway (Ganesha) to provide access
|
||||
controlled NFS mounts to OpenStack instances.
|
||||
|
||||
Deployment
|
||||
----------
|
||||
|
||||
.. warning::
|
||||
|
||||
Throughout this guide make sure ``openstack-origin`` matches the value you
|
||||
used when `deploying OpenStack`_.
|
||||
|
||||
One way to add Manila Ganesha is to do so during the bundle deployment of a new
|
||||
OpenStack cloud. This is done by means of a bundle overlay, such as
|
||||
`manila-ganesha-overlay.yaml`:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
machines:
|
||||
'0':
|
||||
series: bionic
|
||||
'1':
|
||||
series: bionic
|
||||
'2':
|
||||
series: bionic
|
||||
'3':
|
||||
series: bionic
|
||||
relations:
|
||||
- - manila:ha
|
||||
- manila-hacluster:ha
|
||||
- - manila-ganesha:ha
|
||||
- manila-ganesha-hacluster:ha
|
||||
- - ceph-mon:mds
|
||||
- ceph-fs:ceph-mds
|
||||
- - ceph-mon:client
|
||||
- manila-ganesha:ceph
|
||||
- - manila-ganesha:shared-db
|
||||
- percona-cluster:shared-db
|
||||
- - manila-ganesha:amqp
|
||||
- rabbitmq-server:amqp
|
||||
- - manila-ganesha:identity-service
|
||||
- keystone:identity-credentials
|
||||
- - manila:remote-manila-plugin
|
||||
- manila-ganesha:manila-plugin
|
||||
- - manila:amqp
|
||||
- rabbitmq-server:amqp
|
||||
- - manila:identity-service
|
||||
- keystone:identity-service
|
||||
- - manila:shared-db
|
||||
- percona-cluster:shared-db
|
||||
series: bionic
|
||||
applications:
|
||||
ceph-fs:
|
||||
charm: cs:ceph-fs
|
||||
num_units: 2
|
||||
options:
|
||||
source: cloud:bionic-train
|
||||
manila-hacluster:
|
||||
charm: cs:hacluster
|
||||
manila-ganesha-hacluster:
|
||||
charm: cs:hacluster
|
||||
manila-ganesha:
|
||||
charm: cs:manila-ganesha
|
||||
series: bionic
|
||||
num_units: 3
|
||||
options:
|
||||
openstack-origin: cloud:bionic-train
|
||||
vip: <INSERT VIP(S)>
|
||||
bindings:
|
||||
public: public
|
||||
admin: admin
|
||||
internal: internal
|
||||
shared-db: internal
|
||||
amqp: internal
|
||||
# This could also be another existing space
|
||||
tenant-storage: tenant-storage
|
||||
to:
|
||||
- 'lxd:1'
|
||||
- 'lxd:2'
|
||||
- 'lxd:3'
|
||||
manila:
|
||||
charm: cs:manila
|
||||
series: bionic
|
||||
num_units: 3
|
||||
options:
|
||||
openstack-origin: cloud:bionic-train
|
||||
vip: <INSERT VIP(S)>
|
||||
default-share-backend: cephfsnfs1
|
||||
share-protocols: NFS
|
||||
bindings:
|
||||
public: public
|
||||
admin: admin
|
||||
internal: internal
|
||||
shared-db: internal
|
||||
amqp: internal
|
||||
to:
|
||||
- 'lxd:1'
|
||||
- 'lxd:2'
|
||||
- 'lxd:3'
|
||||
|
||||
.. warning::
|
||||
|
||||
The machine mappings will almost certainly need to be changed.
|
||||
|
||||
To deploy OpenStack with Manila Ganesha:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy ./base.yaml --overlay ./manila-ganesha-overlay.yaml
|
||||
|
||||
Where `base.yaml` is a bundle to deploy OpenStack. See the `Getting started
|
||||
tutorial`_ for an introduction to bundle usage.
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
|
||||
To create and access CephFS shares over NFS, you'll need to `create the share`_
|
||||
and then you'll need to `grant access`_ to the share.
|
||||
|
||||
Spaces
|
||||
------
|
||||
|
||||
This charm can optionally dedicate a provider's physical network to serving
|
||||
Ganesha NFS shares. It does so through its support for Juju spaces.
|
||||
|
||||
The charm uses a space called 'tenant-storage' and it should be accessible
|
||||
(routed is ok) to all tenants that expect to access the Manila shares. The
|
||||
easiest way to ensure this access is to create a provider network in OpenStack
|
||||
that is mapped to the same network layer as this space is. For example, the
|
||||
storage space is mapped to VLAN 120, then an OpenStack administrator should
|
||||
create a provider network that maps to the same VLAN. For example:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
openstack network create \
|
||||
--provider-network-type vlan \
|
||||
--provider-segment 120 \
|
||||
--share \
|
||||
--provider-physical-network physnet1 \
|
||||
tenant-storage
|
||||
|
||||
openstack subnet create tenant \
|
||||
--network=tenant-storage \
|
||||
--subnet-range 10.1.10.0/22 \
|
||||
--gateway 10.1.10.1 \
|
||||
--allocation-pool start=10.1.10.50,end=10.1.13.254
|
||||
|
||||
When creating the space in MAAS that corresponds to this network, be sure that
|
||||
DHCP is disabled in this space. If MAAS performs any additional allocations in
|
||||
this space, ensure that the range configured for the subnet in Neutron does not
|
||||
overlap with the MAAS subnets.
|
||||
|
||||
If dedicating a network space is not desired, it is also possible to use
|
||||
Ganesha over a routed network. Manila's IP access restrictions will continue to
|
||||
secure access to Ganesha even for a network that is not managed by Neutron. In
|
||||
order for the latter to apply, a provider network is required, and guests must
|
||||
be attached to that provider network.
|
||||
|
||||
.. LINKS
|
||||
.. _deploying OpenStack: install-openstack
|
||||
.. _create the share: https://docs.openstack.org/manila/latest/admin/cephfs_driver.html#create-cephfs-nfs-share
|
||||
.. _grant access: https://docs.openstack.org/manila/latest/admin/cephfs_driver.html#allow-access-to-cephfs-nfs-share
|
||||
.. _Getting started tutorial: https://docs.openstack.org/charm-guide/latest/getting-started/index.html
|
|
@ -1,150 +0,0 @@
|
|||
========================================
|
||||
Ceph RADOS Gateway multisite replication
|
||||
========================================
|
||||
|
||||
Overview
|
||||
++++++++
|
||||
|
||||
Ceph RADOS Gateway (RGW) native replication between ceph-radosgw applications
|
||||
is supported both within a single model and between different models. By
|
||||
default, each application will accept write operations.
|
||||
|
||||
.. note::
|
||||
|
||||
Multisite replication is supported starting with Ceph Luminous.
|
||||
|
||||
.. warning::
|
||||
|
||||
Converting from a standalone deployment to a replicated deployment is not
|
||||
supported.
|
||||
|
||||
Deployment
|
||||
++++++++++
|
||||
|
||||
.. note::
|
||||
|
||||
Example bundles for the us-west and us-east models can be found
|
||||
in the `bundles` subdirectory of the ceph-radosgw charm.
|
||||
|
||||
To deploy the ceph-radosgw charm in this configuration ensure that the
|
||||
following configuration options are set on the instances of the ceph-radosgw
|
||||
deployed - in this example `rgw-us-east` and `rgw-us-west` are both instances
|
||||
of the ceph-radosgw charm:
|
||||
|
||||
.. code::
|
||||
|
||||
rgw-us-east:
|
||||
realm: replicated
|
||||
zonegroup: us
|
||||
zone: us-east
|
||||
rgw-us-west:
|
||||
realm: replicated
|
||||
zonegroup: us
|
||||
zone: us-west
|
||||
|
||||
.. note::
|
||||
|
||||
The realm and zonegroup configuration must be identical between instances
|
||||
of the ceph-radosgw application participating in the multi-site
|
||||
deployment; the zone configuration must be unique per application.
|
||||
|
||||
When deploying with this configuration the ceph-radosgw applications will
|
||||
deploy into a blocked state until the master/slave (cross-model) relation
|
||||
is added.
|
||||
|
||||
Typically each ceph-radosgw deployment will be associated with a separate
|
||||
ceph cluster at different physical locations - in this example the deployments
|
||||
are in different models ('us-east' and 'us-west').
|
||||
|
||||
One ceph-radosgw application acts as the initial master for the deployment -
|
||||
setup the master relation endpoint as the provider of the offer for the
|
||||
cross-model relation:
|
||||
|
||||
.. code::
|
||||
|
||||
juju offer -m us-east rgw-us-east:master
|
||||
|
||||
The cross-model relation offer can then be consumed in the other model and
|
||||
related to the slave ceph-radosgw application:
|
||||
|
||||
.. code::
|
||||
|
||||
juju consume -m us-west admin/us-east.rgw-us-east
|
||||
juju add-relation -m us-west rgw-us-west:slave rgw-us-east:master
|
||||
|
||||
Once the relation has been added the realm, zonegroup and zone configuration
|
||||
will be created in the master deployment and then synced to the slave
|
||||
deployment.
|
||||
|
||||
The current sync status can be validated from either model:
|
||||
|
||||
.. code::
|
||||
|
||||
juju ssh -m us-east ceph-mon/0
|
||||
sudo radosgw-admin sync status
|
||||
realm 142eb39c-67c4-42b3-9116-1f4ffca23964 (replicated)
|
||||
zonegroup 7b69f059-425b-44f5-8a21-ade63c2034bd (us)
|
||||
zone 4ee3bc39-b526-4ac9-a233-64ebeacc4574 (us-east)
|
||||
metadata sync no sync (zone is master)
|
||||
data sync source: db876cf0-62a8-4b95-88f4-d0f543136a07 (us-west)
|
||||
syncing
|
||||
full sync: 0/128 shards
|
||||
incremental sync: 128/128 shards
|
||||
data is caught up with source
|
||||
|
||||
Once the deployment is complete, the default zone and zonegroup can
|
||||
optionally be tidied using the 'tidydefaults' action:
|
||||
|
||||
.. code::
|
||||
|
||||
juju run-action -m us-west --unit rgw-us-west/0 tidydefaults
|
||||
|
||||
.. warning::
|
||||
|
||||
This operation is not reversible.
|
||||
|
||||
Failover/Recovery
|
||||
+++++++++++++++++
|
||||
|
||||
In the event that the site hosting the zone which is the master for metadata
|
||||
(in this example us-east) has an outage, the master metadata zone must be
|
||||
failed over to the slave site; this operation is performed using the 'promote'
|
||||
action:
|
||||
|
||||
.. code::
|
||||
|
||||
juju run-action -m us-west --wait rgw-us-west/0 promote
|
||||
|
||||
Once this action has completed, the slave site will be the master for metadata
|
||||
updates and the deployment will accept new uploads of data.
|
||||
|
||||
Once the failed site has been recovered it will resync and resume as a slave
|
||||
to the promoted master site (us-west in this example).
|
||||
|
||||
The master metadata zone can be failed back to its original location once resync
|
||||
has completed using the 'promote' action:
|
||||
|
||||
.. code::
|
||||
|
||||
juju run-action -m us-east --wait rgw-us-east/0 promote
|
||||
|
||||
Read/write vs Read-only
|
||||
-----------------------
|
||||
|
||||
By default all zones within a deployment will be read/write capable but only
|
||||
the master zone can be used to create new containers.
|
||||
|
||||
Non-master zones can optionally be marked as read-only by using the 'readonly'
|
||||
action:
|
||||
|
||||
.. code::
|
||||
|
||||
juju run-action -m us-east --wait rgw-us-east/0 readonly
|
||||
|
||||
a zone that is currently read-only can be switched to read/write mode by either
|
||||
promoting it to be the current master or by using the 'readwrite' action:
|
||||
|
||||
.. code::
|
||||
|
||||
juju run-action -m us-east --wait rgw-us-east/0 readwrite
|
||||
|
|
@ -1,512 +0,0 @@
|
|||
=====
|
||||
Swift
|
||||
=====
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
There are two fundamental ways to deploy a Swift cluster with charms. Each
|
||||
method differs in how storage nodes get assigned to storage zones.
|
||||
|
||||
As of the 20.02 charm release, with OpenStack Newton or later, support for a
|
||||
multi-region (global) cluster, which is an extension of the single-region
|
||||
scenario, is available. See the upstream documentation on `Global clusters`_
|
||||
for background information.
|
||||
|
||||
.. warning::
|
||||
|
||||
Charmed Swift global cluster functionality is in a preview state and is
|
||||
ready for testing. It is not production-ready.
|
||||
|
||||
Any Swift deployment relies upon the `swift-proxy`_ and `swift-storage`_
|
||||
charms. Refer to those charms to learn about the various configuration options
|
||||
used throughout this guide.
|
||||
|
||||
Single-region cluster
|
||||
---------------------
|
||||
|
||||
Manual zone assignment
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The 'manual' method (the default) allows the cluster to be designed by
|
||||
explicitly assigning a storage zone to a storage node. This zone gets
|
||||
associated with the swift-storage application. This means that this method
|
||||
involves multiple uniquely-named swift-storage applications.
|
||||
|
||||
Let file ``swift.yaml`` contain the configuration:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
swift-proxy:
|
||||
zone-assignment: manual
|
||||
replicas: 3
|
||||
swift-storage-zone1:
|
||||
zone: 1
|
||||
block-device: /dev/sdb
|
||||
swift-storage-zone2:
|
||||
zone: 2
|
||||
block-device: /dev/sdb
|
||||
swift-storage-zone3:
|
||||
zone: 3
|
||||
block-device: /dev/sdb
|
||||
|
||||
Deploy the proxy and storage nodes:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy --config swift.yaml swift-proxy
|
||||
juju deploy --config swift.yaml swift-storage swift-storage-zone1
|
||||
juju deploy --config swift.yaml swift-storage swift-storage-zone2
|
||||
juju deploy --config swift.yaml swift-storage swift-storage-zone3
|
||||
|
||||
Add relations between the proxy node and all storage nodes:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation swift-proxy:swift-storage swift-storage-zone1:swift-storage
|
||||
juju add-relation swift-proxy:swift-storage swift-storage-zone2:swift-storage
|
||||
juju add-relation swift-proxy:swift-storage swift-storage-zone3:swift-storage
|
||||
|
||||
This will result in a three-zone cluster, with each zone consisting of a single
|
||||
storage node, thereby satisfying the replica requirement of three.
|
||||
|
||||
Storage capacity is increased by adding swift-storage units to a zone. For
|
||||
example, to add two storage nodes to zone '3':
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-unit -n 2 swift-storage-zone3
|
||||
|
||||
.. note::
|
||||
|
||||
When scaling out ensure the candidate machines are equipped with the block
|
||||
devices currently configured for the associated application.
|
||||
|
||||
This charm will not balance the storage ring until there are enough storage
|
||||
zones to meet its minimum replica requirement, in this case three.
|
||||
|
||||
Auto zone assignment
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The 'auto' method automatically assigns storage zones to storage nodes. There
|
||||
is only one swift-storage application and only one relation between it and the
|
||||
swift-proxy application. The relation sets up the initial storage node in zone
|
||||
'1' (by default). Newly-added nodes get assigned to zone '2', '3', and so on,
|
||||
until the number of stated replicas is reached. Zone numbering then falls back
|
||||
to '1' and the zone-assignment cycle repeats. In this way, storage nodes get
|
||||
distributed evenly across zones.
|
||||
|
||||
Let file ``swift.yaml`` contain the configuration:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
swift-proxy:
|
||||
zone-assignment: auto
|
||||
replicas: 3
|
||||
swift-storage:
|
||||
block-device: /dev/sdb
|
||||
|
||||
Deploy the proxy node and the storage application:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy --config swift.yaml swift-proxy
|
||||
juju deploy --config swift.yaml swift-storage
|
||||
|
||||
The first storage node gets assigned to zone '1' when the initial relation is
|
||||
added:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation swift-proxy:swift-storage swift-storage:swift-storage
|
||||
|
||||
The second and third units get assigned to zones '2' and '3', respectively,
|
||||
during scale-out operations:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-unit -n 2 swift-storage
|
||||
|
||||
.. note::
|
||||
|
||||
When scaling out ensure the candidate machines are equipped with the block
|
||||
devices currently configured for the associated application.
|
||||
|
||||
At this point the replica requirement is satisfied and the ring is balanced.
|
||||
The ring is extended by continuing to add more units to the single application.
|
||||
|
||||
Multi-region cluster
|
||||
--------------------
|
||||
|
||||
The previous configurations provided a single-region cluster. Generally a
|
||||
region is composed of a group of nodes with high-bandwidth, low-latency links
|
||||
between them. This almost always translates to the same geographical location.
|
||||
|
||||
Multiple such clusters can be meshed together to create a multi-region (global)
|
||||
cluster. The goal is to achieve greater data resiliency by spanning zones
|
||||
across geographically dispersed regions.
|
||||
|
||||
This section includes two configurations for implementing a Swift global
|
||||
cluster: minimal and comprehensive.
|
||||
|
||||
A global cluster is an extension of the single cluster scenario. Refer to
|
||||
the `Single-region cluster`_ section for information on essential options.
|
||||
|
||||
Minimal configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The proxy and storage nodes for a global cluster require extra configuration:
|
||||
|
||||
On the proxy node,
|
||||
|
||||
* option ``enable-multi-region`` is set to 'true'
|
||||
* option ``region`` is defined
|
||||
* option ``swift-hash`` is defined (same value for all regions)
|
||||
|
||||
On the storage nodes,
|
||||
|
||||
* option ``storage-region`` is set
|
||||
|
||||
The below example has two storage regions, a single zone, one storage node per
|
||||
storage region, and a replica requirement of two. Manual zone assignment will
|
||||
be used.
|
||||
|
||||
Let file ``swift.yaml`` contain the configuration:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
swift-proxy-region1:
|
||||
region: RegionOne
|
||||
zone-assignment: manual
|
||||
replicas: 2
|
||||
enable-multi-region: true
|
||||
swift-hash: "efcf2102-b9e9-4d71-afe6-000000111111"
|
||||
swift-proxy-region2:
|
||||
region: RegionTwo
|
||||
zone-assignment: manual
|
||||
replicas: 2
|
||||
enable-multi-region: true
|
||||
swift-hash: "efcf2102-b9e9-4d71-afe6-000000111111"
|
||||
swift-storage-region1:
|
||||
storage-region: 1
|
||||
zone: 1
|
||||
block-device: /dev/sdb
|
||||
swift-storage-region2:
|
||||
storage-region: 2
|
||||
zone: 1
|
||||
block-device: /dev/sdb
|
||||
|
||||
The value of ``swift-hash`` is arbitrary. It is provided here in the form of a
|
||||
UUID.
|
||||
|
||||
.. important::
|
||||
|
||||
The name of a storage region must be an integer. Here, OpenStack region
|
||||
'RegionOne' corresponds to storage region '1', and OpenStack region
|
||||
'RegionTwo' corresponds to storage region '2'.
|
||||
|
||||
Deploy in RegionOne:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy --config swift.yaml swift-proxy swift-proxy-region1
|
||||
juju deploy --config swift.yaml swift-storage swift-storage-region1
|
||||
|
||||
Deploy in RegionTwo:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy --config swift.yaml swift-proxy swift-proxy-region2
|
||||
juju deploy --config swift.yaml swift-storage swift-storage-region2
|
||||
|
||||
Add relations between swift-proxy in RegionOne and swift-storage in both
|
||||
RegionOne and RegionTwo:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation swift-proxy-region1:swift-storage swift-storage-region1:swift-storage
|
||||
juju add-relation swift-proxy-region1:swift-storage swift-storage-region2:swift-storage
|
||||
|
||||
Add relations between swift-proxy in RegionTwo and swift-storage in both
|
||||
RegionOne and RegionTwo:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation swift-proxy-region2:swift-storage swift-storage-region1:swift-storage
|
||||
juju add-relation swift-proxy-region2:swift-storage swift-storage-region2:swift-storage
|
||||
|
||||
Add a relation between swift-proxy in RegionOne and swift-proxy in RegionTwo:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju add-relation swift-proxy-region1:rings-distributor swift-proxy-region2:rings-consumer
|
||||
|
||||
More than one proxy can be deployed per OpenStack region, and each must have a
|
||||
relation to every proxy in all other OpenStack regions. Only one proxy can act
|
||||
as a "rings-distributor" at any one time; the proxy in RegionOne was
|
||||
arbitrarily chosen.
|
||||
|
||||
Comprehensive configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A global cluster is primarily useful when there are groups of storage nodes and
|
||||
proxy nodes in different physical regions, creating a
|
||||
geographically-distributed cluster. These regions typically reside in distinct
|
||||
Juju models, making `Cross model relations`_ a necessity. In addition, there
|
||||
are configuration options available for tuning read and write behaviour. The
|
||||
next example demonstrates how to implement these features and options in a
|
||||
realistic scenario.
|
||||
|
||||
Refer to the `Minimal configuration`_ section for basic settings.
|
||||
|
||||
Tuning configuration
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The ``read-affinity`` option is used to control what order the regions and
|
||||
zones are examined when searching for an object. A common approach would be to
|
||||
put the local region first on the search path for a proxy. For instance, in the
|
||||
deployment example below the Swift proxy in New York is configured to read from
|
||||
the New York storage nodes first. Similarly the San Francisco proxy prefers
|
||||
storage nodes in San Francisco.
|
||||
|
||||
The ``write-affinity`` option allows nodes to be stored locally before being
|
||||
eventually distributed globally. This write_affinity setting is useful only
|
||||
when you do not read objects immediately after writing them.
|
||||
|
||||
The ``write-affinity-node-count`` option is used to further configure
|
||||
``write-affinity``. This option dictates how many local storage servers will be
|
||||
tried before falling back to remote ones.
|
||||
|
||||
Storage regions are referred to by prepending an 'r' to their names. Hence 'r1'
|
||||
refers to storage region '1'. Similarly for zones. Zone '1' is referred to by
|
||||
'z1'.
|
||||
|
||||
For more details on these options see the upstream `Global clusters`_ document.
|
||||
|
||||
Deployment
|
||||
^^^^^^^^^^
|
||||
|
||||
.. warning::
|
||||
|
||||
Throughout this guide make sure ``openstack-origin`` matches the value you
|
||||
used when `deploying OpenStack`_.
|
||||
|
||||
This example assumes there are two data centres, one in San Francisco (SF) and
|
||||
one in New York (NY). These contain Juju models 'swift-sf' and 'swift-ny'
|
||||
respectively. Model 'swift-ny' contains OpenStack region 'RegionOne' and
|
||||
storage region '1'. Model 'swift-sf' contains OpenStack region 'RegionTwo' and
|
||||
storage region '2'.
|
||||
|
||||
Bundle overlays are needed for encapsulating cross-model relations. So the
|
||||
deployment in each OpenStack region consists of both a bundle and an overlay.
|
||||
|
||||
This is the contents of bundle ``swift-ny.yaml``:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
series: bionic
|
||||
applications:
|
||||
swift-proxy-region1:
|
||||
charm: cs:swift-proxy
|
||||
num_units: 1
|
||||
options:
|
||||
region: RegionOne
|
||||
zone-assignment: manual
|
||||
replicas: 2
|
||||
enable-multi-region: true
|
||||
swift-hash: "efcf2102-b9e9-4d71-afe6-000000111111"
|
||||
read-affinity: "r1=100, r2=200"
|
||||
write-affinity: "r1, r2"
|
||||
write-affinity-node-count: '1'
|
||||
openstack-origin: cloud:bionic-train
|
||||
swift-storage-region1-zone1:
|
||||
charm: cs:swift-storage
|
||||
num_units: 1
|
||||
options:
|
||||
storage-region: 1
|
||||
zone: 1
|
||||
block-device: /etc/swift/storage.img|2G
|
||||
openstack-origin: cloud:bionic-train
|
||||
swift-storage-region1-zone2:
|
||||
charm: cs:swift-storage
|
||||
num_units: 1
|
||||
options:
|
||||
storage-region: 1
|
||||
zone: 2
|
||||
block-device: /etc/swift/storage.img|2G
|
||||
openstack-origin: cloud:bionic-train
|
||||
swift-storage-region1-zone3:
|
||||
charm: cs:swift-storage
|
||||
num_units: 1
|
||||
options:
|
||||
storage-region: 1
|
||||
zone: 3
|
||||
block-device: /etc/swift/storage.img|2G
|
||||
openstack-origin: cloud:bionic-train
|
||||
percona-cluster:
|
||||
charm: cs:percona-cluster
|
||||
num_units: 1
|
||||
options:
|
||||
dataset-size: 25%
|
||||
max-connections: 1000
|
||||
source: cloud:bionic-train
|
||||
keystone:
|
||||
expose: True
|
||||
charm: cs:keystone
|
||||
num_units: 1
|
||||
options:
|
||||
openstack-origin: cloud:bionic-train
|
||||
glance:
|
||||
expose: True
|
||||
charm: cs:glance
|
||||
num_units: 1
|
||||
options:
|
||||
openstack-origin: cloud:bionic-train
|
||||
relations:
|
||||
- - swift-proxy-region1:swift-storage
|
||||
- swift-storage-region1-zone1:swift-storage
|
||||
- - swift-proxy-region1:swift-storage
|
||||
- swift-storage-region1-zone2:swift-storage
|
||||
- - swift-proxy-region1:swift-storage
|
||||
- swift-storage-region1-zone3:swift-storage
|
||||
- - keystone:shared-db
|
||||
- percona-cluster:shared-db
|
||||
- - glance:shared-db
|
||||
- percona-cluster:shared-db
|
||||
- - glance:identity-service
|
||||
- keystone:identity-service
|
||||
- - swift-proxy-region1:identity-service
|
||||
- keystone:identity-service
|
||||
- - glance:object-store
|
||||
- swift-proxy-region1:object-store
|
||||
|
||||
This is the contents of overlay bundle ``swift-ny-offers.yaml``:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
applications:
|
||||
keystone:
|
||||
offers:
|
||||
keystone-offer:
|
||||
endpoints:
|
||||
- identity-service
|
||||
swift-proxy-region1:
|
||||
offers:
|
||||
swift-proxy-region1-offer:
|
||||
endpoints:
|
||||
- swift-storage
|
||||
- rings-distributor
|
||||
swift-storage-region1-zone1:
|
||||
offers:
|
||||
swift-storage-region1-zone1-offer:
|
||||
endpoints:
|
||||
- swift-storage
|
||||
swift-storage-region1-zone2:
|
||||
offers:
|
||||
swift-storage-region1-zone2-offer:
|
||||
endpoints:
|
||||
- swift-storage
|
||||
swift-storage-region1-zone3:
|
||||
offers:
|
||||
swift-storage-region1-zone3-offer:
|
||||
endpoints:
|
||||
- swift-storage
|
||||
|
||||
This is the contents of bundle ``swift-sf.yaml``:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
series: bionic
|
||||
applications:
|
||||
swift-proxy-region2:
|
||||
charm: cs:swift-proxy
|
||||
num_units: 1
|
||||
options:
|
||||
region: RegionTwo
|
||||
zone-assignment: manual
|
||||
replicas: 2
|
||||
enable-multi-region: true
|
||||
swift-hash: "efcf2102-b9e9-4d71-afe6-000000111111"
|
||||
read-affinity: "r1=100, r2=200"
|
||||
write-affinity: "r1, r2"
|
||||
write-affinity-node-count: '1'
|
||||
openstack-origin: cloud:bionic-train
|
||||
swift-storage-region2-zone1:
|
||||
charm: cs:swift-storage
|
||||
num_units: 1
|
||||
options:
|
||||
storage-region: 2
|
||||
zone: 1
|
||||
block-device: /etc/swift/storage.img|2G
|
||||
openstack-origin: cloud:bionic-train
|
||||
swift-storage-region2-zone2:
|
||||
charm: cs:swift-storage
|
||||
num_units: 1
|
||||
options:
|
||||
storage-region: 2
|
||||
zone: 2
|
||||
block-device: /etc/swift/storage.img|2G
|
||||
openstack-origin: cloud:bionic-train
|
||||
swift-storage-region2-zone3:
|
||||
charm: cs:swift-storage
|
||||
num_units: 1
|
||||
options:
|
||||
storage-region: 2
|
||||
zone: 3
|
||||
block-device: /etc/swift/storage.img|2G
|
||||
openstack-origin: cloud:bionic-train
|
||||
relations:
|
||||
- - swift-proxy-region2:swift-storage
|
||||
- swift-storage-region2-zone1:swift-storage
|
||||
- - swift-proxy-region2:swift-storage
|
||||
- swift-storage-region2-zone2:swift-storage
|
||||
- - swift-proxy-region2:swift-storage
|
||||
- swift-storage-region2-zone3:swift-storage
|
||||
|
||||
This is the contents of overlay bundle ``swift-sf-consumer.yaml``:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
relations:
|
||||
- - swift-proxy-region2:identity-service
|
||||
- keystone:identity-service
|
||||
- - swift-proxy-region2:swift-storage
|
||||
- swift-storage-region1-zone1:swift-storage
|
||||
- - swift-proxy-region2:swift-storage
|
||||
- swift-storage-region1-zone2:swift-storage
|
||||
- - swift-proxy-region2:swift-storage
|
||||
- swift-storage-region1-zone3:swift-storage
|
||||
- - swift-storage-region2-zone1:swift-storage
|
||||
- swift-proxy-region1:swift-storage
|
||||
- - swift-storage-region2-zone2:swift-storage
|
||||
- swift-proxy-region1:swift-storage
|
||||
- - swift-storage-region2-zone3:swift-storage
|
||||
- swift-proxy-region1:swift-storage
|
||||
- - swift-proxy-region2:rings-consumer
|
||||
- swift-proxy-region1:rings-distributor
|
||||
saas:
|
||||
keystone:
|
||||
url: admin/swift-ny.keystone-offer
|
||||
swift-proxy-region1:
|
||||
url: admin/swift-ny.swift-proxy-region1-offer
|
||||
swift-storage-region1-zone1:
|
||||
url: admin/swift-ny.swift-storage-region1-zone1-offer
|
||||
swift-storage-region1-zone2:
|
||||
url: admin/swift-ny.swift-storage-region1-zone2-offer
|
||||
swift-storage-region1-zone3:
|
||||
url: admin/swift-ny.swift-storage-region1-zone3-offer
|
||||
|
||||
With the current configuration, ``swift-ny.yaml`` must be deployed first as it
|
||||
contains the Juju "offers" that ``swift-sf.yaml`` will consume:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
juju deploy -m swift-ny ./swift-ny.yaml --overlay ./swift-ny-offers.yaml
|
||||
juju deploy -m swift-sf ./swift-sf.yaml --overlay ./swift-sf-consumer.yaml
|
||||
|
||||
.. LINKS
|
||||
.. _deploying OpenStack: install-openstack
|
||||
.. _Global clusters: https://docs.openstack.org/swift/latest/overview_global_cluster.html
|
||||
.. _Cross model relations: https://juju.is/docs/olm/cross-model-relations
|
||||
.. _swift-proxy: https://jaas.ai/swift-proxy
|
||||
.. _swift-storage: https://jaas.ai/swift-storage
|
|
@ -37,3 +37,10 @@
|
|||
/project-deploy-guide/charm-deployment-guide/latest/pci-passthrough.html 301 /charm-guide/latest/admin/compute/pci-passthrough.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/nova-cells.html 301 /charm-guide/latest/admin/compute/nova-cells.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/ironic.html 301 /charm-guide/latest/admin/compute/ironic.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/app-ceph-rbd-mirror.html 301 /charm-guide/latest/admin/storage/ceph-rbd-mirror.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/ceph-erasure-coding.html 301 /charm-guide/latest/admin/storage/ceph-erasure-coding.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/rgw-multisite.html 301 /charm-guide/latest/admin/storage/ceph-rgw-multisite.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/cinder-volume-replication.html 301 /charm-guide/latest/admin/storage/cinder-replication.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/encryption-at-rest.html 301 /charm-guide/latest/admin/storage/encryption-at-rest.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/manila-ganesha.html 301 /charm-guide/latest/admin/storage/shared-filesystem-services.html
|
||||
/project-deploy-guide/charm-deployment-guide/latest/swift.html 301 /charm-guide/latest/admin/storage/swift.html
|
||||
|
|
Loading…
Reference in New Issue