Update doc for missed quotation marks Change-Id: I4993878b0b86e3b25ee6a7d86b7a406023f20118
21 KiB
Replication
For backend devices that offer replication features, Cinder provides a common mechanism for exposing that functionality on a per volume basis while still trying to allow flexibility for the varying implementation and requirements of all the different backend devices.
There are 2 sides to Cinder's replication feature, the core mechanism and the driver specific functionality, and in this document we'll only be covering the driver side of things aimed at helping vendors implement this functionality in their drivers in a way consistent with all other drivers.
Although we'll be focusing on the driver implementation there will also be some mentions on deployment configurations to provide a clear picture to developers and help them avoid implementing custom solutions to solve things that were meant to be done via the cloud configuration.
Overview
As a general rule replication is enabled and configured via the cinder.conf file under the driver's section, and volume replication is requested through the use of volume types.
NOTE: Current replication implementation is v2.1 and it's meant to solve a very specific use case, the "smoking hole" scenario. It's critical that you read the Use Cases section of the spec here: https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cheesecake.html
From a user's perspective volumes will be created using specific
volume types, even if it is the default volume type, and they will
either be replicated or not, which will be reflected on the
replication_status
field of the volume. So in order to know
if a snapshot is replicated we'll have to check its volume.
After the loss of the primary storage site all operations on the
resources will fail and VMs will no longer have access to the data. It
is then when the Cloud Administrator will issue the
failover-host
command to make the cinder-volume service
perform the failover.
After the failover is completed, the Cinder volume service will start using the failed-over secondary storage site for all operations and the user will once again be able to perform actions on all resources that were replicated, while all other resources will be in error status since they are no longer available.
Storage Device configuration
Most storage devices will require configuration changes to enable the replication functionality, and this configuration process is vendor and storage device specific so it is not contemplated by the Cinder core replication functionality.
It is up to the vendors whether they want to handle this device configuration in the Cinder driver or as a manual process, but the most common approach is to avoid including this configuration logic into Cinder and having the Cloud Administrators do a manual process following a specific guide to enable replication on the storage device before configuring the cinder volume service.
Service configuration
The way to enable and configure replication is common to all drivers
and it is done via the replication_device
configuration
option that goes in the driver's specific section in the
cinder.conf
configuration file.
replication_device
is a multi dictionary option, that
should be specified for each replication target device the admin wants
to configure.
While it is true that all drivers use the same
replication_device
configuration option this doesn't mean
that they will all have the same data, as there is only one standardized
and REQUIRED key in the configuration entry, all others
are vendor specific:
- backend_id:<vendor-identifier-for-rep-target>
Values of backend_id
keys are used to uniquely identify
within the driver each of the secondary sites, although they can be
reused on different driver sections.
These unique identifiers will be used by the failover mechanism as well as in the driver initialization process, and the only requirement is that is must never have the value "default".
An example driver configuration for a device with multiple replication targets is show below:
.....
[driver-biz]
volume_driver=xxxx
volume_backend_name=biz
[driver-baz]
volume_driver=xxxx
volume_backend_name=baz
[driver-foo]
volume_driver=xxxx
volume_backend_name=foo
replication_device = backend_id:vendor-id-1,unique_key:val....
replication_device = backend_id:vendor-id-2,unique_key:val....
In this example the result of calling
self.configuration.safe_get('replication_device')
within
the driver is the following list:
[{backend_id: vendor-id-1, unique_key: val1},
{backend_id: vendor-id-2, unique_key: val2}]
It is expected that if a driver is configured with multiple replication targets, that replicated volumes are actually replicated on all targets.
Besides specific replication device keys defined in the
replication_device
, a driver may also have additional
normal configuration options in the driver section related with the
replication to allow Cloud Administrators to configure things like
timeouts.
Capabilities reporting
There are 2 new replication stats/capability keys that drivers
supporting replication v2.1 should be reporting:
replication_enabled
and
replication_targets
:
stats["replication_enabled"] = True|False
stats["replication_targets"] = [<backend-id_1, <backend-id_2>...]
If a driver is behaving correctly we can expect the
replication_targets
field to be populated whenever
replication_enabled
is set to True
, and it is
expected to either be set to []
or be missing altogether
when replication_enabled
is set to False
.
The purpose of the replication_enabled
field is to be
used by the scheduler in volume types for creation and migrations.
As for the replication_targets
field it is only provided
for informational purposes so it can be retrieved through the
get_capabilities
using the admin REST API, but it will not
be used for validation at the API layer. That way Cloud Administrators
will be able to know available secondary sites where they can
failover.
Volume Types / Extra Specs
The way to control the creation of volumes on a cloud with backends that have replication enabled is, like with many other features, through the use of volume types.
We won't go into the details of volume type creation, but suffice to say that you will most likely want to use volume types to discriminate between replicated and non replicated volumes and be explicit about it so that non replicated volumes won't end up in a replicated backend.
Since the driver is reporting the replication_enabled
key, we just need to require it for replication volume types adding
replication_enabled='<is> True'
and also specifying
it for all non replicated volume types
replication_enabled='<is> False'
.
It's up to the driver to parse the volume type info on create and set things up as requested. While the scoping key can be anything, it's strongly recommended that all backends utilize the same key (replication) for consistency and to make things easier for the Cloud Administrator.
Additional replication parameters can be supplied to the driver using vendor specific properties through the volume type's extra-specs so they can be used by the driver at volume creation time, or retype.
It is up to the driver to parse the volume type info on create and retype to set things up as requested. A good pattern to get a custom parameter from a given volume instance is this:
extra_specs = getattr(volume.volume_type, 'extra_specs', {})
custom_param = extra_specs.get('custom_param', 'default_value')
It may seem convoluted, but we must be careful when retrieving the
extra_specs
from the volume_type
field as it
could be None
.
Vendors should try to avoid obfuscating their custom properties and
expose them using the _init_vendor_properties
method so
they can be checked by the Cloud Administrator using the
get_capabilities
REST API.
NOTE: For storage devices doing per backend/pool replication the use of volume types is also recommended.
Volume creation
Drivers are expected to honor the replication parameters set in the volume type during creation, retyping, or migration.
When implementing the replication feature there are some driver methods that will most likely need modifications -if they are implemented in the driver (since some are optional)- to make sure that the backend is replicating volumes that need to be replicated and not replicating those that don't need to be:
create_volume
create_volume_from_snapshot
create_cloned_volume
retype
clone_image
migrate_volume
In these methods the driver will have to check the volume type to see if the volumes need to be replicated, we could use the same pattern described in the Volume Types / Extra Specs section:
def _is_replicated(self, volume):
specs = getattr(volume.volume_type, 'extra_specs', {})
return specs.get('replication_enabled') == '<is> True'
But it is not the recommended mechanism, and the
is_replicated
method available in volumes and volume types
versioned objects instances should be used instead.
Drivers are expected to keep the replication_status
field up to date and in sync with reality, usually as specified in the
volume type. To do so in above mentioned methods' implementation they
should use the update model mechanism provided for each one of those
methods. One must be careful since the update mechanism may be different
from one method to another.
What this means is that most of these methods should be returning a
replication_status
key with the value set to
enabled
in the model update dictionary if the volume type
is enabling replication. There is no need to return the key with the
value of disabled
if it is not enabled since that is the
default value.
In the case of the create_volume
, and
retype
method there is no need to return the
replication_status
in the model update since it has already
been set by the scheduler on creation using the extra spec from the
volume type. And on migrate_volume
there is no need either
since there is no change to the replication_status
.
NOTE: For storage devices doing per backend/pool replication
it is not necessary to check the volume type for the
replication_enabled
key since all created volumes will be
replicated, but they are expected to return the
replication_status
in all those methods, including the
create_volume
method since the driver may receive a volume
creation request without the replication enabled extra spec and
therefore the driver will not have set the right
replication_status
and the driver needs to correct
this.
Besides the replication_status
field that drivers need
to update there are other fields in the database related to the
replication mechanism that the drivers can use:
replication_extended_status
replication_driver_data
These fields are string type fields with a maximum size of 255 characters and they are available for drivers to use internally as they see fit for their normal replication operation. So they can be assigned in the model update and later on used by the driver, for example during the failover.
To avoid using magic strings drivers must use values defined by the
ReplicationStatus
class in
cinder/objects/fields.py
file and these are:
ERROR
: When setting the replication failed on creation, retype, or migrate. This should be accompanied by the volume statuserror
.ENABLED
: When the volume is being replicated.DISABLED
: When the volume is not being replicated.FAILED_OVER
: After a volume has been successfully failed over.FAILOVER_ERROR
: When there was an error during the failover of this volume.NOT_CAPABLE
: When we failed-over but the volume was not replicated.
The first 3 statuses revolve around the volume creation and the last 3 around the failover mechanism.
The only status that should not be used for the volume's
replication_status
is the FAILING_OVER
status.
Whenever we are referring to values of the
replication_status
in this document we will be referring to
the ReplicationStatus
attributes and not a literal string,
so ERROR
means
cinder.objects.field.ReplicationStatus.ERROR
and not the
string "ERROR".
Failover
This is the mechanism used to instruct the cinder volume service to fail over to a secondary/target device.
Keep in mind the use case is that the primary backend has died a horrible death and is no longer valid, so any volumes that were on the primary and were not being replicated will no longer be available.
The method definition required from the driver to implement the failback mechanism is as follows:
def failover_host(self, context, volumes, secondary_id=None):
There are several things that are expected of this method:
- Promotion of a secondary storage device to primary
- Generating the model updates
- Changing internally to access the secondary storage device for all future requests.
If no secondary storage device is provided to the driver via the
backend_id
argument (it is equal to None
),
then it is up to the driver to choose which storage device to failover
to. In this regard it is important that the driver takes into
consideration that it could be failing over from a secondary (there was
a prior failover request), so it should discard current target from the
selection.
If the secondary_id
is not a valid one the driver is
expected to raise InvalidReplicationTarget
, for any other
non recoverable errors during a failover the driver should raise
UnableToFailOver
or any child of
VolumeDriverException
class and revert to a state where the
previous backend is in use.
The failover method in the driver will receive a list of replicated
volumes that need to be failed over. Replicated volumes passed to the
driver may have diverse replication_status
values, but they
will always be one of: ENABLED
, FAILED_OVER
,
or FAILOVER_ERROR
.
The driver must return a 2-tuple with the new storage device target id as the first element and a list of dictionaries with the model updates required for the volumes so that the driver can perform future actions on those volumes now that they need to be accessed on a different location.
It's not a requirement for the driver to return model updates for all
the volumes, or for any for that matter as it can return
None
or an empty list if there's no update necessary. But
if elements are returned in the model update list then it is a
requirement that each of the dictionaries contains 2 key-value pairs,
volume_id
and updates
like this:
[{
'volume_id': volumes[0].id,
'updates': {
'provider_id': new_provider_id1,
...
},
'volume_id': volumes[1].id,
'updates': {
'provider_id': new_provider_id2,
'replication_status': fields.ReplicationStatus.FAILOVER_ERROR,
...
},
}]
In these updates there is no need to set the
replication_status
to FAILED_OVER
if the
failover was successful, as this will be performed by the manager by
default, but it won't create additional DB queries if it is returned. It
is however necessary to set it to FAILOVER_ERROR
for those
volumes that had errors during the failover.
Drivers don't have to worry about snapshots or non replicated volumes, since the manager will take care of those in the following manner:
- All non replicated volumes will have their current
status
field saved in theprevious_status
field, thestatus
field changed toerror
, and theirreplication_status
set toNOT_CAPABLE
. - All snapshots from non replicated volumes will have their statuses
changed to
error
. - All replicated volumes that failed on the failover will get their
status
changed toerror
, their currentstatus
preserved inprevious_status
, and theirreplication_status
set toFAILOVER_ERROR
. - All snapshots from volumes that had errors during the failover will
have their statuses set to
error
.
Any model update request from the driver that changes the
status
field will trigger a change in the
previous_status
field to preserve the current status.
Once the failover is completed the driver should be pointing to the secondary and should be able to create and destroy volumes and snapshots as usual, and it is left to the Cloud Administrator's discretion whether resource modifying operations are allowed or not.
Failback
Drivers are not required to support failback, but they are required
to raise a InvalidReplicationTarget
exception if the
failback is requested but not supported.
The way to request the failback is quite simple, the driver will
receive the argument secondary_id
with the value of
default
. That is why it was forbidden to use the
default
on the target configuration in the cinder
configuration file.
Expected driver behavior is the same as the one explained in the Failover section:
- Promotion of the original primary to primary
- Generating the model updates
- Changing internally to access the original primary storage device for all future requests.
If the failback of any of the volumes fail the driver must return
replication_status
set to ERROR
in the volume
updates for those volumes. If they succeed it is not necessary to change
the replication_status
since the default behavior will be
to set them to ENABLED
, but it won't create additional DB
queries if it is set.
The manager will update resources in a slightly different way than in the failover case:
- All non replicated volumes will not have any model modifications.
- All snapshots from non replicated volumes will not have any model modifications.
- All replicated volumes that failed on the failback will get their
status
changed toerror
, have their currentstatus
preserved in theprevious_status
field, and theirreplication_status
set toFAILOVER_ERROR
. - All snapshots from volumes that had errors during the failover will
have their statuses set to
error
.
We can avoid using the "default" magic string by using the
FAILBACK_SENTINEL
class attribute from the
VolumeManager
class.
Initialization
It stands to reason that a failed over Cinder volume service may be restarted, so there needs to be a way for a driver to know on start which storage device should be used to access the resources.
So, to let drivers know which storage device they should use the
manager passes drivers the active_backend_id
argument to
their __init__
method during the initialization phase of
the driver. Default value is None
when the default
(primary) storage device should be used.
Drivers should store this value if they will need it, as the base driver is not storing it, for example to determine the current storage device when a failover is requested and we are already in a failover state, as mentioned above.
Freeze / Thaw
In many cases, after a failover has been completed we'll want to allow changes to the data in the volumes as well as some operations like attach and detach while other operations that modify the number of existing resources, like delete or create, are not allowed.
And that is where the freezing mechanism comes in; freezing a backend puts the control plane of the specific Cinder volume service into a read only state, or at least most of it, while allowing the data plane to proceed as usual.
While this will mostly be handled by the Cinder core code, drivers are informed when the freezing mechanism is enabled or disabled via these 2 calls:
freeze_backend(self, context)
thaw_backend(self, context)
In most cases the driver may not need to do anything, and then it
doesn't need to define any of these methods as long as its a child class
of the BaseVD
class that already implements them as
noops.
Raising a VolumeDriverException exception in any of these methods will result in a 500 status code response being returned to the caller and the manager will not log the exception, so it's up to the driver to log the error if it is appropriate.
If the driver wants to give a more meaningful error response, then it can raise other exceptions that have different status codes.
When creating the freeze_backend and
thaw_backend driver methods we must
remember that this is a Cloud Administrator operation, so we can return
errors that reveal internals of the cloud, for example the type of
storage device, and we must use the appropriate internationalization
translation methods when raising exceptions; for VolumeDriverException no translation is
necessary since the manager doesn't log it or return to the user in any
way, but any other exception should use the _()
translation
method since it will be returned to the REST API caller.
For example, if a storage device doesn't support the thaw operation when failed over, then it should raise an Invalid exception:
def thaw_backend(self, context):
if self.failed_over:
msg = _('Thaw is not supported by driver XYZ.')
raise exception.Invalid(msg)