"storage_availability_zone" in the [DEFAULT] section of manila's configuration file has allowed deployers to configure and manage both service (scheduler, share manager) and storage system availability. However, quite often manila's services (api, scheduler, share and data managers) are run on a dedicated control plane that is a different failure domain from that of the storage that manila manages. Also, when using share replication, deployers would need to run multiple manila share manager services with different configuration files, each with their own "storage_availability_zone". To allow flexibility of separating service and storage availability zones, we introduce a new configuration option "backend_availability_zone" within the share driver/backend section. When this option is used, it will override the value of the "storage_availability_zone" from the [DEFAULT] section. Change-Id: Ice99a880dd7be7af94dea86b31a6db88be3d7d9b Implements: bp per-backend-availability-zones
14 KiB
Share Replication
As of the Mitaka release of OpenStack, manila
supports replication of shares between
different pools for drivers that operate with
driver_handles_share_servers=False
mode. These pools may be
on different backends or within the same backend. This feature can be
used as a disaster recovery solution or as a load sharing mirroring
solution depending upon the replication style chosen, the capability of
the driver and the configuration of backends.
This feature assumes and relies on the fact that share drivers will be responsible for communicating with ALL storage controllers necessary to achieve any replication tasks, even if that involves sending commands to other storage controllers in other Availability Zones (or AZs).
End users would be able to create and manage their replicas, alongside their shares and snapshots.
Storage availability zones and replication domains
Replication is supported within the same availability zone, but in an ideal solution, an Availability Zone should be perceived as a single failure domain. So this feature provides the most value in an inter-AZ replication use case.
The replication_domain
option is a backend specific
StrOpt option to be used within manila.conf
. The value can
be any ASCII string. Two backends that can replicate between each other
would have the same replication_domain
. This comes from the
premise that manila expects Share Replication to be performed between
backends that have similar characteristics.
When scheduling new replicas, the scheduler takes into account the
replication_domain
option to match similar backends. It
also ensures that only one replica can be scheduled per pool. When
backends report multiple pools, manila would allow for replication
between two pools on the same backend.
The replication_domain
option is meant to be used in
conjunction with the storage_availability_zone
(or back end
specific backend_availability_zone
) option to utilize this
solution for Data Protection/Disaster Recovery.
Replication types
When creating a share that is meant to have replicas in the future,
the user will use a share_type
with an extra_spec, replication_type
set to a
valid replication type that manila supports. Drivers must report the
replication type that they support as the replication_type
capability during the
_update_share_stats()
call.
Three types of replication are currently supported:
- writable
-
Synchronously replicated shares where all replicas are writable. Promotion is not supported and not needed.
- readable
-
Mirror-style replication with a primary (writable) copy and one or more secondary (read-only) copies which can become writable after a promotion.
- dr (for Disaster Recovery)
-
Generalized replication with secondary copies that are inaccessible until they are promoted to become the
active
replica.
Note
The term active
replica refers to the primary
share. In writable
style of
replication, all replicas are active
, and there could be no distinction of a
primary
share. In readable
and dr
styles of replication, a secondary
replica may be referred to as passive
,
non-active
or simply replica
.
Health of a share replica
Apart from the status
attribute, share replicas have the
replica_state
attribute to denote the state of the replica. The primary
replica will have it's replica_state
attribute set to active
. A
secondary
replica may have one of the following values as
its replica_state
:
- in_sync
-
The replica is up to date with the active replica (possibly within a backend specific
recovery point objective
). - out_of_sync
-
The replica has gone out of date (all new replicas start out in this
replica_state
). - error
-
When the scheduler failed to schedule this replica or some potentially irrecoverable damage occurred with regard to updating data for this replica.
Manila requests periodic update of the replica_state
of all non-active replicas. The update
occurs with respect to an interval defined through the
replica_state_update_interval
option in
manila.conf
.
Administrators have an option of initiating a resync
of
a secondary replica (for readable
and dr
types of replication). This could be performed
before a planned failover operation in order to have the most up-to-date
data on the replica.
Promotion
For readable
and
dr
styles, we refer to
the task of switching a non-active
replica with the active
replica as promotion. For the writable
style of
replication, promotion does not make sense since all replicas are active
(or writable) at all
given points of time.
The status
attribute of the non-active replica being
promoted will be set to replication_change
during its promotion. This has
been classified as a busy
state and hence API interactions
with the share are restricted while one of its replicas is in this
state.
Promotion of replicas with replica_state
set to error
may not be
fully supported by the backend. However, manila allows the action as an
administrator feature and such an attempt may be honored by backends if
possible.
When multiple replicas exist, multiple replication relationships
between shares may need to be redefined at the backend during the
promotion operation. If the driver fails at this stage, the replicas may
be left in an inconsistent state. The share manager will set all
replicas to have the status
attribute set to
error
. Recovery from this state would require administrator
intervention.
Snapshots
If the driver supports snapshots, the replication of a snapshot is
expected to be initiated simultaneously with the creation of the
snapshot on the active
replica. Manila tracks snapshots across replicas as separate snapshot
instances. The aggregate snapshot object itself will be in
creating
state until it is available
across
all of the share's replicas that have their replica_state
attribute set
to active
or
in_sync
.
Therefore, for a driver that supports snapshots, the definition of
being in_sync
with the primary is not only that data is
ensured (within the recovery point objective
), but also that any
'available' snapshots on the primary are ensured on the replica as well.
If the snapshots cannot be ensured, the replica_state
must be reported to manila as
being out_of_sync
until the snapshots have been
replicated.
When a snapshot instance has its status
attribute set to
creating
or deleting
, manila will poll the
respective drivers for a status update. As described earlier, the parent
snapshot itself will be available
only when its instances
across the active
and
in_sync
replicas of the share are available
.
The polling interval will be the same as
replica_state_update_interval
.
Access Rules
Access rules are not meant to be different across the replicas of the
share. Manila expects drivers to handle these access rules effectively
depending on the style of replication supported. For example, the dr
style of replication does
mean that the non-active replicas are inaccessible, so if read-write
rules are expected, then the rules should be applied on the active
replica only.
Similarly, drivers that support readable
replication type should apply any read-write
rules as read-only for the non-active replicas.
Drivers will receive all the access rules in
create_replica
, delete_replica
and
update_replica_state
calls and have ample opportunity to
reconcile these rules effectively across replicas.
Understanding Replication Workflows
Creating a share that supports replication
Administrators can create a share type with extra-spec replication_type
, matching
the style of replication the desired backend supports. Users can use the
share type to create a new share that allows/supports replication. A
replicated share always starts out with one replica, the
primary
share itself.
The manila-scheduler
service will filter and weigh
available pools to find a suitable pool for the share being created. In
particular,
- The
CapabilityFilter
will match thereplication_type
extra_spec in the request share_type with thereplication_type
capability reported by a pool. - The
ShareReplicationFilter
will further ensure that the pool has a non-emptyreplication_domain
capability being reported as well. - The
AvailabilityZoneFilter
will ensure that the availability_zone requested matches with the pool's availability zone.
Creating a replica
The user has to specify the share name/id of the share that is supposed to be replicated and optionally an availability zone for the replica to exist in. The replica inherits the parent share's share_type and associated extra_specs. Scheduling of the replica is similar to that of the share.
- The ShareReplicationFilter will ensure that the pool is within
-
the same
replication_domain
as theactive
replica and also ensures that the pool does not already have a replica for that share.
Drivers supporting writable
style must set the replica_state
attribute to
active
when the
replica has been created and is available
.
Deleting a replica
Users can remove replicas that have their status attribute set to error
,
in_sync
or out_of_sync
. They could even delete
an active
replica as
long as there is another active
replica (as could be the case with writable replication style). Before the
delete_replica
call is made to the driver, an update_access
call is made to ensure access rules are safely removed for the
replica.
Administrators may also force-delete
replicas. Any
driver exceptions will only be logged and not re-raised; the replica
will be purged from manila's database.
Promoting a replica
Users can promote replicas that have their replica_state
attribute set
to in_sync
. Administrators can attempt to promote replicas
that have their replica_state
attribute set to
out_of_sync
or error
. During a promotion, if
the driver raises an exception, all replicas will have their status attribute set to error and recovery from this state will require
administrator intervention.
Resyncing a replica
Prior to a planned failover, an administrator could attempt to update
the data on the replica. The update_replica_state
call will
be made during such an action, giving drivers an opportunity to push the
latest updates from the active replica to
the secondaries.
Creating a snapshot
When a user takes a snapshot of a share that has replicas, manila
creates as many snapshot instances as there are share replicas. These
snapshot instances all begin with their status attribute set to creating. The driver is expected to create the
snapshot of the active
replica and then begin to replicate
this snapshot as soon as the active
replica's snapshot instance is created and
becomes available
.
Deleting a snapshot
When a user deletes a snapshot, the snapshot instances corresponding
to each replica of the share have their status
attribute
set to deleting
. Drivers must update their secondaries as
soon as the active
replica's snapshot instance is deleted.
Driver Interfaces
As part of the _update_share_stats()
call, the base
driver reports the replication_domain
capability. Drivers
are expected to update the replication_type
capability.
Drivers must implement the methods enumerated below in order to
support replication. promote_replica
,
update_replica_state
and
update_replicated_snapshot
need not be implemented by
drivers that support the writable
style of replication. The snapshot methods
create_replicated_snapshot
,
delete_replicated_snapshot
and
update_replicated_snapshot
need not be implemented by a
driver that does not support snapshots.
Each driver request is made on a specific host. Create/delete
operations on secondary replicas are always made on the destination
host. Create/delete operations on snapshots are always made on the active
replica's host.
update_replica_state
and
update_replicated_snapshot
calls are made on the host that
the replica or snapshot resides on.
Share Replica interfaces:
manila.share.driver.ShareDriver
Replicated Snapshot interfaces:
manila.share.driver.ShareDriver