Add spec for Share Migration Newton improvements
Share Migration is a feature that has already been upstream since Liberty. In Newton Design Summit there were several aspects discussed that raised concerns about the current design, pointing out several items for improvements. For the purpose of reviewing the design, it was agreed with the community that a full spec for Share Migration feature would be useful to discuss the current design and the improvements items. All improvement items are included in this proposed spec. APIImpact Change-Id: I240d5305920f12b1df5e54ff5494830e3ddd377b
This commit is contained in:
792
specs/newton/newton-migration-improvements.rst
Normal file
792
specs/newton/newton-migration-improvements.rst
Normal file
@@ -0,0 +1,792 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===================================
|
||||
Newton Share Migration Improvements
|
||||
===================================
|
||||
|
||||
https://blueprints.launchpad.net/manila/+spec/newton-migration-improvements
|
||||
|
||||
Share migration is a feature that allows an administrator to move a share
|
||||
across backends. Ideally, the data moved should be exactly the same as before.
|
||||
This operation is expected to be disruptive in most cases, because if data is
|
||||
moved to another place or backend, the export location may need to change, thus
|
||||
the client may need to re-mount the share. In order to handle this
|
||||
disruptiveness, this spec proposes a 2-phase migration approach.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Whenever a share is created on a backend, it cannot be moved to another backend
|
||||
in the use cases scenarios. The administrators would have to do it manually by
|
||||
creating an empty share in the destination backend, mounting both shares,
|
||||
copying data, and handling all possible difficulties by themselves,
|
||||
while also being inefficient, because in this approach it would download and
|
||||
re-upload all the data being copied. Since several administrators from several
|
||||
cloud environments are prone to facing these scenarios, it justifies having a
|
||||
feature that performs this in a common way.
|
||||
|
||||
Use Cases
|
||||
=========
|
||||
|
||||
There are several scenarios for which a share may need to be migrated:
|
||||
|
||||
Administrator-oriented
|
||||
|
||||
* Maintenance/Evacuation
|
||||
|
||||
* Evacuate a backend for hardware/software upgrades
|
||||
* Evacuate a backend experiencing failures
|
||||
* Evacuate a backend which is EOL
|
||||
|
||||
* Optimization
|
||||
|
||||
* Defragment backends to create empty ones which can be taken offline to
|
||||
conserve power
|
||||
* Rebalance backends to maximize available performance
|
||||
* Move data and compute closer together to reduce network utilization and
|
||||
decrease latency/increase bandwidth
|
||||
|
||||
* True migration
|
||||
|
||||
* Migrate from old hardware generation to a newer generation
|
||||
* Migrate from one vendor to another
|
||||
|
||||
User-oriented (through another feature, such as share-modify or share groups)
|
||||
|
||||
* Change share type
|
||||
* Change AZ
|
||||
* Change share protocol
|
||||
* Change share network (for DHSS=true) shares
|
||||
* Change share group
|
||||
* Expand share when there's no available space
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This spec aims to re-evaluate the Share Migration design, including the
|
||||
improvements discussed at Austin Summit 2016.
|
||||
|
||||
There are two possible ways of doing migration:
|
||||
|
||||
1) Driver understands destination backend and is able to move at backend level.
|
||||
|
||||
2) Manila migrates the share's data to another share created in the destination
|
||||
backend, using the Data Service.
|
||||
|
||||
In (1), the backend may use different mechanisms such as replication or
|
||||
snapshots, may not require share to be mounted, may be able to do so without
|
||||
changing the share to read-only, changing the export location and may also not
|
||||
be disruptive. When the operation is completed, it should return a new list of
|
||||
export locations to update the database, when necessary. This approach is
|
||||
referred to as driver-assisted approach.
|
||||
|
||||
In (2), share must be changed to read-only while we are using non-incremental
|
||||
copy approach, which is expected to cause downtime to tenant applications. This
|
||||
approach may not be able to preserve file metadata and is also referred to as
|
||||
host-assisted approach.
|
||||
|
||||
It is very important to note that when migrating a share from a backend to
|
||||
another, the destination backend needs to support exactly the same share
|
||||
protocol as the source share's. Migration itself does not change share
|
||||
protocols, another feature called "Share Modify" that is yet to be implemented
|
||||
should be responsible for doing this operation and may use share migration
|
||||
feature to do so.
|
||||
|
||||
Some attributes such as 'Share Network', 'Share Type' and 'Availability Zone'
|
||||
may be modified through migration, the administrator can do so through the API.
|
||||
|
||||
Since migration is expected to be disruptive, both if performed by the driver
|
||||
(as in #1 above) or if performed by manila code (as in #2 above), it was
|
||||
designed with a 2-phase possibility in mind. When invoking "migrationg-start",
|
||||
a share will be migrated but will pause when the 1st phase is completed
|
||||
without any disruptiveness, so the administrator can prepare for when to
|
||||
invoke "migration-complete" to finish the migration, which may be disruptive.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Do not have the feature implemented and administrator will need to do this
|
||||
manually and less efficiently. Multiple cloud providers would need to create
|
||||
scripts for doing this and handle each failure scenario themselves. This
|
||||
spec proposes a common way of doing this.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
In order to track share migration's operations, support 2-phase migration and
|
||||
properly handle errors, a field in the database is required. An additional
|
||||
field in the "Share" table is able to meet these requirements until the Jobs
|
||||
table (see [1]) is implemented. The field, named "task_state", works in a
|
||||
similar way as a status and it can be reset via API "reset-task-state".
|
||||
|
||||
In order to have change types during a migration, at a certain point, a share
|
||||
needs to have two instances with different types. At the moment, this is not
|
||||
possible due to the 'share_type_id' field being within the 'Shares' table in
|
||||
the database. So this spec includes a database migration to move the
|
||||
'share_type_id' field to the 'ShareInstances' table.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
Five admin-only new API methods are introduced:
|
||||
|
||||
1) (POST, 202, 400, 409) migration-start: migrates share
|
||||
|
||||
URL: /shares/<share_id>/action
|
||||
|
||||
Body::
|
||||
|
||||
{
|
||||
'migration_start': {
|
||||
'force_host_assisted_migration': false,
|
||||
'preserve_metadata': true,
|
||||
'writable': true,
|
||||
'nondisruptive': true,
|
||||
'host': 'ubuntu@generic2#GENERIC2',
|
||||
'new_share_type_id': 'foo_share_type_id',
|
||||
'new_share_network_id': 'bar_share_network_id'
|
||||
}
|
||||
}
|
||||
|
||||
2) (POST, 202, 400) migration-complete: triggers 2nd phase of migration
|
||||
|
||||
URL: /shares/<share_id>/action
|
||||
|
||||
Body::
|
||||
|
||||
{"migration_complete": {}}
|
||||
|
||||
3) (POST, 200, 400) migration-get-progress: attempts to obtain migration
|
||||
progress
|
||||
|
||||
URL: /shares/<share_id>/action
|
||||
|
||||
Body::
|
||||
|
||||
{"migration_get_progress": {}}
|
||||
|
||||
Example response::
|
||||
|
||||
RESP BODY: {
|
||||
"task_state": "data_copying_in_progress",
|
||||
"total_progress": 50,
|
||||
}
|
||||
|
||||
4) (POST, 202, 400) migration-cancel: attempts to cancel migration
|
||||
|
||||
URL: /shares/<share_id>/action
|
||||
|
||||
Body::
|
||||
|
||||
{"migration_cancel": {}}
|
||||
|
||||
5) (POST, 202, 400) reset-task-state: reset task state field value to desired
|
||||
one
|
||||
|
||||
URL: /shares/<share_id>/action
|
||||
|
||||
Body::
|
||||
|
||||
{"reset_task_state": {"task_state": "migration_error}}
|
||||
|
||||
API details:
|
||||
|
||||
1) ``migration-start [--force-host-assisted-migration <True/False>]
|
||||
[--preserve-metadata <True/False>] [--writable <True/False>]
|
||||
[--non-disruptive <True/False>] [--new-share-type <new_share_type>]
|
||||
[--new-share-network <new_share_network>] <share> <host@backend#pool>``
|
||||
|
||||
:force-host-assisted-migration (defaults False): forces the host-assisted
|
||||
approach to be used, thus using the Data Service to move copy data across
|
||||
backends. This skips the driver-assisted approach which would otherwise be run
|
||||
attempted first.
|
||||
|
||||
:preserve-metadata (defaults True): whether migration should enforce the
|
||||
preservation of metadata. If set to True, this will prevent host-assisted
|
||||
migration from running. Drivers are queried to validate this capability, and
|
||||
if not capable, driver-assisted approach will be skipped and migration will
|
||||
fail.
|
||||
|
||||
:writable (defaults to True): whether migration should only be performed if
|
||||
share remains writable. If set to True, this will prevent host-assisted
|
||||
migration from running. Drivers are queried to validate this capability, and
|
||||
if not capable, driver-assisted approach will be skipped and migration will
|
||||
fail.
|
||||
|
||||
:non-disruptive (defaults to False): whether migration should only be performed
|
||||
if share access is not disrupted during migration. For such, it is also
|
||||
expected that the export location does not change. If set to True, this will
|
||||
prevent host-assisted migration from running. Drivers are queried to validate
|
||||
this capability, and if not capable, driver-assisted approach will be skipped
|
||||
and migration will fail.
|
||||
|
||||
:new-share-type (defaults to None): the new share type that should be set in
|
||||
the migrated share.
|
||||
|
||||
:new-share-network (defaults to None): the new share network that should be set
|
||||
in the migrated share.
|
||||
|
||||
:share: share to be moved.
|
||||
|
||||
:host@backend#pool: string that combines host@backend#pool combination where
|
||||
share should be migrated to.
|
||||
|
||||
2) ``migration-complete <share>``
|
||||
|
||||
:share: share on which migration should be completed. Share must be in
|
||||
host-assisted's "copy completed" or driver-assisted's
|
||||
"driver phase 1 completed" task state to have its phase 2 migration invoked.
|
||||
|
||||
3) ``migration-get-progress <share>``
|
||||
|
||||
:share: share from which migration progress should be obtained. The total
|
||||
progress is displayed along with the current task state value. If the share is
|
||||
not being migrated or the driver cannot obtain progress then an error message
|
||||
is returned.
|
||||
|
||||
4) ``migration-cancel <share>``
|
||||
|
||||
:share: share from which migration should be cancelled. Share must be in
|
||||
host-assisted's "copy completed" or driver-assisted's
|
||||
"driver phase 1 completed" task state to be cancellable.
|
||||
|
||||
5) ``reset-task-state [--task-state <state>] <share>``
|
||||
|
||||
:task-state (defaults to None): value to reset the task state field to.
|
||||
|
||||
:share: share from which the task state field should be reset to the value
|
||||
provided.
|
||||
|
||||
Driver impact
|
||||
-------------
|
||||
|
||||
Vendors can implement the driver-assisted migration in their drivers in order
|
||||
to migrate data efficiently across backends from the same vendor or using
|
||||
vendor-compatible protocols.
|
||||
|
||||
In order to support host-assisted migration, existing drivers which run in
|
||||
DHSS=False mode should not need to implement any additional code, while code to
|
||||
handle their share protocol should be present in the Data Service and the
|
||||
networks need to be manually set up. However, it is highly recommend to support
|
||||
admin network to require less network configuration effort.
|
||||
|
||||
For DHSS=True mode drivers, if existing drivers do not have admin network
|
||||
support to allow connectivity between shares and nodes in admin network, the
|
||||
host-assisted approach will not work.
|
||||
|
||||
Add driver interfaces::
|
||||
|
||||
def migration_check_compatibility(
|
||||
self, context, source_share, destination_share,
|
||||
share_server=None, destination_share_server=None):
|
||||
"""Checks destination compatibility for migration of a given share.
|
||||
|
||||
.. note::
|
||||
Is called to test compatibility with destination backend.
|
||||
|
||||
Driver should check if it is compatible with destination backend so
|
||||
driver-assisted migration can proceed.
|
||||
|
||||
:param context: The 'context.RequestContext' object for the request.
|
||||
:param source_share: Reference to the share to be migrated.
|
||||
:param destination_share: Reference to the share model to be used by
|
||||
migrated share.
|
||||
:param share_server: Share server model or None.
|
||||
:param destination_share_server: Destination Share server model or
|
||||
None.
|
||||
:return: A dictionary containing values indicating if destination
|
||||
backend is compatible, if share can remain writable during
|
||||
migration, if it can preserve all file metadata and if it can
|
||||
perform migration of given share non-disruptively.
|
||||
|
||||
Example::
|
||||
|
||||
{
|
||||
'compatible': True,
|
||||
'writable': True,
|
||||
'preserve_metadata': True,
|
||||
'nondisruptive': True,
|
||||
}
|
||||
"""
|
||||
return {
|
||||
'compatible': False,
|
||||
'writable': False,
|
||||
'preserve_metadata': False,
|
||||
'nondisruptive': False,
|
||||
}
|
||||
|
||||
def migration_start(
|
||||
self, context, source_share, destination_share,
|
||||
share_server=None, destination_share_server=None):
|
||||
"""Starts migration of a given share to another host.
|
||||
|
||||
.. note::
|
||||
Is called in source share's backend to start migration.
|
||||
|
||||
Driver should implement this method if willing to perform migration
|
||||
in a driver-assisted way, useful for when source share's backend driver
|
||||
is compatible with destination backend driver. This method should
|
||||
start the migration procedure in the backend and end. Following steps
|
||||
should be done in 'migration_continue'.
|
||||
|
||||
:param context: The 'context.RequestContext' object for the request.
|
||||
:param source_share: Reference to the original share model.
|
||||
:param destination_share: Reference to the share model to be used by
|
||||
migrated share.
|
||||
:param share_server: Share server model or None.
|
||||
:param destination_share_server: Destination Share server model or
|
||||
None.
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def migration_continue(
|
||||
self, context, source_share, destination_share,
|
||||
share_server=None, destination_share_server=None):
|
||||
"""Continues migration of a given share to another host.
|
||||
|
||||
.. note::
|
||||
Is called in source share's backend to continue migration.
|
||||
|
||||
Driver should implement this method to continue monitor the migration
|
||||
progress in storage and perform following steps until 1st phase is
|
||||
completed.
|
||||
|
||||
:param context: The 'context.RequestContext' object for the request.
|
||||
:param source_share: Reference to the original share model.
|
||||
:param destination_share: Reference to the share model to be used by
|
||||
migrated share.
|
||||
:param share_server: Share server model or None.
|
||||
:param destination_share_server: Destination Share server model or
|
||||
None.
|
||||
:return: Boolean value to indicate if 1st phase is finished.
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def migration_complete(
|
||||
self, context, source_share, destination_share,
|
||||
share_server=None, destination_share_server=None):
|
||||
"""Completes migration of a given share to another host.
|
||||
|
||||
.. note::
|
||||
Is called in source share's backend to complete migration.
|
||||
|
||||
If driver is implementing 2-phase migration, this method should
|
||||
perform the disruptive tasks related to the 2nd phase of migration,
|
||||
thus completing it. Driver should also delete all original share data
|
||||
from source backend.
|
||||
|
||||
:param context: The 'context.RequestContext' object for the request.
|
||||
:param source_share: Reference to the original share model.
|
||||
:param destination_share: Reference to the share model to be used by
|
||||
migrated share.
|
||||
:param share_server: Share server model or None.
|
||||
:param destination_share_server: Destination Share server model or
|
||||
None.
|
||||
:return: List of export locations to update the share with.
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def migration_cancel(
|
||||
self, context, source_share, destination_share,
|
||||
share_server=None, destination_share_server=None):
|
||||
"""Cancels migration of a given share to another host.
|
||||
|
||||
.. note::
|
||||
Is called in source share's backend to cancel migration.
|
||||
|
||||
If possible, driver can implement a way to cancel an in-progress
|
||||
migration.
|
||||
|
||||
:param context: The 'context.RequestContext' object for the request.
|
||||
:param source_share: Reference to the original share model.
|
||||
:param destination_share: Reference to the share model to be used by
|
||||
migrated share.
|
||||
:param share_server: Share server model or None.
|
||||
:param destination_share_server: Destination Share server model or
|
||||
None.
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def migration_get_progress(
|
||||
self, context, source_share, destination_share,
|
||||
share_server=None, destination_share_server=None):
|
||||
"""Obtains progress of migration of a given share to another host.
|
||||
|
||||
.. note::
|
||||
Is called in source share's backend to obtain migration progress.
|
||||
|
||||
If possible, driver can implement a way to return migration progress
|
||||
information.
|
||||
:param context: The 'context.RequestContext' object for the request.
|
||||
:param source_share: Reference to the original share model.
|
||||
:param destination_share: Reference to the share model to be used by
|
||||
migrated share.
|
||||
:param share_server: Share server model or None.
|
||||
:param destination_share_server: Destination Share server model or
|
||||
None.
|
||||
:return: A dictionary with at least 'total_progress' field containing
|
||||
the percentage value.
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
def connection_get_info(self, context, share, share_server=None):
|
||||
"""Is called to provide necessary generic migration logic.
|
||||
|
||||
:param context: The 'context.RequestContext' object for the request.
|
||||
:param share: Reference to the share being migrated.
|
||||
:param share_server: Share server model or None.
|
||||
:return: A dictionary with migration information.
|
||||
"""
|
||||
|
||||
Has a default implementation that can be overridden.
|
||||
|
||||
The general approach for a driver-assisted migration is that drivers will be
|
||||
invoked to analyze compatibility with the destination backend and return a
|
||||
dictionary containing information that describes whether they are compatible
|
||||
and which capabilities, such as perserving metadata, remaining writable and
|
||||
being non-disruptive, are supported. To obtain information to perform this
|
||||
analysis, drivers are advised to read other related backends data from manila
|
||||
configuration file. Ideally, they should be talking to each other through RPC
|
||||
calls, but sensitive data such as passwords should not be included in RPC
|
||||
responses, so such approach cannot be taken at this moment. At this point, it
|
||||
is recommend that drivers also test for connectivity with the destination
|
||||
backend.
|
||||
|
||||
If the destination share has a share network ID defined, it is implied that the
|
||||
share requires a share server, so manila code will send a request to the
|
||||
destination backend so it can provide a share server, where if one does not
|
||||
exist, it will be created on the destination backend, invoked by manila.
|
||||
|
||||
Then, the migration_start method of the source backend driver is invoked, to
|
||||
perform migration. This method should start the migration job in the storage
|
||||
and return. Manila will invoke the method migration_continue according to a
|
||||
periodic task so the driver can perform subsequent steps to continue migration
|
||||
until the first phase is completed, in which the driver should return so.
|
||||
|
||||
The driver should also make sure that while migration is not completed,
|
||||
the source share instance must be revertible to and its data intact.
|
||||
|
||||
At last, the administrator will invoke migration-complete to perform the last,
|
||||
possibly disruptive, steps of migration. Drivers should remove the source share
|
||||
at this moment. Manila will also apply the existing access rules to the
|
||||
destination instance using update_access driver interface.
|
||||
|
||||
Additionally, if the current base driver class implementation for several
|
||||
methods used by the host-assisted migration is not supported by a driver, the
|
||||
driver can override those methods adding a special behavior to support the
|
||||
host-assisted migration approach.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
In order to access a share's data, it must be mounted by an entity. The entity
|
||||
mounting the share is responsible for the data. During migration, if the
|
||||
host-assisted approach is used, the Data Service will be mounting the migrating
|
||||
share and copying data, thus it will expose the share's contents to the Data
|
||||
Service node during a limited period of time. The Data Service is accessible
|
||||
through the administrator network, thus it grants access to data to whoever is
|
||||
able to connect to the Data Service. Restricted access to this node is advised.
|
||||
|
||||
This change includes new entries to rootwrap permission file, for following
|
||||
commands that must be run as root:
|
||||
|
||||
- ls -pA1 --group-directories-first %s
|
||||
- touch --reference=%s %s
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
All the copy-related commands are resource-intensive and should be run in a
|
||||
separate node where only the Data Service is installed, thus not disrupting the
|
||||
other services.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
New configuration options are introduced. Most have default values, while a few
|
||||
others require an administrator to input them. In order for the Data Service
|
||||
to handle mounting shares of several different protocols, it needs to be
|
||||
configured:
|
||||
|
||||
* The node must be set up in the admin network, and the config option
|
||||
'data_node_access_ip' must be set with the IP value of this node interface
|
||||
that connects it to the admin network. This is enough to mount shares which
|
||||
access rules are IP-based.
|
||||
|
||||
* Protocol libraries like for NFS and CIFS need to be installed in this node.
|
||||
|
||||
* For protocols which access rules are certificate-based, the certificate needs
|
||||
to be installed and the config option 'data_node_access_cert' must be set.
|
||||
|
||||
* For protocols which access rules are user-based, the user must be configured
|
||||
in the node and backend security service as an administrator. The
|
||||
username must be set in the 'data_node_access_admin_user' config option and
|
||||
the 'data_node_mount_options' config option must be set with the command
|
||||
parameters that include the username, password and domains required to mount
|
||||
as the admin user.
|
||||
|
||||
* Other protocols other than NFS and CIFS have not been tested, they may work
|
||||
if their access type is included among the supported ones.
|
||||
|
||||
* In order to properly check compatibility with destination backends, drivers
|
||||
will rely on their local configuration files to read information about other
|
||||
backends, so it is advisable that deployers try to keep configuration files
|
||||
of multiple manila-share nodes synchronized and the latest values loaded in
|
||||
the services' memory.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Driver vendors and CI maintainers are advised to enable migration tests to
|
||||
validate whether the host-assisted approach works for their respective drivers
|
||||
and share protocols.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Upon receiving the API request to migrate a share, the Share API layer will
|
||||
perform the following validations:
|
||||
|
||||
Check if share has replicas:
|
||||
|
||||
* if True, return error 409 (Conflict). Migration of a share with replicas is
|
||||
not handled at this moment.
|
||||
|
||||
Check share's status:
|
||||
|
||||
* If not available, return error 400 (Invalid).
|
||||
|
||||
Check if share is busy with another task:
|
||||
|
||||
* If busy, return error 400 (Invalid).
|
||||
|
||||
Check if destination host is different:
|
||||
|
||||
* If it is the same, return error 400 (Invalid).
|
||||
|
||||
Check if there are snapshots:
|
||||
|
||||
* If there are, return error 400 (Invalid). Migration of a share with
|
||||
snapshots is not handled at this moment.
|
||||
|
||||
Check if destination host is available:
|
||||
|
||||
* If it is not, return error 400 (Invalid).
|
||||
|
||||
Check if the new_share_type and share_network_id supplied exist:
|
||||
|
||||
* If not, return error 400 (Invalid).
|
||||
|
||||
If all validations succeed, it should set task_state to MIGRATION_STARTING and
|
||||
invoke the scheduler asynchronously to validate the host against the share
|
||||
type. If host validation fails, scheduler will set task_state to
|
||||
MIGRATION_ERROR (no notification). Else it will invoke the source share's
|
||||
manager also asynchronously to proceed with migration.
|
||||
|
||||
If new_share_type is supplied, it will be used when validating the host in the
|
||||
scheduler instead of the share's original one. The new_share_network, if
|
||||
supplied, will be used when creating the share instance model that will be used
|
||||
by the migrated share, thus triggering the creation of a new share server, if
|
||||
necessary.
|
||||
|
||||
At the share manager, it will change the share's task_state to
|
||||
MIGRATION_IN_PROGRESS and instance status to MIGRATING. Then, it will prepare
|
||||
to invoke the driver-assisted migration if the force_host_assisted_migration
|
||||
API parameter is set to False.
|
||||
|
||||
First, it will attempt to perform the driver-assisted migration, by creating a
|
||||
destination share instance model, obtaining a share server for it and invoking
|
||||
a method that checks for compatibility. If it succeeds and returned
|
||||
capabilities correspond to the supplied API values for 'writable',
|
||||
'preserve_metadata' and 'non-disruptive', the task_state is set to
|
||||
MIGRATION_DRIVER_STARTING and the driver's migration_start method is invoked,
|
||||
in which the driver is expected to start the migration job and return a list of
|
||||
export locations to access the destination instance, if possible at this point.
|
||||
|
||||
At this point, the task state is set to MIGRATION_DRIVER_IN_PROGRESS and a
|
||||
period task runs to invoke the driver's migration_continue method to perform
|
||||
the next steps of migration until it returns True, signaling that the first
|
||||
migration phase has completed, allowing the task state to be set to
|
||||
MIGRATION_DRIVER_PHASE1_DONE.
|
||||
|
||||
If any exception is raised before the first migration phase is completed, all
|
||||
data allocated, such as the destination share instance model and share server
|
||||
is cleaned up. If an exception is raised during the second migration phase,
|
||||
data is not cleaned up so the administrator can analyze the failure and
|
||||
possibly fix manually.
|
||||
|
||||
If the driver-assisted migration fails up to the migration_start driver call,
|
||||
the host-assisted approach takes over, if the variables 'preserve_metadata',
|
||||
'writable' and 'nondisruptive' supplied API values are all False.
|
||||
|
||||
The host-assisted approach code consists in changing all of share's access
|
||||
rules to read-only through the driver (rules are not changed in DB), creating a
|
||||
new share in the destination host through RPCAPI asynchronously and waiting for
|
||||
it to have "available" status, obtaining the connection_info dictionaries for
|
||||
the source and destination backends and invoking the Data Service
|
||||
asynchronously to perform the migration.
|
||||
|
||||
The RPCAPIs for the Data Service include "migration_start" which perform the
|
||||
data copy with regards to the logic for migration (like setting proper statuses
|
||||
and notifying the source backend when necessary), "data_copy_cancel" and
|
||||
"data_copy_get_progress" which are detailed below and are applicable to any
|
||||
data copy job.
|
||||
|
||||
The connection_info dictionary consists in the access mapping compatible with
|
||||
the share being migrated and two templates, one for the mount command, and one
|
||||
for the unmount command. The base driver class has a default implementation for
|
||||
these templates, and can be overridden if the driver requires a particular
|
||||
custom behavior. Both command templates can be customized through the
|
||||
manila.conf configuration file for each backend section, although the Data
|
||||
Service expects at least the "%(path)s" section to be present in the template
|
||||
so it can be replaced by the appropriate export location. Other template
|
||||
elements need to be overridden by customizing the "connection_get_info" method.
|
||||
|
||||
The default mount template is: mount -vt %(proto)s %(options) %(export)s
|
||||
%(path)s
|
||||
The default unmount template is: umount -v %(path)s
|
||||
|
||||
The following access mapping is predefined in the driver base class::
|
||||
|
||||
{
|
||||
'ip': ['nfs'],
|
||||
'user': ['cifs'],
|
||||
}
|
||||
|
||||
The Data Service does the hard work of migration, it is responsible for calling
|
||||
the API to add the proper access rule to be able to mount both the source and
|
||||
destination shares, mount, copy, unmount, and delete the access rules.
|
||||
|
||||
The way to determine the proper access rule type is according to the share's
|
||||
protocol and access mapping configured by the driver. The share's protocol will
|
||||
select the access types where the protocol entry is present and then intersect
|
||||
with the other backend's access mapping to add an access rule that does not
|
||||
cause errors for any of the involved drivers.
|
||||
|
||||
All access rules related to the access types present in the intersected access
|
||||
mapping will be added and later removed after migration. To properly fill the
|
||||
'access_to' field of the access rule entries, the Data Service reads a config
|
||||
option for each type, as follows:
|
||||
|
||||
* If access type is 'user', it will read 'data_node_admin_user' config option.
|
||||
In this case, it is also expected that the administrator has filled the
|
||||
'data_node_mount_options' such as '-o user=foo,pass=bar' if that is necessary
|
||||
to mount the share.
|
||||
|
||||
* If access type is 'cert', it will read 'data_node_access_cert' config option.
|
||||
|
||||
* If access type is 'ip', it will read 'data_node_access_ip' config option.
|
||||
|
||||
* Else it will throw an error message that the access type provided is not
|
||||
supported.
|
||||
|
||||
It is expected that the Data Service node is configured properly in the admin
|
||||
network, has proper libraries and certificates installed and security service
|
||||
user configured.
|
||||
|
||||
The copy work is done by specialized copy class. This class is responsible for
|
||||
iterating through files recursively, copy them attempting to preserve all
|
||||
metadata (cp -P --preserve=all), optionally verify if the SHA-2 hashes of the
|
||||
source and destination match, and finally attempt to apply the metadata of all
|
||||
the files. All operations are performed as root so user restrictions can be
|
||||
bypassed and set back to files after copy. Since all copy operations are
|
||||
performed by running linux commands through rootwrap, the Data Service thread
|
||||
sleeps until the process exits, thus allowing the Data Service to process
|
||||
multiple RPC requests while the node is copying bytes. If any of the
|
||||
above-mentioned operations fail or are not validated, it will be retried once,
|
||||
and if it fails a second time, it will return an error and migration will fail.
|
||||
|
||||
After copying, the Data Service will set the task state to
|
||||
DATA_COPYING_COMPLETED, allowing the admin to invoke migration-complete.
|
||||
|
||||
The share's manager migration_complete method will first check the migrating
|
||||
share's task_state value to decide whether to invoke the driver's
|
||||
migration_complete method or host-assisted approach one. The driver is expected
|
||||
to perform the last disruptive steps of migration and return the list of
|
||||
export locations pertaining to the migrated instance. The host-assisted
|
||||
migration_complete applies the access rules to the new share according to the
|
||||
DB (the original rules), sets the destination share status to "available",
|
||||
deletes the source share and sets the task_state to MIGRATION_SUCCESS.
|
||||
|
||||
The migration_cancel API can only be invoked during copying or first phase is
|
||||
completed. The migration_get_progress API can be invoked at any time, but it
|
||||
will only query the driver or the Data Service for the progress if they are at
|
||||
the step of migrating or copying files, respectively.
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
ganso
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Implement changes agreed in this spec. Since they are improvements to
|
||||
existing code, they could be implemented as a single patch.
|
||||
* Update python-manilaclient with the CLI commands.
|
||||
* Update manila-ui.
|
||||
* Document the implementation (see below).
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
* Update Access interface implemented in drivers.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Unit tests
|
||||
* Tempest tests
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
- Docstrings
|
||||
- Devref
|
||||
- Security guide
|
||||
- User guide
|
||||
- Release notes
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] Newton design summit etherpad discussion::
|
||||
|
||||
https://etherpad.openstack.org/p/newton-manila-data-service-migration
|
||||
|
||||
[2] Mitaka design summit etherpad discussion::
|
||||
|
||||
https://etherpad.openstack.org/p/mitaka-manila-migration-improvements
|
||||
|
||||
[3] Mitaka merged main patches::
|
||||
|
||||
https://review.openstack.org/#/c/244286/
|
||||
https://review.openstack.org/#/c/250515/
|
||||
|
||||
[4] Liberty design summit etherpad discussion::
|
||||
|
||||
https://etherpad.openstack.org/p/YVR-manila-liberty-share-migration
|
||||
|
||||
[5] Liberty merged main patch::
|
||||
|
||||
https://review.openstack.org/#/c/179790/
|
||||
|
||||
[6] Access support mapping::
|
||||
|
||||
http://docs.openstack.org/developer/manila/devref/share_back_ends_feature_support_mapping.html
|
||||
|
||||
Reference in New Issue
Block a user