Update Protection Plugin Design

Implements: blueprint protection-plugin-is-design

Change-Id: Iffd26f3a85346f7c29b93bc97d23d3472acb1f0f
This commit is contained in:
Yuval Brik 2016-05-19 14:39:44 +03:00
parent 09162ed50c
commit f35b04ed7c
4 changed files with 508 additions and 65 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

View File

@ -4,51 +4,185 @@
http://creativecommons.org/licenses/by/3.0/legalcode
.. raw:: html
<style>
.red {color:#d32f2f; font-weight: bold;}
.green {color:#4caf50; font-weight: bold;}
.yellow {color:#fbc02d; font-weight: bold;}
.indigo {color:#536dfe; font-weight: bold;}
</style>
.. role:: red
.. role:: green
.. role:: yellow
.. role:: indigo
==========================================
Pluggable Protection Provider
==========================================
https://blueprints.launchpad.net/smaug/+spec/operation-engine-design
https://blueprints.launchpad.net/smaug/+spec/protection-plugin-is-design
Problem Description
Protection Provider
===================
Even though we allow each provider to be implemented in any way it pleases we
foresee that most providers will want to be able share code between them.
We would also like for a user to be able to easily extend the ProtectionProvider
that will be provided by default.
Protection Provider is a user-facing, configurable, pluggable entity, that
supplies the answer for the questions: "how to" and "where to". By composing
different bank-store (responsible for the "where to") and different *Protection
Plugins* (each responsible for the "how to"). The Protection Provider is
configurable, both in the terms of bank and protection plugins composition, and
in their configuration.
Proposed Change
===============
The protection provider will contain internally, a map between any registered
*Protectable* (OpenStack resource type) and a corresponding *Protection
Plugin*, which is used for operations related to any appropriate resource.
As as solution we propose the *Pluggable Protection Provider*.
There are 3 resource operations a *Protection Provider* supports, and any
*Protection Plugin* needs to implement. These operations usually act on
numerous resources, and the *Protection Provider* infrastructure is responsible
for using the corresponding *Protection Plugin* implemenation, for each
resource. The *Protection Provider* is reponsible for initiating a DFS traverse
of the resource graph, building tasks for each of the resources, and linking
them in respect of the execution order and dependency.
The *Pluggable Protection Provider* will be the reference implementation
protection provider. It's purpose is to be fully pluggable and extandable so
that only extream use cases will need to implement their own Protection Provider
from scratch.
#. **Protect**: the protection provider will traverse the selected resources
from the resource graph
#. **Restore**: the protection provider will traverse the resource graph saved
in the checkpoint
#. **Delete**: the protection provider will traverse the resource graph saved
in the checkpoint
The protection provider will contain internally a map between any registered
*Protectable* and a corrosponding *Protection Plugin*. When the pluggable
protection provider is asked to perform an action, it will walk over the
graph and pass a context object to the appropriate plugin whenever a node is
encountered.
The resource graph is traversed in with DFS. When a node is first encountered
the protection manager gets the plugin for the appropriate resource type, builds
a context and passes it to the plugins `get_pre_task()` method. The plugin can
return any tasks that it wants added to the task list. When all of a node
childrens have been visited the `get_pre_task()` is called. The task returned
from this method will also be added to the task list but is also guranteed to
execute after all the child node's tasks have finished. Any of the methods can
return `None` if they don't want any action performed.
After the entire grap has been traversed the Protection Provider will return
the task lists which will be queued and than executed according to the
After the entire graph has been traversed, the Protection Provider will return
the task flow which will be queued and then executed according to the
executor's policy. When all the tasks are done the operation is considered
complete.
This scheme decouples the tree structure form the task execution. A plugin that
Protection Provider Configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Protection Providers are loaded from configuration files, placed in the
directory specified by the ``provider_config_dir`` configuration option (by
default: ``/etc/smaug/providers.d``). Each provider configuration file must
bear the ``.conf`` suffix and contain a ``[provider]`` section. This section
specifies the following configuration:
#. ``name``: the display name of the protection provider
#. ``id``: unique identifier
#. ``description``: textual description
#. ``bank``: path to the bank plugin
#. ``plugin``: path to a protection plugin. Should be specified multiple times
for multiple protection plugins. Every *Protectable* **must** have a
corresponding *Protection Plugin* to support it.
Additionally, the provider configuration file can include other section
(besides the ``[provider]`` section), to be used as configuration for each bank
or protection plugin.
For example::
[provider]
name = Foo
id = 2e0c8826-81d6-44f5-bbe5-8f46a98c5845
description = Example Protection Provider
bank = smaug.protections.smaug-swift-bank-plugin
plugin = smaug.protections.smaug-volume-protection-plugin
plugin = smaug.protections.smaug-image-protection-plugin
plugin = smaug.protections.smaug-server-protection-plugin
plugin = smaug.protections.smaug-project-protection-plugin
[swift_client]
bank_swift_auth_url = http://10.0.0.10:5000
bank_swift_user = admin
bank_swift_key = password
Protection Plugin
=================
A *Protection Plugin* is a component responsible for the implementation of
operations (protect, restore, delete) of one or more *Protectable* (i.e
resource type). When writing a *Protection Plugin*, the following needs to be
defined:
#. Which resources does the protection plugin support
#. What is the schema of parameters for each operation
#. What is the schema of information the protection plugin stores in a
Checkpoint
#. The implementation of each operation
Protection Plugin Operation Activities
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*Protection Plugin* defines how to protect, restore, and delete resources. In
order to specify the detailed flow of each operation, a *Protection Plugin*
needs to implement numerous 'hooks'. These hooks, named *Activities*, differ
from one another by their time of execution in respect to other activities,
either of the same resource, or other resources.
#. **PreActivity**: invoked before any activity for this resource and dependent
resources has begun
#. **ParallelActivity**: invoked after the resource *PreActivity* is complete,
regardless of the dependent resources' activities.
#. **PostActivity**: invoked after all of the resource's activities are
complete, and the dependent resources' *PostActivities* are complete
For example, a Protection Plugin for Nova servers, might implement a protect
operation by using *PreActivity* to contact a guest agent, in order to complete
database and operation system transactions, use *ParallelActivity* to backup
the server metadata, and use *PostActivity* to contact a guest agent, in order
to resume transactions.
Practically, the protection plugin may implement methods in the form of::
activity_<operation_type>_<activity_type>
Where:
* ``operation_type`` is one of: ``protect``, ``restore``, ``delete``
* ``activity_type`` is one of: ``pre``, ``post``, ``parallel``
Notes:
* Unimplemented methods are practically no-op
* Each such method receives as parameters: ``checkpoint``, ``context``,
``resource``, and ``parameters`` objects
* These methods may return immediately, or use ``yield``. In the case ``yield``
is used, the Protection Provider infrastructure is responsible for
periodically call ``next()``, in order to "poll". This is extremely useful in
cases where asynchronous operations are initiated (such as Cinder volume
creation), but polling must be performed in order to decide when the
operation is complete, and whether it is successful or not. For example:
::
def activity_protect_parallel(self, checkpoint, context, resource, parameters):
id = start_operation( ... )
while True:
status = get_status(id)
if status == 'error':
raise Exception
elif status == 'success':
return
else:
yield
.. figure:: https://raw.githubusercontent.com/openstack/smaug/master/doc/images/protection-service/activities-links.png
:alt: Activities Links
:align: center
Activities Links
:green:`Green`: link of the parent resource PreActivity to the child
resource PreActivity
:yellow:`Yellow`: link of the resource PreActivity to ParallelActivity
:red:`Red`: link of the resource ParallelActivity to PostActivity
:indigo:`Indigo`: link of the child resource PostActivity to the parent
resource PostActivity
This scheme decouples the tree structure from the task execution. A plugin that
handles multiple resources or that aggregates mutiple resources to one task can
use this mechanism to only return tasks when appropriate for it's scheme.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 57 KiB

View File

@ -10,24 +10,34 @@ Protection Service Basics
https://bugs.launchpad.net/smaug/+bug/1529199
Protection Service is a component of smaug (an openstack project working as a service for data protection), which is responsible to execute protect/restore/other actions on operations (triggered plans).
Protection Service is a component of smaug (an openstack project working as a
service for data protection), which is responsible to execute
protect/restore/other actions on operations (triggered plans).
Architecturally, it acts as a RPC server role for smaug API service to actually execute the actions on triggered operations.
Architecturally, it acts as a RPC server role for smaug API service to actually
execute the actions on triggered operations.
It's also the role who actually cooperates with protection plugins provided by providers. It will load providers (composed by a series of plugins) and thus manage them.
It's also the role who actually cooperates with protection plugins provided by
providers. It will load providers (composed by a series of plugins) and thus
manage them.
Internally, protection service will construct work flow for each operation action execution, where tasks in work flow will be linked to a graph by resource dependency and thus be executed on parallel or linearly according to the graph task flow.
Internally, protection service will construct work flow for each operation
action execution, where tasks in work flow will be linked to a graph by
resource dependency and thus be executed on parallel or linearly according to
the graph task flow.
RPC interfaces
================================================
.. image:: https://raw.githubusercontent.com/openstack/smaug/master/doc/images/protection-service/protection-architecture.png
From the module graph, protection service basically provide following RPC calls:
From the module graph, protection service basically provide following RPC
calls:
Operation RPC:
--------------------
**execute_operation(backup_plan:Bac,upPlan, action:Action):** where action could be protect or restore
**execute_operation(backup_plan:BackupPlan, action:Action):** where action
could be protect or restore
Provider RPC:
-------------
@ -51,65 +61,90 @@ Main Concept
Protection Manager
------------------
Endpoint of the RPC server, which will handle Operation RPC calls and dispatch other RPC calls to corresponding components.
Endpoint of the RPC server, which will handle Operation RPC calls and dispatch
other RPC calls to corresponding components.
It will produce a graph work flow for each operation execution, and have the work flow to be executed through its work flow engine.
It will produce a graph work flow for each operation execution, and have the
work flow to be executed through its work flow engine.
ProviderRegistry
----------------
Entity to manage multiple providers, which will load provider definitions on init from config files and maintain them in memory map.
Entity to manage multiple providers, which will load provider definitions on
init from config files and maintain them in memory map.
It will actually handle RPC related to provider management, like list_providers() or show_provider().
It will actually handle RPC related to provider management, like
list_providers() or show_provider().
CheckpointCollection
--------------------
Entity to manage checkpoints, which provides CRUD interfaces to handle checkpoint. As checkpoint is a smaug internal entity, one checkpoint operation is actually composed by combination of serveral BankPlugin atomic operations.
Entity to manage checkpoints, which provides CRUD interfaces to handle
checkpoint. As checkpoint is a smaug internal entity, one checkpoint operation
is actually composed by combination of serveral BankPlugin atomic operations.
Take create_checkpoint as example, it will first acquire write lease (there will be detailed **lease** design doc) to avoid conflict with GC deletion, then it needs create key/value for checkpoint itself. After that, it will build multiple indexes for easier list checkpoints.
Take create_checkpoint as example, it will first acquire write lease (there
will be detailed **lease** design doc) to avoid conflict with GC deletion, then
it needs create key/value for checkpoint itself. After that, it will build
multiple indexes for easier list checkpoints.
Typical scenario
======================================
A typical scenario will start from a triggered operation being sent through RPC call to Protection Service.
A typical scenario will start from a triggered operation being sent through RPC
call to Protection Service.
Let's take action protect as the example and analyze the sequence together with the class graph:
Let's take action protect as the example and analyze the sequence together with
the class graph:
.. image:: https://raw.githubusercontent.com/openstack/smaug/master/doc/images/protection-service/protect-rpc-call-seq-diagram.png
1. Smaug **Operation Engine**
------------------------------
who is responsible for triggering operation according to time schedule or events, will call RPC call of Protection Service: execute_operation(backup_plan:Bac,upPlan, action:Action);
who is responsible for triggering operation according to time schedule or
events, will call RPC call of Protection Service:
execute_operation(backup_plan:Bac,upPlan, action:Action);
2. ProtectionManager
------------------------
who plays as one of the RPC server endpoints, and will handle this RPC call by following sequence:
who plays as one of the RPC server endpoints, and will handle this RPC call by
following sequence:
2.1 CreateCheckpointTask:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This task will be the start point task of the graph flow. This task will call
the unique instance of class
**Checkpoints**:create_checkpoint(plan:ProtectionPlan), to create one
checkpoint to persist the status of the action execution.
This task will be the start point task of the graph flow. This task will call the unique instance of class **Checkpoints**:create_checkpoint(plan:ProtectionPlan), to create one checkpoint to persist the status of the action execution.
The instance of **Checkpoints** will retrieve the **Provider** from input
parameter **BackupPlan**, and get the unique instance of **BankPlugin**.
The instance of **Checkpoints** will retrieve the **Provider** from input parameter **BackupPlan**, and get the unique instance of **BankPlugin**.
While **BankPlugin** provides interfaces for CRUD key/values in **Bank** and
lease interfaces to avoid write/delete conflict, **Checkpoints** is responsible
for the whole procedure of create checkpoint, including grant lease,
create key/value of checkpoint, build indexes etc. through composing calls to
**BankPlugin**
While **BankPlugin** provides interfaces for CRUD key/values in **Bank** and lease interfaces to avoid write/delete conflict, **Checkpoints** is responsible for the whole procedure of create checkpoint, including grant lease, create key/value of checkpoint, build indexes etc. through composing calls to **BankPlugin**
2.2 Call ProtectionProvider to build the resource flow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This task is built by walking through **resource tree** (see
**Pluggable protection provider** doc), which will return a graph flow.
The result graph flow is composed of tasks representing the activities of the
ProtectionPlugin for each resource, and the links between the tasks according
to the activities type, and resource dependencies.
2.2 call ProtectionProvider to build sub task flow:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The graph flow returned by ProtectionProvider would be added to the top layer
task flow, right behind the start point task **CreateCheckpointTask**, and will
be executed with parallel engine.
This task is built by walking through **resource tree** (see **Pluggable protection provider** doc), which will return a graph flow. The result graph flow could be composed by single task or multiple tasks built with dependencies.
The protection plugin is responsible for storing the ProtectionData (backup
id, snapshot id, image id, etc) into the Bank under the corresponding
**ProtectionDefinition**.
The graph flow returned by ProtectionProvider would be added to the top layer task flow, right behind the start point task **CreateCheckpointTask**, and will be executed with parallel engine.
When it comes to each resource task returned from ProtectionProvider task flow building, each task will call protect() interface of related ProtectionPlugin. There, we will get ProtectionData as the return result, which describes the restore target (where the resource is protected to) and the id of the protection data (backup id, snapshot id, image id etc., anything). This ProtectionData will be persisted into Bank under the corresponding **ProtectionDefinition**.
2.3 SyncCheckpointStatusTask:
2.3 CompleteCheckpointTask
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This task is added into the top layer task flow right after the task flow built form ProtectProvider, which will be executed only when all tasks/flows ahead of it have been executed successfully.
This task will list all **ProtectionDefinition** under one checkpoint, for each ProtectionDefinition: if its ProtectionData status hasn't turned to be available, this task will check its protection_id status (backup, snapshot, replication status) by calling ProtectionPlugin.get_protection_status(). If any ProtectionData turns to be available, its status will be updated to the corresponding ProtectionDefinition and won't be checked next time.
Since each protect action will take some time to achieve finished status (ProtectionData turns to be available), this task could be executed periodically or only executed once before timeout.
Until the operation timeout, this task will get the final status of this checkpoint: if all protect actions have achieved finished status, then the checkpoint is finished; otherwise, the checkpoint is broken and will be abandoned.
This task is added into the top layer task flow right after the task flow built
form ProtectProvider, which will be executed only when all tasks ahead of it
have been completed successfully. This task will update the checkpoint status
to be available, and commit it to the bank.