Implementation of recovery method customization

Implements the spec recovery method customization by
configuring the  actions in terms of execution order,
extra parameters to execute commands in action etc.

blueprint: recovery-method-customization
Change-Id: Ibc80ae0a749bd0a53a432a600ca9f0aaa16d5973
This commit is contained in:
shilpa.devharakar 2018-05-07 13:16:55 +05:30 committed by Pooja Jadhav
parent f4abd4319c
commit ed33a646c2
1 changed files with 307 additions and 0 deletions

View File

@ -0,0 +1,307 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=============================
Recovery method customization
=============================
https://blueprints.launchpad.net/masakari/+spec/recovery-method-customization
This spec talks about making recovery workflow configurable. Operator can
configure the workflow in a config file which can be used to build and
execute the recovery workflow.
What is recovery workflow?
Recovery workflow is nothing but certain set of actions executed to recover
from failure.
Masakari supports three types of recovery failures:
* instance-failure
* process-failure
* host-failure
For each of these failures, Masakari executes a workflow to recover from
failure on receiving the notification.
Problem description
===================
Masakari uses taskflow library to execute the workflows which consists of
recovery actions which are predefined and are executed linearly. If operator
wants to add/remove any existing recovery actions to any of these workflow,
then there is no other way to do so without making changes in the code.
For example in case of host-failure recovery workflow, predefined workflow is:
* disable_compute_node
* prepare_ha_enabled_instances
* evacuate and confirm_evacuate
If operator wants to remove task disable_compute_node from the workflow or add
a new task such as send an alert mail to operator then it's not possible with
the current implementation.
Use Cases
---------
Operator may want to add/remove tasks from the existing workflow based on
their requirements.
For example, in case of host-failure recovery workflow predefined flow is;
disable_compute_node, prepare_ha_enabled_instances, evacuate and
confirm_evacuate.
Some of the possible recovery workflow combinations can be:
* send Alert/Mail to operator/users of vms -> disable_compute_node
-> prepare_ha_enabled_instances -> evacuate
-> update the pricing/ metering DB -> confirm_evacuate
-> send Alert/Mail to operator/user (recovery done)
* send Alert/Mail to operator/users of vms -> disable_compute_node
-> prepare_ha_enabled_instances
-> evacuate -> confirm_evacuate
-> send Alert/Mail to operator/users of vms (recovery done)
* send Alert/Mail to operator/users of vms
Proposed change
===============
Make a provision to add/remove tasks from the existing workflow based on the
requirements. We plan to decompose the existing hard-coded recovery workflow
into separate tasks and then tied them together to form a workflow which can be
configured in a new conf masakari-custom-recovery-methods.conf file as explained
below:
Add a section [taskflow_driver_recovery_flows] in newly added
masakari-custom-recovery-methods.conf file. Under this add below config options for
configuration of customized recovery actions for each type of workflow.
Each config option will be dictionary containing key/value pairs for
tasks to be executed.
For example: pre:[v1,v2],main:[v1,v2],post:[v1,v2,v3]
Here key will be pre/main/post and value will be the list of tasks to execute
for recovery failure.
If file does not exist, then default tasks will be executed that will be
configured during registration of configuration options.
* instance_failure_recovery_tasks is a dictionary containing key as
pre/main/post and value will be the comma-separated list of tasks to be
executed for process failure.
* process_failure_recovery_tasks is a dictionary containing key as
pre/main/post and value will be the comma-separated list of tasks to be
executed for process failure.
* host_auto_failure_recovery_tasks is a dictionary containing key as
pre/main/post and value will be the comma-separated list of tasks to be
executed for host failure for auto recovery.
* host_rh_failure_recovery_tasks is a dictionary containing key as
pre/main/post and value will be the comma-separated list of tasks to be
executed for host failure for rh recovery.
For example,
.. code::
[taskflow_driver_recovery_flows]
instance_failure_recovery_tasks = pre:['custom_pre_task','stop_instance_task'],
main:['start_instance_task','custom_main_task'],
post:['confirm_instance_active_task','custom_post_task']
process_failure_recovery_tasks = pre:['disable_compute_node_task'],
main:['confirm_compute_node_disabled_task','custom_main_task'],
post:['custom_post_task']
host_auto_failure_recovery_tasks = pre:['disable_compute_service_task'],
main:['prepare_HA_enabled_instances_task'],
post:['evacuate_instances_task','custom_post_task']
host_rh_failure_recovery_tasks = pre:['custom_pre_task','disable_compute_service_task'],
main:['prepare_HA_enabled_instances_task',
'evacuate_instances_task']
post:['custom_post_task']
Need to add entry point for each task in setup.cfg so that these tasks can be
loaded dynamically using stevedore during creation of a recovery workflow.
For example, Masakari setup.cfg will have following entry points:
* For each entry point in setup.cfg should have the full class path as mentioned
in below example:
.. code::
masakari.task_flow.tasks =
stop_instance_task = masakari.engine.drivers.taskflow.instance_failure:StopInstanceTask
.. code::
masakari.task_flow.tasks =
disable_compute_service_task = <full_class_path_of_task>
prepare_HA_enabled_instances_task = <full_class_path_of_task>
evacuate_instances_task = <full_class_path_of_task>
stop_instance_task = <full_class_path_of_task>
start_instance_task = <full_class_path_of_task>
confirm_instance_active_task = <full_class_path_of_task>
disable_compute_node_task = <full_class_path_of_task>
confirm_compute_node_disabled_task = <full_class_path_of_task>
If operator wants to configure customized tasks in a Third Party library,
then they will need to follow below guidelines to associate newly added
tasks with the respective recovery workflows in Masakari:
* First make sure required Third Party Library is installed on the Masakari
engine node.
* Configure custom task in Third Party Library's setup.cfg as below:
For example, Third Party Libraries setup.cfg will have following entry points
.. code::
masakari.task_flow.tasks =
custom_pre_task = <custom_task_class_path_from_third_party_library>
custom_main_task = <custom_task_class_path_from_third_party_library>
custom_post_task = <custom_task_class_path_from_third_party_library>
Note:
Entry point in Third Party Library's setup.cfg should have same key as
in Masakari setup.cfg for respective failure recovery.
* If there are any configuration parameters required for custom task,
then add them into masakari-custom-recovery-methods.conf under the
same group/section where they are registered in Third Party Library.
Operator will be responsible to generate masakari configuration file
by themselves.
* Operator should ensure output of each task should be made available to
the next tasks needing them.
Alternatives
------------
For recovery from failures, instead of fully configurable task flow,
one can add custom tasks at the start or after completion of predefined
existing workflow.
One can customized recovery workflow in masakari-custom-recovery-methods.conf
as below and Masakari will inject these custom tasks at start or end of the
predefined workflow as per requirement.
For example,
.. code::
[taskflow_driver_recovery_flows]
instance_failure_recovery_tasks = ['custom_pre_task','custom_main_task']
process_failure_recovery_tasks = ['custom_pre_task']
host_auto_failure_recovery_tasks = ['custom_pre_task','custom_main_task']
host_rh_failure_recovery_tasks = ['custom_pre_task']
custom_pre_task and custom_main_task will be executed at the start or end of
the existing instance_failure workflow.
Note:
For host failure having recovery method as rh, developer should add
custom task in nested flow so that it will execute once.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
A new config file masakari-custom-recovery-methods.conf will be added, where
taskflow_driver_recovery_flows section need to be update for customized
recovery workflows.
If an operator doesn't want any customization to any of the recovery workflows,
then there will be no impact as it will load the default tasks for each
recovery workflow.
For example,
.. code::
[taskflow_driver_recovery_flows]
instance_failure_recovery_tasks = pre:['custom_pre_task','stop_instance_task'],
main:['start_instance_task','custom_main_task'],
post:['confirm_instance_active_task','custom_post_task']
process_failure_recovery_tasks = pre:['disable_compute_node_task'],
main:['confirm_compute_node_disabled_task','custom_main_task'],
post:['custom_post_task']
host_auto_failure_recovery_tasks = pre:['disable_compute_service_task'],
main:['prepare_HA_enabled_instances_task'],
post:['evacuate_instances_task','custom_post_task']
host_rh_failure_recovery_tasks = pre:['custom_pre_task','disable_compute_service_task'],
main:['prepare_HA_enabled_instances_task',
'evacuate_instances_task']
post:['custom_post_task']
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* bhagyashris <bhagyashri.shewale@nttdata.com>
Work Items
----------
* Implement customize task flow execution.
* Add unit tests for the coverage.
* Add documentation guide to describe how to configure customizable workflows.
Dependencies
============
None
Testing
=======
None
Documentation Impact
====================
Add documentation guide to describe how to configure customizable workflows.
References
==========
https://etherpad.openstack.org/p/masakari-recovery-method-customization
http://eavesdrop.openstack.org/meetings/masakari/2018/masakari.2018-07-03-03.00.log.html
History
=======
None