From ed33a646c243c2f357fd195b7609384c28389cf7 Mon Sep 17 00:00:00 2001 From: "shilpa.devharakar" Date: Mon, 7 May 2018 13:16:55 +0530 Subject: [PATCH] Implementation of recovery method customization Implements the spec recovery method customization by configuring the actions in terms of execution order, extra parameters to execute commands in action etc. blueprint: recovery-method-customization Change-Id: Ibc80ae0a749bd0a53a432a600ca9f0aaa16d5973 --- .../recovery-method-customization.rst | 307 ++++++++++++++++++ 1 file changed, 307 insertions(+) create mode 100644 specs/rocky/approved/recovery-method-customization.rst diff --git a/specs/rocky/approved/recovery-method-customization.rst b/specs/rocky/approved/recovery-method-customization.rst new file mode 100644 index 0000000..122d6a3 --- /dev/null +++ b/specs/rocky/approved/recovery-method-customization.rst @@ -0,0 +1,307 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +============================= +Recovery method customization +============================= + +https://blueprints.launchpad.net/masakari/+spec/recovery-method-customization + +This spec talks about making recovery workflow configurable. Operator can +configure the workflow in a config file which can be used to build and +execute the recovery workflow. + +What is recovery workflow? + +Recovery workflow is nothing but certain set of actions executed to recover +from failure. +Masakari supports three types of recovery failures: + +* instance-failure +* process-failure +* host-failure + +For each of these failures, Masakari executes a workflow to recover from +failure on receiving the notification. + +Problem description +=================== + +Masakari uses taskflow library to execute the workflows which consists of +recovery actions which are predefined and are executed linearly. If operator +wants to add/remove any existing recovery actions to any of these workflow, +then there is no other way to do so without making changes in the code. +For example in case of ‘host-failure recovery workflow’, predefined workflow is: + +* disable_compute_node +* prepare_ha_enabled_instances +* evacuate and confirm_evacuate + +If operator wants to remove task ‘disable_compute_node’ from the workflow or add +a new task such as send an alert mail to operator then it's not possible with +the current implementation. + +Use Cases +--------- + +Operator may want to add/remove tasks from the existing workflow based on +their requirements. +For example, in case of ‘host-failure recovery’ workflow predefined flow is; +disable_compute_node, prepare_ha_enabled_instances, evacuate and +confirm_evacuate. +Some of the possible recovery workflow combinations can be: + +* send Alert/Mail to operator/users of vms -> disable_compute_node + -> prepare_ha_enabled_instances -> evacuate + -> update the pricing/ metering DB -> confirm_evacuate + -> send Alert/Mail to operator/user (recovery done) +* send Alert/Mail to operator/users of vms -> disable_compute_node + -> prepare_ha_enabled_instances + -> evacuate -> confirm_evacuate + -> send Alert/Mail to operator/users of vms (recovery done) +* send Alert/Mail to operator/users of vms + +Proposed change +=============== + +Make a provision to add/remove tasks from the existing workflow based on the +requirements. We plan to decompose the existing hard-coded recovery workflow +into separate tasks and then tied them together to form a workflow which can be +configured in a new conf ‘masakari-custom-recovery-methods.conf’ file as explained +below: +Add a section ‘[taskflow_driver_recovery_flows]’ in newly added +masakari-custom-recovery-methods.conf file. Under this add below config options for +configuration of customized recovery actions for each type of workflow. +Each config option will be dictionary containing key/value pairs for +tasks to be executed. +For example: pre:[v1,v2],main:[v1,v2],post:[v1,v2,v3] +Here key will be pre/main/post and value will be the list of tasks to execute +for recovery failure. +If file does not exist, then default tasks will be executed that will be +configured during registration of configuration options. + +* ‘instance_failure_recovery_tasks’ is a dictionary containing key as + pre/main/post and value will be the comma-separated list of tasks to be + executed for process failure. +* ‘process_failure_recovery_tasks’ is a dictionary containing key as + pre/main/post and value will be the comma-separated list of tasks to be + executed for process failure. +* ‘host_auto_failure_recovery_tasks’ is a dictionary containing key as + pre/main/post and value will be the comma-separated list of tasks to be + executed for host failure for auto recovery. +* ‘host_rh_failure_recovery_tasks’ is a dictionary containing key as + pre/main/post and value will be the comma-separated list of tasks to be + executed for host failure for rh recovery. + +For example, + +.. code:: + + [taskflow_driver_recovery_flows] + instance_failure_recovery_tasks = pre:['custom_pre_task','stop_instance_task'], + main:['start_instance_task','custom_main_task'], + post:['confirm_instance_active_task','custom_post_task'] + process_failure_recovery_tasks = pre:['disable_compute_node_task'], + main:['confirm_compute_node_disabled_task','custom_main_task'], + post:['custom_post_task'] + host_auto_failure_recovery_tasks = pre:['disable_compute_service_task'], + main:['prepare_HA_enabled_instances_task'], + post:['evacuate_instances_task','custom_post_task'] + host_rh_failure_recovery_tasks = pre:['custom_pre_task','disable_compute_service_task'], + main:['prepare_HA_enabled_instances_task', + 'evacuate_instances_task'] + post:['custom_post_task'] + +Need to add entry point for each task in setup.cfg so that these tasks can be +loaded dynamically using stevedore during creation of a recovery workflow. + +For example, Masakari setup.cfg will have following entry points: + +* For each entry point in setup.cfg should have the full class path as mentioned + in below example: + +.. code:: + + masakari.task_flow.tasks = + stop_instance_task = masakari.engine.drivers.taskflow.instance_failure:StopInstanceTask + +.. code:: + + masakari.task_flow.tasks = + disable_compute_service_task = + prepare_HA_enabled_instances_task = + evacuate_instances_task = + stop_instance_task = + start_instance_task = + confirm_instance_active_task = + disable_compute_node_task = + confirm_compute_node_disabled_task = + +If operator wants to configure customized tasks in a Third Party library, +then they will need to follow below guidelines to associate newly added +tasks with the respective recovery workflows in Masakari: + +* First make sure required Third Party Library is installed on the Masakari + engine node. +* Configure custom task in Third Party Library's setup.cfg as below: + +For example, Third Party Libraries setup.cfg will have following entry points + +.. code:: + + masakari.task_flow.tasks = + custom_pre_task = + custom_main_task = + custom_post_task = + +Note: + Entry point in Third Party Library's setup.cfg should have same key as + in Masakari setup.cfg for respective failure recovery. + +* If there are any configuration parameters required for custom task, + then add them into masakari-custom-recovery-methods.conf under the + same group/section where they are registered in Third Party Library. + Operator will be responsible to generate masakari configuration file + by themselves. + +* Operator should ensure output of each task should be made available to + the next tasks needing them. + + +Alternatives +------------ + +For recovery from failures, instead of fully configurable task flow, +one can add custom tasks at the start or after completion of predefined +existing workflow. + +One can customized recovery workflow in masakari-custom-recovery-methods.conf +as below and Masakari will inject these custom tasks at start or end of the +predefined workflow as per requirement. + +For example, + +.. code:: + + [taskflow_driver_recovery_flows] + instance_failure_recovery_tasks = ['custom_pre_task','custom_main_task'] + process_failure_recovery_tasks = ['custom_pre_task'] + host_auto_failure_recovery_tasks = ['custom_pre_task','custom_main_task'] + host_rh_failure_recovery_tasks = ['custom_pre_task'] + + +custom_pre_task and custom_main_task will be executed at the start or end of +the existing ‘instance_failure’ workflow. + +Note: + For host failure having recovery method as rh, developer should add + custom task in nested flow so that it will execute once. + +Data model impact +----------------- + +None + +REST API impact +--------------- + +None + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +None + +Other deployer impact +--------------------- + +A new config file ‘masakari-custom-recovery-methods.conf’ will be added, where +‘taskflow_driver_recovery_flows’ section need to be update for customized +recovery workflows. +If an operator doesn't want any customization to any of the recovery workflows, +then there will be no impact as it will load the default tasks for each +recovery workflow. +For example, + +.. code:: + + [taskflow_driver_recovery_flows] + instance_failure_recovery_tasks = pre:['custom_pre_task','stop_instance_task'], + main:['start_instance_task','custom_main_task'], + post:['confirm_instance_active_task','custom_post_task'] + process_failure_recovery_tasks = pre:['disable_compute_node_task'], + main:['confirm_compute_node_disabled_task','custom_main_task'], + post:['custom_post_task'] + host_auto_failure_recovery_tasks = pre:['disable_compute_service_task'], + main:['prepare_HA_enabled_instances_task'], + post:['evacuate_instances_task','custom_post_task'] + host_rh_failure_recovery_tasks = pre:['custom_pre_task','disable_compute_service_task'], + main:['prepare_HA_enabled_instances_task', + 'evacuate_instances_task'] + post:['custom_post_task'] + +Developer impact +---------------- + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + +* bhagyashris + +Work Items +---------- + +* Implement customize task flow execution. +* Add unit tests for the coverage. +* Add documentation guide to describe how to configure customizable workflows. + +Dependencies +============ + +None + +Testing +======= + +None + +Documentation Impact +==================== + +Add documentation guide to describe how to configure customizable workflows. + +References +========== + +https://etherpad.openstack.org/p/masakari-recovery-method-customization +http://eavesdrop.openstack.org/meetings/masakari/2018/masakari.2018-07-03-03.00.log.html + +History +======= + +None