Browse Source

Implementation of recovery method customization

Implements the spec recovery method customization by
configuring the  actions in terms of execution order,
extra parameters to execute commands in action etc.

blueprint: recovery-method-customization
Change-Id: Ibc80ae0a749bd0a53a432a600ca9f0aaa16d5973
shilpa.devharakar 11 months ago
parent
commit
ed33a646c2
1 changed files with 307 additions and 0 deletions
  1. 307
    0
      specs/rocky/approved/recovery-method-customization.rst

+ 307
- 0
specs/rocky/approved/recovery-method-customization.rst View File

@@ -0,0 +1,307 @@
1
+..
2
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
3
+ License.
4
+
5
+ http://creativecommons.org/licenses/by/3.0/legalcode
6
+
7
+=============================
8
+Recovery method customization
9
+=============================
10
+
11
+https://blueprints.launchpad.net/masakari/+spec/recovery-method-customization
12
+
13
+This spec talks about making recovery workflow configurable. Operator can
14
+configure the workflow in a config file which can be used to build and
15
+execute the recovery workflow.
16
+
17
+What is recovery workflow?
18
+
19
+Recovery workflow is nothing but certain set of actions executed to recover
20
+from failure.
21
+Masakari supports three types of recovery failures:
22
+
23
+* instance-failure
24
+* process-failure
25
+* host-failure
26
+
27
+For each of these failures, Masakari executes a workflow to recover from
28
+failure on receiving the notification.
29
+
30
+Problem description
31
+===================
32
+
33
+Masakari uses taskflow library to execute the workflows which consists of
34
+recovery actions which are predefined and are executed linearly. If operator
35
+wants to add/remove any existing recovery actions to any of these workflow,
36
+then there is no other way to do so without making changes in the code.
37
+For example in case of ‘host-failure recovery workflow’, predefined workflow is:
38
+
39
+* disable_compute_node
40
+* prepare_ha_enabled_instances
41
+* evacuate and confirm_evacuate
42
+
43
+If operator wants to remove task ‘disable_compute_node’ from the workflow or add
44
+a new task such as send an alert mail to operator then it's not possible with
45
+the current implementation.
46
+
47
+Use Cases
48
+---------
49
+
50
+Operator may want to add/remove tasks from the existing workflow based on
51
+their requirements.
52
+For example, in case of ‘host-failure recovery’ workflow predefined flow is;
53
+disable_compute_node, prepare_ha_enabled_instances, evacuate and
54
+confirm_evacuate.
55
+Some of the possible recovery workflow combinations can be:
56
+
57
+* send Alert/Mail to operator/users of vms -> disable_compute_node
58
+  -> prepare_ha_enabled_instances -> evacuate
59
+  -> update the pricing/ metering DB -> confirm_evacuate
60
+  -> send Alert/Mail to operator/user (recovery done)
61
+* send Alert/Mail to operator/users of vms -> disable_compute_node
62
+  -> prepare_ha_enabled_instances
63
+  -> evacuate -> confirm_evacuate
64
+  -> send Alert/Mail to operator/users of vms (recovery done)
65
+* send Alert/Mail to operator/users of vms
66
+
67
+Proposed change
68
+===============
69
+
70
+Make a provision to add/remove tasks from the existing workflow based on the
71
+requirements. We plan to decompose the existing hard-coded recovery workflow
72
+into separate tasks and then tied them together to form a workflow which can be
73
+configured in a new conf ‘masakari-custom-recovery-methods.conf’ file as explained
74
+below:
75
+Add a section ‘[taskflow_driver_recovery_flows]’ in newly added
76
+masakari-custom-recovery-methods.conf file. Under this add below config options for
77
+configuration of customized recovery actions for each type of workflow.
78
+Each config option will be dictionary containing key/value pairs for
79
+tasks to be executed.
80
+For example: pre:[v1,v2],main:[v1,v2],post:[v1,v2,v3]
81
+Here key will be pre/main/post and value will be the list of tasks to execute
82
+for recovery failure.
83
+If file does not exist, then default tasks will be executed that will be
84
+configured during registration of configuration options.
85
+
86
+* ‘instance_failure_recovery_tasks’ is a dictionary containing key as
87
+   pre/main/post and value will be the comma-separated list of tasks to be
88
+   executed for process failure.
89
+* ‘process_failure_recovery_tasks’ is a dictionary containing key as
90
+   pre/main/post and value will be the comma-separated list of tasks to be
91
+   executed for process failure.
92
+* ‘host_auto_failure_recovery_tasks’ is a dictionary containing key as
93
+   pre/main/post and value will be the comma-separated list of tasks to be
94
+   executed for host failure for auto recovery.
95
+* ‘host_rh_failure_recovery_tasks’ is a dictionary containing key as
96
+   pre/main/post and value will be the comma-separated list of tasks to be
97
+   executed for host failure for rh recovery.
98
+
99
+For example,
100
+
101
+.. code::
102
+
103
+    [taskflow_driver_recovery_flows]
104
+    instance_failure_recovery_tasks = pre:['custom_pre_task','stop_instance_task'],
105
+                                      main:['start_instance_task','custom_main_task'],
106
+                                      post:['confirm_instance_active_task','custom_post_task']
107
+    process_failure_recovery_tasks = pre:['disable_compute_node_task'],
108
+                                     main:['confirm_compute_node_disabled_task','custom_main_task'],
109
+                                     post:['custom_post_task']
110
+    host_auto_failure_recovery_tasks = pre:['disable_compute_service_task'],
111
+                                       main:['prepare_HA_enabled_instances_task'],
112
+                                       post:['evacuate_instances_task','custom_post_task']
113
+    host_rh_failure_recovery_tasks = pre:['custom_pre_task','disable_compute_service_task'],
114
+                                     main:['prepare_HA_enabled_instances_task',
115
+                                     'evacuate_instances_task']
116
+                                     post:['custom_post_task']
117
+
118
+Need to add entry point for each task in setup.cfg so that these tasks can be
119
+loaded dynamically using stevedore during creation of a recovery workflow.
120
+
121
+For example, Masakari setup.cfg will have following entry points:
122
+
123
+* For each entry point in setup.cfg should have the full class path as mentioned
124
+  in below example:
125
+
126
+.. code::
127
+
128
+    masakari.task_flow.tasks =
129
+        stop_instance_task = masakari.engine.drivers.taskflow.instance_failure:StopInstanceTask
130
+
131
+.. code::
132
+
133
+    masakari.task_flow.tasks =
134
+        disable_compute_service_task = <full_class_path_of_task>
135
+        prepare_HA_enabled_instances_task = <full_class_path_of_task>
136
+        evacuate_instances_task = <full_class_path_of_task>
137
+        stop_instance_task = <full_class_path_of_task>
138
+        start_instance_task = <full_class_path_of_task>
139
+        confirm_instance_active_task = <full_class_path_of_task>
140
+        disable_compute_node_task = <full_class_path_of_task>
141
+        confirm_compute_node_disabled_task = <full_class_path_of_task>
142
+
143
+If operator wants to configure customized tasks in a Third Party library,
144
+then they will need to follow below guidelines to associate newly added
145
+tasks with the respective recovery workflows in Masakari:
146
+
147
+* First make sure required Third Party Library is installed on the Masakari
148
+  engine node.
149
+* Configure custom task in Third Party Library's setup.cfg as below:
150
+
151
+For example, Third Party Libraries setup.cfg will have following entry points
152
+
153
+.. code::
154
+
155
+    masakari.task_flow.tasks =
156
+        custom_pre_task = <custom_task_class_path_from_third_party_library>
157
+        custom_main_task = <custom_task_class_path_from_third_party_library>
158
+        custom_post_task = <custom_task_class_path_from_third_party_library>
159
+
160
+Note:
161
+    Entry point in Third Party Library's setup.cfg should have same key as
162
+    in Masakari setup.cfg for respective failure recovery.
163
+
164
+* If there are any configuration parameters required for custom task,
165
+  then add them into masakari-custom-recovery-methods.conf under the
166
+  same group/section where they are registered in Third Party Library.
167
+  Operator will be responsible to generate masakari configuration file
168
+  by themselves.
169
+
170
+* Operator should ensure output of each task should be made available to
171
+  the next tasks needing them.
172
+
173
+
174
+Alternatives
175
+------------
176
+
177
+For recovery from failures, instead of fully configurable task flow,
178
+one can add custom tasks at the start or after completion of predefined
179
+existing workflow.
180
+
181
+One can customized recovery workflow in masakari-custom-recovery-methods.conf
182
+as below and Masakari will inject these custom tasks at start or end of the
183
+predefined workflow as per requirement.
184
+
185
+For example,
186
+
187
+.. code::
188
+
189
+    [taskflow_driver_recovery_flows]
190
+    instance_failure_recovery_tasks = ['custom_pre_task','custom_main_task']
191
+    process_failure_recovery_tasks = ['custom_pre_task']
192
+    host_auto_failure_recovery_tasks = ['custom_pre_task','custom_main_task']
193
+    host_rh_failure_recovery_tasks = ['custom_pre_task']
194
+
195
+
196
+custom_pre_task and custom_main_task will be executed at the start or end of
197
+the existing ‘instance_failure’ workflow.
198
+
199
+Note:
200
+    For host failure having recovery method as rh, developer should add
201
+    custom task in nested flow so that it will execute once.
202
+
203
+Data model impact
204
+-----------------
205
+
206
+None
207
+
208
+REST API impact
209
+---------------
210
+
211
+None
212
+
213
+Security impact
214
+---------------
215
+
216
+None
217
+
218
+Notifications impact
219
+--------------------
220
+
221
+None
222
+
223
+Other end user impact
224
+---------------------
225
+
226
+None
227
+
228
+Performance Impact
229
+------------------
230
+
231
+None
232
+
233
+Other deployer impact
234
+---------------------
235
+
236
+A new config file ‘masakari-custom-recovery-methods.conf’ will be added, where
237
+‘taskflow_driver_recovery_flows’ section need to be update for customized
238
+recovery workflows.
239
+If an operator doesn't want any customization to any of the recovery workflows,
240
+then there will be no impact as it will load the default tasks for each
241
+recovery workflow.
242
+For example,
243
+
244
+.. code::
245
+
246
+    [taskflow_driver_recovery_flows]
247
+    instance_failure_recovery_tasks = pre:['custom_pre_task','stop_instance_task'],
248
+                                      main:['start_instance_task','custom_main_task'],
249
+                                      post:['confirm_instance_active_task','custom_post_task']
250
+    process_failure_recovery_tasks = pre:['disable_compute_node_task'],
251
+                                     main:['confirm_compute_node_disabled_task','custom_main_task'],
252
+                                     post:['custom_post_task']
253
+    host_auto_failure_recovery_tasks = pre:['disable_compute_service_task'],
254
+                                       main:['prepare_HA_enabled_instances_task'],
255
+                                       post:['evacuate_instances_task','custom_post_task']
256
+    host_rh_failure_recovery_tasks = pre:['custom_pre_task','disable_compute_service_task'],
257
+                                     main:['prepare_HA_enabled_instances_task',
258
+                                     'evacuate_instances_task']
259
+                                     post:['custom_post_task']
260
+
261
+Developer impact
262
+----------------
263
+
264
+None
265
+
266
+Implementation
267
+==============
268
+
269
+Assignee(s)
270
+-----------
271
+
272
+Primary assignee:
273
+
274
+* bhagyashris <bhagyashri.shewale@nttdata.com>
275
+
276
+Work Items
277
+----------
278
+
279
+* Implement customize task flow execution.
280
+* Add unit tests for the coverage.
281
+* Add documentation guide to describe how to configure customizable workflows.
282
+
283
+Dependencies
284
+============
285
+
286
+None
287
+
288
+Testing
289
+=======
290
+
291
+None
292
+
293
+Documentation Impact
294
+====================
295
+
296
+Add documentation guide to describe how to configure customizable workflows.
297
+
298
+References
299
+==========
300
+
301
+https://etherpad.openstack.org/p/masakari-recovery-method-customization
302
+http://eavesdrop.openstack.org/meetings/masakari/2018/masakari.2018-07-03-03.00.log.html
303
+
304
+History
305
+=======
306
+
307
+None

Loading…
Cancel
Save