Browse Source

Add periodic task to clean up workflow failure

Change-Id: I9d7fb601903307b54a71def2aeceb00170313317
Implements: bp add-periodic-tasks
Abhishek Kekane 2 years ago
parent
commit
4e746cb5a3
1 changed files with 122 additions and 0 deletions
  1. 122
    0
      specs/ocata/approved/lite-specs.rst

+ 122
- 0
specs/ocata/approved/lite-specs.rst View File

@@ -0,0 +1,122 @@
1
+==================
2
+Masakari Spec Lite
3
+==================
4
+
5
+Please keep this template section in place and add your own copy of it between
6
+the markers. Please fill only one Spec Lite per commit.
7
+
8
+<Title of your Spec Lite>
9
+-------------------------
10
+
11
+:link: <Link to the blueprint.>
12
+
13
+:problem: <What is the driver to make the change.>
14
+
15
+:solution: <High level description what needs to get done. For example:
16
+            "We need to add client function X.Y.Z to interact with new server
17
+            functionality Z".>
18
+
19
+:impacts: <All possible \*Impact flags (same as in commit messages) or 'None'.>
20
+
21
+Optionals (please remove this line and fill or remove the rest until End):
22
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
23
+
24
+:how: <More technical details than the high level overview of `solution`
25
+       if needed.>
26
+
27
+:alternatives: <Any alternative approaches that might be worth of bringing
28
+                to discussion.>
29
+
30
+:timeline: <Estimation of the time needed to complete the work.>
31
+
32
+:reviewers: <If reviewers has been agreed for the functionality, list them
33
+             here.>
34
+
35
+:assignee: <If known, list who is going to work on the feature implementation
36
+            here>
37
+
38
+End of Template
39
++++++++++++++++
40
+
41
+Add periodic task to clean up workflow failure
42
+----------------------------------------------
43
+
44
+:link: https://blueprints.launchpad.net/masakari/+spec/add-periodic-tasks
45
+
46
+:problem: Due to some unknown circumstances, there is a possibility few of the
47
+          notifications might get into ‘error’ status or it might remain in
48
+          ‘new’ status forever. There should be some way to retrieve such
49
+          notifications and process them to completion.
50
+
51
+:solution: Add a periodic task “process_unfinished_notifications’ which will
52
+           execute at regular interval as defined by the new config option
53
+           “process_unfinished_notifications_interval” in seconds. Default
54
+           value for this option will be 120 seconds, however operator can set
55
+           this interval value as per the requirement. Inside this
56
+           periodic task, it will retrieve all notifications which are in
57
+           ‘error’ or ‘new’ status and then execute recovery action workflow to
58
+           process all of them. The notifications which are in ‘new’ status
59
+           will be picked up based on a new config option
60
+           ‘retry_notification_new_status_interval’. Default value for this
61
+           option will be 60 seconds, however operator can set this interval
62
+           value as per the requirement. Each notification has ‘generated_time’
63
+           field, if this time + retry_notification_new_status_interval value
64
+           (in seconds) is greater than or equal to the current system time,
65
+           then all such notifications in ‘new’ status will be picked up by
66
+           this periodic task. Also, the notifications in ‘error’ status will
67
+           be picked up too.
68
+
69
+           Let’s understand the transition state of notification for different
70
+           statuses for success case:
71
+           notification current status error -> running -> finished
72
+           notification current status new-> running -> finished
73
+           If the workflow execution fails, then the transition state of
74
+           notification would be:
75
+           notification current status error -> running -> failed
76
+           notification current status new-> running -> failed
77
+
78
+           Note: One important point to take note of is if the original
79
+           notification status is ‘new’ then it won’t be retried again if the
80
+           workflow fails to process it in the periodic task. It’s status will
81
+           be directly set to ‘failed’. The operator needs to take corrective
82
+           action for all notifications which are in ‘failed‘ state manually.
83
+
84
+:alternatives: Add two periodic tasks ‘process_error_notifications’ and
85
+               ‘process_queued_notifications’ to process notifications which
86
+               are in ‘error’ and ‘new’ status respectively. These periodic
87
+               tasks will be called at regular intervals as defined by two
88
+               new config options “process_error_notification_interval’ and
89
+               ‘process_queued_notifications_interval’. The logic for retrieval
90
+               of notifications which are in ‘error’ and ‘new’ statuses will be
91
+               exactly same as above solution. The only difference would be
92
+               in the notification status upon its completion as explained
93
+               below.
94
+
95
+               Transition state of notification for different statuses for
96
+               success case.
97
+               notification current status
98
+               error (process_error_notifications) -> running -> finished
99
+               notification current status
100
+               new (process_queued_notifications) -> running -> finished
101
+
102
+               If the workflow execution fails, then the transition state of
103
+               notification would be:
104
+               notification current status
105
+               error (process_error_notifications) -> running -> failed
106
+               notification current status
107
+               new (process_queued_notifications) -> running -> error
108
+               This means that the notification will be again eligible for
109
+               reprocessing during the next cycle of
110
+               ‘process_error_notifications’ periodic task.
111
+
112
+:impacts: None
113
+
114
+:timeline: Expected to be merged within the Ocata time frame.
115
+
116
+:reviewers: sam47priya@gmail.com, kajinamit@nttdata.co.jp,
117
+            tushar.vitthal.patil@gmail.com
118
+
119
+:assignee: Abhishek Kekane
120
+
121
+End of Add periodic task to clean up workflow failure
122
++++++++++++++++++++++++++++++++++++++++++++++++++++++

Loading…
Cancel
Save