Add periodic task to clean up workflow failure

Change-Id: I9d7fb601903307b54a71def2aeceb00170313317
Implements: bp add-periodic-tasks
This commit is contained in:
Abhishek Kekane 2017-01-20 11:38:09 +05:30
parent 2c10be4ec4
commit 4e746cb5a3
1 changed files with 122 additions and 0 deletions

View File

@ -0,0 +1,122 @@
==================
Masakari Spec Lite
==================
Please keep this template section in place and add your own copy of it between
the markers. Please fill only one Spec Lite per commit.
<Title of your Spec Lite>
-------------------------
:link: <Link to the blueprint.>
:problem: <What is the driver to make the change.>
:solution: <High level description what needs to get done. For example:
"We need to add client function X.Y.Z to interact with new server
functionality Z".>
:impacts: <All possible \*Impact flags (same as in commit messages) or 'None'.>
Optionals (please remove this line and fill or remove the rest until End):
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
:how: <More technical details than the high level overview of `solution`
if needed.>
:alternatives: <Any alternative approaches that might be worth of bringing
to discussion.>
:timeline: <Estimation of the time needed to complete the work.>
:reviewers: <If reviewers has been agreed for the functionality, list them
here.>
:assignee: <If known, list who is going to work on the feature implementation
here>
End of Template
+++++++++++++++
Add periodic task to clean up workflow failure
----------------------------------------------
:link: https://blueprints.launchpad.net/masakari/+spec/add-periodic-tasks
:problem: Due to some unknown circumstances, there is a possibility few of the
notifications might get into error status or it might remain in
new status forever. There should be some way to retrieve such
notifications and process them to completion.
:solution: Add a periodic task “process_unfinished_notifications which will
execute at regular interval as defined by the new config option
“process_unfinished_notifications_interval” in seconds. Default
value for this option will be 120 seconds, however operator can set
this interval value as per the requirement. Inside this
periodic task, it will retrieve all notifications which are in
error or new status and then execute recovery action workflow to
process all of them. The notifications which are in new status
will be picked up based on a new config option
retry_notification_new_status_interval. Default value for this
option will be 60 seconds, however operator can set this interval
value as per the requirement. Each notification has generated_time
field, if this time + retry_notification_new_status_interval value
(in seconds) is greater than or equal to the current system time,
then all such notifications in new status will be picked up by
this periodic task. Also, the notifications in error status will
be picked up too.
Lets understand the transition state of notification for different
statuses for success case:
notification current status error -> running -> finished
notification current status new-> running -> finished
If the workflow execution fails, then the transition state of
notification would be:
notification current status error -> running -> failed
notification current status new-> running -> failed
Note: One important point to take note of is if the original
notification status is new then it wont be retried again if the
workflow fails to process it in the periodic task. Its status will
be directly set to failed. The operator needs to take corrective
action for all notifications which are in failed state manually.
:alternatives: Add two periodic tasks process_error_notifications and
process_queued_notifications to process notifications which
are in error and new status respectively. These periodic
tasks will be called at regular intervals as defined by two
new config options “process_error_notification_interval and
process_queued_notifications_interval. The logic for retrieval
of notifications which are in error and new statuses will be
exactly same as above solution. The only difference would be
in the notification status upon its completion as explained
below.
Transition state of notification for different statuses for
success case.
notification current status
error (process_error_notifications) -> running -> finished
notification current status
new (process_queued_notifications) -> running -> finished
If the workflow execution fails, then the transition state of
notification would be:
notification current status
error (process_error_notifications) -> running -> failed
notification current status
new (process_queued_notifications) -> running -> error
This means that the notification will be again eligible for
reprocessing during the next cycle of
process_error_notifications periodic task.
:impacts: None
:timeline: Expected to be merged within the Ocata time frame.
:reviewers: sam47priya@gmail.com, kajinamit@nttdata.co.jp,
tushar.vitthal.patil@gmail.com
:assignee: Abhishek Kekane
End of Add periodic task to clean up workflow failure
+++++++++++++++++++++++++++++++++++++++++++++++++++++