masakari-specs/specs/ocata/approved/lite-specs.rst

5.7 KiB
Raw Blame History

Masakari Spec Lite

Please keep this template section in place and add your own copy of it between the markers. Please fill only one Spec Lite per commit.

<Title of your Spec Lite>

link

<Link to the blueprint.>

problem

<What is the driver to make the change.>

solution

<High level description what needs to get done. For example: "We need to add client function X.Y.Z to interact with new server functionality Z".>

impacts

<All possible *Impact flags (same as in commit messages) or 'None'.>

Optionals (please remove this line and fill or remove the rest until End):

how

<More technical details than the high level overview of solution if needed.>

alternatives

<Any alternative approaches that might be worth of bringing to discussion.>

timeline

<Estimation of the time needed to complete the work.>

reviewers

<If reviewers has been agreed for the functionality, list them here.>

assignee

<If known, list who is going to work on the feature implementation here>

End of Template

Add periodic task to clean up workflow failure

link

https://blueprints.launchpad.net/masakari/+spec/add-periodic-tasks

problem

Due to some unknown circumstances, there is a possibility few of the notifications might get into error status or it might remain in new status forever. There should be some way to retrieve such notifications and process them to completion.

solution

Add a periodic task “process_unfinished_notifications which will execute at regular interval as defined by the new config option “process_unfinished_notifications_interval” in seconds. Default value for this option will be 120 seconds, however operator can set this interval value as per the requirement. Inside this periodic task, it will retrieve all notifications which are in error or new status and then execute recovery action workflow to process all of them. The notifications which are in new status will be picked up based on a new config option retry_notification_new_status_interval. Default value for this option will be 60 seconds, however operator can set this interval value as per the requirement. Each notification has generated_time field, if this time + retry_notification_new_status_interval value (in seconds) is greater than or equal to the current system time, then all such notifications in new status will be picked up by this periodic task. Also, the notifications in error status will be picked up too.

Lets understand the transition state of notification for different statuses for success case: notification current status error -> running -> finished notification current status new-> running -> finished If the workflow execution fails, then the transition state of notification would be: notification current status error -> running -> failed notification current status new-> running -> failed

Note: One important point to take note of is if the original notification status is new then it wont be retried again if the workflow fails to process it in the periodic task. Its status will be directly set to failed. The operator needs to take corrective action for all notifications which are in failed state manually.

alternatives

Add two periodic tasks process_error_notifications and process_queued_notifications to process notifications which are in error and new status respectively. These periodic tasks will be called at regular intervals as defined by two new config options “process_error_notification_interval and process_queued_notifications_interval. The logic for retrieval of notifications which are in error and new statuses will be exactly same as above solution. The only difference would be in the notification status upon its completion as explained below.

Transition state of notification for different statuses for success case. notification current status error (process_error_notifications) -> running -> finished notification current status new (process_queued_notifications) -> running -> finished

If the workflow execution fails, then the transition state of notification would be: notification current status error (process_error_notifications) -> running -> failed notification current status new (process_queued_notifications) -> running -> error This means that the notification will be again eligible for reprocessing during the next cycle of process_error_notifications periodic task.

impacts

None

timeline

Expected to be merged within the Ocata time frame.

reviewers

sam47priya@gmail.com, kajinamit@nttdata.co.jp, tushar.vitthal.patil@gmail.com

assignee

Abhishek Kekane

End of Add periodic task to clean up workflow failure