Retry resource create until success

On resource create, if a ResourceInFailure is raised then
repeated attempts are made to delete and recreate the resource
until success or a different error state is achieved.

Likewise, the prepare-retry deletes will be retried until
ResourceInFailure is not raised.

An exponentially increasing delay with jitter is introduced
between each create attempt, and attempts continue up to the configured
action_retry_limit or stack operation timeout.

Likewise An exponentially increasing delay with jitter is introduced
between each prepre-retry delete attempt, and delete attempts
continue up to the configured action_retry_limit or stack operation
timeout. The delete attempt count is reset to zero whenever a create
attempt has been performed.

Creates that result from an UpdateReplace will also go
through this path, so this is also helps some stack update scenarios.

This change is aimed at being part of an interim solution to making
heat resilient to transient cloud failures. Convergence is the
permanent solution however there may be benefits to the convergence
implementation from this interim effort.

Currently retry is only attempted on ResourceInFailure. Eventually
client plugins can indicate whether a given exception should lead
to a retry attempt (such as connection errors, some 500s).

Partial-Blueprint: retry-failed-api-calls
Change-Id: I07c3301349bcd24096f3cafbb6d82c43bccb93de
This commit is contained in:
Steve Baker
2014-07-28 11:48:59 +12:00
parent f0ec53626e
commit d61427fe16
6 changed files with 208 additions and 5 deletions

View File

@@ -47,6 +47,10 @@
# one time. (integer value)
#max_stacks_per_tenant=100
# Number of times to retry to bring a resource to a non-error
# state. Set to 0 to disable retries. (integer value)
#action_retry_limit=5
# Controls how many events will be pruned whenever a stack's
# events exceed max_events_per_stack. Set this lower to keep
# more events at the expense of more frequent purges. (integer