Because of quotas, there are times when creating a resource and then
deleting another resource may fail where doing it in the reverse order
would work, even though the resources are independent of one another.
When enqueueing 'check_resource' messages, send those for cleanup nodes
prior to those for update nodes. This means that all things being equal
(i.e. no dependency relationship), deletions will be started first. It
doesn't guarantee success when quotas allow, since only a dependency
relationship will cause Heat to wait for the deletion to complete before
starting creation, but it is a risk-free way to give us a better chance of
succeeding.
Change-Id: I9727d906cd0ad8c4bf9c5e632a47af6d7aad0c72
Partial-Bug: #1713900
Add converge parameter for stack update API and RPC call,
that allow triggering observe on reality. This will be
triggered by API call with converge argument (with True
or False value) within. This flag also works for resources
within nested stack.
Implements bp get-reality-for-resources
Change-Id: I151b575b714dcc9a5971a1573c126152ecd7ea93
The patch at I1cb321a3878a0abce9b41832f76bf77c25bf7cb4 properly deleted
the snapshots from the database, but as delete in convergence sets the
template to an empty template, stack.resources is empty. This works
around the problem by deleting the snapshots beforehand.
Related-Bug: #1508299
Change-Id: Id1c2c1a293fdcda07c527f29fedc00b716b303bc
Handle the restore operation as a normal convergence update instead of a
legacy one.
Change-Id: I6ee46cdf7a8fdf89c58c9812d08af21c97fb0f9e
Related-Bug: #1687006
Oslo.config deprecated parameter enforce_type and change its default
value to True in Ifa552de0a994e40388cbc9f7dbaa55700ca276b0. Remove
the usage of it to avoid DeprecationWarning: "Using the 'enforce_type'
argument is deprecated in version '4.0' and will be removed in version
'5.0': The argument enforce_type has changed its default value to True
and then will be removed completely."
Change-Id: I91b0f0a52b5ce8654702510eed76d5dea8cc8fe4
Related--Bug: #1517839
Instead of calling check_resource on all leaves in the resource graph at
once, sleep a little bit between each call. As it's a tad slower,
delegate it to a thread so that the stack_create RPC message doesn't
timeout when you have lots of resources.
Change-Id: I84d2b34d65b3ce7d8d858de106dac531aff509b7
Partial-Bug: #1566845
Don't create a resource record for backup stack if the
resource record exists, just to migrate it to backup
stack, to avoid redundant data remaining for existing
stack.
This patch also adds resource.store() which covers
_store() and _store_or_update() implemention. And then
we can delete the two methods.
Change-Id: I0b4b983306ea84fab0e2c81876ef407a80d25989
Closes-Bug: #1662095
When a resource failed, the stack state was set to FAILED and current
traversal was set to emoty string. The actual traversal was lost and
there was no way to delete the sync points belonging to the actual
traversal.
This change keeps the current traversal when you do a state set, so that
later you can delete the sync points belonging to it. Also, the current
traversal is set to empty when the stack has failed and there is no need
to rollback.
Closes-Bug: #1618155
Change-Id: Iec3922af92b70b0628fb94b7b2d597247e6d42c4
Implements:
(1) stack-cancel-update <stack_id> will start another update using the
previous template/environment. We'll start rolling back; in-progress
resources will be allowed to complete normally.
(2) stack-cancel-update <stack_id> --no-rollback will set the
traversal_id to None so no further resources will be updated;
in-progress resources will be allowed to complete normally.
Change-Id: I46ebdebb130be7410abe3e0b62f85da9856287b6
Run `heat-manage migrate-convergence-1 <stack_id>` to migrate
legacy stack to convergence engine.
Heat engine is used for doing migration i.e. migration can't
be done offline.
Change-Id: Ie7c2498b37937438f16d154b154b3a6ecbf9ff74
Implements-bp: convergence-migrate-stack
In convergence, wherein concurrent updates are possible, if a resource
is deleted (by previous traversal) after dependency graph is created
for new traversal, the resource remains in graph but wouldn't be
available in DB for processing.
It is prerequisite to have resources in DB before any action can be
taken on them.
Hence during convergence resource delete action, the resource entry
from DB is not deleted i.e soft deleted, so that the latest/new update
can find the entry.
All of these soft deleted resources will be deleted when the stack has
completed its operation.
Closes-Bug: #1528560
Change-Id: I0b36ce098022560d7fe01623ce7b66d1d5b38d55
This patch reverts change I6a212da19a774239f014163774e75fe11dfe272c
and adds new DB api resource_get_all_active_by_stack to be used by
convergence.
The new DB api will be used while generating graph for convergence
stack and will fetch all resources of stack from DB excluding
the DELETE COMPLETE resources, if any.
Change-Id: I303ef2c9b5b6a0a49253425c00565c8981cc6825
Partial-Bug: #1528560
When a stack fails, update the current traversal and set it to empty
string so that the resource workers bail out. This is effectively
implementing cancel-on-failure.
Change-Id: Ifab89e5dc69bab53faa6b82db624024214830d77
Closes-Bug: #1491186
Currently if convergence is enabled then deleting stack does not
deletes the snapshots. This patch deletes the snapshots before
deleting the stack.
Co-Authored-By: Rakesh H S <rh-s@hpe.com>
Change-Id: I1cb321a3878a0abce9b41832f76bf77c25bf7cb4
Closes-Bug: #1508299
This makes sure that type checking is done by oslo.config
on the test override values.
Change-Id: Ia8c1cb55fe98e9d06b9b9ff13e5c2d25aa67bff3
Closes-bug: #1517839
If enable convergence, make sure to delete the
credentials when deleting stack.
Change-Id: I93096d691503bd914d1c059db3203d4f8ac5aec2
Closes-Bug: #1558964
In convergence, if stack state_set returns None, it indicates failed
concurrent update.
Fix rollback logic to follow convergence code path while saving the
stack state in DB, so that convergence post stack action code is
executed.
Change-Id: I9c4910c9fa56aa0070c752fc419e5e5fa7f13e99
Closes-Bug: #1544949
While selecting best existing resource from DB, we select the resource
only if it belongs to current or previous template.
But in case of multiple concurrent updates, we could still have the
candidate resource from old template. Hence do not ignore them.
Change-Id: I33243a12aca43242825a256dde2c2969ddb5ef73
Closes-Bug: #1495544
Stack create is called internally to complete stack adopt action.
Hence stack adopt will not use stack lock in convergence, so persist
state when state_set is called.
Stack adopt logic is already taken care by convergence.
Change-Id: I2bcfdd7d8b7d9a0ce141ea29ce90253dce0402a8
To avoid certain concurrency related issues, the DB update API needs to
be given the traversal ID of the stack intended to be updated. By making
this change, we can void having following at all the places:
if current_traversal != stack.current_traversal:
return
The check for current traversal should be implicit, as a part of stack's
store and state_set methods, where self.current_traversal should be used
as expected traversal to be updated. All the state changes or updates in
DB to the stack object go through this implicit check (using
update...where).
When stack updates are triggered, the current traversal should be backed
up as previous traversal, a new traversal should be generated and the
stack should be stored in DB with expected traversal as the previous
traversal. This will ensure that no two updates can simultaneously
succeed on same stack with same traversal ID. This was one of our
primary goal.
Following example cases describe the issues we encounter:
1. When 2 updates, U1 and U2 try to update a stack concurrently:
1. Current traversal(CT) is X
2. U1 loads stack with CT=X
3. U2 loads stack with CT=X
4. U2 stores the stack and updates CT=Y
5. U1 stores the stack and updates the CT=Z
Both the updates have succeeded, and both would be running until
one of the workers does stack.current_traversal == current_traversal
and bail out.
Ideally, U1 should have failed: only one should be allowed in case
of concurrent update. When both U1 and U2 pass X as the expected
traversal ID of the stack, then this problem is solved.
2. A resource R is being provisioned for stack with current traversal
CT=X:
1. An new update U is issued, it loads the stack with CT=X.
2. Resource R fails and loads the stack with CT=X to mark it as FAILED.
3. Update U updates the stack with CT=Y and goes ahead with sync_point
etc., marks stack as UPDATE_IN_PROGRESS
4. Resource marks the stack as UPDATE_FAILED, which to user means that
update U has failed, but it actually is going on.
With this patch, when Resource R fails, it will supply CT=X as
expected traversal to be updated and will eventually fail because
update U with CT=Y has taken over.
Partial-Bug: #1512343
Change-Id: I6ca11bed1f353786bb05fec62c89708d98159050
When stack actions such as suspend/resume/snapshot/restore are
completed, two events are logged in DB.
The above stack actions still use stack lock even under convergence.
Hence when setting the stack action as complete/failed, state_set
should not persist state.
Sending notification, events and saving Complete/Failed state in DB
is done at the time of releasing lock for these stack actions.
Change-Id: Ib3e6c1de2f2f17502049e06ae484cb2e50867fab
Closes-Bug: #1516089
Previous traversal sync points should be deleted in case of concurrent
updates. However, in case of stack CREATE there won't be any previous
traversal sync points to delete.
Hence avoid hitting the DB.
Change-Id: If61332107a1f4d70a590f7db87b6c10bc9cf7b2c
The usage of assertEqual(True/False, ***) should be changed to a more
meaningful format of assertTrue/False(***).
Change-Id: I685f7bb0b4669d8813ebbb796b6014ad44d7ff0c
Closes-Bug: #1510001
Input data can contain tuple as key when the attribute and path
components are resolved. Converting this to JSON (serializing) fails.
To fix this, recursively look for tuple as keys in input data and
convert them to string and vice-versa while serializing and
deserializing.
Change-Id: I87e496d51004f3374965332921628f5eccb34657
Partial-Bug: #1492116