We seem to leave replaced_by field of a resource after rollback
(if a stack cancel is issued before the replacement
resource is added to the stack). This in turn results in the
stack getting stuck in IN_PROGRESS for further updates.
Change-Id: I3361ad90e303d24a16a943408fe0f6c406cf785f
Co-Authored-By: Zane Bitter <zbitter@redhat.com>
Task: #19582
Story: #2001974
Since we are doing set operations on it, make the 'requires' attribute of a
Resource a set and only convert to/from a list when loading/storing. To
avoid churn in the database, sort the list when storing it.
Change-Id: I137fae8ae630eb235d7811fcba81738d828e6a1e
When we create a replacement resource, do so with the correct requires list
that it will ultimately have, instead of a copy of the old resource's
requires.
This will happen anyway when Resource.create_convergence() is actually
called (which is on the other end of an RPC message, so it may not actually
happen if another transition starts), but this makes it consistent from the
start.
Change-Id: Idf75a55be8d75e55c893ec1fb6ee3704f46bdc4f
Obtain the list of required resources in one place in check_resource,
instead of in both Resource.create_convergence() and
Resource.update_convergence(). Since these require knowledge about how the
dependencies are encoded in the RPC message, they never really belonged in
the Resource class anyway.
Change-Id: I030c6287acddcd91dfe5fecba72c276fec52829b
In the original prototype for convergence, we passed the input_data from
the SyncPoint to the resource when calling the equivalent of
convergence_delete(), so that we could clear and needed_by references that
no longer exist. This is pointless for a few reasons:
* It's implemented incorrectly - it *sets* the referenced resources into
needed_by instead of clearing them from it.
* We don't actually pass any input data - in WorkerService.check_resource()
it's always set to an empty dict for cleanup nodes, regardless of what
came in on the wire.
* We don't store the result to the database unless we're deleting the
resource anyway - in which case it doesn't matter.
* It turns out that even in the prototype, the whole needed_by mechanism
isn't actually used for anything:
c74aac1f07
Rather than pretend that we're doing something useful with the input_data
here, just set the needed_by to an empty list, which is what was happening
anyway.
Change-Id: I73f6cf1646584dc4a83497f5a583c2c8973e8aba
If an update of a resource fails, its 'requires' should be set to a union
of the previous and new requires lists. This is because if the resource
depends on a resource that has been replaced in this stack update, we can't
know if the current resource now depends on the new or old version of the
replaced resource if the current resource failed.
This is achieved by splitting up the setting of 'requires' and
'current_template_id', and changing them directly in the update() method
instead of via a callback.
When the resource state is changed to UPDATE_IN_PROGRESS, the new
requirements are added to the old ones. When the state is changed to
UPDATE_COMPLETE, the new requirements replace the old ones altogether. If
the update fails or handle_update() raises UpdateReplace, the union of the
requirements is kept. If _needs_update() raises UpdateReplace, the old
requirements are kept.
The current_template_id is updated when the state is changed to either
UPDATE_COMPLETE or UPDATE_FAILED, or when no update is required
(_needs_update() returns False).
This also saves an extra database write when the update fails.
Change-Id: If70d457fba5c64611173e3f9a0ae6b155ec69e06
Closes-Bug: #1663388
In convergence, when a resource is traversed and left unchanged, we must
still update the current template for it in the database. In addition,
if the resource was unchanged in the template but already in a FAILED
state and we elected not to replace it by returning False from
needs_replace_failed(), we must also update the status to COMPLETE.
Currently, we do those in two separate writes. This is an unnecessary
overhead (albeit for a fairly rare case), and the two writes can be
combined into one in the case where both changes are required.
Change-Id: I9e2f1e27ce2c119647c9fe228484228d2c15d943
Related-Bug: #1763021
When we store resource attribute values in the database, we increment the
atomic_key in the DB. This means we should also increment the _atomic_key
member of the Resource object, so that any future writes to the database
will expect the correct key.
In practice, this doesn't matter because there's no RPC API call that
stores attributes and then writes to the resource in the database for other
reasons. (Attributes are updated if they are missing on show-stack with
outputs, which is otherwise a read-only call; they're also updated at the
very end of a check-resource call in convergence.) But by maintaining a
consistent model this could help to prevent future bugs.
Change-Id: I71ed5907d33f2293c04ad3bace639b1d114e9a77
Related-Bug: #1763021
We were missing a call to update the template for convergence purpose
when updating an external resource.
Change-Id: I37e29aaf4110faa00c777c76640be485224720ca
Closes-Bug: #1756269
This refactors the building of schema from parameter validation to use a
new method (which doesn't keep stacks in memory), and use that new
method for providing proper schema for resource group when the size is
0.
Change-Id: Id3020e8f3fd94e2cef413d5eb9de9d1cd16ddeaa
Closes-Bug: #1751074
Closes-Bug: #1626025
We use `physical_resource_name` for name property in some
resources when name not provided during create, so we
shouldn't add name in resource_data if it's None in
property (might just the cases that we using
`physical_resource_name`).
Closes-Bug: #1737696
Change-Id: I7b6bcfdd748fdb8664245c4b65ec7170c2ad4a94
When we consolidated resource locking with resource state transitions in
a7376f7494, prepare_update_replace() moved
outside of the locked section, so that other update operations could
potentially conflict with it. This change ensures that we call it while the
lock is still held (if we have transitioned the resource to
UPDATE_IN_PROGRESS, and thus acquired the lock), or that we acquire the
lock before calling it (if UpdateReplace is raised before attempting to
update the resource).
To avoid unnecessary locking and unlocking, we only take the lock if the
Resource plugin has overridden the default handler method (either
restore_prev_rsrc() or prepare_for_replace()), since the defaults do
nothing.
Change-Id: Ie32836ed643e440143bde8b83aeb4d6159acd034
Closes-Bug: #1727127
We used to validate intrinsic functions in the ResourceDefinition by just
passing it to function.validate(). That worked when it also acted as a
CloudFormation template snippet but, since that behaviour was removed by
2c6fc7bcd6, that call now does nothing. This
replaces it with a validate() method on the ResourceDefinition itself.
This also improves path reporting for validation errors in deletion or
update policies.
Change-Id: I58c788002136ddf9bded309366e2072823b3ac77
Closes-Bug: #1732798
The actions SUSPEND, RESUME, SNAPSHOT, CHECK, and also DELETE when
abandoning rather than deleting a stack, have not (yet) been converted
to the convergence workflow and still do a legacy-style in-memory
traversal with the Stack lock held, even on stacks with convergence
enabled. Since a7376f7494 we have been
attempting to get a lock on the Resource to do these operations, even
though there is no engine ID set (which caused a WARNING-level log). For
these operations, respect existing Resource locks when convergence is
enabled, but don't try to acquire the lock.
Change-Id: I6f232380398a7caf9664717debfe39e3422e70d8
Related-Bug: #1727142
Related-Bug: #1662585
Use a single transaction to create the replacement resource and set it as
the replaced_by link in the old resource. Also, ensure that no other
traversal has taken a lock on the old resource before we modify it.
If we end up bailing out and not creating a replacement or sending an RPC
message to check it, make sure we retrigger any new traversal.
Change-Id: I23db4f06a4060f3d26a78f7b26700de426f355e3
Closes-Bug: #1727128
When a user updates from a deprecated resource type to an
equivalent-but-differently-named one (e.g. from
OS::Heat::SoftwareDeployments to OS::Heat::SoftwareDeploymentGroup), Heat
is supposed to change the type without replacing the resource as it would
normally do when a resource type changes. This was broken in convergence,
because since 4507322675 the new Resource
object we create during the check_resource operation (using the new type's
plugin) is no longer automatically initialised with data from the database
as resources in the legacy path are.
Move the substitution checking to the Resource.load() method, so that it
now returns an instance of the new plugin where allowed. In the actual
update_convergence() method then we need only check that the resource class
is the one we'd expect from the new template, and replace the resource if
not.
We do have a test that is designed to check that this is working, but in it
we didn't compare the physical IDs of the resource that is potentially
getting replaced, but rather the physical IDs of some other resource that
can't possibly get modified (surprise! it doesn't change).
Change-Id: I75778abc303525a71d0a918f7192f00a43c21284
Closes-Bug: #1729439
If we attempt to do a convergence update on a resource and find it already
locked by another traversal, don't try to update the resource's current
template ID or requirements data. Doing so will usually fail with the same
exception, but it is unnecessary and leaves ERROR-level messages in the
log. However, there is a race which could result in the call succeeding
(i.e. if the other task releases the lock just after we fail to get it),
and that could result in the resource not being updated at all.
Change-Id: I6bde1f9359cd52c99cca092e8abc660bac8b3065
Closes-Bug: #1722371
if `lock != self.LOCK_NONE` is true here then
`self._calling_engine_id is None` will alway be true.
Change-Id: I9a9158c4b135f0890807f63784468c069d320c52
TemplateResource is unique in that it can return a custom reference ID
defined as an output in the nested template. This was being fetched by the
StackResource.get_output() method, which also has the effect of retrieving
*all* of the outputs of the nested stack and caching them in memory in case
something else were to reference any of them.
Unfortunately when calculating the resource data for a stack (which we
must always do when e.g. showing the outputs), we always include the
reference IDs of all resources, regardless of whether they are referenced
by get_resource in the data we are looking to populate. (In fact, we have
no way from the Template API to distinguish where get_resource is used on
a particular resource, only where there are dependencies on it.) This is no
problem under the assumption that getting the reference ID is quick, but
that assumption does not hold for TemplateResource.
The show_output RPC call only retrieves a single output (as opposed to
show_stack, used in StackResource.get_output(), which calculates all of
them). Fall back to that call in TemplateResource.get_reference_id() if the
outputs are not already cached to avoid unnecessary overhead in the common
case.
Attribute values are now always fetched before the reference ID, so that we
won't end up making two RPC calls in the case where we also need to read
other outputs.
Change-Id: I66da13c0bb024749de4ae3f0c4b06ebb485cee37
Closes-Bug: #1719333
When generating the node_data() for a resource, catch and store any
exceptions (other than InvalidTemplateAttribute) encountered while
getting attributes. Re-raise the exception at the point where we try to
read the attribute value, including where we try to serialise the
NodeData object to store in the database.
In convergence, we generate and immediately serialise the NodeData, so
this should result in no substantial change in behaviour there.
In other situations (e.g. when we're just loading the data to show the
stack), this prevents an error in attribute calculation from aborting
the whole operation. The exception will still be raised if (and only if)
the erroneous attribute is accessed, but may be handled more
appropriately. For example, errors in calculating output values are
handled by reporting an error only for that particular output.
Change-Id: Idc97aee87405cc13e83be3373078b52e725850ea
Co-Authored-By: Zane Bitter <zbitter@redhat.com>
Closes-Bug: #1712280
Add converge parameter for stack update API and RPC call,
that allow triggering observe on reality. This will be
triggered by API call with converge argument (with True
or False value) within. This flag also works for resources
within nested stack.
Implements bp get-reality-for-resources
Change-Id: I151b575b714dcc9a5971a1573c126152ecd7ea93
This reverts the commits f5c32ad8fd and
14fdf72b000c82a80abb2587189dd7c6c7dfa0a0e.
The constraint never worked and the stuff to pass the template to
constraints (which was broken because we actually passed a
ResourceDefinition instead) is a pain. Just get rid of it rather than
fix it.
Change-Id: I4e1e787ad94ac1951f472ea066a9b1c9d3e03611
Closes-Bug: #1661403
Eager load resource_properties_data in resources in the typical
resource-loading scenarios where properties data will be
accessed. Thus, we can save an extra db query per resource when
loading all the resources in a stack, for instance. Fall back to lazy
loading properties data in other scenarios.
Also, the resource object doesn't need to store a copy of its
ResourcePropertiesData object in self.rsrc_prop_data, so don't.
Change-Id: Ib7684af3fe06f818628fd21f1216de5047872948
Closes-Bug: #1665503
If a resource is in a FAILED state, but the plugin overrides
_needs_update() to not raise UpdateReplace, and also no actual update is
required, we simply change the resource's state to UPDATE_COMPLETE.
However, if there were restricted actions set in the environment, this code
path was not executed. The behaviour should be the same regardless of
whether there are restrictions that have been checked.
Change-Id: Ib3ed081f9f26f4f9172c681bf9ebccb7cdc3ca9c
Related-Bug: #1663522
If a resource is in a FAILED state, but the plugin overrides
_needs_update() to not raise UpdateReplace, and also no actual update is
required, we simply change the resource's state to UPDATE_COMPLETE.
However, since the locking/unlocking steps were merged with the resource
state updates in a7376f7494, in doing so we
were ignoring the lock (since at this stage the resource is still unlocked,
and we don't need to take a lock). If another update traversal had acquired
the lock in the meantime, we would still write to the locked resource.
This change adds a new LOCK_RESPECT lock mode to Resource.store(). In this
mode the lock is not acquired or released, but an error will be raised if
the resource is found to be locked.
Change-Id: I8b5cd1e05b88dd13abc13899a73f23810f7e6135
Closes-Bug: #1705794
Related-Bug: #1662585
During update, delete and signal operations, we set the properties of a
resource to the values stored in the database when they were last created
or updated, so that we don't resolve intrinsic functions to new values
caused by possible changes in live attribute values. In fact, since we
started requiring the resources' node data to be pre-populated into the
StackDefinition, we will get only placeholder values for attributes and
resource IDs if we tried to resolve live functions.
This change does the same for the suspend, resume, snapshot, and check
actions. Should any plugin's handler function for any of those actions read
the resource's properties (I'm not sure that any do) then they will now get
the stored values.
Change-Id: If6b6f66877e23b8538a97aa339357ca1a2a29276
When load()ing a Resource in order to check it, we must load its definition
from whatever version of the template it was created or last updated with.
Previously we created a second Stack object with that template in order to
obtain the resource definition. Since all we really need in order to obtain
this is the StackDefinition, create just that instead.
Change-Id: Ia05983c3d1b838d2e28bb5eca38d13e83ccaf368
Implements: blueprint stack-definition
When the dep_attrs function was written, it was used only in convergence
after checking a single resource. However, now we also use it to generate
data for the ResourceProxy objects, which is often done for all resources
simultaneously. That means doing O(n^2) dep_attrs() calls, which can really
slow things down when there is a large number of resources with complex
properties (or metadata).
This change adds an all_dep_attrs() call, which runs as fast as dep_attrs()
on everything except get_attr functions and their arguments, but only needs
to be called once instead of once per resource. (The get_attr function can
in future override the default implementation of all_dep_attrs() to be as
efficient as dep_attrs() as well.) The resulting data is cached in the
ResourceDefinition or OutputDefinition so that subsequent calls to their
get_attr() methods with different (or the same) resource names will use the
existing data.
Change-Id: If95f4c04b841519ce3d7492211f2696588c0ed48
Partially-Implements: blueprint stack-definition
Closes-Bug: #1684272
This is no longer required as the convergence 'lightweight stack' used for
checking individual resources has now merged with the other code paths: in
every case intrinsic functions are resolved using the data about resources
stored in StackDefinition and accessed via ResourceProxy.
Change-Id: I41cf21c8e1babe819b4b6c668749ed5915ae3b54
This unifies the 'lightweight stack' used in convergence with the how
things work the rest of the time: we now always obtain resource data from
the StackDefinition's ResourceProxy objects. This means that when we are
checking an individual resource, we will never create all of the other
Resource objects for the stack in memory (although we already avoided
loading data for them from the database) - now we will only create
ResourceProxy objects as needed.
Change-Id: Id7472557e26d172df88841ff7f20afdd7f5bfada
Implements: blueprint stack-definition
Since function.dep_attrs() returns logical resource names (rather
than actual objects), we can just as easily use the StackDefinition to
calculate it instead of the Stack and Resource objects.
In the legacy path, we must ensure we use the StackDefinition from the
*new* stack to determine which attributes to include in the NodeData, since
that's what we're going to be using it for. In the convergence path the
current stack definition already contains the new template.
Also, update the *new* stack's definition with the NodeData obtained from
completed resources (in addition to the existing stack's), so that that
data may be used in calculating the dep_attrs for future resources. This is
required when get_attr functions are nested in the template.
Change-Id: I23efcc091eae53470f7f9cb3ca21e09f00f43808
Partially-Implements: blueprint stack-definition
While create a resource, we might retry to create or delete that resource.
Within every action properties validate was act as pre-function before
actually call any action. While the first creat run, if anything goes wrong
during property validation, a validation error will raise right away. Which
will definitely riase in the very first action run. But if it successed,
it still required to revalidate each time an action triggered (whatever is
creat or delete). This patch will ignore properties validate if it's not
the first create run.
Closes-Bug: #1691672
Change-Id: Ibf592a254a862613eddb77ea5933ec6ba0cd2d1a
When requesting the NodeData for a Resource, generate placeholder values if
the resource has not been created yet. Currently, all attributes are given
a value of None (consistent with how the get_attr intrinsic functions
substitute them when the resource state indicates that the attributes are
not valid now). Reference IDs are generated as usual (since the logic for
what to return when the state indicates the data are not valid is contained
in the resource plugins).
In future, this will allow us to add placeholder NodeData for all resources
to the StackDefinition prior to validation. In the long term, that should
help us to remove the state-checking logic from the intrinsic functions. In
the short term, it will allow us to validate template data that is parsed
against the StackDefinition (rather than the Stack), since it gives us a
list of valid attribute names. In the medium term, this should give us a
place to substitute in more sophisticated placeholder values to do better
validation.
Change-Id: Ife69f563a3748c6cb611ca02f826229e76bee4d0
Partially-Implements: blueprint stack-definition
When we signal a resource, we want the signal to be interpreted in the
context of the properties of the resource as they were set at the last
stack update, not based on any attributes of other resources that may have
changed.
We already store the properties for comparison during updates. This patch
freezes the properties while we call handle_signal(), in a similar way to
how we freeze the properties during handle_update() (see bug 1584623).
Also make sure that when a ScalingPolicy adjusts an autoscaling group, it
also uses the group's stored properties so that e.g. the definitions of
existing members cannot change on a scale up.
Change-Id: I7e248ad82f2334b1a580655efa3a302e7d84fbd8
Partially-Implements: blueprint stack-definition
The Stack._find_filtered_resources() method returns Resource objects for
all resources associated with the Stack, regardless of whether they are
current (present in the template; latest version in the case of
convergence). To do this, it previously created a new Resource object
for every resource found in the database.
However, for those resources which *are* current this is unnecessary. We
can access the Resource object simply through self.resources. It turns
out this is necessary for obtaining the required_by list for legacy
stacks, because only the Resources obtained from self.resources also
appear in the Dependencies graph obtained from self.dependencies. The
required_by list is read when listing or showing resources, which would
either return an empty list or fail for legacy stacks.
This patch also makes the Resource.required_by() method more robust in
its error handling.
Change-Id: Id438336e5c88dc7c2d168ba01ee703faa17e8b8e
Closes-Bug: #1703703
Related-Bug: #1523748
Ensure that attributes that are referenced in outputs get cached (and
therefore stored in the database) even when they are not referenced by
other resources.
Change-Id: I667ab04f91edddef5c5dbec0a89d465110c312b4
Closes-Bug: #1660831
Calculating all of the attributes of a resource that are referenced in the
template was previously done inside the node_data() method, but this is
useful for the resource to be able to access for other reasons so refactor
it into a separate method.
Change-Id: I8ec943652e6f93a2352f2ea5e6ac46e0088e5458
Related-Bug: #1660831
Partially-Implements: blueprint stack-definition
This changes the logic of getting cancellation grace
period of task runner before closing it: to move the
liveness check into the cancel_all() method in the
scheduler rather than ask the resource if it's IN_PROGRESS.
Change-Id: Ia2a03de227ff15cdce1b3dbb6cd6bff6c5a50a15
Partial-Bug: 1693495
This moves signal validation after hook handling, to be able to properly
get signals during DELETE actions.
Change-Id: I18e2284ab344f6c2e46e40744b771524b7bad9d9
Closes-Bug: #1690806
Previously, all caching of attribute values was done via the Attributes
object. However, some resource types override Resource.get_attribute() to
do custom handling of the trailing attribute path or dynamic attribute
names, and in these cases the resulting values were not cached (since they
don't go through the Attributes object).
This patch adds a caching step for these resources:
* OS::Senlin::Cluster
* OS::Heat::ResourceChain
* OS::Heat::ResourceGroup
* OS::Heat::AutoscalingGroup
* OS::Heat::SoftwareDeployment
* OS::Heat::SoftwareDeploymentGroup
* TemplateResource
* AWS::CloudFormation::Stack
Change-Id: I07ac22cc4370a79bd8712e2431fa3272115bc0eb
Co-Authored-By: Crag Wolfe <cwolfe@redhat.com>
Partial-Bug: #1660831
This code was never used for anything, as the rest of the patch series
never landed and was abandoned.
This reverts commit 15e52ff5e9.
Change-Id: I7d1a22753e8de1d3adf127c14516ebd667513bfa
Store resource attributes that may be cached in the DB, saving the
cost of re-resolving them later. This works for most resources,
specifically those that do not override the get_attribute() method.
Change-Id: I71f8aa431a60457326167b8c82adc03ca750eda6
Partial-Bug: #1660831
The 'show' attribute is used to get all data about a resource. There are
folks using this in lieu of having access to the underlying OpenStack APIs
themselves. By not caching it, we ensure that they will always see a live
value, even once we start storing attribute values in the database when
modifying a resource.
Change-Id: Id81a56f7d774bc18a2b5d62c80b369f1c12bcc5c
Co-Authored-By: Crag Wolfe <cwolfe@redhat.com>
Related-Bug: #1660831
If a traversal is interrupted by a fresh update before a particular
resource is created, then the resource is left stored in the DB with the
old template ID. While an update always uses the new template, a create
assumes that the template ID in the DB is correct. Since the resource has
never been created, the new traversal will create it using the old
template.
To resolve this, detect the case where the resource has not been created
yet and we are about to create it and the traversal ID is still current,
and always use the new resource definition in that case.
Change-Id: Ifa0ce9e1e08f86b30df00d92488301ea05b45b14
Closes-Bug: #1663745
I405888f46451d2657aa28f610f8ca555215ff5cf changed to load only
a few fields when loading the resource from the db. We should
load 'name' field for it to be used when rasing ConcurrentTransaction
exception.
Change-Id: I699961b75c7c417d03e21b51a2464c39e737ca89
Related-Bug: #1583679
Related-Bug: #1680658