Instead of calling check_resource on all leaves in the resource graph at
once, sleep a little bit between each call. As it's a tad slower,
delegate it to a thread so that the stack_create RPC message doesn't
timeout when you have lots of resources.
Change-Id: I84d2b34d65b3ce7d8d858de106dac531aff509b7
Partial-Bug: #1566845
The list_events rpc api no longer includes resource properties data
when returning multiple events. In the "show event" case where an event
uuid is included in the list_events request, the resource properties data
for that single event continues to be included in the output.
Note that there is no change in behaviour on the heat-api side (see
heat/api/openstack/v1/events.py). Previously when listing multiple
events, it had just been ignoring the resource properties data it
fetched from the list_events rpc api, not including it in the output
it returned (e.g., to python-heatclient).
Change-Id: I7ac83d848cdd0e6c313870c0a4d59a5d9b2301f5
Partial-Bug: #1665506
When a stack is IN_PROGRESS and an UPDATE or RESTORE is called
after an engine crash, we set status of the stack and all of its
IN_PROGRESS resources to FAILED
Change-Id: Ia3adbfeff16c69719f9e5365657ab46a0932ec9b
Closes-Bug: #1570576
Since this resource type depends wholly on the deprecated Glance v1 API, we
should deprecate it as well. The Glance v2 API does not offer any
equivalent functionality.
Change-Id: Iab2bb291d1640f13b6f91957d795bdf4234fb0e5
Partially-Implements: blueprint migrate-to-glance-v2
Designage deprecated v1 and corresponding v2
resource plugin is provided as part of below blueprint.
so this patch deprecates v1 resource plugins.
Change-Id: Ia0fbea7a591be200d16be95d7111613f8762190c
implments: blueprint heat-designate-recordset-zone
Some resources do not work if their metadata is in
a wrong state, .e.g the metadata 'scaling_in_progress'
of scaling group/policy might be always True if engine
restarts while scaling.
This patch adds an interface 'handle_metadata_reset' for
resource, then the plugins can override it if needed.
We reset the metadata while marking resource healthy.
Change-Id: Ibd6c18acf6f3f24cf9bf16a524127850968062bc
Closes-Bug: #1651084
This change adds project information of software configs
if using admin context.
Change-Id: Ia26919aa1177a9366c65710becb2097b79e02445
Closes-Bug: #1646312
The db.api module provides a useless indirection to the only
implementation we ever had, sqlalchemy. Let's use that directly instead
of the wrapper.
Change-Id: I80353cfed801b95571523515fd3228eae45c96ae
If the name passed into mark-unhealthy is not a valid resource name,
check if it is a valid resource id and retrieve the resource via id
instead of name.
Change-Id: Ie28ed102665b2c6379d1f55b7a02b76d05e38ddd
Co-Authored-By: Zane Bitter <zbitter@redhat.com>
Closes-Bug: #1635295
It's possible that we could end up with multiple resources with the same
physical resource ID, but that would be undetectable since we return only
one from the database layer. This change allows us to detect the problem an
return an error where the result is rendered ambiguous.
Change-Id: I2c5ddbe6731c33a09ec7c4a7b91dcfe414da4385
Before creating a nested stack we validate the resources in the parent
stack. Therefore it's wasteful to validate them again upon creating the
stack - prior to this patch the number or validation calls for each
resource would be equal to the depth of the stack in the tree (so twice for
a child of the root, and three times for a grandchild). With this patch,
each resource in a nested stack is validated only once, by the parent
resource.
ResourceGroup and its subclasses are currently the only StackResources that
do not validate all of their members; they validates a stack containing a
single representative resource. This means that in the case of e.g. index
substitution, the nested stack could be created without having validated
some resources. However, in practice this is unlikely to lead to validity
problems not causing a failure until later than they previously would have.
Change-Id: Iaa36ae5543b3ce30ae8df5a05b48fe987bc0ffdc
Closes-Bug: #1645336
oslo_service Service usage in the engine was slightly wrong: we
inherited from the base class without using its threadgroup, and we also
inherited from it in utility classes that were not real services. This
cleans up those.
Change-Id: I0f902afb2b4fb03c579d071f9b502e3108aa460a
When doing a stack delete, we update it with an
empty template. Therefore checking for the resource
in the template would fail in resource_signal.
Change-Id: Id8c2226e78ed74138ce9065c4435aa5778726656
Closes-Bug: #1635610
Add 'with_condition_func' filter param for API
template-function-list, if the param set to true,
the response will include the condition functions.
Change-Id: Icdfbafbb98698373648ff2d78db3c45fe2b924ee
Closes-Bug: #1625505
setUp and tearDown will be automatically called around each
testcase, so this is to remove setUp and tearDown that doing
nothing additional than super to keep code clean.
Change-Id: I8b6943602419d3f360991721d90b61888b55ea60
Previously, the stop_stack message accidentally used the
engine_life_check_timeout (by default, 2s). But unlike other messages sent
using that timeout, stop_stack needs to synchronously kill all running
threads operating on the stack. For a very large stack, this can easily
take much longer than a couple of seconds. This patch increases the timeout
to give a better chance of being able to start the delete.
Change-Id: I4b36ed7f1025b6439aeab63d71041bb2000363a0
Closes-Bug: #1499669
It is found that the inter-leaving of lock when a update-replace of a
resource is needed is the reason for new traversal not being triggered.
Consider the order of events below:
1. A server is being updated. The worker locks the server resource.
2. A rollback is triggered because some one cancelled the stack.
3. As part of rollback, new update using old template is started.
4. The new update tries to take the lock but it has been already
acquired in (1). The new update now expects that the when the old
resource is done, it will re-trigger the new traversal.
5. The old update decides to create a new resource for replacement. The
replacement resource is initiated for creation, a check_resource RPC
call is made for new resource.
6. A worker, possibly in another engine, receives the call and then it
bails out when it finds that there is a new traversal initiated (from
2). Now, there is no progress from here because it is expected (from 4)
that there will be a re-trigger when the old resource is done.
This change takes care of re-triggering the new traversal from worker
when it finds that there is a new traversal and an update-replace. Note
that this issue will not be seen when there is no update-replace
because the old resource will finish (either fail or complete) and in
the same thread it will find the new traversal and trigger it.
Closes-Bug: #1625073
Change-Id: Icea5ba498ef8ca45cd85a9721937da2f4ac304e0
The error messages 'Command Out of Sync' are due to the threads being
stopped in the middle of the database operations. This happens in the
legacy action when delete is requested during a stack create.
We have the thread cancel message but that was not being used in this
case. Thread cancel should provide a more graceful way of ensuring the
stack is in a FAILED state before the delete is attempted.
This changes does the following in the delete_stack service method for
legace engine:
- if the stack is still locked, send thread cancel message
- in a subthread wait for the lock to be released, or until a
timeout based on the 4 minute cancel grace period
- if the stack is still locked, do a thread stop as before
Closes-Bug: #1499669
Closes-Bug: #1546431
Closes-Bug: #1536451
Change-Id: I4cd613681f07d295955c4d8a06505d72d83728a0
We used to try to acquire the stack lock in order to find out which engine
to cancel a running update on, in the misguided belief that it could never
succeed. Accordingly, we never released the lock.
Since it is entirely possible to encounter a race where the lock has
already been released, use the get_engine_id() method instead to look up
the ID of the engine holding the lock without attempting to acquire it.
Change-Id: I1d026f8c67dddcf840ccbc2f3f1537693dc266fb
Closes-Bug: #1624538
The stack cancel update would halt the parent stack from propagating but
the nested stacks kept on going till they either failed or completed.
This is not desired, the cancel update should stop all the nested stacks
from moving further, albeit, it shouldn't abruptly stop the currently
running workers.
Change-Id: I3e1c58bbe4f92e2d2bfea539f3d0e861a3a7cef1
Co-Authored-By: Zane Bitter <zbitter@redhat.com>
Closes-Bug: #1623201
When a resource failed, the stack state was set to FAILED and current
traversal was set to emoty string. The actual traversal was lost and
there was no way to delete the sync points belonging to the actual
traversal.
This change keeps the current traversal when you do a state set, so that
later you can delete the sync points belonging to it. Also, the current
traversal is set to empty when the stack has failed and there is no need
to rollback.
Closes-Bug: #1618155
Change-Id: Iec3922af92b70b0628fb94b7b2d597247e6d42c4
Implements mechanism to cancel existing workers (in_progress resources).
The stack-cancel-update request lands in one of the engines, and if
there are any workers in that engine which are working for the stack,
they are cancelled first and then other engines are requested to cancel
the workers.
Change-Id: I464c4fdb760247d436473af49448f7797dc0130d
This allows a convergence operation to be cancelled at an appropriate point
(i.e. between steps in a task) by sending a message to a queue.
Note that there's no code yet to actually cancel any operations
(specifically, sending a cancel message to the stack will _not_ cause the
check_resource operations to be cancelled under convergence).
Change-Id: I9469c31de5e40334083ef1dd20243f2f6779549e
Related-Bug: #1545063
Co-Authored-By: Anant Patil <anant.patil@hpe.com>
The input to check stack complete should be the resource ID of the
resource that the current resource replaces instead of its own. Failing
to do so will result in stack being in in_progress state for ever.
Change-Id: I6f2856c82c8cc73f628976b7296ab0fb20af5ff3
Closes-Bug: #1614960
This refactors the service module to use template_schemata
for environment merging.
Change-Id: I86a28d0496e05f978fa1b734818404639541761b
Blueprint: environment-merging
This moves the merge_environment utility function
from service.py to environment_util.py.
Change-Id: Ia005cf47d5e655e60359f8da397a712e749ce13c
Blueprint: environment-merging
Run `heat-manage migrate-convergence-1 <stack_id>` to migrate
legacy stack to convergence engine.
Heat engine is used for doing migration i.e. migration can't
be done offline.
Change-Id: Ie7c2498b37937438f16d154b154b3a6ecbf9ff74
Implements-bp: convergence-migrate-stack
A deeply misguided effort to move all exceptions out of the
heat.engine.resource module, where they belong, and into the
heat.common.exception module, where they largely do not, broke the API for
third-party resource plugins. Unfortunately this happened a couple of
releases back already, so we can't simply put UpdateReplace back where it
belongs as that would also break the de-facto third-party API.
This change adds an alias in the correct location and a comment indicating
that it will move back at some time in the future. It also switches all of
the in-tree uses back to heat.engine.resource.UpdateReplace, in the hope
that third-party developers will be more inclined to copy from that.
This reverts commit 4e2cfb991a.
Change-Id: Iedd5d07d6c0c07e39e51a0fb810665b3e9c61f87
Closes-Bug: #1611104
We can use admin_context to have access to stacks
and software configs across projects. This removes
the tenant_safe flag from rpc and db api. This is
backward compatible with older rpc clients.
We still support use of global_tenant flag for listing
stacks and software configs. However, by default
an admin(user with admin role in admin_project)
would not need that.
Change-Id: I12303dbf30fb80290f95baba0c67cdf684f5f409
Currently, when a user makes an API call to list resources, they were
retrieved from the currently active template associated with the
stack. This worked in the legacy engine where no concurrent updates
where possible. With convergence and thus concurrent updates, a stack
is allowed to have resources of previous traversals and still continue
creating new resources. Therefore, relying on the template to list
resources won't exactly match the state in the database.
For example, on deletes where we update with an empty template,
currently, the stack parses the empty template searching for
resources. Doing a `resource-list` when the stack is in a state of
DELETE_IN_PROGRESS shows an empty list of resources and therefore not
matching the state in the database.
This change makes iter_resources always call _find_filtered_resources
which builds all its resources from a database query.
Change-Id: Ibe87a773c38efb6d4865fd3a1dbd079972dd8be4
Closes-Bug: #1523748
Closes-Bug: #1301320
When the only way to define a Software Config was via a Heat resource, the
input and output configs were validated by the properties of the resource.
However, subsequently a REST API to create Software Configs directly was
added. That means that configs created in this way do not have the contents
of the inputs and outputs sections validated. This change adds validation
to ensure that the configs always follow the correct schema.
Change-Id: I8c66bb82484b75723524959be753a4cd20c0f84d