Add spec for bp:post-run-cleanup
Made changes according to comments provided by community Removed database access Added reporting detail Added pluggability suggestion Added example of how deletes work when --preserve-state is true Added --dry-run argument definition Added --init-saved-state argument definition Change-Id: Ice4bdc3d6059c0549e7b648dc187830abbf89a61
This commit is contained in:
137
specs/post-run-cleanup.rst
Normal file
137
specs/post-run-cleanup.rst
Normal file
@@ -0,0 +1,137 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
========================
|
||||||
|
Tempest Post-run Cleanup
|
||||||
|
========================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/tempest/+spec/post-run-cleanup
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
====================
|
||||||
|
The existing script, /tempest/stress/cleanup.py, can be used to do some
|
||||||
|
basic cleanup after a Tempest run, but a more robust tool is needed
|
||||||
|
to report as much information as possible about dangling objects (leaks)
|
||||||
|
left behind after a tempest run in an attempt to help find out why the
|
||||||
|
object(s) was left behind and report a bug against the root cause. Also
|
||||||
|
the tool should completely reset the environment to the pre-run state
|
||||||
|
should tempest leave behind any dangling objects.
|
||||||
|
|
||||||
|
The idea is that the user should be able look at the report generated
|
||||||
|
and find the root cause as to why an object was not deleted. Also the
|
||||||
|
tool will return the system back into a state where tempest can be
|
||||||
|
re-run with the expectation that the same test results will be returned.
|
||||||
|
|
||||||
|
Currently there can be a good deal of manual work needed, depending
|
||||||
|
on what tests fail, to return to this pre-run state. This blueprint
|
||||||
|
is designed to alleviate this issue.
|
||||||
|
|
||||||
|
Proposed changes
|
||||||
|
================
|
||||||
|
- Keep /tempest/stress/cleanup.py as a starting point and extend it.
|
||||||
|
It should be moved to from /tempest/stress to tempest/cmd/ and
|
||||||
|
an entry point should be added for it as well. This way it is installed
|
||||||
|
as a binary when setup.py is run, which also allows it to be unit
|
||||||
|
tested. Currently the tools uses the tempest OpenStack clients and
|
||||||
|
this will remain unchanged.
|
||||||
|
- Fix cleanup.py to delete objects by looping through each tenant/user,
|
||||||
|
over the way it currently works, which is to use the admin user and
|
||||||
|
"all_tenants" argument, as some object types don't support this
|
||||||
|
argument, Floating IPs for example.
|
||||||
|
- Currently cleanup.py deletes all objects across all users/tenants.
|
||||||
|
Add two runtime arguments: --init-saved-state, that creates a JSON
|
||||||
|
file containg the pre-tempest run state and --preserve-state, that
|
||||||
|
will preserve the deployment's pre-tempest run state, including tenants
|
||||||
|
and users defined in tempest.conf. This will enforce that the
|
||||||
|
deployment is in the same state it was prior to running tempest and
|
||||||
|
allow tempest to be run again without having to reconfigure tempest
|
||||||
|
and recreate the tempest test users etc. For example, if
|
||||||
|
--preserve-state is true cleanup will load the JSON file (created by
|
||||||
|
running cleanup with --init-saved-state flag prior to tempest run)
|
||||||
|
containing the preserved state of the environment and marshal the data
|
||||||
|
to some defined instance variables. Then, when cleanup is looping
|
||||||
|
through floating ips we would have something like:
|
||||||
|
|
||||||
|
for f in floating_ips:
|
||||||
|
if not preserve or (preserve and f['id'] not in self.floating_ips):
|
||||||
|
try:
|
||||||
|
admin_manager.floating_ips_client.delete_floating_ip(f['id'])
|
||||||
|
except Exception:
|
||||||
|
...
|
||||||
|
|
||||||
|
- cleanup.py currently deletes servers (instances), keypairs,
|
||||||
|
security groups, floating ips, users, tenants, snapshots and volumes.
|
||||||
|
It should also delete any stacks, availability zones and any other objects
|
||||||
|
created by Tempest, full list TBD.
|
||||||
|
- As mentioned in the overview section above some test failures leave the
|
||||||
|
system in a strange state. For example, an instance cannot be deleted
|
||||||
|
because it is in Error state. Even after using CLI to reset the instance
|
||||||
|
to Active state, future delete calls just result in Error state once
|
||||||
|
again. Such a case indicates a bug in OpenStack. This tool should
|
||||||
|
should provide as much detail as possible as to what went wrong so
|
||||||
|
a defect can be opened against the problem(s).
|
||||||
|
- Add argument, --dry-run, that runs cleanup in reporting mode only, showing
|
||||||
|
what would be deleted without doing the actual deletes
|
||||||
|
|
||||||
|
Scenario 1: run cleanup.py
|
||||||
|
--------------------------
|
||||||
|
This is the current behavior, which deletes all objects in the system,
|
||||||
|
with the exception of the missing ones, stacks and availability zones
|
||||||
|
for example.
|
||||||
|
|
||||||
|
Scenario 2: run cleanup.py --preserve-state
|
||||||
|
-------------------------------------------
|
||||||
|
Same as Scenario 1 except that objects defined in tempest.conf, that
|
||||||
|
are used in a Tempest run are preserved.
|
||||||
|
|
||||||
|
For example (exceptions are variables defined in tempest.conf):
|
||||||
|
|
||||||
|
- delete all users except: username, alt_username, admin_username
|
||||||
|
- delete all tenants except: tenant_name, alt_tenant_name, admin_tenant_name
|
||||||
|
- delete all images except: image_ref, image_ref_alt
|
||||||
|
|
||||||
|
Additional Implications
|
||||||
|
-----------------------
|
||||||
|
There are cases where cruft will be left in the database do to openstack defects
|
||||||
|
that don't allow objects to be removed during the cleanup process.
|
||||||
|
In such cases resetting the system to the pre-existing state requires direct
|
||||||
|
interaction with the database. It may be useful to design the cleanup script
|
||||||
|
so that it has a pluggable interface, where downstream functionality can be
|
||||||
|
added to automate required database interactions for example. Although
|
||||||
|
the API delete failure indicates an upstream bug that needs to be fixed, until
|
||||||
|
that bug is fixed testing the environment further is blocked until the records
|
||||||
|
are deleted.
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
Primary assignee:
|
||||||
|
David Paterson <davpat2112@yahoo.com>
|
||||||
|
|
||||||
|
Can optionally can list additional ids if they intend on doing
|
||||||
|
substantial implementation work on this blueprint.
|
||||||
|
|
||||||
|
Milestones
|
||||||
|
----------
|
||||||
|
Target Milestone for completion:
|
||||||
|
|
||||||
|
- Juno release cycle, approximately the week of July 24th, 2014.
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
- refactor location of cleanup.py
|
||||||
|
- register new runtime arguments in cleanup.py
|
||||||
|
- enable filtering deletions based on --preserve-state argument
|
||||||
|
and values defined in tempest.conf
|
||||||
|
- write code for detailed reporting on dangling resources and
|
||||||
|
possible root cause for cleanup failure.
|
||||||
|
- implement code for --dry-run argument, report only mode.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
Only those listed above
|
||||||
|
|
||||||
Reference in New Issue
Block a user