Add spec for bp:post-run-cleanup

Made changes according to comments provided by community Removed database access Added reporting detail Added pluggability suggestion Added example of how deletes work when --preserve-state is true Added --dry-run argument definition Added --init-saved-state argument definition Change-Id: Ice4bdc3d6059c0549e7b648dc187830abbf89a61
2014-05-01 07:19:08 -04:00
parent f46b3b1aeb
commit 6d5d398464
1 changed files with 137 additions and 0 deletions
--- a/specs/post-run-cleanup.rst
+++ b/specs/post-run-cleanup.rst
@@ -0,0 +1,137 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+========================
+Tempest Post-run Cleanup
+========================
+
+https://blueprints.launchpad.net/tempest/+spec/post-run-cleanup
+
+Problem description
+====================
+The existing script, /tempest/stress/cleanup.py, can be used to do some
+basic cleanup after a Tempest run, but a more robust tool is needed
+to report as much information as possible about dangling objects (leaks)
+left behind after a tempest run in an attempt to help find out why the
+object(s) was left behind and report a bug against the root cause. Also
+the tool should completely reset the environment to the pre-run state
+should tempest leave behind any dangling objects.
+
+The idea is that the user should be able look at the report generated
+and find the root cause as to why an object was not deleted. Also the
+tool will return the system back into a state where tempest can be
+re-run with the expectation that the same test results will be returned.
+
+Currently there can be a good deal of manual work needed, depending
+on what tests fail, to return to this pre-run state. This blueprint
+is designed to alleviate this issue.
+
+Proposed changes
+================
+- Keep /tempest/stress/cleanup.py as a starting point and extend it.
+  It should be moved to from /tempest/stress to tempest/cmd/ and
+  an entry point should be added for it as well. This way it is installed
+  as a binary when setup.py is run, which also allows it to be unit
+  tested.  Currently the tools uses the tempest OpenStack clients and
+  this will remain unchanged.
+- Fix cleanup.py to delete objects by looping through each tenant/user,
+  over the way it currently works, which is to use the admin user and
+  "all_tenants" argument, as some object types don't support this
+  argument, Floating IPs for example.
+- Currently cleanup.py deletes all objects across all users/tenants.
+  Add two runtime arguments: --init-saved-state, that creates a JSON
+  file containg the pre-tempest run state and --preserve-state, that
+  will preserve the deployment's pre-tempest run state, including tenants
+  and users defined in tempest.conf. This will enforce that the
+  deployment is in the same state it was prior to running tempest and
+  allow tempest to be run again without having to reconfigure tempest
+  and recreate the tempest test users etc. For example, if
+  --preserve-state is true cleanup will load the JSON file (created by
+  running cleanup with --init-saved-state flag prior to tempest run)
+  containing the preserved state of the environment and marshal the data
+  to some defined instance variables. Then, when cleanup is looping
+  through floating ips we would have something like:
+
+        for f in floating_ips:
+            if not preserve or (preserve and f['id'] not in self.floating_ips):
+                try:
+                    admin_manager.floating_ips_client.delete_floating_ip(f['id'])
+                except Exception:
+                    ...
+
+- cleanup.py currently deletes servers (instances), keypairs,
+  security groups, floating ips, users, tenants, snapshots and volumes.
+  It should also delete any stacks, availability zones and any other objects
+  created by Tempest, full list TBD.
+- As mentioned in the overview section above some test failures leave the
+  system in a strange state. For example, an instance cannot be deleted
+  because it is in Error state.  Even after using CLI to reset the instance
+  to Active state, future delete calls just result in Error state once
+  again. Such a case indicates a bug in OpenStack.  This tool should
+  should provide as much detail as possible as to what went wrong so
+  a defect can be opened against the problem(s).
+- Add argument, --dry-run, that runs cleanup in reporting mode only, showing
+  what would be deleted without doing the actual deletes
+
+Scenario 1: run cleanup.py
+--------------------------
+This is the current behavior, which deletes all objects in the system,
+with the exception of the missing ones, stacks and availability zones
+for example.
+
+Scenario 2: run cleanup.py --preserve-state
+-------------------------------------------
+Same as Scenario 1 except that objects defined in tempest.conf, that
+are used in a Tempest run are preserved.
+
+For example (exceptions are variables defined in tempest.conf):
+
+- delete all users except: username, alt_username, admin_username
+- delete all tenants except: tenant_name, alt_tenant_name, admin_tenant_name
+- delete all images except: image_ref, image_ref_alt
+
+Additional Implications
+-----------------------
+There are cases where cruft will be left in the database do to openstack defects
+that don't allow objects to be removed during the cleanup process.
+In such cases resetting the system to the pre-existing state requires direct
+interaction with the database.  It may be useful to design the cleanup script
+so that it has a pluggable interface,  where downstream functionality can be
+added to automate required database interactions for example. Although
+the API delete failure indicates an upstream bug that needs to be fixed, until
+that bug is fixed testing the environment further is blocked until the records
+are deleted.
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+Primary assignee:
+  David Paterson <davpat2112@yahoo.com>
+
+Can optionally can list additional ids if they intend on doing
+substantial implementation work on this blueprint.
+
+Milestones
+----------
+Target Milestone for completion:
+
+- Juno release cycle, approximately the week of July 24th, 2014.
+
+Work Items
+----------
+- refactor location of cleanup.py
+- register new runtime arguments in cleanup.py
+- enable filtering deletions based on --preserve-state argument
+  and values defined in tempest.conf
+- write code for detailed reporting on dangling resources and
+  possible root cause for cleanup failure.
+- implement code for --dry-run argument, report only mode.
+
+Dependencies
+============
+Only those listed above
+