.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode Sections of this template were taken directly from the Nova spec template at: https://github.com/openstack/nova-specs/blob/master/specs/template.rst .. This template should be in ReSTructured text. The filename in the git repository should match the launchpad URL, for example a URL of https://blueprints.launchpad.net/trove/+spec/awesome-thing should be named awesome-thing.rst. Please do not delete any of the sections in this template. If you have nothing to say for a whole section, just write: None Note: This comment may be removed if desired, however the license notice above should remain. ===================== Persist Error Message ===================== .. If section numbers are desired, unindent this .. sectnum:: .. If a TOC is desired, unindent this .. contents:: Errors that occur in Trove should be easy to retrieve so that the end user can see exactly what is happening with their database instance. Launchpad Blueprint: https://blueprints.launchpad.net/trove/+spec/persist-error-message Problem Description =================== Historically it has been very difficult to determine the cause of a failure in Trove. This is due to the fact that errors may be logged in multiple places, none of which are available to the end user. With the advent of Notifications in Trove, however, it is now feasible to persist error messages in the db so that they can be retrieved and displayed. Proposed Change =============== Each server will register a callback with the notification framework. Whenever a notification is sent, this callback will be fired off and any errors that occur can then be saved in the database. This information can then be recalled by the user using the 'trove show' command. For errors that occur outside the framework of notifications, a direct call will be made to persist the error. Not all errors will need to be persisted, so an initial set will be proposed that can be enhanced over time as the need arises. Configuration ------------- No configuration changes are anticipated. Database -------- A new table (instance_faults) will be added to the Trove schema: ================= ============ =========== ============================== Column Type Allow Nulls Description ================= ============ =========== ============================== id varchar(64) No ID of fault (autogenerated) instance_id varchar(64) No ID of instance that the fault occurred on message varchar(255) No Error message of the fault details text(65535) No Extra details (i.e. stack trace) created DateTime No Created date updated DateTime No Updated date deleted tinyint(1) Yes Deleted flag deleted_at DateTime Yes Deleted date ================= ============ =========== ============================== Public API ---------- The only change to the public API will be the addition of a 'fault' data structure that is returned when requesting instance details. This will look like: .. code-block:: python 'fault' : { 'created': , 'message': 'error message', 'details': 'potential stack trace', }, The 'details' value will only be available if the request is done by an admin user. Public API Security ------------------- No security issues are anticipated. Since the messages persisted are all exception messages that are broadcast as notifications, none should contain sensitive information. If any are found to, they should be treated as bugs and modified accordingly (none have been discovered as of yet). Python API ---------- No changes are anticipated to the python API. CLI (python-troveclient) ------------------------ The 'show' Trove CLI command may now have new data displayed: .. code-block:: bash +-------------------+----------------------------------------------------+ | Property | Value | +-------------------+----------------------------------------------------+ | created | 2016-05-06T21:28:53 | | datastore | mysql | | datastore_version | 5.6 | | fault_date | 2016-05-06T21:30:06 | | fault_details | Traceback (most recent call last): | | | File "//manager.py", line 265, in prepare | | | cluster_config, snapshot, modules) | | | File "//manager.py", line 355, in _prepare | | | raise RuntimeError("A guest error occurred") | | | RuntimeError: A guest error occurred | | fault_message | A guest error occured | | flavor | 15 | | id | 73cfc462-dd59-4dc1-9d32-95954171775f | | ip | 10.66.25.8 | | name | myinst2 | | status | ACTIVE | | updated | 2016-05-06T21:28:58 | | volume | 1 | | volume_used | 0.1 | +-------------------+----------------------------------------------------+ Internal API ------------ No changes need to be made to this API. Guest Agent ----------- No changes need to be made to the guest agent. Alternatives ------------ We could continue to require access to the logs and/or Nova instances to determine what happened when an error occurs. Dashboard Impact (UX) ===================== The relevant fields need to be exposed during the 'show' command. Implementation ============== Assignee(s) ----------- Primary assignee: [peterstac] Milestones ---------- Newton Work Items ---------- The work will be undertaken within a single task. Upgrade Implications ==================== No upgrade issues are expected. Dependencies ============ None. Testing ======= Scenario tests will be enhanced to verify that errors are persisted in the database and can be retrieved. Documentation Impact ==================== This is a net-new feature, and as such will require documentation. References ========== None Appendix ======== None