diff --git a/specs/inspector-error-enumeration.rst b/specs/inspector-error-enumeration.rst new file mode 100644 index 0000000..107fded --- /dev/null +++ b/specs/inspector-error-enumeration.rst @@ -0,0 +1,247 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +================================== +Ironic Inspector Error Enumeration +================================== + +https://bugs.launchpad.net/ironic-inspector/+bug/1710945 + +This blueprint will introduce a new field `error-code` to the +**Ironic Inspector** API. The new field is thought to make the automation +around **Ironic Inspector** easier and more reliable. + + +Problem description +=================== + +Currently, if node inspection process fails for one reason or the other, +it may be hard for the **Ironic Inspector** REST API consumers to determine +the exact cause of the failure. That is because the only error indication +currently being offered by the **Ironic Inspector** REST API (other than +HTTP error code) is a free-form error message text. + +.. code-block:: json + + { + "error": { + "message": "Diskette drive 0 seek failure" + } + } + + +Proposed change +=============== + +The proposal is to enumerate **Ironic Inspector** REST API errors by +introducing a new numeric field `error-code` to the +**Ironic Inspector** REST API. + +There is probably no need to assign a distinct `error-code` to every +possible `error` message. Instead a handful of important classes of +errors may be determined, then all `error` messages may be distributed +over the `error-code` set. + +The collection of generally useful `error-code` values would become +part of a common library consumed by **Ironic Python Agent**, +**Ironic Inspector** and its CLI tool. + + +Alternatives +------------ + +Advise **Ironic Inspector** REST API consumers to rely upon the +`error` messages they observe. This would constitute a somewhat toxic +design as it effectively blocks **Ironic Inspector** developers from +changing error messages (accidental change, rewording, localization), puts +needless efforts on the consumers while the end product would remain +fragile. + + +Data model impact +----------------- + +The node object at the **Ironic Inspector** database schema would include +the new integer field - `error_code`. + +The **Ironic Inspector** REST API would include the new integer +field - `error-code`. + +The `error-code` values would encode the exact error (lower byte), +more general error class (higher byte) and the severity of the error +(most significant byte): + +.. code-block:: + + ERROR_SEVERITY_LOW = 0 + ERROR_SEVERITY_HIGH = 1 + ERROR_SEVERITY_FATAL = 2 + + ERROR_CLASS_NONE = 0x0000 + ERROR_CLASS_IO = 0x0100 + ERROR_CLASS_MEMORY = 0x0200 + ... + + ERROR_CODE_NONE = 0x00 + ERROR_CODE_BADSECTOR = 0x01 + ERROR_CODE_OOM = 0x02 + ... + + error_code = ERROR_SEVERITY_LOW | ERROR_CLASS_MEMORY | ERROR_CODE_OOM + +The existence of the error class would relax the dependency on the exact +error codes among different versions of **Ironic Inspector** and the +surrounding tooling. Even if the client is not aware of the exact +`error-code` it received from **Ironic Inspector**, the client can +still attempt to interpret the error class and act accordingly to +the encoded severity. + +Existing database would have to be migrated onto the modified schema. +Initial value for the new `error_code` field would be set to `` +(e.g. 0x000000). + + +HTTP API impact +--------------- + +When **Ironic Python Agent** is sending the introspection +results up to the **Ironic Inspector** via the Ramdisk +callback, the `error-code` attribute may be present: + +.. code-block:: json + + POST /v1/continue + { + "inventory": + { + ... + }, + "root_disk": "/dev/sda1", + "boot_interface": "01:11:22:33:44:55:66", + "error": "Diskette drive 0 seek failure", + "error-code": 1234 + } + +When **Ironic Inspector** clients (e.g. CLI) retrieve introspection +status, the `error-code` attribute will be present alongside the +existing `error` attribute: + +.. code-block:: json + + GET /v1/introspection/13211c7a-0402-4a1d-b970-5a44870125f5 + { + "finished": true, + "state": "error", + "error": "Diskette drive 0 seek failure", + "error-code": 1234, + ... + } + +The **Ironic Python Agent** REST API microversion would have to be bumped. + + +Client (CLI) impact +------------------- + +.. code-block:: bash + + $ openstack baremetal introspection status 13211c7a-0402-4a1d-b970-5a44870125f5 + +-------------+--------------------------------------+ + | Field | Value | + +-------------+--------------------------------------+ + | error | Diskette drive 0 seek failure | + | error-code | 1234 (I/O Error) | + | finished | True | + | finished_at | 2017-09-01T14:04:58 | + | started_at | 2017-09-01T14:02:12 | + | state | error | + | uuid | 13211c7a-0402-4a1d-b970-5a44870125f5 | + +-------------+--------------------------------------+ + + +Ironic python agent impact +-------------------------- + +A new dependency on the common library enumerating error codes +would be introduced. + + +Performance and scalability impact +---------------------------------- + +None + +Security impact +--------------- + +None. + + +Deployer impact +--------------- + +The Deployer would be able to build automation utilizing the +same library as the ironic-* projects when processing/reporting +an error. + + +Developer impact +---------------- + +Developers should adhere to the standardized error codes. Introducing +new error code will require an update of the shared error codes library. + + +Upgrades and Backwards Compatibility +------------------------------------ + +The new `error-code` attribute/field enhances the current error +handling with further detail, expanding on the current error reporting. +This should be a backwards-compatible change (e.g. older CLI/automation) +won't be broken. + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + + + +Work Items +---------- + +* Create a common library for error codes +* Adopt the new common library in: + * IPA + * inspector + * inspector client +* Modify **Ironic Python Agent** to report `error-code` +* Modify **Ironic Inspector** to consume, store and report `error-code` + + +Dependencies +============ + +The new dependency on the common error codes library would be +introduced. Possibly a new OpenStack project would be created +to accommodate the new library. + + +Testing +======= + +The new functionality and the new library would require unittesting +and integration testing the same way as e.g **Ironic Inspector** does. + + +References +========== + +None.