ironic-specs/specs/approved/crud-and-state-notification...

356 lines
9.3 KiB
ReStructuredText

..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================================================
Add notifications about resources CRUD and node states
======================================================
https://bugs.launchpad.net/ironic/+bug/1606520
This spec proposes addition of new notifications to ironic: CRUD (create,
update, or delete) of resources and node state changes for provision state,
maintenance and console state.
Problem description
===================
Resource indexation services like Searchlight [#]_ require notifications about
creation, update or deletion of a resource. Currently CRUD notifications are
not implemented in ironic. Creating an efficient plugin for Searchlight is
impossible without these notifications.
Ironic node notifications for provision state, maintenance and console
state also could be used by Searchlight plugin in order to keep Searchlight's
index of ironic resources up-to-date.
Apart from searchlight, there is a use case of monitoring service, that
caches all notification payloads along with event type, like
start/end/error/<etc> and an operator can query this service to see if ironic
is behaving properly. For example, if there are much more start notifications
for node create, than there are end notifications, it may mean that the
database is not behaving properly, or messaging is having a hard time
delivering messages between API and conductor. That is a separate case from
searchlight: searchlight for example does not need to know the payload of the
node create start notification, as there is no actual node yet, but for
monitoring purposes, it may be useful.
Proposed change
===============
As a general note for all CRUD notifications, ``*.start`` and ``*.error`` event
payloads will be ignored by Searchlight, as in both cases it would mean that
resource representation has not changed, or in case of ``*create*``
notifications, that the resource was not created.
Node CRUD notifications
-----------------------
The following event types will be added:
* "baremetal.node.create.start";
* "baremetal.node.create.end";
* "baremetal.node.create.error";
* "baremetal.node.update.start";
* "baremetal.node.update.end";
* "baremetal.node.update.error";
* "baremetal.node.delete.start";
* "baremetal.node.delete.end";
* "baremetal.node.delete.error".
Priority level - INFO or ERROR (for "error" status). Payload contains all
fields from base ``NodePayload`` with additional fields: ``chassis_uuid``,
``instance_info``, ``driver_info``. Secrets in the node fields will be masked.
``raid_config`` and ``target_raid_config`` fields are excluded because they can
contain low-level disk and vendor information. If/when there is a use case for
them, they can be added in the future. All these notifications will be
implemented at the API level.
Port CRUD notifications
-----------------------
The following event types will be added:
* "baremetal.port.create.start";
* "baremetal.port.create.end";
* "baremetal.port.create.error";
* "baremetal.port.update.start";
* "baremetal.port.update.end";
* "baremetal.port.update.error";
* "baremetal.port.delete.start";
* "baremetal.port.delete.end";
* "baremetal.port.delete.error".
Priority level - INFO or ERROR (for "error" status).
Payload contains these fields: ``uuid``, ``node_uuid``, ``address``, ``extra``,
``local_link_connection``, ``pxe_enabled``, ``created_at``, ``updated_at``.
These notifications will be implemented at the API level. In addition,
"baremetal.port.create.*" will be emitted by the ironic-conductor service
when driver creates a port (examples are [#]_ and [#]_).
Chassis CRUD notifications
--------------------------
The following event types will be added:
* "baremetal.chassis.create.start";
* "baremetal.chassis.create.end";
* "baremetal.chassis.create.error";
* "baremetal.chassis.update.start";
* "baremetal.chassis.update.end";
* "baremetal.chassis.update.error";
* "baremetal.chassis.delete.start".
* "baremetal.chassis.delete.end".
* "baremetal.chassis.delete.error";
Priority level - INFO or ERROR (for "error" status).
Payload contains these fields: ``uuid``, ``extra``, ``description``,
``created_at``, ``updated_at``. All these notifications will be implemented at
the API level.
Node provision state notifications
----------------------------------
Will be implemented via TaskManager methods (and emitted by the
ironic-conductor service).
Types of events for node provision state:
* "baremetal.node.provision_set.start";
* "baremetal.node.provision_set.end";
* "baremetal.node.provision_set.error";
* "baremetal.node.provision_set.success".
Types of state changing in ironic and corresponding events:
* Start transition, spawning a working thread: "start" notification with
INFO level.
* End transition, cleaning ``target_provision_state``: "end" notification with
INFO level.
* Error events processing: "error" notification with ERROR level.
* Change ``provision_state`` without starting a worker that is not "end" or
"error": "success" notification with INFO level. Examples are
DEPLOYING <-> DEPLOYWAIT, AVAILABLE -> MANAGEABLE.
Payload contains all fields from base ``NodePayload`` with additional fields:
``instance_info``, ``previous_provision_state``,
``previous_target_provision_state``, ``event`` (FSM event that triggered the
state change).
To efficiently use the provision state notifications all related node changes
(like setting of ``last_error``, ``maintenance``) should be done before event
processing.
Node maintenance notifications
------------------------------
The following event types will be added:
* "baremetal.node.maintenance_set.start";
* "baremetal.node.maintenance_set.end";
* "baremetal.node.maintenance_set.error".
Priority level - INFO or ERROR (for "error" status). Payload contains all
fields from base ``NodePayload``. All these notifications will be implemented
at the API level and reflect maintenance changes to a node due to a user
request. There won't be any explicit node maintenance notifications for
maintenance changes done internally by ironic. Since these internal changes
occur as a result of trying to change the node's state (e.g. provision, power),
one of the other notifications that is emitted will "cover" these internal
maintenance changes.
Node console notifications
--------------------------
The following event types will be added:
* "baremetal.node.console_set.start";
* "baremetal.node.console_set.end";
* "baremetal.node.console_set.error";
* "baremetal.node.console_restore.start";
* "baremetal.node.console_restore.end";
* "baremetal.node.console_restore.error".
``console_set`` action is used when start or stop console is initiated via API
request, ``console_restore`` action is used when ``console_enabled`` flag is
already enabled in the DB for node and console restart via driver is required
(due to dead or restarted ironic-conductor process). Priority level - INFO or
ERROR (for "error" status). Payload contains all fields from base
``NodePayload``. All these notifications will be implemented in the
ironic-conductor, because setting of a node's console is an asynchronous
request, so ironic-conductor can easily emit notifications for the start/end of
the change.
Alternatives
------------
Periodically polling ironic resources via API.
Data model impact
-----------------
None
State Machine Impact
--------------------
None
REST API impact
---------------
None
Client (CLI) impact
-------------------
None
RPC API impact
--------------
None
Driver API impact
-----------------
None
Nova driver impact
------------------
None
Ramdisk impact
--------------
None
Security impact
---------------
None
Other end user impact
---------------------
None
Scalability impact
------------------
If notifications are enabled, they can create high load on the message bus
during node deployments on large environments.
Performance Impact
------------------
None
Other deployer impact
---------------------
Deployers should set already existing ``notification_level`` config options
properly.
Developer impact
----------------
* If developer creates resources in the driver, proper notification should be
emitted.
* For provision state change all related node updates should be done before
event processing.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
* yuriyz
Other contributors:
* vdrok
* mariojv
Work Items
----------
* Implement node provision state change notifications.
* Implement CRUD notifications and node maintenance notifications.
* Implement console notifications.
* Add notifications to the current ironic code that creates resources in the
drivers.
* Fix ironic code with node updates after event processing.
Dependencies
============
Patch with base ``NodePayload`` [#]_.
Testing
=======
Unit tests will be added.
Upgrades and Backwards Compatibility
====================================
None
Documentation Impact
====================
New notifications feature will be documented.
References
==========
.. [#] https://wiki.openstack.org/wiki/Searchlight
.. [#] https://github.com/openstack/ironic/blob/2c76da5f437c5fc2f4022e8705e74fed0a46bebb/ironic/drivers/modules/irmc/inspect.py#L177
.. [#] https://github.com/openstack/ironic/blob/2c76da5f437c5fc2f4022e8705e74fed0a46bebb/ironic/drivers/modules/ilo/inspect.py#L56
.. [#] https://review.openstack.org/#/c/321865/