Support node history

Currently ironic uses one last_error field to record error information
when an operation failed. The field is easily overwritten and we don't
have a traceback on what happened in the past, the only way is to check
service logs.

The proposal is to introduce a new table named node_history, and record
important node events that helps bare metal maintenance and troubleshooting.

Change-Id: I3b8832a945183ce3ed41ea79838fc9f682bfc547
Story: 2002980
Task: 22989
This commit is contained in:
Kaifeng Wang 2019-04-15 14:56:05 +08:00
parent 350f8d2a75
commit 30b1bd844f
2 changed files with 242 additions and 0 deletions

View File

@ -0,0 +1,241 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
====================
Support node history
====================
https://storyboard.openstack.org/#!/story/2002980
This spec proposes node history support for nodes, which is useful for
identifying issues.
Problem description
===================
Currently ironic uses one last_error field to record error information when
an operation failed, this field is easily overwritten, to traceback the root
cause we have to search logs on the conductor host located somewhere in the
cloud. To make bare metal management easier, it would be handy to have a
history, especially, errors and state transitions of a node.
The proposal is to introduce a new table to store those events and provide
API support to retrieve them.
Proposed change
===============
Introduces a new table named ``node_history`` and a db object ``NodeHistory``,
see `Data model impact`_ for the schema definition.
Implements API layer to support node history query. The node history is
supposed to be query only.
Only two kinds of events will be logged in this proposal:
* State transitions
* Everything goes to last_error, this also covers node maintenance state
change.
The range could be extended according to requirements in the future, but not
included in this spec.
Introduces a periodic task to remove node history entries which exceed
specified maximum of number, the number will be configurable by configuration
options.
Adds a ``history`` module to provide history interface abstraction and provides
two implementation with ``none`` and ``database``.
Alternatives
------------
Other solutions exist, like using LOG collector and aggregator, but they need
more integrations and not directly supported from ironic.
Data model impact
-----------------
A new database table will be added with following schema::
op.create_table('node_history',
sa.Column('created_at', sa.DateTime(), nullable=True),
sa.Column('updated_at', sa.DateTime(), nullable=True),
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('uuid', sa.String(length=36), nullable=False),
sa.Column('conductor', sa.String(length=255), nullable=True),
sa.Column('event', sa.Text(), nullable=True),
sa.Column('node_id', sa.Integer(), nullable=True),
sa.Column('user'), sa.String(length=32), nullable=True),
sa.PrimaryKeyConstraint('id'),
sa.UniqueConstraint('uuid', name='uniq_history0uuid'),
sa.ForeignKeyConstraint(['node_id'], ['nodes.id'], ),
mysql_ENGINE='InnoDB',
mysql_DEFAULT_CHARSET='UTF8')
sa.Index('node_id', 'node_id')
``event`` is the string conveys what happened to the node, the content will
be truncated to 1000 characters.
``conductor`` is the hostname of the conductor who recorded the entry.
``user`` is the requestor for the operation from the context, for the Identify
service it's a string with fixed length.
State Machine Impact
--------------------
None
REST API impact
---------------
Following endpoints will be added to support querying node history,
microversioned. Clients with earlier microversion will receive 404.
* GET /v1/{node_ident}/history
* Retrieve the list of events logged for this node. By default ``uuid``,
``event`` and ``created_at`` are returned. The ``event`` will be
truncated to 255 to give a brief information. Detailed history entry
will be returned if ``detail`` is set to True in the query string.
* For a normal request, 200 is returned.
* GET /v1/{node_ident}/history/{history_uuid}
* Get detailed information of an event.
* For a normal request, 200 is returned.
Client (CLI) impact
-------------------
"openstack baremetal" CLI
~~~~~~~~~~~~~~~~~~~~~~~~~
OSC will be enhanced to support following operations:
* ``openstack baremetal node history list``: list all events kept for this node
* ``openstack baremetal node history show <uuid>``: show a specific node event
RPC API impact
--------------
None
Driver API impact
-----------------
None
Nova driver impact
------------------
None
Ramdisk impact
--------------
None
Security impact
---------------
None
Other end user impact
---------------------
None
Scalability impact
------------------
Node events could occupy considerable amount of data in the database
when this feature is enabled, depending on the scale of bare metals and
activities. In such case the configuration options of this feature should be
evaluated.
Performance Impact
------------------
The new periodic task and database access will use some resource, but should
be trivial.
Other deployer impact
---------------------
Adds following configuration options to change the behavior of this feature:
* ``[conductor]node_history_backend``: can be ``none`` and ``database``.
``none`` does nothing and effectively disable this feature, this is the
default.
* ``[conductor]node_history_max_entries``: how many events ironic should keep.
Oldest events will be removed when reached max entries. The default is 300,
the minimum value is 1.
* ``[conductor]node_history_cleanup_interval``: the interval in seconds, the
clean up periodic task should be scheduled. One day by default. Set to 0
will disable periodic clean up.
* ``[conductor]node_history_cleanup_batch_num``: the maximum number of entries
will be removed during one clean up operation.
Developer impact
----------------
Other events could be added once this spec is implemented.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
<kaifeng, kaifeng.w@gmail.com>
Other contributors:
<None>
Work Items
----------
Implements proposed work:
* Database support
* The history module and two backends namely none and database
* Log history at proper code path
* API support
* CLI support
* Documentation
Dependencies
============
None
Testing
=======
The feature will be covered by unit test.
Upgrades and Backwards Compatibility
====================================
This feature is disabled by default.
Documentation Impact
====================
Documentation will be updated.
References
==========
None

View File

@ -0,0 +1 @@
../approved/node-history.rst