Currently ironic uses one last_error field to record error information when an operation failed. The field is easily overwritten and we don't have a traceback on what happened in the past, the only way is to check service logs. The proposal is to introduce a new table named node_history, and record important node events that helps bare metal maintenance and troubleshooting. Change-Id: I3b8832a945183ce3ed41ea79838fc9f682bfc547 Story: 2002980 Task: 22989
6.3 KiB
Support node history
https://storyboard.openstack.org/#!/story/2002980
This spec proposes node history support for nodes, which is useful for identifying issues.
Problem description
Currently ironic uses one last_error field to record error information when an operation failed, this field is easily overwritten, to traceback the root cause we have to search logs on the conductor host located somewhere in the cloud. To make bare metal management easier, it would be handy to have a history, especially, errors and state transitions of a node.
The proposal is to introduce a new table to store those events and provide API support to retrieve them.
Proposed change
Introduces a new table named node_history
and a db
object NodeHistory
, see Data
model impact for the schema definition.
Implements API layer to support node history query. The node history is supposed to be query only.
Only two kinds of events will be logged in this proposal:
- State transitions
- Everything goes to last_error, this also covers node maintenance state change.
The range could be extended according to requirements in the future, but not included in this spec.
Introduces a periodic task to remove node history entries which exceed specified maximum of number, the number will be configurable by configuration options.
Adds a history
module to provide history interface
abstraction and provides two implementation with none
and
database
.
Alternatives
Other solutions exist, like using LOG collector and aggregator, but they need more integrations and not directly supported from ironic.
Data model impact
A new database table will be added with following schema:
op.create_table('node_history',
sa.Column('created_at', sa.DateTime(), nullable=True),
sa.Column('updated_at', sa.DateTime(), nullable=True),
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('uuid', sa.String(length=36), nullable=False),
sa.Column('conductor', sa.String(length=255), nullable=True),
sa.Column('event', sa.Text(), nullable=True),
sa.Column('node_id', sa.Integer(), nullable=True),
sa.Column('user'), sa.String(length=32), nullable=True),
sa.PrimaryKeyConstraint('id'),
sa.UniqueConstraint('uuid', name='uniq_history0uuid'),
sa.ForeignKeyConstraint(['node_id'], ['nodes.id'], ),
mysql_ENGINE='InnoDB',
mysql_DEFAULT_CHARSET='UTF8')
sa.Index('node_id', 'node_id')
event
is the string conveys what happened to the node,
the content will be truncated to 1000 characters.
conductor
is the hostname of the conductor who recorded
the entry.
user
is the requestor for the operation from the
context, for the Identify service it's a string with fixed length.
State Machine Impact
None
REST API impact
Following endpoints will be added to support querying node history, microversioned. Clients with earlier microversion will receive 404.
- GET /v1/{node_ident}/history
- Retrieve the list of events logged for this node. By default
uuid
,event
andcreated_at
are returned. Theevent
will be truncated to 255 to give a brief information. Detailed history entry will be returned ifdetail
is set to True in the query string. - For a normal request, 200 is returned.
- Retrieve the list of events logged for this node. By default
- GET /v1/{node_ident}/history/{history_uuid}
- Get detailed information of an event.
- For a normal request, 200 is returned.
Client (CLI) impact
"openstack baremetal" CLI
OSC will be enhanced to support following operations:
openstack baremetal node history list
: list all events kept for this nodeopenstack baremetal node history show <uuid>
: show a specific node event
RPC API impact
None
Driver API impact
None
Nova driver impact
None
Ramdisk impact
None
Security impact
None
Other end user impact
None
Scalability impact
Node events could occupy considerable amount of data in the database when this feature is enabled, depending on the scale of bare metals and activities. In such case the configuration options of this feature should be evaluated.
Performance Impact
The new periodic task and database access will use some resource, but should be trivial.
Other deployer impact
Adds following configuration options to change the behavior of this feature:
[conductor]node_history_backend
: can benone
anddatabase
.none
does nothing and effectively disable this feature, this is the default.[conductor]node_history_max_entries
: how many events ironic should keep. Oldest events will be removed when reached max entries. The default is 300, the minimum value is 1.[conductor]node_history_cleanup_interval
: the interval in seconds, the clean up periodic task should be scheduled. One day by default. Set to 0 will disable periodic clean up.[conductor]node_history_cleanup_batch_num
: the maximum number of entries will be removed during one clean up operation.
Developer impact
Other events could be added once this spec is implemented.
Implementation
Assignee(s)
- Primary assignee:
-
<kaifeng, kaifeng.w@gmail.com>
- Other contributors:
-
<None>
Work Items
Implements proposed work:
- Database support
- The history module and two backends namely none and database
- Log history at proper code path
- API support
- CLI support
- Documentation
Dependencies
None
Testing
The feature will be covered by unit test.
Upgrades and Backwards Compatibility
This feature is disabled by default.
Documentation Impact
Documentation will be updated.
References
None