Document recovery from power faults

Change-Id: I95dbbbf0f2cb7e75d3f1c872ffccad99365df321
This commit is contained in:
Dmitry Tantsur 2021-09-27 11:24:04 +02:00
parent 143449d3db
commit 535c28b67d
2 changed files with 32 additions and 5 deletions

View File

@ -89,3 +89,33 @@ compute service.
power state change event is received from the baremetal service in which
case the power state from compute service's database will be forced on the
node.
.. _power-fault:
Power fault and recovery
========================
When `Baremetal Power Sync`_ is enabled, and the Bare Metal service loses
access to a node (usually because of invalid credentials, BMC issues or
networking interruptions), the node enters ``maintenance`` mode and its
``fault`` field is set to ``power failure``. The exact reason is stored in the
``maintenance_reason`` field.
As always with maintenance mode, only a subset of operations will work on such
nodes, and both the Compute service and the Ironic's native allocation API will
refuse to pick them. Any in-progress operations will either pause or fail.
The conductor responsible for the node will try to recover the connection
periodically (with the interval configured by the
:oslo.config:option:`conductor.power_failure_recovery_interval` option). If the
power sync is successful, the ``fault`` field is unset and the node leaves the
maintenance mode.
.. note::
This only applies to automatic maintenance mode with the ``fault`` field
set. Maintenance mode set manually is never left automatically.
Alternatively, you can disable maintenance mode yourself once the problem is
resolved::
baremetal node maintenance unset <IRONIC NODE>

View File

@ -33,11 +33,8 @@ A few things should be checked in this case:
baremetal node provide <IRONIC NODE>
The Bare metal service automatically puts a node in maintenance mode if
there are issues with accessing its management interface. Check the power
credentials (e.g. ``ipmi_address``, ``ipmi_username`` and ``ipmi_password``)
and then move the node out of maintenance mode::
baremetal node maintenance unset <IRONIC NODE>
there are issues with accessing its management interface. See
:ref:`power-fault` for details.
The ``node validate`` command can be used to verify that all required fields
are present. The following command should not return anything::