Document recovery from power faults

Change-Id: I95dbbbf0f2cb7e75d3f1c872ffccad99365df321
This commit is contained in:
Dmitry Tantsur 2021-09-27 11:24:04 +02:00
parent 143449d3db
commit 535c28b67d
2 changed files with 32 additions and 5 deletions

View File

@ -89,3 +89,33 @@ compute service.
power state change event is received from the baremetal service in which power state change event is received from the baremetal service in which
case the power state from compute service's database will be forced on the case the power state from compute service's database will be forced on the
node. node.
.. _power-fault:
Power fault and recovery
========================
When `Baremetal Power Sync`_ is enabled, and the Bare Metal service loses
access to a node (usually because of invalid credentials, BMC issues or
networking interruptions), the node enters ``maintenance`` mode and its
``fault`` field is set to ``power failure``. The exact reason is stored in the
``maintenance_reason`` field.
As always with maintenance mode, only a subset of operations will work on such
nodes, and both the Compute service and the Ironic's native allocation API will
refuse to pick them. Any in-progress operations will either pause or fail.
The conductor responsible for the node will try to recover the connection
periodically (with the interval configured by the
:oslo.config:option:`conductor.power_failure_recovery_interval` option). If the
power sync is successful, the ``fault`` field is unset and the node leaves the
maintenance mode.
.. note::
This only applies to automatic maintenance mode with the ``fault`` field
set. Maintenance mode set manually is never left automatically.
Alternatively, you can disable maintenance mode yourself once the problem is
resolved::
baremetal node maintenance unset <IRONIC NODE>

View File

@ -33,11 +33,8 @@ A few things should be checked in this case:
baremetal node provide <IRONIC NODE> baremetal node provide <IRONIC NODE>
The Bare metal service automatically puts a node in maintenance mode if The Bare metal service automatically puts a node in maintenance mode if
there are issues with accessing its management interface. Check the power there are issues with accessing its management interface. See
credentials (e.g. ``ipmi_address``, ``ipmi_username`` and ``ipmi_password``) :ref:`power-fault` for details.
and then move the node out of maintenance mode::
baremetal node maintenance unset <IRONIC NODE>
The ``node validate`` command can be used to verify that all required fields The ``node validate`` command can be used to verify that all required fields
are present. The following command should not return anything:: are present. The following command should not return anything::