
This changes from 'inject_nmi' which doesn't feel right for an OSC command, because it isn't just english words. Change-Id: I466039fb7407e69e1375421594378a0a7bce1d7e
17 KiB
Enhance Power Interface for Soft Power Off and NMI
https://bugs.launchpad.net/ironic/+bug/1526226
The proposal presents the work required to enhance the power interface to support soft reboot and soft power off, and the management interface to support diagnostic interrupt (NMI [1]).
Problem description
There exists a problem in the current driver interface which doesn't provide with soft power off and diagnostic interrupt (NMI [1]) capabilities even though ipmitool [2] and most of BMCs support these capabilities.
Here is a part of ipmitool man page in which describes soft power off and diagnostic interrupt (NMI [1]).
$ man ipmitool:
...
power
Performs a chassis control command to view and change the
power state.
...
diag
Pulse a diagnostic interrupt (NMI) directly to the
processor(s).
soft
Initiate a soft-shutdown of OS via ACPI. This can be
done in a number of ways, commonly by simulating an
overtemperature or by simulating a power button press.
It is necessary for there to be Operating System
support for ACPI and some sort of daemon watching for
events for this soft power to work.
From customer's point of view, both tenant admin and tenant user, the lack of the soft power off and diagnostic interrupt (NMI [1]) lead the following inconveniences.
- Customer cannot safely shutdown or soft power off their instance without logging on.
- Customer cannot take NMI dump to investigate OS related problem by themselves.
From deployer's point of view, that is cloud provider, the lack of the two capabilities leads the following inconveniences.
- Cloud provider support staff cannot shutdown customer's instance safely without logging on for hardware maintenance reason or etc.
- Cloud provider support staff cannot ask customer to take NMI dump as one of investigation materials.
Proposed change
In order to solve the problems described in the previous section, this spec proposes to enhance the power states, the PowerInterface base class and the ManagementInterface base class so that each driver can implement to initiate soft reboot, soft power off and inject NMI.
And this enhancement enables the soft reboot, soft power off and inject NMI through Ironic CLI and REST API for tenant admin and cloud provider. Also this enhancement enables them through Nova CLI and REST API for tenant user when Nova's blueprint [3] is implemented.
As a reference implementation, this spec also proposes to implement the enhanced PowerInterface base class into the IPMIPower concrete class and the enhanced ManagementInterface base class into the IPMIManagement concrete class.
add the following new power states to ironic.common.states:
SOFT_REBOOT = 'soft rebooting' SOFT_POWER_OFF = 'soft power off'
add "get_supported_power_states" method and its default implementation to the base PowerInterface class in ironic/drivers/base.py:
def get_supported_power_states(self, task): """Get a list of the supported power states. :param task: A TaskManager instance containing the node to act on. :returns: A list of the supported power states defined in :mod:`ironic.common.states`. """ return [states.POWER_ON, states.POWER_OFF, states.REBOOT]
- Note: WakeOnLanPower driver supports only states.POWER_ON.
add a default parameter
timeout
into the "set_power_state" method in to the base PowerInterface class in ironic/drivers/base.py:@abc.abstractmethod def set_power_state(self, task, power_state, timeout=None): """Set the power state of the task's node. :param task: a TaskManager instance containing the node to act on. :param power_state: Any power state from :mod:`ironic.common.states`. :param timeout: timeout positive integer (> 0) for any power state. ``None`` indicates to use default timeout which depends on ``power_state``[*]_ and driver. :raises: MissingParameterValue if a required parameter is missing. """
enhance "set_power_state" method in IPMIPower class so that the new states can be accepted as "power_state" parameter.
IPMIPower reference implementation supports SOFT_REBOOT and SOFT_POWER_OFF.
SOFT_REBOOT is implemented by first SOFT_POWER_OFF and then a plain POWER_ON such that Ironic implemented REBOOT. This implementation enables generic BMC detect the reboot completion as the power state change from ON -> OFF -> ON which power transition is called
power cycle
.The following table shows power state value of each state variables.
new_state
is a value of the second parameter of set_power_state() function.power_state
is a value of node property.target_power_state
is a value of node property.new_state power_state (start state) target_power_state (assigned value) power_state (end state) SOFT_REBOOT
SOFT_REBOOT SOFT_POWER_OFF SOFT_POWER_OFF
POWER_ON POWER_OFF POWER_OFF POWER_ON POWER_OFF SOFT_POWER_OFF POWER_ON POWER_ON SOFT_POWER_OFF NONE POWER_OFF POWER_ON POWER_ON POWER_OFF POWER_OFF new_state power_state (start state) target_power_state (assigned value) power_state (end state) SOFT_REBOOT SOFT_POWER_OFF POWER_ON POWER_ON SOFT_POWER_OFF SOFT_POWER_OFF ERROR ERROR add "get_supported_power_states" method and implementation in IPMIPower:
def get_supported_power_states(self, task): """Get a list of the supported power states. :param task: A TaskManager instance containing the node to act on. currently not used. :returns: A list of the supported power states defined in :mod:`ironic.common.states`. """ return [states.POWER_ON, states.POWER_OFF, states.REBOOT, states.SOFT_REBOOT, states.SOFT_POWER_OFF]
add "inject_nmi" abstract method to the base ManagementInterface class in ironic/drivers/base.py:
@abc.abstractmethod def inject_nmi(self, task): """Inject NMI, Non Maskable Interrupt. :param task: A TaskManager instance containing the node to act on. :returns: None """
add "inject_nmi" concrete method implementation in IPMIManagement class.
Alternatives
- Both the soft power off and diagnostic interrupt (NMI [1]) could be implemented by vendor passthru. However the proposed change is better than the vendor passthru, because users of Ironic API or Ironic CLI can write script or program uniformly.
Data model impact
None
State Machine Impact
None
REST API impact
Add support of SOFT_REBOOT and SOFT_POWER_OFF to the target parameter of following API:
PUT /v1/nodes/(node_ident)/states/power The target parameter supports the following JSON data respectively. ``timeout`` is an optional parameter for any ``target`` parameter. In case of "soft reboot" and "soft power off", ``timeout`` overrides ``soft_power_off_timeout`` in the in the Ironic configuration file, typically /etc/ironic/ironic.conf. Examples {"target": "soft reboot", "timeout": 900} {"target": "soft power off", "timeout": 600}
Add a new "supported_power_states" member to the return type Node and NodeStates, and enhance the following APIs:
GET /v1/nodes/(node_ident) GET /v1/nodes/(node_ident)/states JSON example of the returned type NodeStates { "console_enabled": false, "last_error": null, "power_state": "power on", "provision_state": null, "provision_updated_at": null, "target_power_state": "soft power off", "target_provision_state": "active", "supported_power_states": [ "power on", "power off", "rebooting", "soft rebooting", "soft power off" ] } Consequently Ironic CLI "ironic node-show" and "ironic node-show-states" return "supported_power_states" member in the table format. example of "ironic node-show-states" +------------------------+----------------------------------------+ | Property | Value | +------------------------+----------------------------------------+ | target_power_state | soft power off | | target_provision_state | None | | last_error | None | | console_enabled | False | | provision_updated_at | 2015-08-01T00:00:00+00:00 | | power_state | power on | | provision_state | active | | supported_power_states | ["power on", "power off", "rebooting", | | | "soft rebooting", "soft power off"] | +------------------------+----------------------------------------+
Add a new management API to support inject NMI:
PUT /v1/nodes/(node_ident)/management/inject_nmi Request doesn't take any parameter.
Client (CLI) impact
Enhance Ironic CLI "ironic node-set-power-state" to support power graceful off/reboot by adding optional arguments. This CLI is async. In order to get the latest status, call "ironic node-show-states" and check the returned value.:
usage: ironic node-set-power-state <node> <power-state> [--soft] [--timeout <timeout>] Power a node on/off/reboot, power graceful off/reboot to a node. Positional arguments <node> Name or UUID of the node. <power-state> 'on', 'off', 'reboot' Optional arguments: --soft power graceful off/reboot. --timeout <timeout> timeout positive integer value(> 0) for any ``power-state``. If ``--soft`` option is also specified, it overrides ``soft_power_off_timeout`` in the in the Ironic configuration file, typically /etc/ironic/ironic.conf.
Add a new Ironic CLI "ironic node-inject-nmi" to support inject nmi. This CLI is async. In order to get the latest status, serial console access is required.:
usage: ironic node-inject-nmi <node> Inject NMI, Non Maskable Interrupt. Positional arguments <node> Name or UUID of the node.
Enhance OSC plugin "openstack baremetal node" so that the parameter can accept 'reboot [--soft] [--timeout <timeout>]', 'power [on|off [--soft] [--timeout <timeout>]' and 'inject nmi'. This CLI is async. In order to get the latest status, call "openstack baremetal node show" and check the returned value.:
usage: openstack baremetal node reboot [--soft] [--timeout <timeout>] <uuid> usage: openstack baremetal node power off [--soft] [--timeout <timeout>] <uuid> usage: openstack baremetal node inject nmi <uuid>
RPC API impact
None
Driver API impact
PowerInterface base and ManagementInterface base are enhanced by adding a new method respectively as described in the section "Proposed change". And these enhancements keep API backward compatible. Therefor it doesn't have any risk to break out of tree drivers.
Nova driver impact
The default behavior of "nova reboot" command to a virtual machine instance such as KVM is soft reboot. And "nova reboot" command has a option '--hard' to indicate hard reboot.
However the default behavior of "nova reboot" to an Ironic instance is hard reboot, and --hard option is meaningless to the Ironic instance.
Therefor Ironic Nova driver needs to be update to unify the behavior between virtual machine instance and bare-metal instance.
This problem is reported as a bug [6]. How to fix this problem is specified in nova blueprint [10] and spec [11].
The default behavior change of "nova reboot" command is made by following the standard deprecation policy [12]. How to deprecate nova command is also specified in nova blueprint [10] and spec [11].
Ramdisk impact
None
Security impact
None
Other end user impact
None
Scalability impact
None
Performance Impact
None
Other deployer impact
- Deployer, cloud provider, needs to set up ACPI [7] and NMI [1] capable bare metal servers in cloud environment.
- change the default timeout value (sec) in the Ironic configuration file, typically /etc/ironic/ironic.conf if necessary.
Developer impact
- Each driver developer needs to follow this interface to implement this proposed feature.
Implementation
Assignee(s)
- Primary assignee:
-
Naohiro Tamura (naohirot)
- Other contributors:
-
None
Work Items
- Enhance PowerInterface class and ManagementInterface class to support soft power off and inject nmi [1] as described "Proposed change".
- Enhance Ironic API as described in "REST API impact".
- Enhance Ironic CLI as described in "Client (CLI) impact".
- Implement the enhanced PowerInterface class into the concrete class IPMIPower, and the enhanced ManagementInterface class into the concrete class IPMIManagement. Implementing vendor's concrete class is up to each vendor.
- Coordinate the work with Nova NMI support "Inject NMI to an instance" [3] if necessary.
- Update the deployer documentation from the ironic perspective.
Dependencies
- Soft power off control depends on ACPI [7]. In case of Linux system, acpid [8] has to be installed. In case of Windows system, local security policy has to be set as described in "Shutdown: Allow system to be shut down without having to log on" [9].
- NMI [1] reaction depends on Kernel Crash Dump Configuration. How to set up the kernel dump can be found for Linux system in [13], [14], and for Windows in [15].
Testing
- Unit Tests.
- Tempest Tests, at least soft reboot/soft power off.
- Each vendor plans Third Party CI Tests if implemented.
Upgrades and Backwards Compatibility
None (Forwards Compatibility is out of scope)
- Note The backwards compatibility issue of the default behavior change of "nova reboot" command is solved by following the standard deprecation policy [12].
Documentation Impact
- The deployer doc and REST API reference manual need to be updated. (CLI manual is generated automatically from source code)
References
[1] http://en.wikipedia.org/wiki/Non-maskable_interrupt
[2] http://linux.die.net/man/1/ipmitool
[3] https://review.openstack.org/#/c/187176/
[4] https://en.wikipedia.org/wiki/Communicating_sequential_processes
[5] http://linux.die.net/man/1/virsh
[6] https://bugs.launchpad.net/nova/+bug/1485416
[7] http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface
[8] http://linux.die.net/man/8/acpid
[9] https://technet.microsoft.com/en-us/library/jj852274%28v=ws.10%29.aspx
[10] https://blueprints.launchpad.net/nova/+spec/soft-reboot-poweroff
[11] https://review.openstack.org/#/c/229282/
[12] http://governance.openstack.org/reference/tags/assert_follows-standard-deprecation.html
[14] https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html