This review syncs the state of launchpad with the spec repo for things implemented up to liberty-3. There are no changes to the specs, just things being moved around. Change-Id: I930d33532b268b6e933c8be06a0569c20fd09586
218 lines
5.7 KiB
ReStructuredText
218 lines
5.7 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
===========================================
|
|
New nova API call to mark nova-compute down
|
|
===========================================
|
|
|
|
https://blueprints.launchpad.net/nova/+spec/mark-host-down
|
|
|
|
New API call is needed to change the state of nova-compute service down
|
|
immediately. This allows usage of evacuate API without a delay. Also as
|
|
external system calling the API will make sure no VMs left running, there
|
|
will be no possibility to break shared storage or use same IPs again. API
|
|
usage applies mainly for cases where there is single host mapped to
|
|
nova-compute. Cases like in Ironic or vSphere would be out of scope.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
Nova-compute state change for failed or unreachable host is slow and does
|
|
not reliably state host is down or not. Evacuation cannot happen fast and
|
|
as VMs might still be running, it might lead to reusing same IPs and to data
|
|
corruption in case of shared storage. Also there can be an impact on cloud
|
|
stability due to ability to schedule VMs on failed host.
|
|
|
|
Use Cases
|
|
----------
|
|
|
|
As a user I want to fast evacuate VMs in case nova-compute down.
|
|
|
|
As a user I want to trust VMs will be scheduled to a healthy compute node.
|
|
|
|
As a user I want to trust no VMs are left running in case nova-compute is
|
|
reported down. This can be the case if external system can mark nova-compute
|
|
down when notice fault, so it can be trusted that also the corresponding
|
|
VMs are really down.
|
|
|
|
As a deployer I want to deploy external fault monitoring system that can
|
|
detect different problems that can be translated as host fault to be informed
|
|
to OpenStack and make sure that host is fenced (powered down). Monitoring
|
|
system could monitor interfaces, links, services, memory, CPU, HW, hypervisor,
|
|
OpenStack services,... and make actions accordingly.
|
|
|
|
Project Priority
|
|
-----------------
|
|
|
|
Liberty priorities have not yet been defined.
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
Introducing new services API extensions for setting the power state to up or
|
|
down of the nova-compute.
|
|
|
|
As future work there could be other BP made related to this:
|
|
|
|
* New notification of service state change.
|
|
|
|
Related to instances running on host there could also be BPs made:
|
|
|
|
* There could be an API to set 'power_state: shutdown' for all VMs related to
|
|
a single host.
|
|
* Currently there is an API to reset VM state one by one. There could be an
|
|
API to have the same for all VMs related to a single host.
|
|
|
|
Alternatives
|
|
------------
|
|
|
|
There is no attractive alternatives to detect all different host faults than
|
|
to have a external tool to detect different host faults. For this kind of tool
|
|
to exist there needs to be new API in Nova to report fault. Currently there
|
|
must have been some kind of workarounds implemented as cannot trust or get the
|
|
states from OpenStack fast enough.
|
|
|
|
Data model impact
|
|
-----------------
|
|
|
|
Nova DB service table will have a new Boolean column ``forced_down`` with false
|
|
as default value. Database servicegroup driver ``is_up`` method needs to be
|
|
updated to use this to determine service state is down in case value is true.
|
|
Otherwise current timestamp based usage is expected. Only when ``forced_down``
|
|
flag will be set back to false will nova-compute be allowed to come up and
|
|
have the state reported up.
|
|
|
|
REST API impact
|
|
---------------
|
|
|
|
New compute API to change nova-compute ``forced_down`` flag value to true or
|
|
false:
|
|
|
|
request::
|
|
|
|
PUT /v2.1/{tenant_id}/os-services/force-down
|
|
{
|
|
"binary": "nova-compute",
|
|
"host": "host1",
|
|
"forced_down": true
|
|
}
|
|
|
|
response::
|
|
|
|
200 OK
|
|
{
|
|
"service": {
|
|
"host": "host1",
|
|
"binary": "nova-compute",
|
|
"forced_down": true
|
|
}
|
|
}
|
|
|
|
request::
|
|
|
|
PUT /v2.1/{tenant_id}/os-services/force-down
|
|
{
|
|
"binary": "nova-compute",
|
|
"host": "host1",
|
|
"forced_down": false
|
|
}
|
|
|
|
response::
|
|
|
|
200 OK
|
|
{
|
|
"service": {
|
|
"host": "host1",
|
|
"binary": "nova-compute",
|
|
"forced_down": false
|
|
}
|
|
}
|
|
|
|
Service schema will have new optional parameter:
|
|
|
|
``forced_down``: parameter_types.boolean
|
|
|
|
This will be in response messages to forced_down requests.
|
|
|
|
Besides new call, response for list of services will also contain information
|
|
about state of forced_down field.
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
Configurable by policy, defaulting to admin role.
|
|
|
|
Notifications impact
|
|
--------------------
|
|
|
|
None
|
|
|
|
Other end user impact
|
|
---------------------
|
|
|
|
None
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
None
|
|
|
|
Other deployer impact
|
|
---------------------
|
|
|
|
Deployer can make use of any external system to detect host fault and report it
|
|
to OpenStack.
|
|
|
|
Developer impact
|
|
----------------
|
|
|
|
None
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee: Tomi Juvonen
|
|
Other contributors: Ryota Mibu, Roman Dobosz
|
|
|
|
Work Items
|
|
----------
|
|
|
|
* Test cases.
|
|
* REST API and Service changes.
|
|
Implementation: https://review.openstack.org/#/c/184086/
|
|
* CLI API changes.
|
|
* Documentation.
|
|
|
|
Dependencies
|
|
============
|
|
|
|
None.
|
|
|
|
Testing
|
|
=======
|
|
|
|
Unit and functional test cases needs to be added.
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
New API needs to be documented:
|
|
|
|
* Compute API extensions documentation.
|
|
http://developer.openstack.org/api-ref-compute-v2.1.html
|
|
* nova.compute.api documentation.
|
|
http://docs.openstack.org/developer/nova/api/nova.compute.api.html
|
|
|
|
References
|
|
==========
|
|
* OPNFV Doctor project: https://wiki.opnfv.org/doctor
|
|
* OpenStack Instance HA Proposal:
|
|
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
|
|
* The Different Facets of OpenStack HA:
|
|
http://blog.russellbryant.net/2015/03/10/the-different-facets-of-openstack-ha/
|