Support Vitrage resources in Heat
The purpose is to automate the auto-healing process that involves external monitoring, Vitrage alarm deduction and Mistral workflow execution. Story: 2002684 Task: 22527 Change-Id: If66248e07a662a225799a2bd3fc88a31d1539021
This commit is contained in:
parent
0bb2c20ba8
commit
f91c7b2371
@ -66,13 +66,21 @@ Queens
|
||||
specs/queens/*
|
||||
|
||||
Rocky
|
||||
------
|
||||
-----
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
specs/rocky/*
|
||||
|
||||
Stein
|
||||
-----
|
||||
.. toctree::
|
||||
:glob:
|
||||
:maxdepth: 1
|
||||
|
||||
specs/stein/*
|
||||
|
||||
Backlog
|
||||
-------
|
||||
.. toctree::
|
||||
|
344
specs/stein/vitrage-resources.rst
Normal file
344
specs/stein/vitrage-resources.rst
Normal file
@ -0,0 +1,344 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=========================
|
||||
Support Vitrage Resources
|
||||
=========================
|
||||
|
||||
https://storyboard.openstack.org/#!/story/2002684
|
||||
|
||||
This Blueprint proposes to add support for Vitrage resources in Heat.
|
||||
The purpose is to automate the auto-healing process that involves external
|
||||
monitoring, Vitrage alarm deduction and Mistral workflow execution.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Auto-healing a Heat stack when an instance is down is extremely important.
|
||||
This use case is already handled when Nova sends a notification about the
|
||||
instance state, Aodh raises an event alarm and as a result a Mistral healing
|
||||
workflow is executed.
|
||||
|
||||
However, there are cases where Nova is not aware about the real state of the
|
||||
instance. One example is a network failure: a NIC that is down can result in no
|
||||
network connectivity to certain instances, while their state in Nova remains
|
||||
'Active'. We would like to support auto-healing in such cases as well.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
An ``OS::Vitrage::Template`` resource will be added in Heat, under
|
||||
heat/engine/resources/openstack/vitrage.
|
||||
|
||||
Its role will be to create, based on the properties given in HOT template,
|
||||
a Vitrage template with a condition->action scenario that will handle the
|
||||
healing. For this purpose it will use a Vitrage ``template prototype`` yaml
|
||||
file that will reside in the same directory. The template prototypes can be
|
||||
reused, and very few of them will be needed in order to support most
|
||||
self-healing use cases.
|
||||
|
||||
|
||||
The VitrageTemplate resource will support use cases like:
|
||||
|
||||
#. An external monitor detects a network failure
|
||||
#. Vitrage is notified, and based on its topology-graph it identifies all
|
||||
affected resources
|
||||
#. If an instance that belongs to a Heat stack is affected, Vitrage will
|
||||
execute a Mistral healing workflow
|
||||
|
||||
|
||||
VitrageTemplate definition
|
||||
--------------------------
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
resources:
|
||||
name:
|
||||
type: OS::Vitrage::Template
|
||||
properties:
|
||||
template_prototype: String
|
||||
template_params:
|
||||
description: String
|
||||
...
|
||||
|
||||
|
||||
Properties:
|
||||
|
||||
- template_prototype - filename of the Vitrage template prototype
|
||||
- description - Description of the Vitrage template
|
||||
- template_params - list of key/value parameters that are required for the
|
||||
given template prototype
|
||||
|
||||
|
||||
Example 1 - instance down alarm
|
||||
-------------------------------
|
||||
|
||||
If there is an 'Instance down' alarm on an instance, execute a Mistral healing
|
||||
workflow on that instance.
|
||||
|
||||
Hot Template
|
||||
^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
resources:
|
||||
execute_healing:
|
||||
type: OS::Vitrage::Template
|
||||
properties:
|
||||
template_prototype: execute_healing_on_instance_down.yaml
|
||||
template_params:
|
||||
description: Execute Mistral healing workflow if instance is down
|
||||
instance_alarm_name: Instance down
|
||||
instance_id: {get_resource: server}
|
||||
workflow_name: {get_resource: autoheal}
|
||||
heat_stack_id: {get_param: "OS::stack_id"}
|
||||
|
||||
|
||||
Vitrage Template Prototype
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
**execute_healing_on_instance_down.yaml**
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
metadata:
|
||||
version: 3
|
||||
name: get_param(name)
|
||||
description: get_param(description)
|
||||
type: prototype
|
||||
parameters:
|
||||
instance_alarm_name:
|
||||
description: Name of the alarm on the instance
|
||||
instance_id:
|
||||
description: Uuid of the instance to auto-heal
|
||||
heat_stack_id:
|
||||
description: Uuid of the Heat stack to auto-heal
|
||||
workflow_name:
|
||||
description: Name of the Mistral workflow to execute
|
||||
definitions:
|
||||
entities:
|
||||
- entity:
|
||||
category: ALARM
|
||||
name: get_param(instance_alarm_name)
|
||||
template_id: alarm
|
||||
- entity:
|
||||
category: RESOURCE
|
||||
type: nova.instance
|
||||
id: get_param(instance_id)
|
||||
template_id: instance
|
||||
relationships:
|
||||
- relationship:
|
||||
source: alarm
|
||||
relationship_type: on
|
||||
target: instance
|
||||
template_id : alarm_on_instance
|
||||
scenarios:
|
||||
- scenario:
|
||||
condition: alarm_on_instance
|
||||
actions:
|
||||
- action:
|
||||
action_type: execute_mistral
|
||||
properties:
|
||||
workflow: get_param(workflow_name)
|
||||
input:
|
||||
instance_id: get_param(instance_id)
|
||||
heat_stack_id: get_param(heat_stack_id)
|
||||
|
||||
|
||||
Example 2 - host down alarm
|
||||
---------------------------
|
||||
|
||||
If there is a 'Host down' alarm on a host, and the host contains the instance
|
||||
that is defined in this template, execute a Mistral healing workflow on that
|
||||
instance.
|
||||
|
||||
This example is similar to the first one, just that it uses a more complex
|
||||
Vitrage template that considers the host->instance relationship. It also
|
||||
performs other actions, in addition to executing a Mistral healing workflow:
|
||||
|
||||
- modify the states of host and the instances in Vitrage
|
||||
- raise an alarm on the instance and mark the host alarm as its root cause
|
||||
- notify Nova that the host and instance are down
|
||||
|
||||
All the complexity resides in the reusable Vitrage template prototype, while
|
||||
the Heat usage is simple and quite straight forward.
|
||||
|
||||
|
||||
Hot Template
|
||||
^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
resources:
|
||||
execute_healing:
|
||||
type: OS::Vitrage::Template
|
||||
properties:
|
||||
template_prototype: execute_healing_on_host_down.yaml
|
||||
template_params:
|
||||
description: Execute Mistral healing workflow if a host is down
|
||||
host_alarm_name: Host down
|
||||
instance_alarm_name: Instance down
|
||||
instance_id: {get_resource: server}
|
||||
workflow_name: {get_resource: autoheal}
|
||||
heat_stack_id: {get_param: "OS::stack_id"}
|
||||
|
||||
|
||||
|
||||
Vitrage Template Prototype
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
**execute_healing_on_host_down.yaml**
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
metadata:
|
||||
version: 3
|
||||
name: get_param(name)
|
||||
description: get_param(description)
|
||||
type: prototype
|
||||
parameters:
|
||||
host_alarm_name:
|
||||
description: Name of the alarm on the host
|
||||
instance_id:
|
||||
description: Uuid of the instance to auto-heal
|
||||
heat_stack_id:
|
||||
description: Uuid of the Heat stack to auto-heal
|
||||
instance_alarm_name:
|
||||
description: Name of the alarm to be created on the instance
|
||||
instance_alarm_severity:
|
||||
description: Severity of the alarm to be created on the instance
|
||||
default: critical
|
||||
host_state:
|
||||
description: New state to be set for the host
|
||||
default: ERROR
|
||||
instance_state:
|
||||
description: New state to be set for the instance
|
||||
default: ERROR
|
||||
workflow_name:
|
||||
description: Name of the Mistral workflow to execute
|
||||
definitions:
|
||||
entities:
|
||||
- entity:
|
||||
category: ALARM
|
||||
name: get_param(host_alarm_name)
|
||||
template_id: host_alarm
|
||||
- entity:
|
||||
category: ALARM
|
||||
name: get_param(instance_alarm_name)
|
||||
template_id: instance_alarm
|
||||
- entity:
|
||||
category: RESOURCE
|
||||
type: nova.host
|
||||
template_id: host
|
||||
- entity:
|
||||
category: RESOURCE
|
||||
type: nova.instance
|
||||
id: get_param(instance_id)
|
||||
template_id: instance
|
||||
relationships:
|
||||
- relationship:
|
||||
source: host_alarm
|
||||
relationship_type: on
|
||||
target: host
|
||||
template_id : alarm_on_host
|
||||
- relationship:
|
||||
source: host
|
||||
relationship_type: contains
|
||||
target: instance
|
||||
template_id : host_contains_instance
|
||||
- relationship:
|
||||
source: instance_alarm
|
||||
relationship_type: on
|
||||
target: instance
|
||||
template_id : alarm_on_instance
|
||||
scenarios:
|
||||
- scenario:
|
||||
condition: alarm_on_host
|
||||
actions:
|
||||
- action:
|
||||
action_type: set_state
|
||||
action_target:
|
||||
target: host
|
||||
properties:
|
||||
state: get_param(host_state)
|
||||
- action:
|
||||
action_type: mark_down
|
||||
action_target:
|
||||
target: host
|
||||
- scenario:
|
||||
condition: alarm_on_host and host_contains_instance
|
||||
actions:
|
||||
- action:
|
||||
action_type: raise_alarm
|
||||
action_target:
|
||||
target: instance
|
||||
properties:
|
||||
alarm_name: get_param(instance_alarm_name)
|
||||
severity: get_param(instance_alarm_severity)
|
||||
- scenario:
|
||||
condition: alarm_on_instance
|
||||
actions:
|
||||
- action:
|
||||
action_type: execute_mistral
|
||||
properties:
|
||||
workflow: get_param(workflow_name)
|
||||
input:
|
||||
instance_id: get_param(instance_id)
|
||||
heat_stack_id: get_param(heat_stack_id)
|
||||
- action:
|
||||
action_type: set_state
|
||||
action_target:
|
||||
target: instance
|
||||
properties:
|
||||
state: get_param(instance_state)
|
||||
- action:
|
||||
action_type: mark_down
|
||||
action_target:
|
||||
target: instance
|
||||
- scenario:
|
||||
condition: alarm_on_host and host_contains_instance and alarm_on_instance
|
||||
actions:
|
||||
- action:
|
||||
action_type: add_causal_relationship
|
||||
action_target:
|
||||
source: host_alarm
|
||||
target: instance_alarm
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
ifat_afek
|
||||
|
||||
Milestones
|
||||
----------
|
||||
|
||||
Target Milestone for completion:
|
||||
stein-3
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
- Implement a Vitrage client plugin
|
||||
- Implement the VitrageTemplate resource
|
||||
- Add unit tests and tempest tests
|
||||
- Add a HOT template example to heat-templates
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
Depends on Vitrage template prototypes implementation:
|
||||
https://review.openstack.org/#/c/627861/
|
Loading…
Reference in New Issue
Block a user