Support Vitrage resources in Heat

The purpose is to automate the auto-healing process that
involves external monitoring, Vitrage alarm deduction and
Mistral workflow execution.

Story: 2002684
Task: 22527

Change-Id: If66248e07a662a225799a2bd3fc88a31d1539021
This commit is contained in:
Ifat Afek 2019-02-20 17:42:13 +00:00
parent d0a0accca7
commit de6ff6dc03
2 changed files with 361 additions and 1 deletions

View File

@ -66,13 +66,21 @@ Queens
specs/queens/*
Rocky
------
-----
.. toctree::
:glob:
:maxdepth: 1
specs/rocky/*
Stein
-----
.. toctree::
:glob:
:maxdepth: 1
specs/stein/*
Backlog
-------
.. toctree::

View File

@ -0,0 +1,352 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================
Support Vitrage Resources
=========================
https://storyboard.openstack.org/#!/story/2002684
This Blueprint proposes to add support for Vitrage resources in Heat.
The purpose is to automate the auto-healing process that involves external
monitoring, Vitrage alarm deduction and Mistral workflow execution.
Problem description
===================
Auto-healing a Heat stack when an instance is down is extremely important.
This use case is already handled when Nova sends a notification about the
instance state, Aodh raises an event alarm and as a result a Mistral healing
workflow is executed.
However, there are cases where Nova is not aware of the real state of the
instance. One example is a network failure: a NIC that is down can result in no
network connectivity to certain instances, while their state in Nova remains
'Active'. We would like to support auto-healing in such cases as well.
Proposed change
===============
An ``OS::Vitrage::Template`` resource will be added in Heat, under
heat/engine/resources/openstack/vitrage.
Its role will be to create, based on the properties given in HOT template,
a Vitrage template with a condition->action scenario that will handle the
healing. For this purpose it will use a Vitrage template with parameters that
will reside in the same directory. A template with parameters can be reused,
and very few such templates will be needed in order to support most
self-healing use cases.
The VitrageTemplate resource will support use cases like:
#. An external monitor detects a network failure
#. Vitrage is notified, and based on its topology-graph it identifies all
affected resources
#. If an instance that belongs to a Heat stack is affected, Vitrage will
execute a Mistral healing workflow
VitrageTemplate definition
--------------------------
.. code-block:: yaml
resources:
name:
type: OS::Vitrage::Template
description: String
properties:
template_file: String
template_params:
...
Properties:
- template_file - the Vitrage template file
- description - Description of the Vitrage template
- template_params - list of key/value parameters that are required for the
given template
Example 1 - instance down alarm
-------------------------------
If there is an 'Instance down' alarm on an instance, execute a Mistral healing
workflow on that instance.
Hot Template
^^^^^^^^^^^^
.. code-block:: yaml
resources:
execute_healing:
type: OS::Vitrage::Template
description: Execute Mistral healing workflow if instance is down
properties:
template_file: { get_file: "execute_healing_on_instance_down.yaml" }
template_params:
instance_alarm_name: Instance down
instance_id: {get_resource: server}
workflow_name: {get_resource: autoheal}
heat_stack_id: {get_param: "OS::stack_id"}
Vitrage Template Prototype
^^^^^^^^^^^^^^^^^^^^^^^^^^
**execute_healing_on_instance_down.yaml**
.. code-block:: yaml
metadata:
version: 2
name: get_param(template_name)
description: get_param(description)
type: standard
parameters:
description:
description: The description of the template
template_name:
description: Name of the template to create
instance_alarm_name:
description: Name of the alarm on the instance
instance_id:
description: Uuid of the instance to auto-heal
heat_stack_id:
description: Uuid of the Heat stack to auto-heal
workflow_name:
description: Name of the Mistral workflow to execute
definitions:
entities:
- entity:
category: ALARM
name: get_param(instance_alarm_name)
template_id: alarm
- entity:
category: RESOURCE
type: nova.instance
id: get_param(instance_id)
template_id: instance
relationships:
- relationship:
source: alarm
relationship_type: on
target: instance
template_id : alarm_on_instance
scenarios:
- scenario:
condition: alarm_on_instance
actions:
- action:
action_type: execute_mistral
properties:
workflow: get_param(workflow_name)
input:
instance_id: get_param(instance_id)
heat_stack_id: get_param(heat_stack_id)
Example 2 - host down alarm
---------------------------
If there is a 'Host down' alarm on a host, and the host contains the instance
that is defined in this template, execute a Mistral healing workflow on that
instance.
This example is similar to the first one, just that it uses a more complex
Vitrage template that considers the host->instance relationship. It also
performs other actions, in addition to executing a Mistral healing workflow:
- modify the states of host and the instances in Vitrage
- raise an alarm on the instance and mark the host alarm as its root cause
- notify Nova that the host and instance are down
All the complexity resides in the reusable Vitrage template, while the Heat
usage is simple and quite straight forward.
Hot Template
^^^^^^^^^^^^
.. code-block:: yaml
resources:
execute_healing:
type: OS::Vitrage::Template
description: Execute Mistral healing workflow if a host is down
properties:
template_file: { get_file: "execute_healing_on_host_down.yaml" }
template_params:
host_alarm_name: Host down
instance_alarm_name: Instance down
instance_id: {get_resource: server}
workflow_name: {get_resource: autoheal}
heat_stack_id: {get_param: "OS::stack_id"}
Vitrage Template Prototype
^^^^^^^^^^^^^^^^^^^^^^^^^^
**execute_healing_on_host_down.yaml**
.. code-block:: yaml
metadata:
version: 2
name: get_param(template_name)
description: get_param(description)
type: standard
parameters:
description:
description: The description of the template
template_name:
description: Name of the template to create
host_alarm_name:
description: Name of the alarm on the host
instance_id:
description: Uuid of the instance to auto-heal
heat_stack_id:
description: Uuid of the Heat stack to auto-heal
instance_alarm_name:
description: Name of the alarm to be created on the instance
instance_alarm_severity:
description: Severity of the alarm to be created on the instance
default: critical
host_state:
description: New state to be set for the host
default: ERROR
instance_state:
description: New state to be set for the instance
default: ERROR
workflow_name:
description: Name of the Mistral workflow to execute
definitions:
entities:
- entity:
category: ALARM
name: get_param(host_alarm_name)
template_id: host_alarm
- entity:
category: ALARM
name: get_param(instance_alarm_name)
template_id: instance_alarm
- entity:
category: RESOURCE
type: nova.host
template_id: host
- entity:
category: RESOURCE
type: nova.instance
id: get_param(instance_id)
template_id: instance
relationships:
- relationship:
source: host_alarm
relationship_type: on
target: host
template_id : alarm_on_host
- relationship:
source: host
relationship_type: contains
target: instance
template_id : host_contains_instance
- relationship:
source: instance_alarm
relationship_type: on
target: instance
template_id : alarm_on_instance
scenarios:
- scenario:
condition: alarm_on_host
actions:
- action:
action_type: set_state
action_target:
target: host
properties:
state: get_param(host_state)
- action:
action_type: mark_down
action_target:
target: host
- scenario:
condition: alarm_on_host and host_contains_instance
actions:
- action:
action_type: raise_alarm
action_target:
target: instance
properties:
alarm_name: get_param(instance_alarm_name)
severity: get_param(instance_alarm_severity)
- scenario:
condition: alarm_on_instance
actions:
- action:
action_type: execute_mistral
properties:
workflow: get_param(workflow_name)
input:
instance_id: get_param(instance_id)
heat_stack_id: get_param(heat_stack_id)
- action:
action_type: set_state
action_target:
target: instance
properties:
state: get_param(instance_state)
- action:
action_type: mark_down
action_target:
target: instance
- scenario:
condition: alarm_on_host and host_contains_instance and alarm_on_instance
actions:
- action:
action_type: add_causal_relationship
action_target:
source: host_alarm
target: instance_alarm
Alternatives
------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
ifat_afek
Milestones
----------
Target Milestone for completion:
stein-3
Work Items
----------
- Implement a Vitrage client plugin
- Implement the VitrageTemplate resource
- Add unit tests and tempest tests
- Add a HOT template example to heat-templates
Dependencies
============
None. Template with parameters is supported in Vitrage Stein release.