Support Vitrage resources in Heat
The purpose is to automate the auto-healing process that involves external monitoring, Vitrage alarm deduction and Mistral workflow execution. Story: 2002684 Task: 22527 Depends-On: Ie28ba2087c6d87aec57198afe9c328542a4c25ca Change-Id: If66248e07a662a225799a2bd3fc88a31d1539021
This commit is contained in:
parent
8e865c99e9
commit
443ca9cafc
207
specs/rocky/vitrage-resources.rst
Normal file
207
specs/rocky/vitrage-resources.rst
Normal file
@ -0,0 +1,207 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
=========================
|
||||
Support Vitrage Resources
|
||||
=========================
|
||||
|
||||
https://storyboard.openstack.org/#!/story/2002684
|
||||
|
||||
This Blueprint proposes to add support for Vitrage resources in Heat.
|
||||
The purpose is to automate the auto-healing process that involves external
|
||||
monitoring, Vitrage alarm deduction and Mistral workflow execution.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Auto-healing a Heat stack when an instance is down is extremely important.
|
||||
This use case is already handled when Nova sends a notification about the
|
||||
instance state, Aodh raises an event alarm and as a result a Mistral healing
|
||||
workflow is executed.
|
||||
|
||||
However, there are cases where Nova is not aware about the real state of the
|
||||
instance. One example is a network failure: a NIC that is down can result in no
|
||||
network connectivity to certain instances, while their state in Nova remains
|
||||
'Active'. We would like to support auto-healing in such cases as well.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
An OS::Vitrage::Template resource will be added in Heat, under
|
||||
heat/engine/resources/openstack/vitrage.
|
||||
|
||||
Its role will be to create, based on the properties given in HOT template,
|
||||
a Vitrage template with a condition->action scenario that will handle the
|
||||
healing.
|
||||
|
||||
The VitrageTemplate resource will support the following use case:
|
||||
|
||||
#. An external monitor detects a network failure
|
||||
#. Vitrage is notified, and based on its topology-graph it identifies all
|
||||
affected resources
|
||||
#. If an instance that belongs to a Heat stack is affected, Vitrage executes
|
||||
a Mistral healing workflow
|
||||
|
||||
|
||||
The implementation will be done in three phases.
|
||||
|
||||
**Phase 1:** A simple Vitrage template will be created and will support (only)
|
||||
the following scenario:
|
||||
|
||||
If a specific alarm is raised on a specific instance -> execute the Mistral
|
||||
healing workflow
|
||||
|
||||
**Phase 2:** Enable creation of more complex Vitrage templates, like the one in
|
||||
the network failure use case. This will require additional development in
|
||||
Vitrage, so it will provide "template skeletons" for different scenarios.
|
||||
|
||||
**Phase 3:** Enable referencing a complete Vitrage template that is written in
|
||||
a separate yaml file. This will allow all the capabilities that are provided
|
||||
by Vitrage.s
|
||||
|
||||
|
||||
|
||||
**VitrageTemplate definition**
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
resources:
|
||||
name:
|
||||
type: OS::Vitrage::Template
|
||||
properties:
|
||||
type: String # Phase 1 - only 'instance_auto_healing' is supported
|
||||
description: String
|
||||
input:
|
||||
alarm_name: String
|
||||
resource_type: String # Phase 1 - only 'nova.instance' is supported
|
||||
resource_id: String
|
||||
actions:
|
||||
- action:
|
||||
type: String # Phase 1 - only 'execute_mistral' is supported
|
||||
properties:
|
||||
workflow: String
|
||||
workflow_input:
|
||||
{...}
|
||||
|
||||
|
||||
Properties:
|
||||
|
||||
- type - Type of the Vitrage template. On phase 1 only a single template will
|
||||
be supported: if there is an alarm on an instance, execute the workflow.
|
||||
- description - Description of the Vitrage template
|
||||
- alarm_name - The name of the alarm that should trigger the workflow
|
||||
execution. This can be an alarm from an external monitor (like Zabbix,
|
||||
Nagios, Collectd, or Prometheus), or a deduced alarm that was raised by
|
||||
Vitrage.
|
||||
- resource_type - Type of the resource that the alarm is raised on. On phase 1
|
||||
only 'nova.instance' will be supported.
|
||||
- resource_id - Id of the resource that the alarm is raised on
|
||||
- actions - a list of actions to execute as a result. On phase 1 the only
|
||||
supported action will be execute_mistral
|
||||
- workflow - Id of the Mistral workflow to be executed
|
||||
- workflow_input - values to be passed as inputs to the workflow
|
||||
|
||||
|
||||
**Phase 1 example**
|
||||
|
||||
If there is an 'Instance down' alarm on an instance, execute a Mistral healing
|
||||
workflow on that instance.
|
||||
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
resources:
|
||||
execute_healing:
|
||||
type: OS::Vitrage::Template
|
||||
properties:
|
||||
description: Execute Mistral healing workflow if the instance is down
|
||||
input:
|
||||
alarm_name: Instance down
|
||||
instance_id: {get_resource: server}
|
||||
actions:
|
||||
- action:
|
||||
action_type: execute_mistral
|
||||
properties:
|
||||
workflow: {get_resource: autoheal}
|
||||
input:
|
||||
instance_id: {get_resource: server}
|
||||
heat_stack_id: {get_param: "OS::stack_id"}
|
||||
|
||||
|
||||
**Phase 2 example**
|
||||
|
||||
If there is an 'Host down' alarm on a host, and the host contains the instance
|
||||
that is defined in this template, execute a Mistral healing workflow on that
|
||||
instance.
|
||||
|
||||
The differences between the first example and this one are:
|
||||
|
||||
- template type. The template of this type will include internally the
|
||||
host->instance relation
|
||||
- additional ``host_id`` parameter
|
||||
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
resources:
|
||||
execute_healing:
|
||||
type: OS::Vitrage::Template
|
||||
properties:
|
||||
type: host_down_auto_healing
|
||||
description: Execute Mistral healing workflow if the instance is down
|
||||
input:
|
||||
alarm_name: Host down
|
||||
host_id: compute-0
|
||||
instance_id: {get_resource: server}
|
||||
actions:
|
||||
- action:
|
||||
type: execute_mistral
|
||||
properties:
|
||||
workflow: {get_resource: autoheal}
|
||||
input:
|
||||
instance_id: {get_resource: server}
|
||||
heat_stack_id: {get_param: "OS::stack_id"}
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
ifat_afek
|
||||
|
||||
Milestones
|
||||
----------
|
||||
|
||||
Target Milestone for completion:
|
||||
rocky-3
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Phase 1:
|
||||
|
||||
- Implement a Vitrage client plugin
|
||||
- Implement the VitrageTemplate resource
|
||||
- Add unit tests and tempest tests
|
||||
- Add a HOT template example to heat-templates
|
||||
|
||||
Phase 2 (future):
|
||||
|
||||
- Create different types of Vitrage templates
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
No dependencies for phase 1.
|
||||
Phase 2 depends on future Vitrage development.
|
Loading…
Reference in New Issue
Block a user