diff --git a/specs/rocky/vitrage-resources.rst b/specs/rocky/vitrage-resources.rst new file mode 100644 index 00000000..e71c23da --- /dev/null +++ b/specs/rocky/vitrage-resources.rst @@ -0,0 +1,207 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================= +Support Vitrage Resources +========================= + +https://storyboard.openstack.org/#!/story/2002684 + +This Blueprint proposes to add support for Vitrage resources in Heat. +The purpose is to automate the auto-healing process that involves external +monitoring, Vitrage alarm deduction and Mistral workflow execution. + +Problem description +=================== + +Auto-healing a Heat stack when an instance is down is extremely important. +This use case is already handled when Nova sends a notification about the +instance state, Aodh raises an event alarm and as a result a Mistral healing +workflow is executed. + +However, there are cases where Nova is not aware about the real state of the +instance. One example is a network failure: a NIC that is down can result in no +network connectivity to certain instances, while their state in Nova remains +'Active'. We would like to support auto-healing in such cases as well. + +Proposed change +=============== + +An OS::Vitrage::Template resource will be added in Heat, under +heat/engine/resources/openstack/vitrage. + +Its role will be to create, based on the properties given in HOT template, +a Vitrage template with a condition->action scenario that will handle the +healing. + +The VitrageTemplate resource will support the following use case: + +#. An external monitor detects a network failure +#. Vitrage is notified, and based on its topology-graph it identifies all + affected resources +#. If an instance that belongs to a Heat stack is affected, Vitrage executes + a Mistral healing workflow + + +The implementation will be done in three phases. + +**Phase 1:** A simple Vitrage template will be created and will support (only) +the following scenario: + +If a specific alarm is raised on a specific instance -> execute the Mistral +healing workflow + +**Phase 2:** Enable creation of more complex Vitrage templates, like the one in +the network failure use case. This will require additional development in +Vitrage, so it will provide "template skeletons" for different scenarios. + +**Phase 3:** Enable referencing a complete Vitrage template that is written in +a separate yaml file. This will allow all the capabilities that are provided +by Vitrage.s + + + +**VitrageTemplate definition** + +.. code-block:: yaml + + resources: + name: + type: OS::Vitrage::Template + properties: + type: String # Phase 1 - only 'instance_auto_healing' is supported + description: String + input: + alarm_name: String + resource_type: String # Phase 1 - only 'nova.instance' is supported + resource_id: String + actions: + - action: + type: String # Phase 1 - only 'execute_mistral' is supported + properties: + workflow: String + workflow_input: + {...} + + +Properties: + + - type - Type of the Vitrage template. On phase 1 only a single template will + be supported: if there is an alarm on an instance, execute the workflow. + - description - Description of the Vitrage template + - alarm_name - The name of the alarm that should trigger the workflow + execution. This can be an alarm from an external monitor (like Zabbix, + Nagios, Collectd, or Prometheus), or a deduced alarm that was raised by + Vitrage. + - resource_type - Type of the resource that the alarm is raised on. On phase 1 + only 'nova.instance' will be supported. + - resource_id - Id of the resource that the alarm is raised on + - actions - a list of actions to execute as a result. On phase 1 the only + supported action will be execute_mistral + - workflow - Id of the Mistral workflow to be executed + - workflow_input - values to be passed as inputs to the workflow + + +**Phase 1 example** + +If there is an 'Instance down' alarm on an instance, execute a Mistral healing +workflow on that instance. + + +.. code-block:: yaml + + resources: + execute_healing: + type: OS::Vitrage::Template + properties: + description: Execute Mistral healing workflow if the instance is down + input: + alarm_name: Instance down + instance_id: {get_resource: server} + actions: + - action: + action_type: execute_mistral + properties: + workflow: {get_resource: autoheal} + input: + instance_id: {get_resource: server} + heat_stack_id: {get_param: "OS::stack_id"} + + +**Phase 2 example** + +If there is an 'Host down' alarm on a host, and the host contains the instance +that is defined in this template, execute a Mistral healing workflow on that +instance. + +The differences between the first example and this one are: + +- template type. The template of this type will include internally the + host->instance relation +- additional ``host_id`` parameter + + +.. code-block:: yaml + + resources: + execute_healing: + type: OS::Vitrage::Template + properties: + type: host_down_auto_healing + description: Execute Mistral healing workflow if the instance is down + input: + alarm_name: Host down + host_id: compute-0 + instance_id: {get_resource: server} + actions: + - action: + type: execute_mistral + properties: + workflow: {get_resource: autoheal} + input: + instance_id: {get_resource: server} + heat_stack_id: {get_param: "OS::stack_id"} + + +Alternatives +------------ + +None + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + ifat_afek + +Milestones +---------- + +Target Milestone for completion: + rocky-3 + +Work Items +---------- + +Phase 1: + +- Implement a Vitrage client plugin +- Implement the VitrageTemplate resource +- Add unit tests and tempest tests +- Add a HOT template example to heat-templates + +Phase 2 (future): + +- Create different types of Vitrage templates + +Dependencies +============ + +No dependencies for phase 1. +Phase 2 depends on future Vitrage development.