watcher/doc/source/dev/glossary.rst

31 KiB

Glossary

This page explains the different terms used in the Watcher system.

They are sorted in alphabetical order.

Action

An Action <action_definition> is what enables Watcher to transform the current state of a Cluster <cluster_definition> after an Audit <audit_definition>.

An Action <action_definition> is an atomic task which changes the current state of a target Managed resource <managed_resource_definition> of the OpenStack Cluster <cluster_definition> such as:

  • Live migration of an instance from one compute node to another compute node with Nova
  • Changing the power level of a compute node (ACPI level, ...)
  • Changing the current state of an hypervisor (enable or disable) with Nova

In most cases, an Action <action_definition> triggers some concrete commands on an existing OpenStack module (Nova, Neutron, Cinder, Ironic, etc.) via a Primitive <primitive_definition>.

An Action <action_definition> has a life-cycle and its current state may be one of the following:

  • PENDING : the Action <action_definition> has not been executed yet by the Watcher Applier <watcher_applier_definition>
  • ONGOING : the Action <action_definition> is currently being processed by the Watcher Applier <watcher_applier_definition>
  • SUCCEEDED : the Action <action_definition> has been executed successfully
  • FAILED : an error occured while trying to execute the Action <action_definition>
  • DELETED : the Action <action_definition> is still stored in the Watcher database <watcher_database_definition> but is not returned any more through the Watcher APIs.
  • CANCELLED : the Action <action_definition> was in PENDING or ONGOING state and was cancelled by the Administrator <administrator_definition>

Action Plan

An Action Plan <action_plan_definition> is a flow of Actions <action_definition> that should be executed in order to satisfy a given Goal <goal_definition>.

An Action Plan <action_plan_definition> is generated by Watcher when an Audit <audit_definition> is successful which implies that the Strategy <strategy_definition> which was used has found a Solution <solution_definition> to achieve the Goal <goal_definition> of this Audit <audit_definition>.

In the default implementation of Watcher, an Action Plan <action_plan_definition> is only composed of successive Actions <action_definition> (i.e., a Workflow of Actions <action_definition> belonging to a unique branch).

However, Watcher provides abstract interfaces for many of its components, allowing other implementations to generate and handle more complex Action Plan(s) <action_plan_definition> composed of two types of Action Item(s):

  • simple Actions <action_definition>: atomic tasks, which means it can not be split into smaller tasks or commands from an OpenStack point of view.
  • composite Actions: which are composed of several simple Actions <action_definition> ordered in sequential and/or parallel flows.

An Action Plan <action_plan_definition> may be described using standard workflow model description formats such as Business Process Model and Notation 2.0 (BPMN 2.0) or Unified Modeling Language (UML).

An Action Plan <action_plan_definition> has a life-cycle and its current state may be one of the following:

  • RECOMMENDED : the Action Plan <action_plan_definition> is waiting for a validation from the Administrator <administrator_definition>
  • ONGOING : the Action Plan <action_plan_definition> is currently being processed by the Watcher Applier <watcher_applier_definition>
  • SUCCEEDED : the Action Plan <action_plan_definition> has been executed successfully (i.e. all Actions <action_definition> that it contains have been executed successfully)
  • FAILED : an error occured while executing the Action Plan <action_plan_definition>
  • DELETED : the Action Plan <action_plan_definition> is still stored in the Watcher database <watcher_database_definition> but is not returned any more through the Watcher APIs.
  • CANCELLED : the Action Plan <action_plan_definition> was in PENDING or ONGOING state and was cancelled by the Administrator <administrator_definition>

Administrator

The Administrator <administrator_definition> is any user who has admin access on the OpenStack cluster. This user is allowed to create new projects for tenants, create new users and assign roles to each user.

The Administrator <administrator_definition> usually has remote access to any host of the cluster in order to change the configuration and restart any OpenStack service, including Watcher.

In the context of Watcher, the Administrator <administrator_definition> is a role for users which allows them to run any Watcher commands, such as:

  • Create/Delete an Audit Template <audit_template_definition>
  • Launch an Audit <audit_definition>
  • Get the Action Plan <action_plan_definition>
  • Launch a recommended Action Plan <action_plan_definition> manually
  • Archive previous Audits <audit_definition> and Action Plans <action_plan_definition>

The Administrator <administrator_definition> is also allowed to modify any Watcher configuration files and to restart Watcher services.

Audit

In the Watcher system, an Audit <audit_definition> is a request for optimizing a Cluster <cluster_definition>.

The optimization is done in order to satisfy one Goal <goal_definition> on a given Cluster <cluster_definition>.

For each Audit <audit_definition>, the Watcher system generates an Action Plan <action_plan_definition>.

An Audit <audit_definition> has a life-cycle and its current state may be one of the following:

  • PENDING : a request for an Audit <audit_definition> has been submitted (either manually by the Administrator <administrator_definition> or automatically via some event handling mechanism) and is in the queue for being processed by the Watcher Decision Engine <watcher_decision_engine_definition>
  • ONGOING : the Audit <audit_definition> is currently being processed by the Watcher Decision Engine <watcher_decision_engine_definition>
  • SUCCEEDED : the Audit <audit_definition> has been executed successfully (note that it may not necessarily produce a Solution <solution_definition>).
  • FAILED : an error occured while executing the Audit <audit_definition>
  • DELETED : the Audit <audit_definition> is still stored in the Watcher database <watcher_database_definition> but is not returned any more through the Watcher APIs.
  • CANCELLED : the Audit <audit_definition> was in PENDING or ONGOING state and was cancelled by the Administrator <administrator_definition>

Audit Template

An Audit <audit_definition> may be launched several times with the same settings (Goal <goal_definition>, thresholds, ...). Therefore it makes sense to save those settings in some sort of Audit preset object, which is known as an Audit Template <audit_template_definition>.

An Audit Template <audit_template_definition> contains at least the Goal <goal_definition> of the Audit <audit_definition>.

It may also contain some error handling settings indicating whether:

  • Watcher Applier <watcher_applier_definition> stops the entire operation
  • Watcher Applier <watcher_applier_definition> performs a rollback

and how many retries should be attempted before failure occurs (also the latter can be complex: for example the scenario in which there are many first-time failures on ultimately successful Actions <action_definition>).

Moreover, an Audit Template <audit_template_definition> may contain some settings related to the level of automation for the Action Plan <action_plan_definition> that will be generated by the Audit <audit_definition>. A flag will indicate whether the Action Plan <action_plan_definition> will be launched automatically or will need a manual confirmation from the Administrator <administrator_definition>.

Last but not least, an Audit Template <audit_template_definition> may contain a list of extra parameters related to the Strategy <strategy_definition> configuration. These parameters can be provided as a list of key-value pairs.

Availability Zone

Please, read the official OpenStack definition of an Availability Zone.

Cluster

A Cluster <cluster_definition> is a set of physical machines which provide compute, storage and networking resources and are managed by the same OpenStack Controller node. A Cluster <cluster_definition> represents a set of resources that a cloud provider is able to offer to his/her customers <customer_definition>.

A data center may contain several clusters.

The Cluster <cluster_definition> may be divided in one or several Availability Zone(s) <availability_zone_definition>.

Cluster Data Model

A Cluster Data Model <cluster_data_model_definition> is a logical representation of the current state and topology of the Cluster <cluster_definition> Managed resources <managed_resource_definition>.

It is represented as a set of Managed resources <managed_resource_definition> (which may be a simple tree or a flat list of key-value pairs) which enables Watcher Strategies <strategy_definition> to know the current relationships between the different resources <managed_resource_definition>) of the Cluster <cluster_definition> during an Audit <audit_definition> and enables the Strategy <strategy_definition> to request information such as:

- What compute nodes are in a given Availability Zone <availability_zone_definition> or a given Host Aggregate <host_aggregates_definition> ? - What Instances <instance_definition> are hosted on a given compute node ? - What is the current load of a compute node ? - What is the current free memory of a compute node ? - What is the network link between two compute nodes ? - What is the available bandwidth on a given network link ? - What is the current space available on a given virtual disk of a given Instance <instance_definition> ? - What is the current state of a given Instance <instance_definition>? - ...

In a word, this data model enables the Strategy <strategy_definition> to know:

  • the current topology of the Cluster <cluster_definition>
  • the current capacity for each Managed resource <managed_resource_definition>
  • the current amount of used/free space for each Managed resource <managed_resource_definition>
  • the current state of each Managed resources <managed_resource_definition>

In the Watcher project, we aim at providing a generic and very basic Cluster Data Model <cluster_data_model_definition> for each Goal <goal_definition>, usable in the associated Strategies <strategy_definition> through some helper classes in order to:

  • simplify the development of a new Strategy <strategy_definition> for a given Goal <goal_definition> when there already are some existing Strategies <strategy_definition> associated to the same Goal <goal_definition>
  • avoid duplicating the same code in several Strategies <strategy_definition> associated to the same Goal <goal_definition>
  • have a better consistency between the different Strategies <strategy_definition> for a given Goal <goal_definition>
  • avoid any strong coupling with any external Cluster Data Model <cluster_data_model_definition> (the proposed data model acts as a pivot data model)

There may be various generic and basic Cluster Data Models <cluster_data_model_definition> proposed in Watcher helpers, each of them being adapted to achieving a given Goal <goal_definition>:

  • For example, for a Goal <goal_definition> which aims at optimizing the network resources <managed_resource_definition> the Strategy <strategy_definition> may need to know which resources <managed_resource_definition> are communicating together.
  • Whereas for a Goal <goal_definition> which aims at optimizing thermal and power conditions, the Strategy <strategy_definition> may need to know the location of each compute node in the racks and the location of each rack in the room.

Note however that a developer can use his/her own Cluster Data Model <cluster_data_model_definition> if the proposed data model does not fit his/her needs as long as the Strategy <strategy_definition> is able to produce a Solution <solution_definition> for the requested Goal <goal_definition>. For example, a developer could rely on the Nova Data Model to optimize some compute resources.

The Cluster Data Model <cluster_data_model_definition> may be persisted in any appropriate storage system (SQL database, NoSQL database, JSON file, XML File, In Memory Database, ...).

Cluster History

The Cluster History <cluster_history_definition> contains all the previously collected timestamped data such as metrics and events associated to any managed resource <managed_resource_definition> of the Cluster <cluster_definition>.

Just like the Cluster Data Model <cluster_data_model_definition>, this history may be used by any Strategy <strategy_definition> in order to find the most optimal Solution <solution_definition> during an Audit <audit_definition>.

In the Watcher project, a generic Cluster History <cluster_history_definition> API is proposed with some helper classes in order to :

- avoid duplicating the same code in several Strategies <strategy_definition> - have a better consistency between the different Strategies <strategy_definition> - avoid any strong coupling with any external metrics/events storage system (the proposed API and measurement naming system acts as a pivot format)

Note however that a developer can use his/her own history management system if the Ceilometer system does not fit his/her needs as long as the Strategy <strategy_definition> is able to produce a Solution <solution_definition> for the requested Goal <goal_definition>.

The Cluster History <cluster_history_definition> data may be persisted in any appropriate storage system (InfluxDB, OpenTSDB, MongoDB,...).

Controller Node

A controller node is a machine that typically runs the following core OpenStack services:

  • Keystone: for identity and service management
  • Cinder scheduler: for volumes management
  • Glance controller: for image management
  • Neutron controller: for network management
  • Nova controller: for global compute resources management with services such as nova-scheduler, nova-conductor and nova-network.

In many configurations, Watcher will reside on a controller node even if it can potentially be hosted on a dedicated machine.

Compute node

Please, read the official OpenStack definition of a Compute Node.

Customer

A Customer <customer_definition> is the person or company which subscribes to the cloud provider offering. A customer may have several Project(s) <project_definition> hosted on the same Cluster <cluster_definition> or dispatched on different clusters.

In the private cloud context, the Customers <customer_definition> are different groups within the same organization (different departments, project teams, branch offices and so on). Cloud infrastructure includes the ability to precisely track each customer's service usage so that it can be charged back to them, or at least reported to them.

Goal

A Goal <goal_definition> is a human readable, observable and measurable end result having one objective to be achieved.

Here are some examples of Goals <goal_definition>:

  • minimize the energy consumption
  • minimize the number of compute nodes (consolidation)
  • balance the workload among compute nodes
  • minimize the license cost (some softwares have a licensing model which is based on the number of sockets or cores where the software is deployed)
  • find the most appropriate moment for a planned maintenance on a given group of host (which may be an entire availability zone): power supply replacement, cooling system replacement, hardware modification, ...

Host Aggregate

Please, read the official OpenStack definition of a Host Aggregate.

Instance

A running virtual machine, or a virtual machine in a known state such as suspended, that can be used like a hardware server.

Managed resource

A Managed resource <managed_resource_definition> is one instance of Managed resource type <managed_resource_type_definition> in a topology with particular properties and dependencies on other Managed resources <managed_resource_definition> (relationships).

For example, a Managed resource <managed_resource_definition> can be one virtual machine (i.e., an instance <instance_definition>) hosted on a compute node <compute_node_definition> and connected to another virtual machine through a network link (represented also as a Managed resource <managed_resource_definition> in the Cluster Data Model <cluster_data_model_definition>).

Managed resource type

A Managed resource type <managed_resource_definition> is a type of hardware or software element of the Cluster <cluster_definition> that the Watcher system can act on.

Here are some examples of Managed resource types <managed_resource_definition>:

It can be any of the the official list of available resource types defined in OpenStack for HEAT.

Optimization Efficiency

The Optimization Efficiency <efficiency_definition> is the objective measure of how much of the Goal <goal_definition> has been achieved in respect with constraints and SLAs <sla_definition> defined by the Customer <customer_definition>.

The way efficiency is evaluated will depend on the Goal <goal_definition> to achieve.

Of course, the efficiency will be relevant only as long as the Action Plan <action_plan_definition> is relevant (i.e., the current state of the Cluster <cluster_definition> has not changed in a way that a new Audit <audit_definition> would need to be launched).

For example, if the Goal <goal_definition> is to lower the energy consumption, the Efficiency <efficiency_definition> will be computed using several indicators (KPIs):

  • the percentage of energy gain (which must be the highest possible)
  • the number of SLA violations <sla_violation_definition> (which must be the lowest possible)
  • the number of virtual machine migrations (which must be the lowest possible)

All those indicators (KPIs) are computed within a given timeframe, which is the time taken to execute the whole Action Plan <action_plan_definition>.

The efficiency also enables the Administrator <administrator_definition> to objectively compare different Strategies <strategy_definition> for the same goal and same workload of the Cluster <cluster_definition>.

Project

Projects <project_definition> represent the base unit of “ownership” in OpenStack, in that all resources <managed_resource_definition> in OpenStack should be owned by a specific project <project_definition>. In OpenStack Identity, a project <project_definition> must be owned by a specific domain.

Please, read the official OpenStack definition of a Project.

Primitive

A Primitive <primitive_definition> is the component that carries out a certain type of atomic Actions <action_definition> on a given Managed resource <managed_resource_definition> (nova, swift, neutron, glance,..). A Primitive <primitive_definition> is a part of the Watcher Applier <watcher_applier_definition> module.

For example, there can be a Primitive <primitive_definition> which is responsible for creating a snapshot of a given instance on a Nova compute node. This Primitive <primitive_definition> knows exactly how to send the appropriate commands to Nova for this type of Actions <action_definition>.

SLA

SLA <sla_definition> means Service Level Agreement.

The resources are negotiated between the Customer <customer_definition> and the Cloud Provider in a contract.

Most of the time, this contract is composed of two documents:

  • SLA <sla_definition> : Service Level Agreement
  • SLO <slo_definition> : Service Level Objectives

Note that the SLA <sla_definition> is more general than the SLO <slo_definition> in the sense that the former specifies what service is to be provided, how it is supported, times, locations, costs, performance, and responsibilities of the parties involved while the SLO <slo_definition> focuses on more measurable characteristics such as availability, throughput, frequency, response time or quality.

You can also read the Wikipedia page for SLA which provides a good definition.

SLA violation

A SLA violation <sla_violation_definition> happens when a SLA <sla_definition> defined with a given Customer <customer_definition> could not be respected by the cloud provider within the timeframe defined by the official contract document.

SLO

A Service Level Objective (SLO) is a key element of a SLA <sla_definition> between a service provider and a Customer <customer_definition>. SLOs are agreed as a means of measuring the performance of the Service Provider and are outlined as a way of avoiding disputes between the two parties based on misunderstanding.

You can also read the Wikipedia page for SLO which provides a good definition.

Solution

A Solution <solution_definition> is a set of Actions <action_definition> generated by a Strategy <strategy_definition> (i.e., an algorithm) in order to achieve the Goal <goal_definition> of an Audit <audit_definition>.

A Solution <solution_definition> is different from an Action Plan <action_plan_definition> because it contains the non-scheduled list of Actions <action_definition> which is produced by a Strategy <strategy_definition>. In other words, the list of Actions in a Solution <solution_definition> has not yet been re-ordered by the Watcher Planner <watcher_planner_definition>.

Note that some algorithms (i.e. Strategies <strategy_definition>) may generate several Solutions <solution_definition>. This gives rise to the problem of determining which Solution <solution_definition> should be applied.

Two approaches to dealing with this can be envisaged:

- fully automated mode: only the Solution <solution_definition>
with the highest ranking (i.e., the highest

Optimization Efficiency <efficiency_definition>) will be sent to the Watcher Planner <watcher_planner_definition> and translated into concrete Actions <action_definition>.

- manual mode: several Solutions <solution_definition> are proposed

to the Administrator <administrator_definition> with a detailed measurement of the estimated Optimization Efficiency <efficiency_definition> and he/she decides which one will be launched.

Strategy

A Strategy <strategy_definition> is an algorithm implementation which is able to find a Solution <solution_definition> for a given Goal <goal_definition>.

There may be several potential strategies which are able to achieve the same Goal <goal_definition>. This is why it is possible to configure which specific Strategy <strategy_definition> should be used for each Goal <goal_definition>.

Some strategies may provide better optimization results but may take more time to find an optimal Solution <solution_definition>.

When a new Goal <goal_definition> is added to the Watcher configuration, at least one default associated Strategy <strategy_definition> should be provided as well.

Watcher Applier

This component is in charge of executing the Action Plan <action_plan_definition> built by the Watcher Decision Engine <watcher_decision_engine_definition>.

See architecture for more details on this component.

Watcher Database

This database stores all the Watcher domain objects which can be requested by the Watcher API or the Watcher CLI:

  • Audit templates
  • Audits
  • Action plans
  • Actions
  • Goals

The Watcher domain being here "optimization of some resources provided by an OpenStack system".

See architecture for more details on this component.

Watcher Decision Engine

This component is responsible for computing a set of potential optimization Actions <action_definition> in order to fulfill the Goal <goal_definition> of an Audit <audit_definition>.

It first reads the parameters of the Audit <audit_definition> from the associated Audit Template <audit_template_definition> and knows the Goal <goal_definition> to achieve.

It then selects the most appropriate Strategy <strategy_definition> depending on how Watcher was configured for this Goal <goal_definition>.

The Strategy <strategy_definition> is then executed and generates a set of Actions <action_definition> which are scheduled in time by the Watcher Planner <watcher_planner_definition> (i.e., it generates an Action Plan <action_plan_definition>).

See architecture for more details on this component.

Watcher Planner

The Watcher Planner <watcher_planner_definition> is part of the Watcher Decision Engine <watcher_decision_engine_definition>.

This module takes the set of Actions <action_definition> generated by a Strategy <strategy_definition> and builds the design of a workflow which defines how-to schedule in time those different Actions <action_definition> and for each Action <action_definition> what are the prerequisite conditions.

It is important to schedule Actions <action_definition> in time in order to prevent overload of the Cluster <cluster_definition> while applying the Action Plan <action_plan_definition>. For example, it is important not to migrate too many instances at the same time in order to avoid a network congestion which may decrease the SLA <sla_definition> for Customers <customer_definition>.

It is also important to schedule Actions <action_definition> in order to avoid security issues such as denial of service on core OpenStack services.

See architecture for more details on this component.