31 KiB
Glossary
This page explains the different terms used in the Watcher system.
They are sorted in alphabetical order.
Action
An Action <action_definition>
is what enables
Watcher to transform the current state of a Cluster <cluster_definition>
after an Audit <audit_definition>
.
An Action <action_definition>
is an atomic task
which changes the current state of a target Managed resource <managed_resource_definition>
of the OpenStack Cluster <cluster_definition>
such as:
- Live migration of an instance from one compute node to another compute node with Nova
- Changing the power level of a compute node (ACPI level, ...)
- Changing the current state of an hypervisor (enable or disable) with Nova
In most cases, an Action <action_definition>
triggers some
concrete commands on an existing OpenStack module (Nova, Neutron,
Cinder, Ironic, etc.) via a Primitive <primitive_definition>
.
An Action <action_definition>
has a life-cycle and
its current state may be one of the following:
- PENDING : the
Action <action_definition>
has not been executed yet by theWatcher Applier <watcher_applier_definition>
- ONGOING : the
Action <action_definition>
is currently being processed by theWatcher Applier <watcher_applier_definition>
- SUCCEEDED : the
Action <action_definition>
has been executed successfully - FAILED : an error occured while trying to execute
the
Action <action_definition>
- DELETED : the
Action <action_definition>
is still stored in theWatcher database <watcher_database_definition>
but is not returned any more through the Watcher APIs. - CANCELLED : the
Action <action_definition>
was in PENDING or ONGOING state and was cancelled by theAdministrator <administrator_definition>
Action Plan
An Action Plan <action_plan_definition>
is a flow
of Actions <action_definition>
that should be
executed in order to satisfy a given Goal <goal_definition>
.
An Action Plan <action_plan_definition>
is
generated by Watcher when an Audit <audit_definition>
is successful which
implies that the Strategy <strategy_definition>
which was used
has found a Solution <solution_definition>
to achieve the
Goal <goal_definition>
of this Audit <audit_definition>
.
In the default implementation of Watcher, an Action Plan <action_plan_definition>
is only
composed of successive Actions <action_definition>
(i.e., a Workflow of
Actions <action_definition>
belonging to a
unique branch).
However, Watcher provides abstract interfaces for many of its
components, allowing other implementations to generate and handle more
complex Action Plan(s) <action_plan_definition>
composed
of two types of Action Item(s):
- simple
Actions <action_definition>
: atomic tasks, which means it can not be split into smaller tasks or commands from an OpenStack point of view. - composite Actions: which are composed of several simple
Actions <action_definition>
ordered in sequential and/or parallel flows.
An Action Plan <action_plan_definition>
may be
described using standard workflow model description formats such as Business Process Model and
Notation 2.0 (BPMN 2.0) or Unified
Modeling Language (UML).
An Action Plan <action_plan_definition>
has a
life-cycle and its current state may be one of the following:
- RECOMMENDED : the
Action Plan <action_plan_definition>
is waiting for a validation from theAdministrator <administrator_definition>
- ONGOING : the
Action Plan <action_plan_definition>
is currently being processed by theWatcher Applier <watcher_applier_definition>
- SUCCEEDED : the
Action Plan <action_plan_definition>
has been executed successfully (i.e. allActions <action_definition>
that it contains have been executed successfully) - FAILED : an error occured while executing the
Action Plan <action_plan_definition>
- DELETED : the
Action Plan <action_plan_definition>
is still stored in theWatcher database <watcher_database_definition>
but is not returned any more through the Watcher APIs. - CANCELLED : the
Action Plan <action_plan_definition>
was in PENDING or ONGOING state and was cancelled by theAdministrator <administrator_definition>
Administrator
The Administrator <administrator_definition>
is any
user who has admin access on the OpenStack cluster. This user is allowed
to create new projects for tenants, create new users and assign roles to
each user.
The Administrator <administrator_definition>
usually
has remote access to any host of the cluster in order to change the
configuration and restart any OpenStack service, including Watcher.
In the context of Watcher, the Administrator <administrator_definition>
is a
role for users which allows them to run any Watcher commands, such
as:
- Create/Delete an
Audit Template <audit_template_definition>
- Launch an
Audit <audit_definition>
- Get the
Action Plan <action_plan_definition>
- Launch a recommended
Action Plan <action_plan_definition>
manually - Archive previous
Audits <audit_definition>
andAction Plans <action_plan_definition>
The Administrator <administrator_definition>
is also
allowed to modify any Watcher configuration files and to restart Watcher
services.
Audit
In the Watcher system, an Audit <audit_definition>
is a request for
optimizing a Cluster <cluster_definition>
.
The optimization is done in order to satisfy one Goal <goal_definition>
on a given Cluster <cluster_definition>
.
For each Audit <audit_definition>
, the Watcher system
generates an Action Plan <action_plan_definition>
.
An Audit <audit_definition>
has a life-cycle and
its current state may be one of the following:
- PENDING : a request for an
Audit <audit_definition>
has been submitted (either manually by theAdministrator <administrator_definition>
or automatically via some event handling mechanism) and is in the queue for being processed by theWatcher Decision Engine <watcher_decision_engine_definition>
- ONGOING : the
Audit <audit_definition>
is currently being processed by theWatcher Decision Engine <watcher_decision_engine_definition>
- SUCCEEDED : the
Audit <audit_definition>
has been executed successfully (note that it may not necessarily produce aSolution <solution_definition>
). - FAILED : an error occured while executing the
Audit <audit_definition>
- DELETED : the
Audit <audit_definition>
is still stored in theWatcher database <watcher_database_definition>
but is not returned any more through the Watcher APIs. - CANCELLED : the
Audit <audit_definition>
was in PENDING or ONGOING state and was cancelled by theAdministrator <administrator_definition>
Audit Template
An Audit <audit_definition>
may be launched several
times with the same settings (Goal <goal_definition>
, thresholds, ...).
Therefore it makes sense to save those settings in some sort of Audit
preset object, which is known as an Audit Template <audit_template_definition>
.
An Audit Template <audit_template_definition>
contains at least the Goal <goal_definition>
of the Audit <audit_definition>
.
It may also contain some error handling settings indicating whether:
Watcher Applier <watcher_applier_definition>
stops the entire operationWatcher Applier <watcher_applier_definition>
performs a rollback
and how many retries should be attempted before failure occurs (also
the latter can be complex: for example the scenario in which there are
many first-time failures on ultimately successful Actions <action_definition>
).
Moreover, an Audit Template <audit_template_definition>
may
contain some settings related to the level of automation for the Action Plan <action_plan_definition>
that will
be generated by the Audit <audit_definition>
. A flag will indicate
whether the Action Plan <action_plan_definition>
will be
launched automatically or will need a manual confirmation from the Administrator <administrator_definition>
.
Last but not least, an Audit Template <audit_template_definition>
may
contain a list of extra parameters related to the Strategy <strategy_definition>
configuration.
These parameters can be provided as a list of key-value pairs.
Availability Zone
Please, read the official OpenStack definition of an Availability Zone.
Cluster
A Cluster <cluster_definition>
is a set of
physical machines which provide compute, storage and networking
resources and are managed by the same OpenStack Controller node. A Cluster <cluster_definition>
represents a set of
resources that a cloud provider is able to offer to his/her customers <customer_definition>
.
A data center may contain several clusters.
The Cluster <cluster_definition>
may be divided in
one or several Availability Zone(s) <availability_zone_definition>
.
Cluster Data Model
A Cluster Data Model <cluster_data_model_definition>
is a logical representation of the current state and topology of the
Cluster <cluster_definition>
Managed resources <managed_resource_definition>
.
It is represented as a set of Managed resources <managed_resource_definition>
(which may be a simple tree or a flat list of key-value pairs) which
enables Watcher Strategies <strategy_definition>
to know the
current relationships between the different resources <managed_resource_definition>
) of the
Cluster <cluster_definition>
during an Audit <audit_definition>
and enables the Strategy <strategy_definition>
to request
information such as:
- What compute nodes are in a given Availability Zone <availability_zone_definition>
or a given Host Aggregate <host_aggregates_definition>
? -
What Instances <instance_definition>
are hosted on a
given compute node ? - What is the current load of a compute node ? -
What is the current free memory of a compute node ? - What is the
network link between two compute nodes ? - What is the available
bandwidth on a given network link ? - What is the current space
available on a given virtual disk of a given Instance <instance_definition>
? - What is the
current state of a given Instance <instance_definition>
? - ...
In a word, this data model enables the Strategy <strategy_definition>
to know:
- the current topology of the
Cluster <cluster_definition>
- the current capacity for each
Managed resource <managed_resource_definition>
- the current amount of used/free space for each
Managed resource <managed_resource_definition>
- the current state of each
Managed resources <managed_resource_definition>
In the Watcher project, we aim at providing a generic and very basic
Cluster Data Model <cluster_data_model_definition>
for each Goal <goal_definition>
, usable in the associated
Strategies <strategy_definition>
through some
helper classes in order to:
- simplify the development of a new
Strategy <strategy_definition>
for a givenGoal <goal_definition>
when there already are some existingStrategies <strategy_definition>
associated to the sameGoal <goal_definition>
- avoid duplicating the same code in several
Strategies <strategy_definition>
associated to the sameGoal <goal_definition>
- have a better consistency between the different
Strategies <strategy_definition>
for a givenGoal <goal_definition>
- avoid any strong coupling with any external
Cluster Data Model <cluster_data_model_definition>
(the proposed data model acts as a pivot data model)
There may be various generic and basic Cluster Data Models <cluster_data_model_definition>
proposed in Watcher helpers, each of them being adapted to achieving a
given Goal <goal_definition>
:
- For example, for a
Goal <goal_definition>
which aims at optimizing the networkresources <managed_resource_definition>
theStrategy <strategy_definition>
may need to know whichresources <managed_resource_definition>
are communicating together. - Whereas for a
Goal <goal_definition>
which aims at optimizing thermal and power conditions, theStrategy <strategy_definition>
may need to know the location of each compute node in the racks and the location of each rack in the room.
Note however that a developer can use his/her own Cluster Data Model <cluster_data_model_definition>
if the proposed data model does not fit his/her needs as long as the
Strategy <strategy_definition>
is able to
produce a Solution <solution_definition>
for the requested
Goal <goal_definition>
. For example, a developer
could rely on the Nova Data Model to optimize some compute
resources.
The Cluster Data Model <cluster_data_model_definition>
may be persisted in any appropriate storage system (SQL database, NoSQL
database, JSON file, XML File, In Memory Database, ...).
Cluster History
The Cluster History <cluster_history_definition>
contains all the previously collected timestamped data such as metrics
and events associated to any managed resource <managed_resource_definition>
of the Cluster <cluster_definition>
.
Just like the Cluster Data Model <cluster_data_model_definition>
,
this history may be used by any Strategy <strategy_definition>
in order to find
the most optimal Solution <solution_definition>
during an Audit <audit_definition>
.
In the Watcher project, a generic Cluster History <cluster_history_definition>
API
is proposed with some helper classes in order to :
- share a common measurement (events or metrics) naming based on what is defined in Ceilometer. See the full list of available measurements
- share common meter types (Cumulative, Delta, Gauge) based on what is defined in Ceilometer. See the full list of meter types
- simplify the development of a new
Strategy <strategy_definition>
- avoid duplicating the same code in several Strategies <strategy_definition>
- have a better
consistency between the different Strategies <strategy_definition>
- avoid any
strong coupling with any external metrics/events storage system (the
proposed API and measurement naming system acts as a pivot format)
Note however that a developer can use his/her own history management
system if the Ceilometer system does not fit his/her needs as long as
the Strategy <strategy_definition>
is able to
produce a Solution <solution_definition>
for the requested
Goal <goal_definition>
.
The Cluster History <cluster_history_definition>
data may be persisted in any appropriate storage system (InfluxDB,
OpenTSDB, MongoDB,...).
Controller Node
A controller node is a machine that typically runs the following core OpenStack services:
- Keystone: for identity and service management
- Cinder scheduler: for volumes management
- Glance controller: for image management
- Neutron controller: for network management
- Nova controller: for global compute resources management with services such as nova-scheduler, nova-conductor and nova-network.
In many configurations, Watcher will reside on a controller node even if it can potentially be hosted on a dedicated machine.
Compute node
Please, read the official OpenStack definition of a Compute Node.
Customer
A Customer <customer_definition>
is the person or
company which subscribes to the cloud provider offering. A customer may
have several Project(s) <project_definition>
hosted on the
same Cluster <cluster_definition>
or dispatched on
different clusters.
In the private cloud context, the Customers <customer_definition>
are different
groups within the same organization (different departments, project
teams, branch offices and so on). Cloud infrastructure includes the
ability to precisely track each customer's service usage so that it can
be charged back to them, or at least reported to them.
Goal
A Goal <goal_definition>
is a human readable,
observable and measurable end result having one objective to be
achieved.
Here are some examples of Goals <goal_definition>
:
- minimize the energy consumption
- minimize the number of compute nodes (consolidation)
- balance the workload among compute nodes
- minimize the license cost (some softwares have a licensing model which is based on the number of sockets or cores where the software is deployed)
- find the most appropriate moment for a planned maintenance on a given group of host (which may be an entire availability zone): power supply replacement, cooling system replacement, hardware modification, ...
Host Aggregate
Please, read the official OpenStack definition of a Host Aggregate.
Instance
A running virtual machine, or a virtual machine in a known state such as suspended, that can be used like a hardware server.
Managed resource
A Managed resource <managed_resource_definition>
is one instance of Managed resource type <managed_resource_type_definition>
in a topology with particular properties and dependencies on other Managed resources <managed_resource_definition>
(relationships).
For example, a Managed resource <managed_resource_definition>
can be one virtual machine (i.e., an instance <instance_definition>
) hosted on a
compute node <compute_node_definition>
and
connected to another virtual machine through a network link (represented
also as a Managed resource <managed_resource_definition>
in the Cluster Data Model <cluster_data_model_definition>
).
Managed resource type
A Managed resource type <managed_resource_definition>
is a type of hardware or software element of the Cluster <cluster_definition>
that the Watcher
system can act on.
Here are some examples of Managed resource types <managed_resource_definition>
:
- Nova Host Aggregates
- Nova Servers
- Cinder Volumes
- Neutron Routers
- Neutron Networks
- Neutron load-balancers
- Sahara Hadoop Cluster
- ...
It can be any of the the official list of available resource types defined in OpenStack for HEAT.
Optimization Efficiency
The Optimization Efficiency <efficiency_definition>
is the objective measure of how much of the Goal <goal_definition>
has been achieved in respect with constraints and SLAs <sla_definition>
defined by the Customer <customer_definition>
.
The way efficiency is evaluated will depend on the Goal <goal_definition>
to achieve.
Of course, the efficiency will be relevant only as long as the Action Plan <action_plan_definition>
is relevant
(i.e., the current state of the Cluster <cluster_definition>
has not changed in
a way that a new Audit <audit_definition>
would need to be
launched).
For example, if the Goal <goal_definition>
is to lower the energy
consumption, the Efficiency <efficiency_definition>
will be
computed using several indicators (KPIs):
- the percentage of energy gain (which must be the highest possible)
- the number of
SLA violations <sla_violation_definition>
(which must be the lowest possible) - the number of virtual machine migrations (which must be the lowest possible)
All those indicators (KPIs) are computed within a given timeframe,
which is the time taken to execute the whole Action Plan <action_plan_definition>
.
The efficiency also enables the Administrator <administrator_definition>
to
objectively compare different Strategies <strategy_definition>
for the same
goal and same workload of the Cluster <cluster_definition>
.
Project
Projects <project_definition>
represent the base
unit of “ownership” in OpenStack, in that all resources <managed_resource_definition>
in
OpenStack should be owned by a specific project <project_definition>
. In OpenStack
Identity, a project <project_definition>
must be owned by a
specific domain.
Please, read the official OpenStack definition of a Project.
Primitive
A Primitive <primitive_definition>
is the
component that carries out a certain type of atomic Actions <action_definition>
on a given Managed resource <managed_resource_definition>
(nova, swift, neutron, glance,..). A Primitive <primitive_definition>
is a part of
the Watcher Applier <watcher_applier_definition>
module.
For example, there can be a Primitive <primitive_definition>
which is
responsible for creating a snapshot of a given instance on a Nova
compute node. This Primitive <primitive_definition>
knows exactly
how to send the appropriate commands to Nova for this type of Actions <action_definition>
.
SLA
SLA <sla_definition>
means Service Level
Agreement.
The resources are negotiated between the Customer <customer_definition>
and the Cloud
Provider in a contract.
Most of the time, this contract is composed of two documents:
SLA <sla_definition>
: Service Level AgreementSLO <slo_definition>
: Service Level Objectives
Note that the SLA <sla_definition>
is more general than the
SLO <slo_definition>
in the sense that the
former specifies what service is to be provided, how it is supported,
times, locations, costs, performance, and responsibilities of the
parties involved while the SLO <slo_definition>
focuses on more measurable
characteristics such as availability, throughput, frequency, response
time or quality.
You can also read the Wikipedia page for SLA which provides a good definition.
SLA violation
A SLA violation <sla_violation_definition>
happens
when a SLA <sla_definition>
defined with a given Customer <customer_definition>
could not be
respected by the cloud provider within the timeframe defined by the
official contract document.
SLO
A Service Level Objective (SLO) is a key element of a SLA <sla_definition>
between a service provider and a Customer <customer_definition>
. SLOs are agreed
as a means of measuring the performance of the Service Provider and are
outlined as a way of avoiding disputes between the two parties based on
misunderstanding.
You can also read the Wikipedia page for SLO which provides a good definition.
Solution
A Solution <solution_definition>
is a set of Actions <action_definition>
generated by a Strategy <strategy_definition>
(i.e., an
algorithm) in order to achieve the Goal <goal_definition>
of an Audit <audit_definition>
.
A Solution <solution_definition>
is different from
an Action Plan <action_plan_definition>
because it
contains the non-scheduled list of Actions <action_definition>
which is produced by
a Strategy <strategy_definition>
. In other words,
the list of Actions in a Solution <solution_definition>
has not yet been
re-ordered by the Watcher Planner <watcher_planner_definition>
.
Note that some algorithms (i.e. Strategies <strategy_definition>
) may generate
several Solutions <solution_definition>
. This gives rise
to the problem of determining which Solution <solution_definition>
should be
applied.
Two approaches to dealing with this can be envisaged:
- - fully automated mode: only the
Solution <solution_definition>
-
- with the highest ranking (i.e., the highest
-
Optimization Efficiency <efficiency_definition>
) will be sent to theWatcher Planner <watcher_planner_definition>
and translated into concreteActions <action_definition>
.
- - manual mode: several
Solutions <solution_definition>
are proposed -
to the
Administrator <administrator_definition>
with a detailed measurement of the estimatedOptimization Efficiency <efficiency_definition>
and he/she decides which one will be launched.
Strategy
A Strategy <strategy_definition>
is an algorithm
implementation which is able to find a Solution <solution_definition>
for a given Goal <goal_definition>
.
There may be several potential strategies which are able to achieve
the same Goal <goal_definition>
. This is why it is
possible to configure which specific Strategy <strategy_definition>
should be used
for each Goal <goal_definition>
.
Some strategies may provide better optimization results but may take
more time to find an optimal Solution <solution_definition>
.
When a new Goal <goal_definition>
is added to the Watcher
configuration, at least one default associated Strategy <strategy_definition>
should be
provided as well.
Watcher Applier
This component is in charge of executing the Action Plan <action_plan_definition>
built by
the Watcher Decision Engine <watcher_decision_engine_definition>
.
See architecture
for
more details on this component.
Watcher Database
This database stores all the Watcher domain objects which can be requested by the Watcher API or the Watcher CLI:
- Audit templates
- Audits
- Action plans
- Actions
- Goals
The Watcher domain being here "optimization of some resources provided by an OpenStack system".
See architecture
for
more details on this component.
Watcher Decision Engine
This component is responsible for computing a set of potential
optimization Actions <action_definition>
in order to fulfill
the Goal <goal_definition>
of an Audit <audit_definition>
.
It first reads the parameters of the Audit <audit_definition>
from the associated
Audit Template <audit_template_definition>
and
knows the Goal <goal_definition>
to achieve.
It then selects the most appropriate Strategy <strategy_definition>
depending on how
Watcher was configured for this Goal <goal_definition>
.
The Strategy <strategy_definition>
is then executed
and generates a set of Actions <action_definition>
which are scheduled
in time by the Watcher Planner <watcher_planner_definition>
(i.e., it generates an Action Plan <action_plan_definition>
).
See architecture
for
more details on this component.
Watcher Planner
The Watcher Planner <watcher_planner_definition>
is
part of the Watcher Decision Engine <watcher_decision_engine_definition>
.
This module takes the set of Actions <action_definition>
generated by a Strategy <strategy_definition>
and builds the
design of a workflow which defines how-to schedule in time those
different Actions <action_definition>
and for each Action <action_definition>
what are the
prerequisite conditions.
It is important to schedule Actions <action_definition>
in time in order to
prevent overload of the Cluster <cluster_definition>
while applying the
Action Plan <action_plan_definition>
. For
example, it is important not to migrate too many instances at the same
time in order to avoid a network congestion which may decrease the SLA <sla_definition>
for Customers <customer_definition>
.
It is also important to schedule Actions <action_definition>
in order to avoid
security issues such as denial of service on core OpenStack
services.
See architecture
for
more details on this component.