Update API doc for Alarm Definitions
Explain how Alarm Definitions work and how they generate Alarms with examples Updated Table of Contents which was way out of date Change-Id: I23c5531b645a2262abef50828f435fb7b9499521
This commit is contained in:
parent
a17aad2c8e
commit
f0736202d4
@ -1,6 +1,6 @@
|
||||
# Monasca API
|
||||
|
||||
Date: July 18, 2014
|
||||
Date: November 5, 2014
|
||||
|
||||
Document Version: v2.0
|
||||
|
||||
@ -8,12 +8,13 @@ Document Version: v2.0
|
||||
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
||||
**Table of Contents** *generated with [DocToc](http://doctoc.herokuapp.com/)*
|
||||
|
||||
- [Monasca API](#monasca-api)
|
||||
- [Overview](#overview)
|
||||
- [Metric Name and Dimensions](#metric-name-and-dimensions)
|
||||
- [Name](#name)
|
||||
- [Dimensions](#dimensions)
|
||||
- [Alarm Expressions](#alarm-expressions)
|
||||
- [Text Representation](#text-representation)
|
||||
- [Alarm Definitions and Alarms](#alarm-definitions-and-alarms)
|
||||
- [Alarm Definition Expressions](#alarm-definition-expressions)
|
||||
- [Syntax](#syntax)
|
||||
- [Simple Example](#simple-example)
|
||||
- [More Complex Example](#more-complex-example)
|
||||
@ -24,35 +25,269 @@ Document Version: v2.0
|
||||
- [Common Responses](#common-responses)
|
||||
- [Versions](#versions)
|
||||
- [List Versions](#list-versions)
|
||||
- [GET](#get)
|
||||
- [Headers](#headers)
|
||||
- [Path Parameters](#path-parameters)
|
||||
- [Query Parameters](#query-parameters)
|
||||
- [Request Body](#request-body)
|
||||
- [Request Examples](#request-examples)
|
||||
- [Response](#response)
|
||||
- [Status code](#status-code)
|
||||
- [Response Body](#response-body)
|
||||
- [Response Examples](#response-examples)
|
||||
- [Get Version](#get-version)
|
||||
- [Get /{version_id}](#get-version_id)
|
||||
- [Headers](#headers-1)
|
||||
- [Path Parameters](#path-parameters-1)
|
||||
- [Query Parameters](#query-parameters-1)
|
||||
- [Request Body](#request-body-1)
|
||||
- [Request Examples](#request-examples-1)
|
||||
- [Response](#response-1)
|
||||
- [Status code](#status-code-1)
|
||||
- [Response Body](#response-body-1)
|
||||
- [Response Examples](#response-examples-1)
|
||||
- [Metrics](#metrics)
|
||||
- [Create Metric](#create-metric)
|
||||
- [POST /v2.0/metrics](#post-v20metrics)
|
||||
- [Headers](#headers-2)
|
||||
- [Path Parameters](#path-parameters-2)
|
||||
- [Query Parameters](#query-parameters-2)
|
||||
- [Request Body](#request-body-2)
|
||||
- [Request Examples](#request-examples-2)
|
||||
- [Single metric](#single-metric)
|
||||
- [Array of metrics](#array-of-metrics)
|
||||
- [Response](#response-2)
|
||||
- [Status Code](#status-code)
|
||||
- [Response Body](#response-body-2)
|
||||
- [List metrics](#list-metrics)
|
||||
- [GET /v2.0/metrics](#get-v20metrics)
|
||||
- [Headers](#headers-3)
|
||||
- [Path Parameters](#path-parameters-3)
|
||||
- [Query Parameters](#query-parameters-3)
|
||||
- [Request Body](#request-body-3)
|
||||
- [Request Examples](#request-examples-3)
|
||||
- [Response](#response-3)
|
||||
- [Status Code](#status-code-1)
|
||||
- [Response Body](#response-body-3)
|
||||
- [Response Examples](#response-examples-2)
|
||||
- [Measurements](#measurements)
|
||||
- [List measurements](#list-measurements)
|
||||
- [GET /v2.0/metrics/measurements](#get-v20metricsmeasurements)
|
||||
- [Headers](#headers-4)
|
||||
- [Path Parameters](#path-parameters-4)
|
||||
- [Query Parameters](#query-parameters-4)
|
||||
- [Request Body](#request-body-4)
|
||||
- [Request Examples](#request-examples-4)
|
||||
- [Response](#response-4)
|
||||
- [Status Code](#status-code-2)
|
||||
- [Response Body](#response-body-4)
|
||||
- [Response Examples](#response-examples-3)
|
||||
- [Statistics](#statistics)
|
||||
- [List statistics](#list-statistics)
|
||||
- [GET /v2.0/metrics/statistics](#get-v20metricsstatistics)
|
||||
- [Headers](#headers-5)
|
||||
- [Path Parameters](#path-parameters-5)
|
||||
- [Query Parameters](#query-parameters-5)
|
||||
- [Request Body](#request-body-5)
|
||||
- [Request Examples](#request-examples-5)
|
||||
- [Response](#response-5)
|
||||
- [Status Code](#status-code-3)
|
||||
- [Response Body](#response-body-5)
|
||||
- [Response Examples](#response-examples-4)
|
||||
- [Notification Methods](#notification-methods)
|
||||
- [Create Notification Method](#create-notification-method)
|
||||
- [POST /v2.0/notification-methods](#post-v20notification-methods)
|
||||
- [Headers](#headers-6)
|
||||
- [Path Parameters](#path-parameters-6)
|
||||
- [Query Parameters](#query-parameters-6)
|
||||
- [Request Body](#request-body-6)
|
||||
- [Request Examples](#request-examples-6)
|
||||
- [Response](#response-6)
|
||||
- [Status Code](#status-code-4)
|
||||
- [Response Body](#response-body-6)
|
||||
- [Response Examples](#response-examples-5)
|
||||
- [List Notification Methods](#list-notification-methods)
|
||||
- [GET /v2.0/notification-methods](#get-v20notification-methods)
|
||||
- [Headers](#headers-7)
|
||||
- [Path Parameters](#path-parameters-7)
|
||||
- [Query Parameters](#query-parameters-7)
|
||||
- [Request Body](#request-body-7)
|
||||
- [Request Examples](#request-examples-7)
|
||||
- [Response](#response-7)
|
||||
- [Status Code](#status-code-5)
|
||||
- [Response Body](#response-body-7)
|
||||
- [Response Examples](#response-examples-6)
|
||||
- [Get Notification Method](#get-notification-method)
|
||||
- [GET /v2.0/notification-methods/{notification_method_id}](#get-v20notification-methodsnotification_method_id)
|
||||
- [Headers](#headers-8)
|
||||
- [Path Parameters](#path-parameters-8)
|
||||
- [Query Parameters](#query-parameters-8)
|
||||
- [Request Body](#request-body-8)
|
||||
- [Request Examples](#request-examples-8)
|
||||
- [Response](#response-8)
|
||||
- [Status Code](#status-code-6)
|
||||
- [Response Body](#response-body-8)
|
||||
- [Response Examples](#response-examples-7)
|
||||
- [Update Notification Method](#update-notification-method)
|
||||
- [PUT /v2.0/notification-methods/{notification_method_id}](#put-v20notification-methodsnotification_method_id)
|
||||
- [Headers](#headers-9)
|
||||
- [Path Parameters](#path-parameters-9)
|
||||
- [Query Parameters](#query-parameters-9)
|
||||
- [Request Body](#request-body-9)
|
||||
- [Request Examples](#request-examples-9)
|
||||
- [Response](#response-9)
|
||||
- [Status Code](#status-code-7)
|
||||
- [Response Body](#response-body-9)
|
||||
- [Response Examples](#response-examples-8)
|
||||
- [Delete Notification Method](#delete-notification-method)
|
||||
- [Alarm Defitinions](#alarm-definitions)
|
||||
- [DELETE /v2.0/notification-methods/{notification_method_id}](#delete-v20notification-methodsnotification_method_id)
|
||||
- [Headers](#headers-10)
|
||||
- [Path Parameters](#path-parameters-10)
|
||||
- [Query Parameters](#query-parameters-10)
|
||||
- [Request Body](#request-body-10)
|
||||
- [Request Examples](#request-examples-10)
|
||||
- [Response](#response-10)
|
||||
- [Status Code](#status-code-8)
|
||||
- [Response Body](#response-body-10)
|
||||
- [Alarm Definitions](#alarm-definitions)
|
||||
- [Create Alarm Definition](#create-alarm-definition)
|
||||
- [POST /v2.0/alarm-definitions](#post-v20alarm-definitions)
|
||||
- [Headers](#headers-11)
|
||||
- [Path Parameters](#path-parameters-11)
|
||||
- [Query Parameters](#query-parameters-11)
|
||||
- [Request Body](#request-body-11)
|
||||
- [Request Examples](#request-examples-11)
|
||||
- [Response](#response-11)
|
||||
- [Status Code](#status-code-9)
|
||||
- [Response Body](#response-body-11)
|
||||
- [Response Examples](#response-examples-9)
|
||||
- [List Alarm Definitions](#list-alarm-definitions)
|
||||
- [GET /v2.0/alarm-definitions](#get-v20alarm-definitions)
|
||||
- [Headers](#headers-12)
|
||||
- [Path Parameters](#path-parameters-12)
|
||||
- [Query Parameters](#query-parameters-12)
|
||||
- [Request Body](#request-body-12)
|
||||
- [Request Examples](#request-examples-12)
|
||||
- [Response](#response-12)
|
||||
- [Status Code](#status-code-10)
|
||||
- [Response Body](#response-body-12)
|
||||
- [Response Examples](#response-examples-10)
|
||||
- [Get Alarm Definition](#get-alarm-definition)
|
||||
- [GET /v2.0/alarm-definitions/{alarm_definition_id}](#get-v20alarm-definitionsalarm_definition_id)
|
||||
- [Headers](#headers-13)
|
||||
- [Path Parameters](#path-parameters-13)
|
||||
- [Query Parameters](#query-parameters-13)
|
||||
- [Request Body](#request-body-13)
|
||||
- [Response](#response-13)
|
||||
- [Status Code](#status-code-11)
|
||||
- [Response Body](#response-body-13)
|
||||
- [Response Examples](#response-examples-11)
|
||||
- [Update Alarm Definition](#update-alarm-definition)
|
||||
- [PUT /v2.0/alarm-definitions/{alarm_definition_id}](#put-v20alarm-definitionsalarm_definition_id)
|
||||
- [Headers](#headers-14)
|
||||
- [Path Parameters](#path-parameters-14)
|
||||
- [Query Parameters](#query-parameters-14)
|
||||
- [Request Body](#request-body-14)
|
||||
- [Request Examples](#request-examples-13)
|
||||
- [Response](#response-14)
|
||||
- [Status Code](#status-code-12)
|
||||
- [Response Body](#response-body-14)
|
||||
- [Response Examples](#response-examples-12)
|
||||
- [Patch Alarm Definition](#patch-alarm-definition)
|
||||
- [PATCH /v2.0/alarm-definitions/{alarm_definition_id}](#patch-v20alarm-definitionsalarm_definition_id)
|
||||
- [Headers](#headers-15)
|
||||
- [Path Parameters](#path-parameters-15)
|
||||
- [Query Parameters](#query-parameters-15)
|
||||
- [Request Body](#request-body-15)
|
||||
- [Request Examples](#request-examples-14)
|
||||
- [Response](#response-15)
|
||||
- [Status Code](#status-code-13)
|
||||
- [Response Body](#response-body-15)
|
||||
- [Response Examples](#response-examples-13)
|
||||
- [Delete Alarm Definition](#delete-alarm-definition)
|
||||
- [Alarms](#alarms)
|
||||
- [DELETE /v2.0/alarm-definitions/{alarm_definition_id}](#delete-v20alarm-definitionsalarm_definition_id)
|
||||
- [Headers](#headers-16)
|
||||
- [Path Parameters](#path-parameters-16)
|
||||
- [Query Parameters](#query-parameters-16)
|
||||
- [Request Body](#request-body-16)
|
||||
- [Request Examples](#request-examples-15)
|
||||
- [Response](#response-16)
|
||||
- [Status Code](#status-code-14)
|
||||
- [Response Body](#response-body-16)
|
||||
- [List Alarms](#list-alarms)
|
||||
- [GET /v2.0/alarms](#get-v20alarms)
|
||||
- [Headers](#headers-17)
|
||||
- [Path Parameters](#path-parameters-17)
|
||||
- [Query Parameters](#query-parameters-17)
|
||||
- [Request Body](#request-body-17)
|
||||
- [Request Examples](#request-examples-16)
|
||||
- [Response](#response-17)
|
||||
- [Status Code](#status-code-15)
|
||||
- [Response Body](#response-body-17)
|
||||
- [Response Examples](#response-examples-14)
|
||||
- [List Alarms State History](#list-alarms-state-history)
|
||||
- [GET /v2.0/alarms/state-history](#get-v20alarmsstate-history)
|
||||
- [Headers](#headers-18)
|
||||
- [Path Parameters](#path-parameters-18)
|
||||
- [Query Parameters](#query-parameters-18)
|
||||
- [Request Body](#request-body-18)
|
||||
- [Response](#response-18)
|
||||
- [Status Code](#status-code-16)
|
||||
- [Response Body](#response-body-18)
|
||||
- [Response Examples](#response-examples-15)
|
||||
- [Get Alarm](#get-alarm)
|
||||
- [GET /v2.0/alarms/{alarm_id}](#get-v20alarmsalarm_id)
|
||||
- [Headers](#headers-19)
|
||||
- [Path Parameters](#path-parameters-19)
|
||||
- [Query Parameters](#query-parameters-19)
|
||||
- [Request Body](#request-body-19)
|
||||
- [Response](#response-19)
|
||||
- [Status Code](#status-code-17)
|
||||
- [Response Body](#response-body-19)
|
||||
- [Response Examples](#response-examples-16)
|
||||
- [Update Alarm](#update-alarm)
|
||||
- [Patch Alarm](#update-alarm-1)
|
||||
- [PUT /v2.0/alarms/{alarm_id}](#put-v20alarmsalarm_id)
|
||||
- [Headers](#headers-20)
|
||||
- [Path Parameters](#path-parameters-20)
|
||||
- [Query Parameters](#query-parameters-20)
|
||||
- [Request Body](#request-body-20)
|
||||
- [Request Examples](#request-examples-17)
|
||||
- [Response](#response-20)
|
||||
- [Status Code](#status-code-18)
|
||||
- [Response Body](#response-body-20)
|
||||
- [Response Examples](#response-examples-17)
|
||||
- [Patch Alarm](#patch-alarm)
|
||||
- [PATCH /v2.0/alarms/{alarm_id}](#patch-v20alarmsalarm_id)
|
||||
- [Headers](#headers-21)
|
||||
- [Path Parameters](#path-parameters-21)
|
||||
- [Query Parameters](#query-parameters-21)
|
||||
- [Request Body](#request-body-21)
|
||||
- [Request Examples](#request-examples-18)
|
||||
- [Response](#response-21)
|
||||
- [Status Code](#status-code-19)
|
||||
- [Response Body](#response-body-21)
|
||||
- [Response Examples](#response-examples-18)
|
||||
- [Delete Alarm](#delete-alarm)
|
||||
- [DELETE /v2.0/alarms/{alarm_id}](#delete-v20alarmsalarm_id)
|
||||
- [Headers](#headers-22)
|
||||
- [Path Parameters](#path-parameters-22)
|
||||
- [Query Parameters](#query-parameters-22)
|
||||
- [Request Body](#request-body-22)
|
||||
- [Request Examples](#request-examples-19)
|
||||
- [Response](#response-22)
|
||||
- [Status Code](#status-code-20)
|
||||
- [Response Body](#response-body-22)
|
||||
- [List Alarm State History](#list-alarm-state-history)
|
||||
- [GET /v2.0/alarms/{alarm_id}/state-history](#get-v20alarmsalarm_idstate-history)
|
||||
- [Headers](#headers-23)
|
||||
- [Path Parameters](#path-parameters-23)
|
||||
- [Query Parameters](#query-parameters-23)
|
||||
- [Request Body](#request-body-23)
|
||||
- [Request Data](#request-data)
|
||||
- [Response](#response-23)
|
||||
- [Status Code](#status-code-21)
|
||||
- [Response Body](#response-body-23)
|
||||
- [Response Examples](#response-examples-19)
|
||||
- [License](#license)
|
||||
|
||||
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
||||
@ -66,8 +301,9 @@ The API consists of six main resources:
|
||||
2. Metrics - Provides for storage and retrieval of metrics.
|
||||
3. Measurements - Operations for querying measurements of metrics.
|
||||
4. Statistics - Operations for evaluating statistics of metrics.
|
||||
5. Notification Methods - Represents a method, such as email, which can be associated with an alarm via an action. When an alarm is triggered notification methods associated with the alarm are triggered.
|
||||
5. Alarms - Provides CRUD operations for alarms and querying the alarm state history.
|
||||
5. Notification Methods - Represents a method, such as email, which can be associated with an alarm definition via an action. When an alarm is triggered notification methods associated with the alarm definition are triggered.
|
||||
5. Alarm Definitions - Provides CRUD operations for alarm definitions.
|
||||
6. Alarms - Provides CRUD operations for alarms, and querying the alarm state history.
|
||||
|
||||
Before using the API, you must first get a valid auth token from Keystone. All API operations require an auth token specified in the header of the http request.
|
||||
|
||||
@ -75,14 +311,124 @@ Before using the API, you must first get a valid auth token from Keystone. All A
|
||||
A metric is uniquely identified by a name and set of dimensions.
|
||||
|
||||
### Name
|
||||
Defines the name of a metric. A name is of type string(64).
|
||||
Defines the name of a metric. A name is of type string(100).
|
||||
|
||||
### Dimensions
|
||||
A dictionary of (key, value) pairs. The key and value are of type string(255). The first character in the dimension is restricted to the following: `a-z A-Z 0-9 _ / \ $`.
|
||||
However, the next characters may be any character except for the following: `; } { = , & ) ( "`. If one of the restricted characters is needed, this can be achieved by double quoting the dimensions.
|
||||
|
||||
## Alarm Expressions
|
||||
The alarm expression syntax allows the creation of simple or complex alarms to handle a wide variety of needs. Alarm expressions are evaluated every 60 seconds.
|
||||
### Text Representation
|
||||
In this document, metrics will be represented in the form `name{name=value,name=value}` where name is the metric name and the name=value pairs in the curly braces are the dimensions. For example, `cpu.idle_perc{service=monitoring,hostname=mini-mon}` represents a metric with the name "cpu.idle_perc" and the dimensions "service=monitoring" and "hostname=mini-mon".
|
||||
|
||||
## Alarm Definitions and Alarms
|
||||
|
||||
Alarm Definitions are policies that specify how Alarms should be created. By using Alarm Definitions, the user doesn't have to create individual alarms for each system or service. Instead, a small number of Alarm Definitions can be managed and Monasca will create Alarms for systems and services as they appear.
|
||||
|
||||
An Alarm Definition has an expression for evaluating one or more metrics to determine if there is a problem. Depending on the Alarm Definition expression and match_by value, Monasca will create one or more Alarms depending on the Metrics that are received. The match_by parameter specifies which dimension or dimensions should be used to determine if one or more alarms will be created.
|
||||
|
||||
An example is the best way to show this. Imagine two Alarm Definitions have been created:
|
||||
|
||||
Alarm Definition 1 has an expression of `avg(cpu.idle_perc{service=monitoring}) < 20` and the match_by parameter is not set. Alarm Definition 2 has an expression of `min(cpu.idle_perc{service=monitoring}) < 10` and the match_by parameter is set to `hostname`.
|
||||
|
||||
When the metric cpu.idle_perc{service=monitoring,hostname=mini-mon} is first received after the Metric Definitions have been created, an Alarm is created for both Alarm Definitions. The metric is added to both Alarms. The following set of Alarm Definitions and Alarm would exist:
|
||||
|
||||
Alarm Definition 1:
|
||||
```
|
||||
Alarm 1 - Metrics: cpu.idle_perc{service=monitoring,hostname=mini-mon}
|
||||
```
|
||||
|
||||
Alarm Definition 2:
|
||||
```
|
||||
Alarm 1 - Metrics: cpu.idle_perc{service=monitoring,hostname=mini-mon}
|
||||
```
|
||||
|
||||
Now, when the metric cpu.idle_perc{service=monitoring,hostname=devstack} is received, the two Alarm Definitions define different behaviors. Since the value for the hostname dimension is different than the value for the existing Alarm from Alarm Definition 2, an new Alarm will be created. Alarm Definition 1 does not have a value for match_by, so this metric is added to the existing Alarm. This gives us the following set of Alarm Definitions and Alarms:
|
||||
|
||||
Alarm Definition 1:
|
||||
```
|
||||
Alarm 1 - Metrics: cpu.idle_perc{service=monitoring,hostname=mini-mon} and cpu.idle_perc{service=monitoring,hostname=devstack}
|
||||
```
|
||||
|
||||
Alarm Definition 2:
|
||||
```
|
||||
Alarm 1 - Metrics: cpu.idle_perc{service=monitoring,hostname=mini-mon}
|
||||
Alarm 2 - Metrics: cpu.idle_perc{service=monitoring,hostname=devstack}
|
||||
```
|
||||
|
||||
Alarm Definition 1 is evaluating the status of the monitoring service as a whole, while Alarm Definition 2 evaluates each system in the service.
|
||||
|
||||
Now if another system is configured into the monitoring service, then its cpu.idle_perc metric will be added to the Alarm for Alarm Definition 1 and a new Alarm will be created for Alarm Definition 2, all without any user intervention. The system will be monitored without requiring the user to explictly add alarms for the new system as other monitoring systems require.
|
||||
|
||||
If an Alarm Definition expression has multiple subexpressions, for example, `avg(cpu.idle_perc{service=monitoring}) < 10 or avg(cpu.user_perc{service=monitoring}) > 60` and a match_by value set, then the metrics for both subexpressions must have the same value for the dimension specified in match_by. For example, assume this Alarm Definition:
|
||||
|
||||
Expression `avg(cpu.idle_perc{service=monitoring}) < 10 or avg(cpu.user_perc{service=monitoring}) > 60` and match_by is `hostname`
|
||||
|
||||
Now assume four metrics are received by Monasca:
|
||||
|
||||
```
|
||||
cpu.idle_perc{service=monitoring,hostname=mini-mon}
|
||||
cpu.idle_perc{service=monitoring,hostname=devstack}
|
||||
cpu.user_perc{service=monitoring,hostname=mini-mon}
|
||||
cpu.user_perc{service=monitoring,hostname=devstack}
|
||||
```
|
||||
|
||||
This will cause two Alarms to be created, one for each unique value of hostname. One Alarm will have the metrics:
|
||||
|
||||
```
|
||||
avg(cpu.idle_perc{service=monitoring,hostname=mini-mon}) and avg(cpu.user_perc{service=monitoring,hostname=mini-mon})
|
||||
```
|
||||
|
||||
and another will have the metrics:
|
||||
|
||||
```
|
||||
avg(cpu.idle_perc{service=monitoring,hostname=devstack}) and avg(cpu.user_perc{service=monitoring,hostname=devstack})
|
||||
```
|
||||
|
||||
Note that the value of match_by, "hostname", is used to match the metrics between the subexpressions, hence the name match_by.
|
||||
|
||||
An Alarm will only get created when metrics are seen that match all subexpressions in the Alarm Definition. If match_by is set, then each metric must have a value for at least one of the values in match_by. If match_by is not set, only one Alarm will be created for an Alarm Definition.
|
||||
|
||||
The value of the match_by parameter can also be a list, for example, `hostname,device`. In that case, Alarms will be created based and metrics added based on all values of match_by.
|
||||
|
||||
For example, assume the Alarm Definition with the expression `max(disk.space_used_perc{service=monitoring}) > 90` and match_by set to `hostname`. This will create one alarm for each system that contains all of the metrics for each device. If instead, the match_by is set to `hostname,device`, then a separate alarm will be created for each device in each system.
|
||||
|
||||
To illustrate, assume these four metrics are received by Monasca:
|
||||
```
|
||||
disk.space_used_perc{device:/dev/sda1,hostname=mini-mon}
|
||||
disk.space_used_perc{device:tmpfs,hostname=mini-mon}
|
||||
disk.space_used_perc{device:/dev/sda1,hostname=devstack}
|
||||
disk.space_used_perc{device:tmpfs,hostname=devstack}
|
||||
```
|
||||
|
||||
Given the expression `max(disk.space_used_perc{service=monitoring}) > 90` and match_by set to `hostname`, this will create two alarms:
|
||||
|
||||
```
|
||||
Alarm 1 - Metrics: disk.space_used_perc{device:/dev/sda1,hostname=mini-mon}, disk.space_used_perc{device:tmpfs,hostname=mini-mon}
|
||||
Alarm 2 - Metrics: disk.space_used_perc{device:/dev/sda1,hostname=devstack}, disk.space_used_perc{device:tmpfs,hostname=devstack}
|
||||
```
|
||||
|
||||
If instead, match_by is set to `hostname,device`, then four alarms will be created:
|
||||
|
||||
```
|
||||
Alarm 1 - Metrics: disk.space_used_perc{device:/dev/sda1,hostname=mini-mon}
|
||||
Alarm 2 - Metrics: disk.space_used_perc{device:tmpfs,hostname=devstack}
|
||||
Alarm 3 - Metrics: disk.space_used_perc{device:/dev/sda1,hostname=devstack}
|
||||
Alarm 4 - Metrics: disk.space_used_perc{device:tmpfs,hostname=devstack}
|
||||
```
|
||||
|
||||
The second value of match_by will create an Alarm for each device. For each device that fills up, a separate Alarm will be triggered. The first value of match_by will give you less Alarms to display in the dashboard but if an Alarm has already triggered for one device and another device fills up, the Alarm won't be triggered again.
|
||||
|
||||
If desired, an Alarm Definition can be created that exactly matches a set of metrics. The match_by should not be set. Only one Alarm will be created for that Alarm Definition.
|
||||
|
||||
Alarms have a state that is set by the Threshold Engine based on the incoming metrics. The states are:
|
||||
UNDETERMINED - No metrics for at least one of the subexpressions has been received in (period + 2) times periods (see below for definition of period and periods
|
||||
OK - Metrics have been received and the Alarm Definition Expression evaluates to false for the given metrics
|
||||
ALARM - Metrics have been received and the Alarm Definition Expression evaluates to true for the given metrics
|
||||
|
||||
The Alarms are evaluated and their state is set once per minute.
|
||||
|
||||
## Alarm Definition Expressions
|
||||
The alarm definition expression syntax allows the creation of simple or complex alarm definitions to handle a wide variety of needs. Alarm expressions are evaluated every 60 seconds.
|
||||
|
||||
An alarm expression is a boolean equation which if it evaluates to true with the incoming metrics, will then trigger a notification to be sent.
|
||||
|
||||
@ -171,17 +517,17 @@ Threshold values are always in the same units as the metric that they are being
|
||||
|
||||
|
||||
#### Simple Example
|
||||
In this example the metric uniquely identified with the name=cpu_perc and dimension hostname=host.domain.com is compared to the threshold 95.
|
||||
In this example the metric uniquely identified with the name=cpu.system_perc and dimension hostname=host.domain.com is compared to the threshold 95.
|
||||
|
||||
```
|
||||
cpu_perc{hostname=host.domain.com} > 95
|
||||
cpu.system_perc{hostname=host.domain.com} > 95
|
||||
```
|
||||
|
||||
#### More Complex Example
|
||||
In this example the average of the same metric as in the previous example is evaluated over a 120 second period for 3 times so that the expression will evaluate to true if the average is greater than 95 seconds for a total of 360 seconds.
|
||||
In this example the average of the same metric as in the previous example is evaluated over a 120 second period for 3 times so that the expression will evaluate to true if the average is greater than 95 for a total of 360 seconds.
|
||||
|
||||
```
|
||||
avg(cpu_perc{hostname=host.domain.com}, 120) > 95 times 3
|
||||
avg(cpu.system_perc{hostname=host.domain.com}, 120) > 95 times 3
|
||||
```
|
||||
|
||||
Note that period is the number of seconds for the measurement to be done on. They can only be in a multiple of 60. Periods is how many times in a row that this expression must be true before triggering the alarm. Both period and periods are optional and default to 60 and 1 respectively.
|
||||
@ -200,7 +546,7 @@ The metric is a complex identifier that says the name and optional dimensions.
|
||||
In this example a compound alarm expression is evaluated involving two thresholds.
|
||||
|
||||
```
|
||||
avg(cpu_perc{hostname=hostname.domain.com}) > 90 or avg(disk_read_ops{hostname=hostname.domain.com, device=vda, 120) > 1000
|
||||
avg(cpu.system_perc{hostname=hostname.domain.com}) > 90 or avg(disk_read_ops{hostname=hostname.domain.com, device=vda, 120) > 1000
|
||||
```
|
||||
|
||||
# Common Request Headers
|
||||
@ -357,7 +703,7 @@ None.
|
||||
#### Request Body
|
||||
Consists of a single metric object or an array of metric objects. A metric has the following properties:
|
||||
|
||||
* name (string(64), required) - The name of the metric.
|
||||
* name (string(100), required) - The name of the metric.
|
||||
* dimensions ({string(255): string(255)}, optional) - A dictionary consisting of (key, value) pairs used to uniquely identify a metric.
|
||||
* timestamp (string, required) - The timestamp in seconds from the Epoch.
|
||||
* value (float, required) - Value of the metric. Values with base-10 exponents greater than 126 or less than -130 are truncated.
|
||||
@ -440,7 +786,7 @@ Get metrics
|
||||
None.
|
||||
|
||||
#### Query Parameters
|
||||
* name (string(64), optional) - A metric name to filter metrics by.
|
||||
* name (string(100), optional) - A metric name to filter metrics by.
|
||||
* dimensions (string, optional) - A dictionary to filter metrics by specified as a comma separated array of (key, value) pairs as `key1:value1,key2:value2, ...`
|
||||
|
||||
#### Request Body
|
||||
@ -500,7 +846,7 @@ Get measurements for metrics.
|
||||
None.
|
||||
|
||||
#### Query Parameters
|
||||
* name (string(64), optional) - A metric name to filter metrics by.
|
||||
* name (string(100), optional) - A metric name to filter metrics by.
|
||||
* dimensions (string, optional) - A dictionary to filter metrics by specified as a comma separated array of (key, value) pairs as `key1:value1,key2:value2, ...`
|
||||
* start_time (string, required) - The start time in ISO 8601 combined date and time format in UTC.
|
||||
* end_time (string, optional) - The end time in ISO 8601 combined date and time format in UTC.
|
||||
@ -511,7 +857,7 @@ None.
|
||||
|
||||
#### Request Examples
|
||||
```
|
||||
GET /v2.0/metrics/measurements?name=cpu_user_perc&dimensions=hostname:devstack&start_time=2014-07-18T03:00:00Z HTTP/1.1
|
||||
GET /v2.0/metrics/measurements?name=cpu.system_perc&dimensions=hostname:devstack&start_time=2014-07-18T03:00:00Z HTTP/1.1
|
||||
Host: 192.168.10.4:8080
|
||||
Content-Type: application/json
|
||||
X-Auth-Token: 2b8882ba2ec44295bf300aecb2caa4f7
|
||||
@ -525,7 +871,7 @@ Cache-Control: no-cache
|
||||
#### Response Body
|
||||
Returns a JSON array of measurements objects for each unique metric with the following fields:
|
||||
|
||||
* name (string(64)) - A name of a metric.
|
||||
* name (string(100)) - A name of a metric.
|
||||
* dimensions ({string(255): string(255)}) - The dimensions of a metric.
|
||||
* columns (array[string]) - An array of column names corresponding to the columns in measurements.
|
||||
* measurements (array[array[]]) - A two dimensional array of measurements for each timestamp.
|
||||
@ -534,7 +880,7 @@ Returns a JSON array of measurements objects for each unique metric with the fol
|
||||
```
|
||||
[
|
||||
{
|
||||
"name":"cpu_user_perc",
|
||||
"name":"cpu.system_perc",
|
||||
"dimensions":{
|
||||
"hostname":"devstack"
|
||||
},
|
||||
@ -586,7 +932,7 @@ Get statistics for metrics.
|
||||
None.
|
||||
|
||||
#### Query Parameters
|
||||
* name (string(64), required) - A metric name to filter metrics by.
|
||||
* name (string(100), required) - A metric name to filter metrics by.
|
||||
* dimensions (string, optional) - A dictionary to filter metrics by specified as a comma separated array of (key, value) pairs as `key1:value1,key2:value2, ...`
|
||||
* statistics (string, required) - A comma separate array of statistics to evaluate. Valid statistics are avg, min, max, sum and count.
|
||||
* start_time (string, required) - The start time in ISO 8601 combined date and time format in UTC.
|
||||
@ -598,7 +944,7 @@ None.
|
||||
|
||||
#### Request Examples
|
||||
```
|
||||
GET /v2.0/metrics/statistics?name=cpu_user_perc&dimensions=hostname:devstack&start_time=2014-07-18T03:00:00Z&statistics=avg,min,max,sum,count HTTP/1.1
|
||||
GET /v2.0/metrics/statistics?name=cpu.system_perc&dimensions=hostname:devstack&start_time=2014-07-18T03:00:00Z&statistics=avg,min,max,sum,count HTTP/1.1
|
||||
Host: 192.168.10.4:8080
|
||||
X-Auth-Token: 2b8882ba2ec44295bf300aecb2caa4f7
|
||||
Content-Type: application/json
|
||||
@ -613,7 +959,7 @@ Cache-Control: no-cache
|
||||
#### Response Body
|
||||
Returns a JSON array of statistic objects for each unique metric with the following fields:
|
||||
|
||||
* name (string(64)) - A name of a metric.
|
||||
* name (string(100)) - A name of a metric.
|
||||
* dimensions ({string(255): string(255)}) - The dimensions of a metric.
|
||||
* columns (array[string]) - An array of column names corresponding to the columns in statistics.
|
||||
* statistics (array[array[]]) - A two dimensional array of statistics for each period.
|
||||
@ -622,7 +968,7 @@ Returns a JSON array of statistic objects for each unique metric with the follow
|
||||
```
|
||||
[
|
||||
{
|
||||
"name":"cpu_user_perc",
|
||||
"name":"cpu.system_perc",
|
||||
"dimensions":{
|
||||
"hostname":"devstack"
|
||||
},
|
||||
@ -1004,7 +1350,7 @@ Cache-Control: no-cache
|
||||
{
|
||||
"name":"Average CPU percent greater than 10",
|
||||
"description":"The average CPU percent is greater than 10",
|
||||
"expression":"(avg(cpu_user_perc{hostname=devstack}) > 10)",
|
||||
"expression":"(avg(cpu,user_perc{hostname=devstack}) > 10)",
|
||||
"match_by":[
|
||||
"hostname"
|
||||
],
|
||||
@ -1053,10 +1399,10 @@ Returns a JSON array of alarm definition objects with the following fields:
|
||||
],
|
||||
"name":"Average CPU percent greater than 10",
|
||||
"description":"The average CPU percent is greater than 10",
|
||||
"expression":"(avg(cpu_user_perc{hostname=devstack}) > 10)",
|
||||
"expression":"(avg(cpu.user_perc{hostname=devstack}) > 10)",
|
||||
"expression_data":{
|
||||
"function":"AVG",
|
||||
"metric_name":"cpu_user_perc",
|
||||
"metric_name":"cpu.user_perc",
|
||||
"dimensions":{
|
||||
"hostname":"devstack"
|
||||
},
|
||||
@ -1142,10 +1488,10 @@ Returns a JSON array of alarm objects with the following fields:
|
||||
],
|
||||
"name":"CPU percent greater than 10",
|
||||
"description":"Release the hounds",
|
||||
"expression":"(avg(cpu_user_perc{hostname=devstack}) > 10)",
|
||||
"expression":"(avg(cpu.user_perc{hostname=devstack}) > 10)",
|
||||
"expression_data":{
|
||||
"function":"AVG",
|
||||
"metric_name":"cpu_user_perc",
|
||||
"metric_name":"cpu.user_perc",
|
||||
"dimensions":{
|
||||
"hostname":"devstack"
|
||||
},
|
||||
@ -1223,10 +1569,10 @@ Returns a JSON alarm object with the following fields:
|
||||
],
|
||||
"name": "CPU percent greater than 10",
|
||||
"description": "Release the hounds",
|
||||
"expression": "(avg(cpu_user_perc{hostname=devstack}) > 10)",
|
||||
"expression": "(avg(cpu.user_perc{hostname=devstack}) > 10)",
|
||||
"expression_data": {
|
||||
"function": "AVG",
|
||||
"metric_name": "cpu_user_perc",
|
||||
"metric_name": "cpu.user_perc",
|
||||
"dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -1294,7 +1640,7 @@ Cache-Control: no-cache
|
||||
{
|
||||
"name":"CPU percent greater than 15",
|
||||
"description":"Release the hounds",
|
||||
"expression":"(avg(cpu_user_perc{hostname=devstack}) > 15)",
|
||||
"expression":"(avg(cpu.user_perc{hostname=devstack}) > 15)",
|
||||
"match_by":[
|
||||
"hostname"
|
||||
],
|
||||
@ -1342,10 +1688,10 @@ Returns a JSON alarm object with the following parameters:
|
||||
],
|
||||
"name": "CPU percent greater than 15",
|
||||
"description": "Release the hounds",
|
||||
"expression": "(avg(cpu_user_perc{hostname=devstack}) > 15)",
|
||||
"expression": "(avg(cpu.user_perc{hostname=devstack}) > 15)",
|
||||
"expression_data": {
|
||||
"function": "AVG",
|
||||
"metric_name": "cpu_user_perc",
|
||||
"metric_name": "cpu.user_perc",
|
||||
"dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -1412,7 +1758,7 @@ Cache-Control: no-cache
|
||||
{
|
||||
"name":"CPU percent greater than 15",
|
||||
"description":"Release the hounds",
|
||||
"expression":"(avg(cpu_user_perc{hostname=devstack}) > 15)",
|
||||
"expression":"(avg(cpu.user_perc{hostname=devstack}) > 15)",
|
||||
"match_by":[
|
||||
"hostname"
|
||||
],
|
||||
@ -1461,10 +1807,10 @@ Returns a JSON alarm definition object with the following fields:
|
||||
],
|
||||
"name": "CPU percent greater than 15",
|
||||
"description": "Release the hounds",
|
||||
"expression": "(avg(cpu_user_perc{hostname=devstack}) > 15)",
|
||||
"expression": "(avg(cpu.user_perc{hostname=devstack}) > 15)",
|
||||
"expression_data": {
|
||||
"function": "AVG",
|
||||
"metric_name": "cpu_user_perc",
|
||||
"metric_name": "cpu.user_perc",
|
||||
"dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -1547,7 +1893,7 @@ None.
|
||||
"
|
||||
#### Request Examples
|
||||
```
|
||||
GET /v2.0/alarms?metric_name=cpu_utilization&metric_dimensions=hostname:devstack&state=UNDETERMINED HTTP/1.1
|
||||
GET /v2.0/alarms?metric_name=cpu.system_perc&metric_dimensions=hostname:devstack&state=UNDETERMINED HTTP/1.1
|
||||
Host: 192.168.10.4:8080
|
||||
Content-Type: application/json
|
||||
X-Auth-Token: 2b8882ba2ec44295bf300aecb2caa4f7
|
||||
@ -1582,9 +1928,19 @@ Returns a JSON array of alarm objects with the following fields:
|
||||
"href":"http://192.168.10.4:8080/v2.0/alarms/f9935bcc-9641-4cbf-8224-0993a947ea83/state-history"
|
||||
}
|
||||
],
|
||||
"alarm_definition_id":"ad837fca-5564-4cbf-523-0117f7dac6ad",
|
||||
"alarm_definition": {
|
||||
"severity": "LOW",
|
||||
"id": "b7e5f472-7aa5-4254-a49a-463e749ae817",
|
||||
"links": [
|
||||
{
|
||||
"href": "http://192.168.10.4:8080/v2.0/alarm-definitions/b7e5f472-7aa5-4254-a49a-463e749ae817",
|
||||
"rel": "self"
|
||||
}
|
||||
],
|
||||
"name": "high cpu and load"
|
||||
}
|
||||
"metrics":[{
|
||||
"name":"cpu_utilization",
|
||||
"name":"cpu.system_perc",
|
||||
"dimensions":{
|
||||
"hostname":"devstack"
|
||||
}
|
||||
@ -1636,7 +1992,7 @@ Returns a JSON array of alarm state transition objects with the following fields
|
||||
[
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -1648,7 +2004,7 @@ Returns a JSON array of alarm state transition objects with the following fields
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -1660,19 +2016,19 @@ Returns a JSON array of alarm state transition objects with the following fields
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
"old_state": "ALARM",
|
||||
"new_state": "UNDETERMINED",
|
||||
"reason": "No data was present for the sub-alarms: [avg(cpu_user_perc{hostname=devstack}) > 15.0]",
|
||||
"reason": "No data was present for the sub-alarms: [avg(cpu.system_perc{hostname=devstack}) > 15.0]",
|
||||
"reason_data": "{}",
|
||||
"timestamp": "2014-07-19T03:37:26.000Z"
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -1684,19 +2040,19 @@ Returns a JSON array of alarm state transition objects with the following fields
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
"old_state": "ALARM",
|
||||
"new_state": "UNDETERMINED",
|
||||
"reason": "No data was present for the sub-alarms: [avg(cpu_user_perc{hostname=devstack}) > 15.0]",
|
||||
"reason": "No data was present for the sub-alarms: [avg(cpu.system_perc{hostname=devstack}) > 15.0]",
|
||||
"reason_data": "{}",
|
||||
"timestamp": "2014-07-19T03:26:26.000Z"
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -1758,7 +2114,7 @@ Returns a JSON alarm object with the following fields:
|
||||
],
|
||||
"alarm_definition_id":"ad837fca-5564-4cbf-523-0117f7dac6ad",
|
||||
"metrics":[{
|
||||
"name":"cpu_utilization",
|
||||
"name":"cpu.system_perc",
|
||||
"dimensions":{
|
||||
"hostname":"devstack"
|
||||
}
|
||||
@ -1832,7 +2188,7 @@ Returns a JSON alarm object with the following parameters:
|
||||
],
|
||||
"alarm_definition_id":"ad837fca-5564-4cbf-523-0117f7dac6ad",
|
||||
"metrics":[{
|
||||
"name":"cpu_utilization",
|
||||
"name":"cpu.system_perc",
|
||||
"dimensions":{
|
||||
"hostname":"devstack"
|
||||
}
|
||||
@ -1906,7 +2262,7 @@ Returns a JSON alarm object with the following fields:
|
||||
],
|
||||
"alarm_definition_id":"ad837fca-5564-4cbf-523-0117f7dac6ad",
|
||||
"metrics":[{
|
||||
"name":"cpu_utilization",
|
||||
"name":"cpu.system_perc",
|
||||
"dimensions":{
|
||||
"hostname":"devstack"
|
||||
}
|
||||
@ -1997,7 +2353,7 @@ Returns a JSON array of alarm state transition objects with the following fields
|
||||
[
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -2009,7 +2365,7 @@ Returns a JSON array of alarm state transition objects with the following fields
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -2021,19 +2377,19 @@ Returns a JSON array of alarm state transition objects with the following fields
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
"old_state": "ALARM",
|
||||
"new_state": "UNDETERMINED",
|
||||
"reason": "No data was present for the sub-alarms: [avg(cpu_user_perc{hostname=devstack}) > 15.0]",
|
||||
"reason": "No data was present for the sub-alarms: [avg(cpu.system_perc{hostname=devstack}) > 15.0]",
|
||||
"reason_data": "{}",
|
||||
"timestamp": "2014-07-19T03:37:26.000Z"
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
@ -2045,19 +2401,19 @@ Returns a JSON array of alarm state transition objects with the following fields
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
"old_state": "ALARM",
|
||||
"new_state": "UNDETERMINED",
|
||||
"reason": "No data was present for the sub-alarms: [avg(cpu_user_perc{hostname=devstack}) > 15.0]",
|
||||
"reason": "No data was present for the sub-alarms: [avg(cpu.system_perc{hostname=devstack}) > 15.0]",
|
||||
"reason_data": "{}",
|
||||
"timestamp": "2014-07-19T03:26:26.000Z"
|
||||
},
|
||||
{
|
||||
"alarm_id": "f9935bcc-9641-4cbf-8224-0993a947ea83",
|
||||
"metric_name": "cpu_utilization",
|
||||
"metric_name": "cpu.system_perc",
|
||||
"metric_dimensions": {
|
||||
"hostname": "devstack"
|
||||
},
|
||||
|
Loading…
Reference in New Issue
Block a user