Add some documentation for statsd

Change-Id: Ie93681bec4c12bfda5c3c31d34e3cb02176e252d
This commit is contained in:
Andrew Hutchings
2013-04-19 15:51:25 +01:00
parent e66ede3d9a
commit 27cde75e86
5 changed files with 137 additions and 2 deletions

View File

@@ -1,5 +1,5 @@
Configuration of Node Pool Manager and Worker Configuration of Services
============================================= =========================
Options can be specified either via the command line, or with a configuration Options can be specified either via the command line, or with a configuration
file, or both. Options given on the command line will override any options file, or both. Options given on the command line will override any options
@@ -273,4 +273,50 @@ Pool Manager Command Line Options
Enable verbose output. Normally, only errors are logged. This enables Enable verbose output. Normally, only errors are logged. This enables
additional logging, but not as much as the :option:`-d` option. additional logging, but not as much as the :option:`-d` option.
Statsd Command Line Options
---------------------------
.. program:: libra_statsd.py
.. option:: --api_server <HOST:PORT>
The hostname/IP and port colon separated for use with the HP REST API
driver. Can be specified multiple times for multiple servers. This
option is also used for the hp_rest alerting driver.
.. option:: --server <HOST:PORT>
Used to specify the Gearman job server hostname and port. This option
can be used multiple times to specify multiple job servers
.. option:: --driver <DRIVER LIST>
The drivers to be used for alerting. This option can be used multiple
times to specift multiple drivers.
.. option:: --ping_interval <PING_INTERVAL>
How often to run a ping check of load balancers (in seconds), default 60
.. option:: --repair_interval <REPAIR_INTERVAL>
How often to run a check to see if damaged load balancers had been
repaired (in seconds), default 180
.. option:: --datadog_api_key <KEY>
The API key to be used for the datadog driver
.. option:: --datadog_app_key <KEY>
The Application key to be used for the datadog driver
.. option:: --datadog_message_tail <TEXT>
Some text to add at the end of an alerting message such as a list of
users to alert (using @user@email.com format), used for the datadog
driver.
.. option:: --datadog_tags <TAGS>
A list of tags to be used for the datadog driver

View File

@@ -7,4 +7,5 @@ Load Balancer as a Service Device Tools
introduction introduction
worker/index worker/index
pool_mgm/index pool_mgm/index
statsd/index
config config

24
doc/statsd/about.rst Normal file
View File

@@ -0,0 +1,24 @@
Description
===========
Purpose
-------
The Libra Statsd is a monitoring system for the health of load balancers. It
can query many load balancers in parallel and supports a plugable architecture
for different methods of reporting.
Design
------
Statsd currently only does an advanced "ping" style monitoring. By default it
will get a list of ONLINE load balancers from the API server and will send a
gearman message to the worker of each one. The worker tests its own HAProxy
instance and will report a success/fail. If there is a failure or the gearman
message times-out then this is sent to the alerting backends. There is a
further secheduled run set to every three minutes which will re-test the failed
devices to see if they have been repair. If they have this will trigger a
'repaired' notice.
Alerting is done using a plugin system which can have multiple plugins enabled
at the same time.

56
doc/statsd/drivers.rst Normal file
View File

@@ -0,0 +1,56 @@
Statsd Drivers
==============
Introduction
------------
Statsd has a small driver API to be used for alerting. Multiple drivers can
be loaded at the same time to alert in multiple places.
Design
------
The base class called ``AlertDriver`` is used to create new drivers. These
will be supplied ``self.logger`` to use for logging and ``self.args`` which
contains the arguments supplied to statsd. Drivers using this need to
supply two functions:
.. py:class:: AlertDriver
.. py:method:: send_alert(message, device_id)
:param message: A message with details of the failure
:param device_id: The ID of the device that has failed
.. py:method:: send_repair(message, device_id)
:param message: A message with details of the recovered load balancer
:param device_id: The ID of the device that has been recovered
.. py:data:: known_drivers
This is the dictionary that maps values for the
:option:`--driver <libra_statsd.py --driver>` option
to a class implementing the driver :py:class:`~AlertDriver` API
for the statsd server. After implementing a new driver class, you simply add
a new entry to this dictionary to make it a selectable option.
Dummy Driver
------------
This driver is used for simple testing/debugging. It echos the message details
into statsd's log file.
Datadog Driver
--------------
The Datadog driver uses the Datadog API to send alerts into the Datadog event
stream. Alerts are sent as 'ERROR' and repairs as 'SUCCESS'.
HP REST Driver
--------------
This sends messages to the HP REST API server to mark nodes as ERROR/ONLINE.

8
doc/statsd/index.rst Normal file
View File

@@ -0,0 +1,8 @@
Statsd Monitoring Daemon
========================
.. toctree::
:maxdepth: 2
about
drivers