Add some documentation for statsd
Change-Id: Ie93681bec4c12bfda5c3c31d34e3cb02176e252d
This commit is contained in:
committed by
David Shrewsbury
parent
7a13c2a704
commit
e1daeaddbc
@@ -1,5 +1,5 @@
|
||||
Configuration of Node Pool Manager and Worker
|
||||
=============================================
|
||||
Configuration of Services
|
||||
=========================
|
||||
|
||||
Options can be specified either via the command line, or with a configuration
|
||||
file, or both. Options given on the command line will override any options
|
||||
@@ -273,4 +273,50 @@ Pool Manager Command Line Options
|
||||
Enable verbose output. Normally, only errors are logged. This enables
|
||||
additional logging, but not as much as the :option:`-d` option.
|
||||
|
||||
Statsd Command Line Options
|
||||
---------------------------
|
||||
|
||||
.. program:: libra_statsd.py
|
||||
|
||||
.. option:: --api_server <HOST:PORT>
|
||||
|
||||
The hostname/IP and port colon separated for use with the HP REST API
|
||||
driver. Can be specified multiple times for multiple servers. This
|
||||
option is also used for the hp_rest alerting driver.
|
||||
|
||||
.. option:: --server <HOST:PORT>
|
||||
|
||||
Used to specify the Gearman job server hostname and port. This option
|
||||
can be used multiple times to specify multiple job servers
|
||||
|
||||
.. option:: --driver <DRIVER LIST>
|
||||
|
||||
The drivers to be used for alerting. This option can be used multiple
|
||||
times to specift multiple drivers.
|
||||
|
||||
.. option:: --ping_interval <PING_INTERVAL>
|
||||
|
||||
How often to run a ping check of load balancers (in seconds), default 60
|
||||
|
||||
.. option:: --repair_interval <REPAIR_INTERVAL>
|
||||
|
||||
How often to run a check to see if damaged load balancers had been
|
||||
repaired (in seconds), default 180
|
||||
|
||||
.. option:: --datadog_api_key <KEY>
|
||||
|
||||
The API key to be used for the datadog driver
|
||||
|
||||
.. option:: --datadog_app_key <KEY>
|
||||
|
||||
The Application key to be used for the datadog driver
|
||||
|
||||
.. option:: --datadog_message_tail <TEXT>
|
||||
|
||||
Some text to add at the end of an alerting message such as a list of
|
||||
users to alert (using @user@email.com format), used for the datadog
|
||||
driver.
|
||||
|
||||
.. option:: --datadog_tags <TAGS>
|
||||
|
||||
A list of tags to be used for the datadog driver
|
||||
|
||||
@@ -7,4 +7,5 @@ Load Balancer as a Service Device Tools
|
||||
introduction
|
||||
worker/index
|
||||
pool_mgm/index
|
||||
statsd/index
|
||||
config
|
||||
|
||||
24
doc/statsd/about.rst
Normal file
24
doc/statsd/about.rst
Normal file
@@ -0,0 +1,24 @@
|
||||
Description
|
||||
===========
|
||||
|
||||
Purpose
|
||||
-------
|
||||
|
||||
The Libra Statsd is a monitoring system for the health of load balancers. It
|
||||
can query many load balancers in parallel and supports a plugable architecture
|
||||
for different methods of reporting.
|
||||
|
||||
Design
|
||||
------
|
||||
|
||||
Statsd currently only does an advanced "ping" style monitoring. By default it
|
||||
will get a list of ONLINE load balancers from the API server and will send a
|
||||
gearman message to the worker of each one. The worker tests its own HAProxy
|
||||
instance and will report a success/fail. If there is a failure or the gearman
|
||||
message times-out then this is sent to the alerting backends. There is a
|
||||
further secheduled run set to every three minutes which will re-test the failed
|
||||
devices to see if they have been repair. If they have this will trigger a
|
||||
'repaired' notice.
|
||||
|
||||
Alerting is done using a plugin system which can have multiple plugins enabled
|
||||
at the same time.
|
||||
56
doc/statsd/drivers.rst
Normal file
56
doc/statsd/drivers.rst
Normal file
@@ -0,0 +1,56 @@
|
||||
Statsd Drivers
|
||||
==============
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Statsd has a small driver API to be used for alerting. Multiple drivers can
|
||||
be loaded at the same time to alert in multiple places.
|
||||
|
||||
Design
|
||||
------
|
||||
|
||||
The base class called ``AlertDriver`` is used to create new drivers. These
|
||||
will be supplied ``self.logger`` to use for logging and ``self.args`` which
|
||||
contains the arguments supplied to statsd. Drivers using this need to
|
||||
supply two functions:
|
||||
|
||||
.. py:class:: AlertDriver
|
||||
|
||||
.. py:method:: send_alert(message, device_id)
|
||||
|
||||
:param message: A message with details of the failure
|
||||
:param device_id: The ID of the device that has failed
|
||||
|
||||
.. py:method:: send_repair(message, device_id)
|
||||
|
||||
:param message: A message with details of the recovered load balancer
|
||||
:param device_id: The ID of the device that has been recovered
|
||||
|
||||
|
||||
.. py:data:: known_drivers
|
||||
|
||||
This is the dictionary that maps values for the
|
||||
:option:`--driver <libra_statsd.py --driver>` option
|
||||
to a class implementing the driver :py:class:`~AlertDriver` API
|
||||
for the statsd server. After implementing a new driver class, you simply add
|
||||
a new entry to this dictionary to make it a selectable option.
|
||||
|
||||
Dummy Driver
|
||||
------------
|
||||
|
||||
This driver is used for simple testing/debugging. It echos the message details
|
||||
into statsd's log file.
|
||||
|
||||
Datadog Driver
|
||||
--------------
|
||||
|
||||
The Datadog driver uses the Datadog API to send alerts into the Datadog event
|
||||
stream. Alerts are sent as 'ERROR' and repairs as 'SUCCESS'.
|
||||
|
||||
HP REST Driver
|
||||
--------------
|
||||
|
||||
This sends messages to the HP REST API server to mark nodes as ERROR/ONLINE.
|
||||
|
||||
|
||||
8
doc/statsd/index.rst
Normal file
8
doc/statsd/index.rst
Normal file
@@ -0,0 +1,8 @@
|
||||
Statsd Monitoring Daemon
|
||||
========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
about
|
||||
drivers
|
||||
Reference in New Issue
Block a user