From e1daeaddbc436093d1bd15b2ef17a24b8c4928f5 Mon Sep 17 00:00:00 2001 From: Andrew Hutchings Date: Fri, 19 Apr 2013 15:51:25 +0100 Subject: [PATCH] Add some documentation for statsd Change-Id: Ie93681bec4c12bfda5c3c31d34e3cb02176e252d --- doc/config.rst | 50 +++++++++++++++++++++++++++++++++++-- doc/index.rst | 1 + doc/statsd/about.rst | 24 ++++++++++++++++++ doc/statsd/drivers.rst | 56 ++++++++++++++++++++++++++++++++++++++++++ doc/statsd/index.rst | 8 ++++++ 5 files changed, 137 insertions(+), 2 deletions(-) create mode 100644 doc/statsd/about.rst create mode 100644 doc/statsd/drivers.rst create mode 100644 doc/statsd/index.rst diff --git a/doc/config.rst b/doc/config.rst index 3a60d05d..1bcb8f33 100644 --- a/doc/config.rst +++ b/doc/config.rst @@ -1,5 +1,5 @@ -Configuration of Node Pool Manager and Worker -============================================= +Configuration of Services +========================= Options can be specified either via the command line, or with a configuration file, or both. Options given on the command line will override any options @@ -273,4 +273,50 @@ Pool Manager Command Line Options Enable verbose output. Normally, only errors are logged. This enables additional logging, but not as much as the :option:`-d` option. +Statsd Command Line Options +--------------------------- + .. program:: libra_statsd.py + + .. option:: --api_server + + The hostname/IP and port colon separated for use with the HP REST API + driver. Can be specified multiple times for multiple servers. This + option is also used for the hp_rest alerting driver. + + .. option:: --server + + Used to specify the Gearman job server hostname and port. This option + can be used multiple times to specify multiple job servers + + .. option:: --driver + + The drivers to be used for alerting. This option can be used multiple + times to specift multiple drivers. + + .. option:: --ping_interval + + How often to run a ping check of load balancers (in seconds), default 60 + + .. option:: --repair_interval + + How often to run a check to see if damaged load balancers had been + repaired (in seconds), default 180 + + .. option:: --datadog_api_key + + The API key to be used for the datadog driver + + .. option:: --datadog_app_key + + The Application key to be used for the datadog driver + + .. option:: --datadog_message_tail + + Some text to add at the end of an alerting message such as a list of + users to alert (using @user@email.com format), used for the datadog + driver. + + .. option:: --datadog_tags + + A list of tags to be used for the datadog driver diff --git a/doc/index.rst b/doc/index.rst index 5208c02d..b74e495a 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -7,4 +7,5 @@ Load Balancer as a Service Device Tools introduction worker/index pool_mgm/index + statsd/index config diff --git a/doc/statsd/about.rst b/doc/statsd/about.rst new file mode 100644 index 00000000..197491d5 --- /dev/null +++ b/doc/statsd/about.rst @@ -0,0 +1,24 @@ +Description +=========== + +Purpose +------- + +The Libra Statsd is a monitoring system for the health of load balancers. It +can query many load balancers in parallel and supports a plugable architecture +for different methods of reporting. + +Design +------ + +Statsd currently only does an advanced "ping" style monitoring. By default it +will get a list of ONLINE load balancers from the API server and will send a +gearman message to the worker of each one. The worker tests its own HAProxy +instance and will report a success/fail. If there is a failure or the gearman +message times-out then this is sent to the alerting backends. There is a +further secheduled run set to every three minutes which will re-test the failed +devices to see if they have been repair. If they have this will trigger a +'repaired' notice. + +Alerting is done using a plugin system which can have multiple plugins enabled +at the same time. diff --git a/doc/statsd/drivers.rst b/doc/statsd/drivers.rst new file mode 100644 index 00000000..bd12e45f --- /dev/null +++ b/doc/statsd/drivers.rst @@ -0,0 +1,56 @@ +Statsd Drivers +============== + +Introduction +------------ + +Statsd has a small driver API to be used for alerting. Multiple drivers can +be loaded at the same time to alert in multiple places. + +Design +------ + +The base class called ``AlertDriver`` is used to create new drivers. These +will be supplied ``self.logger`` to use for logging and ``self.args`` which +contains the arguments supplied to statsd. Drivers using this need to +supply two functions: + +.. py:class:: AlertDriver + + .. py:method:: send_alert(message, device_id) + + :param message: A message with details of the failure + :param device_id: The ID of the device that has failed + + .. py:method:: send_repair(message, device_id) + + :param message: A message with details of the recovered load balancer + :param device_id: The ID of the device that has been recovered + + +.. py:data:: known_drivers + + This is the dictionary that maps values for the + :option:`--driver ` option + to a class implementing the driver :py:class:`~AlertDriver` API + for the statsd server. After implementing a new driver class, you simply add + a new entry to this dictionary to make it a selectable option. + +Dummy Driver +------------ + +This driver is used for simple testing/debugging. It echos the message details +into statsd's log file. + +Datadog Driver +-------------- + +The Datadog driver uses the Datadog API to send alerts into the Datadog event +stream. Alerts are sent as 'ERROR' and repairs as 'SUCCESS'. + +HP REST Driver +-------------- + +This sends messages to the HP REST API server to mark nodes as ERROR/ONLINE. + + diff --git a/doc/statsd/index.rst b/doc/statsd/index.rst new file mode 100644 index 00000000..7b3e0282 --- /dev/null +++ b/doc/statsd/index.rst @@ -0,0 +1,8 @@ +Statsd Monitoring Daemon +======================== + +.. toctree:: + :maxdepth: 2 + + about + drivers