Create dropwizard plugin for collecting metrics via http

Change-Id: I51fc4f2c2a50a84e8d75cbc8f47c8a4e78b2e4d4
This commit is contained in:
Ryan Brandt
2015-06-09 15:51:09 -06:00
parent 2f096ae58c
commit 38b5ca8384
5 changed files with 300 additions and 58 deletions

View File

@@ -0,0 +1,72 @@
init_config:
instances:
# - name: Some Service Name
# url: http://some.url.example.com
# timeout: 1
# If your service uses basic authentication, you can optionally
# specify a username and password that will be used in the check.
# username: user
# password: pass
# If your service uses keystone for authentication, you can optionally
# specify the information to collect a token to be used in the check.
# This information should follow the same guidelines presented in
# agent.yaml.template
# https://github.com/stackforge/monasca-agent/blob/master/agent.yaml.template
# If use_keystone=True and keystone_config is not specified, the keystone information
# from the agent config will be used.
# use_keystone=True
# keystone_config:
# keystone_url: http://endpoint.com/v3/
# username: user
# password: password
# The (optional) collect_response_time parameter will instruct the
# check to create a metric 'network.http.response_time', tagged with
# the url, reporting the response time in seconds.
# collect_response_time: true
# The (optional) disable_ssl_validation will instruct the check
# to skip the validation of the SSL certificate of the URL being tested.
# This is mostly useful when checking SSL connections signed with
# certificates that are not themselves signed by a public authority.
# When true, the check logs a warning in collector.log
# disable_ssl_validation: true
# The (optional) headers parameter allows you to send extra headers
# with the request. This is useful for explicitly specifying the host
# header or perhaps adding headers for authorisation purposes. Note
# that the http client library converts all headers to lowercase.
# This is legal according to RFC2616
# (See: http://tools.ietf.org/html/rfc2616#section-4.2)
# but may be problematic with some HTTP servers
# (See: https://code.google.com/p/httplib2/issues/detail?id=169)
# headers:
# Host: alternative.host.example.com
# X-Auth-Token: SOME-AUTH-TOKEN
# dimensions:
# dim1: value1
# To select which metrics to record, create a whitelist. Each entry in
# the whitelist should include the name you want to give the metric,
# the path to the metric value in the json (as a series of keys
# separated by '/'), and the type of recording to use (counter, gauge,
# rate, histogram, set). See the Plugins documentation about
# http_metrics for more information about the different types.
# whitelist:
# - name: jvm.memory.total.used
# path: gauges/jvm.memory.total.used/value
# type: gauge
# - name: metrics.published
# path: meters/monasca.api.app.MetricService.metrics.published/count
# type: rate
# - name: raw-sql.time.avg
# path: timers/org.skife.jdbi.v2.DBI.raw-sql/mean
# type: gauge

View File

@@ -19,6 +19,7 @@
- [Host Alive Checks](#host-alive-checks)
- [Process Checks](#process-checks)
- [Http Endpoint Checks](#http-endpoint-checks)
- [Http Metrics](#http-metrics)
- [MySQL Checks](#mysql-checks)
- [ZooKeeper Checks](#zookeeper-checks)
- [Kafka Checks](#kafka-checks)
@@ -523,7 +524,7 @@ The process checks return the following metrics:
## Http Endpoint Checks
This section describes the http endpoint check that can be performed by the Agent. Http endpoint checks are checks that perform simple up/down checks on services, such as HTTP/REST APIs. An agent, given a list of URLs can dispatch an http request and report to the API success/failure as a metric.
This section describes the http endpoint check that can be performed by the Agent. Http endpoint checks are checks that perform simple up/down checks on services, such as HTTP/REST APIs. An agent, given a list of URLs, can dispatch an http request and report to the API success/failure as a metric.
default dimensions:
url: endpoint
@@ -556,6 +557,31 @@ The http_status checks return the following metrics:
| http_status | url, detail | The status of the http endpoint call (0 = success, 1 = failure)
| http_response_time | url | The response time of the http endpoint call
## Http Metrics
This section describes the http metrics check that can be performed by the agent. Http metrics checks are checks that retrieve metrics from any url returning a json formatted response. An agent, given a list of URLs, can dispatch an http request and parse the desired metrics from the json response.
default dimensions:
url: endpoint
default value_meta
error: error_message
Similar to other checks, the configuration is done in YAML (http_metrics.yaml), and consists of two keys: init_config and instances. The former is not used by http_metrics, while the later contains one or more URLs to check, plus optional parameters like a timeout, username/password, whether or not to also record the response time, and a whitelist of metrics to collect. The whitelist should consist of a name, path, and type for each metric to be collected. The name is what the metric will be called when it is reported. The path is a string of keys separated by '/' where the metric value resides in the json response. The type is how you want the metric to be recorded (gauge, counter, histogram, rate, set). A gauge will store and report the value it find with no modifications. A counter will increment itself by the value it finds. A histogram will store values and return the calculated max, median, average, count, and percentiles. A rate will return the difference between the last two recorded samples divided by the interval between those samples in seconds. A set will record samples and return the number of unique values in the set.
If the endpoint being checked requires authentication, there are two options. First, a username and password supplied in the instance options will be used by the check for authentication. Alternately, the check can retrieve a keystone token for authentication. Specific keystone information can be provided for each check, otherwise the information from the agent config will be used.
```
init_config:
instances:
url: http://192.168.0.254/metrics
timeout: 1
collect_response_time: true
whitelist:
name: jvm.memory.total.max,
path: gauges/jvm.memory.total.max/value
type: gauge
```
## MySQL Checks
This section describes the mySQL check that can be performed by the Agent. The mySQL check requires a configuration file called mysql.yaml to be available in the agent conf.d configuration directory.

View File

@@ -21,7 +21,7 @@ class HTTPCheck(services_checks.ServicesCheck):
super(HTTPCheck, self).__init__(name, init_config, agent_config, instances)
@staticmethod
def _load_conf(instance):
def _load_http_conf(instance):
# Fetches the conf
username = instance.get('username', None)
password = instance.get('password', None)
@@ -31,29 +31,26 @@ class HTTPCheck(services_checks.ServicesCheck):
keystone_config = instance.get('keystone_config', None)
url = instance.get('url', None)
response_time = instance.get('collect_response_time', False)
pattern = instance.get('match_pattern', None)
if url is None:
raise Exception("Bad configuration. You must specify a url")
ssl = instance.get('disable_ssl_validation', True)
return url, username, password, timeout, headers, response_time, ssl, pattern, use_keystone, keystone_config
return url, username, password, timeout, headers, response_time, ssl, use_keystone, keystone_config
def _create_status_event(self, status, msg, instance):
"""Does nothing: status events are not yet supported by Mon API.
"""
return
def _check(self, instance):
addr, username, password, timeout, headers, response_time, disable_ssl_validation, pattern, use_keystone, keystone_config = self._load_conf(
def _http_check(self, instance):
addr, username, password, timeout, headers, response_time, disable_ssl_validation, use_keystone, keystone_config = self._load_http_conf(
instance)
config = cfg.Config()
api_config = config.get_config('Api')
content = ''
dimensions = self._set_dimensions({'url': addr}, instance)
start = time.time()
done = False
retry = False
while not done or retry:
@@ -67,9 +64,10 @@ class HTTPCheck(services_checks.ServicesCheck):
headers["X-Auth-Token"] = token
headers["Content-type"] = "application/json"
else:
self.log.warning("""Unable to get token. Keystone API server may be down.
Skipping check for {0}""".format(addr))
return
error_string = """Unable to get token. Keystone API server may be down.
Skipping check for {0}""".format(addr)
self.log.warning(error_string)
return False, error_string
try:
self.log.debug("Connecting to %s" % addr)
if disable_ssl_validation:
@@ -84,31 +82,19 @@ class HTTPCheck(services_checks.ServicesCheck):
length = int((time.time() - start) * 1000)
error_string = '{0} is DOWN, error: {1}. Connection failed after {2} ms'.format(addr, str(e), length)
self.log.info(error_string)
self.gauge('http_status',
1,
dimensions=dimensions,
value_meta={'error': error_string})
return services_checks.Status.DOWN, error_string
return False, error_string
except httplib.ResponseNotReady as e:
length = int((time.time() - start) * 1000)
error_string = '{0} is DOWN, error: {1}. Network is not routable after {2} ms'.format(addr, repr(e), length)
self.log.info(error_string)
self.gauge('http_status',
1,
dimensions=dimensions,
value_meta={'error': error_string})
return services_checks.Status.DOWN, error_string
return False, error_string
except Exception as e:
length = int((time.time() - start) * 1000)
error_string = '{0} is DOWN, error: {1}. Connection failed after {2} ms'.format(addr, str(e), length)
self.log.error('Unhandled exception {0}. Connection failed after {1} ms'.format(str(e), length))
self.gauge('http_status',
1,
dimensions=dimensions,
value_meta={'error': error_string})
return services_checks.Status.DOWN, error_string
return False, error_string
if response_time:
# Stop the timer as early as possible
@@ -120,7 +106,7 @@ class HTTPCheck(services_checks.ServicesCheck):
if retry:
error_string = '{0} is DOWN, unable to get a valid token to connect with'.format(addr)
self.log.error(error_string)
return services_checks.Status.DOWN, error_string
return False, error_string
else:
# Get a new token and retry
self.log.info("Token expired, getting new token and retrying...")
@@ -130,26 +116,37 @@ class HTTPCheck(services_checks.ServicesCheck):
else:
error_string = '{0} is DOWN, error code: {1}'.format(addr, str(resp.status))
self.log.info(error_string)
self.gauge('http_status',
1,
dimensions=dimensions,
value_meta={'error': error_string})
return services_checks.Status.DOWN, error_string
if pattern is not None:
if re.search(pattern, content, re.DOTALL):
self.log.debug("Pattern match successful")
else:
error_string = 'Pattern match failed! "{0}" not in "{1}"'.format(pattern, content)
self.log.info(error_string)
self.gauge('http_status',
1,
dimensions=dimensions,
value_meta={'error': error_string})
return services_checks.Status.DOWN, error_string
success_string = '{0} is UP'.format(addr)
self.log.debug(success_string)
self.gauge('http_status', 0, dimensions=dimensions)
return False, error_string
done = True
return services_checks.Status.UP, success_string
return True, content
def _check(self, instance):
content = ''
addr = instance.get("url", None)
pattern = instance.get('match_pattern', None)
dimensions = self._set_dimensions({'url': addr}, instance)
success, result_string = self._http_check(instance)
if not success:
self.gauge('http_status',
1,
dimensions=dimensions)
return services_checks.Status.DOWN, result_string
if pattern is not None:
if re.search(pattern, result_string, re.DOTALL):
self.log.debug("Pattern match successful")
else:
error_string = 'Pattern match failed! "{0}" not in "{1}"'.format(pattern, content)
self.log.info(error_string)
self.gauge('http_status',
1,
dimensions=dimensions,
value_meta={'error': error_string})
return services_checks.Status.DOWN, error_string
success_string = '{0} is UP'.format(addr)
self.log.debug(success_string)
self.gauge('http_status', 0, dimensions=dimensions)
return services_checks.Status.UP, success_string

View File

@@ -0,0 +1,77 @@
#!/bin/env python
"""Monitoring Agent plugin for HTTP/API checks.
"""
import json
from numbers import Number
import monasca_agent.collector.checks.services_checks as services_checks
import monasca_agent.collector.checks_d.http_check as http_check
class HTTPMetrics(http_check.HTTPCheck):
def __init__(self, name, init_config, agent_config, instances=None):
super(HTTPMetrics, self).__init__(name, init_config, agent_config,
instances)
self.metric_method = {
'gauge': self.gauge,
'counter': self.increment,
'histogram': self.histogram,
'set': self.set,
'rate': self.rate}
def _valid_number(self, value, name):
if not isinstance(value, Number):
self.log.info("Value '{0}' is not a number for metric {1}".format(
value, name))
return False
return True
def _check(self, instance):
addr = instance.get("url", None)
whitelist = instance.get("whitelist", None)
dimensions = self._set_dimensions({'url': addr}, instance)
success, result_string = self._http_check(instance)
if success:
json_data = json.loads(result_string)
for metric in whitelist:
try:
metric_name = metric['name']
metric_type = metric['type']
keys = metric['path'].split('/')
except Exception:
self.log.warning("Invalid configuration for metric '{0}'".format(metric))
continue
current = json_data
try:
for key in keys:
current = current[key]
except Exception:
self.log.warning("Could not find a value at {0} in json message".format(keys))
continue
value = current
# everything requires a number, except set
if metric_type in ['gauge', 'counter', 'histogram', 'rate']:
if not self._valid_number(value, metric_name):
self.log.warning("Invalid value '{0}' for metric '{1}'".format(value, metric_name))
continue
if metric_type in self.metric_method:
self.metric_method[metric_type](metric_name,
value,
dimensions=dimensions)
else:
self.log.warning("Unrecognized type '{0}' for metric '{1}'".format(metric_type, metric_name))
success_string = '{0} is UP'.format(addr)
self.log.debug(success_string)
return services_checks.Status.UP, success_string

View File

@@ -39,11 +39,34 @@ class MonAPI(monasca_setup.detection.Plugin):
"""Build the config as a Plugins object and return."""
log.info("\tEnabling the Monasca api healthcheck")
admin_port = self.api_config['server']['adminConnectors'][0]['port']
return dropwizard_health_check('monitoring', 'api', 'http://localhost:{0}/healthcheck'.format(admin_port))
config = monasca_setup.agent_config.Plugins()
config.merge(dropwizard_health_check('monitoring', 'api', 'http://localhost:8081/healthcheck'))
# todo
# log.info("\tEnabling the mon api metric collection")
# http://localhost:8081/metrics
log.info("\tEnabling the Monasca api metrics")
whitelist = [
{
"name": "jvm.memory.total.max",
"path": "gauges/jvm.memory.total.max/value",
"type": "gauge"},
{
"name": "jvm.memory.total.used",
"path": "gauges/jvm.memory.total.used/value",
"type": "gauge"},
{
"name": "metrics.published",
"path": "meters/monasca.api.app.MetricService.metrics.published/count",
"type": "rate"},
{
"name": "raw-sql.time.avg",
"path": "timers/org.skife.jdbi.v2.DBI.raw-sql/mean",
"type": "gauge"},
{
"name": "raw-sql.time.max",
"path": "timers/org.skife.jdbi.v2.DBI.raw-sql/max",
"type": "gauge"},
]
config.merge(dropwizard_metrics('monitoring', 'api', 'http://localhost:8081/metrics', whitelist))
return config
def dependencies_installed(self):
return True
@@ -75,11 +98,46 @@ class MonPersister(monasca_setup.detection.Plugin):
def build_config(self):
"""Build the config as a Plugins object and return."""
log.info("\tEnabling the Monasca persister healthcheck")
return dropwizard_health_check('monitoring', 'persister', 'http://localhost:8091/healthcheck')
config = monasca_setup.agent_config.Plugins()
config.merge(dropwizard_health_check('monitoring', 'persister', 'http://localhost:8091/healthcheck'))
# todo
# log.info("\tEnabling the mon persister metric collection")
# http://localhost:8091/metrics
log.info("\tEnabling the Monasca persister metrics")
whitelist = [
{
"name": "jvm.memory.total.max",
"path": "gauges/jvm.memory.total.max/value",
"type": "gauge"},
{
"name": "jvm.memory.total.used",
"path": "gauges/jvm.memory.total.used/value",
"type": "gauge"},
{
"name": "alarm-state-transitions-added-to-batch-counter[0]",
"path": "counters/monasca.persister.pipeline.event.AlarmStateTransitionHandler[alarm-state-transition-0].alarm-state-transitions-added-to-batch-counter/count",
"type": "rate"},
{
"name": "alarm-state-transitions-added-to-batch-counter[1]",
"path": "counters/monasca.persister.pipeline.event.AlarmStateTransitionHandler[alarm-state-transition-1].alarm-state-transitions-added-to-batch-counter/count",
"type": "rate"},
{
"name": "metrics-added-to-batch-counter[0]",
"path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-0].metrics-added-to-batch-counter/count",
"type": "rate"},
{
"name": "metrics-added-to-batch-counter[1]",
"path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-1].metrics-added-to-batch-counter/count",
"type": "rate"},
{
"name": "metrics-added-to-batch-counter[2]",
"path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-2].metrics-added-to-batch-counter/count",
"type": "rate"},
{
"name": "metrics-added-to-batch-counter[3]",
"path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-3].metrics-added-to-batch-counter/count",
"type": "rate"}
]
config.merge(dropwizard_metrics('monitoring', 'persister', 'http://localhost:8091/metrics', whitelist))
return config
def dependencies_installed(self):
return True
@@ -115,3 +173,15 @@ def dropwizard_health_check(service, component, url):
'include_content': False,
'dimensions': {'service': service, 'component': component}}]}
return config
def dropwizard_metrics(service, component, url, whitelist):
"""Setup a dropwizard metrics check"""
config = monasca_setup.agent_config.Plugins()
config['http_metrics'] = {'init_config': None,
'instances': [{'name': "{0}-{1} metrics".format(service, component),
'url': url,
'timeout': 1,
'dimensions': {'service': service, 'component': component},
'whitelist': whitelist}]}
return config