Create dropwizard plugin for collecting metrics via http
Change-Id: I51fc4f2c2a50a84e8d75cbc8f47c8a4e78b2e4d4
This commit is contained in:
72
conf.d/http_metrics.yaml.example
Normal file
72
conf.d/http_metrics.yaml.example
Normal file
@@ -0,0 +1,72 @@
|
||||
init_config:
|
||||
|
||||
instances:
|
||||
# - name: Some Service Name
|
||||
# url: http://some.url.example.com
|
||||
# timeout: 1
|
||||
|
||||
# If your service uses basic authentication, you can optionally
|
||||
# specify a username and password that will be used in the check.
|
||||
# username: user
|
||||
# password: pass
|
||||
|
||||
# If your service uses keystone for authentication, you can optionally
|
||||
# specify the information to collect a token to be used in the check.
|
||||
# This information should follow the same guidelines presented in
|
||||
# agent.yaml.template
|
||||
# https://github.com/stackforge/monasca-agent/blob/master/agent.yaml.template
|
||||
# If use_keystone=True and keystone_config is not specified, the keystone information
|
||||
# from the agent config will be used.
|
||||
# use_keystone=True
|
||||
# keystone_config:
|
||||
# keystone_url: http://endpoint.com/v3/
|
||||
# username: user
|
||||
# password: password
|
||||
|
||||
# The (optional) collect_response_time parameter will instruct the
|
||||
# check to create a metric 'network.http.response_time', tagged with
|
||||
# the url, reporting the response time in seconds.
|
||||
|
||||
# collect_response_time: true
|
||||
|
||||
# The (optional) disable_ssl_validation will instruct the check
|
||||
# to skip the validation of the SSL certificate of the URL being tested.
|
||||
# This is mostly useful when checking SSL connections signed with
|
||||
# certificates that are not themselves signed by a public authority.
|
||||
# When true, the check logs a warning in collector.log
|
||||
|
||||
# disable_ssl_validation: true
|
||||
|
||||
# The (optional) headers parameter allows you to send extra headers
|
||||
# with the request. This is useful for explicitly specifying the host
|
||||
# header or perhaps adding headers for authorisation purposes. Note
|
||||
# that the http client library converts all headers to lowercase.
|
||||
# This is legal according to RFC2616
|
||||
# (See: http://tools.ietf.org/html/rfc2616#section-4.2)
|
||||
# but may be problematic with some HTTP servers
|
||||
# (See: https://code.google.com/p/httplib2/issues/detail?id=169)
|
||||
|
||||
# headers:
|
||||
# Host: alternative.host.example.com
|
||||
# X-Auth-Token: SOME-AUTH-TOKEN
|
||||
|
||||
# dimensions:
|
||||
# dim1: value1
|
||||
|
||||
# To select which metrics to record, create a whitelist. Each entry in
|
||||
# the whitelist should include the name you want to give the metric,
|
||||
# the path to the metric value in the json (as a series of keys
|
||||
# separated by '/'), and the type of recording to use (counter, gauge,
|
||||
# rate, histogram, set). See the Plugins documentation about
|
||||
# http_metrics for more information about the different types.
|
||||
|
||||
# whitelist:
|
||||
# - name: jvm.memory.total.used
|
||||
# path: gauges/jvm.memory.total.used/value
|
||||
# type: gauge
|
||||
# - name: metrics.published
|
||||
# path: meters/monasca.api.app.MetricService.metrics.published/count
|
||||
# type: rate
|
||||
# - name: raw-sql.time.avg
|
||||
# path: timers/org.skife.jdbi.v2.DBI.raw-sql/mean
|
||||
# type: gauge
|
||||
@@ -19,6 +19,7 @@
|
||||
- [Host Alive Checks](#host-alive-checks)
|
||||
- [Process Checks](#process-checks)
|
||||
- [Http Endpoint Checks](#http-endpoint-checks)
|
||||
- [Http Metrics](#http-metrics)
|
||||
- [MySQL Checks](#mysql-checks)
|
||||
- [ZooKeeper Checks](#zookeeper-checks)
|
||||
- [Kafka Checks](#kafka-checks)
|
||||
@@ -523,7 +524,7 @@ The process checks return the following metrics:
|
||||
|
||||
|
||||
## Http Endpoint Checks
|
||||
This section describes the http endpoint check that can be performed by the Agent. Http endpoint checks are checks that perform simple up/down checks on services, such as HTTP/REST APIs. An agent, given a list of URLs can dispatch an http request and report to the API success/failure as a metric.
|
||||
This section describes the http endpoint check that can be performed by the Agent. Http endpoint checks are checks that perform simple up/down checks on services, such as HTTP/REST APIs. An agent, given a list of URLs, can dispatch an http request and report to the API success/failure as a metric.
|
||||
|
||||
default dimensions:
|
||||
url: endpoint
|
||||
@@ -556,6 +557,31 @@ The http_status checks return the following metrics:
|
||||
| http_status | url, detail | The status of the http endpoint call (0 = success, 1 = failure)
|
||||
| http_response_time | url | The response time of the http endpoint call
|
||||
|
||||
|
||||
## Http Metrics
|
||||
This section describes the http metrics check that can be performed by the agent. Http metrics checks are checks that retrieve metrics from any url returning a json formatted response. An agent, given a list of URLs, can dispatch an http request and parse the desired metrics from the json response.
|
||||
|
||||
default dimensions:
|
||||
url: endpoint
|
||||
|
||||
default value_meta
|
||||
error: error_message
|
||||
|
||||
Similar to other checks, the configuration is done in YAML (http_metrics.yaml), and consists of two keys: init_config and instances. The former is not used by http_metrics, while the later contains one or more URLs to check, plus optional parameters like a timeout, username/password, whether or not to also record the response time, and a whitelist of metrics to collect. The whitelist should consist of a name, path, and type for each metric to be collected. The name is what the metric will be called when it is reported. The path is a string of keys separated by '/' where the metric value resides in the json response. The type is how you want the metric to be recorded (gauge, counter, histogram, rate, set). A gauge will store and report the value it find with no modifications. A counter will increment itself by the value it finds. A histogram will store values and return the calculated max, median, average, count, and percentiles. A rate will return the difference between the last two recorded samples divided by the interval between those samples in seconds. A set will record samples and return the number of unique values in the set.
|
||||
If the endpoint being checked requires authentication, there are two options. First, a username and password supplied in the instance options will be used by the check for authentication. Alternately, the check can retrieve a keystone token for authentication. Specific keystone information can be provided for each check, otherwise the information from the agent config will be used.
|
||||
|
||||
```
|
||||
init_config:
|
||||
|
||||
instances:
|
||||
url: http://192.168.0.254/metrics
|
||||
timeout: 1
|
||||
collect_response_time: true
|
||||
whitelist:
|
||||
name: jvm.memory.total.max,
|
||||
path: gauges/jvm.memory.total.max/value
|
||||
type: gauge
|
||||
```
|
||||
|
||||
## MySQL Checks
|
||||
This section describes the mySQL check that can be performed by the Agent. The mySQL check requires a configuration file called mysql.yaml to be available in the agent conf.d configuration directory.
|
||||
|
||||
@@ -21,7 +21,7 @@ class HTTPCheck(services_checks.ServicesCheck):
|
||||
super(HTTPCheck, self).__init__(name, init_config, agent_config, instances)
|
||||
|
||||
@staticmethod
|
||||
def _load_conf(instance):
|
||||
def _load_http_conf(instance):
|
||||
# Fetches the conf
|
||||
username = instance.get('username', None)
|
||||
password = instance.get('password', None)
|
||||
@@ -31,29 +31,26 @@ class HTTPCheck(services_checks.ServicesCheck):
|
||||
keystone_config = instance.get('keystone_config', None)
|
||||
url = instance.get('url', None)
|
||||
response_time = instance.get('collect_response_time', False)
|
||||
pattern = instance.get('match_pattern', None)
|
||||
if url is None:
|
||||
raise Exception("Bad configuration. You must specify a url")
|
||||
ssl = instance.get('disable_ssl_validation', True)
|
||||
|
||||
return url, username, password, timeout, headers, response_time, ssl, pattern, use_keystone, keystone_config
|
||||
return url, username, password, timeout, headers, response_time, ssl, use_keystone, keystone_config
|
||||
|
||||
def _create_status_event(self, status, msg, instance):
|
||||
"""Does nothing: status events are not yet supported by Mon API.
|
||||
|
||||
"""
|
||||
return
|
||||
|
||||
def _check(self, instance):
|
||||
addr, username, password, timeout, headers, response_time, disable_ssl_validation, pattern, use_keystone, keystone_config = self._load_conf(
|
||||
def _http_check(self, instance):
|
||||
addr, username, password, timeout, headers, response_time, disable_ssl_validation, use_keystone, keystone_config = self._load_http_conf(
|
||||
instance)
|
||||
config = cfg.Config()
|
||||
api_config = config.get_config('Api')
|
||||
content = ''
|
||||
|
||||
dimensions = self._set_dimensions({'url': addr}, instance)
|
||||
|
||||
start = time.time()
|
||||
|
||||
done = False
|
||||
retry = False
|
||||
while not done or retry:
|
||||
@@ -67,9 +64,10 @@ class HTTPCheck(services_checks.ServicesCheck):
|
||||
headers["X-Auth-Token"] = token
|
||||
headers["Content-type"] = "application/json"
|
||||
else:
|
||||
self.log.warning("""Unable to get token. Keystone API server may be down.
|
||||
Skipping check for {0}""".format(addr))
|
||||
return
|
||||
error_string = """Unable to get token. Keystone API server may be down.
|
||||
Skipping check for {0}""".format(addr)
|
||||
self.log.warning(error_string)
|
||||
return False, error_string
|
||||
try:
|
||||
self.log.debug("Connecting to %s" % addr)
|
||||
if disable_ssl_validation:
|
||||
@@ -84,31 +82,19 @@ class HTTPCheck(services_checks.ServicesCheck):
|
||||
length = int((time.time() - start) * 1000)
|
||||
error_string = '{0} is DOWN, error: {1}. Connection failed after {2} ms'.format(addr, str(e), length)
|
||||
self.log.info(error_string)
|
||||
self.gauge('http_status',
|
||||
1,
|
||||
dimensions=dimensions,
|
||||
value_meta={'error': error_string})
|
||||
return services_checks.Status.DOWN, error_string
|
||||
return False, error_string
|
||||
|
||||
except httplib.ResponseNotReady as e:
|
||||
length = int((time.time() - start) * 1000)
|
||||
error_string = '{0} is DOWN, error: {1}. Network is not routable after {2} ms'.format(addr, repr(e), length)
|
||||
self.log.info(error_string)
|
||||
self.gauge('http_status',
|
||||
1,
|
||||
dimensions=dimensions,
|
||||
value_meta={'error': error_string})
|
||||
return services_checks.Status.DOWN, error_string
|
||||
return False, error_string
|
||||
|
||||
except Exception as e:
|
||||
length = int((time.time() - start) * 1000)
|
||||
error_string = '{0} is DOWN, error: {1}. Connection failed after {2} ms'.format(addr, str(e), length)
|
||||
self.log.error('Unhandled exception {0}. Connection failed after {1} ms'.format(str(e), length))
|
||||
self.gauge('http_status',
|
||||
1,
|
||||
dimensions=dimensions,
|
||||
value_meta={'error': error_string})
|
||||
return services_checks.Status.DOWN, error_string
|
||||
return False, error_string
|
||||
|
||||
if response_time:
|
||||
# Stop the timer as early as possible
|
||||
@@ -120,7 +106,7 @@ class HTTPCheck(services_checks.ServicesCheck):
|
||||
if retry:
|
||||
error_string = '{0} is DOWN, unable to get a valid token to connect with'.format(addr)
|
||||
self.log.error(error_string)
|
||||
return services_checks.Status.DOWN, error_string
|
||||
return False, error_string
|
||||
else:
|
||||
# Get a new token and retry
|
||||
self.log.info("Token expired, getting new token and retrying...")
|
||||
@@ -130,26 +116,37 @@ class HTTPCheck(services_checks.ServicesCheck):
|
||||
else:
|
||||
error_string = '{0} is DOWN, error code: {1}'.format(addr, str(resp.status))
|
||||
self.log.info(error_string)
|
||||
self.gauge('http_status',
|
||||
1,
|
||||
dimensions=dimensions,
|
||||
value_meta={'error': error_string})
|
||||
return services_checks.Status.DOWN, error_string
|
||||
|
||||
if pattern is not None:
|
||||
if re.search(pattern, content, re.DOTALL):
|
||||
self.log.debug("Pattern match successful")
|
||||
else:
|
||||
error_string = 'Pattern match failed! "{0}" not in "{1}"'.format(pattern, content)
|
||||
self.log.info(error_string)
|
||||
self.gauge('http_status',
|
||||
1,
|
||||
dimensions=dimensions,
|
||||
value_meta={'error': error_string})
|
||||
return services_checks.Status.DOWN, error_string
|
||||
|
||||
success_string = '{0} is UP'.format(addr)
|
||||
self.log.debug(success_string)
|
||||
self.gauge('http_status', 0, dimensions=dimensions)
|
||||
return False, error_string
|
||||
done = True
|
||||
return services_checks.Status.UP, success_string
|
||||
return True, content
|
||||
|
||||
def _check(self, instance):
|
||||
content = ''
|
||||
addr = instance.get("url", None)
|
||||
pattern = instance.get('match_pattern', None)
|
||||
|
||||
dimensions = self._set_dimensions({'url': addr}, instance)
|
||||
|
||||
success, result_string = self._http_check(instance)
|
||||
if not success:
|
||||
self.gauge('http_status',
|
||||
1,
|
||||
dimensions=dimensions)
|
||||
return services_checks.Status.DOWN, result_string
|
||||
|
||||
if pattern is not None:
|
||||
if re.search(pattern, result_string, re.DOTALL):
|
||||
self.log.debug("Pattern match successful")
|
||||
else:
|
||||
error_string = 'Pattern match failed! "{0}" not in "{1}"'.format(pattern, content)
|
||||
self.log.info(error_string)
|
||||
self.gauge('http_status',
|
||||
1,
|
||||
dimensions=dimensions,
|
||||
value_meta={'error': error_string})
|
||||
return services_checks.Status.DOWN, error_string
|
||||
|
||||
success_string = '{0} is UP'.format(addr)
|
||||
self.log.debug(success_string)
|
||||
self.gauge('http_status', 0, dimensions=dimensions)
|
||||
return services_checks.Status.UP, success_string
|
||||
|
||||
77
monasca_agent/collector/checks_d/http_metrics.py
Normal file
77
monasca_agent/collector/checks_d/http_metrics.py
Normal file
@@ -0,0 +1,77 @@
|
||||
#!/bin/env python
|
||||
"""Monitoring Agent plugin for HTTP/API checks.
|
||||
|
||||
"""
|
||||
|
||||
import json
|
||||
from numbers import Number
|
||||
|
||||
import monasca_agent.collector.checks.services_checks as services_checks
|
||||
import monasca_agent.collector.checks_d.http_check as http_check
|
||||
|
||||
|
||||
class HTTPMetrics(http_check.HTTPCheck):
|
||||
|
||||
def __init__(self, name, init_config, agent_config, instances=None):
|
||||
super(HTTPMetrics, self).__init__(name, init_config, agent_config,
|
||||
instances)
|
||||
self.metric_method = {
|
||||
'gauge': self.gauge,
|
||||
'counter': self.increment,
|
||||
'histogram': self.histogram,
|
||||
'set': self.set,
|
||||
'rate': self.rate}
|
||||
|
||||
def _valid_number(self, value, name):
|
||||
if not isinstance(value, Number):
|
||||
self.log.info("Value '{0}' is not a number for metric {1}".format(
|
||||
value, name))
|
||||
return False
|
||||
return True
|
||||
|
||||
def _check(self, instance):
|
||||
addr = instance.get("url", None)
|
||||
whitelist = instance.get("whitelist", None)
|
||||
|
||||
dimensions = self._set_dimensions({'url': addr}, instance)
|
||||
|
||||
success, result_string = self._http_check(instance)
|
||||
|
||||
if success:
|
||||
json_data = json.loads(result_string)
|
||||
|
||||
for metric in whitelist:
|
||||
try:
|
||||
metric_name = metric['name']
|
||||
metric_type = metric['type']
|
||||
keys = metric['path'].split('/')
|
||||
except Exception:
|
||||
self.log.warning("Invalid configuration for metric '{0}'".format(metric))
|
||||
continue
|
||||
|
||||
current = json_data
|
||||
try:
|
||||
for key in keys:
|
||||
current = current[key]
|
||||
except Exception:
|
||||
self.log.warning("Could not find a value at {0} in json message".format(keys))
|
||||
continue
|
||||
|
||||
value = current
|
||||
|
||||
# everything requires a number, except set
|
||||
if metric_type in ['gauge', 'counter', 'histogram', 'rate']:
|
||||
if not self._valid_number(value, metric_name):
|
||||
self.log.warning("Invalid value '{0}' for metric '{1}'".format(value, metric_name))
|
||||
continue
|
||||
|
||||
if metric_type in self.metric_method:
|
||||
self.metric_method[metric_type](metric_name,
|
||||
value,
|
||||
dimensions=dimensions)
|
||||
else:
|
||||
self.log.warning("Unrecognized type '{0}' for metric '{1}'".format(metric_type, metric_name))
|
||||
|
||||
success_string = '{0} is UP'.format(addr)
|
||||
self.log.debug(success_string)
|
||||
return services_checks.Status.UP, success_string
|
||||
@@ -39,11 +39,34 @@ class MonAPI(monasca_setup.detection.Plugin):
|
||||
"""Build the config as a Plugins object and return."""
|
||||
log.info("\tEnabling the Monasca api healthcheck")
|
||||
admin_port = self.api_config['server']['adminConnectors'][0]['port']
|
||||
return dropwizard_health_check('monitoring', 'api', 'http://localhost:{0}/healthcheck'.format(admin_port))
|
||||
config = monasca_setup.agent_config.Plugins()
|
||||
config.merge(dropwizard_health_check('monitoring', 'api', 'http://localhost:8081/healthcheck'))
|
||||
|
||||
# todo
|
||||
# log.info("\tEnabling the mon api metric collection")
|
||||
# http://localhost:8081/metrics
|
||||
log.info("\tEnabling the Monasca api metrics")
|
||||
whitelist = [
|
||||
{
|
||||
"name": "jvm.memory.total.max",
|
||||
"path": "gauges/jvm.memory.total.max/value",
|
||||
"type": "gauge"},
|
||||
{
|
||||
"name": "jvm.memory.total.used",
|
||||
"path": "gauges/jvm.memory.total.used/value",
|
||||
"type": "gauge"},
|
||||
{
|
||||
"name": "metrics.published",
|
||||
"path": "meters/monasca.api.app.MetricService.metrics.published/count",
|
||||
"type": "rate"},
|
||||
{
|
||||
"name": "raw-sql.time.avg",
|
||||
"path": "timers/org.skife.jdbi.v2.DBI.raw-sql/mean",
|
||||
"type": "gauge"},
|
||||
{
|
||||
"name": "raw-sql.time.max",
|
||||
"path": "timers/org.skife.jdbi.v2.DBI.raw-sql/max",
|
||||
"type": "gauge"},
|
||||
]
|
||||
config.merge(dropwizard_metrics('monitoring', 'api', 'http://localhost:8081/metrics', whitelist))
|
||||
return config
|
||||
|
||||
def dependencies_installed(self):
|
||||
return True
|
||||
@@ -75,11 +98,46 @@ class MonPersister(monasca_setup.detection.Plugin):
|
||||
def build_config(self):
|
||||
"""Build the config as a Plugins object and return."""
|
||||
log.info("\tEnabling the Monasca persister healthcheck")
|
||||
return dropwizard_health_check('monitoring', 'persister', 'http://localhost:8091/healthcheck')
|
||||
config = monasca_setup.agent_config.Plugins()
|
||||
config.merge(dropwizard_health_check('monitoring', 'persister', 'http://localhost:8091/healthcheck'))
|
||||
|
||||
# todo
|
||||
# log.info("\tEnabling the mon persister metric collection")
|
||||
# http://localhost:8091/metrics
|
||||
log.info("\tEnabling the Monasca persister metrics")
|
||||
whitelist = [
|
||||
{
|
||||
"name": "jvm.memory.total.max",
|
||||
"path": "gauges/jvm.memory.total.max/value",
|
||||
"type": "gauge"},
|
||||
{
|
||||
"name": "jvm.memory.total.used",
|
||||
"path": "gauges/jvm.memory.total.used/value",
|
||||
"type": "gauge"},
|
||||
{
|
||||
"name": "alarm-state-transitions-added-to-batch-counter[0]",
|
||||
"path": "counters/monasca.persister.pipeline.event.AlarmStateTransitionHandler[alarm-state-transition-0].alarm-state-transitions-added-to-batch-counter/count",
|
||||
"type": "rate"},
|
||||
{
|
||||
"name": "alarm-state-transitions-added-to-batch-counter[1]",
|
||||
"path": "counters/monasca.persister.pipeline.event.AlarmStateTransitionHandler[alarm-state-transition-1].alarm-state-transitions-added-to-batch-counter/count",
|
||||
"type": "rate"},
|
||||
{
|
||||
"name": "metrics-added-to-batch-counter[0]",
|
||||
"path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-0].metrics-added-to-batch-counter/count",
|
||||
"type": "rate"},
|
||||
{
|
||||
"name": "metrics-added-to-batch-counter[1]",
|
||||
"path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-1].metrics-added-to-batch-counter/count",
|
||||
"type": "rate"},
|
||||
{
|
||||
"name": "metrics-added-to-batch-counter[2]",
|
||||
"path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-2].metrics-added-to-batch-counter/count",
|
||||
"type": "rate"},
|
||||
{
|
||||
"name": "metrics-added-to-batch-counter[3]",
|
||||
"path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-3].metrics-added-to-batch-counter/count",
|
||||
"type": "rate"}
|
||||
]
|
||||
config.merge(dropwizard_metrics('monitoring', 'persister', 'http://localhost:8091/metrics', whitelist))
|
||||
return config
|
||||
|
||||
def dependencies_installed(self):
|
||||
return True
|
||||
@@ -115,3 +173,15 @@ def dropwizard_health_check(service, component, url):
|
||||
'include_content': False,
|
||||
'dimensions': {'service': service, 'component': component}}]}
|
||||
return config
|
||||
|
||||
|
||||
def dropwizard_metrics(service, component, url, whitelist):
|
||||
"""Setup a dropwizard metrics check"""
|
||||
config = monasca_setup.agent_config.Plugins()
|
||||
config['http_metrics'] = {'init_config': None,
|
||||
'instances': [{'name': "{0}-{1} metrics".format(service, component),
|
||||
'url': url,
|
||||
'timeout': 1,
|
||||
'dimensions': {'service': service, 'component': component},
|
||||
'whitelist': whitelist}]}
|
||||
return config
|
||||
|
||||
Reference in New Issue
Block a user