Create dropwizard plugin for collecting metrics via http

Change-Id: I51fc4f2c2a50a84e8d75cbc8f47c8a4e78b2e4d4
2015-06-09 15:51:09 -06:00
parent 2f096ae58c
commit 38b5ca8384
5 changed files with 300 additions and 58 deletions
--- a/conf.d/http_metrics.yaml.example
+++ b/conf.d/http_metrics.yaml.example
@@ -0,0 +1,72 @@
+init_config:
+
+instances:
+#    -   name: Some Service Name
+#        url: http://some.url.example.com
+#        timeout: 1
+
+         # If your service uses basic authentication, you can optionally
+         # specify a username and password that will be used in the check.
+#        username: user
+#        password: pass
+
+         # If your service uses keystone for authentication, you can optionally
+         # specify the information to collect a token to be used in the check.
+         # This information should follow the same guidelines presented in
+         # agent.yaml.template
+         # https://github.com/stackforge/monasca-agent/blob/master/agent.yaml.template
+         # If use_keystone=True and keystone_config is not specified, the keystone information
+         # from the agent config will be used.
+#        use_keystone=True
+#        keystone_config:
+#            keystone_url: http://endpoint.com/v3/
+#            username: user
+#            password: password
+
+         # The (optional) collect_response_time parameter will instruct the
+         # check to create a metric 'network.http.response_time', tagged with
+         # the url, reporting the response time in seconds.
+
+#        collect_response_time: true
+
+         # The (optional) disable_ssl_validation will instruct the check
+         # to skip the validation of the SSL certificate of the URL being tested.
+         # This is mostly useful when checking SSL connections signed with
+         # certificates that are not themselves signed by a public authority.
+         # When true, the check logs a warning in collector.log
+
+#        disable_ssl_validation: true
+
+         # The (optional) headers parameter allows you to send extra headers
+         # with the request. This is useful for explicitly specifying the host
+         # header or perhaps adding headers for authorisation purposes. Note
+         # that the http client library converts all headers to lowercase.
+         # This is legal according to RFC2616
+         # (See: http://tools.ietf.org/html/rfc2616#section-4.2)
+         # but may be problematic with some HTTP servers
+         # (See: https://code.google.com/p/httplib2/issues/detail?id=169)
+
+#        headers:
+#           Host: alternative.host.example.com
+#           X-Auth-Token: SOME-AUTH-TOKEN
+
+#        dimensions:
+#            dim1: value1
+
+         # To select which metrics to record, create a whitelist. Each entry in
+         # the whitelist should include the name you want to give the metric,
+         # the path to the metric value in the json (as a series of keys
+         # separated by '/'), and the type of recording to use (counter, gauge,
+         # rate, histogram, set). See the Plugins documentation about
+         # http_metrics for more information about the different types.
+
+#        whitelist:
+#           - name: jvm.memory.total.used
+#             path: gauges/jvm.memory.total.used/value
+#             type: gauge
+#           - name: metrics.published
+#             path: meters/monasca.api.app.MetricService.metrics.published/count
+#             type: rate
+#           - name: raw-sql.time.avg
+#             path: timers/org.skife.jdbi.v2.DBI.raw-sql/mean
+#             type: gauge
--- a/docs/Plugins.md
+++ b/docs/Plugins.md
@@ -19,6 +19,7 @@
  - [Host Alive Checks](#host-alive-checks)
  - [Process Checks](#process-checks)
  - [Http Endpoint Checks](#http-endpoint-checks)
+  - [Http Metrics](#http-metrics)
  - [MySQL Checks](#mysql-checks)
  - [ZooKeeper Checks](#zookeeper-checks)
  - [Kafka Checks](#kafka-checks)
@@ -523,7 +524,7 @@ The process checks return the following metrics:


 ## Http Endpoint Checks
-This section describes the http endpoint check that can be performed by the Agent. Http endpoint checks are checks that perform simple up/down checks on services, such as HTTP/REST APIs. An agent, given a list of URLs can dispatch an http request and report to the API success/failure as a metric.
+This section describes the http endpoint check that can be performed by the Agent. Http endpoint checks are checks that perform simple up/down checks on services, such as HTTP/REST APIs. An agent, given a list of URLs, can dispatch an http request and report to the API success/failure as a metric.

 default dimensions:
    url: endpoint
@@ -556,6 +557,31 @@ The http_status checks return the following metrics:
 | http_status  | url, detail | The status of the http endpoint call (0 = success, 1 = failure)
 | http_response_time  | url | The response time of the http endpoint call

+
+## Http Metrics
+This section describes the http metrics check that can be performed by the agent. Http metrics checks are checks that retrieve metrics from any url returning a json formatted response. An agent, given a list of URLs, can dispatch an http request and parse the desired metrics from the json response.
+
+ default dimensions:
+    url: endpoint
+
+ default value_meta
+    error: error_message
+
+Similar to other checks, the configuration is done in YAML (http_metrics.yaml), and consists of two keys: init_config and instances.  The former is not used by http_metrics, while the later contains one or more URLs to check, plus optional parameters like a timeout, username/password, whether or not to also record the response time, and a whitelist of metrics to collect. The whitelist should consist of a name, path, and type for each metric to be collected. The name is what the metric will be called when it is reported. The path is a string of keys separated by '/' where the metric value resides in the json response. The type is how you want the metric to be recorded (gauge, counter, histogram, rate, set). A gauge will store and report the value it find with no modifications. A counter will increment itself by the value it finds. A histogram will store values and return the calculated max, median, average, count, and percentiles. A rate will return the difference between the last two recorded samples divided by the interval between those samples in seconds. A set will record samples and return the number of unique values in the set.
+If the endpoint being checked requires authentication, there are two options. First, a username and password supplied in the instance options will be used by the check for authentication. Alternately, the check can retrieve a keystone token for authentication. Specific keystone information can be provided for each check, otherwise the information from the agent config will be used.
+
+```
+init_config:
+
+instances:
+       url: http://192.168.0.254/metrics
+       timeout: 1
+       collect_response_time: true
+       whitelist:
+              name: jvm.memory.total.max,
+              path: gauges/jvm.memory.total.max/value
+              type: gauge
+```
    
 ## MySQL Checks
 This section describes the mySQL check that can be performed by the Agent.  The mySQL check requires a configuration file called mysql.yaml to be available in the agent conf.d configuration directory.
--- a/monasca_agent/collector/checks_d/http_check.py
+++ b/monasca_agent/collector/checks_d/http_check.py
@@ -21,7 +21,7 @@ class HTTPCheck(services_checks.ServicesCheck):
        super(HTTPCheck, self).__init__(name, init_config, agent_config, instances)

    @staticmethod
-    def _load_conf(instance):
+    def _load_http_conf(instance):
        # Fetches the conf
        username = instance.get('username', None)
        password = instance.get('password', None)
@@ -31,29 +31,26 @@ class HTTPCheck(services_checks.ServicesCheck):
        keystone_config = instance.get('keystone_config', None)
        url = instance.get('url', None)
        response_time = instance.get('collect_response_time', False)
-        pattern = instance.get('match_pattern', None)
        if url is None:
            raise Exception("Bad configuration. You must specify a url")
        ssl = instance.get('disable_ssl_validation', True)

-        return url, username, password, timeout, headers, response_time, ssl, pattern, use_keystone, keystone_config
+        return url, username, password, timeout, headers, response_time, ssl, use_keystone, keystone_config

    def _create_status_event(self, status, msg, instance):
        """Does nothing: status events are not yet supported by Mon API.
-
        """
        return

-    def _check(self, instance):
-        addr, username, password, timeout, headers, response_time, disable_ssl_validation, pattern, use_keystone, keystone_config = self._load_conf(
+    def _http_check(self, instance):
+        addr, username, password, timeout, headers, response_time, disable_ssl_validation, use_keystone, keystone_config = self._load_http_conf(
            instance)
        config = cfg.Config()
        api_config = config.get_config('Api')
-        content = ''
-
        dimensions = self._set_dimensions({'url': addr}, instance)

        start = time.time()
+
        done = False
        retry = False
        while not done or retry:
@@ -67,9 +64,10 @@ class HTTPCheck(services_checks.ServicesCheck):
                    headers["X-Auth-Token"] = token
                    headers["Content-type"] = "application/json"
                else:
-                    self.log.warning("""Unable to get token. Keystone API server may be down.
-                                     Skipping check for {0}""".format(addr))
-                    return
+                    error_string = """Unable to get token. Keystone API server may be down.
+                                     Skipping check for {0}""".format(addr)
+                    self.log.warning(error_string)
+                    return False, error_string
            try:
                self.log.debug("Connecting to %s" % addr)
                if disable_ssl_validation:
@@ -84,31 +82,19 @@ class HTTPCheck(services_checks.ServicesCheck):
                length = int((time.time() - start) * 1000)
                error_string = '{0} is DOWN, error: {1}. Connection failed after {2} ms'.format(addr, str(e), length)
                self.log.info(error_string)
-                self.gauge('http_status',
-                           1,
-                           dimensions=dimensions,
-                           value_meta={'error': error_string})
-                return services_checks.Status.DOWN, error_string
+                return False, error_string

            except httplib.ResponseNotReady as e:
                length = int((time.time() - start) * 1000)
                error_string = '{0} is DOWN, error: {1}. Network is not routable after {2} ms'.format(addr, repr(e), length)
                self.log.info(error_string)
-                self.gauge('http_status',
-                           1,
-                           dimensions=dimensions,
-                           value_meta={'error': error_string})
-                return services_checks.Status.DOWN, error_string
+                return False, error_string

            except Exception as e:
                length = int((time.time() - start) * 1000)
                error_string = '{0} is DOWN, error: {1}. Connection failed after {2} ms'.format(addr, str(e), length)
                self.log.error('Unhandled exception {0}. Connection failed after {1} ms'.format(str(e), length))
-                self.gauge('http_status',
-                           1,
-                           dimensions=dimensions,
-                           value_meta={'error': error_string})
-                return services_checks.Status.DOWN, error_string
+                return False, error_string

            if response_time:
                # Stop the timer as early as possible
@@ -120,7 +106,7 @@ class HTTPCheck(services_checks.ServicesCheck):
                    if retry:
                        error_string = '{0} is DOWN, unable to get a valid token to connect with'.format(addr)
                        self.log.error(error_string)
-                        return services_checks.Status.DOWN, error_string
+                        return False, error_string
                    else:
                        # Get a new token and retry
                        self.log.info("Token expired, getting new token and retrying...")
@@ -130,26 +116,37 @@ class HTTPCheck(services_checks.ServicesCheck):
                else:
                    error_string = '{0} is DOWN, error code: {1}'.format(addr, str(resp.status))
                    self.log.info(error_string)
-                    self.gauge('http_status',
-                               1,
-                               dimensions=dimensions,
-                               value_meta={'error': error_string})
-                    return services_checks.Status.DOWN, error_string
-
-            if pattern is not None:
-                if re.search(pattern, content, re.DOTALL):
-                    self.log.debug("Pattern match successful")
-                else:
-                    error_string = 'Pattern match failed! "{0}" not in "{1}"'.format(pattern, content)
-                    self.log.info(error_string)
-                    self.gauge('http_status',
-                               1,
-                               dimensions=dimensions,
-                               value_meta={'error': error_string})
-                    return services_checks.Status.DOWN, error_string
-
-            success_string = '{0} is UP'.format(addr)
-            self.log.debug(success_string)
-            self.gauge('http_status', 0, dimensions=dimensions)
+                    return False, error_string
            done = True
-            return services_checks.Status.UP, success_string
+            return True, content
+
+    def _check(self, instance):
+        content = ''
+        addr = instance.get("url", None)
+        pattern = instance.get('match_pattern', None)
+
+        dimensions = self._set_dimensions({'url': addr}, instance)
+
+        success, result_string = self._http_check(instance)
+        if not success:
+            self.gauge('http_status',
+                       1,
+                       dimensions=dimensions)
+            return services_checks.Status.DOWN, result_string
+
+        if pattern is not None:
+            if re.search(pattern, result_string, re.DOTALL):
+                self.log.debug("Pattern match successful")
+            else:
+                error_string = 'Pattern match failed! "{0}" not in "{1}"'.format(pattern, content)
+                self.log.info(error_string)
+                self.gauge('http_status',
+                           1,
+                           dimensions=dimensions,
+                           value_meta={'error': error_string})
+                return services_checks.Status.DOWN, error_string
+
+        success_string = '{0} is UP'.format(addr)
+        self.log.debug(success_string)
+        self.gauge('http_status', 0, dimensions=dimensions)
+        return services_checks.Status.UP, success_string
--- a/monasca_agent/collector/checks_d/http_metrics.py
+++ b/monasca_agent/collector/checks_d/http_metrics.py
@@ -0,0 +1,77 @@
+#!/bin/env python
+"""Monitoring Agent plugin for HTTP/API checks.
+
+"""
+
+import json
+from numbers import Number
+
+import monasca_agent.collector.checks.services_checks as services_checks
+import monasca_agent.collector.checks_d.http_check as http_check
+
+
+class HTTPMetrics(http_check.HTTPCheck):
+
+    def __init__(self, name, init_config, agent_config, instances=None):
+        super(HTTPMetrics, self).__init__(name, init_config, agent_config,
+                                          instances)
+        self.metric_method = {
+            'gauge': self.gauge,
+            'counter': self.increment,
+            'histogram': self.histogram,
+            'set': self.set,
+            'rate': self.rate}
+
+    def _valid_number(self, value, name):
+        if not isinstance(value, Number):
+            self.log.info("Value '{0}' is not a number for metric {1}".format(
+                value, name))
+            return False
+        return True
+
+    def _check(self, instance):
+        addr = instance.get("url", None)
+        whitelist = instance.get("whitelist", None)
+
+        dimensions = self._set_dimensions({'url': addr}, instance)
+
+        success, result_string = self._http_check(instance)
+
+        if success:
+            json_data = json.loads(result_string)
+
+            for metric in whitelist:
+                try:
+                    metric_name = metric['name']
+                    metric_type = metric['type']
+                    keys = metric['path'].split('/')
+                except Exception:
+                    self.log.warning("Invalid configuration for metric '{0}'".format(metric))
+                    continue
+
+                current = json_data
+                try:
+                    for key in keys:
+                        current = current[key]
+                except Exception:
+                    self.log.warning("Could not find a value at {0} in json message".format(keys))
+                    continue
+
+                value = current
+
+                # everything requires a number, except set
+                if metric_type in ['gauge', 'counter', 'histogram', 'rate']:
+                    if not self._valid_number(value, metric_name):
+                        self.log.warning("Invalid value '{0}' for metric '{1}'".format(value, metric_name))
+                        continue
+
+                if metric_type in self.metric_method:
+                    self.metric_method[metric_type](metric_name,
+                                                    value,
+                                                    dimensions=dimensions)
+                else:
+                    self.log.warning("Unrecognized type '{0}' for metric '{1}'".format(metric_type, metric_name))
+
+            success_string = '{0} is UP'.format(addr)
+            self.log.debug(success_string)
+            return services_checks.Status.UP, success_string
--- a/monasca_setup/detection/plugins/mon.py
+++ b/monasca_setup/detection/plugins/mon.py
@@ -39,11 +39,34 @@ class MonAPI(monasca_setup.detection.Plugin):
        """Build the config as a Plugins object and return."""
        log.info("\tEnabling the Monasca api healthcheck")
        admin_port = self.api_config['server']['adminConnectors'][0]['port']
-        return dropwizard_health_check('monitoring', 'api', 'http://localhost:{0}/healthcheck'.format(admin_port))
+        config = monasca_setup.agent_config.Plugins()
+        config.merge(dropwizard_health_check('monitoring', 'api', 'http://localhost:8081/healthcheck'))

-        # todo
-        # log.info("\tEnabling the mon api metric collection")
-        # http://localhost:8081/metrics
+        log.info("\tEnabling the Monasca api metrics")
+        whitelist = [
+            {
+                "name": "jvm.memory.total.max",
+                "path": "gauges/jvm.memory.total.max/value",
+                "type": "gauge"},
+            {
+                "name": "jvm.memory.total.used",
+                "path": "gauges/jvm.memory.total.used/value",
+                "type": "gauge"},
+            {
+                "name": "metrics.published",
+                "path": "meters/monasca.api.app.MetricService.metrics.published/count",
+                "type": "rate"},
+            {
+                "name": "raw-sql.time.avg",
+                "path": "timers/org.skife.jdbi.v2.DBI.raw-sql/mean",
+                "type": "gauge"},
+            {
+                "name": "raw-sql.time.max",
+                "path": "timers/org.skife.jdbi.v2.DBI.raw-sql/max",
+                "type": "gauge"},
+        ]
+        config.merge(dropwizard_metrics('monitoring', 'api', 'http://localhost:8081/metrics', whitelist))
+        return config

    def dependencies_installed(self):
        return True
@@ -75,11 +98,46 @@ class MonPersister(monasca_setup.detection.Plugin):
    def build_config(self):
        """Build the config as a Plugins object and return."""
        log.info("\tEnabling the Monasca persister healthcheck")
-        return dropwizard_health_check('monitoring', 'persister', 'http://localhost:8091/healthcheck')
+        config = monasca_setup.agent_config.Plugins()
+        config.merge(dropwizard_health_check('monitoring', 'persister', 'http://localhost:8091/healthcheck'))

-        # todo
-        # log.info("\tEnabling the mon persister metric collection")
-        # http://localhost:8091/metrics
+        log.info("\tEnabling the Monasca persister metrics")
+        whitelist = [
+            {
+                "name": "jvm.memory.total.max",
+                "path": "gauges/jvm.memory.total.max/value",
+                "type": "gauge"},
+            {
+                "name": "jvm.memory.total.used",
+                "path": "gauges/jvm.memory.total.used/value",
+                "type": "gauge"},
+            {
+                "name": "alarm-state-transitions-added-to-batch-counter[0]",
+                "path": "counters/monasca.persister.pipeline.event.AlarmStateTransitionHandler[alarm-state-transition-0].alarm-state-transitions-added-to-batch-counter/count",
+                "type": "rate"},
+            {
+                "name": "alarm-state-transitions-added-to-batch-counter[1]",
+                "path": "counters/monasca.persister.pipeline.event.AlarmStateTransitionHandler[alarm-state-transition-1].alarm-state-transitions-added-to-batch-counter/count",
+                "type": "rate"},
+            {
+                "name": "metrics-added-to-batch-counter[0]",
+                "path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-0].metrics-added-to-batch-counter/count",
+                "type": "rate"},
+            {
+                "name": "metrics-added-to-batch-counter[1]",
+                "path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-1].metrics-added-to-batch-counter/count",
+                "type": "rate"},
+            {
+                "name": "metrics-added-to-batch-counter[2]",
+                "path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-2].metrics-added-to-batch-counter/count",
+                "type": "rate"},
+            {
+                "name": "metrics-added-to-batch-counter[3]",
+                "path": "counters/monasca.persister.pipeline.event.MetricHandler[metric-3].metrics-added-to-batch-counter/count",
+                "type": "rate"}
+        ]
+        config.merge(dropwizard_metrics('monitoring', 'persister', 'http://localhost:8091/metrics', whitelist))
+        return config

    def dependencies_installed(self):
        return True
@@ -115,3 +173,15 @@ def dropwizard_health_check(service, component, url):
                                           'include_content': False,
                                           'dimensions': {'service': service, 'component': component}}]}
    return config
+
+
+def dropwizard_metrics(service, component, url, whitelist):
+    """Setup a dropwizard metrics check"""
+    config = monasca_setup.agent_config.Plugins()
+    config['http_metrics'] = {'init_config': None,
+                              'instances': [{'name': "{0}-{1} metrics".format(service, component),
+                                             'url': url,
+                                             'timeout': 1,
+                                             'dimensions': {'service': service, 'component': component},
+                                             'whitelist': whitelist}]}
+    return config