Files
cloudkitty/doc/source/developer/collector.rst
Luka Peschke e3171f869e Add some developer documentation for collectors
This adds an entry about collectors to the developer documentation.
Some information about collectors has been moved from the admin to the
developer documentation.

Change-Id: I2423761b9f7a672fe837d5d5954473301d936ba3
Story: 2004179
Task: 28514
2019-02-05 15:48:07 +00:00

5.2 KiB

Collector

Data format

Internally, CloudKitty's data format is a bit more detailled than what can be found in the architecture documentation.

The internal data format is the following:

{
    "bananas": [
        {
            "vol": {
                "unit": "banana",
                "qty": 1
            },
            "rating": {
                "price": 1
            },
            "groupby": {
                "xxx_id": "hello",
                "yyy_id": "bye",
            },
            "metadata": {
                "flavor": "chocolate",
                "eaten_by": "gorilla",
            },
       }
    ],
}

However, developers implementing a collector don't need to format the data themselves, as there are helper functions for these matters.

Implementation

Each collector must implement the following class:

cloudkitty.collector.BaseCollector

The retrieve method of the BaseCollector class is called by the orchestrator. This method calls the fetch_all method of the child class.

To create a collector, you need to implement at least the fetch_all method.

Data collection

Collectors must implement a fetch_all method. This method is called for each metric type, for each scope, for each collect period. It has the following prototype:

cloudkitty.collector.BaseCollector

This method is supposed to return a list of objects formatted by CloudKittyFormatTransformer.

Example code of a basic collector:

from cloudkitty.collector import BaseCollector

class MyCollector(BaseCollector):
    def __init__(self, **kwargs):
        super(MyCollector, self).__init__(**kwargs)

    def fetch_all(self, metric_name, start, end,
                  project_id=None, q_filter=None):
        data = []
        for CONDITION:
            # do stuff
            data.append(self.t_cloudkitty.format_item(
                groupby, # dict
                metadata, # dict
                unit, # str
                qty=qty, # int / float
            ))

        return data

project_id can be misleading, as it is a legacy name. It contains the ID of the current scope. The attribute corresponding to the scope is specified in the configuration, under [collect]/scope_key. Thus, all queries should filter based on this attribute. Example:

from oslo_config import cfg

from cloudkitty.collector import BaseCollector

CONF = cfg.CONF

class MyCollector(BaseCollector):
    def __init__(self, **kwargs):
        super(MyCollector, self).__init__(**kwargs)

    def fetch_all(self, metric_name, start, end,
                  project_id=None, q_filter=None):
        scope_key = CONF.collect.scope_key
        filters = {'start': start, 'stop': stop, scope_key: project_id}

        data = self.client.query(
            filters=filters,
            groupby=self.conf[metric_name]['groupby'])
        # Format data etc
        return output

Additional configuration

If you need to extend the metric configuration (add parameters to the extra_args section of metrics.yml), you can overload the check_configuration method of the base collector:

cloudkitty.collector.BaseCollector

This method uses voluptuous for data validation. The base schema for each metric can be found in cloudkitty.collector.METRIC_BASE_SCHEMA. This schema is meant to be extended by other collectors. Example taken from the gnocchi collector code:

from cloudkitty import collector

GNOCCHI_EXTRA_SCHEMA = {
    Required('extra_args'): {
        Required('resource_type'): All(str, Length(min=1)),
        # Due to Gnocchi model, metric are grouped by resource.
        # This parameter allows to adapt the key of the resource identifier
        Required('resource_key', default='id'): All(str, Length(min=1)),
        Required('aggregation_method', default='max'):
            In(['max', 'mean', 'min']),
    },
}

class GnocchiCollector(collector.BaseCollector):

    collector_name = 'gnocchi'

    @staticmethod
    def check_configuration(conf):
        conf = collector.BaseCollector.check_configuration(conf)
        metric_schema = Schema(collector.METRIC_BASE_SCHEMA).extend(
            GNOCCHI_EXTRA_SCHEMA)

        output = {}
        for metric_name, metric in conf.items():
            met = output[metric_name] = metric_schema(metric)

            if met['extra_args']['resource_key'] not in met['groupby']:
                met['groupby'].append(met['extra_args']['resource_key'])

        return output

If your collector does not need any extra_args, it is not required to overload the check_configuration method.