cloudkitty/doc/source/developer/collector.rst
Luka Peschke 492ec063a7 Remove transformers from the codebase
Since data frames are now handled as objects, transformers are no longer
required. This simplifies the global codebase.

Story: 2005890
Task: 36075
Change-Id: I76d9117bd95d80e51ca95804c999f145e65c3a2d
2019-09-10 14:51:02 +00:00

5.2 KiB

Collector

Data format

Internally, CloudKitty's data format is a bit more detailled than what can be found in the architecture documentation.

The internal data format is the following:

{
    "bananas": [
        {
            "vol": {
                "unit": "banana",
                "qty": 1
            },
            "rating": {
                "price": 1
            },
            "groupby": {
                "xxx_id": "hello",
                "yyy_id": "bye",
            },
            "metadata": {
                "flavor": "chocolate",
                "eaten_by": "gorilla",
            },
       }
    ],
}

However, developers implementing a collector don't need to format the data themselves, as there are helper functions for these matters.

Implementation

Each collector must implement the following class:

cloudkitty.collector.BaseCollector

The retrieve method of the BaseCollector class is called by the orchestrator. This method calls the fetch_all method of the child class.

To create a collector, you need to implement at least the fetch_all method.

Data collection

Collectors must implement a fetch_all method. This method is called for each metric type, for each scope, for each collect period. It has the following prototype:

cloudkitty.collector.BaseCollector

This method is supposed to return a list of cloudkitty.dataframe.DataPoint objects.

Example code of a basic collector:

from cloudkitty.collector import BaseCollector

class MyCollector(BaseCollector):
    def __init__(self, **kwargs):
        super(MyCollector, self).__init__(**kwargs)

    def fetch_all(self, metric_name, start, end,
                  project_id=None, q_filter=None):
        data = []
        for CONDITION:
            # do stuff
            data.append(dataframe.DataPoint(
                unit,
                qty, # int, float, decimal.Decimal or str
                0, # price
                groupby, # dict
                metadata, # dict
            ))

        return data

project_id can be misleading, as it is a legacy name. It contains the ID of the current scope. The attribute corresponding to the scope is specified in the configuration, under [collect]/scope_key. Thus, all queries should filter based on this attribute. Example:

from oslo_config import cfg

from cloudkitty.collector import BaseCollector

CONF = cfg.CONF

class MyCollector(BaseCollector):
    def __init__(self, **kwargs):
        super(MyCollector, self).__init__(**kwargs)

    def fetch_all(self, metric_name, start, end,
                  project_id=None, q_filter=None):
        scope_key = CONF.collect.scope_key
        filters = {'start': start, 'stop': stop, scope_key: project_id}

        data = self.client.query(
            filters=filters,
            groupby=self.conf[metric_name]['groupby'])
        # Format data etc
        return output

Additional configuration

If you need to extend the metric configuration (add parameters to the extra_args section of metrics.yml), you can overload the check_configuration method of the base collector:

cloudkitty.collector.BaseCollector

This method uses voluptuous for data validation. The base schema for each metric can be found in cloudkitty.collector.METRIC_BASE_SCHEMA. This schema is meant to be extended by other collectors. Example taken from the gnocchi collector code:

from cloudkitty import collector

GNOCCHI_EXTRA_SCHEMA = {
    Required('extra_args'): {
        Required('resource_type'): All(str, Length(min=1)),
        # Due to Gnocchi model, metric are grouped by resource.
        # This parameter allows to adapt the key of the resource identifier
        Required('resource_key', default='id'): All(str, Length(min=1)),
        Required('aggregation_method', default='max'):
            In(['max', 'mean', 'min']),
    },
}

class GnocchiCollector(collector.BaseCollector):

    collector_name = 'gnocchi'

    @staticmethod
    def check_configuration(conf):
        conf = collector.BaseCollector.check_configuration(conf)
        metric_schema = Schema(collector.METRIC_BASE_SCHEMA).extend(
            GNOCCHI_EXTRA_SCHEMA)

        output = {}
        for metric_name, metric in conf.items():
            met = output[metric_name] = metric_schema(metric)

            if met['extra_args']['resource_key'] not in met['groupby']:
                met['groupby'].append(met['extra_args']['resource_key'])

        return output

If your collector does not need any extra_args, it is not required to overload the check_configuration method.