Spec: Reworking Prometheus Collector

This is a spec proposing some rework on Prometheus collector.

See specs/stein/reworking_prometheus_collector.rst for more details.

Change-Id: I7740432c82cce2e613996e8b0de31c8389413a58
This commit is contained in:
Justin Ferrieu 2018-12-19 11:33:15 +01:00
parent 836c4ed418
commit cd0bfc925f
1 changed files with 267 additions and 0 deletions

View File

@ -0,0 +1,267 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
======================================
Reworking Prometheus Collector
======================================
CloudKitty's Prometheus collector needs some rework. This spec aims at defining
what is to be done and why. Everything in this document is open to discussion,
equally on the associated storyboard and gerrit.
Problem Description
===================
In the present state, Prometheus collector is not consistent with the other
collectors present in CloudKitty in terms of configuration and operations.
* The end user fully defines, on its own, the PromQL query to be executed
in the ``metrics.yml`` file in a ``query`` option
under ``extra_args`` subkey.
Permitting the end user such complexity for building its request
at the expense of genericity is not relevant regarding the use cases
covered by CloudKitty.
* The collector does not use native PromQL ``groupby`` capabilities.
The process is done post Prometheus service response causing
an overhead operational cost.
* The collector ignores ``scope_key`` configuration option,
disallowing scoping requests however it is the standard behaviour
to be found in other collectors of the project.
Besides, it is a major concern for orchestration.
Let's say you deploy several CK processors. Without scoping,
you will end up with data duplications likely to result in false reports.
The collector also needs the following new features:
* It needs authentication configuration options to dial into
some secured Prometheus service.
* It needs a configuration option to take in account a custom certificate
authority file in order to authorize HTTPS requests
signed with custom certificate.
* It needs an insecure configuration option to explicitly trust HTTPS
requests signed with an untrusted certificate.
These features will be useful to end users for configuring CloudKitty
to effectively connect to a Prometheus service.
Proposed Change
===============
In ``cloudkitty.conf``, under ``[collector_prometheus]`` section:
* Add a ``prometheus_user`` and a ``prometheus_password`` option
to provide credentials for authentication (both defaults: none).
* Add a ``cafile`` option to allow custom certificate authority
file (default: none).
* Add an ``insecure`` option to explicitly allow insecure HTTPS requests
(default: false).
Possible deprecation of ``extra_args/query`` in ``metrics.yml``
for a defined metric (see Alternatives_ section).
In ``metrics.yml``, for a defined metric, add an ``aggregation_method`` option
under ``extra_args`` subkey to define which aggregation method to use collecting
metric data.
For instance:
.. code-block:: YAML
# metrics.yml
metrics:
# [...]
volume_size:
unit: GiB
groupby:
- id
- project_id
metadata:
- volume_type
extra_args:
aggregation_method: max
Valid aggregation methods will be the following :
* ``avg``: the average value of all points in the time window.
* ``min``: the minimum value of all points in the time window.
* ``max``: the maximum value of all points in the time window.
* ``sum``: the sum of all values in the time window.
* ``count``: the count of all values in the time window.
* ``stddev``: the population standard deviation of the values in the time window.
* ``stdvar``: the population standard variance of the values in the time window.
.. note::
Time window is computed from the ``period`` configuration option under
``[collect]`` section in ``cloudkitty.conf``.
Other changes at this stage are mostly focused on the query building process
happening inside the Prometheus collector:
* ``[collect]/scope_key`` option value will be added to filter every query
with the corresponding ``scope_id``.
* The collector will now use the ``groupby`` capabilities offered by PromQL
using the syntax ``by (scope_key, groupby, metadata)`` instead of
deferring it to the collector implementation.
Example
-------
For the following ``cloudkitty.conf`` :
.. code-block:: INI
# Some options have been omitted for the sake of clarity
[collect]
collector = prometheus
scope_key = namespace
period = 3600
[collector_prometheus]
prometheus_url = http://prometheus-dn.tld/api/v1
And the following ``metrics.yml`` :
.. code-block:: YAML
metrics:
# [...]
# Let's say we want to rate the following metric
container_memory_usage_bytes:
unit: GiB
groupby:
- container_id
metadata:
- volume_type
extra_args:
aggregation_method: max
The PromQL request will look the following :
``max(max_over_time(container_memory_usage_bytes{namespace="foobar"}[3600s])) by (namespace, container_id, volume_type)``
Start and end timestamps will be computed from the ``period`` option.
Let's say that we are starting from ``January 30th, 2019 at 1pm
(timestamp: 1548853200)``.
Then the end timestamp will be ``1548853200 + period (= 1548856800)``.
The HTTP URL with the url-encoded PromQL will then look the following :
``http://prometheus-dn.tld/api/v1/query?query=max%28max_over_time%28container_memory_usage_bytes%7Bnamespace%3D%22foobar%22%7D%5B3600s%5D%29%29%20by%20%28namespace%2C%20container_id%2C%20volume_type%29&time=158856800``
Alternatives
------------
Instead of deprecation for the ``extra_args/query`` option
defined for a metric in ``metrics.yml`` file, we could remove it.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications Impact
--------------------
None
Other end user impact
---------------------
End users will probably have to reconfigure their metrics definition
if they are using the Prometheus collector in the present state.
However, after these changes are made,
they will have less overhead configuration to do in order to
switch from a collector to another, due to the consistency improvement
in the way metrics are defined in ``metrics.yml`` for Prometheus collector.
They will also benefit from the added configuration options
for the ``[collector_prometheus]`` section in ``cloudkitty.conf``.
Performance Impact
------------------
None
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Justin Ferrieu is assigned to work on the spec as well as to develop
the evoked points.
Primary assignee:
jferrieu
Other contributors:
None
Work Items
----------
* Change configuration schema, query building process and query response
formatting process accordingly inside Prometheus collector.
* Add HTTPS and authentication support.
Dependencies
============
None
Testing
=======
The proposed changes will be tested with unit tests.
Documentation Impact
====================
We will add an entry for detailing the Prometheus collector configuration
in ``Administration Guide/Configuration Guide/Collector/Prometheus``.
References
==========
* Prometheus documentation: https://prometheus.io/docs/