Spec: Reworking Prometheus Collector
This is a spec proposing some rework on Prometheus collector. See specs/stein/reworking_prometheus_collector.rst for more details. Change-Id: I7740432c82cce2e613996e8b0de31c8389413a58
This commit is contained in:
parent
836c4ed418
commit
cd0bfc925f
267
specs/stein/reworking_prometheus_collector.rst
Normal file
267
specs/stein/reworking_prometheus_collector.rst
Normal file
@ -0,0 +1,267 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
======================================
|
||||
Reworking Prometheus Collector
|
||||
======================================
|
||||
|
||||
CloudKitty's Prometheus collector needs some rework. This spec aims at defining
|
||||
what is to be done and why. Everything in this document is open to discussion,
|
||||
equally on the associated storyboard and gerrit.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
In the present state, Prometheus collector is not consistent with the other
|
||||
collectors present in CloudKitty in terms of configuration and operations.
|
||||
|
||||
* The end user fully defines, on its own, the PromQL query to be executed
|
||||
in the ``metrics.yml`` file in a ``query`` option
|
||||
under ``extra_args`` subkey.
|
||||
Permitting the end user such complexity for building its request
|
||||
at the expense of genericity is not relevant regarding the use cases
|
||||
covered by CloudKitty.
|
||||
|
||||
* The collector does not use native PromQL ``groupby`` capabilities.
|
||||
The process is done post Prometheus service response causing
|
||||
an overhead operational cost.
|
||||
|
||||
* The collector ignores ``scope_key`` configuration option,
|
||||
disallowing scoping requests however it is the standard behaviour
|
||||
to be found in other collectors of the project.
|
||||
Besides, it is a major concern for orchestration.
|
||||
Let's say you deploy several CK processors. Without scoping,
|
||||
you will end up with data duplications likely to result in false reports.
|
||||
|
||||
The collector also needs the following new features:
|
||||
|
||||
* It needs authentication configuration options to dial into
|
||||
some secured Prometheus service.
|
||||
|
||||
* It needs a configuration option to take in account a custom certificate
|
||||
authority file in order to authorize HTTPS requests
|
||||
signed with custom certificate.
|
||||
|
||||
* It needs an insecure configuration option to explicitly trust HTTPS
|
||||
requests signed with an untrusted certificate.
|
||||
|
||||
These features will be useful to end users for configuring CloudKitty
|
||||
to effectively connect to a Prometheus service.
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
In ``cloudkitty.conf``, under ``[collector_prometheus]`` section:
|
||||
|
||||
* Add a ``prometheus_user`` and a ``prometheus_password`` option
|
||||
to provide credentials for authentication (both defaults: none).
|
||||
|
||||
* Add a ``cafile`` option to allow custom certificate authority
|
||||
file (default: none).
|
||||
|
||||
* Add an ``insecure`` option to explicitly allow insecure HTTPS requests
|
||||
(default: false).
|
||||
|
||||
|
||||
Possible deprecation of ``extra_args/query`` in ``metrics.yml``
|
||||
for a defined metric (see Alternatives_ section).
|
||||
|
||||
|
||||
In ``metrics.yml``, for a defined metric, add an ``aggregation_method`` option
|
||||
under ``extra_args`` subkey to define which aggregation method to use collecting
|
||||
metric data.
|
||||
|
||||
For instance:
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
# metrics.yml
|
||||
|
||||
metrics:
|
||||
# [...]
|
||||
|
||||
volume_size:
|
||||
unit: GiB
|
||||
groupby:
|
||||
- id
|
||||
- project_id
|
||||
metadata:
|
||||
- volume_type
|
||||
extra_args:
|
||||
aggregation_method: max
|
||||
|
||||
Valid aggregation methods will be the following :
|
||||
|
||||
* ``avg``: the average value of all points in the time window.
|
||||
* ``min``: the minimum value of all points in the time window.
|
||||
* ``max``: the maximum value of all points in the time window.
|
||||
* ``sum``: the sum of all values in the time window.
|
||||
* ``count``: the count of all values in the time window.
|
||||
* ``stddev``: the population standard deviation of the values in the time window.
|
||||
* ``stdvar``: the population standard variance of the values in the time window.
|
||||
|
||||
.. note::
|
||||
|
||||
Time window is computed from the ``period`` configuration option under
|
||||
``[collect]`` section in ``cloudkitty.conf``.
|
||||
|
||||
Other changes at this stage are mostly focused on the query building process
|
||||
happening inside the Prometheus collector:
|
||||
|
||||
* ``[collect]/scope_key`` option value will be added to filter every query
|
||||
with the corresponding ``scope_id``.
|
||||
|
||||
* The collector will now use the ``groupby`` capabilities offered by PromQL
|
||||
using the syntax ``by (scope_key, groupby, metadata)`` instead of
|
||||
deferring it to the collector implementation.
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
For the following ``cloudkitty.conf`` :
|
||||
|
||||
.. code-block:: INI
|
||||
|
||||
# Some options have been omitted for the sake of clarity
|
||||
|
||||
[collect]
|
||||
collector = prometheus
|
||||
scope_key = namespace
|
||||
period = 3600
|
||||
|
||||
[collector_prometheus]
|
||||
prometheus_url = http://prometheus-dn.tld/api/v1
|
||||
|
||||
And the following ``metrics.yml`` :
|
||||
|
||||
.. code-block:: YAML
|
||||
|
||||
metrics:
|
||||
# [...]
|
||||
|
||||
# Let's say we want to rate the following metric
|
||||
container_memory_usage_bytes:
|
||||
unit: GiB
|
||||
groupby:
|
||||
- container_id
|
||||
metadata:
|
||||
- volume_type
|
||||
extra_args:
|
||||
aggregation_method: max
|
||||
|
||||
|
||||
The PromQL request will look the following :
|
||||
|
||||
``max(max_over_time(container_memory_usage_bytes{namespace="foobar"}[3600s])) by (namespace, container_id, volume_type)``
|
||||
|
||||
Start and end timestamps will be computed from the ``period`` option.
|
||||
Let's say that we are starting from ``January 30th, 2019 at 1pm
|
||||
(timestamp: 1548853200)``.
|
||||
Then the end timestamp will be ``1548853200 + period (= 1548856800)``.
|
||||
|
||||
The HTTP URL with the url-encoded PromQL will then look the following :
|
||||
|
||||
``http://prometheus-dn.tld/api/v1/query?query=max%28max_over_time%28container_memory_usage_bytes%7Bnamespace%3D%22foobar%22%7D%5B3600s%5D%29%29%20by%20%28namespace%2C%20container_id%2C%20volume_type%29&time=158856800``
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Instead of deprecation for the ``extra_args/query`` option
|
||||
defined for a metric in ``metrics.yml`` file, we could remove it.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications Impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
End users will probably have to reconfigure their metrics definition
|
||||
if they are using the Prometheus collector in the present state.
|
||||
|
||||
However, after these changes are made,
|
||||
they will have less overhead configuration to do in order to
|
||||
switch from a collector to another, due to the consistency improvement
|
||||
in the way metrics are defined in ``metrics.yml`` for Prometheus collector.
|
||||
|
||||
They will also benefit from the added configuration options
|
||||
for the ``[collector_prometheus]`` section in ``cloudkitty.conf``.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Justin Ferrieu is assigned to work on the spec as well as to develop
|
||||
the evoked points.
|
||||
|
||||
Primary assignee:
|
||||
jferrieu
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Change configuration schema, query building process and query response
|
||||
formatting process accordingly inside Prometheus collector.
|
||||
* Add HTTPS and authentication support.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The proposed changes will be tested with unit tests.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
We will add an entry for detailing the Prometheus collector configuration
|
||||
in ``Administration Guide/Configuration Guide/Collector/Prometheus``.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* Prometheus documentation: https://prometheus.io/docs/
|
||||
|
Loading…
x
Reference in New Issue
Block a user