eb1d64782a
This change adds a redis driver and uses it by default in devstack. Because reading/writting from/to disks is too slow in our testing environment. Change-Id: If617260a9d8e38dc9ba9311c832be333346dd41e
113 lines
4.1 KiB
ReStructuredText
Executable File
113 lines
4.1 KiB
ReStructuredText
Executable File
======================
|
||
Project Architecture
|
||
======================
|
||
|
||
Gnocchi consists of several services: a HTTP REST API (see :doc:`rest`), an
|
||
optional statsd-compatible daemon (see :doc:`statsd`), and an asynchronous
|
||
processing daemon (named `gnocchi-metricd`). Data is received via the HTTP REST
|
||
API or statsd daemon. `gnocchi-metricd` performs operations (statistics
|
||
computing, metric cleanup, etc...) on the received data in the background.
|
||
|
||
Both the HTTP REST API and the asynchronous processing daemon are stateless and
|
||
are scalable. Additional workers can be added depending on load.
|
||
|
||
.. image:: architecture.png
|
||
:align: center
|
||
:width: 80%
|
||
:alt: Gnocchi architecture
|
||
|
||
|
||
Back-ends
|
||
---------
|
||
|
||
Gnocchi uses three different back-ends for storing data: one for storing new
|
||
incoming measures (the incoming driver), one for storing the time series (the
|
||
storage driver) and one for indexing the data (the index driver).
|
||
|
||
The *incoming* storage is responsible for storing new measures sent to metrics.
|
||
It is by default – and usually – the same driver as the *storage* one.
|
||
|
||
The *storage* is responsible for storing measures of created metrics. It
|
||
receives timestamps and values, and pre-computes aggregations according to the
|
||
defined archive policies.
|
||
|
||
The *indexer* is responsible for storing the index of all resources, archive
|
||
policies and metrics, along with their definitions, types and properties. The
|
||
indexer is also responsible for linking resources with metrics.
|
||
|
||
Available storage back-ends
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Gnocchi currently offers different storage drivers:
|
||
|
||
* File (default)
|
||
* `Ceph`_ (preferred)
|
||
* `OpenStack Swift`_
|
||
* `S3`_
|
||
* `Redis`_
|
||
|
||
The drivers are based on an intermediate library, named *Carbonara*, which
|
||
handles the time series manipulation, since none of these storage technologies
|
||
handle time series natively.
|
||
|
||
The four *Carbonara* based drivers are working well and are as scalable as
|
||
their back-end technology permits. Ceph and Swift are inherently more scalable
|
||
than the file driver.
|
||
|
||
Depending on the size of your architecture, using the file driver and storing
|
||
your data on a disk might be enough. If you need to scale the number of server
|
||
with the file driver, you can export and share the data via NFS among all
|
||
Gnocchi processes. In any case, it is obvious that S3, Ceph and Swift drivers
|
||
are largely more scalable. Ceph also offers better consistency, and hence is
|
||
the recommended driver.
|
||
|
||
.. _OpenStack Swift: http://docs.openstack.org/developer/swift/
|
||
.. _Ceph: https://ceph.com
|
||
.. _`S3`: https://aws.amazon.com/s3/
|
||
.. _`Redis`: https://redis.io
|
||
|
||
Available index back-ends
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Gnocchi currently offers different index drivers:
|
||
|
||
* `PostgreSQL`_ (preferred)
|
||
* `MySQL`_ (at least version 5.6.4)
|
||
|
||
Those drivers offer almost the same performance and features, though PostgreSQL
|
||
tends to be more performant and has some additional features (e.g. resource
|
||
duration computing).
|
||
|
||
.. _PostgreSQL: http://postgresql.org
|
||
.. _MySQL: http://mysql.org
|
||
|
||
How to plan for Gnocchi’s storage
|
||
---------------------------------
|
||
|
||
Gnocchi uses a custom file format based on its library *Carbonara*. In Gnocchi,
|
||
a time series is a collection of points, where a point is a given measure, or
|
||
sample, in the lifespan of a time series. The storage format is compressed
|
||
using various techniques, therefore the computing of a time series' size can be
|
||
estimated based on its **worst** case scenario with the following formula::
|
||
|
||
number of points × 8 bytes = size in bytes
|
||
|
||
The number of points you want to keep is usually determined by the following
|
||
formula::
|
||
|
||
number of points = timespan ÷ granularity
|
||
|
||
For example, if you want to keep a year of data with a one minute resolution::
|
||
|
||
number of points = (365 days × 24 hours × 60 minutes) ÷ 1 minute
|
||
number of points = 525 600
|
||
|
||
Then::
|
||
|
||
size in bytes = 525 600 bytes × 6 = 3 159 600 bytes = 3 085 KiB
|
||
|
||
This is just for a single aggregated time series. If your archive policy uses
|
||
the 6 default aggregation methods (mean, min, max, sum, std, count) with the
|
||
same "one year, one minute aggregations" resolution, the space used will go up
|
||
to a maximum of 6 × 4.1 MiB = 24.6 MiB.
|