Change-Id: I007351039135bd7e46422ce743d694363b2aa6aa
8.4 KiB
Add an Elasticsearch v2 storage driver
https://storyboard.openstack.org/#!/story/2006332
Problem Description
For now, there is only one v2 storage driver: InfluxDB. However there should always be several proposed choices for each of CloudKitty's modules.
The following strengths make Elasticsearch a great candidate:
- It's a widespread solution. On most deployments, it is likely that an Elasticsearch cluster is available (for example for log centralization).
- HA and clustering. Elasticsearch features HA and clustering by default, whereas it is only available in the paid version of InfluxDB.
- It's performant. And Elasticsearch allows some tuning by admins.
- Data visualization Data from the InfluxDB storage driver can be visualized with Grafana, the Elastic stack provides Kibana.
Proposed Change
A v2 storage driver for Elasticsearch, available through the
cloudkitty.storage.v2.backends.elasticsearch
entrypoint.
Here's a summary of the routes and aggregation methods that will be used for each of the v2 storage driver interface's methods:
init
:PUT /<index>
See "Data model impact" for mapping details.push
:POST /<index>/<mapping>/_bulk
retrieve
:GET /<index>/_search
A standard search query with filters.total
:GET /<index>/_search
Thecomposite
aggregation will be used: Severalterms
aggregations in themust
clause of abool
query will allow to group data on specific attributes. Asum
aggregation will then be applied to the buckets to obtain the qty and price for each of them.delete
:POST /<index>/_delete_by_query
Same principle as theretrieve
method, but for deletion.
Warning
The "composite" query is stable since Elasticsearch version 6.5. In
order to be compatible with 6.x and 7.x, cloudkitty will use the
include_type_name
parameter for mapping creation. This
parameter was added in Elasticsearch 6.8. This parameter will be removed
in Elasticsearch 8. Thus, CloudKitty will require Elasticsearch >=
6.5 and < to 8.
Note
About pagination: Given that offset
+
size
can't exceed 15000 in the search
API, the
retrieve
function will use scrolling. The
search_after
feature will not be used, as it is stateless,
which means that consecutive requests may return unexpected results
depending on the index updates happening at the same time. The duration
for which scroll contexts should be kept open will be configurable
through a config file option marked as advanced.
The total
function will use the after
parameter of the composite
aggregation
Note
The CloudKitty storage driver will only require the OSS version of Elasticsearch to work. However, some X-Pack features of the Basic version, like authentication, will be supported (but not mandatory) in the future.
Alternatives
None.
Data model impact
The data model used in Elasticsearch will be as follows:
Each DataPoint will be a single document. An existing empty index is required (this will allow tuning from admins). In order to improve overall performance, a mapping with the following attributes will be created.
start
: (date) The start of the period the datapoint applies to.end
: (date) The end of the period the datapoint applies to.type
: (keyword) The type of the datapoint.unit
: (keyword) The unit of the datapoint.qty
: (double) The qty of the datapoint.price
: (double) The price of the datapoint.groupby
: (object) Dict of the datapoint's groupby attributes.metadata
: (object) Dict of the datapoint's metadata attributes.
Note
In order to allow flexible groupby/metadata, the associated objects will be flexible.
Note
Given that we will only do exact value searches, every
string
attribute will be converted to a
keyword
. This will be achieved using dynamic templates.
Warning
By default, the _source
field will be enabled. An option
to disable it in order to improve storage size may be added, but this
should be done with care. See the link to the Elasticsearch
documentation in the references for details.
In the end, the mapping will be defined as follows:
{"mappings": {
"_doc": {
// cast all strings to keywords
"dynamic_templates": [
{"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
},
]// we won't add any attribute to the base object, so dynamic must be false
"dynamic": false,
"properties": {
"start": {"type": "date"},
"end": {"type": "date"},
"type": {"type": "keyword"},
"unit": {"type": "keyword"},
"qty": {"type": "double"},
"price": {"type": "double"},
// groupby and metadata will accept new attributes
"groupby": {"dynamic": true, "type": "object"},
// even though metadata should not be indexed, disabling it can't be
// undone, and disabled objects are only available through the "_source"
// field, which may also be disabled
"metadata": {"dynamic": true, "type": "object"}
}
}
} }
Note
Given that a term to filter on may be part of groupby
or
metadata
, each filter will add two term
queries to the should
part of the bool
query
(one for the groupby
section and one for the
metadata
section). Thus, the
minimum_should_match
parameter of the bool
query will be set to half of the number of terms in the
should
query.
REST API impact
None.
Security impact
In the first iteration, there will be no support for x-pack authentication. It will be up to the admins to secure the connections between the Elasticsearch cluster and CloudKitty. Authentication will be introduced in future releases.
Notifications Impact
None.
Other end user impact
None.
Performance Impact
On most benchmarks (and from what could be determined from POCs), data insertion into Elasticsearch is slower than insertion into InfluxDB. However, Elasticsearch is faster for aggregations. However, once CloudKitty has caught up with the current timestamp, not many insertions are required. Moreover, Elasticsearch's support for clustering and for tuning should allow for a better overall performance in the end.
Other deployer impact
The new backend will require more configuration from the admins:
- Index aliases and lifecycles
- Shards and replicas
Developer impact
None
Implementation
Assignee(s)
- Primary assignee:
-
peschk_l
Work Items
- Implement an Elasticsearch storage driver
- Add support for the driver to the Devstack plugin.
- Add a Tempest job where the Elasticsearch storage driver is used.
Dependencies
Elasticsearch >= 6.5.
Testing
In addition to unit tests, this will be tested with Tempest.
Documentation Impact
The configuration options provided of this driver will be detailed in the documentation. There will also be a section dedicated to the configuration of the Elasticsearch index.
References
- Dynamic templates: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
_source
field: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.htmlsearch_after
parameter: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-request-body.html#request-body-search-search-after- Elasticsearch and InfluxDB benchmarks: https://jolicode.com/blog/influxdb-vs-elasticsearch-for-time-series-and-metrics-data