Optimizing SQL queries that filter on a time range

Queries filtering on date fields are slow as they have to parse each
row. There are sometimes millions of rows to parse while only a few
thousands are necessary.

The following patch narrows data to process by filtering more on
frame_model.begin as a firtst step using a `BETWEEN` statement instead
of >=

Change-Id: I8acbc8946d9e001419f7bf5064fcebe0a0ae907a
Depends-On: Ia6908d13c91a02c47863ae6ac4b595ac98f9fd91
This commit is contained in:
Olivier Chaze 2022-11-23 12:36:04 +01:00 committed by Rafael Weingärtner
parent 0c1eabc364
commit 10a9482a91
2 changed files with 20 additions and 6 deletions

View File

@ -112,9 +112,10 @@ class SQLAlchemyStorage(storage.BaseStorage):
if service:
q = q.filter(
self.frame_model.res_type == service)
# begin and end filters are both needed, do not remove one of them.
q = q.filter(
self.frame_model.begin >= begin,
self.frame_model.end <= end,
self.frame_model.begin.between(begin, end),
self.frame_model.end.between(begin, end),
self.frame_model.res_type != '_NO_DATA_')
if groupby:
q = q.group_by(sqlalchemy.sql.text(groupby))
@ -136,9 +137,10 @@ class SQLAlchemyStorage(storage.BaseStorage):
q = utils.model_query(
self.frame_model,
session)
# begin and end filters are both needed, do not remove one of them.
q = q.filter(
self.frame_model.begin >= begin,
self.frame_model.end <= end)
self.frame_model.begin.between(begin, end),
self.frame_model.end.between(begin, end))
tenants = q.distinct().values(
self.frame_model.tenant_id)
return [tenant.tenant_id for tenant in tenants]
@ -152,9 +154,10 @@ class SQLAlchemyStorage(storage.BaseStorage):
q = utils.model_query(
self.frame_model,
session)
# begin and end filters are both needed, do not remove one of them.
q = q.filter(
self.frame_model.begin >= begin,
self.frame_model.end <= end)
self.frame_model.begin.between(begin, end),
self.frame_model.end.between(begin, end))
for filter_name, filter_value in filters.items():
if filter_value:
q = q.filter(

View File

@ -0,0 +1,11 @@
---
features:
- |
Queries filtering on date fields are slow as they have to parse each row.
There are sometimes millions of rows to parse while only a few thousands
are necessary.
The following patch narrows data to process by filtering more on
frame_model.begin as a firtst step using a `BETWEEN` statement instead
of >=
The BETWEEN statement requires an indexes to be efficient which are
https://review.opendev.org/c/openstack/cloudkitty/+/865435/