Propose reprocessing API

This specification proposes an API to enable reprocessing. Change-Id: I22993261010e6e7f8b5c0605efad9d4c0ec92d6c
2021-05-13 11:04:53 -03:00 · 2021-05-13 11:04:53 -03:00 · 3848c30c09
commit 3848c30c09
parent 2b1fbc5657
2 changed files with 216 additions and 0 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -4,6 +4,16 @@
 Cloudkitty Project Specifications
 =================================

+Yoga
+=======
+
+.. toctree::
+   :glob:
+   :maxdepth: 1
+
+   specs/yoga/*
+
+
 Wallaby
 =======

--- a/specs/yoga/reprocessing_api.rst
+++ b/specs/yoga/reprocessing_api.rst
@ -0,0 +1,206 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+===============================================================
+Introduce reprocessing API
+===============================================================
+
+This specification proposes an API to enable reprocessing.
+
+
+Problem Description
+===================
+
+Sometimes due to a misconfiguration in rating rules, or due to
+backfill of data in the collector backend (Gnocchi, Prometheus,
+or others), operators need to reprocess CloudKitty data. The
+reprocessing process that is normally taken is the following.
+
+- Stop all CloudKitty processors,
+- reset the `state` (last processed/rated timestamp) in MySQL database for all
+  scopes that need reprocessing,
+- clean/delete all processed/rated data for the affected timeframe in the
+  processed data backend (e.g. InfluxDB),
+- restart CloudKitty processors.
+
+CloudKitty has an API already to reset the processing/rating
+state of a scope [1]_. However, this API only resets the state
+in the MySQL database. Therefore, it is important to have
+an API that implements the manual steps that operators are
+normally executing when reprocessing is required.
+
+
+Proposed Change
+===============
+
+To avoid human error during reprocessing, and also to make
+it easier for non-CloudKitty experts to execute it, we propose
+the introduction of a reprocessing API that schedules the
+reprocessing and also removes the processed/rated data for the
+affected timeframe from the backend (e.g. InfluxDB).
+
+
+To achieve that, we will use a new table to store the reprocessing
+progress for scopes, the proposed table name is ``storage_scope_reprocessing_schedule``.
+
+.. code-block:: SQL
+
+  +------------------------------+--------------+------+-----+-------------------+----------------+
+  | Field                        | Type         | Null | Key | Default           | Extra          |
+  +------------------------------+--------------+------+-----+-------------------+----------------+
+  | id                           | int(11)      | NO   | PRI | NULL              | auto_increment |
+  | identifier                   | varchar(256) | NO   | MUL | NULL              | FK             |
+  | start_reprocess_time         | datetime     | NO   |     | -                 |                |
+  | end_reprocess_time           | datetime     | NO   |     | -                 |                |
+  | current_reprocess_time       | datetime     | yes  |     | NULL              |                |
+  | reason                       | text         | NO   |     | NULL              |                |
+  +------------------------------+--------------+------+-----+-------------------+----------------+
+
+The endpoint for the schedule reprocessing API will be:
+
+- ``POST`` ``/v2/task/reprocesses``
+
+  - ``scope_id``: a list of scope IDs to schedule reprocessing to.
+  - ``start_reprocess_time``: a timestamp to start reprocessing.
+  - ``end_reprocess_time``: a timestamp to end reprocessing.
+  - ``reason``: the reason for the reprocessing
+
+- ``GET`` ``/v2/task/reprocesses/<scope_id>`` -- to retrieve the
+  reprocessing schedules for a single scope.
+
+- ``GET`` ``/v2/task/reprocesses`` - to retrieve the
+  reprocessing schedules for multiple scopes
+
+
+The ``/v2/task/reprocesses`` endpoint will receive a POST request similar to
+the following:
+
+.. code-block:: python
+
+    {
+      "scope_id": ["scope_id_one", "scope_id_two", ...],
+      "start_reprocess_time": "2021-05-01 00:00:00",
+      "end_reprocess_time": "2021-05-30 23:00:00"
+      "reason": "The reason why this reprocessing is scheduled."
+    }
+
+The reason field is mandatory to force users to explain why the
+reprocessing is scheduled. We are forcing users to register why
+they are taking such drastic measures, such as reprocessing.
+Therefore, we hold a history of the scheduled reprocessing, and
+their respective reasons/explanations.
+
+If one of the scope IDs informed via ``scope_id`` does not exist,
+we will raise an exception telling the operator that there are
+invalid scopes ID in the request.
+
+One can only schedule reprocessing for timestamps of scopes that
+have already been processed. Therefore, if the `end_reprocess_time`
+is after the latest processing timestamp for a given resource, an
+error is thrown in the API.
+
+One cannot schedule overlapping reprocessing. Therefore, it is
+only possible to reprocess a reprocessing of a scope, when the
+previously scheduled reprocessing is finished (if there is an
+overlapping of start/end timestamps).
+
+After the schedule has been created, the CloudKitty processors
+will clean the affected time range for the given resource, and
+then move on with the reprocessing. Every time a timestamp is
+reprocessed, the ``current_reprocess_time`` is updated, similarly
+to the current processing workflow.
+
+When the ``current_reprocess_time`` equals to ``end_reprocess_time``,
+it means that the scheduled reprocessing has finished. Therefore,
+The reprocessing workers will do nothing for this entry.
+
+We will also introduce an endpoint to consult the status of the
+scheduled reprocessing.
+
+- To list all schedules: ``GET`` ``/v2/task/reprocesses``
+
+  - ``scope_id``: optional field to filter scopes that
+    have been scheduled for reprocessing. If no scope
+    is provided, we list all.
+
+
+Alternatives
+------------
+
+Manually execute the reprocessing steps previously described.
+
+
+Data model impact
+-----------------
+
+A new table (``storage_scope_reprocessing_schedule``) will be
+added to the database.
+
+
+REST API impact
+---------------
+
+A new API will be introduced: ``/v2/task/reprocesses``.
+
+Security impact
+---------------
+
+Only admins must have access to this new API
+
+Notifications Impact
+--------------------
+None
+
+Other end user impact
+---------------------
+
+None
+
+Performance Impact
+------------------
+None
+
+Other deployer impact
+---------------------
+None
+
+Developer impact
+----------------
+None
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+ - Rafael <rafael@apache.org>
+
+
+Work Items
+----------
+
+1) propose, discuss, and merge the spec
+2) execute the implementations as described.
+3) implement changes in the CloudKitty client to support the new API
+
+Dependencies
+============
+None
+
+Testing
+=======
+Unit tested
+
+Documentation Impact
+====================
+None
+
+References
+==========
+
+.. [1] https://docs.openstack.org/cloudkitty/latest/api-reference/v2/index.html?expanded=get-a-rating-summary-detail,reset-the-status-of-several-scopes-detail#reset-the-status-of-several-scopes