ironic/releasenotes/notes/add-sqlite-db-retries-f493d5d7aa6db78b.yaml
Julia Kreger 091edb0631 Retry SQLite DB write failures due to locks
Adds a database retry decorator to capture and retry exceptions
rooted in SQLite locking. These locking errors are rooted in
the fact that essentially, we can only have one distinct writer
at a time. This writer becomes transaction oriented as well.

Unfortunately with our green threads and API surface, we run into
cases where we have background operations (mainly, periodic tasks...)
and API surface transacations which need to operate against the DB
as well. Because we can't say one task or another (realistically
speaking) can have exclusive control and access, then we run into
database locking errors.

So when we encounter a lock error, we retry.

Adds two additional configuration parameters to the database
configuration section, to allow this capability to be further
tuned, as file IO performance is *surely* a contributing factor
to our locking issues as we mostly see them with a loaded CI
system where other issues begin to crop up.

The new parameters are as follows:
* sqlite_retries, a boolean value allowing the retry logic
  to be disabled. This can largely be ignored, but is available
  as it was logical to include.
* sqlite_max_wait_for_retry, a integer value, default 30 seconds
  as to how long to wait for retrying SQLite database operations
  which are failing due to a "database is locked" error.

The retry logic uses the tenacity library, and performs an
expoential backoff. Setting the amount of time to a very large
number is not advisable, as such the default of 30 seconds was
deemed reasonable.

Change-Id: Ifeb92e9f23a94f2d96bb495fe63a71df9865fef3
2023-07-18 13:14:45 +00:00

26 lines
1.3 KiB
YAML

---
fixes:
- |
Adds a database write retry decorate for SQLite failures reporting
"database is locked". By default, through the new configuration
parameter ``[database]sqlite_max_wait_for_retry``, retries will
be performed on failing write operations for up to *30* seconds.
This value can be tuned, but be warned it is an expotential
backoff retry model, and HTTP requests can give up if no
response is received in a reasonable time, thus *30* seconds
was deemed a reasonable default.
The retry logic can be disabled using the
``[database]sqlite_retries`` option, which defaults to
*True*. Users of other, mutli-threaded/concurrent-write database
platforms are not impacted by this change, as the retry logic
recognizes if another database is in use and bypasses the retry
logic in that case. A similar retry logic concept already exists
with other databases in the form of a "Database Deadlock" retry
where two writers conflict on the same row or table. The database
abstraction layer already handles such deadlock conditions.
The SQLite file based locking issue is unfortunately more common
with file based write locking as the entire file, in other words
the entire database, to perform the write operation.