Spec: Kolla-managed backup (and restore) for MariaDB

This mini-spec documents an approach for how Kolla-triggered and managed
MariaDB backups might be handled.

Change-Id: I65f1ab92b9dce48cdd752fffd2123bd58d65b98f
bp: https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery
Nick Jones 5 years ago committed by Nick Jones
parent 3d4e284e13
commit b905892768
No known key found for this signature in database
GPG Key ID: F6D24C8669938334

@ -0,0 +1,201 @@
MariaDB Backup and Recovery
Existing BP: https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery
This blueprint attempts to outline the introduction of backup and recovery
features in Kolla, for data hosted in MariaDB. It aims to do so by
introducing tooling and options that are proven in deployments elsewhere, and
with a degree of flexibility which facilitates integration with existing
Problem description
Kolla currently lacks an easy way for an operator to be able to take a backup
of some or all of their MariaDB databases. Unrecoverable loss of data hosted
in MariaDB can have disastrous consequences, so a feature which eases the
introduction of a sensible backup and restore routine into an OpenStack
operator's life is a worthwhile endeavour.
As backups are no use unless you can restore them, this solution should also
include a feature - or at the very least, a documented set of steps - for an
operator to be able to easily perform a restore and test the validity of their
As stated in the BP, general backup strategy should be considered out-of-scope
as it's likely that each individual or organisation deploying OpenStack will
have their own opinion on what should be done and the frequency with which
these things should be performed. However, Kolla should at least offer a way
to expose the necessary mechanisms to facilitate existing strategies.
Use cases
- As an operator, I wish to make an ad-hoc (on demand) backup of some or all
MariaDB databases, prior to making any manual changes;
- As an operator, I'd like to include my MariaDB database(s) in the scope of my
regularly scheduled backups, and would like to be able to do so via Kolla;
- As an operator, I want to be able to restore my database(s) to a particular
point in time following a failed upgrade or a stray manual query.
For the first two use-cases, full and incremental backup options should be
Proposed change
There are several considerations as part of this proposed change. There's the
tooling necessary to perform a backup, the ability to schedule backups, and the
requirement to transfer the data elsewhere.
Backup Tooling
The linked Blueprint linked mentions the fact that there are several tools
available which facilitate MariaDB backup (and restore). The most common is
`mysqldump`, as this is included as standard with every installation and can
be used to take a consistent backup of some or all databases. However, taking
a backup with this tool has some limitations, chief amongst which is that it
can have a significant performance impact when taking backups in a way that
doesn't lock the database for the duration.
Instead, this proposed change will make use of Percona's XtraBackup tool, which
has been designed specifically for 'hot-backups' avoiding locking and heavy
performance impact. Because of the way XtraBackup functions, it also
facilitates a simpler test / restore procedure as these are physical copies of
the underlying database files, meaning a new instance of MariaDB can be spun up
against these in order to test.
Percona provides pre-packaged binaries for this tool via their own mirrors in
all of the major distributions supported by Kolla.
To implement this, this change will introduce a new Kolla container image
hosting the XtraBackup binary plus dependencies necessary to be able to
connect to MariaDB and retrieve data from some or all of the databases.
A Kolla-Ansible role will be created which will define tasks to:
* Establish the necessary backup-specific credentials;
* Start a container from this image with an associated volume and perform a
full backup if no previous data exists, or an incremental backup if there is
existing data. See below for a suggested default schedule.
The backup data will reside in a dedicated Docker volume. This can then be
used to facilitate transfer of the data elsewhere (i.e mounted by another
container with the tooling necessary to encrypt and upload) or be exported to
another host for testing.
Backups will be performed by default locally, that is on the node currently
running MariaDB, or on the designated master in a Galera cluster. However, it
should be possible to nominate any node which has access to either the internal
API address or the database node directly. It's up to the operator to choose
which mode is best for them, as there are a number of different considerations
and trade-offs to make. A new configuration option should be introduced to
facilitate selection from a member of the MariaDB group.
Automatic scheduling of backups should be disabled by default, but Kolla could
provide a mechanism to facilitate this if it's an operational requirement.
The approach described above doesn't introduce a new, persist container.
Instead, it would be one which runs on demand and produces the target backup
Scheduling could be handled by changing this approach so that the container
runs in perpetuity, with the localised backup scripts being triggered by cron
according to a suggested (but configurable) default schedule. A proposed
schedule would be:
* A full backup every 24 hours;
* An incremental backup every hour;
* Full backups are retained for two weeks;
* Incremental backups are retained for 24 hours.
Alternatively, backups could be triggered by another container or service
running on the host.
Another tool is required to manage the backup lifecycle. This is currently
considered out of scope.
Backup Restore
By targeting a discrete Docker volume for the data that's been backed up,
facilitating a restore is relatively straightforward. Automating this is
currently out of scope, but this piece of work should include an example
procedure for how to handle this volume and access the data that's been backed
Security impact
Implementation of this BP will require the introduction of a dedicated backup
role within MariaDB in order to give the tooling the necessary access. This
will be read-only in nature and restricted to these specific privileges:
Performance Impact
It's possible that there might be some performance degradation whilst taking a
backup of a database node which has a significant amount of data, especially if
the backup target device is the same as the source.
Aside from degradation incurred by way of I/O contention, the selection of
XtraBackup is an attempt at mitigating any kind of performance impact.
Primary assignee:
Nick Jones (yankcrime)
Work Items
1. Introduce a new Kolla image containing XtraBackup package plus dependencies
such as scripts to handle triggering the backup;
2. Introduce a new Kolla-Ansible command and corresponding role to take a
backup using a container launched from this image, saving data to a
dedicated volume;
3. Documentation for new options and also restore process, along with examples.
Tests should be added to validate that a backup has been taken successfully
with the default settings in place. This would take the form of starting
another MariaDB container with the backup volume mounted as ``/var/lib/mysql``
and then performing some example queries to ensure expected data is returned.
Documentation Impact
Kolla and Kolla-Ansible documentation will need updating to introduce the new
backup features and the various options that are available.
A dedicated and comprehensive section should be provide for restores, along
with example scenarios.
[1] https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery
[2] https://etherpad.openstack.org/p/kolla-rocky-ptg-db-backup-restore
[3] https://www.percona.com/doc/percona-xtrabackup/LATEST/index.html