Spec: Kolla-managed backup (and restore) for MariaDB
This mini-spec documents an approach for how Kolla-triggered and managed MariaDB backups might be handled. Change-Id: I65f1ab92b9dce48cdd752fffd2123bd58d65b98f bp: https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery
This commit is contained in:
parent
3d4e284e13
commit
b905892768
201
specs/mariadb-backup-recovery.rst
Normal file
201
specs/mariadb-backup-recovery.rst
Normal file
@ -0,0 +1,201 @@
|
||||
===========================
|
||||
MariaDB Backup and Recovery
|
||||
===========================
|
||||
|
||||
Existing BP: https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery
|
||||
|
||||
This blueprint attempts to outline the introduction of backup and recovery
|
||||
features in Kolla, for data hosted in MariaDB. It aims to do so by
|
||||
introducing tooling and options that are proven in deployments elsewhere, and
|
||||
with a degree of flexibility which facilitates integration with existing
|
||||
solutions.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Kolla currently lacks an easy way for an operator to be able to take a backup
|
||||
of some or all of their MariaDB databases. Unrecoverable loss of data hosted
|
||||
in MariaDB can have disastrous consequences, so a feature which eases the
|
||||
introduction of a sensible backup and restore routine into an OpenStack
|
||||
operator's life is a worthwhile endeavour.
|
||||
|
||||
As backups are no use unless you can restore them, this solution should also
|
||||
include a feature - or at the very least, a documented set of steps - for an
|
||||
operator to be able to easily perform a restore and test the validity of their
|
||||
data.
|
||||
|
||||
As stated in the BP, general backup strategy should be considered out-of-scope
|
||||
as it's likely that each individual or organisation deploying OpenStack will
|
||||
have their own opinion on what should be done and the frequency with which
|
||||
these things should be performed. However, Kolla should at least offer a way
|
||||
to expose the necessary mechanisms to facilitate existing strategies.
|
||||
|
||||
Use cases
|
||||
---------
|
||||
|
||||
- As an operator, I wish to make an ad-hoc (on demand) backup of some or all
|
||||
MariaDB databases, prior to making any manual changes;
|
||||
|
||||
- As an operator, I'd like to include my MariaDB database(s) in the scope of my
|
||||
regularly scheduled backups, and would like to be able to do so via Kolla;
|
||||
|
||||
- As an operator, I want to be able to restore my database(s) to a particular
|
||||
point in time following a failed upgrade or a stray manual query.
|
||||
|
||||
For the first two use-cases, full and incremental backup options should be
|
||||
available.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
There are several considerations as part of this proposed change. There's the
|
||||
tooling necessary to perform a backup, the ability to schedule backups, and the
|
||||
requirement to transfer the data elsewhere.
|
||||
|
||||
Backup Tooling
|
||||
--------------
|
||||
|
||||
The linked Blueprint linked mentions the fact that there are several tools
|
||||
available which facilitate MariaDB backup (and restore). The most common is
|
||||
`mysqldump`, as this is included as standard with every installation and can
|
||||
be used to take a consistent backup of some or all databases. However, taking
|
||||
a backup with this tool has some limitations, chief amongst which is that it
|
||||
can have a significant performance impact when taking backups in a way that
|
||||
doesn't lock the database for the duration.
|
||||
|
||||
Instead, this proposed change will make use of Percona's XtraBackup tool, which
|
||||
has been designed specifically for 'hot-backups' avoiding locking and heavy
|
||||
performance impact. Because of the way XtraBackup functions, it also
|
||||
facilitates a simpler test / restore procedure as these are physical copies of
|
||||
the underlying database files, meaning a new instance of MariaDB can be spun up
|
||||
against these in order to test.
|
||||
|
||||
Percona provides pre-packaged binaries for this tool via their own mirrors in
|
||||
all of the major distributions supported by Kolla.
|
||||
|
||||
To implement this, this change will introduce a new Kolla container image
|
||||
hosting the XtraBackup binary plus dependencies necessary to be able to
|
||||
connect to MariaDB and retrieve data from some or all of the databases.
|
||||
|
||||
A Kolla-Ansible role will be created which will define tasks to:
|
||||
|
||||
* Establish the necessary backup-specific credentials;
|
||||
|
||||
* Start a container from this image with an associated volume and perform a
|
||||
full backup if no previous data exists, or an incremental backup if there is
|
||||
existing data. See below for a suggested default schedule.
|
||||
|
||||
The backup data will reside in a dedicated Docker volume. This can then be
|
||||
used to facilitate transfer of the data elsewhere (i.e mounted by another
|
||||
container with the tooling necessary to encrypt and upload) or be exported to
|
||||
another host for testing.
|
||||
|
||||
Backups will be performed by default locally, that is on the node currently
|
||||
running MariaDB, or on the designated master in a Galera cluster. However, it
|
||||
should be possible to nominate any node which has access to either the internal
|
||||
API address or the database node directly. It's up to the operator to choose
|
||||
which mode is best for them, as there are a number of different considerations
|
||||
and trade-offs to make. A new configuration option should be introduced to
|
||||
facilitate selection from a member of the MariaDB group.
|
||||
|
||||
Scheduling
|
||||
----------
|
||||
|
||||
Automatic scheduling of backups should be disabled by default, but Kolla could
|
||||
provide a mechanism to facilitate this if it's an operational requirement.
|
||||
|
||||
The approach described above doesn't introduce a new, persist container.
|
||||
Instead, it would be one which runs on demand and produces the target backup
|
||||
files.
|
||||
|
||||
Scheduling could be handled by changing this approach so that the container
|
||||
runs in perpetuity, with the localised backup scripts being triggered by cron
|
||||
according to a suggested (but configurable) default schedule. A proposed
|
||||
schedule would be:
|
||||
|
||||
* A full backup every 24 hours;
|
||||
* An incremental backup every hour;
|
||||
* Full backups are retained for two weeks;
|
||||
* Incremental backups are retained for 24 hours.
|
||||
|
||||
Alternatively, backups could be triggered by another container or service
|
||||
running on the host.
|
||||
|
||||
Archival
|
||||
--------
|
||||
|
||||
Another tool is required to manage the backup lifecycle. This is currently
|
||||
considered out of scope.
|
||||
|
||||
Backup Restore
|
||||
--------------
|
||||
|
||||
By targeting a discrete Docker volume for the data that's been backed up,
|
||||
facilitating a restore is relatively straightforward. Automating this is
|
||||
currently out of scope, but this piece of work should include an example
|
||||
procedure for how to handle this volume and access the data that's been backed
|
||||
up.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
Implementation of this BP will require the introduction of a dedicated backup
|
||||
role within MariaDB in order to give the tooling the necessary access. This
|
||||
will be read-only in nature and restricted to these specific privileges:
|
||||
|
||||
``SELECT,RELOAD,LOCK TABLES,SHOW VIEW,REPLICATION CLIENT``
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
It's possible that there might be some performance degradation whilst taking a
|
||||
backup of a database node which has a significant amount of data, especially if
|
||||
the backup target device is the same as the source.
|
||||
|
||||
Aside from degradation incurred by way of I/O contention, the selection of
|
||||
XtraBackup is an attempt at mitigating any kind of performance impact.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
|
||||
Nick Jones (yankcrime)
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
1. Introduce a new Kolla image containing XtraBackup package plus dependencies
|
||||
such as scripts to handle triggering the backup;
|
||||
|
||||
2. Introduce a new Kolla-Ansible command and corresponding role to take a
|
||||
backup using a container launched from this image, saving data to a
|
||||
dedicated volume;
|
||||
|
||||
3. Documentation for new options and also restore process, along with examples.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Tests should be added to validate that a backup has been taken successfully
|
||||
with the default settings in place. This would take the form of starting
|
||||
another MariaDB container with the backup volume mounted as ``/var/lib/mysql``
|
||||
and then performing some example queries to ensure expected data is returned.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Kolla and Kolla-Ansible documentation will need updating to introduce the new
|
||||
backup features and the various options that are available.
|
||||
|
||||
A dedicated and comprehensive section should be provide for restores, along
|
||||
with example scenarios.
|
||||
|
||||
References
|
||||
==========
|
||||
[1] https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery
|
||||
[2] https://etherpad.openstack.org/p/kolla-rocky-ptg-db-backup-restore
|
||||
[3] https://www.percona.com/doc/percona-xtrabackup/LATEST/index.html
|
Loading…
Reference in New Issue
Block a user