From b9058927687806e6ea19f2864b0b4d8f332e08de Mon Sep 17 00:00:00 2001 From: Nick Jones Date: Thu, 7 Jun 2018 17:26:52 +0100 Subject: [PATCH] Spec: Kolla-managed backup (and restore) for MariaDB This mini-spec documents an approach for how Kolla-triggered and managed MariaDB backups might be handled. Change-Id: I65f1ab92b9dce48cdd752fffd2123bd58d65b98f bp: https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery --- specs/mariadb-backup-recovery.rst | 201 ++++++++++++++++++++++++++++++ 1 file changed, 201 insertions(+) create mode 100644 specs/mariadb-backup-recovery.rst diff --git a/specs/mariadb-backup-recovery.rst b/specs/mariadb-backup-recovery.rst new file mode 100644 index 0000000000..9d559ac0f8 --- /dev/null +++ b/specs/mariadb-backup-recovery.rst @@ -0,0 +1,201 @@ +=========================== +MariaDB Backup and Recovery +=========================== + +Existing BP: https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery + +This blueprint attempts to outline the introduction of backup and recovery +features in Kolla, for data hosted in MariaDB. It aims to do so by +introducing tooling and options that are proven in deployments elsewhere, and +with a degree of flexibility which facilitates integration with existing +solutions. + +Problem description +=================== + +Kolla currently lacks an easy way for an operator to be able to take a backup +of some or all of their MariaDB databases. Unrecoverable loss of data hosted +in MariaDB can have disastrous consequences, so a feature which eases the +introduction of a sensible backup and restore routine into an OpenStack +operator's life is a worthwhile endeavour. + +As backups are no use unless you can restore them, this solution should also +include a feature - or at the very least, a documented set of steps - for an +operator to be able to easily perform a restore and test the validity of their +data. + +As stated in the BP, general backup strategy should be considered out-of-scope +as it's likely that each individual or organisation deploying OpenStack will +have their own opinion on what should be done and the frequency with which +these things should be performed. However, Kolla should at least offer a way +to expose the necessary mechanisms to facilitate existing strategies. + +Use cases +--------- + +- As an operator, I wish to make an ad-hoc (on demand) backup of some or all + MariaDB databases, prior to making any manual changes; + +- As an operator, I'd like to include my MariaDB database(s) in the scope of my + regularly scheduled backups, and would like to be able to do so via Kolla; + +- As an operator, I want to be able to restore my database(s) to a particular + point in time following a failed upgrade or a stray manual query. + +For the first two use-cases, full and incremental backup options should be +available. + +Proposed change +=============== + +There are several considerations as part of this proposed change. There's the +tooling necessary to perform a backup, the ability to schedule backups, and the +requirement to transfer the data elsewhere. + +Backup Tooling +-------------- + +The linked Blueprint linked mentions the fact that there are several tools +available which facilitate MariaDB backup (and restore). The most common is +`mysqldump`, as this is included as standard with every installation and can +be used to take a consistent backup of some or all databases. However, taking +a backup with this tool has some limitations, chief amongst which is that it +can have a significant performance impact when taking backups in a way that +doesn't lock the database for the duration. + +Instead, this proposed change will make use of Percona's XtraBackup tool, which +has been designed specifically for 'hot-backups' avoiding locking and heavy +performance impact. Because of the way XtraBackup functions, it also +facilitates a simpler test / restore procedure as these are physical copies of +the underlying database files, meaning a new instance of MariaDB can be spun up +against these in order to test. + +Percona provides pre-packaged binaries for this tool via their own mirrors in +all of the major distributions supported by Kolla. + +To implement this, this change will introduce a new Kolla container image +hosting the XtraBackup binary plus dependencies necessary to be able to +connect to MariaDB and retrieve data from some or all of the databases. + +A Kolla-Ansible role will be created which will define tasks to: + +* Establish the necessary backup-specific credentials; + +* Start a container from this image with an associated volume and perform a + full backup if no previous data exists, or an incremental backup if there is + existing data. See below for a suggested default schedule. + +The backup data will reside in a dedicated Docker volume. This can then be +used to facilitate transfer of the data elsewhere (i.e mounted by another +container with the tooling necessary to encrypt and upload) or be exported to +another host for testing. + +Backups will be performed by default locally, that is on the node currently +running MariaDB, or on the designated master in a Galera cluster. However, it +should be possible to nominate any node which has access to either the internal +API address or the database node directly. It's up to the operator to choose +which mode is best for them, as there are a number of different considerations +and trade-offs to make. A new configuration option should be introduced to +facilitate selection from a member of the MariaDB group. + +Scheduling +---------- + +Automatic scheduling of backups should be disabled by default, but Kolla could +provide a mechanism to facilitate this if it's an operational requirement. + +The approach described above doesn't introduce a new, persist container. +Instead, it would be one which runs on demand and produces the target backup +files. + +Scheduling could be handled by changing this approach so that the container +runs in perpetuity, with the localised backup scripts being triggered by cron +according to a suggested (but configurable) default schedule. A proposed +schedule would be: + + * A full backup every 24 hours; + * An incremental backup every hour; + * Full backups are retained for two weeks; + * Incremental backups are retained for 24 hours. + +Alternatively, backups could be triggered by another container or service +running on the host. + +Archival +-------- + +Another tool is required to manage the backup lifecycle. This is currently +considered out of scope. + +Backup Restore +-------------- + +By targeting a discrete Docker volume for the data that's been backed up, +facilitating a restore is relatively straightforward. Automating this is +currently out of scope, but this piece of work should include an example +procedure for how to handle this volume and access the data that's been backed +up. + +Security impact +--------------- + +Implementation of this BP will require the introduction of a dedicated backup +role within MariaDB in order to give the tooling the necessary access. This +will be read-only in nature and restricted to these specific privileges: + +``SELECT,RELOAD,LOCK TABLES,SHOW VIEW,REPLICATION CLIENT`` + +Performance Impact +------------------ + +It's possible that there might be some performance degradation whilst taking a +backup of a database node which has a significant amount of data, especially if +the backup target device is the same as the source. + +Aside from degradation incurred by way of I/O contention, the selection of +XtraBackup is an attempt at mitigating any kind of performance impact. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + +Nick Jones (yankcrime) + +Work Items +---------- + +1. Introduce a new Kolla image containing XtraBackup package plus dependencies + such as scripts to handle triggering the backup; + +2. Introduce a new Kolla-Ansible command and corresponding role to take a + backup using a container launched from this image, saving data to a + dedicated volume; + +3. Documentation for new options and also restore process, along with examples. + +Testing +======= + +Tests should be added to validate that a backup has been taken successfully +with the default settings in place. This would take the form of starting +another MariaDB container with the backup volume mounted as ``/var/lib/mysql`` +and then performing some example queries to ensure expected data is returned. + +Documentation Impact +==================== + +Kolla and Kolla-Ansible documentation will need updating to introduce the new +backup features and the various options that are available. + +A dedicated and comprehensive section should be provide for restores, along +with example scenarios. + +References +========== +[1] https://blueprints.launchpad.net/kolla/+spec/database-backup-recovery +[2] https://etherpad.openstack.org/p/kolla-rocky-ptg-db-backup-restore +[3] https://www.percona.com/doc/percona-xtrabackup/LATEST/index.html