doc: update backup instructions

Update the backup instructions for some recent changes.  Make a note
of the streaming backup method, discuss some caveats with append-only
mode and discuss the pruning scripts and when to run
(c.f. I9559bb8aeeef06b95fb9e172a2c5bfb5be5b480e,
I250d84c4a9f707e63fef6f70cfdcc1fb7807d3a7).

Change-Id: Idb04ebfa5666cd3c20bc0132683d187e705da3f1
This commit is contained in:
Ian Wienand 2021-02-09 12:15:24 +11:00
parent 62801d8a93
commit 116a2ca4a4

View File

@ -240,6 +240,31 @@ individual host to be backed up. The host to be backed up initiates
the backup process to the remote backup server(s) using a separate ssh
key setup just for backup communication (see ``/root/.ssh/config``).
Setting up hosts for backup
---------------------------
To setup a host for backup, put it in the ``borg-backup`` group.
Hosts can specify ``borg_backup_excludes_extra`` and
``borg_backup_dirs_extra`` to exclude or include specific directories
as required (see role documentation for more details).
``borg`` splits backup data into chunks and de-duplicates as much as
possible. For backing up large items, particularly things like
database dumps, we want to give ``borg`` as much chance to
de-duplicate as possible. Approaches such as dumping to compressed
files on disk defeat de-duplication because all the data changes for
each dump.
For dumping large data, hosts should put a file into
``/etc/borg-streams`` that performs the dump in an uncompressed manner
to stdout. The backup scripts will create a separate archive for each
stream defined here. For more details, see the ``backup`` role
documentation. These streams should attempt to be as friendly to
de-duplication as possible; see some of the examples of ``mysqldump``
to find arguments that help keep the output data more stable (and
hence more easily de-duplicated).
Restore from Backup
-------------------
@ -255,109 +280,32 @@ time is to
* sudo ``su -`` to switch to the backup user for the host to be restored
* you will now be in the home directory of that user
* run ``/opt/borg/bin/borg list ./backup`` to list the archives available
* these should look like ``hostname-YYYY-MM-DDTHH:MM:SS``
* these should look like ``<hostname>-<stream>-YYYY-MM-DDTHH:MM:SS``
* move to working directory
* extract one of the appropriate archives with ``/opt/borg/bin/borg extract ~/backup <archive-tag>``
Rotating backup storage
Managing backup storage
-----------------------
We run ``borg`` in append-only mode, so that clients can not remove
old backups on the server.
We run ``borg`` in append-only mode. This means clients can not
remove old backups on the server.
TODO(ianw) : Write instructions on how to prune server side. We
should monitor growth to see if automatic pruning would be
appropriate, or periodic manual pruning, or something similar to this
existing system where we keep a historic archive and start fresh.
The backup server keeps an active volume and the previously rotated
volume. Each consists of 3 x 1TiB volumes grouped with LVM. The
volumes are mounted at ``/opt/backups-YYYYMM`` for the date it was
created; ``/opt/backups`` is a symlink to the latest volume.
Periodically we rotate the active volume for a fresh one. Follow this
procedure:
#. Create the new volumes via API (on ``bridge.o.o``). Create 3
volumes, named for the server with the year and date added::
DATE=$(date +%Y%m)
OS_VOLUME_API_VERSION=1
OS_CMD="./env/bin/openstack --os-cloud-openstackci-rax --os-region=ORD"
SERVER="backup01.ord.rax.ci.openstack.org"
${CMD} volume create --size 1024 ${SERVER}/main01-${DATE}
${CMD} volume create --size 1024 ${SERVER}/main02-${DATE}
${CMD} volume create --size 1024 ${SERVER}/main03-${DATE}
#. Attach the volumes to the backup server::
${OS_CMD} server add volume ${SERVER} ${SERVER}/main01-${DATE}
${OS_CMD} server add volume ${SERVER} ${SERVER}/main02-${DATE}
${OS_CMD} server add volume ${SERVER} ${SERVER}/main03-${DATE}
#. Now on the backup server, create the new backup LVM volume (get the
device names from ``dmesg`` when they were attached). For
simplicity we create a new volume group for each backup series, and
a single logical volume ontop::
DATE=$(date +%Y%m)
pvcreate /dev/xvd<DRIVE1> /dev/xvd<DRIVE2> /dev/xvd<DRIVE3>
vgcreate main-${DATE} /dev/xvdX /dev/xvdY /dev/xvdZ
lvcreate -l 100%FREE -n backups-${DATE} main-${DATE}
mkfs.ext4 -m 0 -j -L "backups-${DATE}" /dev/main-${DATE}/backups-${DATE}
tune2fs -i 0 -c 0 /dev/main-${DATE}/backups-${DATE}
mkdir /opt/backups-${DATE}
# manually add mount details to /etc/fstab
mount /opt/backups-${DATE}
#. Making sure there are no backups currently running you can now
begin to switch the backups (you can stop the ssh service, but be
careful not to then drop your connection and lock yourself out; you
can always reboot via the API if you do). Firstly, edit
``/etc/fstab`` and make the current (soon to be *old*) backup
volume mount read-only. Unmount the old volume and then remount it
(now as read-only). This should prevent any accidental removal of
the existing backups during the following procedures.
#. Pre-seed the new backup directory (same terminal as above). This
will copy all the directories and authentication details (but none
of the actual backups) and initalise for fresh backups::
cd /opt/backups-${DATE}
rsync -avz --exclude '.bup' /opt/backups/ .
for dir in bup-*; do su $dir -c "BUP_DIR=/opt/backups-${DATE}/$dir/.bup bup init"; done
#. The ``/opt/backups`` symlink can now be switched to the new
volume::
ln -sf /opt/backups-${DATE} /opt/backups
#. ssh can be re-enabled and the new backup volume is effectively
active.
#. Now run a test backup from a server manually. Choose one, get the
backup command from cron and run it manually in a screen (it might
take a while), ensuring everything seems to be writing correctly to
the new volume.
#. You can now clean up the oldest backups (the one *before* the one
you just rotated). Remove the mount from fstab, unmount the volume
and cleanup the LVM components::
DATE=<INSERT OLD DATE CODE HERE>
umount /opt/backups-${DATE}
lvremove /dev/main-${DATE}/backups-${DATE}
vgremove main-${DATE}
# pvremove the volumes; they will have PFree @ 1024.00g as
# they are now not assigned to anything
pvremove /dev/xvd<DRIVE1>
pvremove /dev/xvd<DRIVE2>
pvremove /dev/xvd<DRIVE3>
#. Remove volumes via API (opposite of adding above with ``server
volume detach`` then ``volume delete``).
#. Done! Come back and rotate it again next year.
However, due to the way borg works, append-only mode plays all client
transactions into a transaction log until a read-write operation
occurs. Examining the repository will appear to have all these
transactions applied (e.g. pruned archives will not appear; even if
they have not actually been pruned from disk). If you have reason to
not trust the state of the backup, you should *not* run any read-write
operations. You will need to manually examine the transaction log and
roll-back to a known good state; see
`<https://borgbackup.readthedocs.io/en/stable/usage/notes.html#append-only-mode>`__.
However, we have limited backup space. Each backup server has a
script ``/usr/local/bin/prune-borg-backups`` which can be run to
reclaim space. This will keep the last 7 days of backups, then
monthly backups for 1 year and yearly backups for each archive. The
backup servers will send a warning when backup volume usage is high,
at which point this can be run manually.
.. _force-merging-a-change: