doc: update backup instructions

Update the backup instructions for some recent changes. Make a note of the streaming backup method, discuss some caveats with append-only mode and discuss the pruning scripts and when to run (c.f. I9559bb8aeeef06b95fb9e172a2c5bfb5be5b480e, I250d84c4a9f707e63fef6f70cfdcc1fb7807d3a7). Change-Id: Idb04ebfa5666cd3c20bc0132683d187e705da3f1
2021-02-09 12:15:24 +11:00 · 2021-02-09 12:15:24 +11:00 · 116a2ca4a4
commit 116a2ca4a4
parent 62801d8a93
1 changed files with 44 additions and 96 deletions
--- a/doc/source/sysadmin.rst
+++ b/doc/source/sysadmin.rst
@ -240,6 +240,31 @@ individual host to be backed up.  The host to be backed up initiates
 the backup process to the remote backup server(s) using a separate ssh
 key setup just for backup communication (see ``/root/.ssh/config``).

+Setting up hosts for backup
+---------------------------
+
+To setup a host for backup, put it in the ``borg-backup`` group.
+
+Hosts can specify ``borg_backup_excludes_extra`` and
+``borg_backup_dirs_extra`` to exclude or include specific directories
+as required (see role documentation for more details).
+
+``borg`` splits backup data into chunks and de-duplicates as much as
+possible.  For backing up large items, particularly things like
+database dumps, we want to give ``borg`` as much chance to
+de-duplicate as possible.  Approaches such as dumping to compressed
+files on disk defeat de-duplication because all the data changes for
+each dump.
+
+For dumping large data, hosts should put a file into
+``/etc/borg-streams`` that performs the dump in an uncompressed manner
+to stdout.  The backup scripts will create a separate archive for each
+stream defined here.  For more details, see the ``backup`` role
+documentation.  These streams should attempt to be as friendly to
+de-duplication as possible; see some of the examples of ``mysqldump``
+to find arguments that help keep the output data more stable (and
+hence more easily de-duplicated).
+
 Restore from Backup
 -------------------

@ -255,109 +280,32 @@ time is to
 * sudo ``su -`` to switch to the backup user for the host to be restored
 * you will now be in the home directory of that user
 * run ``/opt/borg/bin/borg list ./backup`` to list the archives available
-* these should look like ``hostname-YYYY-MM-DDTHH:MM:SS``
+* these should look like ``<hostname>-<stream>-YYYY-MM-DDTHH:MM:SS``
 * move to working directory
 * extract one of the appropriate archives with ``/opt/borg/bin/borg extract ~/backup <archive-tag>``

-
-Rotating backup storage
+Managing backup storage
 -----------------------

-We run ``borg`` in append-only mode, so that clients can not remove
-old backups on the server.
+We run ``borg`` in append-only mode.  This means clients can not
+remove old backups on the server.

-TODO(ianw) : Write instructions on how to prune server side.  We
-should monitor growth to see if automatic pruning would be
-appropriate, or periodic manual pruning, or something similar to this
-existing system where we keep a historic archive and start fresh.
-
-The backup server keeps an active volume and the previously rotated
-volume.  Each consists of 3 x 1TiB volumes grouped with LVM.  The
-volumes are mounted at ``/opt/backups-YYYYMM`` for the date it was
-created; ``/opt/backups`` is a symlink to the latest volume.
-Periodically we rotate the active volume for a fresh one.  Follow this
-procedure:
-
-#. Create the new volumes via API (on ``bridge.o.o``).  Create 3
-   volumes, named for the server with the year and date added::
-
-     DATE=$(date +%Y%m)
-     OS_VOLUME_API_VERSION=1
-     OS_CMD="./env/bin/openstack --os-cloud-openstackci-rax --os-region=ORD"
-     SERVER="backup01.ord.rax.ci.openstack.org"
-     ${CMD} volume create --size 1024 ${SERVER}/main01-${DATE}
-     ${CMD} volume create --size 1024 ${SERVER}/main02-${DATE}
-     ${CMD} volume create --size 1024 ${SERVER}/main03-${DATE}
-
-#. Attach the volumes to the backup server::
-     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main01-${DATE}
-     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main02-${DATE}
-     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main03-${DATE}
-
-#. Now on the backup server, create the new backup LVM volume (get the
-   device names from ``dmesg`` when they were attached).  For
-   simplicity we create a new volume group for each backup series, and
-   a single logical volume ontop::
-
-     DATE=$(date +%Y%m)
-     pvcreate /dev/xvd<DRIVE1> /dev/xvd<DRIVE2> /dev/xvd<DRIVE3>
-     vgcreate main-${DATE} /dev/xvdX /dev/xvdY /dev/xvdZ
-     lvcreate -l 100%FREE -n backups-${DATE} main-${DATE}
-
-     mkfs.ext4 -m 0 -j -L "backups-${DATE}" /dev/main-${DATE}/backups-${DATE}
-     tune2fs -i 0 -c 0 /dev/main-${DATE}/backups-${DATE}
-
-     mkdir /opt/backups-${DATE}
-     # manually add mount details to /etc/fstab
-     mount /opt/backups-${DATE}
-
-#. Making sure there are no backups currently running you can now
-   begin to switch the backups (you can stop the ssh service, but be
-   careful not to then drop your connection and lock yourself out; you
-   can always reboot via the API if you do).  Firstly, edit
-   ``/etc/fstab`` and make the current (soon to be *old*) backup
-   volume mount read-only.  Unmount the old volume and then remount it
-   (now as read-only).  This should prevent any accidental removal of
-   the existing backups during the following procedures.
-
-#. Pre-seed the new backup directory (same terminal as above).  This
-   will copy all the directories and authentication details (but none
-   of the actual backups) and initalise for fresh backups::
-
-     cd /opt/backups-${DATE}
-     rsync -avz --exclude '.bup' /opt/backups/ .
-     for dir in bup-*; do su $dir -c "BUP_DIR=/opt/backups-${DATE}/$dir/.bup bup init"; done
-#. The ``/opt/backups`` symlink can now be switched to the new
-   volume::
-
-     ln -sf /opt/backups-${DATE} /opt/backups
-#. ssh can be re-enabled and the new backup volume is effectively
-   active.
-
-#. Now run a test backup from a server manually.  Choose one, get the
-   backup command from cron and run it manually in a screen (it might
-   take a while), ensuring everything seems to be writing correctly to
-   the new volume.
-
-#. You can now clean up the oldest backups (the one *before* the one
-   you just rotated).  Remove the mount from fstab, unmount the volume
-   and cleanup the LVM components::
-
-     DATE=<INSERT OLD DATE CODE HERE>
-     umount /opt/backups-${DATE}
-     lvremove /dev/main-${DATE}/backups-${DATE}
-     vgremove main-${DATE}
-     # pvremove the volumes; they will have PFree @ 1024.00g as
-     # they are now not assigned to anything
-     pvremove /dev/xvd<DRIVE1>
-     pvremove /dev/xvd<DRIVE2>
-     pvremove /dev/xvd<DRIVE3>
-
-#. Remove volumes via API (opposite of adding above with ``server
-   volume detach`` then ``volume delete``).
-
-#. Done!  Come back and rotate it again next year.
+However, due to the way borg works, append-only mode plays all client
+transactions into a transaction log until a read-write operation
+occurs.  Examining the repository will appear to have all these
+transactions applied (e.g. pruned archives will not appear; even if
+they have not actually been pruned from disk).  If you have reason to
+not trust the state of the backup, you should *not* run any read-write
+operations.  You will need to manually examine the transaction log and
+roll-back to a known good state; see
+`<https://borgbackup.readthedocs.io/en/stable/usage/notes.html#append-only-mode>`__.

+However, we have limited backup space.  Each backup server has a
+script ``/usr/local/bin/prune-borg-backups`` which can be run to
+reclaim space.  This will keep the last 7 days of backups, then
+monthly backups for 1 year and yearly backups for each archive.  The
+backup servers will send a warning when backup volume usage is high,
+at which point this can be run manually.

 .. _force-merging-a-change: