doc: update backup instructions

Update the backup instructions for some recent changes. Make a note of the streaming backup method, discuss some caveats with append-only mode and discuss the pruning scripts and when to run (c.f. I9559bb8aeeef06b95fb9e172a2c5bfb5be5b480e, I250d84c4a9f707e63fef6f70cfdcc1fb7807d3a7). Change-Id: Idb04ebfa5666cd3c20bc0132683d187e705da3f1
2021-02-09 12:15:24 +11:00 · 2021-02-09 12:15:24 +11:00 · 116a2ca4a4
commit 116a2ca4a4
parent 62801d8a93
1 changed files with 44 additions and 96 deletions
--- a/doc/source/sysadmin.rst
+++ b/doc/source/sysadmin.rst
@ -240,6 +240,31 @@ individual host to be backed up.  The host to be backed up initiates
 the backup process to the remote backup server(s) using a separate ssh
 key setup just for backup communication (see ``/root/.ssh/config``).
 Setting up hosts for backup
 ---------------------------
 To setup a host for backup, put it in the ``borg-backup`` group.
 Hosts can specify ``borg_backup_excludes_extra`` and
 ``borg_backup_dirs_extra`` to exclude or include specific directories
 as required (see role documentation for more details).
 ``borg`` splits backup data into chunks and de-duplicates as much as
 possible.  For backing up large items, particularly things like
 database dumps, we want to give ``borg`` as much chance to
 de-duplicate as possible.  Approaches such as dumping to compressed
 files on disk defeat de-duplication because all the data changes for
 each dump.
 For dumping large data, hosts should put a file into
 ``/etc/borg-streams`` that performs the dump in an uncompressed manner
 to stdout.  The backup scripts will create a separate archive for each
 stream defined here.  For more details, see the ``backup`` role
 documentation.  These streams should attempt to be as friendly to
 de-duplication as possible; see some of the examples of ``mysqldump``
 to find arguments that help keep the output data more stable (and
 hence more easily de-duplicated).
 Restore from Backup
 -------------------
@ -255,109 +280,32 @@ time is to
 * sudo ``su -`` to switch to the backup user for the host to be restored
 * you will now be in the home directory of that user
 * run ``/opt/borg/bin/borg list ./backup`` to list the archives available
-* these should look like ``hostname-YYYY-MM-DDTHH:MM:SS``
+* these should look like ``<hostname>-<stream>-YYYY-MM-DDTHH:MM:SS``
 * move to working directory
 * extract one of the appropriate archives with ``/opt/borg/bin/borg extract ~/backup <archive-tag>``
-
+Managing backup storage
 Rotating backup storage
 -----------------------
-We run ``borg`` in append-only mode, so that clients can not remove
+We run ``borg`` in append-only mode.  This means clients can not
-old backups on the server.
+remove old backups on the server.
-TODO(ianw) : Write instructions on how to prune server side.  We
+However, due to the way borg works, append-only mode plays all client
-should monitor growth to see if automatic pruning would be
+transactions into a transaction log until a read-write operation
-appropriate, or periodic manual pruning, or something similar to this
+occurs.  Examining the repository will appear to have all these
-existing system where we keep a historic archive and start fresh.
+transactions applied (e.g. pruned archives will not appear; even if
-
+they have not actually been pruned from disk).  If you have reason to
-The backup server keeps an active volume and the previously rotated
+not trust the state of the backup, you should *not* run any read-write
-volume.  Each consists of 3 x 1TiB volumes grouped with LVM.  The
+operations.  You will need to manually examine the transaction log and
-volumes are mounted at ``/opt/backups-YYYYMM`` for the date it was
+roll-back to a known good state; see
-created; ``/opt/backups`` is a symlink to the latest volume.
+`<https://borgbackup.readthedocs.io/en/stable/usage/notes.html#append-only-mode>`__.
 Periodically we rotate the active volume for a fresh one.  Follow this
 procedure:
 #. Create the new volumes via API (on ``bridge.o.o``).  Create 3
   volumes, named for the server with the year and date added::
     DATE=$(date +%Y%m)
     OS_VOLUME_API_VERSION=1
     OS_CMD="./env/bin/openstack --os-cloud-openstackci-rax --os-region=ORD"
     SERVER="backup01.ord.rax.ci.openstack.org"
     ${CMD} volume create --size 1024 ${SERVER}/main01-${DATE}
     ${CMD} volume create --size 1024 ${SERVER}/main02-${DATE}
     ${CMD} volume create --size 1024 ${SERVER}/main03-${DATE}
 #. Attach the volumes to the backup server::
     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main01-${DATE}
     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main02-${DATE}
     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main03-${DATE}
 #. Now on the backup server, create the new backup LVM volume (get the
   device names from ``dmesg`` when they were attached).  For
   simplicity we create a new volume group for each backup series, and
   a single logical volume ontop::
     DATE=$(date +%Y%m)
     pvcreate /dev/xvd<DRIVE1> /dev/xvd<DRIVE2> /dev/xvd<DRIVE3>
     vgcreate main-${DATE} /dev/xvdX /dev/xvdY /dev/xvdZ
     lvcreate -l 100%FREE -n backups-${DATE} main-${DATE}
     mkfs.ext4 -m 0 -j -L "backups-${DATE}" /dev/main-${DATE}/backups-${DATE}
     tune2fs -i 0 -c 0 /dev/main-${DATE}/backups-${DATE}
     mkdir /opt/backups-${DATE}
     # manually add mount details to /etc/fstab
     mount /opt/backups-${DATE}
 #. Making sure there are no backups currently running you can now
   begin to switch the backups (you can stop the ssh service, but be
   careful not to then drop your connection and lock yourself out; you
   can always reboot via the API if you do).  Firstly, edit
   ``/etc/fstab`` and make the current (soon to be *old*) backup
   volume mount read-only.  Unmount the old volume and then remount it
   (now as read-only).  This should prevent any accidental removal of
   the existing backups during the following procedures.
 #. Pre-seed the new backup directory (same terminal as above).  This
   will copy all the directories and authentication details (but none
   of the actual backups) and initalise for fresh backups::
     cd /opt/backups-${DATE}
     rsync -avz --exclude '.bup' /opt/backups/ .
     for dir in bup-*; do su $dir -c "BUP_DIR=/opt/backups-${DATE}/$dir/.bup bup init"; done
 #. The ``/opt/backups`` symlink can now be switched to the new
   volume::
     ln -sf /opt/backups-${DATE} /opt/backups
 #. ssh can be re-enabled and the new backup volume is effectively
   active.
 #. Now run a test backup from a server manually.  Choose one, get the
   backup command from cron and run it manually in a screen (it might
   take a while), ensuring everything seems to be writing correctly to
   the new volume.
 #. You can now clean up the oldest backups (the one *before* the one
   you just rotated).  Remove the mount from fstab, unmount the volume
   and cleanup the LVM components::
     DATE=<INSERT OLD DATE CODE HERE>
     umount /opt/backups-${DATE}
     lvremove /dev/main-${DATE}/backups-${DATE}
     vgremove main-${DATE}
     # pvremove the volumes; they will have PFree @ 1024.00g as
     # they are now not assigned to anything
     pvremove /dev/xvd<DRIVE1>
     pvremove /dev/xvd<DRIVE2>
     pvremove /dev/xvd<DRIVE3>
 #. Remove volumes via API (opposite of adding above with ``server
   volume detach`` then ``volume delete``).
 #. Done!  Come back and rotate it again next year.
 However, we have limited backup space.  Each backup server has a
 script ``/usr/local/bin/prune-borg-backups`` which can be run to
 reclaim space.  This will keep the last 7 days of backups, then
 monthly backups for 1 year and yearly backups for each archive.  The
 backup servers will send a warning when backup volume usage is high,
 at which point this can be run manually.
 .. _force-merging-a-change: