diff --git a/doc/source/sysadmin.rst b/doc/source/sysadmin.rst index 6dd246a748..4290f8147f 100644 --- a/doc/source/sysadmin.rst +++ b/doc/source/sysadmin.rst @@ -240,6 +240,31 @@ individual host to be backed up. The host to be backed up initiates the backup process to the remote backup server(s) using a separate ssh key setup just for backup communication (see ``/root/.ssh/config``). +Setting up hosts for backup +--------------------------- + +To setup a host for backup, put it in the ``borg-backup`` group. + +Hosts can specify ``borg_backup_excludes_extra`` and +``borg_backup_dirs_extra`` to exclude or include specific directories +as required (see role documentation for more details). + +``borg`` splits backup data into chunks and de-duplicates as much as +possible. For backing up large items, particularly things like +database dumps, we want to give ``borg`` as much chance to +de-duplicate as possible. Approaches such as dumping to compressed +files on disk defeat de-duplication because all the data changes for +each dump. + +For dumping large data, hosts should put a file into +``/etc/borg-streams`` that performs the dump in an uncompressed manner +to stdout. The backup scripts will create a separate archive for each +stream defined here. For more details, see the ``backup`` role +documentation. These streams should attempt to be as friendly to +de-duplication as possible; see some of the examples of ``mysqldump`` +to find arguments that help keep the output data more stable (and +hence more easily de-duplicated). + Restore from Backup ------------------- @@ -255,109 +280,32 @@ time is to * sudo ``su -`` to switch to the backup user for the host to be restored * you will now be in the home directory of that user * run ``/opt/borg/bin/borg list ./backup`` to list the archives available -* these should look like ``hostname-YYYY-MM-DDTHH:MM:SS`` +* these should look like ``--YYYY-MM-DDTHH:MM:SS`` * move to working directory * extract one of the appropriate archives with ``/opt/borg/bin/borg extract ~/backup `` - -Rotating backup storage +Managing backup storage ----------------------- -We run ``borg`` in append-only mode, so that clients can not remove -old backups on the server. +We run ``borg`` in append-only mode. This means clients can not +remove old backups on the server. -TODO(ianw) : Write instructions on how to prune server side. We -should monitor growth to see if automatic pruning would be -appropriate, or periodic manual pruning, or something similar to this -existing system where we keep a historic archive and start fresh. - -The backup server keeps an active volume and the previously rotated -volume. Each consists of 3 x 1TiB volumes grouped with LVM. The -volumes are mounted at ``/opt/backups-YYYYMM`` for the date it was -created; ``/opt/backups`` is a symlink to the latest volume. -Periodically we rotate the active volume for a fresh one. Follow this -procedure: - -#. Create the new volumes via API (on ``bridge.o.o``). Create 3 - volumes, named for the server with the year and date added:: - - DATE=$(date +%Y%m) - OS_VOLUME_API_VERSION=1 - OS_CMD="./env/bin/openstack --os-cloud-openstackci-rax --os-region=ORD" - SERVER="backup01.ord.rax.ci.openstack.org" - ${CMD} volume create --size 1024 ${SERVER}/main01-${DATE} - ${CMD} volume create --size 1024 ${SERVER}/main02-${DATE} - ${CMD} volume create --size 1024 ${SERVER}/main03-${DATE} - -#. Attach the volumes to the backup server:: - ${OS_CMD} server add volume ${SERVER} ${SERVER}/main01-${DATE} - ${OS_CMD} server add volume ${SERVER} ${SERVER}/main02-${DATE} - ${OS_CMD} server add volume ${SERVER} ${SERVER}/main03-${DATE} - -#. Now on the backup server, create the new backup LVM volume (get the - device names from ``dmesg`` when they were attached). For - simplicity we create a new volume group for each backup series, and - a single logical volume ontop:: - - DATE=$(date +%Y%m) - pvcreate /dev/xvd /dev/xvd /dev/xvd - vgcreate main-${DATE} /dev/xvdX /dev/xvdY /dev/xvdZ - lvcreate -l 100%FREE -n backups-${DATE} main-${DATE} - - mkfs.ext4 -m 0 -j -L "backups-${DATE}" /dev/main-${DATE}/backups-${DATE} - tune2fs -i 0 -c 0 /dev/main-${DATE}/backups-${DATE} - - mkdir /opt/backups-${DATE} - # manually add mount details to /etc/fstab - mount /opt/backups-${DATE} - -#. Making sure there are no backups currently running you can now - begin to switch the backups (you can stop the ssh service, but be - careful not to then drop your connection and lock yourself out; you - can always reboot via the API if you do). Firstly, edit - ``/etc/fstab`` and make the current (soon to be *old*) backup - volume mount read-only. Unmount the old volume and then remount it - (now as read-only). This should prevent any accidental removal of - the existing backups during the following procedures. - -#. Pre-seed the new backup directory (same terminal as above). This - will copy all the directories and authentication details (but none - of the actual backups) and initalise for fresh backups:: - - cd /opt/backups-${DATE} - rsync -avz --exclude '.bup' /opt/backups/ . - for dir in bup-*; do su $dir -c "BUP_DIR=/opt/backups-${DATE}/$dir/.bup bup init"; done -#. The ``/opt/backups`` symlink can now be switched to the new - volume:: - - ln -sf /opt/backups-${DATE} /opt/backups -#. ssh can be re-enabled and the new backup volume is effectively - active. - -#. Now run a test backup from a server manually. Choose one, get the - backup command from cron and run it manually in a screen (it might - take a while), ensuring everything seems to be writing correctly to - the new volume. - -#. You can now clean up the oldest backups (the one *before* the one - you just rotated). Remove the mount from fstab, unmount the volume - and cleanup the LVM components:: - - DATE= - umount /opt/backups-${DATE} - lvremove /dev/main-${DATE}/backups-${DATE} - vgremove main-${DATE} - # pvremove the volumes; they will have PFree @ 1024.00g as - # they are now not assigned to anything - pvremove /dev/xvd - pvremove /dev/xvd - pvremove /dev/xvd - -#. Remove volumes via API (opposite of adding above with ``server - volume detach`` then ``volume delete``). - -#. Done! Come back and rotate it again next year. +However, due to the way borg works, append-only mode plays all client +transactions into a transaction log until a read-write operation +occurs. Examining the repository will appear to have all these +transactions applied (e.g. pruned archives will not appear; even if +they have not actually been pruned from disk). If you have reason to +not trust the state of the backup, you should *not* run any read-write +operations. You will need to manually examine the transaction log and +roll-back to a known good state; see +``__. +However, we have limited backup space. Each backup server has a +script ``/usr/local/bin/prune-borg-backups`` which can be run to +reclaim space. This will keep the last 7 days of backups, then +monthly backups for 1 year and yearly backups for each archive. The +backup servers will send a warning when backup volume usage is high, +at which point this can be run manually. .. _force-merging-a-change: