doc: update backup instructions
Update the backup instructions for some recent changes. Make a note of the streaming backup method, discuss some caveats with append-only mode and discuss the pruning scripts and when to run (c.f. I9559bb8aeeef06b95fb9e172a2c5bfb5be5b480e, I250d84c4a9f707e63fef6f70cfdcc1fb7807d3a7). Change-Id: Idb04ebfa5666cd3c20bc0132683d187e705da3f1
This commit is contained in:
parent
62801d8a93
commit
116a2ca4a4
@ -240,6 +240,31 @@ individual host to be backed up. The host to be backed up initiates
|
||||
the backup process to the remote backup server(s) using a separate ssh
|
||||
key setup just for backup communication (see ``/root/.ssh/config``).
|
||||
|
||||
Setting up hosts for backup
|
||||
---------------------------
|
||||
|
||||
To setup a host for backup, put it in the ``borg-backup`` group.
|
||||
|
||||
Hosts can specify ``borg_backup_excludes_extra`` and
|
||||
``borg_backup_dirs_extra`` to exclude or include specific directories
|
||||
as required (see role documentation for more details).
|
||||
|
||||
``borg`` splits backup data into chunks and de-duplicates as much as
|
||||
possible. For backing up large items, particularly things like
|
||||
database dumps, we want to give ``borg`` as much chance to
|
||||
de-duplicate as possible. Approaches such as dumping to compressed
|
||||
files on disk defeat de-duplication because all the data changes for
|
||||
each dump.
|
||||
|
||||
For dumping large data, hosts should put a file into
|
||||
``/etc/borg-streams`` that performs the dump in an uncompressed manner
|
||||
to stdout. The backup scripts will create a separate archive for each
|
||||
stream defined here. For more details, see the ``backup`` role
|
||||
documentation. These streams should attempt to be as friendly to
|
||||
de-duplication as possible; see some of the examples of ``mysqldump``
|
||||
to find arguments that help keep the output data more stable (and
|
||||
hence more easily de-duplicated).
|
||||
|
||||
Restore from Backup
|
||||
-------------------
|
||||
|
||||
@ -255,109 +280,32 @@ time is to
|
||||
* sudo ``su -`` to switch to the backup user for the host to be restored
|
||||
* you will now be in the home directory of that user
|
||||
* run ``/opt/borg/bin/borg list ./backup`` to list the archives available
|
||||
* these should look like ``hostname-YYYY-MM-DDTHH:MM:SS``
|
||||
* these should look like ``<hostname>-<stream>-YYYY-MM-DDTHH:MM:SS``
|
||||
* move to working directory
|
||||
* extract one of the appropriate archives with ``/opt/borg/bin/borg extract ~/backup <archive-tag>``
|
||||
|
||||
|
||||
Rotating backup storage
|
||||
Managing backup storage
|
||||
-----------------------
|
||||
|
||||
We run ``borg`` in append-only mode, so that clients can not remove
|
||||
old backups on the server.
|
||||
We run ``borg`` in append-only mode. This means clients can not
|
||||
remove old backups on the server.
|
||||
|
||||
TODO(ianw) : Write instructions on how to prune server side. We
|
||||
should monitor growth to see if automatic pruning would be
|
||||
appropriate, or periodic manual pruning, or something similar to this
|
||||
existing system where we keep a historic archive and start fresh.
|
||||
|
||||
The backup server keeps an active volume and the previously rotated
|
||||
volume. Each consists of 3 x 1TiB volumes grouped with LVM. The
|
||||
volumes are mounted at ``/opt/backups-YYYYMM`` for the date it was
|
||||
created; ``/opt/backups`` is a symlink to the latest volume.
|
||||
Periodically we rotate the active volume for a fresh one. Follow this
|
||||
procedure:
|
||||
|
||||
#. Create the new volumes via API (on ``bridge.o.o``). Create 3
|
||||
volumes, named for the server with the year and date added::
|
||||
|
||||
DATE=$(date +%Y%m)
|
||||
OS_VOLUME_API_VERSION=1
|
||||
OS_CMD="./env/bin/openstack --os-cloud-openstackci-rax --os-region=ORD"
|
||||
SERVER="backup01.ord.rax.ci.openstack.org"
|
||||
${CMD} volume create --size 1024 ${SERVER}/main01-${DATE}
|
||||
${CMD} volume create --size 1024 ${SERVER}/main02-${DATE}
|
||||
${CMD} volume create --size 1024 ${SERVER}/main03-${DATE}
|
||||
|
||||
#. Attach the volumes to the backup server::
|
||||
${OS_CMD} server add volume ${SERVER} ${SERVER}/main01-${DATE}
|
||||
${OS_CMD} server add volume ${SERVER} ${SERVER}/main02-${DATE}
|
||||
${OS_CMD} server add volume ${SERVER} ${SERVER}/main03-${DATE}
|
||||
|
||||
#. Now on the backup server, create the new backup LVM volume (get the
|
||||
device names from ``dmesg`` when they were attached). For
|
||||
simplicity we create a new volume group for each backup series, and
|
||||
a single logical volume ontop::
|
||||
|
||||
DATE=$(date +%Y%m)
|
||||
pvcreate /dev/xvd<DRIVE1> /dev/xvd<DRIVE2> /dev/xvd<DRIVE3>
|
||||
vgcreate main-${DATE} /dev/xvdX /dev/xvdY /dev/xvdZ
|
||||
lvcreate -l 100%FREE -n backups-${DATE} main-${DATE}
|
||||
|
||||
mkfs.ext4 -m 0 -j -L "backups-${DATE}" /dev/main-${DATE}/backups-${DATE}
|
||||
tune2fs -i 0 -c 0 /dev/main-${DATE}/backups-${DATE}
|
||||
|
||||
mkdir /opt/backups-${DATE}
|
||||
# manually add mount details to /etc/fstab
|
||||
mount /opt/backups-${DATE}
|
||||
|
||||
#. Making sure there are no backups currently running you can now
|
||||
begin to switch the backups (you can stop the ssh service, but be
|
||||
careful not to then drop your connection and lock yourself out; you
|
||||
can always reboot via the API if you do). Firstly, edit
|
||||
``/etc/fstab`` and make the current (soon to be *old*) backup
|
||||
volume mount read-only. Unmount the old volume and then remount it
|
||||
(now as read-only). This should prevent any accidental removal of
|
||||
the existing backups during the following procedures.
|
||||
|
||||
#. Pre-seed the new backup directory (same terminal as above). This
|
||||
will copy all the directories and authentication details (but none
|
||||
of the actual backups) and initalise for fresh backups::
|
||||
|
||||
cd /opt/backups-${DATE}
|
||||
rsync -avz --exclude '.bup' /opt/backups/ .
|
||||
for dir in bup-*; do su $dir -c "BUP_DIR=/opt/backups-${DATE}/$dir/.bup bup init"; done
|
||||
#. The ``/opt/backups`` symlink can now be switched to the new
|
||||
volume::
|
||||
|
||||
ln -sf /opt/backups-${DATE} /opt/backups
|
||||
#. ssh can be re-enabled and the new backup volume is effectively
|
||||
active.
|
||||
|
||||
#. Now run a test backup from a server manually. Choose one, get the
|
||||
backup command from cron and run it manually in a screen (it might
|
||||
take a while), ensuring everything seems to be writing correctly to
|
||||
the new volume.
|
||||
|
||||
#. You can now clean up the oldest backups (the one *before* the one
|
||||
you just rotated). Remove the mount from fstab, unmount the volume
|
||||
and cleanup the LVM components::
|
||||
|
||||
DATE=<INSERT OLD DATE CODE HERE>
|
||||
umount /opt/backups-${DATE}
|
||||
lvremove /dev/main-${DATE}/backups-${DATE}
|
||||
vgremove main-${DATE}
|
||||
# pvremove the volumes; they will have PFree @ 1024.00g as
|
||||
# they are now not assigned to anything
|
||||
pvremove /dev/xvd<DRIVE1>
|
||||
pvremove /dev/xvd<DRIVE2>
|
||||
pvremove /dev/xvd<DRIVE3>
|
||||
|
||||
#. Remove volumes via API (opposite of adding above with ``server
|
||||
volume detach`` then ``volume delete``).
|
||||
|
||||
#. Done! Come back and rotate it again next year.
|
||||
However, due to the way borg works, append-only mode plays all client
|
||||
transactions into a transaction log until a read-write operation
|
||||
occurs. Examining the repository will appear to have all these
|
||||
transactions applied (e.g. pruned archives will not appear; even if
|
||||
they have not actually been pruned from disk). If you have reason to
|
||||
not trust the state of the backup, you should *not* run any read-write
|
||||
operations. You will need to manually examine the transaction log and
|
||||
roll-back to a known good state; see
|
||||
`<https://borgbackup.readthedocs.io/en/stable/usage/notes.html#append-only-mode>`__.
|
||||
|
||||
However, we have limited backup space. Each backup server has a
|
||||
script ``/usr/local/bin/prune-borg-backups`` which can be run to
|
||||
reclaim space. This will keep the last 7 days of backups, then
|
||||
monthly backups for 1 year and yearly backups for each archive. The
|
||||
backup servers will send a warning when backup volume usage is high,
|
||||
at which point this can be run manually.
|
||||
|
||||
.. _force-merging-a-change:
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user