From 688dd78a08a5215e1f75bfd9a7ac13e6032fbae6 Mon Sep 17 00:00:00 2001 From: Clark Boylan Date: Thu, 29 Feb 2024 12:42:10 -0800 Subject: [PATCH] Add more info to afs fileserver recovery docs During the debian buster mirror cleanup we lost a volume backing afs on afs01.dfw.openstack.org. Our existing docs gave us a good starting point for recovery, but they could use more specifics. Add that info. Change-Id: Ib334759314f0fd493e9b1bc8c06a8060ba8917ee --- doc/source/afs.rst | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/doc/source/afs.rst b/doc/source/afs.rst index 99e7be57b6..9d23c3fcfb 100644 --- a/doc/source/afs.rst +++ b/doc/source/afs.rst @@ -360,16 +360,31 @@ usable after recovery: * Pause mirror updates and volume release cron jobs + * This is most easily done putting the mirror update server in the + Ansible emergency file then commenting out all root cronjobs on + that server. + * Reboot the server; fix any filesystem errors and check the salvager logs + * ``/var/log/boot.log`` should indicate ``/dev/main/vicepa`` was fsck'd + + * ``/var/log/openafs/SalsrvLog`` contains the salvager logs. + * Check for any stuck volume transactions; remedy as appropriate + * Run ``vos status -server $AFS_FILESERVER_IP_ADDR -localauth`` against + the IP address of each afs fileserver. + * Perform a manual release of every volume from a terminal on a server using "-localauth" in case OpenAFS decides it can't do an incremental update. -* Re-enable cron jobs + * In a screen session on the afs fileserver run: + ``vos release -localauth -verbose $AFS_VOLUME`` + +* Re-enable cron jobs and remove the mirror update server from the + Ansible emergency file. Mirrors ~~~~~~~