Add more info to afs fileserver recovery docs

During the debian buster mirror cleanup we lost a volume backing afs on afs01.dfw.openstack.org. Our existing docs gave us a good starting point for recovery, but they could use more specifics. Add that info. Change-Id: Ib334759314f0fd493e9b1bc8c06a8060ba8917ee
2024-02-29 12:42:10 -08:00 · 2024-02-29 12:42:10 -08:00 · 688dd78a08
commit 688dd78a08
parent 51b6478849
1 changed files with 16 additions and 1 deletions
--- a/doc/source/afs.rst
+++ b/doc/source/afs.rst
@ -360,16 +360,31 @@ usable after recovery:

 * Pause mirror updates and volume release cron jobs

+  * This is most easily done putting the mirror update server in the
+    Ansible emergency file then commenting out all root cronjobs on
+    that server.
+
 * Reboot the server; fix any filesystem errors and check the salvager
  logs

+  * ``/var/log/boot.log`` should indicate ``/dev/main/vicepa`` was fsck'd
+
+  * ``/var/log/openafs/SalsrvLog`` contains the salvager logs.
+
 * Check for any stuck volume transactions; remedy as appropriate

+  * Run ``vos status -server $AFS_FILESERVER_IP_ADDR -localauth`` against
+    the IP address of each afs fileserver.
+
 * Perform a manual release of every volume from a terminal on a server
  using "-localauth" in case OpenAFS decides it can't do an
  incremental update.

-* Re-enable cron jobs
+  * In a screen session on the afs fileserver run:
+    ``vos release -localauth -verbose $AFS_VOLUME``
+
+* Re-enable cron jobs and remove the mirror update server from the
+  Ansible emergency file.

 Mirrors
 ~~~~~~~