Add more info to afs fileserver recovery docs

During the debian buster mirror cleanup we lost a volume backing afs on
afs01.dfw.openstack.org. Our existing docs gave us a good starting point
for recovery, but they could use more specifics. Add that info.

Change-Id: Ib334759314f0fd493e9b1bc8c06a8060ba8917ee
This commit is contained in:
Clark Boylan 2024-02-29 12:42:10 -08:00
parent 51b6478849
commit 688dd78a08

View File

@ -360,16 +360,31 @@ usable after recovery:
* Pause mirror updates and volume release cron jobs * Pause mirror updates and volume release cron jobs
* This is most easily done putting the mirror update server in the
Ansible emergency file then commenting out all root cronjobs on
that server.
* Reboot the server; fix any filesystem errors and check the salvager * Reboot the server; fix any filesystem errors and check the salvager
logs logs
* ``/var/log/boot.log`` should indicate ``/dev/main/vicepa`` was fsck'd
* ``/var/log/openafs/SalsrvLog`` contains the salvager logs.
* Check for any stuck volume transactions; remedy as appropriate * Check for any stuck volume transactions; remedy as appropriate
* Run ``vos status -server $AFS_FILESERVER_IP_ADDR -localauth`` against
the IP address of each afs fileserver.
* Perform a manual release of every volume from a terminal on a server * Perform a manual release of every volume from a terminal on a server
using "-localauth" in case OpenAFS decides it can't do an using "-localauth" in case OpenAFS decides it can't do an
incremental update. incremental update.
* Re-enable cron jobs * In a screen session on the afs fileserver run:
``vos release -localauth -verbose $AFS_VOLUME``
* Re-enable cron jobs and remove the mirror update server from the
Ansible emergency file.
Mirrors Mirrors
~~~~~~~ ~~~~~~~