From 87fccc8e9b780ae80912ce0e9b869cbed96a9bf1 Mon Sep 17 00:00:00 2001 From: "James E. Blair" Date: Tue, 10 Sep 2019 12:37:29 -0700 Subject: [PATCH] Add docs for recovering an OpenAFS fileserver This should be a smooth recovery process. Change-Id: I3c68b077e38a88160286d94e71676c0c4dbb6a51 --- doc/source/afs.rst | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/doc/source/afs.rst b/doc/source/afs.rst index 26b5455054..63bfcf24e7 100644 --- a/doc/source/afs.rst +++ b/doc/source/afs.rst @@ -311,6 +311,25 @@ Then remove the server with :: Finally run the ``bos create`` command above with any modified parameters to restart the server. +Recovering a Failed Fileserver +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a fileserver crashes, take the following steps to ensure it's +usable after recovery: + +* Pause mirror updates and volume release cron jobs + +* Reboot the server; fix any filesystem errors and check the salvager + logs + +* Check for any stuck volume transactions; remedy as appropriate + +* Perform a manual release of every volume from a terminal on a server + using "-localauth" in case OpenAFS decides it can't do an + incremental update. + +* Re-enable cron jobs + Mirrors ~~~~~~~