Sharder: warn when sharding appears to have stalled.

This patch add a configurable timeout after which the sharder will warn if a container DB has not completed sharding. The new config is container_sharding_timeout with a default of 172800 seconds (2 days). Drive-by fix: recording sharding progress will cover the case of shard range shrinking too. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I6ce299b5232a8f394e35f148317f9e08208a0c0f
2022-09-26 14:58:43 -07:00
parent 65a1f4a2ff
commit 4ed2b89cb7
3 changed files with 204 additions and 104 deletions
--- a/etc/container-server.conf-sample
+++ b/etc/container-server.conf-sample
@@ -507,6 +507,12 @@ use = egg:swift#xprofile
 # The default is 12 hours (12 x 60 x 60)
 # recon_sharded_timeout = 43200
 #
+# Maximum amount of time in seconds after sharding has been started on a shard
+# container and before it's considered as timeout. After this amount of time,
+# sharder will warn that a container DB has not completed sharding.
+# The default is 48 hours (48 x 60 x 60)
+# container_sharding_timeout = 172800
+#
 # Large databases tend to take a while to work with, but we want to make sure
 # we write down our progress. Use a larger-than-normal broker timeout to make
 # us less likely to bomb out on a LockTimeout.