Sharder: warn when sharding appears to have stalled.

This patch add a configurable timeout after which the sharder
will warn if a container DB has not completed sharding.

The new config is container_sharding_timeout with a default of
172800 seconds (2 days).

Drive-by fix: recording sharding progress will cover the case
of shard range shrinking too.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I6ce299b5232a8f394e35f148317f9e08208a0c0f
This commit is contained in:
Jianjian Huo
2022-09-26 14:58:43 -07:00
parent 65a1f4a2ff
commit 4ed2b89cb7
3 changed files with 204 additions and 104 deletions

View File

@@ -507,6 +507,12 @@ use = egg:swift#xprofile
# The default is 12 hours (12 x 60 x 60)
# recon_sharded_timeout = 43200
#
# Maximum amount of time in seconds after sharding has been started on a shard
# container and before it's considered as timeout. After this amount of time,
# sharder will warn that a container DB has not completed sharding.
# The default is 48 hours (48 x 60 x 60)
# container_sharding_timeout = 172800
#
# Large databases tend to take a while to work with, but we want to make sure
# we write down our progress. Use a larger-than-normal broker timeout to make
# us less likely to bomb out on a LockTimeout.