Limit rax-ord launch concurrency and don't retry
This region seems to take a very long time to launch nodes when we have a burst of requests for them, like a thundering herd sort of behavior causing launch times to increase substantially. We have a lot of capacity in this region though, so want to boot as many instances as we can here. Attempt to reduce the effect by limiting the number of instances nodepool will launch at the same time. Also, mitigate the higher timeout for this provider by not retrying launch failures, so that we won't ever lock a request for multiples of the timeout. Change-Id: I179ab22df37b2f996288820074ec69b8e0a202a5
This commit is contained in:
parent
34d3d03e32
commit
d0481326bf
@ -189,6 +189,13 @@ providers:
|
|||||||
region-name: 'ORD'
|
region-name: 'ORD'
|
||||||
cloud: rax
|
cloud: rax
|
||||||
boot-timeout: 120
|
boot-timeout: 120
|
||||||
|
# Under load, this region can take a very long time to launch instances,
|
||||||
|
# but we have a lot of capacity here so it's worthwhile to increase the
|
||||||
|
# timeout but mitigate node request delays by not retrying failures. Also
|
||||||
|
# try to substantially reduce the number of instances we launch in
|
||||||
|
# parallel.
|
||||||
|
max-concurrency: 10
|
||||||
|
launch-retries: 1
|
||||||
launch-timeout: 900
|
launch-timeout: 900
|
||||||
rate: 0.01
|
rate: 0.01
|
||||||
diskimages: *provider_diskimages
|
diskimages: *provider_diskimages
|
||||||
|
Loading…
x
Reference in New Issue
Block a user