Limit rax-ord launch concurrency and don't retry

This region seems to take a very long time to launch nodes when we
have a burst of requests for them, like a thundering herd sort of
behavior causing launch times to increase substantially. We have a
lot of capacity in this region though, so want to boot as many
instances as we can here. Attempt to reduce the effect by limiting
the number of instances nodepool will launch at the same time.

Also, mitigate the higher timeout for this provider by not retrying
launch failures, so that we won't ever lock a request for multiples
of the timeout.

Change-Id: I179ab22df37b2f996288820074ec69b8e0a202a5
This commit is contained in:
Jeremy Stanley 2023-03-10 17:48:25 +00:00
parent 34d3d03e32
commit d0481326bf

View File

@ -189,6 +189,13 @@ providers:
region-name: 'ORD'
cloud: rax
boot-timeout: 120
# Under load, this region can take a very long time to launch instances,
# but we have a lot of capacity here so it's worthwhile to increase the
# timeout but mitigate node request delays by not retrying failures. Also
# try to substantially reduce the number of instances we launch in
# parallel.
max-concurrency: 10
launch-retries: 1
launch-timeout: 900
rate: 0.01
diskimages: *provider_diskimages