Limit rax-ord launch concurrency and don't retry

This region seems to take a very long time to launch nodes when we have a burst of requests for them, like a thundering herd sort of behavior causing launch times to increase substantially. We have a lot of capacity in this region though, so want to boot as many instances as we can here. Attempt to reduce the effect by limiting the number of instances nodepool will launch at the same time. Also, mitigate the higher timeout for this provider by not retrying launch failures, so that we won't ever lock a request for multiples of the timeout. Change-Id: I179ab22df37b2f996288820074ec69b8e0a202a5
2023-03-10 17:48:25 +00:00 · 2023-03-10 17:48:25 +00:00 · d0481326bf
commit d0481326bf
parent 34d3d03e32
1 changed files with 7 additions and 0 deletions
--- a/nodepool/nl01.opendev.org.yaml
+++ b/nodepool/nl01.opendev.org.yaml
@ -189,6 +189,13 @@ providers:
    region-name: 'ORD'
    cloud: rax
    boot-timeout: 120
+    # Under load, this region can take a very long time to launch instances,
+    # but we have a lot of capacity here so it's worthwhile to increase the
+    # timeout but mitigate node request delays by not retrying failures. Also
+    # try to substantially reduce the number of instances we launch in
+    # parallel.
+    max-concurrency: 10
+    launch-retries: 1
    launch-timeout: 900
    rate: 0.01
    diskimages: *provider_diskimages