Looking at our graphs, we're still spiking up into the 30-60
concurrent building range at times, which seems to result in some
launches exceeding the already lengthy timeout and wasting quota,
but when things do manage to boot we effectively utilize most of
max-servers nicely. The variability is because max-concurrency is
the maximum number of in-flight node requests the launcher will
accept for a provider, but the number of nodes in a request can be
quite large sometimes.
Raise max-servers back to its earlier value reflecting our available
quota in this provider, but halve the max-concurrency so we don't
try to boot so many at a time.
Change-Id: I683cdf92edeacd7ccf7b550c5bf906e75dfc90e8
This region seems to take a very long time to launch nodes when we
have a burst of requests for them, like a thundering herd sort of
behavior causing launch times to increase substantially. We have a
lot of capacity in this region though, so want to boot as many
instances as we can here. Attempt to reduce the effect by limiting
the number of instances nodepool will launch at the same time.
Also, mitigate the higher timeout for this provider by not retrying
launch failures, so that we won't ever lock a request for multiples
of the timeout.
Change-Id: I179ab22df37b2f996288820074ec69b8e0a202a5
We're still seeing a lot of timeouts waiting for instances to become
active in this provider, and are observing fairly long delays
between API calls at times. Increase the launch wait from 10 to 15
minutes, and increase the minimum delay between API calls by an
order of magnitude from 0.001 to 0.01 seconds.
Change-Id: Ib13ff03629481009a838a581d98d50accbf81de2
Reduce the max-servers in rax-ord from 195 to 100, and revert the
boot-timeout from the 300 we tried back down to 120 like the others.
We're continuing to see server create calls taking longer to report
active than nodepool is willing to wait, but also may be witnessing
the results of API rate limiting or systemic slowness. Reducing the
number of instances we attempt to boot there may give us a clearer
picture of whether that's the case.
Change-Id: Ife7035ba64b457d964c8497da0d9872e41769123
For a while we've been seeing a lot of "Timeout waiting for instance
creation" in Rackspace's ORD region, but checking behind the
launcher it appears these instances do eventually boot, so we're
wasting significant resources discarding quota we never use.
Increase the timeout for this from 2 minutes to 5, but only in this
region as 2 minutes appears to be sufficient in the others.
Change-Id: I1cf91a606eefc4aa65507f491a20182770b99f09
This seems to have been overlooked when the label was added to other
launchers, and is contributing to NODE_FAILURE results for some
jobs, particularly now that fedora-latest is relying on it.
Change-Id: Ifc0e5452ac0cf275463f6f1cfbe0d7fe350e3323
openEuler 20.03-LTS-SP2 was out of date in May 2022. 22.03 LTS
is the newest LTS version. It was release in March 2022 and
will be maintained for 2 years. This patch upgrades the LTS
version. It'll be used in Devstack, Kolla-ansible and so on
in CI jobs.
Change-Id: I23f2b397bc7f1d8c2a959e0e90f5058cf3bf104d
This distro release reached its EOL December 31, 2021. We are removing
it from our CI system as people should really stop testing on it. They
can use CentOS 8 Stream or other alternatives instead.
Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/827181
Change-Id: I13e8185b7839371a9f9043b715dc39c6baf907d5
This is in preparation for removing this label. This distro is no longer
supported and users will need to find alternatives.
Change-Id: I57b363671809afe415a376b0894041438140bdae
This removes the label, nodes, and images for opensuse-tumbleweed across
our cloud providers. We also update grafana to stop graphing stats for
the label.
Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/824068
Change-Id: Ic311af5d667c01c1845251270fd2fdda7d99ebcb
This is in preparation to remove the image and label entirely. Nothing
seems to use the image so clean it up.
Change-Id: I5ab3a0627874e302289deb442f80a782509df2c3
CentOS Stream 9 repos are ready and support for it is included in
diskimage-builder [1]. This patch is adding centos-9-stream diskimages
and images to nodepool configuration in opendev.
[1] https://review.opendev.org/c/openstack/diskimage-builder/+/811392
Change-Id: I9b88baf422d947d5209d036766a86b09dca0c21a
This is a followup change to the previous change which removed fedora-32
node launches. Here we cleanup the disk image configs
Change-Id: I459ec47735550e4adcc912bd707836582223b075
Since Debian Buster can not be used with nova 23.0.0 because of the
min required libvirt version, we should make Bullseye available for CI
to ensure that OpenStack Wallaby release will run on it smoothly.
Depends-On: https://review.opendev.org/c/openstack/diskimage-builder/+/783790
Change-Id: I9c1bb7aaa02ba60ee52e2d7b990e2e6e1212317f
This is a followup change to the previous change which removed fedora-31
node launches. Here we cleanup the disk image configs.
Change-Id: Ic72e324c65ee9d18e9c4cf6627ea6c147b9f484b
A followup change will remove the diskimage configuration for fedora-31.
The current fedora releases are 32 and 33 so we should clean up 31.
Change-Id: I0dde34ab005f48ac521d91e407ac437d3cec965f
Our system-config roles for nodepool update the zookeeper configuration
for our nodepool hosts. The content in the files here is merely a
placeholder. Make that more apparent via the addition of comments and
use of dummy data.
Change-Id: I4e35088a04f6393409963f841f2e9ba174c69598
This flips the active control of the rax providers from
nl01.openstack.org to nl01.opendev.org. This change should only be
landed once we are happy with the deployment of a new nl01.opendev.org
instance.
Change-Id: Idc838c3cb2d631ef684f733b564f7d4713fc3a41
This removes min ready settings for labels on nl01.opendev.org to
prevent it from generating extra min ready requests. We also set
max-servers to 0 on its providers so that it essentailly idles.
This is done to ensure the new server can be deployed properly. A
followon change will flip nl01.openstack.org to idle and set min-ready
and max-servers values on nl01.opendev.org to put it into use.
The old server can be removed once it idles.
Change-Id: Icb13ce9153e3d627b187ec8722539f20db51e266
This is prep work for eventually deploying an nl01.opendev.org server.
First thing this allows us to do is find and deploy the right config in
testing. Once nl01.openstack.org has been completely replaced we can
remove its config from this repo.
Change-Id: Ib269cdc2bc6b9f96ba1e7a07a594267f49cbfcd5