This removes the fedora image builds from nodepool. At this point
Nodepool should no longer have any knowledge of fedora.
There is potential for other cleanups for things like dib elements, but
leaving those in place doesn't hurt much.
This will stop providing the node label entirely and should result in
nodepool cleaning up the existing images for these images in our cloud
providers. It does not remove the diskimages for fedora which will
Looking at our graphs, we're still spiking up into the 30-60
concurrent building range at times, which seems to result in some
launches exceeding the already lengthy timeout and wasting quota,
but when things do manage to boot we effectively utilize most of
max-servers nicely. The variability is because max-concurrency is
the maximum number of in-flight node requests the launcher will
accept for a provider, but the number of nodes in a request can be
quite large sometimes.
Raise max-servers back to its earlier value reflecting our available
quota in this provider, but halve the max-concurrency so we don't
try to boot so many at a time.
This region seems to take a very long time to launch nodes when we
have a burst of requests for them, like a thundering herd sort of
behavior causing launch times to increase substantially. We have a
lot of capacity in this region though, so want to boot as many
instances as we can here. Attempt to reduce the effect by limiting
the number of instances nodepool will launch at the same time.
Also, mitigate the higher timeout for this provider by not retrying
launch failures, so that we won't ever lock a request for multiples
of the timeout.
We're still seeing a lot of timeouts waiting for instances to become
active in this provider, and are observing fairly long delays
between API calls at times. Increase the launch wait from 10 to 15
minutes, and increase the minimum delay between API calls by an
order of magnitude from 0.001 to 0.01 seconds.
Reduce the max-servers in rax-ord from 195 to 100, and revert the
boot-timeout from the 300 we tried back down to 120 like the others.
We're continuing to see server create calls taking longer to report
active than nodepool is willing to wait, but also may be witnessing
the results of API rate limiting or systemic slowness. Reducing the
number of instances we attempt to boot there may give us a clearer
picture of whether that's the case.
For a while we've been seeing a lot of "Timeout waiting for instance
creation" in Rackspace's ORD region, but checking behind the
launcher it appears these instances do eventually boot, so we're
wasting significant resources discarding quota we never use.
Increase the timeout for this from 2 minutes to 5, but only in this
region as 2 minutes appears to be sufficient in the others.
This seems to have been overlooked when the label was added to other
launchers, and is contributing to NODE_FAILURE results for some
jobs, particularly now that fedora-latest is relying on it.
openEuler 20.03-LTS-SP2 was out of date in May 2022. 22.03 LTS
is the newest LTS version. It was release in March 2022 and
will be maintained for 2 years. This patch upgrades the LTS
version. It'll be used in Devstack, Kolla-ansible and so on
in CI jobs.
This distro release reached its EOL December 31, 2021. We are removing
it from our CI system as people should really stop testing on it. They
can use CentOS 8 Stream or other alternatives instead.
This removes the label, nodes, and images for opensuse-tumbleweed across
our cloud providers. We also update grafana to stop graphing stats for
Since Debian Buster can not be used with nova 23.0.0 because of the
min required libvirt version, we should make Bullseye available for CI
to ensure that OpenStack Wallaby release will run on it smoothly.
Our system-config roles for nodepool update the zookeeper configuration
for our nodepool hosts. The content in the files here is merely a
placeholder. Make that more apparent via the addition of comments and
use of dummy data.
This flips the active control of the rax providers from
nl01.openstack.org to nl01.opendev.org. This change should only be
landed once we are happy with the deployment of a new nl01.opendev.org
This removes min ready settings for labels on nl01.opendev.org to
prevent it from generating extra min ready requests. We also set
max-servers to 0 on its providers so that it essentailly idles.
This is done to ensure the new server can be deployed properly. A
followon change will flip nl01.openstack.org to idle and set min-ready
and max-servers values on nl01.opendev.org to put it into use.
The old server can be removed once it idles.
This is prep work for eventually deploying an nl01.opendev.org server.
First thing this allows us to do is find and deploy the right config in
testing. Once nl01.openstack.org has been completely replaced we can
remove its config from this repo.