Commit Graph

1951 Commits

Author SHA1 Message Date
Clark Boylan
97d35a8dd5 Set xenial min ready to 0
We are trying to phase out this node type. We don't need to have a ready
node sitting around at all times for it.

Change-Id: I74da8de9b9776f2f33e921f3566e5f1c134be88d
2024-07-23 17:27:39 -07:00
Clark Boylan
3ff9c96e3b Reduce centos-8-stream min ready to 0
In preparation for centos-8-stream cleanup we want to ensure we are not
going to automatically boot more nodes that we need to clean up.
Followup changes will more completely remove the node from nodepool.

Change-Id: I4ea6b7ab449124325cf22129663f86ef7117a5b9
2024-07-23 13:48:39 -07:00
Dr. Jens Harbott
0b35d2564e Bump max-servers for openmetal cloud to maximum
50 servers should be maximum capacity for this cloud, full steam ahead.

Change-Id: I030f56054cafa402afc2bb2d0148bce119da4a2b
2024-06-20 09:00:14 +02:00
Jeremy Stanley
f36b8cf6a9 Use the "nova" AZ for Nodepool in openmetal-iad3
Address spurious "no valid host" errors for nodes booted in
openmetal-iad3 by specifying the availability zone we intend
Nodepool to use. There is another one which is reserved for our
control plane project/mirror server, and it doesn't allow resource
creation by Nodepool's project.

Change-Id: I3507f5b4557d64c7ebe779743ef397cb4e846cfa
2024-06-19 17:25:54 +00:00
Dr. Jens Harbott
527a148bcd Enable openmetal cloud for nodepool
The new openmetal cloud should be ready for production, but let's be
cautious and start with just a single server for now.

Change-Id: I7252dc394b1c2f41991278eacf3b7eb32b886dcd
2024-06-19 18:25:28 +02:00
Dr. Jens Harbott
03e9785b84 nodepool: reorder providers to list rax-dfw last
There are currently some image upload issues for the rax-dfw region.
Since there is an issue in nodepool that causes uploads to other
provider getting blocked by this (see [0]), move the rax providers to
the end of the list, so all other providers will be updated before
hittinh the issue.

[0] https://review.opendev.org/c/zuul/nodepool/+/922242

Change-Id: Ib8bec7ea2afd5b4569bddb9a4feea6bb07a280f3
2024-06-19 11:28:35 +02:00
Dr. Jens Harbott
ea72a45075 nodepool: revert dedicated diskimages list for rax-iad
When we had some issues with image uploads to rax-iad last year, we
needed to configure it with a dedicated list of diskimages in order to
be able to add "pause: true" statements to those. When the issue was
resolved, we only dropped the pause statements instead of reverting to
use the original list shared across all providers [0]. This is getting
fixed now.

[0] Ia2b737e197483f9080b719bab0ca23461850e157

Change-Id: Ibc1a11d04d5258ca503f8cd5b5fe7a30eb99bc56
2024-06-19 11:27:51 +02:00
Dr. Jens Harbott
2d5825ff33 Fix image uploads for openmetal
The images for centos-8-stream and gentoo are no longer buildable and
nodepool deleted the old "raw" versions of it in order to save space, so
these currently cannot be uploaded to the new openmetal cloud.

Add a reduced list of images as a temporary workaround, this can be
reverted once the old images have been cleaned out globally.

Also add an explicit "pause: true" stanza to the gentoo image build
which so far has only been paused manually.

Change-Id: I4c67940cb0ff9ead6ae2274e00e6f48e7cbcce89
2024-06-18 10:01:50 +02:00
Tony Breeds
20a0a5707f nodepool: Switch "common job platform" from bionic to jammy
Bionic isn't that common anymore so switch the min-ready to Jammy

Change-Id: I66f85c5b462bcae91f14195214194714aca13618
2024-06-18 11:50:29 +10:00
Jeremy Stanley
ffecbaca06 Correct OpenMetal region from iad3 to IAD3
Brown bag fix for mis-cased region name in the Nodepool config.

Change-Id: Ibc4ceef42afc5c0d2ba5c648408cde1843fbc814
2024-06-13 23:08:02 +00:00
Jeremy Stanley
28ae631642 Add OpenMetal to Nodepool and Grafana
This is essentially reverting commits bd15ddc and cb4b99b which were
the final stages of winding down and cleaning up the old InMotion
cloud which OpenMetal has replaced, with the cloud name updated (but
region kept the same) and grafyaml data regenerated. It stops short
of actually booting nodes in the new environment until we have a
chance to spot check things once images get uploaded.

Since this is re-adding diskimages back to nl02, I refrained from
including centos-8-stream which is in the progress of being removed,
so that we don't unnecessarily upload images we're not planning to
boot.

Change-Id: If8e9b7105b4c7a13e87ebb4f6c985e821c30a842
2024-06-13 20:34:06 +00:00
Clark Boylan
cea490f471 Pause centos-8-stream image builds
CentOS 8 Stream is EOL and they cleaned up the repos for this release
which our mirrors happily mirrored. This means the nodes are no longer
functional without significant intervention. Intead of fixing things we
start the process of removal by stopping the image builds.

Change-Id: I5642a8dd3d539563caeba7a09cb86fe19769b38e
2024-06-11 13:06:41 -07:00
Clark Boylan
bd15ddc388 Remove the inmotion cloud entirely
This is the last step in inmotion cloud cleanup. It does leave nl02 as a
nodepool launcher with no active providers. I suspect this is fine and
we'll add the new OpenMetal cloud to nl02 at some point in the future.

The grafana graphs will also need to be manually deleted at some point
as removing the yaml file doesn't remove the dashboard from grafana.

Change-Id: Ib33e0c45c277f77013fe5820b898df03da58b558
2024-06-03 13:20:08 -07:00
Clark Boylan
cb4b99b3ad Clean up diskimages from inmotion cloud
Step two in removing the old inmotion cloud is deleting the old images
from the cloud so that Nodepool isn't trying to clean things up after
the cloud has gone away. This should hopefully avoid the need for manual
db cleanup.

Change-Id: Ia1451df866a7c91a652a4d9083e6c78cbfde678e
2024-06-03 13:18:26 -07:00
Clark Boylan
220c582d57 Set inmotion max servers to 0
This is the first step in winding down this cloud. We plan to redeploy
the cloud on top of an entirely new platform and openstack version. We
can't do that in a new cloud adjacent to the existing one; instead we
must tear down the existing cloud and redeploy on the same hardware.
Wind it down gracefully in Nodepool first.

Change-Id: Icdaca06b2a2737f74daf60cb29674f06160c7faf
2024-06-03 13:15:51 -07:00
Tony Breeds
5b7316cff8 Switch nodepool over to the latest infra-root keyfile
Change-Id: If745d190d6a5586fbf23815b10b8411af3993828
2024-05-31 12:57:50 -05:00
Clark Boylan
eceb8690f6 Chown the /opt/git repo cache to zuul:zuul
Latest git packages on Ubuntu (and possibly other locations in the
future) don't allow locally cloning repos owned by a different user by
default. Attempting to do so results in this error:

  fatal: detected dubious ownership in repository at '/opt/git/opendev.org/foo/bar/.git'
  To add an exception for this directory, call:

      git config --global --add safe.directory /opt/git/opendev.org/foo/bar/.git
  fatal: Could not read from remote repository.

  Please make sure you have the correct access rights
  and the repository exists.

Currently the /opt/git repos are owned by root:root. We expect that
zuul will be the most common user to interact with these cached repos so
we chown to zuul:zuul in order to avoid these problems as much as
possible. Any cases not using zuul will have to determine a path foward
for that special circumstances.

Change-Id: I7cb21869bae42baed5027a9380f60762ab8944e0
2024-05-29 14:35:55 -07:00
Zuul
3336b1f7ae Merge "Retire devstack-gate" 2024-05-28 21:52:41 +00:00
Zuul
329d5851b5 Merge "Clean up unused labels from nl02 config" 2024-05-23 04:23:54 +00:00
Jeremy Stanley
74db99801a Fix unbound setup for ubuntu-noble
dns-root-data has been demoted to a "Recommends" dependency of
unbound, which we don't install. Sadly the default unbound
configuration is broken without it.

Change-Id: Ie285d1c058c4ad7c6579f25cf24884d8e396e1dc
2024-05-22 19:35:36 +00:00
Jeremy Stanley
1d497b0c7d Clean up unused labels from nl02 config
The provider-specific label variants designating nested virt
acceleration support or larger flavors are unused by nl02, so delete
them to reduce confusion.

Change-Id: Id3ac994216624e32d83ae9066d3e77f713cc7245
2024-05-22 15:33:34 +00:00
Jeremy Stanley
059f2785e5 Add Ubuntu 24.04 LTS (ubuntu-noble) nodes
Build images and boot ubuntu-noble everywhere we do for
ubuntu-jammy. Drop the kernel boot parameter override we use on
Jammy since it's default in the kernel versions included in Noble
now.

Change-Id: I3b9d01a111e66290cae16f7f4f58ba0c6f2cacd8
2024-05-21 19:37:55 +00:00
Clark Boylan
82e7504c36 Retire devstack-gate
This removes devstack-gate from the CI system and updates ACLS for the
repo in Gerrit to the openstack retired repo config. There are also
cleanups to nodepool elements and irc bots to remove references to
devstack-gate where we don't need them anymore.

Depends-On: https://review.opendev.org/c/openstack/governance/+/919629
Change-Id: I50ce4e5aa7001ba52bea78a65855278be68e61a5
2024-05-14 15:16:58 -07:00
Zuul
faeda6e404 Merge "Support Ubuntu 24.04 in nodepool elements" 2024-04-23 19:21:57 +00:00
Dr. Jens Harbott
f0d909d7c3 Support Ubuntu 24.04 in nodepool elements
Extend all the tweaks that we have for Ubuntu 22.04 also apply to the
next LTS release.

Change-Id: Id62d39ba4b2af5f5ffd395b97a5187f5082bd4b0
2024-04-17 17:58:28 +00:00
Zuul
371ec90145 Merge "Add warning to nodepool configs about changing cloud name" 2024-04-17 12:29:38 +00:00
Clark Boylan
8c088d17f1 Enable nodepool delete after upload option
This enables the nodepool delete-after-upload option with keep-formats
set to qcow2 on x86 image builders. This should clear out vhd and raw
files after uploads for those formats are completed keeping only qcow2
longer term. This should reduce disk space overhead while still enabling
us to convert from qcow2 to the other formats if that becomes necessary.

Note that we do not enable this for arm64 before arm64 builders
currently build raw images only and we still want at least one copy of
the image to be kept even if it is raw (and not qcow2).

Change-Id: I6cf481e0f9a5eaff35b5d961a084ae34a49ea6c6
2024-03-26 15:10:36 -07:00
Clark Boylan
aabaf95b49 Remove centos-7 nodepool image builds
This is the last step in cleaning centos-7 out of nodepool. The previous
change will have cleaned up uploads and now we can stop building the
images entirely.

Change-Id: Ie81d6d516cd6cd42ae9797025a39521ceede7b71
2024-03-13 08:30:16 -07:00
Clark Boylan
b8c53b9c03 Remove centos-7 image uploads from Nodepool
This removal of centos-7 image uploads should cause Nodepool to clean up
the existing images in the clouds. Once that is done we can completely
remove the image builds in a followup change.

We are performing this cleanup because CentOS 7 is near its EOL and
cleaning it up will create room on nodepool builders and our mirrors for
other more modern test platforms.

Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/912786
Change-Id: I48f6845bc7c97e0a8feb75fc0d540bdbe067e769
2024-03-13 08:21:46 -07:00
Clark Boylan
774ad69f33 Add warning to nodepool configs about changing cloud name
The cloud name is used to lookup cloud credentials in clouds.yaml, but
it is also used to determine names for things like mirrors within jobs.
As a result changing this value can impact running jobs as you need to
update DNS for mirrors (and possibly launch new mirrors) first. Add a
warning to help avoid problems like this in the future.

Change-Id: I9854ad47553370e6cc9ede843be3303dfa1f9f34
2024-03-07 11:28:17 -08:00
James E. Blair
f5c200181a Revert "Try switching Rackspace DFW to an API key"
This reverts commit eca3bde9cb.

This was successful, but we want to make the change without altering
the cloud name.  So switch this back, and separately we will update
the config of the rax cloud.

Change-Id: I8cdbd7777a2da866e54ef9210aff2f913a7a0211
2024-03-07 08:46:25 -08:00
Jeremy Stanley
eca3bde9cb Try switching Rackspace DFW to an API key
Switch the Rackspace region with the smallest quota to uploading
images and booting server instances with our account's API key
instead of its password, in preparation for their MFA transition. If
this works as expected, we'll make a similar switch for the
remaining two regions.

Change-Id: I97887063c735c96d200ce2cbd8950bbec0ef7240
Depends-On: https://review.opendev.org/911164
2024-03-06 15:06:34 +00:00
Clark Boylan
56c5fefcf6 CentOS 7 removal prep changes
This drop min-ready for centos-7 to 0 and removes use of some centos 7
jobs from puppet-midnoet. We will clean up those removed jobs in a
followup change to openstack-zuul-jobs.

We also remove x/collected-openstack-plugins from zuul. This repo uses
centos 7 nodesets that we want to clean up and it last merged a change
in 2019. That change was written by the infra team as part of global
cleanups. I think we can remove it from zuul for now and if interest
restarts it can be added and fixed up.

Change-Id: I06f8b0243d2083aacb44fe12c0c850991ce3ef63
2024-03-04 10:25:58 -08:00
Clark Boylan
c41bc6e5c2 Remove debian-buster image builds from nodepool
This should be landed after the parent chagne has landed and nodepool
has successfully deleted all debian-buster image uploads from our cloud
providers. At this point it should be safe to remove the image builds
entirely.

Change-Id: I7fae65204ca825665c2e168f85d3630686d0cc75
2024-02-23 13:23:22 -08:00
Clark Boylan
feff36e424 Drop debian-buster image uploads from nodepool
Debian buster has been replaced by bullseye and bookworm, both of which
are releases we have images for. It is time to remove the unused debian
buster images as a result.

This change follows the process in nodepool docs for removing a provider
[0] (which isn't quite what we are doing) to properly remove images so
that they can be deleted by nodepool before we remove nodepool's
knowledge of them. The followup change will remove the image builds from
nodepool.

[0] https://zuul-ci.org/docs/nodepool/latest/operation.html#removing-a-provider

Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/910015
Change-Id: I37cb3779944ff9eb1b774ecaf6df3c6929596155
2024-02-23 13:19:49 -08:00
Clark Boylan
8eb9cb661e Set debian-buster min servers to 0
This is in preparation for the removal of this distro release from
Nodepool. Setting this value to will prevent nodepool from automatically
booting new nodes under this label if we cleanup any existing nodes.

Change-Id: I90b6c84a92a0ebc4f40ac3a632667c8338d477f1
2024-02-23 08:41:20 -08:00
Clark Boylan
211fe14946 Remove opensuse-15 image builds from nodepool
This should be landed after the parent chagne has landed and nodepool
has successfully deleted all opensuse-15 image uploads from our cloud
providers. At this point it should be safe to remove the image builds
entirely.

Change-Id: Icc870ce04b0f0b26df673f85dd6380234979906f
2024-02-22 10:27:37 -08:00
Clark Boylan
5635e67866 Drop opensuse image uploads from nodepool
These images are old opensuse 15.2 and there doesn't seem to be interest
in keeping these images running (very few jobs ever ran on them and
rarely successfully and no one is trying to update to 15.5 or 15.6).

This change follows the process in nodepool docs for removing a provider
[0] (which isn't quite what we are doing) to properly remove images so
that they can be deleted by nodepool before we remove nodepool's
knowledge of them. The followup change will remove the image builds from
nodepool.

[0] https://zuul-ci.org/docs/nodepool/latest/operation.html#removing-a-provider

Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/909773
Change-Id: Id9373762ed5de5c7c5131811cec989c2e6e51910
2024-02-22 10:25:15 -08:00
Clark Boylan
b8b984e5b6 Set opensuse-15 min-ready to 0
This is in preparation for the followup changes that will drop opensuse
nodes and images entirely. We set min-ready to 0 first so that we can
manually delete any running nodes before cleaning things up further.

Change-Id: I6cae355fd99dd90b5e48f804ca0d63b641c5da11
2024-02-21 09:32:56 -08:00
Jeremy Stanley
cedfb950de Temporarily lower max-servers for linaro
The launcher is seeing "Error in creating the server. Compute
service reports fault: No valid host was found." from Linaro's
cloud, leading to NODE_FAILURE results for many jobs when our other
ARM-based node provider is running at quota. According to graphs,
we've been able to sustain 16 nodes in-use in this cloud, so
temporarily cap max-servers at that in order to avoid further
failures until someone has a chance to look into what has broken
there.

Change-Id: I3f79e9cc70e848b9ebc6728205f806693209dfd5
2023-12-14 16:33:20 +00:00
Michal Nasiadka
4ba928c675 Add nested-virt-debian-bookworm
Change-Id: I17a202cc82ff19a788fde7b34415542c1b354fae
2023-10-04 14:47:23 +02:00
Clark Boylan
4a3c87dbcd Set a six hour nodepool image upload timeout
This was the old timeout then some refactoring happened and we ended up
with the openstacksdk timeout of one hour. Since then Nodepool added the
ability to configure the timeout so we set it back to the original six
hour value.

Change-Id: I29d0fa9d0077bd8e95f68f74143b2d18dc62014b
2023-09-15 12:57:25 -07:00
Clark Boylan
3b9c5d2f07 Remove fedora image builds
This removes the fedora image builds from nodepool. At this point
Nodepool should no longer have any knowledge of fedora.

There is potential for other cleanups for things like dib elements, but
leaving those in place doesn't hurt much.

Change-Id: I3e6984bc060e9d21f7ad851f3a64db8bb555b38a
2023-09-06 09:16:34 -07:00
Clark Boylan
d83736575e Remove fedora-35 and fedora-36 from nodepool providers
This will stop providing the node label entirely and should result in
nodepool cleaning up the existing images for these images in our cloud
providers. It does not remove the diskimages for fedora which will
happen next.

Change-Id: Ic1361ff4e159509103a6436c88c9f3b5ca447777
2023-09-06 09:12:33 -07:00
Clark Boylan
8d32d45da2 Set fedora labels min-ready to 0
In preparation for fedora node label removal we set min-ready to 0. This
is the first step to removing the images entirely.

Change-Id: I8c2a91cc43a0dbc633857a2733d66dc935ce32fa
2023-09-06 09:07:13 -07:00
Jeremy Stanley
16ddb49e48 Drop libvirt-python from suse in bindep fallback
The bindep fallback list includes a libvirt-python package for all
RPM-based distros, but it appears that OpenSuse Leap has recently
dropped this (likely as part of removing Python 2.7 related
packages). Exclude the package on that platform so that the
opensuse-15 job will stop failing.

Change-Id: I0bb7d9b7b34f4f6c392374182538b7e433617e13
2023-09-06 15:15:03 +00:00
Dr. Jens Harbott
d0c0ddb977 Reduce frequency of image rebuilds
In order to reduce the load on our builder nodes and reduce the strain
on our providers' image stores, build most images only once per week.

Exceptions are ubuntu-jammy, our most often used distro image, which we
keep rebuilding daily, and some other more frequently used images built
every 2 days.

Change-Id: Ibba7f864b15e478fda59c998843c3b2ace0022d8
2023-09-02 13:18:19 +02:00
Dr. Jens Harbott
407f859232 Unpause image uploads for rax-iad part 2
Enable uploads for all images again for rax-iad. We have configured the
nodepool-builders to run with only 1 upload thread, so we will have at
most two parallel uploads (one per builder).

Change-Id: Ia2b737e197483f9080b719bab0ca23461850e157
2023-08-30 21:07:27 +02:00
Dr. Jens Harbott
c8b1b1c3b6 Unpause image upload for rax-iad part 1
This is a partial revert of d50921e66b.

We want to slowly re-enable image uploads for rax-iad, start with a
single image, choosing the one that is getting used most often.

Change-Id: I0816f7da73e66085fe6c52372531477e140cfb76
Depends-On: https://review.opendev.org/892056
2023-08-19 20:12:40 +00:00
Dr. Jens Harbott
d50921e66b Revert "Revert "Temporarily pause image uploads to rax-iad""
This reverts commit 27a3da2e53.

Reason for revert: Uploads are still not working properly

Change-Id: I2a75dd9ff0731a4113a362f9f17f510a9a236ebb
2023-08-10 07:24:07 +00:00