project-config

Author	SHA1	Message	Date
Clark Boylan	29f15bb14c	Set launch-timeout on nodepool providers The nodepool openstack provider implementation has a launch timeout default of 3600 seconds or one hour. This is problematic for us because a node request will be attempted three times by a provider before being passed onto the next provider. This means we may wait up to three hours per failed provider to launch a single node. Fix this by setting a timeout of 10 minutes on the three providers that didn't already set a lower timeout (raxflex and the two vexxhost providers). Looking at grafana graphs for time to ready this should be plenty of time for normal booting conditions and reduces the time we wait from 3 hours per provider to 30 minutes. Change-Id: I0a4af9f5519ff2b64c737b1822590fbb2608e8bb	2024-09-24 15:25:56 -07:00
Clark Boylan	963bb0e3d1	Add nested virt labels to raxflex and openmetal providers The openmetal provider nodes have Intel VMX flags set and raxflex provider nodes have AMD SVM flags set. Both should be capable of nested virt (as long as nested virt works anyway) so lets add these labels to these clouds. In the openmetal case we have the ability to directly gather debugging info ourselves and in the raxflex case we know how to contact when things go wrong. Change-Id: Icc7c9cbafaef93f3ccec7010c82af1d36e02533c	2024-09-12 10:58:35 -07:00
Jeremy Stanley	687fb1700b	Increase the boot timeout for Rackspace Flex nodes Under heavy load, we're occasionally seeing ssh-keyscan time out with no reply after a minute. Double the time Nodepool is willing to wait for SSH to connect, in hopes that reduces these occurrences. Change-Id: Icfc3cc4a29854455684d88be41ba3e70e8507f3a	2024-09-09 16:44:25 +00:00
Jeremy Stanley	ff719df1e0	Increase raxflex-sjc3 max-servers to 32 Looking at `openstack limits show --absolute` we seem to currently have sufficient memory quota to take this up to 32x 8GB RAM instances (quotas for CPU and total instances are higher so RAM is the limiting factor at the moment). Things seem to be going fine at 20 nodes for the past few hours, so let's go ahead and dial it up to maximum for the weekend and see how it works out before we let our Rackspace contacts know we're ready for additional quota. Change-Id: I1d61da4ca2a1a5963e0cdcff99fdf4f41aea5920	2024-09-06 14:25:47 +00:00
Dr. Jens Harbott	572b7f4ae5	Bump max-servers for raxflex-sjc3 to 20 Initial tests with a single server have been successful, so let's increase the load a bit. Change-Id: I5a17cd428f5f31fef47b2737db1d5344fb0a8aae	2024-09-06 09:31:01 +02:00
Jeremy Stanley	cf6cbbd980	Add default network for Rackspace Flex in nodepool The opendevzuul-network01 network should route to the global Internet via NAT, so make that clear to Nodepool. Change-Id: I8e0038fe82b0fc968d80ab656808b7744c160132	2024-09-05 18:13:36 +00:00
Jeremy Stanley	f17b1e682c	Try booting nodes in Rackspace Flex Our expected Nodepool images are present in the raxflex-sjc3 provider now, so raise max-servers to 1 in order to get some sample builds we can analyze for obvious issues like connectivity. Depend on the subnet resizing change since that might otherwise be disruptive if it deployed while jobs are running there. Depends-On: https://review.opendev.org/927813 Change-Id: If8b39b53608188e5881a273e6d092981a5871e84	2024-09-05 12:55:21 +00:00
Jeremy Stanley	cfc9d60ff3	Add Nodepool images to Rackspace Flex Put the raxflex-sjc3 provider in our Nodepool configuration, but with booting disabled by zeroing max-servers so we can make sure image uploads are working first. Change-Id: Id3d2e73b73c35af52dbc13579e773329e4f9ad68	2024-09-04 18:36:27 +00:00
Clark Boylan	3c76958809	Split min ready nodes between jammy and noble Now that noble is our default nodeset lets split the min ready value of 10 nodes for jammy in half between jammy and noble. When jammy becomes less common we'll set noble to the full 10. Change-Id: I1e3b83c18234e1d15a8eb76a63989b29d4925908	2024-08-22 16:11:44 -07:00
Clark Boylan	ebb5b0cb7e	Stop building centos 8 stream images This should be the last piece of cleanup to remove CentOS 8 stream from Nodepool. We basically tell the builders to forget about these images which should result in cleanups of all the related records on disk and in zk. Change-Id: I71421ce9a10438549ef21441349be84b5d7bd38b	2024-07-30 14:01:53 -07:00
Clark Boylan	66fbfb941e	Remove centos-8-stream image uploads and labels This will stop nodepool from trying to manage centos-8-stream images in our cloud providers. It will also remove the label from nodepool and zuul entirely making this node type unusable. This will produce NODE_FAILURE errors but jobs were already failing 100% of the time due to the lack of valid package mirrors. The followup change will stop our image builds (though they are paused now) which will clean up the images on disk on our builders. Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/922653 Change-Id: I1bdb2441a7b8ce5c651e5e865005f80828cd6f47	2024-07-30 14:01:38 -07:00
Clark Boylan	97d35a8dd5	Set xenial min ready to 0 We are trying to phase out this node type. We don't need to have a ready node sitting around at all times for it. Change-Id: I74da8de9b9776f2f33e921f3566e5f1c134be88d	2024-07-23 17:27:39 -07:00
Clark Boylan	3ff9c96e3b	Reduce centos-8-stream min ready to 0 In preparation for centos-8-stream cleanup we want to ensure we are not going to automatically boot more nodes that we need to clean up. Followup changes will more completely remove the node from nodepool. Change-Id: I4ea6b7ab449124325cf22129663f86ef7117a5b9	2024-07-23 13:48:39 -07:00
Tony Breeds	20a0a5707f	nodepool: Switch "common job platform" from bionic to jammy Bionic isn't that common anymore so switch the min-ready to Jammy Change-Id: I66f85c5b462bcae91f14195214194714aca13618	2024-06-18 11:50:29 +10:00
Tony Breeds	5b7316cff8	Switch nodepool over to the latest infra-root keyfile Change-Id: If745d190d6a5586fbf23815b10b8411af3993828	2024-05-31 12:57:50 -05:00
Jeremy Stanley	059f2785e5	Add Ubuntu 24.04 LTS (ubuntu-noble) nodes Build images and boot ubuntu-noble everywhere we do for ubuntu-jammy. Drop the kernel boot parameter override we use on Jammy since it's default in the kernel versions included in Noble now. Change-Id: I3b9d01a111e66290cae16f7f4f58ba0c6f2cacd8	2024-05-21 19:37:55 +00:00
Zuul	371ec90145	Merge "Add warning to nodepool configs about changing cloud name"	2024-04-17 12:29:38 +00:00
Clark Boylan	aabaf95b49	Remove centos-7 nodepool image builds This is the last step in cleaning centos-7 out of nodepool. The previous change will have cleaned up uploads and now we can stop building the images entirely. Change-Id: Ie81d6d516cd6cd42ae9797025a39521ceede7b71	2024-03-13 08:30:16 -07:00
Clark Boylan	b8c53b9c03	Remove centos-7 image uploads from Nodepool This removal of centos-7 image uploads should cause Nodepool to clean up the existing images in the clouds. Once that is done we can completely remove the image builds in a followup change. We are performing this cleanup because CentOS 7 is near its EOL and cleaning it up will create room on nodepool builders and our mirrors for other more modern test platforms. Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/912786 Change-Id: I48f6845bc7c97e0a8feb75fc0d540bdbe067e769	2024-03-13 08:21:46 -07:00
Clark Boylan	774ad69f33	Add warning to nodepool configs about changing cloud name The cloud name is used to lookup cloud credentials in clouds.yaml, but it is also used to determine names for things like mirrors within jobs. As a result changing this value can impact running jobs as you need to update DNS for mirrors (and possibly launch new mirrors) first. Add a warning to help avoid problems like this in the future. Change-Id: I9854ad47553370e6cc9ede843be3303dfa1f9f34	2024-03-07 11:28:17 -08:00
James E. Blair	f5c200181a	Revert "Try switching Rackspace DFW to an API key" This reverts commit eca3bde9cbba1b680f4f813a421ceb2d5803cf96. This was successful, but we want to make the change without altering the cloud name. So switch this back, and separately we will update the config of the rax cloud. Change-Id: I8cdbd7777a2da866e54ef9210aff2f913a7a0211	2024-03-07 08:46:25 -08:00
Jeremy Stanley	eca3bde9cb	Try switching Rackspace DFW to an API key Switch the Rackspace region with the smallest quota to uploading images and booting server instances with our account's API key instead of its password, in preparation for their MFA transition. If this works as expected, we'll make a similar switch for the remaining two regions. Change-Id: I97887063c735c96d200ce2cbd8950bbec0ef7240 Depends-On: https://review.opendev.org/911164	2024-03-06 15:06:34 +00:00
Clark Boylan	56c5fefcf6	CentOS 7 removal prep changes This drop min-ready for centos-7 to 0 and removes use of some centos 7 jobs from puppet-midnoet. We will clean up those removed jobs in a followup change to openstack-zuul-jobs. We also remove x/collected-openstack-plugins from zuul. This repo uses centos 7 nodesets that we want to clean up and it last merged a change in 2019. That change was written by the infra team as part of global cleanups. I think we can remove it from zuul for now and if interest restarts it can be added and fixed up. Change-Id: I06f8b0243d2083aacb44fe12c0c850991ce3ef63	2024-03-04 10:25:58 -08:00
Clark Boylan	c41bc6e5c2	Remove debian-buster image builds from nodepool This should be landed after the parent chagne has landed and nodepool has successfully deleted all debian-buster image uploads from our cloud providers. At this point it should be safe to remove the image builds entirely. Change-Id: I7fae65204ca825665c2e168f85d3630686d0cc75	2024-02-23 13:23:22 -08:00
Clark Boylan	feff36e424	Drop debian-buster image uploads from nodepool Debian buster has been replaced by bullseye and bookworm, both of which are releases we have images for. It is time to remove the unused debian buster images as a result. This change follows the process in nodepool docs for removing a provider [0] (which isn't quite what we are doing) to properly remove images so that they can be deleted by nodepool before we remove nodepool's knowledge of them. The followup change will remove the image builds from nodepool. [0] https://zuul-ci.org/docs/nodepool/latest/operation.html#removing-a-provider Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/910015 Change-Id: I37cb3779944ff9eb1b774ecaf6df3c6929596155	2024-02-23 13:19:49 -08:00
Clark Boylan	8eb9cb661e	Set debian-buster min servers to 0 This is in preparation for the removal of this distro release from Nodepool. Setting this value to will prevent nodepool from automatically booting new nodes under this label if we cleanup any existing nodes. Change-Id: I90b6c84a92a0ebc4f40ac3a632667c8338d477f1	2024-02-23 08:41:20 -08:00
Clark Boylan	211fe14946	Remove opensuse-15 image builds from nodepool This should be landed after the parent chagne has landed and nodepool has successfully deleted all opensuse-15 image uploads from our cloud providers. At this point it should be safe to remove the image builds entirely. Change-Id: Icc870ce04b0f0b26df673f85dd6380234979906f	2024-02-22 10:27:37 -08:00
Clark Boylan	5635e67866	Drop opensuse image uploads from nodepool These images are old opensuse 15.2 and there doesn't seem to be interest in keeping these images running (very few jobs ever ran on them and rarely successfully and no one is trying to update to 15.5 or 15.6). This change follows the process in nodepool docs for removing a provider [0] (which isn't quite what we are doing) to properly remove images so that they can be deleted by nodepool before we remove nodepool's knowledge of them. The followup change will remove the image builds from nodepool. [0] https://zuul-ci.org/docs/nodepool/latest/operation.html#removing-a-provider Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/909773 Change-Id: Id9373762ed5de5c7c5131811cec989c2e6e51910	2024-02-22 10:25:15 -08:00
Clark Boylan	b8b984e5b6	Set opensuse-15 min-ready to 0 This is in preparation for the followup changes that will drop opensuse nodes and images entirely. We set min-ready to 0 first so that we can manually delete any running nodes before cleaning things up further. Change-Id: I6cae355fd99dd90b5e48f804ca0d63b641c5da11	2024-02-21 09:32:56 -08:00
Clark Boylan	3b9c5d2f07	Remove fedora image builds This removes the fedora image builds from nodepool. At this point Nodepool should no longer have any knowledge of fedora. There is potential for other cleanups for things like dib elements, but leaving those in place doesn't hurt much. Change-Id: I3e6984bc060e9d21f7ad851f3a64db8bb555b38a	2023-09-06 09:16:34 -07:00
Clark Boylan	d83736575e	Remove fedora-35 and fedora-36 from nodepool providers This will stop providing the node label entirely and should result in nodepool cleaning up the existing images for these images in our cloud providers. It does not remove the diskimages for fedora which will happen next. Change-Id: Ic1361ff4e159509103a6436c88c9f3b5ca447777	2023-09-06 09:12:33 -07:00
Clark Boylan	8d32d45da2	Set fedora labels min-ready to 0 In preparation for fedora node label removal we set min-ready to 0. This is the first step to removing the images entirely. Change-Id: I8c2a91cc43a0dbc633857a2733d66dc935ce32fa	2023-09-06 09:07:13 -07:00
Dr. Jens Harbott	5aa792f1ae	Start booting bookworm nodes Image builds have been successful Change-Id: If286eb3e1a75c643f67f3d6d3d7e2d31c205ac1b	2023-07-03 18:47:46 +02:00
Jeremy Stanley	8f916dc736	Restore rax-ord quota but lower max-concurrency Looking at our graphs, we're still spiking up into the 30-60 concurrent building range at times, which seems to result in some launches exceeding the already lengthy timeout and wasting quota, but when things do manage to boot we effectively utilize most of max-servers nicely. The variability is because max-concurrency is the maximum number of in-flight node requests the launcher will accept for a provider, but the number of nodes in a request can be quite large sometimes. Raise max-servers back to its earlier value reflecting our available quota in this provider, but halve the max-concurrency so we don't try to boot so many at a time. Change-Id: I683cdf92edeacd7ccf7b550c5bf906e75dfc90e8	2023-03-16 19:53:55 +00:00
Jeremy Stanley	d0481326bf	Limit rax-ord launch concurrency and don't retry This region seems to take a very long time to launch nodes when we have a burst of requests for them, like a thundering herd sort of behavior causing launch times to increase substantially. We have a lot of capacity in this region though, so want to boot as many instances as we can here. Attempt to reduce the effect by limiting the number of instances nodepool will launch at the same time. Also, mitigate the higher timeout for this provider by not retrying launch failures, so that we won't ever lock a request for multiples of the timeout. Change-Id: I179ab22df37b2f996288820074ec69b8e0a202a5	2023-03-10 18:09:33 +00:00
Jeremy Stanley	bc7d946ca2	Wait longer for rax-ord nodes and ease up API rate We're still seeing a lot of timeouts waiting for instances to become active in this provider, and are observing fairly long delays between API calls at times. Increase the launch wait from 10 to 15 minutes, and increase the minimum delay between API calls by an order of magnitude from 0.001 to 0.01 seconds. Change-Id: Ib13ff03629481009a838a581d98d50accbf81de2	2023-03-08 14:39:38 +00:00
Jeremy Stanley	6f5c773b6e	Try halving max-servers for rax-ord region Reduce the max-servers in rax-ord from 195 to 100, and revert the boot-timeout from the 300 we tried back down to 120 like the others. We're continuing to see server create calls taking longer to report active than nodepool is willing to wait, but also may be witnessing the results of API rate limiting or systemic slowness. Reducing the number of instances we attempt to boot there may give us a clearer picture of whether that's the case. Change-Id: Ife7035ba64b457d964c8497da0d9872e41769123	2023-03-07 18:39:00 +00:00
Jeremy Stanley	a177d641f2	Increase boot-timeout for rax-ord For a while we've been seeing a lot of "Timeout waiting for instance creation" in Rackspace's ORD region, but checking behind the launcher it appears these instances do eventually boot, so we're wasting significant resources discarding quota we never use. Increase the timeout for this from 2 minutes to 5, but only in this region as 2 minutes appears to be sufficient in the others. Change-Id: I1cf91a606eefc4aa65507f491a20182770b99f09	2023-03-06 16:56:45 +00:00
Jeremy Stanley	b597a94b2e	Add missing fedora-36 label to nl01 This seems to have been overlooked when the label was added to other launchers, and is contributing to NODE_FAILURE results for some jobs, particularly now that fedora-latest is relying on it. Change-Id: Ifc0e5452ac0cf275463f6f1cfbe0d7fe350e3323	2022-09-19 15:05:52 +00:00
Ian Wienand	36b9c302e5	nodepool: Add Fedora 36 Add Fedora 36 builds Change-Id: I64fac34945ea5c6ec91ddd442281fcaba2c53271	2022-08-22 11:25:09 +10:00
Neil Hanlon	705e8420a2	Add rockylinux 9 to nodepool Change-Id: Iedf1c8eb2898cfc5771a5a695a53a39f9396edc9	2022-08-05 08:58:27 -04:00
wangxiyuan	30fca1376d	Bump openEuler to 22.03 LTS openEuler 20.03-LTS-SP2 was out of date in May 2022. 22.03 LTS is the newest LTS version. It was release in March 2022 and will be maintained for 2 years. This patch upgrades the LTS version. It'll be used in Devstack, Kolla-ansible and so on in CI jobs. Change-Id: I23f2b397bc7f1d8c2a959e0e90f5058cf3bf104d	2022-08-03 14:40:34 +08:00
Dr. Jens Harbott	1bdccd42e5	Start launching Jammy images The first image was built successfully, so we can start launching them. Change-Id: Ie84d1700b6f4f7696e14dfe01bc887e422163d7e	2022-04-26 13:53:29 +02:00
Neil Hanlon	2ccc5241c8	Add rockylinux-8 to nodepool configuration Signed-off-by: Neil Hanlon <neil@rockylinux.org> Change-Id: Ic3344bc47ca56c27f7ec3427a0865bd6ce3349d3 Depends-On: https://review.opendev.org/c/openstack/project-config/+/829405 Depends-On: https://review.opendev.org/c/openstack/project-config/+/829712	2022-02-17 09:12:00 -05:00
Ian Wienand	dd58c496f8	Remove Fedora 34 The dependent changes should be the last references to Fedora 34 nodes Depends-On: https://review.opendev.org/c/openstack/devstack/+/827576 Depends-On: https://review.opendev.org/c/openstack/devstack/+/827578 Depends-On: https://review.opendev.org/c/zuul/nodepool/+/827577 Change-Id: Ie14ea374808e5518588925de3a476f0bc6ff2ccf	2022-02-11 07:55:21 +11:00
Clark Boylan	dce378a6b4	Remove centos-8 This distro release reached its EOL December 31, 2021. We are removing it from our CI system as people should really stop testing on it. They can use CentOS 8 Stream or other alternatives instead. Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/827181 Change-Id: I13e8185b7839371a9f9043b715dc39c6baf907d5	2022-02-02 09:48:36 +11:00
Clark Boylan	58e4543789	Set centos-8 min-ready to 0 This is in preparation for removing this label. This distro is no longer supported and users will need to find alternatives. Change-Id: I57b363671809afe415a376b0894041438140bdae	2022-01-31 13:31:24 -08:00
Clark Boylan	07e9803134	Remove opensuse-tumbleweed from nodepool This removes the label, nodes, and images for opensuse-tumbleweed across our cloud providers. We also update grafana to stop graphing stats for the label. Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/824068 Change-Id: Ic311af5d667c01c1845251270fd2fdda7d99ebcb	2022-01-10 14:02:55 -08:00
Clark Boylan	05afaaa9ea	Set tumbleweed min-ready to 0 This is in preparation to remove the image and label entirely. Nothing seems to use the image so clean it up. Change-Id: I5ab3a0627874e302289deb442f80a782509df2c3	2022-01-10 14:02:55 -08:00
Ian Wienand	dc65009eaf	nodepool: Remove . from openEuler name We have a bug uploading images with "." in the name. Work around this for now by avoid it. Change-Id: I20e1a926d02a632450b8114d84a0fa738b7ec639	2021-12-17 07:39:45 +11:00

1 2

68 Commits