Add an early base job check for CPU counts

Ubuntu Noble (and possible Debian Trixie and other newer kernels) do not
properly handle x2apic with some of the older Xen hypervisors in rax
classic. When that happens instances boot with only a single useable
CPU. This then leads to problems later in jobs as many jobs in OpenDev
are designed to run tasks in parallel which doesn't work as well with a
single CPU.

To work around this we check the CPU count early in the job runtime and
fail if there are fewer than 2 CPUs present. Since this happens early
Zuul's retry mechanisms will restart the job on a new node.

More info about the x2api problem can be found here:

  https://docs.oracle.com/en/operating-systems/uek/8/relnotes8.0/38006792.html

This does suggest another potential workaround which is to use the older
apic version (which does work), but doing so has potential performance
problems. Since this issue seems infrequent we simply recycle the node
instead and let the job retry.

Note that we only update base-test for now to ensure that this doesn't
create widespread problems before applying it to the global base job.

Change-Id: Iff0249ae09da3c591746ce6300c033f6f06f58e6
This commit is contained in:
Clark Boylan
2026-02-02 10:20:23 -08:00
parent 74171f0550
commit a50827ece9
4 changed files with 19 additions and 0 deletions
+3
View File
@@ -34,6 +34,9 @@
hosts: all
roles:
- validate-host
# Hardware-check runs after validate-host as validate-host gathers host
# facts which are used to check the hardware.
- hardware-check
- test-prepare-workspace-git
- mirror-info
- role: configure-mirrors
+11
View File
@@ -0,0 +1,11 @@
An ansible role to check the runtime environment and fail the job if
criteria are not met. Currently only supports checking for a minimum
CPU count.
.. zuul:rolevar:: minimum_cpu_count
:default: 2
The minimum CPU count to consider this a valid testing environemnt
If there are fewer CPUs an error will be raised. Note this defaults
to 2 because you always have a least 1 and in that case wouldn't
need an explicit check.
+1
View File
@@ -0,0 +1 @@
minimum_cpu_count: 2
+4
View File
@@ -0,0 +1,4 @@
- name: Raise an error if CPU count is too low
when: ansible_processor_count < minimum_cpu_count
fail:
msg: "CPU count {{ cpu_count.stdout }} is less than minimum value {{ minimum_cpu_count }}"