Add an early base job check for CPU counts
Ubuntu Noble (and possible Debian Trixie and other newer kernels) do not properly handle x2apic with some of the older Xen hypervisors in rax classic. When that happens instances boot with only a single useable CPU. This then leads to problems later in jobs as many jobs in OpenDev are designed to run tasks in parallel which doesn't work as well with a single CPU. To work around this we check the CPU count early in the job runtime and fail if there are fewer than 2 CPUs present. Since this happens early Zuul's retry mechanisms will restart the job on a new node. More info about the x2api problem can be found here: https://docs.oracle.com/en/operating-systems/uek/8/relnotes8.0/38006792.html This does suggest another potential workaround which is to use the older apic version (which does work), but doing so has potential performance problems. Since this issue seems infrequent we simply recycle the node instead and let the job retry. Note that we only update base-test for now to ensure that this doesn't create widespread problems before applying it to the global base job. Change-Id: Iff0249ae09da3c591746ce6300c033f6f06f58e6
This commit is contained in:
@@ -34,6 +34,9 @@
|
||||
hosts: all
|
||||
roles:
|
||||
- validate-host
|
||||
# Hardware-check runs after validate-host as validate-host gathers host
|
||||
# facts which are used to check the hardware.
|
||||
- hardware-check
|
||||
- test-prepare-workspace-git
|
||||
- mirror-info
|
||||
- role: configure-mirrors
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
An ansible role to check the runtime environment and fail the job if
|
||||
criteria are not met. Currently only supports checking for a minimum
|
||||
CPU count.
|
||||
|
||||
.. zuul:rolevar:: minimum_cpu_count
|
||||
:default: 2
|
||||
|
||||
The minimum CPU count to consider this a valid testing environemnt
|
||||
If there are fewer CPUs an error will be raised. Note this defaults
|
||||
to 2 because you always have a least 1 and in that case wouldn't
|
||||
need an explicit check.
|
||||
@@ -0,0 +1 @@
|
||||
minimum_cpu_count: 2
|
||||
@@ -0,0 +1,4 @@
|
||||
- name: Raise an error if CPU count is too low
|
||||
when: ansible_processor_count < minimum_cpu_count
|
||||
fail:
|
||||
msg: "CPU count {{ cpu_count.stdout }} is less than minimum value {{ minimum_cpu_count }}"
|
||||
Reference in New Issue
Block a user