dd9a25646f
This is my recollection of the consensus from some infra folk on a train late at night: it's probably wrong, but I wanted something I can point contributors at. Change-Id: Ic1ad99335ce41481995322f0ee5daadb08a09c2a
75 lines
3.0 KiB
ReStructuredText
75 lines
3.0 KiB
ReStructuredText
Test infrastructure Requirements
|
|
################################
|
|
|
|
Overview
|
|
========
|
|
|
|
There are multiple different ways that tests can be run. Each has different
|
|
trade-offs between cost, reliability and test coverage.
|
|
|
|
The primary goal for OpenStack test infrastructure is to deliver highly
|
|
reliable testing: with 500 patches successfully getting through the OpenStack
|
|
gate on a peak day, even short service interruptions have a significant impact
|
|
on project velocity.
|
|
|
|
The same velocity makes it extremely risky to disable tests: once disabled a
|
|
test is likely to bitrot quickly, making re-enabling such tests hard.
|
|
|
|
This gives the following principle:
|
|
|
|
* Test runs that can stop a patch landing must be highly available - there must
|
|
be at least two distinct places the test can be run, with no shared failure
|
|
domains other than things that the infra team itself is responsible for.
|
|
|
|
Test run styles
|
|
===============
|
|
|
|
Experimental
|
|
------------
|
|
|
|
Experimental jobs have low reliability requirements: they are run by hand on
|
|
developer request, typically as part of bringing up a new job definition.
|
|
Failures in experimental jobs are not the responsibility of openstack-infra,
|
|
though they will offer best-effort assistance to developers.
|
|
|
|
Silent
|
|
------
|
|
|
|
Silent jobs are jobs that are not yet ready to vote on code changes. They might
|
|
not be ready because of known failures, a lack of redundancy in the
|
|
infrastructure or some other reason. In all other regards they are the same
|
|
as Check jobs, which means we find out about the test reliability and can
|
|
accurately assess whether the job is ready to promote to Check status.
|
|
|
|
Third party
|
|
-----------
|
|
|
|
Third party test jobs are able to vote on code changes (+/- 1 only). These jobs
|
|
are run by third parties on code pushes, but are not able to prevent code
|
|
landing. (Developers of projects usually take negative votes from third party
|
|
systems seriously however). Third party test jobs cannot be gates, and cannot
|
|
set the '+2 verified' flag on review.
|
|
|
|
Check
|
|
-----
|
|
|
|
Check jobs are used to verify each patch pushed to Gerrit. Like a third party
|
|
test job they run against a single pushed patch, rather than the proposed
|
|
merged state of the repository. A failure reported by a check job will prevent
|
|
the patch being approved. As such check jobs have to run in a highly available
|
|
environment with only infra controlled components permitted to have shared
|
|
failure domains.
|
|
|
|
Gate
|
|
----
|
|
|
|
Gate jobs are used to detect failures in patches after they are approved. They
|
|
run against the state the OpenStack projects will have if the code is merged
|
|
(rather than the state of the pushed code). This allows detection of semantic
|
|
conflicts cross-patch (and the usual state for OpenStack is that multiple
|
|
patches are going through the gate at once, so this is crucial). Failures in
|
|
the gate both prevent the patch landing and cause all the pending patches after
|
|
it to be retested. Gate jobs, like check jobs, have to run in a highly
|
|
available environment with only infra controlled components permitted to have
|
|
shared failure domains.
|