Document what it takes to be a check/gate test.
This is my recollection of the consensus from some infra folk on a train late at night: it's probably wrong, but I wanted something I can point contributors at. Change-Id: Ic1ad99335ce41481995322f0ee5daadb08a09c2a
This commit is contained in:
parent
c53349587b
commit
dd9a25646f
@ -29,6 +29,7 @@ Contents:
|
|||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
|
||||||
project
|
project
|
||||||
|
test-infra-requirements
|
||||||
sysadmin
|
sysadmin
|
||||||
systems
|
systems
|
||||||
|
|
||||||
|
74
doc/source/test-infra-requirements.rst
Normal file
74
doc/source/test-infra-requirements.rst
Normal file
@ -0,0 +1,74 @@
|
|||||||
|
Test infrastructure Requirements
|
||||||
|
################################
|
||||||
|
|
||||||
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
There are multiple different ways that tests can be run. Each has different
|
||||||
|
trade-offs between cost, reliability and test coverage.
|
||||||
|
|
||||||
|
The primary goal for OpenStack test infrastructure is to deliver highly
|
||||||
|
reliable testing: with 500 patches successfully getting through the OpenStack
|
||||||
|
gate on a peak day, even short service interruptions have a significant impact
|
||||||
|
on project velocity.
|
||||||
|
|
||||||
|
The same velocity makes it extremely risky to disable tests: once disabled a
|
||||||
|
test is likely to bitrot quickly, making re-enabling such tests hard.
|
||||||
|
|
||||||
|
This gives the following principle:
|
||||||
|
|
||||||
|
* Test runs that can stop a patch landing must be highly available - there must
|
||||||
|
be at least two distinct places the test can be run, with no shared failure
|
||||||
|
domains other than things that the infra team itself is responsible for.
|
||||||
|
|
||||||
|
Test run styles
|
||||||
|
===============
|
||||||
|
|
||||||
|
Experimental
|
||||||
|
------------
|
||||||
|
|
||||||
|
Experimental jobs have low reliability requirements: they are run by hand on
|
||||||
|
developer request, typically as part of bringing up a new job definition.
|
||||||
|
Failures in experimental jobs are not the responsibility of openstack-infra,
|
||||||
|
though they will offer best-effort assistance to developers.
|
||||||
|
|
||||||
|
Silent
|
||||||
|
------
|
||||||
|
|
||||||
|
Silent jobs are jobs that are not yet ready to vote on code changes. They might
|
||||||
|
not be ready because of known failures, a lack of redundancy in the
|
||||||
|
infrastructure or some other reason. In all other regards they are the same
|
||||||
|
as Check jobs, which means we find out about the test reliability and can
|
||||||
|
accurately assess whether the job is ready to promote to Check status.
|
||||||
|
|
||||||
|
Third party
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Third party test jobs are able to vote on code changes (+/- 1 only). These jobs
|
||||||
|
are run by third parties on code pushes, but are not able to prevent code
|
||||||
|
landing. (Developers of projects usually take negative votes from third party
|
||||||
|
systems seriously however). Third party test jobs cannot be gates, and cannot
|
||||||
|
set the '+2 verified' flag on review.
|
||||||
|
|
||||||
|
Check
|
||||||
|
-----
|
||||||
|
|
||||||
|
Check jobs are used to verify each patch pushed to Gerrit. Like a third party
|
||||||
|
test job they run against a single pushed patch, rather than the proposed
|
||||||
|
merged state of the repository. A failure reported by a check job will prevent
|
||||||
|
the patch being approved. As such check jobs have to run in a highly available
|
||||||
|
environment with only infra controlled components permitted to have shared
|
||||||
|
failure domains.
|
||||||
|
|
||||||
|
Gate
|
||||||
|
----
|
||||||
|
|
||||||
|
Gate jobs are used to detect failures in patches after they are approved. They
|
||||||
|
run against the state the OpenStack projects will have if the code is merged
|
||||||
|
(rather than the state of the pushed code). This allows detection of semantic
|
||||||
|
conflicts cross-patch (and the usual state for OpenStack is that multiple
|
||||||
|
patches are going through the gate at once, so this is crucial). Failures in
|
||||||
|
the gate both prevent the patch landing and cause all the pending patches after
|
||||||
|
it to be retested. Gate jobs, like check jobs, have to run in a highly
|
||||||
|
available environment with only infra controlled components permitted to have
|
||||||
|
shared failure domains.
|
Loading…
Reference in New Issue
Block a user