From dd9a25646f1711861b1eaaf6911fcf4ee8fa5505 Mon Sep 17 00:00:00 2001 From: Robert Collins Date: Mon, 11 Nov 2013 09:26:53 +1300 Subject: [PATCH] Document what it takes to be a check/gate test. This is my recollection of the consensus from some infra folk on a train late at night: it's probably wrong, but I wanted something I can point contributors at. Change-Id: Ic1ad99335ce41481995322f0ee5daadb08a09c2a --- doc/source/index.rst | 1 + doc/source/test-infra-requirements.rst | 74 ++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 doc/source/test-infra-requirements.rst diff --git a/doc/source/index.rst b/doc/source/index.rst index 3306c108de..16a5f0b1e8 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -29,6 +29,7 @@ Contents: :maxdepth: 2 project + test-infra-requirements sysadmin systems diff --git a/doc/source/test-infra-requirements.rst b/doc/source/test-infra-requirements.rst new file mode 100644 index 0000000000..ba38f1b186 --- /dev/null +++ b/doc/source/test-infra-requirements.rst @@ -0,0 +1,74 @@ +Test infrastructure Requirements +################################ + +Overview +======== + +There are multiple different ways that tests can be run. Each has different +trade-offs between cost, reliability and test coverage. + +The primary goal for OpenStack test infrastructure is to deliver highly +reliable testing: with 500 patches successfully getting through the OpenStack +gate on a peak day, even short service interruptions have a significant impact +on project velocity. + +The same velocity makes it extremely risky to disable tests: once disabled a +test is likely to bitrot quickly, making re-enabling such tests hard. + +This gives the following principle: + +* Test runs that can stop a patch landing must be highly available - there must + be at least two distinct places the test can be run, with no shared failure + domains other than things that the infra team itself is responsible for. + +Test run styles +=============== + +Experimental +------------ + +Experimental jobs have low reliability requirements: they are run by hand on +developer request, typically as part of bringing up a new job definition. +Failures in experimental jobs are not the responsibility of openstack-infra, +though they will offer best-effort assistance to developers. + +Silent +------ + +Silent jobs are jobs that are not yet ready to vote on code changes. They might +not be ready because of known failures, a lack of redundancy in the +infrastructure or some other reason. In all other regards they are the same +as Check jobs, which means we find out about the test reliability and can +accurately assess whether the job is ready to promote to Check status. + +Third party +----------- + +Third party test jobs are able to vote on code changes (+/- 1 only). These jobs +are run by third parties on code pushes, but are not able to prevent code +landing. (Developers of projects usually take negative votes from third party +systems seriously however). Third party test jobs cannot be gates, and cannot +set the '+2 verified' flag on review. + +Check +----- + +Check jobs are used to verify each patch pushed to Gerrit. Like a third party +test job they run against a single pushed patch, rather than the proposed +merged state of the repository. A failure reported by a check job will prevent +the patch being approved. As such check jobs have to run in a highly available +environment with only infra controlled components permitted to have shared +failure domains. + +Gate +---- + +Gate jobs are used to detect failures in patches after they are approved. They +run against the state the OpenStack projects will have if the code is merged +(rather than the state of the pushed code). This allows detection of semantic +conflicts cross-patch (and the usual state for OpenStack is that multiple +patches are going through the gate at once, so this is crucial). Failures in +the gate both prevent the patch landing and cause all the pending patches after +it to be retested. Gate jobs, like check jobs, have to run in a highly +available environment with only infra controlled components permitted to have +shared failure domains.