Policy on Adding CI Jobs
New CI jobs need to be added following a specific process in order to ensure they don't block patches unnecessarily and that they aren't ignored by developers. Change-Id: I1cf1964c221819e9ad3da661d0423bd0885ce9ba
This commit is contained in:
parent
ced8320a2b
commit
2cc2c0b313
146
specs/policy/adding-ci-jobs.rst
Normal file
146
specs/policy/adding-ci-jobs.rst
Normal file
@ -0,0 +1,146 @@
|
||||
====================
|
||||
Adding New CI Jobs
|
||||
====================
|
||||
|
||||
New CI jobs need to be added following a specific process in order to ensure
|
||||
they don't block patches unnecessarily and that they aren't ignored by
|
||||
developers.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
We need to have a process for adding CI jobs that is not going to result
|
||||
in a lot of spurious failures due to the new jobs. Bogus CI results force
|
||||
additional rechecks and reduce developer/reviewer confidence in the results.
|
||||
|
||||
In addition, maintaining CI jobs is a non-trivial task, and each one we add
|
||||
increases the load on the team. Hopefully having a process that requires the
|
||||
involvement of the new job's proposer makes it clear that the person/team
|
||||
adding the job has a responsibility to help maintain it. CI is everyone's
|
||||
problem.
|
||||
|
||||
Policy
|
||||
======
|
||||
|
||||
The following steps should be completed in the order listed when adding a new
|
||||
job:
|
||||
|
||||
#. Create an experimental job or hijack an existing job for a single Gerrit
|
||||
change. See the references section for details on how to add a new job.
|
||||
This job should be passing before moving on to the next step.
|
||||
|
||||
#. Verify that the new job is providing a reasonable level of logging. Not
|
||||
too much, not too little. Important logs, such as the OpenStack service
|
||||
logs and basic system logs, are necessary to determine why jobs fail.
|
||||
However, OpenStack Infra has to store the logs from an enormous number of
|
||||
jobs, so it is also important to keep our log artifact sizes under control.
|
||||
When in doubt, try to capture about the same amount of logs as the existing
|
||||
jobs.
|
||||
|
||||
#. Promote the job to check non-voting. While the job should have been
|
||||
passing prior to this, it most likely has not been run a significant number
|
||||
of times, so the overall stability is still unknown.
|
||||
|
||||
"Stable" in this case would be defined as not having significantly more
|
||||
spurious failures than the ovb-ha job. Due to the additional complexity of
|
||||
an HA deployment, that job tends to fail for reasons unrelated to the patch
|
||||
being tested more often than the other jobs. We do not want to add any
|
||||
jobs that are less stable. Note that failures due to legitimate problems
|
||||
being caught by the new job should not count against its stability.
|
||||
|
||||
.. important:: Before adding OVB jobs to the check queue, even as
|
||||
non-voting, please check with the CI admins to ensure there is enough
|
||||
OVB capacity to run a large number of new jobs. As of this writing,
|
||||
the OVB cloud capacity is significantly more constrained than regular
|
||||
OpenStack Infra.
|
||||
|
||||
A job should remain in this state until it has been proven stable over a
|
||||
period of time. A good rule of thumb would be that after a week of
|
||||
stability the job can and should move to the next step.
|
||||
|
||||
.. important:: Jobs should not remain non-voting indefinitely. This causes
|
||||
reviewers to ignore the results anyway, so the jobs become a waste of
|
||||
resources. Once a job is believed to be stable, it should be made
|
||||
voting as soon as possible.
|
||||
|
||||
#. To assist with confirming the stability of a job, it should be added to the
|
||||
`CI Status <http://tripleo.org/cistatus.html>`_ page at this point. This
|
||||
can actually be done at any time after the job is moved to the check queue,
|
||||
but must be done before the job becomes voting.
|
||||
|
||||
Additionally, contact Sagi Shnaidman (sshnaidm on IRC) to get the job
|
||||
added to the `Extended CI Status <http://status-tripleoci.rhcloud.com/>`_
|
||||
page.
|
||||
|
||||
#. Send an e-mail to openstack-dev, tagged with [tripleo], that explains the
|
||||
purpose of the new job and notifies people that it is about to be made
|
||||
voting.
|
||||
|
||||
#. Make the job voting. At this point there should be sufficient confidence
|
||||
in the job that reviewers can trust the results and should not merge
|
||||
anything which does not pass it.
|
||||
|
||||
In addition, be aware that voting multinode jobs are also gating. If the
|
||||
job fails the patch cannot merge. This means a broken job can block all
|
||||
TripleO changes from merging.
|
||||
|
||||
#. Keep an eye on the `CI Status <http://tripleo.org/cistatus.html>`_ page to
|
||||
ensure the job keeps running smoothly. If it starts to fail an unusual
|
||||
amount, please investigate.
|
||||
|
||||
Alternatives & History
|
||||
======================
|
||||
|
||||
Historically, a number of jobs have been added to the check queue when they
|
||||
were completely broken. This is bad and reduces developer and reviewer
|
||||
confidence in the CI results. It can also block TripleO changes from merging
|
||||
if the broken job is gating.
|
||||
|
||||
We also have a bad habit of leaving jobs in the non-voting state, which makes
|
||||
them fairly worthless since reviewers will not respect the results. Per
|
||||
this policy, we should clean up all of the non-voting jobs by either moving
|
||||
them back to experimental, or stabilizing them and making them voting.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Author(s)
|
||||
---------
|
||||
|
||||
Primary author:
|
||||
bnemec
|
||||
|
||||
Milestones
|
||||
----------
|
||||
|
||||
This policy would go into effect immediately.
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
This policy is mostly targeted at new jobs, but we do have a number of
|
||||
non-voting jobs that should be brought into compliance with it.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
`OpenStack Infra Manual <https://docs.openstack.org/infra/manual/>`_
|
||||
|
||||
`Adding a New Job <https://docs.openstack.org/infra/manual/drivers.html#running-jobs-with-zuul>`_
|
||||
|
||||
Revision History
|
||||
================
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Pike
|
||||
- Introduced
|
||||
|
||||
.. note::
|
||||
|
||||
This work is licensed under a Creative Commons Attribution 3.0
|
||||
Unported License.
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
Loading…
Reference in New Issue
Block a user