Policy on Adding CI Jobs

New CI jobs need to be added following a specific process in order to ensure they don't block patches unnecessarily and that they aren't ignored by developers. Change-Id: I1cf1964c221819e9ad3da661d0423bd0885ce9ba
2017-03-15 18:18:27 +00:00 · 2017-03-15 18:18:27 +00:00 · 2cc2c0b313
commit 2cc2c0b313
parent ced8320a2b
1 changed files with 146 additions and 0 deletions
--- a/specs/policy/adding-ci-jobs.rst
+++ b/specs/policy/adding-ci-jobs.rst
@ -0,0 +1,146 @@
+====================
+ Adding New CI Jobs
+====================
+
+New CI jobs need to be added following a specific process in order to ensure
+they don't block patches unnecessarily and that they aren't ignored by
+developers.
+
+Problem Description
+===================
+
+We need to have a process for adding CI jobs that is not going to result
+in a lot of spurious failures due to the new jobs.  Bogus CI results force
+additional rechecks and reduce developer/reviewer confidence in the results.
+
+In addition, maintaining CI jobs is a non-trivial task, and each one we add
+increases the load on the team.  Hopefully having a process that requires the
+involvement of the new job's proposer makes it clear that the person/team
+adding the job has a responsibility to help maintain it.  CI is everyone's
+problem.
+
+Policy
+======
+
+The following steps should be completed in the order listed when adding a new
+job:
+
+#. Create an experimental job or hijack an existing job for a single Gerrit
+   change.  See the references section for details on how to add a new job.
+   This job should be passing before moving on to the next step.
+
+#. Verify that the new job is providing a reasonable level of logging.  Not
+   too much, not too little.  Important logs, such as the OpenStack service
+   logs and basic system logs, are necessary to determine why jobs fail.
+   However, OpenStack Infra has to store the logs from an enormous number of
+   jobs, so it is also important to keep our log artifact sizes under control.
+   When in doubt, try to capture about the same amount of logs as the existing
+   jobs.
+
+#. Promote the job to check non-voting.  While the job should have been
+   passing prior to this, it most likely has not been run a significant number
+   of times, so the overall stability is still unknown.
+
+   "Stable" in this case would be defined as not having significantly more
+   spurious failures than the ovb-ha job.  Due to the additional complexity of
+   an HA deployment, that job tends to fail for reasons unrelated to the patch
+   being tested more often than the other jobs.  We do not want to add any
+   jobs that are less stable.  Note that failures due to legitimate problems
+   being caught by the new job should not count against its stability.
+
+   .. important:: Before adding OVB jobs to the check queue, even as
+      non-voting, please check with the CI admins to ensure there is enough
+      OVB capacity to run a large number of new jobs.  As of this writing,
+      the OVB cloud capacity is significantly more constrained than regular
+      OpenStack Infra.
+
+   A job should remain in this state until it has been proven stable over a
+   period of time.  A good rule of thumb would be that after a week of
+   stability the job can and should move to the next step.
+
+   .. important:: Jobs should not remain non-voting indefinitely.  This causes
+      reviewers to ignore the results anyway, so the jobs become a waste of
+      resources.  Once a job is believed to be stable, it should be made
+      voting as soon as possible.
+
+#. To assist with confirming the stability of a job, it should be added to the
+   `CI Status <http://tripleo.org/cistatus.html>`_ page at this point.  This
+   can actually be done at any time after the job is moved to the check queue,
+   but must be done before the job becomes voting.
+
+   Additionally, contact Sagi Shnaidman (sshnaidm on IRC) to get the job
+   added to the `Extended CI Status <http://status-tripleoci.rhcloud.com/>`_
+   page.
+
+#. Send an e-mail to openstack-dev, tagged with [tripleo], that explains the
+   purpose of the new job and notifies people that it is about to be made
+   voting.
+
+#. Make the job voting.  At this point there should be sufficient confidence
+   in the job that reviewers can trust the results and should not merge
+   anything which does not pass it.
+
+   In addition, be aware that voting multinode jobs are also gating.  If the
+   job fails the patch cannot merge.  This means a broken job can block all
+   TripleO changes from merging.
+
+#. Keep an eye on the `CI Status <http://tripleo.org/cistatus.html>`_ page to
+   ensure the job keeps running smoothly.  If it starts to fail an unusual
+   amount, please investigate.
+
+Alternatives & History
+======================
+
+Historically, a number of jobs have been added to the check queue when they
+were completely broken.  This is bad and reduces developer and reviewer
+confidence in the CI results.  It can also block TripleO changes from merging
+if the broken job is gating.
+
+We also have a bad habit of leaving jobs in the non-voting state, which makes
+them fairly worthless since reviewers will not respect the results.  Per
+this policy, we should clean up all of the non-voting jobs by either moving
+them back to experimental, or stabilizing them and making them voting.
+
+Implementation
+==============
+
+Author(s)
+---------
+
+Primary author:
+  bnemec
+
+Milestones
+----------
+
+This policy would go into effect immediately.
+
+Work Items
+----------
+
+This policy is mostly targeted at new jobs, but we do have a number of
+non-voting jobs that should be brought into compliance with it.
+
+References
+==========
+
+`OpenStack Infra Manual <https://docs.openstack.org/infra/manual/>`_
+
+`Adding a New Job <https://docs.openstack.org/infra/manual/drivers.html#running-jobs-with-zuul>`_
+
+Revision History
+================
+
+.. list-table:: Revisions
+   :header-rows: 1
+
+   * - Release Name
+     - Description
+   * - Pike
+     - Introduced
+
+.. note::
+
+  This work is licensed under a Creative Commons Attribution 3.0
+  Unported License.
+  http://creativecommons.org/licenses/by/3.0/legalcode