Merge "policies: Add policy for rechecking failed jobs on Gerrit"
This commit is contained in:
commit
be4571704f
@ -40,8 +40,10 @@ story is to check for `uncategorized <http://status.openstack.org/elastic-rechec
|
||||
failures. This is where failures for new (unknown) gate breaking bugs end up; on the other hand also infra
|
||||
error causing job failures end up here. It should be duty of the diligent Neutron developer to ensure the
|
||||
classification rate for neutron jobs is as close as possible to 100%. To this aim, the diligent Neutron
|
||||
developer should adopt the following procedure:
|
||||
developer should adopt the procedure outlined in the following sections.
|
||||
|
||||
Troubleshooting Tempest jobs
|
||||
----------------------------
|
||||
1. Open logs for failed jobs and look for logs/testr_results.html.gz.
|
||||
2. If that file is missing, check console.html and see where the job failed.
|
||||
1. If there is a failure in devstack-gate-cleanup-host.txt it's likely to be an infra issue.
|
||||
@ -50,10 +52,24 @@ developer should adopt the following procedure:
|
||||
logstash.
|
||||
4. On logstash, search for occurrences of this error message, and try to identify the root cause for the failure
|
||||
(see below).
|
||||
5. File a bug for this failure, and push a elastic-recheck query for it (see below).
|
||||
5. File a bug for this failure, and push an `Elastic Recheck Query <http://docs.openstack.org/developer/neutron/policies/gate-failure-triage.html#filing-an-elastic-recheck-query>`_ for it.
|
||||
6. If you are confident with the area of this bug, and you have time, assign it to yourself; otherwise look for an
|
||||
assignee or talk to the Neutron's bug czar to find an assignee.
|
||||
|
||||
Troubleshooting functional/fullstack job
|
||||
----------------------------------------
|
||||
1. Go to the job link provided by Jenkins CI.
|
||||
2. Look at logs/testr_results.html.gz for which particular test failed.
|
||||
3. More logs from a particular test are stored at
|
||||
logs/dsvm-functional-logs/<path_of_the_test> (or dsvm-fullstack-logs
|
||||
for fullstack job).
|
||||
4. Find the error in the logs and search for similar errors in existing
|
||||
launchpad bugs. If no bugs were reported, create a new bug report. Don't
|
||||
forget to put a snippet of the trace into the new launchpad bug. If the
|
||||
log file for a particular job doesn't contain any trace, pick the one
|
||||
from testr_results.html.gz.
|
||||
5. Create an `Elastic Recheck Query <http://docs.openstack.org/developer/neutron/policies/gate-failure-triage.html#filing-an-elastic-recheck-query>`_
|
||||
|
||||
Root Causing a Gate Failure
|
||||
---------------------------
|
||||
Time-based identification, i.e. find the naughty patch by log scavenging.
|
||||
|
26
doc/source/policies/gerrit-recheck.rst
Normal file
26
doc/source/policies/gerrit-recheck.rst
Normal file
@ -0,0 +1,26 @@
|
||||
Recheck Failed CI jobs in Neutron
|
||||
=================================
|
||||
|
||||
This document provides guidelines on what to do in case your patch fails one of
|
||||
the Jenkins CI jobs. In order to discover potential bugs hidden in the code or
|
||||
tests themselves, it's very helpful to check failed scenarios to investigate
|
||||
the cause of the failure. Sometimes the failure will be caused by the patch
|
||||
being tested, while other times the failure can be caused by a previously
|
||||
untracked bug. Such failures are usually related to tests that interact with
|
||||
a live system, like functional, fullstack and tempest jobs.
|
||||
|
||||
Before issuing a recheck on your patch, make sure that the gate failure is not
|
||||
caused by your patch. Failed job can be also caused by some infra issue, for
|
||||
example unable to fetch things from external resources like git or pip due to
|
||||
outage. Such failures outside of OpenStack world are not worth tracking in
|
||||
launchpad and you can recheck leaving couple of words what went wrong. Data
|
||||
about gate stability is collected and visualized via
|
||||
`Grafana <http://grafana.openstack.org/dashboard/db/neutron-failure-rate>`_.
|
||||
|
||||
Please, do not recheck without providing the bug number for the failed job.
|
||||
For example, do not just put an empty "recheck" comment but find the related
|
||||
bug number and put a "recheck bug ######" comment instead. If a bug does not
|
||||
exist yet, create one so other team members can have a look. It helps us
|
||||
maintain better visibility of gate failures. You can find how to troubleshoot
|
||||
gate failures in the `Gate Failure Triage <http://docs.openstack.org/developer/neutron/policies/gate-failure-triage.html#troubleshooting-tempest-job>`_
|
||||
documentation.
|
@ -32,3 +32,4 @@ items.
|
||||
code-reviews
|
||||
release-checklist
|
||||
thirdparty-ci
|
||||
gerrit-recheck
|
||||
|
Loading…
Reference in New Issue
Block a user