Commit Graph

3747 Commits (master)

Author SHA1 Message Date
Zuul d1ffc204e4 Merge "Emit stats for more build results" 2023-09-27 13:02:29 +00:00
Zuul a6cf0e07e9 Merge "Surface mutiple configuration issues as warnings" 2023-09-27 09:13:05 +00:00
James E. Blair 1a226acbd0 Emit stats for more build results
We currently typically only emit build stats for success and failure,
but do not emit stats for NODE_FAILURE, CANCELED, SKIPPED, etc.

To round out the feature so that all build results are emitted, this
includes a small refactor to report build stats at the same time we
report build results to the database (it almost all cases).  One
exception to that is when we receive a non-current build result --
that generally happens after a build is canceled, so we don't need
to emit the stats again.

Change-Id: I3bdf4e2643e151e2eae9f204f62cdc106df876b4
2023-09-26 11:32:03 -07:00
Simon Westphahl 169465ae41
Surface mutiple configuration issues as warnings
This change adds a warning (viewable in the web UI) when multiple
configurations are found for a project-branch combination.

Change-Id: I3defb2b1e6d7b59c450361c11d709802fe193373
2023-09-25 08:59:33 +02:00
Simon Westphahl 1da1c5e014
Fix child job skip with paused deduplicated parent
When a build pauses, it can also return a list of child jobs to execute.
If the paused build was deduplicated we need to call `setResult()` on
all items that have that particular build.

Change-Id: Iead5c02032bccf46852ee6b2c8adf714689aa2f5
2023-09-22 12:28:45 +02:00
Zuul a06d4110f5 Merge "Add more safety check tests" 2023-09-22 06:14:10 +00:00
James E. Blair 01ac88f3d3 Add more safety check tests
These tests would fail without the safety check introduced in the
parent change.

They were originally written under the assumption that we would
have the optimal behavior for the pipeline processor, where some
builds may be canceled but others may proceed after certain updates
to circular dependency bundles.  However, fixing that is proving
impractical without a refactor, so these tests are added in updated
form which mostly asserts that a lot of jobs are aborted.

The original form of the tests are also added in this change, but
with skip decorators attached.  It is hoped that after some
refactoring of circular dependency handling, we can use the original
form of these tests to validate the desired behavior.

Change-Id: Ie45da9fc1848a717bf5308e595edd27e598d6882
2023-09-21 13:24:04 -07:00
Zuul ebd223cf07 Merge "github: fallback to api_token when can't find installation" 2023-09-19 17:45:41 +00:00
Zuul a75d640b8e Merge "Add a bundle-changed safety check" 2023-09-19 17:05:03 +00:00
Ian Wienand 3c2e518c52 github: fallback to api_token when can't find installation
graphql queries (I77be4f16cf7eb5c8035ce0312f792f4e8d4c3e10) require
authentication. Enqueueing changes from GitHub (including Depends-On)
requires we run a graphql query. This means that Zuul must be able to
authenticate either via an application or api_token to support features
like Depends-On.

If the app is setup (app_id in config) but we aren't installed with
permissions on the project we're looking up, then fall back to using a
specified api_token. This will make Depends-On work.

Logging is updated to reflect whether or not we are able to fallback to
the api_token if the application is not installed. We log the lack of an
application installation at info level if we can fallback to the token,
and log at error level if we're falling back to anonymous access.

For backward compatibility we continue to fallback to anonymous access
if neither an application install or api_token are present. The reason
for this is features like Job required-projects: work fine anonymously,
and there may be Zuul installations that don't need additional
functionality.

Keep in mind that authenticated requests to GitHub get larger API rate
limits. Zuul installations should consider setting an API token even
when using an application for this reason. This gives Zuul the best
chance that fallback requests will not be rate limited.

Documentation is updated, a changelog added and several test
configuration files are padded with the required info.

Story: #2008940
Change-Id: I2107aeafc55591eea790244701567569fa6e80d4
2023-09-18 09:29:38 -07:00
Zuul d44b9875b0 Merge "Fix deduplicating child jobs in check" 2023-09-15 22:22:02 +00:00
Zuul 5b7b0aed5f Merge "Fix deduplication with paused jobs" 2023-09-15 22:13:52 +00:00
Zuul 4b347ce91b Merge "Avoid leaked items caused by config errors" 2023-09-15 18:41:33 +00:00
Zuul 086151de37 Merge "Add more check deduplication tests" 2023-09-15 18:24:17 +00:00
Zuul 5294c582b1 Merge "Fix deduplication of child jobs in check" 2023-09-15 18:24:14 +00:00
Zuul 2d76e35129 Merge "Add more deduplication tests" 2023-09-15 18:17:40 +00:00
Zuul d543ae2f88 Merge "Make Ansible variable freezing more efficient" 2023-09-15 08:13:59 +00:00
Zuul 502b178ad9 Merge "Use cyaml when reading/writing Ansible" 2023-09-15 08:03:13 +00:00
James E. Blair e55748ba69 Make Ansible variable freezing more efficient
We currently iterate over every job/host/etc variable in the freeze
playbook.  The reason is because if any value in any variable is
Undefined according to jinja, the Ansible combine filter throws
an error.  What we want to do in Zuul is merge any variable we can,
but if any is undefined, we skip it.  Thus, the process of combining
the variables one at a time in a task and ignoring errors.

This process can be slow, especially if we have start with a large
amount of data in one of the early variables.  The combine filter
needs to reprocess the large data repeatedly for each additional
variable.

To improve the process, we create a new action plugin, "zuul_freeze"
which takes a list of variables we want to freeze, then templates
them one at a time and stores the result in a cacheable fact.  This
is the essence of what we were trying to accomplish with the combine
filter.

Change-Id: Ie41f404762daa1b1a5ae47f6ec1aa1954ad36a39
2023-09-14 14:00:45 -07:00
James E. Blair 9406bcc2d3 Add a bundle-changed safety check
Several recent bugs and attempted fixes have shown that there may
be some edge cases in the handling of dependency cycles that have
the potential to cause jobs to run with the wrong changes in place.

While we work on longer-term fixes to those, add a safety check to
the pipeline processor so that if we detect a change to the bundle
contents of a queue item, we remove the item from the queue.  We
may not necessarily perform the optimal behavior with this, but it
should keep us from running jobs with known incorrect changes.

This change requires some minor adjustment to some existing unit
tests (it doesn't significantly change the outcome, but it does
cause some jobs to be aborted sooner).  A followup change will add
some more tests which would fail without this change but merit
separate review.

Change-Id: Ia7b1d5b7e3d6910a709478082929f96364ca996b
2023-09-13 14:07:19 -07:00
Simon Westphahl f88a69c7b3
Avoid infinite recursion with topic dependencies
When updating topic dependencies we only consider the current patchset.
However, when there are changes that have a git-level dependency to an
outdated patchset and those dependencies are in a different topic we can
run into max. recursion depth issues.

To fix this problem, we will keep a history of topics that we've already
processed and not call `getChangesByTopic()` again when we've already
seen a topic.

Change-Id: I6c15f502cfd593a44d7adda930670692151b6713
2023-09-13 14:54:15 +02:00
James E. Blair 930c42cd28 Fix deduplicating child jobs in check
If a second change is enqueued in check significantly after the
first, then node requests for child jobs may not be deduplicated
correctly.

Before deduplicating a build, Zuul applies parent data to child
jobs, then compares the child jobs to determine if they are
equivalent.  If so, then they are deduplicated.

This only happens one level in the hierarchy at a time.  Consider
the case where a change is enqueued and both the parent and child
jobs have completed (but the change is still in the queue waiting
on a third, unrelated, job).

If the second change in the bundle is enqueued, Zuul will:
1) Attempt to apply parent data to child jobs.
   Since no jobs have completed yet for this item, no parent data
   are applied.
2) Deduplicate jobs in the second change.
   Zuul will deduplicate the parent job at this point.
3) Zuul will compare the child jobs in the two changes and determine
   they are different because one has parent data and the other does
   not.
4) Zuul submits a node request for the child job.
5) On the next pipeline process, Zuul applies the parent data from
   the deduplicated parent job to the new child job.
6) Zuul deduplicates the child job, and the nodepool request is
   orphaned.

To correct this, we will repeat the process of applying parent data
to child jobs each time we find at least one build to deduplicate.

That means that all existing parent data will be applied to all jobs
on each pass through the pipeline processor no matter how deep the
dependency hierarchy.

Change-Id: Ifff17df40f0d59447f74cdde619246171279b553
2023-09-08 14:20:46 -07:00
James E. Blair 68f80f9749 Fix deduplication with paused jobs
When a deduplicated job paused, it would not wait for all children
across all queue items to complete before resuming; instead it
would wait only for the children in its own queue item.

Check all queue items a build is in before resuming it.

Change-Id: Ic2dec3a6dc58230b0873d7e8ba474bc39ed28385
2023-09-08 12:54:33 -07:00
James E. Blair 493a136dba Add more check deduplication tests
There is enough difference between how jobs are deduplicated in
check vs gate that we should test both code paths.  This duplicates
the remaining tests.

Change-Id: I539a61a575b036021fd46b363a4f7f09262db3a7
2023-09-07 16:58:19 -07:00
James E. Blair 742669ab09 Fix deduplication of child jobs in check
When we deduplicate jobs, we intend to call setResult on all of
the queue items with the deduplicated build.  This only worked
in dependent pipelines because we only looked for queue items in
the current bundle.  In independent pipelines, the queue items
can be in different bundles.

To resolve this, search for items with deduplicated builds in
across the whole queue in independent pipelines (using the approach
we use when deduplicating them to begin with).

Change-Id: I16436710c47b4f22df39e0cd82d0e289b2293c32
2023-09-07 16:58:19 -07:00
James E. Blair 77633e0005 Add more deduplication tests
This adds more test cases for automatic job deduplication, as well
as some explanatory comments.

Change-Id: I5ca96ddf655e501af3c9490ea86e8cd6a13d7e44
2023-09-07 14:11:30 -07:00
James E. Blair 70c34607f5 Add support for limiting dependency processing
To protect Zuul servers from accidental DoS attacks in case someone,
say, uploads a 1k change tree to gerrit, add an option to limit the
dependency processing in the Gerrit driver and in Zuul itself (since
those are the two places where we recursively process deps).

Change-Id: I568bd80bbc75284a8e63c2e414c5ac940fc1429a
2023-09-07 11:01:29 -07:00
Felix Edel 7ba9307f11 Avoid leaked items caused by config errors
The _reportNonEqueuedItem() method is used to temporarily enqueue a
change, report it and directly dequeue it. However, when the reporting
fails e.g. due to a config error, the item will never be dequeued.

This results in a leaked change that causes the queue processor to
loop over it indefinitely.

In our case the config error was caused by disabling the branch
protection in GitHub for a release branch in a certain repository. This
branch also defined a project-template which could not be found by Zuul
anymore after the branch protection was disabled [1].

This behaviour can be reproduced in a unit test by enforcing a broken
tenant configuration that references a non-existing project template
during a pipeline run with a circular dependency.

To fix this, ensure that the temporary enqueued item in
_reportNonEqueuedItem() will be dequeued in any case.

Although this fixes the endless loop in the queue processor, the same
exception will still be raised on pipeline level ("exception processing
pipeline...").

[1]:
2023-08-28 15:28:53,507 ERROR zuul.Pipeline.example-tenant.check: [e: 06d1ab80-45b7-11ee-8c99-721bf9f22e8c] Unable to re-enqueue change <Change 0x7f066ba36090 example-tenant/project 1234,80b4068eb1fe485df59185f0c93059fe7b15c23e> which is missing dependencies
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1614, in _processOneItem
    quiet_dequeue = self.addChange(
                    ^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 611, in addChange
    self._reportNonEqueuedItem(change_queue,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 676, in _reportNonEqueuedItem
    if self.pipeline.tenant.layout.getProjectPipelineConfig(ci):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 8194, in getProjectPipelineConfig
    templates = self.getProjectTemplates(template_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 8127, in getProjectTemplates
    raise TemplateNotFoundError("Project template %s not found" % name)
zuul.model.TemplateNotFoundError: Project template template-foo not found

Change-Id: I2514b783b646caae2863ee1ccbac4600772fe4d6
2023-09-07 18:26:49 +02:00
James E. Blair eee6ef3fcf Strip refs/heads from gerrit default branches
The HEAD Gerrit API endpoint returns '/refs/heads/master', not
'master' as the test fixture was constructed with.  Correct this.

Change-Id: I98b0759516bd50c0eddeb9245fc951c58e80ee45
2023-09-06 07:03:27 -07:00
Zuul 3532828c82 Merge "Fix zk host env var for tests" 2023-09-05 12:52:49 +00:00
Clark Boylan 8869d6b1d4 Add comment to TestValidateWarinngs
Explain why tenant-one is the correct tenant to validate unlike
the test above it.

Change-Id: Iab2a3459b28edeec2c9761a32d424384f64f18af
2023-09-04 14:20:55 -07:00
James E. Blair 267f675533 Allow new warnings when errors exist
If a configuration error existed for a project on one branch
and a change was proposed to update the config on another branch,
that would activate a code path in the manager which attempts to
determine whether errors are relevant.  An error (or warning) is
relevant if it is not in a parent change, and is on the same
project+branch as the current patch.  This is pretty generous.

This means that if a patch touches Zuul configuration with a
warning, all warnings on that branch must be updated.  This was
not the intended behavior.

To correct that, we no longer consider warnings in any of the
places where we check that a queue item is failing due to
configuration errors.

An existing test is updated to include sufficient setup to trigger
the case where a new valid configuration is added to a project
with existing errors and warnings.

A new test case is added to show that we can add new deprecations
as well, and that they are reported to users as warnings.

Change-Id: Id901a540fce7be6fedae668390418aca06a950af
2023-09-04 14:02:13 -07:00
Clark Boylan 4effa487f5 Allow new configs to be used when warnings are present
Prior to this change we checked if there are any errors in the config
(which includes warnings by default) and return a build error if there
are. Now we only check that proper errors are present when returning
errors.

This allows users to push config updates that don't fix all warnings
immediately. Without this any project with warnings present would need
to fix all warnings before newly proposed configs can take effect. This
is particularly problematic for speculative testing, but in general it
seems like warnings shouldn't be fatal.

Change-Id: I31b094fb366328696708b019354b843c4b94ffc0
2023-09-04 11:20:13 -07:00
Zuul 90dce8ed12 Merge "Add pipeline queue stats" 2023-08-30 01:28:50 +00:00
Zuul fc622866ec Merge "Add window-ceiling pipeline parameter" 2023-08-30 01:28:43 +00:00
James E. Blair a316015f56 Add pipeline queue stats
Also add the configured window size to the pipeline stats.

Remove the ambiguous phrasing "since Zuul started" from some of
the counter documentation.

Change-Id: Icbb7bcfbf25a1e34d26dd865fa29f61faceb4683
2023-08-29 15:49:52 -07:00
James E. Blair 7044963857 Add window-ceiling pipeline parameter
This allows users to set a maximum value for the active window
in the event they have a project that has long stretches of
passing tests but they still don't want to commit too many resources
in case of a failure.

We should all be so lucky.

Change-Id: I52b5f3a9e7262b88fb16afc4520b35854e8df184
2023-08-29 15:43:28 -07:00
James E. Blair ef946ff4e1 Add a test for Job._deduplicateSecrets
This method has gotten somewhat complicated, so add a unit test
that verifies that deduplication works and also verify that two
secrets with the same name but different contents are not
deduplicated.

Change-Id: I17ced6eea3c867a1119251631113550581c40bf4
2023-08-29 07:28:10 -07:00
James E. Blair d4fac1a0e8 Register RE2 syntax errors as warnings
This adds a configuration warning (viewable in the web UI) for any
regular expressions found in in-repo configuration that can not
be compiled and executed with RE2.

Change-Id: I092b47e9b43e9548cafdcb65d5d21712fc6cc3af
2023-08-28 15:04:49 -07:00
James E. Blair 3d5f87359d Add configuration support for negative regex
The re2 library does not support negative lookahead expressions.
Expressions such as "(?!stable/)", "(?!master)", and "(?!refs/)" are
very useful branch specifiers with likely many instances in the wild.
We need to provide a migration path for these.

This updates the configuration options which currently accepts Python
regular expressions to additionally accept a nested dictionary which
allows specifying that the regex should be negated.  In the future,
other options (global, multiline, etc) could be added.

A very few options are currently already compiled with re2.  These are
left alone for now, but once the transition to re2 is complete, they
can be upgraded to use this syntax as well.

Change-Id: I509c9821993e1886cef1708ddee6d62d1a160bb0
2023-08-28 15:03:58 -07:00
Zuul e660979a8c Merge "Use tenant-level layout locks" 2023-08-25 14:44:38 +00:00
James E. Blair eb803984a0 Use tenant-level layout locks
The current "layout_lock" in the scheduler is really an "abide" lock.
We lock it every time we change something in the abide (including
tenant layouts).  The name is inherited from pre-multi-tenant Zuul.

This can cause some less-than-optimal behavior when we need to wait to
acquire the "layout_lock" for a tenant reconfiguration event in one
thread while another thread holds the same lock because it is
reloading the configuration for a different tenant.  Ideally we should
be able to have finer-grained tenant-level locking instead, allowing
for less time waiting to reconfigure.

The following sections describe the layout lock use prior to this
commit and how this commit adjusts the code to make it safe for
finer-grained locking.

1) Tenant iteration

The layout lock is used in some places (notably some cleanup methods)
to avoid having the tenant list change during the method.  However,
the configloader already performs an atomic replacement of the tenant
list making it safe for iteration.  This change adds a lock around
updates to the tenant list to prevent corruption if two threads update
it at the same time.

The semaphore cleanup method indirectly references the abide and
layout for use in global and local semaphores.  This is just for path
construction, and the semaphores exist apart from the abide and layout
configurations and so should not be affected by either changing while
the cleanup method is running.

The node request cleanup method could end up running with an outdated
layout objects, including pipelines, however it should not be a
problem if these orphaned objects end up refreshing data from ZK right
before they are removed.

In these cases, we can simply remove the layout lock.

2) Protecting the unparsed project branch cache

The config cache cleanup method uses the unparsed project branch cache
(that is, the in-memory cache of the contents of zuul config files) to
determine what the active projects are.

Within the configloader, the cache is updated and then also used while
loading tenant configuration.  The layout lock would have made sure
all of these actions were mutually exclusive.  In order to remove the
layout lock here, we need to make sure the Abide's
unparsed_project_branch_cache is safe for concurrent updates.

The unparsed_project_branch_cache attribute is a dictionary that
conains references to UnparsedBranchCache objects.  Previously, the
configloader would delete each UnparsedBranchCache object from the
dictionary, reinitialize it, then incrementially add to it.

This current process has a benign flaw.  The branch cache is cleared,
and then loaded with data based on the tenant project config (TPC)
currently being processed.  Because the cache is loaded based on data
from the TPC, it is really only valid for one tenant at a time despite
our intention that it be valid for the entire abide.  However, since
we do check whether it is valid for a given TPC, and then clear and
reload it if it is not, there is no error in data, merely an
incomplete utilization of the cache.

In order to make the cache safe for use by different tenants at the
same time, we address this problem (and effectively make it so that it
is also *effective* for different tenants, even at different times).
The cache is updated to store the ltime for each entry in the cache,
and also to store null entries (with ltimes) for files and paths that
have been checked but are not present in the project-cache.  This
means that at any given time we can determine whether the cache is
valid for a given TPC, and support multiple TPCs (i.e., multiple
tenants).

It's okay for the cache to be updated simultaneously by two tenants
since we don't allow the cache contents to go backwards in ltime.  The
cache will either have the data with at least the ltime required, or
if not, that particular tenant load will spawn cat jobs and update it.

3) Protecting Tenant Project Configs (TPCs)

The loadTPC method on the ConfigLoader would similarly clear the TPCs
for a tenant, then add them back.  This could be problematic for any
other thread which might be referencing or iterating over TPCs.  To
correct this, we take a similar approach of atomic replacement.

Because there are two flavors of TPCs (config and untrusted) and they
are stored in two separate dictionaries, in order to atomically update
a complete tenant at once, the storage hierarchy is restructured as
"tenant -> {config/untrusted} -> project" rather than
"{config/untrusted} -> tenant -> project".  A new class named
TenantTPCRegistry holds both flavors of TPCs for a given tenant, and
it is this object that is atomically replaced.

Now that these issues are dealt with, we can implement a tenant-level
thread lock that is used simply to ensure that two threads don't
update the configuration for the same tenant at the same time.

The scheduler's unparsed abide is updated in two places: upon full
reconfiguration, or when another scheduler has performed a full
reconfiguration and updated the copy in ZK.  To prevent these two
methods from performing the same update simultaneously, we add an
"unparsed_abide_lock" and mutually exclude them.

Change-Id: Ifba261b206db85611c16bab6157f8d1f4349535d
2023-08-24 17:32:25 -07:00
James E. Blair 5c12ea68c6 Add default branch support to the Gerrit driver
This extends the previous change to include project default branch
support for the Gerrit driver as well as GitHub.

Change-Id: I2b1f6feed72277f5e61a2789d8af5276ee4c7b05
2023-08-23 11:07:09 -07:00
James E. Blair 57a9c13197 Use the GitHub default branch as the default branch
This supplies a per-project default value for Zuul's default-branch
based on what the default branch is set to in GitHub.  This means
that if users omit the default-branch setting on a Zuul project
stanza, Zuul will automatically use the correct value.

If the value in GitHub is changed, an event is emitted which allows
us to automatically reconfigure the tenant.

This could be expanded to other drivers that support an indication
of which branch is default.

Change-Id: I660376ecb3f382785d3bf96459384cfafef200c9
2023-08-23 11:07:08 -07:00
James E. Blair 1d07a097ee Use cyaml when reading/writing Ansible
We subclass the yaml.SafeDumper class to adjust its behavior with an
override of the ignore_aliases method.  It is possible to subclass
the cyaml.CSafeDumper class as well.  The "C" part is actually the
Parser and Emitter, not the Dumper/Representer, so our override
is still effective whether we use the C or Python versions.

This can produce a significant performance increase when exchanging
large amounts of data with Ansible.

The C emitter is more aggressive about not using unecessary quotes,
so the ansible dumper test assertions need to change.  To add some
extra assurance, that test is also updated to check that the round-trip
load is as expected as well.

Change-Id: I30fd82c0b9472120d010f3f4a65e17fb426b0f7e
2023-08-22 16:15:19 -07:00
James E. Blair 1b042ba4ab Add job failure output detection regexes
This allows users to trigger the new early failure detection by
matching regexes in the streaming job output.

For example, if a unit test job outputs something sufficiently
unique on failure, one could write a regex that matches that and
triggers the early failure detection before the playbook completes.

For hour-long unit test jobs, this could save a considerable amount
of time.

Note that this adds the google-re2 library to the Ansible venvs.  It
has manylinux wheels available, so is easy to install with
zuul-manage-ansible.  In Zuul itself, we use the fb-re2 library which
requires compilation and is therefore more difficult to use with
zuul-manage-ansible.  Presumably using fb-re2 to validate the syntax
and then later actually using google-re2 to run the regexes is
sufficient.  We may want to switch Zuul to use google-re2 later for
consistency.

Change-Id: Ifc9454767385de4c96e6da6d6f41bcb936aa24cd
2023-08-21 16:41:21 -07:00
Simon Westphahl 3b011296e6 Keep task stdout/stderr separate in result object
Combining stdout/stderr in the result can lead to problems when e.g.
the stdout of a task is used as an input for another task.

This is also different from the normal Ansible behavior and can be
surprising and hard to debug for users.

The new behavior is configurable and off by default to retain backward
compatibility.

Change-Id: Icaced970650913f9632a8db75a5970a38d3a6bc4
Co-Authored-By: James E. Blair <jim@acmegating.com>
2023-08-17 16:22:41 -07:00
Zuul c339a97e4d Merge "Add test for reporting of transient build errors" 2023-08-15 09:53:10 +00:00
Zuul 0a82e72521 Merge "Don't cancel Github check-run during re-enqueue" 2023-08-15 07:12:39 +00:00
Simon Westphahl d1d886bc03
Report error details on Ansible failure
In case of a retry there might be no logs available to help the user
understand the reason for a failure. To improve this we can the details
of the failure as part of the build result.

Change-Id: Ib9fdbdec5d783a347d1b6e5ce6510d50acfe1286
2023-08-07 10:13:16 +02:00