When the git command crashes or is aborted due to a timeout we might end
up with a leaked index.lock file in the affected repository.
This has the effect that all subsequent git operations that try to
create the lock will fail. Since Zuul maintains a separate lock for
serializing operations on a repositotry, we can be sure that the lock
file was leaked in a previous operation and can be removed safely.
Unable to checkout 8a87ff7cc0d0c73ac14217b653f9773a7cfce3a7
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 1045, in _mergeChange
repo.checkout(ref, zuul_event_id=zuul_event_id)
File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 561, in checkout
repo.head.reset(working_tree=True)
File "/opt/zuul/lib/python3.10/site-packages/git/refs/head.py", line 82, in reset
self.repo.git.reset(mode, commit, '--', paths, **kwargs)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 542, in <lambda>
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 1005, in _call_process
return self.execute(call, **exec_kwargs)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 822, in execute
raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git reset --hard HEAD --
stderr: 'fatal: Unable to create '/var/lib/zuul/merger-git/github/foo/foo%2Fbar/.git/index.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.'
Change-Id: I97334383df476809c39e0d03b1af50cb59ee0cc7
GitHub supports a "rebase" merge mode where it will rebase the PR
onto the target branch and fast-forward the target branch to the
result of the rebase.
Add support for this process to the merger so that it can prepare
an effective simulated repo, and map the merge-mode to the merge
operation in the reporter so that gating behavior matches.
This change also makes a few tweaks to the merger to improve
consistency (including renaming a variable ref->base), and corrects
some typos in the similar squash merge test methods.
Change-Id: I9db1d163bafda38204360648bb6781800d2a09b4
To avoid issues with outdated Github access tokens in the Git config we
only update the remote URL on the repo object after the config update
was successful.
This also adds a missing repo lock when building the repo state.
Change-Id: I8e1b5b26f03cb75727d2b2e3c9310214a3eac447
Merges cannot be cherry-picked in git, so if a change is a merge, do a
`git merge` instead of a cherry-pick to match how Gerrit will merge the
change.
Change-Id: I9bc7025d2371913b63f0a6723aff480e7e63d8a3
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
The fix including 2 parts:
1. For Gtihub, we use the base_sha instead of target branch to
be passed as "tosha" parameter to get precise changed files
2. In method getFilesChanges(), use diff() result to filter out
those files that changed and reverted between commits.
The reason we do not direcly use diff() is that for those
drivers other than github, the "base_sha" is not available yet,
using diff() may include unexpected files when target branch
has diverged from the feature branch.
This solution works for 99.9% of the caseses, it may still get
incorrect list of changed files in following corner case:
1. In non-github connection, whose base_sha is not implented, and
2. Files changed and reverted between commits in the change, and
3. The same file has also diverged in target branch.
The above corner case can be fixed by making base_sha available in
other drivers.
Change-Id: Ifae7018a8078c16f2caf759ae675648d8b33c538
If a merger or executor is unable to reset a repo, we currently
simply log the message "Unable to reset repo". Instead, let's
assume that it is permanently broken and rmtree it so that future
attempts will automatically recover.
Change-Id: I17b051d70a9c5800019bf9ef7e0800558614cadd
Some merge operations catch too generic exceptions which causes
BrokeProcessPool exceptions to never reach the executor to allow the
executor to recover.
Bubble these exceptions up to the exececutor for them to be handled.
Change-Id: I77d4d381e12195bcfe7d831a2b9e6d361b90f5a2
It can happen that the remote ref (corresponding to the branch in
cache) is not available when local workspace is cloned.
Fix this issue by creating the remote ref when it does not exist.
Change-Id: I68244e0b5aa3c8b6e15693ffc2897d4f416e0d5c
This change adds support for configuring non-top-level directories
(e.g. `foobar/zuul.d/`) as an extra config path which did not work so
far.
It's not clear if this was a bug or intended behavior that was just not
documented.
Change-Id: I1bc468130c9324a2e1b5d7f50b42fdc045eaa741
This is an attempt to avoid getting the refs twice on the repo and
simplify _saveRepoState by directly using getPackedRefs.
Change-Id: I27876571451554caca19bdf9ae7ff502d2d4e062
When the initial merge job for a queue item fails, users typically
see a message saying "this project or one of dependencies failed
to merge". To help users and/or administrators more quickly identify
the problem, include connection project and change information in
a warning message posted to the code review system.
Change-Id: If1bced80b87b908f63867083efb306ebe02ed1ee
The reverted change can lead to the listing of files that are not
changed in the referenced commit(s). This can e.g. happen if the base
branch (e.g. master) has diverged from the feature branch.
This is now also tested to avoid regressions in the future. The issue
related to files that are added/removed in the same range of commits
(e.g. a PR) needs to be addressed in a separate change.
This reverts commit e63d7b0cdb.
Change-Id: I07bc4a09bf162fdbc4c2daeecb19e12d81241801
In case of large repository with more than 10k refs,
this method use actualy an async call from Gitpython to retrieve sha1.
Gitpython open file filesystem for each refs
For example with repository with 18k tags,
a merger instance take 100% of one CPU (not threadless) for ~ 3min
to perform the loop
To improve this, we store all sha1 of tag directly from
a git command (for_each_ref), this method open once the packedref
of repository to extract all refs.
If a ref is not in the dict we use fallback method `ref.object`
Change-Id: I8b52b39cb79527791a34ac98a25e7ee41c8d4956
This adds a variable which may be useful for debugging or auditing
the repo state of playbooks or roles for a job.
Change-Id: I86429a06ed8625faa72db6a19630de633f1694b6
The original implementation takes into account the changed fils from
all commits of a PR.
It causes a bug when files get changed and reverted in those commits.
e.g. A file is added in first commit then removed in second commit,
this file should should not be considered as a changed file in the PR.
Change-Id: I7db8b9d3f3267073c5e1a71f52e75939ffa91773
The scheduler depends on merge completed events in order to advance
the lifecycle of a queue item. Without them, items can be stuck in
the queue indefinitely.
In the case of certain merge errors, we may not have submitted a
result to the event queue. This change corrects that.
Change-Id: I9527c79868ede31f1fa68faf93ff113ac786462b
We aren't generally using type annotations, and mypy itself is
somewhat flawed, occasionally producing false positives and rarely
catching errors. Stop running it.
Change-Id: I6f24457f7d99ca11ec9228e505e6edec558baf9e
This puts the merger result events (MergeCompleted, FilesChanged) into
ZooKeeper.
This is the first step to put the merge jobs into ZooKeeper, similar to
the builds.
This doesn't change the logic how the result events are processed. The
main change is done in the scheduler callbacks (onMergeCompleted,
onFilesChanged) which now put a serializable merge result event into the
ZooKeeper result event queue rather than the local queue. The event
handler methods are adapted to work with those new events.
As the buildset is not serializable in its current state, we provide the
buildset UUID to the events and look up the corresponding buildset in
the event processing methods based on the provided pipeline and queue.
Change-Id: I033cf27bc8035afbd743e37292da37fde6d0e0b8
With some of the newer changes (most probably the global repo state),
these log messages could increase the amount of log messages in a Zuul
deployment quite heavily.
Thus move them into its own sub logger so it can be stripped or split
out via log config if needed without sacrificing other usefull logs
from zuul.Repo.
Change-Id: I6206a5938e788733950451292403d2b2525753ed
This adds the concept of a 'scheme' to the merger. Up to this point,
the merger has used the 'golang' scheme in all cases. However it is
possible with Gerrit to create a set of git repositories which collide
with each other using that scheme:
root/example.com/component
root/example.com/component/subcomponent
The users which brought this to our attention intend to use their repos
in a flat layout, like:
root/component
root/subcomponent
To resolve this we need to do two things: avoid collisions in all cases
in the internal git repo caches of the mergers and executors, and give
users options to resolve collisions in workspace checkouts.
In this change, mergers are updated to support three schemes:
* golang (the current behavior)
* flat (new behavior described above)
* unique
The unique scheme is not intended to be user-visible. It produces a
truly unique and non-conflicting name by using urllib.quote_plus. It
sacrifices legibility in order to obtain uniqueness.
The mergers and executors are updated to use the unique scheme in their
internal repo caches.
A new job attribute, 'workspace-scheme' is added to allow the user to
select between 'golang' and 'flat' when Zuul prepares the repos for
checkout.
There is one more kind of repo that Zuul prepares: the playbook repo.
Each project that supplies a playbook to a job gets a copy of its repo
checked out into a dedicated directory (with no sibling repos). In that
case there is no risk of collision, and so we retain the current behavior
of using the golang scheme for these checkouts. This allows the playbook
paths to continue to be self-explanatory. For example:
trusted/project_0/example.com/org/project/playbooks/run.yaml
Documentation and a release note are added as well.
Change-Id: I3fa1fd3c04626bfb7159aefce0f4dcb10bbaf5d9
Several changes in an attempt to clarify exactly when updates and
resets should and do happen:
* Remove the repo_state argument from Merger.getRepo()
It was unclear under what circumstances the low-level repo object
honored repo_state (not much). Remove it entirely and rely on
high-level Merger methods to deal with repo_state.
* Have merger.setRepoState() operate on one project instead of a
list of items
Part of the reason we were passing repo_state to low-level
methods was to reset the state for required projects in the
executor. Essentially there were three cases: projects of change
items, projects of non-change items, and projects of neither but
in required-projects. The low-level repo_state usage only
handled the last, the first is easy, and the second we handled by
creating a list of non-change items and passing it to
setRepoState on the merger.
A simpler method of handling all of that is to reduce it to two
cases: projects of change items (which need to be merged) and the
rest (which need to be restored). If we do that, we can maintain
a set of projects we've seen while merging in the first case,
then iterate over all the remaining projects and call
setRepoState on each in the second.
* Remove the update call from Repo.reset()
This lets us call Repo.reset() frequently (i.e., at the start of
any operation that writes to the merger's git repo working dir)
without performing a git fetch. We need to make sure we call
Repo.update() where necessary.
* Remove the reset call from Merger.updateRepo()
This will now only call repo.update(), and even that will only
happen if the repo_state says we should. So we can safely call
this before any significant operations and know that it will
update the repo if necessary.
* Add an update() call to getRepoState()
Because we removed the update() call from Repo.reset(), we need
to add one here next to the existing call to reset().
* Add a reset call to getFiles()
It relied on the reset in updateRepo.
* Set execution_context to False on the executor's main merger
The execution_context parameter determines whether we manipulate the
origin remotes to point at the previous commit. This should be set
for mergers that operate on the build work dir, but it should not
be set for the main merger within the executor (so the main merger
behaves just like a standalone merger). It previous was erroneously
set for the executor's main merger and this change corrects that.
* Add Merger.updateRepo() calls in the merger server merge method
The merger needs to update and reset each repo before merging changes.
Currently _mergeItem resets the repo the first time it encounters it.
But we still need to update the repo. We don't want to update within
the merger method because the executor performs batch updates in
parallel before starting a merge and we don't want to re-do that work.
So instead we add it to the merger server invocation, so it's only
used in the merger:merge gearman function code path.
Change-Id: I740e958357dc7bf0a6506474c5991da12ab6264e
If we pass a repo state to checkoutBranch, we should check out the
repo/branch as specified by the repo state.
FYI, there are two theoretical ways we could call checkoutBranch:
1) Without a repo state (in which case we would expect to checkout
the current value of the branch in the repo; presumable this would
be done after an update() call also without a repo state, which means
the current branch would match upstream).
2) With a repo state, in which case we would expect to check out
the branch as specified by the repo state.
Currently Zuul only uses option 2.
Change-Id: Icad68a337b3ff5fc80af32ee0a4845cc83daa14b
This is two changes squashed. First:
Fix missing repo state restore
The global repo state handling misses the restoration of the repo
states of projects that are not part of the dependency chain. This can
be generically fixed by ensuring that the repo state is restored
immediately after clone into the job workspace.
Original Change-Id: I61db67edb3952cdba7709b5b597dac93be4b6dde
Second:
Keep jobgraphs frozen across reconfiguration
This removes test cases which are no longer be relevant.
Many of these were testing various mutations across reconfigurations,
but with job graphs frozen, about the only thing that we expect to
change now is when a pipeline, project, or tenant is deleted. Test
cases are modified or added to test these.
It appears even the current code may have some bugs related to deleting
pipelines and tenants. The improved testing in this change highlighted
that. The scheduler is updated to ensure that it cancels all jobs on
pipelines or tenants that are removed from a running configuration. This
should ensure we don't leak nodes or semaphores.
Change-Id: I2e4bd2fb9222b49cb10661d28d4c52a3c994ba62
Co-Authored-By: James E. Blair <jim@acmegating.com>
This reverts commit 02ca9aeb8f.
This makes a couple of changes to make sure we're passing in the
full repo_state to updateRepo rather than the project repo state.
Change-Id: Ifca2cd48f24b9cf8eec718034c879ffe75fb6ecc
The isUpdateNeeded call was not operating on the actual dictionary
format that is passed in. The tests did not catch this because they
pass in the format that is expected. Update both the tests and the
calling code in the merger to fix.
This breaks the just-added fast-forward test, which shows us that
the current behavior really is broken.
Change-Id: I34b7dbe1d4f7032d217bca30ca9a8d3c986c1915
We discovered a regression in the global repo state that can lead to
wrong commits checked out on required projects. Further a fix for this
needs a slight re-design of the reconfiguration process. In order to
have some more time to do this revert it for now.
This reverts commit 175990ec42.
Change-Id: Ibcf3758ab886a01468095a8c588cf78db209529e
Store repo state globally for whole buildset
including inherited and required projects.
This is necessary to avoid inconsistencies in case,
e.g., a required projects HEAD changes between two
dependent jobs executions in the same buildset.
Change-Id: I872d4272d8a594b2a40dee0c627f14c990399dd5
The bugfix to disambiguate filepath from ref has been merged for a long
time so it should be safe to replace the method with gitpython.
See:
https://github.com/gitpython-developers/GitPython/pull/319
Change-Id: Ic0c5253273ea47da6567047c9172adbf513c1500
A badly configured .gitmodules committed to a repo can currently break
zuul by not allowing it to git fetch [1]. There is already logic handling
merge conflicts in .gitmodules by resetting the repo but this is not
enough if the current commit is faulty. Instead fix this by trying to
reset to a commit before .gitmodules was introduced.
[1] Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/zuul/executor/server.py", line 2935, in _innerUpdateLoop
self.merger.updateRepo(
File "/usr/local/lib/python3.8/site-packages/zuul/merger/merger.py", line 805, in updateRepo
repo.reset(zuul_event_id=zuul_event_id, build=build,
File "/usr/local/lib/python3.8/site-packages/zuul/merger/merger.py", line 397, in reset
self.update(zuul_event_id=zuul_event_id, build=build)
File "/usr/local/lib/python3.8/site-packages/zuul/merger/merger.py", line 601, in update
self._git_fetch(repo, 'origin', zuul_event_id, tags=True, prune=True)
File "/usr/local/lib/python3.8/site-packages/zuul/merger/merger.py", line 265, in _git_fetch
repo.git.fetch(remote, ref_to_fetch,
File "/usr/local/lib/python3.8/site-packages/git/cmd.py", line 542, in <lambda>
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/git/cmd.py", line 1006, in _call_process
return self.execute(call, **exec_kwargs)
File "/usr/local/lib/python3.8/site-packages/git/cmd.py", line 823, in execute
raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git fetch -f --prune --tags origin
stderr: 'fatal: bad config line 9 in file /var/lib/zuul/executor-git/.../.gitmodules'
Change-Id: I33f8f5905167ebb95ab58f9ef192359573495927
Part of point 5 in https://etherpad.openstack.org/p/zuulv4
Connection is idle for now.
Also update component documentation.
Change-Id: I97a97f61940fab2a555c3651e78fa7a929e8ebfb
In case a branch name contained the '@' character this was interpreted
by Gitpython according to the format supported by git-rev-parse when
resolving the ref name.
Since we already have the commit for the ref we can use it directly,
that way avoiding any issues caused by parsing the ref name.
Change-Id: I49665c62389245f937317e70f093d33a4bf759d3
GitPython is trying to be nice and thinks that we want the head '<name>' when trying
to create the head 'refs/heads/<name>'.
Avoid this by always prepending 'refs/heads/'.
Change-Id: I90768fd678b02a7fbfc0675456dc4105b51d7f06
Change I99f0c8edaae2185c5dbf855398ace2522237226d allowed offloading the
repo reset to a process pool. We can take advandage of this also for the
the repo reset during merge as this is also taking longer under high
load.
Change-Id: I14e704ab818c8c2e0405f0adfc55cee564ed4d1e
Since executors can also handle merge jobs and those merges happen in
the executor's repo cache we need to protect temporary merger refs from
being garbage collected.
Because the executor's update jobs might reset the local branch heads in
between merges, we create the refs for the speculative branch state in
'refs/zuul' instead. Those refs are cleaned up when the related branch
no longer exists.
Branch names for the Zuul refs are hashed (SHA1) in order to avoid
issues with empty directories when the branch name contains slashes.
E.g. the speculative state of the master branch will be referenced by
'refs/zuul/4f26aeafdb2367620a393c973eddbe8f8b846ebd'
Change-Id: Idd2b0bd2dfeba22f3961f851f8a463bc5c9d37ff
Some projects would like to keep anything Zuul related, including
playbooks/roles, in the .zuul.d directory. Currently Zuul considers
anything in these directories ending with .yaml as configuration,
leading to parsing errors if you put playbook or role YAML files in
subdirectories there.
Allow this by adding a check for .zuul.ignore stamp files in
subdirectories of .zuul.d/zuul.d. If found, that directory prefix
will be pruned and not considered for config file read.
Change-Id: I7dbd3bb23648b7d17be0e2a0ea24e51c160e7940