This adds support for deduplicating jobs within dependency cycles.
By default, this will happen automatically if we can determine that the
results of two builds would be expected to be identical. This uses a
heuristic which should almost always be correct; the behavior can be
overidden otherwise.
Change-Id: I890407df822035d52ead3516942fd95e3633094b
If a role is applied to a host more than once (via either play
roles or include_roles, but not via an include_role loop), it will
have the same task UUID from ansible which means Zuul's command
plugin will write the streaming output to the same filename, and
the log streaming will request the same file. That means the file
might look this after the second invocation:
2022-05-19 17:06:23.673625 | one
2022-05-19 17:06:23.673781 | [Zuul] Task exit code: 0
2022-05-19 17:06:29.226463 | two
2022-05-19 17:06:29.226605 | [Zuul] Task exit code: 0
But since we stop reading the log after "Task exit code", the user
would see "one" twice, and never see "two".
Here are some potential fixes for this that don't work:
* Accessing the task vars from zuul_stream to store any additional
information: the callback plugins are not given the task vars.
* Setting the log id on the task args in zuul_stream instead of
command: the same Task object is used for each host and therefore
the command module might see the task object after it has been
further modified (in other words, nothing host-specific can be
set on the task object).
* Setting an even more unique uuid than Task._uuid on the Task
object in zuul_stream and using that in the command module instead
of Task._uuid: in some rare cases, the actual task Python object
may be different between the callback and command plugin, yet still
have the same _uuid; therefore the new attribute would be missing.
Instead, a global variable is used in order to transfer data between
zuul_stream and command. This variable holds a counter for each
task+host combination. Most of the time it will be 1, but if we run
the same task on the same host again, it will increment. Since Ansible
will not run more than one task on a host simultaneously, so there is
no race between the counter being incremented in zuul_stream and used
in command.
Because Ansible is re-invoked for each playbook, the memory usage is
not a concern.
There may be a fork between zuul_stream and command, but that's fine
as long as we treat it as read-only in the command plugin. It will
have the data for our current task+host from the most recent zuul_stream
callback invocation.
This change also includes a somewhat unrelated change to the test
infrastructure. Because we were not setting the log stream port on
the executor in tests, we were actually relying on the "real" OpenDev
Zuul starting zuul_console on the test nodes rather than the
zuul_console we set up for each specific Ansible version from the tests.
This corrects that and uses the correct zuul_console port, so that if we
make any changes to zuul_console in the future, we will test the
changed version, not the one from the Zuul which actually runs the
tox-remote job.
Change-Id: Ia656db5f3dade52c8dbd0505b24049fe0fff67a5
We have two CLIs: zuul-client for REST-related operations, which cover
tenant-scoped, workflow modifying actions such as enqueue, dequeue and
promote; and zuul which supercedes zuul-client and covers also true admin
operations like ZooKeeper maintenance, config checking and issueing auth tokens.
This is a bit confusing for users and operators, and can induce code
duplication.
* Rename zuul CLI into zuul-admin. zuul is still a valid endpoint
and will be removed after next release.
* Print a deprecation warning when invoking the admin CLI as zuul
instead of zuul-admin, and when running autohold-*, enqueue-*,
dequeue and promote subcommands. These subcommands will need to be
run with zuul-client after next release.
* Clarify the scopes and deprecations in the documentation.
Change-Id: I90cf6f2be4e4c8180ad0f5e2696b7eaa7380b411
If the list of branches for a project includes items that are not (yet)
in the min. ltimes for a layout state we can end up in a situation where
a scheduler is unable to start.
2022-05-16 17:34:50,895 ERROR zuul.Scheduler: Error starting Zuul:
Traceback (most recent call last):
File "/opt/zuul/lib/python3.8/site-packages/zuul/cmd/scheduler.py", line 98, in run
self.sched.prime(self.config)
File "/opt/zuul/lib/python3.8/site-packages/zuul/scheduler.py", line 931, in prime
tenant = loader.loadTenant(
File "/opt/zuul/lib/python3.8/site-packages/zuul/configloader.py", line 2530, in loadTenant
new_tenant = self.tenant_parser.fromYaml(
File "/opt/zuul/lib/python3.8/site-packages/zuul/configloader.py", line 1631, in fromYaml
self._cacheTenantYAML(abide, tenant, loading_errors, min_ltimes,
File "/opt/zuul/lib/python3.8/site-packages/zuul/configloader.py", line 1896, in _cacheTenantYAML
pb_ltime = min_ltimes[project.canonical_name][branch]
KeyError: 'new_branch'
The difference could be due to a missed branch creation event or a
simple race condition. The later case might fix itself after the
reconfig triggered by the branch creation event was processed.
Change-Id: I1838e66bc5296f153aa4c7a83ac0addb6c4db1aa
This allows operators to filter the set of branches from which
Zuul loads configuration. They are similar to exclude-unprotected-branches
but apply to all drivers.
Change-Id: I8201b3a19efb266298decb4851430b7205e855a1
The enqueue-message attribute was missing from the schema, which made it
an error to actually try an set it in a pipeline.
Change-Id: Icf0a01c7f4dbbc07e480d7d319a7c5f433a85fa1
Merges cannot be cherry-picked in git, so if a change is a merge, do a
`git merge` instead of a cherry-pick to match how Gerrit will merge the
change.
Change-Id: I9bc7025d2371913b63f0a6723aff480e7e63d8a3
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Setting the logfile isFetching initial state to true is more accurate to
reality as when the logfile object is first insantiated it has not been
fetched and is going to be fetched. This is important to ensure our
useEffect() callbacks file and apply at the right times otherwise they
will update values too early, fire, then not fire again when the
fetching is complete. If they fire before fetching is complete then they
do not function.
Change-Id: Ic8cd2e4ab2d2d7fd5f74ff6862f719c0aaa756dc
We did not clearly identify all of the exceptions to the standard
inheritance behavior ("override"). Nor did we explicitly indicate
that was the standard. Correct that by adjusting the documentation
accordingly.
Also, we did not document the idea of a global repo state. Correct
that as well.
Change-Id: If8848145bf353483df08c1ee6e3fcaafe572a2b6
The passlib library is needed to generate bcrypt password hash.
If the passlib is not installed, Zuul will print a message:
*0
or
crypt.crypt does not support 'bcrypt' algorithm
Change-Id: Ib1adc385bea519ac55fd23ba9da21e0d78f14dcb
A couple of locations continue to reference actiongeneral which has been
removed. Update these locations to use action as the current location
for these plugins.
Change-Id: I71c03d2c0a84592be66fa0d84bc684684a392a27
Node request failures cause a queue item to fail (naturally). In a normal
queue without cycles, that just means that we would cancel jobs behind and
wait for the current item to finish the remaining jobs. But with cycles,
the items in the bundle detect that items ahead (which are part of the bundle)
are failing and so they cancel their own jobs more agressively. If they do
this before all the jobs have started (ie, because we are waiting on an
unfulfilled node request), they can end up in a situation where they never
run builds, but yet they don't report because they are still expecting
those builds.
This likely points to a larger problem in that we should probably not be
canceling those jobs so aggressively. However, the more serious and immediate
problem is the race condition that can cause items not to report.
To correct this immediate problem, tell the scheduler to create fake build
objects with a result of "CANCELED" when the pipeline manager cancels builds
and there is no existing build already. This will at least mean that all
expected builds are present regardless of whether the node request has been
fulfilled.
A later change can be made to avoid canceling jobs in the first place without
needing to change this behavior.
Change-Id: I1e1150ef67c03452b9a98f9366434c53a5ad26fb
Change I733e48127f2b1cf7d2d52153844098163e48bae8 removed ARA, which
was indirectly depending on netaddr. Without netaddr, Ansible IPv6
tasks will break, even though Ansible doesn't itself declare a
dependency on it.
Explicitly add netaddr to our Ansible venvs, so we can perform tasks
which require it.
Change-Id: Ic214377c3e50acc93c2a4a9e564818169b8e2552