If ansible can't parse the yaml, it dies immediately with return code 4.
This happens before callback plugins, so nothing gets captured to
the log. Save the first 200 lines in case there result is a syntax
error. If we detect one, append directly to the job's log file.
Change-Id: I059a61357c471f61485f141f92c35b2bcf2d168d
We support configuring an alternate port for finger. Make sure it makes
its way into the URL we provide if it's provided.
Change-Id: I5f511e15c031755d5c90627830ed29b80c6285fd
The word "manager" in zuul primarily means "PipelineManager" and is
confusing here. We also send back "worker_name" which is intended to be
the unique name for the worker.
For now, send back hostname for the worker_name. In the future, we can
update this to be overridable per-executor in the config.
While we're at it, remove no-longer valid references to other worker
data from the docs.
Change-Id: Ibe5cf7295f133c9dc48162b40ae1f625a19643dc
This lets things like ~.ssh/known_hosts work as expected.
Change-Id: Id2ea5c672135dbf8314a8a07efe4e48fb23fb22f
Co-Authored-By: Paul Belanger <pabelanger@redhat.com>
It's difficult to start zuul in the normal way via with an extra
argument for debugging. Make this an IPC accessible toggle.
The argument is kept for the time being because it's used by the
unit tests, and it's the only way to specify the value on startup,
but we should replace it later with a config file setting.
Change-Id: Ia59a9383fcf90a00e1475977629b7d71d3a40cb0
These repos may be used by the job, but may not show up in the
dependency chain, or as required-projects for the job. To make
sure that the executor always runs the most recent content,
add them to the list of projects to update before running a job.
Change-Id: Ia6c454e52f0ecb8b6d1b80124692ab7d63f81bd1
It exists only for py2/py3 compat. We do not need it any more.
This will explicitly break Zuul v3 for python2, which is different than
simply ceasing to test it and no longer declaring we support it. Since
we're not testing it any longer, it's bound to degrade overtime without
us noticing, so hopefully a clean and explicit break will prevent people
from running under python2 and it working for a minute, then breaking
later.
Change-Id: Ia16bb399a2869ab37a183f3f2197275bb3acafee
This change adds 'ssh_port' to the Node class so that zuul-executor can
use a custom ssh server when it is set by nodepool provider.
Change-Id: Icdac6cd41dded0e46fba5d14c31f40810f73b74a
This change renames untrusted_wrapper to execution_wrapper and uses
bubblewrap for both trusted and untrusted playbooks by default.
This change adds new options to the zuul.conf executor section to let
operators define what directories to mount ro or rw for both context:
* trusted_ro_dirs/trusted_rw_dirs, and
* untrusted_ro_dirs/untrusted_rw_dirs
Change-Id: I9a8a74a338a8a837913db5e2effeef1bd949a49c
Story: 2001070
Task: 4687
This change adds a new get_default library procedure to simplify getting
default value of config object.
Change-Id: I0546b1175b259472a10690273af611ef4bad5a99
Because we already have the known_hosts for all nodes we are planning
to use for a job, it makes sense to prime bubblewrap with this
information. This is specifically needed if using the verify_host
field in synchronize, because it relies on the ssh client knowning
have to read the default known_hosts file.
Change-Id: Ifdb3ac1eb7443beacb9277b5749d773b0c6aa4ad
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
The new upcoming log streaming will be based on a protocol nearly
identical to finger. So much so, that it's actually finger. Isn't that
cool?
Change-Id: I51ef51ee236227e7816effe6683733ae3f29750a
To try to understand the delay after a playbook completes, add
some debug entries, some of which may be useful on their own.
Change-Id: I7a207574c333aceb0d4d7f028ada0eb10cfdc8b7
Now that we have SSH Agents setup, we no longer need to reference our
private key directly.
Change-Id: I15b0127e97214b330ff61353729ac7053cad124f
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Enable SSL support for gearman. We also created an new SSLZuulBaseTest
class to provide a simple way to use SSL end to end where possible. A
future patch will enable support in zookeeper.
Change-Id: Ia8b89bab475d758cc6a021988f8d79ead8836a9d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This moves our nodepool variables more inline with how we handle our
zuul ansible variables. Both will now be a dict.
Change-Id: I069203328cce0bc2d4bf31f31351209bf2b6cb5a
Depends-On: Ia13e6e9e89d24ac3c9c62a0286fba0279b5408b3
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
If we split the string then we end up with a list, which will never work
when looking up a driver. I assume this was supposed to have been strip
initially.
Change-Id: Ib6b3990743981853c8369ee6db0ef606fca3a5f0
Signed-off-by: Jamie Lennox <jamielennox@gmail.com>
ansible-playbook uses basicConfig at the top of the command, which
essentially takes over everything if you send things to log_path.
Instead, go ahead and log things ourselves so that our log lines can all
look consistent (and so that we can easily html-ify) We pass in the
location via an environment variable since we're not using the log_dir
option in ansible.cfg.
As of this patch, there will be less output. A follow up will add back
the missing callback plugin bits.
Change-Id: Ic3ff57bba7a3a23dc5d0055e8e9888f24641f7d5
This was testing some things that seeminly weren't even plumbed through
in Zuul v2.5 except for in the fake build. Removing them from the model
and support commands as well. We can always add back in the things we
need.
Change-Id: I47ee260e2e0a1cb5350b2f22a9b4c61dd1521aae
Story: 2000773
Task: 4617
When starting executor a second time, a traceback was happening because
the destination for the ansible files already existed. shutil.copytree
requires that the destination path does not exist. Since what we seem to
want is a fresh copy of the ansible files at start up, and no left over
cruft, removing the tree seems correct.
Change-Id: I92418f0e6233558f8aaee796a679d638f7c4c4c7
Instead of using a vars.yaml file and a -e command, just put the
variables into the all group in the inventory file. One less file to
manage, single file to look at for debugging.
Change-Id: I5b1f149ecca649b1434488392cc8232de20cd4fc
All of the zuul config files are yaml. Ansible supports yaml formatted
inventory files. If we use yaml, then displaying inventories to users
in raw log files should be easier to read for non-Ansible users.
Variables are set under the hostname, rather than as key=value strings
on the same line. We can also stop writing the vars file and just add
the variables to the top level vars of the same inventory file, but
let's do that next.
Construct the dict as an unattached function not a method to allow for
easy interative unittesting as context isn't needed to validate inputs
and outputs.
Change-Id: Ife7a909ea0e54015bddbf5426343dd5c5911953c
If users are expected to be able to use ansible content written for
production, it is important to be able to define arbitrary groups of
nodes in their inventory. For instance, a playbook to deploy OpenStack
may want a groups called controller, compute, ceph-osd and ceph-monitor,
but a job to test that playbook may want three nodes, one called
compute, one called controller1 and one called controller2. For the test
job, I would want to put controller1 in the ceph-osd group and
controller1 and controller2 in the ceph-monitor group.
nodepool does not need to know anything about these - they are just
logical names the user is describing to make it into the inventory.
There are currently no tests of the inventory we're writing out. The
next patch adds a test to ensure that inventories are written out
properly.
Change-Id: I5555c86ffa96e6a43df5e46302f4e76840372999
The executor maintains a set of all of the repos known to Zuul.
This is primarily for the purpose of having a local cache from which
to clone the repos which are used in jobs. To that end, before each
job starts, it submits a request to a queue to update all the repos
in the job. This makes sure they are up to date before being cloned
for the job.
Since we have that set of repos handy, we can also use them to perform
merge operations. In other words, the executor con also act as a
merger. This can be useful under heavy load, or in the case of a
very simple Zuul installation which has no other merger.
However, merge and update operations must not run at the same time, as
simultaneous access to the git repo may cause errors. To that end,
set up a mutex around merge jobs and update tasks.
Since the primary purpose of the repos is to perform update tasks,
create a special gearman worker for the merge tasks so that we
can de-prioritize them in the executor. In this, we delay response
to a NOOP packet until the update queue is empty. That means that
if the gearman server notifies us that a merge job is ready, we
won't grab it unless our merger is otherwise idle (we can still
race here and get an update task between NOOP and GRAB_JOB, but
there's little we can do about that, and the worst case is that
we briefly delay a merge job).
Since the executor jobs are now in a different worker, their NOOP/
GRAB_JOB cycle is unimpeded, even if the merge worker is waiting.
Change-Id: Icf3663b1a2ce5309e496b1106d5adee6579e37c7
On startup the executor copies the ansible modules into its state
dir. In case of restarting the executor it could happen that it tries
to copy the __pycache__ folder which fails with [1]. This can be fixed
by not copying it as it's not needed.
[1] Example error trace:
Traceback (most recent call last):
File "/usr/bin/zuul-executor", line 10, in <module>
sys.exit(main())
File "/opt/zuul/lib/python3.5/site-packages/zuul/cmd/executor.py", line 170, in main
server.main(False)
File "/opt/zuul/lib/python3.5/site-packages/zuul/cmd/executor.py", line 136, in main
keep_jobdir=self.args.keep_jobdir)
File "/opt/zuul/lib/python3.5/site-packages/zuul/executor/server.py", line 312, in __init__
_copy_ansible_files(zuul.ansible.library, self.library_dir)
File "/opt/zuul/lib/python3.5/site-packages/zuul/executor/server.py", line 240, in _copy_ansible_files
shutil.copytree(full_path, os.path.join(target_dir, fn))
File "/usr/lib/python3.5/shutil.py", line 309, in copytree
os.makedirs(dst)
File "/usr/lib/python3.5/os.py", line 241, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/mnt/zuul/state/ansible/library/__pycache__'
Change-Id: I1ed334d67ee59e2f9157eca34c9376f7af9ea457
This way we can bind mount it in and set it as a plugin path in bwrap
without needing all of zuul.
Change-Id: Ibb81167895c73b64bb49809d007f4768013c7220
Co-Authored-By: James E. Blair <jeblair@redhat.com>
Co-Authored-By: Monty Taylor <mordred@inaugust.com>
This will be the minimum "batteries included" bubblwrap driver. It does
not do any MAC configuration, since these vary by system. Operators
may wish to wrap it further in a MAC wrapper driver.
Because we set bubblewrap as the default wrapper, test_playbooks tests
it. However, it lacks a negative test, so we won't know if we're not
actually containing things.
Users who don't have bubblewrap or don't wish to use it can set the
untrusted_wrapper to 'nullwrap' which will just execute things as
they're done before this change.
Change-Id: I84dd7c8cc55d2110b58609784007ffda0d135716
Story: 2000910
Task: 3540
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This lets us use the build UUID as the job temp dir name
so that the finger log streamer can search the job root dir
to find the requested log by build ID.
Also, since the JobDir parameters aren't really optional, just
make that explicit.
Change-Id: Ifeb9f37c6c9c1ce792079e63a6c507461081f03c
Create a logger adapter for executor jobs that ensures that every
log entry has the job uuid associated with it. Pass this into
the associated mergers as well, since they are also performing
interesting tasks in parallel on the executor.
The uuid is also added to the extras dict so that it's available
to logging systems which can handle extra data.
Change-Id: I28e0bcd0f030361659e5a72c162d549c1f0d6acb
Debugging some issues revealed a problem in the __eq__ method that is
patched here. This produced some red herring backtraces unnecessarily.
It's worth noting that close_fds on this subprocess.Popen call is
critical to the health of any other processes being spawned from Zuul.
Without it, git processes run by the git module went defunct and locked
things up in weird ways.
Change-Id: I6875568f4b7ccf261491c45086727250e58f5ed8
This mimics the behavior of Zuul cloner and allows the user to
specify what branches of each repo should be checked out in the
jobdir at the start of the job.
(Of course, the job is free to check out other branches as needed;
all of them will have the appropriate future state.)
Change-Id: I93af5c49cb0404944636c7e63d203cdb564b267c
Currently, we clone the job's working repos from the repo cache that
the executor maintains. However, some playbook and role repos do
not use that cache.
To facilitate more operations using the merger class, add the concept
of a repo cache to it, and use that in all job related git repo operations
on the executor.
Change-Id: I747b6602540458506e1f6d480a95b80c6543c5a8
Currently the results (i.e., the exact commits on each project-branch)
of a speculative merge operation are only available by examining the
Zuul refs created by the merger. The executor appears to use the results
of the speculative merge in its tests, but that's only surface deep --
it simply leaves the repo in the last merge state, which is sufficient
for simple tests, but not for multi-branch cross-repo dependencies.
This change returns the branch names and commits for each project-branch
involved in a speculative merge. The executor then updates the branch
head references in the job's working source directory to reflect the
merger results. In other words, even if the job is running for a change
on master, if that change depends on a change in stable then the stable
branch of the repo will now include the stable change.
Note: this change does not yet alter the branch checked out -- we're still
on a detached head. That will be addressed in a future change.
Change-Id: Id842d64312d87709f0ed93121735fe97faccc189
When the initial speculative merge for a change is performed at
the request of the pipeline manager, the repo state used to
construct that merge is saved in a data structure. Pass that
structure to the executor when running jobs so that, after cloning
each repo into the jobdir, the repos are made to appear the same
as those an the merger before it started its merge. The subsequent
merge operatons on the executor will repeat the same operations
producing the same content (though the actual commits will be
different due to timestamps).
It would be more efficient to have the executors pull changes from
the mergers, however, that would require the mergers to run an
accessible git service, which is one of the things that adds
significant complexity to a zuul deployment. This method only
requires that the mergers be able to initiate outgoing connections
to gearman and sources.
Because the initial merge may happen well before jobs are executed,
save the dependency chain for a given BuildSet when it's configuration
is being finalized. This will cause us to save not only the repository
configuration that the merger uses, but also the exact sequence of
changes applied on top of that state. (Currently, we build the series
of changes we apply before running each job, however, the queue state
can change (especially if items are merged) in the period between the
inital merge and job launch).
The initial merge is performed before we have a shadow layout for the
item, yet, we must specify a merge mode for each project for which we
merge a change. Currently, we are defaulting to the 'merge-resolve'
merge mode for every project during the initial speculative merge, but
then the secondary merge on the executor will use the correct merge
mode since we have a layout at that point. With this change, where
we are trying to replicate the initial merge exactly, we can't rely
on that behavior any more. Instead, when attempting to find the merge
mode to use for a project, we use the shadow layout of the nearest
item ahead, or else the current live layout, to find the merge mode,
and only if those fail, do we use the default. This means that a change
to a project's merge-mode will not use that merge mode. However,
subsequent changes will. This seems to be the best we can do, short
of detecting this case and merging such changes twice. This seems
rare enough that we don't need to do that.
The test_delayed_merge_conflict method is updated to essentially invert
the meaning of the test. Since the old behavior was for the initial
merge check to be completely independent of the executor merge, this
test examined the case where the initial merge worked but between that
time and when the executor performed its merge, a conflicting change
landed. That should no longer be possible since the executor merge
now uses the results of the initial merge. We keep the test, but invert
its final assertion -- instead of checking for a merge conflict being
reported, we check that no merge conflict is reported.
Change-Id: I34cd58ec9775c1d151db02034c342bd971af036f
When we ask a merger to speculatively merge changes, record the
complete starting state of each repo (defined as all of the refs
other than Zuul refs) and return that at the completion of all
of the merges.
This will later be used so that when a pipeline manager asks a
merger to speculatively merge a change, the process can later
be repeated by the (potentially multiple) executors which will
end up running jobs for the change. Between the time that the
merger runs and the jobs run, the underlying repos may have changed.
This ensures a consistent state throughout.
The facility which used saved zuul refs within the merger repo
to short-cut the merge sequence for an additional change added to
a previously completed merge sequence is removed, because in that
case, we would not be able to know the original repo state for the
earlier merge sequence. This is slightly less efficient, however,
we are proposing removing zuul refs anyway due to the maintenance
burden they cause.
Change-Id: If0215d53c3b08877ded7276955a55fc5e617b244
This makes the transition to python3 much smoother.
Change-Id: I9d8638dd98502bdd91cbe6caf3d94ce197f06c6f
Depends-On: If6bfc35d916cfb84d630af59f4fde4ccae5187d4
Depends-On: I93bfe33f898294f30a82c0a24a18a081f9752354
The logic here has grown enough that a series of copypasta is now fairly
beefy. Not that beef in pasta is bad or anything, but maybe let's do a
utility function.
Change-Id: Ibd91acac936f3dd52ccd57816de167e1487499ed
Some clouds have availability zones with spaces in their
names. Currently zuul generates something like this as ansible
inventory:
node nodepool_region=None (...) nodepool_az=Failure domain 1
This breaks ansible when trying to read the inventory. Quoting the
inventory variable values solves this issue.
Change-Id: I3e97f40986689b3779efc448eb0d5f1db009e796
When the executor checks out and runs content, the security context
(trusted or untrusted) comes in to play in two ways: whether
speculative merging should be used when checking out the content
and the level of access to ansible.
This is straightforward for playbooks: when running an untrusted
playbook, use the speculatively merged repo and the untrusted
ansible environment. When running a trusted playbook, only use
the branch tip of the playbook's repo, and the trusted ansible
environment.
When we consider roles, we also need to consider whether to use
the speculatively merged role repo, or the branch tip. The current
code uses the security context of the role repo to decide which to
do (untrusted role repo uses the speculatively merged repo,
trusted role repo uses branch tip). However this presents a problem.
Consider a job defined in a trusted repo which uses a role defined
in an untrusted repo. The playbook will be run in the trusted
execution context, but the role it depends on will come from the
speculatively-merged role repo. This means a user could propose a
change which Depends-On a change to the role repo and cause mischief.
The author of the job in the trusted repo should be able to rely on
the fact that when that job runs, both the playbook and the roles
used by that playbook only contain code that is actually in the
respective repositories. Likewise, should a job in an untrusted
repo inherit from that job, when its playbook runs, it should be
able to use a speculative change to the role repo.
In short, when we are running a trusted playbook, we should use the
branch tip of all role repos used by that playbook. And when we run
an untrusted playbook, we should use any speculatively merged changes
to those roles.
Since we can run both kinds of playbooks in a single job, this change
prepares roles in both manners, if necessary. If any playbook run by
the job is untrusted, we will prepare the speculatively-merged repo
as a role. If any playbook is trusted (or the role does not appear in
the dependency chain for the change), we prepare the branch tip of the
role repo. When we run the playbooks, we use the appropriate version
of each role based on the security context of the playbook.
Change-Id: I06dd3851a8f805dba9afe1b4a0eaa1b2fdd4efa2
Fully qualify projects in the merger with connection names.
This lets us drop the URL parameter (which always seemed
unecessary, as the merger can figure that out on its own given a
uniquely identified project).
On disk, use the canonical hostname, so that the checked out
versions of repositories include the canonical hostname, and so that
repos on mergers survive changes in connection names.
This simplifies both the API and the JSON data structure passed to
the merger.
The addProject method of the merger is flagged as an internal method
now, as all "public" API methods indirectly call it.
In the executor, after cloning and merging are completed, the 'origin'
remote is removed from the resulting repositories since it may not
be valid for use within a running job.
Change-Id: Idcc9808948b018a271b32492766a96876979d1fa
This corrects a logic error where timed out builds were recorded as
ABORTED. They should be recored as TIMED_OUT instead. Update the tests
which asserted the incorrect result as well.
Change-Id: Ibc07a87a42dbd8de3ae78dfeedc5c8260c9c0153
The watchdog threads check their status every 10 seconds which means
they can only exit as quickly as once every 10 seconds. This results in
the threads appearing to be leaked by tests but they should go away in
no more than 10 seconds.
Note that we don't reduce this poll time as there could be many watchdog
threads running during normal zuul execution and typically 10 second
resolution there is plenty. We just have to accomodate that in our
tests.
Change-Id: If3cf86c4af0b2fbcaf51a233c766c75749ca1d1d
Sometimes we need to log in to a nodepool node using a username of
something other than zuul. This used to be possible by setting that
[launcher] username= property. Re-enable it with the [executor]
default_username= property.
default_username is used instead of simply username as it is likely in
future this information will be able to be supplied by nodepool or other
in a node or image specific way. At which time that information will be
used in priority to the default specified in zuul.
Change-Id: Icf657b4f0bbe34e182307b9eea0cd64a8d813464
If the executor stops while jobs are running, those jobs are not
explicitly aborted. In production, the process exit would cause
all of the jobs to terminate and the gearman disconnection would
report a failure, however, in tests the python process may continue
and the ansible threads would essentially leak into a subsequent
test. This is especially likely to happen if a test holds jobs in
build, and then fails while those jobs are still held. Those threads
will continue to wait to be released while further tests continue
to run. Because all tests assert that git.Repo objects are not
leaked, the outstanding reference that the leaked threads have
to a git.Repo object trips that assertion and all subsequent tests
in the same test runner fail.
This adds code to the executor shutdown to stop all jobs at the start
of the shutdown process. It also adds a test which shuts down the
executor while jobs are held and asserts that after shutdown, those
threads are stopped, and no git repo objects are leaked.
Change-Id: I9d73775a13c289ef922c27b29162efcfca3950a9