Some of the lookup plugins access files on the executor host. Obviously
that's not what we want, so block them like we block action plugins.
password.py is banned, although it could be filtered. However, the
upstream code is fairly intense and slated for refactoring - so let's
wait until someone gets upset about it.
Change-Id: I6260b4658619a972b588c8bfba40ec33557bf2f6
One of the useful parts of the executor debug output is that if you have
keep-jobdir flag on and are holding the node you can run the command
again to debug any problems.
Unfortunately you have to include the ANSIBLE_CONFIG file as an env
variable to make this work - and there is no CLI equivalent for
ANSIBLE_CONFIG. Include the config file in the debug output and unquote
the cmd for easier copy and pasting.
Change-Id: If968f8f0c20f0d043653e8c127603b1725875d89
We need to prefix our node IP address to the host_keys we get from
nodepool. Otherwise, ansible will fail to properly SSH to the node.
Change-Id: I400e2fdde507b742d133dc2b851a5d12686eb551
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Move merge scheduling to its own function in the pipeline manager,
and call it in the main _processOneItem loop once the item has
entered the active window, in addition to the previous location in
the executor when getting the layout information.
Since we are now scheduling merges for all items in any pipeline,
make sure we properly handle both Ref and Change objects.
Also, if the executor encounters a merger failure, immediately report
that result.
Change-Id: I1c9db6993994bf8e841ecd8554c37a3ec0afc798
Co-Authored-By: Adam Gandelman <adamg@ubuntu.com>
Story: 2000773
Task: 3468
PyYAML doesn't automatically use the much faster and more memory
efficient libyaml bindings, even if the extension is available. So we
provide our own module that exports the pieces needed to use the faster
one, or fall back to the pure python implementation.
Change-Id: I7ee99f5017cb83153ab8fa9bc23548ed639777c1
With nodepool putting interface_ip into the node record, which has
already had the v4/v6 calculated, we should just consume that here for
the inventory building.
Change-Id: Ia64aa826ba8f3c54ca883a246163db2639471170
Depends-On: I2b4d992e3b21c00cefe98023267347c02dd961dc
It will be helpful to know which executor ansible-playbook is run
from so pass this info into vars.yaml.
Additionally, update our test_v3 playbook test to also validate our
other executor ansible variables.
Change-Id: I22091c8e764ad519878e5d530e5bc72ffd2a4870
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Ansible gives users helpful warnings if they have a shell or command
invocation that seems similar to an existing ansible module. That's
great, but maybe not what we need in a CI system.
Change-Id: Ib0f901d9cdcfcbea14ea31f9f3142d31050b53c2
Now that we have the host keys of a node in zookeeper, we can drop
ssh-keyscan from zuul and simply write the data to known_hosts.
Change-Id: I7a130a9cb47d0e248b7c5d5e3d576bee58a81d73
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Because we want jobs to know something about the provider they are
running on, expose nodepool variables in the inventory file.
Change-Id: I18c8b414b1bbb114d55d21c5ae77d6348b3e9080
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
To avoid confusion with nodepool-launcher, we've decided to rename
zuul-launcher to zuul-executor.
Change-Id: I7d03cf0f0093400f4ba2e4beb1c92694224a3e8c
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Plumb through support for timeout for jobs. By default, we don't
support any timeout, which means jobs live forever.
Change-Id: Ice4fedffc6086676f54da0f06630a0ff7ad7d916
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
If a job configuration gives a list of repos, add them to the list of
projects to update on the slave.
Test using a mock openstack dsvm job which should clone both nova and
keystone. Put this new mock job in the check pipeline rather than the
gate pipeline to keep the build history small, and assert that both the
launcher and the worker have cloned the project that did not trigger
the job.
Change-Id: I3ccf8713906d65cbd27929548499e81f798cea82
This is step one towards supporting GOPATH natively. With this, one can
set GOPATH={work_dir} and go will find src that we've already checked
out, and will put artifacts in a sibling dir to src that is in the
GOPATH. We'll follow this up with putting the repos into
{src_dir}/{connection}/{full_repo} - such as
"git.openstack.org/openstack-infra/zuul"
Change-Id: I5acefd212587d18d0d3cc4ccd436555734e56e63
One of the main reasons we made the update thread in the launcher
was so that we could keep current copies of all the git repos
needed by jobs in the launcher, and then clone from there to the
jobdir so that we generally clone from cache. We weren't actually
doing that, but instead were cloning from the source for each job.
This changes the jobdir merger to clone from the cached repos.
Change-Id: I2be41424c26028068671aecec2520a4a6ad7ae66
We want things to execute in the context of the work dir, and for the
ansible config files to not be inside of that.
Change-Id: Ie299532a5c50048a0d4a0c90496996f8d0a1307b
Conceptually, we're trying to express whether we trust the authors of a
job or not, and whether or not we trust the job to be able to have
access to secrets or request plugin execution. Reading and writing
narrative text using the word secure for that starts to hurt the head
sometimes. Switch to trusted.
Change-Id: Ic6a9fe7406f808f965a0ed5ef099fdea92f52c25
This is basically the default callback plugin from ansible cleaned up
for pep8 and plumbed in. Changes we make here should directly affect
formatting and whatnot.
Change-Id: Ic807a80a13ad14e080e50307f6f78619235dfe7f
This way the pre-playbook that pushes the git repo onto the worker
knows where to find the git repos.
Change-Id: Id7c924540add867e720a77a019e86a842e27457b
When creating the inventory file, perform an ssh keyscan on each
host and add it to the known_hosts file for the job.
Change-Id: If4b362edf9a1ef280e1a8261b91a73ba92932d18
There is weird blackmagic with action plugins and overriding them that
is causing carnage. Run everything in secure context until we can sort
that.
Change-Id: I45ae26a33c9fe0ff0a2e4f6f37fe47b22854ff35
Ansible has a config flag that is accessible from callback plugins that
is an indicator for whether or not to display args in the log output.
It should be noted that this can wind up being a general flag in our
callback plugin that will let us know if the job being run is secure or
insecure.
Change-Id: Ie0b45ca533e71610cc18950edd735dc3258bd604
This adds support for Ansible roles in Zuul-managed repos. It
is currently limited to repos within the same source, which is
something we should fix.
We also plan to add support for roles from Ansible Galaxy in a
future change.
Change-Id: I7af4dc1333db0dcb9d4a8318a4a95b9564cd1dd8
First, it was a little hard to follow what was going on here because
of the main -> required variable change. So let's call them both
'required'.
Second, it sure does look like we were saying that we required the
first instance of the main playbook to be present, but not the
pre and post playbooks. The opposite is true -- every instance
of pre and post playbooks is required, but we only require that
one of the main playbooks exists. Reverse the logic.
Change-Id: Ibb1ffb725e09b65bcfaac4878ae3d8284864f5fb
There are actions undertaken by action plugins in normal ansible that
allow for executing code on the host that ansible is executing on. We do
not want to allow that for untrusted code, so add a set of action
plugins that override the upstream ones and simply return errors.
Additionally, we can trap for attempts to execute local commands in the
normal action plugin by looking at remote_addr, connection and
delegate_to.
Change-Id: I57dbe5648a9dc6ec9147c8698ad46c4fa1326e5a
This makes zuul-launcher run the actual zuul v3 launcher rather
than the zuul v2.5 ansible launcher. It also cleans up the
shutdown process a bit.
Change-Id: Iedb1c23c3cb46090859c520e10043f0b7083a862
An earlier change dealt with inheritance for pre and post playbooks;
they are nested so that parent job pre and post playbooks run first
and last respectively.
As for the actual playbook, since it's implied by the job name, it's
not clear whether it should be overidden or not. We could drop that
and say that if you specify a 'run' attribute, it means you want to
set the playbook for a job, but if you omit it, you want to use the
parent's playbook.
However, we could keep the implied playbook behavior by making the
'run' attribute a list and adding a playbook context to the list each
time a job is inherited. Then the launcher can walk the list in order
and the first playbook it finds, it runs.
This is what is implemented here.
However, we need to restrict playbooks or other execution-related
job attributes from being overidden by out-of-repo variants (such
as the implicit variant which is created by every entry in a
project-pipeline). To do this, we make more of a distinction
between inheritance and variance, implementing each with its own
method on Job. This way we can better control when certain
attributes are allowed to be set. The 'final' job attribute is
added to indicate that a job should not accept any further
modifications to execution-related attributes.
The attribute storage in Job is altered so that each Job object
explicitly stores whether an attribute was set on it. This makes
it easier to start with a job and apply only the specified
attributes of each variant in turn. Default values are still
handled.
Essentially, each "job" appearance in the configuration will
create a new Job entry with exactly those attributes (with the
exception that a job where "parent" is set will first copy
attributes which are explicitly set on its parent).
When a job is frozen after an item is enqueued, the first
matching job is copied, and each subsequent matching job is
applied as a varient. When that is completed, if the job has
un-inheritable auth information, it is set as final, and then the
project-pipeline variant is applied.
New tests are added to exercise the new methods on Job.
Change-Id: Iaf6d661a7bd0085e55bc301f83fe158fd0a70166
In case a user runs a merger and a launcher on the same host, make
sure that they don't share a git directory (used by the launcher's
internal merger). They could end up colliding.
Incidentally, that's basically the configuraton used in tests, so
update the test configuration likewise.
Change-Id: I64a690c706d00583973bd2d542a5f42ae6e9ef36
With this, the ansible launch server will automatically honor the
test root when it creates its tempdirs.
Change-Id: I4794c59cf7db63a992415a8933259bd0a2e4af54
Make sure all clients are identified.
Log the port on which the gearman server is listening in tests.
Log the arguments for the launch job.
Change-Id: Ia99ea5272241799aa8dd089bdb99f6058838ddff
Make sure that the launcher always completes a job even if it
encounters an exception when trying to launch it.
Change-Id: I4f691fb61d3fd54cab69d49bfdb3a82c6230c4ff
This allows jobs to specify pre and post playbooks. Jobs which inherit
from parents or variants add their pre and post playbooks to their
parents in onion fashion -- the outermost pre playbooks run first and post
playbooks run last.
Change-Id: Ic844dcac77d87481534745a220664d72be2ffa7c
This adds methods for running pre and post playbooks. They are
not actually run yet.
The jobdir is no longer used as a context manager so that it can
be added as an attribute of the AnsibleJob. This makes it easier
to access from tests.
The way results are passed around inside the launcher is changed
to be more clear and to potentially allow for expansion in the
future.
The synthetic 'RUN_ERROR' result that test_rerun_on_error relied
upon is removed. In its place we simply set the requeue attribute
and check for a 'None' result. That is a simpler method of testing
the same thing (that the launcher failed to get a result from the
main body of the job).
Change-Id: I335807576ffb76600ed8a3ac2355a8b5f8729240
Help keep the state of each job the launcher is managing in its own
class. This will make stopping, pre/post playbooks and handling
failures easier.
Change-Id: I8fe77025ca443adcc5c8ca61f3a6b3abde0ba690
Detecting whether to look for a .yaml or .yml file for a playbook,
and detecting whether that playbook exists would ideally be done
in the scheduler. However, doing so involves passing and storing
quite a bit of extra data (file lists for each project-branch
combination), and might put a crimp in making playbook specification
more sophisticated later. So for now, let's do it in the launcher
where it's easy for us to test for the presence of files right
before we run a job. Even though that means we won't detect errors
until later, for many changes this will still be self testing and
should prevent many config errors from landing.
If need be, we can do the extra work to move it into the scheduler
later.
Change-Id: I1ad2eb4a5d0ff08fbd2070f55e352633dd6de81b
This adds a vars file for use by ansible playbooks with all vars
scoped under 'zuul'. The existing environment variables will be
moved into this section in a later change. Currently, the only
supplied variable is 'uuid' for use in a test.
Also, add a test-specific vars entry (zuul._test) so that we can
pass information such as the test chroot into playbooks used in
our tests. This is used by a test to set a flag file inside of
the test chroot to verify that the ansible playbook in a job
actually ran and did something.
Change-Id: Ie5d950d051accad4ec9dc90a9e1b01b3095a1e5c
Co-Authored-By: Monty Taylor <mordred@inaugust.com>
This replaces the stubbed-out 'hello world' Ansible playbook with
an implementation which actually runs the corresponding playbook
defined in the repo where the job is defined.
Change-Id: I73a6b3b067c7d61bb2a2b2140ab98c4944a6adfe
Story: 2000772
These are emitted when the command socket is being shutdown and is
only used to wake the thread. It should not be accepted as a command.
Change-Id: I0e7b30cbe60f5a96daec3697f24b9a97516027e2
In coercing this test to run, it's clear there are a number of "TODO"
items in zuul.launcher.server.LaunchServer that seem simple enough but
that I don't really understand. I've filled in enough to make the test
pass, but I am not at all confident that this is actually the way we
want this to work long-term.
get_running_jobs also had to be extended to add tenants. I have to
wonder if we should change the payload to return the tenants somehow.
Change-Id: If91cb662ceef8e1d7660974df07b821720f210d4
Story: 2000773
Task: 3414
This test is making sure that when something terrible goes wrong in the
launcher and it just returns 'None', the scheduler retries.
During refactoring of the launcher, some of the logic that handled this
special case of run_error was removed. Also the launcher wasn't properly
handling a return of None and would have failed with a TypeError instead
of sending a failure to the zuul client.
Change-Id: I6b063ba913bf72087d2cc027f08e02304310c2be
Story: 2000773
Task: 3403
In all cases, the launcher-merger updates the repos involved in a
job before running it. If there are pre-merge changes, it then
merges those changes into the repos. If the job does not involve
pre-merge changes, then nothing further needs to happen. Avoid
attempting to merge changes which are already merged in this case.
Change-Id: Ie0c0d258b4edad4afc3b569f8ea222523bc769c1
This brings forward most of our improvements related to executing
ansible, and also the command socket processor. It includes
several TODO notes for items which are not straightforward
translations.
Change-Id: Id0c58f7f2e3f78e1edf3d373b65b564568e52f1f
This fleshes out the nodepool stub a little more. It passes some
node information to the launcher after node requests have been
fulfilled. It also corrects some logic errors in the node request
framework. It moves data structures related to node requests into
the model. Finally, it adds nodes to the configuration of some
tests to exercise the system, and adds a test to verify the correct
node is supplied on a job that has a branch variant.
Change-Id: I395ce23ae865df3a55436ee92d04e0eae07c963a
The scheduler tests need to be able to abort jobs. Even though
our Ansible launcher is not capable of aborting jobs yet, implement
the method so that we can override it in the RecordingLaunchServer
and exercise the scheduler.
Instead of using name+number to specify which build to abort, use
the UUID of the build.
Remove the build number entirely, since it isn't required. In some
places it was used to determine whether the job had started. In those
cases, use the build URL instead.
Also correct the launch server used in tests so that it does not send
two initial WORK_DATA packets.
Change-Id: I75525a2b48eb2761d599c039ba3084f09609dfbe