This will prevent the executor from taking too many jobs all at
once which could cause the system to become unstable. It adds a
config option to allow scaling based on observed capability of the
system since load average can be subject to factors outside of
Zuul's control. The default of 2.5 was selected based on highly
scientific observation of a single graph from OpenStack infra's
cacti that looks about right.
Change-Id: I346a92fe7185d7ef216cd303e19acaa59c6d6101
This is something we had in Zuul v2.5 but slipped through in v3.
Executors delay getting new jobs from gearman in proportion to
the number of jobs they're already running, so as to avoid a single
executor ending up running all the jobs.
Change-Id: I75b87c09d507ee7c4785acf52e75e918bb0ab16f
Run the Ansible 'setup' module on all hosts in the inventory
at the start of the job with a 60 second timeout. If we
aren't able to connect to all the hosts and gather facts
within that timeout, there is likely a network problem
between here and the hosts in the inventory; return them and
reschedule the job.
Change-Id: I121514aa1f3c9ac1a664322c8d7624703146f52f
We only get messages in the debug log on unexpected exceptions. Add
a message to the job log so that people can know there was a problem.
Change-Id: I4cb6c5d22724b55c265577dffa581a3c8e74ec25
Scope our 'success' variable from previous runs as zuul_success.
Change-Id: I96af116b9a830e5d67a9b4fd7fa7451620ad3b16
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We log ansible parse errors specially, but the current loop
produces a bunch of lines with b'' which isn't as nice looking
as it should be.
Change-Id: I3e62e6a938422ee02099b2dfe3e1350311c722b4
This is needed for (at least) the log processor jobs which communicate
information about what labels are used.
Change-Id: I20dc54fcd56c3dfebabe4b7900ac06b729582a83
When running or debugging the tests in pycharm the logconfig module
isn't found and leads to broken test cases [1]. Running in tox seems
to work though. Changing the import to zuul.ansible.logconfig fixes
this for both.
[1] AttributeError: module 'zuul.ansible' has no attribute 'logconfig'
Change-Id: I38f624de8338fa8bdc8fd5fa58bf0b8859fbcf4a
Sending SIGTERM to the ssh-agent is a standard part of shutting down the
executor. It stands out it logs when everything else is at debug, but a
TERM signal is more important. This is not something that needs looked
at so should just be debug.
Change-Id: I31f15cc24bb185a0f95ac9d1c30ba852fd5cbd53
Repos can contain secrets, which shouldn't be logged.
Change-Id: I579201beab67ee6d8180596d76eeb512ca2d0410
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We need to pass a working logging config to zuul_stream and ara so that
alembic migrations don't step on pre-playbook output.
Write a logging config using json, then pass its location in env vars so that
zuul_stream and ara can pick it up and pass it to dictConfig.
In support of this, create a LoggingConfig class so that we don't have
to copy key names and logic between executor.server, zuul_stream and
zuul_json. Since we have one, go ahead and use it for the server logging
config too, providing them with a slightly richer default logging
config for folks who don't provide a logging config file of their own.
The log config processing has to go into zuul.ansible because it's
needed in the bubblewrap and we don't have it in the python path
otherwise.
Change-Id: I3d7ac797fd2ee2c53f5fbd79d3ee048be6ca9366
We can block this in config loading before jobs start. Leave the other
validation as well to prevent jobs from passing variables as part of the
return process.
Change-Id: I071a1fcd6037ab0dca78d83ff69b77907d0ccae6
Since the mountpoint for the tmpfs used for playbook secrets may
itself be on a read-only bind mount inside the bubblewrap environment,
ensure that it is created before bwrap runs.
Change-Id: I493d1b33500c23d4e2c1458247345cc751757a0b
So that we may avoid writing the decrypted contents of secrets to
disk, write them to a file in a tmpfs.
Change-Id: I7c029b67d0fc2fa3827dc811137dd4f3a90706d8
We recently began altering the mount map used by the wrapper driver
for each execution run (so that we can only include the current
playbook). However, the setMountsMap method operates on the global
driver object rather than an object more closely bound to the lifetime
of the playbook run. The fact that this works at all is just luck
(executing process is slow enough that hitting a race condition where
the wrong directories are mounted is unlikely).
To correct this, add a new layer which contains the context for the
current playbook execution.
Change-Id: I3a06f19e88435a49c7b9aea4e1221b812f5a43d0
This is incompatible with having the jobdir root be anything other
than /tmp, as it is unlikely to exist on the remote host.
This was originally added in commit
176431ec14 but all the reasons we needed
it are no longer relevant.
Change-Id: Ib3edf0abb0db33143b92e6947f08e22a39ff0c77
It is possible we want to know the name of the cloud, which could be
different from nodepool.provider. In the case of openstack-infra, this
is to fix a DNS issue by when creating the mirror name of our regional
mirrors.
Change-Id: I3ac65744356e3fa25d10208d11be95dc16b1e2e7
Depends-On: Idc7686167d131d8e74d55b8f7f50224a1b782091
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Secrets are proving less useful than originally hoped because they
can not be effectively used in any jobs with untrusted children.
This change binds the secrets to the playbooks which use them, so
that child jobs are unable to access the secrets. This allows us
to create jobs with pre/post playbooks which use secrets which
are suitable for other jobs to inherit from.
Change-Id: I67dd12563f3abd242d6356675afed1de0cb144cf
We were incorrectly preparing the current state of the repo for
ref updated (eg, post) jobs. This ensures that we run with the
actual supplied ref, even if the remote has moved on since then.
Change-Id: I52f05406246e6e39805fd8365412f3cb77fe3a0a
It is helpful to expose the work_root into the inventory. Today, we
would need to use a relative path on log_root or src_root. Having the
absolute path is better to read for debugging.
Change-Id: I86ff459d283eaf348821c2f11c1f8575598f088d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
The content in these can be a file or a directory - so _dirs is
confusing. Change it to _paths and document it.
Change-Id: Ida38766cd3d440d75a6dc55035a54e0804e03760
If a job were to be pointed at an abnormally large git repository (or
a maliciously large one), a clone would fill the disk. Or anything else
that might happen that writes data onto the executor disk.
We run a single thread periodically running du on the root of all jobs
on this executor. This is called the DiskAccountant.
We set a config item per executor of the limit per job. This won't
actually save a server from a full disk if many thousands of concurrent
changes are submitted, but this will prevent any accidental filling of
the disk, and make malicious disk filling much harder.
We also ignore hard links from the merge root, which will exempt bits
cloned from the merge root from disk accounting.
Change-Id: I415e5930cc3ebe2c7e1a84316e78578d6b9ecf30
Story: 2000879
Task: 3504
All callback plugins listed in the explicit callback_plugins path are
loaded. callback_whitelist is apparently only for things that are
shipped with ansible.
Change-Id: I9ffd00cbd5aeffdb0c46f5aec19f54c6f16cf68b
As we start to work on getting devstack-gate running in zuulv3, we have
an execution context shift issue looming. That is, in current
devstack-gate, there is a shell script that runs on the remote node that
then, for some tasks, runs ansible.
In v3, we want ALL of that to be ansible run by the executor. However,
we'll lose our current ara reports in devstack-gate if we shift all the
ansible to the executor before we add ara support to zuul.
This will result in ara collecting log information into a sqlite
database in self.jobdir.work_root. We can then, as we choose, write a
post-playbook task to do 'ara generate html' as we do in devstack-gate
today.
It's implemented in a try/except block so that it's a deployer choice to
install ara or not. Later we can extract that into a plugin interface.
ara was put into the test-requirements.txt though so that for unittests
we can at least catch cases where something about ara might break
ansible-playbook from being able to execute.
Change-Id: I8facdf0b95b83d43c337058d70fe6bf71e17d570
There is a logging adapter that preprends the job id to log lines, so
putting it into the message as well is duplication. This should turn
this:
2017-07-27 19:45:00,836 DEBUG zuul.AnsibleJob: [build:
68c5833049c34d4c9e564811c43ad303] Job 68c5833049c34d4c9e564811c43ad303:
git updates complete
into:
2017-07-27 19:45:00,836 DEBUG zuul.AnsibleJob: [build:
68c5833049c34d4c9e564811c43ad303] Git updates complete
Change-Id: I5e25510d87f195f4f241a83d8faf6515232c6b4d
It would be useful to allow deployment specific configuration that can
be fed into the project-config deployments so that we can customize
things like host ip without having to change job definitions for each
site.
Also, add a method to display the build log from a failed assertion in
the Ansible test (this was used in the development of the tests for
this change).
Change-Id: I87e8bffc540bcafab543c46244f3d5327b56fcae
Co-Authored-By: James E. Blair <jeblair@redhat.com>
The setMountsMap command required the state_dir argument, presumably
so that the zuul ansible path (ie, our custom modules) is available.
Unfortunately, it set it as a read-write bind, not read-only. We
certainly don't want jobs (even trusted jobs) modifying the ansible
code that we run.
Switch it to a read-only bind mount.
Also, remove it from special handling inside of the setMountsMap
method and instead, handle it on the executor site for increased
visibility.
Finally, add options to the zuul-bwrap command to set the ro and
rw binds to make interactive testing easier.
Change-Id: I4a0fdae546a2307d78a5c29b5a62a6d223ecb9e9
This internal-only attribute is basically the same as "ref" but
spelled differently only in the case of a change. Just use
the "ref" name in all cases for improved developer sanity.
Change-Id: I476f8d32dae37309ab0c9e11c8a5337b213f985e
So that we may re-use the same jobs for pre and post merge tests,
enqueue an item for every branch of every timer-triggered project
and checkout that branch before running the job. This means that
rather than having a job for gate plus a job for each stable branch,
we hav just have a single job which runs with different content.
The old method is still supported using override branches.
This updates the model to include Change, Branch, Tag, and Ref
objects which can be used as the value of Item.change. Branch,
Tag, and Ref are all very similar, but the distinction may help
us ensure that we're encoding the right information about the items
we are enqueing. This is important for branch matching in pipelines
and is also used to provide job variables.
Change-Id: I5c41d2dcbbbd1c17d68074cd7480e6ab83f884ea
Turn on fact caching and smart gathering but not for localhost! We do
not want to leak information about zuul-executors to untrusted
playbooks.
Change-Id: I40941c0f15d801d91c60ff5af33d047044052154
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This meant bubblewrap needed to have a zuul user on the host system,
which when gathering facts on localhost would cause an ansible
failure. Additionally, this no longer is needed because our logging
is now handled by zuul_stream.
Change-Id: I75cd050e26438dbe6d47ff216ffe6e08b9687aca
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Pass in and use info from the JobDirPlaybook rather than trying to strip
path elements from the playbook name.
Change-Id: Ifcd6f05e27c987d40db23b3dcec344c2eb786d7c
Add the project in which a job is defined as an implicit role project.
This is a convenience for job authors who may want to put roles in
the root of the project (ie, not adjacent to job playbooks, since,
after all, the roles may be useful outside of the job playbooks). In
that case, they will not need to specify the job's own project in the
roles: section of the job.
Change-Id: Ia382c2da9f7eb7139ceb0b61cb986aace8dc8d8f
There are some errors that the executor may encounter where it will
be unable to, or refuse to, run a job. We know that these errors
will not be corrected by retrying the build, so return them as
errors to the user. The build result will be "ERROR" and the message
which is brief, but hopefully sufficient to illuminate the problem,
will be added to the job report.
Change-Id: Iad486199de19583eb1e9f67c89a8ed8dac75dea1
Story: 2001105
Story: 2001106
Tried first with the upstream callback plugin, but it is a stdout
plugin, so needs to take over stdout to work. We need stdout for
executor communication. Then tried subclassing- but the magical ansible
module plugin loading fun happened again. Just copy it in and modify it
slightly for now.
We add playbook, phase and index information. We also read the previous
file back in and append to it on subsequent runs. This may be a memory
issue. HOWEVER - the current construction will hold all of an individual
play in memory anyway. Most of our content size concerns are around
devstack jobs where the bulk of the content will be in a single playbook
anyway - so although ram pressure may be a real thing - we may need to
solve it on the single playbook level anyway. But for now, this should
get us the data.
Change-Id: Ic1becaf2f3ab345da22fa62314f1296d76777fec
Publishing the inventory as part of publishing logs would be handy
in debugging that everything is as one expects. However, secrets
currently go into the inventory, which means publishing it is not safe.
Write the secrets to their own file.
Also, Return errors if users try to define zuul vars or secrets. To
support that, we need to pass zuul vars as a top-level param, not inside
of vars already.
Change-Id: If58b89882a817ff219ed5f8faf2bde31cc8e1a6a
Due to a bug in the equality check of the ZuulRoles class, we
were unable to add more than one roles path. This corrects that and
adds a test of role inheritance which exercises this.
Change-Id: Icf6daa312405ed56d2fecb89fc6aee69b4b80e41
So that a job lower in the inheritance hierarchy does not alter
the behavior of playbooks defined higher in the hierarchy, run
each playbook with only the roles that were present on the job
at the point in the inheritance hierarchy that playbook was
defined.
Change-Id: I06f4aff5340f48a09dae2cd95180531fa572b85e
We pass the execution phase and the index of the phase now, so emit it
into the play banners in the log. This will allow us to post-process the
logs and put in smart things like "collapse pre tasks".
Also include information about include statements.
Also rename zuul_execution_phase_count to zuul_execution_phase_index -
mainly just because it's an index not a count. My shed is red.
Change-Id: I975ed9547bbcdbb70d5a25c9be398888bdcdb07a
This loads a json file (work/results.json) that the job can write
to. It will be loaded by the executor after the job completes and
returned to the scheduler.
We can use the data in this file as the reported log URL for the
build. Later we can use it to supply file/line comments in
reviews.
Change-Id: Ib4eb743405f337c5bd541dd147e687fd44699713
We don't need as many redundant exception and cleanup handlers if
we move the ssh start and add into the already existing handler
for the AnsibleJob execute method.
Change-Id: I6f5a97b831b535a58fbc6c4f0e9f05f1c670870d
I'm getting an error when executing ssh-agent add. I'm not sure why and
the logs don't provide any useful information nor does the application
cleanup the ssh-agents when it fails.
Turns out I didn't have permission to read the private ssh private key
on the filesystem.
Provide useful logging and cleanup here.
Change-Id: I1788a57f51e3516c91e12d1e0a20a4b842cedb20
Signed-off-by: Jamie Lennox <jamielennox@gmail.com>
We generally expect a repo to either be a single bare role, or
to have a "roles/" directory which is a collection of roles.
We previously also supported the repo itself being a collection
of roles at the root of the repo, but that's not widely used
and is potentially confusing. Especially since we want to make
a followup change to implicitly treat some repos as roles repos.
Stop supporting that for now. If folks want to support this,
we might consider doing so only if there is a meta/main.ya?ml
file in the repo (as Galaxy accidentally supports that, though
it is not an explicit contract they guarantee).
Change-Id: Ie70e9fae4da57d4aefa01442a9f60526163b27c0