Add an Ansible handler to send a hangup signal through
docker-compose to the running haproxy daemon any time the task to
update its configuration fires.
Change-Id: I1946c1e7eaaa8a8e2209007b5d065dba952ec6e2
This adds the simplest form of health checking to haproxy, a tcp check
to the backends. We can do more sophisticated checks like checking ssl
negotiates or even HTTP requests but for now this is probably a good
improvement.
Change-Id: I3c6b07df4b3e0c380c757e1e5cb51ae0be655f34
Zuul has hit a scenario where a git repo update was unable to talk to
gerrit via ssh because it had reached its per user connection limit [0].
This then led to some openstack job failing [1].
The default limit (which we were using) is 64 connection per user.
Apparently this is not quite enough for a busy zuul? Increase this by
50% up to 96.
[0] http://paste.openstack.org/show/754741/
[1] http://lists.openstack.org/pipermail/release-job-failures/2019-July/001193.html
Change-Id: Ibeca2208485608f3b61aa716184165342bfcc3c9
We ended up running into a problem with nodepool built control plane
images (has to do with boot from volume not allowing us to delete images
that are in use by a nova instance). We have decided to clean this up
and go back to not doing this until we can do it more properly.
Note this isn't a revert because having a group for access to control
plane clouds does seem like a good idea in general and I believe there
have been changes we'd have to resolve in the clouds.yaml files anyway.
Depends-On: https://review.opendev.org/#/c/665012/
Change-Id: I5e72928ec2dec37afa9c8567eff30eb6e9c04f1d
The global inventory is used when launching nodes so if we want to
replace a server we have to remove it from the inventory first. This is
that step for replacing gitea01.
Note that when adding it back for the new server there are some edits to
make to the playbooks as noted in the gitea sysadmin docs.
We do also remove this instance from haproxy as well to prevent unwanted
connections while we flip things over.
Change-Id: If32405b1302353f1f262a30b7392533f86fec1e4
This is a follow on to I67870f6d439af2d2a63a5048ef52cecff3e75275 to do
the same for files.openstack.org (as
http://files.openstack.org/mirror/logs/ is a handy central place to
point people at)
Change-Id: I07c707d45ab3e3c6f87460b3346efd7026467c56
The OpenStack/OpenDev PPA repositories are currently undocumented.
Add some information on where to find things.
Change-Id: Iea03c5d558b3dd6af9f7c860dfcc75a71dc59d9f
This tool scans gerrit changes for comments from zuul over the last 30
days to build out success rates for check and gate pipelines. This only
looks at changes that have merged to avoid those that never can merge
because they only fail or are expected to fail.
This tool emits information like:
Changes: 4475
Check Failures: 5317.0
Check Successes: 9173.0
Check Rate of failure: 0.3669427191166322
Gate Failures: 687.0
Gate Successes: 4450.0
Gate Rate of failure: 0.13373564337161767
Total Failures: 6004.0
Total Successes: 13623.0
Total Rate of failure: 0.3059051306873185
Change-Id: I759ba670c6b81f4425ce618c412db9cbd0e51401
Haproxy wants to log to syslog (and not stdout for performance reasons,
see https://github.com/dockerfile/haproxy/issues/3). However there is no
running syslog in our haproxy container. What we can do is mount in the
host's /dev/log and have haproxy write to the hosts syslog to get
logging.
Do this via a docker compose volume bind mount.
Change-Id: Icf4a91c2bc5f5dbb0bfb9d36e7ec0210c6dc4e90
We are booting instances outside of rax and they don't always come with
extra devices that can be repurposed for swap. If in that case then
create a swapfile instead.
Note we do not use fallocate as swapon's manpage says this is suboptimal
with the linux kernel's swap implementation.
Change-Id: I8b9ce18c18e4069aba7de27bb6a9927627b15b49
We're making these requests to localhost over an ssh connection.
The password warning, on the other hand, is a real thing. Let's not
log the gitea password when we run this in prod.
Change-Id: I2157e4027dce5ab9ebceb3f78dbeff22a83d9fad
This runs repo creation across two orgs at the same time. It doesn't
help to parallelize more than 2 since openstack runs the entire time
in one thread (so the other thread handles all the other orgs).
Parallelizing by org avoids database contention for updating the user
table, since each org is a different user. However, there's a weird
locking thing going on with the first update to the settings table,
so this does some extra work to serialize actions until we perform
that first update, then switches to parallel.
This is the maximum we can parallelize repo creation at the moment,
and it also maximizes settings updates (the settings updates take less
time than repo creation, so no further optimization helps).
Change-Id: I7f83dcdb4531a547ae5281434d7cda825dd50059
This keeps repo creation serialized (because of a bug in gitea),
but it parallelizes updating the settings. This should reduce
our time by about half.
It also uses a requests session, though I'm not sure if that
really gets us anything.
It eliminates a couple of extraneous GET calls following 302
redirect responses from the POSTs on setting updates.
This will automatically paralellize to nproc * 5 threads.
Change-Id: I5549562d667c0939d0af1151d44b9190774196f9
This takes a similar approach to the extant ansible_cron_install_cron
variable to disable the cron job for the cloud launcher when running
under CI.
If you happen to have your CI jobs when the cron job decides to fire,
you end up with a harmless but confusing failed run of the cloud
launcher (that has tried to contact real clouds) in the ARA results.
Use the "disbaled" flag to ensure the cron job doesn't run. Using
"disabled" means we can still check that the job was installed via
testinfra however.
Convert ansible_cron_install_cron to a similar method using disable,
document the variable in the README and add a test for the run_all.sh
script in crontab too.
Change-Id: If4911a5fa4116130c39b5a9717d610867ada7eb1
Default apache mimetypes don't include .log as text/plain; add it.
Log export was added with I67870f6d439af2d2a63a5048ef52cecff3e75275 so
match the .log.1 file that logrotate creates for our rsync mirror logs
too.
Change-Id: Iaf3f19d26f3a6fda7ef3571573af219a31f1dced
Apply the exclusion for trusted CI comments to the hide function's
conditional case as well as the toggle function's.
Change-Id: Ia4e5ec22a097a8b8cb564c237fd0aa48ab6f8724
It looks like I forgot to add this in
I525ac18b55f0e11b0a541b51fa97ee5d6512bf70 so the mirror-update
specific roles aren't running automatically.
Change-Id: Iee60906c367c9dec1143ee5ce2735ed72160e13d
When determining whether a project exists, we need to compare to
just the name, not the full data structure about the project.
Also, if the project exists, don't try to create it again; that
will return a 409 conflict error.
Change-Id: I0b8affac96b17fa73253082b1b87d4c00bf23463
When filtering CI system comments, don't hide those from Zuul, our
gating CI system. It is important to see these comments as not all
results may match the patterns used to expose them as rows in the CI
table. Rename the "Toggle CI" button to "Toggle Extra CI" so that
the name remains accurate without being too verbose.
Change-Id: Id0cd8429ee5ce914aebbbc4a24bef9ebf675e21c
Add the full remote_puppet_git playbook that we actually use in
production so that we can test the whole kit and caboodle. For
now don't add a review.o.o server to the mix, because we aren't
testing anything about it.
Change-Id: If1112a363e96148c06f8edf1e3adeaa45fc7271c
This used to be mirrored, however there were issues when upstream
dropped the PC1 repositories a few months back. The puppet openstack
jobs are still trying to leverage this mirror but it does not exist in
some regions because it was disabled on the afs content. This change
fixes the reprepo configuration to still pull down puppet5/6 for xenial
and strech and add the symlink back to the mirrors.
Change-Id: I71ad5afe086a503d75a365543ad8869e35ef873b
Sadly, as readable as the use of the uri module to do the interactions
with gitea is, more reent ansible changed how subprocesses are forked
and this makes iterating over all the projects in projects.yaml take
an incredibly long amount of time.
Instead of doing it in yaml, make a python module that takes the list
one time and does looping and requests calls. This should make it be
possible to run the actual gitea creation playbook in integration tests.
Change-Id: Ifff3291c1092e6df09ae339c9e7dddb5ee692685
Zuul now includes an ansible_python_interpreter hostvar in every
host in its inventory. It defaults to python2. The write-inventory
role, which takes the Zuul inventory and makes an inventory for
the fake bridge server in the gate passes that through. Because it's
in /etc/ansible/inventory.yaml, it overrides any settings which may
arrive via group vars, but this is the way we set the interpreter
for all the hosts on bridge (we do not do so in the actual inventory
file).
To correct this, tell write-inventory to strip the
ansible_python_interpreter variable when it writes out the new
inventory. This restores the behavior to match what happens on
the real bridge host. One instance of setting the interpreter
for the fake "trusty" host used in base platform tests is moved to
a hostvars file to match the rest of the real hosts.
Change-Id: I60f0acb64e7b90ed8af266f21f2114fd598f4a3c