Commit Graph

422 Commits

Author SHA1 Message Date
Zuul
b879e5fad7 Merge "Fork the maxking/docker-mailman images" 2022-11-22 18:11:24 +00:00
Zuul
b7b2157133 Merge "Add a mailman3 list server" 2022-11-22 18:00:30 +00:00
Ian Wienand
9445fccb55
system-config-run-gitea: use standard bridge host
In what looks like a typo, we are overriding the bridge node for this
test to a bionic host.  Remove this.  This was detected by testing an
upgraded Ansible, which wouldn't install on the lower python on
Bionic.

Change-Id: Ie3e754598c6da1812e74afa914f50d91972012cd
2022-11-22 11:26:14 +11:00
Zuul
be9db368af Merge "openafs: copy dkms log directory" 2022-11-21 21:12:41 +00:00
Clark Boylan
12d4355385 Fork the maxking/docker-mailman images
These images have a number of issues we've identified and worked
around. The current iteration of this change is essentially
identical to upstream but with a minor tweak to allow the latest
mailman version, and adjusts the paths for hyperkitty and postorius
URLs to match those in the upstream mailman-web codebase, but
doesn't try to address the other items. However, we should consider
moving our fixes from ansible into the docker images where possible
and upstream those updates.

Unfortunately upstream hasn't been super responsive so far hence this
fork. For tracking purposes here are the issues/PRs we've already filed
upstream:

  https://github.com/maxking/docker-mailman/pull/552
  https://github.com/maxking/docker-mailman/issues/548
  https://github.com/maxking/docker-mailman/issues/549
  https://github.com/maxking/docker-mailman/issues/550

Change-Id: I3314037d46c2ef2086a06dea0321d9f8cdd35c73
2022-11-21 16:51:02 +00:00
Ian Wienand
039aae5fa7
openafs: copy dkms log directory
Grab the make logs from the dkms directory.  This is helpful if the
modules are failing to build.

The /var/lib/dkms directory contains all the source and object files,
etc., which seems unnecessary to store in general.  Thus we just trim
this to the log directory.

Change-Id: I9b5abc9cf4cd59305470a04dda487dfdfd1b395a
2022-11-21 10:33:11 +11:00
Zuul
94cb35a7f6 Merge "Update Gerrit images to 3.5.4 and 3.6.3" 2022-11-16 01:29:42 +00:00
Clark Boylan
c1c91886b4 Add a mailman3 list server
This should now be a largely functional deployment of mailman 3. There
are still some bits that need testing but we'll use followup changes to
force failure and hold nodes.

This deployment of mailman3 uses upstream docker container images. We
currently hack up uids and gids to accomodate that. We also hack up the
settings file and bind mount it over the upstream file in order to use
host networking. We override the hyperkitty index type to xapian. All
list domains are hosted in a single installation and we use native
vhosting to handle that.

We'll deploy this to a new server and migrate one mailing list domain at
a time. This will allow us to start with lists.opendev.org and test
things like dmarc settings before expanding to the remaining lists.

A migration script is also included, which has seen extensive
testing on held nodes for importing copies of the production data
sets.

Change-Id: Ic9bf5cfaf0b87c100a6ce003a6645010a7b50358
2022-11-11 23:20:19 +00:00
Ian Wienand
9c76ebf4af
Update a few s/bridge01/bridge99 references
These were foregotten in I137ab824b9a09ccb067b8d5f0bb2896192291883
when we switched the testing bridge host to bridge99.

Change-Id: I742965c61ed00be05f1daea2d6110413cff99e2a
2022-11-11 15:05:39 +11:00
Clark Boylan
5e8d704278
Update Gerrit images to 3.5.4 and 3.6.3
Gerrit made new releases and we should update to them. Release notes can
be found here:

  https://www.gerritcodereview.com/3.5.html#354
  https://www.gerritcodereview.com/3.6.html#363

The main improvement for us is likely to be the copy approvals
performance boosts and error handling. We still need to run that prior
to our 3.6 upgrade.

Note we currently only run 3.5 in production but we test the 3.6 upgrade
from our current production version so it makes sense to update the 3.6
image as well.

Change-Id: Idf9a16b443907a2d0c19c1b6ec016f5d16583ad2
2022-11-11 13:20:36 +11:00
Ian Wienand
0c90c128d7
Reference bastion through prod_bastion group
In thinking harder about the bootstrap process, it struck me that the
"bastion" group we have is two separate ideas that become a bit
confusing because they share a name.

We have the testing and production paths that need to find a single
bridge node so they can run their nested Ansible.  We've recently
merged changes to the setup playbooks to not hard-code the bridge node
and they now use groups["bastion"][0] to find the bastion host -- but
this group is actually orthogonal to the group of the same name
defined in inventory/service/groups.yaml.

The testing and production paths are running on the executor, and, as
mentioned, need to know the bridge node to log into.  For the testing
path this is happening via the group created in the job definition
from zuul.d/system-config-run.yaml.  For the production jobs, this
group is populated via the add-bastion-host role which dynamically
adds the bridge host and group.

Only the *nested* Ansible running on the bastion host reads
s-c:inventory/service/groups.yaml.  None of the nested-ansible
playbooks need to target only the currently active bastion host.  For
example, we can define as many bridge nodes as we like in the
inventory and run service-bridge.yaml against them.  It won't matter
because the production jobs know the host that is the currently active
bridge as described above.

So, instead of using the same group name in two contexts, rename the
testing/production group "prod_bastion".  groups["prod_bastion"][0]
will be the host that the testing/production jobs use as the bastion
host -- references are updated in this change (i.e. the two places
this group is defined -- the group name in the system-config-run jobs,
and add-bastion-host for production).

We then can return the "bastion" group match to bridge*.opendev.org in
inventory/service/groups.yaml.

This fixes a bootstrapping problem -- if you launch, say,
bridge03.opendev.org the launch node script will now apply the
base.yaml playbook against it, and correctly apply all variables from
the "bastion" group which now matches this new host.  This is what we
want to ensure, e.g. the zuul user and keys are correctly populated.

The other thing we can do here is change the testing path
"prod_bastion" hostname to "bridge99.opendev.org".  By doing this we
ensure we're not hard-coding for the production bridge host in any way
(since if both testing and production are called bridge01.opendev.org
we can hide problems).  This is a big advantage when we want to rotate
the production bridge host, as we can be certain there's no hidden
dependencies.

Change-Id: I137ab824b9a09ccb067b8d5f0bb2896192291883
2022-11-04 09:18:35 +11:00
Ian Wienand
730fcf0171 Remove old bridge testing
We don't need to test on a bionic bridge any more, remove the old test
job.

Change-Id: I826f740b004bdc8b85977ba08f4e17e92f40a316
2022-11-03 04:10:31 +00:00
Zuul
b8326dcc9d Merge "Add python 3.11 docker images" 2022-10-27 23:11:29 +00:00
Zuul
7aa0ef1304 Merge "Switch bridge to bridge01.opendev.org" 2022-10-26 01:13:52 +00:00
Zuul
39387607bc Merge "Drop python 3.8 base image builds" 2022-10-25 22:06:30 +00:00
Clark Boylan
ee359c7e3b Add python 3.11 docker images
Python 3.11 has been released. Once the parent commit of this commit
lands we will have removed our python3.8 images making room for
python3.11 in our image list. Add these new images which will make way
for running and testing our software on this new version of python.

Change-Id: Idcea3d6fa22839390f63cd1722bc4cb46a6ccd53
2022-10-25 10:43:29 -07:00
Ian Wienand
102534fdb8
Switch bridge to bridge01.opendev.org
This switches the bridge name to bridge01.opendev.org.

The testing path is updated along with some final references still in
testinfra.

The production jobs are updated in add-bastion-host, and will have the
correct setup on the new host after the dependent change.

Everything else is abstracted behind the "bastion" group; the entry is
changed here which will make all the relevant playbooks run on the new
host.

Depends-On:  https://review.opendev.org/c/opendev/base-jobs/+/862551
Change-Id: I21df81e45a57f1a4aa5bc290e9884e6dc9b4ca13
2022-10-25 16:08:10 +11:00
Ian Wienand
dc18968927
Run a base test against "old" bridge
Run a base test against a Bionic bridge to ensure we don't break
things as we transition the current production host as we move to a
new Focal-based environment.

Change-Id: I1f745a06c4428cf31a166b3d53dd6321bfd41ebc
2022-10-20 09:49:10 +11:00
Ian Wienand
51611845d4
Convert production playbooks to bastion host group
Following-on from Iffb462371939989b03e5d6ac6c5df63aa7708513, instead
of directly referring to a hostname when adding the bastion host to
the inventory for the production playbooks, this finds it from the
first element of the "bastion" group.

As we do this twice for the run and post playbooks, abstract it into a
role.

The host value is currently "bridge.openstack.org" -- as is the
existing hard-coding -- thus this is intended to be a no-op change.
It is setting the foundation to make replacing the bastion host a
simpler process in the future.

Change-Id: I286796ebd71173019a627f8fe8d9a25d0bfc575a
2022-10-20 09:49:10 +11:00
Ian Wienand
d4c46ecdef
Abstract name of bastion host for testing path
This replaces hard-coding of the host "bridge.openstack.org" with
hard-coding of the first (and only) host in the group "bastion".

The idea here is that we can, as much as possible, simply switch one
place to an alternative hostname for the bastion such as
"bridge.opendev.org" when we upgrade.  This is just the testing path,
for now; a follow-on will modify the production path (which doesn't
really get speculatively tested)

This needs to be defined in two places :

 1) We need to define this in the run jobs for Zuul to use in the
    playbooks/zuul/run-*.yaml playbooks, as it sets up and collects
    logs from the testing bastion host.

 2) The nested Ansible run will then use inventory
    inventory/service/groups.yaml

Various other places are updated to use this abstracted group as the
bastion host.

Variables are moved into the bastion group (which only has one host --
the actual bastion host) which means we only have to update the group
mapping to the new host.

This is intended to be a no-op change; all the jobs should work the
same, but just using the new abstractions.

Change-Id: Iffb462371939989b03e5d6ac6c5df63aa7708513
2022-10-20 09:00:43 +11:00
Ian Wienand
deed697853
testinfra: Update selenium calls
Now that all the bridge nodes are Jammy (3.10), we can uncap this
dependency which will bring in the latest selenium.  Unfortunately
after investigation the easier way to do things I hoped this would
allow doesn't work; comments are added and small updates for new API.

Update the users file-match so they run too.

Change-Id: I6a9d02bfc79b90417b1f5b3d9431f4305864869c
2022-10-20 09:00:43 +11:00
Ian Wienand
34dc0f2679
Run jobs with a jammy bridge.openstack.org
In prepartion for upgrading this host, run jobs with a Jammy based
bridge.openstack.org.

Since this has a much later Python, it brings in a later version of
selenium when testing (used for screenshots) which has dropped some of
the APIs we use.  Pin it to the old version; we will fix this in a
follow-on just to address one thing at a time
(I6a9d02bfc79b90417b1f5b3d9431f4305864869c).

Change-Id: If53286c284f8d25248abf4a1b2edd6951437dec2
2022-10-20 09:00:43 +11:00
Ian Wienand
8efaf8da93
infra-prod-bootstrap-bridge: fix typo in playbook name
Introduced with Iebaeed5028050d890ab541818f405978afd60124

Change-Id: I2e06221d03589dc6bcb5fb060b439e35e3d604dc
2022-10-19 11:10:21 +11:00
Ian Wienand
77ebe6e0b7
infra-prod-bootstrap-bridge: run directly on bridge
In discussion of other changes, I realised that the bridge bootstrap
job is running via zuul/run-production-playbook.yaml.  This means it
uses the Ansible installed on bridge to run against itself -- which
isn't much of a bootstrap.

What should happen is that the bootstrap-bridge.yaml playbook, which
sets up ansible and keys on the bridge node, should run directly from
the executor against the bridge node.

To achieve this we reparent the job to opendev-infra-prod-setup-keys,
which sets up the executor to be able to log into the bridge node.  We
then add the host dynamically and run the bootstrap-bridge.yaml
playbook against it.

This is similar to the gate testing path; where bootstrap-bridge.yaml
is run from the exeuctor against the ephemeral bridge testing node
before the nested-Ansible is used.

The root key deployment is updated to use the nested Ansible directly,
so that it can read the variable from the on-host secrets.

Change-Id: Iebaeed5028050d890ab541818f405978afd60124
2022-10-15 10:39:53 +11:00
Clark Boylan
df4f11393b Drop python 3.8 base image builds
Python 3.11 is coming up and running image builds for all the pythons
gets overwhelming fast. We end up with so many jobs that landing any one
change to our base images becomes difficult. To reduce the total count
we remove builds for 3.8 to make room for 3.11. Only a few things appear
to still be using the 3.8 images and their updates are all listed as
depends on below.

Depends-On: https://review.opendev.org/c/opendev/gerritbot/+/861474
Depends-On: https://review.opendev.org/c/opendev/grafyaml/+/861475
Depends-On: https://review.opendev.org/c/opendev/statusbot/+/861476
Depends-On: https://review.opendev.org/c/zuul/zuul-client/+/861477
Depends-On: https://review.opendev.org/c/zuul/zuul-operator/+/861478
Depends-On: https://review.opendev.org/c/zuul/zuul-registry/+/861479
Change-Id: Ifa44ed0586f54b7ee4d6e37ba32235d63a30addb
2022-10-14 14:34:03 -07:00
Zuul
46ea560259 Merge "Add Jammy gitea-lb02 to our inventory" 2022-10-14 16:44:12 +00:00
Clark Boylan
3e3e053f49 Resync gerrit plugin versions to latest gerrit releases
This was missed in the effort to push out Gerrit 3.5.3 as well as the
ssh rsa sha2 fixes. That said it should be mostly fine as all of the
plugins tagged 3.5.2 have tagged the same commit with 3.5.3. Making this
largely a bookkeeping change.

There is one bit that isn't strictly bookkeeping and that is the
plugins/its-base checkout. Against gerrit 3.5 we convert from a master
checkout [0] to a stable-3.5 [1] checkout as this branch exists now.
Against gerrit 3.6 we convert from a stable-3.6 checkout to a master
checkout. I suspect that a stable-3.6 branch existed for a short period
of time and was cleaned up and zuul is using an old cached state.

The change for its-base on gerrit 3.5 does represent a reversion of
three commits but they all seem related to gerrit 3.6 so I expect this
is fine.

[0] https://gerrit.googlesource.com/plugins/its-base/+log/refs/heads/master
[1] https://gerrit.googlesource.com/plugins/its-base/+log/refs/heads/stable-3.5

Change-Id: I619b28fe642ca8b57eb533157ec0a441f6b66890
2022-10-13 16:54:12 -07:00
Clark Boylan
8d4f1c719e Add Jammy gitea-lb02 to our inventory
This adds our first Jammy production server to the mix. We update the
gitea load balancer as it is a fairly simple service which will allow us
to focus on Jammy updates and not various server updates.

We update testing to shift testing to a jammy node as well. We don't
remove gitea-lb01 yet as this will happen after we switch DNS over to
the new server and are happy with it.

Change-Id: I8fb992e23abf9e97756a3cfef996be4c85da9e6f
2022-10-13 13:09:13 -07:00
Ian Wienand
14b85ea1b8
afs-release: better info when can not get lockfile
For some reason this is failing in the gate -- the some reason bit is
hard to determine at the moment.  Log the exception.

Change-Id: I13c60c5dfc4ab19d8dec589c96338adc7461c992
2022-10-11 10:53:02 +11:00
Ian Wienand
66e510f0ee
run-selenium: Use latest tag on firefox image
I'm not sure why I used this tag; I probably copied it from [1] at the
time?  Let's just try latest.

Update matchers so the screenshot jobs run

[1] https://github.com/SeleniumHQ/docker-selenium

Change-Id: I8ea7981dac54883822f3b6076b6f0f564571f018
2022-10-11 10:53:00 +11:00
Clark Boylan
4170a94be1 Collect apache logs from gitea99 host in testing
We want to ensure that the logging apache does for us is sufficient to
trace requests from the load balancer to apache to gitea. To do that we
need to gather the logs and look at them.

Change-Id: I468d37709c1a3c2255b1bfcf38a23bb1a2a75899
2022-09-30 12:48:51 -07:00
Clark Boylan
8e618c4a95 Remove ansible-version: 2.9
Zuul is removing support for old ansible versions. Remove our pin to
old ansible. There shouldn't be any reason for these pins at this point.

Change-Id: I0e0998e0d29d55695c6cd92b10feeb910b086d0a
2022-09-21 08:47:41 -07:00
James E. Blair
c661fb0972 Add Jaeger tracing server
Change-Id: I1aa68b1d5f99364fa09776301894b922ed169a3a
2022-09-15 19:21:33 -07:00
Clark Boylan
970d5f6a06 Update python builder and base image
It is a good idea ot periodically update our base python images. Now is
a good time to do it as we've got debian bullseye updates and python
minor releases. The bullseye updates fix a glibc bug that was affecting
Ansible in the zuul images. With this update we'll be able to remove the
workaround for that issue.

We also update the builder image's apt-get process to include a clean to
match tbe base image. This is more for consistency than anything else.

Finally update job timeouts for builds as it seems we occasionally need
more time particularly for emulated arm64 builds.

Change-Id: I31483ff434f19f408aef3b63cb2cd24044a8bf29
2022-09-13 11:39:10 -07:00
Zuul
a8a19abf2c Merge "system-config-run-borg-backup: add to gate" 2022-08-12 07:11:38 +00:00
Zuul
74389454ce Merge "system-config-run-borg-backup: rename hosts to distro" 2022-08-11 23:57:30 +00:00
Zuul
00df4d06c0 Merge "system-config-run-borg-backup: add jammy test host" 2022-08-11 05:32:30 +00:00
Ian Wienand
46bb73d947 system-config-run-borg-backup: add to gate
We must have missed this, I noticed when it didn't run on the gate job
for I949c40e9046008d4f442b322a267ce0c967a99dc

Change-Id: I62c5c0f262d9bd53580367dc9f1ad00fe7b6f6f2
2022-08-11 13:54:52 +10:00
Ian Wienand
55654851bc system-config-run-borg-backup: rename hosts to distro
Rename the testing hosts to be clearer that they are different
distros.

Change-Id: Ic4b2b4a1b1fa8bc9a9eb62dc2ccba529958f19cd
2022-08-11 13:32:49 +10:00
Zuul
4ee5be00d9 Merge "Also pin pip/setuptools when creating Xenial venvs" 2022-08-11 00:19:46 +00:00
Jeremy Stanley
2d9d24d07d Also pin pip/setuptools when creating Xenial venvs
We still have some Ubuntu Xenial servers, so cap the max usable pip
and setuptools versions in their venvs like we already do for
Bionic, in order to avoid broken installations. Switch our
conditionals from release name comparisons to version numbers in
order to more cleanly support ranges. Also make sure the borg run
test is triggered by changes to the create-venv role.

Change-Id: I5dd064c37786c47099bf2da66b907facb517c92a
2022-08-10 19:35:10 +00:00
Ian Wienand
a36ee527c8 system-config-run-borg-backup: add jammy test host
With Jammy production nodes coming, add testing to the backup roles on
this distro.

Change-Id: I7d7733c7a52918b1faa65c3d0dcfd2cf94e66066
2022-08-10 10:14:56 +10:00
Ian Wienand
57939b40d9 system-config-run: bump base timeout to 3600
Many of our tests are actually running with a timeout of 3600; I think
between a combination of bumping timeouts for failures and
copy-pasting jobs.

We are seeing frequent timeouts of other jobs without this,
particularly on OVH GRA1.  Let's bump the base timeout to 3600 to
account for this.  The only job that overrides this now is gitea,
which runs for 4800 due to it's long import process.

Change-Id: I762f0f7c7a53a456d9269530c9ae5a9c85903c9c
2022-08-10 10:14:56 +10:00
Ian Wienand
08644ae925 mirror-update: move testing to mirror-update99
Keeping the testing nodes at the other end of the namespace separates
them from production hosts.  This one isn't really referencing itself
in testing like many others, but move it anyway.

Change-Id: I2130829a5f913f8c7ecd8b8dfd0a11da3ce245a9
2022-08-05 08:18:55 +10:00
Ian Wienand
5ba37ced60 paste: move certificate to group variable
Similar to Id98768e29a06cebaf645eb75b39e4dc5adb8830d, move the
certificate variables to the group definition file, so that we don't
have to duplicate handlers or definitions for the testing host.

Change-Id: I6650f5621a4969582f40700232a596d84e2b4a06
2022-08-05 08:18:55 +10:00
Ian Wienand
e70c1e581c static: move certs to group, update testing name to static99
Currently we define the letsencrypt certs for each host in its
individual host variables.

With recent work we have a trusted CA and SAN names setup in
our testing environment; introducing the possibility that we could
accidentally reference the production host during testing (both have
valid certs, as far as the testing hosts are concerned).

To avoid this, we can use our naming scheme to move our testing hosts
to "99" and avoid collision with the production hosts.  As a bonus,
this really makes you think more about your group/host split to get
things right and keep the environment as abstract as possible.

One example of this is that with letsencrypt certificates defined in
host vars, testing and production need to use the same hostname to get
the right certificates created.  Really, this should be group-level
information so it applies equally to host01 and host99.  To cover
"hostXX.opendev.org" as a SAN we can include the inventory_hostname in
the group variables.

This updates one of the more tricky hosts, static, as a proof of
concept.  We rename the handlers to be generic, and update the testing
targets.

Change-Id: Id98768e29a06cebaf645eb75b39e4dc5adb8830d
2022-08-05 08:18:55 +10:00
Zuul
11494a31a4 Merge "system-config-run-gitea: increase timeout" 2022-08-04 17:06:06 +00:00
Zuul
13d65b07a1 Merge "Run our base playbook on jammy" 2022-08-04 11:34:10 +00:00
Ian Wienand
53da4a3fb2 system-config-run-gitea: increase timeout
I've seen a couple of jobs timeout on this for no apparent reason.
Loading all the repos just seems to take a long time.  Looking at the
logs [1], depending on the cloud taking 55m - 1h is not terribly
uncommon.  Increase the timeout on this by 20 minutes to give it
enough headroom over an hour.

[1] https://zuul.opendev.org/t/openstack/builds?job_name=system-config-run-gitea&project=opendev%2Fsystem-config

Change-Id: I51080820bae35ac615a3b8b7ee1b8890e0df8410
2022-08-04 20:38:08 +10:00
Zuul
187e4307a1 Merge "paste : move testing host to paste99, remove https hacks" 2022-08-04 07:19:05 +00:00