This has our change to open etherpad on join, so we should no longer need
to run a fork of the web server. Switch to the upstream container image
and stop building our own.
Change-Id: I3e8da211c78b6486a3dcbd362ae7eb03cc9f5a48
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
We need to depend on the buildset registry as we are building this image
in a separate job. We also don't need to depend on the build job in
gate, we only need the upload job.
Change-Id: Ie7c2ed29c028f8c23d67ad38edbe04b12e22d026
This change splits our existing system-config-run-review job into two
jobs, one for gerrit 3.2 and another for 3.3. The biggest change is that
we use a var called zuul_test_gerrit_version to select which version we
want and that ends up in the fake group file written out by Zuul for the
nested ansible run. The nested ansible run will then populate the
docker-compose file with the appropriate version for us.
Change-Id: I00b52c0f4aa8df3ecface964007fcf5724887e5e
This adds a dockerfile to build an opendevorg/refstack image as well as
the jobs to build and publish it.
Change-Id: Icade6c713fa9bf6ab508fd4d8d65debada2ddb30
This starts at migrating OpenAFS server setup to Ansible.
Firstly we split up the groups and explicitly name hosts, as we will
me migrating each one step-by-step. We split out 1.8 hosts into a new
afs-1.8 group; the first host is afs01.ord.openstack.org which already
has openafs 1.8 installed manually.
An openafs-server role is introduced that does the same setup as the
extant puppet.
The AFS job is renamed to infra-prod-afs as the puppet component will
eventually disappear. Otherwise it runs in the same way, but also
runs the openafs-server role for the 1.8 servers.
Once this is merged, we can run it against afs01.ord.openstack.org to
ensure it works and is idempotent. We can then take on upgrading the
other file servers, and work further on the database servers.
Change-Id: I7998af43961999412f58a78214f4b5387713d30e
Having upgraded to 3.2, we don't need these versions any more.
Change-Id: Ifc37a75aa62b2498e649a4c81b589a04c794184a
Depends-On: https://review.opendev.org/763617
The hound project has undergone a small re-birth and moved to
https://github.com/hound-search/hound
which has broken our deployment. We've talked about leaving
codesearch up to gitea, but it's not quite there yet. There seems to
be no point working on the puppet now.
This builds a container than runs houndd. It's an opendev specific
container; the config is pulled from project-config directly.
There's some custom scripts that drive things. Some points for
reviewers:
- update-hound-config.sh uses "create-hound-config" (which is in
jeepyb for historical reasons) to generate the config file. It
grabs the latest projects.yaml from project-config and exits with a
return code to indicate if things changed.
- when the container starts, it runs update-hound-config.sh to
populate the initial config. There is a testing environment flag
and small config so it doesn't have to clone the entire opendev for
functional testing.
- it runs under supervisord so we can restart the daemon when
projects are updated. Unlike earlier versions that didn't start
listening till indexing was done, this version now puts up a "Hound
is not ready yet" message when while it is working; so we can drop
all the magic we were doing to probe if hound is listening via
netstat and making Apache redirect to a status page.
- resync-hound.sh is run from an external cron job daily, and does
this update and restart check. Since it only reloads if changes
are made, this should be relatively rare anyway.
- There is a PR to monitor the config file
(https://github.com/hound-search/hound/pull/357) which would mean
the restart is unnecessary. This would be good in the near and we
could remove the cron job.
- playbooks/roles/codesearch is unexciting and deploys the container,
certificates and an apache proxy back to localhost:6080 where hound
is listening.
I've combined removal of the old puppet bits here as the "-codesearch"
namespace was already being used.
Change-Id: I8c773b5ea6b87e8f7dfd8db2556626f7b2500473
This will scale up our meetpad install by 50% giving us more capacity
for PTG sessions.
We also increase the tox linters job timeout as it is slow pip
installing then slow running ansible-lint. Do this until we can sort out
why it is slow.
Change-Id: Ieceafefa27266f0bc0f427af790f920a8c44326c
Now that gerritbot is deployed from containers on eavesdrop we want to
run the infra-prod-service-eavesdrop job hourly to ensure that we keep
the docker image up to date there.
We haven't added the service-eavesdrop job to a deploy pipeline in
gerritbot because that would require us to add gerritbot's project ssh
key to bridge.
Change-Id: I5aba91f2ae5c018ee9b2d0481a53b630fc5d1ab7
This adds roles to implement backup with borg [1].
Our current tool "bup" has no Python 3 support and is not packaged for
Ubuntu Focal. This means it is effectively end-of-life. borg fits
our model of servers backing themselves up to a central location, is
well documented and seems well supported. It also has the clarkb seal
of approval :)
As mentioned, borg works in the same manner as bup by doing an
efficient back up over ssh to a remote server. The core of these
roles are the same as the bup based ones; in terms of creating a
separate user for each host and deploying keys and ssh config.
This chooses to install borg in a virtualenv on /opt. This was chosen
for a number of reasons; firstly reading the history of borg there
have been incompatible updates (although they provide a tool to update
repository formats); it seems important that we both pin the version
we are using and keep clients and server in sync. Since we have a
hetrogenous distribution collection we don't want to rely on the
packaged tools which may differ. I don't feel like this is a great
application for a container; we actually don't want it that isolated
from the base system because it's goal is to read and copy it offsite
with as little chance of things going wrong as possible.
Borg has a lot of support for encrypting the data at rest in various
ways. However, that introduces the possibility we could lose both the
key and the backup data. Really the only thing stopping this is key
management, and if we want to go down this path we can do it as a
follow-on.
The remote end server is configured via ssh command rules to run in
append-only mode. This means a misbehaving client can't delete its
old backups. In theory we can prune backups on the server side --
something we could not do with bup. The documentation has been
updated but is vague on this part; I think we should get some hosts in
operation, see how the de-duplication is working out and then decide
how we want to mange things long term.
Testing is added; a focal and bionic host both run a full backup of
themselves to the backup server. Pretty cool, the logs are in
/var/log/borg-backup-<host>.log.
No hosts are currently in the borg groups, so this can be applied
without affecting production. I'd suggest the next steps are to bring
up a borg-based backup server and put a few hosts into this. After
running for a while, we can add all hosts, and then deprecate the
current bup-based backup server in vexxhost and replace that with a
borg-based one; giving us dual offsite backups.
[1] https://borgbackup.readthedocs.io/en/stable/
Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e
There is a new release, update base container. Add promote job that
was forgotten with the original commit
Iddfafe852166fe95b3e433420e2e2a4a6380fc64.
Change-Id: Ie0d7febd2686d267903b29dfeda54e7cd6ad77a3
This deploys graphite from the upstream container.
We override the statsd configuration to have it listen on ipv6.
Similarly we override the ngnix config to listen on ipv6, enable ssl,
forward port 80 to 443, block the /admin page (we don't use it).
For production we will just want to put some cinder storage in
/opt/graphite/storage on the production host and figure out how to
migrate the old stats. The is also a bit of cleanup that will follow,
because we half-converted grafana01.opendev.org -- so everything can't
be in the same group till that is gone.
Testing has been added to push some stats and ensure they are seen.
Change-Id: Ie843b3d90a72564ef90805f820c8abc61a71017d
This uses the Grafana container created with
Iddfafe852166fe95b3e433420e2e2a4a6380fc64 to run the
grafana.opendev.org service.
We retain the old model of an Apache reverse-proxy; it's well tested
and understood, it's much easier than trying to map all the SSL
termination/renewal/etc. into the Grafana container and we don't have
to convince ourselves the container is safe to be directly web-facing.
Otherwise this is a fairly straight forward deployment of the
container. As before, it uses the graph configuration kept in
project-config which is loaded in with grafyaml, which is included in
the container.
Once nice advantage is that it makes it quite easy to develop graphs
locally, using the container which can talk to the public graphite
instance. The documentation has been updated with a reference on how
to do this.
Change-Id: I0cc76d29b6911aecfebc71e5fdfe7cf4fcd071a4
This is a docker image based on the latest upstream Grafana with
grafyaml also installed inside. It includes a small script to run a
refresh of the dashboards.
Change-Id: Iddfafe852166fe95b3e433420e2e2a4a6380fc64
Make inventory/service for service-specific things, including the
groups.yaml group definitions, and inventory/base for hostvars
related to the base system, including the list of hosts.
Move the exisitng host_vars into inventory/service, since most of
them are likely service-specific. Move group_vars/all.yaml into
base/group_vars as almost all of it is related to base things,
with the execption of the gerrit public key.
A followup patch will move host-specific values into equivilent
files in inventory/base.
This should let us override hostvars in gate jobs. It should also
allow us to do better file matchers - and to be able to organize
our playbooks move if we want to.
Depends-On: https://review.opendev.org/731583
Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf
It's the only part of base that's important to run when we run a
service. Run it in the service playbooks and get rid of the
dependency on infra-prod-base.
Continue running it in base so that new nodes are brought up
with iptables in place.
Bump the timeout for the mirror job, because the iptables addition
seems to have just bumped it over the edge.
Change-Id: I4608216f7a59cfa96d3bdb191edd9bc7bb9cca39
This is required by the accessbot job which is in periodic. We
moved it to hourly so that ptgbot could be updated more often, but
without it being in periodic, no periodic jobs are running, and that
seems more critical at the moment.
Change-Id: I0c7dbc0db77f295820302441e495fe4e9ea7d726
Since changes to some services on eavesdrop, for example ptgbot, may
need to take effect fairly quickly, run the playbook hourly rather
than daily. We can't easily trigger on changes merging to the ptgbot
repo in the future when it's in a different Zuul tenant from
system-config.
Change-Id: I90ddc555ded0ac1d3134fd075d816155a475c6d2
We already run accessbot in project-config when the accessbot
script changes. We don't need to run it whenever any of the puppet
or other config on eavesdrop runs, not do we need to run it
hourly. Just run it nightly and on changes to the actual
accessbot config.
Change-Id: Idd47f7c96f677fd1e1c8da3be262a52a70646acd
Our .zuul.yaml file has grown quite large. Try to make this more
manageable by splitting it into zuul.d/ directory with jobs organized by
function.
Change-Id: I0739eb1e2bc64dcacebf92e25503f67302f7c882