We're going to want Mailman 3 served over HTTPS for security
reasons, so start by generating certificates for each of the sites
we have in v2. Also collect the acme.sh logs for verification.
Change-Id: I261ae55c6bc0a414beb473abcb30f9a86c63db85
Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.
Change-Id: I7fa7467f703f5cec075e8e60472868c60ac031f7
Start backing up the new review server. Stop backing up the old
server. Fix the group matching test for the new server.
Change-Id: I8d84b80099d5c4ff7630aca9df312eb388665b86
This moves review02 out of the review-staging group and into the main
review group. At this point, review01.openstack.org is inactive so we
can remove all references to openstack.org from the groups. We update
the system-config job to run against a focal production server, and
remove the unneeded rsync setup used to move data.
This additionally enables replication; this should be a no-op when
applied as part of the transition process is to manually apply this,
so that DNS setup can pull zone changes from opendev.org.
It also switches to the mysql connector, as noted inline we found some
issues with mariadb.
Note backups follow in a separate step to avoid doing too much at
once, hence dropping the backup group from the testing list.
Change-Id: I7ee3e3051ea8f3237fd5f6bf1dcc3e5996c16d10
This converts our existing puppeted mailman configuration into a set of
ansible roles and a new playbook. We don't try to do anything new and
instead do our best to map from puppet to ansible as closely as
possible. This helps reduce churn and will help us find problems more
quickly if they happen.
Followups will further cleanup the puppetry.
Change-Id: If8cdb1164c9000438d1977d8965a92ca8eebe4df
We will be rotating zk01-03.openstack.org out and replacing them with
zk04-06.opendev.org. This is the first change in that process which puts
zk04 into the rotation. This should only be landed when operators are
ready to manually stop zookeeper on zk03 (which is being replaced by
zk04 in this change).
Change-Id: Iea69130f6b3b2c8e54e3938c60e4a3295601c46f
Once we are satisfied that we have disabled the inputs to firehose we
can land this change to stop managing it in config management. Once that
is complete the server can be removed.
Change-Id: I7ebd54f566f8d6f940a921b38139b54a9c4569d8
We duplicate the KDC settings over all our kerberos clients. Add
clients to a "kerberos-client" group and set the variables in a group
file.
Change-Id: I25ed5f8c68065060205dfbb634c6558488003a38
These are new focal replacement servers. Because this is the last set of
replacements for the executors we also cleanup the testing of the old
servers in the system-config-run-zuul job and the inventory group
checker job.
Change-Id: I111d42c9dfd6488ef69ff1a7f76062a73d1f37bf
This server has been replaced by ze01.opendev.org running Focal. Lets
remove the old ze01.openstack.org from inventory so that we can delete
the server. We will follow this up with a rotation of new focal servers
being put in place.
This also renames the xenial executor in testing to ze12.openstack.org
as that will be the last one to be rotated out in production. We will
remove it from testing at that point as well.
We also remove a completely unused zuul-executor-opendev.yaml group_vars
file to avoid confusion.
Change-Id: Ida9c9a5a11578d32a6de2434a41b5d3c54fb7e0c
This is a focal replacement for ze01.openstack.org. Cleanup for
ze01.openstack.org will happen in a followup when we are happy with the
results of running zuul-executor on focal.
Change-Id: If1fef88e2f4778c6e6fbae6b4a5e7621694b64c5
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
Both the filesevers and db servers have common key material deployed
by the openafs-server-config role. Put both types of server in a new
group "afs-server-common" so we can define this key material in just
one group file on bridge.
Then separate out the two into afs-<file|db>-server groups for
consistent naming.
Rename afs-admin for consistent naming.
The service file is updated to reflect the new groups.
Change-Id: Ifa5f251fdfb8de737ad2ed96491d45294ce23a0c
With all AFS file-servers upgraded to 1.8, we can move afs01.dfw back
and rename the group to just "afs".
Change-Id: Ib31bde124e01cd07d6ff7eb31679c55728b95222
Backups have been going well on ethercalc02, so add borg backup runs
to all backed-up servers. Port in some additional excludes for Zuul
and slightly modify the /var/ matching.
Change-Id: Ic3adfd162fa9bedd84402e3c25b5c1bebb21f3cb
This deploys graphite from the upstream container.
We override the statsd configuration to have it listen on ipv6.
Similarly we override the ngnix config to listen on ipv6, enable ssl,
forward port 80 to 443, block the /admin page (we don't use it).
For production we will just want to put some cinder storage in
/opt/graphite/storage on the production host and figure out how to
migrate the old stats. The is also a bit of cleanup that will follow,
because we half-converted grafana01.opendev.org -- so everything can't
be in the same group till that is gone.
Testing has been added to push some stats and ensure they are seen.
Change-Id: Ie843b3d90a72564ef90805f820c8abc61a71017d
Remove the separate "mirror_opendev" group and rename it to just
"mirror". Update various parts to reflect that change.
We no longer deploy any mirror hosts with puppet, remove the various
configuration files.
Depends-On: https://review.opendev.org/728345
Change-Id: Ia982fe9cb4357447989664f033df976b528aaf84
Zuul is publishing lovely container images, so we should
go ahead and start using them.
We can't use containers for zuul-executor because of the
docker->bubblewrap->AFS issue, so install from pip there.
Don't start any of the containers by default, which should
let us safely roll this out and then do a rolling restart.
For things (like web or mergers) where it's safe to do so,
a followup change will swap the flag.
Change-Id: I37dcce3a67477ad3b2c36f2fd3657af18bc25c40
Migration plan:
* add zk* to emergency
* copy data files on each node to a safe place for DR backup
* make a json data backup: zk-shell localhost:2181 --run-once 'mirror / json://!tmp!zookeeper-backup.json/'
* manually run a modified playbook to set up the docker infra without starting containers
* rolling restart; for each node:
* stop zk
* split data and log files and move them to new locations
* remove zk packages
* start zk containers
* remove from emergency; land this change.
Change-Id: Ic06c9cf9604402aa8eb4bb79238021c14c5d9563
This should be mostly a no-op - but we will need to do a shutdown
in emergency mode.
Tell the gerrit role to not run compose up when run as part of
remote_puppet_git.
Change-Id: Id45376c2697656a12afeacf317b6f26c85c08dad
We have LE dns entries for review.o.o, but we're not actually
requesting the cert. Go ahead and request it - it'll make the
apache config easier to sort out.
Get the openstack.org certs for review-dev while we're at it.
Change-Id: I91d06c97993ba37204bd1fc326ae823e1b9c0c1a
Depends-On: https://review.opendev.org/707267
Depends-On: https://review.opendev.org/707255
We ended up running into a problem with nodepool built control plane
images (has to do with boot from volume not allowing us to delete images
that are in use by a nova instance). We have decided to clean this up
and go back to not doing this until we can do it more properly.
Note this isn't a revert because having a group for access to control
plane clouds does seem like a good idea in general and I believe there
have been changes we'd have to resolve in the clouds.yaml files anyway.
Depends-On: https://review.opendev.org/#/c/665012/
Change-Id: I5e72928ec2dec37afa9c8567eff30eb6e9c04f1d
In order to have nodepool build images and upload them to control
plane clouds, add them to the clouds.yaml on the nodepool-builder
hosts. Keep them out of the launcher configs by splitting the config
templates. So that we can keep our copies of things to a minimum,
create a group called "control-plane-clouds" and put bridge and nb0*
in it.
There are clouds mentions in here that we no longer use, a followup
patch will clean those up.
NOTE: Requires shifting the clouds config dict from
host_vars/bridge.openstack.org.yaml to group_vars/control-plane-clouds.yaml
in the secrets on bridge.
Needed-By: https://review.opendev.org/640044
Change-Id: Id1161bca8f23129202599dba299c288a6aa29212
The server has been removed, remove it from inventory.
While we're here, s/graphite.openstack.org/graphite.opendev.org/'
... it's a CNAME redirect but we might as well clean up.
Change-Id: I36c951c85316cd65dde748b1e50ffa2e058c9a88
This leaves ask.o.o and lists.o.o, which are still running Trusty, and
the cgit servers, which are likely to be decommissioned soon.
Change-Id: I78e7fd9e3079cc760da0aad955f6eeb32d442fc3
Two related changes that need to go together because we test with the
production groups.yaml.
Confusingly, there are arm64 PC1 puppet repos, and it contains a bunch
of things that it turns out are the common java parts only. The
puppet-agent package is not available, and it doesn't seem like it
will be [1]. I think this means we can not run puppet4 on our arm64
xenial ci hosts.
The problem is the mirrors have been updated to puppet4 -- runs are
now breaking on the arm mirrors because they don't have puppet-agent
packages. It seems all we can really do at this point is contine to
run them on puppet3.
This is hard (impossible?) to express with a fnmatch in the existing
yamlgroups syntax. We could do something like list all the mirror
hosts and use anchors etc, but we have to keep that maintained. Add
an feature to the inventory plugin that if the list entry starts with
a ^ it is considered a full regex and passed to re.match. This
allows us to write more complex matchers where required -- in this
case the arm64 ci mirror hosts are excluded from the puppet4 group.
Testing is updated.
[1] https://groups.google.com/forum/#!msg/puppet-dev/iBMYJpvhaWM/WTGmJvXxAgAJ
Change-Id: I828e0c524f8d5ca866786978486bc04829464b47
In roughly lexicographical order, upgrade a batch of servers to puppet
4. We skip ask-staging because although it is in the futureparser group
it was temporarily disabled in puppet and so hasn't actually gone
through the futureparser validation stage yet.
Depends-On: https://review.openstack.org/643465
Change-Id: I3971ffb9800e95aaaba0076ec3bd6a05cd92a750
Remove the puppetry for managing nameservers as we now use ansible
configured name servers without puppet.
We will need to follow this up with deletion of the existing
ns*.openstack.org and adns1.openstack.org servers.
Change-Id: Id7ec8fa58c9e37ce94ec71e4562607914e5c3ea4
An upcoming change will remove review.openstack.org and
puppetmaster.openstack.org from our hostgroups, since these servers
have been deleted from the provider already. We were explicitly
testing the hostgroup membership for the former, so replace that
with a couple of new ones which should provide more stable coverage
going forward.
Change-Id: Ida28b65e9f1dc01f233cc9bff4ce32aef70e347a
We removed the mirror nodes from the webservers group to fix iptables
rule application on the nodes. Unfortunately we didn't update our test
that tries to assert mirrors should be in the webservers group. Update
the test results fixture to remove webservers as a valid group for a
mirror node.
Change-Id: Iba18e54f4df4a36c0247f65642faacca9d195769