These new servers are running focal and are ready to be configured as
executors. Once these are up and running I will ask
ze02-04.openstack.org to pause, then will stop them once they are
paused, and finally delete their servers.
Change-Id: I3c8377a68605d45da9759ce5b24e769a2427aff2
This server has been replaced by ze01.opendev.org running Focal. Lets
remove the old ze01.openstack.org from inventory so that we can delete
the server. We will follow this up with a rotation of new focal servers
being put in place.
This also renames the xenial executor in testing to ze12.openstack.org
as that will be the last one to be rotated out in production. We will
remove it from testing at that point as well.
We also remove a completely unused zuul-executor-opendev.yaml group_vars
file to avoid confusion.
Change-Id: Ida9c9a5a11578d32a6de2434a41b5d3c54fb7e0c
Since we have SRV DNS entries for our afsdb services, we don't need to
explicitly list their IP addresses here. From the man page:
For the client CellServDB, it may be desirable to make the client
aware of a cell (so that it's listed by default in /afs when the
-dynroot flag to afsd is in use, for instance) without specifying
the database server machines for that cell. This can be done by
including only the cell line (starting with ">") and omitting any
following database server machine lines. afsd must be configured
with the -afsdb option to use DNS SRV or AFSDB record lookups to
locate database server machines. If the cell has such records and
the client is configured to use them, this configuration won't
require updates to the client CellServDB file when the IP addresses
of the database server machines change.
Thus we just keep the openstack.org entry. We're have not been
keeping the list in here up-to-date with the grand.central.org version
(well, not since 2014 anyway). Since we don't really need to track
any of these, just remove them.
Change-Id: Id358e373c4c804ebe32b7447e5880015119926a5
This group no longer does anything. This used to deploy a bunch of
keytabs for mirror-update, but that has all moved into
"mirror_update_keytab_*".
Change-Id: I3e2110a621d6946bc4838bfa2f743f0e9db391f3
We are in the process of upgrading the AFS servers to focal. As
explained by auristor (extracted from IRC below) we need 3 servers to
actually perform HA with the ubik protocol:
the ubik quorum is defined by the list of voting primary ip addresses
as specified in the ubik service's CellServDB file. The server with
the lowest ip address gets 1.5 votes and the others 1 vote. To win
election requires greater than 50% of the votes. In a two server
configuration there are a total of 2.5 votes to cast. 1.5 > 2.5/2 so
afsdb02.openstack.org always wins regardless of what
afsdb01.openstack.org says. And afsb01.openstack.org can never win
because 1 < 2.5/2. by adding a third ubik server to the quorum, the
total votes cast are 3.5 and it always requires the vote of two
servers to elect a winner ... if afsdb03 is added with the highest
ip address, then either afsdb01 or afsdb02 can be elected
Add a third server which is a focal host and related configuration.
Change-Id: I59e562dd56d6cbabd2560e4205b3bd36045d48c2
This is a focal replacement for ze01.openstack.org. Cleanup for
ze01.openstack.org will happen in a followup when we are happy with the
results of running zuul-executor on focal.
Change-Id: If1fef88e2f4778c6e6fbae6b4a5e7621694b64c5
Our Mailman site templates and similar content contain links to an
old openstack-security page on the foundation-run site which no
longer exists. Correct this to the OpenStack community's security
site, which should be much more stable.
Change-Id: I9577540319c53f76afc40a33b2c5697280397149
This file is now removed (I0cbcd4694a4796573fe48383756be03597d2da0f);
get rid of this to avoid any confusion.
Change-Id: I837d1fccbfa2461eb1315eac54c2a017fcb86511
This syslog configuration is what sends any logs with a program-name
of "docker-<foo>" to /var/log/containers/foo.log. However, at 98-
level the rules are after the default 50- rules, so we're seeing the
logs copied to both syslog and /var/log/containers. Since this
contains a "stop" command, we should move this earlier before the
default rules and the docker logs will not be duplicated.
Change-Id: I0cbcd4694a4796573fe48383756be03597d2da0f
These servers are all up and running and should be ready to go after the
zm01 canary. Note there will be a followup change to remove
zm02-zm08.openstack.org from the inventory. We split this up so that we
can keep those servers around until we're happy with the replacements.
Change-Id: Ic2671da104df2b01986d1b65c8d13507d6792c40
As described inline, ensure that minimal facts for the backup servers'
are loaded before running the backup roles on hosts, so they can read
the ansible_ssh_host_key_ed25519_public fact for each backup server
and ensure it is accepted.
Update the other comments slightly as well.
Change-Id: I1f207ca0770d58f61a89f9ade0bd26cebc982c62
This should be called "_extra" ... currently it overrides the default
exclude list. This means /var/lxcfs gets incorrectly included in the
backup and makes it error out as it has sockets and weird stuff that
can't be backed up; this is why we are getting failure mail.
Change-Id: Idea70c32b2d42f77fee2b35487d88a8ee982c856
I introduced this typo with I500062c1c52c74a567621df9aaa716de804ffae7.
Luckily Ibb63f19817782c25a5929781b0f6342fe4c82cf0 has alerted us to
this problem.
Change-Id: I02bf2f4fa1041642a719100e9591bf5cd1a0bf49
This is a Focal server that will replace zm01.openstack.org. Once this
is deployed and happy we can also move forward and do the remainder of
the mergers.
Change-Id: I139c52e26d17ac8d9b604366a3333556d23c5536
So we can stop/pull/start, move the pull tasks to their own files
and add a playbook that invokes them.
Change-Id: I4f351c1d28e5e4606e0a778e545a3a805525ac71
This change adds comments to the python-base and python-builder
dockerfiles to force a rebuild of these images. We do this periodically
to pull in updates.
Change-Id: I109a46603a74a376cc36fdfbd35734f6dc835abe
Add the two new borg hosts to cacti. Also remove the old bup server
which was still lurking there.
Change-Id: I2bf9e401f93b59ecef162db7020f97ba1498e027
This includes a fix for I216528a76307189d8d87bd2fcfeff95c6ceb53cc.
Now it's released we can be a bit more explicit about why we added the
workaround.
Change-Id: Ibaf1850549b5e7ec3622418b650bc5e59a289ab6
We have seen some poor performance from gitea which may be related to
manage project updates. Start a dstat service which logs to a csv file
on our system-config-run job hosts in order to collect performance info
from our services in pre merge testing. This will include gitea and
should help us evaluate service upgrades and other changes from a
performance perspective before they hit production.
Change-Id: I7bdaab0a0aeb9e1c00fcfcca3d114ae13a76ccc9
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
This sets a global BORG_UNDER_CRON=1 environment variable for
production hosts and makes the borg-backup script send an email if any
part of the backup job appears to fail (this avoids spamming ourselves
if we're testing backups, etc).
We should ideally never get this email, but if we do it's something we
want to investigate quickly. There's nothing worse than thinking
backups are working when they aren't.
Change-Id: Ibb63f19817782c25a5929781b0f6342fe4c82cf0