This change adds a proxy config for registry.access.redhat which should
assist us when gating using images provided by the publically available
registry.
Change-Id: Ica7477d63659610de852d305a63f3e78d0dd8c4f
Signed-off-by: Kevin Carter <kecarter@redhat.com>
The change at https://review.opendev.org/669752 will cause the
self-testing behavior we wanted from this, but will apply more
narrowly, so that jobs are run when their own configuration
changes. Since this is no longer needed, remove it.
Change-Id: I50a863cab3bd7a3535fd0185d4ec9d1307b1b7d6
This adds a periodic job to copy logs to a mirror volume, and export
it via the usual mirror http.
I have precreated the log volume; just as a R/W volume because this is
expected to be very low volume access.
Change-Id: I67870f6d439af2d2a63a5048ef52cecff3e75275
We need to supply information to ansible about how to provision LE certs
for the new fortnebula mirror. Add this dict to host_vars for
mirror01.regionone.fortnebula.opendev.org.
Change-Id: I02218e26ab6e9fad67e634f22de207740506d9e1
We've had some changes to our cloud landscape over the past little while
and cloud launcher is bailing out early when it hits clouds it can't
talk to. Fix this by removing the clouds/regions that no longer
function/exist so that we can configure the clouds that are listed
later.
Change-Id: I803655325d3a92c6d228499800b29332b5b32741
Note we depends on the DNS updates so that LE cert provisioning works
on the first pass.
Depends-On: https://review.opendev.org/668929
Change-Id: I953938b77bfce67be0cb55af5cf4bd64044100f4
We have not put /usr/local/bin into the cron path, so it does not find
the update scripts. Since the scripts were working with this path on
the old host, restore it as a cron variable to maintain the status
quo.
Change-Id: Id9b7533720d3ccd9251055dec5b452cf5963dc85
Currently we start all jobs at the same time, which was not the
intent. Switch this to seed on the unique name of the job, which
should space jobs out randomly.
Change-Id: Ib41d8ca10aefe4a29bdd02935de8a588ab881958
Prior to https://review.opendev.org/#/c/656871/ this code was executed
by run_all.sh in every pass but seems to have been missed as part of
656871's base.yaml split up.
Add service-bridge.yaml to run_all.sh to get these updates applying to
bridge again. In particular things like clouds.yaml updates are missing
otherwise.
Note I've not merged bridge.yaml and service-bridge.yaml as it appears
we want all of the service stuff to happen after base.yaml but
bridge.yaml needs to happen before. I think this is why they were split
in the first place.
Change-Id: I0a7ce1a65cd19459bbaf244b94a23ddde360da1a
We've noticed that openafs was not getting upgraded to the PPA version
on one of our opendev.org mirrors. Switch install of packages to
"latest" to make sure it upgrades (reboots to actually apply change
unresolved issue, but at least package is there).
Also, while looking at this, reorder this to install the PPA first,
then ensure we have the kernel headers, then build the openafs kernel
modules, then install. Add a note about having to install/build the
modules first.
Change-Id: I058f5aa52359276a4013c44acfeb980efe4375a1
Add the new mirror-update server as a follow-on to
I525ac18b55f0e11b0a541b51fa97ee5d6512bf70.
Also ensure that the new mirror server isn't in the puppet groups by
only matching the openstack.org one.
Also remove from the afsadmin group. This group is only used for
keytabs stored on bridge.o.o. I don't think that we need group for
the keytabs -- a keytab should only ever be in use on one host at a
time, so we are better off keeping the keytabs in a specific host_var
for the host they are used on, rather than being in a group and
possibly deployed on servers where they are not used.
Depends-On: https://review.opendev.org/668610
Change-Id: Icda92bb234adc00f6718c1c656e8f069ce2704c4
Keytabs are slightly longer than what is being tested; upto 100 bytes
or so. This means the encoded data breaks over lines, which means you
need to be more careful about quoting.
Update the testing to a longer keytab (100 bytes of random data) and
fix up the quoting. Also enable no_logging to avoid putting key
material into the logs.
Change-Id: I73c391a2ebd2c962dc9a422f9d44265160210852
This move was prompted by wishing to expose the mirror update logs for
the rsync updates so that debugging problems does not require a root
user (note: not actually done in this change; will be a follow-on).
Rather than start hacking at puppet, the rsync mirror scripts make a
nice delination point for starting an Ansible-first/Bionic update.
Most magic is included in the scripts, so there is not much more to do
than copy them. The host uses the existing kerberos and openafs roles
and copies the key material into place (to be added before merge).
Note the scripts are removed from the extant puppet so we don't have
two updates happening simultaneously. This will also require a manual
clean to remove the cron jobs as a once-off when merging.
The other part of mirror-update is the reprepro based scripts for the
various debuntu repositories. They are left as future work for now.
Testing is added to ensure dependencies and scripts are all in place.
Change-Id: I525ac18b55f0e11b0a541b51fa97ee5d6512bf70
We are seeing:
fatal: [adns1.opendev.org]: FAILED! => {"msg": "The task includes an
option with an undefined variable. The error was:
'ansible.vars.hostvars.HostVarsVars object' has no attribute
'acme_txt_required'
I belive this is because we have a disabled mirror host now. So the
iad.rx.opendev.org mirror is in the "letsencrypt" group, but because
it is also disabled the prior role (letsencrypt-request-certs) has not
run and it has not populated it's "acme_txt_required" variable.
We should skip disabled hosts when inspecting the hosts for this
variable. Add this to the "with_inventory_hostnames" match.
Change-Id: I33a1c8b6f7e8499248e370f69a9f573a2bf106a5
Donnyd has kindly offered us access to fortnebula's test cloud. This
adds clouds.yaml entries to bridge and nodepool so that we can take
advantage of these resources.
Change-Id: I4ebc261c6f548aca0b3f37dc9b60ffac08029e67
As documented in [1]
If the number next to "GotSomeSpaces" or any of the "GSS*" fields is
greater than 0, then the fileserver ran out of callback space and had
to prematurely revoke callback promises from clients in order to free
up space.
Here's our stats on afs01:
$ xstat_fs_test localhost -collID 3 -onceonly
Starting up the xstat_fs service, no debugging, one-shot operation
------------------------------------------------------------
13547865 DeleteFiles
1849223729 DeleteCallBacks
45049055 BreakCallBacks
2098382037 AddCallBack
174 GotSomeSpaces
7800 DeleteAllCallBacks
20778 nFEs
21184 nCBs
1500000 nblks
43425561 CBsTimedOut
0 nbreakers
8 GSS1
4 GSS2
5 GSS3
169 GSS4
4 GSS5
So as noted, the server ran out of callback spaces a few times.
Raising it takes only a little memory, but will help performance.
Thanks to Jeffrey Altman (auristor) for pointing this out.
[1] https://www.openafs.org/pages/newsletter/newsletter-2013-03-volume004-issue05.html
Change-Id: I2ad33dd8918cb559634d2c5b8c4e4e7f2d6d4051
We have gitea state now so deploying a new server requires a bit of
process. Document that process.
Change-Id: I946f9880b66efdfb39bc9894950cd02058ed987a
During a db recovery to rebuild a host using the existing db backups
resulted in a corrupt mysql.proc table. The issue seemed to be
attempting to restore the mysql database. Instead of dumping all
databases lets just backup the one we care about: gitea.
Change-Id: Ia2c87b62736fda1c8a9ce77126e383ec74990b4a
This mirror will be manually configured with kafs (see
https://review.opendev.org/623974). This should be a nice distant
geographic counterpoint to the IAD RAX server.
This will need to be manually configured with a custom kernel for now,
but fixes are making their way upstream and this host will be
converted when available.
Depends-On: https://review.opendev.org/667529
Change-Id: I6a22933029c096c781c93c33e6edf03bf59223c9
This server was replaced and has had its db restored from backup on
gitea01, repo dirs recreated via gitea admin ui function, and gerrit has
replicated all repo content to this server.
Put this back into the rotation in haproxy as well as the ansible
management of gitea git repos.
Change-Id: I424d0db0adf0787d5d46e264b6552d79b48f27ef
This reverts commit b3ce1c52dc7ca455ffd94ea07d8a4fb1b6905fa8.
It removed the AFS mirror at the same time it added the proxy,
but jobs don't know to look for the proxy since it's on a
totally different TCP port.
Change-Id: I87cc03eb3322bd7b093dd6fe798aadb48f319805
As noted inline, this needs to be skipped on OVH (and I always forget,
and debug this over and over when launching a mirror node there :).
Change-Id: I07780e29f5fef75cdbab3b504f278387ddc4b13f
We add the new host so that it will get configured as a gitea backend
server. We exclude this server from the list of gitea hosts to configure
git repos on because we want to recover its DB from one of the other
sibling nodes first. This should preserve the http redirects for us.
Once we have the db recovered we can enable replication from gerrit then
readd this host to the haproxy load balancer.
Change-Id: Ia2a98e5ded43cad044db36ca8d0da5a96277afee
Note we don't fully remove it from cacti and hiera and so on because we
are replacing this server and we just want ansible to ignore the old
gitea06 for a bit while we bootstrap the new server.
Change-Id: Iaa89e77c055d8099a7d3d511723782fead43ce74