I'm not sure if something changed in dkms, but this log file is
helpful on centos 9-stream and the other check doesn't match anything.
Also update the README.rst slightly to be more in line with reality.
Change-Id: Ic8cab980ef43490eb1b3ca0b7a0d0c2329bb94ce
Starting the openafs-client service is an intensive operation as it
walks the cache registering various things. We've seen on our
production ARM64 mirror this can take longer than the 1:30 default
timeout. This is a fatal issue, as the module will try to unload
while afsd is still spinning and working resulting in completely
corrupt kernel state.
This is about double the longest time we've seen, so should give
plenty of overhead.
Change-Id: I37186494b9afd72eab3a092279579f1a5fa5d22c
After the latest Bazel upgrade, the --spawn_strategy=standalone
doesn't show the output of the subprocess created, making the
troubleshoot of the failures impossible.
Since release 0.27 Bazel auto detects the execution strategy, if no
strategy flag is provided. If none of the strategy flags was used,
Bazel will generate a default list of strategies (in this order):
remote,worker,sandboxed,local
and, for every action it wants to execute, will pick up the first
strategy that can execute it.
See this blog entry for more details: [1].
[1] https://blog.bazel.build/2019/06/19/list-strategy.html
Change-Id: I4be8375cee88f3565bae5c53cd1a3484ce398aba
Now that https://bugs.debian.org/980115 has been fixed in
1.8.2-1+deb10u1 for Buster and appears in the 10.9 stable point
release (2021-03-27), we no longer need our special backport PPA of
the patched packages and are able to safely drop it from the role.
Change-Id: Id062fef9461e8f6ac66585ccf25f85a588782177
Since we have SRV DNS entries for our afsdb services, we don't need to
explicitly list their IP addresses here. From the man page:
For the client CellServDB, it may be desirable to make the client
aware of a cell (so that it's listed by default in /afs when the
-dynroot flag to afsd is in use, for instance) without specifying
the database server machines for that cell. This can be done by
including only the cell line (starting with ">") and omitting any
following database server machine lines. afsd must be configured
with the -afsdb option to use DNS SRV or AFSDB record lookups to
locate database server machines. If the cell has such records and
the client is configured to use them, this configuration won't
require updates to the client CellServDB file when the IP addresses
of the database server machines change.
Thus we just keep the openstack.org entry. We're have not been
keeping the list in here up-to-date with the grand.central.org version
(well, not since 2014 anyway). Since we don't really need to track
any of these, just remove them.
Change-Id: Id358e373c4c804ebe32b7447e5880015119926a5
We are in the process of upgrading the AFS servers to focal. As
explained by auristor (extracted from IRC below) we need 3 servers to
actually perform HA with the ubik protocol:
the ubik quorum is defined by the list of voting primary ip addresses
as specified in the ubik service's CellServDB file. The server with
the lowest ip address gets 1.5 votes and the others 1 vote. To win
election requires greater than 50% of the votes. In a two server
configuration there are a total of 2.5 votes to cast. 1.5 > 2.5/2 so
afsdb02.openstack.org always wins regardless of what
afsdb01.openstack.org says. And afsb01.openstack.org can never win
because 1 < 2.5/2. by adding a third ubik server to the quorum, the
total votes cast are 3.5 and it always requires the vote of two
servers to elect a winner ... if afsdb03 is added with the highest
ip address, then either afsdb01 or afsdb02 can be elected
Add a third server which is a focal host and related configuration.
Change-Id: I59e562dd56d6cbabd2560e4205b3bd36045d48c2
Over time we've had various different reasons for installing our own
OpenAFS packages in various situations and we haven't kept the when:
flags totally up to date. Currently, we need the 1.8.6-5 packages
with the January 20201 timestamp fix installed everywhere; fix this.
I think that on long-running servers, the have the PPA installed from
prior iterations (I don't think we've ever *removed* it). So things
like executors are still running with our packages, perhaps somewhat
unintentionally.
Update the comments a little to reflect what's going on.
Change-Id: I6a58c23daf85cf8fa005e3dad84a665343a947bc
Stable Debain hasn't updated its openafs packages yet to fix the bit
masking problem. This breaks our testing for zuul jobs. Try the bionic
package from our ppa on debian instead.
Change-Id: I2ab469c984ae7d90d2a87abb2e4b29250c9bc8c2
bazel likes to build everything in ~/.cache and then symlink bazel-*
"convience symlinks" in the workspace/build directory. This causes a
problem for building docker images where we run in the context of the
build directory; docker will not follow the symlinks out of build
directory.
Currently the bazelisk-build copies parts of the build to the
top-level; this means the bazelisk-build role is gerrit specific,
rather than generic as the name implies.
We modify the gerrit build step to break build output symlink and move
it into the top level of the build tree, which is the context the
docker build runs in later. Since this is now just a normal
directory, we can copy from it at will there.
This is useful in follow-on builds where we want to start copying more
than just the release.war file from the build tree, e.g. polygerrit
plugin output.
While we're here, remove the javamelody things that were only for 2.X
series gerrit, which we don't build any more.
[1] https://docs.bazel.build/versions/master/output_directories.html
Change-Id: I00abe437925d805bd88824d653eec38fa95e4fcd
Specify bazelisk_targets as a list, and join the targets as
space-separated in the build command. This is used in the follow-on
Ie0a165cc6ffc765c03457691901a1dd41ce99d5a.
While we are here, remove the build-gerrit.sh script that isn't used
any more, along with the step that installs it.
Also, refactor the tasks to use include_role (this is also used in the
follow on).
Change-Id: I4f3908e75cbbb7673135a2717f9e51f099a4860e
We name the base image we build gerrit-base and we expose port 8081
not 8080 as opendev's gerrit listens on 8081.
Also explicitly build the javamelody plugin deps jar and copy it into
the review_site/lib dir on Gerrit 2 bazel builds. This is necessary
according to javamelody plugin build docs. In order to split Gerrit 2.x
and 3.x behavior in the Bazel builds we convert our Dockerfile into a
multi stage build.
All this ended up down a thread pull where the script in the Dockerfile
dir called build-gerrit.sh isn't actually used to build gerrit :/
clarify that. The script may be useful for local builds so we haven't
removed it yet.
Finally update gerrit plugin checkouts to tags or master as appropriate
where stable branches don't exist for the specified version.
Change-Id: I155a20685b3462e965c4216d134b3b36978fbcc7
Xenial ARM64 doesn't have openafs-client built; we have 1.8.5 built in
our PPA. Leave our production Xenial x86_64 systems with the inbuilt
1.6 client until we've thought about AFS server upgrades.
Change-Id: I7dad812a714133ffe54d4ecc1978f09abb39eb72
This tests the openafs client installation on all the arm64 types that
build wheels, where we currently need the client to copy the binary
wheel output.
Depends-On: https://review.opendev.org/733755
Change-Id: I278db0b6c8fad04ebf2f971bc7b0c007ee92ac31
We have pivoted to Ansible and We don't use puppet5 anywhere. Stop
testing on Bionic as we're not really interested in maintaing it, and
remove the puppet-install installation path there so we don't have
code that isn't being tested.
Change-Id: Ia2d05f7c75e46bc01717d11457b832e42522fa95
The lookup() happens on the local host, not the remote host. ergo we
were never using the Debian.aarch64.yaml file in production anyway
(where bridge is x86 so includes only the x86 file).
So clearly it is not necessary; as we have production ARM64 mirrors
using the base file. This is OK because we build the packages in the
PPA for x86 and arm64.
We can drop openafs_client_apt_repo which isn't used any more.
Follow-on will improve the testing of this.
Change-Id: I298cdfefc813006f7f4218dd37015992556c8498
If we move these into a subdir, it cleans up the number of things
we nave to files match on.
Stop running disable-puppet-agent in base. We run it in run-puppet
which should be fine.
Change-Id: Ia16adb96b11d25a097490882c4c59a50a0b7b23d
Extract eavedrop into its own service playbook and
puppet manifest. While doing that, stop using jenkinsuser
on eavesdrop in favor of zuul-user.
Add the ability to override the keys for the zuul user.
Remove openstack_project::server, it doesn't do anything.
Containerize and anisblize accessbot. The structure of
how we're doing it in puppet makes it hard to actually
run the puppet in the gate. Run the script in its own
playbook so that we can avoid running it in the gate.
Change-Id: I53cb63ffa4ae50575d4fa37b24323ad13ec1bac3
As part of OpenDev rename, a lot of links were changed.
A couple of URLs point to old locations, update them.
This list was done while grepping for "openstack-infra" and fixing
locations that are wrong.
Change-Id: I313d76284bb549f1b2c636ce17fa662c233c0af9
We need to use bazelisk to build gerrit so that we can properly
track bazel versions in the job. Use the roles developed for
gerrit-review to do that, then simplify the dockerfile to have
it simply copy the war into the target image.
Also add polymer-bridges.
Depends-On: https://review.opendev.org/709256
Change-Id: I7c13df51d3b8c117bcc9aab9caad59687471d622
We are seeing some failures that seem to add up to the yum module not
detecting a failure installing the kernel modules for openafs. See if
this works better with "dnf", which is the native package installer on
CentOS 8.
Change-Id: I82588ed5a02e5dff601b41b27b28a663611bfe89
Our control plane servers generally have large ephemeral storage
attached at /opt; for many uses this is enough space that we don't
need to add extra cinder volumes for a reasonable cache (as we usually
do on mirror nodes; but there we create large caches for both openafs
and httpd reverse proxy whose needs exceed even what we get from
ephemeral storage).
Add an option to set the cache location, and use /opt for our new
static01.opendev.org server.
Change-Id: I16eed1734a0a7e855e27105931a131ce4dbd0793
All our AFS release roles use "kinit" for authentication. The only
scripts using k5start are the mirror scripts, but since that doesn't
run on CentOS we don't need it there.
This avoids us having to use EPEL or, on 8, an unsupported build.
Anything needing to be portable should use kinit from now on.
Change-Id: I6323cb835cedf9974cf8d96faa7eb55b8aaafd9a
Linux default udp buffer sizes are somewhat small if sending much udp
traffic. Openafs uses udp for all of its traffic so we increase the
buffer size to 25MB.
Change-Id: Ie6cb7467c186d5471c71ca876ea9e29a90423bed
For whatever reason, the modules package recommends the client
package:
Package: openafs-modules-dkms
Depends: dkms (>= 2.1.0.0), perl:any, libc6-dev
Recommends: openafs-client (>= 1.8.0~pre5-1ubuntu1)
However, if that gets installed before the modules are ready, the
service tries to start and fails, but maybe fools systemd into
thinking it started correctly; so our sanity checks seem to fail on
new servers without a manual restart of the openafs client services.
By ignoring this recommends, we should install the modules, then the
client (which should start OK) in that order only.
Change-Id: I6d69ac0bd2ade95fede33c5f82e7df218da9458b
We've noticed that openafs was not getting upgraded to the PPA version
on one of our opendev.org mirrors. Switch install of packages to
"latest" to make sure it upgrades (reboots to actually apply change
unresolved issue, but at least package is there).
Also, while looking at this, reorder this to install the PPA first,
then ensure we have the kernel headers, then build the openafs kernel
modules, then install. Add a note about having to install/build the
modules first.
Change-Id: I058f5aa52359276a4013c44acfeb980efe4375a1
This requires an external program and only works on Debian hosts.
Newer versions of exim (4.91) have SPF functionality built-in, but
they are not yet available to us.
Change-Id: Idfe6bfa5a404b61c8761aa1bfa2212e4b4e32be9
In a follow-on change (I9bf74df351e056791ed817180436617048224d2c) I
want to use #noqa to ignore an ansible-lint rule on a task; however
emperical testing shows that it doesn't work with 3.5.1. Upgrading to
4.1.0 it seems whatever was wrong has been fixed.
This, however, requires upgrading to 4.1.0.
I've been through the errors ... the comments inline I think justify
what has been turned off. The two legitimate variable space issues I
have rolled into this change; all other hits were false positives as
described.
Change-Id: I7752648aa2d1728749390cf4f38459c1032c0877
Currently ansible fails on most puppet4 hosts with
TASK [puppet-install : Install puppetlabs repo] ********************************
fatal: [...]: FAILED! => {"changed": false, "msg": "A later version is already installed"}
As described inline, the version at the "top level" we are installing
via ansible here is actualy lower than the version in the repo this
package installs (inception). Thus once an upgrade has been run on
the host, we are now trying to *downgrade* the puppetlabs-release
package. This stops the ansible run and makes everything unhappy.
If we have the puppet repo, just skip trying to install it again.
We do this for just trusty and xenial; at this point we don't have any
puppet5 hosts (and none are planned) and I haven't checked if it has
the same issues.
Change-Id: I55ea8bfbfc40befb1d138e9bc0f95b120f8f5dbd
The ansible-role-puppet role manages puppet.conf for us. These two roles
are currently fighting each other over the presence of the server line
in puppet.conf. Avoid this by removing the removal of this line and the
templatedir line from the new puppet-install role since
ansible-role-puppet was there first. Basically just trust
ansible-role-puppet to write a working puppet.conf for us.
Change-Id: Ifb1dff31a61071bd867d3a7cc3cbcc496177e3ce
Talking to clarkb, it was decided we can remove this logic in favor of
having ansible-role-puppet push system-config and modules to the remote
nodes.
Change-Id: I59b8a713cdf2b4c1fede44e977c49be5e8cc08fa
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We can directly pass a list of packages to the package task in ansible,
this will help save us some times on run times.
Change-Id: I9b26f4f4f9731dc7d32186584620f1cec04b7a81
Signed-off-by: Paul Belanger <pabelanger@redhat.com>