In thinking harder about the bootstrap process, it struck me that the
"bastion" group we have is two separate ideas that become a bit
confusing because they share a name.
We have the testing and production paths that need to find a single
bridge node so they can run their nested Ansible. We've recently
merged changes to the setup playbooks to not hard-code the bridge node
and they now use groups["bastion"][0] to find the bastion host -- but
this group is actually orthogonal to the group of the same name
defined in inventory/service/groups.yaml.
The testing and production paths are running on the executor, and, as
mentioned, need to know the bridge node to log into. For the testing
path this is happening via the group created in the job definition
from zuul.d/system-config-run.yaml. For the production jobs, this
group is populated via the add-bastion-host role which dynamically
adds the bridge host and group.
Only the *nested* Ansible running on the bastion host reads
s-c:inventory/service/groups.yaml. None of the nested-ansible
playbooks need to target only the currently active bastion host. For
example, we can define as many bridge nodes as we like in the
inventory and run service-bridge.yaml against them. It won't matter
because the production jobs know the host that is the currently active
bridge as described above.
So, instead of using the same group name in two contexts, rename the
testing/production group "prod_bastion". groups["prod_bastion"][0]
will be the host that the testing/production jobs use as the bastion
host -- references are updated in this change (i.e. the two places
this group is defined -- the group name in the system-config-run jobs,
and add-bastion-host for production).
We then can return the "bastion" group match to bridge*.opendev.org in
inventory/service/groups.yaml.
This fixes a bootstrapping problem -- if you launch, say,
bridge03.opendev.org the launch node script will now apply the
base.yaml playbook against it, and correctly apply all variables from
the "bastion" group which now matches this new host. This is what we
want to ensure, e.g. the zuul user and keys are correctly populated.
The other thing we can do here is change the testing path
"prod_bastion" hostname to "bridge99.opendev.org". By doing this we
ensure we're not hard-coding for the production bridge host in any way
(since if both testing and production are called bridge01.opendev.org
we can hide problems). This is a big advantage when we want to rotate
the production bridge host, as we can be certain there's no hidden
dependencies.
Change-Id: I137ab824b9a09ccb067b8d5f0bb2896192291883
This replaces hard-coding of the host "bridge.openstack.org" with
hard-coding of the first (and only) host in the group "bastion".
The idea here is that we can, as much as possible, simply switch one
place to an alternative hostname for the bastion such as
"bridge.opendev.org" when we upgrade. This is just the testing path,
for now; a follow-on will modify the production path (which doesn't
really get speculatively tested)
This needs to be defined in two places :
1) We need to define this in the run jobs for Zuul to use in the
playbooks/zuul/run-*.yaml playbooks, as it sets up and collects
logs from the testing bastion host.
2) The nested Ansible run will then use inventory
inventory/service/groups.yaml
Various other places are updated to use this abstracted group as the
bastion host.
Variables are moved into the bastion group (which only has one host --
the actual bastion host) which means we only have to update the group
mapping to the new host.
This is intended to be a no-op change; all the jobs should work the
same, but just using the new abstractions.
Change-Id: Iffb462371939989b03e5d6ac6c5df63aa7708513
The pip3 role installs the latest upstream pip, overwriting the
packaged versions. We would prefer to install things in
venv/virtualenvs moving forward to keep better isolation.
Unfortunately thanks to time the Bionic era packaged pip is so old
that it can't install anything modern like Ansible. Thus we have to
squash installing Ansible into a separate venv into this change as
well.
Although the venv created by default on the Bionic host also has an
old pip, luckily we already worked around that in
I81fd268a9354685496a75e33a6f038a32b686352 which provides a create-venv
role that creates a fully updated venv for us.
To minimise other changes, this symlinks ansible/ansible-playbook into
/usr/local/bin. On our current production bastion host this will make
a bit of a mess -- but we are looking at replacing that with a fresh
system soon. The idea is that this new system will not be
bootstrapped with a globally installed Ansible, so we won't have
things lying around in multiple places.
Change-Id: I7551eb92bb6dc5918c367cc347f046ff562eab0c
The ara-report role used to add this but it hasn't been updated for
the latest ARA (I008b35562994f1205a4f66e53f93b9885a6b8754). Add it
back here.
Change-Id: I2d56e7cde32cd7adabb359a35ecdaa9f0880f7d5
ARA's master branch now has static site generation, so we can move
away from the stable branch and get the new reports.
In the mean time ARA upstream has moved to github, so this updates the
references for the -devel job.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/793530
Change-Id: I008b35562994f1205a4f66e53f93b9885a6b8754
Collect the tox logs from the testinfra run on bridge.openstack.org.
The dependent change helps if we have errors installing things into
tox, and this change lets us see the results.
Depends-On: https://review.opendev.org/747325
Change-Id: Id3c39d4287d7dc9705890c73a230b1935d349b9f
We can't run ARA on the executor because that involves running
arbitrary commands, instead generate reports on the executor and put
them where the normal fetch-output will find them later.
Change-Id: I20d88a7f03872d19f6bd014bc687a1bf16e4e80e
When testing our system-conf configuration we don't actually add zuul to
the docker group. This means the zuul user cannot access the docker
socket. This then breaks docker container log collection. Address this
by becoming root when collecting logs.
Change-Id: Ic0232f7ef458cdd07fb0853f97f2dc22ce137c71
This change switches the post bits to use a new centralized
role to collect all container logs.
Depends-On: https://review.opendev.org/701867
Change-Id: I9e982b37518c22e6d5358f7604ebc7f56b0626e3
This runs gerrit in a container on review-dev01 using podman.
Remove an unused web_server.py file that we found from copying it
from puppet to ansible.
Change-Id: I399d3cf8471bc8063022b0db0ff81718b2ee2941
In a follow-on change (I9bf74df351e056791ed817180436617048224d2c) I
want to use #noqa to ignore an ansible-lint rule on a task; however
emperical testing shows that it doesn't work with 3.5.1. Upgrading to
4.1.0 it seems whatever was wrong has been fixed.
This, however, requires upgrading to 4.1.0.
I've been through the errors ... the comments inline I think justify
what has been turned off. The two legitimate variable space issues I
have rolled into this change; all other hits were false positives as
described.
Change-Id: I7752648aa2d1728749390cf4f38459c1032c0877
Change I754637115f8c7469efbc1856e88bbcb6fb83b4ce moved a bunch of log
collection to use "stage-output". This uses "fetch-output" which
automatically puts these logs in hostname subdirectories; but it does
not have an option to put it in hosts/hostname as we were doing with
the other logs.
Although we could add such support, it probably doesn't make sense as
most other multinode jobs will have the same layout with the host logs
at the top level. Remove the intermediate "/hosts/" directory on
system-config jobs so all logs remain at the top level, and we don't
have this confusing split as to where logs are for each host.
Change-Id: I56bd67c659ffb26a460d9406f6f090d431c8aa79
This collects syslogs from nodes running in our ansible gate tests.
The node's logs are grouped under a "hosts" directory (the bridge.o.o
logs are moved there for consistentcy too).
Change-Id: I3869946888f09e189c61be4afb280673aa3a3f2e
This change takes the ARA report from the "inner" run of the base
playbooks on our bridge.o.o node and publishes it into the final log
output. This is then displayed by the middleware.
Create a new log hierarchy with a "bridge.o.o" to make it clear the
logs here are related to the test running on that node. Move the
ansible config under there too.
Change-Id: I74122db09f0f712836a0ee820c6fac87c3c9c734
And collect it on post, it is helpful to see the results.
Change-Id: I0dbecf57bf9182168eb6f99cdf88329fcdeb1bdc
Signed-off-by: Paul Belanger <pabelanger@redhat.com>