We previously auto updated nodepool builders but not launchers when new
container images were present. This created confusion over what versions
of nodepool opendev is running. Use the same behavior for both services
now and auto restart them both.
There is a small chance that we can pull in an update that breaks things
so we run serially to avoid the most egregious instances of this
scenario.
Change-Id: Ifc3ca375553527f9a72e4bb1bdb617523a3f269e
This should only land after we've launched a new nb03.opendev.org
running with the new nodepool arm64 docker image. Once that happens and
we are happy with how it is running we can safely stop managing the
existing nb03.openstack.org server with puppet.
Change-Id: I8d224f9775bd461b43a2631897babd9e351ab6ae
It turns out you can't use "run_once" with the "free" strategy in
Ansible. It actually warns you about this, if you're looking in the
right place.
The existing run-puppet role calls two things with "run_once:", both
delegated to localhost -- cloning the ansible-role-puppet repo (so we
can include_role: puppet) and installing the puppet modules (via
install-ansible-roles role), which are copied from bridge to the
remote side and run by ansible-role-puppet.
With remote_puppet_else.yaml we are running all the puppet hosts at
once with the "free" strategy. This means that these two tasks, both
delegated to localhost (bridge) are actually running for every host.
install-ansible-roles does a git clone, and thus we often see one of
the clones bailing out with a git locking error, because the other
host is running similtaneously.
I8585a1af2dcc294c0e61fc45d9febb044e42151d tried to stop this with
"run_once:" -- but as noted because it's running under the "free"
strategy this is silently ignored.
To get around this, split out the two copying steps into a new role
"puppet-setup". To maintain the namespace, the "run-puppet" module is
renamed to "puppet-run". Before each call of (now) "puppet-run", make
sure we run "puppet-setup" just on localhost.
Remove the run_once and delegation on "install-ansible-roles"; because
this is now called from the playbook with localhost context.
Change-Id: I3b1cea5a25974f56ea9202e252af7b8420f4adc9
It's the only part of base that's important to run when we run a
service. Run it in the service playbooks and get rid of the
dependency on infra-prod-base.
Continue running it in base so that new nodes are brought up
with iptables in place.
Bump the timeout for the mirror job, because the iptables addition
seems to have just bumped it over the edge.
Change-Id: I4608216f7a59cfa96d3bdb191edd9bc7bb9cca39
We have two standalone roles, puppet and cloud-launcher, but we
currently install them with galaxy so depends-on patches don't
work. We also install them every time we run anything, even if
we don't need them for the playbook in question.
Add two roles, one to install a set of ansible roles needed by
the host in question, and the other to encapsulate the sequence
of running puppet, which now includes installing the puppet
role, installing puppet, disabling the puppet agent and then
running puppet.
As a followup, we'll do the same thing with the puppet modules,
so that we arent' cloning and rsyncing ALL of the puppet modules
all the time no matter what.
Change-Id: I69a2e99e869ee39a3da573af421b18ad93056d5b
We still need to run puppet here until they're replaced, but
we're triggering service-nodepool on project-config nodepool
changes. So run the puppet.
Change-Id: Ib0bdaeee98e19921b8c4117c12f8a0c05e64af57
We rolled out review-dev with podman and it worked fine for us. It
worked less fine for nodepool-builder, although we still might be
able to solve it. Maybe right now isn't the time to do this switch.
Gitea, gitea-lb and zuul-registry all use docker instead of podman.
The only thing running with podman right now is review-dev. We can
do a manual cleanup of podman there before runnign this to keep
things simple:
- stop gerrit service
- uninstall podman and podman-compose
- uninstall podman ppa config
- uninstall pip3
Then let ansible install docker and docker compose up.
Story: #2007407
Task: #39062
Change-Id: I9bf99b18559d49d11ba99a96f02a4a45a4f65a86
This is a start at ansible-deployed nodepool environments.
We rename the minimal-nodepool element to nodepool-base-legacy, and
keep running that for the old nodes.
The groups are updated so that only the .openstack.org hosts will run
puppet. Essentially they should remain unchanged.
We start a nodepool-base element that will replace the current
puppet-<openstackci|nodepool> deployment parts. For step one, this
grabs project-config and links in the elements and config file.
A testing host is added for gate testing which should trigger these
roles. This will build into a full deployment test of the builder
container.
Change-Id: If0eb9f02763535bf200062c51a8a0f8793b1e1aa
Depends-On: https://review.opendev.org/#/c/710700/
This is a first step toward making smaller playbooks which can be
run by Zuul in CD.
Zuul should be able to handle missing projects now, so remove it
from the puppet_git playbook and into puppet.
Make the base playbook be merely the base roles.
Make service playbooks for each service.
Remove the run-docker job because it's covered by service jobs.
Stop testing that puppet is installed in testinfra. It's accidentally
working due to the selection of non-puppeted hosts only being on
bionic nodes and not installing puppet on bionic. Instead, we can now
rely on actually *running* puppet when it's important, such as in the
eavesdrop job. Also remove the installation of puppet on the nodes in
the base job, since it's only useful to test that a synthetic test
of installing puppet on nodes we don't use works.
Don't run remote_puppet_git on gitea for now - it's too slow. A
followup patch will rework gitea project creation to not take hours.
Change-Id: Ibb78341c2c6be28005cea73542e829d8f7cfab08