The backup roles have been debugged and are ready to run.
A note is added about having the backup server in a default disabled
state. This was discussed at an infra meeting where consensus was to
keep it disabled [1].
[1] http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-06-11-19.01.log.html#l-184
Change-Id: I2a3d2d08a9d1514bf6bdcf15bc5bc95689f3020f
It looks like I forgot to add this in
I525ac18b55f0e11b0a541b51fa97ee5d6512bf70 so the mirror-update
specific roles aren't running automatically.
Change-Id: Iee60906c367c9dec1143ee5ce2735ed72160e13d
Prior to https://review.opendev.org/#/c/656871/ this code was executed
by run_all.sh in every pass but seems to have been missed as part of
656871's base.yaml split up.
Add service-bridge.yaml to run_all.sh to get these updates applying to
bridge again. In particular things like clouds.yaml updates are missing
otherwise.
Note I've not merged bridge.yaml and service-bridge.yaml as it appears
we want all of the service stuff to happen after base.yaml but
bridge.yaml needs to happen before. I think this is why they were split
in the first place.
Change-Id: I0a7ce1a65cd19459bbaf244b94a23ddde360da1a
Fix the reported stat name for the mirror playbook.
Run the mirror job in gate.
Set follow=false so that we're telling Ansible to set the perms
on the link rather than the target (which is the default).
Change-Id: Id594cf3f7ab1dacae423cd2b7e158a701d086af6
We ignore E006 which is line lenght longer than 79 characters. We don't
actually care about that. Fix E042 in run_all.sh this represents a
potential real issue in bash as it will hide errors.
This makes the bashate output much cleaner which should make it easier
for people to understand why it fails when it fails in check.
Change-Id: I2249b76e33003b57a1d2ab5fcdb17eda4e5cd7ad
This impelements mirrors to live in the opendev.org namespace. The
implementation is Ansible native for deployment on a Bionic node.
The hostname prefix remains the same (mirrorXX.region.provider.) but
the groups.yaml splits the opendev.org mirrors into a separate group.
The matches in the puppet group are also updated so to not run puppet
on the hosts.
The kerberos and openafs client parts do not need any updating and
works on the Bionic host.
The hosts are setup to provision certificates for themselves from
letsencrypt. Note we've added a new handler for mirror nodes to use
that restarts apache on certificate issue/renewal.
The new "mirror" role is a port of the existing puppet mirror.pp. It
installs apache, sets up some modules, makes some symlinks, sets up a
cleanup cron job and installs the apache vhost configuration.
The vhost configuration is also ported from the extant puppet. It is
simplified somewhat; but the biggest change is that we have extracted
the main port 80 configuration into a macro which is applied to both
port 80 and 443; i.e. the host will have SSL support. The other ports
are left alone for now, but can be updated in due course.
Thus we should be able to CNAME the existing mirrors to new nodes, and
any existing http access can continue. We can update our mirror setup
scripts to point to https resources as appropriate.
Change-Id: Iec576d631dd5b02f6b9fb445ee600be060f9cf1e
This is a first step toward making smaller playbooks which can be
run by Zuul in CD.
Zuul should be able to handle missing projects now, so remove it
from the puppet_git playbook and into puppet.
Make the base playbook be merely the base roles.
Make service playbooks for each service.
Remove the run-docker job because it's covered by service jobs.
Stop testing that puppet is installed in testinfra. It's accidentally
working due to the selection of non-puppeted hosts only being on
bionic nodes and not installing puppet on bionic. Instead, we can now
rely on actually *running* puppet when it's important, such as in the
eavesdrop job. Also remove the installation of puppet on the nodes in
the base job, since it's only useful to test that a synthetic test
of installing puppet on nodes we don't use works.
Don't run remote_puppet_git on gitea for now - it's too slow. A
followup patch will rework gitea project creation to not take hours.
Change-Id: Ibb78341c2c6be28005cea73542e829d8f7cfab08
The server has been removed, remove it from inventory.
While we're here, s/graphite.openstack.org/graphite.opendev.org/'
... it's a CNAME redirect but we might as well clean up.
Change-Id: I36c951c85316cd65dde748b1e50ffa2e058c9a88
Our old puppet 4 process was to run the install_puppet.sh script to
transition from puppet 3 to puppet 4 but this ran after base.yaml which
enforces a puppet version.
Unfortunately we were enforcing puppet version 3 in the base.yaml
playbook via the puppet-install role which meant base would install
pupept 3 and our upgrade playbook would install puppet 4 in a loop.
Thankfully we run puppet after the upgrade so we were using the puppet
version we wanted.
To fix this needless reinstall loop we do two things. We move the
upgrade playbook before base.yaml so that we upgrade before we enforce a
version. Then we update group vars for the puppet4 group to enforce the
puppet 4 version.
Change-Id: I97ca81ed5331e664f8e2e65b283793f0919f6033
Most of these playbooks finish much faster than 2 hours. Set
timeouts which are approximately 3x as long as they are currently
running, rounded to the nearest 10m.
Emit the name of the timer to the log at the end of each run so
that it's more clear which playbook just finished.
Correct the timer name for one of the playbooks.
The k8s cluster deployment playbooks are not yet functional --
run times for those are still unknown.
Change-Id: I43a06baaec908cba7d88c4b0932dcc95f1a9a108
First, we need an @ before the extra vars files. Why? Because
an @ is needed.
Second, the rook playbook was stringing all 4 commands on to one
exec call which was working poorly. Instead, make 4 tasks so that
it's slightly better represented in ansible output, each of which
has a (presumably) valid command.
Change-Id: I30efe84d2041237a00da0c0aac02afa92d29c0fb
The current code runs k8s-on-openstack's ansible in an ansible
task. This makes debugging failures especially difficult.
Instead, move the prep task to update-system-config, which will
ensure the repo is cloned, and move the post task to its own
playbook. The cinder storage class k8s action can be removed from
this completely as it's handled in the rook playbook.
Then just run the k8s-on-openstack playbook as usual, but without
the cd first so that our normal ansible.cfg works.
Change-Id: I6015e58daa940914d46602a2cb64ecac5d59fa2e
Since the gitea cluster doesn't appear in any ansible inventory,
we need to create a dedicated file to hold the extra variables.
Change-Id: Ib2365c9204bff549fdc0116243376d6e895f2296
The k8s-on-openstack project produces an opinionated kubernetes
that is correctly set up to be integrated with OpenStack. All of the
patches we've submitted to update it for our environment have been
landed upstream, so just consume it directly.
It's possible we might want to take a more hands-on forky approach in
the future, but for now it seems fairly stable.
Change-Id: I4ff605b6a947ab9b9f3d0a73852dde74c705979f
Add some coarse-grained statsd tracking for the global ansible runs.
Adds a timer for each step, along with an overall timer.
This adds a single argument so that we only try to run stats when
running from the cron job (so if we're debugging by hand or something,
this doesn't trigger). Graphite also needs to accept stats from
bridge.o.o. The plan is to present this via a simple grafana
dashboard.
Change-Id: I299c0ab5dc3dea4841e560d8fb95b8f3e7df89f2
In run_all.sh, increase the number of ansible forks to 50 for most
playbooks in an attempt to speed up the process.
Change-Id: I487605fd3b2d20d7b1f19c40d22018deeae9c112
And revert "Set Ansible forks to 50"
This doesn't seem to have helped, and may have made the run longer.
I suspect a problem with the env var, but let's revert back to the
old value and mechanism (cli flag) to re-establish a baseline,
then we'll change the value of the cli flag.
This reverts commit 84199095716da416849ed4a2649ec8a2c878609d.
This reverts commit 97d8f9d0bfaec24413f134fe252d4011fe9e36d4.
Change-Id: I825b2b3db26ce6dd7d70fcc8b33e70b511eb52db
20 is working fine with plenty of ram/cpu to spare, increase to 50
to attempt to speed up the runtime.
The environment variable should be used by default, but the "-f"
option will override that, in the one case where we need it.
Change-Id: Ie6a1d991a346702ec58cd716b0b94af5c93554ac
When cron runs this we don't get any delimination between runs in the
output log file. Add begin and end markers.
Change-Id: I4d73d7a8943a302e229517bc717175cda260282c
We have an ansible logging location defined in ansible.cfg. We don't
need to override it in run_all.sh.
Change-Id: I7f0a8b70a1ccd7a43ce47a3f452b6d0d5c57e96a
The production directory is a relic from the puppet environment concept,
which we do not use. Remove it.
The puppet apply tests run puppet locally, where the production
environment is still needed, so don't update the paths in the
tools/prep-apply.sh.
Depends-On: https://review.openstack.org/592946
Change-Id: I82572cc616e3c994eab38b0de8c3c72cb5ec5413
Now that we've got base server stuff rewritten in ansible, remove the
old puppet versions.
Depends-On: https://review.openstack.org/588326
Change-Id: I5c82fe6fd25b9ddaa77747db377ffa7e8bf23c7b
The purpose of the playbook is to update the system-config checkout, as
well as installing puppet modules and ansible roles.
Rename it, so that it's clearer what it does. Also, clean it up a bit.
We've gotten better at playbooks since we originally wrote this.
Change-Id: I793914ca3fc7f89cf019cf4cdf52acb7e0c93e60
Add a playbook to rerun install_puppet.sh with PUPPET_VERSION=4. Also
make the install_modules.sh script smarter about figuring out the puppet
version so that the update_puppet.yaml playbook, which updates the
puppet config and puppet modules but not the puppet package, does not
need to be changed.
When we're ready to start upgrading nodes, we'll add them to the puppet4
group in `modules/openstack_project/files/puppetmaster/groups.txt`.
Change-Id: Ic41d277b2d70e7c25669e0c07e668fb9479b8abf
We are running into memory contention and ooming out on
ansible-playbook. Less workers = more ram, hope.
We can also move puppetmaster.o.o to a host with more ram (it only has
2G right now.) We can also disable the apache/passenger/puppet that is
running on the host.
Change-Id: Id5ade889748d5e8f65a8ea68cc64b0c071c6a627
Add separate playbook for infacloud nodes to ensure they run in the
correct order - baremetal -> controller -> compute.
Baremetal is intentionally left out, it is not ready yet.
All 'disabled' flags on infracloud hosts are turned off. This patch
landing turns on management of the infracloud.
Co-Authored-By: Yolanda Robla <info@ysoft.biz>
Co-Authored-By: Spencer Krum <nibz@spencerkrum.com>
Change-Id: Ieeda072d45f7454d6412295c2c6a0cf7ce61d952
There are a few things that are run as part of run_all.sh that are
not logged into puppet_run_all.log - namely git cloning, module installation
and ansible role installation. Let's go ahead and do those in a playbook
so that we can see their output while we're watching the log file.
Change-Id: I6982452f1e572b7bc5a7b7d167c1ccc159c94e66
If we're going to run puppet apply on all of our nodes, they need
the puppet modules installed on them first.
Change-Id: I84b80818fa54d1ddc4d46fead663ed4212bb6ff3
/etc/ansible/playbooks isn't actually a thing, it was just a convenient
place to put things. However, to enable puppet apply, we're going to
want a group_vars directory adjacent to the playbooks, so having them be
a subdirectory of the puppet module and installed by it is just extra
complexity. Also, if we run out of system-config, then it'll be easier
to work with things like what we do with puppet environments for testing
things.
Change-Id: I947521a73051a44036e7f4c45ce74a79637f5a8b
Our current puppet run_all.sh script takes almost 45 minutes to run
puppet agent on all of our nodes. We are using the default concurrency
of 5. Our puppet master should be able to handle a bit more than that.
Run the git/gerrit playbook with a concurrency of 10 and everything else
with a concurrency of 20.
Change-Id: Ia09abb6fa8c699e156aed38d86ce6fd193f3a42d
Ansible galaxy will not overwrite a role that already exists by default.
To keep our ansible puppet role up to date force its installation.
Change-Id: I75eda8600f666895f9be8711d089615e57b3f3c5
Similar to how we install puppet modules from standalone repos, start
using the ansible-galaxy command to install roles from standalone role
repos.
Change-Id: Iae7d8e4626479e565bc194496de289027a4668ed
Depends-On: I76d5cab55942beaff44ea5f289f93ff6ce772c5f