The constructed inventory plugin allows expressing additional groups,
but it's too heavy weight for our needs. Additionally, it is a full
inventory plugin that will add hosts to the inventory if they don't
exist.
What we want instead is something that will associate existing hosts
(that would have come from another source) with groups.
This also switches to using emergency.yaml instead of emergency, which
uses the same format.
We add an extra groups file for gate testing to ensure the CI nodes
get puppet installed.
Change-Id: Iea8b2eb2e9c723aca06f75d3d3307893e320cced
We've only been using nodepool.o.o as a zookeeper server for the past
year or so. Last week we transitioned to a three node zookeeper cluster
and stopped using nodepool.o.o. This server has since been deleted.
This is the last bit of cleanup to remove it from config management.
Change-Id: I9d0363393ed20ee59f40b210ea14fb105a492e20
citycloud is rolling out per-region keystone. There is a change with an
error in it in the latest openstacksdk, so put the right auth_url into
the files directly while we update it and release it again.
Additionally, Sto2 and Lon1 each have different domain ids. The domain
names are the same though - and that's good, because logical names are
nicer in config files anyway.
Restore the config for those clouds.
Change-Id: If55d27defc164bd38af2ffd1e7739120389422af
This region does not show up in catalog listings anymore and is causing
inventory generation for ansible to fail. This change removes Sto2 from
the management side of things so that we can get ansible and puppet
running again.
This does not cleanup nodepool which we can do in a followup once
ansible and puppet are running again.
Change-Id: Ifeea238592b897aa4cea47b723513d7f38d6374b
The mailman verp router handles remote addresses like dnslookup.
It needs to run before dnslookup in order to be effective, so run
it first. It's only for outgoing messages, not incoming, so won't
affect the blackhole aliases we have for incoming fake bounce
messages.
Note that the verp router hasn't been used in about a year due to
this oversight, so we should merge this change with caution.
Change-Id: I7d2a0f05f82485a54c1e7048f09b4edf6e0f0612
This region does not show up in catalog listings anymore and is causing
inventory generation for ansible to fail. This change removes Lon1 from
the management side of things so that we can get ansible and puppet
running again.
This does not cleanup nodepool which we can do in a followup once
ansible and puppet are running again.
Change-Id: Icf3b19381ebba3498dfc204a48dc1ea52ae9d951
We don't use snappy to install software on our servers, but it started
being installed by default. We don't need it, so remove it.
Change-Id: I043d4335916276476350d9ac605fed1e67362e15
The options are deprecated and don't do anything - but they do put
warnings into the service logs.
Change-Id: If53bc8aecc7df75c99ae71e5adb8189790405795
This is going to require some work to port several puppet things
to Ansible. To test the execution mechanism, let's just stub it
out for now.
Change-Id: Ief09ca30b19afffd106c98018cb23a9715fc9a69
After adding iptables configuration to allow bridge.o.o to send stats
to graphite.o.o in I299c0ab5dc3dea4841e560d8fb95b8f3e7df89f2, I
encountered the weird failure that ipv6 rules seemed to be applied on
graphite.o.o, but not the ipv4 ones.
Eventually I realised that the dns_a filter as written is using
socket.getaddrinfo() on bridge.o.o and querying for itself. It thus
gets matches the loopback entry in /etc/hosts and passes along a rule
for 127.0.1.1 or similar. The ipv6 hostname is not in /etc/hosts so
this works there.
What we really want the dns_<a|aaaa> filters to do is lookup the
address in DNS, rather than the local resolver. Without wanting to
get involved in new libraries, etc. the simplest option seems to be to
use the well-known 'host' tool. We can easily parse the output of
this to ensure we're getting the actual DNS addresses for hostnames.
An ipv6 match is added to the existing test. This is effectively
tested by the existing usage of the iptables role which sets up rules
for cacti.o.o access.
Change-Id: Ia7988626e9b1fba998fee796d4016fc66332ec03
We don't want to run ansible if we don't get a complete inventory from
our clouds. The reason for this is we cannot be sure that the ordering
of git servers, gerrit, and zuul or our serialized updates of afs
servers will work correctly if we have an incomplete inventory.
Instead we just want ansible to fail and try again in the future (we can
then debug why our clouds are not working).
From the ansible docs for any_unparsed_is_failed:
If 'true', it is a fatal error when any given inventory source
cannot be successfully parsed by any available inventory plugin;
otherwise, this situation only attracts a warning.
Additionally we tell openstack inventory plugin to report failures
rather than empty inventory so that the unparsed failures happen.
Change-Id: I9025776af4316fbdd2c910566883eb3a2530852a
Keystone auth and openstacksdk/openstackclient do not do the correct
thing without this setting set. They try v2 even though the discovery
doc at the root url does not list that version as valid. Force version 3
so that things will work again.
Change-Id: I7e1b0189c842bbf9640e2cd50873c9f7992dc8d3
This new job is a parent job allowing us to CD from Zuul via
bridge.openstack.org. Using Zuul project ssh keys we add_host bridge.o.o
to our running inventory on the executor then run ansible on bridge.o.o
to run an ansible playbook in
bridge.openstack.org:/opt/system-config/playbooks.
Change-Id: I5cd2dcc53ac480459a22d9e19ef38af78a9e90f7
Allow post-review jobs running under system-config and project-config
to ssh into bridge in order to run Ansible.
Change-Id: I841f87425349722ee69e2f4265b99b5ee0b5a2c8
Add some coarse-grained statsd tracking for the global ansible runs.
Adds a timer for each step, along with an overall timer.
This adds a single argument so that we only try to run stats when
running from the cron job (so if we're debugging by hand or something,
this doesn't trigger). Graphite also needs to accept stats from
bridge.o.o. The plan is to present this via a simple grafana
dashboard.
Change-Id: I299c0ab5dc3dea4841e560d8fb95b8f3e7df89f2
Let's abandon the idea that we'll treat the backup server specially.
As long as we allow *any* automated remote access via ansible, we
have opened the door to potential compromise of the backup systems
if bridge is compromised. Rather than pretending that this separation
gives us any benefit, remove it.
Change-Id: I751060dc05918c440374e80ffb483d948f048f36
In run_all, we start a bunch of plays in sequence, but it's difficult
to tell what they're doing until you see the tasks. Name the plays
themselves to produce a better narrative structure.
Change-Id: I0597eab2c06c6963601dec689714c38101a4d470
We use the git-servers group in remote_puppet_git to positively select
the git nodes in that playbook but used !git0* glob to exclude these
nodes in remote_puppet_else. Use !git-servers in remote_puppet_else so
that the two groups used line up with each other.
Change-Id: I023f8262a86117b2dec1ff5b762082e09e601e74
We were matching afs* as a glob to serialize puppet runs on afs servers.
This was fine until we added afs-client and afsadmin groups to our
inventory which matched afs*. These groups included many nodes including
our mirror nodes and zuul executors all of which were running puppet
serially which is slow.
Fix this by explicitly using the afs and afsdb groups instead of a glob.
Change-Id: If21bbc48b19806343617e18fb03416c642e00ed2
This account is an admin account and sees every project's default
security group. This leads to:
FAILED! => {"changed": false, "msg": "Multiple matches found for default"}
When attempting to set the properties of the default security group for
this account. There doesn't appear to be a good way to filter the other
default security groups out currently so avoid setting them for now.
Change-Id: I9a8cc7d59c0295caa71bf107b9b78745a4617981
Some of our summaries need to display more than 20 tasks to show
complete information. Up to 50, which should be enough for anyone.
Change-Id: I3ae3bb714ea7f5fb094f85c33c19ea3c8a81f6c3
This formerly ran on puppetmaster.openstack.org but needs to be
transitioned to bridge.openstack.org so that we properly configure new
clouds.
Depends-On: https://review.openstack.org/#/c/598404
Change-Id: I2d1067ef5176ecabb52815752407fa70b64a001b
Deployment of the nodepool cloud.yaml file is currently failing with
FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'rackspace_username' is undefined"}
This is because the variables in the group_vars on bridge.o.o are all
prefixed with "nodepool_". Switch to this.
Change-Id: I524cc628138d85e3a31c216d04e4f49bcfaaa4a8