Now that we can confirm this hasn't broken for gitea01, set check on
all the remaining server lines as well.
Change-Id: I11f1f15210dafed66e1209329ddf7f3838592881
Apparently the check-ssl option only modifies check behavior, but
does not actually turn it on. The check option also needs to be set
in order to activate checks of the server. See §5.2 of the haproxy
docs for details:
https://git.haproxy.org/?p=haproxy-2.5.git;a=blob;f=doc/configuration.txt;h=e3949d1eebe171920c451b4cad1d5fcd07d0bfb5;hb=HEAD#l14396
Turn it on for all of our balance_zuul_https server entries.
Also set this on the gitea01 server entry in balance_git_https, so
we can make sure it's still seen as "up" once this change takes
effect. A follow-up change will turn it on for the other
balance_git_https servers out of an abundance of caution around that
service.
Change-Id: I4018507f6e0ee1b5c30139de301e09b3ec6fc494
Switch the port 80 and 443 endpoints over to doing http checks instead
of tcp checks. This ensures that both apache and the zuul-web backend
are functional before balancing to them.
The fingergw remains a tcp check.
Change-Id: Iabe2d7822c9ef7e4514b9a0eb627f15b93ad48e2
The docker v1 protocol proxy listened on these ports and was removed
by 9b6398394d5d5d9e9e9aff244ccac2f98a4317d1 as everything uses v2 now.
The firewall holes were left open though. Clean that up.
Change-Id: Ie00acd5bfb657153b9bc49222ae5d9778ad36e70
Previously we were only checking that Apache can open TCP connections to
determine if Gitea is up or down on a backend. This is insufficient
because Gitea itself may be down while Apache is up. In this situation
TCP connection to Apache will function, but if we make an HTTP request
we should get back an error.
To check if both Apache and Gitea are working properly we switch to
using http checks instead. Then if Gitea is down Apache can return a 500
and the Gitea backend will be removed from the pool. Similarly if Apache
is non functional the check will fail to connect via TCP.
Note we don't verify ssl certs for simplicity as checking these in
testing is not straightforward. We didn't have verification with the old
tcp checks so this isn't a regression, but does represent something we
could try and improve in the future.
Change-Id: Id47a1f9028c7575e8fbbd10fabfc9730095cb541
We never finished puppeting the OpenStack wiki, and if we do manage
to get it under configuration management in the future it will
likely not use Puppet anyway. The dev server is already gone, and
deployment has been explicitly disabled for the other, so let's go
ahead and remove the references here and then we should be able to
retire the separate Puppet module we've been hosting.
Change-Id: I3f9ada3eb3d6f16545270135fab994ac460be94b
The sql connection is no longer supported, we need to use "database"
instead. The corresponding hostvars change has already been made
on bridge.
Change-Id: Ibcac56568f263bd50b2be43baa26c8c514c5272b
The OpenStackID project has been rebranded, and the old
openstackid.org deployment is being retained temporarily in order to
ease transition, but id.openinfra.dev is in place now and intended
as its successor.
Note that when this merges, a manual database edit will be required
to associate every user's new ID with their existing accounts, so
this should only be merged when we're ready to do that part just
prior to deploying and then check it again after to make sure we
didn't race any user additions.
Change-Id: I2716e469bc61e53645c23d362b8637bab0a32bb1
Add secondary vhosts for HTTPS to each mailman site, but don't
remove the plain HTTP ones for now. Before switching to Mailman 3
we'll replace the current HTTP vhosts with blanket redirects to
HTTPS.
Add tests to make sure this is working, and also add a command-line
test for the lists.openinfra.dev site now that it's got a first
non-default list of its own. Also collect Apache logs from the test
nodes so we can see for sure what might break.
Change-Id: I4d93d643381f17c9a968595587909f0ba3dd6f92
This makes the haproxy role more generic so we can run another (or
potentially even more) haproxy instance(s) to manage other services.
The config file is moved to a variable for the haproxy role. The
gitea specific config is then installed for the gitea-lb service by a
new gitea-lb role.
statsd reporting is made optional with an argument. This
enables/disables the service in the docker compose.
Role documenation is updated.
Needed-By: https://review.opendev.org/678159
Change-Id: I3506ebbed9dda17d910001e71b17a865eba4225d
Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.
Change-Id: I7fa7467f703f5cec075e8e60472868c60ac031f7
Previously we had a test specific group vars file for the review Ansible
group. This provided junk secrets to our test installations of Gerrit
then we relied on the review02.opendev.org production host vars file to
set values that are public.
Unfortunately, this meant we were using the production heapLimit value
which is far too large for our test instances leading to the occasionaly
failure:
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 9596567552 bytes for committing reserved memory.
We cannot set the heapLimit in the group var file because the hostvar
file overrides those values. To fix this we need to replace the test
specific group var contents with a test specific host var file instead.
To avoid repeating ourselves we also create a new review.yaml group_vars
file to capture common settings between testing and prod. Note we should
look at combining this new file with the gerrit.yaml group_vars.
On the testing side of things we set the heapLimit to 6GB, we change the
serverid value to prevent any unexpected notedb confusion, and we remove
replication config.
Change-Id: Id8ec5cae967cc38acf79ecf18d3a0faac3a9c4b3
Set the channel we want ptgbot joining in production with a group
var, like we do for statusbot's channel list. Correct the password
var name to match what's used in the template for production (and
matches the override set in our private hostvars on the bastion).
Clean up the unnecessary auth nicks list which was copied from the
statusbot config but is entirely unused. Also get rid of some
unnecessary empty lines in the defaults as they really don't make
the file any more readable.
Change-Id: Id026b89d642eae13feba374e4f3ec610b543e530
We now depend on the reverse proxy not only for abuse mitigation but
also for serving .well-known files with specific CORS headers. To
reduce complexity and avoid traps in the future, make it non-optional.
Change-Id: I54760cb0907483eee6dd9707bfda88b205fa0fed
Zuul is moving to an unbridged Matrix room. Remove eavesdrop from
the OFTC room, and add the Matrix room to the two new Matrix bots.
Change-Id: I9bf34c1f67c6dac41c3761f8ccde4d7fa76bbf89
This isn't necessary since it's hard-coded into the file. Let's
not add it where it isn't needed lest we confuse ourselves into
thinking it's necessary.
Change-Id: I011c647bb85e145e55fb6feb19facdedec180bf1
We merged change I9459e47ecfd19b27b7adcaee9ce91f80d51c124d which
should have opened this port but did not. Add testing for it.
Remove eavesdrop from webservers group
This was overridding the custom iptables ports that were being set
in the eavesdrop group vars file. There appears to be no other use
for the webservers group.
Change-Id: I7109f1472176ff39482f9bdfc8462e5f525f791c
Previously we were only managing root's known_hosts via ansible but even
then this wasn't happening because the gerrit_self_hostkey var wasn't
set anywhere. On top of that we need to manage multiple known_hosts
because gerrit must recognize itself and all of the gitea servers.
Update the code to take a dict of host key values and add each entry to
known_hosts for both the root and gerrit2 user.
We remove keyscans from tests to ensure that this update is actually
working.
Change-Id: If64c34322f64c1fb63bf2ebdcc04355fff6ebba2
Let's avoid changing this and breaking Depends-On again by adding an
explicit warning to the code that sets the config.
Change-Id: Idcb77d8b0b53c56ea7f15f18e001f8bc9a001c98
Talking to review01.o.o in the Zuul gerrit connection config broke
depends-on handling as the urls would all need to be
https://review01.opendev.org/123456 and then later
https://review02.opendev.org/123456 but people use
https://review.opendev.org/123456.
This change was made to simplify DNS updates during the gerrit server
move but we should be able to handle those via manual landing of changes
and running of playbooks instead. Partially revert
e05257e1b7b70b18cb7b1349278e2c786a565512 to fix the depends-on handling.
Change-Id: Ie628b2627c263d88e466205af2a3d0418d6df7d3
Point the Zuul scheduler at review01.opendev.org instead of the CNAME
review.opendev.org. This avoids chicken-egg issues because Zuul
actually updates the DNS entries.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/798242
Change-Id: I1f6054fdab0fe0fcb311686d6af6454b6a714666
This enables the new eavesdrop01.opendev.org server in all current
channels. Puppet has been disabled on the old server and we will
manually stop supybot/meetbot and mirgrate logs before this applies.
Change-Id: I4a422bb9589c8a8761191313a656f8377e93422f
This cleans up zuul01 as it should no longer be used at this point. We
also make the inventory groups a bit more clear that all zuul servers
are under the opendev.org domain now.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/790483
Change-Id: I7885fe60028fbd87688f3ae920a24bce4d1a3acd
This zuul02 instance will replace zuul01. There are a few items to
coordinate when doing an actual switch so we haven't removed zuul01 from
inventory here. In particular we need to update gearman server config
values in the zuul cluster and we need to save queues, shutdown zuul01,
then start zuul02's scheduler and restore queues there.
I believe landing this change is safe as we don't appear to start zuul
on new instances by default. Reviewers should double check this.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/791039
Change-Id: I524b456e494124d8293fbe8e1468de40f3800772
Once we are satisfied that we have disabled the inputs to firehose we
can land this change to stop managing it in config management. Once that
is complete the server can be removed.
Change-Id: I7ebd54f566f8d6f940a921b38139b54a9c4569d8
When we cleaned up the puppet in
I6b6dfd0f8ef89a5362f64cfbc8016ba5b1a346b3 we renamed the group
s/refstack-docker/refstack/ but didn't move the variables and some
other references too.
Change-Id: Ib07d1e9ede628c43b4d5d94b64ec35c101e11be8