80 Commits

Author SHA1 Message Date
Jeremy Stanley
f5268d956c Add check to remainder of balance_git_https
Now that we can confirm this hasn't broken for gitea01, set check on
all the remaining server lines as well.

Change-Id: I11f1f15210dafed66e1209329ddf7f3838592881
2022-03-07 18:14:19 +00:00
Jeremy Stanley
4061acd3e7 Add check keyword to balance_zuul_https servers
Apparently the check-ssl option only modifies check behavior, but
does not actually turn it on. The check option also needs to be set
in order to activate checks of the server. See §5.2 of the haproxy
docs for details:
https://git.haproxy.org/?p=haproxy-2.5.git;a=blob;f=doc/configuration.txt;h=e3949d1eebe171920c451b4cad1d5fcd07d0bfb5;hb=HEAD#l14396

Turn it on for all of our balance_zuul_https server entries.

Also set this on the gitea01 server entry in balance_git_https, so
we can make sure it's still seen as "up" once this change takes
effect. A follow-up change will turn it on for the other
balance_git_https servers out of an abundance of caution around that
service.

Change-Id: I4018507f6e0ee1b5c30139de301e09b3ec6fc494
2022-03-07 18:11:46 +00:00
Zuul
53fbf72fdd Merge "Allow zuul-lb to send stats to graphite" 2022-03-07 05:23:29 +00:00
James E. Blair
e97efd5fd5 Allow zuul-lb to send stats to graphite
Change-Id: Ib6bcd21555d34f80e1ace58cbd1cc7f479f92f7a
2022-03-04 15:26:21 -08:00
Clark Boylan
f24bbf97a7 Do more robust checks against zuul-web with haproxy
Switch the port 80 and 443 endpoints over to doing http checks instead
of tcp checks. This ensures that both apache and the zuul-web backend
are functional before balancing to them.

The fingergw remains a tcp check.

Change-Id: Iabe2d7822c9ef7e4514b9a0eb627f15b93ad48e2
2022-03-04 14:17:51 -08:00
James E. Blair
3f8acefbe1 Run zuul-web on zuul01 and add to load balancer
Change-Id: Ia8b10338fa3a1876993404276e0759f4b10d6b54
2022-03-04 13:11:09 -08:00
Jack Morgan
ded27cbb5d Adds support for running zuul-registry as a non-root user
Signed-off-by: Jack Morgan <jack@jento.io>
Change-Id: I89594affb04639b49b409a569036d6afac997251
2022-03-03 09:06:51 -08:00
Clark Boylan
5b5be7cd02 Remove mirror ports 4444 and 8081 from the firewall
The docker v1 protocol proxy listened on these ports and was removed
by 9b6398394d5d5d9e9e9aff244ccac2f98a4317d1 as everything uses v2 now.
The firewall holes were left open though. Clean that up.

Change-Id: Ie00acd5bfb657153b9bc49222ae5d9778ad36e70
2022-02-17 08:31:58 -08:00
Zuul
7dfa0f5fa8 Merge "Haproxy http checks for Gitea" 2022-02-16 22:08:26 +00:00
Zuul
d0a4710eb0 Merge "Remove configuration management for wiki servers" 2022-02-16 17:58:05 +00:00
Clark Boylan
df335525ab Haproxy http checks for Gitea
Previously we were only checking that Apache can open TCP connections to
determine if Gitea is up or down on a backend. This is insufficient
because Gitea itself may be down while Apache is up. In this situation
TCP connection to Apache will function, but if we make an HTTP request
we should get back an error.

To check if both Apache and Gitea are working properly we switch to
using http checks instead. Then if Gitea is down Apache can return a 500
and the Gitea backend will be removed from the pool. Similarly if Apache
is non functional the check will fail to connect via TCP.

Note we don't verify ssl certs for simplicity as checking these in
testing is not straightforward. We didn't have verification with the old
tcp checks so this isn't a regression, but does represent something we
could try and improve in the future.

Change-Id: Id47a1f9028c7575e8fbbd10fabfc9730095cb541
2022-02-15 09:59:52 -08:00
Zuul
9db437f2ba Merge "Switch refstack's IDP to OpenInfraID" 2022-02-15 04:52:24 +00:00
Jeremy Stanley
89c4fd9b3d Remove configuration management for wiki servers
We never finished puppeting the OpenStack wiki, and if we do manage
to get it under configuration management in the future it will
likely not use Puppet anyway. The dev server is already gone, and
deployment has been explicitly disabled for the other, so let's go
ahead and remove the references here and then we should be able to
retire the separate Puppet module we've been hosting.

Change-Id: I3f9ada3eb3d6f16545270135fab994ac460be94b
2022-02-14 22:32:18 +00:00
James E. Blair
2a9553ef25 Add Zuul load balancer
This adds a load balancer for zuul-web and fingergw.

Change-Id: Id5aa01151f64f3c85e1532ad66999ef9471c5896
2022-02-10 13:24:42 -08:00
Zuul
8dafc621d7 Merge "Remove gearman from Zuul" 2022-02-01 23:11:30 +00:00
James E. Blair
14f4a20628 Remove gearman from Zuul
Zuul no longer uses gearman, so we can remove the infrastructure
around it.

Change-Id: I3613d812971add4733d3fe509ee22835e5814ec6
2022-02-01 13:52:47 -08:00
Zuul
6d25c4a5c3 Merge "Add openstack-skyline channel in statusbot/meetbot/logging" 2022-01-31 00:57:14 +00:00
James E. Blair
535b7162a1 Move Zuul SQL connection to "database"
The sql connection is no longer supported, we need to use "database"
instead.  The corresponding hostvars change has already been made
on bridge.

Change-Id: Ibcac56568f263bd50b2be43baa26c8c514c5272b
2022-01-27 16:46:32 -08:00
Ghanshyam Mann
b57f954456 Add openstack-skyline channel in statusbot/meetbot/logging
openstack/skyline project is newly added
- https://review.opendev.org/c/openstack/governance/+/814037
and channel is being added in project-config accessbot
by depends on patch.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/825881
Change-Id: I5df1704b4dade9bf3c5b0ee717a72f6d04fac43a
2022-01-21 15:21:20 -06:00
Jeremy Stanley
2d450e29bc Switch refstack's IDP to OpenInfraID
The OpenStackID project has been rebranded, and the old
openstackid.org deployment is being retained temporarily in order to
ease transition, but id.openinfra.dev is in place now and intended
as its successor.

Note that when this merges, a manual database edit will be required
to associate every user's new ID with their existing accounts, so
this should only be merged when we're ready to do that part just
prior to deploying and then check it again after to make sure we
didn't race any user additions.

Change-Id: I2716e469bc61e53645c23d362b8637bab0a32bb1
2022-01-10 21:21:28 +00:00
Jeremy Stanley
81f8cdfb7b Add HTTPS vhosts to mailman servers
Add secondary vhosts for HTTPS to each mailman site, but don't
remove the plain HTTP ones for now. Before switching to Mailman 3
we'll replace the current HTTP vhosts with blanket redirects to
HTTPS.

Add tests to make sure this is working, and also add a command-line
test for the lists.openinfra.dev site now that it's got a first
non-default list of its own. Also collect Apache logs from the test
nodes so we can see for sure what might break.

Change-Id: I4d93d643381f17c9a968595587909f0ba3dd6f92
2021-12-20 20:35:14 +00:00
Ghanshyam Mann
9dde035e8a Add openstack-venus channel in statusbot
openstack/venus project is newly added
- https://review.opendev.org/c/openstack/project-config/+/808149
and channel is being added in project-config
accessbot by depends on patch.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/821875
Change-Id: Ibf98e54850f65968710a5161d77d3d0880642f38
2021-12-15 15:29:44 -06:00
Ian Wienand
f29aa2da16 Make haproxy role more generic
This makes the haproxy role more generic so we can run another (or
potentially even more) haproxy instance(s) to manage other services.

The config file is moved to a variable for the haproxy role.  The
gitea specific config is then installed for the gitea-lb service by a
new gitea-lb role.

statsd reporting is made optional with an argument.  This
enables/disables the service in the docker compose.

Role documenation is updated.

Needed-By: https://review.opendev.org/678159
Change-Id: I3506ebbed9dda17d910001e71b17a865eba4225d
2021-12-01 09:55:45 +11:00
Clark Boylan
fd88087335 Run gerritbot with a user that will be shared with matrix-gerritbot
They have roughly the same level of access so lets align things.

Change-Id: Ifbe9dae7038345e20e8b498c87a37c519829a8cc
2021-11-05 11:24:05 -07:00
Clark Boylan
cf91bc0971 Remove the gerrit group in favor of the review group
Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.

Change-Id: I7fa7467f703f5cec075e8e60472868c60ac031f7
2021-10-12 09:48:53 -07:00
Clark Boylan
76baae4e3f Replace testing group vars with host vars for review02
Previously we had a test specific group vars file for the review Ansible
group. This provided junk secrets to our test installations of Gerrit
then we relied on the review02.opendev.org production host vars file to
set values that are public.

Unfortunately, this meant we were using the production heapLimit value
which is far too large for our test instances leading to the occasionaly
failure:

  There is insufficient memory for the Java Runtime Environment to continue.
  Native memory allocation (mmap) failed to map 9596567552 bytes for committing reserved memory.

We cannot set the heapLimit in the group var file because the hostvar
file overrides those values. To fix this we need to replace the test
specific group var contents with a test specific host var file instead.
To avoid repeating ourselves we also create a new review.yaml group_vars
file to capture common settings between testing and prod. Note we should
look at combining this new file with the gerrit.yaml group_vars.

On the testing side of things we set the heapLimit to 6GB, we change the
serverid value to prevent any unexpected notedb confusion, and we remove
replication config.

Change-Id: Id8ec5cae967cc38acf79ecf18d3a0faac3a9c4b3
2021-10-12 09:48:45 -07:00
Ian Wienand
d48ec532d5 ptgbot: add leading # to channel name
Change-Id: I90ecb705d237e0117d0aaef1b8abee23a981ff65
2021-10-07 09:09:52 +11:00
Jeremy Stanley
ad16067ae8 Finish ptgbot configuration
Set the channel we want ptgbot joining in production with a group
var, like we do for statusbot's channel list. Correct the password
var name to match what's used in the template for production (and
matches the override set in our private hostvars on the bastion).
Clean up the unnecessary auth nicks list which was copied from the
statusbot config but is entirely unused. Also get rid of some
unnecessary empty lines in the defaults as they really don't make
the file any more readable.

Change-Id: Id026b89d642eae13feba374e4f3ec610b543e530
2021-10-06 19:06:39 +00:00
Zuul
668aa77c9b Merge "Move #zuul from OFTC to Matrix" 2021-08-21 14:57:09 +00:00
James E. Blair
ac1dd4eedd Assume gitea reverse proxy
We now depend on the reverse proxy not only for abuse mitigation but
also for serving .well-known files with specific CORS headers.  To
reduce complexity and avoid traps in the future, make it non-optional.

Change-Id: I54760cb0907483eee6dd9707bfda88b205fa0fed
2021-08-20 22:06:03 -07:00
James E. Blair
cdbfe6b97e Move #zuul from OFTC to Matrix
Zuul is moving to an unbridged Matrix room.  Remove eavesdrop from
the OFTC room, and add the Matrix room to the two new Matrix bots.

Change-Id: I9bf34c1f67c6dac41c3761f8ccde4d7fa76bbf89
2021-08-20 14:44:44 -07:00
James E. Blair
fd4fd57409 Remove port 22 from webservers extra ports
This isn't necessary since it's hard-coded into the file.  Let's
not add it where it isn't needed lest we confuse ourselves into
thinking it's necessary.

Change-Id: I011c647bb85e145e55fb6feb19facdedec180bf1
2021-08-11 14:21:34 -07:00
James E. Blair
8d76a7cd99 Test port 9001 on eavesdrop
We merged change I9459e47ecfd19b27b7adcaee9ce91f80d51c124d which
should have opened this port but did not.  Add testing for it.

Remove eavesdrop from webservers group

This was overridding the custom iptables ports that were being set
in the eavesdrop group vars file.  There appears to be no other use
for the webservers group.

Change-Id: I7109f1472176ff39482f9bdfc8462e5f525f791c
2021-08-11 14:20:41 -07:00
Tristan Cacqueray
32a38a4b83 Add gerritbot-matrix health check and expose prometheus monitoring
This change enables monitoring the gerritbot-matrix service metrics.

Change-Id: I9459e47ecfd19b27b7adcaee9ce91f80d51c124d
2021-08-08 17:35:45 +00:00
Clark Boylan
f6a0bf7be5 Improve gerrit known_hosts management
Previously we were only managing root's known_hosts via ansible but even
then this wasn't happening because the gerrit_self_hostkey var wasn't
set anywhere. On top of that we need to manage multiple known_hosts
because gerrit must recognize itself and all of the gitea servers.
Update the code to take a dict of host key values and add each entry to
known_hosts for both the root and gerrit2 user.

We remove keyscans from tests to ensure that this update is actually
working.

Change-Id: If64c34322f64c1fb63bf2ebdcc04355fff6ebba2
2021-08-02 09:53:27 -07:00
Ian Wienand
e79e3a2f04 Remove review01 references
This server is no longer in production, so remove the various
references to it.

Change-Id: I2cdd8052c48713e9ba648be20ccad5069d5fe40e
2021-07-20 11:57:10 +10:00
Clark Boylan
25d2fdcc3f Add warning to inventory about zuul gerrit server config
Let's avoid changing this and breaking Depends-On again by adding an
explicit warning to the code that sets the config.

Change-Id: Idcb77d8b0b53c56ea7f15f18e001f8bc9a001c98
2021-07-13 10:32:45 -07:00
Clark Boylan
2c06a86915 Talk to review.o.o instead of review01.o.o
Talking to review01.o.o in the Zuul gerrit connection config broke
depends-on handling as the urls would all need to be
https://review01.opendev.org/123456 and then later
https://review02.opendev.org/123456 but people use
https://review.opendev.org/123456.

This change was made to simplify DNS updates during the gerrit server
move but we should be able to handle those via manual landing of changes
and running of playbooks instead. Partially revert
e05257e1b7b70b18cb7b1349278e2c786a565512 to fix the depends-on handling.

Change-Id: Ie628b2627c263d88e466205af2a3d0418d6df7d3
2021-07-13 10:27:36 -07:00
Zuul
f45f5f9626 Merge "Connect Zuul to review01.opendev.org" 2021-07-12 00:11:27 +00:00
James E. Blair
066c2ec4e1 Add gating.dev zone to ADNS
Depends-On: https://review.opendev.org/798374
Change-Id: I901d79c1fceec5566dfd4917b2c7903ffc443acf
2021-06-28 19:39:41 +00:00
Ian Wienand
e05257e1b7 Connect Zuul to review01.opendev.org
Point the Zuul scheduler at review01.opendev.org instead of the CNAME
review.opendev.org.  This avoids chicken-egg issues because Zuul
actually updates the DNS entries.

Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/798242
Change-Id: I1f6054fdab0fe0fcb311686d6af6454b6a714666
2021-06-28 14:36:08 +10:00
Ian Wienand
868a42a85a Move statusbot channels out of hiera
This makes I246b2723372594e65bcd1ba90215d6831d4c0c72 active

Change-Id: I5a9efa2edc2fe6fb70e21d4b58fd4283d2d5972d
2021-06-11 18:15:48 +10:00
Ian Wienand
ccda6d08a1 Move meetbot config to eavesdrop01.opendev.org
This enables the new eavesdrop01.opendev.org server in all current
channels.  Puppet has been disabled on the old server and we will
manually stop supybot/meetbot and mirgrate logs before this applies.

Change-Id: I4a422bb9589c8a8761191313a656f8377e93422f
2021-06-10 09:02:23 +10:00
Clark Boylan
c743b7e484 Clean up zuul01 from inventory
This cleans up zuul01 as it should no longer be used at this point. We
also make the inventory groups a bit more clear that all zuul servers
are under the opendev.org domain now.

Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/790483
Change-Id: I7885fe60028fbd87688f3ae920a24bce4d1a3acd
2021-05-13 06:58:36 -07:00
Clark Boylan
533594d959 Add zuul02 to inventory
This zuul02 instance will replace zuul01. There are a few items to
coordinate when doing an actual switch so we haven't removed zuul01 from
inventory here. In particular we need to update gearman server config
values in the zuul cluster and we need to save queues, shutdown zuul01,
then start zuul02's scheduler and restore queues there.

I believe landing this change is safe as we don't appear to start zuul
on new instances by default. Reviewers should double check this.

Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/791039
Change-Id: I524b456e494124d8293fbe8e1468de40f3800772
2021-05-13 06:58:30 -07:00
Clark Boylan
2eebb858af Remove firehose.openstack.org
Once we are satisfied that we have disabled the inputs to firehose we
can land this change to stop managing it in config management. Once that
is complete the server can be removed.

Change-Id: I7ebd54f566f8d6f940a921b38139b54a9c4569d8
2021-04-13 13:51:48 -07:00
Zuul
3180086559 Merge "Rename refstack group variables" 2021-03-29 21:33:02 +00:00
Ian Wienand
aa94f2d831 Rename refstack group variables
When we cleaned up the puppet in
I6b6dfd0f8ef89a5362f64cfbc8016ba5b1a346b3 we renamed the group
s/refstack-docker/refstack/ but didn't move the variables and some
other references too.

Change-Id: Ib07d1e9ede628c43b4d5d94b64ec35c101e11be8
2021-03-19 16:01:46 +11:00
Zuul
f917044497 Merge "kerberos-kdc: add realm value" 2021-03-18 05:45:01 +00:00
Ian Wienand
ef62e1df31 kerberos-kdc: add realm value
I missed this in the production variables as it is set differently for
testing.

Change-Id: Ie9508cbcb11f8b342f05c98e8e85bc158e5ee4c1
2021-03-18 16:04:51 +11:00