system-config

Author	SHA1	Message	Date
Clark Boylan	613810dba1	Revert "Reduce gerrit heap limit to 44g" This reverts commit 95d9b838140e44c9547ad1fa28bc88206823198c. We've found that we run out of memory at 44g. Bump back up to 48g as that should give us a bit more headroom. Change-Id: I14a8f2b298aa1d3cb5c0829508ee137a6769675b	2020-12-09 15:26:43 -08:00
Clark Boylan	95d9b83814	Reduce gerrit heap limit to 44g We had been setting this to 48GB on java 8, but recent gerrit service issues indicate that this may be too large for our current system on java 11. In particular it appears the non heap portions of the jvm may be in the ~8GB range leaving only about 5-6GB of usable system memory for other activities like web servers, backups, and garbage collection. Reduce this to 44GB to increase headroom to see if that helps us. Java 11 is reported to be much more efficient at garbage collecting so hopefully that makes up the difference between lower memory and where we were on java 8. As a side note we could revert back to java 8 as another option. Change-Id: Ie326aad2a9895098b484924a26c9257cd009d89e	2020-12-08 07:31:53 -08:00
fungi.admin	2197f11a0f	Merge "Omnibus Gerrit 3.2 changes"	2020-11-21 17:19:58 +00:00
Zuul	1b16dae681	Merge "Migrate codesearch site to container"	2020-11-19 22:26:12 +00:00
Ian Wienand	368466730c	Migrate codesearch site to container The hound project has undergone a small re-birth and moved to https://github.com/hound-search/hound which has broken our deployment. We've talked about leaving codesearch up to gitea, but it's not quite there yet. There seems to be no point working on the puppet now. This builds a container than runs houndd. It's an opendev specific container; the config is pulled from project-config directly. There's some custom scripts that drive things. Some points for reviewers: - update-hound-config.sh uses "create-hound-config" (which is in jeepyb for historical reasons) to generate the config file. It grabs the latest projects.yaml from project-config and exits with a return code to indicate if things changed. - when the container starts, it runs update-hound-config.sh to populate the initial config. There is a testing environment flag and small config so it doesn't have to clone the entire opendev for functional testing. - it runs under supervisord so we can restart the daemon when projects are updated. Unlike earlier versions that didn't start listening till indexing was done, this version now puts up a "Hound is not ready yet" message when while it is working; so we can drop all the magic we were doing to probe if hound is listening via netstat and making Apache redirect to a status page. - resync-hound.sh is run from an external cron job daily, and does this update and restart check. Since it only reloads if changes are made, this should be relatively rare anyway. - There is a PR to monitor the config file (https://github.com/hound-search/hound/pull/357) which would mean the restart is unnecessary. This would be good in the near and we could remove the cron job. - playbooks/roles/codesearch is unexciting and deploys the container, certificates and an apache proxy back to localhost:6080 where hound is listening. I've combined removal of the old puppet bits here as the "-codesearch" namespace was already being used. Change-Id: I8c773b5ea6b87e8f7dfd8db2556626f7b2500473	2020-11-20 07:41:12 +11:00
Clark Boylan	57f9e54ad8	Omnibus Gerrit 3.2 changes These changes are squashed together to simplify applying them to config management without zuul and ansible running one of these without the others. We essentially need them all in place at the same time to accurately reflect the post upgrade state. We stop blocking /p/ in gerrit's apache vhost. /p/ is used for dashboards. We add a few java options that new gerrit sets by default. We update the gerrit image in docker compose to 3.2. We update zuul to use basic auth instead of digest auth when talking to Gerrit. Change-Id: I6ea38313544ce1ecbc4cfd914b1f33e77d0d2d03	2020-11-17 16:04:56 -08:00
Zuul	2c7591c318	Merge "Set gerrit.serverId in gerrit.config"	2020-11-17 21:22:53 +00:00
Ian Wienand	c16501af8a	zuul backup : expand debug log match Follow-on to Ia9579c7b3204b47d453fc51388265bf1867af20c, this also matches the web-debug* log files Change-Id: Ibabbfa3b01317528a75eeec17ea28168da57123a	2020-11-13 14:34:06 +11:00
Ian Wienand	dbff6071b1	backup: skip zuul debug logs for backup This cuts out the bulk of the storage expense, but leaves us with the regular logs for enhanced audit trails. Change-Id: Ia9579c7b3204b47d453fc51388265bf1867af20c	2020-11-12 12:11:39 +11:00
Ian Wienand	6bcfe05742	review: trim backups This should help reduce the bulk of the review site backups * launchpadlib cache has ~650,000 files which we don't need to track * review_site/tmp has ~50,000 files * review_site/cache is about 9gb * review_site/index is optional to backup, but a) it's very unlikley to be useful in a full restore situation; we'd have to re-create them and b) things seem to come and go under this directory during the backup, causing it to exit with an error status. Change-Id: If7009cfcd5a3a07c07108149772cc8c1873bf277	2020-11-11 23:36:11 +00:00
Clark Boylan	b9b1cba959	Set gerrit.serverId in gerrit.config This serverId value is used by notedb to identify the gerrit cluster that notedb contents belong to. By default a random uuid is generated by gerrit for this value. In order to avoid config management and gerrit fighting over this value after we upgrade we set a value now. This should be safe to land on 2.13 as old gerrit should ignore the value. Change-Id: I57c9b436a9d0d1dfe77eee907d50fc1dcda6ab12	2020-11-10 10:30:58 -08:00
Ian Wienand	b05a98440a	Remove etherpad from bup backup bup is going crazy and filling the disk when making its backups. We have moved this into the borg backup group and run some backups, so rather than spending time debugging this, we are just going to disable bup on the server. Change-Id: I1daad4eb05f8222131dc84c12577dec924874466	2020-11-10 13:52:03 +11:00
Zuul	9ff95a5f00	Merge "etherpad: ignore live db for borg backups"	2020-11-10 00:11:22 +00:00
Ian Wienand	b26622ad12	etherpad: ignore live db for borg backups Change-Id: Ie7f7e189720e68ec0b07a727be0f5752da20566d	2020-11-10 10:11:24 +11:00
Ian Wienand	d533e89089	Add all backup hosts to borg backups Backups have been going well on ethercalc02, so add borg backup runs to all backed-up servers. Port in some additional excludes for Zuul and slightly modify the /var/ matching. Change-Id: Ic3adfd162fa9bedd84402e3c25b5c1bebb21f3cb	2020-11-09 17:23:22 +11:00
Ian Wienand	3568b76c3c	Add * match to grafana.opendev.org This wasn't matching grafana01 Change-Id: I930a6d1428d8becd29d15fdb53d26b0c186b79fd	2020-11-05 11:35:57 +11:00
Zuul	1dc940c74f	Merge "RAX DFW/IAD : add internal mirror DNS to cert"	2020-11-04 03:28:57 +00:00
Ian Wienand	676c5dad44	Add borg backup server in RAX ORD This is our second backup server for borg, hosted in RAX/ORD. Change-Id: I2c896345e497067ce12863bdb1dda8ce467e2243	2020-10-30 16:39:25 +11:00
Ian Wienand	9a0dfc3004	RAX DFW/IAD : add internal mirror DNS to cert As done for ORD, see Ic1e64a9f0de7bca2659404243d3a004b70888e89 Change-Id: I01a0d259abfed00745dd4cf5957ee3cfd14b9449 Depends-On: https://review.opendev.org/760493	2020-10-30 15:02:51 +11:00
Ian Wienand	c49ece9204	Cleanup grafana.openstack.org The opendev.org server is in production, cleanup the old puppet-based host. Change-Id: I6db3ce929226a23b96234b52ece8b17f4c6a326a	2020-10-29 07:59:42 +11:00
Clark Boylan	c38f27c4bc	This updates LE config for the ord mirror to the correct name We don't need a duplicate name, we need a mirror-int.ord.rax.opendev.org name. I think this was copy pasting failure. Simple fix. Change-Id: Ibe079da6d9393d30e8a664cc67355336d27105e4	2020-10-28 09:59:17 -07:00
Ian Wienand	0746dc187b	nameserver: Allow master server to notify via ipv6 Logs show that the nameservers are being notified via ipv6 and rejecting the request: nsd[18851]: notify for acme.opendev.org. \ from 2001:4800:7819:104:be76:4eff:fe04:43d0 refused, no acl matches. Modify the nsd ACL to allow the ipv6 of the master to trigger updates. This is important for the letsencrypt process, where we need the acme.opendev.org domain updated in a timely fashion so that TXT authentication works. Change-Id: I785f9636dd05e15b8ffd211845f439be7e8344a3	2020-10-28 09:26:14 +00:00
Ian Wienand	7e2641f672	Generate internal certs for RAX ORD mirror This should create a certificate that also covers the -int hostnames, which are records that point the the RAX internal network, rather than public network. Change-Id: Ic1e64a9f0de7bca2659404243d3a004b70888e89 Depends-On: https://review.opendev.org/759970	2020-10-28 10:04:45 +11:00
Clark Boylan	ccc91ba9c9	Switch to filtered apache backends We seem to be under a similar attack to last time. The new apache filter in front of gitea was implemented to be used if this happened again. Switch to it. Change-Id: Ib9ed3029dad7fc26cca209fece547a2a94d8da4a	2020-10-14 08:55:21 -07:00
Zuul	3796649b66	Merge "borg backup : add ethercalc02"	2020-10-12 22:22:53 +00:00
Ian Wienand	27f67136f7	borg backup : add ethercalc02 Add ethercalc as the first borg backup client. We will monitor this as it roles into production. Change-Id: I1ac71e92a8e5c779cd98af16ee4958877c6162ce	2020-10-12 16:30:46 +11:00
Clark Boylan	cae58690dc	Stop replicating to local git mirror on gerrit We've disabled access to the local gerrit git mirrors at the /p/ prefix previously as newer gerrit uses that path for something else. The next step is to stop replicating to that location entirely. Another reason for this is when we switch to notedb this local replication will replicate everything then if we expose it we'd potentially expose content we don't want to via git (rather than the gerrit APIs). Change-Id: I795466af3e1608eefe506ca56828327491f73c27	2020-10-09 10:39:30 -07:00
Ian Wienand	0d0f8ffe14	Add initial borg backup server To catch up -- because this work is moving slowly ... the two backup servers are currently the vexxhost and RAX ORD hosts. The vexxhost node is deployed with Ansible on Bionic, but the old ORD host still needs to be upgraded and moved out of puppet. Instead of dealing with the unmaintained bup and getting it to work on the current LTS Focal, we are doing an initial borg deployment with plans to switch to it globally. This adds the backup02.ca-ymq-1.vexxhost.opendev.org to the inventory and borg-backup-server group, so it will be deployed as a borg backup server (note, no hosts are backing up to it, yet). To avoid the original bup roles matching, we restrict the backup-server group to backup01.ca-ymq-1.vexxhost.opendev.org explicitly. Change-Id: Id30a2ffad75236fc23ed51b2c67d0028da988de5	2020-10-08 11:54:27 +11:00
Ian Wienand	1b4006757a	Cleanup graphite01 Server is replaced with graphite02.opendev.org Change-Id: Ie6099e935a6a7e10c818d1d3003e44bca11dd13a	2020-09-30 11:55:24 +10:00
Clark Boylan	9fdbd56d16	Remove nb04 This was a host used to transition to docker run nodepool builders. That transition has been completed for nb01.opendev.org and nb02.opendev.org and we don't need the third x86 builder. Change-Id: I93c7fc9b24476527b451415e7c138cd17f3fdf9f	2020-09-18 11:12:04 -07:00
Zuul	98370830a3	Merge "Remove mirror01.regionone.linaro-us.opendev.org"	2020-09-18 04:46:09 +00:00
Clark Boylan	1bff2f9fca	Block port 2181 on zookeeper hosts We keep port 2181 listening in zookeeper so that we can easily use the zkshell tool to debug and navigate the database. But now that all zuul and nodepool nodes are using tls we don't need to expose this insecure port publicly. Change-Id: I2a5ab8a9aee8f2739953e859ea52e6e9fd440790	2020-09-09 15:31:47 -07:00
Clark Boylan	1ea83138ef	Remove nodepool builder puppetry and nb03.openstack.org This should only land after we've launched a new nb03.opendev.org running with the new nodepool arm64 docker image. Once that happens and we are happy with how it is running we can safely stop managing the existing nb03.openstack.org server with puppet. Change-Id: I8d224f9775bd461b43a2631897babd9e351ab6ae	2020-09-09 15:09:43 -07:00
Clark Boylan	ebd9c4c59e	Add nb03.opendev.org This server is going to be our new arm64 nodepool-builder running on the new arm64 docker images for nodepool. Depends-On: https://review.opendev.org/750037 Change-Id: I3b46ff901eb92c7f09b79c22441c3f80bc6f9d15	2020-09-04 13:22:32 -07:00
Ian Wienand	19ea4603f4	puppet: don't run module install steps multiple times It turns out you can't use "run_once" with the "free" strategy in Ansible. It actually warns you about this, if you're looking in the right place. The existing run-puppet role calls two things with "run_once:", both delegated to localhost -- cloning the ansible-role-puppet repo (so we can include_role: puppet) and installing the puppet modules (via install-ansible-roles role), which are copied from bridge to the remote side and run by ansible-role-puppet. With remote_puppet_else.yaml we are running all the puppet hosts at once with the "free" strategy. This means that these two tasks, both delegated to localhost (bridge) are actually running for every host. install-ansible-roles does a git clone, and thus we often see one of the clones bailing out with a git locking error, because the other host is running similtaneously. I8585a1af2dcc294c0e61fc45d9febb044e42151d tried to stop this with "run_once:" -- but as noted because it's running under the "free" strategy this is silently ignored. To get around this, split out the two copying steps into a new role "puppet-setup". To maintain the namespace, the "run-puppet" module is renamed to "puppet-run". Before each call of (now) "puppet-run", make sure we run "puppet-setup" just on localhost. Remove the run_once and delegation on "install-ansible-roles"; because this is now called from the playbook with localhost context. Change-Id: I3b1cea5a25974f56ea9202e252af7b8420f4adc9	2020-09-03 09:23:05 +10:00
Ian Wienand	c55a548e71	mirror02.regionone.linaro.us : add missing LE file Change-Id: Ia052f85a92b8a52d7e1896c24ac54dd9eb1620e0	2020-08-25 16:39:11 +10:00
Ian Wienand	600c9e78d4	Remove mirror01.regionone.linaro-us.opendev.org Replaced with 02 mirror Change-Id: I63114be35836f5ddb204e8c0ca5a1e10b056a4b0	2020-08-25 14:43:07 +10:00
Zuul	959473141b	Merge "Backup inventory - match zuul01.openstack.org"	2020-08-04 01:48:27 +00:00
Zuul	b4c95d08b9	Merge "Update host_vars and sync-to-review-test playbook"	2020-07-28 14:09:11 +00:00
Ian Wienand	78eadcb783	Backup inventory - match zuul01.openstack.org The zuul01.openstack.org server is not matching the Ansible backup group, which specifies opendev.org. This means it is not backing up to the "new" vexxhost server like everything else. Change-Id: I07ac19f7cb5597950886c01806189e479e7a3724	2020-07-28 13:06:05 +10:00
Zuul	1800b01bad	Merge "Forward openstack-infra ML to openstack-discuss"	2020-07-15 15:22:14 +00:00
Ian Wienand	cacdb7f573	Backup all hosts with Ansible The process of switching hosts to Ansible backups got a little ... backed up. I think the idea was that we would move these legacy hosts to an all-Ansible configuration a little faster than what has ended up happening. In the mean time, we have done a better job of merging our environment so puppet hosts are just a regular host that runs a puppet step rather than separate entities. So there is no problem running these roles on these older servers. This will bring consistency to our backup story with everything being managed from Ansible. This will currently setup these hosts to backup to the only opendev backup server in vexxhost. As a follow-on, we will add another opendev backup host in another provider to provide dual-redundancy. After that, we can remove the bup::site calls from these hosts and retire the puppet-based backups. Change-Id: Ieaea46d312056bf34992826d673356c56abfc87a	2020-07-15 08:33:44 +10:00
Ian Wienand	999a409530	Add Zuul to backups group With I37dcce3a67477ad3b2c36f2fd3657af18bc25c40 we removed the configuration managment of backups on the zuul server, which was happening via puppet. So the server continues in it's last state, but if we ever built a fresh server it would not have backups. Add it into the Ansible backup group, and uncomment the backup-server group to get a run and setup the Ansible-managed backups. Change-Id: I0af6b7fedc2f8f5a7f214771918138f72d298325	2020-07-14 08:35:57 +10:00
Monty Taylor	4aa28fee13	Update host_vars and sync-to-review-test playbook The host is review-test.opendev.org, so hostvars for review-test.openstack.org are not so much going to do anything. It's easier if we just ssh as root from review to gerrit2 on review-test. review-test needs to be in letsencrypt group and have a handler. We need to install mysql - it's on the existing review servers but not in ansible, it's just left over from puppet. The db credentials are in /root/.gerrit_db.cnf Change-Id: I90e3c9d1b398cc16fea9f7056cfb059c7140160e	2020-07-12 08:09:46 -05:00
Ian Wienand	5f2e6c43a8	gitea: open port 3081 I476674036748d284b9f51e30cc2ffc9650a50541 did not open port 3081 so the proxy isn't visible. Also this group variable is a better place to update the setting. Change-Id: Iad0696221bb9a19852e4ce7cbe06b06ab360cf11	2020-07-08 13:54:44 +10:00
Zuul	c72451c466	Merge "Don't install the track-upstream cron on review-test"	2020-07-08 00:26:25 +00:00
Monty Taylor	4d26d9cb40	Don't install the track-upstream cron on review-test This is just spawning containers that never die. Change-Id: I1f5215c6e60ac59d1eb224bef9032785938dfc70	2020-07-07 14:40:24 -05:00
Jeremy Stanley	c351382293	Forward openstack-infra ML to openstack-discuss The OpenStack Infrastructure team has disbanded, replaced by the OpenDev community and the OpenStack TaCT SIG. As OpenStack-specific community infrastructure discussion now happens under TaCT's banner and they use the openstack-discuss ML, redirect any future messages for the openstack-infra ML there so we can close down the old list. Change-Id: I0aea3b36668a92e47a6510880196589b94576cdf	2020-07-02 21:27:31 +00:00
Ian Wienand	185797a0e5	Graphite container deployment This deploys graphite from the upstream container. We override the statsd configuration to have it listen on ipv6. Similarly we override the ngnix config to listen on ipv6, enable ssl, forward port 80 to 443, block the /admin page (we don't use it). For production we will just want to put some cinder storage in /opt/graphite/storage on the production host and figure out how to migrate the old stats. The is also a bit of cleanup that will follow, because we half-converted grafana01.opendev.org -- so everything can't be in the same group till that is gone. Testing has been added to push some stats and ensure they are seen. Change-Id: Ie843b3d90a72564ef90805f820c8abc61a71017d	2020-07-03 07:17:28 +10:00
Ian Wienand	b146181174	Grafana container deployment This uses the Grafana container created with Iddfafe852166fe95b3e433420e2e2a4a6380fc64 to run the grafana.opendev.org service. We retain the old model of an Apache reverse-proxy; it's well tested and understood, it's much easier than trying to map all the SSL termination/renewal/etc. into the Grafana container and we don't have to convince ourselves the container is safe to be directly web-facing. Otherwise this is a fairly straight forward deployment of the container. As before, it uses the graph configuration kept in project-config which is loaded in with grafyaml, which is included in the container. Once nice advantage is that it makes it quite easy to develop graphs locally, using the container which can talk to the public graphite instance. The documentation has been updated with a reference on how to do this. Change-Id: I0cc76d29b6911aecfebc71e5fdfe7cf4fcd071a4	2020-07-03 07:17:22 +10:00

... 2 3 4 5 6

263 Commits