system-config

Author	SHA1	Message	Date
Ian Wienand	3d63b3b8a4	borg-backup-server: log prune output to file This saves prune output to a log file automatically. Add a bit more info on the process too. Change-Id: I2607ddbc313dfebc122609af78bb5eed63906f6b	2021-08-04 14:47:50 +10:00
James E. Blair	ec4baa8bcb	Fix typo in gerrit sysadmin doc The label arguments require "=". Change-Id: I35442033d26060fa639f414aa1a8c6e508716831	2021-05-19 13:19:26 -07:00
Ian Wienand	116a2ca4a4	doc: update backup instructions Update the backup instructions for some recent changes. Make a note of the streaming backup method, discuss some caveats with append-only mode and discuss the pruning scripts and when to run (c.f. I9559bb8aeeef06b95fb9e172a2c5bfb5be5b480e, I250d84c4a9f707e63fef6f70cfdcc1fb7807d3a7). Change-Id: Idb04ebfa5666cd3c20bc0132683d187e705da3f1	2021-02-09 12:15:24 +11:00
Zuul	15d579cf31	Merge "Document dual account split for Gerrit admins"	2020-11-05 17:19:50 +00:00
Ian Wienand	eb07ab3613	borg-backup: add fuse Add the FUSE dependencies for our hosts backed up with borg, along with a small script to make mounting the backups easier. This is the best way to recover something quickly in what is sure to be a stressful situation. Documentation and testing is updated. Change-Id: I1f409b2df952281deedff2ff8f09e3132a2aff08	2020-11-05 11:56:46 +11:00
Jeremy Stanley	427ae2a2aa	Document dual account split for Gerrit admins Our Gerrit admins follow this model of access management now, in order to shield Administrators permission from external identity provider risks. Change-Id: I3070c28c26548d364da38d366bfa2ac8b2fb4668	2020-10-28 21:03:20 +00:00
Zuul	083e8b43ea	Merge "Add borg-backup roles"	2020-10-01 07:36:47 +00:00
Ian Wienand	e3fb7d2be0	docs: Update some of sysadmin details Give a little more details on the current ci/cd setup; remove puppet cruft. Change-Id: I684df4459cf5940d70b89e4c05103f8a8352af87	2020-09-07 17:14:21 +10:00
Ian Wienand	028d655375	Add borg-backup roles This adds roles to implement backup with borg [1]. Our current tool "bup" has no Python 3 support and is not packaged for Ubuntu Focal. This means it is effectively end-of-life. borg fits our model of servers backing themselves up to a central location, is well documented and seems well supported. It also has the clarkb seal of approval :) As mentioned, borg works in the same manner as bup by doing an efficient back up over ssh to a remote server. The core of these roles are the same as the bup based ones; in terms of creating a separate user for each host and deploying keys and ssh config. This chooses to install borg in a virtualenv on /opt. This was chosen for a number of reasons; firstly reading the history of borg there have been incompatible updates (although they provide a tool to update repository formats); it seems important that we both pin the version we are using and keep clients and server in sync. Since we have a hetrogenous distribution collection we don't want to rely on the packaged tools which may differ. I don't feel like this is a great application for a container; we actually don't want it that isolated from the base system because it's goal is to read and copy it offsite with as little chance of things going wrong as possible. Borg has a lot of support for encrypting the data at rest in various ways. However, that introduces the possibility we could lose both the key and the backup data. Really the only thing stopping this is key management, and if we want to go down this path we can do it as a follow-on. The remote end server is configured via ssh command rules to run in append-only mode. This means a misbehaving client can't delete its old backups. In theory we can prune backups on the server side -- something we could not do with bup. The documentation has been updated but is vague on this part; I think we should get some hosts in operation, see how the de-duplication is working out and then decide how we want to mange things long term. Testing is added; a focal and bionic host both run a full backup of themselves to the backup server. Pretty cool, the logs are in /var/log/borg-backup-<host>.log. No hosts are currently in the borg groups, so this can be applied without affecting production. I'd suggest the next steps are to bring up a borg-based backup server and put a few hosts into this. After running for a while, we can add all hosts, and then deprecate the current bup-based backup server in vexxhost and replace that with a borg-based one; giving us dual offsite backups. [1] https://borgbackup.readthedocs.io/en/stable/ Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e	2020-07-21 17:36:50 +10:00
Clark Boylan	3beb50a3b3	Add bit more info on disabling ansible runs We've got a section on using the emergency file and disabled ansible group. Add info about the special DISABLE-ANSIBLE file there to help make that info easier to find. Change-Id: I2e750b9b87ca7a4f800d3ac161a195d49543a7da	2020-06-15 14:41:51 -05:00
Monty Taylor	83ced7f6e6	Split inventory into multiple dirs and move hostvars Make inventory/service for service-specific things, including the groups.yaml group definitions, and inventory/base for hostvars related to the base system, including the list of hosts. Move the exisitng host_vars into inventory/service, since most of them are likely service-specific. Move group_vars/all.yaml into base/group_vars as almost all of it is related to base things, with the execption of the gerrit public key. A followup patch will move host-specific values into equivilent files in inventory/base. This should let us override hostvars in gate jobs. It should also allow us to do better file matchers - and to be able to organize our playbooks move if we want to. Depends-On: https://review.opendev.org/731583 Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf	2020-06-04 07:44:36 -05:00
Dr. Jens Harbott	46b4053a0a	Document the need to use sudo in order to access OSC Change-Id: I9e80f0b57bc9758e6b0458428315b1087856ddec	2020-05-19 10:09:23 +00:00
Monty Taylor	cba5129465	Remove puppet and cron mentions from docs We've got some old out of date docs in some places. This isn't even a full reworking, but at least tries to remove some of the more egregiously wrong things. Change-Id: I9033acb9572e1ce1b3e4426564b92706a4385dcb	2020-04-16 07:04:14 -07:00
Monty Taylor	8af7b47812	Get rid of all-clouds.yaml We had the clouds split from back when we used the openstack dynamic inventory plugin. We don't use that anymore, so we don't need these to be split. Any other usage we have directly references a cloud. Change-Id: I5d95bf910fb8e2cbca64f92c6ad4acd3aaeed1a3	2020-04-09 16:44:20 -05:00
James E. Blair	06d5ce1423	Correct launch readme link This has a .rst extension now. Change-Id: Icafdb12f91315f5c37f95755034d216bc4a5c837	2020-03-27 09:45:42 -07:00
Andreas Jaeger	2c0b82e5e8	Update infra-manual location The infra-manual now lives on docs.opendev.org, update links. New location is: https://docs.opendev.org/opendev/infra-manual/latest Change-Id: I7716c68cbff4f3a640d7161f59cfc034a7ccca52	2020-03-20 22:03:09 +01:00
Dr. Jens Harbott	c86525ccd3	Update references to IRC channels With the move from OpenStack governance to our own OpenDev team, we should also move to use the #opendev IRC channel in preference to the #openstack-infra channel which will remain in use for OpenStack specific discussions. Update the references in our docs accordingly. Change-Id: I448704f5d2664fd233a69a2ad12578ca24d9878a	2020-03-18 17:33:08 +01:00
Zuul	44935bca39	Merge "Add notes on manual host configuration runs"	2020-01-16 22:53:05 +00:00
Ian Wienand	814e4be128	Ansible roles for backup This introduces two new roles for managing the backup-server and hosts that we wish to back up. Firstly the "backup" role runs on hosts we wish to backup. This generates and configures a separate ssh key for running bup and installs the appropriate cron job to run the backup daily. The "backup-server" job runs on the backup server (or, indeed servers). It creates users for each backup host, accepts the remote keys mentioned above and initalises bup. It is then ready to receive backups from the remote hosts. This eliminates a fairly long-standing requirement for manual setup of the backup server users and keys; this section is removed from the documentation. testinfra coverage is added. Change-Id: I9bf74df351e056791ed817180436617048224d2c	2019-08-05 16:59:57 +10:00
Jeremy Stanley	4c04ad5436	Correct emergency file reference in launch script The launch script is referring to the wrong path for the emergency inventory. Also correct the references in the sysadmin guide and update the example for using it. Change-Id: I80bdbd440ec451bcd6fb1a3eb552ffda32407c44	2019-07-26 14:55:32 +00:00
Jeremy Stanley	861f5e893f	Streamline documented bup setup process Reorder some of the commands used to set up and configure the bup user on backup servers so the process is more straightforward and requires fewer mental context switches. Change-Id: I73cb80a04b8b5a74bb0857b4c8b6fb09030d6306	2019-06-18 23:57:19 +00:00
Monty Taylor	d500651367	Rename cgit_file to git_file In sphinx, we have a :cgit_file: directive that makes links to files. Thing is - we're not using cgit anymore. So just rename it to git_file. Change-Id: I80aca5fb3cc84281e29843944fea33e6f4d9fe6f	2019-04-22 11:47:11 +00:00
Monty Taylor	eaa74543de	Finish updating docs for opendev The zuul and zuulv3 docs need to be merged, but that seemed like too much for this. Also, the 3rd party CI doc is out of date, but in this patch only removed sections that linked to docs or files that don't exist anymore. Change-Id: Ie5497edd762d2146165608f3227b0bac88a913df	2019-04-20 18:25:37 +00:00
Ian Wienand	d4a6f1269a	Backup rotation procedure Add a backup rotation procedure to the sysadmin documentation Change-Id: I366198c635c7fd7f8e1876296bf9357dd577bf56	2019-03-19 12:12:16 +11:00
Ian Wienand	1c48bfe327	Enable github shared admin account This change describes the shared github administrator account. This is inspired by I0c61f192a6b5164af7babde5c99e5ee2b77a652c. As described there, this allows for admins to have private accounts in the organisation, but requires that 2FA be turned on. If people wish to keep this as a single account which they do "real" work with (commits, etc) that is probably OK, but add a note that you'll end up with a lot of mostly irrelevant stuff in your feeds. Change-Id: Ic408250571133796b4b4639715fe8d01f91898f2	2018-12-12 10:48:16 +11:00
Ian Wienand	8a95c976e9	Add a workflow overview for adding a cloud Add some details about how we integrate a new cloud into the ecosystem. I feel like this is an appropriate level of detail given we're dealing with clueful admins who just need a rough guide on what to do and can fill in the gaps. Fix up the formatting a bit while we're here. Change-Id: Iba3440e67ab798d5018b9dffb835601bb5c0c6c7	2018-10-19 16:38:00 +00:00
Ian Wienand	cccbeb781c	Add notes on manual host configuration runs Change-Id: I7cf2ea77a378920eacb35ff7743062966ece1487	2018-09-20 09:53:28 +10:00
Andreas Jaeger	1c6b4876eb	Cleanup docs formatting Fix indents of some pages, the wrong indent let to gray bars besides them. Also, fix a typo and add some markup. Change-Id: I6e7126ef7b782b376efcc7c6d69c6de9a504ddb5	2018-08-24 22:13:37 +02:00
Monty Taylor	c716240692	Clean up puppetmaster puppet config handled by ansible We have a bunch of this handled now in ansible, so remove the old stuff. Remove puppetmaster group management files. It's confusing for there to be two files. Remove the old one. Remove mqtt config. This isn't really a thing currently, and we're eyeing running things from zuul anyway, so no need to port to ansible. Change-Id: I8b64d21eadcc4a08bd5e5440fc5f756ae5bcd46b	2018-08-17 11:53:52 -05:00
Monty Taylor	bab6fcad3c	Remove base.yaml things from openstack_project::server Now that we've got base server stuff rewritten in ansible, remove the old puppet versions. Depends-On: https://review.openstack.org/588326 Change-Id: I5c82fe6fd25b9ddaa77747db377ffa7e8bf23c7b	2018-08-16 17:25:10 -05:00
Zuul	04aac06820	Merge "Update Gerrit project renaming for Zuul v3"	2018-08-01 16:45:10 +00:00
Ian Wienand	882b730fdf	Update to openstackdocstheme This modernises the openstack-infra documentation by switching to openstackdocstheme. Update dependencies as required. To remove non-relevant stuff from conf.py, I have just taken the demo file from openstackdocstheme and lightly modified it. It seems later sphinx has included it's own ":file:" role which now conflicts. Change it it ":cgit_file:" in our documentation. Remove the custom header template which no longer applies. Add the post-2.0-pbr sphinx-based warning-as-error, which fixes the original problem that I actually noticed that errors could slip through the gate tests :) Change-Id: Ic7bec57b971bb4c75fc839e7269d1f69a576b85c	2018-06-25 11:19:43 +10:00
Jeremy Stanley	cbbceb2330	Update Gerrit project renaming for Zuul v3 With the switch to Zuul v3, we need to resolve some configuration catch-22s where project names and related in-repository job definitions can't happen without a complex multi-stage removal and reintroduction process to get it through speculative testing successfully. For now, just punt and use monolithic changes bypassing CI in code review. As an up side, the Ansible automation of this process coupled with Zuul v3's increased resilience to on-the-fly configuration changes means we can skip stopping/starting it now and significantly simplify the process. Since we're here, correct the section heading level for "Force-Merging a Change" in the sysadmin document. Change-Id: I335c23abd0b5706f43bbea2dd8cfffa4280dd5db	2018-03-19 15:26:58 +00:00
Ian Wienand	5b2ac45099	Add a note on the shared infra root mail account Change-Id: Id8ae73f99f46d5f0224c8d9145d5c06ee9ea09da	2018-02-08 12:01:55 +11:00
Zuul	81a86fa41a	Merge "Add docs to replace a cinder volume"	2017-11-20 21:13:22 +00:00
Ian Wienand	60b89d662e	Remove ci-backup-rs-ord.openstack.org Migrate backups to new backup01.ord.rax.ci.openstack.org We decided to start fresh backups on the new server, so this is ready to go. I have performed an initial backup on each server so it has accepted the host key of the new server and been tested (I also fixed up review-dev.o.o, which was rebuilt but keys not updated ... todo: add this to puppet, but since it changes so infrequently not high priority). Change-Id: I0872f9fcf4a334d32f632b3cb04801deefab4fd1	2017-11-15 09:28:55 +11:00
James E. Blair	b8722bc67c	Add documentation on force-merging a change Change-Id: Ie6fd2a7fa968909440ae3a30b64a6b80792dd1c5	2017-10-12 01:50:05 +00:00
Paul Belanger	d485cc7e11	Add docs to replace a cinder volume We usually want to do these steps to avoid volume outages when rackspace is doing updates. Change-Id: Ie5de97484dddb9136c240baf46724646e39df67e Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-03-23 13:22:47 -04:00
Clark Boylan	b61a3eb7a4	Clean up backups documentation This adds the now required bup init command to the server to be backed up. Also remove now gone HPCloud backup server and fix quotes around command for catting public ssh key. Change-Id: I607a7c079b16d7f1e94d6b0888cd6e302a04f68f	2017-02-08 10:38:27 -08:00
Jenkins	1960078a1d	Merge "Use an ordinal server naming pattern"	2016-06-30 20:14:40 +00:00
Jenkins	abf31b52e9	Merge "Update cinder mgmt docs to use openstackclient"	2016-06-19 00:54:31 +00:00
Jenkins	4d04652b3f	Merge "Add more lvm commands to cinder documentation"	2016-05-26 02:32:39 +00:00
Jeremy Stanley	3ac0a5eb69	Use an ordinal server naming pattern As discussed during the "Launch Node, Ansible and Puppet" summit session in Austin, we're making things unnecessarily hard on ourselves by insisting on having multiple servers in our inventory with the same name. In order to make server addition and replacement automation simpler, start using an ordinal suffix on server short names to differentiate them (we can still easily rely on DNS for their non-numbered convenience names). Change-Id: I040a5c3b5e1abc50c3e4676bcab0bf4eaa550f4b	2016-05-23 19:42:18 +00:00
Leif Madsen	bdd7085987	Minor documentation tweaks Change-Id: Iece51871918979875f10eeaac0795c23232832d3	2016-04-27 22:29:05 -04:00
Spencer Krum	f76f5f446f	Update cinder mgmt docs to use openstackclient Instead of using a special cinder.sh and cinder vhost, use openstack client with clouds.yaml. Change-Id: I6a14a5fda09929d8345036ca4c54f387acd4fdc0	2016-04-14 13:26:23 -07:00
Spencer Krum	55e28bbe0b	Add more lvm commands to cinder documentation Sometimes we want to extend a logical volume to the entire size of the volume group. The command to do this is quite strange and I am tried of googling it. It is so documented. Change-Id: I600ceb41c57e27eaaf68a1643be848cd331130a5	2016-04-14 12:54:06 -07:00
Jenkins	7ae85a3dbc	Merge "Fix file link to groups.txt in sysadmin docs"	2016-03-11 11:23:38 +00:00
James E. Blair	1b7b8e0569	Add instructions on using openstackclient Change-Id: I3a4c3618e1d558c80c7c0bcee94bb32027397311	2016-02-24 13:03:19 -08:00
Elizabeth K. Joseph	0f529255a9	Fix file link to groups.txt in sysadmin docs Change-Id: I888beeebb70b16707661908992edb55f7f38f50c	2016-02-24 11:49:53 -08:00
Monty Taylor	765c1474b7	Use groups.txt for disabling hosts for puppet We already have a dynamic system for managing static group management. Use it for the disabled group so that the rules for managing the members are not different. Also, update the disabled list to match reality. Also, Update docs because hosts are no longer groups The upstream OpenStack Inventory in Ansible was fixed to no longer return each cloud host as its own group unless there are duplicates for the host in question. This means it's no longer the right thing to do to put hosts into disabled:children - disabled is just fine. Change-Id: I95c83ed64801db15ad99a14547895f3520356f99	2016-01-20 11:38:20 -05:00

1 2

82 Commits