This introduces two new roles for managing the backup-server and hosts
that we wish to back up.
Firstly the "backup" role runs on hosts we wish to backup. This
generates and configures a separate ssh key for running bup and
installs the appropriate cron job to run the backup daily.
The "backup-server" job runs on the backup server (or, indeed
servers). It creates users for each backup host, accepts the remote
keys mentioned above and initalises bup. It is then ready to receive
backups from the remote hosts.
This eliminates a fairly long-standing requirement for manual setup of
the backup server users and keys; this section is removed from the
documentation.
testinfra coverage is added.
Change-Id: I9bf74df351e056791ed817180436617048224d2c
The launch script is referring to the wrong path for the emergency
inventory. Also correct the references in the sysadmin guide and
update the example for using it.
Change-Id: I80bdbd440ec451bcd6fb1a3eb552ffda32407c44
Reorder some of the commands used to set up and configure the bup
user on backup servers so the process is more straightforward and
requires fewer mental context switches.
Change-Id: I73cb80a04b8b5a74bb0857b4c8b6fb09030d6306
In sphinx, we have a :cgit_file: directive that makes links to files.
Thing is - we're not using cgit anymore. So just rename it to git_file.
Change-Id: I80aca5fb3cc84281e29843944fea33e6f4d9fe6f
The zuul and zuulv3 docs need to be merged, but that seemed like
too much for this. Also, the 3rd party CI doc is out of date, but
in this patch only removed sections that linked to docs or files
that don't exist anymore.
Change-Id: Ie5497edd762d2146165608f3227b0bac88a913df
This change describes the shared github administrator account.
This is inspired by I0c61f192a6b5164af7babde5c99e5ee2b77a652c. As
described there, this allows for admins to have private accounts in
the organisation, but requires that 2FA be turned on. If people wish
to keep this as a single account which they do "real" work with
(commits, etc) that is probably OK, but add a note that you'll end up
with a lot of mostly irrelevant stuff in your feeds.
Change-Id: Ic408250571133796b4b4639715fe8d01f91898f2
Add some details about how we integrate a new cloud into the
ecosystem. I feel like this is an appropriate level of detail given
we're dealing with clueful admins who just need a rough guide on what
to do and can fill in the gaps.
Fix up the formatting a bit while we're here.
Change-Id: Iba3440e67ab798d5018b9dffb835601bb5c0c6c7
Fix indents of some pages, the wrong indent let to gray bars besides
them.
Also, fix a typo and add some markup.
Change-Id: I6e7126ef7b782b376efcc7c6d69c6de9a504ddb5
We have a bunch of this handled now in ansible, so remove the old stuff.
Remove puppetmaster group management files. It's confusing for there to
be two files. Remove the old one.
Remove mqtt config. This isn't really a thing currently, and we're
eyeing running things from zuul anyway, so no need to port to ansible.
Change-Id: I8b64d21eadcc4a08bd5e5440fc5f756ae5bcd46b
Now that we've got base server stuff rewritten in ansible, remove the
old puppet versions.
Depends-On: https://review.openstack.org/588326
Change-Id: I5c82fe6fd25b9ddaa77747db377ffa7e8bf23c7b
This modernises the openstack-infra documentation by switching to
openstackdocstheme. Update dependencies as required.
To remove non-relevant stuff from conf.py, I have just taken the demo
file from openstackdocstheme and lightly modified it.
It seems later sphinx has included it's own ":file:" role which now
conflicts. Change it it ":cgit_file:" in our documentation. Remove
the custom header template which no longer applies. Add the
post-2.0-pbr sphinx-based warning-as-error, which fixes the original
problem that I actually noticed that errors could slip through the
gate tests :)
Change-Id: Ic7bec57b971bb4c75fc839e7269d1f69a576b85c
With the switch to Zuul v3, we need to resolve some configuration
catch-22s where project names and related in-repository job
definitions can't happen without a complex multi-stage removal and
reintroduction process to get it through speculative testing
successfully. For now, just punt and use monolithic changes
bypassing CI in code review. As an up side, the Ansible automation
of this process coupled with Zuul v3's increased resilience to
on-the-fly configuration changes means we can skip stopping/starting
it now and significantly simplify the process.
Since we're here, correct the section heading level for
"Force-Merging a Change" in the sysadmin document.
Change-Id: I335c23abd0b5706f43bbea2dd8cfffa4280dd5db
Migrate backups to new backup01.ord.rax.ci.openstack.org
We decided to start fresh backups on the new server, so this is ready
to go. I have performed an initial backup on each server so it has
accepted the host key of the new server and been tested (I also fixed
up review-dev.o.o, which was rebuilt but keys not updated ... todo:
add this to puppet, but since it changes so infrequently not high
priority).
Change-Id: I0872f9fcf4a334d32f632b3cb04801deefab4fd1
We usually want to do these steps to avoid volume outages when
rackspace is doing updates.
Change-Id: Ie5de97484dddb9136c240baf46724646e39df67e
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This adds the now required bup init command to the server to be backed
up. Also remove now gone HPCloud backup server and fix quotes around
command for catting public ssh key.
Change-Id: I607a7c079b16d7f1e94d6b0888cd6e302a04f68f
As discussed during the "Launch Node, Ansible and Puppet" summit
session in Austin, we're making things unnecessarily hard on
ourselves by insisting on having multiple servers in our inventory
with the same name. In order to make server addition and replacement
automation simpler, start using an ordinal suffix on server short
names to differentiate them (we can still easily rely on DNS for
their non-numbered convenience names).
Change-Id: I040a5c3b5e1abc50c3e4676bcab0bf4eaa550f4b
Sometimes we want to extend a logical volume to the entire size of the
volume group. The command to do this is quite strange and I am tried of
googling it. It is so documented.
Change-Id: I600ceb41c57e27eaaf68a1643be848cd331130a5
We already have a dynamic system for managing static group management.
Use it for the disabled group so that the rules for managing the members
are not different.
Also, update the disabled list to match reality.
Also, Update docs because hosts are no longer groups
The upstream OpenStack Inventory in Ansible was fixed to no longer
return each cloud host as its own group unless there are duplicates for
the host in question. This means it's no longer the right thing to do
to put hosts into disabled:children - disabled is just fine.
Change-Id: I95c83ed64801db15ad99a14547895f3520356f99
At long last, the day of reckoning is here. Run puppet apply and then
copy the log files back and post them to puppetdb.
Change-Id: I919fea64df0fbb8681e91ac9425b4c43760bb3dd
The way disabling works with puppet and openstack inventory in ansible
can be confusing at first. Some examples hopefully clarify the
situation.
Change-Id: Ib85feebce309896c6f3d139318dd5d204d9cb8ec
With the puppetmaster not there anymore, we should consume inventory
from OpenStack rather than from puppet.
It turns out that because of the way static and dynamic inventories get
merged, the static file needs to stand alone. SO - if you need to
disable a dynamic host from OpenStack (pretty much all of our hosts) you
need to not only add it to dynamic:children, you need to add an emtpy
group into the static file too, otherwise you'll get an error like:
root@puppetmaster:~# ansible -i newinv '!disabled' --list-hosts
ERROR: newinv/static:4: child group is not defined: (jenkins-dev.openstack.org)
Change-Id: Ic6809ed0b7014d7aebd414bf3a342e3a37eb10b6
We're not ready to move from puppet inventory to openstack inventory
just yet, so don't actually swap the dynamic inventory plugin. But, add
it to the system so that running manual tests of all of the pieces is
possible.
Add the currently administratively disabled hosts to the disabled group
so that we can verify this works.
Change-Id: I73931332b2917b71a008f9213365f7594f69c41e
Updating the sample puppet code block for etherpad
on System Administration Wiki as it was referring
to an old etherpad puppet code.
Change-Id: Ibd2d2ee1febf909d5851b829a4a9c5f2d620a20f
We're now putting bup backups into /opt/backups on a cinder
volume, updating documentation to move home directories for
servers we back up here.
Change-Id: I81e68dfb3fd9fd92dfb41ea3415a44db37f6c3af
The current sysadmin.html assumes there is a volume group (vg) present
when adding a new cinder volume. On a new server a vg won't exist, adding
instructions for how to add it.
Change-Id: I3171819fb5aea8a5edfab28f29ba35f9d0f5d461
We are renaming the openstack-infra/config repo to
openstack-infra/system-config. This patch edits the docs files.
Change-Id: Ic594f1b5438a400fb6c1071c3045adb7a0b7e441