docs: Update some of sysadmin details
Give a little more details on the current ci/cd setup; remove puppet cruft. Change-Id: I684df4459cf5940d70b89e4c05103f8a8352af87
This commit is contained in:
parent
642c6c2d88
commit
e3fb7d2be0
@ -6,147 +6,89 @@ System Administration
|
||||
#####################
|
||||
|
||||
Our infrastructure is code and contributions to it are handled just
|
||||
like the rest of OpenStack. This means that anyone can contribute to
|
||||
like the rest of OpenDev. This means that anyone can contribute to
|
||||
the installation and long-running maintenance of systems without shell
|
||||
access, and anyone who is interested can provide feedback and
|
||||
collaborate on code reviews.
|
||||
|
||||
The configuration of every system operated by the infrastructure team
|
||||
is managed by a combination of Ansible and Puppet:
|
||||
is managed by Ansible and driven by continuous integration and
|
||||
deployment by Zuul.
|
||||
|
||||
https://opendev.org/opendev/system-config
|
||||
|
||||
All system configuration should be encoded in that repository so that
|
||||
anyone may propose a change in the running configuration to Gerrit.
|
||||
|
||||
Making a Change in Puppet
|
||||
=========================
|
||||
Guide to CI and CD
|
||||
==================
|
||||
|
||||
Many changes to the Puppet configuration can safely be made while only
|
||||
performing syntax checks. Some more complicated changes merit local
|
||||
testing and an interactive development cycle. The system-config repo
|
||||
is structured to facilitate local testing before proposing a change
|
||||
for review. This is accomplished by separating the puppet
|
||||
configuration into several layers with increasing specificity about
|
||||
site configuration higher in the stack.
|
||||
All development work is based around Zuul jobs and a continuous
|
||||
integration and development workflow.
|
||||
|
||||
The `modules/` directory holds puppet modules that abstractly describe
|
||||
the configuration of a service. Ideally, these should have no
|
||||
OpenStack-specific information in them, and eventually they should all
|
||||
become modules that are directly consumed from PuppetForge, only
|
||||
existing in the system-config repo during an initial incubation period.
|
||||
This is not yet the case, so you may find OpenStack-specific
|
||||
configuration in these modules, though we are working to reduce it.
|
||||
The starting point for all services is generally the playbooks and
|
||||
roles kept in :git_file:`playbooks`.
|
||||
Most playbooks are named ``service-<name>.yaml`` and will indicate
|
||||
which production areas they drive.
|
||||
|
||||
The `modules/openstack_project/manifests/` directory holds
|
||||
configuration for each of the servers that the OpenStack project runs.
|
||||
Think of these manifests as describing how OpenStack runs a particular
|
||||
service. However, no site-specific configuration such as hostnames or
|
||||
credentials should be included in these files. This is what lets you
|
||||
easily test an OpenStack project manifest on your own server.
|
||||
These playbooks run on groups of hosts which are defined in
|
||||
:git_file:`inventory/service/groups`. The production hosts are kept
|
||||
in an inventory at :git_file:`inventory/base/hosts.yaml`. During
|
||||
testing, these same playbooks are run against the test nodes. You can
|
||||
note that the testing hosts are given names that match the group
|
||||
configuration in the jobs defined in
|
||||
:git_file:`zuul.d/system-config-run.yaml`.
|
||||
|
||||
Finally, the `manifests/site.pp` file contains the information that is
|
||||
specific to the actual servers that OpenStack runs. These should be
|
||||
very simple node definitions that largely exist simply to provide
|
||||
private data from hiera to the more robust manifests in the
|
||||
`openstack_project` modules.
|
||||
Deployment is run through a bastion host ``bridge.openstack.org``.
|
||||
After changes are approved, Zuul will run Ansible on this host; which
|
||||
will then connect to the production hosts and run the orchestration
|
||||
using the latest committed code. The bridge is a special host because
|
||||
it holds production secrets, such as passwords or API keys, and
|
||||
unredacted logs. As many logs as possible are provided in the public
|
||||
Zuul job results, but they need to be audited to ensure they do not
|
||||
leak secrets and thus in some cases may not be published.
|
||||
|
||||
This means that you can run the same configuration on your own server
|
||||
simply by providing a different manifest file instead of site.pp.
|
||||
|
||||
.. note::
|
||||
The example below is for Debian / Ubuntu systems. If you are using a
|
||||
Red Hat based system be sure to setup sudo or simply run the commands as
|
||||
the root user.
|
||||
|
||||
As an example, to run the etherpad configuration on your own server,
|
||||
start by ensuring `git` is installed and then cloning the system-config
|
||||
Git repo::
|
||||
|
||||
sudo su -
|
||||
apt-get install git
|
||||
git clone https://opendev.org/opendev/system-config
|
||||
cd system-config
|
||||
|
||||
Then copy the etherpad node definition from `manifests/site.pp` to a new
|
||||
file (be sure to specify the FQDN of the host you are working with in
|
||||
the node specifier). It might look something like this::
|
||||
|
||||
# local.pp
|
||||
class { 'openstack_project::etherpad':
|
||||
ssl_cert_file_contents => hiera('etherpad_ssl_cert_file_contents'),
|
||||
ssl_key_file_contents => hiera('etherpad_ssl_key_file_contents'),
|
||||
ssl_chain_file_contents => hiera('etherpad_ssl_chain_file_contents'),
|
||||
mysql_host => hiera('etherpad_db_host', 'localhost'),
|
||||
mysql_user => hiera('etherpad_db_user', 'username'),
|
||||
mysql_password => hiera('etherpad_db_password'),
|
||||
}
|
||||
|
||||
.. note::
|
||||
Be sure not to use any of the hiera functionality from manifests/site.pp
|
||||
since it is not installed yet. You should be able to comment out the logic
|
||||
safely.
|
||||
|
||||
Then to apply that configuration, run the following from the root of the
|
||||
system-config repository::
|
||||
|
||||
./install_puppet.sh
|
||||
./install_modules.sh
|
||||
puppet apply -l /tmp/manifest.log --modulepath=modules:/etc/puppet/modules manifests/local.pp
|
||||
|
||||
That should turn the system you are logged into into an etherpad
|
||||
server with the same configuration as that used by the OpenStack
|
||||
project. You can edit the contents of the system-config repo and
|
||||
iterate ``puppet apply`` as needed. When you're ready to propose the
|
||||
change for review, you can propose the change with git-review. See the
|
||||
`Development workflow section in the Developer's Guide
|
||||
<https://docs.opendev.org/opendev/infra-manual/latest/developers.html#development-workflow>`_
|
||||
for more information.
|
||||
|
||||
Accessing Clouds
|
||||
================
|
||||
|
||||
As an unprivileged user who is a member of the `sudo` group on
|
||||
bridge, you can access any of the clouds with::
|
||||
|
||||
sudo openstack --os-cloud <cloud name> --os-cloud-region <region name>
|
||||
For CI testing, each job creates a "fake" bridge, along with the
|
||||
servers required for orchestration. Thus CI testing is performed by a
|
||||
"nested" Ansible -- Zuul initially connects to the testing bridge node
|
||||
and deploys it, and then this node runs its own Ansible that tests the
|
||||
orchestration to the other testing nodes, simulating the production
|
||||
environment. This is driven by playbooks kept in
|
||||
:git_file:`playbooks/zuul`. Here you will also find testing
|
||||
definitions of host variables that are kept secret for production
|
||||
hosts.
|
||||
|
||||
After the test environment is orchestrated, the
|
||||
`testinfra <https://testinfra.readthedocs.io/en/latest/>`__ tests from
|
||||
:git_file:`testinfra` are run. This validates the complete
|
||||
orchestration testing environment; things such as ensuring user
|
||||
creation, container readiness and service wellness checks are all
|
||||
performed.
|
||||
|
||||
.. _adding_new_server:
|
||||
|
||||
Adding a New Server
|
||||
===================
|
||||
|
||||
To create a new server, do the following:
|
||||
Creating a new server for your service requires discussion with the
|
||||
OpenDev administrators to ensure donor resources are being used
|
||||
effectively.
|
||||
|
||||
* Add a file in :git_file:`modules/openstack_project/manifests/` that defines a
|
||||
class which specifies the configuration of the server.
|
||||
|
||||
* Add a node pattern entry in :git_file:`manifests/site.pp` for the server
|
||||
that uses that class. Make sure it supports an ordinal naming pattern
|
||||
(e.g., fooserver01.openstack.org not just fooserver.openstack.org, even
|
||||
if you're replacing an existing server) and that another server with the
|
||||
same does not already exist in the ansible inventory.
|
||||
|
||||
* If your server needs private information such as passwords, use
|
||||
hiera calls in the site manifest, and ask an infra-core team member
|
||||
to manually add the private information to hiera.
|
||||
|
||||
* You should be able to install and configure most software only with
|
||||
ansible or puppet. Nonetheless, if you need SSH access to the host,
|
||||
add your public key to :git_file:`inventory/service/group_vars/all.yaml` and
|
||||
include a stanza like this in your server class::
|
||||
* Hosts should only be configured by Ansible. Nonetheless, in some
|
||||
cases SSH access can be granted. Add your public key to
|
||||
:git_file:`inventory/base/group_vars/all.yaml` and include a stanza
|
||||
like this in your server ``host_vars``::
|
||||
|
||||
extra_users:
|
||||
- your_user_name
|
||||
|
||||
* Add an RST file with documentation about the server in :git_file:`doc/source`
|
||||
and add it to the index in that directory.
|
||||
* Add an RST file with documentation about the server and services in
|
||||
:git_file:`doc/source` and add it to the index in that directory.
|
||||
|
||||
SSH Access
|
||||
==========
|
||||
|
||||
For any of the systems managed by the OpenStack Infrastructure team, the
|
||||
For any of the systems managed by the OpenDev Infrastructure team, the
|
||||
following practices must be observed for SSH access:
|
||||
|
||||
* SSH access is only permitted with SSH public/private key
|
||||
@ -171,14 +113,13 @@ following practices must be observed for SSH access:
|
||||
is received should be used, and the SSH keys should be added with
|
||||
the confirmation constraint ('ssh-add -c').
|
||||
* The number of SSH keys that are configured to permit access to
|
||||
OpenStack machines should be kept to a minimum.
|
||||
* OpenStack Infrastructure machines must use puppet to centrally manage and
|
||||
configure user accounts, and the SSH authorized_keys files from the
|
||||
openstack-infra/system-config repository.
|
||||
OpenDev machines should be kept to a minimum.
|
||||
* OpenDev Infrastructure machines must use Ansible to centrally manage
|
||||
and configure user accounts, and the SSH authorized_keys files from
|
||||
the opendev/system-config repository.
|
||||
* SSH keys should be periodically rotated (at least once per year).
|
||||
During rotation, a new key can be added to puppet for a time, and
|
||||
then the old one removed. Be sure to run puppet on the backup
|
||||
servers to make sure they are updated.
|
||||
then the old one removed.
|
||||
|
||||
|
||||
GitHub Access
|
||||
@ -186,7 +127,7 @@ GitHub Access
|
||||
|
||||
To ensure that code review and testing are not bypassed in the public
|
||||
Git repositories, only Gerrit will be permitted to commit code to
|
||||
OpenStack repositories. Because GitHub always allows project
|
||||
OpenDev repositories. Because GitHub always allows project
|
||||
administrators to commit code, accounts that have access to manage the
|
||||
GitHub projects necessarily will have commit access to the
|
||||
repositories.
|
||||
@ -197,7 +138,7 @@ would prefer to keep a separate account, it can be added to the
|
||||
organisation after discussion and noting the caveats around elevated
|
||||
access. The account must have 2FA enabled.
|
||||
|
||||
In either case, the adminstrator accounts should not be used to check
|
||||
In either case, the administrator accounts should not be used to check
|
||||
out or commit code for any project.
|
||||
|
||||
Note that it is unlikely to be useful to use an account also used for
|
||||
@ -207,26 +148,16 @@ for all projects.
|
||||
Root only information
|
||||
#####################
|
||||
|
||||
Some information is only relevant if you have root access to the system - e.g.
|
||||
you are an OpenStack CI root operator, or you are running a clone of the
|
||||
OpenStack CI infrastructure for another project.
|
||||
Below is information relevant to members of the core team with root
|
||||
access.
|
||||
|
||||
Force configuration run on a server
|
||||
===================================
|
||||
Accessing Clouds
|
||||
================
|
||||
|
||||
If you need to force a configuration run on a single server before the
|
||||
usual cron job time, you can use the ``kick.sh`` script on
|
||||
``bridge.openstack.org``.
|
||||
As an unprivileged user who is a member of the `sudo` group on bridge,
|
||||
you can inspect any of the clouds with::
|
||||
|
||||
You could do a single server::
|
||||
|
||||
# /opt/system-config/production/tools/kick.sh 'review.openstack.org'
|
||||
|
||||
Or use matching to cover a range of servers::
|
||||
|
||||
# /opt/system-config/production/tools/kick.sh 'ze*.openstack.org'
|
||||
|
||||
# /opt/system-config/production/tools/kick.sh 'ze0[1-4].openstack.org'
|
||||
sudo openstack --os-cloud <cloud name> --os-cloud-region <region name>
|
||||
|
||||
Backups
|
||||
=======
|
||||
@ -477,9 +408,8 @@ from misspelling the name of the file and is recommended.
|
||||
Examples
|
||||
--------
|
||||
|
||||
To disable an OpenStack instance called `amazing.openstack.org` temporarily
|
||||
without landing a puppet change, ensure the following is in
|
||||
`/etc/ansible/hosts/emergency.yaml`
|
||||
To disable an OpenDev instance called `foo.opendev.org` temporarily,
|
||||
ensure the following is in `/etc/ansible/hosts/emergency.yaml`
|
||||
|
||||
::
|
||||
|
||||
@ -489,6 +419,21 @@ without landing a puppet change, ensure the following is in
|
||||
disabled:
|
||||
- foo.opendev.org # 2020-05-23 bob is testing change 654321
|
||||
|
||||
Ad-hoc Ansible runs
|
||||
===================
|
||||
|
||||
If you need to run Ansible manually against a host, you should
|
||||
|
||||
* disable automated Ansible runs following the section above
|
||||
* ``su`` to the ``zuul`` user and run the playbook with something like
|
||||
``ansible-playbook -vv
|
||||
src/opendev.org/opendev/system-config/playbooks/service-<name>.yaml``
|
||||
* Restore automated ansible runs.
|
||||
* You can also use the ``--limit`` flag to restrict which hosts run
|
||||
when there are many in a group. However, be aware that some
|
||||
roles/playbooks like ``letsencrypt`` and ``backup`` run across
|
||||
multiple hosts (deploying DNS records or authorization keys), so
|
||||
incorrect ``--limit`` flags could cause further failures.
|
||||
|
||||
.. _cinder:
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user