docs: reorganise around a open infrastructure overview
This introduces and "Open Infrastructure" page which is designed for a moderately experienced developer with some understanding of Zuul, Ansible and basic Linux admin skills to have an entrypoint to navigating the system-config and related repositories. It is designed to re-enforce the idea of open infrastructure, and explain how development, testing and production come together at a level high enough to be understood, but with links or descriptions of specific places in the code to get started. It moves a little of what was in the sysadmin page into this, and leaves that page as more low-level descriptions of various tasks. Change-Id: I60a9299df455b98ad549ac0075a59d381722bc06
This commit is contained in:
parent
3f6cd427d7
commit
4c86706e5e
@ -183,9 +183,8 @@ After the cloud is configured, it can be added as a resource for
|
||||
nodepool to use for testing nodes.
|
||||
|
||||
Firstly, an ``infra-root`` member will need to make the region-local
|
||||
mirror server, configure any required storage for it and setup DNS
|
||||
(see :ref:`adding_new_server`). With this active, the cloud is ready
|
||||
to start running testing nodes.
|
||||
mirror server, configure any required storage for it and setup DNS.
|
||||
With this active, the cloud is ready to start running testing nodes.
|
||||
|
||||
At this point, the cloud needs to be added to nodepool configuration
|
||||
in `project-config
|
||||
|
@ -34,6 +34,7 @@ Contents:
|
||||
:maxdepth: 2
|
||||
|
||||
project
|
||||
open-infrastructure
|
||||
test-infra-requirements
|
||||
sysadmin
|
||||
systems
|
||||
|
301
doc/source/open-infrastructure.rst
Normal file
301
doc/source/open-infrastructure.rst
Normal file
@ -0,0 +1,301 @@
|
||||
:title: Open Infrastructure Technical Overview
|
||||
|
||||
.. _opendev-infra-overview:
|
||||
|
||||
Open Infrastructure Technical Overview
|
||||
######################################
|
||||
|
||||
The OpenDev system administration team strives to run the services
|
||||
behind the OpenDev Collaboratory as an open source project; we term
|
||||
this *open infrastructure*.
|
||||
|
||||
Our infrastructure is code and contributions to it are handled just
|
||||
like the rest of OpenDev. This means that anyone can contribute to
|
||||
the installation and long-running maintenance of systems without shell
|
||||
access, and anyone who is interested can provide feedback and
|
||||
collaborate on code reviews. There are no permissions or special
|
||||
privileges required to contribute to the OpenDev infrastructure
|
||||
project.
|
||||
|
||||
Below is a short guide to the major pieces of the project. Some
|
||||
knowledge of Zuul job configuration, Ansible, interaction with the
|
||||
Gerrit code-review system and general Linux administration are
|
||||
assumed; however expertise is not required.
|
||||
|
||||
Operating environment
|
||||
---------------------
|
||||
|
||||
The OpenDev production systems run in resources (compute, network,
|
||||
storage) provided by donations from companies who support the project.
|
||||
|
||||
Our standard production system is based on the latest Ubuntu LTS
|
||||
release.
|
||||
|
||||
Production systems are deployed by Ansible. Most production
|
||||
applications run from containers; some are custom built and others we
|
||||
use unmodified from upstream sources.
|
||||
|
||||
Zuul handles the testing and deployment of all changes. Current
|
||||
trends would refer to this as a *gitops* model -- all production
|
||||
changes are ultimately driven by a change proposed to the code-review
|
||||
system. This means we do not have bespoke production systems and any
|
||||
modifications we make are reviewed by peers and logged with change
|
||||
history.
|
||||
|
||||
We have a *bastion host*, or *bridge*, which is a static host with
|
||||
permissions to deploy to the production systems. Zuul will run
|
||||
Ansible on the production systems via this host to deploy new changes
|
||||
into production.
|
||||
|
||||
Getting started - CI
|
||||
--------------------
|
||||
|
||||
The configuration of every system operated by the OpenDev sysadmins is
|
||||
managed by Ansible and driven by continuous integration and deployment
|
||||
by Zuul. This is almost exclusively driven by code kept in the
|
||||
``system-config`` repository, which can be browsed at:
|
||||
|
||||
https://opendev.org/opendev/system-config
|
||||
|
||||
All system configuration should be encoded in that repository so that
|
||||
anyone may propose a change in the running configuration to Gerrit.
|
||||
|
||||
Any change to the OpenDev infrastructure system is first proposed as a
|
||||
review to this repository at ``review.opendev.org``. The current open
|
||||
reviews can be seen at
|
||||
|
||||
https://review.opendev.org/q/project:opendev/system-config
|
||||
|
||||
Zuul will first run CI on all incoming changes. Each service
|
||||
generally has its own CI job that runs when relevant files
|
||||
(configuration, Ansible roles, playbooks, etc.) are updated. These
|
||||
are generally called ``system-config-run-<service>``; Zuul will post a
|
||||
comment when the change has been tested, or you can see in-flight
|
||||
testing at the status page
|
||||
|
||||
https://zuul.opendev.org/t/openstack/status
|
||||
|
||||
These jobs are crafted in a way that they replicate production as much
|
||||
as possible. Reading the job definitions in in
|
||||
:git_file:`zuul.d/system-config-run.yaml` will give you a feel for the
|
||||
hosts that are set up with each job. When you view the job results in
|
||||
the Zuul UI, you will see many logs collected from a number of hosts
|
||||
that simulate the production environment. This has all the
|
||||
information you generally need to debug problems, but the best place
|
||||
to start is with the *artifacts* tab, which has some curated links to
|
||||
useful overviews.
|
||||
|
||||
One of the job artifacts is the `ARA report
|
||||
<https://ara.readthedocs.io/en/latest/>`__. This is a graphical view
|
||||
of the *nested* Ansible run on the (ephemeral) bastion host against
|
||||
the (ephemeral) production-test nodes. This is generally the first
|
||||
stop for finding deployment issues.
|
||||
|
||||
Another artifact is the ``testinfra results``. `Testinfra
|
||||
<https://testfinra.readthedoocs.io>`__ allows us to define
|
||||
unit-test-like behaviour to test functionality such as service and API
|
||||
status, correct deployment of users and files and other interesting
|
||||
details. Failures here would indicate the the deployment steps
|
||||
worked, but some part of the operation of that system is not as we
|
||||
expect. The ``testinfra`` code driving this is kept in
|
||||
:git_file:`testinfra` and test files are named for the service they
|
||||
test.
|
||||
|
||||
Finally there is a ``screenshots`` artifact, which is a link to a
|
||||
directory that some tests populate with image files. Tests that are
|
||||
bringing up interactive services will use a headless browser to take
|
||||
shots of important pages to verify correct operation.
|
||||
|
||||
The logs tab has links the the raw logs; this collects much more
|
||||
detail such as ``syslog``, Apache logs, database dumps, etc. Once you
|
||||
have identified the general problem from the above steps, these logs
|
||||
provide the in-depth details for further analysis.
|
||||
|
||||
Playbooks and roles
|
||||
-------------------
|
||||
|
||||
The starting point for all services is generally the playbooks and
|
||||
roles kept in :git_file:`playbooks/`. Most playbooks are named
|
||||
``service-<name>.yaml`` and will indicate from their naming which
|
||||
production areas they drive.
|
||||
|
||||
During testing, these same playbooks are run against the test nodes.
|
||||
You can note that the testing hosts are given names that match the
|
||||
group configuration in the jobs defined in
|
||||
:git_file:`zuul.d/system-config-run.yaml`.
|
||||
|
||||
These playbooks are usually small and they call out to roles where
|
||||
most of the work is done. Roles are kept in
|
||||
:git_file:`playbooks/roles/`. These roles are written to be as
|
||||
generic as possible, but they are not expected to be used outside the
|
||||
OpenDev production deployment system.
|
||||
|
||||
These playbooks and roles are the same for CI and deployment.
|
||||
|
||||
Hosts and variables
|
||||
-------------------
|
||||
|
||||
The playbooks above run on groups of hosts which are defined in
|
||||
:git_file:`inventory/service/groups.yaml`.
|
||||
|
||||
The production hosts are kept in an inventory at
|
||||
:git_file:`inventory/base/hosts.yaml`. In CI, the inventory is
|
||||
generated by Zuul (as it is allocating ephemeral nodes from the
|
||||
testing pool).
|
||||
|
||||
Public production and testing variables are kept under
|
||||
:git_file:`inventory/`. The one difference between CI and production
|
||||
is *secrets* such as API keys, tokens and passwords; in production the
|
||||
*nested* Ansible will populate these variables for the deployment
|
||||
directly from values stored on the bastion host. In CI, dummy values
|
||||
should be populated into the templates under
|
||||
:git_file:`playbooks/zuul/templates/`.
|
||||
|
||||
Production secrets are currently managed manually by OpenDev
|
||||
administrators on the bastion host.
|
||||
|
||||
Deployment
|
||||
----------
|
||||
|
||||
After review and approval of a change, Zuul will perform final gate
|
||||
testing and merge the change on your behalf.
|
||||
|
||||
Just as uploading a new change triggers Zuul to run CI tests in the
|
||||
*check* pipeline, and approving a change triggers Zuul to run gate
|
||||
tests and merge in the *gate* pipeline, the merge of a change triggers
|
||||
Zuul to run the deployment jobs in the *deploy* pipeline.
|
||||
|
||||
These jobs are named ``infra-prod-<service>`` and run the same
|
||||
playbooks and roles as in the CI system, except against the production
|
||||
services. Zuul will deploy the merged changes to the bastion host,
|
||||
and then trigger the bastion host to run a *nested* Ansible deployment
|
||||
against the production host..
|
||||
|
||||
Since the production run logs may leak sensitive information, they are
|
||||
not published openly. You can add a GPG public key to
|
||||
:git_file:`playbooks/zuul/roles/encrypt-logs/defaults/main.yaml` and
|
||||
then ensure the ``infra-prod-<service>`` production has your name in
|
||||
its ``encrypt_logs_job_recipients`` variable. Once approved and
|
||||
committed, you will then be able to view the encrypted production log
|
||||
output provided via the Zuul build page for the production run.
|
||||
|
||||
Containers
|
||||
----------
|
||||
|
||||
Most services are containerised. When looking at the
|
||||
``system-config-run-*`` and ``infra-prod-*`` jobs you may see dependencies
|
||||
on container build/upload/promote jobs; this indicates we have jobs
|
||||
that build a bespoke container for this environment.
|
||||
|
||||
The base ``Dockerfile`` for these containers is found under
|
||||
:git_file:``docker/``. Most are straight forward, but some of the more
|
||||
complicated services have multiple steps and layers. Any changes to
|
||||
the ``Dockerfile`` will be tested as usual, and when approved the
|
||||
containers will be rebuilt, published and pulled onto the production
|
||||
systems automatically.
|
||||
|
||||
Certificates
|
||||
------------
|
||||
|
||||
We provision SSL certificates from LetsEncrypt; see
|
||||
:ref:`letsencrypt`.
|
||||
|
||||
DNS
|
||||
---
|
||||
|
||||
DNS for ``opendev.org`` (and some other domains) is also handled through
|
||||
the review system; see the
|
||||
`<https://opendev.org/opendev/zone-opendev.org/>`__ project.
|
||||
|
||||
Backups
|
||||
-------
|
||||
|
||||
Any host in the ``backup`` group will have backups to two
|
||||
geographically distinct locations setup by the deployment
|
||||
infrastructure. See the ``borg-backup`` role for details on including
|
||||
or excluding various data.
|
||||
|
||||
Remote access
|
||||
-------------
|
||||
|
||||
Hosts are only configured by Ansible, but they can be setup for
|
||||
interactive access if required.
|
||||
|
||||
Add your public key to :git_file:`inventory/base/group_vars/all.yaml`
|
||||
and include a stanza like this in your server ``host_vars``::
|
||||
|
||||
extra_users:
|
||||
- your_user_name
|
||||
|
||||
See :ref:`ssh-access` for details on keys.
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
Each service should have an RST file with documentation about the
|
||||
server and services in :git_file:`doc/source/`.
|
||||
|
||||
Submitting Changes
|
||||
------------------
|
||||
|
||||
If you are not familiar with submitting changes to Gerrit, you can
|
||||
start with any of the various developer guides such as ::
|
||||
|
||||
https://docs.opendev.org/opendev/infra-manual/latest/gettingstarted.html
|
||||
https://docs.openstack.org/doc-contrib-guide/quickstart/first-timers.html
|
||||
https://docs.opendev.org/opendev/infra-manual/latest/developers.html
|
||||
|
||||
The change description is very important and the major source of
|
||||
historical information. It is expected a developer can read the
|
||||
description of a change and have the context to generally understand
|
||||
why it was introduced. Comments in the code-review system are useful
|
||||
to understand the deeper history of each change, but each change
|
||||
should stand-alone once committed. Only the most trivial of changes
|
||||
that are completely self-evident (e.g. typo fixes) would be expected
|
||||
to have less than a few sentences of context in their change log.
|
||||
|
||||
Lifecycle
|
||||
---------
|
||||
|
||||
We welcome all changes and contributions to the project.
|
||||
|
||||
Before starting work to deploy a new service that will require
|
||||
resources, you should do some preparation work. Putting an item on
|
||||
the `weekly team meeting agenda
|
||||
<https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting>`__ agenda
|
||||
is always welcome. Logs of previous meetings can be seen at
|
||||
`<https://meetings.opendev.org/#OpenDev_Meeting>`__. More complicated
|
||||
changes may justify going through the spec process; see
|
||||
`<https://opendev.org/opendev/infra-specs>`_. If the existing admins
|
||||
are aware of the details before reviews start appearing it makes the
|
||||
process much smoother.
|
||||
|
||||
All preliminary work can be done in an iterative fashion using the CI
|
||||
jobs at your own pace. The ``#opendev`` IRC channel on ``OFTC`` is a
|
||||
good place to find help during this process. Alternatively, questions
|
||||
are welcome on the `service-discuss list
|
||||
<http://lists.opendev.org/cgi-bin/mailman/listinfo/service-discuss>`__
|
||||
This change (or changes) will be reviewed and may take a few rounds
|
||||
before final approval (in Gerrit terms, a ``+2`` vote). Most changes
|
||||
will receive a few ``-1`` votes from reviewers during development.
|
||||
This is really just a flag to note that some further discussion is
|
||||
required; it is not a rejection.
|
||||
|
||||
You can set ``Workflow`` to ``-1`` in Gerrit on changes you are
|
||||
working on, or some developers like to put ``[WIP]`` at the front of
|
||||
their change description to indicate to reviewers they probably
|
||||
shouldn't spend much time on this yet, as you are still working on it.
|
||||
Small, stand-alone sequential changes are encouraged, and Zuul makes
|
||||
testing such "stacks" of changes trivial.
|
||||
|
||||
We currently have admins manually deploy production virtual-machines,
|
||||
storage attached to those machines and secrets to the bastion host.
|
||||
This will need to happen before changes are put into production.
|
||||
Discussion with the admins will help decide on which cloud provider,
|
||||
the VM storage/size and other such matters.
|
||||
|
||||
Once resources are allocated and the new host is available in the
|
||||
inventory, the production jobs can deploy. After this the service
|
||||
moves into a maintenance phase; changes can be proposed and, after
|
||||
review, deployed.
|
||||
|
@ -1,89 +1,15 @@
|
||||
:title: System Administration
|
||||
|
||||
This page collects technical information of relevance to those
|
||||
interested in admin of OpenDev services. For a higher-level overview,
|
||||
see :ref:`opendev-infra-overview`.
|
||||
|
||||
.. _sysadmin:
|
||||
|
||||
System Administration
|
||||
#####################
|
||||
|
||||
Our infrastructure is code and contributions to it are handled just
|
||||
like the rest of OpenDev. This means that anyone can contribute to
|
||||
the installation and long-running maintenance of systems without shell
|
||||
access, and anyone who is interested can provide feedback and
|
||||
collaborate on code reviews.
|
||||
|
||||
The configuration of every system operated by the infrastructure team
|
||||
is managed by Ansible and driven by continuous integration and
|
||||
deployment by Zuul.
|
||||
|
||||
https://opendev.org/opendev/system-config
|
||||
|
||||
All system configuration should be encoded in that repository so that
|
||||
anyone may propose a change in the running configuration to Gerrit.
|
||||
|
||||
Guide to CI and CD
|
||||
==================
|
||||
|
||||
All development work is based around Zuul jobs and a continuous
|
||||
integration and development workflow.
|
||||
|
||||
The starting point for all services is generally the playbooks and
|
||||
roles kept in :git_file:`playbooks`.
|
||||
Most playbooks are named ``service-<name>.yaml`` and will indicate
|
||||
which production areas they drive.
|
||||
|
||||
These playbooks run on groups of hosts which are defined in
|
||||
:git_file:`inventory/service/groups.yaml`. The production hosts are kept
|
||||
in an inventory at :git_file:`inventory/base/hosts.yaml`. During
|
||||
testing, these same playbooks are run against the test nodes. You can
|
||||
note that the testing hosts are given names that match the group
|
||||
configuration in the jobs defined in
|
||||
:git_file:`zuul.d/system-config-run.yaml`.
|
||||
|
||||
Deployment is run through a bastion host ``bridge.openstack.org``.
|
||||
After changes are approved, Zuul will run Ansible on this host; which
|
||||
will then connect to the production hosts and run the orchestration
|
||||
using the latest committed code. The bridge is a special host because
|
||||
it holds production secrets, such as passwords or API keys, and
|
||||
unredacted logs. As many logs as possible are provided in the public
|
||||
Zuul job results, but they need to be audited to ensure they do not
|
||||
leak secrets and thus in some cases may not be published.
|
||||
|
||||
For CI testing, each job creates a "fake" bridge, along with the
|
||||
servers required for orchestration. Thus CI testing is performed by a
|
||||
"nested" Ansible -- Zuul initially connects to the testing bridge node
|
||||
and deploys it, and then this node runs its own Ansible that tests the
|
||||
orchestration to the other testing nodes, simulating the production
|
||||
environment. This is driven by playbooks kept in
|
||||
:git_file:`playbooks/zuul`. Here you will also find testing
|
||||
definitions of host variables that are kept secret for production
|
||||
hosts.
|
||||
|
||||
After the test environment is orchestrated, the
|
||||
`testinfra <https://testinfra.readthedocs.io/en/latest/>`__ tests from
|
||||
:git_file:`testinfra` are run. This validates the complete
|
||||
orchestration testing environment; things such as ensuring user
|
||||
creation, container readiness and service wellness checks are all
|
||||
performed.
|
||||
|
||||
.. _adding_new_server:
|
||||
|
||||
Adding a New Server
|
||||
===================
|
||||
|
||||
Creating a new server for your service requires discussion with the
|
||||
OpenDev administrators to ensure donor resources are being used
|
||||
effectively.
|
||||
|
||||
* Hosts should only be configured by Ansible. Nonetheless, in some
|
||||
cases SSH access can be granted. Add your public key to
|
||||
:git_file:`inventory/base/group_vars/all.yaml` and include a stanza
|
||||
like this in your server ``host_vars``::
|
||||
|
||||
extra_users:
|
||||
- your_user_name
|
||||
|
||||
* Add an RST file with documentation about the server and services in
|
||||
:git_file:`doc/source` and add it to the index in that directory.
|
||||
.. _ssh-access:
|
||||
|
||||
SSH Access
|
||||
==========
|
||||
|
Loading…
Reference in New Issue
Block a user