docs: Update some of sysadmin details

Give a little more details on the current ci/cd setup; remove puppet cruft. Change-Id: I684df4459cf5940d70b89e4c05103f8a8352af87
2020-09-03 13:54:14 +10:00 · 2020-09-03 13:54:14 +10:00 · e3fb7d2be0
commit e3fb7d2be0
parent 642c6c2d88
1 changed files with 82 additions and 137 deletions
--- a/doc/source/sysadmin.rst
+++ b/doc/source/sysadmin.rst
@ -6,147 +6,89 @@ System Administration
 #####################

 Our infrastructure is code and contributions to it are handled just
-like the rest of OpenStack.  This means that anyone can contribute to
+like the rest of OpenDev.  This means that anyone can contribute to
 the installation and long-running maintenance of systems without shell
 access, and anyone who is interested can provide feedback and
 collaborate on code reviews.

 The configuration of every system operated by the infrastructure team
-is managed by a combination of Ansible and Puppet:
+is managed by Ansible and driven by continuous integration and
+deployment by Zuul.

  https://opendev.org/opendev/system-config

 All system configuration should be encoded in that repository so that
 anyone may propose a change in the running configuration to Gerrit.

-Making a Change in Puppet
-=========================
+Guide to CI and CD
+==================

-Many changes to the Puppet configuration can safely be made while only
-performing syntax checks.  Some more complicated changes merit local
-testing and an interactive development cycle.  The system-config repo
-is structured to facilitate local testing before proposing a change
-for review.  This is accomplished by separating the puppet
-configuration into several layers with increasing specificity about
-site configuration higher in the stack.
+All development work is based around Zuul jobs and a continuous
+integration and development workflow.

-The `modules/` directory holds puppet modules that abstractly describe
-the configuration of a service.  Ideally, these should have no
-OpenStack-specific information in them, and eventually they should all
-become modules that are directly consumed from PuppetForge, only
-existing in the system-config repo during an initial incubation period.
-This is not yet the case, so you may find OpenStack-specific
-configuration in these modules, though we are working to reduce it.
+The starting point for all services is generally the playbooks and
+roles kept in :git_file:`playbooks`.
+Most playbooks are named ``service-<name>.yaml`` and will indicate
+which production areas they drive.

-The `modules/openstack_project/manifests/` directory holds
-configuration for each of the servers that the OpenStack project runs.
-Think of these manifests as describing how OpenStack runs a particular
-service.  However, no site-specific configuration such as hostnames or
-credentials should be included in these files.  This is what lets you
-easily test an OpenStack project manifest on your own server.
+These playbooks run on groups of hosts which are defined in
+:git_file:`inventory/service/groups`.  The production hosts are kept
+in an inventory at :git_file:`inventory/base/hosts.yaml`.  During
+testing, these same playbooks are run against the test nodes.  You can
+note that the testing hosts are given names that match the group
+configuration in the jobs defined in
+:git_file:`zuul.d/system-config-run.yaml`.

-Finally, the `manifests/site.pp` file contains the information that is
-specific to the actual servers that OpenStack runs.  These should be
-very simple node definitions that largely exist simply to provide
-private data from hiera to the more robust manifests in the
-`openstack_project` modules.
+Deployment is run through a bastion host ``bridge.openstack.org``.
+After changes are approved, Zuul will run Ansible on this host; which
+will then connect to the production hosts and run the orchestration
+using the latest committed code.  The bridge is a special host because
+it holds production secrets, such as passwords or API keys, and
+unredacted logs.  As many logs as possible are provided in the public
+Zuul job results, but they need to be audited to ensure they do not
+leak secrets and thus in some cases may not be published.

-This means that you can run the same configuration on your own server
-simply by providing a different manifest file instead of site.pp.
-
-.. note::
-   The example below is for Debian / Ubuntu systems.  If you are using a
-   Red Hat based system be sure to setup sudo or simply run the commands as
-   the root user.
-
-As an example, to run the etherpad configuration on your own server,
-start by ensuring `git` is installed and then cloning the system-config
-Git repo::
-
-  sudo su -
-  apt-get install git
-  git clone https://opendev.org/opendev/system-config
-  cd system-config
-
-Then copy the etherpad node definition from `manifests/site.pp` to a new
-file (be sure to specify the FQDN of the host you are working with in
-the node specifier).  It might look something like this::
-
-  # local.pp
-  class { 'openstack_project::etherpad':
-    ssl_cert_file_contents  => hiera('etherpad_ssl_cert_file_contents'),
-    ssl_key_file_contents   => hiera('etherpad_ssl_key_file_contents'),
-    ssl_chain_file_contents => hiera('etherpad_ssl_chain_file_contents'),
-    mysql_host              => hiera('etherpad_db_host', 'localhost'),
-    mysql_user              => hiera('etherpad_db_user', 'username'),
-    mysql_password          => hiera('etherpad_db_password'),
-  }
-
-.. note::
-   Be sure not to use any of the hiera functionality from manifests/site.pp
-   since it is not installed yet. You should be able to comment out the logic
-   safely.
-
-Then to apply that configuration, run the following from the root of the
-system-config repository::
-
-  ./install_puppet.sh
-  ./install_modules.sh
-  puppet apply -l /tmp/manifest.log --modulepath=modules:/etc/puppet/modules manifests/local.pp
-
-That should turn the system you are logged into into an etherpad
-server with the same configuration as that used by the OpenStack
-project. You can edit the contents of the system-config repo and
-iterate ``puppet apply`` as needed. When you're ready to propose the
-change for review, you can propose the change with git-review. See the
-`Development workflow section in the Developer's Guide
-<https://docs.opendev.org/opendev/infra-manual/latest/developers.html#development-workflow>`_
-for more information.
-
-Accessing Clouds
-================
-
-As an unprivileged user who is a member of the `sudo` group on
-bridge, you can access any of the clouds with::
-
-  sudo openstack --os-cloud <cloud name> --os-cloud-region <region name>
+For CI testing, each job creates a "fake" bridge, along with the
+servers required for orchestration.  Thus CI testing is performed by a
+"nested" Ansible -- Zuul initially connects to the testing bridge node
+and deploys it, and then this node runs its own Ansible that tests the
+orchestration to the other testing nodes, simulating the production
+environment.  This is driven by playbooks kept in
+:git_file:`playbooks/zuul`.  Here you will also find testing
+definitions of host variables that are kept secret for production
+hosts.

+After the test environment is orchestrated, the
+`testinfra <https://testinfra.readthedocs.io/en/latest/>`__ tests from
+:git_file:`testinfra` are run.  This validates the complete
+orchestration testing environment; things such as ensuring user
+creation, container readiness and service wellness checks are all
+performed.

 .. _adding_new_server:

 Adding a New Server
 ===================

-To create a new server, do the following:
+Creating a new server for your service requires discussion with the
+OpenDev administrators to ensure donor resources are being used
+effectively.

-* Add a file in :git_file:`modules/openstack_project/manifests/` that defines a
-  class which specifies the configuration of the server.
-
-* Add a node pattern entry in :git_file:`manifests/site.pp` for the server
-  that uses that class. Make sure it supports an ordinal naming pattern
-  (e.g., fooserver01.openstack.org not just fooserver.openstack.org, even
-  if you're replacing an existing server) and that another server with the
-  same does not already exist in the ansible inventory.
-
-* If your server needs private information such as passwords, use
-  hiera calls in the site manifest, and ask an infra-core team member
-  to manually add the private information to hiera.
-
-* You should be able to install and configure most software only with
-  ansible or puppet.  Nonetheless, if you need SSH access to the host,
-  add your public key to :git_file:`inventory/service/group_vars/all.yaml` and
-  include a stanza like this in your server class::
+* Hosts should only be configured by Ansible.  Nonetheless, in some
+  cases SSH access can be granted.  Add your public key to
+  :git_file:`inventory/base/group_vars/all.yaml` and include a stanza
+  like this in your server ``host_vars``::

    extra_users:
      - your_user_name

-* Add an RST file with documentation about the server in :git_file:`doc/source`
-  and add it to the index in that directory.
+* Add an RST file with documentation about the server and services in
+  :git_file:`doc/source` and add it to the index in that directory.

 SSH Access
 ==========

-For any of the systems managed by the OpenStack Infrastructure team, the
+For any of the systems managed by the OpenDev Infrastructure team, the
 following practices must be observed for SSH access:

 * SSH access is only permitted with SSH public/private key
@ -171,14 +113,13 @@ following practices must be observed for SSH access:
  is received should be used, and the SSH keys should be added with
  the confirmation constraint ('ssh-add -c').
 * The number of SSH keys that are configured to permit access to
-  OpenStack machines should be kept to a minimum.
-* OpenStack Infrastructure machines must use puppet to centrally manage and
-  configure user accounts, and the SSH authorized_keys files from the
-  openstack-infra/system-config repository.
+  OpenDev machines should be kept to a minimum.
+* OpenDev Infrastructure machines must use Ansible to centrally manage
+  and configure user accounts, and the SSH authorized_keys files from
+  the opendev/system-config repository.
 * SSH keys should be periodically rotated (at least once per year).
  During rotation, a new key can be added to puppet for a time, and
-  then the old one removed.  Be sure to run puppet on the backup
-  servers to make sure they are updated.
+  then the old one removed.


 GitHub Access
@ -186,7 +127,7 @@ GitHub Access

 To ensure that code review and testing are not bypassed in the public
 Git repositories, only Gerrit will be permitted to commit code to
-OpenStack repositories.  Because GitHub always allows project
+OpenDev repositories.  Because GitHub always allows project
 administrators to commit code, accounts that have access to manage the
 GitHub projects necessarily will have commit access to the
 repositories.
@ -197,7 +138,7 @@ would prefer to keep a separate account, it can be added to the
 organisation after discussion and noting the caveats around elevated
 access.  The account must have 2FA enabled.

-In either case, the adminstrator accounts should not be used to check
+In either case, the administrator accounts should not be used to check
 out or commit code for any project.

 Note that it is unlikely to be useful to use an account also used for
@ -207,26 +148,16 @@ for all projects.
 Root only information
 #####################

-Some information is only relevant if you have root access to the system - e.g.
-you are an OpenStack CI root operator, or you are running a clone of the
-OpenStack CI infrastructure for another project.
+Below is information relevant to members of the core team with root
+access.

-Force configuration run on a server
-===================================
+Accessing Clouds
+================

-If you need to force a configuration run on a single server before the
-usual cron job time, you can use the ``kick.sh`` script on
-``bridge.openstack.org``.
+As an unprivileged user who is a member of the `sudo` group on bridge,
+you can inspect any of the clouds with::

-You could do a single server::
-
-  # /opt/system-config/production/tools/kick.sh 'review.openstack.org'
-
-Or use matching to cover a range of servers::
-
-  # /opt/system-config/production/tools/kick.sh 'ze*.openstack.org'
-
-  # /opt/system-config/production/tools/kick.sh 'ze0[1-4].openstack.org'
+  sudo openstack --os-cloud <cloud name> --os-cloud-region <region name>

 Backups
 =======
@ -477,9 +408,8 @@ from misspelling the name of the file and is recommended.
 Examples
 --------

-To disable an OpenStack instance called `amazing.openstack.org` temporarily
-without landing a puppet change, ensure the following is in
-`/etc/ansible/hosts/emergency.yaml`
+To disable an OpenDev instance called `foo.opendev.org` temporarily,
+ensure the following is in `/etc/ansible/hosts/emergency.yaml`

 ::

@ -489,6 +419,21 @@ without landing a puppet change, ensure the following is in
    disabled:
      - foo.opendev.org # 2020-05-23 bob is testing change 654321

+Ad-hoc Ansible runs
+===================
+
+If you need to run Ansible manually against a host, you should
+
+* disable automated Ansible runs following the section above
+* ``su`` to the ``zuul`` user and run the playbook with something like
+  ``ansible-playbook -vv
+  src/opendev.org/opendev/system-config/playbooks/service-<name>.yaml``
+* Restore automated ansible runs.
+* You can also use the ``--limit`` flag to restrict which hosts run
+  when there are many in a group.  However, be aware that some
+  roles/playbooks like ``letsencrypt`` and ``backup`` run across
+  multiple hosts (deploying DNS records or authorization keys), so
+  incorrect ``--limit`` flags could cause further failures.

 .. _cinder: