system-config/doc/source/sysadmin.rst

:title: System Administration

.. _sysadmin:

System Administration
#####################

Our infrastructure is code and contributions to it are handled just
like the rest of OpenDev.  This means that anyone can contribute to
the installation and long-running maintenance of systems without shell
access, and anyone who is interested can provide feedback and
collaborate on code reviews.

The configuration of every system operated by the infrastructure team
is managed by Ansible and driven by continuous integration and
deployment by Zuul.

  https://opendev.org/opendev/system-config

All system configuration should be encoded in that repository so that
anyone may propose a change in the running configuration to Gerrit.

Guide to CI and CD
==================

All development work is based around Zuul jobs and a continuous
integration and development workflow.

The starting point for all services is generally the playbooks and
roles kept in :git_file:`playbooks`.
Most playbooks are named ``service-<name>.yaml`` and will indicate
which production areas they drive.

These playbooks run on groups of hosts which are defined in
:git_file:`inventory/service/groups`.  The production hosts are kept
in an inventory at :git_file:`inventory/base/hosts.yaml`.  During
testing, these same playbooks are run against the test nodes.  You can
note that the testing hosts are given names that match the group
configuration in the jobs defined in
:git_file:`zuul.d/system-config-run.yaml`.

Deployment is run through a bastion host ``bridge.openstack.org``.
After changes are approved, Zuul will run Ansible on this host; which
will then connect to the production hosts and run the orchestration
using the latest committed code.  The bridge is a special host because
it holds production secrets, such as passwords or API keys, and
unredacted logs.  As many logs as possible are provided in the public
Zuul job results, but they need to be audited to ensure they do not
leak secrets and thus in some cases may not be published.

For CI testing, each job creates a "fake" bridge, along with the
servers required for orchestration.  Thus CI testing is performed by a
"nested" Ansible -- Zuul initially connects to the testing bridge node
and deploys it, and then this node runs its own Ansible that tests the
orchestration to the other testing nodes, simulating the production
environment.  This is driven by playbooks kept in
:git_file:`playbooks/zuul`.  Here you will also find testing
definitions of host variables that are kept secret for production
hosts.

After the test environment is orchestrated, the
`testinfra <https://testinfra.readthedocs.io/en/latest/>`__ tests from
:git_file:`testinfra` are run.  This validates the complete
orchestration testing environment; things such as ensuring user
creation, container readiness and service wellness checks are all
performed.

.. _adding_new_server:

Adding a New Server
===================

Creating a new server for your service requires discussion with the
OpenDev administrators to ensure donor resources are being used
effectively.

* Hosts should only be configured by Ansible.  Nonetheless, in some
  cases SSH access can be granted.  Add your public key to
  :git_file:`inventory/base/group_vars/all.yaml` and include a stanza
  like this in your server ``host_vars``::

    extra_users:
      - your_user_name

* Add an RST file with documentation about the server and services in
  :git_file:`doc/source` and add it to the index in that directory.

SSH Access
==========

For any of the systems managed by the OpenDev Infrastructure team, the
following practices must be observed for SSH access:

* SSH access is only permitted with SSH public/private key
  authentication.
* Users must use a strong passphrase to protect their private key.  A
  passphrase of several words, at least one of which is not in a
  dictionary is advised, or a random string of at least 16
  characters.
* To mitigate the inconvenience of using a long passphrase, users may
  want to use an SSH agent so that the passphrase is only requested
  once per desktop session.
* Users private keys must never be stored anywhere except their own
  workstation(s).  In particular, they must never be stored on any
  remote server.
* If users need to 'hop' from a server or bastion host to another
  machine, they must not copy a private key to the intermediate
  machine (see above).  Instead SSH agent forwarding may be used.
  However due to the potential for a compromised intermediate machine
  to ask the agent to sign requests without the users knowledge, in
  this case only an SSH agent that interactively prompts the user
  each time a signing request (ie, ssh-agent, but not gnome-keyring)
  is received should be used, and the SSH keys should be added with
  the confirmation constraint ('ssh-add -c').
* The number of SSH keys that are configured to permit access to
  OpenDev machines should be kept to a minimum.
* OpenDev Infrastructure machines must use Ansible to centrally manage
  and configure user accounts, and the SSH authorized_keys files from
  the opendev/system-config repository.
* SSH keys should be periodically rotated (at least once per year).
  During rotation, a new key can be added to puppet for a time, and
  then the old one removed.


GitHub Access
=============

To ensure that code review and testing are not bypassed in the public
Git repositories, only Gerrit will be permitted to commit code to
OpenDev repositories.  Because GitHub always allows project
administrators to commit code, accounts that have access to manage the
GitHub projects necessarily will have commit access to the
repositories.

A shared Github administrative account is available (credentials
stored in the global authentication location).  If administrators
would prefer to keep a separate account, it can be added to the
organisation after discussion and noting the caveats around elevated
access.  The account must have 2FA enabled.

In either case, the administrator accounts should not be used to check
out or commit code for any project.

Note that it is unlikely to be useful to use an account also used for
active development, as you will be subscribed to many notifications
for all projects.

Root only information
#####################

Below is information relevant to members of the core team with root
access.

Accessing Clouds
================

As an unprivileged user who is a member of the `sudo` group on bridge,
you can inspect any of the clouds with::

  sudo openstack --os-cloud <cloud name> --os-cloud-region <region name>

Backups
=======

Infra uses the `borg <https://borgbackup.readthedocs.io>`__ backup
tool.

Hosts in the ``borg-backup`` Ansible inventory group will be backed up
to servers in the ``borg-backup-server`` group with ``borg``.  The
``playbooks/roles/borg-backup`` and
``playbooks/roles/borg-backup-server`` roles implement the required
setup.

The backup server has a unique Unix user for each host to be backed
up.  The roles will setup required users, their home directories in
the backup volume and relevant ``authorized_keys``.

Host backup happens via a daily cron job (managed by Ansible) on each
individual host to be backed up.  The host to be backed up initiates
the backup process to the remote backup server(s) using a separate ssh
key setup just for backup communication (see ``/root/.ssh/config``).

Restore from Backup
-------------------

``borg`` has many options for restoring but a basic way to dump a host
at a particular time is to

* log into the backup server
* sudo ``su -`` to switch to the backup user for the host to be restored
* you will now be in the home directory of that user
* run ``/opt/borg/bin/borg list ./backup`` to list the archives available
* these should look like ``hostname-YYYY-MM-DDTHH:MM:SS``
* move to working directory
* extract one of the appropriate archives with ``/opt/borg/bin/borg extract ~/backup <archive-tag>``

Rotating backup storage
-----------------------

We run ``borg`` in append-only mode, so that clients can not remove
old backups on the server.

TODO(ianw) : Write instructions on how to prune server side.  We
should monitor growth to see if automatic pruning would be
appropriate, or periodic manual pruning, or something similar to this
existing system where we keep a historic archive and start fresh.

The backup server keeps an active volume and the previously rotated
volume.  Each consists of 3 x 1TiB volumes grouped with LVM.  The
volumes are mounted at ``/opt/backups-YYYYMM`` for the date it was
created; ``/opt/backups`` is a symlink to the latest volume.
Periodically we rotate the active volume for a fresh one.  Follow this
procedure:

#. Create the new volumes via API (on ``bridge.o.o``).  Create 3
   volumes, named for the server with the year and date added::

     DATE=$(date +%Y%m)
     OS_VOLUME_API_VERSION=1
     OS_CMD="./env/bin/openstack --os-cloud-openstackci-rax --os-region=ORD"
     SERVER="backup01.ord.rax.ci.openstack.org"
     ${CMD} volume create --size 1024 ${SERVER}/main01-${DATE}
     ${CMD} volume create --size 1024 ${SERVER}/main02-${DATE}
     ${CMD} volume create --size 1024 ${SERVER}/main03-${DATE}

#. Attach the volumes to the backup server::
     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main01-${DATE}
     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main02-${DATE}
     ${OS_CMD} server add volume ${SERVER} ${SERVER}/main03-${DATE}

#. Now on the backup server, create the new backup LVM volume (get the
   device names from ``dmesg`` when they were attached).  For
   simplicity we create a new volume group for each backup series, and
   a single logical volume ontop::

     DATE=$(date +%Y%m)
     pvcreate /dev/xvd<DRIVE1> /dev/xvd<DRIVE2> /dev/xvd<DRIVE3>
     vgcreate main-${DATE} /dev/xvdX /dev/xvdY /dev/xvdZ
     lvcreate -l 100%FREE -n backups-${DATE} main-${DATE}

     mkfs.ext4 -m 0 -j -L "backups-${DATE}" /dev/main-${DATE}/backups-${DATE}
     tune2fs -i 0 -c 0 /dev/main-${DATE}/backups-${DATE}

     mkdir /opt/backups-${DATE}
     # manually add mount details to /etc/fstab
     mount /opt/backups-${DATE}

#. Making sure there are no backups currently running you can now
   begin to switch the backups (you can stop the ssh service, but be
   careful not to then drop your connection and lock yourself out; you
   can always reboot via the API if you do).  Firstly, edit
   ``/etc/fstab`` and make the current (soon to be *old*) backup
   volume mount read-only.  Unmount the old volume and then remount it
   (now as read-only).  This should prevent any accidental removal of
   the existing backups during the following procedures.

#. Pre-seed the new backup directory (same terminal as above).  This
   will copy all the directories and authentication details (but none
   of the actual backups) and initalise for fresh backups::

     cd /opt/backups-${DATE}
     rsync -avz --exclude '.bup' /opt/backups/ .
     for dir in bup-*; do su $dir -c "BUP_DIR=/opt/backups-${DATE}/$dir/.bup bup init"; done
#. The ``/opt/backups`` symlink can now be switched to the new
   volume::

     ln -sf /opt/backups-${DATE} /opt/backups
#. ssh can be re-enabled and the new backup volume is effectively
   active.

#. Now run a test backup from a server manually.  Choose one, get the
   backup command from cron and run it manually in a screen (it might
   take a while), ensuring everything seems to be writing correctly to
   the new volume.

#. You can now clean up the oldest backups (the one *before* the one
   you just rotated).  Remove the mount from fstab, unmount the volume
   and cleanup the LVM components::

     DATE=<INSERT OLD DATE CODE HERE>
     umount /opt/backups-${DATE}
     lvremove /dev/main-${DATE}/backups-${DATE}
     vgremove main-${DATE}
     # pvremove the volumes; they will have PFree @ 1024.00g as
     # they are now not assigned to anything
     pvremove /dev/xvd<DRIVE1>
     pvremove /dev/xvd<DRIVE2>
     pvremove /dev/xvd<DRIVE3>

#. Remove volumes via API (opposite of adding above with ``server
   volume detach`` then ``volume delete``).

#. Done!  Come back and rotate it again next year.


.. _force-merging-a-change:

Force-Merging a Change
======================

Occasionally it is necessary to bypass the CI system and merge a
change directly.  Usually, this is only required if we have a hole in
our testing of the CI or related systems themselves and have merged a
change which causes them to be unable to operate normally and
therefore unable to merge a reversion of the problematic change.  In
these cases, use the following procedure to force-merge a change.

* Add yourself to the *Project Bootstrappers* group in Gerrit.

* Navigate to the change which needs to be merged and reload the page.

* Remove any -2 votes on the change.

* Add +2 Code-Review, and +1 Workflow votes if necessary, then add +2
  Verified. Also leave a review comment briefly explaining why this
  was necessary, and make sure to mention it in the #opendev
  IRC channel (ideally as a #status log entry for the benefit of
  those not paying close attention to scrollback).

* At this point, a *Submit* Button should appear, click it.  The
  change should now be merged.

* Remove yourself from *Project Bootstrappers*

This procedure is the safest way to force-merge a change, ensuring
that all of the normal steps that Gerrit performs on repos still
happen.

Launching New Servers
=====================

New servers are launched using the ``launch/launch-node.py`` tool from the git
repository ``https://opendev.org/opendev/system-config``. This
tool is run from a checkout on the bridge - please see :git_file:`launch/README.rst`
for detailed instructions.

.. _disable-enable-ansible:

Disable/Enable Ansible
======================

You should normally not make manual changes to servers, but instead,
make changes through ansible or puppet.  However, under some circumstances,
you may need to temporarily make a manual change to a managed
resource on a server.

OpenDev uses a Static Inventory in Ansible to control execution of Ansible
on hosts. A full understanding
of the concepts in
`Ansible Inventory Introduction
<http://docs.ansible.com/ansible/intro_inventory.html>`_
is essential for being able to make informed decisions about actions
to take.

In the case of needing to disable the running of ansible or puppet on a node,
it's a simple matter of adding an entry to the ansible inventory "disabled" group
in :git_file:`inventory/groups.yaml`. The
disabled entry is an input to `ansible --list-hosts` so you can check your
entry simply by running it with `ansible $hostlist --list-hosts` as root
on the bridge host and ensuring that the list of hosts returned is as
expected. Globs, group names and server UUIDs should all be acceptable input.

If you need to disable a host immediately without waiting for a patch to land
to `system-config`, there is a file on the bridge host,
`/etc/ansible/hosts/emergency.yaml` that can be edited directly.

`/etc/ansible/hosts/emergency.yaml` is a file that should normally be empty,
but the contents are not managed by ansible. It's purpose is to allow for
disabling ansible at times when landing a change to the ansible repo would be
either unreasonable or impossible.

Disabling puppet via ansible inventory does not disable puppet from being
able to be run directly on the host, it merely prevents ansible from
attempting to run it during the regular zuul jobs. If you choose to run
puppet manually on a host, take care to ensure that it has not been disabled
at the bridge level first.

If you need to pause all execution of ansible playbooks by Zuul you can
run the utility script ``disable-ansible``. The script touches the file
``/home/zuul/DISABLE-ANSIBLE`` on bridge.openstack.org. Doing
this forces the Zuul jobs that run ansible for us to wait until that file is
removed. This acts like a global pause. The script exists to prevent admins
from misspelling the name of the file and is recommended.

Examples
--------

To disable an OpenDev instance called `foo.opendev.org` temporarily,
ensure the following is in `/etc/ansible/hosts/emergency.yaml`

::

  # Please add an inline comment so we know who added the host and why
  plugin: yamlgroup
  groups:
    disabled:
      - foo.opendev.org # 2020-05-23 bob is testing change 654321

Ad-hoc Ansible runs
===================

If you need to run Ansible manually against a host, you should

* disable automated Ansible runs following the section above
* ``su`` to the ``zuul`` user and run the playbook with something like
  ``ansible-playbook -vv
  src/opendev.org/opendev/system-config/playbooks/service-<name>.yaml``
* Restore automated ansible runs.
* You can also use the ``--limit`` flag to restrict which hosts run
  when there are many in a group.  However, be aware that some
  roles/playbooks like ``letsencrypt`` and ``backup`` run across
  multiple hosts (deploying DNS records or authorization keys), so
  incorrect ``--limit`` flags could cause further failures.

.. _cinder:

Cinder Volume Management
========================

Adding a New Device
-------------------

If the main volume group doesn't have enough space for what you want
to do, this is how you can add a new volume.

Log into bridge.openstack.org and run::

  export OS_CLOUD=openstackci-rax
  export OS_REGION_NAME=DFW

  openstack server list
  openstack volume list

Change the variables to use a different environment. ORD for example::

  export OS_CLOUD=openstackci-rax
  export OS_REGION_NAME=ORD

* Add a new 1024G cinder volume (substitute the hostname and the next number
  in series for NN)::

    openstack volume create --size 1024 "$HOSTNAME.ord.openstack.org/mainNN"
    openstack server add volume "HOSTNAME.openstack.org" "HOSTNAME.openstack.org/mainNN"

* or to add a 100G SSD volume::

    openstack volume create --type SSD --size 100 "HOSTNAME.openstack.org/mainNN"
    openstack server add volume "HOSTNAME.openstack.org" "HOSTNAME.openstack.org/mainNN"

* Then, on the host, create the partition table::

    DEVICE=/dev/xvdX
    sudo parted $DEVICE mklabel msdos mkpart primary 0% 100% set 1 lvm on
    sudo pvcreate ${DEVICE}1

* It should show up in pvs::

    $ sudo pvs
      PV         VG   Fmt  Attr PSize    PFree
      /dev/xvdX1      lvm2 a-   1024.00g 1024.00g

* Add it to the main volume group::

    sudo vgextend main ${DEVICE}1

* However, if the volume group does not exist yet, you can create it::

    sudo vgcreate main ${DEVICE}1

Creating a New Logical Volume
-----------------------------

Make sure there is enough space in the volume group::

  $ sudo vgs
    VG   #PV #LV #SN Attr   VSize VFree
    main   4   2   0 wz--n- 2.00t 347.98g

If not, see `Adding a New Device`_.

Create the new logical volume and initialize the filesystem::

  NAME=newvolumename
  sudo lvcreate -L1500GB -n $NAME main

  sudo mkfs.ext4 -m 0 -j -L $NAME /dev/main/$NAME
  sudo tune2fs -i 0 -c 0 /dev/main/$NAME

Be sure to add it to ``/etc/fstab``.

Expanding an Existing Logical Volume
------------------------------------

Make sure there is enough space in the volume group::

  $ sudo vgs
    VG   #PV #LV #SN Attr   VSize VFree
    main   4   2   0 wz--n- 2.00t 347.98g

If not, see `Adding a New Device`_.

The following example increases the size of a volume by 100G::

  NAME=volumename
  sudo lvextend -L+100G /dev/main/$NAME
  sudo resize2fs /dev/main/$NAME

The following example increases the size of a volume to the maximum allowable::

  NAME=volumename
  sudo lvextend -l +100%FREE /dev/main/$NAME
  sudo resize2fs /dev/main/$NAME

Replace an Existing Device
--------------------------

We generally need to do this if our cloud provider is planning maintenance to a
volume. We usually get a few days heads up on maintenance window, so depending
on the size of the volume, it may take some time to replace.

First thing to do is add the replacement device to the server, see
`Adding a New Device`_. Be sure the replacement volume is the same type / size
as the existing.

If the step above were followed, you should see something like::

  $ sudo pvs
    PV         VG   Fmt  Attr PSize  PFree
    /dev/xvdb1 main lvm2 a--  50.00g     0
    /dev/xvdc1 main lvm2 a--  50.00g 50.00g

Be sure both devices are in the same VG (volume group), if not you did not
properly extend the device.

.. note::
   Be sure to use a screen session for the following step!

Next is to move the data from once device to another::

  $ sudo pvmove /dev/xvdb1 /dev/xvdc1
    /dev/xvdb1: Moved: 0.0%
    /dev/xvdb1: Moved: 1.8%
    ...
    ...
    /dev/xvdb1: Moved: 99.4%
    /dev/xvdb1: Moved: 100.0%

Confirm all the data was moved, and the original device is empty (PFree)::

  $ sudo pvs
    PV         VG   Fmt  Attr PSize  PFree
    /dev/xvdb1 main lvm2 a--  50.00g 50.00g
    /dev/xvdc1 main lvm2 a--  50.00g     0

And remove the device from the main volume group::

  $ sudo vgreduce main /dev/xvdb1
    Removed "/dev/xvdb1" from volume group "main"

To be safe, we can also wipe the label from LVM::

  $ sudo pvremove /dev/xvdb1
    Labels on physical volume "/dev/xvdb1" successfully wiped

Leaving us with just a single device::

  $ sudo pvs
    PV         VG   Fmt  Attr PSize  PFree
    /dev/xvdc1 main lvm2 a--  50.00g    0

At this time, you are able to remove the original volume from openstack if
no longer needed.

Email
=====

There is a shared email account used for Infrastructure related mail
(account sign-ups, support tickets, etc).  Root admins should ensure
they have access to this account; access credentials are available
from any existing member.