[DOCS] Creating new folder for proposed operations guide

Moves pre-existing operations content to new folder Change-Id: I5c177dda2bba47e835fbd77cd63df3b52864c4d4 Implements: blueprint create-ops-guide
2016-11-16 16:16:23 +00:00 · 2016-11-16 16:16:23 +00:00 · a7f25d8162
commit a7f25d8162
parent 3070530e88
11 changed files with 1043 additions and 0 deletions
--- a/doc/source/draft-operations-guide/extending.rst
+++ b/doc/source/draft-operations-guide/extending.rst
@ -0,0 +1,306 @@
 ===========================
 Extending OpenStack-Ansible
 ===========================
 The OpenStack-Ansible project provides a basic OpenStack environment, but
 many deployers will wish to extend the environment based on their needs. This
 could include installing extra services, changing package versions, or
 overriding existing variables.
 Using these extension points, deployers can provide a more 'opinionated'
 installation of OpenStack that may include their own software.
 Including OpenStack-Ansible in your project
 -------------------------------------------
 Including the openstack-ansible repository within another project can be
 done in several ways.
    1. A git submodule pointed to a released tag.
    2. A script to automatically perform a git checkout of
       openstack-ansible
 When including OpenStack-Ansible in a project, consider using a parallel
 directory structure as shown in the `ansible.cfg files`_ section.
 Also note that copying files into directories such as `env.d`_ or
 `conf.d`_ should be handled via some sort of script within the extension
 project.
 ansible.cfg files
 -----------------
 You can create your own playbook, variable, and role structure while still
 including the OpenStack-Ansible roles and libraries by putting an
 ``ansible.cfg`` file in your ``playbooks`` directory.
 The relevant options for Ansible 1.9 (included in OpenStack-Ansible)
 are as follows:
    ``library``
        This variable should point to
        ``openstack-ansible/playbooks/library``. Doing so allows roles and
        playbooks to access OpenStack-Ansible's included Ansible modules.
    ``roles_path``
        This variable should point to
        ``openstack-ansible/playbooks/roles``. This allows Ansible to
        properly look up any OpenStack-Ansible roles that extension roles
        may reference.
    ``inventory``
        This variable should point to
        ``openstack-ansible/playbooks/inventory``. With this setting,
        extensions have access to the same dynamic inventory that
        OpenStack-Ansible uses.
 Note that the paths to the ``openstack-ansible`` top level directory can be
 relative in this file.
 Consider this directory structure::
    my_project
    |
    |- custom_stuff
    |  |
    |  |- playbooks
    |- openstack-ansible
    |  |
    |  |- playbooks
 The variables in ``my_project/custom_stuff/playbooks/ansible.cfg`` would use
 ``../openstack-ansible/playbooks/<directory>``.
 env.d
 -----
 The ``/etc/openstack_deploy/env.d`` directory sources all YAML files into the
 deployed environment, allowing a deployer to define additional group mappings.
 This directory is used to extend the environment skeleton, or modify the
 defaults defined in the ``playbooks/inventory/env.d`` directory.
 See also `Understanding Container Groups`_ in Appendix C.
 .. _Understanding Container Groups: ../install-guide/app-custom-layouts.html#understanding-container-groups
 conf.d
 ------
 Common OpenStack services and their configuration are defined by
 OpenStack-Ansible in the
 ``/etc/openstack_deploy/openstack_user_config.yml`` settings file.
 Additional services should be defined with a YAML file in
 ``/etc/openstack_deploy/conf.d``, in order to manage file size.
 See also `Understanding Host Groups`_ in Appendix C.
 .. _Understanding Host Groups: ../install-guide/app-custom-layouts.html#understanding-host-groups
 user\_*.yml files
 -----------------
 Files in ``/etc/openstack_deploy`` beginning with ``user_`` will be
 automatically sourced in any ``openstack-ansible`` command. Alternatively,
 the files can be sourced with the ``-e`` parameter of the ``ansible-playbook``
 command.
 ``user_variables.yml`` and ``user_secrets.yml`` are used directly by
 OpenStack-Ansible. Adding custom variables used by your own roles and
 playbooks to these files is not recommended. Doing so will complicate your
 upgrade path by making comparison of your existing files with later versions
 of these files more arduous. Rather, recommended practice is to place your own
 variables in files named following the ``user_*.yml`` pattern so they will be
 sourced alongside those used exclusively by OpenStack-Ansible.
 Ordering and Precedence
 +++++++++++++++++++++++
 ``user_*.yml`` variables are just YAML variable files. They will be sourced
 in alphanumeric order by ``openstack-ansible``.
 .. _adding-galaxy-roles:
 Adding Galaxy roles
 -------------------
 Any roles defined in ``openstack-ansible/ansible-role-requirements.yml``
 will be installed by the
 ``openstack-ansible/scripts/bootstrap-ansible.sh`` script.
 Setting overrides in configuration files
 ----------------------------------------
 All of the services that use YAML, JSON, or INI for configuration can receive
 overrides through the use of a Ansible action plugin named ``config_template``.
 The configuration template engine allows a deployer to use a simple dictionary
 to modify or add items into configuration files at run time that may not have a
 preset template option. All OpenStack-Ansible roles allow for this
 functionality where applicable. Files available to receive overrides can be
 seen in the ``defaults/main.yml`` file as standard empty dictionaries (hashes).
 Practical guidance for using this feature is available in the `Install Guide`_.
 This module has been `submitted for consideration`_ into Ansible Core.
 .. _Install Guide: ../install-guide/app-advanced-config-override.html
 .. _submitted for consideration: https://github.com/ansible/ansible/pull/12555
 Build the environment with additional python packages
 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 The system will allow you to install and build any package that is a python
 installable. The repository infrastructure will look for and create any
 git based or PyPi installable package. When the package is built the repo-build
 role will create the sources as Python wheels to extend the base system and
 requirements.
 While the packages pre-built in the repository-infrastructure are
 comprehensive, it may be needed to change the source locations and versions of
 packages to suit different deployment needs. Adding additional repositories as
 overrides is as simple as listing entries within the variable file of your
 choice. Any ``user_.*.yml`` file within the "/etc/openstack_deployment"
 directory will work to facilitate the addition of a new packages.
 .. code-block:: yaml
    swift_git_repo: https://private-git.example.org/example-org/swift
    swift_git_install_branch: master
 Additional lists of python packages can also be overridden using a
 ``user_.*.yml`` variable file.
 .. code-block:: yaml
    swift_requires_pip_packages:
      - virtualenv
      - virtualenv-tools
      - python-keystoneclient
      - NEW-SPECIAL-PACKAGE
 Once the variables are set call the play ``repo-build.yml`` to build all of the
 wheels within the repository infrastructure. When ready run the target plays to
 deploy your overridden source code.
 Module documentation
 ++++++++++++++++++++
 These are the options available as found within the virtual module
 documentation section.
 .. code-block:: yaml
    module: config_template
    version_added: 1.9.2
    short_description: >
      Renders template files providing a create/update override interface
    description:
      - The module contains the template functionality with the ability to
        override items in config, in transit, through the use of a simple
        dictionary without having to write out various temp files on target
        machines. The module renders all of the potential jinja a user could
        provide in both the template file and in the override dictionary which
        is ideal for deployers who may have lots of different configs using a
        similar code base.
      - The module is an extension of the **copy** module and all of attributes
        that can be set there are available to be set here.
    options:
      src:
        description:
          - Path of a Jinja2 formatted template on the local server. This can
            be a relative or absolute path.
        required: true
        default: null
      dest:
        description:
          - Location to render the template to on the remote machine.
        required: true
        default: null
      config_overrides:
        description:
          - A dictionary used to update or override items within a configuration
            template. The dictionary data structure may be nested. If the target
            config file is an ini file the nested keys in the ``config_overrides``
            will be used as section headers.
      config_type:
        description:
          - A string value describing the target config type.
        choices:
          - ini
          - json
          - yaml
 Example task using the "config_template" module
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. code-block:: yaml
   - name: Run config template ini
     config_template:
       src: test.ini.j2
       dest: /tmp/test.ini
       config_overrides: {{ test_overrides }}
       config_type: ini
 Example overrides dictionary(hash)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. code-block:: yaml
   test_overrides:
     DEFAULT:
       new_item: 12345
 Original template  file "test.ini.j2"
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. code-block:: ini
   [DEFAULT]
   value1 = abc
   value2 = 123
 Rendered on disk file "/tmp/test.ini"
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. code-block:: ini
   [DEFAULT]
   value1 = abc
   value2 = 123
   new_item = 12345
 In this task the ``test.ini.j2`` file is a template which will be rendered and
 written to disk at ``/tmp/test.ini``. The **config_overrides** entry is a
 dictionary(hash) which allows a deployer to set arbitrary data as overrides to
 be written into the configuration file at run time. The **config_type** entry
 specifies the type of configuration file the module will be interacting with;
 available options are "yaml", "json", and "ini".
 Discovering Available Overrides
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 All of these options can be specified in any way that suits your deployment.
 In terms of ease of use and flexibility it's recommended that you define your
 overrides in a user variable file such as
 ``/etc/openstack_deploy/user_variables.yml``.
 The list of overrides available may be found by executing:
 .. code-block:: bash
    find . -name "main.yml" -exec grep '_.*_overrides:' {} \; \
        | grep -v "^#" \
        | sort -u
--- a/doc/source/draft-operations-guide/index.rst
+++ b/doc/source/draft-operations-guide/index.rst
@ -0,0 +1,18 @@
 ==================================
 OpenStack-Ansible operations guide
 ==================================
 This is a draft index page for the proposed OpenStack-Ansible
 operations guide.
 .. toctree::
   :maxdepth: 2
   ops-lxc-commands.rst
   ops-add-computehost.rst
   ops-remove-computehost.rst
   ops-galera.rst
   ops-tips.rst
   ops-troubleshooting.rst
   extending.rst
--- a/doc/source/draft-operations-guide/ops-add-computehost.rst
+++ b/doc/source/draft-operations-guide/ops-add-computehost.rst
@ -0,0 +1,29 @@
 =====================
 Adding a compute host
 =====================
 Use the following procedure to add a compute host to an operational
 cluster.
 #. Configure the host as a target host. See `Prepare target hosts
   <http://docs.openstack.org/developer/openstack-ansible/install-guide/targethosts.html>`_
   for more information.
 #. Edit the ``/etc/openstack_deploy/openstack_user_config.yml`` file and
   add the host to the ``compute_hosts`` stanza.
   If necessary, also modify the ``used_ips`` stanza.
 #. If the cluster is utilizing Telemetry/Metering (Ceilometer),
   edit the ``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the
   host to the ``metering-compute_hosts`` stanza.
 #. Run the following commands to add the host. Replace
   ``NEW_HOST_NAME`` with the name of the new host.
   .. code-block:: shell-session
       # cd /opt/openstack-ansible/playbooks
       # openstack-ansible setup-hosts.yml --limit NEW_HOST_NAME
       # openstack-ansible setup-openstack.yml --skip-tags nova-key-distribute --limit NEW_HOST_NAME
       # openstack-ansible setup-openstack.yml --tags nova-key --limit compute_hosts
--- a/doc/source/draft-operations-guide/ops-galera-recovery.rst
+++ b/doc/source/draft-operations-guide/ops-galera-recovery.rst
@ -0,0 +1,302 @@
 =======================
 Galera cluster recovery
 =======================
 Run the ``galera-bootstrap`` playbook to automatically recover
 a node or an entire environment. Run the ``galera install`` playbook
 using the ``galera-bootstrap``  tag to auto recover a node or an
 entire environment.
 #. Run the following Ansible command to show the failed nodes:
   .. code-block:: shell-session
       # openstack-ansible galera-install.yml --tags galera-bootstrap
 The cluster comes back online after completion of this command.
 Single-node failure
 ~~~~~~~~~~~~~~~~~~~
 If a single node fails, the other nodes maintain quorum and
 continue to process SQL requests.
 #. Run the following Ansible command to determine the failed node:
   .. code-block:: shell-session
       # ansible galera_container -m shell -a "mysql -h localhost \
       -e 'show status like \"%wsrep_cluster_%\";'"
       node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
       ERROR 2002 (HY000): Can't connect to local MySQL server through
       socket '/var/run/mysqld/mysqld.sock' (111)
       node2_galera_container-49a47d25 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     17
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node4_galera_container-76275635 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     17
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
   In this example, node 3 has failed.
 #. Restart MariaDB on the failed node and verify that it rejoins the
   cluster.
 #. If MariaDB fails to start, run the ``mysqld`` command and perform
   further analysis on the output. As a last resort, rebuild the container
   for the node.
 Multi-node failure
 ~~~~~~~~~~~~~~~~~~
 When all but one node fails, the remaining node cannot achieve quorum and
 stops processing SQL requests. In this situation, failed nodes that
 recover cannot join the cluster because it no longer exists.
 #. Run the following Ansible command to show the failed nodes:
   .. code-block:: shell-session
       # ansible galera_container -m shell -a "mysql \
       -h localhost -e 'show status like \"%wsrep_cluster_%\";'"
       node2_galera_container-49a47d25 | FAILED | rc=1 >>
       ERROR 2002 (HY000): Can't connect to local MySQL server
       through socket '/var/run/mysqld/mysqld.sock' (111)
       node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
       ERROR 2002 (HY000): Can't connect to local MySQL server
       through socket '/var/run/mysqld/mysqld.sock' (111)
       node4_galera_container-76275635 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     18446744073709551615
       wsrep_cluster_size        1
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      non-Primary
   In this example, nodes 2 and 3 have failed. The remaining operational
   server indicates ``non-Primary`` because it cannot achieve quorum.
 #. Run the following command to
   `rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
   the operational node into the cluster:
   .. code-block:: shell-session
       # mysql -e "SET GLOBAL wsrep_provider_options='pc.bootstrap=yes';"
       node4_galera_container-76275635 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     15
       wsrep_cluster_size        1
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
       ERROR 2002 (HY000): Can't connect to local MySQL server
       through socket '/var/run/mysqld/mysqld.sock' (111)
       node2_galera_container-49a47d25 | FAILED | rc=1 >>
       ERROR 2002 (HY000): Can't connect to local MySQL server
       through socket '/var/run/mysqld/mysqld.sock' (111)
   The remaining operational node becomes the primary node and begins
   processing SQL requests.
 #. Restart MariaDB on the failed nodes and verify that they rejoin the
   cluster:
   .. code-block:: shell-session
       # ansible galera_container -m shell -a "mysql \
       -h localhost -e 'show status like \"%wsrep_cluster_%\";'"
       node3_galera_container-3ea2cbd3 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     17
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node2_galera_container-49a47d25 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     17
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node4_galera_container-76275635 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     17
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
 #. If MariaDB fails to start on any of the failed nodes, run the
   ``mysqld`` command and perform further analysis on the output. As a
   last resort, rebuild the container for the node.
 Complete failure
 ~~~~~~~~~~~~~~~~
 Restore from backup if all of the nodes in a Galera cluster fail (do not
 shutdown gracefully). Run the following command to determine if all nodes in
 the cluster have failed:
 .. code-block:: shell-session
    # ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat"
    node3_galera_container-3ea2cbd3 | success | rc=0 >>
    # GALERA saved state
    version: 2.1
    uuid:    338b06b0-2948-11e4-9d06-bef42f6c52f1
    seqno:   -1
    cert_index:
    node2_galera_container-49a47d25 | success | rc=0 >>
    # GALERA saved state
    version: 2.1
    uuid:    338b06b0-2948-11e4-9d06-bef42f6c52f1
    seqno:   -1
    cert_index:
    node4_galera_container-76275635 | success | rc=0 >>
    # GALERA saved state
    version: 2.1
    uuid:    338b06b0-2948-11e4-9d06-bef42f6c52f1
    seqno:   -1
    cert_index:
 All the nodes have failed if ``mysqld`` is not running on any of the
 nodes and all of the nodes contain a ``seqno`` value of -1.
 If any single node has a positive ``seqno`` value, then that node can be
 used to restart the cluster. However, because there is no guarantee that
 each node has an identical copy of the data, we do not recommend to
 restart the cluster using the ``--wsrep-new-cluster`` command on one
 node.
 Rebuilding a container
 ~~~~~~~~~~~~~~~~~~~~~~
 Recovering from certain failures require rebuilding one or more containers.
 #. Disable the failed node on the load balancer.
   .. note::
      Do not rely on the load balancer health checks to disable the node.
      If the node is not disabled, the load balancer sends SQL requests
      to it before it rejoins the cluster and cause data inconsistencies.
 #. Destroy the container and remove MariaDB data stored outside
   of the container:
   .. code-block:: shell-session
       # lxc-stop -n node3_galera_container-3ea2cbd3
       # lxc-destroy -n node3_galera_container-3ea2cbd3
       # rm -rf /openstack/node3_galera_container-3ea2cbd3/*
   In this example, node 3 failed.
 #. Run the host setup playbook to rebuild the container on node 3:
   .. code-block:: shell-session
       # openstack-ansible setup-hosts.yml -l node3 \
       -l node3_galera_container-3ea2cbd3
   The playbook restarts all other containers on the node.
 #. Run the infrastructure playbook to configure the container
   specifically on node 3:
   .. code-block:: shell-session
       # openstack-ansible setup-infrastructure.yml \
       -l node3_galera_container-3ea2cbd3
   .. warning::
      The new container runs a single-node Galera cluster, which is a dangerous
      state because the environment contains more than one active database
      with potentially different data.
   .. code-block:: shell-session
       # ansible galera_container -m shell -a "mysql \
       -h localhost -e 'show status like \"%wsrep_cluster_%\";'"
       node3_galera_container-3ea2cbd3 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     1
       wsrep_cluster_size        1
       wsrep_cluster_state_uuid  da078d01-29e5-11e4-a051-03d896dbdb2d
       wsrep_cluster_status      Primary
       node2_galera_container-49a47d25 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     4
       wsrep_cluster_size        2
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node4_galera_container-76275635 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     4
       wsrep_cluster_size        2
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
 #. Restart MariaDB in the new container and verify that it rejoins the
   cluster.
   .. note::
      In larger deployments, it may take some time for the MariaDB daemon to
      start in the new container. It will be synchronizing data from the other
      MariaDB servers during this time. You can monitor the status during this
      process by tailing the ``/var/log/mysql_logs/galera_server_error.log``
      log file.
      Lines starting with ``WSREP_SST`` will appear during the sync process
      and you should see a line with ``WSREP: SST complete, seqno: <NUMBER>``
      if the sync was successful.
   .. code-block:: shell-session
       # ansible galera_container -m shell -a "mysql \
       -h localhost -e 'show status like \"%wsrep_cluster_%\";'"
       node2_galera_container-49a47d25 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     5
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node3_galera_container-3ea2cbd3 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     5
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node4_galera_container-76275635 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     5
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
 #. Enable the failed node on the load balancer.
--- a/doc/source/draft-operations-guide/ops-galera-remove.rst
+++ b/doc/source/draft-operations-guide/ops-galera-remove.rst
@ -0,0 +1,32 @@
 ==============
 Removing nodes
 ==============
 In the following example, all but one node was shut down gracefully:
 .. code-block:: shell-session
    # ansible galera_container -m shell -a "mysql -h localhost \
    -e 'show status like \"%wsrep_cluster_%\";'"
    node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
    ERROR 2002 (HY000): Can't connect to local MySQL server
    through socket '/var/run/mysqld/mysqld.sock' (2)
    node2_galera_container-49a47d25 | FAILED | rc=1 >>
    ERROR 2002 (HY000): Can't connect to local MySQL server
    through socket '/var/run/mysqld/mysqld.sock' (2)
    node4_galera_container-76275635 | success | rc=0 >>
    Variable_name             Value
    wsrep_cluster_conf_id     7
    wsrep_cluster_size        1
    wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
    wsrep_cluster_status      Primary
 Compare this example output with the output from the multi-node failure
 scenario where the remaining operational node is non-primary and stops
 processing SQL requests. Gracefully shutting down the MariaDB service on
 all but one node allows the remaining operational node to continue
 processing SQL requests. When gracefully shutting down multiple nodes,
 perform the actions sequentially to retain operation.
--- a/doc/source/draft-operations-guide/ops-galera-start.rst
+++ b/doc/source/draft-operations-guide/ops-galera-start.rst
@ -0,0 +1,88 @@
 ==================
 Starting a cluster
 ==================
 Gracefully shutting down all nodes destroys the cluster. Starting or
 restarting a cluster from zero nodes requires creating a new cluster on
 one of the nodes.
 #. Start a new cluster on the most advanced node.
   Check the ``seqno`` value in the ``grastate.dat`` file on all of the nodes:
   .. code-block:: shell-session
       # ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat"
       node2_galera_container-49a47d25 | success | rc=0 >>
       # GALERA saved state version: 2.1
       uuid:    338b06b0-2948-11e4-9d06-bef42f6c52f1
       seqno:   31
       cert_index:
       node3_galera_container-3ea2cbd3 | success | rc=0 >>
       # GALERA saved state version: 2.1
       uuid:    338b06b0-2948-11e4-9d06-bef42f6c52f1
       seqno:   31
       cert_index:
       node4_galera_container-76275635 | success | rc=0 >>
       # GALERA saved state version: 2.1
       uuid:    338b06b0-2948-11e4-9d06-bef42f6c52f1
       seqno:   31
       cert_index:
   In this example, all nodes in the cluster contain the same positive
   ``seqno`` values as they were synchronized just prior to
   graceful shutdown. If all ``seqno`` values are equal, any node can
   start the new cluster.
   .. code-block:: shell-session
       # /etc/init.d/mysql start --wsrep-new-cluster
   This command results in a cluster containing a single node. The
   ``wsrep_cluster_size`` value shows the number of nodes in the
   cluster.
   .. code-block:: shell-session
       node2_galera_container-49a47d25 | FAILED | rc=1 >>
       ERROR 2002 (HY000): Can't connect to local MySQL server
       through socket '/var/run/mysqld/mysqld.sock' (111)
       node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
       ERROR 2002 (HY000): Can't connect to local MySQL server
       through socket '/var/run/mysqld/mysqld.sock' (2)
       node4_galera_container-76275635 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     1
       wsrep_cluster_size        1
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
 #. Restart MariaDB on the other nodes and verify that they rejoin the
   cluster.
   .. code-block:: shell-session
       node2_galera_container-49a47d25 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     3
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node3_galera_container-3ea2cbd3 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     3
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
       node4_galera_container-76275635 | success | rc=0 >>
       Variable_name             Value
       wsrep_cluster_conf_id     3
       wsrep_cluster_size        3
       wsrep_cluster_state_uuid  338b06b0-2948-11e4-9d06-bef42f6c52f1
       wsrep_cluster_status      Primary
--- a/doc/source/draft-operations-guide/ops-galera.rst
+++ b/doc/source/draft-operations-guide/ops-galera.rst
@ -0,0 +1,18 @@
 ==========================
 Galera cluster maintenance
 ==========================
 .. toctree::
   ops-galera-remove.rst
   ops-galera-start.rst
   ops-galera-recovery.rst
 Routine maintenance includes gracefully adding or removing nodes from
 the cluster without impacting operation and also starting a cluster
 after gracefully shutting down all nodes.
 MySQL instances are restarted when creating a cluster, when adding a
 node, when the service is not running, or when changes are made to the
 ``/etc/mysql/my.cnf`` configuration file.
--- a/doc/source/draft-operations-guide/ops-lxc-commands.rst
+++ b/doc/source/draft-operations-guide/ops-lxc-commands.rst
@ -0,0 +1,38 @@
 ========================
 Linux Container commands
 ========================
 The following are some useful commands to manage LXC:
 -  List containers and summary information such as operational state and
   network configuration:
   .. code-block:: shell-session
       # lxc-ls --fancy
 -  Show container details including operational state, resource
   utilization, and ``veth`` pairs:
   .. code-block:: shell-session
       # lxc-info --name container_name
 -  Start a container:
   .. code-block:: shell-session
       # lxc-start --name container_name
 -  Attach to a container:
   .. code-block:: shell-session
       # lxc-attach --name container_name
 -  Stop a container:
   .. code-block:: shell-session
       # lxc-stop --name container_name
--- a/doc/source/draft-operations-guide/ops-remove-computehost.rst
+++ b/doc/source/draft-operations-guide/ops-remove-computehost.rst
@ -0,0 +1,49 @@
 =======================
 Removing a compute host
 =======================
 The `openstack-ansible-ops <https://git.openstack.org/cgit/openstack/openstack-ansible-ops>`_
 repository contains a playbook for removing a compute host from an
 OpenStack-Ansible (OSA) environment.
 To remove a compute host, follow the below procedure.
 .. note::
   This guide describes how to remove a compute node from an OSA environment
   completely. Perform these steps with caution, as the compute node will no
   longer be in service after the steps have been completed. This guide assumes
   that all data and instances have been properly migrated.
 #. Disable all OpenStack services running on the compute node.
   This can include, but is not limited to, the ``nova-compute`` service
   and the neutron agent service.
   .. note::
     Ensure this step is performed first
  .. code-block:: console
     # Run these commands on the compute node to be removed
     # stop nova-compute
     # stop neutron-linuxbridge-agent
 #. Clone the ``openstack-ansible-ops`` repository to your deployment host:
  .. code-block:: console
     $ git clone https://git.openstack.org/openstack/openstack-ansible-ops \
       /opt/openstack-ansible-ops
 #. Run the ``remove_compute_node.yml`` Ansible playbook with the
   ``node_to_be_removed`` user variable set:
  .. code-block:: console
     $ cd /opt/openstack-ansible-ops/ansible_tools/playbooks
     openstack-ansible remove_compute_node.yml \
     -e node_to_be_removed="<name-of-compute-host>"
 #. After the playbook completes, remove the compute node from the
   OpenStack-Ansible configuration file in
   ``/etc/openstack_deploy/openstack_user_config.yml``.
--- a/doc/source/draft-operations-guide/ops-tips.rst
+++ b/doc/source/draft-operations-guide/ops-tips.rst
@ -0,0 +1,38 @@
 ===============
 Tips and tricks
 ===============
 Ansible forks
 ~~~~~~~~~~~~~
 The default MaxSessions setting for the OpenSSH Daemon is 10. Each Ansible
 fork makes use of a Session. By default, Ansible sets the number of forks to
 5. However, you can increase the number of forks used in order to improve
 deployment performance in large environments.
 Note that more than 10 forks will cause issues for any playbooks
 which use ``delegate_to`` or ``local_action`` in the tasks. It is
 recommended that the number of forks are not raised when executing against the
 Control Plane, as this is where delegation is most often used.
 The number of forks used may be changed on a permanent basis by including
 the appropriate change to the ``ANSIBLE_FORKS`` in your ``.bashrc`` file.
 Alternatively it can be changed for a particular playbook execution by using
 the ``--forks`` CLI parameter. For example, the following executes the nova
 playbook against the control plane with 10 forks, then against the compute
 nodes with 50 forks.
 .. code-block:: shell-session
    # openstack-ansible --forks 10 os-nova-install.yml --limit compute_containers
    # openstack-ansible --forks 50 os-nova-install.yml --limit compute_hosts
 For more information about forks, please see the following references:
 * OpenStack-Ansible `Bug 1479812`_
 * Ansible `forks`_ entry for ansible.cfg
 * `Ansible Performance Tuning`_
 .. _Bug 1479812: https://bugs.launchpad.net/openstack-ansible/+bug/1479812
 .. _forks: http://docs.ansible.com/ansible/intro_configuration.html#forks
 .. _Ansible Performance Tuning: https://www.ansible.com/blog/ansible-performance-tuning
--- a/doc/source/draft-operations-guide/ops-troubleshooting.rst
+++ b/doc/source/draft-operations-guide/ops-troubleshooting.rst
@ -0,0 +1,125 @@
 ===============
 Troubleshooting
 ===============
 Host kernel upgrade from version 3.13
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Ubuntu kernel packages newer than version 3.13 contain a change in
 module naming from ``nf_conntrack`` to ``br_netfilter``. After
 upgrading the kernel, re-run the ``openstack-hosts-setup.yml``
 playbook against those hosts. See `OSA bug 157996`_ for more
 information.
 .. _OSA bug 157996: https://bugs.launchpad.net/openstack-ansible/+bug/1579963
 Container networking issues
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 All LXC containers on the host have two virtual Ethernet interfaces:
 * `eth0` in the container connects to `lxcbr0` on the host
 * `eth1` in the container connects to `br-mgmt` on the host
 .. note::
   Some containers, such as ``cinder``, ``glance``, ``neutron_agents``, and
   ``swift_proxy``, have more than two interfaces to support their
   functions.
 Predictable interface naming
 ----------------------------
 On the host, all virtual Ethernet devices are named based on their
 container as well as the name of the interface inside the container:
   .. code-block:: shell-session
      ${CONTAINER_UNIQUE_ID}_${NETWORK_DEVICE_NAME}
 As an example, an all-in-one (AIO) build might provide a utility
 container called `aio1_utility_container-d13b7132`. That container
 will have two network interfaces: `d13b7132_eth0` and `d13b7132_eth1`.
 Another option would be to use the LXC tools to retrieve information
 about the utility container:
   .. code-block:: shell-session
      # lxc-info -n aio1_utility_container-d13b7132
      Name:           aio1_utility_container-d13b7132
      State:          RUNNING
      PID:            8245
      IP:             10.0.3.201
      IP:             172.29.237.204
      CPU use:        79.18 seconds
      BlkIO use:      678.26 MiB
      Memory use:     613.33 MiB
      KMem use:       0 bytes
      Link:           d13b7132_eth0
       TX bytes:      743.48 KiB
       RX bytes:      88.78 MiB
       Total bytes:   89.51 MiB
      Link:           d13b7132_eth1
       TX bytes:      412.42 KiB
       RX bytes:      17.32 MiB
       Total bytes:   17.73 MiB
 The ``Link:`` lines will show the network interfaces that are attached
 to the utility container.
 Reviewing container networking traffic
 --------------------------------------
 To dump traffic on the ``br-mgmt`` bridge, use ``tcpdump`` to see all
 communications between the various containers. To narrow the focus,
 run ``tcpdump`` only on the desired network interface of the
 containers.
 Cached Ansible facts issues
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 At the beginning of a playbook run, information about each host is gathered.
 Examples of the information gathered are:
    * Linux distribution
    * Kernel version
    * Network interfaces
 To improve performance, particularly in large deployments, you can
 cache host facts and information.
 OpenStack-Ansible enables fact caching by default. The facts are
 cached in JSON files within ``/etc/openstack_deploy/ansible_facts``.
 Fact caching can be disabled by commenting out the ``fact_caching``
 parameter in ``playbooks/ansible.cfg``. Refer to the Ansible
 documentation on `fact caching`_ for more details.
 .. _fact caching: http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching
 Forcing regeneration of cached facts
 ------------------------------------
 Cached facts may be incorrect if the host receives a kernel upgrade or new
 network interfaces. Newly created bridges also disrupt cache facts.
 This can lead to unexpected errors while running playbooks, and
 require that the cached facts be regenerated.
 Run the following command to remove all currently cached facts for all hosts:
 .. code-block:: shell-session
   # rm /etc/openstack_deploy/ansible_facts/*
 New facts will be gathered and cached during the next playbook run.
 To clear facts for a single host, find its file within
 ``/etc/openstack_deploy/ansible_facts/`` and remove it. Each host has
 a JSON file that is named after its hostname. The facts for that host
 will be regenerated on the next playbook run.