[DOCS] Creating new folder for proposed operations guide

Moves pre-existing operations content to new folder

Change-Id: I5c177dda2bba47e835fbd77cd63df3b52864c4d4
Implements: blueprint create-ops-guide
This commit is contained in:
Alexandra Settle 2016-11-16 16:16:23 +00:00
parent 3070530e88
commit a7f25d8162
11 changed files with 1043 additions and 0 deletions

View File

@ -0,0 +1,306 @@
===========================
Extending OpenStack-Ansible
===========================
The OpenStack-Ansible project provides a basic OpenStack environment, but
many deployers will wish to extend the environment based on their needs. This
could include installing extra services, changing package versions, or
overriding existing variables.
Using these extension points, deployers can provide a more 'opinionated'
installation of OpenStack that may include their own software.
Including OpenStack-Ansible in your project
-------------------------------------------
Including the openstack-ansible repository within another project can be
done in several ways.
1. A git submodule pointed to a released tag.
2. A script to automatically perform a git checkout of
openstack-ansible
When including OpenStack-Ansible in a project, consider using a parallel
directory structure as shown in the `ansible.cfg files`_ section.
Also note that copying files into directories such as `env.d`_ or
`conf.d`_ should be handled via some sort of script within the extension
project.
ansible.cfg files
-----------------
You can create your own playbook, variable, and role structure while still
including the OpenStack-Ansible roles and libraries by putting an
``ansible.cfg`` file in your ``playbooks`` directory.
The relevant options for Ansible 1.9 (included in OpenStack-Ansible)
are as follows:
``library``
This variable should point to
``openstack-ansible/playbooks/library``. Doing so allows roles and
playbooks to access OpenStack-Ansible's included Ansible modules.
``roles_path``
This variable should point to
``openstack-ansible/playbooks/roles``. This allows Ansible to
properly look up any OpenStack-Ansible roles that extension roles
may reference.
``inventory``
This variable should point to
``openstack-ansible/playbooks/inventory``. With this setting,
extensions have access to the same dynamic inventory that
OpenStack-Ansible uses.
Note that the paths to the ``openstack-ansible`` top level directory can be
relative in this file.
Consider this directory structure::
my_project
|
|- custom_stuff
| |
| |- playbooks
|- openstack-ansible
| |
| |- playbooks
The variables in ``my_project/custom_stuff/playbooks/ansible.cfg`` would use
``../openstack-ansible/playbooks/<directory>``.
env.d
-----
The ``/etc/openstack_deploy/env.d`` directory sources all YAML files into the
deployed environment, allowing a deployer to define additional group mappings.
This directory is used to extend the environment skeleton, or modify the
defaults defined in the ``playbooks/inventory/env.d`` directory.
See also `Understanding Container Groups`_ in Appendix C.
.. _Understanding Container Groups: ../install-guide/app-custom-layouts.html#understanding-container-groups
conf.d
------
Common OpenStack services and their configuration are defined by
OpenStack-Ansible in the
``/etc/openstack_deploy/openstack_user_config.yml`` settings file.
Additional services should be defined with a YAML file in
``/etc/openstack_deploy/conf.d``, in order to manage file size.
See also `Understanding Host Groups`_ in Appendix C.
.. _Understanding Host Groups: ../install-guide/app-custom-layouts.html#understanding-host-groups
user\_*.yml files
-----------------
Files in ``/etc/openstack_deploy`` beginning with ``user_`` will be
automatically sourced in any ``openstack-ansible`` command. Alternatively,
the files can be sourced with the ``-e`` parameter of the ``ansible-playbook``
command.
``user_variables.yml`` and ``user_secrets.yml`` are used directly by
OpenStack-Ansible. Adding custom variables used by your own roles and
playbooks to these files is not recommended. Doing so will complicate your
upgrade path by making comparison of your existing files with later versions
of these files more arduous. Rather, recommended practice is to place your own
variables in files named following the ``user_*.yml`` pattern so they will be
sourced alongside those used exclusively by OpenStack-Ansible.
Ordering and Precedence
+++++++++++++++++++++++
``user_*.yml`` variables are just YAML variable files. They will be sourced
in alphanumeric order by ``openstack-ansible``.
.. _adding-galaxy-roles:
Adding Galaxy roles
-------------------
Any roles defined in ``openstack-ansible/ansible-role-requirements.yml``
will be installed by the
``openstack-ansible/scripts/bootstrap-ansible.sh`` script.
Setting overrides in configuration files
----------------------------------------
All of the services that use YAML, JSON, or INI for configuration can receive
overrides through the use of a Ansible action plugin named ``config_template``.
The configuration template engine allows a deployer to use a simple dictionary
to modify or add items into configuration files at run time that may not have a
preset template option. All OpenStack-Ansible roles allow for this
functionality where applicable. Files available to receive overrides can be
seen in the ``defaults/main.yml`` file as standard empty dictionaries (hashes).
Practical guidance for using this feature is available in the `Install Guide`_.
This module has been `submitted for consideration`_ into Ansible Core.
.. _Install Guide: ../install-guide/app-advanced-config-override.html
.. _submitted for consideration: https://github.com/ansible/ansible/pull/12555
Build the environment with additional python packages
+++++++++++++++++++++++++++++++++++++++++++++++++++++
The system will allow you to install and build any package that is a python
installable. The repository infrastructure will look for and create any
git based or PyPi installable package. When the package is built the repo-build
role will create the sources as Python wheels to extend the base system and
requirements.
While the packages pre-built in the repository-infrastructure are
comprehensive, it may be needed to change the source locations and versions of
packages to suit different deployment needs. Adding additional repositories as
overrides is as simple as listing entries within the variable file of your
choice. Any ``user_.*.yml`` file within the "/etc/openstack_deployment"
directory will work to facilitate the addition of a new packages.
.. code-block:: yaml
swift_git_repo: https://private-git.example.org/example-org/swift
swift_git_install_branch: master
Additional lists of python packages can also be overridden using a
``user_.*.yml`` variable file.
.. code-block:: yaml
swift_requires_pip_packages:
- virtualenv
- virtualenv-tools
- python-keystoneclient
- NEW-SPECIAL-PACKAGE
Once the variables are set call the play ``repo-build.yml`` to build all of the
wheels within the repository infrastructure. When ready run the target plays to
deploy your overridden source code.
Module documentation
++++++++++++++++++++
These are the options available as found within the virtual module
documentation section.
.. code-block:: yaml
module: config_template
version_added: 1.9.2
short_description: >
Renders template files providing a create/update override interface
description:
- The module contains the template functionality with the ability to
override items in config, in transit, through the use of a simple
dictionary without having to write out various temp files on target
machines. The module renders all of the potential jinja a user could
provide in both the template file and in the override dictionary which
is ideal for deployers who may have lots of different configs using a
similar code base.
- The module is an extension of the **copy** module and all of attributes
that can be set there are available to be set here.
options:
src:
description:
- Path of a Jinja2 formatted template on the local server. This can
be a relative or absolute path.
required: true
default: null
dest:
description:
- Location to render the template to on the remote machine.
required: true
default: null
config_overrides:
description:
- A dictionary used to update or override items within a configuration
template. The dictionary data structure may be nested. If the target
config file is an ini file the nested keys in the ``config_overrides``
will be used as section headers.
config_type:
description:
- A string value describing the target config type.
choices:
- ini
- json
- yaml
Example task using the "config_template" module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
- name: Run config template ini
config_template:
src: test.ini.j2
dest: /tmp/test.ini
config_overrides: {{ test_overrides }}
config_type: ini
Example overrides dictionary(hash)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
test_overrides:
DEFAULT:
new_item: 12345
Original template file "test.ini.j2"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: ini
[DEFAULT]
value1 = abc
value2 = 123
Rendered on disk file "/tmp/test.ini"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: ini
[DEFAULT]
value1 = abc
value2 = 123
new_item = 12345
In this task the ``test.ini.j2`` file is a template which will be rendered and
written to disk at ``/tmp/test.ini``. The **config_overrides** entry is a
dictionary(hash) which allows a deployer to set arbitrary data as overrides to
be written into the configuration file at run time. The **config_type** entry
specifies the type of configuration file the module will be interacting with;
available options are "yaml", "json", and "ini".
Discovering Available Overrides
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
All of these options can be specified in any way that suits your deployment.
In terms of ease of use and flexibility it's recommended that you define your
overrides in a user variable file such as
``/etc/openstack_deploy/user_variables.yml``.
The list of overrides available may be found by executing:
.. code-block:: bash
find . -name "main.yml" -exec grep '_.*_overrides:' {} \; \
| grep -v "^#" \
| sort -u

View File

@ -0,0 +1,18 @@
==================================
OpenStack-Ansible operations guide
==================================
This is a draft index page for the proposed OpenStack-Ansible
operations guide.
.. toctree::
:maxdepth: 2
ops-lxc-commands.rst
ops-add-computehost.rst
ops-remove-computehost.rst
ops-galera.rst
ops-tips.rst
ops-troubleshooting.rst
extending.rst

View File

@ -0,0 +1,29 @@
=====================
Adding a compute host
=====================
Use the following procedure to add a compute host to an operational
cluster.
#. Configure the host as a target host. See `Prepare target hosts
<http://docs.openstack.org/developer/openstack-ansible/install-guide/targethosts.html>`_
for more information.
#. Edit the ``/etc/openstack_deploy/openstack_user_config.yml`` file and
add the host to the ``compute_hosts`` stanza.
If necessary, also modify the ``used_ips`` stanza.
#. If the cluster is utilizing Telemetry/Metering (Ceilometer),
edit the ``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the
host to the ``metering-compute_hosts`` stanza.
#. Run the following commands to add the host. Replace
``NEW_HOST_NAME`` with the name of the new host.
.. code-block:: shell-session
# cd /opt/openstack-ansible/playbooks
# openstack-ansible setup-hosts.yml --limit NEW_HOST_NAME
# openstack-ansible setup-openstack.yml --skip-tags nova-key-distribute --limit NEW_HOST_NAME
# openstack-ansible setup-openstack.yml --tags nova-key --limit compute_hosts

View File

@ -0,0 +1,302 @@
=======================
Galera cluster recovery
=======================
Run the ``galera-bootstrap`` playbook to automatically recover
a node or an entire environment. Run the ``galera install`` playbook
using the ``galera-bootstrap`` tag to auto recover a node or an
entire environment.
#. Run the following Ansible command to show the failed nodes:
.. code-block:: shell-session
# openstack-ansible galera-install.yml --tags galera-bootstrap
The cluster comes back online after completion of this command.
Single-node failure
~~~~~~~~~~~~~~~~~~~
If a single node fails, the other nodes maintain quorum and
continue to process SQL requests.
#. Run the following Ansible command to determine the failed node:
.. code-block:: shell-session
# ansible galera_container -m shell -a "mysql -h localhost \
-e 'show status like \"%wsrep_cluster_%\";'"
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server through
socket '/var/run/mysqld/mysqld.sock' (111)
node2_galera_container-49a47d25 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 17
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 17
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
In this example, node 3 has failed.
#. Restart MariaDB on the failed node and verify that it rejoins the
cluster.
#. If MariaDB fails to start, run the ``mysqld`` command and perform
further analysis on the output. As a last resort, rebuild the container
for the node.
Multi-node failure
~~~~~~~~~~~~~~~~~~
When all but one node fails, the remaining node cannot achieve quorum and
stops processing SQL requests. In this situation, failed nodes that
recover cannot join the cluster because it no longer exists.
#. Run the following Ansible command to show the failed nodes:
.. code-block:: shell-session
# ansible galera_container -m shell -a "mysql \
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
node2_galera_container-49a47d25 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server
through socket '/var/run/mysqld/mysqld.sock' (111)
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server
through socket '/var/run/mysqld/mysqld.sock' (111)
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 18446744073709551615
wsrep_cluster_size 1
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status non-Primary
In this example, nodes 2 and 3 have failed. The remaining operational
server indicates ``non-Primary`` because it cannot achieve quorum.
#. Run the following command to
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
the operational node into the cluster:
.. code-block:: shell-session
# mysql -e "SET GLOBAL wsrep_provider_options='pc.bootstrap=yes';"
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 15
wsrep_cluster_size 1
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server
through socket '/var/run/mysqld/mysqld.sock' (111)
node2_galera_container-49a47d25 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server
through socket '/var/run/mysqld/mysqld.sock' (111)
The remaining operational node becomes the primary node and begins
processing SQL requests.
#. Restart MariaDB on the failed nodes and verify that they rejoin the
cluster:
.. code-block:: shell-session
# ansible galera_container -m shell -a "mysql \
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
node3_galera_container-3ea2cbd3 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 17
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node2_galera_container-49a47d25 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 17
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 17
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
#. If MariaDB fails to start on any of the failed nodes, run the
``mysqld`` command and perform further analysis on the output. As a
last resort, rebuild the container for the node.
Complete failure
~~~~~~~~~~~~~~~~
Restore from backup if all of the nodes in a Galera cluster fail (do not
shutdown gracefully). Run the following command to determine if all nodes in
the cluster have failed:
.. code-block:: shell-session
# ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat"
node3_galera_container-3ea2cbd3 | success | rc=0 >>
# GALERA saved state
version: 2.1
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
seqno: -1
cert_index:
node2_galera_container-49a47d25 | success | rc=0 >>
# GALERA saved state
version: 2.1
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
seqno: -1
cert_index:
node4_galera_container-76275635 | success | rc=0 >>
# GALERA saved state
version: 2.1
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
seqno: -1
cert_index:
All the nodes have failed if ``mysqld`` is not running on any of the
nodes and all of the nodes contain a ``seqno`` value of -1.
If any single node has a positive ``seqno`` value, then that node can be
used to restart the cluster. However, because there is no guarantee that
each node has an identical copy of the data, we do not recommend to
restart the cluster using the ``--wsrep-new-cluster`` command on one
node.
Rebuilding a container
~~~~~~~~~~~~~~~~~~~~~~
Recovering from certain failures require rebuilding one or more containers.
#. Disable the failed node on the load balancer.
.. note::
Do not rely on the load balancer health checks to disable the node.
If the node is not disabled, the load balancer sends SQL requests
to it before it rejoins the cluster and cause data inconsistencies.
#. Destroy the container and remove MariaDB data stored outside
of the container:
.. code-block:: shell-session
# lxc-stop -n node3_galera_container-3ea2cbd3
# lxc-destroy -n node3_galera_container-3ea2cbd3
# rm -rf /openstack/node3_galera_container-3ea2cbd3/*
In this example, node 3 failed.
#. Run the host setup playbook to rebuild the container on node 3:
.. code-block:: shell-session
# openstack-ansible setup-hosts.yml -l node3 \
-l node3_galera_container-3ea2cbd3
The playbook restarts all other containers on the node.
#. Run the infrastructure playbook to configure the container
specifically on node 3:
.. code-block:: shell-session
# openstack-ansible setup-infrastructure.yml \
-l node3_galera_container-3ea2cbd3
.. warning::
The new container runs a single-node Galera cluster, which is a dangerous
state because the environment contains more than one active database
with potentially different data.
.. code-block:: shell-session
# ansible galera_container -m shell -a "mysql \
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
node3_galera_container-3ea2cbd3 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 1
wsrep_cluster_size 1
wsrep_cluster_state_uuid da078d01-29e5-11e4-a051-03d896dbdb2d
wsrep_cluster_status Primary
node2_galera_container-49a47d25 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 4
wsrep_cluster_size 2
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 4
wsrep_cluster_size 2
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
#. Restart MariaDB in the new container and verify that it rejoins the
cluster.
.. note::
In larger deployments, it may take some time for the MariaDB daemon to
start in the new container. It will be synchronizing data from the other
MariaDB servers during this time. You can monitor the status during this
process by tailing the ``/var/log/mysql_logs/galera_server_error.log``
log file.
Lines starting with ``WSREP_SST`` will appear during the sync process
and you should see a line with ``WSREP: SST complete, seqno: <NUMBER>``
if the sync was successful.
.. code-block:: shell-session
# ansible galera_container -m shell -a "mysql \
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
node2_galera_container-49a47d25 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 5
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node3_galera_container-3ea2cbd3 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 5
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 5
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
#. Enable the failed node on the load balancer.

View File

@ -0,0 +1,32 @@
==============
Removing nodes
==============
In the following example, all but one node was shut down gracefully:
.. code-block:: shell-session
# ansible galera_container -m shell -a "mysql -h localhost \
-e 'show status like \"%wsrep_cluster_%\";'"
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server
through socket '/var/run/mysqld/mysqld.sock' (2)
node2_galera_container-49a47d25 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server
through socket '/var/run/mysqld/mysqld.sock' (2)
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 7
wsrep_cluster_size 1
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
Compare this example output with the output from the multi-node failure
scenario where the remaining operational node is non-primary and stops
processing SQL requests. Gracefully shutting down the MariaDB service on
all but one node allows the remaining operational node to continue
processing SQL requests. When gracefully shutting down multiple nodes,
perform the actions sequentially to retain operation.

View File

@ -0,0 +1,88 @@
==================
Starting a cluster
==================
Gracefully shutting down all nodes destroys the cluster. Starting or
restarting a cluster from zero nodes requires creating a new cluster on
one of the nodes.
#. Start a new cluster on the most advanced node.
Check the ``seqno`` value in the ``grastate.dat`` file on all of the nodes:
.. code-block:: shell-session
# ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat"
node2_galera_container-49a47d25 | success | rc=0 >>
# GALERA saved state version: 2.1
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
seqno: 31
cert_index:
node3_galera_container-3ea2cbd3 | success | rc=0 >>
# GALERA saved state version: 2.1
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
seqno: 31
cert_index:
node4_galera_container-76275635 | success | rc=0 >>
# GALERA saved state version: 2.1
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
seqno: 31
cert_index:
In this example, all nodes in the cluster contain the same positive
``seqno`` values as they were synchronized just prior to
graceful shutdown. If all ``seqno`` values are equal, any node can
start the new cluster.
.. code-block:: shell-session
# /etc/init.d/mysql start --wsrep-new-cluster
This command results in a cluster containing a single node. The
``wsrep_cluster_size`` value shows the number of nodes in the
cluster.
.. code-block:: shell-session
node2_galera_container-49a47d25 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server
through socket '/var/run/mysqld/mysqld.sock' (111)
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
ERROR 2002 (HY000): Can't connect to local MySQL server
through socket '/var/run/mysqld/mysqld.sock' (2)
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 1
wsrep_cluster_size 1
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
#. Restart MariaDB on the other nodes and verify that they rejoin the
cluster.
.. code-block:: shell-session
node2_galera_container-49a47d25 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 3
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node3_galera_container-3ea2cbd3 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 3
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary
node4_galera_container-76275635 | success | rc=0 >>
Variable_name Value
wsrep_cluster_conf_id 3
wsrep_cluster_size 3
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
wsrep_cluster_status Primary

View File

@ -0,0 +1,18 @@
==========================
Galera cluster maintenance
==========================
.. toctree::
ops-galera-remove.rst
ops-galera-start.rst
ops-galera-recovery.rst
Routine maintenance includes gracefully adding or removing nodes from
the cluster without impacting operation and also starting a cluster
after gracefully shutting down all nodes.
MySQL instances are restarted when creating a cluster, when adding a
node, when the service is not running, or when changes are made to the
``/etc/mysql/my.cnf`` configuration file.

View File

@ -0,0 +1,38 @@
========================
Linux Container commands
========================
The following are some useful commands to manage LXC:
- List containers and summary information such as operational state and
network configuration:
.. code-block:: shell-session
# lxc-ls --fancy
- Show container details including operational state, resource
utilization, and ``veth`` pairs:
.. code-block:: shell-session
# lxc-info --name container_name
- Start a container:
.. code-block:: shell-session
# lxc-start --name container_name
- Attach to a container:
.. code-block:: shell-session
# lxc-attach --name container_name
- Stop a container:
.. code-block:: shell-session
# lxc-stop --name container_name

View File

@ -0,0 +1,49 @@
=======================
Removing a compute host
=======================
The `openstack-ansible-ops <https://git.openstack.org/cgit/openstack/openstack-ansible-ops>`_
repository contains a playbook for removing a compute host from an
OpenStack-Ansible (OSA) environment.
To remove a compute host, follow the below procedure.
.. note::
This guide describes how to remove a compute node from an OSA environment
completely. Perform these steps with caution, as the compute node will no
longer be in service after the steps have been completed. This guide assumes
that all data and instances have been properly migrated.
#. Disable all OpenStack services running on the compute node.
This can include, but is not limited to, the ``nova-compute`` service
and the neutron agent service.
.. note::
Ensure this step is performed first
.. code-block:: console
# Run these commands on the compute node to be removed
# stop nova-compute
# stop neutron-linuxbridge-agent
#. Clone the ``openstack-ansible-ops`` repository to your deployment host:
.. code-block:: console
$ git clone https://git.openstack.org/openstack/openstack-ansible-ops \
/opt/openstack-ansible-ops
#. Run the ``remove_compute_node.yml`` Ansible playbook with the
``node_to_be_removed`` user variable set:
.. code-block:: console
$ cd /opt/openstack-ansible-ops/ansible_tools/playbooks
openstack-ansible remove_compute_node.yml \
-e node_to_be_removed="<name-of-compute-host>"
#. After the playbook completes, remove the compute node from the
OpenStack-Ansible configuration file in
``/etc/openstack_deploy/openstack_user_config.yml``.

View File

@ -0,0 +1,38 @@
===============
Tips and tricks
===============
Ansible forks
~~~~~~~~~~~~~
The default MaxSessions setting for the OpenSSH Daemon is 10. Each Ansible
fork makes use of a Session. By default, Ansible sets the number of forks to
5. However, you can increase the number of forks used in order to improve
deployment performance in large environments.
Note that more than 10 forks will cause issues for any playbooks
which use ``delegate_to`` or ``local_action`` in the tasks. It is
recommended that the number of forks are not raised when executing against the
Control Plane, as this is where delegation is most often used.
The number of forks used may be changed on a permanent basis by including
the appropriate change to the ``ANSIBLE_FORKS`` in your ``.bashrc`` file.
Alternatively it can be changed for a particular playbook execution by using
the ``--forks`` CLI parameter. For example, the following executes the nova
playbook against the control plane with 10 forks, then against the compute
nodes with 50 forks.
.. code-block:: shell-session
# openstack-ansible --forks 10 os-nova-install.yml --limit compute_containers
# openstack-ansible --forks 50 os-nova-install.yml --limit compute_hosts
For more information about forks, please see the following references:
* OpenStack-Ansible `Bug 1479812`_
* Ansible `forks`_ entry for ansible.cfg
* `Ansible Performance Tuning`_
.. _Bug 1479812: https://bugs.launchpad.net/openstack-ansible/+bug/1479812
.. _forks: http://docs.ansible.com/ansible/intro_configuration.html#forks
.. _Ansible Performance Tuning: https://www.ansible.com/blog/ansible-performance-tuning

View File

@ -0,0 +1,125 @@
===============
Troubleshooting
===============
Host kernel upgrade from version 3.13
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ubuntu kernel packages newer than version 3.13 contain a change in
module naming from ``nf_conntrack`` to ``br_netfilter``. After
upgrading the kernel, re-run the ``openstack-hosts-setup.yml``
playbook against those hosts. See `OSA bug 157996`_ for more
information.
.. _OSA bug 157996: https://bugs.launchpad.net/openstack-ansible/+bug/1579963
Container networking issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~
All LXC containers on the host have two virtual Ethernet interfaces:
* `eth0` in the container connects to `lxcbr0` on the host
* `eth1` in the container connects to `br-mgmt` on the host
.. note::
Some containers, such as ``cinder``, ``glance``, ``neutron_agents``, and
``swift_proxy``, have more than two interfaces to support their
functions.
Predictable interface naming
----------------------------
On the host, all virtual Ethernet devices are named based on their
container as well as the name of the interface inside the container:
.. code-block:: shell-session
${CONTAINER_UNIQUE_ID}_${NETWORK_DEVICE_NAME}
As an example, an all-in-one (AIO) build might provide a utility
container called `aio1_utility_container-d13b7132`. That container
will have two network interfaces: `d13b7132_eth0` and `d13b7132_eth1`.
Another option would be to use the LXC tools to retrieve information
about the utility container:
.. code-block:: shell-session
# lxc-info -n aio1_utility_container-d13b7132
Name: aio1_utility_container-d13b7132
State: RUNNING
PID: 8245
IP: 10.0.3.201
IP: 172.29.237.204
CPU use: 79.18 seconds
BlkIO use: 678.26 MiB
Memory use: 613.33 MiB
KMem use: 0 bytes
Link: d13b7132_eth0
TX bytes: 743.48 KiB
RX bytes: 88.78 MiB
Total bytes: 89.51 MiB
Link: d13b7132_eth1
TX bytes: 412.42 KiB
RX bytes: 17.32 MiB
Total bytes: 17.73 MiB
The ``Link:`` lines will show the network interfaces that are attached
to the utility container.
Reviewing container networking traffic
--------------------------------------
To dump traffic on the ``br-mgmt`` bridge, use ``tcpdump`` to see all
communications between the various containers. To narrow the focus,
run ``tcpdump`` only on the desired network interface of the
containers.
Cached Ansible facts issues
~~~~~~~~~~~~~~~~~~~~~~~~~~~
At the beginning of a playbook run, information about each host is gathered.
Examples of the information gathered are:
* Linux distribution
* Kernel version
* Network interfaces
To improve performance, particularly in large deployments, you can
cache host facts and information.
OpenStack-Ansible enables fact caching by default. The facts are
cached in JSON files within ``/etc/openstack_deploy/ansible_facts``.
Fact caching can be disabled by commenting out the ``fact_caching``
parameter in ``playbooks/ansible.cfg``. Refer to the Ansible
documentation on `fact caching`_ for more details.
.. _fact caching: http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching
Forcing regeneration of cached facts
------------------------------------
Cached facts may be incorrect if the host receives a kernel upgrade or new
network interfaces. Newly created bridges also disrupt cache facts.
This can lead to unexpected errors while running playbooks, and
require that the cached facts be regenerated.
Run the following command to remove all currently cached facts for all hosts:
.. code-block:: shell-session
# rm /etc/openstack_deploy/ansible_facts/*
New facts will be gathered and cached during the next playbook run.
To clear facts for a single host, find its file within
``/etc/openstack_deploy/ansible_facts/`` and remove it. Each host has
a JSON file that is named after its hostname. The facts for that host
will be regenerated on the next playbook run.