[DOCS] Creating new folder for proposed operations guide
Moves pre-existing operations content to new folder Change-Id: I5c177dda2bba47e835fbd77cd63df3b52864c4d4 Implements: blueprint create-ops-guide
This commit is contained in:
parent
3070530e88
commit
a7f25d8162
306
doc/source/draft-operations-guide/extending.rst
Normal file
306
doc/source/draft-operations-guide/extending.rst
Normal file
@ -0,0 +1,306 @@
|
||||
===========================
|
||||
Extending OpenStack-Ansible
|
||||
===========================
|
||||
|
||||
The OpenStack-Ansible project provides a basic OpenStack environment, but
|
||||
many deployers will wish to extend the environment based on their needs. This
|
||||
could include installing extra services, changing package versions, or
|
||||
overriding existing variables.
|
||||
|
||||
Using these extension points, deployers can provide a more 'opinionated'
|
||||
installation of OpenStack that may include their own software.
|
||||
|
||||
Including OpenStack-Ansible in your project
|
||||
-------------------------------------------
|
||||
|
||||
Including the openstack-ansible repository within another project can be
|
||||
done in several ways.
|
||||
|
||||
1. A git submodule pointed to a released tag.
|
||||
2. A script to automatically perform a git checkout of
|
||||
openstack-ansible
|
||||
|
||||
When including OpenStack-Ansible in a project, consider using a parallel
|
||||
directory structure as shown in the `ansible.cfg files`_ section.
|
||||
|
||||
Also note that copying files into directories such as `env.d`_ or
|
||||
`conf.d`_ should be handled via some sort of script within the extension
|
||||
project.
|
||||
|
||||
ansible.cfg files
|
||||
-----------------
|
||||
|
||||
You can create your own playbook, variable, and role structure while still
|
||||
including the OpenStack-Ansible roles and libraries by putting an
|
||||
``ansible.cfg`` file in your ``playbooks`` directory.
|
||||
|
||||
The relevant options for Ansible 1.9 (included in OpenStack-Ansible)
|
||||
are as follows:
|
||||
|
||||
``library``
|
||||
This variable should point to
|
||||
``openstack-ansible/playbooks/library``. Doing so allows roles and
|
||||
playbooks to access OpenStack-Ansible's included Ansible modules.
|
||||
``roles_path``
|
||||
This variable should point to
|
||||
``openstack-ansible/playbooks/roles``. This allows Ansible to
|
||||
properly look up any OpenStack-Ansible roles that extension roles
|
||||
may reference.
|
||||
``inventory``
|
||||
This variable should point to
|
||||
``openstack-ansible/playbooks/inventory``. With this setting,
|
||||
extensions have access to the same dynamic inventory that
|
||||
OpenStack-Ansible uses.
|
||||
|
||||
Note that the paths to the ``openstack-ansible`` top level directory can be
|
||||
relative in this file.
|
||||
|
||||
Consider this directory structure::
|
||||
|
||||
my_project
|
||||
|
|
||||
|- custom_stuff
|
||||
| |
|
||||
| |- playbooks
|
||||
|- openstack-ansible
|
||||
| |
|
||||
| |- playbooks
|
||||
|
||||
The variables in ``my_project/custom_stuff/playbooks/ansible.cfg`` would use
|
||||
``../openstack-ansible/playbooks/<directory>``.
|
||||
|
||||
|
||||
env.d
|
||||
-----
|
||||
|
||||
The ``/etc/openstack_deploy/env.d`` directory sources all YAML files into the
|
||||
deployed environment, allowing a deployer to define additional group mappings.
|
||||
|
||||
This directory is used to extend the environment skeleton, or modify the
|
||||
defaults defined in the ``playbooks/inventory/env.d`` directory.
|
||||
|
||||
See also `Understanding Container Groups`_ in Appendix C.
|
||||
|
||||
.. _Understanding Container Groups: ../install-guide/app-custom-layouts.html#understanding-container-groups
|
||||
|
||||
conf.d
|
||||
------
|
||||
|
||||
Common OpenStack services and their configuration are defined by
|
||||
OpenStack-Ansible in the
|
||||
``/etc/openstack_deploy/openstack_user_config.yml`` settings file.
|
||||
|
||||
Additional services should be defined with a YAML file in
|
||||
``/etc/openstack_deploy/conf.d``, in order to manage file size.
|
||||
|
||||
See also `Understanding Host Groups`_ in Appendix C.
|
||||
|
||||
.. _Understanding Host Groups: ../install-guide/app-custom-layouts.html#understanding-host-groups
|
||||
|
||||
user\_*.yml files
|
||||
-----------------
|
||||
|
||||
Files in ``/etc/openstack_deploy`` beginning with ``user_`` will be
|
||||
automatically sourced in any ``openstack-ansible`` command. Alternatively,
|
||||
the files can be sourced with the ``-e`` parameter of the ``ansible-playbook``
|
||||
command.
|
||||
|
||||
``user_variables.yml`` and ``user_secrets.yml`` are used directly by
|
||||
OpenStack-Ansible. Adding custom variables used by your own roles and
|
||||
playbooks to these files is not recommended. Doing so will complicate your
|
||||
upgrade path by making comparison of your existing files with later versions
|
||||
of these files more arduous. Rather, recommended practice is to place your own
|
||||
variables in files named following the ``user_*.yml`` pattern so they will be
|
||||
sourced alongside those used exclusively by OpenStack-Ansible.
|
||||
|
||||
Ordering and Precedence
|
||||
+++++++++++++++++++++++
|
||||
|
||||
``user_*.yml`` variables are just YAML variable files. They will be sourced
|
||||
in alphanumeric order by ``openstack-ansible``.
|
||||
|
||||
.. _adding-galaxy-roles:
|
||||
|
||||
Adding Galaxy roles
|
||||
-------------------
|
||||
|
||||
Any roles defined in ``openstack-ansible/ansible-role-requirements.yml``
|
||||
will be installed by the
|
||||
``openstack-ansible/scripts/bootstrap-ansible.sh`` script.
|
||||
|
||||
|
||||
Setting overrides in configuration files
|
||||
----------------------------------------
|
||||
|
||||
All of the services that use YAML, JSON, or INI for configuration can receive
|
||||
overrides through the use of a Ansible action plugin named ``config_template``.
|
||||
The configuration template engine allows a deployer to use a simple dictionary
|
||||
to modify or add items into configuration files at run time that may not have a
|
||||
preset template option. All OpenStack-Ansible roles allow for this
|
||||
functionality where applicable. Files available to receive overrides can be
|
||||
seen in the ``defaults/main.yml`` file as standard empty dictionaries (hashes).
|
||||
|
||||
Practical guidance for using this feature is available in the `Install Guide`_.
|
||||
|
||||
This module has been `submitted for consideration`_ into Ansible Core.
|
||||
|
||||
.. _Install Guide: ../install-guide/app-advanced-config-override.html
|
||||
.. _submitted for consideration: https://github.com/ansible/ansible/pull/12555
|
||||
|
||||
|
||||
Build the environment with additional python packages
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
The system will allow you to install and build any package that is a python
|
||||
installable. The repository infrastructure will look for and create any
|
||||
git based or PyPi installable package. When the package is built the repo-build
|
||||
role will create the sources as Python wheels to extend the base system and
|
||||
requirements.
|
||||
|
||||
While the packages pre-built in the repository-infrastructure are
|
||||
comprehensive, it may be needed to change the source locations and versions of
|
||||
packages to suit different deployment needs. Adding additional repositories as
|
||||
overrides is as simple as listing entries within the variable file of your
|
||||
choice. Any ``user_.*.yml`` file within the "/etc/openstack_deployment"
|
||||
directory will work to facilitate the addition of a new packages.
|
||||
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
swift_git_repo: https://private-git.example.org/example-org/swift
|
||||
swift_git_install_branch: master
|
||||
|
||||
|
||||
Additional lists of python packages can also be overridden using a
|
||||
``user_.*.yml`` variable file.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
swift_requires_pip_packages:
|
||||
- virtualenv
|
||||
- virtualenv-tools
|
||||
- python-keystoneclient
|
||||
- NEW-SPECIAL-PACKAGE
|
||||
|
||||
|
||||
Once the variables are set call the play ``repo-build.yml`` to build all of the
|
||||
wheels within the repository infrastructure. When ready run the target plays to
|
||||
deploy your overridden source code.
|
||||
|
||||
|
||||
Module documentation
|
||||
++++++++++++++++++++
|
||||
|
||||
These are the options available as found within the virtual module
|
||||
documentation section.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
module: config_template
|
||||
version_added: 1.9.2
|
||||
short_description: >
|
||||
Renders template files providing a create/update override interface
|
||||
description:
|
||||
- The module contains the template functionality with the ability to
|
||||
override items in config, in transit, through the use of a simple
|
||||
dictionary without having to write out various temp files on target
|
||||
machines. The module renders all of the potential jinja a user could
|
||||
provide in both the template file and in the override dictionary which
|
||||
is ideal for deployers who may have lots of different configs using a
|
||||
similar code base.
|
||||
- The module is an extension of the **copy** module and all of attributes
|
||||
that can be set there are available to be set here.
|
||||
options:
|
||||
src:
|
||||
description:
|
||||
- Path of a Jinja2 formatted template on the local server. This can
|
||||
be a relative or absolute path.
|
||||
required: true
|
||||
default: null
|
||||
dest:
|
||||
description:
|
||||
- Location to render the template to on the remote machine.
|
||||
required: true
|
||||
default: null
|
||||
config_overrides:
|
||||
description:
|
||||
- A dictionary used to update or override items within a configuration
|
||||
template. The dictionary data structure may be nested. If the target
|
||||
config file is an ini file the nested keys in the ``config_overrides``
|
||||
will be used as section headers.
|
||||
config_type:
|
||||
description:
|
||||
- A string value describing the target config type.
|
||||
choices:
|
||||
- ini
|
||||
- json
|
||||
- yaml
|
||||
|
||||
|
||||
Example task using the "config_template" module
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
- name: Run config template ini
|
||||
config_template:
|
||||
src: test.ini.j2
|
||||
dest: /tmp/test.ini
|
||||
config_overrides: {{ test_overrides }}
|
||||
config_type: ini
|
||||
|
||||
|
||||
Example overrides dictionary(hash)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
test_overrides:
|
||||
DEFAULT:
|
||||
new_item: 12345
|
||||
|
||||
|
||||
Original template file "test.ini.j2"
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
[DEFAULT]
|
||||
value1 = abc
|
||||
value2 = 123
|
||||
|
||||
|
||||
Rendered on disk file "/tmp/test.ini"
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
[DEFAULT]
|
||||
value1 = abc
|
||||
value2 = 123
|
||||
new_item = 12345
|
||||
|
||||
|
||||
In this task the ``test.ini.j2`` file is a template which will be rendered and
|
||||
written to disk at ``/tmp/test.ini``. The **config_overrides** entry is a
|
||||
dictionary(hash) which allows a deployer to set arbitrary data as overrides to
|
||||
be written into the configuration file at run time. The **config_type** entry
|
||||
specifies the type of configuration file the module will be interacting with;
|
||||
available options are "yaml", "json", and "ini".
|
||||
|
||||
|
||||
Discovering Available Overrides
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
All of these options can be specified in any way that suits your deployment.
|
||||
In terms of ease of use and flexibility it's recommended that you define your
|
||||
overrides in a user variable file such as
|
||||
``/etc/openstack_deploy/user_variables.yml``.
|
||||
|
||||
The list of overrides available may be found by executing:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
find . -name "main.yml" -exec grep '_.*_overrides:' {} \; \
|
||||
| grep -v "^#" \
|
||||
| sort -u
|
18
doc/source/draft-operations-guide/index.rst
Normal file
18
doc/source/draft-operations-guide/index.rst
Normal file
@ -0,0 +1,18 @@
|
||||
==================================
|
||||
OpenStack-Ansible operations guide
|
||||
==================================
|
||||
|
||||
This is a draft index page for the proposed OpenStack-Ansible
|
||||
operations guide.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
ops-lxc-commands.rst
|
||||
ops-add-computehost.rst
|
||||
ops-remove-computehost.rst
|
||||
ops-galera.rst
|
||||
ops-tips.rst
|
||||
ops-troubleshooting.rst
|
||||
extending.rst
|
||||
|
29
doc/source/draft-operations-guide/ops-add-computehost.rst
Normal file
29
doc/source/draft-operations-guide/ops-add-computehost.rst
Normal file
@ -0,0 +1,29 @@
|
||||
=====================
|
||||
Adding a compute host
|
||||
=====================
|
||||
|
||||
Use the following procedure to add a compute host to an operational
|
||||
cluster.
|
||||
|
||||
#. Configure the host as a target host. See `Prepare target hosts
|
||||
<http://docs.openstack.org/developer/openstack-ansible/install-guide/targethosts.html>`_
|
||||
for more information.
|
||||
|
||||
#. Edit the ``/etc/openstack_deploy/openstack_user_config.yml`` file and
|
||||
add the host to the ``compute_hosts`` stanza.
|
||||
|
||||
If necessary, also modify the ``used_ips`` stanza.
|
||||
|
||||
#. If the cluster is utilizing Telemetry/Metering (Ceilometer),
|
||||
edit the ``/etc/openstack_deploy/conf.d/ceilometer.yml`` file and add the
|
||||
host to the ``metering-compute_hosts`` stanza.
|
||||
|
||||
#. Run the following commands to add the host. Replace
|
||||
``NEW_HOST_NAME`` with the name of the new host.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# cd /opt/openstack-ansible/playbooks
|
||||
# openstack-ansible setup-hosts.yml --limit NEW_HOST_NAME
|
||||
# openstack-ansible setup-openstack.yml --skip-tags nova-key-distribute --limit NEW_HOST_NAME
|
||||
# openstack-ansible setup-openstack.yml --tags nova-key --limit compute_hosts
|
302
doc/source/draft-operations-guide/ops-galera-recovery.rst
Normal file
302
doc/source/draft-operations-guide/ops-galera-recovery.rst
Normal file
@ -0,0 +1,302 @@
|
||||
=======================
|
||||
Galera cluster recovery
|
||||
=======================
|
||||
|
||||
Run the ``galera-bootstrap`` playbook to automatically recover
|
||||
a node or an entire environment. Run the ``galera install`` playbook
|
||||
using the ``galera-bootstrap`` tag to auto recover a node or an
|
||||
entire environment.
|
||||
|
||||
#. Run the following Ansible command to show the failed nodes:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# openstack-ansible galera-install.yml --tags galera-bootstrap
|
||||
|
||||
The cluster comes back online after completion of this command.
|
||||
|
||||
Single-node failure
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If a single node fails, the other nodes maintain quorum and
|
||||
continue to process SQL requests.
|
||||
|
||||
#. Run the following Ansible command to determine the failed node:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql -h localhost \
|
||||
-e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server through
|
||||
socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
|
||||
In this example, node 3 has failed.
|
||||
|
||||
#. Restart MariaDB on the failed node and verify that it rejoins the
|
||||
cluster.
|
||||
|
||||
#. If MariaDB fails to start, run the ``mysqld`` command and perform
|
||||
further analysis on the output. As a last resort, rebuild the container
|
||||
for the node.
|
||||
|
||||
Multi-node failure
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When all but one node fails, the remaining node cannot achieve quorum and
|
||||
stops processing SQL requests. In this situation, failed nodes that
|
||||
recover cannot join the cluster because it no longer exists.
|
||||
|
||||
#. Run the following Ansible command to show the failed nodes:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node2_galera_container-49a47d25 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 18446744073709551615
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status non-Primary
|
||||
|
||||
In this example, nodes 2 and 3 have failed. The remaining operational
|
||||
server indicates ``non-Primary`` because it cannot achieve quorum.
|
||||
|
||||
#. Run the following command to
|
||||
`rebootstrap <http://galeracluster.com/documentation-webpages/quorumreset.html#id1>`_
|
||||
the operational node into the cluster:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# mysql -e "SET GLOBAL wsrep_provider_options='pc.bootstrap=yes';"
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 15
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node2_galera_container-49a47d25 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
The remaining operational node becomes the primary node and begins
|
||||
processing SQL requests.
|
||||
|
||||
#. Restart MariaDB on the failed nodes and verify that they rejoin the
|
||||
cluster:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 17
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
#. If MariaDB fails to start on any of the failed nodes, run the
|
||||
``mysqld`` command and perform further analysis on the output. As a
|
||||
last resort, rebuild the container for the node.
|
||||
|
||||
Complete failure
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Restore from backup if all of the nodes in a Galera cluster fail (do not
|
||||
shutdown gracefully). Run the following command to determine if all nodes in
|
||||
the cluster have failed:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
# GALERA saved state
|
||||
version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: -1
|
||||
cert_index:
|
||||
|
||||
|
||||
All the nodes have failed if ``mysqld`` is not running on any of the
|
||||
nodes and all of the nodes contain a ``seqno`` value of -1.
|
||||
|
||||
If any single node has a positive ``seqno`` value, then that node can be
|
||||
used to restart the cluster. However, because there is no guarantee that
|
||||
each node has an identical copy of the data, we do not recommend to
|
||||
restart the cluster using the ``--wsrep-new-cluster`` command on one
|
||||
node.
|
||||
|
||||
Rebuilding a container
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Recovering from certain failures require rebuilding one or more containers.
|
||||
|
||||
#. Disable the failed node on the load balancer.
|
||||
|
||||
.. note::
|
||||
|
||||
Do not rely on the load balancer health checks to disable the node.
|
||||
If the node is not disabled, the load balancer sends SQL requests
|
||||
to it before it rejoins the cluster and cause data inconsistencies.
|
||||
|
||||
#. Destroy the container and remove MariaDB data stored outside
|
||||
of the container:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-stop -n node3_galera_container-3ea2cbd3
|
||||
# lxc-destroy -n node3_galera_container-3ea2cbd3
|
||||
# rm -rf /openstack/node3_galera_container-3ea2cbd3/*
|
||||
|
||||
In this example, node 3 failed.
|
||||
|
||||
#. Run the host setup playbook to rebuild the container on node 3:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# openstack-ansible setup-hosts.yml -l node3 \
|
||||
-l node3_galera_container-3ea2cbd3
|
||||
|
||||
|
||||
The playbook restarts all other containers on the node.
|
||||
|
||||
#. Run the infrastructure playbook to configure the container
|
||||
specifically on node 3:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# openstack-ansible setup-infrastructure.yml \
|
||||
-l node3_galera_container-3ea2cbd3
|
||||
|
||||
|
||||
.. warning::
|
||||
|
||||
The new container runs a single-node Galera cluster, which is a dangerous
|
||||
state because the environment contains more than one active database
|
||||
with potentially different data.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 1
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid da078d01-29e5-11e4-a051-03d896dbdb2d
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 4
|
||||
wsrep_cluster_size 2
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 4
|
||||
wsrep_cluster_size 2
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
#. Restart MariaDB in the new container and verify that it rejoins the
|
||||
cluster.
|
||||
|
||||
.. note::
|
||||
|
||||
In larger deployments, it may take some time for the MariaDB daemon to
|
||||
start in the new container. It will be synchronizing data from the other
|
||||
MariaDB servers during this time. You can monitor the status during this
|
||||
process by tailing the ``/var/log/mysql_logs/galera_server_error.log``
|
||||
log file.
|
||||
|
||||
Lines starting with ``WSREP_SST`` will appear during the sync process
|
||||
and you should see a line with ``WSREP: SST complete, seqno: <NUMBER>``
|
||||
if the sync was successful.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql \
|
||||
-h localhost -e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 5
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
|
||||
#. Enable the failed node on the load balancer.
|
32
doc/source/draft-operations-guide/ops-galera-remove.rst
Normal file
32
doc/source/draft-operations-guide/ops-galera-remove.rst
Normal file
@ -0,0 +1,32 @@
|
||||
==============
|
||||
Removing nodes
|
||||
==============
|
||||
|
||||
In the following example, all but one node was shut down gracefully:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "mysql -h localhost \
|
||||
-e 'show status like \"%wsrep_cluster_%\";'"
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (2)
|
||||
|
||||
node2_galera_container-49a47d25 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (2)
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 7
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
|
||||
Compare this example output with the output from the multi-node failure
|
||||
scenario where the remaining operational node is non-primary and stops
|
||||
processing SQL requests. Gracefully shutting down the MariaDB service on
|
||||
all but one node allows the remaining operational node to continue
|
||||
processing SQL requests. When gracefully shutting down multiple nodes,
|
||||
perform the actions sequentially to retain operation.
|
88
doc/source/draft-operations-guide/ops-galera-start.rst
Normal file
88
doc/source/draft-operations-guide/ops-galera-start.rst
Normal file
@ -0,0 +1,88 @@
|
||||
==================
|
||||
Starting a cluster
|
||||
==================
|
||||
|
||||
Gracefully shutting down all nodes destroys the cluster. Starting or
|
||||
restarting a cluster from zero nodes requires creating a new cluster on
|
||||
one of the nodes.
|
||||
|
||||
#. Start a new cluster on the most advanced node.
|
||||
Check the ``seqno`` value in the ``grastate.dat`` file on all of the nodes:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# ansible galera_container -m shell -a "cat /var/lib/mysql/grastate.dat"
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
# GALERA saved state version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: 31
|
||||
cert_index:
|
||||
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
# GALERA saved state version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: 31
|
||||
cert_index:
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
# GALERA saved state version: 2.1
|
||||
uuid: 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
seqno: 31
|
||||
cert_index:
|
||||
|
||||
In this example, all nodes in the cluster contain the same positive
|
||||
``seqno`` values as they were synchronized just prior to
|
||||
graceful shutdown. If all ``seqno`` values are equal, any node can
|
||||
start the new cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# /etc/init.d/mysql start --wsrep-new-cluster
|
||||
|
||||
This command results in a cluster containing a single node. The
|
||||
``wsrep_cluster_size`` value shows the number of nodes in the
|
||||
cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
node2_galera_container-49a47d25 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (111)
|
||||
|
||||
node3_galera_container-3ea2cbd3 | FAILED | rc=1 >>
|
||||
ERROR 2002 (HY000): Can't connect to local MySQL server
|
||||
through socket '/var/run/mysqld/mysqld.sock' (2)
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 1
|
||||
wsrep_cluster_size 1
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
#. Restart MariaDB on the other nodes and verify that they rejoin the
|
||||
cluster.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
node2_galera_container-49a47d25 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 3
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node3_galera_container-3ea2cbd3 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 3
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
||||
node4_galera_container-76275635 | success | rc=0 >>
|
||||
Variable_name Value
|
||||
wsrep_cluster_conf_id 3
|
||||
wsrep_cluster_size 3
|
||||
wsrep_cluster_state_uuid 338b06b0-2948-11e4-9d06-bef42f6c52f1
|
||||
wsrep_cluster_status Primary
|
||||
|
18
doc/source/draft-operations-guide/ops-galera.rst
Normal file
18
doc/source/draft-operations-guide/ops-galera.rst
Normal file
@ -0,0 +1,18 @@
|
||||
==========================
|
||||
Galera cluster maintenance
|
||||
==========================
|
||||
|
||||
.. toctree::
|
||||
|
||||
ops-galera-remove.rst
|
||||
ops-galera-start.rst
|
||||
ops-galera-recovery.rst
|
||||
|
||||
Routine maintenance includes gracefully adding or removing nodes from
|
||||
the cluster without impacting operation and also starting a cluster
|
||||
after gracefully shutting down all nodes.
|
||||
|
||||
MySQL instances are restarted when creating a cluster, when adding a
|
||||
node, when the service is not running, or when changes are made to the
|
||||
``/etc/mysql/my.cnf`` configuration file.
|
||||
|
38
doc/source/draft-operations-guide/ops-lxc-commands.rst
Normal file
38
doc/source/draft-operations-guide/ops-lxc-commands.rst
Normal file
@ -0,0 +1,38 @@
|
||||
========================
|
||||
Linux Container commands
|
||||
========================
|
||||
|
||||
The following are some useful commands to manage LXC:
|
||||
|
||||
- List containers and summary information such as operational state and
|
||||
network configuration:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-ls --fancy
|
||||
|
||||
- Show container details including operational state, resource
|
||||
utilization, and ``veth`` pairs:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-info --name container_name
|
||||
|
||||
- Start a container:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-start --name container_name
|
||||
|
||||
- Attach to a container:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-attach --name container_name
|
||||
|
||||
- Stop a container:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-stop --name container_name
|
||||
|
49
doc/source/draft-operations-guide/ops-remove-computehost.rst
Normal file
49
doc/source/draft-operations-guide/ops-remove-computehost.rst
Normal file
@ -0,0 +1,49 @@
|
||||
=======================
|
||||
Removing a compute host
|
||||
=======================
|
||||
|
||||
The `openstack-ansible-ops <https://git.openstack.org/cgit/openstack/openstack-ansible-ops>`_
|
||||
repository contains a playbook for removing a compute host from an
|
||||
OpenStack-Ansible (OSA) environment.
|
||||
To remove a compute host, follow the below procedure.
|
||||
|
||||
.. note::
|
||||
|
||||
This guide describes how to remove a compute node from an OSA environment
|
||||
completely. Perform these steps with caution, as the compute node will no
|
||||
longer be in service after the steps have been completed. This guide assumes
|
||||
that all data and instances have been properly migrated.
|
||||
|
||||
#. Disable all OpenStack services running on the compute node.
|
||||
This can include, but is not limited to, the ``nova-compute`` service
|
||||
and the neutron agent service.
|
||||
|
||||
.. note::
|
||||
|
||||
Ensure this step is performed first
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
# Run these commands on the compute node to be removed
|
||||
# stop nova-compute
|
||||
# stop neutron-linuxbridge-agent
|
||||
|
||||
#. Clone the ``openstack-ansible-ops`` repository to your deployment host:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ git clone https://git.openstack.org/openstack/openstack-ansible-ops \
|
||||
/opt/openstack-ansible-ops
|
||||
|
||||
#. Run the ``remove_compute_node.yml`` Ansible playbook with the
|
||||
``node_to_be_removed`` user variable set:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ cd /opt/openstack-ansible-ops/ansible_tools/playbooks
|
||||
openstack-ansible remove_compute_node.yml \
|
||||
-e node_to_be_removed="<name-of-compute-host>"
|
||||
|
||||
#. After the playbook completes, remove the compute node from the
|
||||
OpenStack-Ansible configuration file in
|
||||
``/etc/openstack_deploy/openstack_user_config.yml``.
|
38
doc/source/draft-operations-guide/ops-tips.rst
Normal file
38
doc/source/draft-operations-guide/ops-tips.rst
Normal file
@ -0,0 +1,38 @@
|
||||
===============
|
||||
Tips and tricks
|
||||
===============
|
||||
|
||||
Ansible forks
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
The default MaxSessions setting for the OpenSSH Daemon is 10. Each Ansible
|
||||
fork makes use of a Session. By default, Ansible sets the number of forks to
|
||||
5. However, you can increase the number of forks used in order to improve
|
||||
deployment performance in large environments.
|
||||
|
||||
Note that more than 10 forks will cause issues for any playbooks
|
||||
which use ``delegate_to`` or ``local_action`` in the tasks. It is
|
||||
recommended that the number of forks are not raised when executing against the
|
||||
Control Plane, as this is where delegation is most often used.
|
||||
|
||||
The number of forks used may be changed on a permanent basis by including
|
||||
the appropriate change to the ``ANSIBLE_FORKS`` in your ``.bashrc`` file.
|
||||
Alternatively it can be changed for a particular playbook execution by using
|
||||
the ``--forks`` CLI parameter. For example, the following executes the nova
|
||||
playbook against the control plane with 10 forks, then against the compute
|
||||
nodes with 50 forks.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# openstack-ansible --forks 10 os-nova-install.yml --limit compute_containers
|
||||
# openstack-ansible --forks 50 os-nova-install.yml --limit compute_hosts
|
||||
|
||||
For more information about forks, please see the following references:
|
||||
|
||||
* OpenStack-Ansible `Bug 1479812`_
|
||||
* Ansible `forks`_ entry for ansible.cfg
|
||||
* `Ansible Performance Tuning`_
|
||||
|
||||
.. _Bug 1479812: https://bugs.launchpad.net/openstack-ansible/+bug/1479812
|
||||
.. _forks: http://docs.ansible.com/ansible/intro_configuration.html#forks
|
||||
.. _Ansible Performance Tuning: https://www.ansible.com/blog/ansible-performance-tuning
|
125
doc/source/draft-operations-guide/ops-troubleshooting.rst
Normal file
125
doc/source/draft-operations-guide/ops-troubleshooting.rst
Normal file
@ -0,0 +1,125 @@
|
||||
===============
|
||||
Troubleshooting
|
||||
===============
|
||||
|
||||
Host kernel upgrade from version 3.13
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Ubuntu kernel packages newer than version 3.13 contain a change in
|
||||
module naming from ``nf_conntrack`` to ``br_netfilter``. After
|
||||
upgrading the kernel, re-run the ``openstack-hosts-setup.yml``
|
||||
playbook against those hosts. See `OSA bug 157996`_ for more
|
||||
information.
|
||||
|
||||
.. _OSA bug 157996: https://bugs.launchpad.net/openstack-ansible/+bug/1579963
|
||||
|
||||
|
||||
|
||||
Container networking issues
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
All LXC containers on the host have two virtual Ethernet interfaces:
|
||||
|
||||
* `eth0` in the container connects to `lxcbr0` on the host
|
||||
* `eth1` in the container connects to `br-mgmt` on the host
|
||||
|
||||
.. note::
|
||||
|
||||
Some containers, such as ``cinder``, ``glance``, ``neutron_agents``, and
|
||||
``swift_proxy``, have more than two interfaces to support their
|
||||
functions.
|
||||
|
||||
Predictable interface naming
|
||||
----------------------------
|
||||
|
||||
On the host, all virtual Ethernet devices are named based on their
|
||||
container as well as the name of the interface inside the container:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
${CONTAINER_UNIQUE_ID}_${NETWORK_DEVICE_NAME}
|
||||
|
||||
As an example, an all-in-one (AIO) build might provide a utility
|
||||
container called `aio1_utility_container-d13b7132`. That container
|
||||
will have two network interfaces: `d13b7132_eth0` and `d13b7132_eth1`.
|
||||
|
||||
Another option would be to use the LXC tools to retrieve information
|
||||
about the utility container:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# lxc-info -n aio1_utility_container-d13b7132
|
||||
|
||||
Name: aio1_utility_container-d13b7132
|
||||
State: RUNNING
|
||||
PID: 8245
|
||||
IP: 10.0.3.201
|
||||
IP: 172.29.237.204
|
||||
CPU use: 79.18 seconds
|
||||
BlkIO use: 678.26 MiB
|
||||
Memory use: 613.33 MiB
|
||||
KMem use: 0 bytes
|
||||
Link: d13b7132_eth0
|
||||
TX bytes: 743.48 KiB
|
||||
RX bytes: 88.78 MiB
|
||||
Total bytes: 89.51 MiB
|
||||
Link: d13b7132_eth1
|
||||
TX bytes: 412.42 KiB
|
||||
RX bytes: 17.32 MiB
|
||||
Total bytes: 17.73 MiB
|
||||
|
||||
The ``Link:`` lines will show the network interfaces that are attached
|
||||
to the utility container.
|
||||
|
||||
Reviewing container networking traffic
|
||||
--------------------------------------
|
||||
|
||||
To dump traffic on the ``br-mgmt`` bridge, use ``tcpdump`` to see all
|
||||
communications between the various containers. To narrow the focus,
|
||||
run ``tcpdump`` only on the desired network interface of the
|
||||
containers.
|
||||
|
||||
Cached Ansible facts issues
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
At the beginning of a playbook run, information about each host is gathered.
|
||||
Examples of the information gathered are:
|
||||
|
||||
* Linux distribution
|
||||
* Kernel version
|
||||
* Network interfaces
|
||||
|
||||
To improve performance, particularly in large deployments, you can
|
||||
cache host facts and information.
|
||||
|
||||
OpenStack-Ansible enables fact caching by default. The facts are
|
||||
cached in JSON files within ``/etc/openstack_deploy/ansible_facts``.
|
||||
|
||||
Fact caching can be disabled by commenting out the ``fact_caching``
|
||||
parameter in ``playbooks/ansible.cfg``. Refer to the Ansible
|
||||
documentation on `fact caching`_ for more details.
|
||||
|
||||
.. _fact caching: http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching
|
||||
|
||||
Forcing regeneration of cached facts
|
||||
------------------------------------
|
||||
|
||||
Cached facts may be incorrect if the host receives a kernel upgrade or new
|
||||
network interfaces. Newly created bridges also disrupt cache facts.
|
||||
|
||||
This can lead to unexpected errors while running playbooks, and
|
||||
require that the cached facts be regenerated.
|
||||
|
||||
Run the following command to remove all currently cached facts for all hosts:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# rm /etc/openstack_deploy/ansible_facts/*
|
||||
|
||||
New facts will be gathered and cached during the next playbook run.
|
||||
|
||||
To clear facts for a single host, find its file within
|
||||
``/etc/openstack_deploy/ansible_facts/`` and remove it. Each host has
|
||||
a JSON file that is named after its hostname. The facts for that host
|
||||
will be regenerated on the next playbook run.
|
||||
|
Loading…
Reference in New Issue
Block a user