import draft from openstack-manuals/doc/ha-guide-draft/
Move the draft of the restructured HA guide to this dedicated repository, in order to focus reviews on a smaller, more specialist audience and accelerate development. Change-Id: I95a4b46fecaafafd1beb8314d1cf795b60fb17a8
This commit is contained in:
parent
34c59c7cb5
commit
393604c6b7
230
doc/common/app-support.rst
Normal file
230
doc/common/app-support.rst
Normal file
@ -0,0 +1,230 @@
|
|||||||
|
.. ## WARNING ##########################################################
|
||||||
|
.. This file is synced from openstack/openstack-manuals repository to
|
||||||
|
.. other related repositories. If you need to make changes to this file,
|
||||||
|
.. make the changes in openstack-manuals. After any change merged to,
|
||||||
|
.. openstack-manuals, automatically a patch for others will be proposed.
|
||||||
|
.. #####################################################################
|
||||||
|
|
||||||
|
=================
|
||||||
|
Community support
|
||||||
|
=================
|
||||||
|
|
||||||
|
The following resources are available to help you run and use OpenStack.
|
||||||
|
The OpenStack community constantly improves and adds to the main
|
||||||
|
features of OpenStack, but if you have any questions, do not hesitate to
|
||||||
|
ask. Use the following resources to get OpenStack support and
|
||||||
|
troubleshoot your installations.
|
||||||
|
|
||||||
|
Documentation
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
For the available OpenStack documentation, see
|
||||||
|
`docs.openstack.org <https://docs.openstack.org>`_.
|
||||||
|
|
||||||
|
The following guides explain how to install a Proof-of-Concept OpenStack cloud
|
||||||
|
and its associated components:
|
||||||
|
|
||||||
|
* `Rocky Installation Guides <https://docs.openstack.org/rocky/install/>`_
|
||||||
|
|
||||||
|
The following books explain how to configure and run an OpenStack cloud:
|
||||||
|
|
||||||
|
* `Architecture Design Guide <https://docs.openstack.org/arch-design/>`_
|
||||||
|
|
||||||
|
* `Rocky Administrator Guides <https://docs.openstack.org/rocky/admin/>`_
|
||||||
|
|
||||||
|
* `Rocky Configuration Guides <https://docs.openstack.org/rocky/configuration/>`_
|
||||||
|
|
||||||
|
* `Rocky Networking Guide <https://docs.openstack.org/neutron/rocky/admin/>`_
|
||||||
|
|
||||||
|
* `High Availability Guide <https://docs.openstack.org/ha-guide/>`_
|
||||||
|
|
||||||
|
* `Security Guide <https://docs.openstack.org/security-guide/>`_
|
||||||
|
|
||||||
|
* `Virtual Machine Image Guide <https://docs.openstack.org/image-guide/>`_
|
||||||
|
|
||||||
|
The following book explains how to use the command-line clients:
|
||||||
|
|
||||||
|
* `Rocky API Bindings
|
||||||
|
<https://docs.openstack.org/rocky/language-bindings.html>`_
|
||||||
|
|
||||||
|
The following documentation provides reference and guidance information
|
||||||
|
for the OpenStack APIs:
|
||||||
|
|
||||||
|
* `API Documentation <https://developer.openstack.org/api-guide/quick-start/>`_
|
||||||
|
|
||||||
|
The following guide provides information on how to contribute to OpenStack
|
||||||
|
documentation:
|
||||||
|
|
||||||
|
* `Documentation Contributor Guide <https://docs.openstack.org/doc-contrib-guide/>`_
|
||||||
|
|
||||||
|
ask.openstack.org
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
During the set up or testing of OpenStack, you might have questions
|
||||||
|
about how a specific task is completed or be in a situation where a
|
||||||
|
feature does not work correctly. Use the
|
||||||
|
`ask.openstack.org <https://ask.openstack.org>`_ site to ask questions
|
||||||
|
and get answers. When you visit the `Ask OpenStack
|
||||||
|
<https://ask.openstack.org>`_ site, scan
|
||||||
|
the recently asked questions to see whether your question has already
|
||||||
|
been answered. If not, ask a new question. Be sure to give a clear,
|
||||||
|
concise summary in the title and provide as much detail as possible in
|
||||||
|
the description. Paste in your command output or stack traces, links to
|
||||||
|
screen shots, and any other information which might be useful.
|
||||||
|
|
||||||
|
The OpenStack wiki
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The `OpenStack wiki <https://wiki.openstack.org/>`_ contains a broad
|
||||||
|
range of topics but some of the information can be difficult to find or
|
||||||
|
is a few pages deep. Fortunately, the wiki search feature enables you to
|
||||||
|
search by title or content. If you search for specific information, such
|
||||||
|
as about networking or OpenStack Compute, you can find a large amount
|
||||||
|
of relevant material. More is being added all the time, so be sure to
|
||||||
|
check back often. You can find the search box in the upper-right corner
|
||||||
|
of any OpenStack wiki page.
|
||||||
|
|
||||||
|
The Launchpad bugs area
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The OpenStack community values your set up and testing efforts and wants
|
||||||
|
your feedback. To log a bug, you must `sign up for a Launchpad account
|
||||||
|
<https://launchpad.net/+login>`_. You can view existing bugs and report bugs
|
||||||
|
in the Launchpad Bugs area. Use the search feature to determine whether
|
||||||
|
the bug has already been reported or already been fixed. If it still
|
||||||
|
seems like your bug is unreported, fill out a bug report.
|
||||||
|
|
||||||
|
Some tips:
|
||||||
|
|
||||||
|
* Give a clear, concise summary.
|
||||||
|
|
||||||
|
* Provide as much detail as possible in the description. Paste in your
|
||||||
|
command output or stack traces, links to screen shots, and any other
|
||||||
|
information which might be useful.
|
||||||
|
|
||||||
|
* Be sure to include the software and package versions that you are
|
||||||
|
using, especially if you are using a development branch, such as,
|
||||||
|
``"Kilo release" vs git commit bc79c3ecc55929bac585d04a03475b72e06a3208``.
|
||||||
|
|
||||||
|
* Any deployment-specific information is helpful, such as whether you
|
||||||
|
are using Ubuntu 14.04 or are performing a multi-node installation.
|
||||||
|
|
||||||
|
The following Launchpad Bugs areas are available:
|
||||||
|
|
||||||
|
* `Bugs: OpenStack Block Storage
|
||||||
|
(cinder) <https://bugs.launchpad.net/cinder>`_
|
||||||
|
|
||||||
|
* `Bugs: OpenStack Compute (nova) <https://bugs.launchpad.net/nova>`_
|
||||||
|
|
||||||
|
* `Bugs: OpenStack Dashboard
|
||||||
|
(horizon) <https://bugs.launchpad.net/horizon>`_
|
||||||
|
|
||||||
|
* `Bugs: OpenStack Identity
|
||||||
|
(keystone) <https://bugs.launchpad.net/keystone>`_
|
||||||
|
|
||||||
|
* `Bugs: OpenStack Image service
|
||||||
|
(glance) <https://bugs.launchpad.net/glance>`_
|
||||||
|
|
||||||
|
* `Bugs: OpenStack Networking
|
||||||
|
(neutron) <https://bugs.launchpad.net/neutron>`_
|
||||||
|
|
||||||
|
* `Bugs: OpenStack Object Storage
|
||||||
|
(swift) <https://bugs.launchpad.net/swift>`_
|
||||||
|
|
||||||
|
* `Bugs: Application catalog (murano) <https://bugs.launchpad.net/murano>`_
|
||||||
|
|
||||||
|
* `Bugs: Bare metal service (ironic) <https://bugs.launchpad.net/ironic>`_
|
||||||
|
|
||||||
|
* `Bugs: Clustering service (senlin) <https://bugs.launchpad.net/senlin>`_
|
||||||
|
|
||||||
|
* `Bugs: Container Infrastructure Management service (magnum) <https://bugs.launchpad.net/magnum>`_
|
||||||
|
|
||||||
|
* `Bugs: Data processing service
|
||||||
|
(sahara) <https://bugs.launchpad.net/sahara>`_
|
||||||
|
|
||||||
|
* `Bugs: Database service (trove) <https://bugs.launchpad.net/trove>`_
|
||||||
|
|
||||||
|
* `Bugs: DNS service (designate) <https://bugs.launchpad.net/designate>`_
|
||||||
|
|
||||||
|
* `Bugs: Key Manager Service (barbican) <https://bugs.launchpad.net/barbican>`_
|
||||||
|
|
||||||
|
* `Bugs: Monitoring (monasca) <https://bugs.launchpad.net/monasca>`_
|
||||||
|
|
||||||
|
* `Bugs: Orchestration (heat) <https://bugs.launchpad.net/heat>`_
|
||||||
|
|
||||||
|
* `Bugs: Rating (cloudkitty) <https://bugs.launchpad.net/cloudkitty>`_
|
||||||
|
|
||||||
|
* `Bugs: Shared file systems (manila) <https://bugs.launchpad.net/manila>`_
|
||||||
|
|
||||||
|
* `Bugs: Telemetry
|
||||||
|
(ceilometer) <https://bugs.launchpad.net/ceilometer>`_
|
||||||
|
|
||||||
|
* `Bugs: Telemetry v3
|
||||||
|
(gnocchi) <https://bugs.launchpad.net/gnocchi>`_
|
||||||
|
|
||||||
|
* `Bugs: Workflow service
|
||||||
|
(mistral) <https://bugs.launchpad.net/mistral>`_
|
||||||
|
|
||||||
|
* `Bugs: Messaging service
|
||||||
|
(zaqar) <https://bugs.launchpad.net/zaqar>`_
|
||||||
|
|
||||||
|
* `Bugs: Container service
|
||||||
|
(zun) <https://bugs.launchpad.net/zun>`_
|
||||||
|
|
||||||
|
* `Bugs: OpenStack API Documentation
|
||||||
|
(developer.openstack.org) <https://bugs.launchpad.net/openstack-api-site>`_
|
||||||
|
|
||||||
|
* `Bugs: OpenStack Documentation
|
||||||
|
(docs.openstack.org) <https://bugs.launchpad.net/openstack-manuals>`_
|
||||||
|
|
||||||
|
Documentation feedback
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To provide feedback on documentation, join our IRC channel ``#openstack-doc``
|
||||||
|
on the Freenode IRC network, or `report a bug in Launchpad
|
||||||
|
<https://bugs.launchpad.net/openstack/+filebug>`_ and choose the particular
|
||||||
|
project that the documentation is a part of.
|
||||||
|
|
||||||
|
The OpenStack IRC channel
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The OpenStack community lives in the #openstack IRC channel on the
|
||||||
|
Freenode network. You can hang out, ask questions, or get immediate
|
||||||
|
feedback for urgent and pressing issues. To install an IRC client or use
|
||||||
|
a browser-based client, go to
|
||||||
|
`https://webchat.freenode.net/ <https://webchat.freenode.net>`_. You can
|
||||||
|
also use `Colloquy <http://colloquy.info/>`_ (Mac OS X),
|
||||||
|
`mIRC <http://www.mirc.com/>`_ (Windows),
|
||||||
|
or XChat (Linux). When you are in the IRC channel
|
||||||
|
and want to share code or command output, the generally accepted method
|
||||||
|
is to use a Paste Bin. The OpenStack project has one at `Paste
|
||||||
|
<http://paste.openstack.org>`_. Just paste your longer amounts of text or
|
||||||
|
logs in the web form and you get a URL that you can paste into the
|
||||||
|
channel. The OpenStack IRC channel is ``#openstack`` on
|
||||||
|
``irc.freenode.net``. You can find a list of all OpenStack IRC channels on
|
||||||
|
the `IRC page on the wiki <https://wiki.openstack.org/wiki/IRC>`_.
|
||||||
|
|
||||||
|
OpenStack mailing lists
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A great way to get answers and insights is to post your question or
|
||||||
|
problematic scenario to the OpenStack mailing list. You can learn from
|
||||||
|
and help others who might have similar issues. To subscribe or view the
|
||||||
|
archives, go to the `general OpenStack mailing list
|
||||||
|
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>`_. If you are
|
||||||
|
interested in the other mailing lists for specific projects or development,
|
||||||
|
refer to `Mailing Lists <https://wiki.openstack.org/wiki/Mailing_Lists>`_.
|
||||||
|
|
||||||
|
OpenStack distribution packages
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The following Linux distributions provide community-supported packages
|
||||||
|
for OpenStack:
|
||||||
|
|
||||||
|
* **CentOS, Fedora, and Red Hat Enterprise Linux:**
|
||||||
|
https://www.rdoproject.org/
|
||||||
|
|
||||||
|
* **openSUSE and SUSE Linux Enterprise Server:**
|
||||||
|
https://en.opensuse.org/Portal:OpenStack
|
||||||
|
|
||||||
|
* **Ubuntu:** https://wiki.ubuntu.com/OpenStack/CloudArchive
|
8
doc/common/appendix.rst
Normal file
8
doc/common/appendix.rst
Normal file
@ -0,0 +1,8 @@
|
|||||||
|
Appendix
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
app-support.rst
|
||||||
|
glossary.rst
|
47
doc/common/conventions.rst
Normal file
47
doc/common/conventions.rst
Normal file
@ -0,0 +1,47 @@
|
|||||||
|
.. ## WARNING ##########################################################
|
||||||
|
.. This file is synced from openstack/openstack-manuals repository to
|
||||||
|
.. other related repositories. If you need to make changes to this file,
|
||||||
|
.. make the changes in openstack-manuals. After any change merged to,
|
||||||
|
.. openstack-manuals, automatically a patch for others will be proposed.
|
||||||
|
.. #####################################################################
|
||||||
|
|
||||||
|
===========
|
||||||
|
Conventions
|
||||||
|
===========
|
||||||
|
|
||||||
|
The OpenStack documentation uses several typesetting conventions.
|
||||||
|
|
||||||
|
Notices
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Notices take these forms:
|
||||||
|
|
||||||
|
.. note:: A comment with additional information that explains a part of the
|
||||||
|
text.
|
||||||
|
|
||||||
|
.. important:: Something you must be aware of before proceeding.
|
||||||
|
|
||||||
|
.. tip:: An extra but helpful piece of practical advice.
|
||||||
|
|
||||||
|
.. caution:: Helpful information that prevents the user from making mistakes.
|
||||||
|
|
||||||
|
.. warning:: Critical information about the risk of data loss or security
|
||||||
|
issues.
|
||||||
|
|
||||||
|
Command prompts
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ command
|
||||||
|
|
||||||
|
Any user, including the ``root`` user, can run commands that are
|
||||||
|
prefixed with the ``$`` prompt.
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# command
|
||||||
|
|
||||||
|
The ``root`` user must run commands that are prefixed with the ``#``
|
||||||
|
prompt. You can also prefix these commands with the :command:`sudo`
|
||||||
|
command, if available, to run them.
|
4164
doc/common/glossary.rst
Normal file
4164
doc/common/glossary.rst
Normal file
File diff suppressed because it is too large
Load Diff
110
doc/common/source/conf.py
Normal file
110
doc/common/source/conf.py
Normal file
@ -0,0 +1,110 @@
|
|||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# This file is execfile()d with the current directory set to its
|
||||||
|
# containing dir.
|
||||||
|
#
|
||||||
|
# Note that not all possible configuration values are present in this
|
||||||
|
# autogenerated file.
|
||||||
|
#
|
||||||
|
# All configuration values have a default; values that are commented out
|
||||||
|
# serve to show the default.
|
||||||
|
|
||||||
|
import os
|
||||||
|
# import sys
|
||||||
|
|
||||||
|
|
||||||
|
# If extensions (or modules to document with autodoc) are in another directory,
|
||||||
|
# add these directories to sys.path here. If the directory is relative to the
|
||||||
|
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||||
|
# sys.path.insert(0, os.path.abspath('.'))
|
||||||
|
|
||||||
|
# -- General configuration ------------------------------------------------
|
||||||
|
|
||||||
|
# If your documentation needs a minimal Sphinx version, state it here.
|
||||||
|
# needs_sphinx = '1.0'
|
||||||
|
|
||||||
|
# Add any Sphinx extension module names here, as strings. They can be
|
||||||
|
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||||
|
# ones.
|
||||||
|
extensions = ['openstackdocstheme']
|
||||||
|
|
||||||
|
# Add any paths that contain templates here, relative to this directory.
|
||||||
|
# templates_path = ['_templates']
|
||||||
|
|
||||||
|
# The suffix of source filenames.
|
||||||
|
source_suffix = '.rst'
|
||||||
|
|
||||||
|
# The encoding of source files.
|
||||||
|
# source_encoding = 'utf-8-sig'
|
||||||
|
|
||||||
|
# The master toctree document.
|
||||||
|
master_doc = 'index'
|
||||||
|
|
||||||
|
# General information about the project.
|
||||||
|
repository_name = "openstack/openstack-manuals"
|
||||||
|
bug_project = 'openstack-manuals'
|
||||||
|
project = u'Common documents'
|
||||||
|
bug_tag = u'common'
|
||||||
|
|
||||||
|
copyright = u'2015-2018, OpenStack contributors'
|
||||||
|
|
||||||
|
# The version info for the project you're documenting, acts as replacement for
|
||||||
|
# |version| and |release|, also used in various other places throughout the
|
||||||
|
# built documents.
|
||||||
|
#
|
||||||
|
# The short X.Y version.
|
||||||
|
version = ''
|
||||||
|
# The full version, including alpha/beta/rc tags.
|
||||||
|
release = ''
|
||||||
|
|
||||||
|
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||||
|
# for a list of supported languages.
|
||||||
|
# language = None
|
||||||
|
|
||||||
|
# There are two options for replacing |today|: either, you set today to some
|
||||||
|
# non-false value, then it is used:
|
||||||
|
# today = ''
|
||||||
|
# Else, today_fmt is used as the format for a strftime call.
|
||||||
|
# today_fmt = '%B %d, %Y'
|
||||||
|
|
||||||
|
# List of patterns, relative to source directory, that match files and
|
||||||
|
# directories to ignore when looking for source files.
|
||||||
|
exclude_patterns = []
|
||||||
|
|
||||||
|
# The reST default role (used for this markup: `text`) to use for all
|
||||||
|
# documents.
|
||||||
|
# default_role = None
|
||||||
|
|
||||||
|
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||||
|
# add_function_parentheses = True
|
||||||
|
|
||||||
|
# If true, the current module name will be prepended to all description
|
||||||
|
# unit titles (such as .. function::).
|
||||||
|
# add_module_names = True
|
||||||
|
|
||||||
|
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||||
|
# output. They are ignored by default.
|
||||||
|
# show_authors = False
|
||||||
|
|
||||||
|
# The name of the Pygments (syntax highlighting) style to use.
|
||||||
|
pygments_style = 'sphinx'
|
||||||
|
|
||||||
|
# A list of ignored prefixes for module index sorting.
|
||||||
|
# modindex_common_prefix = []
|
||||||
|
|
||||||
|
# If true, keep warnings as "system message" paragraphs in the built documents.
|
||||||
|
# keep_warnings = False
|
||||||
|
|
||||||
|
# -- Options for Internationalization output ------------------------------
|
||||||
|
locale_dirs = ['locale/']
|
1
doc/source/common
Symbolic link
1
doc/source/common
Symbolic link
@ -0,0 +1 @@
|
|||||||
|
../common
|
55
doc/source/compute-node-ha.rst
Normal file
55
doc/source/compute-node-ha.rst
Normal file
@ -0,0 +1,55 @@
|
|||||||
|
============================
|
||||||
|
Configuring the compute node
|
||||||
|
============================
|
||||||
|
|
||||||
|
The `Installation Guides
|
||||||
|
<https://docs.openstack.org/ocata/install/>`_
|
||||||
|
provide instructions for installing multiple compute nodes.
|
||||||
|
To make the compute nodes highly available, you must configure the
|
||||||
|
environment to include multiple instances of the API and other services.
|
||||||
|
|
||||||
|
Configuring high availability for instances
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
As of September 2016, the OpenStack High Availability community is
|
||||||
|
designing and developing an official and unified way to provide high
|
||||||
|
availability for instances. We are developing automatic
|
||||||
|
recovery from failures of hardware or hypervisor-related software on
|
||||||
|
the compute node, or other failures that could prevent instances from
|
||||||
|
functioning correctly, such as, issues with a cinder volume I/O path.
|
||||||
|
|
||||||
|
More details are available in the `user story
|
||||||
|
<http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html>`_
|
||||||
|
co-authored by OpenStack's HA community and `Product Working Group
|
||||||
|
<https://wiki.openstack.org/wiki/ProductTeam>`_ (PWG), where this feature is
|
||||||
|
identified as missing functionality in OpenStack, which
|
||||||
|
should be addressed with high priority.
|
||||||
|
|
||||||
|
Existing solutions
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The architectural challenges of instance HA and several currently
|
||||||
|
existing solutions were presented in `a talk at the Austin summit
|
||||||
|
<https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation>`_,
|
||||||
|
for which `slides are also available <http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/>`_.
|
||||||
|
|
||||||
|
The code for three of these solutions can be found online at the following
|
||||||
|
links:
|
||||||
|
|
||||||
|
* `a mistral-based auto-recovery workflow
|
||||||
|
<https://github.com/gryf/mistral-evacuate>`_, by Intel
|
||||||
|
* `masakari <https://launchpad.net/masakari>`_, by NTT
|
||||||
|
* `OCF RAs
|
||||||
|
<http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/#/ocf-pros-cons>`_,
|
||||||
|
as used by Red Hat and SUSE
|
||||||
|
|
||||||
|
Current upstream work
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Work is in progress on a unified approach, which combines the best
|
||||||
|
aspects of existing upstream solutions. More details are available on
|
||||||
|
`the HA VMs user story wiki
|
||||||
|
<https://wiki.openstack.org/wiki/ProductTeam/User_Stories/HA_VMs>`_.
|
||||||
|
|
||||||
|
To get involved with this work, see the section on the
|
||||||
|
:doc:`ha-community`.
|
@ -1,3 +1,16 @@
|
|||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
# This file is execfile()d with the current directory set to its
|
# This file is execfile()d with the current directory set to its
|
||||||
# containing dir.
|
# containing dir.
|
||||||
#
|
#
|
||||||
@ -8,8 +21,7 @@
|
|||||||
# serve to show the default.
|
# serve to show the default.
|
||||||
|
|
||||||
import os
|
import os
|
||||||
|
# import sys
|
||||||
import openstackdocstheme
|
|
||||||
|
|
||||||
# If extensions (or modules to document with autodoc) are in another directory,
|
# If extensions (or modules to document with autodoc) are in another directory,
|
||||||
# add these directories to sys.path here. If the directory is relative to the
|
# add these directories to sys.path here. If the directory is relative to the
|
||||||
@ -26,6 +38,15 @@ import openstackdocstheme
|
|||||||
# ones.
|
# ones.
|
||||||
extensions = ['openstackdocstheme']
|
extensions = ['openstackdocstheme']
|
||||||
|
|
||||||
|
# Add any paths that contain templates here, relative to this directory.
|
||||||
|
# templates_path = ['_templates']
|
||||||
|
|
||||||
|
# The suffix of source filenames.
|
||||||
|
source_suffix = '.rst'
|
||||||
|
|
||||||
|
# The encoding of source files.
|
||||||
|
# source_encoding = 'utf-8-sig'
|
||||||
|
|
||||||
# The master toctree document.
|
# The master toctree document.
|
||||||
master_doc = 'index'
|
master_doc = 'index'
|
||||||
|
|
||||||
@ -36,12 +57,97 @@ project = u'High Availability Guide'
|
|||||||
bug_tag = u'ha-guide'
|
bug_tag = u'ha-guide'
|
||||||
copyright = u'2016-present, OpenStack contributors'
|
copyright = u'2016-present, OpenStack contributors'
|
||||||
|
|
||||||
|
# The version info for the project you're documenting, acts as replacement for
|
||||||
|
# |version| and |release|, also used in various other places throughout the
|
||||||
|
# built documents.
|
||||||
|
#
|
||||||
|
# The short X.Y version.
|
||||||
|
version = ''
|
||||||
|
# The full version, including alpha/beta/rc tags.
|
||||||
|
release = ''
|
||||||
|
|
||||||
|
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||||
|
# for a list of supported languages.
|
||||||
|
# language = None
|
||||||
|
|
||||||
|
# There are two options for replacing |today|: either, you set today to some
|
||||||
|
# non-false value, then it is used:
|
||||||
|
# today = ''
|
||||||
|
# Else, today_fmt is used as the format for a strftime call.
|
||||||
|
# today_fmt = '%B %d, %Y'
|
||||||
|
|
||||||
|
# List of patterns, relative to source directory, that match files and
|
||||||
|
# directories to ignore when looking for source files.
|
||||||
|
exclude_patterns = ['common/cli*', 'common/nova*',
|
||||||
|
'common/get-started*', 'common/dashboard*']
|
||||||
|
|
||||||
|
# The reST default role (used for this markup: `text`) to use for all
|
||||||
|
# documents.
|
||||||
|
# default_role = None
|
||||||
|
|
||||||
|
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||||
|
# add_function_parentheses = True
|
||||||
|
|
||||||
|
# If true, the current module name will be prepended to all description
|
||||||
|
# unit titles (such as .. function::).
|
||||||
|
# add_module_names = True
|
||||||
|
|
||||||
|
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||||
|
# output. They are ignored by default.
|
||||||
|
# show_authors = False
|
||||||
|
|
||||||
|
# The name of the Pygments (syntax highlighting) style to use.
|
||||||
|
pygments_style = 'sphinx'
|
||||||
|
|
||||||
|
# A list of ignored prefixes for module index sorting.
|
||||||
|
# modindex_common_prefix = []
|
||||||
|
|
||||||
|
# If true, keep warnings as "system message" paragraphs in the built documents.
|
||||||
|
# keep_warnings = False
|
||||||
|
|
||||||
|
|
||||||
# -- Options for HTML output ----------------------------------------------
|
# -- Options for HTML output ----------------------------------------------
|
||||||
|
|
||||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||||
# a list of builtin themes.
|
# a list of builtin themes.
|
||||||
html_theme = 'openstackdocs'
|
html_theme = 'openstackdocs'
|
||||||
|
|
||||||
|
# Theme options are theme-specific and customize the look and feel of a theme
|
||||||
|
# further. For a list of options available for each theme, see the
|
||||||
|
# documentation.
|
||||||
|
html_theme_options = {
|
||||||
|
'display_badge': False
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add any paths that contain custom themes here, relative to this directory.
|
||||||
|
# html_theme_path = [openstackdocstheme.get_html_theme_path()]
|
||||||
|
|
||||||
|
# The name for this set of Sphinx documents. If None, it defaults to
|
||||||
|
# "<project> v<release> documentation".
|
||||||
|
# html_title = None
|
||||||
|
|
||||||
|
# A shorter title for the navigation bar. Default is the same as html_title.
|
||||||
|
# html_short_title = None
|
||||||
|
|
||||||
|
# The name of an image file (relative to this directory) to place at the top
|
||||||
|
# of the sidebar.
|
||||||
|
# html_logo = None
|
||||||
|
|
||||||
|
# The name of an image file (within the static path) to use as favicon of the
|
||||||
|
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
|
||||||
|
# pixels large.
|
||||||
|
# html_favicon = None
|
||||||
|
|
||||||
|
# Add any paths that contain custom static files (such as style sheets) here,
|
||||||
|
# relative to this directory. They are copied after the builtin static files,
|
||||||
|
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||||
|
# html_static_path = []
|
||||||
|
|
||||||
|
# Add any extra paths that contain custom files (such as robots.txt or
|
||||||
|
# .htaccess) here, relative to this directory. These files are copied
|
||||||
|
# directly to the root of the documentation.
|
||||||
|
# html_extra_path = []
|
||||||
|
|
||||||
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
||||||
# using the given strftime format.
|
# using the given strftime format.
|
||||||
# So that we can enable "log-a-bug" links from each output HTML page, this
|
# So that we can enable "log-a-bug" links from each output HTML page, this
|
||||||
@ -49,6 +155,73 @@ html_theme = 'openstackdocs'
|
|||||||
# minutes.
|
# minutes.
|
||||||
html_last_updated_fmt = '%Y-%m-%d %H:%M'
|
html_last_updated_fmt = '%Y-%m-%d %H:%M'
|
||||||
|
|
||||||
|
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||||
|
# typographically correct entities.
|
||||||
|
# html_use_smartypants = True
|
||||||
|
|
||||||
|
# Custom sidebar templates, maps document names to template names.
|
||||||
|
# html_sidebars = {}
|
||||||
|
|
||||||
|
# Additional templates that should be rendered to pages, maps page names to
|
||||||
|
# template names.
|
||||||
|
# html_additional_pages = {}
|
||||||
|
|
||||||
|
# If false, no module index is generated.
|
||||||
|
# html_domain_indices = True
|
||||||
|
|
||||||
|
# If false, no index is generated.
|
||||||
|
html_use_index = False
|
||||||
|
|
||||||
|
# If true, the index is split into individual pages for each letter.
|
||||||
|
# html_split_index = False
|
||||||
|
|
||||||
|
# If true, links to the reST sources are added to the pages.
|
||||||
|
html_show_sourcelink = False
|
||||||
|
|
||||||
|
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
|
||||||
|
# html_show_sphinx = True
|
||||||
|
|
||||||
|
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
|
||||||
|
# html_show_copyright = True
|
||||||
|
|
||||||
|
# If true, an OpenSearch description file will be output, and all pages will
|
||||||
|
# contain a <link> tag referring to it. The value of this option must be the
|
||||||
|
# base URL from which the finished HTML is served.
|
||||||
|
# html_use_opensearch = ''
|
||||||
|
|
||||||
|
# This is the file name suffix for HTML files (e.g. ".xhtml").
|
||||||
|
# html_file_suffix = None
|
||||||
|
|
||||||
|
# Output file base name for HTML help builder.
|
||||||
|
htmlhelp_basename = 'ha-guide'
|
||||||
|
|
||||||
|
# If true, publish source files
|
||||||
|
html_copy_source = False
|
||||||
|
|
||||||
|
# -- Options for LaTeX output ---------------------------------------------
|
||||||
|
|
||||||
|
latex_engine = 'xelatex'
|
||||||
|
|
||||||
|
latex_elements = {
|
||||||
|
# The paper size ('letterpaper' or 'a4paper').
|
||||||
|
# 'papersize': 'letterpaper',
|
||||||
|
|
||||||
|
# set font (TODO: different fonts for translated PDF document builds)
|
||||||
|
'fontenc': '\\usepackage{fontspec}',
|
||||||
|
'fontpkg': '''\
|
||||||
|
\defaultfontfeatures{Scale=MatchLowercase}
|
||||||
|
\setmainfont{Liberation Serif}
|
||||||
|
\setsansfont{Liberation Sans}
|
||||||
|
\setmonofont[SmallCapsFont={Liberation Mono}]{Liberation Mono}
|
||||||
|
''',
|
||||||
|
|
||||||
|
# The font size ('10pt', '11pt' or '12pt').
|
||||||
|
# 'pointsize': '10pt',
|
||||||
|
|
||||||
|
# Additional stuff for the LaTeX preamble.
|
||||||
|
# 'preamble': '',
|
||||||
|
}
|
||||||
|
|
||||||
# Grouping the document tree into LaTeX files. List of tuples
|
# Grouping the document tree into LaTeX files. List of tuples
|
||||||
# (source start file, target name, title,
|
# (source start file, target name, title,
|
||||||
# author, documentclass [howto, manual, or own class]).
|
# author, documentclass [howto, manual, or own class]).
|
||||||
@ -57,5 +230,63 @@ latex_documents = [
|
|||||||
u'OpenStack contributors', 'manual'),
|
u'OpenStack contributors', 'manual'),
|
||||||
]
|
]
|
||||||
|
|
||||||
|
# The name of an image file (relative to this directory) to place at the top of
|
||||||
|
# the title page.
|
||||||
|
# latex_logo = None
|
||||||
|
|
||||||
|
# For "manual" documents, if this is true, then toplevel headings are parts,
|
||||||
|
# not chapters.
|
||||||
|
# latex_use_parts = False
|
||||||
|
|
||||||
|
# If true, show page references after internal links.
|
||||||
|
# latex_show_pagerefs = False
|
||||||
|
|
||||||
|
# If true, show URL addresses after external links.
|
||||||
|
# latex_show_urls = False
|
||||||
|
|
||||||
|
# Documents to append as an appendix to all manuals.
|
||||||
|
# latex_appendices = []
|
||||||
|
|
||||||
|
# If false, no module index is generated.
|
||||||
|
# latex_domain_indices = True
|
||||||
|
|
||||||
|
|
||||||
|
# -- Options for manual page output ---------------------------------------
|
||||||
|
|
||||||
|
# One entry per manual page. List of tuples
|
||||||
|
# (source start file, name, description, authors, manual section).
|
||||||
|
man_pages = [
|
||||||
|
('index', 'haguide', u'High Availability Guide',
|
||||||
|
[u'OpenStack contributors'], 1)
|
||||||
|
]
|
||||||
|
|
||||||
|
# If true, show URL addresses after external links.
|
||||||
|
# man_show_urls = False
|
||||||
|
|
||||||
|
|
||||||
|
# -- Options for Texinfo output -------------------------------------------
|
||||||
|
|
||||||
|
# Grouping the document tree into Texinfo files. List of tuples
|
||||||
|
# (source start file, target name, title, author,
|
||||||
|
# dir menu entry, description, category)
|
||||||
|
texinfo_documents = [
|
||||||
|
('index', 'HAGuide', u'High Availability Guide',
|
||||||
|
u'OpenStack contributors', 'HAGuide',
|
||||||
|
'This guide shows OpenStack operators and deployers how to configure'
|
||||||
|
'OpenStack to be robust and fault-tolerant.', 'Miscellaneous'),
|
||||||
|
]
|
||||||
|
|
||||||
|
# Documents to append as an appendix to all manuals.
|
||||||
|
# texinfo_appendices = []
|
||||||
|
|
||||||
|
# If false, no module index is generated.
|
||||||
|
# texinfo_domain_indices = True
|
||||||
|
|
||||||
|
# How to display URL addresses: 'footnote', 'no', or 'inline'.
|
||||||
|
# texinfo_show_urls = 'footnote'
|
||||||
|
|
||||||
|
# If true, do not generate a @detailmenu in the "Top" node's menu.
|
||||||
|
# texinfo_no_detailmenu = False
|
||||||
|
|
||||||
# -- Options for Internationalization output ------------------------------
|
# -- Options for Internationalization output ------------------------------
|
||||||
locale_dirs = ['locale/']
|
locale_dirs = ['locale/']
|
||||||
|
342
doc/source/control-plane-stateful.rst
Normal file
342
doc/source/control-plane-stateful.rst
Normal file
@ -0,0 +1,342 @@
|
|||||||
|
=================================
|
||||||
|
Configuring the stateful services
|
||||||
|
=================================
|
||||||
|
.. to do: scope how in depth we want these sections to be
|
||||||
|
|
||||||
|
Database for high availability
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Galera
|
||||||
|
------
|
||||||
|
|
||||||
|
The first step is to install the database that sits at the heart of the
|
||||||
|
cluster. To implement high availability, run an instance of the database on
|
||||||
|
each controller node and use Galera Cluster to provide replication between
|
||||||
|
them. Galera Cluster is a synchronous multi-master database cluster, based
|
||||||
|
on MySQL and the InnoDB storage engine. It is a high-availability service
|
||||||
|
that provides high system uptime, no data loss, and scalability for growth.
|
||||||
|
|
||||||
|
You can achieve high availability for the OpenStack database in many
|
||||||
|
different ways, depending on the type of database that you want to use.
|
||||||
|
There are three implementations of Galera Cluster available to you:
|
||||||
|
|
||||||
|
- `Galera Cluster for MySQL <http://galeracluster.com/>`_: The MySQL
|
||||||
|
reference implementation from Codership, Oy.
|
||||||
|
- `MariaDB Galera Cluster <https://mariadb.org/>`_: The MariaDB
|
||||||
|
implementation of Galera Cluster, which is commonly supported in
|
||||||
|
environments based on Red Hat distributions.
|
||||||
|
- `Percona XtraDB Cluster <https://www.percona.com/>`_: The XtraDB
|
||||||
|
implementation of Galera Cluster from Percona.
|
||||||
|
|
||||||
|
In addition to Galera Cluster, you can also achieve high availability
|
||||||
|
through other database options, such as PostgreSQL, which has its own
|
||||||
|
replication system.
|
||||||
|
|
||||||
|
Pacemaker active/passive with HAproxy
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Replicated storage
|
||||||
|
------------------
|
||||||
|
|
||||||
|
For example: DRBD
|
||||||
|
|
||||||
|
Shared storage
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Messaging service for high availability
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
RabbitMQ
|
||||||
|
--------
|
||||||
|
|
||||||
|
An AMQP (Advanced Message Queuing Protocol) compliant message bus is
|
||||||
|
required for most OpenStack components in order to coordinate the
|
||||||
|
execution of jobs entered into the system.
|
||||||
|
|
||||||
|
The most popular AMQP implementation used in OpenStack installations
|
||||||
|
is RabbitMQ.
|
||||||
|
|
||||||
|
RabbitMQ nodes fail over on the application and the infrastructure layers.
|
||||||
|
|
||||||
|
The application layer is controlled by the ``oslo.messaging``
|
||||||
|
configuration options for multiple AMQP hosts. If the AMQP node fails,
|
||||||
|
the application reconnects to the next one configured within the
|
||||||
|
specified reconnect interval. The specified reconnect interval
|
||||||
|
constitutes its SLA.
|
||||||
|
|
||||||
|
On the infrastructure layer, the SLA is the time for which RabbitMQ
|
||||||
|
cluster reassembles. Several cases are possible. The Mnesia keeper
|
||||||
|
node is the master of the corresponding Pacemaker resource for
|
||||||
|
RabbitMQ. When it fails, the result is a full AMQP cluster downtime
|
||||||
|
interval. Normally, its SLA is no more than several minutes. Failure
|
||||||
|
of another node that is a slave of the corresponding Pacemaker
|
||||||
|
resource for RabbitMQ results in no AMQP cluster downtime at all.
|
||||||
|
|
||||||
|
.. until we've determined the content depth, I've transferred RabbitMQ
|
||||||
|
configuration below from the old HA guide (darrenc)
|
||||||
|
|
||||||
|
Making the RabbitMQ service highly available involves the following steps:
|
||||||
|
|
||||||
|
- :ref:`Install RabbitMQ<rabbitmq-install>`
|
||||||
|
|
||||||
|
- :ref:`Configure RabbitMQ for HA queues<rabbitmq-configure>`
|
||||||
|
|
||||||
|
- :ref:`Configure OpenStack services to use RabbitMQ HA queues
|
||||||
|
<rabbitmq-services>`
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Access to RabbitMQ is not normally handled by HAProxy. Instead,
|
||||||
|
consumers must be supplied with the full list of hosts running
|
||||||
|
RabbitMQ with ``rabbit_hosts`` and turn on the ``rabbit_ha_queues``
|
||||||
|
option. For more information, read the `core issue
|
||||||
|
<http://people.redhat.com/jeckersb/private/vip-failover-tcp-persist.html>`_.
|
||||||
|
For more detail, read the `history and solution
|
||||||
|
<http://john.eckersberg.com/improving-ha-failures-with-tcp-timeouts.html>`_.
|
||||||
|
|
||||||
|
.. _rabbitmq-install:
|
||||||
|
|
||||||
|
Install RabbitMQ
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The commands for installing RabbitMQ are specific to the Linux distribution
|
||||||
|
you are using.
|
||||||
|
|
||||||
|
For Ubuntu or Debian:
|
||||||
|
|
||||||
|
.. code-block: console
|
||||||
|
|
||||||
|
# apt-get install rabbitmq-server
|
||||||
|
|
||||||
|
For RHEL, Fedora, or CentOS:
|
||||||
|
|
||||||
|
.. code-block: console
|
||||||
|
|
||||||
|
# yum install rabbitmq-server
|
||||||
|
|
||||||
|
For openSUSE:
|
||||||
|
|
||||||
|
.. code-block: console
|
||||||
|
|
||||||
|
# zypper install rabbitmq-server
|
||||||
|
|
||||||
|
For SLES 12:
|
||||||
|
|
||||||
|
.. code-block: console
|
||||||
|
|
||||||
|
# zypper addrepo -f obs://Cloud:OpenStack:Kilo/SLE_12 Kilo
|
||||||
|
[Verify the fingerprint of the imported GPG key. See below.]
|
||||||
|
# zypper install rabbitmq-server
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
For SLES 12, the packages are signed by GPG key 893A90DAD85F9316.
|
||||||
|
You should verify the fingerprint of the imported GPG key before using it.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
Key ID: 893A90DAD85F9316
|
||||||
|
Key Name: Cloud:OpenStack OBS Project <Cloud:OpenStack@build.opensuse.org>
|
||||||
|
Key Fingerprint: 35B34E18ABC1076D66D5A86B893A90DAD85F9316
|
||||||
|
Key Created: Tue Oct 8 13:34:21 2013
|
||||||
|
Key Expires: Thu Dec 17 13:34:21 2015
|
||||||
|
|
||||||
|
For more information, see the official installation manual for the
|
||||||
|
distribution:
|
||||||
|
|
||||||
|
- `Debian and Ubuntu <https://www.rabbitmq.com/install-debian.html>`_
|
||||||
|
- `RPM based <https://www.rabbitmq.com/install-rpm.html>`_
|
||||||
|
(RHEL, Fedora, CentOS, openSUSE)
|
||||||
|
|
||||||
|
.. _rabbitmq-configure:
|
||||||
|
|
||||||
|
Configure RabbitMQ for HA queues
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
.. [TODO: This section should begin with a brief mention
|
||||||
|
.. about what HA queues are and why they are valuable, etc]
|
||||||
|
|
||||||
|
.. [TODO: replace "currently" with specific release names]
|
||||||
|
|
||||||
|
.. [TODO: Does this list need to be updated? Perhaps we need a table
|
||||||
|
.. that shows each component and the earliest release that allows it
|
||||||
|
.. to work with HA queues.]
|
||||||
|
|
||||||
|
The following components/services can work with HA queues:
|
||||||
|
|
||||||
|
- OpenStack Compute
|
||||||
|
- OpenStack Block Storage
|
||||||
|
- OpenStack Networking
|
||||||
|
- Telemetry
|
||||||
|
|
||||||
|
Consider that, while exchanges and bindings survive the loss of individual
|
||||||
|
nodes, queues and their messages do not because a queue and its contents
|
||||||
|
are located on one node. If we lose this node, we also lose the queue.
|
||||||
|
|
||||||
|
Mirrored queues in RabbitMQ improve the availability of service since
|
||||||
|
it is resilient to failures.
|
||||||
|
|
||||||
|
Production servers should run (at least) three RabbitMQ servers for testing
|
||||||
|
and demonstration purposes, however it is possible to run only two servers.
|
||||||
|
In this section, we configure two nodes, called ``rabbit1`` and ``rabbit2``.
|
||||||
|
To build a broker, ensure that all nodes have the same Erlang cookie file.
|
||||||
|
|
||||||
|
.. [TODO: Should the example instead use a minimum of three nodes?]
|
||||||
|
|
||||||
|
#. Stop RabbitMQ and copy the cookie from the first node to each of the
|
||||||
|
other node(s):
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# scp /var/lib/rabbitmq/.erlang.cookie root@NODE:/var/lib/rabbitmq/.erlang.cookie
|
||||||
|
|
||||||
|
#. On each target node, verify the correct owner,
|
||||||
|
group, and permissions of the file :file:`erlang.cookie`:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
|
||||||
|
# chmod 400 /var/lib/rabbitmq/.erlang.cookie
|
||||||
|
|
||||||
|
#. Start the message queue service on all nodes and configure it to start
|
||||||
|
when the system boots. On Ubuntu, it is configured by default.
|
||||||
|
|
||||||
|
On CentOS, RHEL, openSUSE, and SLES:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# systemctl enable rabbitmq-server.service
|
||||||
|
# systemctl start rabbitmq-server.service
|
||||||
|
|
||||||
|
#. Verify that the nodes are running:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# rabbitmqctl cluster_status
|
||||||
|
Cluster status of node rabbit@NODE...
|
||||||
|
[{nodes,[{disc,[rabbit@NODE]}]},
|
||||||
|
{running_nodes,[rabbit@NODE]},
|
||||||
|
{partitions,[]}]
|
||||||
|
...done.
|
||||||
|
|
||||||
|
#. Run the following commands on each node except the first one:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# rabbitmqctl stop_app
|
||||||
|
Stopping node rabbit@NODE...
|
||||||
|
...done.
|
||||||
|
# rabbitmqctl join_cluster --ram rabbit@rabbit1
|
||||||
|
# rabbitmqctl start_app
|
||||||
|
Starting node rabbit@NODE ...
|
||||||
|
...done.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The default node type is a disc node. In this guide, nodes
|
||||||
|
join the cluster as RAM nodes.
|
||||||
|
|
||||||
|
#. Verify the cluster status:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# rabbitmqctl cluster_status
|
||||||
|
Cluster status of node rabbit@NODE...
|
||||||
|
[{nodes,[{disc,[rabbit@rabbit1]},{ram,[rabbit@NODE]}]}, \
|
||||||
|
{running_nodes,[rabbit@NODE,rabbit@rabbit1]}]
|
||||||
|
|
||||||
|
If the cluster is working, you can create usernames and passwords
|
||||||
|
for the queues.
|
||||||
|
|
||||||
|
#. To ensure that all queues except those with auto-generated names
|
||||||
|
are mirrored across all running nodes,
|
||||||
|
set the ``ha-mode`` policy key to all
|
||||||
|
by running the following command on one of the nodes:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# rabbitmqctl set_policy ha-all '^(?!amq\.).*' '{"ha-mode": "all"}'
|
||||||
|
|
||||||
|
More information is available in the RabbitMQ documentation:
|
||||||
|
|
||||||
|
- `Highly Available Queues <https://www.rabbitmq.com/ha.html>`_
|
||||||
|
- `Clustering Guide <https://www.rabbitmq.com/clustering.html>`_
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
As another option to make RabbitMQ highly available, RabbitMQ contains the
|
||||||
|
OCF scripts for the Pacemaker cluster resource agents since version 3.5.7.
|
||||||
|
It provides the active/active RabbitMQ cluster with mirrored queues.
|
||||||
|
For more information, see `Auto-configuration of a cluster with
|
||||||
|
a Pacemaker <https://www.rabbitmq.com/pacemaker.html>`_.
|
||||||
|
|
||||||
|
.. _rabbitmq-services:
|
||||||
|
|
||||||
|
Configure OpenStack services to use Rabbit HA queues
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Configure the OpenStack components to use at least two RabbitMQ nodes.
|
||||||
|
|
||||||
|
Use these steps to configurate all services using RabbitMQ:
|
||||||
|
|
||||||
|
#. RabbitMQ HA cluster ``host:port`` pairs:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
rabbit_hosts=rabbit1:5672,rabbit2:5672,rabbit3:5672
|
||||||
|
|
||||||
|
#. Retry connecting with RabbitMQ:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
rabbit_retry_interval=1
|
||||||
|
|
||||||
|
#. How long to back-off for between retries when connecting to RabbitMQ:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
rabbit_retry_backoff=2
|
||||||
|
|
||||||
|
#. Maximum retries with trying to connect to RabbitMQ (infinite by default):
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
rabbit_max_retries=0
|
||||||
|
|
||||||
|
#. Use durable queues in RabbitMQ:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
rabbit_durable_queues=true
|
||||||
|
|
||||||
|
#. Use HA queues in RabbitMQ (``x-ha-policy: all``):
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
rabbit_ha_queues=true
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
If you change the configuration from an old set-up
|
||||||
|
that did not use HA queues, restart the service:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# rabbitmqctl stop_app
|
||||||
|
# rabbitmqctl reset
|
||||||
|
# rabbitmqctl start_app
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Pacemaker active/passive
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Mirrored queues
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Qpid
|
||||||
|
----
|
518
doc/source/control-plane-stateless.rst
Normal file
518
doc/source/control-plane-stateless.rst
Normal file
@ -0,0 +1,518 @@
|
|||||||
|
==============================
|
||||||
|
Configuring stateless services
|
||||||
|
==============================
|
||||||
|
|
||||||
|
.. to do: scope what details we want on the following services
|
||||||
|
|
||||||
|
API services
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Load-balancer
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
HAProxy
|
||||||
|
-------
|
||||||
|
|
||||||
|
HAProxy provides a fast and reliable HTTP reverse proxy and load balancer
|
||||||
|
for TCP or HTTP applications. It is particularly suited for web crawling
|
||||||
|
under very high loads while needing persistence or Layer 7 processing.
|
||||||
|
It realistically supports tens of thousands of connections with recent
|
||||||
|
hardware.
|
||||||
|
|
||||||
|
Each instance of HAProxy configures its front end to accept connections only
|
||||||
|
to the virtual IP (VIP) address. The HAProxy back end (termination
|
||||||
|
point) is a list of all the IP addresses of instances for load balancing.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Ensure your HAProxy installation is not a single point of failure,
|
||||||
|
it is advisable to have multiple HAProxy instances running.
|
||||||
|
|
||||||
|
You can also ensure the availability by other means, using Keepalived
|
||||||
|
or Pacemaker.
|
||||||
|
|
||||||
|
Alternatively, you can use a commercial load balancer, which is hardware
|
||||||
|
or software. We recommend a hardware load balancer as it generally has
|
||||||
|
good performance.
|
||||||
|
|
||||||
|
For detailed instructions about installing HAProxy on your nodes,
|
||||||
|
see the HAProxy `official documentation <http://www.haproxy.org/#docs>`_.
|
||||||
|
|
||||||
|
Configuring HAProxy
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
#. Restart the HAProxy service.
|
||||||
|
|
||||||
|
#. Locate your HAProxy instance on each OpenStack controller in your
|
||||||
|
environment. The following is an example ``/etc/haproxy/haproxy.cfg``
|
||||||
|
configuration file. Configure your instance using the following
|
||||||
|
configuration file, you will need a copy of it on each
|
||||||
|
controller node.
|
||||||
|
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
global
|
||||||
|
chroot /var/lib/haproxy
|
||||||
|
daemon
|
||||||
|
group haproxy
|
||||||
|
maxconn 4000
|
||||||
|
pidfile /var/run/haproxy.pid
|
||||||
|
user haproxy
|
||||||
|
|
||||||
|
defaults
|
||||||
|
log global
|
||||||
|
maxconn 4000
|
||||||
|
option redispatch
|
||||||
|
retries 3
|
||||||
|
timeout http-request 10s
|
||||||
|
timeout queue 1m
|
||||||
|
timeout connect 10s
|
||||||
|
timeout client 1m
|
||||||
|
timeout server 1m
|
||||||
|
timeout check 10s
|
||||||
|
|
||||||
|
listen dashboard_cluster
|
||||||
|
bind <Virtual IP>:443
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option httpchk
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:443 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:443 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:443 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen galera_cluster
|
||||||
|
bind <Virtual IP>:3306
|
||||||
|
balance source
|
||||||
|
option mysql-check
|
||||||
|
server controller1 10.0.0.12:3306 check port 9200 inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:3306 backup check port 9200 inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:3306 backup check port 9200 inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen glance_api_cluster
|
||||||
|
bind <Virtual IP>:9292
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option httpchk
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:9292 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:9292 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:9292 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen glance_registry_cluster
|
||||||
|
bind <Virtual IP>:9191
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:9191 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:9191 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:9191 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen keystone_admin_cluster
|
||||||
|
bind <Virtual IP>:35357
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option httpchk
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:35357 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:35357 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:35357 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen keystone_public_internal_cluster
|
||||||
|
bind <Virtual IP>:5000
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option httpchk
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:5000 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:5000 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:5000 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen nova_ec2_api_cluster
|
||||||
|
bind <Virtual IP>:8773
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:8773 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:8773 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:8773 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen nova_compute_api_cluster
|
||||||
|
bind <Virtual IP>:8774
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option httpchk
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:8774 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:8774 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:8774 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen nova_metadata_api_cluster
|
||||||
|
bind <Virtual IP>:8775
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:8775 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:8775 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:8775 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen cinder_api_cluster
|
||||||
|
bind <Virtual IP>:8776
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option httpchk
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:8776 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:8776 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:8776 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen ceilometer_api_cluster
|
||||||
|
bind <Virtual IP>:8777
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:8777 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:8777 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:8777 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen nova_vncproxy_cluster
|
||||||
|
bind <Virtual IP>:6080
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:6080 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:6080 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:6080 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen neutron_api_cluster
|
||||||
|
bind <Virtual IP>:9696
|
||||||
|
balance source
|
||||||
|
option tcpka
|
||||||
|
option httpchk
|
||||||
|
option tcplog
|
||||||
|
server controller1 10.0.0.12:9696 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:9696 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:9696 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
listen swift_proxy_cluster
|
||||||
|
bind <Virtual IP>:8080
|
||||||
|
balance source
|
||||||
|
option tcplog
|
||||||
|
option tcpka
|
||||||
|
server controller1 10.0.0.12:8080 check inter 2000 rise 2 fall 5
|
||||||
|
server controller2 10.0.0.13:8080 check inter 2000 rise 2 fall 5
|
||||||
|
server controller3 10.0.0.14:8080 check inter 2000 rise 2 fall 5
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The Galera cluster configuration directive ``backup`` indicates
|
||||||
|
that two of the three controllers are standby nodes.
|
||||||
|
This ensures that only one node services write requests
|
||||||
|
because OpenStack support for multi-node writes is not yet production-ready.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The Telemetry API service configuration does not have the ``option httpchk``
|
||||||
|
directive as it cannot process this check properly.
|
||||||
|
|
||||||
|
.. TODO: explain why the Telemetry API is so special
|
||||||
|
|
||||||
|
#. Configure the kernel parameter to allow non-local IP binding. This allows
|
||||||
|
running HAProxy instances to bind to a VIP for failover. Add following line
|
||||||
|
to ``/etc/sysctl.conf``:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
net.ipv4.ip_nonlocal_bind = 1
|
||||||
|
|
||||||
|
#. Restart the host or, to make changes work immediately, invoke:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ sysctl -p
|
||||||
|
|
||||||
|
#. Add HAProxy to the cluster and ensure the VIPs can only run on machines
|
||||||
|
where HAProxy is active:
|
||||||
|
|
||||||
|
``pcs``
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ pcs resource create lb-haproxy systemd:haproxy --clone
|
||||||
|
$ pcs constraint order start vip then lb-haproxy-clone kind=Optional
|
||||||
|
$ pcs constraint colocation add lb-haproxy-clone with vip
|
||||||
|
|
||||||
|
``crmsh``
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ crm cib new conf-haproxy
|
||||||
|
$ crm configure primitive haproxy lsb:haproxy op monitor interval="1s"
|
||||||
|
$ crm configure clone haproxy-clone haproxy
|
||||||
|
$ crm configure colocation vip-with-haproxy inf: vip haproxy-clone
|
||||||
|
$ crm configure order haproxy-after-vip mandatory: vip haproxy-clone
|
||||||
|
|
||||||
|
|
||||||
|
Pacemaker versus systemd
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
Memcached
|
||||||
|
---------
|
||||||
|
|
||||||
|
Memcached is a general-purpose distributed memory caching system. It
|
||||||
|
is used to speed up dynamic database-driven websites by caching data
|
||||||
|
and objects in RAM to reduce the number of times an external data
|
||||||
|
source must be read.
|
||||||
|
|
||||||
|
Memcached is a memory cache demon that can be used by most OpenStack
|
||||||
|
services to store ephemeral data, such as tokens.
|
||||||
|
|
||||||
|
Access to Memcached is not handled by HAProxy because replicated
|
||||||
|
access is currently in an experimental state. Instead, OpenStack
|
||||||
|
services must be supplied with the full list of hosts running
|
||||||
|
Memcached.
|
||||||
|
|
||||||
|
The Memcached client implements hashing to balance objects among the
|
||||||
|
instances. Failure of an instance impacts only a percentage of the
|
||||||
|
objects and the client automatically removes it from the list of
|
||||||
|
instances. The SLA is several minutes.
|
||||||
|
|
||||||
|
|
||||||
|
Highly available API services
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Identity API
|
||||||
|
------------
|
||||||
|
|
||||||
|
Ensure you have read the
|
||||||
|
`OpenStack Identity service getting started documentation
|
||||||
|
<https://docs.openstack.org/admin-guide/common/get-started-identity.html>`_.
|
||||||
|
|
||||||
|
.. to do: reference controller-ha-identity and see if section involving
|
||||||
|
adding to pacemaker is in scope
|
||||||
|
|
||||||
|
|
||||||
|
Add OpenStack Identity resource to Pacemaker
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The following section(s) detail how to add the Identity service
|
||||||
|
to Pacemaker on SUSE and Red Hat.
|
||||||
|
|
||||||
|
SUSE
|
||||||
|
----
|
||||||
|
|
||||||
|
SUSE Enterprise Linux and SUSE-based distributions, such as openSUSE,
|
||||||
|
use a set of OCF agents for controlling OpenStack services.
|
||||||
|
|
||||||
|
#. Run the following commands to download the OpenStack Identity resource
|
||||||
|
to Pacemaker:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cd /usr/lib/ocf/resource.d
|
||||||
|
# mkdir openstack
|
||||||
|
# cd openstack
|
||||||
|
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/keystone
|
||||||
|
# chmod a+rx *
|
||||||
|
|
||||||
|
#. Add the Pacemaker configuration for the OpenStack Identity resource
|
||||||
|
by running the following command to connect to the Pacemaker cluster:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# crm configure
|
||||||
|
|
||||||
|
#. Add the following cluster resources:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
clone p_keystone ocf:openstack:keystone \
|
||||||
|
params config="/etc/keystone/keystone.conf" os_password="secretsecret" os_username="admin" os_tenant_name="admin" os_auth_url="http://10.0.0.11:5000/v2.0/" \
|
||||||
|
op monitor interval="30s" timeout="30s"
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This configuration creates ``p_keystone``,
|
||||||
|
a resource for managing the OpenStack Identity service.
|
||||||
|
|
||||||
|
#. Commit your configuration changes from the :command:`crm configure` menu
|
||||||
|
with the following command:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# commit
|
||||||
|
|
||||||
|
The :command:`crm configure` supports batch input. You may have to copy and
|
||||||
|
paste the above lines into your live Pacemaker configuration, and then make
|
||||||
|
changes as required.
|
||||||
|
|
||||||
|
For example, you may enter ``edit p_ip_keystone`` from the
|
||||||
|
:command:`crm configure` menu and edit the resource to match your preferred
|
||||||
|
virtual IP address.
|
||||||
|
|
||||||
|
Pacemaker now starts the OpenStack Identity service and its dependent
|
||||||
|
resources on all of your nodes.
|
||||||
|
|
||||||
|
Red Hat
|
||||||
|
--------
|
||||||
|
|
||||||
|
For Red Hat Enterprise Linux and Red Hat-based Linux distributions,
|
||||||
|
the following process uses Systemd unit files.
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# pcs resource create openstack-keystone systemd:openstack-keystone --clone interleave=true
|
||||||
|
|
||||||
|
.. _identity-config-identity:
|
||||||
|
|
||||||
|
Configure OpenStack Identity service
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
#. Edit the :file:`keystone.conf` file
|
||||||
|
to change the values of the :manpage:`bind(2)` parameters:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
bind_host = 10.0.0.12
|
||||||
|
public_bind_host = 10.0.0.12
|
||||||
|
admin_bind_host = 10.0.0.12
|
||||||
|
|
||||||
|
The ``admin_bind_host`` parameter
|
||||||
|
lets you use a private network for admin access.
|
||||||
|
|
||||||
|
#. To be sure that all data is highly available,
|
||||||
|
ensure that everything is stored in the MySQL database
|
||||||
|
(which is also highly available):
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[catalog]
|
||||||
|
driver = keystone.catalog.backends.sql.Catalog
|
||||||
|
# ...
|
||||||
|
[identity]
|
||||||
|
driver = keystone.identity.backends.sql.Identity
|
||||||
|
# ...
|
||||||
|
|
||||||
|
#. If the Identity service will be sending ceilometer notifications
|
||||||
|
and your message bus is configured for high availability, you will
|
||||||
|
need to ensure that the Identity service is correctly configured to
|
||||||
|
use it.
|
||||||
|
|
||||||
|
.. _identity-services-config:
|
||||||
|
|
||||||
|
Configure OpenStack services to use the highly available OpenStack Identity
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Your OpenStack services now point their OpenStack Identity configuration
|
||||||
|
to the highly available virtual cluster IP address.
|
||||||
|
|
||||||
|
#. For OpenStack Compute service, (if your OpenStack Identity service
|
||||||
|
IP address is 10.0.0.11) use the following configuration in the
|
||||||
|
:file:`api-paste.ini` file:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
auth_host = 10.0.0.11
|
||||||
|
|
||||||
|
#. Create the OpenStack Identity Endpoint with this IP address.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
If you are using both private and public IP addresses,
|
||||||
|
create two virtual IP addresses and define the endpoint. For
|
||||||
|
example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
$service-type public http://PUBLIC_VIP:5000/v2.0
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
$service-type admin http://10.0.0.11:35357/v2.0
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
$service-type internal http://10.0.0.11:5000/v2.0
|
||||||
|
|
||||||
|
#. If you are using Dashboard (horizon), edit the :file:`local_settings.py`
|
||||||
|
file to include the following:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
OPENSTACK_HOST = 10.0.0.11
|
||||||
|
|
||||||
|
|
||||||
|
Telemetry API
|
||||||
|
-------------
|
||||||
|
|
||||||
|
The Telemetry polling agent can be configured to partition its polling
|
||||||
|
workload between multiple agents. This enables high availability (HA).
|
||||||
|
|
||||||
|
Both the central and the compute agent can run in an HA deployment.
|
||||||
|
This means that multiple instances of these services can run in
|
||||||
|
parallel with workload partitioning among these running instances.
|
||||||
|
|
||||||
|
The `Tooz <https://pypi.org/project/tooz>`_ library provides
|
||||||
|
the coordination within the groups of service instances.
|
||||||
|
It provides an API above several back ends that can be used for building
|
||||||
|
distributed applications.
|
||||||
|
|
||||||
|
Tooz supports
|
||||||
|
`various drivers <https://docs.openstack.org/tooz/latest/user/drivers.html>`_
|
||||||
|
including the following back end solutions:
|
||||||
|
|
||||||
|
* `Zookeeper <http://zookeeper.apache.org/>`_:
|
||||||
|
Recommended solution by the Tooz project.
|
||||||
|
|
||||||
|
* `Redis <http://redis.io/>`_:
|
||||||
|
Recommended solution by the Tooz project.
|
||||||
|
|
||||||
|
* `Memcached <http://memcached.org/>`_:
|
||||||
|
Recommended for testing.
|
||||||
|
|
||||||
|
You must configure a supported Tooz driver for the HA deployment of
|
||||||
|
the Telemetry services.
|
||||||
|
|
||||||
|
For information about the required configuration options
|
||||||
|
to set in the :file:`ceilometer.conf`, see the `coordination section
|
||||||
|
<https://docs.openstack.org/ocata/config-reference/telemetry.html>`_
|
||||||
|
in the OpenStack Configuration Reference.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Only one instance for the central and compute agent service(s) is able
|
||||||
|
to run and function correctly if the ``backend_url`` option is not set.
|
||||||
|
|
||||||
|
The availability check of the instances is provided by heartbeat messages.
|
||||||
|
When the connection with an instance is lost, the workload will be
|
||||||
|
reassigned within the remaining instances in the next polling cycle.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Memcached uses a timeout value, which should always be set to
|
||||||
|
a value that is higher than the heartbeat value set for Telemetry.
|
||||||
|
|
||||||
|
For backward compatibility and supporting existing deployments, the central
|
||||||
|
agent configuration supports using different configuration files. This is for
|
||||||
|
groups of service instances that are running in parallel.
|
||||||
|
For enabling this configuration, set a value for the
|
||||||
|
``partitioning_group_prefix`` option in the
|
||||||
|
`polling section <https://docs.openstack.org/ocata/config-reference/telemetry/telemetry-config-options.html>`_
|
||||||
|
in the OpenStack Configuration Reference.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
For each sub-group of the central agent pool with the same
|
||||||
|
``partitioning_group_prefix``, a disjoint subset of meters must be polled
|
||||||
|
to avoid samples being missing or duplicated. The list of meters to poll
|
||||||
|
can be set in the :file:`/etc/ceilometer/pipeline.yaml` configuration file.
|
||||||
|
For more information about pipelines see the `Data processing and pipelines
|
||||||
|
<https://docs.openstack.org/admin-guide/telemetry-data-pipelines.html>`_
|
||||||
|
section.
|
||||||
|
|
||||||
|
To enable the compute agent to run multiple instances simultaneously with
|
||||||
|
workload partitioning, the ``workload_partitioning`` option must be set to
|
||||||
|
``True`` under the `compute section <https://docs.openstack.org/ocata/config-reference/telemetry.html>`_
|
||||||
|
in the :file:`ceilometer.conf` configuration file.
|
||||||
|
|
||||||
|
|
||||||
|
.. To Do: Cover any other projects here with API services which require specific
|
||||||
|
HA details.
|
9
doc/source/control-plane.rst
Normal file
9
doc/source/control-plane.rst
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
===========================
|
||||||
|
Configuring a control plane
|
||||||
|
===========================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
control-plane-stateless.rst
|
||||||
|
control-plane-stateful.rst
|
BIN
doc/source/figures/Cluster-deployment-collapsed.png
Normal file
BIN
doc/source/figures/Cluster-deployment-collapsed.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 223 KiB |
BIN
doc/source/figures/Cluster-deployment-segregated.png
Normal file
BIN
doc/source/figures/Cluster-deployment-segregated.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 215 KiB |
15
doc/source/ha-community.rst
Normal file
15
doc/source/ha-community.rst
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
============
|
||||||
|
HA community
|
||||||
|
============
|
||||||
|
|
||||||
|
The OpenStack HA community holds `weekly IRC meetings
|
||||||
|
<https://wiki.openstack.org/wiki/Meetings/HATeamMeeting>`_ to discuss
|
||||||
|
a range of topics relating to HA in OpenStack. Everyone interested is
|
||||||
|
encouraged to attend. The `logs of all previous meetings
|
||||||
|
<http://eavesdrop.openstack.org/meetings/ha/>`_ are available to read.
|
||||||
|
|
||||||
|
You can contact the HA community directly in `the #openstack-ha
|
||||||
|
channel on Freenode IRC <https://wiki.openstack.org/wiki/IRC>`_, or by
|
||||||
|
sending mail to the `openstack-dev
|
||||||
|
<https://wiki.openstack.org/wiki/Mailing_Lists#Future_Development>`_
|
||||||
|
mailing list with the ``[HA]`` prefix in the ``Subject`` header.
|
@ -5,8 +5,31 @@ OpenStack High Availability Guide
|
|||||||
Abstract
|
Abstract
|
||||||
~~~~~~~~
|
~~~~~~~~
|
||||||
|
|
||||||
This guide provides information about configuring OpenStack services for high
|
This guide describes how to install and configure OpenStack for high
|
||||||
availability.
|
availability. It supplements the Installation Guides
|
||||||
|
and assumes that you are familiar with the material in those guides.
|
||||||
|
|
||||||
This is a placeholder while we migrate information over from another repo.
|
.. warning::
|
||||||
|
|
||||||
|
This guide is a work-in-progress and changing rapidly
|
||||||
|
while we continue to test and enhance the guidance. There are
|
||||||
|
open `TODO` items throughout and available on the OpenStack manuals
|
||||||
|
`bug list <https://bugs.launchpad.net/openstack-manuals?field.tag=ha-guide>`_.
|
||||||
|
Please help where you are able.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
common/conventions.rst
|
||||||
|
overview.rst
|
||||||
|
intro-ha.rst
|
||||||
|
intro-os-ha.rst
|
||||||
|
control-plane.rst
|
||||||
|
networking-ha.rst
|
||||||
|
storage-ha.rst
|
||||||
|
compute-node-ha.rst
|
||||||
|
monitoring.rst
|
||||||
|
testing.rst
|
||||||
|
ref-arch-examples.rst
|
||||||
|
ha-community.rst
|
||||||
|
common/appendix.rst
|
||||||
|
127
doc/source/intro-ha-common-tech.rst
Normal file
127
doc/source/intro-ha-common-tech.rst
Normal file
@ -0,0 +1,127 @@
|
|||||||
|
========================
|
||||||
|
Commonly used technology
|
||||||
|
========================
|
||||||
|
High availability can be achieved only on system level, while both hardware and
|
||||||
|
software components can contribute to the system level availability.
|
||||||
|
This document lists the most common hardware and software technologies
|
||||||
|
that can be used to build a highly available system.
|
||||||
|
|
||||||
|
Hardware
|
||||||
|
~~~~~~~~
|
||||||
|
Using different technologies to enable high availability on the hardware
|
||||||
|
level provides a good basis to build a high available system. The next chapters
|
||||||
|
discuss the most common technologies used in this field.
|
||||||
|
|
||||||
|
Redundant switches
|
||||||
|
------------------
|
||||||
|
Network switches are single point of failures as networking is critical to
|
||||||
|
operate all other basic domains of the infrastructure, like compute and
|
||||||
|
storage. Network switches need to be able to forward the network traffic
|
||||||
|
and be able to forward the traffic to a working next hop.
|
||||||
|
For these reasons consider the following two factors when making a network
|
||||||
|
switch redundant:
|
||||||
|
|
||||||
|
#. The network switch itself should synchronize its internal state to a
|
||||||
|
redundant switch either in active/active or active/passive way.
|
||||||
|
|
||||||
|
#. The network topology should be designed in a way that the network router can
|
||||||
|
use at least two paths in every critical direction.
|
||||||
|
|
||||||
|
Bonded interfaces
|
||||||
|
-----------------
|
||||||
|
Bonded interfaces are two independent physical network interfaces handled as
|
||||||
|
one interface in active/passive or in active/active redundancy mode. In
|
||||||
|
active/passive mode, if an error happens in the active network interface or in
|
||||||
|
the remote end of the interface, the interfaces are switched over. In
|
||||||
|
active/active mode, when an error happens in an interface or in the remote end
|
||||||
|
of an interface, then the interface is marked as unavailable and ceases to be
|
||||||
|
used.
|
||||||
|
|
||||||
|
Load balancers
|
||||||
|
--------------
|
||||||
|
Physical load balancers are special routers which direct the traffic in
|
||||||
|
different directions based on a set of rules. Load balancers can be in
|
||||||
|
redundant mode similarly to the physical switches.
|
||||||
|
Load balancers are also important for distributing the traffic to the different
|
||||||
|
active/active components of the system.
|
||||||
|
|
||||||
|
Storage
|
||||||
|
-------
|
||||||
|
Physical storage high availability can be achieved with different scopes:
|
||||||
|
|
||||||
|
#. High availability within a hardware unit with redundant disks (mostly
|
||||||
|
organized into different RAID configurations), redundant control components,
|
||||||
|
redundant I/O interfaces and redundant power supply.
|
||||||
|
|
||||||
|
#. System level high availability with redundant hardware units with data
|
||||||
|
replication.
|
||||||
|
|
||||||
|
Software
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
HAproxy
|
||||||
|
-------
|
||||||
|
|
||||||
|
HAProxy provides a fast and reliable HTTP reverse proxy and load balancer
|
||||||
|
for TCP or HTTP applications. It is particularly suited for web crawling
|
||||||
|
under very high loads while needing persistence or Layer 7 processing.
|
||||||
|
It realistically supports tens of thousands of connections with recent
|
||||||
|
hardware.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Ensure your HAProxy installation is not a single point of failure,
|
||||||
|
it is advisable to have multiple HAProxy instances running.
|
||||||
|
|
||||||
|
You can also ensure the availability by other means, using Keepalived
|
||||||
|
or Pacemaker.
|
||||||
|
|
||||||
|
Alternatively, you can use a commercial load balancer, which is hardware
|
||||||
|
or software. We recommend a hardware load balancer as it generally has
|
||||||
|
good performance.
|
||||||
|
|
||||||
|
For detailed instructions about installing HAProxy on your nodes,
|
||||||
|
see the HAProxy `official documentation <http://www.haproxy.org/#docs>`_.
|
||||||
|
|
||||||
|
keepalived
|
||||||
|
----------
|
||||||
|
|
||||||
|
`keepalived <http://www.keepalived.org/>`_ is a routing software that
|
||||||
|
provides facilities for load balancing and high-availability to Linux
|
||||||
|
system and Linux based infrastructures.
|
||||||
|
|
||||||
|
Keepalived implements a set of checkers to dynamically and
|
||||||
|
adaptively maintain and manage loadbalanced server pool according
|
||||||
|
their health.
|
||||||
|
|
||||||
|
The keepalived daemon can be used to monitor services or systems and
|
||||||
|
to automatically failover to a standby if problems occur.
|
||||||
|
|
||||||
|
Pacemaker
|
||||||
|
---------
|
||||||
|
|
||||||
|
`Pacemaker <http://clusterlabs.org/>`_ cluster stack is a state-of-the-art
|
||||||
|
high availability and load balancing stack for the Linux platform.
|
||||||
|
Pacemaker is used to make OpenStack infrastructure highly available.
|
||||||
|
|
||||||
|
Pacemaker relies on the
|
||||||
|
`Corosync <http://corosync.github.io/corosync/>`_ messaging layer
|
||||||
|
for reliable cluster communications. Corosync implements the Totem single-ring
|
||||||
|
ordering and membership protocol. It also provides UDP and InfiniBand based
|
||||||
|
messaging, quorum, and cluster membership to Pacemaker.
|
||||||
|
|
||||||
|
Pacemaker does not inherently understand the applications it manages.
|
||||||
|
Instead, it relies on resource agents (RAs) that are scripts that encapsulate
|
||||||
|
the knowledge of how to start, stop, and check the health of each application
|
||||||
|
managed by the cluster.
|
||||||
|
|
||||||
|
These agents must conform to one of the `OCF <https://github.com/ClusterLabs/
|
||||||
|
OCF-spec/blob/master/ra/resource-agent-api.md>`_,
|
||||||
|
`SysV Init <http://refspecs.linux-foundation.org/LSB_3.0.0/LSB-Core-generic/
|
||||||
|
LSB-Core-generic/iniscrptact.html>`_, Upstart, or Systemd standards.
|
||||||
|
|
||||||
|
Pacemaker ships with a large set of OCF agents (such as those managing
|
||||||
|
MySQL databases, virtual IP addresses, and RabbitMQ), but can also use
|
||||||
|
any agents already installed on your system and can be extended with
|
||||||
|
your own (see the
|
||||||
|
`developer guide <http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html>`_).
|
147
doc/source/intro-ha-key-concepts.rst
Normal file
147
doc/source/intro-ha-key-concepts.rst
Normal file
@ -0,0 +1,147 @@
|
|||||||
|
============
|
||||||
|
Key concepts
|
||||||
|
============
|
||||||
|
|
||||||
|
Redundancy and failover
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
High availability is implemented with redundant hardware
|
||||||
|
running redundant instances of each service.
|
||||||
|
If one piece of hardware running one instance of a service fails,
|
||||||
|
the system can then failover to use another instance of a service
|
||||||
|
that is running on hardware that did not fail.
|
||||||
|
|
||||||
|
A crucial aspect of high availability
|
||||||
|
is the elimination of single points of failure (SPOFs).
|
||||||
|
A SPOF is an individual piece of equipment or software
|
||||||
|
that causes system downtime or data loss if it fails.
|
||||||
|
In order to eliminate SPOFs, check that mechanisms exist for redundancy of:
|
||||||
|
|
||||||
|
- Network components, such as switches and routers
|
||||||
|
|
||||||
|
- Applications and automatic service migration
|
||||||
|
|
||||||
|
- Storage components
|
||||||
|
|
||||||
|
- Facility services such as power, air conditioning, and fire protection
|
||||||
|
|
||||||
|
In the event that a component fails and a back-up system must take on
|
||||||
|
its load, most high availability systems will replace the failed
|
||||||
|
component as quickly as possible to maintain necessary redundancy. This
|
||||||
|
way time spent in a degraded protection state is minimized.
|
||||||
|
|
||||||
|
Most high availability systems fail in the event of multiple
|
||||||
|
independent (non-consequential) failures. In this case, most
|
||||||
|
implementations favor protecting data over maintaining availability.
|
||||||
|
|
||||||
|
High availability systems typically achieve an uptime percentage of
|
||||||
|
99.99% or more, which roughly equates to less than an hour of
|
||||||
|
cumulative downtime per year. In order to achieve this, high
|
||||||
|
availability systems should keep recovery times after a failure to
|
||||||
|
about one to two minutes, sometimes significantly less.
|
||||||
|
|
||||||
|
OpenStack currently meets such availability requirements for its own
|
||||||
|
infrastructure services, meaning that an uptime of 99.99% is feasible
|
||||||
|
for the OpenStack infrastructure proper. However, OpenStack does not
|
||||||
|
guarantee 99.99% availability for individual guest instances.
|
||||||
|
|
||||||
|
This document discusses some common methods of implementing highly
|
||||||
|
available systems, with an emphasis on the core OpenStack services and
|
||||||
|
other open source services that are closely aligned with OpenStack.
|
||||||
|
|
||||||
|
You will need to address high availability concerns for any applications
|
||||||
|
software that you run on your OpenStack environment. The important thing is
|
||||||
|
to make sure that your services are redundant and available.
|
||||||
|
How you achieve that is up to you.
|
||||||
|
|
||||||
|
Active/passive versus active/active
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Stateful services can be configured as active/passive or active/active,
|
||||||
|
which are defined as follows:
|
||||||
|
|
||||||
|
:term:`active/passive configuration`
|
||||||
|
Maintains a redundant instance
|
||||||
|
that can be brought online when the active service fails.
|
||||||
|
For example, OpenStack writes to the main database
|
||||||
|
while maintaining a disaster recovery database that can be brought online
|
||||||
|
if the main database fails.
|
||||||
|
|
||||||
|
A typical active/passive installation for a stateful service maintains
|
||||||
|
a replacement resource that can be brought online when required.
|
||||||
|
Requests are handled using a :term:`virtual IP address (VIP)` that
|
||||||
|
facilitates returning to service with minimal reconfiguration.
|
||||||
|
A separate application (such as Pacemaker or Corosync) monitors
|
||||||
|
these services, bringing the backup online as necessary.
|
||||||
|
|
||||||
|
:term:`active/active configuration`
|
||||||
|
Each service also has a backup but manages both the main and
|
||||||
|
redundant systems concurrently.
|
||||||
|
This way, if there is a failure, the user is unlikely to notice.
|
||||||
|
The backup system is already online and takes on increased load
|
||||||
|
while the main system is fixed and brought back online.
|
||||||
|
|
||||||
|
Typically, an active/active installation for a stateless service
|
||||||
|
maintains a redundant instance, and requests are load balanced using
|
||||||
|
a virtual IP address and a load balancer such as HAProxy.
|
||||||
|
|
||||||
|
A typical active/active installation for a stateful service includes
|
||||||
|
redundant services, with all instances having an identical state. In
|
||||||
|
other words, updates to one instance of a database update all other
|
||||||
|
instances. This way a request to one instance is the same as a
|
||||||
|
request to any other. A load balancer manages the traffic to these
|
||||||
|
systems, ensuring that operational systems always handle the
|
||||||
|
request.
|
||||||
|
|
||||||
|
Clusters and quorums
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The quorum specifies the minimal number of nodes
|
||||||
|
that must be functional in a cluster of redundant nodes
|
||||||
|
in order for the cluster to remain functional.
|
||||||
|
When one node fails and failover transfers control to other nodes,
|
||||||
|
the system must ensure that data and processes remain sane.
|
||||||
|
To determine this, the contents of the remaining nodes are compared
|
||||||
|
and, if there are discrepancies, a majority rules algorithm is implemented.
|
||||||
|
|
||||||
|
For this reason, each cluster in a high availability environment should
|
||||||
|
have an odd number of nodes and the quorum is defined as more than a half
|
||||||
|
of the nodes.
|
||||||
|
If multiple nodes fail so that the cluster size falls below the quorum
|
||||||
|
value, the cluster itself fails.
|
||||||
|
|
||||||
|
For example, in a seven-node cluster, the quorum should be set to
|
||||||
|
``floor(7/2) + 1 == 4``. If quorum is four and four nodes fail simultaneously,
|
||||||
|
the cluster itself would fail, whereas it would continue to function, if
|
||||||
|
no more than three nodes fail. If split to partitions of three and four nodes
|
||||||
|
respectively, the quorum of four nodes would continue to operate the majority
|
||||||
|
partition and stop or fence the minority one (depending on the
|
||||||
|
no-quorum-policy cluster configuration).
|
||||||
|
|
||||||
|
And the quorum could also have been set to three, just as a configuration
|
||||||
|
example.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
We do not recommend setting the quorum to a value less than ``floor(n/2) + 1``
|
||||||
|
as it would likely cause a split-brain in a face of network partitions.
|
||||||
|
|
||||||
|
When four nodes fail simultaneously, the cluster would continue to function as
|
||||||
|
well. But if split to partitions of three and four nodes respectively, the
|
||||||
|
quorum of three would have made both sides to attempt to fence the other and
|
||||||
|
host resources. Without fencing enabled, it would go straight to running
|
||||||
|
two copies of each resource.
|
||||||
|
|
||||||
|
This is why setting the quorum to a value less than ``floor(n/2) + 1`` is
|
||||||
|
dangerous. However it may be required for some specific cases, such as a
|
||||||
|
temporary measure at a point it is known with 100% certainty that the other
|
||||||
|
nodes are down.
|
||||||
|
|
||||||
|
When configuring an OpenStack environment for study or demonstration purposes,
|
||||||
|
it is possible to turn off the quorum checking. Production systems should
|
||||||
|
always run with quorum enabled.
|
||||||
|
|
||||||
|
Load balancing
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. to do: definition and description of need within HA
|
24
doc/source/intro-ha.rst
Normal file
24
doc/source/intro-ha.rst
Normal file
@ -0,0 +1,24 @@
|
|||||||
|
=================================
|
||||||
|
Introduction to high availability
|
||||||
|
=================================
|
||||||
|
|
||||||
|
High availability systems seek to minimize the following issues:
|
||||||
|
|
||||||
|
#. System downtime: Occurs when a user-facing service is unavailable
|
||||||
|
beyond a specified maximum amount of time.
|
||||||
|
|
||||||
|
#. Data loss: Accidental deletion or destruction of data.
|
||||||
|
|
||||||
|
Most high availability systems guarantee protection against system downtime
|
||||||
|
and data loss only in the event of a single failure.
|
||||||
|
However, they are also expected to protect against cascading failures,
|
||||||
|
where a single failure deteriorates into a series of consequential failures.
|
||||||
|
Many service providers guarantee a :term:`Service Level Agreement (SLA)`
|
||||||
|
including uptime percentage of computing service, which is calculated based
|
||||||
|
on the available time and system downtime excluding planned outage time.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
intro-ha-key-concepts.rst
|
||||||
|
intro-ha-common-tech.rst
|
67
doc/source/intro-os-ha-cluster.rst
Normal file
67
doc/source/intro-os-ha-cluster.rst
Normal file
@ -0,0 +1,67 @@
|
|||||||
|
================
|
||||||
|
Cluster managers
|
||||||
|
================
|
||||||
|
|
||||||
|
At its core, a cluster is a distributed finite state machine capable
|
||||||
|
of co-ordinating the startup and recovery of inter-related services
|
||||||
|
across a set of machines.
|
||||||
|
|
||||||
|
Even a distributed or replicated application that is able to survive failures
|
||||||
|
on one or more machines can benefit from a cluster manager because a cluster
|
||||||
|
manager has the following capabilities:
|
||||||
|
|
||||||
|
#. Awareness of other applications in the stack
|
||||||
|
|
||||||
|
While SYS-V init replacements like systemd can provide
|
||||||
|
deterministic recovery of a complex stack of services, the
|
||||||
|
recovery is limited to one machine and lacks the context of what
|
||||||
|
is happening on other machines. This context is crucial to
|
||||||
|
determine the difference between a local failure, and clean startup
|
||||||
|
and recovery after a total site failure.
|
||||||
|
|
||||||
|
#. Awareness of instances on other machines
|
||||||
|
|
||||||
|
Services like RabbitMQ and Galera have complicated boot-up
|
||||||
|
sequences that require co-ordination, and often serialization, of
|
||||||
|
startup operations across all machines in the cluster. This is
|
||||||
|
especially true after a site-wide failure or shutdown where you must
|
||||||
|
first determine the last machine to be active.
|
||||||
|
|
||||||
|
#. A shared implementation and calculation of `quorum
|
||||||
|
<https://en.wikipedia.org/wiki/Quorum_(Distributed_Systems)>`_
|
||||||
|
|
||||||
|
It is very important that all members of the system share the same
|
||||||
|
view of who their peers are and whether or not they are in the
|
||||||
|
majority. Failure to do this leads very quickly to an internal
|
||||||
|
`split-brain <https://en.wikipedia.org/wiki/Split-brain_(computing)>`_
|
||||||
|
state. This is where different parts of the system are pulling in
|
||||||
|
different and incompatible directions.
|
||||||
|
|
||||||
|
#. Data integrity through fencing (a non-responsive process does not
|
||||||
|
imply it is not doing anything)
|
||||||
|
|
||||||
|
A single application does not have sufficient context to know the
|
||||||
|
difference between failure of a machine and failure of the
|
||||||
|
application on a machine. The usual practice is to assume the
|
||||||
|
machine is dead and continue working, however this is highly risky. A
|
||||||
|
rogue process or machine could still be responding to requests and
|
||||||
|
generally causing havoc. The safer approach is to make use of
|
||||||
|
remotely accessible power switches and/or network switches and SAN
|
||||||
|
controllers to fence (isolate) the machine before continuing.
|
||||||
|
|
||||||
|
#. Automated recovery of failed instances
|
||||||
|
|
||||||
|
While the application can still run after the failure of several
|
||||||
|
instances, it may not have sufficient capacity to serve the
|
||||||
|
required volume of requests. A cluster can automatically recover
|
||||||
|
failed instances to prevent additional load induced failures.
|
||||||
|
|
||||||
|
Pacemaker
|
||||||
|
~~~~~~~~~
|
||||||
|
.. to do: description and point to ref arch example using pacemaker
|
||||||
|
|
||||||
|
`Pacemaker <http://clusterlabs.org>`_.
|
||||||
|
|
||||||
|
Systemd
|
||||||
|
~~~~~~~
|
||||||
|
.. to do: description and point to ref arch example using Systemd and link
|
35
doc/source/intro-os-ha-memcached.rst
Normal file
35
doc/source/intro-os-ha-memcached.rst
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
=========
|
||||||
|
Memcached
|
||||||
|
=========
|
||||||
|
|
||||||
|
Most OpenStack services can use Memcached to store ephemeral data such as
|
||||||
|
tokens. Although Memcached does not support typical forms of redundancy such
|
||||||
|
as clustering, OpenStack services can use almost any number of instances
|
||||||
|
by configuring multiple hostnames or IP addresses.
|
||||||
|
|
||||||
|
The Memcached client implements hashing to balance objects among the instances.
|
||||||
|
Failure of an instance only impacts a percentage of the objects,
|
||||||
|
and the client automatically removes it from the list of instances.
|
||||||
|
|
||||||
|
Installation
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To install and configure Memcached, read the
|
||||||
|
`official documentation <https://github.com/Memcached/Memcached/wiki#getting-started>`_.
|
||||||
|
|
||||||
|
Memory caching is managed by `oslo.cache
|
||||||
|
<http://specs.openstack.org/openstack/oslo-specs/specs/kilo/oslo-cache-using-dogpile.html>`_.
|
||||||
|
This ensures consistency across all projects when using multiple Memcached
|
||||||
|
servers. The following is an example configuration with three hosts:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
Memcached_servers = controller1:11211,controller2:11211,controller3:11211
|
||||||
|
|
||||||
|
By default, ``controller1`` handles the caching service. If the host goes down,
|
||||||
|
``controller2`` or ``controller3`` will complete the service.
|
||||||
|
|
||||||
|
For more information about Memcached installation, see the
|
||||||
|
*Environment -> Memcached* section in the
|
||||||
|
`Installation Guides <https://docs.openstack.org/ocata/install/>`_
|
||||||
|
depending on your distribution.
|
52
doc/source/intro-os-ha-state.rst
Normal file
52
doc/source/intro-os-ha-state.rst
Normal file
@ -0,0 +1,52 @@
|
|||||||
|
==================================
|
||||||
|
Stateless versus stateful services
|
||||||
|
==================================
|
||||||
|
|
||||||
|
OpenStack components can be divided into three categories:
|
||||||
|
|
||||||
|
- OpenStack APIs: APIs that are HTTP(s) stateless services written in python,
|
||||||
|
easy to duplicate and mostly easy to load balance.
|
||||||
|
|
||||||
|
- The SQL relational database server provides stateful type consumed by other
|
||||||
|
components. Supported databases are MySQL, MariaDB, and PostgreSQL.
|
||||||
|
Making the SQL database redundant is complex.
|
||||||
|
|
||||||
|
- :term:`Advanced Message Queuing Protocol (AMQP)` provides OpenStack
|
||||||
|
internal stateful communication service.
|
||||||
|
|
||||||
|
.. to do: Ensure the difference between stateless and stateful services
|
||||||
|
.. is clear
|
||||||
|
|
||||||
|
Stateless services
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A service that provides a response after your request and then
|
||||||
|
requires no further attention. To make a stateless service highly
|
||||||
|
available, you need to provide redundant instances and load balance them.
|
||||||
|
|
||||||
|
Stateless OpenStack services
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
OpenStack services that are stateless include ``nova-api``,
|
||||||
|
``nova-conductor``, ``glance-api``, ``keystone-api``, ``neutron-api``,
|
||||||
|
and ``nova-scheduler``.
|
||||||
|
|
||||||
|
Stateful services
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A service where subsequent requests to the service
|
||||||
|
depend on the results of the first request.
|
||||||
|
Stateful services are more difficult to manage because a single
|
||||||
|
action typically involves more than one request. Providing
|
||||||
|
additional instances and load balancing does not solve the problem.
|
||||||
|
For example, if the horizon user interface reset itself every time
|
||||||
|
you went to a new page, it would not be very useful.
|
||||||
|
OpenStack services that are stateful include the OpenStack database
|
||||||
|
and message queue.
|
||||||
|
Making stateful services highly available can depend on whether you choose
|
||||||
|
an active/passive or active/active configuration.
|
||||||
|
|
||||||
|
Stateful OpenStack services
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
.. to do: create list of stateful services
|
12
doc/source/intro-os-ha.rst
Normal file
12
doc/source/intro-os-ha.rst
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
================================================
|
||||||
|
Introduction to high availability with OpenStack
|
||||||
|
================================================
|
||||||
|
|
||||||
|
.. to do: description of section & improvement of title (intro to OS HA)
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
intro-os-ha-state.rst
|
||||||
|
intro-os-ha-cluster.rst
|
||||||
|
intro-os-ha-memcached.rst
|
6
doc/source/monitoring.rst
Normal file
6
doc/source/monitoring.rst
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
==========
|
||||||
|
Monitoring
|
||||||
|
==========
|
||||||
|
|
||||||
|
|
||||||
|
|
20
doc/source/networking-ha-l3-agent.rst
Normal file
20
doc/source/networking-ha-l3-agent.rst
Normal file
@ -0,0 +1,20 @@
|
|||||||
|
========
|
||||||
|
L3 Agent
|
||||||
|
========
|
||||||
|
.. TODO: Introduce L3 agent
|
||||||
|
|
||||||
|
HA Routers
|
||||||
|
~~~~~~~~~~
|
||||||
|
.. TODO: content for HA routers
|
||||||
|
|
||||||
|
Networking DHCP agent
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
The OpenStack Networking (neutron) service has a scheduler that lets you run
|
||||||
|
multiple agents across nodes. The DHCP agent can be natively highly available.
|
||||||
|
|
||||||
|
To configure the number of DHCP agents per network, modify the
|
||||||
|
``dhcp_agents_per_network`` parameter in the :file:`/etc/neutron/neutron.conf`
|
||||||
|
file. By default this is set to 1. To achieve high availability, assign more
|
||||||
|
than one DHCP agent per network. For more information, see
|
||||||
|
`High-availability for DHCP
|
||||||
|
<https://docs.openstack.org/newton/networking-guide/config-dhcp-ha.html>`_.
|
6
doc/source/networking-ha-neutron-l3-analysis.rst
Normal file
6
doc/source/networking-ha-neutron-l3-analysis.rst
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
==========
|
||||||
|
Neutron L3
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. TODO: create and import Neutron L3 analysis
|
||||||
|
Introduce the Networking (neutron) service L3 agent
|
5
doc/source/networking-ha-neutron-server.rst
Normal file
5
doc/source/networking-ha-neutron-server.rst
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
=========================
|
||||||
|
Neutron Networking server
|
||||||
|
=========================
|
||||||
|
|
||||||
|
.. TODO: Create content similar to other API sections
|
29
doc/source/networking-ha.rst
Normal file
29
doc/source/networking-ha.rst
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
===================================
|
||||||
|
Configuring the networking services
|
||||||
|
===================================
|
||||||
|
|
||||||
|
Configure networking on each node. See the basic information about
|
||||||
|
configuring networking in the Networking service section of the
|
||||||
|
`Install Guides <https://docs.openstack.org/ocata/install/>`_,
|
||||||
|
depending on your distribution.
|
||||||
|
|
||||||
|
OpenStack network nodes contain:
|
||||||
|
|
||||||
|
- Networking DHCP agent
|
||||||
|
- Neutron L3 agent
|
||||||
|
- Networking L2 agent
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The L2 agent cannot be distributed and highly available. Instead, it
|
||||||
|
must be installed on each data forwarding node to control the virtual
|
||||||
|
network driver such as Open vSwitch or Linux Bridge. One L2 agent runs
|
||||||
|
per node and controls its virtual interfaces.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
networking-ha-neutron-server.rst
|
||||||
|
networking-ha-neutron-l3-analysis.rst
|
||||||
|
networking-ha-l3-agent.rst
|
||||||
|
|
24
doc/source/overview.rst
Normal file
24
doc/source/overview.rst
Normal file
@ -0,0 +1,24 @@
|
|||||||
|
========
|
||||||
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
This guide can be split into two parts:
|
||||||
|
|
||||||
|
#. High level architecture
|
||||||
|
#. Reference architecture examples, monitoring, and testing
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
We recommend using this guide for assistance when considering your HA cloud.
|
||||||
|
We do not recommend using this guide for manually building your HA cloud.
|
||||||
|
We recommend starting with a pre-validated solution and adjusting to your
|
||||||
|
needs.
|
||||||
|
|
||||||
|
High availability is not for every user. It presents some challenges.
|
||||||
|
High availability may be too complex for databases or
|
||||||
|
systems with large amounts of data. Replication can slow large systems
|
||||||
|
down. Different setups have different prerequisites. Read the guidelines
|
||||||
|
for each setup.
|
||||||
|
|
||||||
|
.. important::
|
||||||
|
|
||||||
|
High availability is turned off as the default in OpenStack setups.
|
3
doc/source/ref-arch-examples.rst
Normal file
3
doc/source/ref-arch-examples.rst
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
======================
|
||||||
|
Reference Architecture
|
||||||
|
======================
|
59
doc/source/storage-ha-backend.rst
Normal file
59
doc/source/storage-ha-backend.rst
Normal file
@ -0,0 +1,59 @@
|
|||||||
|
|
||||||
|
.. _storage-ha-backend:
|
||||||
|
|
||||||
|
================
|
||||||
|
Storage back end
|
||||||
|
================
|
||||||
|
|
||||||
|
An OpenStack environment includes multiple data pools for the VMs:
|
||||||
|
|
||||||
|
- Ephemeral storage is allocated for an instance and is deleted when the
|
||||||
|
instance is deleted. The Compute service manages ephemeral storage and
|
||||||
|
by default, Compute stores ephemeral drives as files on local disks on the
|
||||||
|
compute node. As an alternative, you can use Ceph RBD as the storage back
|
||||||
|
end for ephemeral storage.
|
||||||
|
|
||||||
|
- Persistent storage exists outside all instances. Two types of persistent
|
||||||
|
storage are provided:
|
||||||
|
|
||||||
|
- The Block Storage service (cinder) that can use LVM or Ceph RBD as the
|
||||||
|
storage back end.
|
||||||
|
- The Image service (glance) that can use the Object Storage service (swift)
|
||||||
|
or Ceph RBD as the storage back end.
|
||||||
|
|
||||||
|
For more information about configuring storage back ends for
|
||||||
|
the different storage options, see `Manage volumes
|
||||||
|
<https://docs.openstack.org/admin-guide/blockstorage-manage-volumes.html>`_
|
||||||
|
in the OpenStack Administrator Guide.
|
||||||
|
|
||||||
|
This section discusses ways to protect against data loss in your OpenStack
|
||||||
|
environment.
|
||||||
|
|
||||||
|
RAID drives
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Configuring RAID on the hard drives that implement storage protects your data
|
||||||
|
against a hard drive failure. If the node itself fails, data may be lost.
|
||||||
|
In particular, all volumes stored on an LVM node can be lost.
|
||||||
|
|
||||||
|
Ceph
|
||||||
|
----
|
||||||
|
|
||||||
|
`Ceph RBD <http://ceph.com/>`_ is an innately high availability storage back
|
||||||
|
end. It creates a storage cluster with multiple nodes that communicate with
|
||||||
|
each other to replicate and redistribute data dynamically.
|
||||||
|
A Ceph RBD storage cluster provides a single shared set of storage nodes that
|
||||||
|
can handle all classes of persistent and ephemeral data (glance, cinder, and
|
||||||
|
nova) that are required for OpenStack instances.
|
||||||
|
|
||||||
|
Ceph RBD provides object replication capabilities by storing Block Storage
|
||||||
|
volumes as Ceph RBD objects. Ceph RBD ensures that each replica of an object
|
||||||
|
is stored on a different node. This means that your volumes are protected
|
||||||
|
against hard drive and node failures, or even the failure of the data center
|
||||||
|
itself.
|
||||||
|
|
||||||
|
When Ceph RBD is used for ephemeral volumes as well as block and image storage,
|
||||||
|
it supports `live migration
|
||||||
|
<https://docs.openstack.org/admin-guide/compute-live-migration-usage.html>`_
|
||||||
|
of VMs with ephemeral drives. LVM only supports live migration of
|
||||||
|
volume-backed VMs.
|
192
doc/source/storage-ha-block.rst
Normal file
192
doc/source/storage-ha-block.rst
Normal file
@ -0,0 +1,192 @@
|
|||||||
|
==================================
|
||||||
|
Highly available Block Storage API
|
||||||
|
==================================
|
||||||
|
|
||||||
|
Cinder provides Block-Storage-as-a-Service suitable for performance
|
||||||
|
sensitive scenarios such as databases, expandable file systems, or
|
||||||
|
providing a server with access to raw block level storage.
|
||||||
|
|
||||||
|
Persistent block storage can survive instance termination and can also
|
||||||
|
be moved across instances like any external storage device. Cinder
|
||||||
|
also has volume snapshots capability for backing up the volumes.
|
||||||
|
|
||||||
|
Making the Block Storage API service highly available in
|
||||||
|
active/passive mode involves:
|
||||||
|
|
||||||
|
- :ref:`ha-blockstorage-pacemaker`
|
||||||
|
- :ref:`ha-blockstorage-configure`
|
||||||
|
- :ref:`ha-blockstorage-services`
|
||||||
|
|
||||||
|
In theory, you can run the Block Storage service as active/active.
|
||||||
|
However, because of sufficient concerns, we recommend running
|
||||||
|
the volume component as active/passive only.
|
||||||
|
|
||||||
|
You can read more about these concerns on the
|
||||||
|
`Red Hat Bugzilla <https://bugzilla.redhat.com/show_bug.cgi?id=1193229>`_
|
||||||
|
and there is a
|
||||||
|
`psuedo roadmap <https://etherpad.openstack.org/p/cinder-kilo-stabilisation-work>`_
|
||||||
|
for addressing them upstream.
|
||||||
|
|
||||||
|
.. _ha-blockstorage-pacemaker:
|
||||||
|
|
||||||
|
Add Block Storage API resource to Pacemaker
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
On RHEL-based systems, create resources for cinder's systemd agents and create
|
||||||
|
constraints to enforce startup/shutdown ordering:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
pcs resource create openstack-cinder-api systemd:openstack-cinder-api --clone interleave=true
|
||||||
|
pcs resource create openstack-cinder-scheduler systemd:openstack-cinder-scheduler --clone interleave=true
|
||||||
|
pcs resource create openstack-cinder-volume systemd:openstack-cinder-volume
|
||||||
|
|
||||||
|
pcs constraint order start openstack-cinder-api-clone then openstack-cinder-scheduler-clone
|
||||||
|
pcs constraint colocation add openstack-cinder-scheduler-clone with openstack-cinder-api-clone
|
||||||
|
pcs constraint order start openstack-cinder-scheduler-clone then openstack-cinder-volume
|
||||||
|
pcs constraint colocation add openstack-cinder-volume with openstack-cinder-scheduler-clone
|
||||||
|
|
||||||
|
|
||||||
|
If the Block Storage service runs on the same nodes as the other services,
|
||||||
|
then it is advisable to also include:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
pcs constraint order start openstack-keystone-clone then openstack-cinder-api-clone
|
||||||
|
|
||||||
|
Alternatively, instead of using systemd agents, download and
|
||||||
|
install the OCF resource agent:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cd /usr/lib/ocf/resource.d/openstack
|
||||||
|
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/cinder-api
|
||||||
|
# chmod a+rx *
|
||||||
|
|
||||||
|
You can now add the Pacemaker configuration for Block Storage API resource.
|
||||||
|
Connect to the Pacemaker cluster with the :command:`crm configure` command
|
||||||
|
and add the following cluster resources:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
primitive p_cinder-api ocf:openstack:cinder-api \
|
||||||
|
params config="/etc/cinder/cinder.conf" \
|
||||||
|
os_password="secretsecret" \
|
||||||
|
os_username="admin" \
|
||||||
|
os_tenant_name="admin" \
|
||||||
|
keystone_get_token_url="http://10.0.0.11:5000/v2.0/tokens" \
|
||||||
|
op monitor interval="30s" timeout="30s"
|
||||||
|
|
||||||
|
This configuration creates ``p_cinder-api``, a resource for managing the
|
||||||
|
Block Storage API service.
|
||||||
|
|
||||||
|
The command :command:`crm configure` supports batch input, copy and paste the
|
||||||
|
lines above into your live Pacemaker configuration and then make changes as
|
||||||
|
required. For example, you may enter ``edit p_ip_cinder-api`` from the
|
||||||
|
:command:`crm configure` menu and edit the resource to match your preferred
|
||||||
|
virtual IP address.
|
||||||
|
|
||||||
|
Once completed, commit your configuration changes by entering :command:`commit`
|
||||||
|
from the :command:`crm configure` menu. Pacemaker then starts the Block Storage
|
||||||
|
API service and its dependent resources on one of your nodes.
|
||||||
|
|
||||||
|
.. _ha-blockstorage-configure:
|
||||||
|
|
||||||
|
Configure Block Storage API service
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Edit the ``/etc/cinder/cinder.conf`` file. For example, on a RHEL-based system:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
:linenos:
|
||||||
|
|
||||||
|
[DEFAULT]
|
||||||
|
# This is the name which we should advertise ourselves as and for
|
||||||
|
# A/P installations it should be the same everywhere
|
||||||
|
host = cinder-cluster-1
|
||||||
|
|
||||||
|
# Listen on the Block Storage VIP
|
||||||
|
osapi_volume_listen = 10.0.0.11
|
||||||
|
|
||||||
|
auth_strategy = keystone
|
||||||
|
control_exchange = cinder
|
||||||
|
|
||||||
|
volume_driver = cinder.volume.drivers.nfs.NfsDriver
|
||||||
|
nfs_shares_config = /etc/cinder/nfs_exports
|
||||||
|
nfs_sparsed_volumes = true
|
||||||
|
nfs_mount_options = v3
|
||||||
|
|
||||||
|
[database]
|
||||||
|
connection = mysql+pymysql://cinder:CINDER_DBPASS@10.0.0.11/cinder
|
||||||
|
max_retries = -1
|
||||||
|
|
||||||
|
[keystone_authtoken]
|
||||||
|
# 10.0.0.11 is the Keystone VIP
|
||||||
|
identity_uri = http://10.0.0.11:35357/
|
||||||
|
www_authenticate_uri = http://10.0.0.11:5000/
|
||||||
|
admin_tenant_name = service
|
||||||
|
admin_user = cinder
|
||||||
|
admin_password = CINDER_PASS
|
||||||
|
|
||||||
|
[oslo_messaging_rabbit]
|
||||||
|
# Explicitly list the rabbit hosts as it doesn't play well with HAProxy
|
||||||
|
rabbit_hosts = 10.0.0.12,10.0.0.13,10.0.0.14
|
||||||
|
# As a consequence, we also need HA queues
|
||||||
|
rabbit_ha_queues = True
|
||||||
|
heartbeat_timeout_threshold = 60
|
||||||
|
heartbeat_rate = 2
|
||||||
|
|
||||||
|
Replace ``CINDER_DBPASS`` with the password you chose for the Block Storage
|
||||||
|
database. Replace ``CINDER_PASS`` with the password you chose for the
|
||||||
|
``cinder`` user in the Identity service.
|
||||||
|
|
||||||
|
This example assumes that you are using NFS for the physical storage, which
|
||||||
|
will almost never be true in a production installation.
|
||||||
|
|
||||||
|
If you are using the Block Storage service OCF agent, some settings will
|
||||||
|
be filled in for you, resulting in a shorter configuration file:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
:linenos:
|
||||||
|
|
||||||
|
# We have to use MySQL connection to store data:
|
||||||
|
connection = mysql+pymysql://cinder:CINDER_DBPASS@10.0.0.11/cinder
|
||||||
|
# Alternatively, you can switch to pymysql,
|
||||||
|
# a new Python 3 compatible library and use
|
||||||
|
# sql_connection = mysql+pymysql://cinder:CINDER_DBPASS@10.0.0.11/cinder
|
||||||
|
# and be ready when everything moves to Python 3.
|
||||||
|
# Ref: https://wiki.openstack.org/wiki/PyMySQL_evaluation
|
||||||
|
|
||||||
|
# We bind Block Storage API to the VIP:
|
||||||
|
osapi_volume_listen = 10.0.0.11
|
||||||
|
|
||||||
|
# We send notifications to High Available RabbitMQ:
|
||||||
|
notifier_strategy = rabbit
|
||||||
|
rabbit_host = 10.0.0.11
|
||||||
|
|
||||||
|
Replace ``CINDER_DBPASS`` with the password you chose for the Block Storage
|
||||||
|
database.
|
||||||
|
|
||||||
|
.. _ha-blockstorage-services:
|
||||||
|
|
||||||
|
Configure OpenStack services to use the highly available Block Storage API
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Your OpenStack services must now point their Block Storage API configuration
|
||||||
|
to the highly available, virtual cluster IP address rather than a Block Storage
|
||||||
|
API server’s physical IP address as you would for a non-HA environment.
|
||||||
|
|
||||||
|
Create the Block Storage API endpoint with this IP.
|
||||||
|
|
||||||
|
If you are using both private and public IP addresses, create two virtual IPs
|
||||||
|
and define your endpoint. For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
volumev2 public http://PUBLIC_VIP:8776/v2/%\(project_id\)s
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
volumev2 admin http://10.0.0.11:8776/v2/%\(project_id\)s
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
volumev2 internal http://10.0.0.11:8776/v2/%\(project_id\)s
|
||||||
|
|
114
doc/source/storage-ha-file-systems.rst
Normal file
114
doc/source/storage-ha-file-systems.rst
Normal file
@ -0,0 +1,114 @@
|
|||||||
|
========================================
|
||||||
|
Highly available Shared File Systems API
|
||||||
|
========================================
|
||||||
|
|
||||||
|
Making the Shared File Systems (manila) API service highly available
|
||||||
|
in active/passive mode involves:
|
||||||
|
|
||||||
|
- :ref:`ha-sharedfilesystems-configure`
|
||||||
|
- :ref:`ha-sharedfilesystems-services`
|
||||||
|
- :ref:`ha-sharedfilesystems-pacemaker`
|
||||||
|
|
||||||
|
.. _ha-sharedfilesystems-configure:
|
||||||
|
|
||||||
|
Configure Shared File Systems API service
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Edit the :file:`/etc/manila/manila.conf` file:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
:linenos:
|
||||||
|
|
||||||
|
# We have to use MySQL connection to store data:
|
||||||
|
sql_connection = mysql+pymysql://manila:password@10.0.0.11/manila?charset=utf8
|
||||||
|
|
||||||
|
# We bind Shared File Systems API to the VIP:
|
||||||
|
osapi_volume_listen = 10.0.0.11
|
||||||
|
|
||||||
|
# We send notifications to High Available RabbitMQ:
|
||||||
|
notifier_strategy = rabbit
|
||||||
|
rabbit_host = 10.0.0.11
|
||||||
|
|
||||||
|
|
||||||
|
.. _ha-sharedfilesystems-services:
|
||||||
|
|
||||||
|
Configure OpenStack services to use Shared File Systems API
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Your OpenStack services must now point their Shared File Systems API
|
||||||
|
configuration to the highly available, virtual cluster IP address rather than
|
||||||
|
a Shared File Systems API server’s physical IP address as you would
|
||||||
|
for a non-HA environment.
|
||||||
|
|
||||||
|
You must create the Shared File Systems API endpoint with this IP.
|
||||||
|
|
||||||
|
If you are using both private and public IP addresses, you should create two
|
||||||
|
virtual IPs and define your endpoints like this:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ openstack endpoint create --region RegionOne \
|
||||||
|
sharev2 public 'http://PUBLIC_VIP:8786/v2/%(tenant_id)s'
|
||||||
|
|
||||||
|
$ openstack endpoint create --region RegionOne \
|
||||||
|
sharev2 internal 'http://10.0.0.11:8786/v2/%(tenant_id)s'
|
||||||
|
|
||||||
|
$ openstack endpoint create --region RegionOne \
|
||||||
|
sharev2 admin 'http://10.0.0.11:8786/v2/%(tenant_id)s'
|
||||||
|
|
||||||
|
.. _ha-sharedfilesystems-pacemaker:
|
||||||
|
|
||||||
|
Add Shared File Systems API resource to Pacemaker
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
#. Download the resource agent to your system:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cd /usr/lib/ocf/resource.d/openstack
|
||||||
|
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/manila-api
|
||||||
|
# chmod a+rx *
|
||||||
|
|
||||||
|
#. Add the Pacemaker configuration for the Shared File Systems
|
||||||
|
API resource. Connect to the Pacemaker cluster with the following
|
||||||
|
command:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# crm configure
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The :command:`crm configure` supports batch input. Copy and paste
|
||||||
|
the lines in the next step into your live Pacemaker configuration and then
|
||||||
|
make changes as required.
|
||||||
|
|
||||||
|
For example, you may enter ``edit p_ip_manila-api`` from the
|
||||||
|
:command:`crm configure` menu and edit the resource to match your preferred
|
||||||
|
virtual IP address.
|
||||||
|
|
||||||
|
#. Add the following cluster resources:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
primitive p_manila-api ocf:openstack:manila-api \
|
||||||
|
params config="/etc/manila/manila.conf" \
|
||||||
|
os_password="secretsecret" \
|
||||||
|
os_username="admin" \
|
||||||
|
os_tenant_name="admin" \
|
||||||
|
keystone_get_token_url="http://10.0.0.11:5000/v2.0/tokens" \
|
||||||
|
op monitor interval="30s" timeout="30s"
|
||||||
|
|
||||||
|
This configuration creates ``p_manila-api``, a resource for managing the
|
||||||
|
Shared File Systems API service.
|
||||||
|
|
||||||
|
#. Commit your configuration changes by entering the following command
|
||||||
|
from the :command:`crm configure` menu:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# commit
|
||||||
|
|
||||||
|
Pacemaker now starts the Shared File Systems API service and its
|
||||||
|
dependent resources on one of your nodes.
|
||||||
|
|
141
doc/source/storage-ha-image.rst
Normal file
141
doc/source/storage-ha-image.rst
Normal file
@ -0,0 +1,141 @@
|
|||||||
|
==========================
|
||||||
|
Highly available Image API
|
||||||
|
==========================
|
||||||
|
|
||||||
|
The OpenStack Image service offers a service for discovering, registering, and
|
||||||
|
retrieving virtual machine images. To make the OpenStack Image API service
|
||||||
|
highly available in active/passive mode, you must:
|
||||||
|
|
||||||
|
- :ref:`glance-api-pacemaker`
|
||||||
|
- :ref:`glance-api-configure`
|
||||||
|
- :ref:`glance-services`
|
||||||
|
|
||||||
|
Prerequisites
|
||||||
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Before beginning, ensure that you are familiar with the
|
||||||
|
documentation for installing the OpenStack Image API service.
|
||||||
|
See the *Image service* section in the
|
||||||
|
`Installation Guides <https://docs.openstack.org/ocata/install>`_,
|
||||||
|
depending on your distribution.
|
||||||
|
|
||||||
|
.. _glance-api-pacemaker:
|
||||||
|
|
||||||
|
Add OpenStack Image API resource to Pacemaker
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
#. Download the resource agent to your system:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
# cd /usr/lib/ocf/resource.d/openstack
|
||||||
|
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/glance-api
|
||||||
|
# chmod a+rx *
|
||||||
|
|
||||||
|
#. Add the Pacemaker configuration for the OpenStack Image API resource.
|
||||||
|
Use the following command to connect to the Pacemaker cluster:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
crm configure
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The :command:`crm configure` command supports batch input. Copy and paste
|
||||||
|
the lines in the next step into your live Pacemaker configuration and
|
||||||
|
then make changes as required.
|
||||||
|
|
||||||
|
For example, you may enter ``edit p_ip_glance-api`` from the
|
||||||
|
:command:`crm configure` menu and edit the resource to match your
|
||||||
|
preferred virtual IP address.
|
||||||
|
|
||||||
|
#. Add the following cluster resources:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
primitive p_glance-api ocf:openstack:glance-api \
|
||||||
|
params config="/etc/glance/glance-api.conf" \
|
||||||
|
os_password="secretsecret" \
|
||||||
|
os_username="admin" os_tenant_name="admin" \
|
||||||
|
os_auth_url="http://10.0.0.11:5000/v2.0/" \
|
||||||
|
op monitor interval="30s" timeout="30s"
|
||||||
|
|
||||||
|
This configuration creates ``p_glance-api``, a resource for managing the
|
||||||
|
OpenStack Image API service.
|
||||||
|
|
||||||
|
#. Commit your configuration changes by entering the following command from
|
||||||
|
the :command:`crm configure` menu:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
commit
|
||||||
|
|
||||||
|
Pacemaker then starts the OpenStack Image API service and its dependent
|
||||||
|
resources on one of your nodes.
|
||||||
|
|
||||||
|
.. _glance-api-configure:
|
||||||
|
|
||||||
|
Configure OpenStack Image service API
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Edit the :file:`/etc/glance/glance-api.conf` file
|
||||||
|
to configure the OpenStack Image service:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
# We have to use MySQL connection to store data:
|
||||||
|
sql_connection=mysql://glance:password@10.0.0.11/glance
|
||||||
|
# Alternatively, you can switch to pymysql,
|
||||||
|
# a new Python 3 compatible library and use
|
||||||
|
# sql_connection=mysql+pymysql://glance:password@10.0.0.11/glance
|
||||||
|
# and be ready when everything moves to Python 3.
|
||||||
|
# Ref: https://wiki.openstack.org/wiki/PyMySQL_evaluation
|
||||||
|
|
||||||
|
# We bind OpenStack Image API to the VIP:
|
||||||
|
bind_host = 10.0.0.11
|
||||||
|
|
||||||
|
# Connect to OpenStack Image registry service:
|
||||||
|
registry_host = 10.0.0.11
|
||||||
|
|
||||||
|
# We send notifications to High Available RabbitMQ:
|
||||||
|
notifier_strategy = rabbit
|
||||||
|
rabbit_host = 10.0.0.11
|
||||||
|
|
||||||
|
[TODO: need more discussion of these parameters]
|
||||||
|
|
||||||
|
.. _glance-services:
|
||||||
|
|
||||||
|
Configure OpenStack services to use the highly available OpenStack Image API
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Your OpenStack services must now point their OpenStack Image API configuration
|
||||||
|
to the highly available, virtual cluster IP address instead of pointing to the
|
||||||
|
physical IP address of an OpenStack Image API server as you would in a non-HA
|
||||||
|
cluster.
|
||||||
|
|
||||||
|
For example, if your OpenStack Image API service IP address is 10.0.0.11
|
||||||
|
(as in the configuration explained here), you would use the following
|
||||||
|
configuration in your :file:`nova.conf` file:
|
||||||
|
|
||||||
|
.. code-block:: ini
|
||||||
|
|
||||||
|
[glance]
|
||||||
|
# ...
|
||||||
|
api_servers = 10.0.0.11
|
||||||
|
# ...
|
||||||
|
|
||||||
|
|
||||||
|
You must also create the OpenStack Image API endpoint with this IP address.
|
||||||
|
If you are using both private and public IP addresses, create two virtual IP
|
||||||
|
addresses and define your endpoint. For example:
|
||||||
|
|
||||||
|
.. code-block:: console
|
||||||
|
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
image public http://PUBLIC_VIP:9292
|
||||||
|
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
image admin http://10.0.0.11:9292
|
||||||
|
|
||||||
|
$ openstack endpoint create --region $KEYSTONE_REGION \
|
||||||
|
image internal http://10.0.0.11:9292
|
22
doc/source/storage-ha.rst
Normal file
22
doc/source/storage-ha.rst
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
===================
|
||||||
|
Configuring storage
|
||||||
|
===================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
storage-ha-image.rst
|
||||||
|
storage-ha-block.rst
|
||||||
|
storage-ha-file-systems.rst
|
||||||
|
storage-ha-backend.rst
|
||||||
|
|
||||||
|
Making the Block Storage (cinder) API service highly available in
|
||||||
|
active/active mode involves:
|
||||||
|
|
||||||
|
* Configuring Block Storage to listen on the VIP address
|
||||||
|
|
||||||
|
* Managing the Block Storage API daemon with the Pacemaker cluster manager
|
||||||
|
|
||||||
|
* Configuring OpenStack services to use this IP address
|
||||||
|
|
||||||
|
.. To Do: HA without Pacemaker
|
6
doc/source/testing.rst
Normal file
6
doc/source/testing.rst
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
=======
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
|
||||||
|
|
27
setup.cfg
Normal file
27
setup.cfg
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
[metadata]
|
||||||
|
name = openstackhaguide
|
||||||
|
summary = OpenStack High Availability Guide
|
||||||
|
author = OpenStack
|
||||||
|
author-email = openstack-dev@lists.openstack.org
|
||||||
|
home-page = https://docs.openstack.org/
|
||||||
|
classifier =
|
||||||
|
Environment :: OpenStack
|
||||||
|
Intended Audience :: Information Technology
|
||||||
|
Intended Audience :: System Administrators
|
||||||
|
License :: OSI Approved :: Apache Software License
|
||||||
|
Operating System :: POSIX :: Linux
|
||||||
|
Topic :: Documentation
|
||||||
|
|
||||||
|
[global]
|
||||||
|
setup-hooks =
|
||||||
|
pbr.hooks.setup_hook
|
||||||
|
|
||||||
|
[files]
|
||||||
|
|
||||||
|
[build_sphinx]
|
||||||
|
warning-is-error = 1
|
||||||
|
build-dir = build
|
||||||
|
source-dir = source
|
||||||
|
|
||||||
|
[wheel]
|
||||||
|
universal = 1
|
30
setup.py
Normal file
30
setup.py
Normal file
@ -0,0 +1,30 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
# Copyright (c) 2013 Hewlett-Packard Development Company, L.P.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# THIS FILE IS MANAGED BY THE GLOBAL REQUIREMENTS REPO - DO NOT EDIT
|
||||||
|
import setuptools
|
||||||
|
|
||||||
|
# In python < 2.7.4, a lazy loading of package `pbr` will break
|
||||||
|
# setuptools if some other modules registered functions in `atexit`.
|
||||||
|
# solution from: http://bugs.python.org/issue15881#msg170215
|
||||||
|
try:
|
||||||
|
import multiprocessing # noqa
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
setuptools.setup(
|
||||||
|
setup_requires=['pbr'],
|
||||||
|
pbr=True)
|
13
tox.ini
13
tox.ini
@ -14,3 +14,16 @@ deps =
|
|||||||
commands =
|
commands =
|
||||||
doc8 doc/source -e txt -e rst
|
doc8 doc/source -e txt -e rst
|
||||||
sphinx-build -E -W -b html doc/source doc/build/html
|
sphinx-build -E -W -b html doc/source doc/build/html
|
||||||
|
|
||||||
|
[doc8]
|
||||||
|
# Settings for doc8:
|
||||||
|
# Ignore target directories and autogenerated files
|
||||||
|
ignore-path = doc/*/target,doc/*/build*
|
||||||
|
# File extensions to use
|
||||||
|
extensions = .rst,.txt
|
||||||
|
# Maximal line length should be 79 but we have some overlong lines.
|
||||||
|
# Let's not get far more in.
|
||||||
|
max-line-length = 79
|
||||||
|
# Disable some doc8 checks:
|
||||||
|
# D000: Check RST validity (cannot handle the "linenos" directive)
|
||||||
|
ignore = D000
|
||||||
|
Loading…
Reference in New Issue
Block a user