Browse Source

import draft from openstack-manuals/doc/ha-guide-draft/

Move the draft of the restructured HA guide to this dedicated
repository, in order to focus reviews on a smaller, more specialist
audience and accelerate development.

Change-Id: I95a4b46fecaafafd1beb8314d1cf795b60fb17a8
Adam Spiers 5 months ago
parent
commit
393604c6b7
38 changed files with 6919 additions and 5 deletions
  1. 230
    0
      doc/common/app-support.rst
  2. 8
    0
      doc/common/appendix.rst
  3. 47
    0
      doc/common/conventions.rst
  4. 4164
    0
      doc/common/glossary.rst
  5. 110
    0
      doc/common/source/conf.py
  6. 1
    0
      doc/source/common
  7. 55
    0
      doc/source/compute-node-ha.rst
  8. 233
    2
      doc/source/conf.py
  9. 342
    0
      doc/source/control-plane-stateful.rst
  10. 518
    0
      doc/source/control-plane-stateless.rst
  11. 9
    0
      doc/source/control-plane.rst
  12. BIN
      doc/source/figures/Cluster-deployment-collapsed.png
  13. BIN
      doc/source/figures/Cluster-deployment-segregated.png
  14. 15
    0
      doc/source/ha-community.rst
  15. 26
    3
      doc/source/index.rst
  16. 127
    0
      doc/source/intro-ha-common-tech.rst
  17. 147
    0
      doc/source/intro-ha-key-concepts.rst
  18. 24
    0
      doc/source/intro-ha.rst
  19. 67
    0
      doc/source/intro-os-ha-cluster.rst
  20. 35
    0
      doc/source/intro-os-ha-memcached.rst
  21. 52
    0
      doc/source/intro-os-ha-state.rst
  22. 12
    0
      doc/source/intro-os-ha.rst
  23. 6
    0
      doc/source/monitoring.rst
  24. 20
    0
      doc/source/networking-ha-l3-agent.rst
  25. 6
    0
      doc/source/networking-ha-neutron-l3-analysis.rst
  26. 5
    0
      doc/source/networking-ha-neutron-server.rst
  27. 29
    0
      doc/source/networking-ha.rst
  28. 24
    0
      doc/source/overview.rst
  29. 3
    0
      doc/source/ref-arch-examples.rst
  30. 59
    0
      doc/source/storage-ha-backend.rst
  31. 192
    0
      doc/source/storage-ha-block.rst
  32. 114
    0
      doc/source/storage-ha-file-systems.rst
  33. 141
    0
      doc/source/storage-ha-image.rst
  34. 22
    0
      doc/source/storage-ha.rst
  35. 6
    0
      doc/source/testing.rst
  36. 27
    0
      setup.cfg
  37. 30
    0
      setup.py
  38. 13
    0
      tox.ini

+ 230
- 0
doc/common/app-support.rst View File

@@ -0,0 +1,230 @@
1
+.. ## WARNING ##########################################################
2
+.. This file is synced from openstack/openstack-manuals repository to
3
+.. other related repositories. If you need to make changes to this file,
4
+.. make the changes in openstack-manuals. After any change merged to,
5
+.. openstack-manuals, automatically a patch for others will be proposed.
6
+.. #####################################################################
7
+
8
+=================
9
+Community support
10
+=================
11
+
12
+The following resources are available to help you run and use OpenStack.
13
+The OpenStack community constantly improves and adds to the main
14
+features of OpenStack, but if you have any questions, do not hesitate to
15
+ask. Use the following resources to get OpenStack support and
16
+troubleshoot your installations.
17
+
18
+Documentation
19
+~~~~~~~~~~~~~
20
+
21
+For the available OpenStack documentation, see
22
+`docs.openstack.org <https://docs.openstack.org>`_.
23
+
24
+The following guides explain how to install a Proof-of-Concept OpenStack cloud
25
+and its associated components:
26
+
27
+* `Rocky Installation Guides <https://docs.openstack.org/rocky/install/>`_
28
+
29
+The following books explain how to configure and run an OpenStack cloud:
30
+
31
+*  `Architecture Design Guide <https://docs.openstack.org/arch-design/>`_
32
+
33
+*  `Rocky Administrator Guides <https://docs.openstack.org/rocky/admin/>`_
34
+
35
+*  `Rocky Configuration Guides <https://docs.openstack.org/rocky/configuration/>`_
36
+
37
+*  `Rocky Networking Guide <https://docs.openstack.org/neutron/rocky/admin/>`_
38
+
39
+*  `High Availability Guide <https://docs.openstack.org/ha-guide/>`_
40
+
41
+*  `Security Guide <https://docs.openstack.org/security-guide/>`_
42
+
43
+*  `Virtual Machine Image Guide <https://docs.openstack.org/image-guide/>`_
44
+
45
+The following book explains how to use the command-line clients:
46
+
47
+*  `Rocky API Bindings
48
+   <https://docs.openstack.org/rocky/language-bindings.html>`_
49
+
50
+The following documentation provides reference and guidance information
51
+for the OpenStack APIs:
52
+
53
+*  `API Documentation <https://developer.openstack.org/api-guide/quick-start/>`_
54
+
55
+The following guide provides information on how to contribute to OpenStack
56
+documentation:
57
+
58
+*  `Documentation Contributor Guide <https://docs.openstack.org/doc-contrib-guide/>`_
59
+
60
+ask.openstack.org
61
+~~~~~~~~~~~~~~~~~
62
+
63
+During the set up or testing of OpenStack, you might have questions
64
+about how a specific task is completed or be in a situation where a
65
+feature does not work correctly. Use the
66
+`ask.openstack.org <https://ask.openstack.org>`_ site to ask questions
67
+and get answers. When you visit the `Ask OpenStack
68
+<https://ask.openstack.org>`_ site, scan
69
+the recently asked questions to see whether your question has already
70
+been answered. If not, ask a new question. Be sure to give a clear,
71
+concise summary in the title and provide as much detail as possible in
72
+the description. Paste in your command output or stack traces, links to
73
+screen shots, and any other information which might be useful.
74
+
75
+The OpenStack wiki
76
+~~~~~~~~~~~~~~~~~~
77
+
78
+The `OpenStack wiki <https://wiki.openstack.org/>`_ contains a broad
79
+range of topics but some of the information can be difficult to find or
80
+is a few pages deep. Fortunately, the wiki search feature enables you to
81
+search by title or content. If you search for specific information, such
82
+as about networking or OpenStack Compute, you can find a large amount
83
+of relevant material. More is being added all the time, so be sure to
84
+check back often. You can find the search box in the upper-right corner
85
+of any OpenStack wiki page.
86
+
87
+The Launchpad bugs area
88
+~~~~~~~~~~~~~~~~~~~~~~~
89
+
90
+The OpenStack community values your set up and testing efforts and wants
91
+your feedback. To log a bug, you must `sign up for a Launchpad account
92
+<https://launchpad.net/+login>`_. You can view existing bugs and report bugs
93
+in the Launchpad Bugs area. Use the search feature to determine whether
94
+the bug has already been reported or already been fixed. If it still
95
+seems like your bug is unreported, fill out a bug report.
96
+
97
+Some tips:
98
+
99
+*  Give a clear, concise summary.
100
+
101
+*  Provide as much detail as possible in the description. Paste in your
102
+   command output or stack traces, links to screen shots, and any other
103
+   information which might be useful.
104
+
105
+*  Be sure to include the software and package versions that you are
106
+   using, especially if you are using a development branch, such as,
107
+   ``"Kilo release" vs git commit bc79c3ecc55929bac585d04a03475b72e06a3208``.
108
+
109
+*  Any deployment-specific information is helpful, such as whether you
110
+   are using Ubuntu 14.04 or are performing a multi-node installation.
111
+
112
+The following Launchpad Bugs areas are available:
113
+
114
+*  `Bugs: OpenStack Block Storage
115
+   (cinder) <https://bugs.launchpad.net/cinder>`_
116
+
117
+*  `Bugs: OpenStack Compute (nova) <https://bugs.launchpad.net/nova>`_
118
+
119
+*  `Bugs: OpenStack Dashboard
120
+   (horizon) <https://bugs.launchpad.net/horizon>`_
121
+
122
+*  `Bugs: OpenStack Identity
123
+   (keystone) <https://bugs.launchpad.net/keystone>`_
124
+
125
+*  `Bugs: OpenStack Image service
126
+   (glance) <https://bugs.launchpad.net/glance>`_
127
+
128
+*  `Bugs: OpenStack Networking
129
+   (neutron) <https://bugs.launchpad.net/neutron>`_
130
+
131
+*  `Bugs: OpenStack Object Storage
132
+   (swift) <https://bugs.launchpad.net/swift>`_
133
+
134
+*  `Bugs: Application catalog (murano) <https://bugs.launchpad.net/murano>`_
135
+
136
+*  `Bugs: Bare metal service (ironic) <https://bugs.launchpad.net/ironic>`_
137
+
138
+*  `Bugs: Clustering service (senlin) <https://bugs.launchpad.net/senlin>`_
139
+
140
+*  `Bugs: Container Infrastructure Management service (magnum) <https://bugs.launchpad.net/magnum>`_
141
+
142
+*  `Bugs: Data processing service
143
+   (sahara) <https://bugs.launchpad.net/sahara>`_
144
+
145
+*  `Bugs: Database service (trove) <https://bugs.launchpad.net/trove>`_
146
+
147
+*  `Bugs: DNS service (designate) <https://bugs.launchpad.net/designate>`_
148
+
149
+*  `Bugs: Key Manager Service (barbican) <https://bugs.launchpad.net/barbican>`_
150
+
151
+*  `Bugs: Monitoring (monasca) <https://bugs.launchpad.net/monasca>`_
152
+
153
+*  `Bugs: Orchestration (heat) <https://bugs.launchpad.net/heat>`_
154
+
155
+*  `Bugs: Rating (cloudkitty) <https://bugs.launchpad.net/cloudkitty>`_
156
+
157
+*  `Bugs: Shared file systems (manila) <https://bugs.launchpad.net/manila>`_
158
+
159
+*  `Bugs: Telemetry
160
+   (ceilometer) <https://bugs.launchpad.net/ceilometer>`_
161
+
162
+*  `Bugs: Telemetry v3
163
+   (gnocchi) <https://bugs.launchpad.net/gnocchi>`_
164
+
165
+*  `Bugs: Workflow service
166
+   (mistral) <https://bugs.launchpad.net/mistral>`_
167
+
168
+*  `Bugs: Messaging service
169
+   (zaqar) <https://bugs.launchpad.net/zaqar>`_
170
+
171
+*  `Bugs: Container service
172
+   (zun) <https://bugs.launchpad.net/zun>`_
173
+
174
+*  `Bugs: OpenStack API Documentation
175
+   (developer.openstack.org) <https://bugs.launchpad.net/openstack-api-site>`_
176
+
177
+*  `Bugs: OpenStack Documentation
178
+   (docs.openstack.org) <https://bugs.launchpad.net/openstack-manuals>`_
179
+
180
+Documentation feedback
181
+~~~~~~~~~~~~~~~~~~~~~~
182
+
183
+To provide feedback on documentation, join our IRC channel ``#openstack-doc``
184
+on the Freenode IRC network, or `report a bug in Launchpad
185
+<https://bugs.launchpad.net/openstack/+filebug>`_ and choose the particular
186
+project that the documentation is a part of.
187
+
188
+The OpenStack IRC channel
189
+~~~~~~~~~~~~~~~~~~~~~~~~~
190
+
191
+The OpenStack community lives in the #openstack IRC channel on the
192
+Freenode network. You can hang out, ask questions, or get immediate
193
+feedback for urgent and pressing issues. To install an IRC client or use
194
+a browser-based client, go to
195
+`https://webchat.freenode.net/ <https://webchat.freenode.net>`_. You can
196
+also use `Colloquy <http://colloquy.info/>`_ (Mac OS X),
197
+`mIRC <http://www.mirc.com/>`_ (Windows),
198
+or XChat (Linux). When you are in the IRC channel
199
+and want to share code or command output, the generally accepted method
200
+is to use a Paste Bin. The OpenStack project has one at `Paste
201
+<http://paste.openstack.org>`_. Just paste your longer amounts of text or
202
+logs in the web form and you get a URL that you can paste into the
203
+channel. The OpenStack IRC channel is ``#openstack`` on
204
+``irc.freenode.net``. You can find a list of all OpenStack IRC channels on
205
+the `IRC page on the wiki <https://wiki.openstack.org/wiki/IRC>`_.
206
+
207
+OpenStack mailing lists
208
+~~~~~~~~~~~~~~~~~~~~~~~
209
+
210
+A great way to get answers and insights is to post your question or
211
+problematic scenario to the OpenStack mailing list. You can learn from
212
+and help others who might have similar issues. To subscribe or view the
213
+archives, go to the `general OpenStack mailing list
214
+<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>`_. If you are
215
+interested in the other mailing lists for specific projects or development,
216
+refer to `Mailing Lists <https://wiki.openstack.org/wiki/Mailing_Lists>`_.
217
+
218
+OpenStack distribution packages
219
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
220
+
221
+The following Linux distributions provide community-supported packages
222
+for OpenStack:
223
+
224
+*  **CentOS, Fedora, and Red Hat Enterprise Linux:**
225
+   https://www.rdoproject.org/
226
+
227
+*  **openSUSE and SUSE Linux Enterprise Server:**
228
+   https://en.opensuse.org/Portal:OpenStack
229
+
230
+*  **Ubuntu:** https://wiki.ubuntu.com/OpenStack/CloudArchive

+ 8
- 0
doc/common/appendix.rst View File

@@ -0,0 +1,8 @@
1
+Appendix
2
+~~~~~~~~
3
+
4
+.. toctree::
5
+   :maxdepth: 1
6
+
7
+   app-support.rst
8
+   glossary.rst

+ 47
- 0
doc/common/conventions.rst View File

@@ -0,0 +1,47 @@
1
+.. ## WARNING ##########################################################
2
+.. This file is synced from openstack/openstack-manuals repository to
3
+.. other related repositories. If you need to make changes to this file,
4
+.. make the changes in openstack-manuals. After any change merged to,
5
+.. openstack-manuals, automatically a patch for others will be proposed.
6
+.. #####################################################################
7
+
8
+===========
9
+Conventions
10
+===========
11
+
12
+The OpenStack documentation uses several typesetting conventions.
13
+
14
+Notices
15
+~~~~~~~
16
+
17
+Notices take these forms:
18
+
19
+.. note:: A comment with additional information that explains a part of the
20
+          text.
21
+
22
+.. important:: Something you must be aware of before proceeding.
23
+
24
+.. tip:: An extra but helpful piece of practical advice.
25
+
26
+.. caution:: Helpful information that prevents the user from making mistakes.
27
+
28
+.. warning:: Critical information about the risk of data loss or security
29
+             issues.
30
+
31
+Command prompts
32
+~~~~~~~~~~~~~~~
33
+
34
+.. code-block:: console
35
+
36
+   $ command
37
+
38
+Any user, including the ``root`` user, can run commands that are
39
+prefixed with the ``$`` prompt.
40
+
41
+.. code-block:: console
42
+
43
+   # command
44
+
45
+The ``root`` user must run commands that are prefixed with the ``#``
46
+prompt. You can also prefix these commands with the :command:`sudo`
47
+command, if available, to run them.

+ 4164
- 0
doc/common/glossary.rst
File diff suppressed because it is too large
View File


+ 110
- 0
doc/common/source/conf.py View File

@@ -0,0 +1,110 @@
1
+# Licensed under the Apache License, Version 2.0 (the "License");
2
+# you may not use this file except in compliance with the License.
3
+# You may obtain a copy of the License at
4
+#
5
+#    http://www.apache.org/licenses/LICENSE-2.0
6
+#
7
+# Unless required by applicable law or agreed to in writing, software
8
+# distributed under the License is distributed on an "AS IS" BASIS,
9
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
10
+# implied.
11
+# See the License for the specific language governing permissions and
12
+# limitations under the License.
13
+
14
+# This file is execfile()d with the current directory set to its
15
+# containing dir.
16
+#
17
+# Note that not all possible configuration values are present in this
18
+# autogenerated file.
19
+#
20
+# All configuration values have a default; values that are commented out
21
+# serve to show the default.
22
+
23
+import os
24
+# import sys
25
+
26
+
27
+# If extensions (or modules to document with autodoc) are in another directory,
28
+# add these directories to sys.path here. If the directory is relative to the
29
+# documentation root, use os.path.abspath to make it absolute, like shown here.
30
+# sys.path.insert(0, os.path.abspath('.'))
31
+
32
+# -- General configuration ------------------------------------------------
33
+
34
+# If your documentation needs a minimal Sphinx version, state it here.
35
+# needs_sphinx = '1.0'
36
+
37
+# Add any Sphinx extension module names here, as strings. They can be
38
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
39
+# ones.
40
+extensions = ['openstackdocstheme']
41
+
42
+# Add any paths that contain templates here, relative to this directory.
43
+# templates_path = ['_templates']
44
+
45
+# The suffix of source filenames.
46
+source_suffix = '.rst'
47
+
48
+# The encoding of source files.
49
+# source_encoding = 'utf-8-sig'
50
+
51
+# The master toctree document.
52
+master_doc = 'index'
53
+
54
+# General information about the project.
55
+repository_name = "openstack/openstack-manuals"
56
+bug_project = 'openstack-manuals'
57
+project = u'Common documents'
58
+bug_tag = u'common'
59
+
60
+copyright = u'2015-2018, OpenStack contributors'
61
+
62
+# The version info for the project you're documenting, acts as replacement for
63
+# |version| and |release|, also used in various other places throughout the
64
+# built documents.
65
+#
66
+# The short X.Y version.
67
+version = ''
68
+# The full version, including alpha/beta/rc tags.
69
+release = ''
70
+
71
+# The language for content autogenerated by Sphinx. Refer to documentation
72
+# for a list of supported languages.
73
+# language = None
74
+
75
+# There are two options for replacing |today|: either, you set today to some
76
+# non-false value, then it is used:
77
+# today = ''
78
+# Else, today_fmt is used as the format for a strftime call.
79
+# today_fmt = '%B %d, %Y'
80
+
81
+# List of patterns, relative to source directory, that match files and
82
+# directories to ignore when looking for source files.
83
+exclude_patterns = []
84
+
85
+# The reST default role (used for this markup: `text`) to use for all
86
+# documents.
87
+# default_role = None
88
+
89
+# If true, '()' will be appended to :func: etc. cross-reference text.
90
+# add_function_parentheses = True
91
+
92
+# If true, the current module name will be prepended to all description
93
+# unit titles (such as .. function::).
94
+# add_module_names = True
95
+
96
+# If true, sectionauthor and moduleauthor directives will be shown in the
97
+# output. They are ignored by default.
98
+# show_authors = False
99
+
100
+# The name of the Pygments (syntax highlighting) style to use.
101
+pygments_style = 'sphinx'
102
+
103
+# A list of ignored prefixes for module index sorting.
104
+# modindex_common_prefix = []
105
+
106
+# If true, keep warnings as "system message" paragraphs in the built documents.
107
+# keep_warnings = False
108
+
109
+# -- Options for Internationalization output ------------------------------
110
+locale_dirs = ['locale/']

+ 1
- 0
doc/source/common View File

@@ -0,0 +1 @@
1
+../common

+ 55
- 0
doc/source/compute-node-ha.rst View File

@@ -0,0 +1,55 @@
1
+============================
2
+Configuring the compute node
3
+============================
4
+
5
+The `Installation Guides
6
+<https://docs.openstack.org/ocata/install/>`_
7
+provide instructions for installing multiple compute nodes.
8
+To make the compute nodes highly available, you must configure the
9
+environment to include multiple instances of the API and other services.
10
+
11
+Configuring high availability for instances
12
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13
+
14
+As of September 2016, the OpenStack High Availability community is
15
+designing and developing an official and unified way to provide high
16
+availability for instances. We are developing automatic
17
+recovery from failures of hardware or hypervisor-related software on
18
+the compute node, or other failures that could prevent instances from
19
+functioning correctly, such as, issues with a cinder volume I/O path.
20
+
21
+More details are available in the `user story
22
+<http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html>`_
23
+co-authored by OpenStack's HA community and `Product Working Group
24
+<https://wiki.openstack.org/wiki/ProductTeam>`_ (PWG), where this feature is
25
+identified as missing functionality in OpenStack, which
26
+should be addressed with high priority.
27
+
28
+Existing solutions
29
+~~~~~~~~~~~~~~~~~~
30
+
31
+The architectural challenges of instance HA and several currently
32
+existing solutions were presented in `a talk at the Austin summit
33
+<https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation>`_,
34
+for which `slides are also available <http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/>`_.
35
+
36
+The code for three of these solutions can be found online at the following
37
+links:
38
+
39
+* `a mistral-based auto-recovery workflow
40
+  <https://github.com/gryf/mistral-evacuate>`_, by Intel
41
+* `masakari <https://launchpad.net/masakari>`_, by NTT
42
+* `OCF RAs
43
+  <http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/#/ocf-pros-cons>`_,
44
+  as used by Red Hat and SUSE
45
+
46
+Current upstream work
47
+~~~~~~~~~~~~~~~~~~~~~
48
+
49
+Work is in progress on a unified approach, which combines the best
50
+aspects of existing upstream solutions. More details are available on
51
+`the HA VMs user story wiki
52
+<https://wiki.openstack.org/wiki/ProductTeam/User_Stories/HA_VMs>`_.
53
+
54
+To get involved with this work, see the section on the
55
+:doc:`ha-community`.

+ 233
- 2
doc/source/conf.py View File

@@ -1,3 +1,16 @@
1
+# Licensed under the Apache License, Version 2.0 (the "License");
2
+# you may not use this file except in compliance with the License.
3
+# You may obtain a copy of the License at
4
+#
5
+#    http://www.apache.org/licenses/LICENSE-2.0
6
+#
7
+# Unless required by applicable law or agreed to in writing, software
8
+# distributed under the License is distributed on an "AS IS" BASIS,
9
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
10
+# implied.
11
+# See the License for the specific language governing permissions and
12
+# limitations under the License.
13
+
1 14
 # This file is execfile()d with the current directory set to its
2 15
 # containing dir.
3 16
 #
@@ -8,8 +21,7 @@
8 21
 # serve to show the default.
9 22
 
10 23
 import os
11
-
12
-import openstackdocstheme
24
+# import sys
13 25
 
14 26
 # If extensions (or modules to document with autodoc) are in another directory,
15 27
 # add these directories to sys.path here. If the directory is relative to the
@@ -26,6 +38,15 @@ import openstackdocstheme
26 38
 # ones.
27 39
 extensions = ['openstackdocstheme']
28 40
 
41
+# Add any paths that contain templates here, relative to this directory.
42
+# templates_path = ['_templates']
43
+
44
+# The suffix of source filenames.
45
+source_suffix = '.rst'
46
+
47
+# The encoding of source files.
48
+# source_encoding = 'utf-8-sig'
49
+
29 50
 # The master toctree document.
30 51
 master_doc = 'index'
31 52
 
@@ -36,12 +57,97 @@ project = u'High Availability Guide'
36 57
 bug_tag = u'ha-guide'
37 58
 copyright = u'2016-present, OpenStack contributors'
38 59
 
60
+# The version info for the project you're documenting, acts as replacement for
61
+# |version| and |release|, also used in various other places throughout the
62
+# built documents.
63
+#
64
+# The short X.Y version.
65
+version = ''
66
+# The full version, including alpha/beta/rc tags.
67
+release = ''
68
+
69
+# The language for content autogenerated by Sphinx. Refer to documentation
70
+# for a list of supported languages.
71
+# language = None
72
+
73
+# There are two options for replacing |today|: either, you set today to some
74
+# non-false value, then it is used:
75
+# today = ''
76
+# Else, today_fmt is used as the format for a strftime call.
77
+# today_fmt = '%B %d, %Y'
78
+
79
+# List of patterns, relative to source directory, that match files and
80
+# directories to ignore when looking for source files.
81
+exclude_patterns = ['common/cli*', 'common/nova*',
82
+                    'common/get-started*', 'common/dashboard*']
83
+
84
+# The reST default role (used for this markup: `text`) to use for all
85
+# documents.
86
+# default_role = None
87
+
88
+# If true, '()' will be appended to :func: etc. cross-reference text.
89
+# add_function_parentheses = True
90
+
91
+# If true, the current module name will be prepended to all description
92
+# unit titles (such as .. function::).
93
+# add_module_names = True
94
+
95
+# If true, sectionauthor and moduleauthor directives will be shown in the
96
+# output. They are ignored by default.
97
+# show_authors = False
98
+
99
+# The name of the Pygments (syntax highlighting) style to use.
100
+pygments_style = 'sphinx'
101
+
102
+# A list of ignored prefixes for module index sorting.
103
+# modindex_common_prefix = []
104
+
105
+# If true, keep warnings as "system message" paragraphs in the built documents.
106
+# keep_warnings = False
107
+
108
+
39 109
 # -- Options for HTML output ----------------------------------------------
40 110
 
41 111
 # The theme to use for HTML and HTML Help pages.  See the documentation for
42 112
 # a list of builtin themes.
43 113
 html_theme = 'openstackdocs'
44 114
 
115
+# Theme options are theme-specific and customize the look and feel of a theme
116
+# further.  For a list of options available for each theme, see the
117
+# documentation.
118
+html_theme_options = {
119
+    'display_badge': False
120
+}
121
+
122
+# Add any paths that contain custom themes here, relative to this directory.
123
+# html_theme_path = [openstackdocstheme.get_html_theme_path()]
124
+
125
+# The name for this set of Sphinx documents.  If None, it defaults to
126
+# "<project> v<release> documentation".
127
+# html_title = None
128
+
129
+# A shorter title for the navigation bar.  Default is the same as html_title.
130
+# html_short_title = None
131
+
132
+# The name of an image file (relative to this directory) to place at the top
133
+# of the sidebar.
134
+# html_logo = None
135
+
136
+# The name of an image file (within the static path) to use as favicon of the
137
+# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
138
+# pixels large.
139
+# html_favicon = None
140
+
141
+# Add any paths that contain custom static files (such as style sheets) here,
142
+# relative to this directory. They are copied after the builtin static files,
143
+# so a file named "default.css" will overwrite the builtin "default.css".
144
+# html_static_path = []
145
+
146
+# Add any extra paths that contain custom files (such as robots.txt or
147
+# .htaccess) here, relative to this directory. These files are copied
148
+# directly to the root of the documentation.
149
+# html_extra_path = []
150
+
45 151
 # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
46 152
 # using the given strftime format.
47 153
 # So that we can enable "log-a-bug" links from each output HTML page, this
@@ -49,6 +155,73 @@ html_theme = 'openstackdocs'
49 155
 # minutes.
50 156
 html_last_updated_fmt = '%Y-%m-%d %H:%M'
51 157
 
158
+# If true, SmartyPants will be used to convert quotes and dashes to
159
+# typographically correct entities.
160
+# html_use_smartypants = True
161
+
162
+# Custom sidebar templates, maps document names to template names.
163
+# html_sidebars = {}
164
+
165
+# Additional templates that should be rendered to pages, maps page names to
166
+# template names.
167
+# html_additional_pages = {}
168
+
169
+# If false, no module index is generated.
170
+# html_domain_indices = True
171
+
172
+# If false, no index is generated.
173
+html_use_index = False
174
+
175
+# If true, the index is split into individual pages for each letter.
176
+# html_split_index = False
177
+
178
+# If true, links to the reST sources are added to the pages.
179
+html_show_sourcelink = False
180
+
181
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
182
+# html_show_sphinx = True
183
+
184
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
185
+# html_show_copyright = True
186
+
187
+# If true, an OpenSearch description file will be output, and all pages will
188
+# contain a <link> tag referring to it.  The value of this option must be the
189
+# base URL from which the finished HTML is served.
190
+# html_use_opensearch = ''
191
+
192
+# This is the file name suffix for HTML files (e.g. ".xhtml").
193
+# html_file_suffix = None
194
+
195
+# Output file base name for HTML help builder.
196
+htmlhelp_basename = 'ha-guide'
197
+
198
+# If true, publish source files
199
+html_copy_source = False
200
+
201
+# -- Options for LaTeX output ---------------------------------------------
202
+
203
+latex_engine = 'xelatex'
204
+
205
+latex_elements = {
206
+    # The paper size ('letterpaper' or 'a4paper').
207
+    # 'papersize': 'letterpaper',
208
+
209
+    # set font (TODO: different fonts for translated PDF document builds)
210
+    'fontenc': '\\usepackage{fontspec}',
211
+    'fontpkg': '''\
212
+\defaultfontfeatures{Scale=MatchLowercase}
213
+\setmainfont{Liberation Serif}
214
+\setsansfont{Liberation Sans}
215
+\setmonofont[SmallCapsFont={Liberation Mono}]{Liberation Mono}
216
+''',
217
+
218
+    # The font size ('10pt', '11pt' or '12pt').
219
+    # 'pointsize': '10pt',
220
+
221
+    # Additional stuff for the LaTeX preamble.
222
+    # 'preamble': '',
223
+}
224
+
52 225
 # Grouping the document tree into LaTeX files. List of tuples
53 226
 # (source start file, target name, title,
54 227
 #  author, documentclass [howto, manual, or own class]).
@@ -57,5 +230,63 @@ latex_documents = [
57 230
      u'OpenStack contributors', 'manual'),
58 231
 ]
59 232
 
233
+# The name of an image file (relative to this directory) to place at the top of
234
+# the title page.
235
+# latex_logo = None
236
+
237
+# For "manual" documents, if this is true, then toplevel headings are parts,
238
+# not chapters.
239
+# latex_use_parts = False
240
+
241
+# If true, show page references after internal links.
242
+# latex_show_pagerefs = False
243
+
244
+# If true, show URL addresses after external links.
245
+# latex_show_urls = False
246
+
247
+# Documents to append as an appendix to all manuals.
248
+# latex_appendices = []
249
+
250
+# If false, no module index is generated.
251
+# latex_domain_indices = True
252
+
253
+
254
+# -- Options for manual page output ---------------------------------------
255
+
256
+# One entry per manual page. List of tuples
257
+# (source start file, name, description, authors, manual section).
258
+man_pages = [
259
+    ('index', 'haguide', u'High Availability Guide',
260
+     [u'OpenStack contributors'], 1)
261
+]
262
+
263
+# If true, show URL addresses after external links.
264
+# man_show_urls = False
265
+
266
+
267
+# -- Options for Texinfo output -------------------------------------------
268
+
269
+# Grouping the document tree into Texinfo files. List of tuples
270
+# (source start file, target name, title, author,
271
+#  dir menu entry, description, category)
272
+texinfo_documents = [
273
+    ('index', 'HAGuide', u'High Availability Guide',
274
+     u'OpenStack contributors', 'HAGuide',
275
+     'This guide shows OpenStack operators and deployers how to configure'
276
+     'OpenStack to be robust and fault-tolerant.', 'Miscellaneous'),
277
+]
278
+
279
+# Documents to append as an appendix to all manuals.
280
+# texinfo_appendices = []
281
+
282
+# If false, no module index is generated.
283
+# texinfo_domain_indices = True
284
+
285
+# How to display URL addresses: 'footnote', 'no', or 'inline'.
286
+# texinfo_show_urls = 'footnote'
287
+
288
+# If true, do not generate a @detailmenu in the "Top" node's menu.
289
+# texinfo_no_detailmenu = False
290
+
60 291
 # -- Options for Internationalization output ------------------------------
61 292
 locale_dirs = ['locale/']

+ 342
- 0
doc/source/control-plane-stateful.rst View File

@@ -0,0 +1,342 @@
1
+=================================
2
+Configuring the stateful services
3
+=================================
4
+.. to do: scope how in depth we want these sections to be
5
+
6
+Database for high availability
7
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8
+
9
+Galera
10
+------
11
+
12
+The first step is to install the database that sits at the heart of the
13
+cluster. To implement high availability, run an instance of the database on
14
+each controller node and use Galera Cluster to provide replication between
15
+them. Galera Cluster is a synchronous multi-master database cluster, based
16
+on MySQL and the InnoDB storage engine. It is a high-availability service
17
+that provides high system uptime, no data loss, and scalability for growth.
18
+
19
+You can achieve high availability for the OpenStack database in many
20
+different ways, depending on the type of database that you want to use.
21
+There are three implementations of Galera Cluster available to you:
22
+
23
+- `Galera Cluster for MySQL <http://galeracluster.com/>`_: The MySQL
24
+  reference implementation from Codership, Oy.
25
+- `MariaDB Galera Cluster <https://mariadb.org/>`_: The MariaDB
26
+  implementation of Galera Cluster, which is commonly supported in
27
+  environments based on Red Hat distributions.
28
+- `Percona XtraDB Cluster <https://www.percona.com/>`_: The XtraDB
29
+  implementation of Galera Cluster from Percona.
30
+
31
+In addition to Galera Cluster, you can also achieve high availability
32
+through other database options, such as PostgreSQL, which has its own
33
+replication system.
34
+
35
+Pacemaker active/passive with HAproxy
36
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37
+
38
+Replicated storage
39
+------------------
40
+
41
+For example: DRBD
42
+
43
+Shared storage
44
+--------------
45
+
46
+Messaging service for high availability
47
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48
+
49
+RabbitMQ
50
+--------
51
+
52
+An AMQP (Advanced Message Queuing Protocol) compliant message bus is
53
+required for most OpenStack components in order to coordinate the
54
+execution of jobs entered into the system.
55
+
56
+The most popular AMQP implementation used in OpenStack installations
57
+is RabbitMQ.
58
+
59
+RabbitMQ nodes fail over on the application and the infrastructure layers.
60
+
61
+The application layer is controlled by the ``oslo.messaging``
62
+configuration options for multiple AMQP hosts. If the AMQP node fails,
63
+the application reconnects to the next one configured within the
64
+specified reconnect interval. The specified reconnect interval
65
+constitutes its SLA.
66
+
67
+On the infrastructure layer, the SLA is the time for which RabbitMQ
68
+cluster reassembles. Several cases are possible. The Mnesia keeper
69
+node is the master of the corresponding Pacemaker resource for
70
+RabbitMQ. When it fails, the result is a full AMQP cluster downtime
71
+interval. Normally, its SLA is no more than several minutes. Failure
72
+of another node that is a slave of the corresponding Pacemaker
73
+resource for RabbitMQ results in no AMQP cluster downtime at all.
74
+
75
+.. until we've determined the content depth, I've transferred RabbitMQ
76
+   configuration below from the old HA guide (darrenc)
77
+
78
+Making the RabbitMQ service highly available involves the following steps:
79
+
80
+- :ref:`Install RabbitMQ<rabbitmq-install>`
81
+
82
+- :ref:`Configure RabbitMQ for HA queues<rabbitmq-configure>`
83
+
84
+- :ref:`Configure OpenStack services to use RabbitMQ HA queues
85
+  <rabbitmq-services>`
86
+
87
+.. note::
88
+
89
+   Access to RabbitMQ is not normally handled by HAProxy. Instead,
90
+   consumers must be supplied with the full list of hosts running
91
+   RabbitMQ with ``rabbit_hosts`` and turn on the ``rabbit_ha_queues``
92
+   option. For more information, read the `core issue
93
+   <http://people.redhat.com/jeckersb/private/vip-failover-tcp-persist.html>`_.
94
+   For more detail, read the `history and solution
95
+   <http://john.eckersberg.com/improving-ha-failures-with-tcp-timeouts.html>`_.
96
+
97
+.. _rabbitmq-install:
98
+
99
+Install RabbitMQ
100
+^^^^^^^^^^^^^^^^
101
+
102
+The commands for installing RabbitMQ are specific to the Linux distribution
103
+you are using.
104
+
105
+For Ubuntu or Debian:
106
+
107
+.. code-block: console
108
+
109
+   # apt-get install rabbitmq-server
110
+
111
+For RHEL, Fedora, or CentOS:
112
+
113
+.. code-block: console
114
+
115
+   # yum install rabbitmq-server
116
+
117
+For openSUSE:
118
+
119
+.. code-block: console
120
+
121
+   # zypper install rabbitmq-server
122
+
123
+For SLES 12:
124
+
125
+.. code-block: console
126
+
127
+   # zypper addrepo -f obs://Cloud:OpenStack:Kilo/SLE_12 Kilo
128
+   [Verify the fingerprint of the imported GPG key. See below.]
129
+   # zypper install rabbitmq-server
130
+
131
+.. note::
132
+
133
+   For SLES 12, the packages are signed by GPG key 893A90DAD85F9316.
134
+   You should verify the fingerprint of the imported GPG key before using it.
135
+
136
+   .. code-block:: none
137
+
138
+      Key ID: 893A90DAD85F9316
139
+      Key Name: Cloud:OpenStack OBS Project <Cloud:OpenStack@build.opensuse.org>
140
+      Key Fingerprint: 35B34E18ABC1076D66D5A86B893A90DAD85F9316
141
+      Key Created: Tue Oct  8 13:34:21 2013
142
+      Key Expires: Thu Dec 17 13:34:21 2015
143
+
144
+For more information, see the official installation manual for the
145
+distribution:
146
+
147
+- `Debian and Ubuntu <https://www.rabbitmq.com/install-debian.html>`_
148
+- `RPM based <https://www.rabbitmq.com/install-rpm.html>`_
149
+  (RHEL, Fedora, CentOS, openSUSE)
150
+
151
+.. _rabbitmq-configure:
152
+
153
+Configure RabbitMQ for HA queues
154
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
155
+
156
+.. [TODO: This section should begin with a brief mention
157
+.. about what HA queues are and why they are valuable, etc]
158
+
159
+.. [TODO: replace "currently" with specific release names]
160
+
161
+.. [TODO: Does this list need to be updated? Perhaps we need a table
162
+.. that shows each component and the earliest release that allows it
163
+.. to work with HA queues.]
164
+
165
+The following components/services can work with HA queues:
166
+
167
+- OpenStack Compute
168
+- OpenStack Block Storage
169
+- OpenStack Networking
170
+- Telemetry
171
+
172
+Consider that, while exchanges and bindings survive the loss of individual
173
+nodes, queues and their messages do not because a queue and its contents
174
+are located on one node. If we lose this node, we also lose the queue.
175
+
176
+Mirrored queues in RabbitMQ improve the availability of service since
177
+it is resilient to failures.
178
+
179
+Production servers should run (at least) three RabbitMQ servers for testing
180
+and demonstration purposes, however it is possible to run only two servers.
181
+In this section, we configure two nodes, called ``rabbit1`` and ``rabbit2``.
182
+To build a broker, ensure that all nodes have the same Erlang cookie file.
183
+
184
+.. [TODO: Should the example instead use a minimum of three nodes?]
185
+
186
+#. Stop RabbitMQ and copy the cookie from the first node to each of the
187
+   other node(s):
188
+
189
+   .. code-block:: console
190
+
191
+      # scp /var/lib/rabbitmq/.erlang.cookie root@NODE:/var/lib/rabbitmq/.erlang.cookie
192
+
193
+#. On each target node, verify the correct owner,
194
+   group, and permissions of the file :file:`erlang.cookie`:
195
+
196
+   .. code-block:: console
197
+
198
+      # chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
199
+      # chmod 400 /var/lib/rabbitmq/.erlang.cookie
200
+
201
+#. Start the message queue service on all nodes and configure it to start
202
+   when the system boots. On Ubuntu, it is configured by default.
203
+
204
+   On CentOS, RHEL, openSUSE, and SLES:
205
+
206
+   .. code-block:: console
207
+
208
+      # systemctl enable rabbitmq-server.service
209
+      # systemctl start rabbitmq-server.service
210
+
211
+#. Verify that the nodes are running:
212
+
213
+   .. code-block:: console
214
+
215
+      # rabbitmqctl cluster_status
216
+      Cluster status of node rabbit@NODE...
217
+      [{nodes,[{disc,[rabbit@NODE]}]},
218
+       {running_nodes,[rabbit@NODE]},
219
+       {partitions,[]}]
220
+      ...done.
221
+
222
+#. Run the following commands on each node except the first one:
223
+
224
+   .. code-block:: console
225
+
226
+      # rabbitmqctl stop_app
227
+      Stopping node rabbit@NODE...
228
+      ...done.
229
+      # rabbitmqctl join_cluster --ram rabbit@rabbit1
230
+      # rabbitmqctl start_app
231
+      Starting node rabbit@NODE ...
232
+      ...done.
233
+
234
+.. note::
235
+
236
+   The default node type is a disc node. In this guide, nodes
237
+   join the cluster as RAM nodes.
238
+
239
+#. Verify the cluster status:
240
+
241
+   .. code-block:: console
242
+
243
+      # rabbitmqctl cluster_status
244
+      Cluster status of node rabbit@NODE...
245
+      [{nodes,[{disc,[rabbit@rabbit1]},{ram,[rabbit@NODE]}]}, \
246
+          {running_nodes,[rabbit@NODE,rabbit@rabbit1]}]
247
+
248
+   If the cluster is working, you can create usernames and passwords
249
+   for the queues.
250
+
251
+#. To ensure that all queues except those with auto-generated names
252
+   are mirrored across all running nodes,
253
+   set the ``ha-mode`` policy key to all
254
+   by running the following command on one of the nodes:
255
+
256
+   .. code-block:: console
257
+
258
+      # rabbitmqctl set_policy ha-all '^(?!amq\.).*' '{"ha-mode": "all"}'
259
+
260
+More information is available in the RabbitMQ documentation:
261
+
262
+- `Highly Available Queues <https://www.rabbitmq.com/ha.html>`_
263
+- `Clustering Guide <https://www.rabbitmq.com/clustering.html>`_
264
+
265
+.. note::
266
+
267
+   As another option to make RabbitMQ highly available, RabbitMQ contains the
268
+   OCF scripts for the Pacemaker cluster resource agents since version 3.5.7.
269
+   It provides the active/active RabbitMQ cluster with mirrored queues.
270
+   For more information, see `Auto-configuration of a cluster with
271
+   a Pacemaker <https://www.rabbitmq.com/pacemaker.html>`_.
272
+
273
+.. _rabbitmq-services:
274
+
275
+Configure OpenStack services to use Rabbit HA queues
276
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
277
+
278
+Configure the OpenStack components to use at least two RabbitMQ nodes.
279
+
280
+Use these steps to configurate all services using RabbitMQ:
281
+
282
+#. RabbitMQ HA cluster ``host:port`` pairs:
283
+
284
+   .. code-block:: console
285
+
286
+      rabbit_hosts=rabbit1:5672,rabbit2:5672,rabbit3:5672
287
+
288
+#. Retry connecting with RabbitMQ:
289
+
290
+   .. code-block:: console
291
+
292
+      rabbit_retry_interval=1
293
+
294
+#. How long to back-off for between retries when connecting to RabbitMQ:
295
+
296
+   .. code-block:: console
297
+
298
+      rabbit_retry_backoff=2
299
+
300
+#. Maximum retries with trying to connect to RabbitMQ (infinite by default):
301
+
302
+   .. code-block:: console
303
+
304
+      rabbit_max_retries=0
305
+
306
+#. Use durable queues in RabbitMQ:
307
+
308
+   .. code-block:: console
309
+
310
+      rabbit_durable_queues=true
311
+
312
+#. Use HA queues in RabbitMQ (``x-ha-policy: all``):
313
+
314
+   .. code-block:: console
315
+
316
+      rabbit_ha_queues=true
317
+
318
+.. note::
319
+
320
+   If you change the configuration from an old set-up
321
+   that did not use HA queues, restart the service:
322
+
323
+   .. code-block:: console
324
+
325
+      # rabbitmqctl stop_app
326
+      # rabbitmqctl reset
327
+      # rabbitmqctl start_app
328
+
329
+
330
+
331
+
332
+
333
+Pacemaker active/passive
334
+------------------------
335
+
336
+
337
+
338
+Mirrored queues
339
+---------------
340
+
341
+Qpid
342
+----

+ 518
- 0
doc/source/control-plane-stateless.rst View File

@@ -0,0 +1,518 @@
1
+==============================
2
+Configuring stateless services
3
+==============================
4
+
5
+.. to do: scope what details we want on the following services
6
+
7
+API services
8
+~~~~~~~~~~~~
9
+
10
+Load-balancer
11
+~~~~~~~~~~~~~
12
+
13
+HAProxy
14
+-------
15
+
16
+HAProxy provides a fast and reliable HTTP reverse proxy and load balancer
17
+for TCP or HTTP applications. It is particularly suited for web crawling
18
+under very high loads while needing persistence or Layer 7 processing.
19
+It realistically supports tens of thousands of connections with recent
20
+hardware.
21
+
22
+Each instance of HAProxy configures its front end to accept connections only
23
+to the virtual IP (VIP) address. The HAProxy back end (termination
24
+point) is a list of all the IP addresses of instances for load balancing.
25
+
26
+.. note::
27
+
28
+   Ensure your HAProxy installation is not a single point of failure,
29
+   it is advisable to have multiple HAProxy instances running.
30
+
31
+   You can also ensure the availability by other means, using Keepalived
32
+   or Pacemaker.
33
+
34
+Alternatively, you can use a commercial load balancer, which is hardware
35
+or software. We recommend a hardware load balancer as it generally has
36
+good performance.
37
+
38
+For detailed instructions about installing HAProxy on your nodes,
39
+see the HAProxy `official documentation <http://www.haproxy.org/#docs>`_.
40
+
41
+Configuring HAProxy
42
+^^^^^^^^^^^^^^^^^^^
43
+
44
+#. Restart the HAProxy service.
45
+
46
+#. Locate your HAProxy instance on each OpenStack controller in your
47
+   environment. The following is an example ``/etc/haproxy/haproxy.cfg``
48
+   configuration file. Configure your instance using the following
49
+   configuration file, you will need a copy of it on each
50
+   controller node.
51
+
52
+
53
+   .. code-block:: none
54
+
55
+        global
56
+         chroot  /var/lib/haproxy
57
+         daemon
58
+         group  haproxy
59
+         maxconn  4000
60
+         pidfile  /var/run/haproxy.pid
61
+         user  haproxy
62
+
63
+       defaults
64
+         log  global
65
+         maxconn  4000
66
+         option  redispatch
67
+         retries  3
68
+         timeout  http-request 10s
69
+         timeout  queue 1m
70
+         timeout  connect 10s
71
+         timeout  client 1m
72
+         timeout  server 1m
73
+         timeout  check 10s
74
+
75
+        listen dashboard_cluster
76
+         bind <Virtual IP>:443
77
+         balance  source
78
+         option  tcpka
79
+         option  httpchk
80
+         option  tcplog
81
+         server controller1 10.0.0.12:443 check inter 2000 rise 2 fall 5
82
+         server controller2 10.0.0.13:443 check inter 2000 rise 2 fall 5
83
+         server controller3 10.0.0.14:443 check inter 2000 rise 2 fall 5
84
+
85
+        listen galera_cluster
86
+         bind <Virtual IP>:3306
87
+         balance  source
88
+         option  mysql-check
89
+         server controller1 10.0.0.12:3306 check port 9200 inter 2000 rise 2 fall 5
90
+         server controller2 10.0.0.13:3306 backup check port 9200 inter 2000 rise 2 fall 5
91
+         server controller3 10.0.0.14:3306 backup check port 9200 inter 2000 rise 2 fall 5
92
+
93
+        listen glance_api_cluster
94
+         bind <Virtual IP>:9292
95
+         balance  source
96
+         option  tcpka
97
+         option  httpchk
98
+         option  tcplog
99
+         server controller1 10.0.0.12:9292 check inter 2000 rise 2 fall 5
100
+         server controller2 10.0.0.13:9292 check inter 2000 rise 2 fall 5
101
+         server controller3 10.0.0.14:9292 check inter 2000 rise 2 fall 5
102
+
103
+        listen glance_registry_cluster
104
+         bind <Virtual IP>:9191
105
+         balance  source
106
+         option  tcpka
107
+         option  tcplog
108
+         server controller1 10.0.0.12:9191 check inter 2000 rise 2 fall 5
109
+         server controller2 10.0.0.13:9191 check inter 2000 rise 2 fall 5
110
+         server controller3 10.0.0.14:9191 check inter 2000 rise 2 fall 5
111
+
112
+        listen keystone_admin_cluster
113
+         bind <Virtual IP>:35357
114
+         balance  source
115
+         option  tcpka
116
+         option  httpchk
117
+         option  tcplog
118
+         server controller1 10.0.0.12:35357 check inter 2000 rise 2 fall 5
119
+         server controller2 10.0.0.13:35357 check inter 2000 rise 2 fall 5
120
+         server controller3 10.0.0.14:35357 check inter 2000 rise 2 fall 5
121
+
122
+        listen keystone_public_internal_cluster
123
+         bind <Virtual IP>:5000
124
+         balance  source
125
+         option  tcpka
126
+         option  httpchk
127
+         option  tcplog
128
+         server controller1 10.0.0.12:5000 check inter 2000 rise 2 fall 5
129
+         server controller2 10.0.0.13:5000 check inter 2000 rise 2 fall 5
130
+         server controller3 10.0.0.14:5000 check inter 2000 rise 2 fall 5
131
+
132
+        listen nova_ec2_api_cluster
133
+         bind <Virtual IP>:8773
134
+         balance  source
135
+         option  tcpka
136
+         option  tcplog
137
+         server controller1 10.0.0.12:8773 check inter 2000 rise 2 fall 5
138
+         server controller2 10.0.0.13:8773 check inter 2000 rise 2 fall 5
139
+         server controller3 10.0.0.14:8773 check inter 2000 rise 2 fall 5
140
+
141
+        listen nova_compute_api_cluster
142
+         bind <Virtual IP>:8774
143
+         balance  source
144
+         option  tcpka
145
+         option  httpchk
146
+         option  tcplog
147
+         server controller1 10.0.0.12:8774 check inter 2000 rise 2 fall 5
148
+         server controller2 10.0.0.13:8774 check inter 2000 rise 2 fall 5
149
+         server controller3 10.0.0.14:8774 check inter 2000 rise 2 fall 5
150
+
151
+        listen nova_metadata_api_cluster
152
+         bind <Virtual IP>:8775
153
+         balance  source
154
+         option  tcpka
155
+         option  tcplog
156
+         server controller1 10.0.0.12:8775 check inter 2000 rise 2 fall 5
157
+         server controller2 10.0.0.13:8775 check inter 2000 rise 2 fall 5
158
+         server controller3 10.0.0.14:8775 check inter 2000 rise 2 fall 5
159
+
160
+        listen cinder_api_cluster
161
+         bind <Virtual IP>:8776
162
+         balance  source
163
+         option  tcpka
164
+         option  httpchk
165
+         option  tcplog
166
+         server controller1 10.0.0.12:8776 check inter 2000 rise 2 fall 5
167
+         server controller2 10.0.0.13:8776 check inter 2000 rise 2 fall 5
168
+         server controller3 10.0.0.14:8776 check inter 2000 rise 2 fall 5
169
+
170
+        listen ceilometer_api_cluster
171
+         bind <Virtual IP>:8777
172
+         balance  source
173
+         option  tcpka
174
+         option  tcplog
175
+         server controller1 10.0.0.12:8777 check inter 2000 rise 2 fall 5
176
+         server controller2 10.0.0.13:8777 check inter 2000 rise 2 fall 5
177
+         server controller3 10.0.0.14:8777 check inter 2000 rise 2 fall 5
178
+
179
+        listen nova_vncproxy_cluster
180
+         bind <Virtual IP>:6080
181
+         balance  source
182
+         option  tcpka
183
+         option  tcplog
184
+         server controller1 10.0.0.12:6080 check inter 2000 rise 2 fall 5
185
+         server controller2 10.0.0.13:6080 check inter 2000 rise 2 fall 5
186
+         server controller3 10.0.0.14:6080 check inter 2000 rise 2 fall 5
187
+
188
+        listen neutron_api_cluster
189
+         bind <Virtual IP>:9696
190
+         balance  source
191
+         option  tcpka
192
+         option  httpchk
193
+         option  tcplog
194
+         server controller1 10.0.0.12:9696 check inter 2000 rise 2 fall 5
195
+         server controller2 10.0.0.13:9696 check inter 2000 rise 2 fall 5
196
+         server controller3 10.0.0.14:9696 check inter 2000 rise 2 fall 5
197
+
198
+        listen swift_proxy_cluster
199
+         bind <Virtual IP>:8080
200
+         balance  source
201
+         option  tcplog
202
+         option  tcpka
203
+         server controller1 10.0.0.12:8080 check inter 2000 rise 2 fall 5
204
+         server controller2 10.0.0.13:8080 check inter 2000 rise 2 fall 5
205
+         server controller3 10.0.0.14:8080 check inter 2000 rise 2 fall 5
206
+
207
+   .. note::
208
+
209
+      The Galera cluster configuration directive ``backup`` indicates
210
+      that two of the three controllers are standby nodes.
211
+      This ensures that only one node services write requests
212
+      because OpenStack support for multi-node writes is not yet production-ready.
213
+
214
+   .. note::
215
+
216
+      The Telemetry API service configuration does not have the ``option httpchk``
217
+      directive as it cannot process this check properly.
218
+
219
+.. TODO: explain why the Telemetry API is so special
220
+
221
+#. Configure the kernel parameter to allow non-local IP binding. This allows
222
+   running HAProxy instances to bind to a VIP for failover. Add following line
223
+   to ``/etc/sysctl.conf``:
224
+
225
+   .. code-block:: none
226
+
227
+      net.ipv4.ip_nonlocal_bind = 1
228
+
229
+#. Restart the host or, to make changes work immediately, invoke:
230
+
231
+   .. code-block:: console
232
+
233
+      $ sysctl -p
234
+
235
+#. Add HAProxy to the cluster and ensure the VIPs can only run on machines
236
+   where HAProxy is active:
237
+
238
+   ``pcs``
239
+
240
+   .. code-block:: console
241
+
242
+      $ pcs resource create lb-haproxy systemd:haproxy --clone
243
+      $ pcs constraint order start vip then lb-haproxy-clone kind=Optional
244
+      $ pcs constraint colocation add lb-haproxy-clone with vip
245
+
246
+   ``crmsh``
247
+
248
+   .. code-block:: console
249
+
250
+      $ crm cib new conf-haproxy
251
+      $ crm configure primitive haproxy lsb:haproxy op monitor interval="1s"
252
+      $ crm configure clone haproxy-clone haproxy
253
+      $ crm configure colocation vip-with-haproxy inf: vip haproxy-clone
254
+      $ crm configure order haproxy-after-vip mandatory: vip haproxy-clone
255
+
256
+
257
+Pacemaker versus systemd
258
+------------------------
259
+
260
+Memcached
261
+---------
262
+
263
+Memcached is a general-purpose distributed memory caching system. It
264
+is used to speed up dynamic database-driven websites by caching data
265
+and objects in RAM to reduce the number of times an external data
266
+source must be read.
267
+
268
+Memcached is a memory cache demon that can be used by most OpenStack
269
+services to store ephemeral data, such as tokens.
270
+
271
+Access to Memcached is not handled by HAProxy because replicated
272
+access is currently in an experimental state. Instead, OpenStack
273
+services must be supplied with the full list of hosts running
274
+Memcached.
275
+
276
+The Memcached client implements hashing to balance objects among the
277
+instances. Failure of an instance impacts only a percentage of the
278
+objects and the client automatically removes it from the list of
279
+instances. The SLA is several minutes.
280
+
281
+
282
+Highly available API services
283
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
284
+
285
+Identity API
286
+------------
287
+
288
+Ensure you have read the
289
+`OpenStack Identity service getting started documentation
290
+<https://docs.openstack.org/admin-guide/common/get-started-identity.html>`_.
291
+
292
+.. to do: reference controller-ha-identity and see if section involving
293
+   adding to pacemaker is in scope
294
+
295
+
296
+Add OpenStack Identity resource to Pacemaker
297
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
298
+
299
+The following section(s) detail how to add the Identity service
300
+to Pacemaker on SUSE and Red Hat.
301
+
302
+SUSE
303
+----
304
+
305
+SUSE Enterprise Linux and SUSE-based distributions, such as openSUSE,
306
+use a set of OCF agents for controlling OpenStack services.
307
+
308
+#. Run the following commands to download the OpenStack Identity resource
309
+   to Pacemaker:
310
+
311
+   .. code-block:: console
312
+
313
+      # cd /usr/lib/ocf/resource.d
314
+      # mkdir openstack
315
+      # cd openstack
316
+      # wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/keystone
317
+      # chmod a+rx *
318
+
319
+#. Add the Pacemaker configuration for the OpenStack Identity resource
320
+   by running the following command to connect to the Pacemaker cluster:
321
+
322
+   .. code-block:: console
323
+
324
+      # crm configure
325
+
326
+#. Add the following cluster resources:
327
+
328
+   .. code-block:: console
329
+
330
+      clone p_keystone ocf:openstack:keystone \
331
+      params config="/etc/keystone/keystone.conf" os_password="secretsecret" os_username="admin" os_tenant_name="admin" os_auth_url="http://10.0.0.11:5000/v2.0/" \
332
+      op monitor interval="30s" timeout="30s"
333
+
334
+   .. note::
335
+
336
+      This configuration creates ``p_keystone``,
337
+      a resource for managing the OpenStack Identity service.
338
+
339
+#. Commit your configuration changes from the :command:`crm configure` menu
340
+   with the following command:
341
+
342
+   .. code-block:: console
343
+
344
+      # commit
345
+
346
+   The :command:`crm configure` supports batch input. You may have to copy and
347
+   paste the above lines into your live Pacemaker configuration, and then make
348
+   changes as required.
349
+
350
+   For example, you may enter ``edit p_ip_keystone`` from the
351
+   :command:`crm configure` menu and edit the resource to match your preferred
352
+   virtual IP address.
353
+
354
+   Pacemaker now starts the OpenStack Identity service and its dependent
355
+   resources on all of your nodes.
356
+
357
+Red Hat
358
+--------
359
+
360
+For Red Hat Enterprise Linux and Red Hat-based Linux distributions,
361
+the following process uses Systemd unit files.
362
+
363
+.. code-block:: console
364
+
365
+   # pcs resource create openstack-keystone systemd:openstack-keystone --clone interleave=true
366
+
367
+.. _identity-config-identity:
368
+
369
+Configure OpenStack Identity service
370
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
371
+
372
+#. Edit the :file:`keystone.conf` file
373
+   to change the values of the :manpage:`bind(2)` parameters:
374
+
375
+   .. code-block:: ini
376
+
377
+      bind_host = 10.0.0.12
378
+      public_bind_host = 10.0.0.12
379
+      admin_bind_host = 10.0.0.12
380
+
381
+   The ``admin_bind_host`` parameter
382
+   lets you use a private network for admin access.
383
+
384
+#. To be sure that all data is highly available,
385
+   ensure that everything is stored in the MySQL database
386
+   (which is also highly available):
387
+
388
+   .. code-block:: ini
389
+
390
+      [catalog]
391
+      driver = keystone.catalog.backends.sql.Catalog
392
+      # ...
393
+      [identity]
394
+      driver = keystone.identity.backends.sql.Identity
395
+      # ...
396
+
397
+#. If the Identity service will be sending ceilometer notifications
398
+   and your message bus is configured for high availability, you will
399
+   need to ensure that the Identity service is correctly configured to
400
+   use it.
401
+
402
+.. _identity-services-config:
403
+
404
+Configure OpenStack services to use the highly available OpenStack Identity
405
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
406
+
407
+Your OpenStack services now point their OpenStack Identity configuration
408
+to the highly available virtual cluster IP address.
409
+
410
+#. For OpenStack Compute service, (if your OpenStack Identity service
411
+   IP address is 10.0.0.11) use the following configuration in the
412
+   :file:`api-paste.ini` file:
413
+
414
+  .. code-block:: ini
415
+
416
+     auth_host = 10.0.0.11
417
+
418
+#. Create the OpenStack Identity Endpoint with this IP address.
419
+
420
+   .. note::
421
+
422
+      If you are using both private and public IP addresses,
423
+      create two virtual IP addresses and define the endpoint. For
424
+      example:
425
+
426
+   .. code-block:: console
427
+
428
+      $ openstack endpoint create --region $KEYSTONE_REGION \
429
+      $service-type public http://PUBLIC_VIP:5000/v2.0
430
+      $ openstack endpoint create --region $KEYSTONE_REGION \
431
+      $service-type admin http://10.0.0.11:35357/v2.0
432
+      $ openstack endpoint create --region $KEYSTONE_REGION \
433
+      $service-type internal http://10.0.0.11:5000/v2.0
434
+
435
+#. If you are using Dashboard (horizon), edit the :file:`local_settings.py`
436
+   file to include the following:
437
+
438
+      .. code-block:: ini
439
+
440
+         OPENSTACK_HOST = 10.0.0.11
441
+
442
+
443
+Telemetry API
444
+-------------
445
+
446
+The Telemetry polling agent can be configured to partition its polling
447
+workload between multiple agents. This enables high availability (HA).
448
+
449
+Both the central and the compute agent can run in an HA deployment.
450
+This means that multiple instances of these services can run in
451
+parallel with workload partitioning among these running instances.
452
+
453
+The `Tooz <https://pypi.org/project/tooz>`_ library provides
454
+the coordination within the groups of service instances.
455
+It provides an API above several back ends that can be used for building
456
+distributed applications.
457
+
458
+Tooz supports
459
+`various drivers <https://docs.openstack.org/tooz/latest/user/drivers.html>`_
460
+including the following back end solutions:
461
+
462
+* `Zookeeper <http://zookeeper.apache.org/>`_:
463
+    Recommended solution by the Tooz project.
464
+
465
+* `Redis <http://redis.io/>`_:
466
+    Recommended solution by the Tooz project.
467
+
468
+* `Memcached <http://memcached.org/>`_:
469
+    Recommended for testing.
470
+
471
+You must configure a supported Tooz driver for the HA deployment of
472
+the Telemetry services.
473
+
474
+For information about the required configuration options
475
+to set in the :file:`ceilometer.conf`, see the `coordination section
476
+<https://docs.openstack.org/ocata/config-reference/telemetry.html>`_
477
+in the OpenStack Configuration Reference.
478
+
479
+.. note::
480
+
481
+   Only one instance for the central and compute agent service(s) is able
482
+   to run and function correctly if the ``backend_url`` option is not set.
483
+
484
+The availability check of the instances is provided by heartbeat messages.
485
+When the connection with an instance is lost, the workload will be
486
+reassigned within the remaining instances in the next polling cycle.
487
+
488
+.. note::
489
+
490
+   Memcached uses a timeout value, which should always be set to
491
+   a value that is higher than the heartbeat value set for Telemetry.
492
+
493
+For backward compatibility and supporting existing deployments, the central
494
+agent configuration supports using different configuration files. This is for
495
+groups of service instances that are running in parallel.
496
+For enabling this configuration, set a value for the
497
+``partitioning_group_prefix`` option in the
498
+`polling section <https://docs.openstack.org/ocata/config-reference/telemetry/telemetry-config-options.html>`_
499
+in the OpenStack Configuration Reference.
500
+
501
+.. warning::
502
+
503
+   For each sub-group of the central agent pool with the same
504
+   ``partitioning_group_prefix``, a disjoint subset of meters must be polled
505
+   to avoid samples being missing or duplicated. The list of meters to poll
506
+   can be set in the :file:`/etc/ceilometer/pipeline.yaml` configuration file.
507
+   For more information about pipelines see the `Data processing and pipelines
508
+   <https://docs.openstack.org/admin-guide/telemetry-data-pipelines.html>`_
509
+   section.
510
+
511
+To enable the compute agent to run multiple instances simultaneously with
512
+workload partitioning, the ``workload_partitioning`` option must be set to
513
+``True`` under the `compute section <https://docs.openstack.org/ocata/config-reference/telemetry.html>`_
514
+in the :file:`ceilometer.conf` configuration file.
515
+
516
+
517
+.. To Do: Cover any other projects here with API services which require specific
518
+   HA details.

+ 9
- 0
doc/source/control-plane.rst View File

@@ -0,0 +1,9 @@
1
+===========================
2
+Configuring a control plane
3
+===========================
4
+
5
+.. toctree::
6
+   :maxdepth: 2
7
+
8
+   control-plane-stateless.rst
9
+   control-plane-stateful.rst

BIN
doc/source/figures/Cluster-deployment-collapsed.png View File


BIN
doc/source/figures/Cluster-deployment-segregated.png View File


+ 15
- 0
doc/source/ha-community.rst View File

@@ -0,0 +1,15 @@
1
+============
2
+HA community
3
+============
4
+
5
+The OpenStack HA community holds `weekly IRC meetings
6
+<https://wiki.openstack.org/wiki/Meetings/HATeamMeeting>`_ to discuss
7
+a range of topics relating to HA in OpenStack. Everyone interested is
8
+encouraged to attend. The `logs of all previous meetings
9
+<http://eavesdrop.openstack.org/meetings/ha/>`_ are available to read.
10
+
11
+You can contact the HA community directly in `the #openstack-ha
12
+channel on Freenode IRC <https://wiki.openstack.org/wiki/IRC>`_, or by
13
+sending mail to the `openstack-dev
14
+<https://wiki.openstack.org/wiki/Mailing_Lists#Future_Development>`_
15
+mailing list with the ``[HA]`` prefix in the ``Subject`` header.

+ 26
- 3
doc/source/index.rst View File

@@ -5,8 +5,31 @@ OpenStack High Availability Guide
5 5
 Abstract
6 6
 ~~~~~~~~
7 7
 
8
-This guide provides information about configuring OpenStack services for high
9
-availability.
8
+This guide describes how to install and configure OpenStack for high
9
+availability. It supplements the Installation Guides
10
+and assumes that you are familiar with the material in those guides.
10 11
 
11
-This is a placeholder while we migrate information over from another repo.
12
+.. warning::
12 13
 
14
+   This guide is a work-in-progress and changing rapidly
15
+   while we continue to test and enhance the guidance. There are
16
+   open `TODO` items throughout and available on the OpenStack manuals
17
+   `bug list <https://bugs.launchpad.net/openstack-manuals?field.tag=ha-guide>`_.
18
+   Please help where you are able.
19
+
20
+.. toctree::
21
+   :maxdepth: 1
22
+
23
+   common/conventions.rst
24
+   overview.rst
25
+   intro-ha.rst
26
+   intro-os-ha.rst
27
+   control-plane.rst
28
+   networking-ha.rst
29
+   storage-ha.rst
30
+   compute-node-ha.rst
31
+   monitoring.rst
32
+   testing.rst
33
+   ref-arch-examples.rst
34
+   ha-community.rst
35
+   common/appendix.rst

+ 127
- 0
doc/source/intro-ha-common-tech.rst View File

@@ -0,0 +1,127 @@
1
+========================
2
+Commonly used technology
3
+========================
4
+High availability can be achieved only on system level, while both hardware and
5
+software components can contribute to the system level availability.
6
+This document lists the most common hardware and software technologies
7
+that can be used to build a highly available system.
8
+
9
+Hardware
10
+~~~~~~~~
11
+Using different technologies to enable high availability on the hardware
12
+level provides a good basis to build a high available system. The next chapters
13
+discuss the most common technologies used in this field.
14
+
15
+Redundant switches
16
+------------------
17
+Network switches are single point of failures as networking is critical to
18
+operate all other basic domains of the infrastructure, like compute and
19
+storage. Network switches need to be able to forward the network traffic
20
+and be able to forward the traffic to a working next hop.
21
+For these reasons consider the following two factors when making a network
22
+switch redundant:
23
+
24
+#. The network switch itself should synchronize its internal state to a
25
+   redundant switch either in active/active or active/passive way.
26
+
27
+#. The network topology should be designed in a way that the network router can
28
+   use at least two paths in every critical direction.
29
+
30
+Bonded interfaces
31
+-----------------
32
+Bonded interfaces are two independent physical network interfaces handled as
33
+one interface in active/passive or in active/active redundancy mode. In
34
+active/passive mode, if an error happens in the active network interface or in
35
+the remote end of the interface, the interfaces are switched over. In
36
+active/active mode, when an error happens in an interface or in the remote end
37
+of an interface, then the interface is marked as unavailable and ceases to be
38
+used.
39
+
40
+Load balancers
41
+--------------
42
+Physical load balancers are special routers which direct the traffic in
43
+different directions based on a set of rules. Load balancers can be in
44
+redundant mode similarly to the physical switches.
45
+Load balancers are also important for distributing the traffic to the different
46
+active/active components of the system.
47
+
48
+Storage
49
+-------
50
+Physical storage high availability can be achieved with different scopes:
51
+
52
+#. High availability within a hardware unit with redundant disks (mostly
53
+   organized into different RAID configurations), redundant control components,
54
+   redundant I/O interfaces and redundant power supply.
55
+
56
+#. System level high availability with redundant hardware units with data
57
+   replication.
58
+
59
+Software
60
+~~~~~~~~
61
+
62
+HAproxy
63
+-------
64
+
65
+HAProxy provides a fast and reliable HTTP reverse proxy and load balancer
66
+for TCP or HTTP applications. It is particularly suited for web crawling
67
+under very high loads while needing persistence or Layer 7 processing.
68
+It realistically supports tens of thousands of connections with recent
69
+hardware.
70
+
71
+.. note::
72
+
73
+   Ensure your HAProxy installation is not a single point of failure,
74
+   it is advisable to have multiple HAProxy instances running.
75
+
76
+   You can also ensure the availability by other means, using Keepalived
77
+   or Pacemaker.
78
+
79
+Alternatively, you can use a commercial load balancer, which is hardware
80
+or software. We recommend a hardware load balancer as it generally has
81
+good performance.
82
+
83
+For detailed instructions about installing HAProxy on your nodes,
84
+see the HAProxy `official documentation <http://www.haproxy.org/#docs>`_.
85
+
86
+keepalived
87
+----------
88
+
89
+`keepalived <http://www.keepalived.org/>`_ is a routing software that
90
+provides facilities for load balancing and high-availability to Linux
91
+system and Linux based infrastructures.
92
+
93
+Keepalived implements a set of checkers to dynamically and
94
+adaptively maintain and manage loadbalanced server pool according
95
+their health.
96
+
97
+The keepalived daemon can be used to monitor services or systems and
98
+to automatically failover to a standby if problems occur.
99
+
100
+Pacemaker
101
+---------
102
+
103
+`Pacemaker <http://clusterlabs.org/>`_ cluster stack is a state-of-the-art
104
+high availability and load balancing stack for the Linux platform.
105
+Pacemaker is used to make OpenStack infrastructure highly available.
106
+
107
+Pacemaker relies on the
108
+`Corosync <http://corosync.github.io/corosync/>`_ messaging layer
109
+for reliable cluster communications. Corosync implements the Totem single-ring
110
+ordering and membership protocol. It also provides UDP and InfiniBand based
111
+messaging, quorum, and cluster membership to Pacemaker.
112
+
113
+Pacemaker does not inherently understand the applications it manages.
114
+Instead, it relies on resource agents (RAs) that are scripts that encapsulate
115
+the knowledge of how to start, stop, and check the health of each application
116
+managed by the cluster.
117
+
118
+These agents must conform to one of the `OCF <https://github.com/ClusterLabs/
119
+OCF-spec/blob/master/ra/resource-agent-api.md>`_,
120
+`SysV Init <http://refspecs.linux-foundation.org/LSB_3.0.0/LSB-Core-generic/
121
+LSB-Core-generic/iniscrptact.html>`_, Upstart, or Systemd standards.
122
+
123
+Pacemaker ships with a large set of OCF agents (such as those managing
124
+MySQL databases, virtual IP addresses, and RabbitMQ), but can also use
125
+any agents already installed on your system and can be extended with
126
+your own (see the
127
+`developer guide <http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html>`_).

+ 147
- 0
doc/source/intro-ha-key-concepts.rst View File

@@ -0,0 +1,147 @@
1
+============
2
+Key concepts
3
+============
4
+
5
+Redundancy and failover
6
+~~~~~~~~~~~~~~~~~~~~~~~
7
+
8
+High availability is implemented with redundant hardware
9
+running redundant instances of each service.
10
+If one piece of hardware running one instance of a service fails,
11
+the system can then failover to use another instance of a service
12
+that is running on hardware that did not fail.
13
+
14
+A crucial aspect of high availability
15
+is the elimination of single points of failure (SPOFs).
16
+A SPOF is an individual piece of equipment or software
17
+that causes system downtime or data loss if it fails.
18
+In order to eliminate SPOFs, check that mechanisms exist for redundancy of:
19
+
20
+- Network components, such as switches and routers
21
+
22
+- Applications and automatic service migration
23
+
24
+- Storage components
25
+
26
+- Facility services such as power, air conditioning, and fire protection
27
+
28
+In the event that a component fails and a back-up system must take on
29
+its load, most high availability systems will replace the failed
30
+component as quickly as possible to maintain necessary redundancy. This
31
+way time spent in a degraded protection state is minimized.
32
+
33
+Most high availability systems fail in the event of multiple
34
+independent (non-consequential) failures. In this case, most
35
+implementations favor protecting data over maintaining availability.
36
+
37
+High availability systems typically achieve an uptime percentage of
38
+99.99% or more, which roughly equates to less than an hour of
39
+cumulative downtime per year. In order to achieve this, high
40
+availability systems should keep recovery times after a failure to
41
+about one to two minutes, sometimes significantly less.
42
+
43
+OpenStack currently meets such availability requirements for its own
44
+infrastructure services, meaning that an uptime of 99.99% is feasible
45
+for the OpenStack infrastructure proper. However, OpenStack does not
46
+guarantee 99.99% availability for individual guest instances.
47
+
48
+This document discusses some common methods of implementing highly
49
+available systems, with an emphasis on the core OpenStack services and
50
+other open source services that are closely aligned with OpenStack.
51
+
52
+You will need to address high availability concerns for any applications
53
+software that you run on your OpenStack environment. The important thing is
54
+to make sure that your services are redundant and available.
55
+How you achieve that is up to you.
56
+
57
+Active/passive versus active/active
58
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
59
+
60
+Stateful services can be configured as active/passive or active/active,
61
+which are defined as follows:
62
+
63
+:term:`active/passive configuration`
64
+  Maintains a redundant instance
65
+  that can be brought online when the active service fails.
66
+  For example, OpenStack writes to the main database
67
+  while maintaining a disaster recovery database that can be brought online
68
+  if the main database fails.
69
+
70
+  A typical active/passive installation for a stateful service maintains
71
+  a replacement resource that can be brought online when required.
72
+  Requests are handled using a :term:`virtual IP address (VIP)` that
73
+  facilitates returning to service with minimal reconfiguration.
74
+  A separate application (such as Pacemaker or Corosync) monitors
75
+  these services, bringing the backup online as necessary.
76
+
77
+:term:`active/active configuration`
78
+  Each service also has a backup but manages both the main and
79
+  redundant systems concurrently.
80
+  This way, if there is a failure, the user is unlikely to notice.
81
+  The backup system is already online and takes on increased load
82
+  while the main system is fixed and brought back online.
83
+
84
+  Typically, an active/active installation for a stateless service
85
+  maintains a redundant instance, and requests are load balanced using
86
+  a virtual IP address and a load balancer such as HAProxy.
87
+
88
+  A typical active/active installation for a stateful service includes
89
+  redundant services, with all instances having an identical state. In
90
+  other words, updates to one instance of a database update all other
91
+  instances. This way a request to one instance is the same as a
92
+  request to any other. A load balancer manages the traffic to these
93
+  systems, ensuring that operational systems always handle the
94
+  request.
95
+
96
+Clusters and quorums
97
+~~~~~~~~~~~~~~~~~~~~
98
+
99
+The quorum specifies the minimal number of nodes
100
+that must be functional in a cluster of redundant nodes
101
+in order for the cluster to remain functional.
102
+When one node fails and failover transfers control to other nodes,
103
+the system must ensure that data and processes remain sane.
104
+To determine this, the contents of the remaining nodes are compared
105
+and, if there are discrepancies, a majority rules algorithm is implemented.
106
+
107
+For this reason, each cluster in a high availability environment should
108
+have an odd number of nodes and the quorum is defined as more than a half
109
+of the nodes.
110
+If multiple nodes fail so that the cluster size falls below the quorum
111
+value, the cluster itself fails.
112
+
113
+For example, in a seven-node cluster, the quorum should be set to
114
+``floor(7/2) + 1 == 4``. If quorum is four and four nodes fail simultaneously,
115
+the cluster itself would fail, whereas it would continue to function, if
116
+no more than three nodes fail. If split to partitions of three and four nodes
117
+respectively, the quorum of four nodes would continue to operate the majority
118
+partition and stop or fence the minority one (depending on the
119
+no-quorum-policy cluster configuration).
120
+
121
+And the quorum could also have been set to three, just as a configuration
122
+example.
123
+
124
+.. note::
125
+
126
+  We do not recommend setting the quorum to a value less than ``floor(n/2) + 1``
127
+  as it would likely cause a split-brain in a face of network partitions.
128
+
129
+When four nodes fail simultaneously, the cluster would continue to function as
130
+well. But if split to partitions of three and four nodes respectively, the
131
+quorum of three would have made both sides to attempt to fence the other and
132
+host resources. Without fencing enabled, it would go straight to running
133
+two copies of each resource.
134
+
135
+This is why setting the quorum to a value less than ``floor(n/2) + 1`` is
136
+dangerous. However it may be required for some specific cases, such as a
137
+temporary measure at a point it is known with 100% certainty that the other
138
+nodes are down.
139
+
140
+When configuring an OpenStack environment for study or demonstration purposes,
141
+it is possible to turn off the quorum checking. Production systems should
142
+always run with quorum enabled.
143
+
144
+Load balancing
145
+~~~~~~~~~~~~~~
146
+
147
+.. to do: definition and description of need within HA

+ 24
- 0
doc/source/intro-ha.rst View File

@@ -0,0 +1,24 @@
1
+=================================
2
+Introduction to high availability
3
+=================================
4
+
5
+High availability systems seek to minimize the following issues:
6
+
7
+#. System downtime: Occurs when a user-facing service is unavailable
8
+   beyond a specified maximum amount of time.
9
+
10
+#. Data loss: Accidental deletion or destruction of data.
11
+
12
+Most high availability systems guarantee protection against system downtime
13
+and data loss only in the event of a single failure.
14
+However, they are also expected to protect against cascading failures,
15
+where a single failure deteriorates into a series of consequential failures.
16
+Many service providers guarantee a :term:`Service Level Agreement (SLA)`
17
+including uptime percentage of computing service, which is calculated based
18
+on the available time and system downtime excluding planned outage time.
19
+
20
+.. toctree::
21
+   :maxdepth: 2
22
+
23
+   intro-ha-key-concepts.rst
24
+   intro-ha-common-tech.rst

+ 67
- 0
doc/source/intro-os-ha-cluster.rst View File

@@ -0,0 +1,67 @@
1
+================
2
+Cluster managers
3
+================
4
+
5
+At its core, a cluster is a distributed finite state machine capable
6
+of co-ordinating the startup and recovery of inter-related services
7
+across a set of machines.
8
+
9
+Even a distributed or replicated application that is able to survive failures
10
+on one or more machines can benefit from a cluster manager because a cluster
11
+manager has the following capabilities:
12
+
13
+#. Awareness of other applications in the stack
14
+
15
+   While SYS-V init replacements like systemd can provide
16
+   deterministic recovery of a complex stack of services, the
17
+   recovery is limited to one machine and lacks the context of what
18
+   is happening on other machines. This context is crucial to
19
+   determine the difference between a local failure, and clean startup
20
+   and recovery after a total site failure.
21
+
22
+#. Awareness of instances on other machines
23
+
24
+   Services like RabbitMQ and Galera have complicated boot-up
25
+   sequences that require co-ordination, and often serialization, of
26
+   startup operations across all machines in the cluster. This is
27
+   especially true after a site-wide failure or shutdown where you must
28
+   first determine the last machine to be active.
29
+
30
+#. A shared implementation and calculation of `quorum
31
+   <https://en.wikipedia.org/wiki/Quorum_(Distributed_Systems)>`_
32
+
33
+   It is very important that all members of the system share the same
34
+   view of who their peers are and whether or not they are in the
35
+   majority. Failure to do this leads very quickly to an internal
36
+   `split-brain <https://en.wikipedia.org/wiki/Split-brain_(computing)>`_
37
+   state. This is where different parts of the system are pulling in
38
+   different and incompatible directions.
39
+
40
+#. Data integrity through fencing (a non-responsive process does not
41
+   imply it is not doing anything)
42
+
43
+   A single application does not have sufficient context to know the
44
+   difference between failure of a machine and failure of the
45
+   application on a machine. The usual practice is to assume the
46
+   machine is dead and continue working, however this is highly risky. A
47
+   rogue process or machine could still be responding to requests and
48
+   generally causing havoc. The safer approach is to make use of
49
+   remotely accessible power switches and/or network switches and SAN
50
+   controllers to fence (isolate) the machine before continuing.
51
+
52
+#. Automated recovery of failed instances
53
+
54
+   While the application can still run after the failure of several
55
+   instances, it may not have sufficient capacity to serve the
56
+   required volume of requests. A cluster can automatically recover
57
+   failed instances to prevent additional load induced failures.
58
+
59
+Pacemaker
60
+~~~~~~~~~
61
+.. to do: description and point to ref arch example using pacemaker
62
+
63
+`Pacemaker <http://clusterlabs.org>`_.
64
+
65
+Systemd
66
+~~~~~~~
67
+.. to do: description and point to ref arch example using Systemd and link

+ 35
- 0
doc/source/intro-os-ha-memcached.rst View File

@@ -0,0 +1,35 @@
1
+=========
2
+Memcached
3
+=========
4
+
5
+Most OpenStack services can use Memcached to store ephemeral data such as
6
+tokens. Although Memcached does not support typical forms of redundancy such
7
+as clustering, OpenStack services can use almost any number of instances
8
+by configuring multiple hostnames or IP addresses.
9
+
10
+The Memcached client implements hashing to balance objects among the instances.
11
+Failure of an instance only impacts a percentage of the objects,
12
+and the client automatically removes it from the list of instances.
13
+
14
+Installation
15
+~~~~~~~~~~~~
16
+
17
+To install and configure Memcached, read the
18
+`official documentation <https://github.com/Memcached/Memcached/wiki#getting-started>`_.
19
+
20
+Memory caching is managed by `oslo.cache
21
+<http://specs.openstack.org/openstack/oslo-specs/specs/kilo/oslo-cache-using-dogpile.html>`_.
22
+This ensures consistency across all projects when using multiple Memcached
23
+servers. The following is an example configuration with three hosts:
24
+
25
+.. code-block:: ini
26
+
27
+  Memcached_servers = controller1:11211,controller2:11211,controller3:11211
28
+
29
+By default, ``controller1`` handles the caching service. If the host goes down,
30
+``controller2`` or ``controller3`` will complete the service.
31
+
32
+For more information about Memcached installation, see the
33
+*Environment -> Memcached* section in the
34
+`Installation Guides <https://docs.openstack.org/ocata/install/>`_
35
+depending on your distribution.

+ 52
- 0
doc/source/intro-os-ha-state.rst View File

@@ -0,0 +1,52 @@
1
+==================================
2
+Stateless versus stateful services
3
+==================================
4
+
5
+OpenStack components can be divided into three categories:
6
+
7
+- OpenStack APIs: APIs that are HTTP(s) stateless services written in python,
8
+  easy to duplicate and mostly easy to load balance.
9
+
10
+- The SQL relational database server provides stateful type consumed by other
11
+  components. Supported databases are MySQL, MariaDB, and PostgreSQL.
12
+  Making the SQL database redundant is complex.
13
+
14
+- :term:`Advanced Message Queuing Protocol (AMQP)` provides OpenStack
15
+  internal stateful communication service.
16
+
17
+.. to do: Ensure the difference between stateless and stateful services
18
+.. is clear
19
+
20
+Stateless services
21
+~~~~~~~~~~~~~~~~~~
22
+
23
+A service that provides a response after your request and then
24
+requires no further attention. To make a stateless service highly
25
+available, you need to provide redundant instances and load balance them.
26
+
27
+Stateless OpenStack services
28
+----------------------------
29
+
30
+OpenStack services that are stateless include ``nova-api``,
31
+``nova-conductor``, ``glance-api``, ``keystone-api``, ``neutron-api``,
32
+and ``nova-scheduler``.
33
+
34
+Stateful services
35
+~~~~~~~~~~~~~~~~~
36
+
37
+A service where subsequent requests to the service
38
+depend on the results of the first request.
39
+Stateful services are more difficult to manage because a single
40
+action typically involves more than one request. Providing
41
+additional instances and load balancing does not solve the problem.
42
+For example, if the horizon user interface reset itself every time
43
+you went to a new page, it would not be very useful.
44
+OpenStack services that are stateful include the OpenStack database
45
+and message queue.
46
+Making stateful services highly available can depend on whether you choose
47
+an active/passive or active/active configuration.
48
+
49
+Stateful OpenStack services
50
+----------------------------
51
+
52
+.. to do: create list of stateful services

+ 12
- 0
doc/source/intro-os-ha.rst View File

@@ -0,0 +1,12 @@
1
+================================================
2
+Introduction to high availability with OpenStack
3
+================================================
4
+
5
+.. to do: description of section & improvement of title (intro to OS HA)
6
+
7
+.. toctree::
8
+   :maxdepth: 2
9
+
10
+   intro-os-ha-state.rst
11
+   intro-os-ha-cluster.rst
12
+   intro-os-ha-memcached.rst

+ 6
- 0
doc/source/monitoring.rst View File

@@ -0,0 +1,6 @@
1
+==========
2
+Monitoring
3
+==========
4
+
5
+
6
+

+ 20
- 0
doc/source/networking-ha-l3-agent.rst View File

@@ -0,0 +1,20 @@
1
+========
2
+L3 Agent
3
+========
4
+.. TODO: Introduce L3 agent
5
+
6
+HA Routers
7
+~~~~~~~~~~
8
+.. TODO: content for HA routers
9
+
10
+Networking DHCP agent
11
+~~~~~~~~~~~~~~~~~~~~~
12
+The OpenStack Networking (neutron) service has a scheduler that lets you run
13
+multiple agents across nodes. The DHCP agent can be natively highly available.
14
+
15
+To configure the number of DHCP agents per network, modify the
16
+``dhcp_agents_per_network`` parameter in the :file:`/etc/neutron/neutron.conf`
17
+file. By default this is set to 1. To achieve high availability, assign more
18
+than one DHCP agent per network. For more information, see
19
+`High-availability for DHCP
20
+<https://docs.openstack.org/newton/networking-guide/config-dhcp-ha.html>`_.

+ 6
- 0
doc/source/networking-ha-neutron-l3-analysis.rst View File

@@ -0,0 +1,6 @@
1
+==========
2
+Neutron L3
3
+==========
4
+
5
+.. TODO: create and import Neutron L3 analysis
6
+   Introduce the Networking (neutron) service L3 agent

+ 5
- 0
doc/source/networking-ha-neutron-server.rst View File

@@ -0,0 +1,5 @@
1
+=========================
2
+Neutron Networking server
3
+=========================
4
+
5
+.. TODO: Create content similar to other API sections

+ 29
- 0
doc/source/networking-ha.rst View File

@@ -0,0 +1,29 @@
1
+===================================
2
+Configuring the networking services
3
+===================================
4
+
5
+Configure networking on each node. See the basic information about
6
+configuring networking in the Networking service section of the
7
+`Install Guides <https://docs.openstack.org/ocata/install/>`_,
8
+depending on your distribution.
9
+
10
+OpenStack network nodes contain:
11
+
12
+- Networking DHCP agent
13
+- Neutron L3 agent
14
+- Networking L2 agent
15
+
16
+.. note::
17
+
18
+   The L2 agent cannot be distributed and highly available. Instead, it
19
+   must be installed on each data forwarding node to control the virtual
20
+   network driver such as Open vSwitch or Linux Bridge. One L2 agent runs
21
+   per node and controls its virtual interfaces.
22
+
23
+.. toctree::
24
+   :maxdepth: 2
25
+
26
+   networking-ha-neutron-server.rst
27
+   networking-ha-neutron-l3-analysis.rst
28
+   networking-ha-l3-agent.rst
29
+

+ 24
- 0
doc/source/overview.rst View File

@@ -0,0 +1,24 @@
1
+========
2
+Overview
3
+========
4
+
5
+This guide can be split into two parts:
6
+
7
+#. High level architecture
8
+#. Reference architecture examples, monitoring, and testing
9
+
10
+.. warning::
11
+   We recommend using this guide for assistance when considering your HA cloud.
12
+   We do not recommend using this guide for manually building your HA cloud.
13
+   We recommend starting with a pre-validated solution and adjusting to your
14
+   needs.
15
+
16
+High availability is not for every user. It presents some challenges.
17
+High availability may be too complex for databases or
18
+systems with large amounts of data. Replication can slow large systems
19
+down. Different setups have different prerequisites. Read the guidelines
20
+for each setup.
21
+
22
+.. important::
23
+
24
+   High availability is turned off as the default in OpenStack setups.

+ 3
- 0
doc/source/ref-arch-examples.rst View File

@@ -0,0 +1,3 @@
1
+======================
2
+Reference Architecture
3
+======================

+ 59
- 0
doc/source/storage-ha-backend.rst View File

@@ -0,0 +1,59 @@
1
+
2
+.. _storage-ha-backend:
3
+
4
+================
5
+Storage back end
6
+================
7
+
8
+An OpenStack environment includes multiple data pools for the VMs:
9
+
10
+- Ephemeral storage is allocated for an instance and is deleted when the
11
+  instance is deleted. The Compute service manages ephemeral storage and
12
+  by default, Compute stores ephemeral drives as files on local disks on the
13
+  compute node. As an alternative, you can use Ceph RBD as the storage back
14
+  end for ephemeral storage.
15
+
16
+- Persistent storage exists outside all instances. Two types of persistent
17
+  storage are provided:
18
+
19
+  - The Block Storage service (cinder) that can use LVM or Ceph RBD as the
20
+    storage back end.
21
+  - The Image service (glance) that can use the Object Storage service (swift)
22
+    or Ceph RBD as the storage back end.
23
+
24
+For more information about configuring storage back ends for
25
+the different storage options, see `Manage volumes
26
+<https://docs.openstack.org/admin-guide/blockstorage-manage-volumes.html>`_
27
+in the OpenStack Administrator Guide.
28
+
29
+This section discusses ways to protect against data loss in your OpenStack
30
+environment.
31
+
32
+RAID drives
33
+-----------
34
+
35
+Configuring RAID on the hard drives that implement storage protects your data
36
+against a hard drive failure. If the node itself fails, data may be lost.
37
+In particular, all volumes stored on an LVM node can be lost.
38
+
39
+Ceph
40
+----
41
+
42
+`Ceph RBD <http://ceph.com/>`_ is an innately high availability storage back
43
+end. It creates a storage cluster with multiple nodes that communicate with
44
+each other to replicate and redistribute data dynamically.
45
+A Ceph RBD storage cluster provides a single shared set of storage nodes that
46
+can handle all classes of persistent and ephemeral data (glance, cinder, and
47
+nova) that are required for OpenStack instances.
48
+
49
+Ceph RBD provides object replication capabilities by storing Block Storage
50
+volumes as Ceph RBD objects. Ceph RBD ensures that each replica of an object
51
+is stored on a different node. This means that your volumes are protected
52
+against hard drive and node failures, or even the failure of the data center
53
+itself.
54
+
55
+When Ceph RBD is used for ephemeral volumes as well as block and image storage,
56
+it supports `live migration
57
+<https://docs.openstack.org/admin-guide/compute-live-migration-usage.html>`_
58
+of VMs with ephemeral drives. LVM only supports live migration of
59
+volume-backed VMs.

+ 192
- 0
doc/source/storage-ha-block.rst View File

@@ -0,0 +1,192 @@
1
+==================================
2
+Highly available Block Storage API
3
+==================================
4
+
5
+Cinder provides Block-Storage-as-a-Service suitable for performance
6
+sensitive scenarios such as databases, expandable file systems, or
7
+providing a server with access to raw block level storage.
8
+
9
+Persistent block storage can survive instance termination and can also
10
+be moved across instances like any external storage device. Cinder
11
+also has volume snapshots capability for backing up the volumes.
12
+
13
+Making the Block Storage API service highly available in
14
+active/passive mode involves:
15
+
16
+- :ref:`ha-blockstorage-pacemaker`
17
+- :ref:`ha-blockstorage-configure`
18
+- :ref:`ha-blockstorage-services`
19
+
20
+In theory, you can run the Block Storage service as active/active.
21
+However, because of sufficient concerns, we recommend running
22
+the volume component as active/passive only.
23
+
24
+You can read more about these concerns on the
25
+`Red Hat Bugzilla <https://bugzilla.redhat.com/show_bug.cgi?id=1193229>`_
26
+and there is a
27
+`psuedo roadmap <https://etherpad.openstack.org/p/cinder-kilo-stabilisation-work>`_
28
+for addressing them upstream.
29
+
30
+.. _ha-blockstorage-pacemaker:
31
+
32
+Add Block Storage API resource to Pacemaker
33
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34
+
35
+On RHEL-based systems, create resources for cinder's systemd agents and create
36
+constraints to enforce startup/shutdown ordering:
37
+
38
+.. code-block:: console
39
+
40
+  pcs resource create openstack-cinder-api systemd:openstack-cinder-api --clone interleave=true
41
+  pcs resource create openstack-cinder-scheduler systemd:openstack-cinder-scheduler --clone interleave=true
42
+  pcs resource create openstack-cinder-volume systemd:openstack-cinder-volume
43
+
44
+  pcs constraint order start openstack-cinder-api-clone then openstack-cinder-scheduler-clone
45
+  pcs constraint colocation add openstack-cinder-scheduler-clone with openstack-cinder-api-clone
46
+  pcs constraint order start openstack-cinder-scheduler-clone then openstack-cinder-volume
47
+  pcs constraint colocation add openstack-cinder-volume with openstack-cinder-scheduler-clone
48
+
49
+
50
+If the Block Storage service runs on the same nodes as the other services,
51
+then it is advisable to also include:
52
+
53
+.. code-block:: console
54
+
55
+   pcs constraint order start openstack-keystone-clone then openstack-cinder-api-clone
56
+
57
+Alternatively, instead of using systemd agents, download and
58
+install the OCF resource agent:
59
+
60
+.. code-block:: console
61
+
62
+   # cd /usr/lib/ocf/resource.d/openstack
63
+   # wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/cinder-api
64
+   # chmod a+rx *
65
+
66
+You can now add the Pacemaker configuration for Block Storage API resource.
67
+Connect to the Pacemaker cluster with the :command:`crm configure` command
68
+and add the following cluster resources:
69
+
70
+.. code-block:: none
71
+
72
+   primitive p_cinder-api ocf:openstack:cinder-api \
73
+      params config="/etc/cinder/cinder.conf" \
74
+      os_password="secretsecret" \
75
+      os_username="admin" \
76
+      os_tenant_name="admin" \
77
+      keystone_get_token_url="http://10.0.0.11:5000/v2.0/tokens" \
78
+      op monitor interval="30s" timeout="30s"
79
+
80
+This configuration creates ``p_cinder-api``, a resource for managing the
81
+Block Storage API service.
82
+
83
+The command :command:`crm configure` supports batch input, copy and paste the
84
+lines above into your live Pacemaker configuration and then make changes as
85
+required. For example, you may enter ``edit p_ip_cinder-api`` from the
86
+:command:`crm configure` menu and edit the resource to match your preferred
87
+virtual IP address.
88
+
89
+Once completed, commit your configuration changes by entering :command:`commit`
90
+from the :command:`crm configure` menu. Pacemaker then starts the Block Storage
91
+API service and its dependent resources on one of your nodes.
92
+
93
+.. _ha-blockstorage-configure:
94
+
95
+Configure Block Storage API service
96
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
97
+
98
+Edit the ``/etc/cinder/cinder.conf`` file. For example, on a RHEL-based system:
99
+
100
+.. code-block:: ini
101
+   :linenos:
102
+
103
+   [DEFAULT]
104
+   # This is the name which we should advertise ourselves as and for
105
+   # A/P installations it should be the same everywhere
106
+   host = cinder-cluster-1
107
+
108
+   # Listen on the Block Storage VIP
109
+   osapi_volume_listen = 10.0.0.11
110
+
111
+   auth_strategy = keystone
112
+   control_exchange = cinder
113
+
114
+   volume_driver = cinder.volume.drivers.nfs.NfsDriver
115
+   nfs_shares_config = /etc/cinder/nfs_exports
116
+   nfs_sparsed_volumes = true
117
+   nfs_mount_options = v3
118
+
119
+   [database]
120
+   connection = mysql+pymysql://cinder:CINDER_DBPASS@10.0.0.11/cinder
121
+   max_retries = -1
122
+
123
+   [keystone_authtoken]
124
+   # 10.0.0.11 is the Keystone VIP
125
+   identity_uri = http://10.0.0.11:35357/
126
+   www_authenticate_uri = http://10.0.0.11:5000/
127
+   admin_tenant_name = service
128
+   admin_user = cinder
129
+   admin_password = CINDER_PASS
130
+
131
+   [oslo_messaging_rabbit]
132
+   # Explicitly list the rabbit hosts as it doesn't play well with HAProxy
133
+   rabbit_hosts = 10.0.0.12,10.0.0.13,10.0.0.14
134
+   # As a consequence, we also need HA queues
135
+   rabbit_ha_queues = True
136
+   heartbeat_timeout_threshold = 60
137
+   heartbeat_rate = 2
138
+
139
+Replace ``CINDER_DBPASS`` with the password you chose for the Block Storage
140
+database. Replace ``CINDER_PASS`` with the password you chose for the
141
+``cinder`` user in the Identity service.
142
+
143
+This example assumes that you are using NFS for the physical storage, which
144
+will almost never be true in a production installation.
145
+
146
+If you are using the Block Storage service OCF agent, some settings will
147
+be filled in for you, resulting in a shorter configuration file:
148
+
149
+.. code-block:: ini
150
+   :linenos:
151
+
152
+   # We have to use MySQL connection to store data:
153
+   connection = mysql+pymysql://cinder:CINDER_DBPASS@10.0.0.11/cinder
154
+   # Alternatively, you can switch to pymysql,
155
+   # a new Python 3 compatible library and use
156
+   # sql_connection = mysql+pymysql://cinder:CINDER_DBPASS@10.0.0.11/cinder
157
+   # and be ready when everything moves to Python 3.
158
+   # Ref: https://wiki.openstack.org/wiki/PyMySQL_evaluation
159
+
160
+   # We bind Block Storage API to the VIP:
161
+   osapi_volume_listen = 10.0.0.11
162
+
163
+   # We send notifications to High Available RabbitMQ:
164
+   notifier_strategy = rabbit
165
+   rabbit_host = 10.0.0.11
166
+
167
+Replace ``CINDER_DBPASS`` with the password you chose for the Block Storage
168
+database.
169
+
170
+.. _ha-blockstorage-services:
171
+
172
+Configure OpenStack services to use the highly available Block Storage API
173
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
174
+
175
+Your OpenStack services must now point their Block Storage API configuration
176
+to the highly available, virtual cluster IP address rather than a Block Storage
177
+API server’s physical IP address as you would for a non-HA environment.
178
+
179
+Create the Block Storage API endpoint with this IP.
180
+
181
+If you are using both private and public IP addresses, create two virtual IPs
182
+and define your endpoint. For example:
183
+
184
+.. code-block:: console
185
+
186
+   $ openstack endpoint create --region $KEYSTONE_REGION \
187
+     volumev2 public http://PUBLIC_VIP:8776/v2/%\(project_id\)s
188
+   $ openstack endpoint create --region $KEYSTONE_REGION \
189
+     volumev2 admin http://10.0.0.11:8776/v2/%\(project_id\)s
190
+   $ openstack endpoint create --region $KEYSTONE_REGION \
191
+     volumev2 internal http://10.0.0.11:8776/v2/%\(project_id\)s
192
+

+ 114
- 0
doc/source/storage-ha-file-systems.rst View File

@@ -0,0 +1,114 @@
1
+========================================
2
+Highly available Shared File Systems API
3
+========================================
4
+
5
+Making the Shared File Systems (manila) API service highly available
6
+in active/passive mode involves:
7
+
8
+- :ref:`ha-sharedfilesystems-configure`
9
+- :ref:`ha-sharedfilesystems-services`
10
+- :ref:`ha-sharedfilesystems-pacemaker`
11
+
12
+.. _ha-sharedfilesystems-configure:
13
+
14
+Configure Shared File Systems API service
15
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16
+
17
+Edit the :file:`/etc/manila/manila.conf` file:
18
+
19
+.. code-block:: ini
20
+   :linenos:
21
+
22
+   # We have to use MySQL connection to store data:
23
+   sql_connection = mysql+pymysql://manila:password@10.0.0.11/manila?charset=utf8
24
+
25
+   # We bind Shared File Systems API to the VIP:
26
+   osapi_volume_listen = 10.0.0.11
27
+
28
+   # We send notifications to High Available RabbitMQ:
29
+   notifier_strategy = rabbit
30
+   rabbit_host = 10.0.0.11
31
+
32
+
33
+.. _ha-sharedfilesystems-services:
34
+
35
+Configure OpenStack services to use Shared File Systems API
36
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37
+
38
+Your OpenStack services must now point their Shared File Systems API
39
+configuration to the highly available, virtual cluster IP address rather than
40
+a Shared File Systems API server’s physical IP address as you would
41
+for a non-HA environment.
42
+
43
+You must create the Shared File Systems API endpoint with this IP.
44
+
45
+If you are using both private and public IP addresses, you should create two
46
+virtual IPs and define your endpoints like this:
47
+
48
+.. code-block:: console
49
+
50
+   $ openstack endpoint create --region RegionOne \
51
+     sharev2 public 'http://PUBLIC_VIP:8786/v2/%(tenant_id)s'
52
+
53
+   $ openstack endpoint create --region RegionOne \
54
+     sharev2 internal 'http://10.0.0.11:8786/v2/%(tenant_id)s'
55
+
56
+   $ openstack endpoint create --region RegionOne \
57
+     sharev2 admin 'http://10.0.0.11:8786/v2/%(tenant_id)s'
58
+
59
+.. _ha-sharedfilesystems-pacemaker:
60
+
61
+Add Shared File Systems API resource to Pacemaker
62
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
63
+
64
+#. Download the resource agent to your system:
65
+
66
+   .. code-block:: console
67
+
68
+      # cd /usr/lib/ocf/resource.d/openstack
69
+      # wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/manila-api
70
+      # chmod a+rx *