Edits to the full ha-guide document

Cleaning up the ha-guide for minor errors and restructure of content.
Some bugs have been filed to draw attention to the TODOs inline.

Change-Id: Id6cdff494db905826ae87be3e38d587e9829d6da
This commit is contained in:
Alexandra Settle 2016-11-28 15:46:36 +00:00 committed by Alexandra Settle
parent 79a5ba2125
commit f6451c96b1
33 changed files with 924 additions and 1078 deletions

View File

@ -1,12 +1,9 @@
============================================
Configure high availability on compute nodes
============================================
==============================================
Configuring high availability on compute nodes
==============================================
The `Newton Installation Tutorials and Guides
<http://docs.openstack.org/project-install-guide/newton/>`_
gives instructions for installing multiple compute nodes.
To make them highly available,
you must configure the environment
to include multiple instances of the API
and other services.
provide instructions for installing multiple compute nodes.
To make the compute nodes highly available, you must configure the
environment to include multiple instances of the API and other services.

View File

@ -1,4 +1,3 @@
==================================================
Configuring the compute node for high availability
==================================================

View File

@ -8,228 +8,222 @@ under very high loads while needing persistence or Layer 7 processing.
It realistically supports tens of thousands of connections with recent
hardware.
Each instance of HAProxy configures its front end to accept connections
only from the virtual IP (VIP) address and to terminate them as a list
of all instances of the corresponding service under load balancing,
such as any OpenStack API service.
Each instance of HAProxy configures its front end to accept connections only
to the virtual IP (VIP) address. The HAProxy back end (termination
point) is a list of all the IP addresses of instances for load balancing.
This makes the instances of HAProxy act independently and fail over
transparently together with the network endpoints (VIP addresses)
failover and, therefore, shares the same SLA.
.. note::
You can alternatively use a commercial load balancer, which is a hardware
or software. A hardware load balancer generally has good performance.
Ensure your HAProxy installation is not a single point of failure,
it is advisable to have multiple HAProxy instances running.
You can also ensure the availability by other means, using Keepalived
or Pacemaker.
Alternatively, you can use a commercial load balancer, which is hardware
or software. We recommend a hardware load balancer as it generally has
good performance.
For detailed instructions about installing HAProxy on your nodes,
see its `official documentation <http://www.haproxy.org/#docs>`_.
see the HAProxy `official documentation <http://www.haproxy.org/#docs>`_.
.. note::
HAProxy should not be a single point of failure.
It is advisable to have multiple HAProxy instances running,
where the number of these instances is a small odd number like 3 or 5.
You need to ensure its availability by other means,
such as Keepalived or Pacemaker.
The common practice is to locate an HAProxy instance on each OpenStack
controller in the environment.
Once configured (see example file below), add HAProxy to the cluster
and ensure the VIPs can only run on machines where HAProxy is active:
``pcs``
.. code-block:: console
$ pcs resource create lb-haproxy systemd:haproxy --clone
$ pcs constraint order start vip then lb-haproxy-clone kind=Optional
$ pcs constraint colocation add lb-haproxy-clone with vip
``crmsh``
.. code-block:: console
$ crm cib new conf-haproxy
$ crm configure primitive haproxy lsb:haproxy op monitor interval="1s"
$ crm configure clone haproxy-clone haproxy
$ crm configure colocation vip-with-haproxy inf: vip haproxy-clone
$ crm configure order haproxy-after-vip mandatory: vip haproxy-clone
Example Config File
Configuring HAProxy
~~~~~~~~~~~~~~~~~~~
Here is an example ``/etc/haproxy/haproxy.cfg`` configuration file.
You need a copy of it on each controller node.
#. Restart the HAProxy service.
.. note::
#. Locate your HAProxy instance on each OpenStack controller in your
environment. The following is an example ``/etc/haproxy/haproxy.cfg``
configuration file. Configure your instance using the following
configuration file, you will need a copy of it on each
controller node.
To implement any changes made to this you must restart the HAProxy service
.. code-block:: none
.. code-block:: none
global
chroot /var/lib/haproxy
daemon
group haproxy
maxconn 4000
pidfile /var/run/haproxy.pid
user haproxy
global
chroot /var/lib/haproxy
daemon
group haproxy
maxconn 4000
pidfile /var/run/haproxy.pid
user haproxy
defaults
log global
maxconn 4000
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
defaults
log global
maxconn 4000
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
listen dashboard_cluster
bind <Virtual IP>:443
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:443 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:443 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:443 check inter 2000 rise 2 fall 5
listen dashboard_cluster
bind <Virtual IP>:443
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:443 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:443 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:443 check inter 2000 rise 2 fall 5
listen galera_cluster
bind <Virtual IP>:3306
balance source
option mysql-check
server controller1 10.0.0.12:3306 check port 9200 inter 2000 rise 2 fall 5
server controller2 10.0.0.13:3306 backup check port 9200 inter 2000 rise 2 fall 5
server controller3 10.0.0.14:3306 backup check port 9200 inter 2000 rise 2 fall 5
listen galera_cluster
bind <Virtual IP>:3306
balance source
option mysql-check
server controller1 10.0.0.12:3306 check port 9200 inter 2000 rise 2 fall 5
server controller2 10.0.0.13:3306 backup check port 9200 inter 2000 rise 2 fall 5
server controller3 10.0.0.14:3306 backup check port 9200 inter 2000 rise 2 fall 5
listen glance_api_cluster
bind <Virtual IP>:9292
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:9292 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:9292 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:9292 check inter 2000 rise 2 fall 5
listen glance_api_cluster
bind <Virtual IP>:9292
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:9292 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:9292 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:9292 check inter 2000 rise 2 fall 5
listen glance_registry_cluster
bind <Virtual IP>:9191
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:9191 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:9191 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:9191 check inter 2000 rise 2 fall 5
listen glance_registry_cluster
bind <Virtual IP>:9191
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:9191 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:9191 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:9191 check inter 2000 rise 2 fall 5
listen keystone_admin_cluster
bind <Virtual IP>:35357
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:35357 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:35357 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:35357 check inter 2000 rise 2 fall 5
listen keystone_admin_cluster
bind <Virtual IP>:35357
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:35357 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:35357 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:35357 check inter 2000 rise 2 fall 5
listen keystone_public_internal_cluster
bind <Virtual IP>:5000
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:5000 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:5000 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:5000 check inter 2000 rise 2 fall 5
listen keystone_public_internal_cluster
bind <Virtual IP>:5000
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:5000 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:5000 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:5000 check inter 2000 rise 2 fall 5
listen nova_ec2_api_cluster
bind <Virtual IP>:8773
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:8773 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8773 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8773 check inter 2000 rise 2 fall 5
listen nova_ec2_api_cluster
bind <Virtual IP>:8773
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:8773 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8773 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8773 check inter 2000 rise 2 fall 5
listen nova_compute_api_cluster
bind <Virtual IP>:8774
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:8774 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8774 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8774 check inter 2000 rise 2 fall 5
listen nova_compute_api_cluster
bind <Virtual IP>:8774
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:8774 check inter 2000 rise 2 fall 5
erver controller2 10.0.0.13:8774 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8774 check inter 2000 rise 2 fall 5
listen nova_metadata_api_cluster
bind <Virtual IP>:8775
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:8775 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8775 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8775 check inter 2000 rise 2 fall 5
listen nova_metadata_api_cluster
bind <Virtual IP>:8775
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:8775 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8775 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8775 check inter 2000 rise 2 fall 5
listen cinder_api_cluster
bind <Virtual IP>:8776
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:8776 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8776 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8776 check inter 2000 rise 2 fall 5
listen cinder_api_cluster
bind <Virtual IP>:8776
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:8776 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8776 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8776 check inter 2000 rise 2 fall 5
listen ceilometer_api_cluster
bind <Virtual IP>:8777
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:8777 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8777 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8777 check inter 2000 rise 2 fall 5
listen ceilometer_api_cluster
bind <Virtual IP>:8777
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:8777 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8777 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8777 check inter 2000 rise 2 fall 5
listen nova_vncproxy_cluster
bind <Virtual IP>:6080
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:6080 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:6080 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:6080 check inter 2000 rise 2 fall 5
listen nova_vncproxy_cluster
bind <Virtual IP>:6080
balance source
option tcpka
option tcplog
server controller1 10.0.0.12:6080 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:6080 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:6080 check inter 2000 rise 2 fall 5
listen neutron_api_cluster
bind <Virtual IP>:9696
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:9696 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:9696 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:9696 check inter 2000 rise 2 fall 5
listen neutron_api_cluster
bind <Virtual IP>:9696
balance source
option tcpka
option httpchk
option tcplog
server controller1 10.0.0.12:9696 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:9696 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:9696 check inter 2000 rise 2 fall 5
listen swift_proxy_cluster
bind <Virtual IP>:8080
balance source
option tcplog
option tcpka
server controller1 10.0.0.12:8080 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8080 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8080 check inter 2000 rise 2 fall 5
listen swift_proxy_cluster
bind <Virtual IP>:8080
balance source
option tcplog
option tcpka
server controller1 10.0.0.12:8080 check inter 2000 rise 2 fall 5
server controller2 10.0.0.13:8080 check inter 2000 rise 2 fall 5
server controller3 10.0.0.14:8080 check inter 2000 rise 2 fall 5
.. note::
.. note::
The Galera cluster configuration directive ``backup`` indicates
that two of the three controllers are standby nodes.
This ensures that only one node services write requests
because OpenStack support for multi-node writes is not yet production-ready.
The Galera cluster configuration directive ``backup`` indicates
that two of the three controllers are standby nodes.
This ensures that only one node services write requests
because OpenStack support for multi-node writes is not yet production-ready.
.. note::
.. note::
The Telemetry API service configuration does not have the ``option httpchk``
directive as it cannot process this check properly.
TODO: explain why the Telemetry API is so special
The Telemetry API service configuration does not have the ``option httpchk``
directive as it cannot process this check properly.
[TODO: we need more commentary about the contents and format of this file]
.. TODO: explain why the Telemetry API is so special
#. Add HAProxy to the cluster and ensure the VIPs can only run on machines
where HAProxy is active:
``pcs``
.. code-block:: console
$ pcs resource create lb-haproxy systemd:haproxy --clone
$ pcs constraint order start vip then lb-haproxy-clone kind=Optional
$ pcs constraint colocation add lb-haproxy-clone with vip
``crmsh``
.. code-block:: console
$ crm cib new conf-haproxy
$ crm configure primitive haproxy lsb:haproxy op monitor interval="1s"
$ crm configure clone haproxy-clone haproxy
$ crm configure colocation vip-with-haproxy inf: vip haproxy-clone
$ crm configure order haproxy-after-vip mandatory: vip haproxy-clone

View File

@ -2,13 +2,8 @@
Highly available Identity API
=============================
You should be familiar with
`OpenStack Identity service
<http://docs.openstack.org/admin-guide/common/get-started-identity.html>`_
before proceeding, which is used by many services.
Making the OpenStack Identity service highly available
in active / passive mode involves:
in active and passive mode involves:
- :ref:`identity-pacemaker`
- :ref:`identity-config-identity`
@ -16,17 +11,28 @@ in active / passive mode involves:
.. _identity-pacemaker:
Prerequisites
~~~~~~~~~~~~~
Before beginning, ensure you have read the
`OpenStack Identity service getting started documentation
<http://docs.openstack.org/admin-guide/common/get-started-identity.html>`_
before proceeding.
Add OpenStack Identity resource to Pacemaker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following section(s) detail how to add the OpenStack Identity
resource to Pacemaker on SUSE and Red Hat.
SUSE
-----
SUSE Enterprise Linux and SUSE-based distributions, such as openSUSE,
use a set of OCF agents for controlling OpenStack services.
#. You must first download the OpenStack Identity resource to Pacemaker
by running the following commands:
#. Run the following commands to download the OpenStack Identity resource
to Pacemaker:
.. code-block:: console
@ -36,40 +42,49 @@ use a set of OCF agents for controlling OpenStack services.
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/keystone
# chmod a+rx *
#. You can now add the Pacemaker configuration
for the OpenStack Identity resource
by running the :command:`crm configure` command
to connect to the Pacemaker cluster.
Add the following cluster resources:
#. Add the Pacemaker configuration for the OpenStack Identity resource
by running the following command to connect to the Pacemaker cluster:
::
.. code-block:: console
# crm configure
#. Add the following cluster resources:
.. code-block:: console
clone p_keystone ocf:openstack:keystone \
params config="/etc/keystone/keystone.conf" os_password="secretsecret" os_username="admin" os_tenant_name="admin" os_auth_url="http://10.0.0.11:5000/v2.0/" \
op monitor interval="30s" timeout="30s"
This configuration creates ``p_keystone``,
a resource for managing the OpenStack Identity service.
.. note::
:command:`crm configure` supports batch input
so you may copy and paste the above lines
into your live Pacemaker configuration,
and then make changes as required.
For example, you may enter edit ``p_ip_keystone``
from the :command:`crm configure` menu
and edit the resource to match your preferred virtual IP address.
This configuration creates ``p_keystone``,
a resource for managing the OpenStack Identity service.
#. After you add these resources,
commit your configuration changes by entering :command:`commit`
from the :command:`crm configure` menu.
Pacemaker then starts the OpenStack Identity service
and its dependent resources on all of your nodes.
#. Commit your configuration changes from the :command:`crm configure` menu
with the following command:
.. code-block:: console
# commit
The :command:`crm configure` supports batch input. You may have to copy and
paste the above lines into your live Pacemaker configuration, and then make
changes as required.
For example, you may enter ``edit p_ip_keystone`` from the
:command:`crm configure` menu and edit the resource to match your preferred
virtual IP address.
Pacemaker now starts the OpenStack Identity service and its dependent
resources on all of your nodes.
Red Hat
--------
For Red Hat Enterprise Linux and Red Hat-based Linux distributions,
the process is simpler as they use the standard Systemd unit files.
the following process uses Systemd unit files.
.. code-block:: console
@ -116,29 +131,24 @@ Configure OpenStack Identity service
Configure OpenStack services to use the highly available OpenStack Identity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Your OpenStack services must now point
their OpenStack Identity configuration
to the highly available virtual cluster IP address
rather than point to the physical IP address
of an OpenStack Identity server as you would do
in a non-HA environment.
Your OpenStack services now point their OpenStack Identity configuration
to the highly available virtual cluster IP address.
#. For OpenStack Compute, for example,
if your OpenStack Identity service IP address is 10.0.0.11,
use the following configuration in your :file:`api-paste.ini` file:
#. For OpenStack Compute, (if your OpenStack Identity service IP address
is 10.0.0.11) use the following configuration in the :file:`api-paste.ini`
file:
.. code-block:: ini
auth_host = 10.0.0.11
#. You also need to create the OpenStack Identity Endpoint
with this IP address.
#. Create the OpenStack Identity Endpoint with this IP address.
.. note::
If you are using both private and public IP addresses,
you should create two virtual IP addresses
and define your endpoint like this:
create two virtual IP addresses and define the endpoint. For
example:
.. code-block:: console
@ -150,12 +160,9 @@ in a non-HA environment.
$service-type internal http://10.0.0.11:5000/v2.0
#. If you are using the horizon dashboard,
edit the :file:`local_settings.py` file
to include the following:
#. If you are using the horizon Dashboard, edit the :file:`local_settings.py`
file to include the following:
.. code-block:: ini
OPENSTACK_HOST = 10.0.0.11

View File

@ -1,6 +1,6 @@
===================
=========
Memcached
===================
=========
Memcached is a general-purpose distributed memory caching system. It
is used to speed up dynamic database-driven websites by caching data
@ -10,12 +10,12 @@ source must be read.
Memcached is a memory cache demon that can be used by most OpenStack
services to store ephemeral data, such as tokens.
Access to memcached is not handled by HAproxy because replicated
access is currently only in an experimental state. Instead OpenStack
Access to Memcached is not handled by HAProxy because replicated
access is currently in an experimental state. Instead, OpenStack
services must be supplied with the full list of hosts running
memcached.
Memcached.
The Memcached client implements hashing to balance objects among the
instances. Failure of an instance only impacts a percentage of the
instances. Failure of an instance impacts only a percentage of the
objects and the client automatically removes it from the list of
instances. The SLA is several minutes.
instances. The SLA is several minutes.

View File

@ -2,23 +2,24 @@
Pacemaker cluster stack
=======================
`Pacemaker <http://clusterlabs.org/>`_ cluster stack is the state-of-the-art
`Pacemaker <http://clusterlabs.org/>`_ cluster stack is a state-of-the-art
high availability and load balancing stack for the Linux platform.
Pacemaker is useful to make OpenStack infrastructure highly available.
Also, it is storage and application-agnostic, and in no way
specific to OpenStack.
Pacemaker is used to make OpenStack infrastructure highly available.
.. note::
It is storage and application-agnostic, and in no way specific to OpenStack.
Pacemaker relies on the
`Corosync <http://corosync.github.io/corosync/>`_ messaging layer
for reliable cluster communications.
Corosync implements the Totem single-ring ordering and membership protocol.
It also provides UDP and InfiniBand based messaging,
quorum, and cluster membership to Pacemaker.
for reliable cluster communications. Corosync implements the Totem single-ring
ordering and membership protocol. It also provides UDP and InfiniBand based
messaging, quorum, and cluster membership to Pacemaker.
Pacemaker does not inherently (need or want to) understand the
applications it manages. Instead, it relies on resource agents (RAs),
scripts that encapsulate the knowledge of how to start, stop, and
check the health of each application managed by the cluster.
Pacemaker does not inherently understand the applications it manages.
Instead, it relies on resource agents (RAs) that are scripts that encapsulate
the knowledge of how to start, stop, and check the health of each application
managed by the cluster.
These agents must conform to one of the `OCF <https://github.com/ClusterLabs/
OCF-spec/blob/master/ra/resource-agent-api.md>`_,
@ -44,57 +45,61 @@ The steps to implement the Pacemaker cluster stack are:
Install packages
~~~~~~~~~~~~~~~~
On any host that is meant to be part of a Pacemaker cluster,
you must first establish cluster communications
through the Corosync messaging layer.
This involves installing the following packages
(and their dependencies, which your package manager
usually installs automatically):
On any host that is meant to be part of a Pacemaker cluster, establish cluster
communications through the Corosync messaging layer.
This involves installing the following packages (and their dependencies, which
your package manager usually installs automatically):
- pacemaker
- `pacemaker`
- pcs (CentOS or RHEL) or crmsh
- `pcs` (CentOS or RHEL) or crmsh
- corosync
- `corosync`
- fence-agents (CentOS or RHEL) or cluster-glue
- `fence-agents` (CentOS or RHEL) or cluster-glue
- resource-agents
- `resource-agents`
- libqb0
- `libqb0`
.. _pacemaker-corosync-setup:
Set up the cluster with `pcs`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Set up the cluster with pcs
~~~~~~~~~~~~~~~~~~~~~~~~~~~
#. Make sure pcs is running and configured to start at boot time:
#. Make sure `pcs` is running and configured to start at boot time:
.. code-block:: console
$ systemctl enable pcsd
$ systemctl start pcsd
#. Set a password for hacluster user **on each host**.
Since the cluster is a single administrative domain, it is generally
accepted to use the same password on all nodes.
#. Set a password for hacluster user on each host:
.. code-block:: console
$ echo my-secret-password-no-dont-use-this-one \
| passwd --stdin hacluster
#. Use that password to authenticate to the nodes which will
make up the cluster. The :option:`-p` option is used to give
the password on command line and makes it easier to script.
.. note::
Since the cluster is a single administrative domain, it is
acceptable to use the same password on all nodes.
#. Use that password to authenticate to the nodes that will
make up the cluster:
.. code-block:: console
$ pcs cluster auth controller1 controller2 controller3 \
-u hacluster -p my-secret-password-no-dont-use-this-one --force
#. Create the cluster, giving it a name, and start it:
.. note::
The :option:`-p` option is used to give the password on command
line and makes it easier to script.
#. Create and name the cluster, and then start it:
.. code-block:: console
@ -115,12 +120,12 @@ After installing the Corosync package, you must create
the :file:`/etc/corosync/corosync.conf` configuration file.
.. note::
For Ubuntu, you should also enable the Corosync service
in the ``/etc/default/corosync`` configuration file.
Corosync can be configured to work
with either multicast or unicast IP addresses
or to use the votequorum library.
For Ubuntu, you should also enable the Corosync service in the
``/etc/default/corosync`` configuration file.
Corosync can be configured to work with either multicast or unicast IP
addresses or to use the votequorum library.
- :ref:`corosync-multicast`
- :ref:`corosync-unicast`
@ -132,11 +137,10 @@ Set up Corosync with multicast
------------------------------
Most distributions ship an example configuration file
(:file:`corosync.conf.example`)
as part of the documentation bundled with the Corosync package.
An example Corosync configuration file is shown below:
(:file:`corosync.conf.example`) as part of the documentation bundled with
the Corosync package. An example Corosync configuration file is shown below:
**Example Corosync configuration file for multicast (corosync.conf)**
**Example Corosync configuration file for multicast (``corosync.conf``)**
.. code-block:: ini
@ -215,26 +219,26 @@ Note the following:
When this timeout expires, the token is declared lost,
and after ``token_retransmits_before_loss_const lost`` tokens,
the non-responding processor (cluster node) is declared dead.
In other words, ``token × token_retransmits_before_loss_const``
``token × token_retransmits_before_loss_const``
is the maximum time a node is allowed to not respond to cluster messages
before being considered dead.
The default for token is 1000 milliseconds (1 second),
with 4 allowed retransmits.
These defaults are intended to minimize failover times,
but can cause frequent "false alarms" and unintended failovers
but can cause frequent false alarms and unintended failovers
in case of short network interruptions. The values used here are safer,
albeit with slightly extended failover times.
- With ``secauth`` enabled,
Corosync nodes mutually authenticate using a 128-byte shared secret
stored in the :file:`/etc/corosync/authkey` file,
which may be generated with the :command:`corosync-keygen` utility.
When using ``secauth``, cluster communications are also encrypted.
Corosync nodes mutually authenticates using a 128-byte shared secret
stored in the :file:`/etc/corosync/authkey` file.
This can be generated with the :command:`corosync-keygen` utility.
Cluster communications are encrypted when using ``secauth``.
- In Corosync configurations using redundant networking
(with more than one interface),
you must select a Redundant Ring Protocol (RRP) mode other than none.
``active`` is the recommended RRP mode.
- In Corosync, configurations use redundant networking
(with more than one interface). This means you must select a Redundant
Ring Protocol (RRP) mode other than none. We recommend ``active`` as
the RRP mode.
Note the following about the recommended interface configuration:
@ -245,61 +249,57 @@ Note the following:
The example uses two network addresses of /24 IPv4 subnets.
- Multicast groups (``mcastaddr``) must not be reused
across cluster boundaries.
In other words, no two distinct clusters
across cluster boundaries. No two distinct clusters
should ever use the same multicast group.
Be sure to select multicast addresses compliant with
`RFC 2365, "Administratively Scoped IP Multicast"
<http://www.ietf.org/rfc/rfc2365.txt>`_.
- For firewall configurations,
note that Corosync communicates over UDP only,
and uses ``mcastport`` (for receives)
and ``mcastport - 1`` (for sends).
- For firewall configurations, Corosync communicates over UDP only,
and uses ``mcastport`` (for receives) and ``mcastport - 1`` (for sends).
- The service declaration for the pacemaker service
- The service declaration for the Pacemaker service
may be placed in the :file:`corosync.conf` file directly
or in its own separate file, :file:`/etc/corosync/service.d/pacemaker`.
.. note::
If you are using Corosync version 2 on Ubuntu 14.04,
remove or comment out lines under the service stanza,
which enables Pacemaker to start up. Another potential
problem is the boot and shutdown order of Corosync and
Pacemaker. To force Pacemaker to start after Corosync and
stop before Corosync, fix the start and kill symlinks manually:
If you are using Corosync version 2 on Ubuntu 14.04,
remove or comment out lines under the service stanza.
These stanzas enable Pacemaker to start up. Another potential
problem is the boot and shutdown order of Corosync and
Pacemaker. To force Pacemaker to start after Corosync and
stop before Corosync, fix the start and kill symlinks manually:
.. code-block:: console
.. code-block:: console
# update-rc.d pacemaker start 20 2 3 4 5 . stop 00 0 1 6 .
# update-rc.d pacemaker start 20 2 3 4 5 . stop 00 0 1 6 .
The Pacemaker service also requires an additional
configuration file ``/etc/corosync/uidgid.d/pacemaker``
to be created with the following content:
The Pacemaker service also requires an additional
configuration file ``/etc/corosync/uidgid.d/pacemaker``
to be created with the following content:
.. code-block:: ini
.. code-block:: ini
uidgid {
uid: hacluster
gid: haclient
}
uidgid {
uid: hacluster
gid: haclient
}
- Once created, the :file:`corosync.conf` file
- Once created, synchronize the :file:`corosync.conf` file
(and the :file:`authkey` file if the secauth option is enabled)
must be synchronized across all cluster nodes.
across all cluster nodes.
.. _corosync-unicast:
Set up Corosync with unicast
----------------------------
For environments that do not support multicast,
Corosync should be configured for unicast.
An example fragment of the :file:`corosync.conf` file
for unicastis shown below:
For environments that do not support multicast, Corosync should be configured
for unicast. An example fragment of the :file:`corosync.conf` file
for unicastis is shown below:
**Corosync configuration file fragment for unicast (corosync.conf)**
**Corosync configuration file fragment for unicast (``corosync.conf``)**
.. code-block:: ini
@ -341,45 +341,38 @@ for unicastis shown below:
Note the following:
- If the ``broadcast`` parameter is set to yes,
the broadcast address is used for communication.
If this option is set, the ``mcastaddr`` parameter should not be set.
- If the ``broadcast`` parameter is set to ``yes``, the broadcast address is
used for communication. If this option is set, the ``mcastaddr`` parameter
should not be set.
- The ``transport`` directive controls the transport mechanism used.
To avoid the use of multicast entirely,
specify the ``udpu`` unicast transport parameter.
This requires specifying the list of members
in the ``nodelist`` directive;
this could potentially make up the membership before deployment.
The default is ``udp``.
The transport type can also be set to ``udpu`` or ``iba``.
- The ``transport`` directive controls the transport mechanism.
To avoid the use of multicast entirely, specify the ``udpu`` unicast
transport parameter. This requires specifying the list of members in the
``nodelist`` directive. This potentially makes up the membership before
deployment. The default is ``udp``. The transport type can also be set to
``udpu`` or ``iba``.
- Within the ``nodelist`` directive,
it is possible to specify specific information
about the nodes in the cluster.
The directive can contain only the node sub-directive,
which specifies every node that should be a member of the membership,
and where non-default options are needed.
Every node must have at least the ``ring0_addr`` field filled.
- Within the ``nodelist`` directive, it is possible to specify specific
information about the nodes in the cluster. The directive can contain only
the node sub-directive, which specifies every node that should be a member
of the membership, and where non-default options are needed. Every node must
have at least the ``ring0_addr`` field filled.
.. note::
For UDPU, every node that should be a member
of the membership must be specified.
For UDPU, every node that should be a member of the membership must be specified.
Possible options are:
- ``ring{X}_addr`` specifies the IP address of one of the nodes.
{X} is the ring number.
``{X}`` is the ring number.
- ``nodeid`` is optional
when using IPv4 and required when using IPv6.
This is a 32-bit value specifying the node identifier
delivered to the cluster membership service.
If this is not specified with IPv4,
the node id is determined from the 32-bit IP address
of the system to which the system is bound with ring identifier of 0.
The node identifier value of zero is reserved and should not be used.
- ``nodeid`` is optional when using IPv4 and required when using IPv6.
This is a 32-bit value specifying the node identifier delivered to the
cluster membership service. If this is not specified with IPv4,
the node ID is determined from the 32-bit IP address of the system to which
the system is bound with ring identifier of 0. The node identifier value of
zero is reserved and should not be used.
.. _corosync-votequorum:
@ -387,15 +380,14 @@ Note the following:
Set up Corosync with votequorum library
---------------------------------------
The votequorum library is part of the corosync project.
It provides an interface to the vote-based quorum service
and it must be explicitly enabled in the Corosync configuration file.
The main role of votequorum library is to avoid split-brain situations,
but it also provides a mechanism to:
The votequorum library is part of the Corosync project. It provides an
interface to the vote-based quorum service and it must be explicitly enabled
in the Corosync configuration file. The main role of votequorum library is to
avoid split-brain situations, but it also provides a mechanism to:
- Query the quorum status
- Get a list of nodes known to the quorum service
- List the nodes known to the quorum service
- Receive notifications of quorum state changes
@ -403,15 +395,13 @@ but it also provides a mechanism to:
- Change the number of expected votes for a cluster to be quorate
- Connect an additional quorum device
to allow small clusters remain quorate during node outages
- Connect an additional quorum device to allow small clusters remain quorate
during node outages
The votequorum library has been created to replace and eliminate
qdisk, the disk-based quorum daemon for CMAN,
from advanced cluster configurations.
The votequorum library has been created to replace and eliminate ``qdisk``, the
disk-based quorum daemon for CMAN, from advanced cluster configurations.
A sample votequorum service configuration
in the :file:`corosync.conf` file is:
A sample votequorum service configuration in the :file:`corosync.conf` file is:
.. code-block:: ini
@ -425,42 +415,33 @@ in the :file:`corosync.conf` file is:
Note the following:
- Specifying ``corosync_votequorum`` enables the votequorum library;
this is the only required option.
- Specifying ``corosync_votequorum`` enables the votequorum library.
This is the only required option.
- The cluster is fully operational with ``expected_votes`` set to 7 nodes
(each node has 1 vote), quorum: 4.
If a list of nodes is specified as ``nodelist``,
the ``expected_votes`` value is ignored.
(each node has 1 vote), quorum: 4. If a list of nodes is specified as
``nodelist``, the ``expected_votes`` value is ignored.
- Setting ``wait_for_all`` to 1 means that,
When starting up a cluster (all nodes down),
the cluster quorum is held until all nodes are online
and have joined the cluster for the first time.
This parameter is new in Corosync 2.0.
- When you start up a cluster (all nodes down) and set ``wait_for_all`` to 1,
the cluster quorum is held until all nodes are online and have joined the
cluster for the first time. This parameter is new in Corosync 2.0.
- Setting ``last_man_standing`` to 1 enables
the Last Man Standing (LMS) feature;
by default, it is disabled (set to 0).
If a cluster is on the quorum edge
(``expected_votes:`` set to 7; ``online nodes:`` set to 4)
for longer than the time specified
for the ``last_man_standing_window`` parameter,
the cluster can recalculate quorum and continue operating
even if the next node will be lost.
This logic is repeated until the number of online nodes
in the cluster reaches 2.
In order to allow the cluster to step down from 2 members to only 1,
the ``auto_tie_breaker`` parameter needs to be set;
this is not recommended for production environments.
- Setting ``last_man_standing`` to 1 enables the Last Man Standing (LMS)
feature. By default, it is disabled (set to 0).
If a cluster is on the quorum edge (``expected_votes:`` set to 7;
``online nodes:`` set to 4) for longer than the time specified
for the ``last_man_standing_window`` parameter, the cluster can recalculate
quorum and continue operating even if the next node will be lost.
This logic is repeated until the number of online nodes in the cluster
reaches 2. In order to allow the cluster to step down from 2 members to only
1, the ``auto_tie_breaker`` parameter needs to be set.
We do not recommended this for production environments.
- ``last_man_standing_window`` specifies the time, in milliseconds,
required to recalculate quorum after one or more hosts
have been lost from the cluster.
To do the new quorum recalculation,
have been lost from the cluster. To perform a new quorum recalculation,
the cluster must have quorum for at least the interval
specified for ``last_man_standing_window``;
the default is 10000ms.
specified for ``last_man_standing_window``. The default is 10000ms.
.. _pacemaker-corosync-start:
@ -468,30 +449,29 @@ Note the following:
Start Corosync
--------------
``Corosync`` is started as a regular system service.
Depending on your distribution, it may ship with an LSB init script,
an upstart job, or a systemd unit file.
Either way, the service is usually named ``corosync``:
Corosync is started as a regular system service. Depending on your
distribution, it may ship with an LSB init script, an upstart job, or
a Systemd unit file.
- To start ``corosync`` with the LSB init script:
- Start ``corosync`` with the LSB init script:
.. code-block:: console
# /etc/init.d/corosync start
- Alternatively:
Alternatively:
.. code-block:: console
# service corosync start
- To start ``corosync`` with upstart:
- Start ``corosync`` with upstart:
.. code-block:: console
# start corosync
- To start ``corosync`` with systemd unit file:
- Start ``corosync`` with systemd unit file:
.. code-block:: console
@ -514,8 +494,8 @@ to get a summary of the health of the communication rings:
id = 10.0.42.100
status = ring 1 active with no faults
Use the :command:`corosync-objctl` utility
to dump the Corosync cluster member list:
Use the :command:`corosync-objctl` utility to dump the Corosync cluster
member list:
.. code-block:: console
@ -527,11 +507,8 @@ to dump the Corosync cluster member list:
runtime.totem.pg.mrp.srp.983895584.join_count=1
runtime.totem.pg.mrp.srp.983895584.status=joined
You should see a ``status=joined`` entry
for each of your constituent cluster nodes.
[TODO: Should the main example now use corosync-cmapctl and have the note
give the command for Corosync version 1?]
You should see a ``status=joined`` entry for each of your constituent
cluster nodes.
.. note::
@ -543,38 +520,38 @@ give the command for Corosync version 1?]
Start Pacemaker
---------------
After the ``corosync`` service have been started
and you have verified that the cluster is communicating properly,
you can start :command:`pacemakerd`, the Pacemaker master control process.
Choose one from the following four ways to start it:
After the ``corosync`` service have been started and you have verified that the
cluster is communicating properly, you can start :command:`pacemakerd`, the
Pacemaker master control process. Choose one from the following four ways to
start it:
- To start ``pacemaker`` with the LSB init script:
#. Start ``pacemaker`` with the LSB init script:
.. code-block:: console
# /etc/init.d/pacemaker start
- Alternatively:
Alternatively:
.. code-block:: console
# service pacemaker start
- To start ``pacemaker`` with upstart:
#. Start ``pacemaker`` with upstart:
.. code-block:: console
# start pacemaker
- To start ``pacemaker`` with the systemd unit file:
#. Start ``pacemaker`` with the systemd unit file:
.. code-block:: console
# systemctl start pacemaker
After the ``pacemaker`` service have started,
Pacemaker creates a default empty cluster configuration with no resources.
Use the :command:`crm_mon` utility to observe the status of ``pacemaker``:
After the ``pacemaker`` service has started, Pacemaker creates a default empty
cluster configuration with no resources. Use the :command:`crm_mon` utility to
observe the status of ``pacemaker``:
.. code-block:: console
@ -596,30 +573,29 @@ Use the :command:`crm_mon` utility to observe the status of ``pacemaker``:
Set basic cluster properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After you set up your Pacemaker cluster,
you should set a few basic cluster properties:
After you set up your Pacemaker cluster, set a few basic cluster properties:
``crmsh``
- ``crmsh``
.. code-block:: console
.. code-block:: console
$ crm configure property pe-warn-series-max="1000" \
pe-input-series-max="1000" \
pe-error-series-max="1000" \
cluster-recheck-interval="5min"
$ crm configure property pe-warn-series-max="1000" \
pe-input-series-max="1000" \
pe-error-series-max="1000" \
cluster-recheck-interval="5min"
``pcs``
- ``pcs``
.. code-block:: console
.. code-block:: console
$ pcs property set pe-warn-series-max=1000 \
pe-input-series-max=1000 \
pe-error-series-max=1000 \
cluster-recheck-interval=5min
$ pcs property set pe-warn-series-max=1000 \
pe-input-series-max=1000 \
pe-error-series-max=1000 \
cluster-recheck-interval=5min
Note the following:
- Setting the ``pe-warn-series-max``, ``pe-input-series-max``
- Setting the ``pe-warn-series-max``, ``pe-input-series-max``,
and ``pe-error-series-max`` parameters to 1000
instructs Pacemaker to keep a longer history of the inputs processed
and errors and warnings generated by its Policy Engine.
@ -631,4 +607,4 @@ Note the following:
It is usually prudent to reduce this to a shorter interval,
such as 5 or 3 minutes.
After you make these changes, you may commit the updated configuration.
After you make these changes, commit the updated configuration.

View File

@ -2,76 +2,80 @@
Highly available Telemetry API
==============================
`Telemetry service
<http://docs.openstack.org/admin-guide/common/get-started-telemetry.html>`__
provides data collection service and alarming service.
The `Telemetry service
<http://docs.openstack.org/admin-guide/common/get-started-telemetry.html>`_
provides a data collection service and an alarming service.
Telemetry central agent
~~~~~~~~~~~~~~~~~~~~~~~
The Telemetry central agent can be configured to partition its polling
workload between multiple agents, enabling high availability.
workload between multiple agents. This enables high availability (HA).
Both the central and the compute agent can run in an HA deployment,
which means that multiple instances of these services can run in
Both the central and the compute agent can run in an HA deployment.
This means that multiple instances of these services can run in
parallel with workload partitioning among these running instances.
The `Tooz <https://pypi.python.org/pypi/tooz>`__ library provides
The `Tooz <https://pypi.python.org/pypi/tooz>`_ library provides
the coordination within the groups of service instances.
It provides an API above several back ends that can be used for building
distributed applications.
Tooz supports
`various drivers <http://docs.openstack.org/developer/tooz/drivers.html>`__
`various drivers <http://docs.openstack.org/developer/tooz/drivers.html>`_
including the following back end solutions:
* `Zookeeper <http://zookeeper.apache.org/>`__.
* `Zookeeper <http://zookeeper.apache.org/>`_:
Recommended solution by the Tooz project.
* `Redis <http://redis.io/>`__.
* `Redis <http://redis.io/>`_:
Recommended solution by the Tooz project.
* `Memcached <http://memcached.org/>`__.
* `Memcached <http://memcached.org/>`_:
Recommended for testing.
You must configure a supported Tooz driver for the HA deployment of
the Telemetry services.
For information about the required configuration options that have
to be set in the :file:`ceilometer.conf` configuration file for both
the central and compute agents, see the `coordination section
<http://docs.openstack.org/newton/config-reference/telemetry.html>`__
For information about the required configuration options
to set in the :file:`ceilometer.conf`, see the `coordination section
<http://docs.openstack.org/newton/config-reference/telemetry.html>`_
in the OpenStack Configuration Reference.
.. note:: Without the ``backend_url`` option being set only one
instance of both the central and compute agent service is able to run
and function correctly.
.. note::
Only one instance for the central and compute agent service(s) is able
to run and function correctly if the ``backend_url`` option is not set.
The availability check of the instances is provided by heartbeat messages.
When the connection with an instance is lost, the workload will be
reassigned within the remained instances in the next polling cycle.
reassigned within the remaining instances in the next polling cycle.
.. note:: Memcached uses a timeout value, which should always be set to
.. note::
Memcached uses a timeout value, which should always be set to
a value that is higher than the heartbeat value set for Telemetry.
For backward compatibility and supporting existing deployments, the central
agent configuration also supports using different configuration files for
groups of service instances of this type that are running in parallel.
agent configuration supports using different configuration files. This is for
groups of service instances that are running in parallel.
For enabling this configuration, set a value for the
``partitioning_group_prefix`` option in the
`polling section <http://docs.openstack.org/newton/config-reference/telemetry/telemetry-config-options.html>`__
`polling section <http://docs.openstack.org/newton/config-reference/telemetry/telemetry-config-options.html>`_
in the OpenStack Configuration Reference.
.. warning:: For each sub-group of the central agent pool with the same
``partitioning_group_prefix`` a disjoint subset of meters must be polled --
otherwise samples may be missing or duplicated. The list of meters to poll
.. warning::
For each sub-group of the central agent pool with the same
``partitioning_group_prefix``, a disjoint subset of meters must be polled
to avoid samples being missing or duplicated. The list of meters to poll
can be set in the :file:`/etc/ceilometer/pipeline.yaml` configuration file.
For more information about pipelines see the `Data collection and
processing
<http://docs.openstack.org/admin-guide/telemetry-data-collection.html#data-collection-and-processing>`__
<http://docs.openstack.org/admin-guide/telemetry-data-collection.html#data-collection-and-processing>`_
section.
To enable the compute agent to run multiple instances simultaneously with
workload partitioning, the workload_partitioning option has to be set to
``True`` under the `compute section <http://docs.openstack.org/newton/config-reference/telemetry.html>`__
workload partitioning, the ``workload_partitioning`` option must be set to
``True`` under the `compute section <http://docs.openstack.org/newton/config-reference/telemetry.html>`_
in the :file:`ceilometer.conf` configuration file.

View File

@ -1,13 +1,12 @@
=================
Configure the VIP
=================
You must select and assign a virtual IP address (VIP)
that can freely float between cluster nodes.
You must select and assign a virtual IP address (VIP) that can freely float
between cluster nodes.
This configuration creates ``vip``,
a virtual IP address for use by the API node (``10.0.0.11``):
This configuration creates ``vip``, a virtual IP address for use by the
API node (``10.0.0.11``).
For ``crmsh``:

View File

@ -2,8 +2,8 @@
Configuring the controller for high availability
================================================
The cloud controller runs on the management network
and must talk to all other services.
The cloud controller runs on the management network and must talk to
all other services.
.. toctree::
:maxdepth: 2

View File

@ -2,29 +2,26 @@
Hardware considerations for high availability
=============================================
.. TODO: Provide a minimal architecture example for HA, expanded on that
given in the *Environment* section of
http://docs.openstack.org/project-install-guide/newton (depending
on the distribution) for easy comparison.
When you use high availability, consider the hardware requirements needed
for your application.
Hardware setup
~~~~~~~~~~~~~~
The standard hardware requirements:
The following are the standard hardware requirements:
- Provider networks. See the *Overview -> Networking Option 1: Provider
- Provider networks: See the *Overview -> Networking Option 1: Provider
networks* section of the
`Install Tutorials and Guides <http://docs.openstack.org/project-install-guide/newton>`_
depending on your distribution.
- Self-service networks. See the *Overview -> Networking Option 2:
- Self-service networks: See the *Overview -> Networking Option 2:
Self-service networks* section of the
`Install Tutorials and Guides <http://docs.openstack.org/project-install-guide/newton>`_
depending on your distribution.
However, OpenStack does not require a significant amount of resources
and the following minimum requirements should support
a proof-of-concept high availability environment
with core services and several instances:
OpenStack does not require a significant amount of resources and the following
minimum requirements should support a proof-of-concept high availability
environment with core services and several instances:
+-------------------+------------------+----------+-----------+------+
| Node type | Processor Cores | Memory | Storage | NIC |
@ -39,26 +36,23 @@ nodes is 2 milliseconds. Although the cluster software can be tuned to
operate at higher latencies, some vendors insist on this value before
agreeing to support the installation.
The `ping` command can be used to find the latency between two
servers.
You can use the `ping` command to find the latency between two servers.
Virtualized hardware
~~~~~~~~~~~~~~~~~~~~
For demonstrations and studying,
you can set up a test environment on virtual machines (VMs).
This has the following benefits:
For demonstrations and studying, you can set up a test environment on virtual
machines (VMs). This has the following benefits:
- One physical server can support multiple nodes,
each of which supports almost any number of network interfaces.
- Ability to take periodic "snap shots" throughout the installation process
and "roll back" to a working configuration in the event of a problem.
- You can take periodic snap shots throughout the installation process
and roll back to a working configuration in the event of a problem.
However, running an OpenStack environment on VMs
degrades the performance of your instances,
particularly if your hypervisor and/or processor lacks support
for hardware acceleration of nested VMs.
However, running an OpenStack environment on VMs degrades the performance of
your instances, particularly if your hypervisor or processor lacks
support for hardware acceleration of nested VMs.
.. note::

View File

@ -1,40 +1,32 @@
=================
Install memcached
=================
====================
Installing Memcached
====================
[TODO: Verify that Oslo supports hash synchronization;
if so, this should not take more than load balancing.]
[TODO: This hands off to two different docs for install information.
We should choose one or explain the specific purpose of each.]
Most OpenStack services can use memcached
to store ephemeral data such as tokens.
Although memcached does not support
typical forms of redundancy such as clustering,
OpenStack services can use almost any number of instances
Most OpenStack services can use Memcached to store ephemeral data such as
tokens. Although Memcached does not support typical forms of redundancy such
as clustering, OpenStack services can use almost any number of instances
by configuring multiple hostnames or IP addresses.
The memcached client implements hashing
to balance objects among the instances.
Failure of an instance only impacts a percentage of the objects
The Memcached client implements hashing to balance objects among the instances.
Failure of an instance only impacts a percentage of the objects,
and the client automatically removes it from the list of instances.
To install and configure memcached, read the
`official documentation <https://github.com/memcached/memcached/wiki#getting-started>`_.
To install and configure Memcached, read the
`official documentation <https://github.com/Memcached/Memcached/wiki#getting-started>`_.
Memory caching is managed by `oslo.cache
<http://specs.openstack.org/openstack/oslo-specs/specs/kilo/oslo-cache-using-dogpile.html>`_
so the way to use multiple memcached servers is the same for all projects.
Example configuration with three hosts:
<http://specs.openstack.org/openstack/oslo-specs/specs/kilo/oslo-cache-using-dogpile.html>`_.
This ensures consistency across all projects when using multiple Memcached
servers. The following is an example configuration with three hosts:
.. code-block:: ini
memcached_servers = controller1:11211,controller2:11211,controller3:11211
Memcached_servers = controller1:11211,controller2:11211,controller3:11211
By default, ``controller1`` handles the caching service.
If the host goes down, ``controller2`` or ``controller3`` does the job.
For more information about memcached installation, see the
By default, ``controller1`` handles the caching service. If the host goes down,
``controller2`` or ``controller3`` will complete the service.
For more information about Memcached installation, see the
*Environment -> Memcached* section in the
`Installation Tutorials and Guides <http://docs.openstack.org/project-install-guide/newton>`_
depending on your distribution.

View File

@ -1,6 +1,6 @@
=====================================
Install operating system on each node
=====================================
===============================
Installing the operating system
===============================
The first step in setting up your highly available OpenStack cluster
is to install the operating system on each node.

View File

@ -3,7 +3,7 @@ HA community
============
Weekly IRC meetings
-------------------
~~~~~~~~~~~~~~~~~~~
The OpenStack HA community holds `weekly IRC meetings
<https://wiki.openstack.org/wiki/Meetings/HATeamMeeting>`_ to discuss
@ -12,7 +12,7 @@ encouraged to attend. The `logs of all previous meetings
<http://eavesdrop.openstack.org/meetings/ha/>`_ are available to read.
Contacting the community
------------------------
~~~~~~~~~~~~~~~~~~~~~~~~
You can contact the HA community directly in `the #openstack-ha
channel on Freenode IRC <https://wiki.openstack.org/wiki/IRC>`_, or by

View File

@ -1,20 +1,23 @@
=================================
=================================
OpenStack High Availability Guide
=================================
Abstract
~~~~~~~~
This guide describes how to install and configure
OpenStack for high availability.
It supplements the Installation Tutorials and Guides
This guide describes how to install and configure OpenStack for high
availability. It supplements the Installation Tutorials and Guides
and assumes that you are familiar with the material in those guides.
This guide documents OpenStack Newton, Mitaka, and Liberty releases.
.. warning:: This guide is a work-in-progress and changing rapidly
while we continue to test and enhance the guidance. Please note
where there are open "to do" items and help where you are able.
.. warning::
This guide is a work-in-progress and changing rapidly
while we continue to test and enhance the guidance. There are
open `TODO` items throughout and available on the OpenStack manuals
`bug list <https://bugs.launchpad.net/openstack-manuals/>`_.
Please help where you are able.
Contents
~~~~~~~~

View File

@ -4,28 +4,28 @@ Configure high availability of instances
As of September 2016, the OpenStack High Availability community is
designing and developing an official and unified way to provide high
availability for instances. That is, we are developing automatic
availability for instances. We are developing automatic
recovery from failures of hardware or hypervisor-related software on
the compute node, or other failures which could prevent instances from
functioning correctly - issues with a cinder volume I/O path, for example.
the compute node, or other failures that could prevent instances from
functioning correctly, such as, issues with a cinder volume I/O path.
More details are available in the `user story
<http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html>`_
co-authored by OpenStack's HA community and `Product Working Group
<https://wiki.openstack.org/wiki/ProductTeam>`_ (PWG), who have
identified this feature as missing functionality in OpenStack which
<https://wiki.openstack.org/wiki/ProductTeam>`_ (PWG), where this feature is
identified as missing functionality in OpenStack, which
should be addressed with high priority.
Existing solutions
------------------
~~~~~~~~~~~~~~~~~~
The architectural challenges of instance HA and several currently
existing solutions were presented in `a talk at the Austin summit
<https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation>`_,
for which `slides are also available
<http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/>`_.
for which `slides are also available <http://aspiers.github.io/openstack-summit-2016-austin-compute-ha/>`_.
The code for three of these solutions can be found online:
The code for three of these solutions can be found online at the following
links:
* `a mistral-based auto-recovery workflow
<https://github.com/gryf/mistral-evacuate>`_, by Intel
@ -35,7 +35,7 @@ The code for three of these solutions can be found online:
as used by Red Hat and SUSE
Current upstream work
---------------------
~~~~~~~~~~~~~~~~~~~~~
Work is in progress on a unified approach, which combines the best
aspects of existing upstream solutions. More details are available on

View File

@ -2,24 +2,24 @@
The Pacemaker architecture
==========================
What is a cluster manager
~~~~~~~~~~~~~~~~~~~~~~~~~
What is a cluster manager?
~~~~~~~~~~~~~~~~~~~~~~~~~~
At its core, a cluster is a distributed finite state machine capable
of co-ordinating the startup and recovery of inter-related services
across a set of machines.
Even a distributed and/or replicated application that is able to
survive failures on one or more machines can benefit from a
cluster manager:
Even a distributed or replicated application that is able to survive failures
on one or more machines can benefit from a cluster manager because a cluster
manager has the following capabilities:
#. Awareness of other applications in the stack
While SYS-V init replacements like systemd can provide
deterministic recovery of a complex stack of services, the
recovery is limited to one machine and lacks the context of what
is happening on other machines - context that is crucial to
determine the difference between a local failure, clean startup
is happening on other machines. This context is crucial to
determine the difference between a local failure, and clean startup
and recovery after a total site failure.
#. Awareness of instances on other machines
@ -27,17 +27,17 @@ cluster manager:
Services like RabbitMQ and Galera have complicated boot-up
sequences that require co-ordination, and often serialization, of
startup operations across all machines in the cluster. This is
especially true after site-wide failure or shutdown where we must
especially true after a site-wide failure or shutdown where you must
first determine the last machine to be active.
#. A shared implementation and calculation of `quorum
<http://en.wikipedia.org/wiki/Quorum_(Distributed_Systems)>`_.
<http://en.wikipedia.org/wiki/Quorum_(Distributed_Systems)>`_
It is very important that all members of the system share the same
view of who their peers are and whether or not they are in the
majority. Failure to do this leads very quickly to an internal
`split-brain <http://en.wikipedia.org/wiki/Split-brain_(computing)>`_
state - where different parts of the system are pulling in
state. This is where different parts of the system are pulling in
different and incompatible directions.
#. Data integrity through fencing (a non-responsive process does not
@ -46,7 +46,7 @@ cluster manager:
A single application does not have sufficient context to know the
difference between failure of a machine and failure of the
application on a machine. The usual practice is to assume the
machine is dead and carry on, however this is highly risky - a
machine is dead and continue working, however this is highly risky. A
rogue process or machine could still be responding to requests and
generally causing havoc. The safer approach is to make use of
remotely accessible power switches and/or network switches and SAN
@ -59,46 +59,46 @@ cluster manager:
required volume of requests. A cluster can automatically recover
failed instances to prevent additional load induced failures.
For this reason, the use of a cluster manager like `Pacemaker
<http://clusterlabs.org>`_ is highly recommended.
For these reasons, we highly recommend the use of a cluster manager like
`Pacemaker <http://clusterlabs.org>`_.
Deployment flavors
~~~~~~~~~~~~~~~~~~
It is possible to deploy three different flavors of the Pacemaker
architecture. The two extremes are **Collapsed** (where every
component runs on every node) and **Segregated** (where every
architecture. The two extremes are ``Collapsed`` (where every
component runs on every node) and ``Segregated`` (where every
component runs in its own 3+ node cluster).
Regardless of which flavor you choose, it is recommended that the
clusters contain at least three nodes so that we can take advantage of
Regardless of which flavor you choose, we recommend that
clusters contain at least three nodes so that you can take advantage of
`quorum <quorum_>`_.
Quorum becomes important when a failure causes the cluster to split in
two or more partitions. In this situation, you want the majority to
ensure the minority are truly dead (through fencing) and continue to
host resources. For a two-node cluster, no side has the majority and
two or more partitions. In this situation, you want the majority members of
the system to ensure the minority are truly dead (through fencing) and continue
to host resources. For a two-node cluster, no side has the majority and
you can end up in a situation where both sides fence each other, or
both sides are running the same services - leading to data corruption.
both sides are running the same services. This can lead to data corruption.
Clusters with an even number of hosts suffer from similar issues - a
Clusters with an even number of hosts suffer from similar issues. A
single network failure could easily cause a N:N split where neither
side retains a majority. For this reason, we recommend an odd number
of cluster members when scaling up.
You can have up to 16 cluster members (this is currently limited by
the ability of corosync to scale higher). In extreme cases, 32 and
even up to 64 nodes could be possible, however, this is not well tested.
even up to 64 nodes could be possible. However, this is not well tested.
Collapsed
---------
In this configuration, there is a single cluster of 3 or more
In a collapsed configuration, there is a single cluster of 3 or more
nodes on which every component is running.
This scenario has the advantage of requiring far fewer, if more
powerful, machines. Additionally, being part of a single cluster
allows us to accurately model the ordering dependencies between
allows you to accurately model the ordering dependencies between
components.
This scenario can be visualized as below.
@ -136,12 +136,11 @@ It is also possible to follow a segregated approach for one or more
components that are expected to be a bottleneck and use a collapsed
approach for the remainder.
Proxy server
~~~~~~~~~~~~
Almost all services in this stack benefit from being proxied.
Using a proxy server provides:
Using a proxy server provides the following capabilities:
#. Load distribution
@ -152,8 +151,8 @@ Using a proxy server provides:
#. API isolation
By sending all API access through the proxy, we can clearly
identify service interdependencies. We can also move them to
By sending all API access through the proxy, you can clearly
identify service interdependencies. You can also move them to
locations other than ``localhost`` to increase capacity if the
need arises.
@ -169,7 +168,7 @@ Using a proxy server provides:
The proxy can be configured as a secondary mechanism for detecting
service failures. It can even be configured to look for nodes in
a degraded state (such as being 'too far' behind in the
a degraded state (such as being too far behind in the
replication) and take them out of circulation.
The following components are currently unable to benefit from the use
@ -179,20 +178,13 @@ of a proxy server:
* Memcached
* MongoDB
However, the reasons vary and are discussed under each component's
heading.
We recommend HAProxy as the load balancer, however, there are many
alternatives in the marketplace.
We use a check interval of 1 second, however, the timeouts vary by service.
We recommend HAProxy as the load balancer, however, there are many alternative
load balancing solutions in the marketplace.
Generally, we use round-robin to distribute load amongst instances of
active/active services, however, Galera uses the ``stick-table`` options
to ensure that incoming connections to the virtual IP (VIP) should be
directed to only one of the available back ends.
In Galera's case, although it can run active/active, this helps avoid
lock contention and prevent deadlocks. It is used in combination with
the ``httpchk`` option that ensures only nodes that are in sync with its
active/active services. Alternatively, Galera uses ``stack-table`` options
to ensure that incoming connection to virtual IP (VIP) are directed to only one
of the available back ends. This helps avoid lock contention and prevent
deadlocks, although Galera can run active/active. Used in combination with
the ``httpchk`` option, this ensure only nodes that are in sync with their
peers are allowed to handle requests.

View File

@ -2,20 +2,18 @@
High availability concepts
==========================
High availability systems seek to minimize two things:
High availability systems seek to minimize the following issues:
**System downtime**
Occurs when a user-facing service is unavailable
beyond a specified maximum amount of time.
#. System downtime: Occurs when a user-facing service is unavailable
beyond a specified maximum amount of time.
**Data loss**
Accidental deletion or destruction of data.
#. Data loss: Accidental deletion or destruction of data.
Most high availability systems guarantee protection against system downtime
and data loss only in the event of a single failure.
However, they are also expected to protect against cascading failures,
where a single failure deteriorates into a series of consequential failures.
Many service providers guarantee :term:`Service Level Agreement (SLA)`
Many service providers guarantee a :term:`Service Level Agreement (SLA)`
including uptime percentage of computing service, which is calculated based
on the available time and system downtime excluding planned outage time.
@ -65,19 +63,16 @@ guarantee 99.99% availability for individual guest instances.
This document discusses some common methods of implementing highly
available systems, with an emphasis on the core OpenStack services and
other open source services that are closely aligned with OpenStack.
These methods are by no means the only ways to do it;
you may supplement these services with commercial hardware and software
that provides additional features and functionality.
You also need to address high availability concerns
for any applications software that you run on your OpenStack environment.
The important thing is to make sure that your services are redundant
and available; how you achieve that is up to you.
Stateless vs. stateful services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You will need to address high availability concerns for any applications
software that you run on your OpenStack environment. The important thing is
to make sure that your services are redundant and available.
How you achieve that is up to you.
Preventing single points of failure can depend on whether or not a
service is stateless.
Stateless versus stateful services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following are the definitions of stateless and stateful services:
Stateless service
A service that provides a response after your request
@ -86,13 +81,13 @@ Stateless service
you need to provide redundant instances and load balance them.
OpenStack services that are stateless include ``nova-api``,
``nova-conductor``, ``glance-api``, ``keystone-api``,
``neutron-api`` and ``nova-scheduler``.
``neutron-api``, and ``nova-scheduler``.
Stateful service
A service where subsequent requests to the service
depend on the results of the first request.
Stateful services are more difficult to manage because a single
action typically involves more than one request, so simply providing
action typically involves more than one request. Providing
additional instances and load balancing does not solve the problem.
For example, if the horizon user interface reset itself every time
you went to a new page, it would not be very useful.
@ -101,10 +96,11 @@ Stateful service
Making stateful services highly available can depend on whether you choose
an active/passive or active/active configuration.
Active/Passive vs. Active/Active
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Active/passive versus active/active
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stateful services may be configured as active/passive or active/active:
Stateful services can be configured as active/passive or active/active,
which are defined as follows:
:term:`active/passive configuration`
Maintains a redundant instance
@ -148,7 +144,7 @@ in order for the cluster to remain functional.
When one node fails and failover transfers control to other nodes,
the system must ensure that data and processes remain sane.
To determine this, the contents of the remaining nodes are compared
and, if there are discrepancies, a "majority rules" algorithm is implemented.
and, if there are discrepancies, a majority rules algorithm is implemented.
For this reason, each cluster in a high availability environment should
have an odd number of nodes and the quorum is defined as more than a half
@ -157,7 +153,7 @@ If multiple nodes fail so that the cluster size falls below the quorum
value, the cluster itself fails.
For example, in a seven-node cluster, the quorum should be set to
floor(7/2) + 1 == 4. If quorum is four and four nodes fail simultaneously,
``floor(7/2) + 1 == 4``. If quorum is four and four nodes fail simultaneously,
the cluster itself would fail, whereas it would continue to function, if
no more than three nodes fail. If split to partitions of three and four nodes
respectively, the quorum of four nodes would continue to operate the majority
@ -169,25 +165,23 @@ example.
.. note::
Note that setting the quorum to a value less than floor(n/2) + 1 is not
recommended and would likely cause a split-brain in a face of network
partitions.
We do not recommend setting the quorum to a value less than ``floor(n/2) + 1``
as it would likely cause a split-brain in a face of network partitions.
Then, for the given example when four nodes fail simultaneously,
the cluster would continue to function as well. But if split to partitions of
three and four nodes respectively, the quorum of three would have made both
sides to attempt to fence the other and host resources. And without fencing
enabled, it would go straight to running two copies of each resource.
When four nodes fail simultaneously, the cluster would continue to function as
well. But if split to partitions of three and four nodes respectively, the
quorum of three would have made both sides to attempt to fence the other and
host resources. Without fencing enabled, it would go straight to running
two copies of each resource.
This is why setting the quorum to a value less than floor(n/2) + 1 is
dangerous. However it may be required for some specific cases, like a
This is why setting the quorum to a value less than ``floor(n/2) + 1`` is
dangerous. However it may be required for some specific cases, such as a
temporary measure at a point it is known with 100% certainty that the other
nodes are down.
When configuring an OpenStack environment for study or demonstration purposes,
it is possible to turn off the quorum checking;
this is discussed later in this guide.
Production systems should always run with quorum enabled.
it is possible to turn off the quorum checking. Production systems should
always run with quorum enabled.
Single-controller high availability mode
@ -203,11 +197,12 @@ but is not appropriate for a production environment.
It is possible to add controllers to such an environment
to convert it into a truly highly available environment.
High availability is not for every user. It presents some challenges.
High availability may be too complex for databases or
systems with large amounts of data. Replication can slow large systems
down. Different setups have different prerequisites. Read the guidelines
for each setup.
High availability is turned off as the default in OpenStack setups.
.. important::
High availability is turned off as the default in OpenStack setups.

View File

@ -3,17 +3,17 @@ Overview of highly available controllers
========================================
OpenStack is a set of multiple services exposed to the end users
as HTTP(s) APIs. Additionally, for own internal usage OpenStack
requires SQL database server and AMQP broker. The physical servers,
where all the components are running are often called controllers.
This modular OpenStack architecture allows to duplicate all the
as HTTP(s) APIs. Additionally, for your own internal usage, OpenStack
requires an SQL database server and AMQP broker. The physical servers,
where all the components are running, are called controllers.
This modular OpenStack architecture allows you to duplicate all the
components and run them on different controllers.
By making all the components redundant it is possible to make
OpenStack highly available.
In general we can divide all the OpenStack components into three categories:
- OpenStack APIs, these are HTTP(s) stateless services written in python,
- OpenStack APIs: These are HTTP(s) stateless services written in python,
easy to duplicate and mostly easy to load balance.
- SQL relational database server provides stateful type consumed by other
@ -42,17 +42,16 @@ Networking for high availability.
Common deployment architectures
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are primarily two recommended architectures for making OpenStack
highly available.
Both use a cluster manager such as Pacemaker or Veritas to
orchestrate the actions of the various services across a set of
machines. Since we are focused on FOSS, we will refer to these as
Pacemaker architectures.
We recommend two primary architectures for making OpenStack highly available.
The architectures differ in the sets of services managed by the
cluster.
Both use a cluster manager, such as Pacemaker or Veritas, to
orchestrate the actions of the various services across a set of
machines. Because we are focused on FOSS, we refer to these as
Pacemaker architectures.
Traditionally, Pacemaker has been positioned as an all-encompassing
solution. However, as OpenStack services have matured, they are
increasingly able to run in an active/active configuration and
@ -61,7 +60,7 @@ depend.
With this in mind, some vendors are restricting Pacemaker's use to
services that must operate in an active/passive mode (such as
cinder-volume), those with multiple states (for example, Galera) and
``cinder-volume``), those with multiple states (for example, Galera), and
those with complex bootstrapping procedures (such as RabbitMQ).
The majority of services, needing no real orchestration, are handled

View File

@ -1,4 +1,3 @@
======================================
High availability for other components
======================================

View File

@ -1,9 +1,7 @@
===========================================
Introduction to OpenStack high availability
===========================================
.. toctree::
:maxdepth: 2

View File

@ -2,12 +2,10 @@
Run Networking DHCP agent
=========================
The OpenStack Networking service has a scheduler
that lets you run multiple agents across nodes;
the DHCP agent can be natively highly available.
To configure the number of DHCP agents per network,
modify the ``dhcp_agents_per_network`` parameter
in the :file:`/etc/neutron/neutron.conf` file.
By default this is set to 1.
To achieve high availability,
assign more than one DHCP agent per network.
The OpenStack Networking (neutron) service has a scheduler that lets you run
multiple agents across nodes. The DHCP agent can be natively highly available.
To configure the number of DHCP agents per network, modify the
``dhcp_agents_per_network`` parameter in the :file:`/etc/neutron/neutron.conf`
file. By default this is set to 1. To achieve high availability, assign more
than one DHCP agent per network.

View File

@ -2,12 +2,12 @@
Run Networking L3 agent
=======================
The neutron L3 agent is scalable, due to the scheduler that supports
Virtual Router Redundancy Protocol (VRRP)
to distribute virtual routers across multiple nodes.
To enable high availability for configured routers,
edit the :file:`/etc/neutron/neutron.conf` file
to set the following values:
The Networking (neutron) service L3 agent is scalable, due to the scheduler
that supports Virtual Router Redundancy Protocol (VRRP) to distribute virtual
routers across multiple nodes.
To enable high availability for configured routers, edit the
:file:`/etc/neutron/neutron.conf` file to set the following values:
.. list-table:: /etc/neutron/neutron.conf parameters for high availability
:widths: 15 10 30

View File

@ -2,12 +2,10 @@
Run Networking LBaaS agent
==========================
Currently, no native feature is provided
to make the LBaaS agent highly available
using the default plug-in HAProxy.
A common way to make HAProxy highly available
is to use the VRRP (Virtual Router Redundancy Protocol).
Unfortunately, this is not yet implemented
in the LBaaS HAProxy plug-in.
Currently, no native feature is provided to make the LBaaS agent highly
available using the default plug-in HAProxy. A common way to make HAProxy
highly available is to use the VRRP (Virtual Router Redundancy Protocol).
Unfortunately, this is not yet implemented in the LBaaS HAProxy plug-in.
[TODO: update this section.]

View File

@ -2,11 +2,9 @@
Run Networking metadata agent
=============================
No native feature is available
to make this service highly available.
At this time, the Active/Passive solution exists
to run the neutron metadata agent
in failover mode with Pacemaker.
Currently, no native feature is available to make this service highly
available. At this time, the active/passive solution exists to run the
neutron metadata agent in failover mode with Pacemaker.
[TODO: Update this information.
Can this service now be made HA in active/active mode

View File

@ -2,10 +2,10 @@
Networking services for high availability
=========================================
Configure networking on each node. See basic information
Configure networking on each node. See the basic information
about configuring networking in the *Networking service*
section of the
`Install Tutorials and Guides <http://docs.openstack.org/project-install-guide/newton>`_
`Install Tutorials and Guides <http://docs.openstack.org/project-install-guide/newton>`_,
depending on your distribution.
Notes from planning outline:

View File

@ -12,26 +12,27 @@ Certain services running on the underlying operating system of your
OpenStack database may block Galera Cluster from normal operation
or prevent ``mysqld`` from achieving network connectivity with the cluster.
Firewall
---------
Galera Cluster requires that you open four ports to network traffic:
Galera Cluster requires that you open the following ports to network traffic:
- On ``3306``, Galera Cluster uses TCP for database client connections
and State Snapshot Transfers methods that require the client,
(that is, ``mysqldump``).
- On ``4567`` Galera Cluster uses TCP for replication traffic. Multicast
- On ``4567``, Galera Cluster uses TCP for replication traffic. Multicast
replication uses both TCP and UDP on this port.
- On ``4568`` Galera Cluster uses TCP for Incremental State Transfers.
- On ``4444`` Galera Cluster uses TCP for all other State Snapshot Transfer
- On ``4568``, Galera Cluster uses TCP for Incremental State Transfers.
- On ``4444``, Galera Cluster uses TCP for all other State Snapshot Transfer
methods.
.. seealso:: For more information on firewalls, see `Firewalls and default ports
<http://docs.openstack.org/newton/config-reference/firewalls-default-ports.html>`_ in the Configuration Reference.
.. seealso::
This can be achieved through the use of either the ``iptables``
command such as:
For more information on firewalls, see `Firewalls and default ports
<http://docs.openstack.org/newton/config-reference/firewalls-default-ports.html>`_
in the Configuration Reference.
This can be achieved using the :command:`iptables` command:
.. code-block:: console
@ -39,15 +40,14 @@ command such as:
--protocol tcp --match tcp --dport ${PORT} \
--source ${NODE-IP-ADDRESS} --jump ACCEPT
Make sure to save the changes once you are done, this will vary
Make sure to save the changes once you are done. This will vary
depending on your distribution:
- `Ubuntu <http://askubuntu.com/questions/66890/how-can-i-make-a-specific-set-of-iptables-rules-permanent#66905>`_
- `Fedora <https://fedoraproject.org/wiki/How_to_edit_iptables_rules>`_
- For `Ubuntu <http://askubuntu.com/questions/66890/how-can-i-make-a-specific-set-of-iptables-rules-permanent#66905>`_
- For `Fedora <https://fedoraproject.org/wiki/How_to_edit_iptables_rules>`_
Alternatively you may be able to make modifications using the
``firewall-cmd`` utility for FirewallD that is available on many Linux
distributions:
Alternatively, make modifications using the ``firewall-cmd`` utility for
FirewallD that is available on many Linux distributions:
.. code-block:: console
@ -60,11 +60,11 @@ SELinux
Security-Enhanced Linux is a kernel module for improving security on Linux
operating systems. It is commonly enabled and configured by default on
Red Hat-based distributions. In the context of Galera Cluster, systems with
SELinux may block the database service, keep it from starting or prevent it
SELinux may block the database service, keep it from starting, or prevent it
from establishing network connections with the cluster.
To configure SELinux to permit Galera Cluster to operate, you may need
to use the ``semanage`` utility to open the ports it uses, for
to use the ``semanage`` utility to open the ports it uses. For
example:
.. code-block:: console
@ -79,14 +79,16 @@ relaxed about database access and actions:
# semanage permissive -a mysqld_t
.. note:: Bear in mind, leaving SELinux in permissive mode is not a good
security practice. Over the longer term, you need to develop a
security policy for Galera Cluster and then switch SELinux back
into enforcing mode.
.. note::
For more information on configuring SELinux to work with
Galera Cluster, see the `Documentation
<http://galeracluster.com/documentation-webpages/selinux.html>`_
Bear in mind, leaving SELinux in permissive mode is not a good
security practice. Over the longer term, you need to develop a
security policy for Galera Cluster and then switch SELinux back
into enforcing mode.
For more information on configuring SELinux to work with
Galera Cluster, see the `SELinux Documentation
<http://galeracluster.com/documentation-webpages/selinux.html>`_
AppArmor
---------
@ -111,7 +113,7 @@ following steps on each cluster node:
# service apparmor restart
For servers that use ``systemd``, instead run this command:
For servers that use ``systemd``, run the following command:
.. code-block:: console
@ -119,7 +121,6 @@ following steps on each cluster node:
AppArmor now permits Galera Cluster to operate.
Database configuration
~~~~~~~~~~~~~~~~~~~~~~~
@ -152,21 +153,20 @@ additions.
wsrep_sst_method=rsync
Configuring mysqld
-------------------
While all of the configuration parameters available to the standard MySQL,
MariaDB or Percona XtraDB database server are available in Galera Cluster,
MariaDB, or Percona XtraDB database servers are available in Galera Cluster,
there are some that you must define an outset to avoid conflict or
unexpected behavior.
- Ensure that the database server is not bound only to to the localhost,
``127.0.0.1``. Also, do not bind it to ``0.0.0.0``. It makes ``mySQL``
bind to all IP addresses on the machine including the virtual IP address,
which will cause ``HAProxy`` not to start. Instead, bind it to the
management IP address of the controller node to enable access by other
nodes through the management network:
- Ensure that the database server is not bound only to the localhost:
``127.0.0.1``. Also, do not bind it to ``0.0.0.0``. Binding to the localhost
or ``0.0.0.0`` makes ``mySQL`` bind to all IP addresses on the machine,
including the virtual IP address causing ``HAProxy`` not to start. Instead,
bind to the management IP address of the controller node to enable access by
other nodes through the management network:
.. code-block:: ini
@ -194,7 +194,7 @@ parameters that you must define to avoid conflicts.
default_storage_engine=InnoDB
- Ensure that the InnoDB locking mode for generating auto-increment values
is set to ``2``, which is the interleaved locking mode.
is set to ``2``, which is the interleaved locking mode:
.. code-block:: ini
@ -211,8 +211,8 @@ parameters that you must define to avoid conflicts.
innodb_flush_log_at_trx_commit=0
Bear in mind, while setting this parameter to ``1`` or ``2`` can improve
performance, it introduces certain dangers. Operating system failures can
Setting this parameter to ``1`` or ``2`` can improve
performance, but it introduces certain dangers. Operating system failures can
erase the last second of transactions. While you can recover this data
from another node, if the cluster goes down at the same time
(in the event of a data center power outage), you lose this data permanently.
@ -230,19 +230,19 @@ Configuring wsrep replication
------------------------------
Galera Cluster configuration parameters all have the ``wsrep_`` prefix.
There are five that you must define for each cluster node in your
You must define the following parameters for each cluster node in your
OpenStack database.
- **wsrep Provider** The Galera Replication Plugin serves as the wsrep
Provider for Galera Cluster. It is installed on your system as the
``libgalera_smm.so`` file. You must define the path to this file in
your ``my.cnf``.
- **wsrep Provider**: The Galera Replication Plugin serves as the ``wsrep``
provider for Galera Cluster. It is installed on your system as the
``libgalera_smm.so`` file. Define the path to this file in
your ``my.cnf``:
.. code-block:: ini
wsrep_provider="/usr/lib/libgalera_smm.so"
- **Cluster Name** Define an arbitrary name for your cluster.
- **Cluster Name**: Define an arbitrary name for your cluster.
.. code-block:: ini
@ -251,7 +251,7 @@ OpenStack database.
You must use the same name on every cluster node. The connection fails
when this value does not match.
- **Cluster Address** List the IP addresses for each cluster node.
- **Cluster Address**: List the IP addresses for each cluster node.
.. code-block:: ini
@ -260,21 +260,18 @@ OpenStack database.
Replace the IP addresses given here with comma-separated list of each
OpenStack database in your cluster.
- **Node Name** Define the logical name of the cluster node.
- **Node Name**: Define the logical name of the cluster node.
.. code-block:: ini
wsrep_node_name="Galera1"
- **Node Address** Define the IP address of the cluster node.
- **Node Address**: Define the IP address of the cluster node.
.. code-block:: ini
wsrep_node_address="192.168.1.1"
Additional parameters
^^^^^^^^^^^^^^^^^^^^^^
@ -299,6 +296,6 @@ For a complete list of the available parameters, run the
| wsrep_sync_wait | 0 |
+------------------------------+-------+
For the documentation of these parameters, wsrep Provider option and status
variables available in Galera Cluster, see `Reference
For documentation about these parameters, ``wsrep`` provider option, and status
variables available in Galera Cluster, see the Galera cluster `Reference
<http://galeracluster.com/documentation-webpages/reference.html>`_.

View File

@ -2,35 +2,31 @@
Management
==========
When you finish the installation and configuration process on each
cluster node in your OpenStack database, you can initialize Galera Cluster.
When you finish installing and configuring the OpenStack database,
you can initialize the Galera Cluster.
Before you attempt this, verify that you have the following ready:
Prerequisites
~~~~~~~~~~~~~
- Database hosts with Galera Cluster installed. You need a
minimum of three hosts;
- No firewalls between the hosts;
- SELinux and AppArmor set to permit access to ``mysqld``;
- Database hosts with Galera Cluster installed
- A minimum of three hosts
- No firewalls between the hosts
- SELinux and AppArmor set to permit access to ``mysqld``
- The correct path to ``libgalera_smm.so`` given to the
``wsrep_provider`` parameter.
``wsrep_provider`` parameter
Initializing the cluster
~~~~~~~~~~~~~~~~~~~~~~~~~
In Galera Cluster, the Primary Component is the cluster of database
In the Galera Cluster, the Primary Component is the cluster of database
servers that replicate into each other. In the event that a
cluster node loses connectivity with the Primary Component, it
defaults into a non-operational state, to avoid creating or serving
inconsistent data.
By default, cluster nodes do not start as part of a Primary
Component. Instead they assume that one exists somewhere and
attempts to establish a connection with it. To create a Primary
Component, you must start one cluster node using the
``--wsrep-new-cluster`` option. You can do this using any cluster
node, it is not important which you choose. In the Primary
Component, replication and state transfers bring all databases to
the same state.
By default, cluster nodes do not start as part of a Primary Component.
In the Primary Component, replication and state transfers bring all databases
to the same state.
To start the cluster, complete the following steps:
@ -41,7 +37,7 @@ To start the cluster, complete the following steps:
# service mysql start --wsrep-new-cluster
For servers that use ``systemd``, instead run this command:
For servers that use ``systemd``, run the following command:
.. code-block:: console
@ -68,15 +64,15 @@ To start the cluster, complete the following steps:
# service mysql start
For servers that use ``systemd``, instead run this command:
For servers that use ``systemd``, run the following command:
.. code-block:: console
# systemctl start mariadb
#. When you have all cluster nodes started, log into the database
client on one of them and check the ``wsrep_cluster_size``
status variable again.
client of any cluster node and check the ``wsrep_cluster_size``
status variable again:
.. code-block:: mysql
@ -89,32 +85,33 @@ To start the cluster, complete the following steps:
+--------------------+-------+
When each cluster node starts, it checks the IP addresses given to
the ``wsrep_cluster_address`` parameter and attempts to establish
the ``wsrep_cluster_address`` parameter. It then attempts to establish
network connectivity with a database server running there. Once it
establishes a connection, it attempts to join the Primary
Component, requesting a state transfer as needed to bring itself
into sync with the cluster.
In the event that you need to restart any cluster node, you can do
so. When the database server comes back it, it establishes
connectivity with the Primary Component and updates itself to any
changes it may have missed while down.
.. note::
In the event that you need to restart any cluster node, you can do
so. When the database server comes back it, it establishes
connectivity with the Primary Component and updates itself to any
changes it may have missed while down.
Restarting the cluster
-----------------------
Individual cluster nodes can stop and be restarted without issue.
When a database loses its connection or restarts, Galera Cluster
When a database loses its connection or restarts, the Galera Cluster
brings it back into sync once it reestablishes connection with the
Primary Component. In the event that you need to restart the
entire cluster, identify the most advanced cluster node and
initialize the Primary Component on that node.
To find the most advanced cluster node, you need to check the
sequence numbers, or seqnos, on the last committed transaction for
sequence numbers, or the ``seqnos``, on the last committed transaction for
each. You can find this by viewing ``grastate.dat`` file in
database directory,
database directory:
.. code-block:: console
@ -139,26 +136,24 @@ Alternatively, if the database server is running, use the
+----------------------+--------+
This value increments with each transaction, so the most advanced
node has the highest sequence number, and therefore is the most up to date.
node has the highest sequence number and therefore is the most up to date.
Configuration tips
~~~~~~~~~~~~~~~~~~~
Deployment strategies
----------------------
Galera can be configured using one of the following
strategies:
- Each instance has its own IP address;
- Each instance has its own IP address:
OpenStack services are configured with the list of these IP
addresses so they can select one of the addresses from those
available.
- Galera runs behind HAProxy.
- Galera runs behind HAProxy:
HAProxy load balances incoming requests and exposes just one IP
address for all the clients.
@ -166,32 +161,25 @@ strategies:
Galera synchronous replication guarantees a zero slave lag. The
failover procedure completes once HAProxy detects that the active
back end has gone down and switches to the backup one, which is
then marked as 'UP'. If no back ends are up (in other words, the
Galera cluster is not ready to accept connections), the failover
procedure finishes only when the Galera cluster has been
then marked as ``UP``. If no back ends are ``UP``, the failover
procedure finishes only when the Galera Cluster has been
successfully reassembled. The SLA is normally no more than 5
minutes.
- Use MySQL/Galera in active/passive mode to avoid deadlocks on
``SELECT ... FOR UPDATE`` type queries (used, for example, by nova
and neutron). This issue is discussed more in the following:
and neutron). This issue is discussed in the following:
- `IMPORTANT: MySQL Galera does *not* support SELECT ... FOR UPDATE
<http://lists.openstack.org/pipermail/openstack-dev/2014-May/035264.html>`_
- `Understanding reservations, concurrency, and locking in Nova
<http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/>`_
Of these options, the second one is highly recommended. Although Galera
supports active/active configurations, we recommend active/passive
(enforced by the load balancer) in order to avoid lock contention.
Configuring HAProxy
--------------------
If you use HAProxy for load-balancing client access to Galera
Cluster as described in the :doc:`controller-ha-haproxy`, you can
If you use HAProxy as a load-balancing client to provide access to the
Galera Cluster, as described in the :doc:`controller-ha-haproxy`, you can
use the ``clustercheck`` utility to improve health checks.
#. Create a configuration file for ``clustercheck`` at
@ -205,7 +193,7 @@ use the ``clustercheck`` utility to improve health checks.
MYSQL_PORT="3306"
#. Log in to the database client and grant the ``clustercheck`` user
``PROCESS`` privileges.
``PROCESS`` privileges:
.. code-block:: mysql
@ -248,12 +236,10 @@ use the ``clustercheck`` utility to improve health checks.
# service xinetd enable
# service xinetd start
For servers that use ``systemd``, instead run these commands:
For servers that use ``systemd``, run the following commands:
.. code-block:: console
# systemctl daemon-reload
# systemctl enable xinetd
# systemctl start xinetd

View File

@ -13,19 +13,18 @@ You can achieve high availability for the OpenStack database in many
different ways, depending on the type of database that you want to use.
There are three implementations of Galera Cluster available to you:
- `Galera Cluster for MySQL <http://galeracluster.com/>`_ The MySQL
reference implementation from Codership, Oy;
- `MariaDB Galera Cluster <https://mariadb.org/>`_ The MariaDB
- `Galera Cluster for MySQL <http://galeracluster.com/>`_: The MySQL
reference implementation from Codership, Oy.
- `MariaDB Galera Cluster <https://mariadb.org/>`_: The MariaDB
implementation of Galera Cluster, which is commonly supported in
environments based on Red Hat distributions;
- `Percona XtraDB Cluster <http://www.percona.com/>`_ The XtraDB
environments based on Red Hat distributions.
- `Percona XtraDB Cluster <http://www.percona.com/>`_: The XtraDB
implementation of Galera Cluster from Percona.
In addition to Galera Cluster, you can also achieve high availability
through other database options, such as PostgreSQL, which has its own
replication system.
.. toctree::
:maxdepth: 2

View File

@ -9,8 +9,7 @@ execution of jobs entered into the system.
The most popular AMQP implementation used in OpenStack installations
is RabbitMQ.
RabbitMQ nodes fail over both on the application and the
infrastructure layers.
RabbitMQ nodes fail over on the application and the infrastructure layers.
The application layer is controlled by the ``oslo.messaging``
configuration options for multiple AMQP hosts. If the AMQP node fails,
@ -21,7 +20,7 @@ constitutes its SLA.
On the infrastructure layer, the SLA is the time for which RabbitMQ
cluster reassembles. Several cases are possible. The Mnesia keeper
node is the master of the corresponding Pacemaker resource for
RabbitMQ; when it fails, the result is a full AMQP cluster downtime
RabbitMQ. When it fails, the result is a full AMQP cluster downtime
interval. Normally, its SLA is no more than several minutes. Failure
of another node that is a slave of the corresponding Pacemaker
resource for RabbitMQ results in no AMQP cluster downtime at all.
@ -32,43 +31,18 @@ Making the RabbitMQ service highly available involves the following steps:
- :ref:`Configure RabbitMQ for HA queues<rabbitmq-configure>`
- :ref:`Configure OpenStack services to use Rabbit HA queues
- :ref:`Configure OpenStack services to use RabbitMQ HA queues
<rabbitmq-services>`
.. note::
Access to RabbitMQ is not normally handled by HAproxy. Instead,
Access to RabbitMQ is not normally handled by HAProxy. Instead,
consumers must be supplied with the full list of hosts running
RabbitMQ with ``rabbit_hosts`` and turn on the ``rabbit_ha_queues``
option.
Jon Eck found the `core issue
<http://people.redhat.com/jeckersb/private/vip-failover-tcp-persist.html>`_
and went into some detail regarding the `history and solution
<http://john.eckersberg.com/improving-ha-failures-with-tcp-timeouts.html>`_
on his blog.
In summary though:
The source address for the connection from HAProxy back to the
client is the VIP address. However the VIP address is no longer
present on the host. This means that the network (IP) layer
deems the packet unroutable, and informs the transport (TCP)
layer. TCP, however, is a reliable transport. It knows how to
handle transient errors and will retry. And so it does.
In this case that is a problem though, because:
TCP generally holds on to hope for a long time. A ballpark
estimate is somewhere on the order of tens of minutes (30
minutes is commonly referenced). During this time it will keep
probing and trying to deliver the data.
It is important to note that HAProxy has no idea that any of this is
happening. As far as its process is concerned, it called
``write()`` with the data and the kernel returned success. The
resolution is already understood and just needs to make its way
through a review.
option. For more information, read the `core issue
<http://people.redhat.com/jeckersb/private/vip-failover-tcp-persist.html>`_.
For more detail, read the `history and solution
<http://john.eckersberg.com/improving-ha-failures-with-tcp-timeouts.html>`_.
.. _rabbitmq-install:
@ -93,17 +67,16 @@ you are using:
* - SLES 12
- :command:`# zypper addrepo -f obs://Cloud:OpenStack:Kilo/SLE_12 Kilo`
[Verify fingerprint of imported GPG key; see below]
[Verify the fingerprint of the imported GPG key. See below.]
:command:`# zypper install rabbitmq-server`
.. note::
For SLES 12, the packages are signed by GPG key 893A90DAD85F9316.
You should verify the fingerprint of the imported GPG key before using it.
::
.. code-block:: ini
Key ID: 893A90DAD85F9316
Key Name: Cloud:OpenStack OBS Project <Cloud:OpenStack@build.opensuse.org>
@ -111,8 +84,8 @@ you are using:
Key Created: Tue Oct 8 13:34:21 2013
Key Expires: Thu Dec 17 13:34:21 2015
For more information,
see the official installation manual for the distribution:
For more information, see the official installation manual for the
distribution:
- `Debian and Ubuntu <http://www.rabbitmq.com/install-debian.html>`_
- `RPM based <http://www.rabbitmq.com/install-rpm.html>`_
@ -123,53 +96,45 @@ see the official installation manual for the distribution:
Configure RabbitMQ for HA queues
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[TODO: This section should begin with a brief mention
about what HA queues are and why they are valuable, etc]
.. [TODO: This section should begin with a brief mention
.. about what HA queues are and why they are valuable, etc]
We are building a cluster of RabbitMQ nodes to construct a RabbitMQ broker,
which is a logical grouping of several Erlang nodes.
.. [TODO: replace "currently" with specific release names]
.. [TODO: Does this list need to be updated? Perhaps we need a table
.. that shows each component and the earliest release that allows it
.. to work with HA queues.]
The following components/services can work with HA queues:
[TODO: replace "currently" with specific release names]
[TODO: Does this list need to be updated? Perhaps we need a table
that shows each component and the earliest release that allows it
to work with HA queues.]
- OpenStack Compute
- OpenStack Block Storage
- OpenStack Networking
- Telemetry
We have to consider that, while exchanges and bindings
survive the loss of individual nodes,
queues and their messages do not
because a queue and its contents are located on one node.
If we lose this node, we also lose the queue.
Consider that, while exchanges and bindings survive the loss of individual
nodes, queues and their messages do not because a queue and its contents
are located on one node. If we lose this node, we also lose the queue.
Mirrored queues in RabbitMQ improve
the availability of service since it is resilient to failures.
Mirrored queues in RabbitMQ improve the availability of service since
it is resilient to failures.
Production servers should run (at least) three RabbitMQ servers;
for testing and demonstration purposes,
it is possible to run only two servers.
In this section, we configure two nodes,
called ``rabbit1`` and ``rabbit2``.
To build a broker, we need to ensure
that all nodes have the same Erlang cookie file.
Production servers should run (at least) three RabbitMQ servers for testing
and demonstration purposes, however it is possible to run only two servers.
In this section, we configure two nodes, called ``rabbit1`` and ``rabbit2``.
To build a broker, ensure that all nodes have the same Erlang cookie file.
[TODO: Should the example instead use a minimum of three nodes?]
.. [TODO: Should the example instead use a minimum of three nodes?]
#. To do so, stop RabbitMQ everywhere and copy the cookie
from the first node to each of the other node(s):
#. Stop RabbitMQ and copy the cookie from the first node to each of the
other node(s):
.. code-block:: console
# scp /var/lib/rabbitmq/.erlang.cookie root@NODE:/var/lib/rabbitmq/.erlang.cookie
#. On each target node, verify the correct owner,
group, and permissions of the file :file:`erlang.cookie`.
group, and permissions of the file :file:`erlang.cookie`:
.. code-block:: console
@ -177,9 +142,7 @@ that all nodes have the same Erlang cookie file.
# chmod 400 /var/lib/rabbitmq/.erlang.cookie
#. Start the message queue service on all nodes and configure it to start
when the system boots.
On Ubuntu, it is configured by default.
when the system boots. On Ubuntu, it is configured by default.
On CentOS, RHEL, openSUSE, and SLES:
@ -216,7 +179,7 @@ that all nodes have the same Erlang cookie file.
The default node type is a disc node. In this guide, nodes
join the cluster as RAM nodes.
#. To verify the cluster status:
#. Verify the cluster status:
.. code-block:: console
@ -225,8 +188,8 @@ that all nodes have the same Erlang cookie file.
[{nodes,[{disc,[rabbit@rabbit1]},{ram,[rabbit@NODE]}]}, \
{running_nodes,[rabbit@NODE,rabbit@rabbit1]}]
If the cluster is working,
you can create usernames and passwords for the queues.
If the cluster is working, you can create usernames and passwords
for the queues.
#. To ensure that all queues except those with auto-generated names
are mirrored across all running nodes,
@ -255,53 +218,50 @@ More information is available in the RabbitMQ documentation:
Configure OpenStack services to use Rabbit HA queues
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have to configure the OpenStack components
to use at least two RabbitMQ nodes.
Configure the OpenStack components to use at least two RabbitMQ nodes.
Do this configuration on all services using RabbitMQ:
Use these steps to configurate all services using RabbitMQ:
#. RabbitMQ HA cluster host:port pairs:
#. RabbitMQ HA cluster ``host:port`` pairs:
::
.. code-block:: console
rabbit_hosts=rabbit1:5672,rabbit2:5672,rabbit3:5672
#. How frequently to retry connecting with RabbitMQ:
[TODO: document the unit of measure here? Seconds?]
#. Retry connecting with RabbitMQ:
::
.. code-block:: console
rabbit_retry_interval=1
#. How long to back-off for between retries when connecting to RabbitMQ:
[TODO: document the unit of measure here? Seconds?]
::
.. code-block:: console
rabbit_retry_backoff=2
#. Maximum retries with trying to connect to RabbitMQ (infinite by default):
::
.. code-block:: console
rabbit_max_retries=0
#. Use durable queues in RabbitMQ:
::
.. code-block:: console
rabbit_durable_queues=true
#. Use HA queues in RabbitMQ (x-ha-policy: all):
#. Use HA queues in RabbitMQ (``x-ha-policy: all``):
::
.. code-block:: console
rabbit_ha_queues=true
.. note::
If you change the configuration from an old set-up
that did not use HA queues, you should restart the service:
that did not use HA queues, restart the service:
.. code-block:: console

View File

@ -5,29 +5,20 @@
Storage back end
================
Most of this guide concerns the control plane of high availability:
ensuring that services continue to run even if a component fails.
Ensuring that data is not lost
is the data plane component of high availability;
this is discussed here.
An OpenStack environment includes multiple data pools for the VMs:
- Ephemeral storage is allocated for an instance
and is deleted when the instance is deleted.
The Compute service manages ephemeral storage.
By default, Compute stores ephemeral drives as files
on local disks on the Compute node
but Ceph RBD can instead be used
as the storage back end for ephemeral storage.
- Ephemeral storage is allocated for an instance and is deleted when the
instance is deleted. The Compute service manages ephemeral storage and
by default, Compute stores ephemeral drives as files on local disks on the
Compute node. CAs an alternative, you can use Ceph RBD as the storage back
end for ephemeral storage.
- Persistent storage exists outside all instances.
Two types of persistent storage are provided:
- Persistent storage exists outside all instances. Two types of persistent
storage are provided:
- Block Storage service (cinder)
can use LVM or Ceph RBD as the storage back end.
- Image service (glance)
can use the Object Storage service (swift)
- The Block Storage service (cinder) that can use LVM or Ceph RBD as the
storage back end.
- The Image service (glance) that can use the Object Storage service (swift)
or Ceph RBD as the storage back end.
For more information about configuring storage back ends for
@ -35,45 +26,37 @@ the different storage options, see `Manage volumes
<http://docs.openstack.org/admin-guide/blockstorage-manage-volumes.html>`_
in the OpenStack Administrator Guide.
This section discusses ways to protect against
data loss in your OpenStack environment.
This section discusses ways to protect against data loss in your OpenStack
environment.
RAID drives
-----------
Configuring RAID on the hard drives that implement storage
protects your data against a hard drive failure.
If, however, the node itself fails, data may be lost.
Configuring RAID on the hard drives that implement storage protects your data
against a hard drive failure. If the node itself fails, data may be lost.
In particular, all volumes stored on an LVM node can be lost.
Ceph
----
`Ceph RBD <http://ceph.com/>`_
is an innately high availability storage back end.
It creates a storage cluster with multiple nodes
that communicate with each other
to replicate and redistribute data dynamically.
A Ceph RBD storage cluster provides
a single shared set of storage nodes
that can handle all classes of persistent and ephemeral data
-- glance, cinder, and nova --
that are required for OpenStack instances.
`Ceph RBD <http://ceph.com/>`_ is an innately high availability storage back
end. It creates a storage cluster with multiple nodes that communicate with
each other to replicate and redistribute data dynamically.
A Ceph RBD storage cluster provides a single shared set of storage nodes that
can handle all classes of persistent and ephemeral data (glance, cinder, and
nova) that are required for OpenStack instances.
Ceph RBD provides object replication capabilities
by storing Block Storage volumes as Ceph RBD objects;
Ceph RBD ensures that each replica of an object
is stored on a different node.
This means that your volumes are protected against
hard drive and node failures
or even the failure of the data center itself.
Ceph RBD provides object replication capabilities by storing Block Storage
volumes as Ceph RBD objects. Ceph RBD ensures that each replica of an object
is stored on a different node. This means that your volumes are protected
against hard drive and node failures, or even the failure of the data center
itself.
When Ceph RBD is used for ephemeral volumes
as well as block and image storage, it supports
`live migration
When Ceph RBD is used for ephemeral volumes as well as block and image storage,
it supports `live migration
<http://docs.openstack.org/admin-guide/compute-live-migration-usage.html>`_
of VMs with ephemeral drives;
LVM only supports live migration of volume-backed VMs.
of VMs with ephemeral drives. LVM only supports live migration of
volume-backed VMs.
Remote backup facilities
------------------------

View File

@ -2,7 +2,7 @@
Highly available Block Storage API
==================================
Cinder provides 'block storage as a service' suitable for performance
Cinder provides Block-Storage-as-a-Service suitable for performance
sensitive scenarios such as databases, expandable file systems, or
providing a server with access to raw block level storage.
@ -10,7 +10,7 @@ Persistent block storage can survive instance termination and can also
be moved across instances like any external storage device. Cinder
also has volume snapshots capability for backing up the volumes.
Making this Block Storage API service highly available in
Making the Block Storage API service highly available in
active/passive mode involves:
- :ref:`ha-blockstorage-pacemaker`
@ -18,60 +18,22 @@ active/passive mode involves:
- :ref:`ha-blockstorage-services`
In theory, you can run the Block Storage service as active/active.
However, because of sufficient concerns, it is recommended running
However, because of sufficient concerns, we recommend running
the volume component as active/passive only.
Jon Bernard writes:
::
Requests are first seen by Cinder in the API service, and we have a
fundamental problem there - a standard test-and-set race condition
exists for many operations where the volume status is first checked
for an expected status and then (in a different operation) updated to
a pending status. The pending status indicates to other incoming
requests that the volume is undergoing a current operation, however it
is possible for two simultaneous requests to race here, which
undefined results.
Later, the manager/driver will receive the message and carry out the
operation. At this stage there is a question of the synchronization
techniques employed by the drivers and what guarantees they make.
If cinder-volume processes exist as different process, then the
'synchronized' decorator from the lockutils package will not be
sufficient. In this case the programmer can pass an argument to
synchronized() 'external=True'. If external is enabled, then the
locking will take place on a file located on the filesystem. By
default, this file is placed in Cinder's 'state directory' in
/var/lib/cinder so won't be visible to cinder-volume instances running
on different machines.
However, the location for file locking is configurable. So an
operator could configure the state directory to reside on shared
storage. If the shared storage in use implements unix file locking
semantics, then this could provide the requisite synchronization
needed for an active/active HA configuration.
The remaining issue is that not all drivers use the synchronization
methods, and even fewer of those use the external file locks. A
sub-concern would be whether they use them correctly.
You can read more about these concerns on the
`Red Hat Bugzilla <https://bugzilla.redhat.com/show_bug.cgi?id=1193229>`_
and there is a
`psuedo roadmap <https://etherpad.openstack.org/p/cinder-kilo-stabilisation-work>`_
for addressing them upstream.
.. _ha-blockstorage-pacemaker:
Add Block Storage API resource to Pacemaker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On RHEL-based systems, you should create resources for cinder's
systemd agents and create constraints to enforce startup/shutdown
ordering:
On RHEL-based systems, create resources for cinder's systemd agents and create
constraints to enforce startup/shutdown ordering:
.. code-block:: console
@ -115,29 +77,25 @@ and add the following cluster resources:
keystone_get_token_url="http://10.0.0.11:5000/v2.0/tokens" \
op monitor interval="30s" timeout="30s"
This configuration creates ``p_cinder-api``,
a resource for managing the Block Storage API service.
This configuration creates ``p_cinder-api``, a resource for managing the
Block Storage API service.
The command :command:`crm configure` supports batch input,
so you may copy and paste the lines above
into your live pacemaker configuration and then make changes as required.
For example, you may enter ``edit p_ip_cinder-api``
from the :command:`crm configure` menu
and edit the resource to match your preferred virtual IP address.
The command :command:`crm configure` supports batch input, copy and paste the
lines above into your live Pacemaker configuration and then make changes as
required. For example, you may enter ``edit p_ip_cinder-api`` from the
:command:`crm configure` menu and edit the resource to match your preferred
virtual IP address.
Once completed, commit your configuration changes
by entering :command:`commit` from the :command:`crm configure` menu.
Pacemaker then starts the Block Storage API service
and its dependent resources on one of your nodes.
Once completed, commit your configuration changes by entering :command:`commit`
from the :command:`crm configure` menu. Pacemaker then starts the Block Storage
API service and its dependent resources on one of your nodes.
.. _ha-blockstorage-configure:
Configure Block Storage API service
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Edit the ``/etc/cinder/cinder.conf`` file:
On a RHEL-based system, it should look something like:
Edit the ``/etc/cinder/cinder.conf`` file. For example, on a RHEL-based system:
.. code-block:: ini
:linenos:
@ -211,19 +169,17 @@ database.
.. _ha-blockstorage-services:
Configure OpenStack services to use highly available Block Storage API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Configure OpenStack services to use the highly available Block Storage API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Your OpenStack services must now point their
Block Storage API configuration to the highly available,
virtual cluster IP address
rather than a Block Storage API servers physical IP address
as you would for a non-HA environment.
Your OpenStack services must now point their Block Storage API configuration
to the highly available, virtual cluster IP address rather than a Block Storage
API servers physical IP address as you would for a non-HA environment.
You must create the Block Storage API endpoint with this IP.
Create the Block Storage API endpoint with this IP.
If you are using both private and public IP addresses,
you should create two virtual IPs and define your endpoint like this:
If you are using both private and public IP addresses, create two virtual IPs
and define your endpoint. For example:
.. code-block:: console

View File

@ -14,41 +14,56 @@ in active/passive mode involves:
Add Shared File Systems API resource to Pacemaker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You must first download the resource agent to your system:
#. Download the resource agent to your system:
.. code-block:: console
.. code-block:: console
# cd /usr/lib/ocf/resource.d/openstack
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/manila-api
# chmod a+rx *
# cd /usr/lib/ocf/resource.d/openstack
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/manila-api
# chmod a+rx *
You can now add the Pacemaker configuration for the Shared File Systems
API resource. Connect to the Pacemaker cluster with the
:command:`crm configure` command and add the following cluster resources:
#. Add the Pacemaker configuration for the Shared File Systems
API resource. Connect to the Pacemaker cluster with the following
command:
.. code-block:: ini
.. code-block:: console
primitive p_manila-api ocf:openstack:manila-api \
params config="/etc/manila/manila.conf" \
os_password="secretsecret" \
os_username="admin" \
os_tenant_name="admin" \
keystone_get_token_url="http://10.0.0.11:5000/v2.0/tokens" \
op monitor interval="30s" timeout="30s"
# crm configure
This configuration creates ``p_manila-api``, a resource for managing the
Shared File Systems API service.
.. note::
The :command:`crm configure` supports batch input, so you may copy and paste
the lines above into your live Pacemaker configuration and then make changes
as required. For example, you may enter ``edit p_ip_manila-api`` from the
:command:`crm configure` menu and edit the resource to match your preferred
virtual IP address.
The :command:`crm configure` supports batch input. Copy and paste
the lines in the next step into your live Pacemaker configuration and then
make changes as required.
Once completed, commit your configuration changes by entering :command:`commit`
from the :command:`crm configure` menu. Pacemaker then starts the
Shared File Systems API service and its dependent resources on one of your
nodes.
For example, you may enter ``edit p_ip_manila-api`` from the
:command:`crm configure` menu and edit the resource to match your preferred
virtual IP address.
#. Add the following cluster resources:
.. code-block:: ini
primitive p_manila-api ocf:openstack:manila-api \
params config="/etc/manila/manila.conf" \
os_password="secretsecret" \
os_username="admin" \
os_tenant_name="admin" \
keystone_get_token_url="http://10.0.0.11:5000/v2.0/tokens" \
op monitor interval="30s" timeout="30s"
This configuration creates ``p_manila-api``, a resource for managing the
Shared File Systems API service.
#. Commit your configuration changes by entering the following command
from the :command:`crm configure` menu:
.. code-block:: console
# commit
Pacemaker now starts the Shared File Systems API service and its
dependent resources on one of your nodes.
.. _ha-sharedfilesystems-configure:

View File

@ -2,19 +2,21 @@
Highly available Image API
==========================
The OpenStack Image service offers a service for discovering,
registering, and retrieving virtual machine images.
To make the OpenStack Image API service highly available
in active / passive mode, you must:
The OpenStack Image service offers a service for discovering, registering, and
retrieving virtual machine images. To make the OpenStack Image API service
highly available in active/passive mode, you must:
- :ref:`glance-api-pacemaker`
- :ref:`glance-api-configure`
- :ref:`glance-services`
This section assumes that you are familiar with the
Prerequisites
~~~~~~~~~~~~~
Before beginning, ensure that you are familiar with the
documentation for installing the OpenStack Image API service.
See the *Image service* section in the
`Installation Tutorials and Guides <http://docs.openstack.org/project-install-guide/newton>`_
`Installation Tutorials and Guides <http://docs.openstack.org/project-install-guide/newton>`_,
depending on your distribution.
.. _glance-api-pacemaker:
@ -22,44 +24,54 @@ depending on your distribution.
Add OpenStack Image API resource to Pacemaker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You must first download the resource agent to your system:
#. Download the resource agent to your system:
.. code-block:: console
.. code-block:: console
# cd /usr/lib/ocf/resource.d/openstack
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/glance-api
# chmod a+rx *
# cd /usr/lib/ocf/resource.d/openstack
# wget https://git.openstack.org/cgit/openstack/openstack-resource-agents/plain/ocf/glance-api
# chmod a+rx *
You can now add the Pacemaker configuration
for the OpenStack Image API resource.
Use the :command:`crm configure` command
to connect to the Pacemaker cluster
and add the following cluster resources:
#. Add the Pacemaker configuration for the OpenStack Image API resource.
Use the following command to connect to the Pacemaker cluster:
::
.. code-block:: console
primitive p_glance-api ocf:openstack:glance-api \
params config="/etc/glance/glance-api.conf" \
os_password="secretsecret" \
os_username="admin" os_tenant_name="admin" \
os_auth_url="http://10.0.0.11:5000/v2.0/" \
op monitor interval="30s" timeout="30s"
crm configure
This configuration creates ``p_glance-api``,
a resource for managing the OpenStack Image API service.
.. note::
The :command:`crm configure` command supports batch input,
so you may copy and paste the above into your live Pacemaker configuration
and then make changes as required.
For example, you may enter edit ``p_ip_glance-api``
from the :command:`crm configure` menu
and edit the resource to match your preferred virtual IP address.
The :command:`crm configure` command supports batch input. Copy and paste
the lines in the next step into your live Pacemaker configuration and
then make changes as required.
After completing these steps,
commit your configuration changes by entering :command:`commit`
from the :command:`crm configure` menu.
Pacemaker then starts the OpenStack Image API service
and its dependent resources on one of your nodes.
For example, you may enter ``edit p_ip_glance-api`` from the
:command:`crm configure` menu and edit the resource to match your
preferred virtual IP address.
#. Add the following cluster resources:
.. code-block:: console
primitive p_glance-api ocf:openstack:glance-api \
params config="/etc/glance/glance-api.conf" \
os_password="secretsecret" \
os_username="admin" os_tenant_name="admin" \
os_auth_url="http://10.0.0.11:5000/v2.0/" \
op monitor interval="30s" timeout="30s"
This configuration creates ``p_glance-api``, a resource for managing the
OpenStack Image API service.
#. Commit your configuration changes by entering the following command from
the :command:`crm configure` menu:
.. code-block:: console
commit
Pacemaker then starts the OpenStack Image API service and its dependent
resources on one of your nodes.
.. _glance-api-configure:
@ -67,7 +79,7 @@ Configure OpenStack Image service API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Edit the :file:`/etc/glance/glance-api.conf` file
to configure the OpenStack image service:
to configure the OpenStack Image service:
.. code-block:: ini
@ -93,20 +105,17 @@ to configure the OpenStack image service:
.. _glance-services:
Configure OpenStack services to use highly available OpenStack Image API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Configure OpenStack services to use the highly available OpenStack Image API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Your OpenStack services must now point
their OpenStack Image API configuration to the highly available,
virtual cluster IP address
instead of pointing to the physical IP address
of an OpenStack Image API server
as you would in a non-HA cluster.
Your OpenStack services must now point their OpenStack Image API configuration
to the highly available, virtual cluster IP address instead of pointing to the
physical IP address of an OpenStack Image API server as you would in a non-HA
cluster.
For OpenStack Compute, for example,
if your OpenStack Image API service IP address is 10.0.0.11
(as in the configuration explained here),
you would use the following configuration in your :file:`nova.conf` file:
For example, if your OpenStack Image API service IP address is 10.0.0.11
(as in the configuration explained here), you would use the following
configuration in your :file:`nova.conf` file:
.. code-block:: ini
@ -117,9 +126,8 @@ you would use the following configuration in your :file:`nova.conf` file:
You must also create the OpenStack Image API endpoint with this IP address.
If you are using both private and public IP addresses,
you should create two virtual IP addresses
and define your endpoint like this:
If you are using both private and public IP addresses, create two virtual IP
addresses and define your endpoint. For example:
.. code-block:: console