Merge "Update haht docs" into stable/newton
This commit is contained in:
commit
6550c5eaf5
@ -215,7 +215,7 @@ A bare-bones congress.conf is as follows (adapt MySQL root password):
|
||||
drivers = congress.datasources.neutronv2_driver.NeutronV2Driver,congress.datasources.glancev2_driver.GlanceV2Driver,congress.datasources.nova_driver.NovaDriver,congress.datasources.keystone_driver.KeystoneDriver,congress.datasources.ceilometer_driver.CeilometerDriver,congress.datasources.cinder_driver.CinderDriver,congress.datasources.swift_driver.SwiftDriver,congress.datasources.plexxi_driver.PlexxiDriver,congress.datasources.vCenter_driver.VCenterDriver,congress.datasources.murano_driver.MuranoDriver,congress.datasources.ironic_driver.IronicDriver
|
||||
auth_strategy = noauth
|
||||
[database]
|
||||
connection = mysql://root:password@127.0.0.1/congress?charset=utf8
|
||||
connection = mysql+pymysql://root:password@127.0.0.1/congress?charset=utf8
|
||||
|
||||
For a detailed sample, please follow README-congress.conf.txt
|
||||
|
||||
|
@ -88,5 +88,15 @@ are specified in the [DEFAULT] section of the configuration file.
|
||||
``debug``
|
||||
Whether or not the DEBUG-level of logging is enabled. Default is false.
|
||||
|
||||
``transport_url``
|
||||
URL to the shared messaging service. It is not needed in a single-process
|
||||
Congress deployment, but must be specified in a multi-process Congress
|
||||
deployment.
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
[DEFAULT]
|
||||
transport_url = rabbit://<rabbit-userid>:<rabbit-password>@<rabbit-host-address>:<port>
|
||||
|
||||
.. include:: ha-overview.rst
|
||||
.. include:: ha-deployment.rst
|
||||
|
@ -8,72 +8,122 @@ HA Deployment
|
||||
-------------
|
||||
|
||||
Overview
|
||||
--------
|
||||
==================
|
||||
|
||||
This section shows how to deploy Congress with High Availability (HA).
|
||||
Congress is divided to 2 parts in HA. First part is API and PolicyEngine
|
||||
Node which is replicated with Active-Active style. Another part is
|
||||
DataSource Node which is deployed with warm-standby style. Please see the
|
||||
:ref:`HA Overview <ha_overview>` for details.
|
||||
This section shows how to deploy Congress with High Availability (HA). For an
|
||||
architectural overview, please see the :ref:`HA Overview <ha_overview>`.
|
||||
|
||||
An HA deployment of Congress involves five main steps.
|
||||
|
||||
#. Deploy messaging and database infrastructure to be shared by all the
|
||||
Congress nodes.
|
||||
#. Prepare the hosts to run Congress nodes.
|
||||
#. Deploy N (at least 2) policy-engine nodes.
|
||||
#. Deploy one datasource-drivers node.
|
||||
#. Deploy a load-balancer to load-balance between the N policy-engine nodes.
|
||||
|
||||
The following sections describe each step in more detail.
|
||||
|
||||
|
||||
Shared Services
|
||||
==================
|
||||
|
||||
All the Congress nodes share a database backend. To setup a database backend
|
||||
for Congress, please follow the database portion of
|
||||
`separate install instructions`__.
|
||||
|
||||
__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
|
||||
|
||||
Various solutions exist to avoid creating a single point of failure with the
|
||||
database backend.
|
||||
|
||||
Note: If a replicated database solution is used, it must support table
|
||||
locking. Galera, for example, would not work. This limitation is expected to
|
||||
be removed in the Ocata release.
|
||||
|
||||
A shared messaging service is also required. Refer to `Shared Messaging`__ for
|
||||
instructions for installing and configuring RabbitMQ.
|
||||
|
||||
__ http://docs.openstack.org/ha-guide/shared-messaging.html
|
||||
|
||||
|
||||
Hosts Preparation
|
||||
==================
|
||||
|
||||
Congress should be installed on each host expected to run a Congress node.
|
||||
Please follow the directions in `separate install instructions`__ to install
|
||||
Congress on each host, skipping the local database portion.
|
||||
|
||||
__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
|
||||
|
||||
In the configuration file, a ``transport_url`` should be specified to use the
|
||||
RabbitMQ messaging service configured in step 1.
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
+-------------------------------------+ +--------------+
|
||||
| Load Balancer (eg. HAProxy) | <----+ Push client |
|
||||
+----+-------------+-------------+----+ +--------------+
|
||||
| | |
|
||||
PE | PE | PE | all+DSDs node
|
||||
+---------+ +---------+ +---------+ +-----------------+
|
||||
| +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ |
|
||||
| | API | | | | API | | | | API | | | | DSD | | DSD | |
|
||||
| +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ |
|
||||
| +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ |
|
||||
| | PE | | | | PE | | | | PE | | | | DSD | | DSD | |
|
||||
| +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ |
|
||||
+---------+ +---------+ +---------+ +--------+--------+
|
||||
| | | |
|
||||
| | | |
|
||||
+--+----------+-------------+--------+--------+
|
||||
| |
|
||||
| |
|
||||
+-------+----+ +------------------------+-----------------+
|
||||
| Oslo Msg | | DBs (policy, config, push data, exec log)|
|
||||
+------------+ +------------------------------------------+
|
||||
[DEFAULT]
|
||||
transport_url = rabbit://<rabbit-userid>:<rabbit-password>@<rabbit-host-address>:5672
|
||||
|
||||
All hosts should be configured with a database connection that points to the
|
||||
shared database deployed in step 1, not the local address shown in
|
||||
`separate install instructions`__.
|
||||
|
||||
__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
|
||||
|
||||
For example:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
[database]
|
||||
connection = mysql+pymysql://root:<database-password>@<shared-database-ip-address>/congress?charset=utf8
|
||||
|
||||
|
||||
HA for API and Policy Engine Node
|
||||
---------------------------------
|
||||
Policy Engine Nodes
|
||||
=====================
|
||||
|
||||
New config settings for setting the DSE node type:
|
||||
|
||||
- N (>=2 even okay) nodes of PE+API node
|
||||
In this step, we deploy N (at least 2) policy-engine nodes, each with an
|
||||
associated API server. Each node can be started as follows:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ python /usr/local/bin/congress-server --api --policy-engine --node-id=<api_unique_id>
|
||||
$ python /usr/local/bin/congress-server --api --policy-engine --node-id=<unique_node_id>
|
||||
|
||||
- One single DSD node
|
||||
Each node must have a unique node-id specified as a commandline option.
|
||||
|
||||
For high availability, each node is usually deployed on a different host. If
|
||||
multiple nodes are to be deployed on the same host, each node must have a
|
||||
different port specified using the ``bind_port`` configuration option in the
|
||||
congress configuration file.
|
||||
|
||||
|
||||
Datasource Drivers Node
|
||||
========================
|
||||
|
||||
In this step, we deploy a single datasource-drivers node in warm-standby style.
|
||||
|
||||
The datasource-drivers node can be started directly with the following command:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ python /usr/local/bin/congress-server --datasources --node-id=<datasource_unique_id>
|
||||
$ python /usr/local/bin/congress-server --datasources --node-id=<unique_node_id>
|
||||
|
||||
HA for DataSource Node
|
||||
----------------------
|
||||
A unique node-id (distinct from all the policy-engine nodes) must be specified.
|
||||
|
||||
Nodes which DataSourceDriver runs on takes warm-standby style. Congress assumes
|
||||
cluster manager handles the active-standby cluster. In this document, we describe
|
||||
how to make HA of DataSourceDriver node by `Pacemaker`_ .
|
||||
For warm-standby deployment, an external manager is used to launch and manage
|
||||
the datasource-drivers node. In this document, we sketch how to deploy the
|
||||
datasource-drivers node with `Pacemaker`_ .
|
||||
|
||||
See the `OpenStack High Availability Guide`__ for general usage of Pacemaker
|
||||
and how to deploy Pacemaker cluster stack. The guide has some HA configuration
|
||||
for other OpenStack projects.
|
||||
and how to deploy Pacemaker cluster stack. The guide also has some HA
|
||||
configuration guidance for other OpenStack projects.
|
||||
|
||||
__ http://docs.openstack.org/ha-guide/index.html
|
||||
.. _Pacemaker: http://clusterlabs.org/
|
||||
|
||||
Prepare OCF resource agent
|
||||
==========================
|
||||
----------------------------
|
||||
|
||||
You need a custom Resource Agent (RA) for DataSoure Node HA. The custom RA is
|
||||
located in Congress repository, ``/path/to/congress/script/ocf/congress-datasource``.
|
||||
@ -87,8 +137,8 @@ Install the RA with following steps.
|
||||
$ cp /path/to/congress/script/ocf/congress-datasource ./congress-datasource
|
||||
$ chmod a+rx congress-datasource
|
||||
|
||||
Configure RA
|
||||
============
|
||||
Configuring the Resource Agent
|
||||
-------------------------------
|
||||
|
||||
You can now add the Pacemaker configuration for Congress DataSource Node resource.
|
||||
Connect to the Pacemaker cluster with the *crm configure* command and add the
|
||||
@ -111,4 +161,18 @@ The RA has following configurable parameters.
|
||||
* config: a path of Congress's config file
|
||||
* node_id(Option): a node id of the datasource node. Default is "datasource-node".
|
||||
* binary(Option): a path of Congress binary Default is "/usr/local/bin/congress-server".
|
||||
* additional_parameters(Option): additional parameters of congress-server
|
||||
* additional_parameters(Option): additional parameters of congress-server
|
||||
|
||||
Load-balancer
|
||||
==============
|
||||
|
||||
A load-balancer should be used to distribute incoming API requests to the N
|
||||
policy-engine (and API service) nodes deployed in step 3.
|
||||
It is recommended that a sticky configuration be used to avoid exposing a user
|
||||
to out-of-sync artifacts when the user hits different policy-engine nodes.
|
||||
|
||||
`HAProxy <http://www.haproxy.org/>`_ is a popular load-balancer for this
|
||||
purpose. The HAProxy section of the `OpenStack High Availability Guide`__
|
||||
has instructions for deploying HAProxy for high availability.
|
||||
|
||||
__ http://docs.openstack.org/ha-guide/index.html
|
@ -20,7 +20,7 @@ HA Types
|
||||
========
|
||||
|
||||
Warm Standby
|
||||
~~~~~~~~~~~~
|
||||
-------------
|
||||
Warm Standby is when a software component is installed and available on the
|
||||
secondary node. The secondary node is up and running. In the case of a
|
||||
failure on the primary node, the software component is started on the
|
||||
@ -29,7 +29,7 @@ Data is regularly mirrored to the secondary system using disk based replication
|
||||
or shared disk. This generally provides a recovery time of a few minutes.
|
||||
|
||||
Active-Active (Load-Balanced)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
------------------------------
|
||||
In this method, both the primary and secondary systems are active and
|
||||
processing requests in parallel. Data replication happens through software
|
||||
capabilities and would be bi-directional. This generally provides a recovery
|
||||
@ -72,24 +72,9 @@ oslo-messaging to all policy engines.
|
||||
| Oslo Msg | | DBs (policy, config, push data, exec log)|
|
||||
+------------+ +------------------------------------------+
|
||||
|
||||
- Performance impact of HAHT deployment:
|
||||
|
||||
- Downtime: < 1s for queries, ~2s for reactive enforcement
|
||||
- Throughput and latency: leverages multi-process and multi-node parallelism
|
||||
- DSDs nodes are separated from PE, allowing high load DSDs to operate more
|
||||
smoothly and avoid affecting PE performance.
|
||||
- PE nodes are symmetric in configuration, making it easy to load balance
|
||||
evenly.
|
||||
- No redundant data-pulling load on datasources
|
||||
|
||||
- Requirements for HAHT deployment
|
||||
|
||||
- Cluster manager (eg. Pacemaker + Corosync) to manage warm
|
||||
standby
|
||||
- Does not require global leader election
|
||||
|
||||
Details
|
||||
~~~~~~~
|
||||
-------------
|
||||
|
||||
- Datasource Drivers (DSDs):
|
||||
|
||||
@ -156,24 +141,30 @@ Details
|
||||
caller to a particular node. This configuration avoids the experience of
|
||||
going back in time.
|
||||
- External components (load balancer, DBs, and oslo messaging bus) can be made
|
||||
highly available using standard solutions (e.g. clustered LB, Galera MySQL
|
||||
cluster, HA rabbitMQ)
|
||||
highly available using standard solutions (e.g. clustered LB, HA rabbitMQ)
|
||||
|
||||
|
||||
Performance Impact
|
||||
==================
|
||||
- In single node deployment, there is generally no performance impact.
|
||||
- Increased latency due to network communication required by multi-node
|
||||
deployment
|
||||
- Increased reactive enforcement latency if action executions are persistently
|
||||
logged to facilitate smoother failover
|
||||
- PE replication can achieve greater query throughput
|
||||
|
||||
End User Impact
|
||||
===============
|
||||
Different PE instances may be out-of-sync in their data and policies (eventual
|
||||
consistency). The issue is generally made transparent to the end user by
|
||||
making each user sticky to a particular PE instance. But if a PE instance
|
||||
goes down, the end user reaches a different instance and may experience
|
||||
out-of-sync artifacts.
|
||||
|
||||
Cautions and Limitations
|
||||
============================
|
||||
- Replicated PE deployment is new in the Newton release and a major departure
|
||||
from the previous model. As a result, the deployer may be more likely to
|
||||
experience unexpected issues.
|
||||
- In the Newton release, creating a new policy requires locking a database
|
||||
table. As a result, it should not be deployed with a database backend that
|
||||
does not support table locking (e.g., Galera). The limitation is expected to
|
||||
be removed in the Ocata release.
|
||||
- Different PE instances may be out-of-sync in their data and policies
|
||||
(eventual consistency).
|
||||
The issue is generally made transparent to the end user by
|
||||
configuring the load balancer to make each user sticky to a particular PE
|
||||
instance. But if a user reaches a different PE instance (say because of load
|
||||
balancer configuration or because the original instance went down), the end
|
||||
user reaches a different instance and may experience out-of-sync artifacts.
|
Loading…
Reference in New Issue
Block a user