Update haht docs

Change-Id: If4e3f95c51632b593f313986a05aa43f5f3f5169
This commit is contained in:
Eric K 2016-09-29 17:40:06 -07:00
parent f4488be27a
commit bcd73c413a
4 changed files with 141 additions and 76 deletions

View File

@ -215,7 +215,7 @@ A bare-bones congress.conf is as follows (adapt MySQL root password):
drivers = congress.datasources.neutronv2_driver.NeutronV2Driver,congress.datasources.glancev2_driver.GlanceV2Driver,congress.datasources.nova_driver.NovaDriver,congress.datasources.keystone_driver.KeystoneDriver,congress.datasources.ceilometer_driver.CeilometerDriver,congress.datasources.cinder_driver.CinderDriver,congress.datasources.swift_driver.SwiftDriver,congress.datasources.plexxi_driver.PlexxiDriver,congress.datasources.vCenter_driver.VCenterDriver,congress.datasources.murano_driver.MuranoDriver,congress.datasources.ironic_driver.IronicDriver
auth_strategy = noauth
[database]
connection = mysql://root:password@127.0.0.1/congress?charset=utf8
connection = mysql+pymysql://root:password@127.0.0.1/congress?charset=utf8
For a detailed sample, please follow README-congress.conf.txt

View File

@ -88,5 +88,15 @@ are specified in the [DEFAULT] section of the configuration file.
``debug``
Whether or not the DEBUG-level of logging is enabled. Default is false.
``transport_url``
URL to the shared messaging service. It is not needed in a single-process
Congress deployment, but must be specified in a multi-process Congress
deployment.
.. code-block:: text
[DEFAULT]
transport_url = rabbit://<rabbit-userid>:<rabbit-password>@<rabbit-host-address>:<port>
.. include:: ha-overview.rst
.. include:: ha-deployment.rst

View File

@ -8,72 +8,122 @@ HA Deployment
-------------
Overview
--------
==================
This section shows how to deploy Congress with High Availability (HA).
Congress is divided to 2 parts in HA. First part is API and PolicyEngine
Node which is replicated with Active-Active style. Another part is
DataSource Node which is deployed with warm-standby style. Please see the
:ref:`HA Overview <ha_overview>` for details.
This section shows how to deploy Congress with High Availability (HA). For an
architectural overview, please see the :ref:`HA Overview <ha_overview>`.
An HA deployment of Congress involves five main steps.
#. Deploy messaging and database infrastructure to be shared by all the
Congress nodes.
#. Prepare the hosts to run Congress nodes.
#. Deploy N (at least 2) policy-engine nodes.
#. Deploy one datasource-drivers node.
#. Deploy a load-balancer to load-balance between the N policy-engine nodes.
The following sections describe each step in more detail.
Shared Services
==================
All the Congress nodes share a database backend. To setup a database backend
for Congress, please follow the database portion of
`separate install instructions`__.
__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
Various solutions exist to avoid creating a single point of failure with the
database backend.
Note: If a replicated database solution is used, it must support table
locking. Galera, for example, would not work. This limitation is expected to
be removed in the Ocata release.
A shared messaging service is also required. Refer to `Shared Messaging`__ for
instructions for installing and configuring RabbitMQ.
__ http://docs.openstack.org/ha-guide/shared-messaging.html
Hosts Preparation
==================
Congress should be installed on each host expected to run a Congress node.
Please follow the directions in `separate install instructions`__ to install
Congress on each host, skipping the local database portion.
__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
In the configuration file, a ``transport_url`` should be specified to use the
RabbitMQ messaging service configured in step 1.
For example:
.. code-block:: text
+-------------------------------------+ +--------------+
| Load Balancer (eg. HAProxy) | <----+ Push client |
+----+-------------+-------------+----+ +--------------+
| | |
PE | PE | PE | all+DSDs node
+---------+ +---------+ +---------+ +-----------------+
| +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ |
| | API | | | | API | | | | API | | | | DSD | | DSD | |
| +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ |
| +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ |
| | PE | | | | PE | | | | PE | | | | DSD | | DSD | |
| +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ |
+---------+ +---------+ +---------+ +--------+--------+
| | | |
| | | |
+--+----------+-------------+--------+--------+
| |
| |
+-------+----+ +------------------------+-----------------+
| Oslo Msg | | DBs (policy, config, push data, exec log)|
+------------+ +------------------------------------------+
[DEFAULT]
transport_url = rabbit://<rabbit-userid>:<rabbit-password>@<rabbit-host-address>:5672
All hosts should be configured with a database connection that points to the
shared database deployed in step 1, not the local address shown in
`separate install instructions`__.
__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
For example:
.. code-block:: text
[database]
connection = mysql+pymysql://root:<database-password>@<shared-database-ip-address>/congress?charset=utf8
HA for API and Policy Engine Node
---------------------------------
Policy Engine Nodes
=====================
New config settings for setting the DSE node type:
- N (>=2 even okay) nodes of PE+API node
In this step, we deploy N (at least 2) policy-engine nodes, each with an
associated API server. Each node can be started as follows:
.. code-block:: console
$ python /usr/local/bin/congress-server --api --policy-engine --node-id=<api_unique_id>
$ python /usr/local/bin/congress-server --api --policy-engine --node-id=<unique_node_id>
- One single DSD node
Each node must have a unique node-id specified as a commandline option.
For high availability, each node is usually deployed on a different host. If
multiple nodes are to be deployed on the same host, each node must have a
different port specified using the ``bind_port`` configuration option in the
congress configuration file.
Datasource Drivers Node
========================
In this step, we deploy a single datasource-drivers node in warm-standby style.
The datasource-drivers node can be started directly with the following command:
.. code-block:: console
$ python /usr/local/bin/congress-server --datasources --node-id=<datasource_unique_id>
$ python /usr/local/bin/congress-server --datasources --node-id=<unique_node_id>
HA for DataSource Node
----------------------
A unique node-id (distinct from all the policy-engine nodes) must be specified.
Nodes which DataSourceDriver runs on takes warm-standby style. Congress assumes
cluster manager handles the active-standby cluster. In this document, we describe
how to make HA of DataSourceDriver node by `Pacemaker`_ .
For warm-standby deployment, an external manager is used to launch and manage
the datasource-drivers node. In this document, we sketch how to deploy the
datasource-drivers node with `Pacemaker`_ .
See the `OpenStack High Availability Guide`__ for general usage of Pacemaker
and how to deploy Pacemaker cluster stack. The guide has some HA configuration
for other OpenStack projects.
and how to deploy Pacemaker cluster stack. The guide also has some HA
configuration guidance for other OpenStack projects.
__ http://docs.openstack.org/ha-guide/index.html
.. _Pacemaker: http://clusterlabs.org/
Prepare OCF resource agent
==========================
----------------------------
You need a custom Resource Agent (RA) for DataSoure Node HA. The custom RA is
located in Congress repository, ``/path/to/congress/script/ocf/congress-datasource``.
@ -87,8 +137,8 @@ Install the RA with following steps.
$ cp /path/to/congress/script/ocf/congress-datasource ./congress-datasource
$ chmod a+rx congress-datasource
Configure RA
============
Configuring the Resource Agent
-------------------------------
You can now add the Pacemaker configuration for Congress DataSource Node resource.
Connect to the Pacemaker cluster with the *crm configure* command and add the
@ -111,4 +161,18 @@ The RA has following configurable parameters.
* config: a path of Congress's config file
* node_id(Option): a node id of the datasource node. Default is "datasource-node".
* binary(Option): a path of Congress binary Default is "/usr/local/bin/congress-server".
* additional_parameters(Option): additional parameters of congress-server
* additional_parameters(Option): additional parameters of congress-server
Load-balancer
==============
A load-balancer should be used to distribute incoming API requests to the N
policy-engine (and API service) nodes deployed in step 3.
It is recommended that a sticky configuration be used to avoid exposing a user
to out-of-sync artifacts when the user hits different policy-engine nodes.
`HAProxy <http://www.haproxy.org/>`_ is a popular load-balancer for this
purpose. The HAProxy section of the `OpenStack High Availability Guide`__
has instructions for deploying HAProxy for high availability.
__ http://docs.openstack.org/ha-guide/index.html

View File

@ -20,7 +20,7 @@ HA Types
========
Warm Standby
~~~~~~~~~~~~
-------------
Warm Standby is when a software component is installed and available on the
secondary node. The secondary node is up and running. In the case of a
failure on the primary node, the software component is started on the
@ -29,7 +29,7 @@ Data is regularly mirrored to the secondary system using disk based replication
or shared disk. This generally provides a recovery time of a few minutes.
Active-Active (Load-Balanced)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
------------------------------
In this method, both the primary and secondary systems are active and
processing requests in parallel. Data replication happens through software
capabilities and would be bi-directional. This generally provides a recovery
@ -72,24 +72,9 @@ oslo-messaging to all policy engines.
| Oslo Msg | | DBs (policy, config, push data, exec log)|
+------------+ +------------------------------------------+
- Performance impact of HAHT deployment:
- Downtime: < 1s for queries, ~2s for reactive enforcement
- Throughput and latency: leverages multi-process and multi-node parallelism
- DSDs nodes are separated from PE, allowing high load DSDs to operate more
smoothly and avoid affecting PE performance.
- PE nodes are symmetric in configuration, making it easy to load balance
evenly.
- No redundant data-pulling load on datasources
- Requirements for HAHT deployment
- Cluster manager (eg. Pacemaker + Corosync) to manage warm
standby
- Does not require global leader election
Details
~~~~~~~
-------------
- Datasource Drivers (DSDs):
@ -156,24 +141,30 @@ Details
caller to a particular node. This configuration avoids the experience of
going back in time.
- External components (load balancer, DBs, and oslo messaging bus) can be made
highly available using standard solutions (e.g. clustered LB, Galera MySQL
cluster, HA rabbitMQ)
highly available using standard solutions (e.g. clustered LB, HA rabbitMQ)
Performance Impact
==================
- In single node deployment, there is generally no performance impact.
- Increased latency due to network communication required by multi-node
deployment
- Increased reactive enforcement latency if action executions are persistently
logged to facilitate smoother failover
- PE replication can achieve greater query throughput
End User Impact
===============
Different PE instances may be out-of-sync in their data and policies (eventual
consistency). The issue is generally made transparent to the end user by
making each user sticky to a particular PE instance. But if a PE instance
goes down, the end user reaches a different instance and may experience
out-of-sync artifacts.
Cautions and Limitations
============================
- Replicated PE deployment is new in the Newton release and a major departure
from the previous model. As a result, the deployer may be more likely to
experience unexpected issues.
- In the Newton release, creating a new policy requires locking a database
table. As a result, it should not be deployed with a database backend that
does not support table locking (e.g., Galera). The limitation is expected to
be removed in the Ocata release.
- Different PE instances may be out-of-sync in their data and policies
(eventual consistency).
The issue is generally made transparent to the end user by
configuring the load balancer to make each user sticky to a particular PE
instance. But if a user reaches a different PE instance (say because of load
balancer configuration or because the original instance went down), the end
user reaches a different instance and may experience out-of-sync artifacts.