Update haht docs

Change-Id: If4e3f95c51632b593f313986a05aa43f5f3f5169
2016-09-29 17:40:06 -07:00 · 2016-09-29 17:40:06 -07:00 · bcd73c413a
parent f4488be27a
commit bcd73c413a
4 changed files with 141 additions and 76 deletions
--- a/README.rst
+++ b/README.rst
@ -215,7 +215,7 @@ A bare-bones congress.conf is as follows (adapt MySQL root password):
  drivers = congress.datasources.neutronv2_driver.NeutronV2Driver,congress.datasources.glancev2_driver.GlanceV2Driver,congress.datasources.nova_driver.NovaDriver,congress.datasources.keystone_driver.KeystoneDriver,congress.datasources.ceilometer_driver.CeilometerDriver,congress.datasources.cinder_driver.CinderDriver,congress.datasources.swift_driver.SwiftDriver,congress.datasources.plexxi_driver.PlexxiDriver,congress.datasources.vCenter_driver.VCenterDriver,congress.datasources.murano_driver.MuranoDriver,congress.datasources.ironic_driver.IronicDriver
  auth_strategy = noauth
  [database]
-  connection = mysql://root:password@127.0.0.1/congress?charset=utf8
+  connection = mysql+pymysql://root:password@127.0.0.1/congress?charset=utf8

 For a detailed sample, please follow README-congress.conf.txt

--- a/doc/source/deployment.rst
+++ b/doc/source/deployment.rst
@ -88,5 +88,15 @@ are specified in the [DEFAULT] section of the configuration file.
 ``debug``
    Whether or not the DEBUG-level of logging is enabled. Default is false.

+``transport_url``
+    URL to the shared messaging service. It is not needed in a single-process
+    Congress deployment, but must be specified in a multi-process Congress
+    deployment.
+
+.. code-block:: text
+
+    [DEFAULT]
+    transport_url = rabbit://<rabbit-userid>:<rabbit-password>@<rabbit-host-address>:<port>
+
 .. include:: ha-overview.rst
 .. include:: ha-deployment.rst
--- a/doc/source/ha-deployment.rst
+++ b/doc/source/ha-deployment.rst
@ -8,72 +8,122 @@ HA Deployment
 -------------

 Overview
--------
+==================

-This section shows how to deploy Congress with High Availability (HA).
-Congress is divided to 2 parts in HA. First part is API and PolicyEngine
-Node which is replicated with Active-Active style. Another part is
-DataSource Node which is deployed with warm-standby style. Please see the
-:ref:`HA Overview <ha_overview>` for details.
+This section shows how to deploy Congress with High Availability (HA). For an
+architectural overview, please see the :ref:`HA Overview <ha_overview>`.
+
+An HA deployment of Congress involves five main steps.
+
+#. Deploy messaging and database infrastructure to be shared by all the
+   Congress nodes.
+#. Prepare the hosts to run Congress nodes.
+#. Deploy N (at least 2) policy-engine nodes.
+#. Deploy one datasource-drivers node.
+#. Deploy a load-balancer to load-balance between the N policy-engine nodes.
+
+The following sections describe each step in more detail.
+
+
+Shared Services
+==================
+
+All the Congress nodes share a database backend. To setup a database backend
+for Congress, please follow the database portion of
+`separate install instructions`__.
+
+__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
+
+Various solutions exist to avoid creating a single point of failure with the
+database backend.
+
+Note: If a replicated database solution is used, it must support table
+locking. Galera, for example, would not work. This limitation is expected to
+be removed in the Ocata release.
+
+A shared messaging service is also required. Refer to `Shared Messaging`__ for
+instructions for installing and configuring RabbitMQ.
+
+__ http://docs.openstack.org/ha-guide/shared-messaging.html
+
+
+Hosts Preparation
+==================
+
+Congress should be installed on each host expected to run a Congress node.
+Please follow the directions in `separate install instructions`__ to install
+Congress on each host, skipping the local database portion.
+
+__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
+
+In the configuration file, a ``transport_url`` should be specified to use the
+RabbitMQ messaging service configured in step 1.
+
+For example:

 .. code-block:: text

-  +-------------------------------------+      +--------------+
-  |       Load Balancer (eg. HAProxy)   | <----+ Push client  |
-  +----+-------------+-------------+----+      +--------------+
-       |             |             |
-  PE   |        PE   |        PE   |        all+DSDs node
-  +---------+   +---------+   +---------+   +-----------------+
-  | +-----+ |   | +-----+ |   | +-----+ |   | +-----+ +-----+ |
-  | | API | |   | | API | |   | | API | |   | | DSD | | DSD | |
-  | +-----+ |   | +-----+ |   | +-----+ |   | +-----+ +-----+ |
-  | +-----+ |   | +-----+ |   | +-----+ |   | +-----+ +-----+ |
-  | | PE  | |   | | PE  | |   | | PE  | |   | | DSD | | DSD | |
-  | +-----+ |   | +-----+ |   | +-----+ |   | +-----+ +-----+ |
-  +---------+   +---------+   +---------+   +--------+--------+
-       |             |             |                 |
-       |             |             |                 |
-       +--+----------+-------------+--------+--------+
-          |                                 |
-          |                                 |
-  +-------+----+   +------------------------+-----------------+
-  |  Oslo Msg  |   | DBs (policy, config, push data, exec log)|
-  +------------+   +------------------------------------------+
+    [DEFAULT]
+    transport_url = rabbit://<rabbit-userid>:<rabbit-password>@<rabbit-host-address>:5672
+
+All hosts should be configured with a database connection that points to the
+shared database deployed in step 1, not the local address shown in
+`separate install instructions`__.
+
+__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install
+
+For example:
+
+.. code-block:: text
+
+    [database]
+    connection = mysql+pymysql://root:<database-password>@<shared-database-ip-address>/congress?charset=utf8


-HA for API and Policy Engine Node
---------------------------------
+Policy Engine Nodes
+=====================

-New config settings for setting the DSE node type:
-
- N (>=2 even okay) nodes of PE+API node
+In this step, we deploy N (at least 2) policy-engine nodes, each with an
+associated API server. Each node can be started as follows:

  .. code-block:: console

-    $ python /usr/local/bin/congress-server --api --policy-engine --node-id=<api_unique_id>
+    $ python /usr/local/bin/congress-server --api --policy-engine --node-id=<unique_node_id>

- One single DSD node
+Each node must have a unique node-id specified as a commandline option.
+
+For high availability, each node is usually deployed on a different host. If
+multiple nodes are to be deployed on the same host, each node must have a
+different port specified using the ``bind_port`` configuration option in the
+congress configuration file.
+
+
+Datasource Drivers Node
+========================
+
+In this step, we deploy a single datasource-drivers node in warm-standby style.
+
+The datasource-drivers node can be started directly with the following command:

  .. code-block:: console

-    $ python /usr/local/bin/congress-server --datasources --node-id=<datasource_unique_id>
+    $ python /usr/local/bin/congress-server --datasources --node-id=<unique_node_id>

-HA for DataSource Node
----------------------
+A unique node-id (distinct from all the policy-engine nodes) must be specified.

-Nodes which DataSourceDriver runs on takes warm-standby style. Congress assumes
-cluster manager handles the active-standby cluster. In this document, we describe
-how to make HA of DataSourceDriver node by `Pacemaker`_ .
+For warm-standby deployment, an external manager is used to launch and manage
+the datasource-drivers node. In this document, we sketch how to deploy the
+datasource-drivers node with `Pacemaker`_ .

 See the `OpenStack High Availability Guide`__ for general usage of Pacemaker
-and how to deploy Pacemaker cluster stack. The guide has some HA configuration
-for other OpenStack projects.
+and how to deploy Pacemaker cluster stack. The guide also has some HA
+configuration guidance for other OpenStack projects.

 __ http://docs.openstack.org/ha-guide/index.html
 .. _Pacemaker: http://clusterlabs.org/

 Prepare OCF resource agent
-==========================
+----------------------------

 You need a custom Resource Agent (RA) for DataSoure Node HA. The custom RA is
 located in Congress repository, ``/path/to/congress/script/ocf/congress-datasource``.
@ -87,8 +137,8 @@ Install the RA with following steps.
  $ cp /path/to/congress/script/ocf/congress-datasource ./congress-datasource
  $ chmod a+rx congress-datasource

-Configure RA
-============
+Configuring the Resource Agent
+-------------------------------

 You can now add the Pacemaker configuration for Congress DataSource Node resource.
 Connect to the Pacemaker cluster with the *crm configure* command and add the
@ -111,4 +161,18 @@ The RA has following configurable parameters.
 * config: a path of Congress's config file
 * node_id(Option): a node id of the datasource node. Default is "datasource-node".
 * binary(Option): a path of Congress binary Default is "/usr/local/bin/congress-server".
-* additional_parameters(Option): additional parameters of congress-server
+* additional_parameters(Option): additional parameters of congress-server
+
+Load-balancer
+==============
+
+A load-balancer should be used to distribute incoming API requests to the N
+policy-engine (and API service) nodes deployed in step 3.
+It is recommended that a sticky configuration be used to avoid exposing a user
+to out-of-sync artifacts when the user hits different policy-engine nodes.
+
+`HAProxy <http://www.haproxy.org/>`_ is a popular load-balancer for this
+purpose. The HAProxy section of the `OpenStack High Availability Guide`__
+has instructions for deploying HAProxy for high availability.
+
+__ http://docs.openstack.org/ha-guide/index.html
--- a/doc/source/ha-overview.rst
+++ b/doc/source/ha-overview.rst
@ -20,7 +20,7 @@ HA Types
 ========

 Warm Standby
-~~~~~~~~~~~~
+-------------
 Warm Standby is when a software component is installed and available on the
 secondary node. The secondary node is up and running. In the case of a
 failure on the primary node, the software component is started on the
@ -29,7 +29,7 @@ Data is regularly mirrored to the secondary system using disk based replication
 or shared disk. This generally provides a recovery time of a few minutes.

 Active-Active (Load-Balanced)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+------------------------------
 In this method, both the primary and secondary systems are active and
 processing requests in parallel. Data replication happens through software
 capabilities and would be bi-directional. This generally provides a recovery
@ -72,24 +72,9 @@ oslo-messaging to all policy engines.
  |  Oslo Msg  |   | DBs (policy, config, push data, exec log)|
  +------------+   +------------------------------------------+

- Performance impact of HAHT deployment:
-
-  - Downtime: < 1s for queries, ~2s for reactive enforcement
-  - Throughput and latency: leverages multi-process and multi-node parallelism
-  - DSDs nodes are separated from PE, allowing high load DSDs to operate more
-    smoothly and avoid affecting PE performance.
-  - PE nodes are symmetric in configuration, making it easy to load balance
-    evenly.
-  - No redundant data-pulling load on datasources
-
- Requirements for HAHT deployment
-
-  - Cluster manager (eg. Pacemaker + Corosync) to manage warm
-    standby
-  - Does not require global leader election

 Details
-~~~~~~~
+-------------

 - Datasource Drivers (DSDs):

@ -156,24 +141,30 @@ Details
    caller to a particular node. This configuration avoids the experience of
    going back in time.
 - External components (load balancer, DBs, and oslo messaging bus) can be made
-  highly available using standard solutions (e.g. clustered LB, Galera MySQL
-  cluster, HA rabbitMQ)
+  highly available using standard solutions (e.g. clustered LB, HA rabbitMQ)


 Performance Impact
 ==================
- In single node deployment, there is generally no performance impact.
 - Increased latency due to network communication required by multi-node
  deployment
 - Increased reactive enforcement latency if action executions are persistently
  logged to facilitate smoother failover
 - PE replication can achieve greater query throughput

-End User Impact
-===============
-Different PE instances may be out-of-sync in their data and policies (eventual
-consistency). The issue is generally made transparent to the end  user by
-making each user sticky to a particular PE instance. But if a PE instance
-goes down, the end user reaches a different instance and may experience
-out-of-sync artifacts.
-
+Cautions and Limitations
+============================
+- Replicated PE deployment is new in the Newton release and a major departure
+  from the previous model. As a result, the deployer may be more likely to
+  experience unexpected issues.
+- In the Newton release, creating a new policy requires locking a database
+  table. As a result, it should not be deployed with a database backend that
+  does not support table locking (e.g., Galera). The limitation is expected to
+  be removed in the Ocata release.
+- Different PE instances may be out-of-sync in their data and policies
+  (eventual consistency).
+  The issue is generally made transparent to the end  user by
+  configuring the load balancer to make each user sticky to a particular PE
+  instance. But if a user reaches a different PE instance (say because of load
+  balancer configuration or because the original instance went down), the end
+  user reaches a different instance and may experience out-of-sync artifacts.