From 6b8288d015f320e585bf5bc8675246f3383d1464 Mon Sep 17 00:00:00 2001 From: Eric K Date: Thu, 29 Sep 2016 17:40:06 -0700 Subject: [PATCH] Update haht docs Change-Id: If4e3f95c51632b593f313986a05aa43f5f3f5169 --- README.rst | 2 +- doc/source/deployment.rst | 10 +++ doc/source/ha-deployment.rst | 156 ++++++++++++++++++++++++----------- doc/source/ha-overview.rst | 49 +++++------ 4 files changed, 141 insertions(+), 76 deletions(-) diff --git a/README.rst b/README.rst index bab2e93f1..e40edeb72 100644 --- a/README.rst +++ b/README.rst @@ -215,7 +215,7 @@ A bare-bones congress.conf is as follows (adapt MySQL root password): drivers = congress.datasources.neutronv2_driver.NeutronV2Driver,congress.datasources.glancev2_driver.GlanceV2Driver,congress.datasources.nova_driver.NovaDriver,congress.datasources.keystone_driver.KeystoneDriver,congress.datasources.ceilometer_driver.CeilometerDriver,congress.datasources.cinder_driver.CinderDriver,congress.datasources.swift_driver.SwiftDriver,congress.datasources.plexxi_driver.PlexxiDriver,congress.datasources.vCenter_driver.VCenterDriver,congress.datasources.murano_driver.MuranoDriver,congress.datasources.ironic_driver.IronicDriver auth_strategy = noauth [database] - connection = mysql://root:password@127.0.0.1/congress?charset=utf8 + connection = mysql+pymysql://root:password@127.0.0.1/congress?charset=utf8 For a detailed sample, please follow README-congress.conf.txt diff --git a/doc/source/deployment.rst b/doc/source/deployment.rst index 700aedb11..69cfee0c3 100644 --- a/doc/source/deployment.rst +++ b/doc/source/deployment.rst @@ -88,5 +88,15 @@ are specified in the [DEFAULT] section of the configuration file. ``debug`` Whether or not the DEBUG-level of logging is enabled. Default is false. +``transport_url`` + URL to the shared messaging service. It is not needed in a single-process + Congress deployment, but must be specified in a multi-process Congress + deployment. + +.. code-block:: text + + [DEFAULT] + transport_url = rabbit://:@: + .. include:: ha-overview.rst .. include:: ha-deployment.rst diff --git a/doc/source/ha-deployment.rst b/doc/source/ha-deployment.rst index e607c6255..bf168664c 100644 --- a/doc/source/ha-deployment.rst +++ b/doc/source/ha-deployment.rst @@ -8,72 +8,122 @@ HA Deployment ------------- Overview --------- +================== -This section shows how to deploy Congress with High Availability (HA). -Congress is divided to 2 parts in HA. First part is API and PolicyEngine -Node which is replicated with Active-Active style. Another part is -DataSource Node which is deployed with warm-standby style. Please see the -:ref:`HA Overview ` for details. +This section shows how to deploy Congress with High Availability (HA). For an +architectural overview, please see the :ref:`HA Overview `. + +An HA deployment of Congress involves five main steps. + +#. Deploy messaging and database infrastructure to be shared by all the + Congress nodes. +#. Prepare the hosts to run Congress nodes. +#. Deploy N (at least 2) policy-engine nodes. +#. Deploy one datasource-drivers node. +#. Deploy a load-balancer to load-balance between the N policy-engine nodes. + +The following sections describe each step in more detail. + + +Shared Services +================== + +All the Congress nodes share a database backend. To setup a database backend +for Congress, please follow the database portion of +`separate install instructions`__. + +__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install + +Various solutions exist to avoid creating a single point of failure with the +database backend. + +Note: If a replicated database solution is used, it must support table +locking. Galera, for example, would not work. This limitation is expected to +be removed in the Ocata release. + +A shared messaging service is also required. Refer to `Shared Messaging`__ for +instructions for installing and configuring RabbitMQ. + +__ http://docs.openstack.org/ha-guide/shared-messaging.html + + +Hosts Preparation +================== + +Congress should be installed on each host expected to run a Congress node. +Please follow the directions in `separate install instructions`__ to install +Congress on each host, skipping the local database portion. + +__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install + +In the configuration file, a ``transport_url`` should be specified to use the +RabbitMQ messaging service configured in step 1. + +For example: .. code-block:: text - +-------------------------------------+ +--------------+ - | Load Balancer (eg. HAProxy) | <----+ Push client | - +----+-------------+-------------+----+ +--------------+ - | | | - PE | PE | PE | all+DSDs node - +---------+ +---------+ +---------+ +-----------------+ - | +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ | - | | API | | | | API | | | | API | | | | DSD | | DSD | | - | +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ | - | +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ | - | | PE | | | | PE | | | | PE | | | | DSD | | DSD | | - | +-----+ | | +-----+ | | +-----+ | | +-----+ +-----+ | - +---------+ +---------+ +---------+ +--------+--------+ - | | | | - | | | | - +--+----------+-------------+--------+--------+ - | | - | | - +-------+----+ +------------------------+-----------------+ - | Oslo Msg | | DBs (policy, config, push data, exec log)| - +------------+ +------------------------------------------+ + [DEFAULT] + transport_url = rabbit://:@:5672 + +All hosts should be configured with a database connection that points to the +shared database deployed in step 1, not the local address shown in +`separate install instructions`__. + +__ http://docs.openstack.org/developer/congress/README.html?highlight=readme#separate-install + +For example: + +.. code-block:: text + + [database] + connection = mysql+pymysql://root:@/congress?charset=utf8 -HA for API and Policy Engine Node ---------------------------------- +Policy Engine Nodes +===================== -New config settings for setting the DSE node type: - -- N (>=2 even okay) nodes of PE+API node +In this step, we deploy N (at least 2) policy-engine nodes, each with an +associated API server. Each node can be started as follows: .. code-block:: console - $ python /usr/local/bin/congress-server --api --policy-engine --node-id= + $ python /usr/local/bin/congress-server --api --policy-engine --node-id= -- One single DSD node +Each node must have a unique node-id specified as a commandline option. + +For high availability, each node is usually deployed on a different host. If +multiple nodes are to be deployed on the same host, each node must have a +different port specified using the ``bind_port`` configuration option in the +congress configuration file. + + +Datasource Drivers Node +======================== + +In this step, we deploy a single datasource-drivers node in warm-standby style. + +The datasource-drivers node can be started directly with the following command: .. code-block:: console - $ python /usr/local/bin/congress-server --datasources --node-id= + $ python /usr/local/bin/congress-server --datasources --node-id= -HA for DataSource Node ----------------------- +A unique node-id (distinct from all the policy-engine nodes) must be specified. -Nodes which DataSourceDriver runs on takes warm-standby style. Congress assumes -cluster manager handles the active-standby cluster. In this document, we describe -how to make HA of DataSourceDriver node by `Pacemaker`_ . +For warm-standby deployment, an external manager is used to launch and manage +the datasource-drivers node. In this document, we sketch how to deploy the +datasource-drivers node with `Pacemaker`_ . See the `OpenStack High Availability Guide`__ for general usage of Pacemaker -and how to deploy Pacemaker cluster stack. The guide has some HA configuration -for other OpenStack projects. +and how to deploy Pacemaker cluster stack. The guide also has some HA +configuration guidance for other OpenStack projects. __ http://docs.openstack.org/ha-guide/index.html .. _Pacemaker: http://clusterlabs.org/ Prepare OCF resource agent -========================== +---------------------------- You need a custom Resource Agent (RA) for DataSoure Node HA. The custom RA is located in Congress repository, ``/path/to/congress/script/ocf/congress-datasource``. @@ -87,8 +137,8 @@ Install the RA with following steps. $ cp /path/to/congress/script/ocf/congress-datasource ./congress-datasource $ chmod a+rx congress-datasource -Configure RA -============ +Configuring the Resource Agent +------------------------------- You can now add the Pacemaker configuration for Congress DataSource Node resource. Connect to the Pacemaker cluster with the *crm configure* command and add the @@ -111,4 +161,18 @@ The RA has following configurable parameters. * config: a path of Congress's config file * node_id(Option): a node id of the datasource node. Default is "datasource-node". * binary(Option): a path of Congress binary Default is "/usr/local/bin/congress-server". -* additional_parameters(Option): additional parameters of congress-server \ No newline at end of file +* additional_parameters(Option): additional parameters of congress-server + +Load-balancer +============== + +A load-balancer should be used to distribute incoming API requests to the N +policy-engine (and API service) nodes deployed in step 3. +It is recommended that a sticky configuration be used to avoid exposing a user +to out-of-sync artifacts when the user hits different policy-engine nodes. + +`HAProxy `_ is a popular load-balancer for this +purpose. The HAProxy section of the `OpenStack High Availability Guide`__ +has instructions for deploying HAProxy for high availability. + +__ http://docs.openstack.org/ha-guide/index.html \ No newline at end of file diff --git a/doc/source/ha-overview.rst b/doc/source/ha-overview.rst index ba65a0a17..9fb0ce82c 100644 --- a/doc/source/ha-overview.rst +++ b/doc/source/ha-overview.rst @@ -20,7 +20,7 @@ HA Types ======== Warm Standby -~~~~~~~~~~~~ +------------- Warm Standby is when a software component is installed and available on the secondary node. The secondary node is up and running. In the case of a failure on the primary node, the software component is started on the @@ -29,7 +29,7 @@ Data is regularly mirrored to the secondary system using disk based replication or shared disk. This generally provides a recovery time of a few minutes. Active-Active (Load-Balanced) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------ In this method, both the primary and secondary systems are active and processing requests in parallel. Data replication happens through software capabilities and would be bi-directional. This generally provides a recovery @@ -72,24 +72,9 @@ oslo-messaging to all policy engines. | Oslo Msg | | DBs (policy, config, push data, exec log)| +------------+ +------------------------------------------+ -- Performance impact of HAHT deployment: - - - Downtime: < 1s for queries, ~2s for reactive enforcement - - Throughput and latency: leverages multi-process and multi-node parallelism - - DSDs nodes are separated from PE, allowing high load DSDs to operate more - smoothly and avoid affecting PE performance. - - PE nodes are symmetric in configuration, making it easy to load balance - evenly. - - No redundant data-pulling load on datasources - -- Requirements for HAHT deployment - - - Cluster manager (eg. Pacemaker + Corosync) to manage warm - standby - - Does not require global leader election Details -~~~~~~~ +------------- - Datasource Drivers (DSDs): @@ -156,24 +141,30 @@ Details caller to a particular node. This configuration avoids the experience of going back in time. - External components (load balancer, DBs, and oslo messaging bus) can be made - highly available using standard solutions (e.g. clustered LB, Galera MySQL - cluster, HA rabbitMQ) + highly available using standard solutions (e.g. clustered LB, HA rabbitMQ) Performance Impact ================== -- In single node deployment, there is generally no performance impact. - Increased latency due to network communication required by multi-node deployment - Increased reactive enforcement latency if action executions are persistently logged to facilitate smoother failover - PE replication can achieve greater query throughput -End User Impact -=============== -Different PE instances may be out-of-sync in their data and policies (eventual -consistency). The issue is generally made transparent to the end user by -making each user sticky to a particular PE instance. But if a PE instance -goes down, the end user reaches a different instance and may experience -out-of-sync artifacts. - +Cautions and Limitations +============================ +- Replicated PE deployment is new in the Newton release and a major departure + from the previous model. As a result, the deployer may be more likely to + experience unexpected issues. +- In the Newton release, creating a new policy requires locking a database + table. As a result, it should not be deployed with a database backend that + does not support table locking (e.g., Galera). The limitation is expected to + be removed in the Ocata release. +- Different PE instances may be out-of-sync in their data and policies + (eventual consistency). + The issue is generally made transparent to the end user by + configuring the load balancer to make each user sticky to a particular PE + instance. But if a user reaches a different PE instance (say because of load + balancer configuration or because the original instance went down), the end + user reaches a different instance and may experience out-of-sync artifacts. \ No newline at end of file