Spec to Add Support for High Availability

Previously, a spec did not exist to define how Kolla will support highly available OpenStack services. Change-Id: I8fcc60f26d2cb98179be6b520c13abb22a372ecf
2015-05-11 16:58:11 +00:00 · 2015-05-11 16:58:11 +00:00 · 2aebba26a4
commit 2aebba26a4
parent 035d14ff98
2 changed files with 287 additions and 0 deletions
--- a/specs/ha.svg
+++ b/specs/ha.svg
--- a/specs/high-availability.rst
+++ b/specs/high-availability.rst
@ -0,0 +1,286 @@
+..
+   This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+======================
+High-Availability
+======================
+
+Kolla deployments will be maturing from evaluation-level environments to
+highly available and scalable environments supporting production
+applications and services. The goal for Kolla HA is to provide active/active,
+highly available and independently scalable OpenStack services. This spec
+focuses on providing high availability within a single Kolla-based
+OpenStack deployment.
+
+Problem description
+===================
+
+OpenStack consists of several core and infrastructure services. The services
+work in concert to provide a cloud computing environment. By default, the
+services are single points of failure and are unable to scale beyond a single
+instance.
+
+Use cases
+---------
+1. Fail any one or a combination of OpenStack core and/or infrastructure
+   services and the overall system continues to operate.
+2. Scale any one or a combination of OpenStack core and/or infrastructure
+   services without any interruption of service.
+
+Proposed change
+===============
+
+The spec consists of the following components for providing high
+availability and scalability to Kolla-based OpenStack services:
+
+MySQL Galera: A synchronous multi-master database clustering technology
+for MySQL/InnoDB databases that includes features such as:
+
+  * Synchronous replication
+  * Active/active multi-master topology
+  * Read and write to any cluster node
+  * True parallel replication, on row level
+  * Direct client connections, native MySQL look & feel
+  * No slave lag or integrity issues
+
+**Note:** Although Galera supports an active/active multi-master topology
+with multiple active writers, a few OpenStack services cause database
+deadlocks in this scenario. This is because these services use the
+SELECT... FOR UPDATE SQL construct, even though it's [documented][]
+not to. It appears that [Nova][] has recently fixed the issue and
+[Neutron][] is making progress
+
+[documented]: https://github.com/openstack/nova/blob/da59d3228125d7e7427c0ba70180db17c597e8fb/nova/openstack/common/db/sqlalchemy/session.py#L180-196
+[Nova]: http://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/lock-free-quota-management.html
+[Neutron]: https://bugs.launchpad.net/neutron/+bug/1364358 https://bugs.launchpad.net/neutron/+bug/1331564
+
+Testing should be performed as part of the Galera implementation to verify
+whether the recent patches address the deadlock issues. If so, multiple
+active/active writers can exist within the Galera cluster. If not,then
+the initial implementation should continue using the well documented
+work around until the issue is completely resolved.
+
+Several OpenStack services utilize a message queuing system to send and
+receive messages between software components. The intention of this
+spec is to leverage RabbitMQ as the messaging system since it is most
+commonly used within the OpenStack community. Clustering and RabbitMQ
+Mirrored Queues provide active/active and highly scalable message
+queuing for OpenStack services.
+
+* RabbitMQ Clustering: If the RabbitMQ broker consists of a single node,
+  then a failure of that node will cause downtime, temporary
+  unavailability of service, and potentially loss of messages. A cluster
+  of RabbitMQ nodes can be used to construct the RabbitMQ broker.
+  Clustering RabbitMQ nodes are resilient to the loss of individual nodes
+  in terms of overall availability of the service. All data/state
+  required for the operation of a RabbitMQ broker is replicated across
+  all nodes, for reliability and scaling. An exception to this are message
+  queues, which by default reside on the node that created them, though
+  they are visible and reachable from all nodes.
+
+* RabbitMQ Mirrored Queues: While exchanges and bindings survive the loss
+  of individual nodes, message queues and their messages do not. This is
+  because a queue and its contents reside on exactly one node, thus the
+  loss of a node will render its queues unavailable. To solve these
+  various problems, RabbitMQ has developed active/active high-availability
+  for message queues. This works by allowing queues to be mirrored on
+  other nodes within a RabbitMQ cluster. The result is that should one
+  node of a cluster fail, the queue can automatically switch to one of the
+  mirrors and continue to operate, with no unavailability of service. This
+  solution still requires a RabbitMQ Cluster, which means that it will not
+  cope seamlessly with network partitions within the cluster and, for that
+  reason, is not recommended for use across a WAN (though of course,
+  clients can still connect from as near and as far as needed).
+
+HAProxy provide load-balancing between clients and OpenStack API Endpoints.
+HAProxy is a free, very fast and reliable software solution offering high
+availability, load balancing, and proxying for TCP and HTTP-based
+applications. HAProxy implements an event-driven, single-process model
+which enables support for a high number of simultaneous connections.
+
+Memcached is required by nova-consoleauth, Horizon and Swift Proxy to store
+ephemeral data such as tokens. Memcached does not support typical forms of
+redundancy such as clustering. However, OpenStack services can support more
+than one Memcached instance by configuring multiple hostnames or IP addresses.
+The Memcached client implements hashing to balance objects among the
+instances. Failure of a Memcached instance only impacts a portion of the objects
+and the client automatically removes it from the list of instances.
+
+An Example configuration with two hosts:
+'memcached_servers = kolla1:11211,kolla2:11211'
+
+Memcached does not implement authentication and therefore is insecure.
+This risk is reduced in default deployments because Memcached is not exposed
+outside of localhost. Since deploying Memcahced in a highly-available manner
+requires exposing Memcached outside of localhost, precautions should be taken
+to reduce this risk.
+
+Keepalived is routing software that provides simple and robust facilities
+for load-balancing Linux systems. Keepalived will track the HAProxy process
+and failover/back between HAProxy instances with minimal service interruption.
+Keepalived implements the Virtual Router Redundancy Protocol (VRRP).
+VRRP creates virtual routers, which are an abstract representation of
+multiple routers, i.e. master and backup routers, acting as a group.
+The default gateway of a participating host is assigned to the
+virtual router instead of a physical router. If the physical router that
+is routing packets on behalf of the virtual router fails, another physical
+router is selected to automatically replace it. The physical router that
+is forwarding packets at any given time is called the master router.
+
+Neutron: HAProxy will provide high-availability and scalability to the
+neutron-server service by load-balancing traffic across multiple instances
+of the service. RabbitMQ clustering and mirrored queues will be used to
+provide high-availability and scalability for RPC messaging among Neutron
+components. Galera will provide high-availability and scalability to Neutron
+persistent data. Multiple Neutron Agents will be deployed across separate
+nodes. [Multiple L3/DHCP Agents][] will be assigned per tenant network to
+provide network high-availability for tenant networks.
+
+[Multiple L3/DHCP Agents]: https://wiki.openstack.org/wiki/Neutron/L3_High_Availability_VRRP
+
+Glance: Glance can use different back-ends to store OpenStack images. Examples
+include Swift, Ceph, Cinder and the default filesystem back-end. Although
+it is not required, it is highly recommended to use a backend that is highly
+scalable and fault-tolerant.
+
+Just as with the rest of the OpenStack API's, HAProxy and Keepalived can
+provide high-availability and scalability to the Glance API and Registry
+endpoints.
+
+Swift: Multiple Swift Proxy services can be used to provide high
+availability to the Swift object, account and container storage rings.
+Standard Swift replication provides high-availability to data stored within
+a Swift object storage system. The replication processes compare local data
+with each remote copy to ensure they all contain the latest version. Object
+replication uses a hash list to quickly compare subsections of each
+partition. Container and account replication use a combination of
+hashes and shared high water marks.
+
+Cinder: As with other stateless services, HAProxy can provide high
+availability and scalability to cinder-api. RabbitMQ clustering and mirrored
+queues can provide high-availability and scalability for RPC messaging among
+Cinder services. At the time of this writing, the only Cinder backend
+supported is LVM. LVM can be made [highly-available][] or a new Cinder
+backend, such as [Ceph][], can be added to Kolla which supports high
+availability and scalability for tenant-facing block storage services.
+Due to limitations described [here][], the Cinder volume manager can
+not be reliably deployed in an active/active or active/passive fashion.
+
+[highly-available]: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ap-ha-halvm-CA.html
+[Ceph]: http://ceph.com/
+[here]: https://bugs.launchpad.net/cinder/+bug/1322190
+
+In general, the Kolla HA environment looks like:
+
+![Image](https://raw.githubusercontent.com/stackforge/kolla/master/specs/ha.svg)
+
+Security impact
+---------------
+
+Keystone UUID vs PKI tokens. Tokens are used as a mechanism to
+authenticate API requests of users. Keystone supports UUID and
+PKI token formats. PKI tokens provide better security, but are more
+difficult to deploy in an active/active manner. Therefore,
+it is recommended to start with UUID tokens and add PKI tokens
+in a future iteration.
+
+Performance Impact
+------------------
+
+The proposed high-availability spec should increase the performance of
+Kolla-based OpenStack clouds. A slight performance decrease can be expected due
+to the additional hop introduced by the load-balancing layer. However, the
+additional latency introduced by this layer is insignificant. Since this layer
+provides intelligent load-balancing of services, improved performance can be
+expected for services under moderate-to-heavy load. Without the intelligence
+provided by the load-balancing layer, overloaded services can become degraded
+and a decrease in performance can be expected.
+
+Performance tests should be conducted for the following scenarios to validate
+and/or improve the HA spec:
+
+1. The HA environment is functioning as expected.
+2. One or more API services are in a failed state.
+3. One or more Galera instances are in a failed state.
+4. One or more RabbitMQ Brokers are in a failed state.
+5. Adding services to/from the HA environment.
+
+Implementation
+==============
+
+Generally, the implementation should start off simple and add capabilities
+through development iterations. The implementation can be organized as follows:
+
+1. Multi-node: In order to implement HA, Kolla must first support being deployed
+to multiple hosts.
+
+2. Database: Implement a Galera container image that follows best practices
+used by existing Kolla images. Use existing tools to manage the image,
+the configuration file(s) and deployment of the Galera service in a highly
+available and scalable fashion.
+
+3. RabbitMQ: Implement a RabbitMQ container image that follows best practices
+used by existing Kolla images. Use existing tools to manage the image,
+the configuration file(s) and deployment of the Galera service in a highly
+available and scalable fashion.
+
+4. APIs: Implement HAProxy/Keepalived container images that follow best practices
+used by existing Kolla images. Use existing tools to manage the image,
+the configuration file(s) and deployment of the Galera service in a highly
+available and scalable fashion.
+
+5. Update Existing Images: Update existing container images with the necessary
+configuration to support HA. For example, OpenStack services should use the
+Virtual IP of the Galera cluster to communicate with the DB instead of an
+IP assigned to an individual DB instance.
+
+6. Update Existing Deployment Automation: Update automation scripts, playbooks,
+etc to support additional container images, configuration parameters, etc.
+introduced by the HA components.
+
+7. Testing: Follow the guidance provided in the performance impact section to
+test how the OpenStack environment responds and performances in various failure
+scenarios.
+
+Assignee(s)
+-----------
+
+Primary assignee:
+
+kolla maintainers
+
+Work Items
+----------
+
+1. Deploy existing Kolla across multiple physical hosts.
+2. Create containers for HAProxy, Keepalived.
+3. Add Galera Support to existing MariaDB container set.
+4. Add clustering/mirrored queue support to RabbitMQ container set.
+5. Add L3/DHCP agent HA to existing Neutron agent container set.
+6. Create Swift containers.
+7. Add/configure the Glance backend to support HA and scalability.
+8. Add/configure HAproxy for services, like keystone or horizon.
+
+Testing
+=======
+
+We don't know how to test multi-node deployment in CI/CD because
+we are unsure whether the gating system allows for deployments
+consisting of more than one VM. As a result, we will rely on manual
+testing of the solution as a starting point.
+
+Documentation Impact
+====================
+
+The integration-guide.md should be updated to include additional K/V
+pairs introduced by HA. Additionally, a document should be developed
+explaining how to deploy and configure Kolla in a highly-available
+fashion.
+
+References
+==========
+
+* [VRRP] http://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol