4fc5f1cc6e
1.Use opendev.org instead of git.openstack.org. 2.Use review.opendev.org instead of review.openstack.org. You can see the discussion below: http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html Change-Id: I377ea63827b6f3995ee275fdc466c1b2100eafc7
287 lines
13 KiB
ReStructuredText
287 lines
13 KiB
ReStructuredText
..
|
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
|
License.
|
|
|
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
|
|
|
======================
|
|
High-Availability
|
|
======================
|
|
|
|
Kolla deployments will be maturing from evaluation-level environments to
|
|
highly available and scalable environments supporting production
|
|
applications and services. The goal for Kolla HA is to provide active/active,
|
|
highly available and independently scalable OpenStack services. This spec
|
|
focuses on providing high availability within a single Kolla-based
|
|
OpenStack deployment.
|
|
|
|
Problem description
|
|
===================
|
|
|
|
OpenStack consists of several core and infrastructure services. The services
|
|
work in concert to provide a cloud computing environment. By default, the
|
|
services are single points of failure and are unable to scale beyond a single
|
|
instance.
|
|
|
|
Use cases
|
|
---------
|
|
1. Fail any one or a combination of OpenStack core and/or infrastructure
|
|
services and the overall system continues to operate.
|
|
2. Scale any one or a combination of OpenStack core and/or infrastructure
|
|
services without any interruption of service.
|
|
|
|
Proposed change
|
|
===============
|
|
|
|
The spec consists of the following components for providing high
|
|
availability and scalability to Kolla-based OpenStack services:
|
|
|
|
MySQL Galera: A synchronous multi-master database clustering technology
|
|
for MySQL/InnoDB databases that includes features such as:
|
|
|
|
* Synchronous replication
|
|
* Active/active multi-master topology
|
|
* Read and write to any cluster node
|
|
* True parallel replication, on row level
|
|
* Direct client connections, native MySQL look & feel
|
|
* No slave lag or integrity issues
|
|
|
|
**Note:** Although Galera supports an active/active multi-master topology
|
|
with multiple active writers, a few OpenStack services cause database
|
|
deadlocks in this scenario. This is because these services use the
|
|
SELECT... FOR UPDATE SQL construct, even though it's [documented][]
|
|
not to. It appears that [Nova][] has recently fixed the issue and
|
|
[Neutron][] is making progress
|
|
|
|
[documented]: https://github.com/openstack/nova/blob/da59d3228125d7e7427c0ba70180db17c597e8fb/nova/openstack/common/db/sqlalchemy/session.py#L180-196
|
|
[Nova]: https://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/lock-free-quota-management.html
|
|
[Neutron]: https://bugs.launchpad.net/neutron/+bug/1364358 https://bugs.launchpad.net/neutron/+bug/1331564
|
|
|
|
Testing should be performed as part of the Galera implementation to verify
|
|
whether the recent patches address the deadlock issues. If so, multiple
|
|
active/active writers can exist within the Galera cluster. If not,then
|
|
the initial implementation should continue using the well documented
|
|
work around until the issue is completely resolved.
|
|
|
|
Several OpenStack services utilize a message queuing system to send and
|
|
receive messages between software components. The intention of this
|
|
spec is to leverage RabbitMQ as the messaging system since it is most
|
|
commonly used within the OpenStack community. Clustering and RabbitMQ
|
|
Mirrored Queues provide active/active and highly scalable message
|
|
queuing for OpenStack services.
|
|
|
|
* RabbitMQ Clustering: If the RabbitMQ broker consists of a single node,
|
|
then a failure of that node will cause downtime, temporary
|
|
unavailability of service, and potentially loss of messages. A cluster
|
|
of RabbitMQ nodes can be used to construct the RabbitMQ broker.
|
|
Clustering RabbitMQ nodes are resilient to the loss of individual nodes
|
|
in terms of overall availability of the service. All data/state
|
|
required for the operation of a RabbitMQ broker is replicated across
|
|
all nodes, for reliability and scaling. An exception to this are message
|
|
queues, which by default reside on the node that created them, though
|
|
they are visible and reachable from all nodes.
|
|
|
|
* RabbitMQ Mirrored Queues: While exchanges and bindings survive the loss
|
|
of individual nodes, message queues and their messages do not. This is
|
|
because a queue and its contents reside on exactly one node, thus the
|
|
loss of a node will render its queues unavailable. To solve these
|
|
various problems, RabbitMQ has developed active/active high-availability
|
|
for message queues. This works by allowing queues to be mirrored on
|
|
other nodes within a RabbitMQ cluster. The result is that should one
|
|
node of a cluster fail, the queue can automatically switch to one of the
|
|
mirrors and continue to operate, with no unavailability of service. This
|
|
solution still requires a RabbitMQ Cluster, which means that it will not
|
|
cope seamlessly with network partitions within the cluster and, for that
|
|
reason, is not recommended for use across a WAN (though of course,
|
|
clients can still connect from as near and as far as needed).
|
|
|
|
HAProxy provide load-balancing between clients and OpenStack API Endpoints.
|
|
HAProxy is a free, very fast and reliable software solution offering high
|
|
availability, load balancing, and proxying for TCP and HTTP-based
|
|
applications. HAProxy implements an event-driven, single-process model
|
|
which enables support for a high number of simultaneous connections.
|
|
|
|
Memcached is required by nova-consoleauth, Horizon and Swift Proxy to store
|
|
ephemeral data such as tokens. Memcached does not support typical forms of
|
|
redundancy such as clustering. However, OpenStack services can support more
|
|
than one Memcached instance by configuring multiple hostnames or IP addresses.
|
|
The Memcached client implements hashing to balance objects among the
|
|
instances. Failure of a Memcached instance only impacts a portion of the objects
|
|
and the client automatically removes it from the list of instances.
|
|
|
|
An Example configuration with two hosts:
|
|
'memcached_servers = kolla1:11211,kolla2:11211'
|
|
|
|
Memcached does not implement authentication and therefore is insecure.
|
|
This risk is reduced in default deployments because Memcached is not exposed
|
|
outside of localhost. Since deploying Memcahced in a highly-available manner
|
|
requires exposing Memcached outside of localhost, precautions should be taken
|
|
to reduce this risk.
|
|
|
|
Keepalived is routing software that provides simple and robust facilities
|
|
for load-balancing Linux systems. Keepalived will track the HAProxy process
|
|
and failover/back between HAProxy instances with minimal service interruption.
|
|
Keepalived implements the Virtual Router Redundancy Protocol (VRRP).
|
|
VRRP creates virtual routers, which are an abstract representation of
|
|
multiple routers, i.e. master and backup routers, acting as a group.
|
|
The default gateway of a participating host is assigned to the
|
|
virtual router instead of a physical router. If the physical router that
|
|
is routing packets on behalf of the virtual router fails, another physical
|
|
router is selected to automatically replace it. The physical router that
|
|
is forwarding packets at any given time is called the master router.
|
|
|
|
Neutron: HAProxy will provide high-availability and scalability to the
|
|
neutron-server service by load-balancing traffic across multiple instances
|
|
of the service. RabbitMQ clustering and mirrored queues will be used to
|
|
provide high-availability and scalability for RPC messaging among Neutron
|
|
components. Galera will provide high-availability and scalability to Neutron
|
|
persistent data. Multiple Neutron Agents will be deployed across separate
|
|
nodes. [Multiple L3/DHCP Agents][] will be assigned per tenant network to
|
|
provide network high-availability for tenant networks.
|
|
|
|
[Multiple L3/DHCP Agents]: https://wiki.openstack.org/wiki/Neutron/L3_High_Availability_VRRP
|
|
|
|
Glance: Glance can use different back ends to store OpenStack images. Examples
|
|
include Swift, Ceph, Cinder and the default filesystem back end. Although
|
|
it is not required, it is highly recommended to use a back end that is highly
|
|
scalable and fault-tolerant.
|
|
|
|
Just as with the rest of the OpenStack API's, HAProxy and Keepalived can
|
|
provide high-availability and scalability to the Glance API and Registry
|
|
endpoints.
|
|
|
|
Swift: Multiple Swift Proxy services can be used to provide high
|
|
availability to the Swift object, account and container storage rings.
|
|
Standard Swift replication provides high-availability to data stored within
|
|
a Swift object storage system. The replication processes compare local data
|
|
with each remote copy to ensure they all contain the latest version. Object
|
|
replication uses a hash list to quickly compare subsections of each
|
|
partition. Container and account replication use a combination of
|
|
hashes and shared high water marks.
|
|
|
|
Cinder: As with other stateless services, HAProxy can provide high
|
|
availability and scalability to cinder-api. RabbitMQ clustering and mirrored
|
|
queues can provide high-availability and scalability for RPC messaging among
|
|
Cinder services. At the time of this writing, the only Cinder back end
|
|
supported is LVM. LVM can be made [highly-available][] or a new Cinder
|
|
back end, such as [Ceph][], can be added to Kolla which supports high
|
|
availability and scalability for tenant-facing block storage services.
|
|
Due to limitations described [here][], the Cinder volume manager can
|
|
not be reliably deployed in an active/active or active/passive fashion.
|
|
|
|
[highly-available]: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ap-ha-halvm-CA.html
|
|
[Ceph]: http://ceph.com/
|
|
[here]: https://bugs.launchpad.net/cinder/+bug/1322190
|
|
|
|
In general, the Kolla HA environment looks like:
|
|
|
|
![Image](https://opendev.org/openstack/kolla/raw/branch/master/specs/ha.svg)
|
|
|
|
Security impact
|
|
---------------
|
|
|
|
Keystone UUID vs PKI tokens. Tokens are used as a mechanism to
|
|
authenticate API requests of users. Keystone supports UUID and
|
|
PKI token formats. PKI tokens provide better security, but are more
|
|
difficult to deploy in an active/active manner. Therefore,
|
|
it is recommended to start with UUID tokens and add PKI tokens
|
|
in a future iteration.
|
|
|
|
Performance Impact
|
|
------------------
|
|
|
|
The proposed high-availability spec should increase the performance of
|
|
Kolla-based OpenStack clouds. A slight performance decrease can be expected due
|
|
to the additional hop introduced by the load-balancing layer. However, the
|
|
additional latency introduced by this layer is insignificant. Since this layer
|
|
provides intelligent load-balancing of services, improved performance can be
|
|
expected for services under moderate-to-heavy load. Without the intelligence
|
|
provided by the load-balancing layer, overloaded services can become degraded
|
|
and a decrease in performance can be expected.
|
|
|
|
Performance tests should be conducted for the following scenarios to validate
|
|
and/or improve the HA spec:
|
|
|
|
1. The HA environment is functioning as expected.
|
|
2. One or more API services are in a failed state.
|
|
3. One or more Galera instances are in a failed state.
|
|
4. One or more RabbitMQ Brokers are in a failed state.
|
|
5. Adding services to/from the HA environment.
|
|
|
|
Implementation
|
|
==============
|
|
|
|
Generally, the implementation should start off simple and add capabilities
|
|
through development iterations. The implementation can be organized as follows:
|
|
|
|
1. Multi-node: In order to implement HA, Kolla must first support being deployed
|
|
to multiple hosts.
|
|
|
|
2. Database: Implement a Galera container image that follows best practices
|
|
used by existing Kolla images. Use existing tools to manage the image,
|
|
the configuration file(s) and deployment of the Galera service in a highly
|
|
available and scalable fashion.
|
|
|
|
3. RabbitMQ: Implement a RabbitMQ container image that follows best practices
|
|
used by existing Kolla images. Use existing tools to manage the image,
|
|
the configuration file(s) and deployment of the Galera service in a highly
|
|
available and scalable fashion.
|
|
|
|
4. APIs: Implement HAProxy/Keepalived container images that follow best practices
|
|
used by existing Kolla images. Use existing tools to manage the image,
|
|
the configuration file(s) and deployment of the Galera service in a highly
|
|
available and scalable fashion.
|
|
|
|
5. Update Existing Images: Update existing container images with the necessary
|
|
configuration to support HA. For example, OpenStack services should use the
|
|
Virtual IP of the Galera cluster to communicate with the DB instead of an
|
|
IP assigned to an individual DB instance.
|
|
|
|
6. Update Existing Deployment Automation: Update automation scripts, playbooks,
|
|
etc to support additional container images, configuration parameters, etc.
|
|
introduced by the HA components.
|
|
|
|
7. Testing: Follow the guidance provided in the performance impact section to
|
|
test how the OpenStack environment responds and performances in various failure
|
|
scenarios.
|
|
|
|
Assignee(s)
|
|
-----------
|
|
|
|
Primary assignee:
|
|
|
|
kolla maintainers
|
|
|
|
Work Items
|
|
----------
|
|
|
|
1. Deploy existing Kolla across multiple physical hosts.
|
|
2. Create containers for HAProxy, Keepalived.
|
|
3. Add Galera Support to existing MariaDB container set.
|
|
4. Add clustering/mirrored queue support to RabbitMQ container set.
|
|
5. Add L3/DHCP agent HA to existing Neutron agent container set.
|
|
6. Create Swift containers.
|
|
7. Add/configure the Glance back end to support HA and scalability.
|
|
8. Add/configure HAproxy for services, like keystone or horizon.
|
|
|
|
Testing
|
|
=======
|
|
|
|
We don't know how to test multi-node deployment in CI/CD because
|
|
we are unsure whether the gating system allows for deployments
|
|
consisting of more than one VM. As a result, we will rely on manual
|
|
testing of the solution as a starting point.
|
|
|
|
Documentation Impact
|
|
====================
|
|
|
|
The integration-guide.md should be updated to include additional K/V
|
|
pairs introduced by HA. Additionally, a document should be developed
|
|
explaining how to deploy and configure Kolla in a highly-available
|
|
fashion.
|
|
|
|
References
|
|
==========
|
|
|
|
* [VRRP] http://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol
|