Composable HA services
Spec to implement composable services for pacemaker-managed services Change-Id: Ia28d3fe85b8a53e630df42198cafd9eccce595fb
This commit is contained in:
parent
5becb56cc0
commit
9b4df72acd
201
specs/ocata/composable-ha-architecture.rst
Normal file
201
specs/ocata/composable-ha-architecture.rst
Normal file
@ -0,0 +1,201 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================
|
||||
Composable HA architecture
|
||||
==========================
|
||||
|
||||
https://blueprints.launchpad.net/tripleo/+spec/composable-ha
|
||||
|
||||
Since Newton, we have the following services managed by pacemaker:
|
||||
|
||||
* Cloned and master/slave resources:
|
||||
galera, redis, haproxy, rabbitmq
|
||||
|
||||
* Active/Passive resources:
|
||||
VIPs, cinder-volume, cinder-backup, manila-share
|
||||
|
||||
It is currently not possible to compose the above service in the same
|
||||
way like we do today via composable roles for the non-pacemaker services
|
||||
This spec aims to address this limitation and let the operator be more flexible
|
||||
in the composition of the control plane.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
Currently tripleo has implemented no logic whatsoever to assign specific pacemaker
|
||||
managed services to roles/nodes.
|
||||
|
||||
* Since we do not have a lot in terms of hard performance data, we typically support
|
||||
three controller nodes. This is perceived as a scalability limiting factor and there is
|
||||
a general desire to be able to assign specific nodes to specific pacemaker-managed
|
||||
services (e.g. three nodes only for galera, five nodes only for rabbitmq)
|
||||
|
||||
* Right now if the operator deploys on N controllers he will get N cloned instances
|
||||
of the non-A/P pacemaker services on the same N nodes. We want to be able to
|
||||
be much more flexible. E.g. deploy galera on the first 3 nodes, rabbitmq on the
|
||||
remaining 5 nodes, etc.
|
||||
|
||||
* It is also desirable for the operator to be able to choose on which nodes the A/P
|
||||
resources will run.
|
||||
|
||||
* We also currently have a scalability limit of 16 nodes for the pacemaker cluster.
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
The proposal here is to keep the existing cluster in its current form, but to extend
|
||||
it in two ways:
|
||||
A) Allow the operator to include a specific service in a custom node and have pacemaker
|
||||
run that resource only on that node. E.g. the operator can define the following custom nodes:
|
||||
|
||||
* Node A
|
||||
pacemaker
|
||||
galera
|
||||
|
||||
* Node B
|
||||
pacemaker
|
||||
rabbitmq
|
||||
|
||||
* Node C
|
||||
pacemaker
|
||||
VIPs, cinder-volume, cinder-backup, manila-share, redis, haproxy
|
||||
|
||||
With the above definition the operator can instantiate any number of A, B or C nodes
|
||||
and scale up to a total of 16 nodes. Pacemaker will place the resources only on
|
||||
the appropriate nodes.
|
||||
|
||||
B) Allow the operator to extend the cluster beyond 16 nodes via pacemaker remote.
|
||||
For example an operator could define the following:
|
||||
|
||||
* Node A
|
||||
pacemaker
|
||||
galera
|
||||
rabbitmq
|
||||
|
||||
* Node B
|
||||
pacemaker-remote
|
||||
redis
|
||||
|
||||
* Node C
|
||||
pacemaker-remote
|
||||
VIPs, cinder-volume, cinder-backup, manila-share, redis, haproxy
|
||||
|
||||
This second scenario would allow an operator to extend beyond the 16 nodes limit.
|
||||
The only difference to scenario 1) is the fact that the quorum of the cluster is
|
||||
obtained only by the nodes from Node A.
|
||||
|
||||
The way this would work is that the placement on nodes would be controllerd by location
|
||||
rules that would work based on node properties matching.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
A bunch of alternative designs was discussed and evaluated:
|
||||
A) A cluster per service:
|
||||
|
||||
One possible architecture would be to create a separate pacemaker cluster for
|
||||
each HA service. This has been ruled out mainly for the following reasons:
|
||||
|
||||
* It cannot be done outside of containers
|
||||
* It would create a lot of network traffic
|
||||
|
||||
* It would increase the management/monitoring of the pacemaker resources and clusters
|
||||
exponentially
|
||||
|
||||
* Each service would still be limited to 16 nodes
|
||||
* A new container fencing agent would have to be written
|
||||
|
||||
B) A single cluster where only the clone-max property is set for the non A/P services
|
||||
|
||||
This would be still a single cluster, but unlike today where the cloned and
|
||||
master/slave resources run on every controller we would introduce variables to
|
||||
control the maximum number of nodes a resource could run on. E.g.
|
||||
GaleraResourceCount would set clone-max to a value different than the number of
|
||||
controllers. Example: 10 controllers, galera has clone-max set to 3, rabbit to
|
||||
5 and redis to 3.
|
||||
While this would be rather simple to implement and would change very little in the
|
||||
current semantics, this design was ruled out:
|
||||
|
||||
* We'd still have the 16 nodes limit
|
||||
* It would not provide fine grained control over which services live on which nodes
|
||||
|
||||
Security Impact
|
||||
---------------
|
||||
|
||||
No changes regarding security aspects compared to the existing status quo.
|
||||
|
||||
Other End User Impact
|
||||
---------------------
|
||||
|
||||
No particular impact except added flexibility in placing pacemaker-managed resources.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
The performance impact here is that with the added scalability it will be possible for
|
||||
an operator to dedicate specific nodes for certain pacemaker-managed services.
|
||||
There are no changes in terms of code, only a more flexible and scalable way to deploy
|
||||
services on the control plane.
|
||||
|
||||
Other Deployer Impact
|
||||
---------------------
|
||||
|
||||
This proposal aims to use the same method that the custom roles introduced in Newton
|
||||
use to tailor the services running on a node. With the very same method it will be possible
|
||||
to do that for the HA services managed by pacemaker today.
|
||||
|
||||
Developer Impact
|
||||
----------------
|
||||
|
||||
No impact
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
michele
|
||||
|
||||
Other contributors:
|
||||
cmsj, abeekhof
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
We need to work on the following:
|
||||
|
||||
1. Add location rule constraints support in puppet
|
||||
2. Make puppet-tripleo set node properties on the nodes where a service profile
|
||||
3. Create corresponding location rules
|
||||
4. Add a puppet-tripleo pacemaker-remote profile
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
No additional dependencies are required.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
We will need to test the flexible placement of the pacemaker-managed services
|
||||
within the CI. This can be done within today's CI limitations (i.e. in the three
|
||||
controller HA job we can make sure that the placement is customized and working)
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
No impact
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
Mostly internal discussions within the HA team at Red Hat
|
Loading…
Reference in New Issue
Block a user