Composable HA services

Spec to implement composable services for pacemaker-managed services Change-Id: Ia28d3fe85b8a53e630df42198cafd9eccce595fb
2016-09-25 22:02:25 +02:00 · 2016-09-25 22:02:25 +02:00 · 9b4df72acd
commit 9b4df72acd
parent 5becb56cc0
1 changed files with 201 additions and 0 deletions
--- a/specs/ocata/composable-ha-architecture.rst
+++ b/specs/ocata/composable-ha-architecture.rst
@ -0,0 +1,201 @@
 ..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.
 http://creativecommons.org/licenses/by/3.0/legalcode
 ==========================
 Composable HA architecture
 ==========================
 https://blueprints.launchpad.net/tripleo/+spec/composable-ha
 Since Newton, we have the following services managed by pacemaker:
 * Cloned and master/slave resources:
  galera, redis, haproxy, rabbitmq
 * Active/Passive resources:
  VIPs, cinder-volume, cinder-backup, manila-share
 It is currently not possible to compose the above service in the same
 way like we do today via composable roles for the non-pacemaker services
 This spec aims to address this limitation and let the operator be more flexible
 in the composition of the control plane.
 Problem Description
 ===================
 Currently tripleo has implemented no logic whatsoever to assign specific pacemaker
 managed services to roles/nodes.
 * Since we do not have a lot in terms of hard performance data, we typically support
  three controller nodes. This is perceived as a scalability limiting factor and there is
  a general desire to be able to assign specific nodes to specific pacemaker-managed
  services (e.g. three nodes only for galera, five nodes only for rabbitmq)
 * Right now if the operator deploys on N controllers he will get N cloned instances
  of the non-A/P pacemaker services on the same N nodes. We want to be able to
  be much more flexible. E.g. deploy galera on the first 3 nodes, rabbitmq on the
  remaining 5 nodes, etc.
 * It is also desirable for the operator to be able to choose on which nodes the A/P
  resources will run.
 * We also currently have a scalability limit of 16 nodes for the pacemaker cluster.
 Proposed Change
 ===============
 Overview
 --------
 The proposal here is to keep the existing cluster in its current form, but to extend
 it in two ways:
 A) Allow the operator to include a specific service in a custom node and have pacemaker
 run that resource only on that node. E.g. the operator can define the following custom nodes:
 * Node A
  pacemaker
  galera
 * Node B
  pacemaker
  rabbitmq
 * Node C
  pacemaker
  VIPs, cinder-volume, cinder-backup, manila-share, redis, haproxy
 With the above definition the operator can instantiate any number of A, B or C nodes
 and scale up to a total of 16 nodes. Pacemaker will place the resources only on
 the appropriate nodes.
 B) Allow the operator to extend the cluster beyond 16 nodes via pacemaker remote.
 For example an operator could define the following:
 * Node A
  pacemaker
  galera
  rabbitmq
 * Node B
  pacemaker-remote
  redis
 * Node C
  pacemaker-remote
  VIPs, cinder-volume, cinder-backup, manila-share, redis, haproxy
 This second scenario would allow an operator to extend beyond the 16 nodes limit.
 The only difference to scenario 1) is the fact that the quorum of the cluster is
 obtained only by the nodes from Node A.
 The way this would work is that the placement on nodes would be controllerd by location
 rules that would work based on node properties matching.
 Alternatives
 ------------
 A bunch of alternative designs was discussed and evaluated:
 A) A cluster per service:
 One possible architecture would be to create a separate pacemaker cluster for
 each HA service. This has been ruled out mainly for the following reasons:
 * It cannot be done outside of containers
 * It would create a lot of network traffic
 * It would increase the management/monitoring of the pacemaker resources and clusters
  exponentially
 * Each service would still be limited to 16 nodes
 * A new container fencing agent would have to be written
 B) A single cluster where only the clone-max property is set for the non A/P services
 This would be still a single cluster, but unlike today where the cloned and
 master/slave resources run on every controller we would introduce variables to
 control the maximum number of nodes a resource could run on. E.g.
 GaleraResourceCount would set clone-max to a value different than the number of
 controllers. Example: 10 controllers, galera has clone-max set to 3, rabbit to
 5 and redis to 3.
 While this would be rather simple to implement and would change very little in the
 current semantics, this design was ruled out:
 * We'd still have the 16 nodes limit
 * It would not provide fine grained control over which services live on which nodes
 Security Impact
 ---------------
 No changes regarding security aspects compared to the existing status quo.
 Other End User Impact
 ---------------------
 No particular impact except added flexibility in placing pacemaker-managed resources.
 Performance Impact
 ------------------
 The performance impact here is that with the added scalability it will be possible for
 an operator to dedicate specific nodes for certain pacemaker-managed services.
 There are no changes in terms of code, only a more flexible and scalable way to deploy
 services on the control plane.
 Other Deployer Impact
 ---------------------
 This proposal aims to use the same method that the custom roles introduced in Newton
 use to tailor the services running on a node. With the very same method it will be possible
 to do that for the HA services managed by pacemaker today.
 Developer Impact
 ----------------
 No impact
 Implementation
 ==============
 Assignee(s)
 -----------
 Primary assignee:
  michele
 Other contributors:
  cmsj, abeekhof
 Work Items
 ----------
 We need to work on the following:
 1. Add location rule constraints support in puppet
 2. Make puppet-tripleo set node properties on the nodes where a service profile
 3. Create corresponding location rules
 4. Add a puppet-tripleo pacemaker-remote profile
 Dependencies
 ============
 No additional dependencies are required.
 Testing
 =======
 We will need to test the flexible placement of the pacemaker-managed services
 within the CI. This can be done within today's CI limitations (i.e. in the three
 controller HA job we can make sure that the placement is customized and working)
 Documentation Impact
 ====================
 No impact
 References
 ==========
 Mostly internal discussions within the HA team at Red Hat