[[ch-pacemaker]] === The Pacemaker Cluster Stack OpenStack infrastructure high availability relies on the http://www.clusterlabs.org[Pacemaker] cluster stack, the state-of-the-art high availability and load balancing stack for the Linux platform. Pacemaker is storage and application-agnostic, and is in no way specific to OpenStack. Pacemaker relies on the http://www.corosync.org[Corosync] messaging layer for reliable cluster communications. Corosync implements the Totem single-ring ordering and membership protocol. It also provides UDP and InfiniBand based messaging, quorum, and cluster membership to Pacemaker. Pacemaker interacts with applications through _resource agents_ (RAs), of which it supports over 70 natively. Pacemaker can also easily use third-party RAs. An OpenStack high-availability configuration uses existing native Pacemaker RAs (such as those managing MySQL databases or virtual IP addresses), existing third-party RAs (such as for RabbitMQ), and native OpenStack RAs (such as those managing the OpenStack Identity and Image Services). ==== Installing Packages On any host that is meant to be part of a Pacemaker cluster, you must first establish cluster communications through the Corosync messaging layer. This involves installing the following packages (and their dependencies, which your package manager will normally install automatically): * +pacemaker+ * +corosync+ * +cluster-glue+ * +fence-agents+ (Fedora only; all other distributions use fencing agents from +cluster-glue+) * +resource-agents+ ==== Setting up Corosync Besides installing the +corosync+ package, you will also have to create a configuration file, stored in +/etc/corosync/corosync.conf+. Most distributions ship an example configuration file (+corosync.conf.example+) as part of the documentation bundled with the +corosync+ package. An example Corosync configuration file is shown below: .Corosync configuration file (+corosync.conf+) ---- include::includes/corosync.conf[] ---- <1> The +token+ value specifies the time, in milliseconds, during which the Corosync token is expected to be transmitted around the ring. When this timeout expires, the token is declared lost, and after +token_retransmits_before_loss_const+ lost tokens the non-responding _processor_ (cluster node) is declared dead. In other words, +token+ × +token_retransmits_before_loss_const+ is the maximum time a node is allowed to not respond to cluster messages before being considered dead. The default for +token+ is 1000 (1 second), with 4 allowed retransmits. These defaults are intended to minimize failover times, but can cause frequent "false alarms" and unintended failovers in case of short network interruptions. The values used here are safer, albeit with slightly extended failover times. <2> With +secauth+ enabled, Corosync nodes mutually authenticate using a 128-byte shared secret stored in +/etc/corosync/authkey+, which may be generated with the +corosync-keygen+ utility. When using +secauth+, cluster communications are also encrypted. <3> In Corosync configurations using redundant networking (with more than one +interface+), you must select a Redundant Ring Protocol (RRP) mode other than +none+. +active+ is the recommended RRP mode. <4> There are several things to note about the recommended interface configuration: * The +ringnumber+ must differ between all configured interfaces, starting with 0. * The +bindnetaddr+ is the _network_ address of the interfaces to bind to. The example uses two network addresses of +/24+ IPv4 subnets. * Multicast groups (+mcastaddr+) _must not_ be reused across cluster boundaries. In other words, no two distinct clusters should ever use the same multicast group. Be sure to select multicast addresses compliant with http://www.ietf.org/rfc/rfc2365.txt[RFC 2365, "Administratively Scoped IP Multicast"]. * For firewall configurations, note that Corosync communicates over UDP only, and uses +mcastport+ (for receives) and +mcastport+-1 (for sends). <5> The +service+ declaration for the +pacemaker+ service may be placed in the +corosync.conf+ file directly, or in its own separate file, +/etc/corosync/service.d/pacemaker+. Once created, the +corosync.conf+ file (and the +authkey+ file if the +secauth+ option is enabled) must be synchronized across all cluster nodes. ==== Starting Corosync Corosync is started as a regular system service. Depending on your distribution, it may ship with a LSB (System V style) init script, an upstart job, or a systemd unit file. Either way, the service is usually named +corosync+: * +/etc/init.d/corosync start+ (LSB) * +service corosync start+ (LSB, alternate) * +start corosync+ (upstart) * +systemctl start corosync+ (systemd) You can now check the Corosync connectivity with two tools. The +corosync-cfgtool+ utility, when invoked with the +-s+ option, gives a summary of the health of the communication rings: ---- # corosync-cfgtool -s Printing ring status. Local node ID 435324542 RING ID 0 id = 192.168.42.82 status = ring 0 active with no faults RING ID 1 id = 10.0.42.100 status = ring 1 active with no faults ---- The +corosync-objctl+ utility can be used to dump the Corosync cluster member list: ---- # corosync-objctl runtime.totem.pg.mrp.srp.members runtime.totem.pg.mrp.srp.435324542.ip=r(0) ip(192.168.42.82) r(1) ip(10.0.42.100) runtime.totem.pg.mrp.srp.435324542.join_count=1 runtime.totem.pg.mrp.srp.435324542.status=joined runtime.totem.pg.mrp.srp.983895584.ip=r(0) ip(192.168.42.87) r(1) ip(10.0.42.254) runtime.totem.pg.mrp.srp.983895584.join_count=1 runtime.totem.pg.mrp.srp.983895584.status=joined ---- You should see a +status=joined+ entry for each of your constituent cluster nodes. ==== Starting Pacemaker Once the Corosync services have been started, and you have established that the cluster is communicating properly, it is safe to start +pacemakerd+, the Pacemaker master control process: * +/etc/init.d/pacemaker start+ (LSB) * +service pacemaker start+ (LSB, alternate) * +start pacemaker+ (upstart) * +systemctl start pacemaker+ (systemd) Once Pacemaker services have started, Pacemaker will create a default empty cluster configuration with no resources. You may observe Pacemaker's status with the +crm_mon+ utility: ---- ============ Last updated: Sun Oct 7 21:07:52 2012 Last change: Sun Oct 7 20:46:00 2012 via cibadmin on node2 Stack: openais Current DC: node2 - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node2 node1 ] ---- ==== Setting basic cluster properties Once your Pacemaker cluster is set up, it is recommended to set a few basic cluster properties. To do so, start the +crm+ shell and change into the configuration menu by entering +configure+. Alternatively. you may jump straight into the Pacemaker configuration menu by typing +crm configure+ directly from a shell prompt. Then, set the following properties: ---- include::includes/pacemaker-properties.crm[] ---- <1> Setting +no-quorum-policy="ignore"+ is required in 2-node Pacemaker clusters for the following reason: if quorum enforcement is enabled, and one of the two nodes fails, then the remaining node can not establish a _majority_ of quorum votes necessary to run services, and thus it is unable to take over any resources. The appropriate workaround is to ignore loss of quorum in the cluster. This is safe and necessary _only_ in 2-node clusters. Do not set this property in Pacemaker clusters with more than two nodes. <2> Setting +pe-warn-series-max+, +pe-input-series-max+ and +pe-error-series-max+ to 1000 instructs Pacemaker to keep a longer history of the inputs processed, and errors and warnings generated, by its Policy Engine. This history is typically useful in case cluster troubleshooting becomes necessary. <3> Pacemaker uses an event-driven approach to cluster state processing. However, certain Pacemaker actions occur at a configurable interval, +cluster-recheck-interval+, which defaults to 15 minutes. It is usually prudent to reduce this to a shorter interval, such as 5 or 3 minutes. Once you have made these changes, you may +commit+ the updated configuration.