1994580fa4
Change-Id: I5ec78bca5832b31fa7bfe2528d5042bc278e42e9 Closes-Bug: 1279187
211 lines
8.1 KiB
Plaintext
211 lines
8.1 KiB
Plaintext
[[ch-pacemaker]]
|
|
=== The Pacemaker Cluster Stack
|
|
|
|
OpenStack infrastructure high availability relies on the
|
|
http://www.clusterlabs.org[Pacemaker] cluster stack, the
|
|
state-of-the-art high availability and load balancing stack for the
|
|
Linux platform. Pacemaker is storage and application-agnostic, and is
|
|
in no way specific to OpenStack.
|
|
|
|
Pacemaker relies on the http://www.corosync.org[Corosync] messaging
|
|
layer for reliable cluster communications. Corosync implements the
|
|
Totem single-ring ordering and membership protocol. It also provides UDP
|
|
and InfiniBand based messaging, quorum, and cluster membership to
|
|
Pacemaker.
|
|
|
|
Pacemaker interacts with applications through _resource agents_ (RAs),
|
|
of which it supports over 70 natively. Pacemaker can also easily use
|
|
third-party RAs. An OpenStack high-availability configuration uses
|
|
existing native Pacemaker RAs (such as those managing MySQL
|
|
databases or virtual IP addresses), existing third-party RAs (such as
|
|
for RabbitMQ), and native OpenStack RAs (such as those managing the
|
|
OpenStack Identity and Image Services).
|
|
|
|
==== Installing Packages
|
|
|
|
On any host that is meant to be part of a Pacemaker cluster, you must
|
|
first establish cluster communications through the Corosync messaging
|
|
layer. This involves installing the following packages (and their
|
|
dependencies, which your package manager will normally install
|
|
automatically):
|
|
|
|
* +pacemaker+ Note that the crm shell should be downloaded separately.
|
|
* +crmsh+
|
|
* +corosync+
|
|
* +cluster-glue+
|
|
* +fence-agents+ (Fedora only; all other distributions use fencing
|
|
agents from +cluster-glue+)
|
|
* +resource-agents+
|
|
|
|
==== Setting up Corosync
|
|
|
|
Besides installing the +corosync+ package, you will also have to
|
|
create a configuration file, stored in
|
|
+/etc/corosync/corosync.conf+. Most distributions ship an example
|
|
configuration file (+corosync.conf.example+) as part of the
|
|
documentation bundled with the +corosync+ package. An example Corosync
|
|
configuration file is shown below:
|
|
|
|
.Corosync configuration file (+corosync.conf+)
|
|
----
|
|
include::includes/corosync.conf[]
|
|
----
|
|
|
|
<1> The +token+ value specifies the time, in milliseconds, during
|
|
which the Corosync token is expected to be transmitted around the
|
|
ring. When this timeout expires, the token is declared lost, and after
|
|
+token_retransmits_before_loss_const+ lost tokens the non-responding
|
|
_processor_ (cluster node) is declared dead. In other words,
|
|
+token+ × +token_retransmits_before_loss_const+ is the maximum
|
|
time a node is allowed to not respond to cluster messages before being
|
|
considered dead. The default for +token+ is 1000 (1 second), with 4
|
|
allowed retransmits. These defaults are intended to minimize failover
|
|
times, but can cause frequent "false alarms" and unintended failovers
|
|
in case of short network interruptions. The values used here are
|
|
safer, albeit with slightly extended failover times.
|
|
|
|
<2> With +secauth+ enabled, Corosync nodes mutually authenticate using
|
|
a 128-byte shared secret stored in +/etc/corosync/authkey+, which may
|
|
be generated with the +corosync-keygen+ utility. When using +secauth+,
|
|
cluster communications are also encrypted.
|
|
|
|
<3> In Corosync configurations using redundant networking (with more
|
|
than one +interface+), you must select a Redundant Ring Protocol (RRP)
|
|
mode other than +none+. +active+ is the recommended RRP mode.
|
|
|
|
<4> There are several things to note about the recommended interface
|
|
configuration:
|
|
* The +ringnumber+ must differ between all configured interfaces,
|
|
starting with 0.
|
|
* The +bindnetaddr+ is the _network_ address of the interfaces to bind
|
|
to. The example uses two network addresses of +/24+ IPv4 subnets.
|
|
* Multicast groups (+mcastaddr+) _must not_ be reused across cluster
|
|
boundaries. In other words, no two distinct clusters should ever use
|
|
the same multicast group. Be sure to select multicast addresses
|
|
compliant with http://www.ietf.org/rfc/rfc2365.txt[RFC 2365,
|
|
"Administratively Scoped IP Multicast"].
|
|
* For firewall configurations, note that Corosync communicates over
|
|
UDP only, and uses +mcastport+ (for receives) and +mcastport+-1 (for
|
|
sends).
|
|
|
|
<5> The +service+ declaration for the +pacemaker+ service may be
|
|
placed in the +corosync.conf+ file directly, or in its own separate
|
|
file, +/etc/corosync/service.d/pacemaker+.
|
|
|
|
Once created, the +corosync.conf+ file (and the +authkey+ file if the
|
|
+secauth+ option is enabled) must be synchronized across all cluster
|
|
nodes.
|
|
|
|
==== Starting Corosync
|
|
|
|
Corosync is started as a regular system service. Depending on your
|
|
distribution, it may ship with a LSB (System V style) init script, an
|
|
upstart job, or a systemd unit file. Either way, the service is
|
|
usually named +corosync+:
|
|
|
|
* +/etc/init.d/corosync start+ (LSB)
|
|
* +service corosync start+ (LSB, alternate)
|
|
* +start corosync+ (upstart)
|
|
* +systemctl start corosync+ (systemd)
|
|
|
|
You can now check the Corosync connectivity with two tools.
|
|
|
|
The +corosync-cfgtool+ utility, when invoked with the +-s+ option,
|
|
gives a summary of the health of the communication rings:
|
|
|
|
----
|
|
# corosync-cfgtool -s
|
|
Printing ring status.
|
|
Local node ID 435324542
|
|
RING ID 0
|
|
id = 192.168.42.82
|
|
status = ring 0 active with no faults
|
|
RING ID 1
|
|
id = 10.0.42.100
|
|
status = ring 1 active with no faults
|
|
----
|
|
|
|
The +corosync-objctl+ utility can be used to dump the Corosync cluster
|
|
member list:
|
|
|
|
----
|
|
# corosync-objctl runtime.totem.pg.mrp.srp.members
|
|
runtime.totem.pg.mrp.srp.435324542.ip=r(0) ip(192.168.42.82) r(1) ip(10.0.42.100)
|
|
runtime.totem.pg.mrp.srp.435324542.join_count=1
|
|
runtime.totem.pg.mrp.srp.435324542.status=joined
|
|
runtime.totem.pg.mrp.srp.983895584.ip=r(0) ip(192.168.42.87) r(1) ip(10.0.42.254)
|
|
runtime.totem.pg.mrp.srp.983895584.join_count=1
|
|
runtime.totem.pg.mrp.srp.983895584.status=joined
|
|
----
|
|
|
|
You should see a +status=joined+ entry for each of your constituent
|
|
cluster nodes.
|
|
|
|
==== Starting Pacemaker
|
|
|
|
Once the Corosync services have been started, and you have established
|
|
that the cluster is communicating properly, it is safe to start
|
|
+pacemakerd+, the Pacemaker master control process:
|
|
|
|
* +/etc/init.d/pacemaker start+ (LSB)
|
|
* +service pacemaker start+ (LSB, alternate)
|
|
* +start pacemaker+ (upstart)
|
|
* +systemctl start pacemaker+ (systemd)
|
|
|
|
Once Pacemaker services have started, Pacemaker will create a default
|
|
empty cluster configuration with no resources. You may observe
|
|
Pacemaker's status with the +crm_mon+ utility:
|
|
|
|
----
|
|
============
|
|
Last updated: Sun Oct 7 21:07:52 2012
|
|
Last change: Sun Oct 7 20:46:00 2012 via cibadmin on node2
|
|
Stack: openais
|
|
Current DC: node2 - partition with quorum
|
|
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
|
|
2 Nodes configured, 2 expected votes
|
|
0 Resources configured.
|
|
============
|
|
|
|
Online: [ node2 node1 ]
|
|
----
|
|
|
|
==== Setting basic cluster properties
|
|
|
|
Once your Pacemaker cluster is set up, it is recommended to set a few
|
|
basic cluster properties. To do so, start the +crm+ shell and change
|
|
into the configuration menu by entering
|
|
+configure+. Alternatively. you may jump straight into the Pacemaker
|
|
configuration menu by typing +crm configure+ directly from a shell
|
|
prompt.
|
|
|
|
Then, set the following properties:
|
|
|
|
----
|
|
include::includes/pacemaker-properties.crm[]
|
|
----
|
|
|
|
<1> Setting +no-quorum-policy="ignore"+ is required in 2-node Pacemaker
|
|
clusters for the following reason: if quorum enforcement is enabled,
|
|
and one of the two nodes fails, then the remaining node can not
|
|
establish a _majority_ of quorum votes necessary to run services, and
|
|
thus it is unable to take over any resources. The appropriate
|
|
workaround is to ignore loss of quorum in the cluster. This is safe
|
|
and necessary _only_ in 2-node clusters. Do not set this property in
|
|
Pacemaker clusters with more than two nodes.
|
|
|
|
<2> Setting +pe-warn-series-max+, +pe-input-series-max+ and
|
|
+pe-error-series-max+ to 1000 instructs Pacemaker to keep a longer
|
|
history of the inputs processed, and errors and warnings generated, by
|
|
its Policy Engine. This history is typically useful in case cluster
|
|
troubleshooting becomes necessary.
|
|
|
|
<3> Pacemaker uses an event-driven approach to cluster state
|
|
processing. However, certain Pacemaker actions occur at a configurable
|
|
interval, +cluster-recheck-interval+, which defaults to 15 minutes. It
|
|
is usually prudent to reduce this to a shorter interval, such as 5 or
|
|
3 minutes.
|
|
|
|
Once you have made these changes, you may +commit+ the updated
|
|
configuration.
|