Improve chapter "HA using active/passive - The Pacemaker cluster stack"

Change-Id: I3c8fd184cdfe5da6031a3ee7b1bf78f04a4de16c
This commit is contained in:
Christian Berendt
2014-09-23 21:36:16 +02:00
parent 383e25ccdd
commit c4acf1ce6a
6 changed files with 275 additions and 278 deletions

View File

@@ -1,34 +1,31 @@
<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook"
<chapter xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="ch-pacemaker">
<title>The Pacemaker cluster stack</title>
<para>OpenStack infrastructure high availability relies on the
<link xlink:href="http://www.clusterlabs.org">Pacemaker</link> cluster stack, the
state-of-the-art high availability and load balancing stack for the
Linux platform. Pacemaker is storage and application-agnostic, and is
in no way specific to OpenStack.</para>
<para>Pacemaker relies on the <link xlink:href="http://www.corosync.org">Corosync</link> messaging
layer for reliable cluster communications. Corosync implements the
Totem single-ring ordering and membership protocol. It also provides UDP
and InfiniBand based messaging, quorum, and cluster membership to
Pacemaker.</para>
<para>Pacemaker interacts with applications through <emphasis>resource agents</emphasis> (RAs),
of which it supports over 70 natively. Pacemaker can also easily use
third-party RAs. An OpenStack high-availability configuration uses
existing native Pacemaker RAs (such as those managing MySQL
databases or virtual IP addresses), existing third-party RAs (such as
for RabbitMQ), and native OpenStack RAs (such as those managing the
OpenStack Identity and Image Services).</para>
<xi:include href="pacemaker/section_install_packages.xml"/>
<xi:include href="pacemaker/section_set_up_corosync.xml"/>
<xi:include href="pacemaker/section_starting_corosync.xml"/>
<xi:include href="pacemaker/section_start_pacemaker.xml"/>
<xi:include href="pacemaker/section_set_basic_cluster_properties.xml"/>
</chapter>
<title>The Pacemaker cluster stack</title>
<para>OpenStack infrastructure high availability relies on the
<link xlink:href="http://www.clusterlabs.org">Pacemaker</link> cluster
stack, the state-of-the-art high availability and load balancing stack
for the Linux platform. Pacemaker is storage and application-agnostic,
and is in no way specific to OpenStack.</para>
<para>Pacemaker relies on the
<link xlink:href="http://www.corosync.org">Corosync</link> messaging
layer for reliable cluster communications. Corosync implements the
Totem single-ring ordering and membership protocol. It also provides
UDP and InfiniBand based messaging, quorum, and cluster membership to
Pacemaker.</para>
<para>Pacemaker interacts with applications through resource agents
(RAs), of which it supports over 70 natively. Pacemaker can also
easily use third-party RAs. An OpenStack high-availability
configuration uses existing native Pacemaker RAs (such as those
managing MySQL databases or virtual IP addresses), existing third-party
RAs (such as for RabbitMQ), and native OpenStack RAs (such as those
managing the OpenStack Identity and Image Services).</para>
<xi:include href="pacemaker/section_install_packages.xml"/>
<xi:include href="pacemaker/section_set_up_corosync.xml"/>
<xi:include href="pacemaker/section_starting_corosync.xml"/>
<xi:include href="pacemaker/section_start_pacemaker.xml"/>
<xi:include href="pacemaker/section_set_basic_cluster_properties.xml"/>
</chapter>

View File

@@ -1,43 +1,44 @@
<section xmlns="http://docbook.org/ns/docbook"
<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="_install_packages">
<title>Install packages</title>
<para>On any host that is meant to be part of a Pacemaker cluster, you must
first establish cluster communications through the Corosync messaging
layer. This involves installing the following packages (and their
dependencies, which your package manager will normally install
automatically):</para>
<itemizedlist>
<listitem>
<para><literal>pacemaker</literal> (Note that the crm shell should be downloaded separately.)
</para>
</listitem>
<listitem>
<para>
<literal>crmsh</literal>
</para>
</listitem>
<listitem>
<para>
<literal>corosync</literal>
</para>
</listitem>
<listitem>
<para>
<literal>cluster-glue</literal>
</para>
</listitem>
<listitem>
<para><literal>fence-agents</literal> (Fedora only; all other distributions use fencing
agents from <literal>cluster-glue</literal>)
</para>
</listitem>
<listitem>
<para>
<literal>resource-agents</literal>
</para>
</listitem>
</itemizedlist>
</section>
<title>Install packages</title>
<para>On any host that is meant to be part of a Pacemaker cluster, you must
first establish cluster communications through the Corosync messaging
layer. This involves installing the following packages (and their
dependencies, which your package manager will normally install
automatically):</para>
<itemizedlist>
<listitem>
<para><package>pacemaker</package> (Note that the crm shell should be
downloaded separately.)</para>
</listitem>
<listitem>
<para>
<package>crmsh</package>
</para>
</listitem>
<listitem>
<para>
<package>corosync</package>
</para>
</listitem>
<listitem>
<para>
<package>cluster-glue</package>
</para>
</listitem>
<listitem>
<para><package>fence-agents</package> (Fedora only; all other
distributions use fencing agents from
<package>cluster-glue</package>)</para>
</listitem>
<listitem>
<para>
<package>resource-agents</package>
</para>
</listitem>
</itemizedlist>
</section>

View File

@@ -1,54 +1,54 @@
<section xmlns="http://docbook.org/ns/docbook"
<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="_set_basic_cluster_properties">
<title>Set basic cluster properties</title>
<para>Once your Pacemaker cluster is set up, it is recommended to set a few
basic cluster properties. To do so, start the <literal>crm</literal> shell and change
into the configuration menu by entering
<literal>configure</literal>. Alternatively, you may jump straight into the Pacemaker
configuration menu by typing <literal>crm configure</literal> directly from a shell
prompt.</para>
<para>Then, set the following properties:</para>
<programlisting>property no-quorum-policy="ignore" \ # <co xml:id="CO2-1"/>
<title>Set basic cluster properties</title>
<para>Once your Pacemaker cluster is set up, it is recommended to set a few
basic cluster properties. To do so, start the <command>crm</command> shell
and change into the configuration menu by entering
<literal>configure</literal>. Alternatively, you may jump straight into
the Pacemaker configuration menu by typing <command>crm configure</command>
directly from a shell prompt.</para>
<para>Then, set the following properties:</para>
<programlisting>property no-quorum-policy="ignore" \ # <co xml:id="CO2-1"/>
pe-warn-series-max="1000" \ # <co xml:id="CO2-2"/>
pe-input-series-max="1000" \
pe-error-series-max="1000" \
cluster-recheck-interval="5min" # <co xml:id="CO2-3"/></programlisting>
<calloutlist>
<callout arearefs="CO2-1">
<para>
Setting <literal>no-quorum-policy="ignore"</literal> is required in 2-node Pacemaker
clusters for the following reason: if quorum enforcement is enabled,
and one of the two nodes fails, then the remaining node can not
establish a <emphasis>majority</emphasis> of quorum votes necessary to run services, and
thus it is unable to take over any resources. In this case, the appropriate
workaround is to ignore loss of quorum in the cluster. This should only <emphasis>only</emphasis> be done in 2-node clusters: do not set this property in
Pacemaker clusters with more than two nodes. Note that a two-node cluster with this setting exposes a risk of split-brain because either half of the cluster, or both, are able to become active in the event that both nodes remain online but lose communication with one another. The preferred configuration is 3 or more nodes per cluster.
</para>
</callout>
<callout arearefs="CO2-2">
<para>
Setting <literal>pe-warn-series-max</literal>, <literal>pe-input-series-max</literal> and
<literal>pe-error-series-max</literal> to 1000 instructs Pacemaker to keep a longer
history of the inputs processed, and errors and warnings generated, by
its Policy Engine. This history is typically useful in case cluster
troubleshooting becomes necessary.
</para>
</callout>
<callout arearefs="CO2-3">
<para>
Pacemaker uses an event-driven approach to cluster state
processing. However, certain Pacemaker actions occur at a configurable
interval, <literal>cluster-recheck-interval</literal>, which defaults to 15 minutes. It
is usually prudent to reduce this to a shorter interval, such as 5 or
3 minutes.
</para>
</callout>
</calloutlist>
<para>Once you have made these changes, you may <literal>commit</literal> the updated
configuration.</para>
</section>
<calloutlist>
<callout arearefs="CO2-1">
<para>Setting <option>no-quorum-policy="ignore"</option> is required
in 2-node Pacemaker clusters for the following reason: if quorum
enforcement is enabled, and one of the two nodes fails, then the
remaining node can not establish a majority of quorum votes necessary
to run services, and thus it is unable to take over any resources. In
this case, the appropriate workaround is to ignore loss of quorum in
the cluster. This should only only be done in 2-node clusters: do not
set this property in Pacemaker clusters with more than two nodes. Note
that a two-node cluster with this setting exposes a risk of
split-brain because either half of the cluster, or both, are able to
become active in the event that both nodes remain online but lose
communication with one another. The preferred configuration is 3 or
more nodes per cluster.</para>
</callout>
<callout arearefs="CO2-2">
<para>Setting <option>pe-warn-series-max</option>,
<option>pe-input-series-max</option> and
<option>pe-error-series-max</option> to 1000 instructs Pacemaker to
keep a longer history of the inputs processed, and errors and warnings
generated, by its Policy Engine. This history is typically useful in
case cluster troubleshooting becomes necessary.</para>
</callout>
<callout arearefs="CO2-3">
<para>Pacemaker uses an event-driven approach to cluster state
processing. However, certain Pacemaker actions occur at a configurable
interval, <option>cluster-recheck-interval</option>, which defaults to
15 minutes. It is usually prudent to reduce this to a shorter interval,
such as 5 or 3 minutes.</para>
</callout>
</calloutlist>
<para>Once you have made these changes, you may <literal>commit</literal>
the updated configuration.</para>
</section>

View File

@@ -1,21 +1,20 @@
<section xmlns="http://docbook.org/ns/docbook"
<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="_set_up_corosync">
<title>Set up Corosync</title>
<para>Besides installing the <literal>corosync</literal> package, you must also
create a configuration file, stored in
<filename>/etc/corosync/corosync.conf</filename>. Most distributions ship an example
configuration file (<filename>corosync.conf.example</filename>) as part of the
documentation bundled with the <literal>corosync</literal> package. An example Corosync
configuration file is shown below:</para>
<formalpara>
<title>Corosync configuration file (<filename>corosync.conf</filename>)</title>
<para>
<programlisting>totem {
<title>Set up Corosync</title>
<para>Besides installing the <package>corosync</package> package, you must
also create a configuration file, stored in
<filename>/etc/corosync/corosync.conf</filename>. Most distributions ship
an example configuration file (<filename>corosync.conf.example</filename>)
as part of the documentation bundled with the <package>corosync</package>
package. An example Corosync configuration file is shown below:</para>
<formalpara>
<title>Corosync configuration file (<filename>corosync.conf</filename>)</title>
<para>
<programlisting language="ini">totem {
version: 2
# Time (in ms) to wait for a token <co xml:id="CO1-1"/>
@@ -80,87 +79,77 @@ logging {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}</programlisting>
</para>
</formalpara>
<calloutlist>
<callout arearefs="CO1-1">
<para>
The <literal>token</literal> value specifies the time, in milliseconds, during
which the Corosync token is expected to be transmitted around the
ring. When this timeout expires, the token is declared lost, and after
<literal>token_retransmits_before_loss_const</literal> lost tokens the non-responding
<emphasis>processor</emphasis> (cluster node) is declared dead. In other words,
<literal>token</literal> × <literal>token_retransmits_before_loss_const</literal> is the maximum
time a node is allowed to not respond to cluster messages before being
considered dead. The default for <literal>token</literal> is 1000 (1 second), with 4
allowed retransmits. These defaults are intended to minimize failover
times, but can cause frequent "false alarms" and unintended failovers
in case of short network interruptions. The values used here are
safer, albeit with slightly extended failover times.
</para>
</callout>
<callout arearefs="CO1-2">
<para>
With <literal>secauth</literal> enabled, Corosync nodes mutually authenticate using
a 128-byte shared secret stored in <literal>/etc/corosync/authkey</literal>, which may
be generated with the <literal>corosync-keygen</literal> utility. When using <literal>secauth</literal>,
cluster communications are also encrypted.
</para>
</callout>
<callout arearefs="CO1-3">
<para>
In Corosync configurations using redundant networking (with more
than one <literal>interface</literal>), you must select a Redundant Ring Protocol (RRP)
mode other than <literal>none</literal>. <literal>active</literal> is the recommended RRP mode.
</para>
</callout>
<callout arearefs="CO1-4">
<para>
There are several things to note about the recommended interface
configuration:
</para>
<itemizedlist>
<listitem>
<para>
The <literal>ringnumber</literal> must differ between all configured interfaces,
starting with 0.
</para>
</listitem>
<listitem>
<para>
The <literal>bindnetaddr</literal> is the <emphasis>network</emphasis> address of the interfaces to bind
to. The example uses two network addresses of <literal>/24</literal> IPv4 subnets.
</para>
</listitem>
<listitem>
<para>
Multicast groups (<literal>mcastaddr</literal>) <emphasis>must not</emphasis> be reused across cluster
boundaries. In other words, no two distinct clusters should ever use
the same multicast group. Be sure to select multicast addresses
compliant with <link xlink:href="http://www.ietf.org/rfc/rfc2365.txt">RFC 2365,
"Administratively Scoped IP Multicast"</link>.
</para>
</listitem>
<listitem>
<para>
For firewall configurations, note that Corosync communicates over
UDP only, and uses <literal>mcastport</literal> (for receives) and <literal>mcastport</literal>-1 (for
sends).
</para>
</listitem>
</itemizedlist>
</callout>
<callout arearefs="CO1-5">
<para>
The <literal>service</literal> declaration for the <literal>pacemaker</literal> service may be
placed in the <filename>corosync.conf</filename> file directly, or in its own separate
file, <filename>/etc/corosync/service.d/pacemaker</filename>.
</para>
</callout>
</calloutlist>
<para>Once created, the <filename>corosync.conf</filename> file (and the <filename>authkey</filename> file if the
<literal>secauth</literal> option is enabled) must be synchronized across all cluster
nodes.</para>
</section>
}}</programlisting>
</para>
</formalpara>
<calloutlist>
<callout arearefs="CO1-1">
<para>The <option>token</option> value specifies the time, in
milliseconds, during which the Corosync token is expected to be
transmitted around the ring. When this timeout expires, the token is
declared lost, and after <option>token_retransmits_before_loss_const</option>
lost tokens the non-responding processor (cluster node) is declared
dead. In other words,
<option>token</option> × <option>token_retransmits_before_loss_const</option>
is the maximum time a node is allowed to not respond to cluster
messages before being considered dead. The default for
<option>token</option> is 1000 (1 second), with 4 allowed
retransmits. These defaults are intended to minimize failover times,
but can cause frequent "false alarms" and unintended failovers in case
of short network interruptions. The values used here are safer, albeit
with slightly extended failover times.</para>
</callout>
<callout arearefs="CO1-2">
<para>With <option>secauth</option> enabled, Corosync nodes mutually
authenticate using a 128-byte shared secret stored in
<filename>/etc/corosync/authkey</filename>, which may be generated with
the <command>corosync-keygen</command> utility. When using
<option>secauth</option>, cluster communications are also
encrypted.</para>
</callout>
<callout arearefs="CO1-3">
<para>In Corosync configurations using redundant networking (with more
than one <option>interface</option>), you must select a Redundant
Ring Protocol (RRP) mode other than <literal>none</literal>.
<literal>active</literal> is the recommended RRP mode.</para>
</callout>
<callout arearefs="CO1-4">
<para>There are several things to note about the recommended interface
configuration:</para>
<itemizedlist>
<listitem>
<para>The <option>ringnumber</option> must differ between all
configured interfaces, starting with 0.</para>
</listitem>
<listitem>
<para>The <option>bindnetaddr</option> is the network address of
the interfaces to bind to. The example uses two network addresses
of <literal>/24</literal> IPv4 subnets.</para>
</listitem>
<listitem>
<para>Multicast groups (<option>mcastaddr</option>) must not be
reused across cluster boundaries. In other words, no two distinct
clusters should ever use the same multicast group. Be sure to
select multicast addresses compliant with
<link xlink:href="http://www.ietf.org/rfc/rfc2365.txt">RFC 2365,
"Administratively Scoped IP Multicast"</link>.</para>
</listitem>
<listitem>
<para>For firewall configurations, note that Corosync communicates
over UDP only, and uses <literal>mcastport</literal> (for receives)
and <literal>mcastport - 1</literal> (for sends).</para>
</listitem>
</itemizedlist>
</callout>
<callout arearefs="CO1-5">
<para>The <literal>service</literal> declaration for the
<literal>pacemaker</literal> service may be placed in the
<filename>corosync.conf</filename> file directly, or in its own
separate file,
<filename>/etc/corosync/service.d/pacemaker</filename>.</para>
</callout>
</calloutlist>
<para>Once created, the <filename>corosync.conf</filename> file (and the
<filename>authkey</filename> file if the <option>secauth</option> option
is enabled) must be synchronized across all cluster nodes.</para>
</section>

View File

@@ -1,34 +1,40 @@
<section xmlns="http://docbook.org/ns/docbook"
<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="_start_pacemaker">
<title>Start Pacemaker</title>
<para>Once the Corosync services have been started and you have established
that the cluster is communicating properly, it is safe to start
<literal>pacemakerd</literal>, the Pacemaker master control process:</para>
<itemizedlist>
<listitem>
<para><literal>/etc/init.d/pacemaker start</literal> (LSB)
</para>
</listitem>
<listitem>
<para><literal>service pacemaker start</literal> (LSB, alternate)
</para>
</listitem>
<listitem>
<para><literal>start pacemaker</literal> (upstart)
</para>
</listitem>
<listitem>
<para><literal>systemctl start pacemaker</literal> (systemd)
</para>
</listitem>
</itemizedlist>
<para>Once Pacemaker services have started, Pacemaker will create a default
empty cluster configuration with no resources. You may observe
Pacemakers status with the <literal>crm_mon</literal> utility:</para>
<screen><computeroutput>============
<title>Start Pacemaker</title>
<para>Once the Corosync services have been started and you have established
that the cluster is communicating properly, it is safe to start
<systemitem class="service">pacemakerd</systemitem>, the Pacemaker
master control process:</para>
<itemizedlist>
<listitem>
<para>
<command>/etc/init.d/pacemaker start</command> (LSB)
</para>
</listitem>
<listitem>
<para>
<command>service pacemaker start</command> (LSB, alternate)
</para>
</listitem>
<listitem>
<para>
<command>start pacemaker</command> (upstart)
</para>
</listitem>
<listitem>
<para>
<command>systemctl start pacemaker</command> (systemd)
</para>
</listitem>
</itemizedlist>
<para>Once Pacemaker services have started, Pacemaker will create a default
empty cluster configuration with no resources. You may observe
Pacemaker's status with the <command>crm_mon</command> utility:</para>
<screen><computeroutput>============
Last updated: Sun Oct 7 21:07:52 2012
Last change: Sun Oct 7 20:46:00 2012 via cibadmin on node2
Stack: openais
@@ -39,4 +45,4 @@ Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
============
Online: [ node2 node1 ]</computeroutput></screen>
</section>
</section>

View File

@@ -1,38 +1,42 @@
<section xmlns="http://docbook.org/ns/docbook"
<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="_starting_corosync">
<title>Starting Corosync</title>
<para>Corosync is started as a regular system service. Depending on your
distribution, it may ship with an LSB init script, an
upstart job, or a systemd unit file. Either way, the service is
usually named <literal>corosync</literal>:</para>
<itemizedlist>
<listitem>
<para><literal>/etc/init.d/corosync start</literal> (LSB)
</para>
</listitem>
<listitem>
<para><literal>service corosync start</literal> (LSB, alternate)
</para>
</listitem>
<listitem>
<para><literal>start corosync</literal> (upstart)
</para>
</listitem>
<listitem>
<para><literal>systemctl start corosync</literal> (systemd)
</para>
</listitem>
</itemizedlist>
<para>You can now check the Corosync connectivity with two tools.</para>
<para>The <literal>corosync-cfgtool</literal> utility, when invoked with the <literal>-s</literal> option,
gives a summary of the health of the communication rings:</para>
<screen><prompt>#</prompt> <userinput>corosync-cfgtool -s</userinput>
<computeroutput>Printing ring status.
<title>Starting Corosync</title>
<para>Corosync is started as a regular system service. Depending on your
distribution, it may ship with an LSB init script, an
upstart job, or a systemd unit file. Either way, the service is
usually named <systemitem class="service">corosync</systemitem>:</para>
<itemizedlist>
<listitem>
<para>
<command>/etc/init.d/corosync start</command> (LSB)
</para>
</listitem>
<listitem>
<para>
<command>service corosync start</command> (LSB, alternate)
</para>
</listitem>
<listitem>
<para>
<command>start corosync</command> (upstart)
</para>
</listitem>
<listitem>
<para>
<command>systemctl start corosync</command> (systemd)
</para>
</listitem>
</itemizedlist>
<para>You can now check the Corosync connectivity with two tools.</para>
<para>The <command>corosync-cfgtool</command> utility, when invoked with
the <option>-s</option> option, gives a summary of the health of the
communication rings:</para>
<screen><prompt>#</prompt> <userinput>corosync-cfgtool -s</userinput>
<computeroutput>Printing ring status.
Local node ID 435324542
RING ID 0
id = 192.168.42.82
@@ -40,15 +44,15 @@ RING ID 0
RING ID 1
id = 10.0.42.100
status = ring 1 active with no faults</computeroutput></screen>
<para>The <literal>corosync-objctl</literal> utility can be used to dump the Corosync cluster
member list:</para>
<screen><prompt>#</prompt> <userinput>corosync-objctl runtime.totem.pg.mrp.srp.members</userinput>
<computeroutput>runtime.totem.pg.mrp.srp.435324542.ip=r(0) ip(192.168.42.82) r(1) ip(10.0.42.100)
<para>The <command>corosync-objctl</command> utility can be used to dump the
Corosync cluster member list:</para>
<screen><prompt>#</prompt> <userinput>corosync-objctl runtime.totem.pg.mrp.srp.members</userinput>
<computeroutput>runtime.totem.pg.mrp.srp.435324542.ip=r(0) ip(192.168.42.82) r(1) ip(10.0.42.100)
runtime.totem.pg.mrp.srp.435324542.join_count=1
runtime.totem.pg.mrp.srp.435324542.status=joined
runtime.totem.pg.mrp.srp.983895584.ip=r(0) ip(192.168.42.87) r(1) ip(10.0.42.254)
runtime.totem.pg.mrp.srp.983895584.join_count=1
runtime.totem.pg.mrp.srp.983895584.status=joined</computeroutput></screen>
<para>You should see a <literal>status=joined</literal> entry for each of your constituent
cluster nodes.</para>
</section>
<para>You should see a <literal>status=joined</literal> entry for each of
your constituent cluster nodes.</para>
</section>