ha-guide/doc/high-availability-guide/controller/section_rabbitmq.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE section [
<!ENTITY % openstack SYSTEM "../openstack.ent">
%openstack;
]>
<section xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="s-rabbitmq">
  <title>Highly available RabbitMQ</title>
  <para>RabbitMQ is the default AMQP server used by many OpenStack
    services. Making the RabbitMQ service highly available involves:</para>
  <itemizedlist>
    <listitem>
      <para>configuring a DRBD device for use by RabbitMQ,</para>
    </listitem>
    <listitem>
      <para>configuring RabbitMQ to use a data directory residing on
        that DRBD device,</para>
    </listitem>
    <listitem>
      <para>selecting and assigning a virtual IP address (VIP) that can freely
        float between cluster nodes,</para>
    </listitem>
    <listitem>
      <para>configuring RabbitMQ to listen on that IP address,</para>
    </listitem>
    <listitem>
      <para>managing all resources, including the RabbitMQ daemon itself, with
        the Pacemaker cluster manager.</para>
    </listitem>
  </itemizedlist>
  <note>
    <para><link xlink:href="http://www.rabbitmq.com/ha.html">Active-active mirrored queues</link>
      is another method for configuring RabbitMQ versions 3.3.0 and later
      for high availability. You can also manage a RabbitMQ cluster with
      active-active mirrored queues using the Pacemaker cluster manager.</para>
  </note>
  <section xml:id="_configure_drbd_2">
    <title>Configure DRBD</title>
    <para>The Pacemaker based RabbitMQ server requires a DRBD resource from
      which it mounts the <filename>/var/lib/rabbitmq</filename> directory.
      In this example, the DRBD resource is simply named
      <literal>rabbitmq</literal>:</para>
    <formalpara>
      <title><literal>rabbitmq</literal> DRBD resource configuration
        (<filename>/etc/drbd.d/rabbitmq.res</filename>)</title>
      <para>
        <programlisting>resource rabbitmq {
  device    minor 1;
  disk      "/dev/data/rabbitmq";
  meta-disk internal;
  on node1 {
    address ipv4 10.0.42.100:7701;
  }
  on node2 {
    address ipv4 10.0.42.254:7701;
  }
}</programlisting>
      </para>
    </formalpara>
    <para>This resource uses an underlying local disk (in DRBD terminology, a
      backing device) named <filename>/dev/data/rabbitmq</filename> on both
      cluster nodes, <literal>node1</literal> and <literal>node2</literal>.
      Normally, this would be an LVM Logical Volume specifically set aside for
      this purpose. The DRBD meta-disk is internal, meaning DRBD-specific
      metadata is being stored at the end of the disk device itself. The device
      is configured to communicate between IPv4 addresses
      <literal>10.0.42.100</literal> and <literal>10.0.42.254</literal>,
      using TCP port 7701. Once enabled, it will map to a local DRBD block
      device with the device minor number 1, that is,
      <filename>/dev/drbd1</filename>.</para>
    <para>Enabling a DRBD resource is explained in detail in
      <link xlink:href="http://www.drbd.org/users-guide-8.3/s-first-time-up.html">the DRBD
      User's Guide</link>. In brief, the proper sequence of commands is this:</para>
    <screen><prompt>#</prompt> <userinput>drbdadm create-md rabbitmq</userinput><co xml:id="CO4-1"/>
<prompt>#</prompt> <userinput>drbdadm up rabbitmq</userinput><co xml:id="CO4-2"/>
<prompt>#</prompt> <userinput>drbdadm -- --force primary rabbitmq</userinput><co xml:id="CO4-3"/></screen>
    <calloutlist>
      <callout arearefs="CO4-1">
        <para>Initializes DRBD metadata and writes the initial set of
          metadata to <filename>/dev/data/rabbitmq</filename>. Must be
          completed on both nodes.</para>
      </callout>
      <callout arearefs="CO4-2">
        <para>Creates the <filename>/dev/drbd1</filename> device node,
          attaches the DRBD device to its backing store, and connects
          the DRBD node to its peer. Must be completed on both nodes.</para>
      </callout>
      <callout arearefs="CO4-3">
        <para>Kicks off the initial device synchronization, and puts the
          device into the <literal>primary</literal> (readable and writable)
          role. See <link xlink:href="http://www.drbd.org/users-guide-8.3/ch-admin.html#s-roles">
          Resource roles</link> (from the DRBD User's Guide) for a more
          detailed description of the primary and secondary roles in DRBD.
          Must be completed on one node only, namely the one where you
          are about to continue with creating your filesystem.</para>
      </callout>
    </calloutlist>
  </section>
  <section xml:id="_create_a_file_system">
    <title>Create a file system</title>
    <para>Once the DRBD resource is running and in the primary role (and
      potentially still in the process of running the initial device
      synchronization), you may proceed with creating the filesystem for
      RabbitMQ data. XFS is generally the recommended filesystem:</para>
    <screen><prompt>#</prompt> <userinput>mkfs -t xfs /dev/drbd1</userinput></screen>
    <para>You may also use the alternate device path for the DRBD device,
      which may be easier to remember as it includes the self-explanatory
      resource name:</para>
    <screen><prompt>#</prompt> <userinput>mkfs -t xfs /dev/drbd/by-res/rabbitmq</userinput></screen>
    <para>Once completed, you may safely return the device to the secondary
      role. Any ongoing device synchronization will continue in the
      background:</para>
    <screen><prompt>#</prompt> <userinput>drbdadm secondary rabbitmq</userinput></screen>
  </section>
  <section xml:id="_prepare_rabbitmq_for_pacemaker_high_availability">
    <title>Prepare RabbitMQ for Pacemaker high availability</title>
    <para>In order for Pacemaker monitoring to function properly, you must
      ensure that RabbitMQ's <filename>.erlang.cookie</filename> files
      are identical on all nodes, regardless of whether DRBD is mounted
      there or not. The simplest way of doing so is to take an existing
      <filename>.erlang.cookie</filename> from one of your nodes, copying
      it to the RabbitMQ data directory on the other node, and also
      copying it to the DRBD-backed filesystem.</para>
    <screen><prompt>#</prompt> <userinput>scp -p /var/lib/rabbitmq/.erlang.cookie node2:/var/lib/rabbitmq/</userinput>
<prompt>#</prompt> <userinput>mount /dev/drbd/by-res/rabbitmq /mnt</userinput>
<prompt>#</prompt> <userinput>cp -a /var/lib/rabbitmq/.erlang.cookie /mnt</userinput>
<prompt>#</prompt> <userinput>umount /mnt</userinput></screen>
  </section>
  <section xml:id="_add_rabbitmq_resources_to_pacemaker">
    <title>Add RabbitMQ resources to Pacemaker</title>
    <para>You may now proceed with adding the Pacemaker configuration for
      RabbitMQ resources. Connect to the Pacemaker cluster with
      <command>crm configure</command>, and add the following cluster
      resources:</para>
    <programlisting>primitive p_ip_rabbitmq ocf:heartbeat:IPaddr2 \
  params ip="192.168.42.100" cidr_netmask="24" \
  op monitor interval="10s"
primitive p_drbd_rabbitmq ocf:linbit:drbd \
  params drbd_resource="rabbitmq" \
  op start timeout="90s" \
  op stop timeout="180s" \
  op promote timeout="180s" \
  op demote timeout="180s" \
  op monitor interval="30s" role="Slave" \
  op monitor interval="29s" role="Master"
primitive p_fs_rabbitmq ocf:heartbeat:Filesystem \
  params device="/dev/drbd/by-res/rabbitmq" \
    directory="/var/lib/rabbitmq" \
    fstype="xfs" options="relatime" \
  op start timeout="60s" \
  op stop timeout="180s" \
  op monitor interval="60s" timeout="60s"
primitive p_rabbitmq ocf:rabbitmq:rabbitmq-server \
  params nodename="rabbit@localhost" \
    mnesia_base="/var/lib/rabbitmq" \
  op monitor interval="20s" timeout="10s"
group g_rabbitmq p_ip_rabbitmq p_fs_rabbitmq p_rabbitmq
ms ms_drbd_rabbitmq p_drbd_rabbitmq \
  meta notify="true" master-max="1" clone-max="2"
colocation c_rabbitmq_on_drbd inf: g_rabbitmq ms_drbd_rabbitmq:Master
order o_drbd_before_rabbitmq inf: ms_drbd_rabbitmq:promote g_rabbitmq:start</programlisting>
    <para>This configuration creates</para>
    <itemizedlist>
      <listitem>
        <para><literal>p_ip_rabbitmq</literal>, a virtual IP address for
          use by RabbitMQ (<literal>192.168.42.100</literal>),</para>
      </listitem>
      <listitem>
        <para><literal>p_fs_rabbitmq</literal>, a Pacemaker managed
          filesystem mounted to <filename>/var/lib/rabbitmq</filename>
          on whatever node currently runs the RabbitMQ service,</para>
      </listitem>
      <listitem>
        <para><literal>ms_drbd_rabbitmq</literal>, the master/slave set
          managing the <literal>rabbitmq</literal> DRBD resource,</para>
      </listitem>
      <listitem>
        <para>a service group and order and colocation constraints to
          ensure resources are started on the correct nodes, and in the
          correct sequence.</para>
      </listitem>
    </itemizedlist>
    <para><command>crm configure</command> supports batch input, so you
      may copy and paste the above into your live pacemaker configuration,
      and then make changes as required. For example, you may enter
      <literal>edit p_ip_rabbitmq</literal> from the
      <command>crm configure</command> menu and edit the resource to
      match your preferred virtual IP address.</para>
    <para>Once completed, commit your configuration changes by entering
      <literal>commit</literal> from the <command>crm configure</command>
      menu. Pacemaker will then start the RabbitMQ service, and its
      dependent resources, on one of your nodes.</para>
  </section>
  <section xml:id="_configure_openstack_services_for_highly_available_rabbitmq">
    <title>Configure OpenStack services for highly available RabbitMQ</title>
    <para>Your OpenStack services must now point their RabbitMQ
      configuration to the highly available, virtual cluster IP
      address&mdash;rather than a RabbitMQ server's physical IP address
      as you normally would.</para>
    <para>For OpenStack Image, for example, if your RabbitMQ service
      IP address is <literal>192.168.42.100</literal> as in the
      configuration explained here, you would use the following line
      in your OpenStack Image API configuration file
      (<filename>glance-api.conf</filename>):</para>
    <programlisting language="ini">rabbit_host = 192.168.42.100</programlisting>
    <para>No other changes are necessary to your OpenStack configuration.
      If the node currently hosting your RabbitMQ experiences a problem
      necessitating service failover, your OpenStack services may
      experience a brief RabbitMQ interruption, as they would in the
      event of a network hiccup, and then continue to run normally.</para>
  </section>
</section>