2a96da05fd
added a colon changed sentenced, moved the Change-Id: Ie8190244f13b97bbdda8b90021904696ab0046fb
168 lines
6.3 KiB
Plaintext
168 lines
6.3 KiB
Plaintext
[[s-rabbitmq]]
|
|
==== Highly available RabbitMQ
|
|
|
|
RabbitMQ is the default AMQP server used by many OpenStack
|
|
services. Making the RabbitMQ service highly available involves:
|
|
|
|
* configuring a DRBD device for use by RabbitMQ,
|
|
* configuring RabbitMQ to use a data directory residing on that DRBD
|
|
device,
|
|
* selecting and assigning a virtual IP address (VIP) that can freely
|
|
float between cluster nodes,
|
|
* configuring RabbitMQ to listen on that IP address,
|
|
* managing all resources, including the RabbitMQ daemon itself, with
|
|
the Pacemaker cluster manager.
|
|
|
|
NOTE: There is an alternative method of configuring RabbitMQ for high
|
|
availability. That approach, known as
|
|
http://www.rabbitmq.com/ha.html[active-active mirrored queues],
|
|
happens to be the one preferred by the RabbitMQ developers -- however
|
|
it has shown less than ideal consistency and reliability in OpenStack
|
|
clusters. Thus, at the time of writing, the Pacemaker/DRBD based
|
|
approach remains the recommended one for OpenStack environments,
|
|
although this may change in the near future as RabbitMQ active-active
|
|
mirrored queues mature.
|
|
|
|
===== Configuring DRBD
|
|
|
|
The Pacemaker based RabbitMQ server requires a DRBD resource from
|
|
which it mounts the +/var/lib/rabbitmq+ directory. In this example,
|
|
the DRBD resource is simply named +rabbitmq+:
|
|
|
|
.+rabbitmq+ DRBD resource configuration (+/etc/drbd.d/rabbitmq.res+)
|
|
----
|
|
include::includes/rabbitmq.res[]
|
|
----
|
|
|
|
This resource uses an underlying local disk (in DRBD terminology, a
|
|
_backing device_) named +/dev/data/rabbitmq+ on both cluster nodes,
|
|
+node1+ and +node2+. Normally, this would be an LVM Logical Volume
|
|
specifically set aside for this purpose. The DRBD +meta-disk+ is
|
|
+internal+, meaning DRBD-specific metadata is being stored at the end
|
|
of the +disk+ device itself. The device is configured to communicate
|
|
between IPv4 addresses 10.0.42.100 and 10.0.42.254, using TCP port
|
|
7701. Once enabled, it will map to a local DRBD block device with the
|
|
device minor number 1, that is, +/dev/drbd1+.
|
|
|
|
Enabling a DRBD resource is explained in detail in
|
|
http://www.drbd.org/users-guide-8.3/s-first-time-up.html[the DRBD
|
|
User's Guide]. In brief, the proper sequence of commands is this:
|
|
|
|
----
|
|
drbdadm create-md rabbitmq <1>
|
|
drbdadm up rabbitmq <2>
|
|
drbdadm -- --force primary rabbitmq <3>
|
|
----
|
|
|
|
<1> Initializes DRBD metadata and writes the initial set of metadata
|
|
to +/dev/data/rabbitmq+. Must be completed on both nodes.
|
|
|
|
<2> Creates the +/dev/drbd1+ device node, _attaches_ the DRBD device
|
|
to its backing store, and _connects_ the DRBD node to its peer. Must
|
|
be completed on both nodes.
|
|
|
|
<3> Kicks off the initial device synchronization, and puts the device
|
|
into the +primary+ (readable and writable) role. See
|
|
http://www.drbd.org/users-guide-8.3/ch-admin.html#s-roles[Resource
|
|
roles] (from the DRBD User's Guide) for a more detailed description of
|
|
the primary and secondary roles in DRBD. Must be completed _on one
|
|
node only,_ namely the one where you are about to continue with
|
|
creating your filesystem.
|
|
|
|
===== Creating a file system
|
|
|
|
Once the DRBD resource is running and in the primary role (and
|
|
potentially still in the process of running the initial device
|
|
synchronization), you may proceed with creating the filesystem for
|
|
RabbitMQ data. XFS is generally the recommended filesystem:
|
|
|
|
----
|
|
mkfs -t xfs /dev/drbd1
|
|
----
|
|
|
|
You may also use the alternate device path for the DRBD device, which
|
|
may be easier to remember as it includes the self-explanatory resource
|
|
name:
|
|
|
|
----
|
|
mkfs -t xfs /dev/drbd/by-res/rabbitmq
|
|
----
|
|
|
|
Once completed, you may safely return the device to the secondary
|
|
role. Any ongoing device synchronization will continue in the
|
|
background:
|
|
|
|
----
|
|
drbdadm secondary rabbitmq
|
|
----
|
|
|
|
===== Preparing RabbitMQ for Pacemaker high availability
|
|
|
|
In order for Pacemaker monitoring to function properly, you must
|
|
ensure that RabbitMQ's +.erlang.cookie+ files are identical on all
|
|
nodes, regardless of whether DRBD is mounted there or not. The
|
|
simplest way of doing so is to take an existing +.erlang.cookie+ from
|
|
one of your nodes, copying it to the RabbitMQ data directory on the
|
|
other node, and also copying it to the DRBD-backed filesystem.
|
|
|
|
----
|
|
node1:# scp -a /var/lib/rabbitmq/.erlang.cookie node2:/var/lib/rabbitmq/
|
|
node1:# mount /dev/drbd/by-res/rabbitmq /mnt
|
|
node1:# cp -a /var/lib/rabbitmq/.erlang.cookie /mnt
|
|
node1:# umount /mnt
|
|
----
|
|
|
|
===== Adding RabbitMQ resources to Pacemaker
|
|
|
|
You may now proceed with adding the Pacemaker configuration for
|
|
RabbitMQ resources. Connect to the Pacemaker cluster with +crm
|
|
configure+, and add the following cluster resources:
|
|
|
|
----
|
|
include::includes/pacemaker-rabbitmq.crm[]
|
|
----
|
|
|
|
This configuration creates
|
|
|
|
* +p_ip_rabbitmp+, a virtual IP address for use by RabbitMQ
|
|
(192.168.42.100),
|
|
* +p_fs_rabbitmq+, a Pacemaker managed filesystem mounted to
|
|
+/var/lib/rabbitmq+ on whatever node currently runs the RabbitMQ
|
|
service,
|
|
* +ms_drbd_rabbitmq+, the _master/slave set_ managing the +rabbitmq+
|
|
DRBD resource,
|
|
* a service +group+ and +order+ and +colocation+ constraints to ensure
|
|
resources are started on the correct nodes, and in the correct sequence.
|
|
|
|
+crm configure+ supports batch input, so you may copy and paste the
|
|
above into your live pacemaker configuration, and then make changes as
|
|
required. For example, you may enter +edit p_ip_rabbitmq+ from the
|
|
+crm configure+ menu and edit the resource to match your preferred
|
|
virtual IP address.
|
|
|
|
Once completed, commit your configuration changes by entering +commit+
|
|
from the +crm configure+ menu. Pacemaker will then start the RabbitMQ
|
|
service, and its dependent resources, on one of your nodes.
|
|
|
|
===== Configuring OpenStack services for highly available RabbitMQ
|
|
|
|
Your OpenStack services must now point their RabbitMQ configuration to
|
|
the highly available, virtual cluster IP address -- rather than a
|
|
RabbitMQ server's physical IP address as you normally would.
|
|
|
|
For OpenStack Image, for example, if your RabbitMQ service IP address is
|
|
192.168.42.100 as in the configuration explained here, you would use
|
|
the following line in your OpenStack Image API configuration file
|
|
(+glance-api.conf+):
|
|
|
|
----
|
|
rabbit_host = 192.168.42.100
|
|
----
|
|
|
|
No other changes are necessary to your OpenStack configuration. If the
|
|
node currently hosting your RabbitMQ experiences a problem
|
|
necessitating service failover, your OpenStack services may experience
|
|
a brief RabbitMQ interruption, as they would in the event of a network
|
|
hiccup, and then continue to run normally.
|
|
|