This is needed because it was discovered that the "admin guide" link from upstream (docs.openstack.org) cannot be a redirect (to our current howto dir). I like the new organisation better anyway. Change-Id: I516429a93563205148d18b96e32d428d424a681a
99 lines
4.0 KiB
ReStructuredText
99 lines
4.0 KiB
ReStructuredText
:orphan:
|
|
|
|
=======================
|
|
Replace a RabbitMQ node
|
|
=======================
|
|
|
|
Introduction
|
|
~~~~~~~~~~~~
|
|
|
|
There may be times when a cloud's RabbitMQ cluster experiences difficulties.
|
|
When this happens one option to bringing the cluster back to a healthy state is
|
|
to replace one of its nodes.
|
|
|
|
.. important::
|
|
|
|
This procedure will not result in cloud downtime providing that there is at
|
|
least one functional RabbitMQ node present.
|
|
|
|
However, it will trigger the restart of every client application (cloud
|
|
service) that has a relation to the rabbitmq-server application. This may
|
|
have adverse effects on the cloud's ability to service its current workload.
|
|
|
|
The above issue can be mitigated through the use of :doc:`Deferred service
|
|
events <deferred-events>`. If restarts are deferred, a client will
|
|
nonetheless experience a slight delay (due to timeouts) if it tries to
|
|
contact a non-existent node.
|
|
|
|
.. note::
|
|
|
|
An alternative approach to replacing a node is to repair it. See operation
|
|
:doc:`Repair a RabbitMQ node <ops-repair-rabbitmq-node>`, which includes a
|
|
comparison of the repair and replacement methods.
|
|
|
|
Initial state of the cluster
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
For this example, the :command:`juju status rabbitmq-server` command describes
|
|
the initial state of the cluster:
|
|
|
|
.. code-block:: console
|
|
|
|
Model Controller Cloud/Region Version SLA Timestamp
|
|
openstack maas-controller maas-one/default 2.9.18 unsupported 23:18:34Z
|
|
|
|
App Version Status Scale Charm Store Channel Rev OS Message
|
|
rabbitmq-server 3.8.2 active 3 rabbitmq-server charmstore stable 118 ubuntu Unit is ready and clustered
|
|
|
|
Unit Workload Agent Machine Public address Ports Message
|
|
rabbitmq-server/0 error idle 0/lxd/0 10.0.0.177 5672/tcp ????????????????
|
|
rabbitmq-server/1* active idle 1/lxd/0 10.0.0.175 5672/tcp Unit is ready and clustered
|
|
rabbitmq-server/2 active idle 2/lxd/0 10.0.0.176 5672/tcp Unit is ready and clustered
|
|
|
|
Machine State DNS Inst id Series AZ Message
|
|
0 started 10.0.0.172 node2 focal default Deployed
|
|
0/lxd/0 started 10.0.0.177 juju-64dabb-0-lxd-0 focal default Container started
|
|
1 started 10.0.0.173 node4 focal default Deployed
|
|
1/lxd/0 started 10.0.0.175 juju-64dabb-1-lxd-0 focal default Container started
|
|
2 started 10.0.0.174 node3 focal default Deployed
|
|
2/lxd/0 started 10.0.0.176 juju-64dabb-2-lxd-0 focal default Container started
|
|
|
|
In this example the unhealthy node is assumed to be ``rabbitmq-server/0`` with
|
|
an arbitrary message indicative of a problem. This is the node that is to be
|
|
replaced.
|
|
|
|
Replace the unhealthy node
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
First remove the unhealthy node:
|
|
|
|
.. code-block:: none
|
|
|
|
juju remove-unit rabbitmq-server/0
|
|
|
|
Then add a node. Here we will be deploying to a new container on existing
|
|
machine '5':
|
|
|
|
.. code-block:: none
|
|
|
|
juju add-unit --to lxd:5 rabbitmq-server
|
|
|
|
Please be patient as the node joins the cluster. In this example a machine (a
|
|
LXD container) must also be provisioned.
|
|
|
|
Verify model health
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
Verify the model's health with the :command:`juju status rabbitmq-server`
|
|
command. Its partial output should look like:
|
|
|
|
.. code-block:: console
|
|
|
|
App Version Status Scale Charm Store Channel Rev OS Message
|
|
rabbitmq-server 3.8.2 active 3 rabbitmq-server charmstore stable 118 ubuntu Unit is ready and clustered
|
|
|
|
Unit Workload Agent Machine Public address Ports Message
|
|
rabbitmq-server/0 active idle 5/lxd/1 10.0.0.178 5672/tcp Unit is ready and clustered
|
|
rabbitmq-server/1* active idle 1/lxd/0 10.0.0.175 5672/tcp Unit is ready and clustered
|
|
rabbitmq-server/2 active idle 2/lxd/0 10.0.0.176 5672/tcp Unit is ready and clustered
|