Files
charm-guide/doc/source/admin/ops-replace-rabbitmq-node.rst
Peter Matulis 8bf87e2c8a Move howto contents to admin directory
This is needed because it was discovered that the
"admin guide" link from upstream (docs.openstack.org)
cannot be a redirect (to our current howto dir). I
like the new organisation better anyway.

Change-Id: I516429a93563205148d18b96e32d428d424a681a
2022-01-07 14:07:01 -05:00

99 lines
4.0 KiB
ReStructuredText

:orphan:
=======================
Replace a RabbitMQ node
=======================
Introduction
~~~~~~~~~~~~
There may be times when a cloud's RabbitMQ cluster experiences difficulties.
When this happens one option to bringing the cluster back to a healthy state is
to replace one of its nodes.
.. important::
This procedure will not result in cloud downtime providing that there is at
least one functional RabbitMQ node present.
However, it will trigger the restart of every client application (cloud
service) that has a relation to the rabbitmq-server application. This may
have adverse effects on the cloud's ability to service its current workload.
The above issue can be mitigated through the use of :doc:`Deferred service
events <deferred-events>`. If restarts are deferred, a client will
nonetheless experience a slight delay (due to timeouts) if it tries to
contact a non-existent node.
.. note::
An alternative approach to replacing a node is to repair it. See operation
:doc:`Repair a RabbitMQ node <ops-repair-rabbitmq-node>`, which includes a
comparison of the repair and replacement methods.
Initial state of the cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For this example, the :command:`juju status rabbitmq-server` command describes
the initial state of the cluster:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
openstack maas-controller maas-one/default 2.9.18 unsupported 23:18:34Z
App Version Status Scale Charm Store Channel Rev OS Message
rabbitmq-server 3.8.2 active 3 rabbitmq-server charmstore stable 118 ubuntu Unit is ready and clustered
Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0 error idle 0/lxd/0 10.0.0.177 5672/tcp ????????????????
rabbitmq-server/1* active idle 1/lxd/0 10.0.0.175 5672/tcp Unit is ready and clustered
rabbitmq-server/2 active idle 2/lxd/0 10.0.0.176 5672/tcp Unit is ready and clustered
Machine State DNS Inst id Series AZ Message
0 started 10.0.0.172 node2 focal default Deployed
0/lxd/0 started 10.0.0.177 juju-64dabb-0-lxd-0 focal default Container started
1 started 10.0.0.173 node4 focal default Deployed
1/lxd/0 started 10.0.0.175 juju-64dabb-1-lxd-0 focal default Container started
2 started 10.0.0.174 node3 focal default Deployed
2/lxd/0 started 10.0.0.176 juju-64dabb-2-lxd-0 focal default Container started
In this example the unhealthy node is assumed to be ``rabbitmq-server/0`` with
an arbitrary message indicative of a problem. This is the node that is to be
replaced.
Replace the unhealthy node
~~~~~~~~~~~~~~~~~~~~~~~~~~
First remove the unhealthy node:
.. code-block:: none
juju remove-unit rabbitmq-server/0
Then add a node. Here we will be deploying to a new container on existing
machine '5':
.. code-block:: none
juju add-unit --to lxd:5 rabbitmq-server
Please be patient as the node joins the cluster. In this example a machine (a
LXD container) must also be provisioned.
Verify model health
~~~~~~~~~~~~~~~~~~~
Verify the model's health with the :command:`juju status rabbitmq-server`
command. Its partial output should look like:
.. code-block:: console
App Version Status Scale Charm Store Channel Rev OS Message
rabbitmq-server 3.8.2 active 3 rabbitmq-server charmstore stable 118 ubuntu Unit is ready and clustered
Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0 active idle 5/lxd/1 10.0.0.178 5672/tcp Unit is ready and clustered
rabbitmq-server/1* active idle 1/lxd/0 10.0.0.175 5672/tcp Unit is ready and clustered
rabbitmq-server/2 active idle 2/lxd/0 10.0.0.176 5672/tcp Unit is ready and clustered