Files
charm-guide/doc/source/admin/ops-replace-vault-node.rst
Peter Matulis 995c6ed96c Remove wait option from all run commands
The Charm Guide was recently updated for Juju 3.x. However, it was not
known at the time that the semantics of the --wait option for the `juju
run` command had changed.

In 3, the run command by default stays in the foreground (a --background
option has been added to achieve previous 2.9 default behaviour). The
--wait option is now a timeout and, if used, requires a value. Since
the previous PR simply substituted commands, all `juju run --wait`
commands will fail.

Change-Id: I6bb90762ad5cb5ca97ca311501b1ff7d3d9a3ccb
Signed-off-by: Peter Matulis <peter.matulis@canonical.com>
2023-09-27 12:16:47 -04:00

181 lines
6.9 KiB
ReStructuredText

:orphan:
==========================
Replace Vault cluster node
==========================
.. important::
This page has been identified as being affected by the breaking changes
introduced between versions 2.9.x and 3.x of the Juju client. Read
support note :ref:`juju_29_3x_changes` before continuing.
Introduction
------------
This article shows how to replace a Vault node in a cluster made highly
available by means of the subordinate hacluster charm. It implies the removal
and then the addition of a vault unit. This is done with generic Juju commands
and actions available to the hacluster charm.
.. important::
This procedure will not result in cloud downtime providing that there is at
least one functional Vault node present at all times.
.. warning::
This procedure will involve a sealed Vault instance. Please ensure that the
requisite number of unseal keys are available before continuing.
Procedure
---------
If the unit being removed is in a 'lost' state (as seen in :command:`juju
status`) please first see the `Notes`_ section.
List the application units
~~~~~~~~~~~~~~~~~~~~~~~~~~
Display the units, in this case for the vault application:
.. code-block:: none
juju status vault
This article will be based on the following (partial) output:
.. code-block:: console
Unit Workload Agent Machine Public address Ports Message
vault/0* active idle 1/lxd/4 10.246.114.76 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/1 active idle 10.246.114.76 Unit is ready and clustered
vault-mysql-router/0* active idle 10.246.114.76 Unit is ready
vault/3 active idle 0/lxd/8 10.246.114.83 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/2 active idle 10.246.114.83 Unit is ready and clustered
vault-mysql-router/25 active idle 10.246.114.83 Unit is ready
vault/4 active idle 2/lxd/9 10.246.114.84 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/0* active idle 10.246.114.84 Unit is ready and clustered
vault-mysql-router/24 active idle 10.246.114.84 Unit is ready
In this example, unit ``vault/3`` will be removed.
Pause the subordinate hacluster unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pause the hacluster unit that corresponds to the principal application unit
being removed. Here, unit ``vault-hacluster/2`` corresponds to unit
``vault/3``:
.. code-block:: none
juju run vault-hacluster/2 pause
Remove the principal application unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Remove the principal application unit:
.. code-block:: none
juju remove-unit vault/3
This will also remove the hacluster subordinate unit (and any other subordinate
units).
Add a principal application unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Scale out the existing vault application and place the new (containerised) unit
on the same host that the removed unit was on (machine 0):
.. code-block:: none
juju add-unit --to lxd:0 vault
.. caution::
If network spaces are in use the above command will not succeed. See Juju
issue `LP #1969523`_ for a workaround.
The new :command:`juju status` output now contains:
.. code-block:: console
Unit Workload Agent Machine Public address Ports Message
vault/0* active idle 1/lxd/4 10.246.114.76 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/1 active idle 10.246.114.76 Unit is ready and clustered
vault-mysql-router/0* active idle 10.246.114.76 Unit is ready
vault/4 active idle 2/lxd/9 10.246.114.84 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/0* active idle 10.246.114.84 Unit is ready and clustered
vault-mysql-router/24 active idle 10.246.114.84 Unit is ready
vault/6 blocked idle 0/lxd/9 10.246.114.83 8200/tcp Unit is sealed
vault-hacluster/28 active idle 10.246.114.83 Unit is ready and clustered
vault-mysql-router/40 active idle 10.246.114.83 Unit is ready
Notice that the new vault unit (``vault/6``) is sealed.
Unseal the new Vault instance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here we will assume that the original Vault deploy was initialised with a
requirement of three unseal keys.
Set an environment variable based on the address of the newly-introduced unit,
and unseal the instance:
.. code-block:: none
export VAULT_ADDR="http://10.246.114.83:8200"
vault operator unseal
vault operator unseal
vault operator unseal
For more information on unsealing Vault see cloud operation :doc:`Unseal Vault
<ops-unseal-vault>`.
Verify cloud services
~~~~~~~~~~~~~~~~~~~~~
The final :command:`juju status vault` (partial) output is:
.. code-block:: console
Unit Workload Agent Machine Public address Ports Message
vault/0* active idle 1/lxd/4 10.246.114.76 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/1 active idle 10.246.114.76 Unit is ready and clustered
vault-mysql-router/0* active idle 10.246.114.76 Unit is ready
vault/4 active idle 2/lxd/9 10.246.114.84 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/0* active idle 10.246.114.84 Unit is ready and clustered
vault-mysql-router/24 active idle 10.246.114.84 Unit is ready
vault/6 active idle 0/lxd/9 10.246.114.83 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/28 active idle 10.246.114.83 Unit is ready and clustered
vault-mysql-router/40 active idle 10.246.114.83 Unit is ready
Ensure that all cloud services are working as expected.
Notes
-----
Pre-removal, in the case where the principal application unit has transitioned
to a 'lost' state (e.g. dropped off the network due to a hardware failure),
#. the first step (pause the hacluster unit) can be skipped
#. the second step (remove the principal unit) can be replaced by:
.. code-block:: none
juju remove-machine N --force
N is the Juju machine ID (see the :command:`juju status` command) where the
unit to be removed is running.
.. warning::
Removing the machine by force will naturally remove any other units that
may be present, including those from an entirely different application.
.. LINKS
.. _LP #1969523: https://bugs.launchpad.net/juju/+bug/1969523