Extend scale back hacluster operation - retry

Extend the scale back hacluster cloud operation from
using Vault as an example to replacing a Vault cluster
node. This consists essentially of adding a section on
scaling out the application.

Improve the Unseal Vault operation by linking to the
Vault TLS cloud operation.

Change-Id: Id2e146e7e5dbad8f1df4acb85f60728257ba526d
This commit is contained in:
Peter Matulis 2022-04-21 13:45:59 -04:00
parent 1fa01acd73
commit ab54e97433
4 changed files with 183 additions and 160 deletions

View File

@ -15,7 +15,7 @@ General cloud operations:
ops-unseal-vault
ops-config-tls-vault-api
ops-live-migrate-vms
ops-scale-back-with-hacluster
ops-replace-vault-node
ops-scale-out-nova-compute
ops-start-innodb-from-outage
ops-auto-glance-image-updates

View File

@ -0,0 +1,175 @@
:orphan:
==================================================
Scale back an application with the hacluster charm
==================================================
Introduction
------------
This article shows how to replace a Vault node in a cluster made highly
available by means of the subordinate hacluster charm. It implies the removal
and then the addition of a vault unit. This is done with generic Juju commands
and actions available to the hacluster charm.
.. important::
This procedure will not result in cloud downtime providing that there is at
least one functional Vault node present at all times.
.. warning::
This procedure will involve a sealed Vault instance. Please ensure that the
requisite number of unseal keys are available before continuing.
Procedure
---------
If the unit being removed is in a 'lost' state (as seen in :command:`juju
status`) please first see the `Notes`_ section.
List the application units
~~~~~~~~~~~~~~~~~~~~~~~~~~
Display the units, in this case for the vault application:
.. code-block:: none
juju status vault
This article will be based on the following (partial) output:
.. code-block:: console
Unit Workload Agent Machine Public address Ports Message
vault/0* active idle 1/lxd/4 10.246.114.76 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/1 active idle 10.246.114.76 Unit is ready and clustered
vault-mysql-router/0* active idle 10.246.114.76 Unit is ready
vault/3 active idle 0/lxd/8 10.246.114.83 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/2 active idle 10.246.114.83 Unit is ready and clustered
vault-mysql-router/25 active idle 10.246.114.83 Unit is ready
vault/4 active idle 2/lxd/9 10.246.114.84 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/0* active idle 10.246.114.84 Unit is ready and clustered
vault-mysql-router/24 active idle 10.246.114.84 Unit is ready
In this example, unit ``vault/3`` will be removed.
Pause the subordinate hacluster unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pause the hacluster unit that corresponds to the principal application unit
being removed. Here, unit ``vault-hacluster/2`` corresponds to unit
``vault/3``:
.. code-block:: none
juju run-action --wait vault-hacluster/2 pause
Remove the principal application unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Remove the principal application unit:
.. code-block:: none
juju remove-unit vault/3
This will also remove the hacluster subordinate unit (and any other subordinate
units).
Add a principal application unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Add a principal application unit. We accomplish this by scaling out the
existing vault application and placing the new (containerised) unit on the same
host that the removed unit was on (machine 0):
.. code-block:: none
juju add-unit --to lxd:0 vault
.. caution::
If network spaces are in use the above command will not succeed. See Juju
issue `LP #1969523`_ for a workaround.
The new :command:`juju status` output now contains:
.. code-block:: console
Unit Workload Agent Machine Public address Ports Message
vault/0* active idle 1/lxd/4 10.246.114.76 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/1 active idle 10.246.114.76 Unit is ready and clustered
vault-mysql-router/0* active idle 10.246.114.76 Unit is ready
vault/4 active idle 2/lxd/9 10.246.114.84 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/0* active idle 10.246.114.84 Unit is ready and clustered
vault-mysql-router/24 active idle 10.246.114.84 Unit is ready
vault/6 blocked idle 0/lxd/9 10.246.114.83 8200/tcp Unit is sealed
vault-hacluster/28 active idle 10.246.114.83 Unit is ready and clustered
vault-mysql-router/40 active idle 10.246.114.83 Unit is ready
Notice that the new vault unit (``vault/6``) is sealed.
Unseal the new Vault instance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here we will assume that the original Vault deploy was initialised with a
requirement of three unseal keys.
Set an environment variable based on the address of the newly-introduced unit,
and unseal the instance:
.. code-block:: none
export VAULT_ADDR="http://10.246.114.83:8200"
vault operator unseal
vault operator unseal
vault operator unseal
For more information on unsealing Vault see cloud operation :doc:`Unseal Vault
<ops-unseal-vault>`.
Verify cloud services
~~~~~~~~~~~~~~~~~~~~~
The final :command:`juju status vault` (partial) output is:
.. code-block:: console
Unit Workload Agent Machine Public address Ports Message
vault/0* active idle 1/lxd/4 10.246.114.76 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/1 active idle 10.246.114.76 Unit is ready and clustered
vault-mysql-router/0* active idle 10.246.114.76 Unit is ready
vault/4 active idle 2/lxd/9 10.246.114.84 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/0* active idle 10.246.114.84 Unit is ready and clustered
vault-mysql-router/24 active idle 10.246.114.84 Unit is ready
vault/6 active idle 0/lxd/9 10.246.114.83 8200/tcp Unit is ready (active: false, mlock: disabled)
vault-hacluster/28 active idle 10.246.114.83 Unit is ready and clustered
vault-mysql-router/40 active idle 10.246.114.83 Unit is ready
Ensure that all cloud services are working as expected.
Notes
-----
Pre-removal, in the case where the principal application unit has transitioned
to a 'lost' state (e.g. dropped off the network due to a hardware failure),
#. the first step (pause the hacluster unit) can be skipped
#. the second step (remove the principal unit) can be replaced by:
.. code-block:: none
juju remove-machine N --force
N is the Juju machine ID (see the :command:`juju status` command) where the
unit to be removed is running.
.. warning::
Removing the machine by force will naturally remove any other units that
may be present, including those from an entirely different application.
.. LINKS
.. _LP #1969523: https://bugs.launchpad.net/juju/+bug/1969523

View File

@ -1,159 +0,0 @@
:orphan:
==================================================
Scale back an application with the hacluster charm
==================================================
Introduction
------------
This article shows how to scale back an application that is made highly
available by means of the subordinate hacluster charm. It implies the removal
of one or more of the principal application's units. This is easily done with
generic Juju commands and actions available to the hacluster charm.
.. note::
Since the application being scaled back is already in HA mode the removal of
one of its cluster members should not cause any immediate interruption of
cloud services.
Scaling back an application will also remove its associated hacluster unit.
It is best practice to have at least three hacluster units per application
at all times. An odd number is also recommended.
Procedure
---------
If the unit being removed is in a 'lost' state (as seen in :command:`juju
status`) please first see the `Notes`_ section.
List the application units
~~~~~~~~~~~~~~~~~~~~~~~~~~
Display the units, in this case for the vault application:
.. code-block:: none
juju status vault
This article will be based on the following output:
.. code-block:: console
Unit Workload Agent Machine Public address Ports Message
vault/0* active idle 0/lxd/5 10.0.0.227 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/0* active idle 10.0.0.227 Unit is ready and clustered
vault-mysql-router/0* active idle 10.0.0.227 Unit is ready
vault/1 active idle 1/lxd/5 10.0.0.234 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/1 active idle 10.0.0.234 Unit is ready and clustered
vault-mysql-router/1 active idle 10.0.0.234 Unit is ready
vault/2 active idle 2/lxd/6 10.0.0.233 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/2 active idle 10.0.0.233 Unit is ready and clustered
vault-mysql-router/2 active idle 10.0.0.233 Unit is ready
In the below example, unit ``vault/1`` will be removed.
Pause the subordinate hacluster unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pause the hacluster unit that corresponds to the principle application unit
being removed. Here, unit ``vault-hacluster/1`` corresponds to unit
``vault/1``:
.. code-block:: none
juju run-action --wait vault-hacluster/1 pause
.. caution::
Unit numbers for a subordinate unit and its corresponding principal unit are
not necessarily the same (e.g. it is possible to have ``vault-hacluster/2``
correspond to ``vault/1``).
Remove the principal application unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Remove the principal application unit:
.. code-block:: none
juju remove-unit vault/1
This will also remove the hacluster subordinate unit (and any other subordinate
units).
Update the ``cluster_count`` value
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Inform the hacluster charm about the new number of hacluster units, two here:
.. code-block:: none
juju config vault-hacluster cluster_count=2
In this example a count of two (less than three) removes quorum functionality
and enables a two-node cluster. This is a sub-optimal state and is shown as an
example only.
Update Corosync
~~~~~~~~~~~~~~~
Remove Corosync nodes from its ring and update ``corosync.conf`` to reflect the
new number of nodes (``min_quorum`` is recalculated):
.. code-block:: none
juju run-action --wait vault-hacluster/leader update-ring i-really-mean-it=true
Check the status of the Corosync cluster by querying a remaining hacluster
unit:
.. code-block:: none
juju ssh vault-hacluster/leader sudo crm status
There should not be any node listed as OFFLINE.
.. note::
With Juju client < 2.9 a subordinate leader unit must be referenced via its
machine ID (e.g. 0/lxd/5) when using the :command:`juju ssh` command.
Verify cloud services
~~~~~~~~~~~~~~~~~~~~~
For this example, the final :command:`juju status vault` output is:
.. code-block:: console
Unit Workload Agent Machine Public address Ports Message
vault/0* active idle 0/lxd/5 10.0.0.227 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/0* active idle 10.0.0.227 Unit is ready and clustered
vault-mysql-router/0* active idle 10.0.0.227 Unit is ready
vault/2 active idle 2/lxd/6 10.0.0.233 8200/tcp Unit is ready (active: true, mlock: disabled)
vault-hacluster/2 active idle 10.0.0.233 Unit is ready and clustered
vault-mysql-router/2 active idle 10.0.0.233 Unit is ready
Ensure that all cloud services are working as expected.
Notes
-----
Pre-removal, in the case where the principal application unit has transitioned
to a 'lost' state (e.g. dropped off the network due to a hardware failure),
#. the first step (pause the hacluster unit) can be skipped
#. the second step (remove the principal unit) can be replaced by:
.. code-block:: none
juju remove-machine N --force
N is the Juju machine ID (see the :command:`juju status` command) where the
unit to be removed is running.
.. warning::
Removing the machine by force will naturally remove any other units that
may be present, including those from an entirely different application.

View File

@ -51,6 +51,13 @@ For a single unit requiring three keys (``vault/0`` with IP address
vault operator unseal
vault operator unseal
.. note::
If the Vault API is encrypted you will need to inform your Vault client of
the associated CA certificate via an additional variable (VAULT_CACERT). See
cloud operation :doc:`Configure TLS for the Vault API
<ops-config-tls-vault-api>`.
You will be prompted for the unseal keys. The information will not be echoed
back to the screen nor captured in the shell's history.