Merge "Rewrite Masakari appendix"

2020-07-07 14:10:27 +00:00 · 2020-07-07 14:10:27 +00:00 · a8aaaa71fa
parent aff7a42a14 3e5e038fd2
commit a8aaaa71fa
1 changed files with 440 additions and 236 deletions
--- a/deploy-guide/source/app-masakari.rst
+++ b/deploy-guide/source/app-masakari.rst
@ -1,301 +1,505 @@
-Appendix L: Automated Instance Recovery
-=======================================
+====================
+Appendix L: Masakari
+====================

 Overview
-++++++++
+--------

-As of the 19.04 charm release, with OpenStack Stein and later, Masakari can be
-a deployed to provide automated instance recovery for guests using shared
-storage. Masakari responds to two different failures types, individual guest
-failure and the loss of an entire compute node.
+As of the 20.05 charm release, Masakari can be deployed to provide automated
+instance recovery for clouds that use shared storage for its instances. The
+following functionality is provided:

-.. warning::
+#. **Evacuation of instances** (supported since OpenStack Stein)

-    These charms bring forward upstream Masakari features which need to be
-    carefully considered and pre-validated in test labs by cloud operators.
-    Further upstream Masakari development, charm feature work and scenario
-    validation is likely going to be necessary before the solution can be
-    considered mature on the whole.
+   In the event of hypervisor software failure the associated compute node is
+   shut down and instance images are started on another hypervisor.

-STONITH
-+++++++
+#. **Restarting of instances** (supported since OpenStack Ussuri)

-It is important that guests using shared storage cannot continue to run in the
-event of a compute node becoming isolated. The risk being that masakari
-attempts to bring the same guest up on a new compute node when the old one
-is still running which could lead to data corruption. To ensure that this does
-not occur stonith can be setup for the compute nodes. For stonith to be
-configured the **maas_url** and **maas_credentials** config option must be
-set in the hacluster charm related to the masakari charm. Also the
-**enable-stonith** config option should be set to **True** in the
-pacemaker-remote charm.
+   A failed instance can be restarted on its current hypervisor.
+
+See the `Masakari charm`_ for an overview of the charms involved.
+
+.. note::
+
+   `MAAS`_ is required when enabling Masakari on Charmed OpenStack.
+
+Software
+--------
+
+Install the software necessary for configuring Masakari:
+
+.. code-block:: none
+
+   sudo snap install openstackclients
+
+Verify that the ``segment`` sub-command is available (this is provided by the
+``python-masakariclient`` plugin):
+
+.. code-block:: none
+
+   openstack segment --help
+
+.. important::
+
+   If the ``segment`` sub-command is not available you will need a more recent
+   version of the ``openstackclients`` snap. For example, you may need to use
+   the 'edge' channel: :command:`sudo snap refresh openstackclients
+   --channel=edge`.
+
+Instance evacuation mechanics
+-----------------------------
+
+In order for an instance to be relocated to another hypervisor some form of
+shared storage must be implemented. As a result, the scenario where a
+hypervisor has lost network access to its peers yet continues to access that
+shared storage must be considered.
+
+The mechanics of instance evacuation is now described:
+
+Masakari Monitors, on a peer hypervisor, detects that its peer is unavailable
+and notifies the Masakari API server. This in turn triggers the Masakari engine
+to initiate a failover of the instance via Nova. Assuming that Nova concurs
+that the hypervisor is absent, it will attempt to start the instance on another
+hypervisor. At this point there are two instances competing for the same disk
+image, which can lead to data corruption.
+
+The solution is to enable a STONITH Pacemaker plugin, which will power off the
+compute node via the MAAS API when Pacemaker detects the hypervisor as being
+offline.
+
+.. caution::
+
+   Since nova-compute is typically deployed on bare metal, which may host
+   containerised applications and possibly even applications alongside
+   nova-compute (e.g. ceph-osd), care is advised when designing a cloud with
+   Masakari to avoid a powered-off compute node from disrupting crucial non-HA
+   cloud services.
+
+   Ensure that Masakari functionality has been fully validated in a staging
+   environment prior to using it in production.
+
+Usage
+-----
+
+Configuration
+~~~~~~~~~~~~~
+
+The below overlay bundle can be used to deploy Masakari when using a bundle to
+deploy OpenStack.
+
+Ensure that the ``machines`` section and the placement directives (i.e. the
+``to`` option under the masakari application) can co-exist with your OpenStack
+bundle.
+
+Provide values for the ``maas_url``, ``maas_credentials``, and ``vip``
+hacluster charm options . A VIP is a virtual IP needed for Masakari to enable
+HA (a requirement when using the masakari charm). If multiple networks are
+used, multiple (space separated) VIPs should be provided. See `OpenStack high
+availability`_ for HA guidance.
+
+Enable STONITH via the ``enable-stonith`` pacemaker-remote charm option.
+
+Provide values for the ``binding`` (network spaces) masakari charm option
+according to your local environment. For simplicity (or for testing), the same
+network space can be used for all Masakari bindings.
+
+.. code-block:: yaml
+
+   machines:
+     '0':
+       series: bionic
+     '1':
+       series: bionic
+     '2':
+       series: bionic
+     '3':
+       series: bionic
+   relations:
+   - - nova-compute:juju-info
+     - masakari-monitors:container
+   - - masakari:ha
+     - hacluster:ha
+   - - keystone:identity-credentials
+     - masakari-monitors:identity-credentials
+   - - nova-compute:juju-info
+     - pacemaker-remote:juju-info
+   - - hacluster:pacemaker-remote
+     - pacemaker-remote:pacemaker-remote
+   - - masakari:identity-service
+     - keystone:identity-service
+   - - masakari:shared-db
+     - mysql:shared-db
+   - - masakari:amqp
+     - rabbitmq-server:amqp
+   series: bionic
+   applications:
+     masakari-monitors:
+       charm: cs:masakari-monitors
+     hacluster:
+       charm: cs:hacluster
+       options:
+         maas_url: <INSERT MAAS URL>
+         maas_credentials: <INSERT MAAS API KEY>
+     pacemaker-remote:
+       charm: cs:pacemaker-remote
+       options:
+         enable-stonith: True
+         enable-resources: False
+     masakari:
+       charm: cs:masakari
+       series: bionic
+       num_units: 3
+       options:
+         openstack-origin: cloud:bionic-stein
+         vip: <INSERT VIP(S)>
+       bindings:
+         public: public
+         admin: admin
+         internal: internal
+         shared-db: internal
+         amqp: internal
+       to:
+       - 'lxd:1'
+       - 'lxd:2'
+       - 'lxd:3'

 Deployment
-++++++++++
+~~~~~~~~~~

-Three new charms are needeed to deploy this solution: masakari,
-masakari-monitors and pacemaker-remote. The masakari charm provides api
-services and is a principal or standalone charm. The masakari-monitors charm is
-deployed as a subordinate to the nova-compute charm as it monitors
-nova-compute directly and sends messages to the masakari API charm. The
-pacemaker-remote charm is also a subordinate to the nova-compute charm and is
-required to monitor the compute nodes health.
+To deploy Masakari during the deployment of a new cloud (e.g. via the
+`openstack-base`_ bundle):

-Below is an overlay which can be used to add masakari to an existing
-deployment:
+.. code-block:: none

-.. code::
+   juju deploy ./bundle.yaml --overlay masakari-overlay.yaml

-    machines:
-      '0':
-        series: bionic
-      '1':
-        series: bionic
-      '2':
-        series: bionic
-      '3':
-        series: bionic
-    relations:
-    - - nova-compute:juju-info
-      - masakari-monitors:container
-    - - masakari:ha
-      - hacluster:ha
-    - - keystone:identity-credentials
-      - masakari-monitors:identity-credentials
-    - - nova-compute:juju-info
-      - pacemaker-remote:juju-info
-    - - hacluster:pacemaker-remote
-      - pacemaker-remote:pacemaker-remote
-    - - masakari:identity-service
-      - keystone:identity-service
-    - - masakari:shared-db
-      - mysql:shared-db
-    - - masakari:amqp
-      - rabbitmq-server:amqp
-    series: bionic
-    applications:
-      masakari-monitors:
-        charm: cs:masakari-monitors
-      hacluster:
-        charm: cs:hacluster
-        options:
-          maas_url: <INSERT MAAS URL>
-          maas_credentials: <INSERT MAAS API KEY>
-      pacemaker-remote:
-        charm: cs:pacemaker-remote
-        options:
-          enable-stonith: True
-          enable-resources: False
-      masakari:
-        charm: cs:masakari
-        series: bionic
-        num_units: 3
-        options:
-          openstack-origin: cloud:bionic-stein
-          vip: <INSERT VIP(S)>
-        bindings:
-          public: public
-          admin: admin
-          internal: internal
-          shared-db: internal
-          amqp: internal
-        to:
-        - 'lxd:1'
-        - 'lxd:2'
-        - 'lxd:3'
+To add Masakari to an existing deployment (i.e. the Juju model has pre-existing
+machines) the ``--map-machines`` option should be used.

-.. warning::
+The cloud should then be configured for usage. See `Configure OpenStack`_ for
+assistance.

-    The bundle above with need customising to correct maas_url,
-    maas_credentials and vip settings. The machine mappings will almost
-    certainly need updating too.
+For the purposes of this document the below hypervisors are presumed:

-To use the overlay with an existing model remember to use the
-**--map-machines** switch to juju
+.. code-block:: console

-.. code::
+   +-------------------+---------+-------+
+   | Host              | Status  | State |
+   +-------------------+---------+-------+
+   | virt-node-01.maas | enabled | up    |
+   | virt-node-10.maas | enabled | up    |
+   | virt-node-02.maas | enabled | up    |
+   +-------------------+---------+-------+

-    $ juju deploy base.yaml --overlay masakari-overlay.yaml --map-machines=existing
+In addition let us assume that instance 'bionic-1' now resides on host
+'virt-node-02.maas':

-Configuring Masakari
-++++++++++++++++++++
+.. code-block:: console

-In Masakari the compute nodes are grouped into failover segments. In the event
-of a failure guests are moved onto other nodes within the same segment. Which
-compute node is chosen to house the evacuated guests is determined by the
-recovery method of that segment.
+   +----------------------+-------------------+
+   | Field                | Value             |
+   +----------------------+-------------------+
+   | OS-EXT-SRV-ATTR:host | virt-node-02.maas |
+   +----------------------+-------------------+

-'AUTO' Recovery Method
----------------------
+The above information was obtained by the following two commands,
+respectively:

-With auto recovery the guests are relocated to any of the available nodes in
-the same segment. The problem with this approach is that there is no guarantee
-that resources will be available to accommodate guests from a failed compute
-node.
+.. code-block:: none

-To configure a group of compute hosts for auto recovery, first create a segment
-with the recovery method set to auto:
+   openstack compute service list -c Host -c Status -c State --service nova-compute
+   openstack server show bionic-1 -c OS-EXT-SRV-ATTR:host

-.. code::
+Instance evacuation recovery methods
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-    $ openstack segment create segment1 auto COMPUTE
-    +-----------------+--------------------------------------+
-    | Field           | Value                                |
-    +-----------------+--------------------------------------+
-    | created_at      | 2019-04-12T13:59:50.000000           |
-    | updated_at      | None                                 |
-    | uuid            | 691b8ef3-7481-48b2-afb6-908a98c8a768 |
-    | name            | segment1                             |
-    | description     | None                                 |
-    | id              | 1                                    |
-    | service_type    | COMPUTE                              |
-    | recovery_method | auto                                 |
-    +-----------------+--------------------------------------+
+With Masakari, compute nodes are grouped into failover segments. In the event
+of a compute node failure, that node's instances are moved onto another compute
+node within the same segment.

+The destination node is determined by the recovery method configured for the
+affected segment. There are four methods:

-Next the hypervisors need to be added into the segment, these should be
-referenced by their unqualified hostname:
+* ``reserved_host``
+* ``auto``
+* ``rh_priority``
+* ``auto_priority``

-.. code::
+A compute node failure can be simulated by bringing down its primary network
+interface. For example, to bring down a node that corresponds to unit
+``nova-compute/2``:

-    $ openstack segment host create tidy-goose COMPUTE SSH 691b8ef3-7481-48b2-afb6-908a98c8a768
-    +---------------------+--------------------------------------+
-    | Field               | Value                                |
-    +---------------------+--------------------------------------+
-    | created_at          | 2019-04-12T14:18:24.000000           |
-    | updated_at          | None                                 |
-    | uuid                | 11b85c9d-2b97-4b83-b773-0e9565e407b5 |
-    | name                | tidy-goose                           |
-    | type                | COMPUTE                              |
-    | control_attributes  | SSH                                  |
-    | reserved            | False                                |
-    | on_maintenance      | False                                |
-    | failover_segment_id | 691b8ef3-7481-48b2-afb6-908a98c8a768 |
-    +---------------------+--------------------------------------+
+.. code-block:: none

-Repeat above for all remaining hypervisors:
+   juju run --unit nova-compute/2 sudo ip link set br-ens3 down

+'reserved_host'
+^^^^^^^^^^^^^^^

-.. code::
+The ``reserved_host`` recovery method relocates instances to a subset of
+non-active nodes. Because these nodes are not active and are typically
+resourced adequately for failover duty, there is a guarantee that sufficient
+resources will exist on a reserved node to accommodate migrated instances.

-    $ openstack segment host list 691b8ef3-7481-48b2-afb6-908a98c8a768
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
-    | uuid                                 | name       | type    | control_attributes | reserved | on_maintenance | failover_segment_id                  |
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
-    | 75afadbb-67cc-47b2-914e-e3bf848028e4 | frank-colt | COMPUTE | SSH                | False    | False          | 691b8ef3-7481-48b2-afb6-908a98c8a768 |
-    | 11b85c9d-2b97-4b83-b773-0e9565e407b5 | tidy-goose | COMPUTE | SSH                | False    | False          | 691b8ef3-7481-48b2-afb6-908a98c8a768 |
-    | f1e9b0b4-3ac9-4f07-9f83-5af2f9151109 | model-crow | COMPUTE | SSH                | False    | False          | 691b8ef3-7481-48b2-afb6-908a98c8a768 |
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
+For example, to create segment 'S1', configure it to use the ``reserved_host``
+method, and assign it three compute nodes, with one being tagged as a reserved
+node:

-'RESERVED_HOST' Recovery Method
-------------------------------
+.. code-block:: none

-With reserved_host recovery compute hosts are allocated as reserved which
-allows an operator to guarantee there is sufficient capacity available for any
-guests in need of evacuation.
+   openstack segment create S1 reserved_host COMPUTE
+   openstack segment host create virt-node-10.maas COMPUTE SSH S1
+   openstack segment host create virt-node-02.maas COMPUTE SSH S1
+   openstack segment host create --reserved True virt-node-01.maas COMPUTE SSH S1

-Firstly create a segment with the reserved_host recovery method:
+View the details of a segment:

-.. code::
+.. code-block:: none

-    $ openstack segment create segment1 reserved_host COMPUTE -c uuid -f value
-    2598f8aa-3612-4731-9716-e126ca6cc280
+   openstack segment list

+Sample output:

-Add a host using the --reserved switch to indicate that it will act as a
-standby:
+.. code-block:: console

-.. code::
+   +--------------------------------------+------+-------------+--------------+-----------------+
+   | uuid                                 | name | description | service_type | recovery_method |
+   +--------------------------------------+------+-------------+--------------+-----------------+
+   | 3af6dfe7-1619-486f-a2c6-8453488c6a66 | S2   | None        | COMPUTE      | auto            |
+   +--------------------------------------+------+-------------+--------------+-----------------+

-    $ openstack segment host create model-crow --reserved True COMPUTE SSH 2598f8aa-3612-4731-9716-e126ca6cc280
+A segment's hosts can be listed like this:

+.. code-block:: none

-Add the remaining hypervisors as before:
+   openstack segment host list -c name -c reserved -c on_maintenance S2

-.. code::
+The output should show a value of 'True' in the 'reserved' column for the
+appropriate node:

-    $ openstack segment host create frank-colt COMPUTE SSH 2598f8aa-3612-4731-9716-e126ca6cc280
-    $ openstack segment host create tidy-goose COMPUTE SSH 2598f8aa-3612-4731-9716-e126ca6cc280
+.. code-block:: console

+   +-------------------+----------+----------------+
+   | name              | reserved | on_maintenance |
+   +-------------------+----------+----------------+
+   | virt-node-01.maas | True     | False          |
+   | virt-node-10.maas | False    | False          |
+   | virt-node-02.maas | False    | False          |
+   +-------------------+----------+----------------+

-Listing the segment hosts shows that model-crow is a reserved host:
+Finally, disable the reserved node in Nova so that it becomes non-active, and
+thus available for failover:

-.. code::
+.. code-block:: none

-    $ openstack segment host list 2598f8aa-3612-4731-9716-e126ca6cc280
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
-    | uuid                                 | name       | type    | control_attributes | reserved | on_maintenance | failover_segment_id                  |
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
-    | 4769e08c-ed52-440a-866e-832b977aa5e2 | tidy-goose | COMPUTE | SSH                | False    | False          | 2598f8aa-3612-4731-9716-e126ca6cc280 |
-    | 90aedbd2-e03b-4dbd-b330-a1c848f300df | frank-colt | COMPUTE | SSH                | False    | False          | 2598f8aa-3612-4731-9716-e126ca6cc280 |
-    | c77574cc-b6e7-440e-9c86-84e91981f15e | model-crow | COMPUTE | SSH                | True     | False          | 2598f8aa-3612-4731-9716-e126ca6cc280 |
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
+   openstack compute service set --disable virt-node-01.maas nova-compute

-Finally disable the reserved host in nova so that it remains available for
-failover:
+The cloud's compute node list should show a status of 'disabled' for the
+appropriate node:

-.. code::
+.. code-block:: console

-    $ openstack compute service set --disable model-crow nova-compute
-    $ openstack compute service list
-    +----+----------------+---------------------+----------+----------+-------+----------------------------+
-    | ID | Binary         | Host                | Zone     | Status   | State | Updated At                 |
-    +----+----------------+---------------------+----------+----------+-------+----------------------------+
-    |  1 | nova-scheduler | juju-44b912-3-lxd-3 | internal | enabled  | up    | 2019-04-13T10:59:10.000000 |
-    |  5 | nova-conductor | juju-44b912-3-lxd-3 | internal | enabled  | up    | 2019-04-13T10:59:08.000000 |
-    |  7 | nova-compute   | tidy-goose          | nova     | enabled  | up    | 2019-04-13T10:59:11.000000 |
-    |  8 | nova-compute   | frank-colt          | nova     | enabled  | up    | 2019-04-13T10:59:05.000000 |
-    |  9 | nova-compute   | model-crow          | nova     | disabled | up    | 2019-04-13T10:59:12.000000 |
-    +----+----------------+---------------------+----------+----------+-------+----------------------------+
+   +-------------------+----------+-------+
+   | Host              | Status   | State |
+   +-------------------+----------+-------+
+   | virt-node-01.maas | disabled | up    |
+   | virt-node-10.maas | enabled  | up    |
+   | virt-node-02.maas | enabled  | up    |
+   +-------------------+----------+-------+

-When a compute node failure is detected, masakari will disable the failed node
-and enable the reserve node in nova. After simulating a failure of frank-colt
-the service list now looks like this:
+When a compute node failure is detected, Masakari will, in Nova, disable the
+failed node and enable a reserved node. The state of the node should also show
+as 'down'.

-.. code::
+Presuming that node 'virt-node-02.maas' has failed the cloud's compute node
+list should become:

-    $ openstack compute service list
-    +----+----------------+---------------------+----------+----------+-------+----------------------------+
-    | ID | Binary         | Host                | Zone     | Status   | State | Updated At                 |
-    +----+----------------+---------------------+----------+----------+-------+----------------------------+
-    |  1 | nova-scheduler | juju-44b912-3-lxd-3 | internal | enabled  | up    | 2019-04-13T11:05:20.000000 |
-    |  5 | nova-conductor | juju-44b912-3-lxd-3 | internal | enabled  | up    | 2019-04-13T11:05:28.000000 |
-    |  7 | nova-compute   | tidy-goose          | nova     | enabled  | up    | 2019-04-13T11:05:21.000000 |
-    |  8 | nova-compute   | frank-colt          | nova     | disabled | down  | 2019-04-13T11:03:56.000000 |
-    |  9 | nova-compute   | model-crow          | nova     | enabled  | up    | 2019-04-13T11:05:22.000000 |
-    +----+----------------+---------------------+----------+----------+-------+----------------------------+
+.. code-block:: console

-Since the reserved host has now been enabled and is hosting evacuated guests,
-masakari has removed the reserved flag from it. Masakari has also placed the
-failed node in maintenance mode.
+   +-------------------+----------+-------+
+   | Host              | Status   | State |
+   +-------------------+----------+-------+
+   | virt-node-01.maas | enabled  | up    |
+   | virt-node-10.maas | enabled  | up    |
+   | virt-node-02.maas | disabled | down  |
+   +-------------------+----------+-------+

-.. code::
+The reserved node will begin hosting evacuated instances and Masakari will
+remove the reserved flag from it. It will also place the failed node in
+maintenance mode.

-    $ openstack segment host list 2598f8aa-3612-4731-9716-e126ca6cc280
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
-    | uuid                                 | name       | type    | control_attributes | reserved | on_maintenance | failover_segment_id                  |
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
-    | 4769e08c-ed52-440a-866e-832b977aa5e2 | tidy-goose | COMPUTE | SSH                | False    | False          | 2598f8aa-3612-4731-9716-e126ca6cc280 |
-    | 90aedbd2-e03b-4dbd-b330-a1c848f300df | frank-colt | COMPUTE | SSH                | False    | True           | 2598f8aa-3612-4731-9716-e126ca6cc280 |
-    | c77574cc-b6e7-440e-9c86-84e91981f15e | model-crow | COMPUTE | SSH                | False    | False          | 2598f8aa-3612-4731-9716-e126ca6cc280 |
-    +--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
+The segment's host list should show:

-‘AUTO_PRIORITY’ and ‘RH_PRIORITY’ Recovery Methods
--------------------------------------------------
+.. code-block:: console

-These methods appear to chain the previous methods together. So, auto_priority
-attempts to move the guest using the auto method first and if that fails it
-tries the reserved_host method. rh_priority does the same thing but in the
-reverse order. See
-`Masakari Pike Release Note <https://docs.openstack.org/releasenotes/masakari/pike.html>`_  for details.
+   +-------------------+----------+----------------+
+   | name              | reserved | on_maintenance |
+   +-------------------+----------+----------------+
+   | virt-node-01.maas | False    | False          |
+   | virt-node-10.maas | False    | False          |
+   | virt-node-02.maas | False    | True           |
+   +-------------------+----------+----------------+

-Individual Instance Recovery
----------------------------
+The expectation is that instance 'bionic-1' has been moved from
+'virt-node-02.maas' to the reserved node, host 'virt-node-01.maas':

-Finally, to use the masakari feature which reacts to a single guest failing
-rather than a whole hypervisor, the guest(s) need to be marked with a small
-piece of metadata:
+.. code-block:: console

-.. code::
+   +----------------------+-------------------+
+   | Field                | Value             |
+   +----------------------+-------------------+
+   | OS-EXT-SRV-ATTR:host | virt-node-01.maas |
+   +----------------------+-------------------+

-    $ openstack server set --property HA_Enabled=True server_120419134342
+'auto'
+^^^^^^
+
+The ``auto`` recovery method relocates instances to any available node in the
+same segment. Because all the nodes are active, contrarily to the
+``reserved_host`` method, there is no guarantee that sufficient resources will
+exist on the destination node to accommodate migrated instances.
+
+For example, to create segment 'S2', configure it to use the ``auto`` method,
+and assign it three compute nodes:
+
+.. code-block:: none
+
+   openstack segment create S2 auto COMPUTE
+   openstack segment host create virt-node-01.maas COMPUTE SSH S2
+   openstack segment host create virt-node-02.maas COMPUTE SSH S2
+   openstack segment host create virt-node-10.maas COMPUTE SSH S2
+
+In contrast to the ``reserved_host`` method all the nodes show as active (i.e.
+none are reserved):
+
+.. code-block:: console
+
+   +-------------------+----------+----------------+
+   | name              | reserved | on_maintenance |
+   +-------------------+----------+----------------+
+   | virt-node-10.maas | False    | False          |
+   | virt-node-02.maas | False    | False          |
+   | virt-node-01.maas | False    | False          |
+   +-------------------+----------+----------------+
+
+Continuing with the above observation, upon node failure, there are no
+hypervisors for Masakari to enable in Nova. A failed node will however be put
+``on_maintenance`` in Masakari:
+
+.. code-block:: console
+
+   +-------------------+----------+----------------+
+   | name              | reserved | on_maintenance |
+   +-------------------+----------+----------------+
+   | virt-node-10.maas | False    | False          |
+   | virt-node-02.maas | False    | False          |
+   | virt-node-01.maas | False    | True           |
+   +-------------------+----------+----------------+
+
+'rh_priority' and 'auto_priority'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The below recovery methods utilise one of the previously described methods but
+use the other as a failover.
+
+* ``rh_priority``
+
+  Attempts to evacuate instances using the ``reserved_host`` method. If the
+  latter is unsuccessful the ``auto`` method will be used.
+
+* ``auto_priority``
+
+  Attempts to evacuate instances using the ``auto`` method. If the latter is
+  unsuccessful the ``reserved_host`` method will be used.
+
+Instance restart
+~~~~~~~~~~~~~~~~
+
+The enabling of the instance restart feature is done on a per-instance basis.
+
+For example, tag instance 'bionic-1' as HA-enabled in order to have it
+restarted automatically on its hypervisor:
+
+.. code-block:: none
+
+   openstack server set --property HA_Enabled=True bionic-1
+
+.. important::
+
+   Perhaps non-intuitively, if the instance evacuation feature is not desired a
+   hypervisor must nonetheless be assigned a failover segment in order for the
+   restart feature to be available to its instances.
+
+An instance failure can be simulated by killing its process. First determine
+its hypervisor and ``qemu`` guest name:
+
+.. code-block:: none
+
+   openstack server show bionic-1 -c OS-EXT-SRV-ATTR:host -c OS-EXT-SRV-ATTR:instance_name
+
+Output:
+
+.. code-block:: console
+
+   +-------------------------------+-------------------+
+   | Field                         | Value             |
+   +-------------------------------+-------------------+
+   | OS-EXT-SRV-ATTR:host          | virt-node-02.maas |
+   | OS-EXT-SRV-ATTR:instance_name | instance-00000001 |
+   +-------------------------------+-------------------+
+
+If you do not have admin rights in the cloud the above fields may not be
+visible.
+
+This hypervisor corresponds to unit ``nova-compute/2`` in this example cloud.
+
+Check the current PID, kill the process, wait a minute, and verify that a new
+process gets started:
+
+.. code-block:: none
+
+   juju run --unit nova-compute/2 'pgrep -f guest=instance-00000001'
+   juju run --unit nova-compute/2 'sudo pkill -f -9 guest=instance-00000001'
+   juju run --unit nova-compute/2 'pgrep -f guest=instance-00000001'
+
+Supplementary information
+-------------------------
+
+This section contains information that can be useful when working with
+Masakari.
+
+* Once a failed node has been re-inserted into the cloud it will show, in
+  Nova, as 'disabled' but 'up' and, in Masakari, as 'on_maintenance'. It can
+  become an active hypervisor with:
+
+  .. code-block:: none
+
+     openstack compute service set --enable <host-name> nova-compute
+     openstack segment host update --on_maintenance=False <segment-name> <host-name>
+
+* A segment's recovery method can be updated with:
+
+  .. code-block:: none
+
+     openstack segment update --recovery_method <method> --service_type COMPUTE <segment-name>
+
+* A node cannot be assigned to a segment while it's assigned to another
+  segment. It must first be removed from the current segment with:
+
+  .. code-block:: none
+
+     openstack segment host delete <segment-name> <host-name>
+
+* A node's reserved status can be updated with:
+
+  .. code-block:: none
+
+     openstack segment host update --reserved=<boolean> <segment-name> <host-name>
+
+.. LINKS
+.. _MAAS: https://maas.io
+.. _Masakari charm: http://jaas.ai/masakari
+.. _openstack-base: https://jaas.ai/openstack-base
+.. _OpenStack high availability: app-ha.html#ha-applications
+.. _Configure OpenStack: config-openstack.html