:orphan: ====================================================== Replace a hyperconverged Ceph storage and compute node ====================================================== .. important:: This page has been identified as being affected by the breaking changes introduced between versions 2.9.x and 3.x of the Juju client. Read support note :ref:`juju_29_3x_changes` before continuing. Introduction ------------ A common topology for Charmed OpenStack is the co-location of the nova-compute and the ceph-osd applications. This article covers the removal and redeployment of such a data plane cloud node. .. important:: For the target cloud node, only nova-compute, ceph-osd, and their subordinate applications are assumed to be deployed. .. warning:: Migration involves disabling compute services on the source host, effectively removing the hypervisor from the cloud. Procedure --------- First ensure that the cloud is in a healthy state, the Compute services and Ceph cluster in particular. Identify cloud node specifics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Identify the unit and hypervisor name of the compute node: .. code-block:: none juju status nova-compute openstack hypervisor list Identify the unit of the storage node and the IDs of the associated OSDs: .. code-block:: none juju status ceph-osd juju exec -a ceph-osd mount | grep ceph juju ssh ceph-mon/leader sudo ceph osd tree In this example, * the existing compute node: * is hosted on unit ``nova-compute/0`` * has a hypervisor name of node2.maas * the existing storage node: * is hosted on unit ``ceph-osd/2`` * has two OSDs present and their IDs are 0 and 1 The ID of the existing Juju machine is common to both applications and is assumed to be 14. The ID of the replacement Juju machine is assumed to be 21. * the new storage node: * will be hosted on unit ``ceph-osd/10`` * will have storage disks ``/dev/nvme0n1 /dev/nvme0n2`` .. warning:: You must replace the values in this example with the values of your actual environment. Disable nova-compute services ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Disable nova-compute services on the node: .. code-block:: none juju run nova-compute/0 disable Respawn any Octavia VMs ~~~~~~~~~~~~~~~~~~~~~~~ Skip this section if Octavia is not deployed in the cloud. Any possible Octavia load balancer VMs (amphorae) need to be identified and respawned. .. note:: Migrating the amphorae like any other VMs (see next section) may work but the Octavia project recommends respawning (failing over) its VMs. This is because migration may take longer than expected, which may in turn cause Octavia to see its VMs as lost/stale. See `Evacuating a Specific Amphora from a Host`_ in the upstream documentation. List the amphorae hosted on the node: .. code-block:: none openstack server list --host node2.maas --all-projects | grep amphora The Amphora ID is appended to the VM name. For each VM, #. gather the load balancer ID: .. code-block:: none openstack loadbalancer amphora show #. respawn an Octavia VM and monitor its progress: .. code-block:: none openstack loadbalancer failover watch 'openstack loadbalancer amphora list | grep ' The original VM will be removed from the compute node. Live migrate the compute node VMs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Evacuate the compute node's VMs by live migration: .. code-block:: none nova host-evacuate-live node2.maas See cloud operation :doc:`Live migrate VMs from a running compute node ` for in-depth coverage of live migration. Ensure that all VMs have been evacuated: .. code-block:: none juju ssh nova-compute/0 sudo virsh list --all Unregister objects from the cloud ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Unregister the compute node ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Unregister the compute node from the cloud: .. code-block:: none juju run nova-compute/0 remove-from-cloud See cloud operation :ref:`Scale back the nova-compute application ` for more details on this step. Unregister the neutron agents ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Unregister the associated neutron agent from the cloud. The agent's ID should be the compute node's name. Verify this by first listing the agents: .. code-block:: none openstack network agent list openstack network agent delete node2.maas Remove OSD storage devices ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: none juju run ceph-osd/2 remove-disk osd-ids=osd.0 purge=true juju run ceph-osd/2 remove-disk osd-ids=osd.1 purge=true .. note:: The Ceph operation `Removing OSDs`_ has more details on the ``remove-disk`` action. Remove and add a Juju machine ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Remove the affected Juju machine from the model: .. code-block:: none juju remove-machine 14 Add a Juju machine .. code-block:: none juju add-machine The machine's hardware requirements can be stated via the ``--constraints`` option. This option can also be used to select a particular MAAS node by specifying a MAAS tag. The chosen machine should have the storage devices necessary to compensate for the Ceph OSDs that were removed. Add Ceph storage and compute services ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Add Ceph storage and compute services to the new Juju machine: .. code-block:: none juju add-unit nova-compute --to 21 juju add-unit ceph-osd --to 21 Integrate the new Ceph disks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The current value of the ceph-osd charm option ``osd-devices`` may match the two storage devices belonging to the new cloud node. In such a case, there is nothing else to do; the disks will be integrated into the cluster automatically. First list all the disks on the new storage node: .. code-block:: none juju run ceph-osd/10 list-disks Then query the charm option: .. code-block:: none juju config ceph-osd osd-devices If the new disk is not represented by the option's value you can either change the value (which applies to the entire cluster) or use the `add-disk` action against the new ceph-osd unit. Here, we'll use the action using our previously-assumed values: .. code-block:: none juju run ceph-osd/10 add-disk \ osd-devices='/dev/nvme0n1 /dev/nvme0n2' Inspect Ceph cluster changes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is recommended to get a summary of the Ceph cluster using the commands used previously. In particular, the ceph-osd unit number will have changed: .. code-block:: none juju status ceph-osd juju exec -a ceph-osd mount | grep ceph juju ssh ceph-mon/leader sudo ceph osd tree Customise the local environment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Perform any customisations that may be required as per the local environment. This may include: #. Adding the new compute node to a Nova aggregate or availability zone #. Setting CRUSH device classes for the new Ceph OSDs Verify the new cloud node ~~~~~~~~~~~~~~~~~~~~~~~~~ The hyperconverged Ceph storage and compute node has now been replaced. Verify that the new compute node is functional. See the verification step in cloud operation `Scale out the nova-compute application ` for guidance. Verify that the Ceph cluster is healthy: .. code-block:: none juju ssh ceph-mon/leader sudo ceph status .. LINKS .. _nova-compute charm: https://charmhub.io/nova-compute .. _Evacuating a Specific Amphora from a Host: https://docs.openstack.org/octavia/latest/admin/guides/operator-maintenance.html#evacuating-a-specific-amphora-from-a-host .. _Removing OSDs: https://ubuntu.com/ceph/docs/removing-osds