The Charm Guide was recently updated for Juju 3.x. However, it was not known at the time that the semantics of the --wait option for the `juju run` command had changed. In 3, the run command by default stays in the foreground (a --background option has been added to achieve previous 2.9 default behaviour). The --wait option is now a timeout and, if used, requires a value. Since the previous PR simply substituted commands, all `juju run --wait` commands will fail. Change-Id: I6bb90762ad5cb5ca97ca311501b1ff7d3d9a3ccb Signed-off-by: Peter Matulis <peter.matulis@canonical.com>
7.5 KiB
- orphan
Replace a hyperconverged Ceph storage and compute node
Important
This page has been identified as being affected by the breaking
changes introduced between versions 2.9.x and 3.x of the Juju client.
Read support note juju_29_3x_changes before continuing.
Introduction
A common topology for Charmed OpenStack is the co-location of the nova-compute and the ceph-osd applications. This article covers the removal and redeployment of such a data plane cloud node.
Important
For the target cloud node, only nova-compute, ceph-osd, and their subordinate applications are assumed to be deployed.
Warning
Migration involves disabling compute services on the source host, effectively removing the hypervisor from the cloud.
Procedure
First ensure that the cloud is in a healthy state, the Compute services and Ceph cluster in particular.
Identify cloud node specifics
Identify the unit and hypervisor name of the compute node:
juju status nova-compute
openstack hypervisor list
Identify the unit of the storage node and the IDs of the associated OSDs:
juju status ceph-osd
juju exec -a ceph-osd mount | grep ceph
juju ssh ceph-mon/leader sudo ceph osd tree
In this example,
- the existing compute node:
- is hosted on unit
nova-compute/0 - has a hypervisor name of node2.maas
- is hosted on unit
- the existing storage node:
- is hosted on unit
ceph-osd/2 - has two OSDs present and their IDs are 0 and 1
- is hosted on unit
The ID of the existing Juju machine is common to both applications and is assumed to be 14.
The ID of the replacement Juju machine is assumed to be 21.
- the new storage node:
- will be hosted on unit
ceph-osd/10 - will have storage disks
/dev/nvme0n1 /dev/nvme0n2
- will be hosted on unit
Warning
You must replace the values in this example with the values of your actual environment.
Disable nova-compute services
Disable nova-compute services on the node:
juju run nova-compute/0 disable
Respawn any Octavia VMs
Skip this section if Octavia is not deployed in the cloud.
Any possible Octavia load balancer VMs (amphorae) need to be identified and respawned.
Note
Migrating the amphorae like any other VMs (see next section) may work but the Octavia project recommends respawning (failing over) its VMs. This is because migration may take longer than expected, which may in turn cause Octavia to see its VMs as lost/stale. See Evacuating a Specific Amphora from a Host in the upstream documentation.
List the amphorae hosted on the node:
openstack server list --host node2.maas --all-projects | grep amphora
The Amphora ID is appended to the VM name.
For each VM,
gather the load balancer ID:
openstack loadbalancer amphora show <Amphora ID>respawn an Octavia VM and monitor its progress:
openstack loadbalancer failover <LB ID> watch 'openstack loadbalancer amphora list | grep <LB ID>'The original VM will be removed from the compute node.
Live migrate the compute node VMs
Evacuate the compute node's VMs by live migration:
nova host-evacuate-live node2.maas
See cloud operation Live migrate VMs from a running compute node
<ops-live-migrate-vms> for in-depth coverage of live
migration.
Ensure that all VMs have been evacuated:
juju ssh nova-compute/0 sudo virsh list --all
Unregister objects from the cloud
Unregister the compute node
Unregister the compute node from the cloud:
juju run nova-compute/0 remove-from-cloud
See cloud operation Scale back the nova-compute application
<unregister_compute_node> for more details on this
step.
Unregister the neutron agents
Unregister the associated neutron agent from the cloud. The agent's ID should be the compute node's name. Verify this by first listing the agents:
openstack network agent list
openstack network agent delete node2.maas
Remove OSD storage devices
juju run ceph-osd/2 remove-disk osd-ids=osd.0 purge=true
juju run ceph-osd/2 remove-disk osd-ids=osd.1 purge=true
Note
The Ceph operation Removing OSDs has
more details on the remove-disk action.
Remove and add a Juju machine
Remove the affected Juju machine from the model:
juju remove-machine 14
Add a Juju machine
juju add-machine
The machine's hardware requirements can be stated via the
--constraints option. This option can also be used to
select a particular MAAS node by specifying a MAAS tag. The chosen
machine should have the storage devices necessary to compensate for the
Ceph OSDs that were removed.
Add Ceph storage and compute services
Add Ceph storage and compute services to the new Juju machine:
juju add-unit nova-compute --to 21
juju add-unit ceph-osd --to 21
Integrate the new Ceph disks
The current value of the ceph-osd charm option
osd-devices may match the two storage devices belonging to
the new cloud node. In such a case, there is nothing else to do; the
disks will be integrated into the cluster automatically.
First list all the disks on the new storage node:
juju run ceph-osd/10 list-disks
Then query the charm option:
juju config ceph-osd osd-devices
If the new disk is not represented by the option's value you can either change the value (which applies to the entire cluster) or use the add-disk action against the new ceph-osd unit. Here, we'll use the action using our previously-assumed values:
juju run ceph-osd/10 add-disk \
osd-devices='/dev/nvme0n1 /dev/nvme0n2'
Inspect Ceph cluster changes
It is recommended to get a summary of the Ceph cluster using the commands used previously. In particular, the ceph-osd unit number will have changed:
juju status ceph-osd
juju exec -a ceph-osd mount | grep ceph
juju ssh ceph-mon/leader sudo ceph osd tree
Customise the local environment
Perform any customisations that may be required as per the local environment. This may include:
- Adding the new compute node to a Nova aggregate or availability zone
- Setting CRUSH device classes for the new Ceph OSDs
Verify the new cloud node
The hyperconverged Ceph storage and compute node has now been replaced.
Verify that the new compute node is functional. See the verification step in cloud operation Scale out the nova-compute application <scale_out_nova_compute_verfication> for guidance.
Verify that the Ceph cluster is healthy:
juju ssh ceph-mon/leader sudo ceph status