tripleo-docs/deploy-guide/source/post_deployment/quiesce_cephstorage.rst
Alex Schultz c6918e5da6 Migrate install to deploy-guide
The deployment guide is currently pointed at triplo-docs but it has been
requested that we actually publish a deployment guide. This change
extracts many of the installation doc pages and moves them into the
deploy-guide source tree.  Once the deploy-guide is published, we will
follow up to reference the deployment guide from tripleo-docs.

Change-Id: I0ebd26f014180a92c6cf4ab0929d99b2d860796f
2019-08-16 15:42:17 -06:00

1.8 KiB

Quiescing a CephStorage Node

The process of quiescing a cephstorage node means to inform the Ceph cluster that one or multiple OSDs will be permanently removed so that the node can be shut down without affecting the data availability.

Take the OSDs out of the cluster

Before you remove an OSD, you need to take it out of the cluster so that Ceph can begin rebalancing and copying its data to other OSDs. Running the following commands on a given cephstorage node will take all data out of the OSDs hosted on it:

OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
for OSD_ID in $OSD_IDS; do ceph crush reweight osd.$OSD_ID 0.0; done

Ceph will begin rebalancing the cluster by migrating placement groups out of the OSDs. You can observe this process with the ceph tool:

ceph -w

You should see the placement group states change from active+clean to active, some degraded objects, and finally active+clean when migration completes.

Removing the OSDs

After the rebalancing, the OSDs will still be running. Running the following on that same cephstorage node will stop all OSDs hosted on it, remove them from the CRUSH map, from the OSDs map and delete the authentication keys:

OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
for OSD_ID in $OSD_IDS; do
  ceph osd out $OSD_ID
  systemctl stop ceph-osd@$OSD_ID
  ceph osd crush remove osd.$OSD_ID
  ceph auth del osd.$OSD_ID
  ceph osd rm $OSD_ID
done

You are now free to reboot or shut down the node (using the Ironic API), or even remove it from the overcloud altogether by scaling down the overcloud deployment, see delete_nodes.