diff --git a/doc/source/storage/kubernetes/index.rst b/doc/source/storage/kubernetes/index.rst index e3bde1865..0d39a0b16 100644 --- a/doc/source/storage/kubernetes/index.rst +++ b/doc/source/storage/kubernetes/index.rst @@ -119,9 +119,10 @@ Configure Ceph OSDs on a Host add-ssd-backed-journals-using-horizon add-ssd-backed-journals-using-the-cli add-a-storage-tier-using-the-cli - replace-osds-and-journal-disks provision-storage-on-a-controller-or-storage-host-using-horizon provision-storage-on-a-storage-host-using-the-cli + replace-osds-and-journal-disks + replace-osds-on-an-aio-dx-system-319b0bc2f7e6 ------------------------- Persistent Volume Support diff --git a/doc/source/storage/kubernetes/replace-osds-and-journal-disks.rst b/doc/source/storage/kubernetes/replace-osds-and-journal-disks.rst index 5868deeeb..c6f6d7802 100644 --- a/doc/source/storage/kubernetes/replace-osds-and-journal-disks.rst +++ b/doc/source/storage/kubernetes/replace-osds-and-journal-disks.rst @@ -13,8 +13,18 @@ You can replace failed storage devices on storage nodes. For best results, ensure the replacement disk is the same size as others in the same peer group. Do not substitute a smaller disk than the original. -The replacement disk is automatically formatted and updated with data when the -storage host is unlocked. For more information, see |node-doc|: :ref:`Change -Hardware Components for a Storage Host -`. +.. note:: + Due to a limitation in **udev**, the device path of a disk connected through + a SAS controller changes when the disk is replaced. Therefore, in the + general procedure below, you must lock, delete, and re-install the node. + However, for an |AIO-DX| system, use the following alternative procedure to + replace |OSDs| without reinstalling the host: + :ref:`Replace OSDs on an AIO-DX System `. +.. rubric:: |proc| + +Follow the procedure located at |node-doc|: :ref:`Change +Hardware Components for a Storage Host `. + +The replacement disk is automatically formatted and updated with data when the +storage host is unlocked. diff --git a/doc/source/storage/kubernetes/replace-osds-on-an-aio-dx-system-319b0bc2f7e6.rst b/doc/source/storage/kubernetes/replace-osds-on-an-aio-dx-system-319b0bc2f7e6.rst new file mode 100644 index 000000000..285373221 --- /dev/null +++ b/doc/source/storage/kubernetes/replace-osds-on-an-aio-dx-system-319b0bc2f7e6.rst @@ -0,0 +1,117 @@ +.. _replace-osds-on-an-aio-dx-system-319b0bc2f7e6: + +================================ +Replace OSDs on an AIO-DX System +================================ + +On systems that use a Ceph backend for persistent storage, you can replace +storage disks or swap an |AIO-DX| node while the system is running, even if the +storage resources are in active use. + +.. note:: + All storage alarms need to be cleared before starting this procedure. + +.. rubric:: |context| + +You can replace |OSDs| in an |AIO-DX| system to increase capacity, or replace +faulty disks on the host without reinstalling the host. + +.. rubric:: |proc| + +#. Ensure that the controller with the |OSD| to be replaced is the standby + controller. + + For example, if the disk replacement has to be done on controller-1 + and it is the active controller, use the following command to swact the + controller to controller-0: + + .. code-block:: none + + ~(keystone_admin)$ system host-show controller-1 | fgrep capabilities + ~(keystone_admin)$ system host-swact controller-1 + + After controller swact, you will have to connect via ssh again to the + to connect to the newly active controller-0. + +#. Determine the **osdid** of the disk that is to be replaced. + + .. code-block:: none + + ~(keystone_admin)$ system host-stor-list controller-1 + +#. Lock the standby controller-1 to make the changes. + + .. code-block:: none + + ~(keystone_admin)$ system host-lock controller-1 + +#. Run the :command:`ceph osd destroy osd. --yes-i-really-mean-it` command. + + .. code-block:: none + + ~(keystone_admin)$ ceph osd destroy osd. --yes-i-really-mean-it + +#. Power down controller-1. + +#. Replace the storage disk. + +#. Power on controller-1. + +#. Unlock controller-1. + + .. code-block:: none + + ~(keystone_admin)]$ system host-unlock controller-1 + +#. Wait for the recovery process in the Ceph cluster to complete. + + .. code-block:: none + + ~(keystone_admin)]$ ceph -s + + cluster: + id: 50ce952f-bd16-4864-9487-6c7e959be95e + health: HEALTH_WARN + Degraded data redundancy: 13/50 objects degraded (26.000%), 10 pgs degraded + + services: + mon: 1 daemons, quorum controller (age 68m) + mgr: controller-0(active, since 66m) + mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby + osd: 2 osds: 2 up (since 9s), 2 in (since 9s) + + data: + pools: 3 pools, 192 pgs + objects: 25 objects, 300 MiB + usage: 655 MiB used, 15 GiB / 16 GiB avail + pgs: 13/50 objects degraded (26.000%) + 182 active+clean + 8 active+recovery_wait+degraded + 2 active+recovering+degraded + + io: + recovery: 24 B/s, 1 keys/s, 1 objects/s + +#. Ensure that the Ceph cluster is healthy. + + .. code-block:: none + + ~(keystone_admin)]$ ceph -s + + cluster: + id: 50ce952f-bd16-4864-9487-6c7e959be95e + health: HEALTH_OK + + services: + mon: 1 daemons, quorum controller (age 68m) + mgr: controller-0(active, since 66m), standbys: controller-1 + mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby + osd: 2 osds: 2 up (since 36s), 2 in (since 36s) + + data: + pools: 3 pools, 192 pgs + objects: 25 objects, 300 MiB + usage: 815 MiB used, 15 GiB / 16 GiB avail + pgs: 192 active+clean + +