New disk replacement procedures
Change-Id: I703a61da792e59fdd19dfcaef5376b7f2f2ca975 Signed-off-by: Keane Lim <keane.lim@windriver.com>
This commit is contained in:
@@ -123,7 +123,11 @@ Configure Ceph OSDs on a Host
|
|||||||
provision-storage-on-a-controller-or-storage-host-using-horizon
|
provision-storage-on-a-controller-or-storage-host-using-horizon
|
||||||
provision-storage-on-a-storage-host-using-the-cli
|
provision-storage-on-a-storage-host-using-the-cli
|
||||||
replace-osds-and-journal-disks
|
replace-osds-and-journal-disks
|
||||||
|
replace-osds-on-a-standard-system-f3b1e376304c
|
||||||
replace-osds-on-an-aio-dx-system-319b0bc2f7e6
|
replace-osds-on-an-aio-dx-system-319b0bc2f7e6
|
||||||
|
replace-osds-on-an-aio-sx-multi-disk-system-b4ddd1c1257c
|
||||||
|
replace-osds-on-an-aio-sx-single-disk-system-without-backup-951eefebd1f2
|
||||||
|
replace-osds-on-an-aio-sx-single-disk-system-with-backup-770c9324f372
|
||||||
|
|
||||||
-------------------------
|
-------------------------
|
||||||
Persistent Volume Support
|
Persistent Volume Support
|
||||||
|
|||||||
@@ -17,9 +17,23 @@ the same peer group. Do not substitute a smaller disk than the original.
|
|||||||
Due to a limitation in **udev**, the device path of a disk connected through
|
Due to a limitation in **udev**, the device path of a disk connected through
|
||||||
a SAS controller changes when the disk is replaced. Therefore, in the
|
a SAS controller changes when the disk is replaced. Therefore, in the
|
||||||
general procedure below, you must lock, delete, and re-install the node.
|
general procedure below, you must lock, delete, and re-install the node.
|
||||||
However, for an |AIO-DX| system, use the following alternative procedure to
|
However, for standard, |AIO-SX|, and |AIO-DX| systems, use the following
|
||||||
replace |OSDs| without reinstalling the host:
|
alternative procedures to replace |OSDs| without reinstalling the host:
|
||||||
:ref:`Replace OSDs on an AIO-DX System <replace-osds-on-an-aio-dx-system-319b0bc2f7e6>`.
|
|
||||||
|
- :ref:`Replace OSDs on a Standard System
|
||||||
|
<replace-osds-on-a-standard-system-f3b1e376304c>`
|
||||||
|
|
||||||
|
- :ref:`Replace OSDs on an AIO-DX System
|
||||||
|
<replace-osds-on-an-aio-dx-system-319b0bc2f7e6>`
|
||||||
|
|
||||||
|
- :ref:`Replace OSDs on an AIO-SX Multi-Disk System
|
||||||
|
<replace-osds-on-an-aio-sx-multi-disk-system-b4ddd1c1257c>`
|
||||||
|
|
||||||
|
- :ref:`Replace ODSs on an AIO-SX Single Disk System without Backup
|
||||||
|
<replace-osds-on-an-aio-sx-single-disk-system-without-backup-951eefebd1f2>`
|
||||||
|
|
||||||
|
- :ref:`Replace OSDs on an AIO-SX Single Disk System with Backup
|
||||||
|
<replace-osds-on-an-aio-sx-single-disk-system-with-backup-770c9324f372>`
|
||||||
|
|
||||||
.. rubric:: |proc|
|
.. rubric:: |proc|
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,176 @@
|
|||||||
|
.. _replace-osds-on-a-standard-system-f3b1e376304c:
|
||||||
|
|
||||||
|
=================================
|
||||||
|
Replace OSDs on a Standard System
|
||||||
|
=================================
|
||||||
|
|
||||||
|
You can replace |OSDs| in a standard system to increase capacity, or replace
|
||||||
|
faulty disks on the host without reinstalling the host.
|
||||||
|
|
||||||
|
.. rubric:: |prereq|
|
||||||
|
|
||||||
|
For standard systems with controller storage, ensure that the controller with the |OSD| to be replaced is the standby controller.
|
||||||
|
|
||||||
|
For example, if the disk replacement has to be done on controller-1 and it is the active controller, use the following command to swact the controller to controller-0:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-swact controller-1
|
||||||
|
|
||||||
|
After controller swact, you will have to connect via ssh again to the <oam-floating-ip> to connect to the newly active controller-0.
|
||||||
|
|
||||||
|
.. rubric:: |proc|
|
||||||
|
|
||||||
|
**Standard systems with controller storage**
|
||||||
|
|
||||||
|
#. If controller-1 has the OSD to be replaced, lock it.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-lock controller-1
|
||||||
|
|
||||||
|
#. Run the :command:`ceph osd destroy osd.<ID> --yes-i-really-mean-it` command.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd destroy osd.<id> --yes-i-really-mean-it
|
||||||
|
|
||||||
|
#. Power down controller-1.
|
||||||
|
|
||||||
|
#. Replace the storage disk.
|
||||||
|
|
||||||
|
#. Power on controller-1.
|
||||||
|
|
||||||
|
#. Unlock controller-1.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ system host-unlock controller-1
|
||||||
|
|
||||||
|
#. Wait for the recovery process in the Ceph cluster to start and finish.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 50ce952f-bd16-4864-9487-6c7e959be95e
|
||||||
|
health: HEALTH_WARN
|
||||||
|
Degraded data redundancy: 13/50 objects degraded (26.000%), 10 pgs degraded
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller (age 68m)
|
||||||
|
mgr: controller-0(active, since 66m)
|
||||||
|
mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
|
||||||
|
osd: 2 osds: 2 up (since 9s), 2 in (since 9s)
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 25 objects, 300 MiB
|
||||||
|
usage: 655 MiB used, 15 GiB / 16 GiB avail
|
||||||
|
pgs: 13/50 objects degraded (26.000%)
|
||||||
|
182 active+clean
|
||||||
|
8 active+recovery_wait+degraded
|
||||||
|
2 active+recovering+degraded
|
||||||
|
|
||||||
|
io:
|
||||||
|
recovery: 24 B/s, 1 keys/s, 1 objects/s
|
||||||
|
|
||||||
|
#. Ensure that the Ceph cluster is healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 50ce952f-bd16-4864-9487-6c7e959be95e
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller (age 68m)
|
||||||
|
mgr: controller-0(active, since 66m), standbys: controller-1
|
||||||
|
mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
|
||||||
|
osd: 2 osds: 2 up (since 36s), 2 in (since 36s)
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 25 objects, 300 MiB
|
||||||
|
usage: 815 MiB used, 15 GiB / 16 GiB avail
|
||||||
|
pgs: 192 active+clean
|
||||||
|
|
||||||
|
**Standard systems with dedicated storage nodes**
|
||||||
|
|
||||||
|
#. If storage-1 has the OSD to be replaced, lock it.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-lock storage-1
|
||||||
|
|
||||||
|
#. Run the :command:`ceph osd destroy osd.<ID> --yes-i-really-mean-it` command.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd destroy osd.<id> --yes-i-really-mean-it
|
||||||
|
|
||||||
|
#. Power down storage-1.
|
||||||
|
|
||||||
|
#. Replace the storage disk.
|
||||||
|
|
||||||
|
#. Power on storage-1.
|
||||||
|
|
||||||
|
#. Unlock storage-1.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ system host-unlock storage-1
|
||||||
|
|
||||||
|
#. Wait for the recovery process in the Ceph cluster to start and finish.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 50ce952f-bd16-4864-9487-6c7e959be95e
|
||||||
|
health: HEALTH_WARN
|
||||||
|
Degraded data redundancy: 13/50 objects degraded (26.000%), 10 pgs degraded
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller (age 68m)
|
||||||
|
mgr: controller-0(active, since 66m)
|
||||||
|
mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
|
||||||
|
osd: 2 osds: 2 up (since 9s), 2 in (since 9s)
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 25 objects, 300 MiB
|
||||||
|
usage: 655 MiB used, 15 GiB / 16 GiB avail
|
||||||
|
pgs: 13/50 objects degraded (26.000%)
|
||||||
|
182 active+clean
|
||||||
|
8 active+recovery_wait+degraded
|
||||||
|
2 active+recovering+degraded
|
||||||
|
|
||||||
|
io:
|
||||||
|
recovery: 24 B/s, 1 keys/s, 1 objects/s
|
||||||
|
|
||||||
|
#. Ensure that the Ceph cluster is healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 50ce952f-bd16-4864-9487-6c7e959be95e
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller (age 68m)
|
||||||
|
mgr: controller-0(active, since 66m), standbys: controller-1
|
||||||
|
mds: kube-cephfs:1 {0=controller-0=up:active} 1 up:standby
|
||||||
|
osd: 2 osds: 2 up (since 36s), 2 in (since 36s)
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 25 objects, 300 MiB
|
||||||
|
usage: 815 MiB used, 15 GiB / 16 GiB avail
|
||||||
|
pgs: 192 active+clean
|
||||||
@@ -0,0 +1,203 @@
|
|||||||
|
.. _replace-osds-on-an-aio-sx-multi-disk-system-b4ddd1c1257c:
|
||||||
|
|
||||||
|
===========================================
|
||||||
|
Replace OSDs on an AIO-SX Multi-Disk System
|
||||||
|
===========================================
|
||||||
|
|
||||||
|
You can replace |OSDs| in an |AIO-SX| system to increase capacity, or replace
|
||||||
|
faulty disks on the host without reinstalling the host.
|
||||||
|
|
||||||
|
.. rubric:: |proc|
|
||||||
|
|
||||||
|
**Replication factor > 1**
|
||||||
|
|
||||||
|
#. Make sure there is more than one OSD installed, otherwise there could be
|
||||||
|
data loss.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd tree
|
||||||
|
|
||||||
|
#. Verify that all Ceph pools are present.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd lspools
|
||||||
|
|
||||||
|
#. For each pool, make sure its size attribute is larger than 1, otherwise
|
||||||
|
there could be data loss.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd pool get <pool-name> size
|
||||||
|
|
||||||
|
#. Disable pool size change during the procedure. This must be run for all
|
||||||
|
pools.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd pool set <pool-name> nosizechange true
|
||||||
|
|
||||||
|
#. Verify that the Ceph cluster is healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 50ce952f-bd16-4864-9487-6c7e959be95e
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
#. Lock the controller.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-lock controller-0
|
||||||
|
|
||||||
|
#. Power down the controller.
|
||||||
|
|
||||||
|
#. Replace the disk.
|
||||||
|
|
||||||
|
#. Power on the controller.
|
||||||
|
|
||||||
|
#. Unlock the controller.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-unlock controller-0
|
||||||
|
|
||||||
|
#. Wait for the recovery process in the Ceph cluster to start and finish.
|
||||||
|
|
||||||
|
#. Ensure that the Ceph cluster is healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 50ce952f-bd16-4864-9487-6c7e959be95e
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
#. Enable pool size changes.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph osd pool set <pool-name> nosizechange false
|
||||||
|
|
||||||
|
|
||||||
|
**Replication factor 1 with space to backup**
|
||||||
|
|
||||||
|
#. Make sure there is more than one OSD installed, otherwise there could be
|
||||||
|
data loss.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd tree
|
||||||
|
|
||||||
|
#. Verify all present ceph pools.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd lspools
|
||||||
|
|
||||||
|
#. For each pool, make sure its size attribute is larger than 1, otherwise
|
||||||
|
there could be data loss.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd pool get <pool-name> size
|
||||||
|
|
||||||
|
#. Disable pool size change during the procedure. This must be run for all
|
||||||
|
pools.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd pool set <pool-name> nosizechange true
|
||||||
|
|
||||||
|
#. Verify that the Ceph cluster is healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 50ce952f-bd16-4864-9487-6c7e959be95e
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
#. Lock the controller.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-lock controller-0
|
||||||
|
|
||||||
|
#. Power down the controller.
|
||||||
|
|
||||||
|
#. Replace the disk.
|
||||||
|
|
||||||
|
#. Power on the controller.
|
||||||
|
|
||||||
|
#. Unlock the controller.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-unlock controller-0
|
||||||
|
|
||||||
|
#. Wait for the recovery process in the Ceph cluster to start and finish.
|
||||||
|
|
||||||
|
#. Ensure that the Ceph cluster is healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 50ce952f-bd16-4864-9487-6c7e959be95e
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
#. Enable pool size changes.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph osd pool set <pool-name> nosizechange false
|
||||||
|
|
||||||
|
#. Set the replication factor to 1 for all pools.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)]$ ceph osd pool set <pool-name> size 1
|
||||||
|
|
||||||
|
|
||||||
|
**Replication factor 1 without space to backup**
|
||||||
|
|
||||||
|
#. Lock the controller.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-lock controller-0
|
||||||
|
|
||||||
|
#. Backup file /etc/pmon.d/ceph.conf, then remove it.
|
||||||
|
|
||||||
|
#. Mark |OSD| as out and down, stop it, and destroy it.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd out osd.<id>
|
||||||
|
~(keystone_admin)$ ceph osd down osd.<id>
|
||||||
|
~(keystone_admin)$ sudo /etc/init.d/ceph stop osd.1
|
||||||
|
~(keystone_admin)$ ceph osd destroy osd.1
|
||||||
|
|
||||||
|
#. Shutdown the machine, replace disk, turn it on, and wait for boot to finish.
|
||||||
|
|
||||||
|
#. Unlock the controller.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-unlock controller-0
|
||||||
|
|
||||||
|
#. Copy the backup ceph.conf to /etc/pmon.d/.
|
||||||
|
|
||||||
|
#. Verify that the Ceph cluster is healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph -s
|
||||||
@@ -0,0 +1,236 @@
|
|||||||
|
.. _replace-osds-on-an-aio-sx-single-disk-system-with-backup-770c9324f372:
|
||||||
|
|
||||||
|
========================================================
|
||||||
|
Replace OSDs on an AIO-SX Single Disk System with Backup
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
When replacing |OSDs| on an AIO-SX system with replication factor 1, it is possible to make a backup.
|
||||||
|
|
||||||
|
.. rubric:: |prereq|
|
||||||
|
|
||||||
|
Verify if there is an available disk to create a new |OSD| in order to backup
|
||||||
|
data from an existing |OSD|. Make sure the disk is at least the same size as
|
||||||
|
the disk to be replaced.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-disk-list controller-0
|
||||||
|
|
||||||
|
.. rubric:: |proc|
|
||||||
|
|
||||||
|
#. Add the new OSD with the previously displayed disk UUID of the available
|
||||||
|
disk identified in the prerequisites.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-stor-add controller-0 <disk uuid>
|
||||||
|
|
||||||
|
#. Wait for the new OSD to get configured. Run :command:`ceph -s` to verify
|
||||||
|
that the output shows two |OSDs| and that the cluster has finished recovery.
|
||||||
|
Make sure the Ceph cluster is healthy (``HEALTH_OK``) before proceeding.
|
||||||
|
|
||||||
|
#. Change replication factor of the pools to 2.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd lspools # will list all ceph pools
|
||||||
|
~(keystone_admin)$ ceph osd pool set <pool-name> size 2
|
||||||
|
~(keystone_admin)$ ceph osd pool set <pool-name> nosizechange true
|
||||||
|
|
||||||
|
This will make the cluster enter a recovery state:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
[sysadmin@controller-0 ~(keystone_admin)]$ ceph -s
|
||||||
|
cluster:
|
||||||
|
id: 38563514-4726-4664-9155-5efd5701de86
|
||||||
|
health: HEALTH_WARN
|
||||||
|
Degraded data redundancy: 3/57 objects degraded (5.263%), 3 pgs degraded
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller-0 (age 28m)
|
||||||
|
mgr: controller-0(active, since 27m)
|
||||||
|
mds: kube-cephfs:1 {0=controller-0=up:active}
|
||||||
|
osd: 2 osds: 2 up (since 6m), 2 in (since 6m)
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 32 objects, 1000 MiB
|
||||||
|
usage: 1.2 GiB used, 16 GiB / 18 GiB avail
|
||||||
|
pgs: 2.604% pgs not active
|
||||||
|
3/57 objects degraded (5.263%)
|
||||||
|
184 active+clean
|
||||||
|
5 activating
|
||||||
|
2 active+recovery_wait+degraded
|
||||||
|
1 active+recovering+degraded
|
||||||
|
|
||||||
|
io:
|
||||||
|
recovery: 323 B/s, 1 keys/s, 3 objects/s
|
||||||
|
|
||||||
|
|
||||||
|
#. Wait for recovery to end and the Ceph cluster to become healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph -s
|
||||||
|
|
||||||
|
cluster:
|
||||||
|
id: 38563514-4726-4664-9155-5efd5701de86
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller-0 (age 28m)
|
||||||
|
mgr: controller-0(active, since 28m)
|
||||||
|
mds: kube-cephfs:1 {0=controller-0=up:active}
|
||||||
|
osd: 2 osds: 2 up (since 7m), 2 in (since 7m)
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 32 objects, 1000 MiB
|
||||||
|
usage: 2.2 GiB used, 15 GiB / 18 GiB avail
|
||||||
|
pgs: 192 active+clean
|
||||||
|
|
||||||
|
#. Lock the system.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-lock controller-0
|
||||||
|
|
||||||
|
#. Mark the |OSD| out.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd out osd.<id>
|
||||||
|
|
||||||
|
#. Wait for the rebalance to finish.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
[sysadmin@controller-0 ~(keystone_admin)]$ ceph -s
|
||||||
|
cluster:
|
||||||
|
id: 38563514-4726-4664-9155-5efd5701de86
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller-0 (age 37m)
|
||||||
|
mgr: controller-0(active, since 36m)
|
||||||
|
mds: kube-cephfs:1 {0=controller-0=up:active}
|
||||||
|
osd: 2 osds: 2 up (since 15m), 1 in (since 2s)
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 32 objects, 1000 MiB
|
||||||
|
usage: 808 MiB used, 8.0 GiB / 8.8 GiB avail
|
||||||
|
pgs: 192 active+clean
|
||||||
|
|
||||||
|
progress:
|
||||||
|
Rebalancing after osd.0 marked out
|
||||||
|
[..............................]
|
||||||
|
|
||||||
|
#. Stop the |OSD| and purge it from the Ceph cluster.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ sudo mv /etc/pmon.d/ceph.conf ~/
|
||||||
|
~(keystone_admin)$ sudo /etc/init.d/ceph stop osd.<id>
|
||||||
|
|
||||||
|
#. Obtain the stor UUID and delete it from the platform.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-stor-list controller-0 # list all stors
|
||||||
|
~(keystone_admin)$ system host-stor-delete <stor uuid> # delete stor
|
||||||
|
|
||||||
|
#. Purge the disk from the Ceph cluster.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd purge osd.<id> --yes-i-really-mean-it
|
||||||
|
|
||||||
|
#. Remove the |OSD| entry in /etc/ceph/ceph.conf.
|
||||||
|
|
||||||
|
#. Unmount and remove any remaining folders.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ sudo umount /var/lib/ceph/osd/ceph-<id>
|
||||||
|
~(keystone_admin)$ sudo rm -rf /var/lib/ceph/osd/ceph-<id>/
|
||||||
|
|
||||||
|
#. Set the pool to allow size changes.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd pool set <pool-name> nosizechange false
|
||||||
|
|
||||||
|
#. Unlock machine.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-unlock controller-0
|
||||||
|
|
||||||
|
#. Verify that the Ceph cluster is healthy.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph -s
|
||||||
|
|
||||||
|
If you see a ``HEALTH_ERR`` message like the following:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
controller-0:~$ ceph -s
|
||||||
|
cluster:
|
||||||
|
id: 38563514-4726-4664-9155-5efd5701de86
|
||||||
|
health: HEALTH_ERR
|
||||||
|
1 filesystem is degraded
|
||||||
|
1 filesystem has a failed mds daemon
|
||||||
|
1 filesystem is offline
|
||||||
|
no active mgr
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller-0 (age 38s)
|
||||||
|
mgr: no daemons active (since 3s)
|
||||||
|
mds: kube-cephfs:0/1, 1 failed
|
||||||
|
osd: 1 osds: 1 up (since 14m), 1 in (since 15m)
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 32 objects, 1000 MiB
|
||||||
|
usage: 1.1 GiB used, 7.7 GiB / 8.8 GiB avail
|
||||||
|
pgs: 192 active+clean
|
||||||
|
|
||||||
|
Wait a few minutes until the Ceph cluster shows ``HEALTH_OK``.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
controller-0:~$ ceph -s
|
||||||
|
cluster:
|
||||||
|
id: 38563514-4726-4664-9155-5efd5701de86
|
||||||
|
health: HEALTH_OK
|
||||||
|
|
||||||
|
services:
|
||||||
|
mon: 1 daemons, quorum controller-0 (age 2m)
|
||||||
|
mgr: controller-0(active, since 96s)
|
||||||
|
mds: kube-cephfs:1 {0=controller-0=up:active}
|
||||||
|
osd: 1 osds: 1 up (since 46s), 1 in (since 17m)
|
||||||
|
|
||||||
|
task status:
|
||||||
|
|
||||||
|
data:
|
||||||
|
pools: 3 pools, 192 pgs
|
||||||
|
objects: 32 objects, 1000 MiB
|
||||||
|
usage: 1.1 GiB used, 7.7 GiB / 8.8 GiB avail
|
||||||
|
pgs: 192 active+clean
|
||||||
|
|
||||||
|
|
||||||
|
#. The |OSD| tree should display the new |OSD| and not the previous one.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
controller-0:~$ ceph osd tree
|
||||||
|
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
|
||||||
|
-1 0.00850 root storage-tier
|
||||||
|
-2 0.00850 chassis group-0
|
||||||
|
-3 0.00850 host controller-0
|
||||||
|
1 hdd 0.00850 osd.1 up 1.00000 1.00000
|
||||||
|
|
||||||
@@ -0,0 +1,71 @@
|
|||||||
|
.. _replace-osds-on-an-aio-sx-single-disk-system-without-backup-951eefebd1f2:
|
||||||
|
|
||||||
|
===========================================================
|
||||||
|
Replace OSDs on an AIO-SX Single Disk System without Backup
|
||||||
|
===========================================================
|
||||||
|
|
||||||
|
.. rubric:: |proc|
|
||||||
|
|
||||||
|
#. Get a list of all pools and their settings (size, min_size, pg_num, pgp_num).
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd lspools # list all pools
|
||||||
|
~(keystone_admin)$ ceph osd pool get $POOLNAME $SETTING
|
||||||
|
|
||||||
|
Keep the pool names and settings as they will be used in step 12.
|
||||||
|
|
||||||
|
#. Lock the controller.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system host-lock controller-0
|
||||||
|
|
||||||
|
#. Remove all applications that use ceph pools.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ system application-list # list the applications
|
||||||
|
~(keystone_admin)$ system application-remove $APPLICATION_NAME # remove
|
||||||
|
application
|
||||||
|
|
||||||
|
Keep the names of the removed applications as they will be used in step 11.
|
||||||
|
|
||||||
|
#. Make a backup of /etc/pmon.d/ceph.conf to a safe location and remove the
|
||||||
|
ceph.conf file from the /etc/pmon.d folder.
|
||||||
|
|
||||||
|
#. Stop ``ceph-mds``.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ /etc/init.d/ceph stop mds
|
||||||
|
|
||||||
|
#. Declare ``ceph fs`` as failed and delete it.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph mds fail 0
|
||||||
|
~(keystone_admin)$ ceph fs rm <ceph fs filename> --yes-i-really-mean-it
|
||||||
|
|
||||||
|
#. Allow Ceph pools to be deleted.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph tell mon.\* injectargs
|
||||||
|
'--mon-allow-pool-delete=true'
|
||||||
|
|
||||||
|
#. Remove all the pools.
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
~(keystone_admin)$ ceph osd pool ls | xargs -i ceph osd pool delete {}
|
||||||
|
{} --yes-i-really-really-mean-it
|
||||||
|
|
||||||
|
#. Shutdown machine, replace disk, turn it on and wait for boot to finish.
|
||||||
|
|
||||||
|
#. Move the backed up ceph.conf from step 4 to /etc/pmon.d and unlock the
|
||||||
|
controller.
|
||||||
|
|
||||||
|
#. Add the applications that were removed in step 3.
|
||||||
|
|
||||||
|
#. Verify that all pools and settings listed in step 1 are recreated.
|
||||||
Reference in New Issue
Block a user