Merge "Separated Ceph Content into separate section"

2020-06-15 16:33:41 +00:00 · 2020-06-15 16:33:41 +00:00 · 742bc5b3c1
commit 742bc5b3c1
parent f5808be7b2 a37202b063
2 changed files with 90 additions and 59 deletions
--- a/doc/source/troubleshooting_ceph.rst
+++ b/doc/source/troubleshooting_ceph.rst
@ -0,0 +1,83 @@
+..
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+--------------------
+Troubleshooting Ceph
+--------------------
+
+.. contents:: Table of Contents
+    :depth: 3
+
+Initial Troubleshooting
+-----------------------
+
+Many stateful services in Airship rely on Ceph to function correctly.
+For more information on Ceph debugging follow the official
+`Ceph debugging guide <http://docs.ceph.com/docs/mimic/rados/troubleshooting/log-and-debug/>`__.
+
+Although Ceph tolerates failures of multiple OSDs, it is important
+to make sure that your Ceph cluster is healthy.
+
+::
+
+    # Many commands require the name of the Ceph monitor pod, use the following
+    # shell command to assign the pod name to an environment variable for ease
+    # of use.
+    CEPH_MON=$(sudo kubectl get --no-headers pods -n=ceph \
+        l="application=ceph,component=mon" | awk '{ print $1; exit }')
+
+    # Get the status of the Ceph cluster.
+    sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph -s
+
+    # Get the health of the Ceph cluster
+    sudo kubectl -n ceph exec ${CEPH_MON} ceph health detail
+
+The health indicators for Ceph are:
+
+* `HEALTH_OK`: Indicates the cluster is healthy
+* `HEALTH_WARN`: Indicates there may be an issue, but all the data stored in the
+  cluster remains accessible. In some cases Ceph will return to `HEALTH_OK`
+  automatically, i.e. when Ceph finishes the rebalancing process
+* `HEALTH_ERR`: Indicates a more serious problem that requires immediate
+  attention as a part or all of your data has become inaccessible
+
+When the cluster is unhealthy, and some Placement Groups are reported to be in
+degraded or down states, determine the problem by inspecting the logs of
+Ceph OSD that is down using ``kubectl``.
+
+There are a few other commands that may be useful during the debugging:
+
+::
+
+    # Make sure your CEPH_MON variable is set, mentioned above.
+    echo ${CEPH_MON}
+
+    # List a hierarchy of OSDs in the cluster to see what OSDs are down.
+    sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph osd tree
+
+    # Get a detailed information on the status of every Placement Group.
+    sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph pg dump
+
+    # List allocated block devices.
+    sudo kubectl exec -it -n ceph ${CEPH_MON} -- rbd ls
+
+    # See what client uses the device.
+    # Note: The pvc name will be different in your cluster
+    sudo kubectl exec -it -n ceph ${CEPH_MON} -- rbd status \
+        kubernetes-dynamic-pvc-e71e65a9-3b99-11e9-bf31-e65b6238af01
+
+    # List all Ceph block devices mounted on a specific host.
+    mount | grep rbd
+
+    # Exec into the Monitor pod
+    sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph -s
--- a/doc/source/troubleshooting_guide.rst
+++ b/doc/source/troubleshooting_guide.rst
@ -18,6 +18,13 @@ to search and create issues.
 .. contents:: Table of Contents
    :depth: 3

+**Additional Troubleshooting**
+
+.. toctree::
+   :maxdepth: 3
+
+   troubleshooting_ceph.rst
+
 ---------------------
 Perform Health Checks
 ---------------------
@ -232,62 +239,3 @@ by Kubernetes to satisfy replication factor.

    # Restart Armada API service.
    kubectl delete pod -n ucp armada-api-d5f757d5-6z6nv
-
----
-Ceph
----
-
-Many stateful services in Airship rely on Ceph to function correctly.
-For more information on Ceph debugging follow an official
-`Ceph debugging guide <http://docs.ceph.com/docs/mimic/rados/troubleshooting/log-and-debug/>`__.
-
-Although Ceph tolerates failures of multiple OSDs, it is important
-to make sure that your Ceph cluster is healthy.
-
-::
-
-    # Get a name of Ceph Monitor pod.
-    CEPH_MON=$(sudo kubectl get pods --all-namespaces -o=name | \
-        grep ceph-mon | sed -n 1p | sed 's|pod/||')
-    # Get the status of the Ceph cluster.
-    sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph -s
-
-Cluster is in a helthy state when ``health`` parameter is set to ``HEALTH_OK``.
-
-When the cluster is unhealthy, and some Placement Groups are reported to be in
-degraded or down states, determine the problem by inspecting the logs of
-Ceph OSD that is down using ``kubectl``.
-
-::
-
-    # Get a name of Ceph Monitor pod.
-    CEPH_MON=$(sudo kubectl get pods --all-namespaces -o=name | \
-        grep ceph-mon | sed -n 1p | sed 's|pod/||')
-    # List a hierarchy of OSDs in the cluster to see what OSDs are down.
-    sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph osd tree
-
-There are a few other commands that may be useful during the debugging:
-
-::
-
-    # Get a name of Ceph Monitor pod.
-    CEPH_MON=$(sudo kubectl get pods --all-namespaces -o=name | \
-        grep ceph-mon | sed -n 1p | sed 's|pod/||')
-
-    # Get a detailed information on the status of every Placement Group.
-    sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph pg dump
-
-    # List allocated block devices.
-    sudo kubectl exec -it -n ceph ${CEPH_MON} -- rbd ls
-    # See what client uses the device.
-    sudo kubectl exec -it -n ceph ${CEPH_MON} -- rbd status \
-        kubernetes-dynamic-pvc-e71e65a9-3b99-11e9-bf31-e65b6238af01
-
-    # List all Ceph block devices mounted on a specific host.
-    mount | grep rbd
-
-    # Exec into the Monitor pod
-    MON_POD=$(sudo kubectl get --no-headers pods -n=ceph \
-        l="application=ceph,component=mon" | awk '{ print $1; exit }')
-    echo $MON_POD
-    sudo kubectl exec -n ceph ${MON_POD} -- ceph -s