Merge "Separated Ceph Content into separate section"
This commit is contained in:
commit
742bc5b3c1
83
doc/source/troubleshooting_ceph.rst
Normal file
83
doc/source/troubleshooting_ceph.rst
Normal file
@ -0,0 +1,83 @@
|
|||||||
|
..
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
not use this file except in compliance with the License. You may obtain
|
||||||
|
a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
License for the specific language governing permissions and limitations
|
||||||
|
under the License.
|
||||||
|
|
||||||
|
--------------------
|
||||||
|
Troubleshooting Ceph
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
.. contents:: Table of Contents
|
||||||
|
:depth: 3
|
||||||
|
|
||||||
|
Initial Troubleshooting
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
Many stateful services in Airship rely on Ceph to function correctly.
|
||||||
|
For more information on Ceph debugging follow the official
|
||||||
|
`Ceph debugging guide <http://docs.ceph.com/docs/mimic/rados/troubleshooting/log-and-debug/>`__.
|
||||||
|
|
||||||
|
Although Ceph tolerates failures of multiple OSDs, it is important
|
||||||
|
to make sure that your Ceph cluster is healthy.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
# Many commands require the name of the Ceph monitor pod, use the following
|
||||||
|
# shell command to assign the pod name to an environment variable for ease
|
||||||
|
# of use.
|
||||||
|
CEPH_MON=$(sudo kubectl get --no-headers pods -n=ceph \
|
||||||
|
l="application=ceph,component=mon" | awk '{ print $1; exit }')
|
||||||
|
|
||||||
|
# Get the status of the Ceph cluster.
|
||||||
|
sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph -s
|
||||||
|
|
||||||
|
# Get the health of the Ceph cluster
|
||||||
|
sudo kubectl -n ceph exec ${CEPH_MON} ceph health detail
|
||||||
|
|
||||||
|
The health indicators for Ceph are:
|
||||||
|
|
||||||
|
* `HEALTH_OK`: Indicates the cluster is healthy
|
||||||
|
* `HEALTH_WARN`: Indicates there may be an issue, but all the data stored in the
|
||||||
|
cluster remains accessible. In some cases Ceph will return to `HEALTH_OK`
|
||||||
|
automatically, i.e. when Ceph finishes the rebalancing process
|
||||||
|
* `HEALTH_ERR`: Indicates a more serious problem that requires immediate
|
||||||
|
attention as a part or all of your data has become inaccessible
|
||||||
|
|
||||||
|
When the cluster is unhealthy, and some Placement Groups are reported to be in
|
||||||
|
degraded or down states, determine the problem by inspecting the logs of
|
||||||
|
Ceph OSD that is down using ``kubectl``.
|
||||||
|
|
||||||
|
There are a few other commands that may be useful during the debugging:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
# Make sure your CEPH_MON variable is set, mentioned above.
|
||||||
|
echo ${CEPH_MON}
|
||||||
|
|
||||||
|
# List a hierarchy of OSDs in the cluster to see what OSDs are down.
|
||||||
|
sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph osd tree
|
||||||
|
|
||||||
|
# Get a detailed information on the status of every Placement Group.
|
||||||
|
sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph pg dump
|
||||||
|
|
||||||
|
# List allocated block devices.
|
||||||
|
sudo kubectl exec -it -n ceph ${CEPH_MON} -- rbd ls
|
||||||
|
|
||||||
|
# See what client uses the device.
|
||||||
|
# Note: The pvc name will be different in your cluster
|
||||||
|
sudo kubectl exec -it -n ceph ${CEPH_MON} -- rbd status \
|
||||||
|
kubernetes-dynamic-pvc-e71e65a9-3b99-11e9-bf31-e65b6238af01
|
||||||
|
|
||||||
|
# List all Ceph block devices mounted on a specific host.
|
||||||
|
mount | grep rbd
|
||||||
|
|
||||||
|
# Exec into the Monitor pod
|
||||||
|
sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph -s
|
@ -18,6 +18,13 @@ to search and create issues.
|
|||||||
.. contents:: Table of Contents
|
.. contents:: Table of Contents
|
||||||
:depth: 3
|
:depth: 3
|
||||||
|
|
||||||
|
**Additional Troubleshooting**
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 3
|
||||||
|
|
||||||
|
troubleshooting_ceph.rst
|
||||||
|
|
||||||
---------------------
|
---------------------
|
||||||
Perform Health Checks
|
Perform Health Checks
|
||||||
---------------------
|
---------------------
|
||||||
@ -232,62 +239,3 @@ by Kubernetes to satisfy replication factor.
|
|||||||
|
|
||||||
# Restart Armada API service.
|
# Restart Armada API service.
|
||||||
kubectl delete pod -n ucp armada-api-d5f757d5-6z6nv
|
kubectl delete pod -n ucp armada-api-d5f757d5-6z6nv
|
||||||
|
|
||||||
----
|
|
||||||
Ceph
|
|
||||||
----
|
|
||||||
|
|
||||||
Many stateful services in Airship rely on Ceph to function correctly.
|
|
||||||
For more information on Ceph debugging follow an official
|
|
||||||
`Ceph debugging guide <http://docs.ceph.com/docs/mimic/rados/troubleshooting/log-and-debug/>`__.
|
|
||||||
|
|
||||||
Although Ceph tolerates failures of multiple OSDs, it is important
|
|
||||||
to make sure that your Ceph cluster is healthy.
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
# Get a name of Ceph Monitor pod.
|
|
||||||
CEPH_MON=$(sudo kubectl get pods --all-namespaces -o=name | \
|
|
||||||
grep ceph-mon | sed -n 1p | sed 's|pod/||')
|
|
||||||
# Get the status of the Ceph cluster.
|
|
||||||
sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph -s
|
|
||||||
|
|
||||||
Cluster is in a helthy state when ``health`` parameter is set to ``HEALTH_OK``.
|
|
||||||
|
|
||||||
When the cluster is unhealthy, and some Placement Groups are reported to be in
|
|
||||||
degraded or down states, determine the problem by inspecting the logs of
|
|
||||||
Ceph OSD that is down using ``kubectl``.
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
# Get a name of Ceph Monitor pod.
|
|
||||||
CEPH_MON=$(sudo kubectl get pods --all-namespaces -o=name | \
|
|
||||||
grep ceph-mon | sed -n 1p | sed 's|pod/||')
|
|
||||||
# List a hierarchy of OSDs in the cluster to see what OSDs are down.
|
|
||||||
sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph osd tree
|
|
||||||
|
|
||||||
There are a few other commands that may be useful during the debugging:
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
# Get a name of Ceph Monitor pod.
|
|
||||||
CEPH_MON=$(sudo kubectl get pods --all-namespaces -o=name | \
|
|
||||||
grep ceph-mon | sed -n 1p | sed 's|pod/||')
|
|
||||||
|
|
||||||
# Get a detailed information on the status of every Placement Group.
|
|
||||||
sudo kubectl exec -it -n ceph ${CEPH_MON} -- ceph pg dump
|
|
||||||
|
|
||||||
# List allocated block devices.
|
|
||||||
sudo kubectl exec -it -n ceph ${CEPH_MON} -- rbd ls
|
|
||||||
# See what client uses the device.
|
|
||||||
sudo kubectl exec -it -n ceph ${CEPH_MON} -- rbd status \
|
|
||||||
kubernetes-dynamic-pvc-e71e65a9-3b99-11e9-bf31-e65b6238af01
|
|
||||||
|
|
||||||
# List all Ceph block devices mounted on a specific host.
|
|
||||||
mount | grep rbd
|
|
||||||
|
|
||||||
# Exec into the Monitor pod
|
|
||||||
MON_POD=$(sudo kubectl get --no-headers pods -n=ceph \
|
|
||||||
l="application=ceph,component=mon" | awk '{ print $1; exit }')
|
|
||||||
echo $MON_POD
|
|
||||||
sudo kubectl exec -n ceph ${MON_POD} -- ceph -s
|
|
||||||
|
Loading…
Reference in New Issue
Block a user