Merge "Added Health Checks to the troubleshooting guide"
This commit is contained in:
commit
15de25b1c7
|
@ -4,3 +4,13 @@ peggles/
|
|||
# Unit test / coverage reports
|
||||
.tox/
|
||||
config-ssh
|
||||
|
||||
# Sphinx Build Files
|
||||
_build
|
||||
|
||||
# Various user specific files
|
||||
.DS_Store
|
||||
.idea/
|
||||
.vimrc
|
||||
*.swp
|
||||
.vscode/
|
|
@ -1,3 +1,4 @@
|
|||
=====================
|
||||
Troubleshooting Guide
|
||||
=====================
|
||||
|
||||
|
@ -10,9 +11,117 @@ root cause of the problem.
|
|||
|
||||
For additional support you can contact the Airship team via
|
||||
`IRC or mailing list <https://www.airshipit.org/community/>`__,
|
||||
use `Airship bug tracker <https://storyboard.openstack.org/#!/project_group/Airship>`__
|
||||
use `Airship bug tracker <https://storyboard.openstack.org/#!/
|
||||
project_group/Airship>`__
|
||||
to search and create issues.
|
||||
|
||||
.. contents:: Table of Contents
|
||||
:depth: 3
|
||||
|
||||
---------------------
|
||||
Perform Health Checks
|
||||
---------------------
|
||||
|
||||
The first step in troubleshooting an Airship deployment is to identify unhealthy
|
||||
services by performing health checks.
|
||||
|
||||
Verify Peering is established
|
||||
-----------------------------
|
||||
|
||||
::
|
||||
|
||||
sudo /opt/cni/bin/calicoctl node status
|
||||
|
||||
Calico process is running.
|
||||
IPv4 BGP status
|
||||
+--------------+-----------+-------+------------+-------------+
|
||||
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
|
||||
+--------------+-----------+-------+------------+-------------+
|
||||
| 172.29.0.2 | global | up | 2018-05-22 | Established |
|
||||
| 172.29.0.3 | global | up | 2018-05-22 | Established |
|
||||
+--------------+-----------+-------+------------+-------------+
|
||||
IPv6 BGP status No IPv6 peers found.
|
||||
|
||||
Verify that **STATE** is ``up`` and **INFO** is ``Established``. However, if
|
||||
**STATE** is ``start`` and **INFO** is ``Connect``, peering has failed.
|
||||
|
||||
For more information on Calico troubleshooting, visit the
|
||||
`Calico Documentation <https://docs.projectcalico.org/introduction/>`__
|
||||
|
||||
Verify the Health of Kubernetes
|
||||
-------------------------------
|
||||
|
||||
::
|
||||
|
||||
# Verify that for all nodes, STATE is Ready.
|
||||
#
|
||||
# Note: After a reboot, it may take as long as 30 minutes for
|
||||
# a node to stabilize and reach a Ready condition.
|
||||
kubectl get nodes
|
||||
|
||||
# Verify that liveness probes for all pods are working.
|
||||
# This command exposes pods whose liveness probe is failing.
|
||||
kubectl get pods --all-namespaces | grep Running | grep 0/
|
||||
|
||||
# Verify that all pods are in the Running or Completed state.
|
||||
# This command exposes pods that are not running or completed.
|
||||
kubectl get pods --all-namespaces | grep -v Running | Completed
|
||||
|
||||
# Look for crashed pods.
|
||||
kubectl get pods --all-namespaces -o wide | grep Crash
|
||||
|
||||
# Check the health of core services.
|
||||
kubectl get pods --all-namespaces -o wide | grep core
|
||||
kubectl get services --all-namespaces | grep core
|
||||
|
||||
# Check the health of proxy services.
|
||||
kubectl get pods --all-namespaces -o wide | grep proxy
|
||||
|
||||
# Get all pod details.
|
||||
kubectl get pods --all-namespaces -o wide -w
|
||||
|
||||
# Look for failed jobs.
|
||||
kubectl get jobs – --all-namespaces -o wide | grep -v "1 1"
|
||||
|
||||
Verify the Health of OpenStack
|
||||
------------------------------
|
||||
|
||||
Check OpenStack's health by issuing the following commands at the terminal,
|
||||
in order to do so you must have a set an OpenStack RC file, details
|
||||
`here <https://docs.openstack.org/mitaka/cli-reference/common/cli_set_
|
||||
environment_variables_using_openstack_rc.html#download-and-source-the-
|
||||
openstack-rc-file>`__
|
||||
|
||||
::
|
||||
|
||||
# Verify Keystone by requesting a token.
|
||||
openstack token issue
|
||||
|
||||
# Verify networks.
|
||||
openstack network list
|
||||
|
||||
# Verify subnets.
|
||||
openstack subnet list
|
||||
|
||||
# Verify VMs.
|
||||
openstack server list
|
||||
|
||||
# Verify compute hypervisors.
|
||||
openstack hypervisor list
|
||||
|
||||
# Verify Images
|
||||
openstack image list
|
||||
|
||||
Check for kube-proxy iptables NAT Issues
|
||||
----------------------------------------
|
||||
|
||||
::
|
||||
|
||||
# Check the iptables and make sure the IP addresses are the same:
|
||||
% iptables -n -t nat -L | grep coredns
|
||||
% kubectl -n kube-system get -o wide pod | grep coredns
|
||||
|
||||
-----------------------
|
||||
Configuring Airship CLI
|
||||
-----------------------
|
||||
|
||||
|
@ -32,6 +141,7 @@ how to get it configured on your environment.
|
|||
# Run it without arguments to get a help message.
|
||||
sudo ./treasuremap/tools/airship
|
||||
|
||||
---------------------
|
||||
Manifests Preparation
|
||||
---------------------
|
||||
|
||||
|
@ -62,6 +172,7 @@ Example:
|
|||
sudo ./treasuremap/tools/airship pegleg site -r treasuremap/ \
|
||||
render -o rendered.txt ${SITE}
|
||||
|
||||
------------------
|
||||
Deployment Failure
|
||||
------------------
|
||||
|
||||
|
@ -122,6 +233,7 @@ by Kubernetes to satisfy replication factor.
|
|||
# Restart Armada API service.
|
||||
kubectl delete pod -n ucp armada-api-d5f757d5-6z6nv
|
||||
|
||||
----
|
||||
Ceph
|
||||
----
|
||||
|
||||
|
@ -132,8 +244,6 @@ For more information on Ceph debugging follow an official
|
|||
Although Ceph tolerates failures of multiple OSDs, it is important
|
||||
to make sure that your Ceph cluster is healthy.
|
||||
|
||||
Example:
|
||||
|
||||
::
|
||||
|
||||
# Get a name of Ceph Monitor pod.
|
||||
|
@ -175,3 +285,9 @@ There are a few other commands that may be useful during the debugging:
|
|||
|
||||
# List all Ceph block devices mounted on a specific host.
|
||||
mount | grep rbd
|
||||
|
||||
# Exec into the Monitor pod
|
||||
MON_POD=$(sudo kubectl get --no-headers pods -n=ceph \
|
||||
l="application=ceph,component=mon" | awk '{ print $1; exit }')
|
||||
echo $MON_POD
|
||||
sudo kubectl exec -n ceph ${MON_POD} -- ceph -s
|
||||
|
|
Loading…
Reference in New Issue