Added Health Checks to the troubleshooting guide

Added information about health checks that can be used to verify deployment of airship 2. Also made two additions to .gitignore to refrain from tracking Sphinx build files and JetBrains .idea files. Change-Id: Icbf39860e9e137261b302ad5649fb48b095f6220
2020-05-13 09:51:15 -05:00 · 2020-05-13 09:51:15 -05:00 · d2e23503e7
parent 7fe5ca0645
commit d2e23503e7
2 changed files with 115 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -4,3 +4,9 @@ peggles/
 # Unit test / coverage reports
 .tox/
 config-ssh
+
+# Sphinx Build Files
+_build
+
+# IDE Files
+.idea
--- a/doc/source/troubleshooting_guide.rst
+++ b/doc/source/troubleshooting_guide.rst
@ -175,3 +175,112 @@ There are a few other commands that may be useful during the debugging:

    # List all Ceph block devices mounted on a specific host.
    mount | grep rbd
+
+Health Checks
+-------------
+
+
+**Verify Peering is established**
+
+``% sudo /opt/cni/bin/calicoctl node status``
+
+Example:
+
+::
+
+    % sudo /opt/cni/bin/calicoctl node status
+    Calico process is running.
+    IPv4 BGP status
+    +--------------+-----------+-------+------------+-------------+
+    | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+    +--------------+-----------+-------+------------+-------------+
+    | 172.29.0.2 | global | up | 2018-05-22 | Established |
+    | 172.29.0.3 | global | up | 2018-05-22 | Established |
+    +--------------+-----------+-------+------------+-------------+
+    IPv6 BGP status No IPv6 peers found.
+
+Verify that **STATE** is ``up`` and **INFO** is ``Established``. However, if **STATE** is ``start`` and **INFO** is
+``Connect``, peering has failed.
+
+For more information on Calico troubleshooting, Visit the latest publication of:
+https://docs.projectcalico.org/v3.4/usage/troubleshooting/
+
+
+**Kubernetes Health Checks**
+
+--------------------------------------+-------------------------------------------------------------------+
+| Command                              | Description                                                       |
+======================================+===================================================================+
+| | ``% kubectl get nodes``            | | Verify that for all nodes, STATE is Ready.                      |
+| |                                    | |                                                                 |
+| |                                    | | **Note**: After a reboot, it may take as long as 30 minutes for |
+| |                                    | | a node to stabilize and reach a ``Ready`` condition.            |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% kubectl get pods --``          | | Verify that liveness probes for all pods are working. This      |
+| | ``allnamespaces | grep Running |`` | | command exposes pods whose liveness probe is failing.           |
+| | ``grep 0/``                        | |                                                                 |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% kubectl get pods --``          | | Verify that all pods are in the ``Running`` or ``Completed``    |
+| | ``allnamespaces | grep -v Running``| | state. This command exposes pods that are not running or        |
+| | ``| grep -v Completed``            | | completed.                                                      |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% kubectl get pods --``          | | Look for crashed pods.                                          |
+| | ``allnamespaces -o wide | grep``   | |                                                                 |
+| | ``Crash``                          | |                                                                 |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% kubectl get pods --``          | | Check the health of core services.                              |
+| | ``allnamespaces -o wide | grep``   | |                                                                 |
+| | ``core % kubectl get services --`` | |                                                                 |
+| | ``allnamespaces | grep core``      | |                                                                 |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% kubectl get pods --``          | | Check the health of proxy services.                             |
+| | ``allnamespaces -o wide | grep``   | |                                                                 |
+| | ``proxy``                          | |                                                                 |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% kubectl get pods --``          | | Get all pod details.                                            |
+| | ``allnamespaces -o wide -w``       | |                                                                 |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% kubectl get jobs –``           | | Look for failed jobs.                                           |
+| | ``allnamespaces -o wide | grep -v``| |                                                                 |
+| | ``"1    1"``                       | |                                                                 |
+--------------------------------------+-------------------------------------------------------------------+
+
+
+**OpenStack Health Checks**
+
+Check OpenStack health by issuing the following commands at the terminal:
+
+--------------------------------------+-------------------------------------------------------------------+
+| Command                              | Description                                                       |
+======================================+===================================================================+
+| | ``% openstack token issue``        | | Verifies Keystone by requesting a token.                        |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% openstack network list``       | | Verify networks.                                                |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% openstack subnet list``        | | Verify subnets.                                                 |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% openstack server list``        | | Verify VMs.                                                     |
+--------------------------------------+-------------------------------------------------------------------+
+| | ``% openstack hypervisor list``    | | Verify compute hypervisors.                                     |
+--------------------------------------+-------------------------------------------------------------------+
+
+
+**Check Ceph Status**
+
+::
+
+    % MON_POD=$(sudo kubectl get --no-headers pods -n=ceph l="application=ceph,component=mon" | awk '{ print $1; exit }')
+    % echo $MON_POD
+    % sudo kubectl exec -n ceph ${MON_POD} -- ceph -s
+
+For troubleshooting information, see the Ceph Troubleshooting section of this guide.
+
+**Check for kube-proxy iptables NAT Issues**
+
+
+
+::
+
+    # Check the IP tables and make sure the IP addresses are the same:
+    % iptables -n -t nat -L | grep coredns
+    % kubectl -n kube-system get -o wide pod | grep coredns