Added Health Checks to the troubleshooting guide

Added information about health checks that can be used to verify
deployment of airship 2. Also made two additions to .gitignore
to refrain from tracking Sphinx build files and IDE files.

Change-Id: Icbf39860e9e137261b302ad5649fb48b095f6220
This commit is contained in:
dm470r 2020-05-13 09:51:15 -05:00
parent 7fe5ca0645
commit 992a4fa89f
2 changed files with 115 additions and 0 deletions

10
.gitignore vendored
View File

@ -4,3 +4,13 @@ peggles/
# Unit test / coverage reports
.tox/
config-ssh
# Sphinx Build Files
_build
# Various user specific files
.DS_Store
.idea/
.vimrc
*.swp
.vscode/

View File

@ -13,6 +13,111 @@ For additional support you can contact the Airship team via
use `Airship bug tracker <https://storyboard.openstack.org/#!/project_group/Airship>`__
to search and create issues.
Preform Health Checks
---------------------
The first step in troubleshooting an Airship deployment is to identify unhealthy services by performing health checks.
**Verify Peering is established**
::
% sudo /opt/cni/bin/calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-----------+-------+------------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-----------+-------+------------+-------------+
| 172.29.0.2 | global | up | 2018-05-22 | Established |
| 172.29.0.3 | global | up | 2018-05-22 | Established |
+--------------+-----------+-------+------------+-------------+
IPv6 BGP status No IPv6 peers found.
Verify that **STATE** is ``up`` and **INFO** is ``Established``. However, if **STATE** is ``start`` and **INFO** is
``Connect``, peering has failed.
For more information on Calico troubleshooting, Visit the latest publication of:
https://docs.projectcalico.org/v3.4/usage/troubleshooting/
**Verify the Health of Kubernetes**
+--------------------------------------+-------------------------------------------------------------------+
| Command | Description |
+======================================+===================================================================+
| | ``% kubectl get nodes`` | | Verify that for all nodes, STATE is Ready. |
| | | | |
| | | | **Note**: After a reboot, it may take as long as 30 minutes for |
| | | | a node to stabilize and reach a ``Ready`` condition. |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% kubectl get pods`` | | Verify that liveness probes for all pods are working. This |
| | ``--all-namespaces | grep Running``| | command exposes pods whose liveness probe is failing. |
| | ``| grep 0/`` | | |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% kubectl get pods`` | | Verify that all pods are in the ``Running`` or ``Completed`` |
| | ``--all-namespaces | grep -v`` | | state. This command exposes pods that are not running or |
| | ``Running | grep -v Completed`` | | completed. |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% kubectl get pods`` | | Look for crashed pods. |
| | ``--all-namespaces -o wide | grep``| | |
| | ``Crash`` | | |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% kubectl get pods`` | | Check the health of core services. |
| | ``--all-namespaces -o wide | grep``| | |
| | ``core`` | | |
| | ``% kubectl get services`` | | |
| | ``--all-namespaces | grep core`` | | |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% kubectl get pods`` | | Check the health of proxy services. |
| | ``--all-namespaces -o wide | grep``| | |
| | ``proxy`` | | |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% kubectl get pods`` | | Get all pod details. |
| | ``--all-namespaces -o wide -w`` | | |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% kubectl get jobs `` | | Look for failed jobs. |
| | ``--all-namespaces -o wide | grep``| | |
| | ``-v "1 1"`` | | |
+--------------------------------------+-------------------------------------------------------------------+
**Verify the Health of OpenStack**
Check OpenStack's health by issuing the following commands at the terminal:
+--------------------------------------+-------------------------------------------------------------------+
| Command | Description |
+======================================+===================================================================+
| | ``% openstack token issue`` | | Verifies Keystone by requesting a token. |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% openstack network list`` | | Verify networks. |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% openstack subnet list`` | | Verify subnets. |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% openstack server list`` | | Verify VMs. |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% openstack hypervisor list`` | | Verify compute hypervisors. |
+--------------------------------------+-------------------------------------------------------------------+
| | ``% openstack image list`` | | Verify images. |
+--------------------------------------+-------------------------------------------------------------------+
**Check Ceph Status**
::
% MON_POD=$(sudo kubectl get --no-headers pods -n=ceph l="application=ceph,component=mon" | awk '{ print $1; exit }')
% echo $MON_POD
% sudo kubectl exec -n ceph ${MON_POD} -- ceph -s
For troubleshooting information, see the Ceph Troubleshooting section of this guide.
**Check for kube-proxy iptables NAT Issues**
::
# Check the IP tables and make sure the IP addresses are the same:
% iptables -n -t nat -L | grep coredns
% kubectl -n kube-system get -o wide pod | grep coredns
Configuring Airship CLI
-----------------------