diff --git a/.gitignore b/.gitignore index ee3c0fba8..13e9d592a 100644 --- a/.gitignore +++ b/.gitignore @@ -4,3 +4,13 @@ peggles/ # Unit test / coverage reports .tox/ config-ssh + +# Sphinx Build Files +_build + +# Various user specific files +.DS_Store +.idea/ +.vimrc +*.swp +.vscode/ \ No newline at end of file diff --git a/doc/source/troubleshooting_guide.rst b/doc/source/troubleshooting_guide.rst index 072a4897d..fd5048312 100644 --- a/doc/source/troubleshooting_guide.rst +++ b/doc/source/troubleshooting_guide.rst @@ -13,6 +13,110 @@ For additional support you can contact the Airship team via use `Airship bug tracker `__ to search and create issues. +Preform Health Checks +--------------------- + +The first step in troubleshooting an Airship deployment is to identify unhealthy services by performing health checks. + +**Verify Peering is established** + +:: + + % sudo /opt/cni/bin/calicoctl node status + + Calico process is running. + IPv4 BGP status + +--------------+-----------+-------+------------+-------------+ + | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | + +--------------+-----------+-------+------------+-------------+ + | 172.29.0.2 | global | up | 2018-05-22 | Established | + | 172.29.0.3 | global | up | 2018-05-22 | Established | + +--------------+-----------+-------+------------+-------------+ + IPv6 BGP status No IPv6 peers found. + +Verify that **STATE** is ``up`` and **INFO** is ``Established``. However, if **STATE** is ``start`` and **INFO** is +``Connect``, peering has failed. + +For more information on Calico troubleshooting, Visit the latest publication of: +https://docs.projectcalico.org/v3.4/usage/troubleshooting/ + +**Verify the Health of Kubernetes** + ++--------------------------------------+-------------------------------------------------------------------+ +| Command | Description | ++======================================+===================================================================+ +| | ``% kubectl get nodes`` | | Verify that for all nodes, STATE is Ready. | +| | | | | +| | | | **Note**: After a reboot, it may take as long as 30 minutes for | +| | | | a node to stabilize and reach a ``Ready`` condition. | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% kubectl get pods --`` | | Verify that liveness probes for all pods are working. This | +| | ``allnamespaces | grep Running |`` | | command exposes pods whose liveness probe is failing. | +| | ``grep 0/`` | | | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% kubectl get pods --`` | | Verify that all pods are in the ``Running`` or ``Completed`` | +| | ``allnamespaces | grep -v Running``| | state. This command exposes pods that are not running or | +| | ``| grep -v Completed`` | | completed. | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% kubectl get pods --`` | | Look for crashed pods. | +| | ``allnamespaces -o wide | grep`` | | | +| | ``Crash`` | | | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% kubectl get pods --`` | | Check the health of core services. | +| | ``allnamespaces -o wide | grep`` | | | +| | ``core % kubectl get services --`` | | | +| | ``allnamespaces | grep core`` | | | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% kubectl get pods --`` | | Check the health of proxy services. | +| | ``allnamespaces -o wide | grep`` | | | +| | ``proxy`` | | | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% kubectl get pods --`` | | Get all pod details. | +| | ``allnamespaces -o wide -w`` | | | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% kubectl get jobs –`` | | Look for failed jobs. | +| | ``allnamespaces -o wide | grep -v``| | | +| | ``"1 1"`` | | | ++--------------------------------------+-------------------------------------------------------------------+ + +**Verify the Health of OpenStack** + +Check OpenStack's health by issuing the following commands at the terminal: + ++--------------------------------------+-------------------------------------------------------------------+ +| Command | Description | ++======================================+===================================================================+ +| | ``% openstack token issue`` | | Verifies Keystone by requesting a token. | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% openstack network list`` | | Verify networks. | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% openstack subnet list`` | | Verify subnets. | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% openstack server list`` | | Verify VMs. | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% openstack hypervisor list`` | | Verify compute hypervisors. | ++--------------------------------------+-------------------------------------------------------------------+ +| | ``% openstack image list`` | | Verify images. | ++--------------------------------------+-------------------------------------------------------------------+ + +**Check Ceph Status** + +:: + + % MON_POD=$(sudo kubectl get --no-headers pods -n=ceph l="application=ceph,component=mon" | awk '{ print $1; exit }') + % echo $MON_POD + % sudo kubectl exec -n ceph ${MON_POD} -- ceph -s + +For troubleshooting information, see the Ceph Troubleshooting section of this guide. + +**Check for kube-proxy iptables NAT Issues** + +:: + + # Check the IP tables and make sure the IP addresses are the same: + % iptables -n -t nat -L | grep coredns + % kubectl -n kube-system get -o wide pod | grep coredns + Configuring Airship CLI -----------------------