Merge "Add Flannel troubleshooting"

2016-03-06 02:50:19 +00:00
parent 76468f1eea 30a9d40999
commit 3b24dfd02e
1 changed files with 125 additions and 3 deletions
--- a/doc/source/troubleshooting-guide.rst
+++ b/doc/source/troubleshooting-guide.rst
@@ -265,7 +265,7 @@ If the ping is not successful, check the following:
  Flannel subnet.  If this is not correct, the docker daemon is not configured
  correctly with the parameter *--bip*.  Check the systemd service for docker.

- Is Flannel running properly?  check the `flannel service`_.
+- Is Flannel running properly?  check the `Running Flannel`_.

 - Ping and try `tcpdump
  <http://docs.openstack.org/openstack-ops/content/network_troubleshooting.html#tcpdump>`_
@@ -446,10 +446,132 @@ If etcd continues to fail, check the following:
  If the public discovery service is not reachable, check the
  `Cluster internet access`_.

-flannel service
+Running Flannel
 ---------------
-*To be filled in*

+When deploying a COE, Flannel is available as a network driver for
+certain COE type.  Magnum currently supports Flannel for a Kubernetes
+or Swarm bay.
+
+Flannel provides a flat network space for the containers in the bay:
+they are allocated IP in this network space and they will have connectivity
+to each other.  Therefore, if Flannel fails, some containers will not
+be able to access services from other containers in the bay.  This can be
+confirmed by running *ping* or *curl* from one container to another.
+
+The Flannel daemon is run as a systemd service on each node of the bay.
+To check Flannel, run on each node::
+
+    sudo service flanneld status
+
+If the daemon is running, you should see that the service is successfully
+deployed::
+
+    Active: active (running) since ....
+
+If the daemon is not running, the status will show the service as failed,
+something like::
+
+    Active: failed (Result: timeout) ....
+
+or::
+
+    Active: inactive (dead) ....
+
+Flannel daemon may also be running but not functioning correctly.
+Check the following:
+
+- Check the log for Flannel::
+
+    sudo journalctl -u flanneld
+
+- Since Flannel relies on etcd, a common cause for failure is that the
+  etcd service is not running on the master nodes.  Check the `etcd service`_.
+  If the etcd service failed, once it has been restored successfully, the
+  Flannel service can be restarted by::
+
+    sudo service flanneld restart
+
+- Magnum writes the configuration for Flannel in a local file on each master
+  node.  Check for this file on the master nodes by::
+
+    cat /etc/sysconfig/flannel-network.json
+
+  The content should be something like::
+
+    {
+      "Network": "10.100.0.0/16",
+      "Subnetlen": 24,
+      "Backend": {
+        "Type": "udp"
+      }
+    }
+
+  where the values for the parameters must match the corresponding
+  parameters from the bay model.
+
+  Magnum also loads this configuration into etcd, therefore, verify
+  the configuration in etcd by running *etcdctl* on the master nodes::
+
+    etcdctl get /coreos.com/network/config
+
+- Each node is allocated a segment of the network space.  Check
+  for this segment on each node by::
+
+    grep FLANNEL_SUBNET /run/flannel/subnet.env
+
+  The containers on this node should be assigned an IP in this range.
+  The nodes negotiate for their segment through etcd, and you can use
+  *etcdctl* on the master node to query the network segment associated
+  with each node::
+
+    for s in `etcdctl ls /coreos.com/network/subnets`
+    do
+    echo $s
+    etcdctl get $s
+    done
+
+    /coreos.com/network/subnets/10.100.14.0-24
+    {"PublicIP":"10.0.0.5"}
+    /coreos.com/network/subnets/10.100.61.0-24
+    {"PublicIP":"10.0.0.6"}
+    /coreos.com/network/subnets/10.100.92.0-24
+    {"PublicIP":"10.0.0.7"}
+
+  Alternatively, you can read the full record in ectd by::
+
+    curl http://<master_node_ip>:2379/v2/keys/coreos.com/network/subnets
+
+  You should receive a json snippet that describes all the segments
+  allocated.
+
+- This network segment is passed to Docker via the parameter *--bip*.
+  If this is not configured correctly, Docker would not assign the correct
+  IP in the Flannel network segment to the container.  Check by::
+
+    cat /run/flannel/docker
+    ps -aux | grep docker
+
+- Check the interface for Flannel::
+
+    ifconfig flannel0
+
+  The IP should be the first address in the Flannel subnet for this node.
+
+- Flannel has several different backend implementations and they have
+  specific requirements.  The *udp* backend is the most general and have
+  no requirement on the network.  The *vxlan* backend requires vxlan
+  support in the kernel, so ensure that the image used does provide
+  vxlan support.  The *host-gw* backend requires that all the hosts are
+  on the same L2 network.  This is currently met by the private Neutron
+  subnet created by Magnum;  however, if other network topology is used
+  instead, ensure that this requirement is met if *host-gw* is used.
+
+Current known limitation:  the image fedora-21-atomic-5.qcow2 has
+Flannel version 0.5.0.  This version has known bugs that prevent the
+backend vxland and host-gw to work correctly.  Only the backend udp
+works for this image.  Version 0.5.3 and later should work correctly.
+The image fedora-21-atomic-7.qcow2 has Flannel version 0.5.5.

 Kubernetes services
 -------------------