Merge "Add Flannel troubleshooting"

This commit is contained in:
Jenkins 2016-03-06 02:50:19 +00:00 committed by Gerrit Code Review
commit 3b24dfd02e
1 changed files with 125 additions and 3 deletions

View File

@ -265,7 +265,7 @@ If the ping is not successful, check the following:
Flannel subnet. If this is not correct, the docker daemon is not configured
correctly with the parameter *--bip*. Check the systemd service for docker.
- Is Flannel running properly? check the `flannel service`_.
- Is Flannel running properly? check the `Running Flannel`_.
- Ping and try `tcpdump
<http://docs.openstack.org/openstack-ops/content/network_troubleshooting.html#tcpdump>`_
@ -446,10 +446,132 @@ If etcd continues to fail, check the following:
If the public discovery service is not reachable, check the
`Cluster internet access`_.
flannel service
Running Flannel
---------------
*To be filled in*
When deploying a COE, Flannel is available as a network driver for
certain COE type. Magnum currently supports Flannel for a Kubernetes
or Swarm bay.
Flannel provides a flat network space for the containers in the bay:
they are allocated IP in this network space and they will have connectivity
to each other. Therefore, if Flannel fails, some containers will not
be able to access services from other containers in the bay. This can be
confirmed by running *ping* or *curl* from one container to another.
The Flannel daemon is run as a systemd service on each node of the bay.
To check Flannel, run on each node::
sudo service flanneld status
If the daemon is running, you should see that the service is successfully
deployed::
Active: active (running) since ....
If the daemon is not running, the status will show the service as failed,
something like::
Active: failed (Result: timeout) ....
or::
Active: inactive (dead) ....
Flannel daemon may also be running but not functioning correctly.
Check the following:
- Check the log for Flannel::
sudo journalctl -u flanneld
- Since Flannel relies on etcd, a common cause for failure is that the
etcd service is not running on the master nodes. Check the `etcd service`_.
If the etcd service failed, once it has been restored successfully, the
Flannel service can be restarted by::
sudo service flanneld restart
- Magnum writes the configuration for Flannel in a local file on each master
node. Check for this file on the master nodes by::
cat /etc/sysconfig/flannel-network.json
The content should be something like::
{
"Network": "10.100.0.0/16",
"Subnetlen": 24,
"Backend": {
"Type": "udp"
}
}
where the values for the parameters must match the corresponding
parameters from the bay model.
Magnum also loads this configuration into etcd, therefore, verify
the configuration in etcd by running *etcdctl* on the master nodes::
etcdctl get /coreos.com/network/config
- Each node is allocated a segment of the network space. Check
for this segment on each node by::
grep FLANNEL_SUBNET /run/flannel/subnet.env
The containers on this node should be assigned an IP in this range.
The nodes negotiate for their segment through etcd, and you can use
*etcdctl* on the master node to query the network segment associated
with each node::
for s in `etcdctl ls /coreos.com/network/subnets`
do
echo $s
etcdctl get $s
done
/coreos.com/network/subnets/10.100.14.0-24
{"PublicIP":"10.0.0.5"}
/coreos.com/network/subnets/10.100.61.0-24
{"PublicIP":"10.0.0.6"}
/coreos.com/network/subnets/10.100.92.0-24
{"PublicIP":"10.0.0.7"}
Alternatively, you can read the full record in ectd by::
curl http://<master_node_ip>:2379/v2/keys/coreos.com/network/subnets
You should receive a json snippet that describes all the segments
allocated.
- This network segment is passed to Docker via the parameter *--bip*.
If this is not configured correctly, Docker would not assign the correct
IP in the Flannel network segment to the container. Check by::
cat /run/flannel/docker
ps -aux | grep docker
- Check the interface for Flannel::
ifconfig flannel0
The IP should be the first address in the Flannel subnet for this node.
- Flannel has several different backend implementations and they have
specific requirements. The *udp* backend is the most general and have
no requirement on the network. The *vxlan* backend requires vxlan
support in the kernel, so ensure that the image used does provide
vxlan support. The *host-gw* backend requires that all the hosts are
on the same L2 network. This is currently met by the private Neutron
subnet created by Magnum; however, if other network topology is used
instead, ensure that this requirement is met if *host-gw* is used.
Current known limitation: the image fedora-21-atomic-5.qcow2 has
Flannel version 0.5.0. This version has known bugs that prevent the
backend vxland and host-gw to work correctly. Only the backend udp
works for this image. Version 0.5.3 and later should work correctly.
The image fedora-21-atomic-7.qcow2 has Flannel version 0.5.5.
Kubernetes services
-------------------