Add Flannel troubleshooting
Add section on Flannel to troubleshooting guide. Flannel provides the overlay network and is the default network driver for Kubernetes COE. It is also an optional network driver for Swarm COE. Failure in Flannel would mean that the containers cannot communicate with each other. This section covers common failures, how to verify configuration, how to check for correct operation, logs, known limitations. If you have encountered other scenarios or have additional tips, please comment or add your own patch. Partially implements: blueprint magnum-troubleshooting-guide Change-Id: Icfc61d532a3381f3b70d36d8e765f0b6548e0c3f
This commit is contained in:
@@ -265,7 +265,7 @@ If the ping is not successful, check the following:
|
||||
Flannel subnet. If this is not correct, the docker daemon is not configured
|
||||
correctly with the parameter *--bip*. Check the systemd service for docker.
|
||||
|
||||
- Is Flannel running properly? check the `flannel service`_.
|
||||
- Is Flannel running properly? check the `Running Flannel`_.
|
||||
|
||||
- Ping and try `tcpdump
|
||||
<http://docs.openstack.org/openstack-ops/content/network_troubleshooting.html#tcpdump>`_
|
||||
@@ -382,10 +382,132 @@ etcd service
|
||||
*To be filled in*
|
||||
|
||||
|
||||
flannel service
|
||||
Running Flannel
|
||||
---------------
|
||||
*To be filled in*
|
||||
|
||||
When deploying a COE, Flannel is available as a network driver for
|
||||
certain COE type. Magnum currently supports Flannel for a Kubernetes
|
||||
or Swarm bay.
|
||||
|
||||
Flannel provides a flat network space for the containers in the bay:
|
||||
they are allocated IP in this network space and they will have connectivity
|
||||
to each other. Therefore, if Flannel fails, some containers will not
|
||||
be able to access services from other containers in the bay. This can be
|
||||
confirmed by running *ping* or *curl* from one container to another.
|
||||
|
||||
The Flannel daemon is run as a systemd service on each node of the bay.
|
||||
To check Flannel, run on each node::
|
||||
|
||||
sudo service flanneld status
|
||||
|
||||
If the daemon is running, you should see that the service is successfully
|
||||
deployed::
|
||||
|
||||
Active: active (running) since ....
|
||||
|
||||
If the daemon is not running, the status will show the service as failed,
|
||||
something like::
|
||||
|
||||
Active: failed (Result: timeout) ....
|
||||
|
||||
or::
|
||||
|
||||
Active: inactive (dead) ....
|
||||
|
||||
Flannel daemon may also be running but not functioning correctly.
|
||||
Check the following:
|
||||
|
||||
- Check the log for Flannel::
|
||||
|
||||
sudo journalctl -u flanneld
|
||||
|
||||
- Since Flannel relies on etcd, a common cause for failure is that the
|
||||
etcd service is not running on the master nodes. Check the `etcd service`_.
|
||||
If the etcd service failed, once it has been restored successfully, the
|
||||
Flannel service can be restarted by::
|
||||
|
||||
sudo service flanneld restart
|
||||
|
||||
- Magnum writes the configuration for Flannel in a local file on each master
|
||||
node. Check for this file on the master nodes by::
|
||||
|
||||
cat /etc/sysconfig/flannel-network.json
|
||||
|
||||
The content should be something like::
|
||||
|
||||
{
|
||||
"Network": "10.100.0.0/16",
|
||||
"Subnetlen": 24,
|
||||
"Backend": {
|
||||
"Type": "udp"
|
||||
}
|
||||
}
|
||||
|
||||
where the values for the parameters must match the corresponding
|
||||
parameters from the bay model.
|
||||
|
||||
Magnum also loads this configuration into etcd, therefore, verify
|
||||
the configuration in etcd by running *etcdctl* on the master nodes::
|
||||
|
||||
etcdctl get /coreos.com/network/config
|
||||
|
||||
- Each node is allocated a segment of the network space. Check
|
||||
for this segment on each node by::
|
||||
|
||||
grep FLANNEL_SUBNET /run/flannel/subnet.env
|
||||
|
||||
The containers on this node should be assigned an IP in this range.
|
||||
The nodes negotiate for their segment through etcd, and you can use
|
||||
*etcdctl* on the master node to query the network segment associated
|
||||
with each node::
|
||||
|
||||
for s in `etcdctl ls /coreos.com/network/subnets`
|
||||
do
|
||||
echo $s
|
||||
etcdctl get $s
|
||||
done
|
||||
|
||||
/coreos.com/network/subnets/10.100.14.0-24
|
||||
{"PublicIP":"10.0.0.5"}
|
||||
/coreos.com/network/subnets/10.100.61.0-24
|
||||
{"PublicIP":"10.0.0.6"}
|
||||
/coreos.com/network/subnets/10.100.92.0-24
|
||||
{"PublicIP":"10.0.0.7"}
|
||||
|
||||
Alternatively, you can read the full record in ectd by::
|
||||
|
||||
curl http://<master_node_ip>:2379/v2/keys/coreos.com/network/subnets
|
||||
|
||||
You should receive a json snippet that describes all the segments
|
||||
allocated.
|
||||
|
||||
- This network segment is passed to Docker via the parameter *--bip*.
|
||||
If this is not configured correctly, Docker would not assign the correct
|
||||
IP in the Flannel network segment to the container. Check by::
|
||||
|
||||
cat /run/flannel/docker
|
||||
ps -aux | grep docker
|
||||
|
||||
- Check the interface for Flannel::
|
||||
|
||||
ifconfig flannel0
|
||||
|
||||
The IP should be the first address in the Flannel subnet for this node.
|
||||
|
||||
- Flannel has several different backend implementations and they have
|
||||
specific requirements. The *udp* backend is the most general and have
|
||||
no requirement on the network. The *vxlan* backend requires vxlan
|
||||
support in the kernel, so ensure that the image used does provide
|
||||
vxlan support. The *host-gw* backend requires that all the hosts are
|
||||
on the same L2 network. This is currently met by the private Neutron
|
||||
subnet created by Magnum; however, if other network topology is used
|
||||
instead, ensure that this requirement is met if *host-gw* is used.
|
||||
|
||||
Current known limitation: the image fedora-21-atomic-5.qcow2 has
|
||||
Flannel version 0.5.0. This version has known bugs that prevent the
|
||||
backend vxland and host-gw to work correctly. Only the backend udp
|
||||
works for this image. Version 0.5.3 and later should work correctly.
|
||||
The image fedora-21-atomic-7.qcow2 has Flannel version 0.5.5.
|
||||
|
||||
Kubernetes services
|
||||
-------------------
|
||||
|
Reference in New Issue
Block a user