Nested CNI: Remove interfaces on DEL requests

Historically we haven't been doing any actions on CNI DEL request when
running in nested mode because all the interfaces we created will be
gone when netns will get deleted. There is an issue with that approach:

1. We get ADD request for pod A in netns X and correctly bind it using
   VLAN ID V.
2. We get DEL request for same pod A and netns X. We don't do anything.
3. Netns is not yet deleted…
4. Another ADD request comes for pod A but now with netns Y (normal K8s
   behavior).
   a) We try binding it, but there is a veth pair with VLAN ID V already
   on the host namespace, so we end up with VLAN ID conflict.
   b) Binding fails with EEXIST error.

The problem with this hypothesis is that the EEXIST error seems to be
persistent and in theory it should disappear after the netns gets
deleted but maybe there's a longer delay in the netns deletion.

This commit solves this problem by making sure we always clean up all
interfaces on DEL requests.

Change-Id: I01b66dcd5c7c0e8193150a0a1b20878aaa8ec9e3
Closes-Bug: 1854928
This commit is contained in:
Michał Dulko 2020-03-04 14:26:22 +01:00
parent ded6b6debc
commit a0a6e65bb0
2 changed files with 13 additions and 4 deletions

View File

@ -110,9 +110,13 @@ class NestedDriver(health.HealthHandler, b_base.BaseBindingDriver,
iface.up() iface.up()
def disconnect(self, vif, ifname, netns, container_id): def disconnect(self, vif, ifname, netns, container_id):
# NOTE(vikasc): device will get deleted with container namespace, so # NOTE(dulek): Interfaces should get deleted with the netns, but it may
# nothing to be done here. # happen that kubelet or crio will call new CNI ADD before
pass # the old netns is deleted. This might result in VLAN ID
# conflict. In oder to protect from that let's remove the
# netns ifaces here anyway.
with b_base.get_ipdb(netns) as c_ipdb:
self._remove_ifaces(c_ipdb, (vif.vif_name, ifname), netns)
class VlanDriver(NestedDriver): class VlanDriver(NestedDriver):

View File

@ -97,8 +97,13 @@ class TestDriverMixin(test_base.TestCase):
if report: if report:
report.assert_called_once() report.assert_called_once()
@mock.patch('kuryr_kubernetes.cni.binding.base.get_ipdb')
@mock.patch('os_vif.unplug') @mock.patch('os_vif.unplug')
def _test_disconnect(self, m_vif_unplug, report=None): def _test_disconnect(self, m_vif_unplug, m_get_ipdb, report=None):
def get_ipdb(netns=None):
return self.ipdbs[netns]
m_get_ipdb.side_effect = get_ipdb
base.disconnect(self.vif, self.instance_info, self.ifname, self.netns, base.disconnect(self.vif, self.instance_info, self.ifname, self.netns,
report) report)
m_vif_unplug.assert_called_once_with(self.vif, self.instance_info) m_vif_unplug.assert_called_once_with(self.vif, self.instance_info)