kuryr-kubernetes

History

Michał Dulko 8d8b84ca13 CNI: Confirm pods in cache before connecting

In highly distributed environment like Kubernetes installation with
Kuryr, we need to plan for network outages in any case. If we don't, we
end up with bugs like one this patch tries to fix.

If we'd lose a Pod delete event on kuryr-daemon following can happen:

    1. Pod A of name "foo" gets created.
    2. It gets annotated normally and CNI ADD request gives it an IP X.
    3. Pod A gets deleted.
    4. Somehow the delete event gets lost on kuryr-daemon's watcher.
    5. CRI sends CNI DEL request and pod gets unplugged successfully. It
       never gets deleted from the daemon's registry, because we never
       got the Pod delete event from K8s API.
    6. Pod B of the same name "foo" gets created.
    7. CNI looks up registry by <namespace>/<pod>, finds old VIF there
       and plugs pod B with pod A's VIF X.
    8. kuryr-controller never notices that and assigns IP X to another
       pod.
    9. We get an IP conflict.

To solve the issue this patch makes sure that when handling ADD CNI calls, we
always get the pod from K8s API first, and if uid of the API one doesn't match
the one in the registry, we remove the registry entry. That way we can make
sure the pod we've cached isn't stale. This adds one K8s API call per CNI ADD
request, which is a significant load increase, but hopefully the K8s API can
handle it.

Closes-Bug: 1854928

Change-Id: I9916fca41bd917d85be973b8625b65a61139c3b3

2020-05-20 17:58:08 +02:00

cmd

Basic Python 3 compatibility fixes

2019-11-22 09:19:14 +01:00

cni

CNI: Confirm pods in cache before connecting

2020-05-20 17:58:08 +02:00

controller

Fix pep8 job after flake8 upgrade

2020-05-12 11:55:06 +02:00

handlers

Run on_finalize() for ADDED events