It can happen that we get the CNI request, but pod gets deleted before
kuryr-controller was able to create KuryrPort for it. If kuryr-daemon
only watches for KuryrPort events it will not be able to notice that
and will wait until the timeout, which in effect doesn't play well with
some K8s tests.
This commit adds a separate Service that will watch on Pod events and if
Pod gets deleted we'll make sure to put a sentinel value (None) into the
registry so that the thread waiting for the KuryrPort to appear there
will know that it has to stop and raise an error.
Closes-Bug: 1963678
Change-Id: I52fc1805ec47f24c1da88fd13e79e928a3693419
Recent versions of cri-o and containerd are passing K8S_POD_UID as a CNI
argument, alongside with K8S_POD_NAMESPACE and K8S_POD_NAME. As both
latter variables cannot be used to safely identify a pod in the API
(StatefulSet recreates pods with the same name), we were prone to race
conditions in the CNI code that we could only workaround. The end effect
was mostly IP conflict.
Now that the UID argument is passed, we're able to compare the UID from
the request with the one in the API to make sure we're wiring the
correct pod. This commit implements that by making sure to move the
check to the code actually waiting for the pod to appear in the
registry. In case of K8S_POD_UID missing from the CNI request, API call
to retrieve Pod is used as a fallback.
We also know that this check doesn't work for static pods, so CRD and
controller needed to be updated to include information if the pod is
static on the KuryrPort spec, so that we can skip the check for the
static pods without the need to fetch Pod from the API.
Closes-Bug: 1963677
Change-Id: I5ef6a8212c535e90dee049a579c1483644d56db8
Besides that, added ownerReferences to the KuryrNetworkPolicy, so that
we don't need to query kubertnetes API about uid of NetworkPolicy
object. Although side effect of that field is, that it will be removed
alongside with NetworkPolicy, but it's acceptable, since we remove it
anyway after NetworkPolicy removal.
Change-Id: Ia9fb5cac516bc042c20897d8527afdfb8661b42b
This commit implements adding Events for various things that may happen
when Kuryr handles a Service - either incidents or just informative
messages helpful to the user.
As to add an Event it is required to have a uid of the object and in
most of KuryrLoadBalancerHandler we don't have the Service
representation, this commit implements populating ownerReferences of the
KuryrLoadBalancer object with a reference to the Service. This has a
major side effect that KLB will be garbage collected if corresponding
service doesn't exist, which is probably a good thing (and we manage
that ourselves using finalizers anyway).
Another set of refactorings is to remove KLB creation from
EndpointsHandler in order to stop it fighting with ServiceHandler over
creation - EndpointsHandler cannot add ownerReference as it has no uid
of the Service.
Other refactorings related to the error messages are also included.
Change-Id: I36b4d62e6fc7ace00909e9b1aa7681f6d59ca455
There are 2 places we can time out in CNI:
1. KuryrPort haven't got created for a pod we got ADD request for.
2. Ports we plugged for a pod aren't moving to ACTIVE state.
This commit improves logging in these two cases by making sure an
exception with a meaningful message is raised when such a timeout
occurs. The messages includes clues on where to look for root the cause
next - kuryr-controller in case of 1 and Neutron in case of 2.
I'm also disabling putting kuryr-cni into unhealthy state in these
cases. Those erorrs cannot be cleared by a restart of the pod.
Change-Id: I5b881b58fe7d6dfed66a7bb6e3473b5b7939854d
We've discovered that running kuryr-daemon with [cni_daemon]worker_num=1
breaks pyroute2.IPDB's ability to correctly close threads, leading to a
process leak. This commit makes sure kuryr-daemon will fail to start
when worker_num <= 1.
This required a few more changes in order to make sure that when any
kuryr-daemon subservice dies, kuryr-daemon will shutdown too.
Change-Id: I41afc6fa67abfff62d2f0017db508051a1e7edf4
If an Octavia loadbalancer is stuck in PENDING_UPDATE state or Neutron
port is DOWN despite being plugged there's not much Kuryr can do. For
such cases we need to clearly message the user that the error they're
seeing is caused by OpenStack service misbehaving and not Kuryr.
This commit does so by making sure in such cases we raise a distinct
version of ResourceNotReady exception.
Change-Id: I2dd1e8989caf004b3dee0cb51780a45ce8d9353c
Closes-Bug: 1918711
This is not the proper way of informing user that Octavia returns 503,
we should have a nice message or we'll start getting bug reports on us
Closes-bug: 1918708
Change-Id: I871c3998edb5b1d594067b60e908c453ad122dde
During tests it turns out, that we didn't catch Forbidden exceptions,
since there was no forbidden http code sent from kubernetes API. In this
Patch we introduce new exception K8sFieldValueForbidden, which will be
raised on 422 Unprocessable Entity, k8s API returns.
Also, taken care of objects in state terminating in remove_finalizer
method.
Closes-Bug: 1895124
Change-Id: If4ac93190db3a56ee6b94ca122bfd2e95c29ffb9
When a lb transitions to ERROR or the IP on the Service
spec differs from the lb VIP, the lb is released and
the CRD doesn't get updated, causing Not Found expections
when handling the creation of others load balancer
resources. This commit fixes the issue by ensuring the
clean up of the status field happens upon lb release.
Also, it adds protection in case we still get
nonexistent lb on the CRD.
Closes-Bug: 1894758
Change-Id: I484ece6a7b52b51d878f724bd4fad0494eb759d6
This is another attempt at getting the useless tracebacks out of logs
for any level higher than debug. In particular:
* All pools logs regarding "Kuryr-controller not yet ready to *" are now
on debug level.
* The ugly ResourceNotReady raised from _populate_pool is now suppressed
if method is run from eventlet.spawn(), which prevents that exception
being logged by eventlet.
* ResourceNotReady will only print namespace/name or name, not the full
resource representation.
* ConnectionResetError is suppressed on Retry handler level just as any
other K8sClientError.
Change-Id: Ic6e6ee556f36ef4fe3429e8e1e4a2ddc7e8251dc
We were observed the situation, where pod has been created, triggered
KuryrPort CRD creation and removed just before Kuryrport on_present
event was handled, resulting in errors from KuryrPort side.
Idea is to set the finalizer to the pod as quickly as possible, so that
it wont disappear before we correctly set up the CRD.
Change-Id: Iac82bb05a465e94e47356c3c873e11f00e5d0cd9
This commit adds helper client methods that will aid in working with
finalizers of the CRDs and other stuff. Also one place where we remove
the finalizer already is updated to use the methods.
Change-Id: I665e03f80102a08b2c3ec412a4417c3a32f9384b
This patch moves the namespace handling to be more aligned
with the k8s style.
Depends-on: If0aaf748d13027b3d660aa0f74c4f6653e911250
Change-Id: Ia2811d743f6c4791321b05977118d0b4276787b5
Apparently it is possible to override Neutron's MTU setting through DHCP
agent. This may lead to a situation when node (VM) network will have a
different MTU than pod network. In such case setting pod network's MTU
on a Pod's veth pair will fail due to MTU mismatch.
This commit makes sure we detect such situation soon and produce a log
message with a hint about the root cause.
Change-Id: Ib694950c77ac7c3fd480f579b627dc79bfceac85
Closes-Bug: 1863212
This patch creates a npwg multi-vif driver which can parse the
Pod annotations and CRD defined in Network Plumbing Working
Group CRD SPEC.
Implements: blueprint kuryr-npwg-spec-support
Change-Id: I9ee9643b468a5fe453541b9cf1acf31ca872a313
This is the second patch of the Ingress Controller capability.
In order for the K8S Ingress and OpenShift Route resources to work,
the cluster must have an Ingress Controller running.
This patch extends LBaaS driver to support L7 load balancing and
verifies, retrieves and stores the L7 router LB (pre-created by admin or
Devstack) details.
The OCP-route and K8S-endpoint handlers (implemented in next patch) will
query the ingress controller for the L7 router details.
Partially Implements: blueprint openshift-router-support
Change-Id: Id55169f6c9c1c607b2aa54c92711dfbd04a9e39d
This patch adds a new subnet driver that creates a new network
for each created k8s namespace. It makes use of K8s CRDs to store
the information about the network resources created for each
namespace
Partially Implements: blueprint network-namespace
Change-Id: I7988e1da7a9ed57f29c85ddcd99bb2c87808010e
This patch adds support for nodes with different vif drivers as
well as different pool drivers for each vif driver type.
Closes-Bug: 1747406
Change-Id: I842fd4b513a5f325d598d677e5008f9ea51adab9
Upon K8S service creation the LBaaS handler creates all LB resources
at neutron (LB,Listener,Pool,etc) and store them at K8S resource
using annotation.
When K8S service is deleted, the LBaaS handler retrieves LB
resources details from annotation and release them at neutron.
This patch handles the case in which K8S service resource was deleted
before LBaaS handler stored openstack resource details.
Closes-Bug: 1748890
Change-Id: Iea806d32c99cd3cf51a832b576ff4054fc522bd3
Currently nested containers can only be run by using trunk support and
vlan based interfaces. This patch introduces the additional option of
MACVLAN slave interfaces for pods running in VMs.
This patch includes both a new VIF driver on the controller side and the
binding driver for the CNI plugin.
Implements: blueprint macvlan-pod-in-vm
Depends-On: Ib71204d2d14d3d4f15beada701094e37d89d7801
Co-Authored-By: Marco Chiappero <marco.chiappero@intel.com>
Change-Id: I03c536bb0057bba0a5eb4d1c135baa8ab625e400
In order to better organize nested drivers (VLAN and MACVLAN),
refactor the class hierachy of VIF drivers, providing better locations
for shared code. In particular:
- add an additional abstract class named NestedPodVIFDriver for nested
drivers to share common code, to accomodate the upcoming MACVLAN
driver
- rename GenericPodVIFDriver to NeutronPodVIFDriver (all the drivers are
Neutron specific)
This change is part of the MACVLAN based pod-in-VM spec and should be
applied before any following MACVLAN related patches.
Implements: blueprint
https://blueprints.launchpad.net/kuryr-kubernetes/+spec/macvlan-pod-in-vm
Change-Id: Ib71204d2d14d3d4f15beada701094e37d89d7801
Signed-off-by: Marco Chiappero <marco.chiappero@intel.com>
This patch provides an experimental CNI driver. It's primary purpose
is to enable development of other components (e.g. functional tests,
service/LBaaSv2 support). It is expected to be replaced with daemon
to configure VIF and connect it to the pods and a small lightweight
client to serve as CNI driver called by Kubernetes.
NOTE: unit tests are not provided as part of this patch as it is yet
unclear what parts of it will be reused in daemon-based
implementation.
Change-Id: Iacc8439dd3aee910d542e48ed013d6d3f354786e
Partially-Implements: blueprint kuryr-k8s-integration
This patch introduces a driver that manages normal Neutron ports to
provide VIFs for Kubernetes Pods.
Change-Id: Ice32e96e107f7b7331caca3b79c488532710b4a2
Partially-Implements: blueprint kuryr-k8s-integration
This patch introduces Port-to-VIF translation to 'os_vif_util' and
implements a translator that supports hybrid OpenVSwitch plugging
case.
Change-Id: I9f5c36fa32b51da8cccf377455b096270f23a782
Partially-Implements: blueprint kuryr-k8s-integration
This patch adds the Retry handler that can be used as part of the
event handling pipeline to retry failed handlers.
Change-Id: Ia86790de8efa6a3ef5b677a70ffbd2d8201f9d95
Partially-Implements: blueprint kuryr-k8s-integration
Adds basic K8s client implementation and CONF-based singletons for
both Neutron and K8s clients.
The K8s client added by this patch should be considered a temporary
solution that only implements the necessary parts to let us move
forward with kuryr-kubernetes. Eventually it will be replaced by either
[1] or [2].
The problem with [1] is that it does not yet support the streaming API
that we need for WATCH. And [2] is outside of the OSt umbrella, so [1]
is preferred over [2] unless [2] makes it into global-requirements.txt.
[1] https://github.com/openstack/python-k8sclient
[2] https://pypi.python.org/pypi/pykube
NOTE: Removed py3-related code from config and top-level __init__.
How to properly deal with that code is TBD.
Change-Id: Ib4eb410eaf9725c296fcdddd8857eb24b8929915
Partially-Implements: blueprint kuryr-k8s-integration