When a lb transitions to ERROR or the IP on the Service
spec differs from the lb VIP, the lb is released and
the CRD doesn't get updated, causing Not Found expections
when handling the creation of others load balancer
resources. This commit fixes the issue by ensuring the
clean up of the status field happens upon lb release.
Also, it adds protection in case we still get
nonexistent lb on the CRD.
Closes-Bug: 1894758
Change-Id: I484ece6a7b52b51d878f724bd4fad0494eb759d6
By default, the Octavia health manager will create a number of
threads based on the number of cores available on the controller.
Since the kuryr test jobs are memory constrained and has a limited
scale, let's pin the number of health manager threads to a number
lower than the cores would dictate. This should save RAM for the
kuryr test jobs.
Change-Id: Idf53ed23664b684a4e378783aabef5898593af77
This fixes 3 issues in order to unblock CI:
1. OVN job is fixed by updating it's configs to match what Neutron uses
to configure DevStack with OVN.
2. lower-constraints job will run on Ubuntu Bionic and Python 3.6,
allowing it to work with deps that don't support 3.8.
3. Swap size in our jobs is set to 8 GBs as recently the default got
shrinked to 1 GB which is way too little for our Amphora jobs.
Also some additional OVN debugging info will get gathered.
Change-Id: I70c867ac21004586c014e9eb797dbf528dd6e3f2
This is another attempt at getting the useless tracebacks out of logs
for any level higher than debug. In particular:
* All pools logs regarding "Kuryr-controller not yet ready to *" are now
on debug level.
* The ugly ResourceNotReady raised from _populate_pool is now suppressed
if method is run from eventlet.spawn(), which prevents that exception
being logged by eventlet.
* ResourceNotReady will only print namespace/name or name, not the full
resource representation.
* ConnectionResetError is suppressed on Retry handler level just as any
other K8sClientError.
Change-Id: Ic6e6ee556f36ef4fe3429e8e1e4a2ddc7e8251dc
Apparently Barbican is Octavia requirements only if we're doing TLS
offload on the LB. As we're just doing HTTPS passthrough it shouldn't be
required for our use case and should be safe to remove.
Change-Id: Ic2f6297691cd6bfe9def6d5d6ba0dea24579bfcc
When syncing the load balancer security group,
the klb crd is updated with the new security groups,
but the event proceeds to be handled with outdated sgs.
This commit ensures the klb crd resource is updated.
Change-Id: Ia7064bc4b599b8dc05572717f4d80acc03583c20
It might happen, that during adding and instantly removing pod, vif
handler will not be able to set the finalizer for such object, resulting
in error. In this patch we prevent for such case.
Closes-Bug: 1894212
Change-Id: I5b3b4b36ae0c2fca7c16153f75c62607efbc3ce4
It may happen that between we list some resources and start iterating
through them, some get deleted. This commit makes sure we're ignoring
errors coming from such situations in NP code.
Closes-Bug: 1894194
Change-Id: I082ab9d5881eab5a4686f4f3ec43b1cd0d8e8ad8
Neutron clears device_owner when port is detached. This means that with
pools we need to consider ports without device_owner set when doing
cleanup on namespace deletion.
Change-Id: Ic38015cba27d8418175027ec4e433df32eae4706
10-kuryr.conflist file from the template file kuryr.conflist.template
1. Currently kubelet's cni config file uses 10-kuryr.conf in kuryr. The
kubernetes can support the config file with two suffixes, ".conf" and
".conflist", the latter can be a list containing multiple cni plugins.
So I want to update the 10-kuryr.conf to 10-kuryr.conflist to satisfy
the need to use multiple plug-in support.
2. If I install the kuryr-cni by pod, the 10-kuryr.conf will only be
copied and overwrited from the kuryr/cni image container. What I
expected is we can customize the config file more freely. So I think we
can add an other file kuryr.conflist.template as the template, then we
generate the 10-kuryr.conflist fron it.
Change-Id: Ie3669db4e60a57b24012124cd24b524eb03f55cf
When creating a Service of LoadBalancer type, a external_svc_net
config must be set, otherwise the Floating IP for the respective
LoadBalancer won't get created and an exception will be raised of
missing config. This commit logs and skip the FIP creation when
the needed config is not set.
Change-Id: I155d11f7080d2d4dbeca6de85b2f284aaacd7d8d
If a loadBalancerIP is defined on the Service it should be used
on the created Load Balancer. However, that is not happening as
the LB CRD can be missing the lb_ip field definition, consequently
a new Floating IP is allocated to the Load Balancer. This commit
fixes the issue by ensuring the lb_ip is present on the CRD if
defined on the Service.
Closes-Bug: 1893927
Change-Id: Ie626328972587772dbeeaf52f96b312de41026f2
When a Service is created before the Pods, no security group to be
enforced on the Load Balancer exists as no matching Pod, which are
the ones providing the sgs, is present yet. Consequently, the LB
gets created with admin_state down due to no sg present and ACLs
blocking the traffic. This commit ensures the LB sg crd holds the
updated Pods' sg before enforcing them on the LB members.
Closes-Bug: 1893910
Change-Id: I9eb52a4d6b82d70493d109e9eb9be8706142bc7d
There is a delay between namespace termination and having all the
objects from that namespace marked as being terminated. This means that
if on CRD creation we'll get a response stating that the namespace is
being terminated - we should ignore it, not fail. This commit makes sure
we only log a warning for those errors.
Change-Id: I87006ffbabea90babd0245753f0d3a0b860b88b5
Closes-Bug: 1893772
Namespace deletion can be stuck if not all the ports belonging
to the associated namespace networks are deleted.
This patch enforce proper clean up of subports by:
- Ensuring no request/release VIF actions are processed until the pools
are recovered by the controller
- Ensure if there are subports leftover when removing a namespace those
are detached from the pool and deleted regardless of their status
Change-Id: I2cb1586fa1f88bab191af0ead22a2b8afca91a3b
If there are no resources already created, just return without further processing
Change-Id: I2c433d5b4cece373004a3b4099e10b5d6e664ff4
Closes-Bug: #1893213
It might happen, that operator would manually removing KuryrPort CRD
corresponding to the pod, and than pod would become broken - there
would be no connectivity to it.
In this patch we prevent from removing KP, in case Pod is not in
deleting state. Now, it will defer the deletion till the Pod deletion.
Change-Id: Idd6383296723c41ed6f29715969c452ef3fcdb1f
It seems that latest crio versions moved removal of network namespace to
later in the container lifecycle. This means that sometimes the network
namespace of a container will hang after pod is already gone from the
API and Kuryr assigned it's VIF to another pod. This might cause VLAN ID
conflicts, but normally that's not an issue as kuryr-daemon is removing
interfaces from container namespace on CNI DEL.
It may however happen that if kuryr-daemon was down when CNI DEL
happened, the info about the VIF saved in KuryrPort will already be
gone. In that case we simply returned success to the CNI without doing
any unplugging. Now if the netns is not removed immediately after that
we might end up with VLAN ID conflicts.
This commit makes sure that even if VIF info is gone, kuryr-daemon will
at least attempt to remove the container interface from the container
netns. This should limit the problem.
Change-Id: Ie7d4966473c83554786e79aea0d28a26de902a66
Closes-Bug: 1892388
The DevStack plugin is really designed just for gate testing and
developer usage. In both use cases normally we won't run K8s services on
multiple nodes. Because of that this commit disables leader election for
kube-scheduler and kube-controller-manager. The motivation behind this
is that on the OpenInfra gate VMs we often see degraded storage
performance that affects etcd. If that happens during leader-election of
a service, it'll crash, leaving us with broken and unrecoverable K8s.
This should eliminate the issue.
Change-Id: Ib84ed289c9aaf3ae27af1c7ffa8aab2f8ba48671
In case of missing pod, KuryrPort should remove itself by removing its
finalizer, not non-existent pod finalizer.
Closes-Bug: 1892863
Change-Id: I10fc315b6ff456282d71d84e6cee4c226ac5cdba
We use affectedPods to comfortably track the list of the pods that the
NetworkPolicy indirectly targets (i.e. matches their ports). It doesn't
make sense to put pods without IP there, as well as it is impossible now
with new KuryrNetworkPolicy CRD.
We haven't seen that problem on previous CRD as we've used a weird
format to save that info: {'<pod-ip>': '<pod-namespace'}. If <pod-ip>
was None, json.dumps serialized that into {'null': '<pod-namespace>'},
which was as happily accepted by K8s API as it was utterly useless.
This commit makes sure we only put pods with IP on affectedPods field.
Please also note that we already have protection in place to make sure
we won't create rules for pods without IP (those rules would effectively
open too much traffic), so that is already covered.
Change-Id: Ie82a153c89119fc8f70071353c8e46b27d643935
Closes-Bug: 1892208