update port may takes an excessive number of seconds
to complete if dvr routers are running on more than 100
compute nodes. This patch tries to save some time by removing
unnecessary calls inside looping through hosts.
Change-Id: Ide740e0c5c43c2d2b842460a37c8ce125da12b28
Closes-Bug: #1830456
(cherry picked from commit 00eb6f26f6)
We have a problem with SNAT with too many connections using the
same source and destination on the network nodes.
In addition we can see in the conntrack table that the who
"instert_failed" increases.
This might be a generic problem with conntrack and linux.
We suspect that we encounter the following "limitation / bug"
in the kernel.
There seems to be a workaround to alleviate this behavior by
setting the -random-fully flag in iptables for port consumption.
This patch fixes the problem by adding the --random-fully to
the SNAT rules.
Conflicts:
neutron/agent/linux/iptables_manager.py
neutron/common/constants.py
neutron/tests/unit/agent/l3/test_agent.py
Change-Id: I246c1f56df889bad9c7e140b56c3614124d80a19
Closes-Bug: #1814002
(cherry picked from commit 30f35e08f9)
It may happen that subnet is connected to dvr router using IP address
different than subnet's gateway_ip.
So in br-tun arp to dvr router's port should be dropped instead of
dropping arp to subnet's gateway_ip (or mac in case of IPv6).
Conflicts:
neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py
Change-Id: Ida6b7ae53f3fc76f54e389c5f7131b5a66f533ce
Closes-bug: #1831575
(cherry picked from commit ae3aa28f5a)
bandit is a linter and is listed in the "blacklist" from the
requirements repo, so it does not appear in the constraints lists.
Project teams are expected to manage the verions(s) allowed on their
own, to allow different teams to roll ahead to new versions as they can
rather than having the entire community do it in lock-step. This change
caps the version of bandit to the one available during the rocky
development cycle to avoid introducing the new rules from newer releases
into a stable branch.
This patch also changes to use older keepalived version in functional
tests.
This issue is reported in bug 1788185.
It looks that current keepalived version which is available in
Ubuntu Xenial repositories (1:1.2.24-1ubuntu0.16.04.1) is broken
and cause failure of some functional tests in Neutron.
Details are in [1].
Older version works fine so as temporary solution we can use
this version in functional tests.
This issue don't happens on master and stable/rocky branch, as there
newer cloud-archive repo is used and it has newer version of keepalived
which works fine.
[1] https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1789045
Change-Id: Ia59de069b29f584cce21163a77812ec0ed243e65
Closes-Bug: #1788185
(cherry picked from commit 159490502e)
In functional tests for L3 HA agent, like e.g.
L3HATestFailover.test_ha_router_failover
it may happen that L3 agent will not change ipv6 accept_ra
knob and test fails because it checks that only once just
after router state is change.
This patch fixes that race by adding wait for 60 seconds to
ipv6 accept_ra change.
Conflicts:
neutron/tests/functional/agent/l3/framework.py
Change-Id: I459ce4b791c27b1e3d977e0de9fbdb21a8a379f5
Closes-Bug: #1829889
(cherry picked from commit 62b2f2b1b1)
There are some extreme conditions which will result the unbound
router gateway port. Then all the centralized floating IPs will
not be reachable since the gateway port was set to 4095 tag.
This patch adds the HA status to the router related port
processing code path. If it is HA router, the gateway port
will go to the right HA router processing code branch.
Closes-Bug: #1827754
Change-Id: Ida1c9f3a38171ea82adc2f11cb17945d6e2434be
(cherry picked from commit 3d99147e73)
RPC notifier method can sometimes be time-consuming,
this will cause other parallel processing resources
fail to send notifications in time. This patch changes
the notify to asynchronous.
Closes-Bug: #1824911
Change-Id: I3f555a0c78fbc02d8214f12b62c37d140bc71da1
(cherry picked from commit 0f471a47c0)
Once HA port is set, it must remain this value no matter
what the server return. Because there is race condition
between l3-agent side sync router info for processing
and server side router deleting.
This patch adds a helper function for every ha_port set
action. If the ha_port is not None, it will always stay
with original value.
Conflicts:
neutron/tests/unit/agent/l3/test_ha_router.py
Closes-Bug: #1826726
Change-Id: I96a088d25048be02a9c5b12c1d087df075b36fc4
(cherry picked from commit 45957f12c8)
(cherry picked from commit 13cb3cd34c)
For auto-address IPv6 subnets postcommit has update port action
if the net already has ports. This results in
"cannot be called within a transaction" error for bulk IPv6 subnet
create.
Closes-Bug: #1822582
Change-Id: Ia32ec4c11c0793e7df07dcce19c122b3c7f865e1
(cherry picked from commit 14c76d3181)
1. give each HA failover case an independent vrrp_id
2. give each HA port an independent IP address, so the
interface IPs for router HA ports will be:
169.254.192.100 and 169.254.192.101
169.254.192.102 and 169.254.192.103
169.254.192.104 and 169.254.192.105
169.254.192.106 and 169.254.192.107
VIP of each case will be:
169.254.0.10/24
169.254.0.11/24
169.254.0.12/24
169.254.0.13/24
169.254.0.14/24
Conflicts:
neutron/tests/functional/agent/l3/test_dvr_router.py
Closes-Bug: #1819160
Change-Id: I1216d96af40449ec16a852cc1f6c4f15c85f4546
(cherry picked from commit c69a87405a)
(cherry picked from commit 2c5957f56d)
(cherry picked from commit c50bdf2329)
(cherry picked from commit 7b2a8f795f)
When two routers are created at the same time, we can't assume the
status of each one. Instead of this, the status of each router is
first checked and then compared to the other router status.
Conflicts:
neutron/tests/functional/agent/l3/test_dvr_router.py
Change-Id: If20a3a414986ea29fbfd50616761c14e5b249b2c
Closes-Bug: #1819160
(cherry picked from commit 8f35331c91)
(cherry picked from commit 8ba5899942)
The test bridge veth pair devices is not up which cause the
VRRP advertisement packet can not pass to each HA port. Then
multiple master router is up. This patch just sets the veth
pair devices up.
Closes-Bug: #1819160
Change-Id: I0e0d0311d73bce83d3c7341e7a0167917818b1ff
(cherry picked from commit 8cc480bd01)
This change is adding required configuration in neutron.conf
to set the lock_path parameter, which was missing in
compute-install-ubuntu.rst
Change-Id: If090bdf060dfe21d11b1a5dfd010dc8167d9e45e
Closes-Bug: #1796976
(cherry picked from commit f4d438019e)
Removing an active or a standby HA router from an agent that has a
valid DVR serviceable port (such as DHCP), does not remove the
HA interface associated with the Router in the SNAT namespace.
When we try to add the HA router back to the agent, then it
adds more than one HA interface to the SNAT Namespace causing
more problems and we sometimes also see multiple active routers.
This bug might have been introduced by this patch [1].
Fix the problem by just adding the router namespaces without HA
interfaces when there is no HA and re-insert the HA interfaces
when HA router is bound to the agent into the namespace.
[1] https://review.openstack.org/#/c/522362/
Conflicts:
neutron/agent/l3/agent.py
Closes-Bug: #1816698
Change-Id: Ie625abcb73f8185bb2bee06dcd26a01d8af0b0d1
(cherry picked from commit d9e0bab6ac)
Ovs-agent can process the ports in large sets, then all
of these ports will have to update DB status or attributes.
But neutron server is centralized. It may have to do
something else, or the database processing can be also
time-consuming. Because of these, it sometimes returns
the RPC timeout exception to ovs-agent. And a fullsync
will be triggered in next rpc loop. The restart time is
becoming longer and longer.
Adds a default step to update the port to reduce
the probability of RPC timeout.
Related-Bug: #1813703
Related-Bug: #1813704
Related-Bug: #1813706
Related-Bug: #1813707
Conflicts:
neutron/tests/unit/plugins/ml2/test_rpc.py
Change-Id: Ie37f4a4869969e235ce16b73cdfcbdc98626823e
(cherry picked from commit 8408af4f17)
(cherry picked from commit d7d30ea950)
(cherry picked from commit 5d705468de)
HA routers are using keepalived and needs to have virtual_router_id
configured. As routers which belongs to same tenant are using same
ha network, those values have to be different for each router.
Before this patch this value was always taken as first available value
from available_vr_ids range.
In some (rare) cases, when more than one router is created in parallel
for same tenant it may happen that those routers would have same vr_id
choosen so keepalived would treat them as single application and only
one router would be ACTIVE on one of L3 agents.
This patch changes this behaviour that now random value from available
vr_ids will be chosen instead of taking first value always.
That should mittigate this rare race condition that it will be (almost)
not noticable for users.
However, proper fix should be probably done as some additional
constraint in database layer. But such solution wouldn't be possible to
backport to stable branches so I decided to propose this easy patch
first.
Conflicts:
neutron/db/l3_hamode_db.py
Change-Id: Idb0ed744e54976dca23593fb2d7317bf77442e65
Related-Bug: #1823314
(cherry picked from commit a8d0f557d5)
(cherry picked from commit ee2ed681c4)
(cherry picked from commit 72c9a7ef84)
The code that ensures the fpr/rfp veth pair exists
between the qrouter and fip namespace was only setting
the mtu of the devices if it had to create them. Set
it all the time to support the mtu being changed.
Change-Id: I176b5f4d4f12cf09f930e2c1944e98082a09bcc6
Closes-bug: #1823798
(cherry picked from commit 6ded6d217a)
In some cases it may happen that when db test will fail due
to timeout oslo_db.exception.DBConnectionError will be raised
instead of sqlalchemy_exc.InterfaceError.
This patch adds handling such case in skip_if_timeout decorator.
Change-Id: I7350d5c884784317c94ff42f28526065ff399b40
Related-Bug: #1687027
(cherry picked from commit b7458b6159)
Kernel 4.4.0-145 backported a change on IPv6 fragmentation API, so
update the OVS version checked out for fullstack tests to a hash
including the needed compatibility layer changes
Change-Id: Ia9383c02e1c62e31db9493729aedbed5b94a3a3f
Closes-bug: #1823155
(cherry picked from commit 004caf773a)
The original fix for bug 1818614 added two new cli args
when spawning neutron-keepalived-state-change but if
e.g. self.agent_conf.AGENT.root_helper_daemon is unset
then "None" string is passed which breaks the
neutron-keepalived-state-change daemon.
Change-Id: I4afcdbbf2f3d2dafcad241ba3fc0778b52b8fc85
Related-Bug: #1818614
Related-Bug: #1823038
(cherry picked from commit afbbec83a2)