In case when L3 agent is hosting routers which have got subnets
with Prefix Delegation enabled, agent couldn't properly handle
IpAddressAlreadyExists exception raised when pd module tries to
configure link local IPv6 addresses.
Now this is fixed and L3 agent can restart without problems in such
case.
Change-Id: Icc995f7b2b465921e41342711d17539f16ead0ce
Closes-Bug: #1892362
Now that we are python3 only, we should move to using the built
in version of mock that supports all of our testing needs and
remove the dependency on the "mock" package.
This patch moves all references to "import mock" to
"from unittest import mock". It also cleans up some new line
inconsistency.
Fixed an inconsistency in the OVSBridge.deferred() definition
as it needs to also have an *args argument.
Fixed an issue where an l3-agent test was mocking
functools.partial, causing a python3.8 failure.
Unit tests only, removing from tests/base.py affects
functional tests which need additional work.
Change-Id: I40e8a8410840c3774c72ae1a8054574445d66ece
When the L3 agent get a router update notification, it will try to
retrieve the router info from neutron server. But at this time, if
the message queue is down/unreachable. It will get exceptions related
message queue. The resync actions will be run then. Sometimes, rabbitMQ
cluster is not so much easy to recover. Then Long time MQ recover time
will cause the router info sync RPC never get successful until it meets
the max retry time. Then the bad thing happens, L3 agent is trying to
remove the router now. It basically shutdown all the existing L3 traffic
of this router.
This patch directly removes the final router removal action, let the
router run as it is.
Closes-Bug: #1871850
Change-Id: I9062638366b45a7a930f31185cd6e23901a43957
Seems that is_enabled_and_bind_by_default() from
neutron.common.ipv6_utils was copied directly into
oslo_utils.netutils, so start using it instead.
Trivialfix
Change-Id: I00fa441e7a20fcd1115485bb8ab75750e6a8cf07
In some deployments, the "neutron" user does not have the permissions
to modify the kernel interfaces. In those cases the radvd user should
be defined. This patch introduces a new config option: "radvd_user".
This config option is the username passed to radvd, used to drop root
privileges and change user ID to username and group ID to the primary
group of username. If no user specified (by default is an empty string),
the user executing the L3 agent will be passed. If "root" specified,
because radvd is spawned as root, no "username" parameter will be
passed.
Change-Id: Ie9a6fbf04d453a3c1c0bddf9ecaa3d4d6467e8ff
Closes-Bug: #1844688
L3 agent supports multiple external networks from a long
time ago, so remove this RPC call since it is not used.
According to codesearch of [1] and [2], we just remove
neutron built-in L3 agent RPC. For neutron server side
or RPC callback classes, the function is still remained.
[1] http://codesearch.openstack.org/?q=get_external_network_id
[2] http://codesearch.openstack.org/?q=L3RpcCallback
Change-Id: I764423e175d6e82729a647e415a9f267f495916f
Closes-Bug: #1844168
As described in the bug, when a HA router transitions from "master" to
"backup", "keepalived" processes will set the virtual IP in all other
HA routers. Each HA router will then advert it and "keepalived" will
decide, according to a trivial algorithm (higher interface IP), which
one should be "master". At this point, the other "keepalived" processes
running in the other servers, will remove the HA router virtual IP
assigned an instant before
To avoid transitioning some routers form "backup" to "master" and then
to "backup" in a very short period, this patch delays the "backup" to
"master" transition, waiting for a possible new "backup" state. If
during the waiting period (set to the HA VRRP advert time, 2 seconds
default) to set the HA state to "master", the L3 agent receives a new
"backup" HA state, the L3 agent does nothing.
Closes-Bug: #1837635
Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
In order to support out of tree interface drivers it is required
to pass a callback to allow the drivers to query information about
the network.
- Allow passing **kwargs to interface drivers
- Pass get_networks() as `get_networks_cb` kw arg
`get_networks_cb` has the same API as
`neutron.neutron_plugin_base_v2.NeutronPluginBaseV2.get_networks()`
minus the the request context which will be embeded in the callback
itself.
The out of tree interface drivers in question are:
MultiInterfaceDriver - a per-physnet interface driver that delegates
operations on a per-physnet basis.
IPoIBInterfaceDriver - an interface driver for IPoIB (IP over Infiniband)
networks.
Those drivers are a part of networking-mlnx[1], Their implementation
is vendor agnostic so they can later be moved to a more common place
if desired.
[1] https://github.com/openstack/networking-mlnx
Change-Id: I74d9f449fb24f64548b0f6db4d5562f7447efb25
Closes-Bug: #1834176
This option was deprecated since couple of releases already.
In Stein we removed 'external_network_bridge' option from L3 agent's
config so now it's time to remove also this one.
There is also new upgrade check introduced to check and warn
users if gateway_external_network_id was used in the deployment.
This patch also removes method _check_router_needs_rescheduling() from
neutron/db/l3_db.py module as it is not needed anymore.
This patch also removes unit tests:
test_update_gateway_agent_exists_supporting_network
test_update_gateway_agent_exists_supporting_multiple_network
test_router_update_gateway_no_eligible_l3_agent
from neutron/tests/unit/extensions/test_l3.py module as those
tests are not needed when there is no "gateway_external_network_id"
config option anymore.
Change-Id: Id01571cd42cfe9c5ce91e90159917c7d3c963878
Removed E125 (continuation line does not distinguish itself
from next logical line) from the ignore list and fixed all
the indentation issues. Didn't think it was going to be
close to 100 files when I started.
Change-Id: I0a6f5efec4b7d8d3632dd9dbb43e0ab58af9dff3
In case router is concurrently deleted, so the HA
state change LOG is not necessary. It sometimes
makes us confusing.
Also print the log for the pid of router
keepalived-state-change child process.
Change-Id: Id57dd787c254994af967db17647a3a28925714da
Related-Bug: #1798475
There was mock of ri._get_floatingips_bound_to_host() in
L3 test_agent unit test module. This method was removed long time
ago in [1] so this mock is not necessary anymore.
TrivialFix
[1] https://review.opendev.org/#/c/499725/
Change-Id: Ia93cab667f8154663ba62b78bc0329ee16b8202c
Change [1] removed the deprecated option external_network_bridge. Per
commit message in change [2], "l3 agent can handle any networks by
setting the neutron parameter external_network_bridge and
gateway_external_network_id to empty". So the consequence of [1] was to
introduce a regression whereby multiple external networks are not
supported by the L3 agent anymore.
This change proposes a new simplified rule. If
gateway_external_network_id is defined, that is the network that the L3
agent will use. If not and multiple external networks exist, the L3
agent will handle any of them.
[1] https://review.opendev.org/#/c/567369/
[2] https://review.opendev.org/#/c/59359
Change-Id: Idd766bd069eda85ab6876a78b8b050ee5ab66cf6
Closes-Bug: #1824571
Currently, most implementations override the L3NatAgent class itself
for their own logic since there is no proper interface to extend
RouterInfo class. This adds unnecessary complexity for developers
who just want to extend router mechanism instead of whole RPC.
Add a RouterFactory class that developer can registers RouterInfo class
and delegate it for RouterInfo creation. Seperate functions and variables
which currently used externally to abstract class from RouterInfo, so that
extension can use the basic interface.
Provide the router registration function to the l3 extension API so that
extension can extend RouterInfo itself which correspond to each features
(ha, distribtued, ha + distributed)
Depends-On: https://review.openstack.org/#/c/620348/
Closes-Bug: #1804634
Partially-Implements: blueprint openflow-based-dvr
Change-Id: I1eff726900a8e67596814ca9a5f392938f154d7b
We have a problem with SNAT with too many connections using the
same source and destination on the network nodes.
In addition we can see in the conntrack table that the who
"instert_failed" increases.
This might be a generic problem with conntrack and linux.
We suspect that we encounter the following "limitation / bug"
in the kernel.
There seems to be a workaround to alleviate this behavior by
setting the -random-fully flag in iptables for port consumption.
This patch fixes the problem by adding the --random-fully to
the SNAT rules.
Change-Id: I246c1f56df889bad9c7e140b56c3614124d80a19
Closes-Bug: #1814002
Without this mock UT related configure_ipv6 method were failing
e.g. on systems when IPv6 was disabled or on systems where such
check should be done differently, like MacOS.
Change-Id: I6d13ab1db1d5465b2ff6abf6499e0d17e1ee8bbb
All of the externally consumed variables from neutron.common.constants
now live in neutron-lib. This patch removes neutron.common.constants
and switches all uses over to lib.
NeutronLibImpact
Depends-On: https://review.openstack.org/#/c/647836/
Change-Id: I3c2f28ecd18996a1cee1ae3af399166defe9da87
When HA router is created in "stanby" mode, ipv6 forwarding is
disabled by default in its namespace.
But when router is transitioned to be "master" on node, ipv6
forwarding should be enabled. This was fine for routers with
configured gateway but we somehow missed the case when router don't
have gateway configured.
Because of that missing ipv6 forwarding setting in such case, IPv6
W-E traffic between 2 subnets was not working fine in L3 HA case.
This patch fixes it by adding configuring ipv6_forwarding on
"all" interface in router's namespace always, even if it don't have
gateway configured.
Change-Id: I8b1b2b426f7a26a4b2407a83f9bf29dd6e9ba7b0
CLoses-Bug: #1818224
Reduces E128 warnings by ~260 to just ~900,
no way we're getting rid of all of them at once (or ever).
Files under neutron/tests still have a ton of E128 warnings.
Change-Id: I9137150ccf129bf443e33428267cd4bc9c323b54
Co-Authored-By: Akihiro Motoki <amotoki@gmail.com>
This option is deprecated and marked to be deleted in Ocata. So
as we are now in Stein development cycle I think that it's good time
to remove it.
Change-Id: I07474713206c218710544ad98c08caaa37dbf53a
Removing an active or a standby HA router from an agent that has a
valid DVR serviceable port (such as DHCP), does not remove the
HA interface associated with the Router in the SNAT namespace.
When we try to add the HA router back to the agent, then it
adds more than one HA interface to the SNAT Namespace causing
more problems and we sometimes also see multiple active routers.
This bug might have been introduced by this patch [1].
Fix the problem by just adding the router namespaces without HA
interfaces when there is no HA and re-insert the HA interfaces
when HA router is bound to the agent into the namespace.
[1] https://review.openstack.org/#/c/522362/
Closes-Bug: #1816698
Change-Id: Ie625abcb73f8185bb2bee06dcd26a01d8af0b0d1
In case when L3 agent is running in dvr_snat mode on compute node,
it is like that e.g. in some of the gate jobs, it may happen that
same router is scheduled to be in standby mode on compute node and
on same compute node there is instance connected to it.
So in such case metadata proxy needs to be spawned in router namespace
even if it is in standby mode.
Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
Closes-Bug: #1817956
Closes-Bug: #1606741
Need to pass centralized floating IPs as preserve_ips to
_external_gateway_added during DVR router update.
Otherwise IP addresses will be deleted from gw device in certain case.
The case is when a router with active centralized floating IPs is
being scheduled to a new dvr_snat L3 agent (rescheduled from a down one).
Please see corresponding traces in the bug description.
Change-Id: Iaeb9fbed73144df6fcd9092c665ed19986e85f4d
Closes-bug: #1817306
If l3-agent was restarted by a regular action, such as config change,
package upgrade, manually service restart etc. We should not set the
HA port down during such scenarios. Unless the physical host was
rebooted, aka the VRRP processes were all terminated.
This patch adds a new RPC call during l3 agent init, it will try to
retrieve the HA router count first. And then compare the VRRP process
(keepalived) count and 'neutron-keepalived-state-change' count
with the hosting router count. If the count matches, then that
set HA port to 'DOWN' state action will not be triggered anymore.
Closes-Bug: #1798475
Change-Id: I5e2bb64df0aaab11a640a798963372c8d91a06a8
Today the neutron common exceptions already live in neutron-lib and are
shimmed from neutron. This patch removes the neutron.common.exceptions
module and changes neutron's imports over to use their respective
neutron-lib exception module instead.
NeutronLibImpact
Change-Id: I9704f20eb21da85d2cf024d83338b3d94593671e
The L3AgentExtension class delete_router() method expects a
dict as it's 'data' argument, but the l3-agent code that
deletes a router was passing just the router ID. Change to
correctly pass a router dictionary if one exists.
Change-Id: I112d1f8dce9defddfbd8fbfa75bf538e308e1561
Closes-bug: #1809134
With DVR routers, if a port is associated with a FloatingIP,
before it is used by a VM, the FloatingIP will be initially
started at the Network Node SNAT Namespace, since the port
is not bound to any host.
Then when the port is attached to a VM, the port gets its
host binding, and then the FloatingIP setup should be migrated
to the Compute host and the original FloatingIP in the Network
Node SNAT Namespace should be cleared.
But the original FloatingIP setup in SNAT Namespace was not
cleared by the agent.
This patch addresses the issue.
Change-Id: I55a16bcc0020087aa1abe76f5bc85cd64ccdaecd
Closes-Bug: #1796491
In case when 2 dvr routers are connected to each other with
tenant network, those routers needs to be always deployed
on same compute nodes.
So this patch changes dvr routers scheduler that it will create
dvr router on each host on which there are vms or other dvr routers
connected to same subnets.
Co-Authored-By: Swaminathan Vasudevan <SVasudevan@suse.com>
Closes-Bug: #1786272
Change-Id: I579c2522f8aed2b4388afacba34d9ffdc26708e3
For L3 DVR HA router, the centralized floating IP nat rules are not
installed in every HA node snat namespace. So, install the rules to
all the router snat-namespace on every scheduled HA router host.
Closes-Bug: #1793527
Change-Id: I08132510b3ed374a3f85146498f3624a103873d7
On l3-agent restart, prefix delegation subnets weren't always
inserted into the local router_info cache, leading to a missing
ip6tables rule. Add it when the internal network is configured
if the prefix has already been assigned.
Change-Id: Ic045e2763ba2772bcaf037591821501e84e40878
Closes-bug: #1789403
This patch contains the l3 agent extension and agent part code.
This patch introduce a new l3 agent extension named "port_forwarding",
to process the binding of the port forwarding resources, manage its own
floatingip configuration on router interface and floatingip status.
Currrently, we support all Neutron Router reference implementations.
This extension uses the period router sync task and PortForwarding OVO
rpc.
* The main idea about this new extension is using the generic router sync
rpc to maintain the host port forwarding resources,
* For a single port forwarding create/update/delete, process it one by one
in smaller scope for forbidding refresh the iptables with a larger
scope frequently.
Partially-Implements: blueprint port-forwarding
Partial-Bug: #1491317
Change-Id: Ic56e67d428f6177099c285a9d1bccabc1e710f2b
Moved the router processing queue code to the agent/common
directory and renamed it "resource processing queue". This
way it can be consumed by other agents, or possibly even
moved to neutron-lib in the future.
Change-Id: I735cf5b0a915828c420c3316b78a48f6d54035e6
radvd needs to run as root, but has the capability to drop privileges on
linux hosts. Currently, radvd process is not using this feature and
this can be considered a serious risk.
In addition, some distributions like SUSE, radvd process runs as a non
privileged user by default, causing radvd failure to daemonize
because it can't write the pid in the corresponding neutron folder and
break the IPv6 functionality.
This patch allows radvd process to run with the same user used by
neutron. In order to allow this, it changes the radvd config file
permissions to 444 because radvd doesn't allow that this file can be
writeable by self/group. The readonly mode is not a problem updating the
file because of the way the neutron_lib replace_file function handles
the files operations.
Closes-Bug: #1777922
Change-Id: Ic5d976ba71a966a537d1f31888f82997a7ccb0de
Signed-off-by: aojeagarcia <aojeagarcia@suse.com>