When the L3 agent get a router update notification, it will try to
retrieve the router info from neutron server. But at this time, if
the message queue is down/unreachable. It will get exceptions related
message queue. The resync actions will be run then. Sometimes, rabbitMQ
cluster is not so much easy to recover. Then Long time MQ recover time
will cause the router info sync RPC never get successful until it meets
the max retry time. Then the bad thing happens, L3 agent is trying to
remove the router now. It basically shutdown all the existing L3 traffic
of this router.
This patch directly removes the final router removal action, let the
router run as it is.
Closes-Bug: #1871850
Change-Id: I9062638366b45a7a930f31185cd6e23901a43957
(cherry picked from commit 12b9149e20)
As described in the bug, when a HA router transitions from "master" to
"backup", "keepalived" processes will set the virtual IP in all other
HA routers. Each HA router will then advert it and "keepalived" will
decide, according to a trivial algorithm (higher interface IP), which
one should be "master". At this point, the other "keepalived" processes
running in the other servers, will remove the HA router virtual IP
assigned an instant before
To avoid transitioning some routers form "backup" to "master" and then
to "backup" in a very short period, this patch delays the "backup" to
"master" transition, waiting for a possible new "backup" state. If
during the waiting period (set to the HA VRRP advert time, 2 seconds
default) to set the HA state to "master", the L3 agent receives a new
"backup" HA state, the L3 agent does nothing.
Conflicts:
neutron/agent/l3/agent.py
Closes-Bug: #1837635
Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
(cherry picked from commit 3f022a193f)
(cherry picked from commit adac5d9b7a)
Without this mock UT related configure_ipv6 method were failing
e.g. on systems when IPv6 was disabled or on systems where such
check should be done differently, like MacOS.
Change-Id: I6d13ab1db1d5465b2ff6abf6499e0d17e1ee8bbb
(cherry picked from commit c20b5e347d)
In some deployments, the "neutron" user does not have the permissions
to modify the kernel interfaces. In those cases the radvd user should
be defined. This patch introduces a new config option: "radvd_user".
This config option is the username passed to radvd, used to drop root
privileges and change user ID to username and group ID to the primary
group of username. If no user specified (by default is an empty string),
the user executing the L3 agent will be passed. If "root" specified,
because radvd is spawned as root, no "username" parameter will be
passed.
Conflicts:
neutron/tests/unit/agent/l3/test_agent.py
Change-Id: Ie9a6fbf04d453a3c1c0bddf9ecaa3d4d6467e8ff
Closes-Bug: #1844688
(cherry picked from commit 6a5a75d5a6)
(cherry picked from commit 5b6b040d07)
(cherry picked from commit 9921c96218)
We have a problem with SNAT with too many connections using the
same source and destination on the network nodes.
In addition we can see in the conntrack table that the who
"instert_failed" increases.
This might be a generic problem with conntrack and linux.
We suspect that we encounter the following "limitation / bug"
in the kernel.
There seems to be a workaround to alleviate this behavior by
setting the -random-fully flag in iptables for port consumption.
This patch fixes the problem by adding the --random-fully to
the SNAT rules.
Conflicts:
neutron/agent/linux/iptables_manager.py
neutron/common/constants.py
neutron/tests/unit/agent/l3/test_agent.py
Change-Id: I246c1f56df889bad9c7e140b56c3614124d80a19
Closes-Bug: #1814002
(cherry picked from commit 30f35e08f9)
Moved the router processing queue code to the agent/common
directory and renamed it "resource processing queue". This
way it can be consumed by other agents, or possibly even
moved to neutron-lib in the future.
Conflicts:
neutron/agent/l3/agent.py
neutron/tests/unit/agent/l3/test_agent.py
Change-Id: I735cf5b0a915828c420c3316b78a48f6d54035e6
(cherry picked from commit f24f3b6b7b)
When HA router is created in "stanby" mode, ipv6 forwarding is
disabled by default in its namespace.
But when router is transitioned to be "master" on node, ipv6
forwarding should be enabled. This was fine for routers with
configured gateway but we somehow missed the case when router don't
have gateway configured.
Because of that missing ipv6 forwarding setting in such case, IPv6
W-E traffic between 2 subnets was not working fine in L3 HA case.
This patch fixes it by adding configuring ipv6_forwarding on
"all" interface in router's namespace always, even if it don't have
gateway configured.
Conflicts:
neutron/tests/functional/agent/l3/framework.py
Change-Id: I8b1b2b426f7a26a4b2407a83f9bf29dd6e9ba7b0
CLoses-Bug: #1818224
(cherry picked from commit b119247bea)
In case when L3 agent is running in dvr_snat mode on compute node,
it is like that e.g. in some of the gate jobs, it may happen that
same router is scheduled to be in standby mode on compute node and
on same compute node there is instance connected to it.
So in such case metadata proxy needs to be spawned in router namespace
even if it is in standby mode.
Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
Closes-Bug: #1817956
Closes-Bug: #1606741
(cherry picked from commit 6ae228cc2e)
Removing an active or a standby HA router from an agent that has a
valid DVR serviceable port (such as DHCP), does not remove the
HA interface associated with the Router in the SNAT namespace.
When we try to add the HA router back to the agent, then it
adds more than one HA interface to the SNAT Namespace causing
more problems and we sometimes also see multiple active routers.
This bug might have been introduced by this patch [1].
Fix the problem by just adding the router namespaces without HA
interfaces when there is no HA and re-insert the HA interfaces
when HA router is bound to the agent into the namespace.
[1] https://review.openstack.org/#/c/522362/
Closes-Bug: #1816698
Change-Id: Ie625abcb73f8185bb2bee06dcd26a01d8af0b0d1
(cherry picked from commit d9e0bab6ac)
Need to pass centralized floating IPs as preserve_ips to
_external_gateway_added during DVR router update.
Otherwise IP addresses will be deleted from gw device in certain case.
The case is when a router with active centralized floating IPs is
being scheduled to a new dvr_snat L3 agent (rescheduled from a down one).
Please see corresponding traces in the bug description.
Change-Id: Iaeb9fbed73144df6fcd9092c665ed19986e85f4d
Closes-bug: #1817306
(cherry picked from commit 1ee18775a9)
With DVR routers, if a port is associated with a FloatingIP,
before it is used by a VM, the FloatingIP will be initially
started at the Network Node SNAT Namespace, since the port
is not bound to any host.
Then when the port is attached to a VM, the port gets its
host binding, and then the FloatingIP setup should be migrated
to the Compute host and the original FloatingIP in the Network
Node SNAT Namespace should be cleared.
But the original FloatingIP setup in SNAT Namespace was not
cleared by the agent.
This patch addresses the issue.
Change-Id: I55a16bcc0020087aa1abe76f5bc85cd64ccdaecd
Closes-Bug: #1796491
(cherry picked from commit cd0cc47a6a)
In case when 2 dvr routers are connected to each other with
tenant network, those routers needs to be always deployed
on same compute nodes.
So this patch changes dvr routers scheduler that it will create
dvr router on each host on which there are vms or other dvr routers
connected to same subnets.
Co-Authored-By: Swaminathan Vasudevan <SVasudevan@suse.com>
Closes-Bug: #1786272
Change-Id: I579c2522f8aed2b4388afacba34d9ffdc26708e3
(cherry picked from commit 5018d70241)
For L3 DVR HA router, the centralized floating IP nat rules are not
installed in every HA node snat namespace. So, install the rules to
all the router snat-namespace on every scheduled HA router host.
Conflicts:
neutron/tests/common/l3_test_common.py
neutron/tests/functional/agent/l3/test_dvr_router.py
Closes-Bug: #1793527
Change-Id: I08132510b3ed374a3f85146498f3624a103873d7
(cherry picked from commit ee7660f593)
On l3-agent restart, prefix delegation subnets weren't always
inserted into the local router_info cache, leading to a missing
ip6tables rule. Add it when the internal network is configured
if the prefix has already been assigned.
Change-Id: Ic045e2763ba2772bcaf037591821501e84e40878
Closes-bug: #1789403
(cherry picked from commit d19dcf1ef2)
This patch contains the l3 agent extension and agent part code.
This patch introduce a new l3 agent extension named "port_forwarding",
to process the binding of the port forwarding resources, manage its own
floatingip configuration on router interface and floatingip status.
Currrently, we support all Neutron Router reference implementations.
This extension uses the period router sync task and PortForwarding OVO
rpc.
* The main idea about this new extension is using the generic router sync
rpc to maintain the host port forwarding resources,
* For a single port forwarding create/update/delete, process it one by one
in smaller scope for forbidding refresh the iptables with a larger
scope frequently.
Partially-Implements: blueprint port-forwarding
Partial-Bug: #1491317
Change-Id: Ic56e67d428f6177099c285a9d1bccabc1e710f2b
radvd needs to run as root, but has the capability to drop privileges on
linux hosts. Currently, radvd process is not using this feature and
this can be considered a serious risk.
In addition, some distributions like SUSE, radvd process runs as a non
privileged user by default, causing radvd failure to daemonize
because it can't write the pid in the corresponding neutron folder and
break the IPv6 functionality.
This patch allows radvd process to run with the same user used by
neutron. In order to allow this, it changes the radvd config file
permissions to 444 because radvd doesn't allow that this file can be
writeable by self/group. The readonly mode is not a problem updating the
file because of the way the neutron_lib replace_file function handles
the files operations.
Closes-Bug: #1777922
Change-Id: Ic5d976ba71a966a537d1f31888f82997a7ccb0de
Signed-off-by: aojeagarcia <aojeagarcia@suse.com>
Fix W503 (line break before binary operator) pep8 warnings
and no longer ignore new failures.
Trivialfix
Change-Id: I7539f3b7187f2ad40681781f74b6e05a01bac474
l3-agent checks the HA state of routers when a router is updated.
To ensure that the HA state is only checked on HA routers the following
check is performed: `if router.get('ha') and not is_dvr_only_agent`.
This check should ensure that the check is only performed on
DvrEdgeHaRouter and HaRouter objects.
Unfortunately, there are cases where we have DvrEdgeRouter objects
running on 'dvr_snat' agents. E.g. when deploying a loadbalancer with
neutron-lbaas in a landscape with 6 network nodes and
max_l3_agents_per_router set to 3, it may happen that the loadbalancer
is placed on a network node that does not have a DvrEdgeHaRouter running
on it. In such a case, neutron will deploy a DvrEdgeRouter object on the
network node to serve the loadbalancer, just like it would deploy a
DvrEdgeRouter on a compute node when deploying a VM.
Under such circumstances each update to the router will lead to an
AttributeError, because the DvrEdgeRouter object does not have the
ha_state attribute.
This patch circumvents the issue by doing an additional check on the
router object to ensure that it actually has the ha_state attribute.
Change-Id: I755990324db445efd0ee0b8a9db1f4d7bfb58e26
Closes-Bug: #1755243
Move the iptables metadata marking rule earlier in
router init, that way any stray metadata requests
that arrive before the filter metadata redirect rule is
installed will just be dropped. We do this irregardless
of whether we will be running the metadata proxy.
Partial-bug: #1735724
Change-Id: I8982523dbb94a7c5b8a4db88a196fabc4dd2873f
When an HA router initialization fails early, it can lead to:
AttributeError: 'HaRouter' object has no attribute 'process_monitor'
Add init of 'self.process_monitor' in RouterInfo init code in
case we try and cleanup early.
Change-Id: Iddeaeef13adee10f7b130e3f9e584b6e9f037030
Closes-bug: #1735557
Before this change, DVR_SNAT agents would get no routers when
asking for updates due to provisioning of DHCP ports on the
node they are running on. This means that there's no connectivity
between the DHCP port and the network gateway (that may be
hosted on a different node), and therefore things like DNS may
break when a VM attempts resolution when talking to the affected
DHCP port.
This change relaxed a conditional that prevents the right list of
routers to be compiled and returned from the server to the agent.
The agent on the other hand needs to make sure to allocate the
right type of router based on what is being returned from the server.
Closes-bug: #1733987
Change-Id: I6124738c3324e0cc3f7998e3a541ff7547f2a8a7
Commit I9642ed9b513a43c5558f9611f43227299707284a rehomed the
PROVISIONAL_IPV6_PD_PREFIX constant into neutron-lib. This patch
consumes it removing the constant in neutron and using lib's version
of it instead.
NeutronLibImpact
Change-Id: I107cb5e0ff2f3e2c5bb9dc501f420d0be08735a0
As soon as we call router_info.initialize(), we could
possibly try and process a router. If it is HA, and
we have not fully initialized the HA port or keepalived
manager, we could trigger an exception.
Move the call to check_ha_state_for_router() into the
update notification code so it's done after the router
has been created. Updated the functional tests for this
since the unit tests are now invalid.
Also added a retry counter to the RouterUpdate object so
the l3-agent code will stop re-enqueuing the same update
in an infinite loop. We will delete the router if the
limit is reached.
Finally, have the L3 HA code verify that ha_port and
keepalived_manager objects are valid during deletion since
there is no need to do additional work if they are not.
Change-Id: Iae65305cbc04b7af482032ddf06b6f2162a9c862
Closes-bug: #1726370
While running the l3-agent tests, noticed there were
some missing mocks that were either causing exceptions
to be thrown, or generating large log messages. Clean
up what we can.
Trivialfix
Change-Id: I8f21bc6aa3a7706cb15d360356c10f0273039bff
Since ri.ex_gw_port can be None, the l3-agent can throw an
exception when looking for ports it might have in a given
network.
Change-Id: I3ab3e9c012022cd7eefa5c609ca9540649079ad3
Closes-bug: #1724043
This is needed by VPNaaS agent extension and other advanced
services in case they support L3 HA router.
Change-Id: Ice1b1c2ca97f47312e37379106ed5a6580f100dc
Needed-By: I0b86c432e4b2210e5f2a73a7e3ba16d10467f0f2
Related-Bug: #1692128
The neutron-lib commit I360545b6ee4291547e0c5c8e668ad03d3efa4725 moved
the externally consumed globals from neutron.common.constants into lib.
With the exception of PROVISIONAL_IPV6_PD_PREFIX all other constants
in neutron.common.constants should only be used in neutron, and will
hopefully remain that way. External consumers needing access to other
common constants should move them into lib first.
NeutronLibImpact
Change-Id: Ie4bcffccf626a6e1de84af01f3487feb825f8b65
Change network namespace add/delete/list code to use
pyroute2 library instead of calling /sbin/ip.
Also changed all in-tree callers to use the new calls.
Closes-bug: #1717582
Related-bug: #1492714
Change-Id: Id802e77543177fbb95ff15c2c7361172e8824633
With the current change in allowing the unbound fip
to be associated with the snat node, we are seeing
that all floating IPs that are associated with an
unbound port are created at the snat node.
This is also applicable for floating IPs that are
created just before associating the port to a VM.
We have seen such scenarios in the test cases.
This is the right behavior as per design. But when
the port is bound to a host, the floating IP should
be migrated to the respective host.
This patch fixes the issue by sending notification to
the respective node, when the port is bound and also
clear the fip from the snat node.
Closes-Bug: #1718788
Change-Id: I6b1f3ffc3c3336035632f6a82d3a87b3be57b403
get_router_cidrs is over-ridden in dvr_edge_router. This function
is not returing the centralized_floating_ip cidrs when called.
This patch fixes the issue.
Change-Id: I95a2e3b1b1474aba15a7cdceeb4ed951b672e2ca
Closes-Bug: #1712728
The agent is not currently checking for the host bound
before configuring the floatingip. That leads to
floatingips being configured on multiple hosts.
This is a partial fix on the agent side to prevent
configuring a floatingip ip that is not bound to
this host.
Related-Bug: #1712412
Related-Bug: #1713927
Change-Id: I1bc8c42425f97234f56412a2f109a996d9f896de
All Newton+ servers using L3RouterPlugin expose the endpoint. Even back
in Newton time when the patch that introduced the new RPC entry point,
there was no need to have this special handling, because neutron-server
is always upgraded before agents.
TrivialFix
Change-Id: I2afa84d6b5771600068f8e98c407bbdce2f266b0
Refactoring neutron agent linux and ovsdb config opts
to be in neutron/conf/agent so that all the config options
reside in a centralized location. This simplifies the
process of looking up the config opts and provides an easy
way to import.
NeutronLibImpact
Change-Id: Ib1e0e63dec2985c417412d1ecc68e2a74ef87182
Partial-Bug: #1563069
_get_floatingips_bound_to_host function was introduced
recently in dvr_local_router to retrieve the external
interface name for centralizing the floatingip.
This function was throwing a 'KeyError' on fip['host'] and
not required for centralized floatingips anymore.
The get_external_device_interface_name in dvr_local_router
will try to get the 'fg' interface that is required for
the bound floating-ips to clear up some of the rules.
In the case of the centralized unbound floating-ips, the
'qg' external interface is retreived from
get_snat_external_device_interface_name that is defined
in 'dvr_edge_router' and based on the namespace.
So _get_floatingips_bound_to_host can be removed from
get_external_device_inteface_name.
Closes-Bug: 1712412
Change-Id: I94c0a071df32f572745a2c29942956c3da9f309b
This patch makes L3 agent to update its ports' MTU when it's changed on
core plugin side.
Related-Bug: #1671634
Change-Id: I4444da6358e8b8420a3a365e1107b02f5bb1161d
This patch is the agent side patch that takes care of configuring
the centralized floatingips for the unbound ports in the snat_namespace.
Change-Id: I595ce4d6520adfd57bacbdf20ed03ffefd0b190a
Closes-Bug: #1583694