When HA router is created in "stanby" mode, ipv6 forwarding is
disabled by default in its namespace.
But when router is transitioned to be "master" on node, ipv6
forwarding should be enabled. This was fine for routers with
configured gateway but we somehow missed the case when router don't
have gateway configured.
Because of that missing ipv6 forwarding setting in such case, IPv6
W-E traffic between 2 subnets was not working fine in L3 HA case.
This patch fixes it by adding configuring ipv6_forwarding on
"all" interface in router's namespace always, even if it don't have
gateway configured.
Conflicts:
neutron/tests/functional/agent/l3/framework.py
neutron/tests/unit/agent/l3/test_agent.py
Change-Id: I8b1b2b426f7a26a4b2407a83f9bf29dd6e9ba7b0
CLoses-Bug: #1818224
(cherry picked from commit b119247bea)
(cherry picked from commit 270912a8c7)
Sometimes in case of HA routers it may happend that
keepalived will set status of router to MASTER before
neutron-keepalived-state-change daemon will spawn "ip monitor"
to monitor changes of IPs in router's namespace.
In such case neutron-keepalived-state-change process will never
notice that keepalived set router to be MASTER and L3 agent will
not be notified about that so router will not be configured properly.
To avoid such race condition neutron-keepalived-state-change will
now check if VIP address is already configured on ha interface
before it will spawn "ip monitor". If it is already configured
by keepalived, it will notify L3 agent that router is set to
MASTER.
Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
Closes-Bug: #1818614
(cherry picked from commit 8fec1ffc83)
When the external gateway is plugged and we enable IPv6
forwarding on it, make sure the 'all' sysctl knob is also
enabled, else IPv6 packets will not be forwarded. This
seems to only affect HA routers that default to disabling
this 'all' knob on creation.
Also, when we are removing all the IPv6 addresses from a
HA router internal interface, set 'accept_ra' to zero so
it doesn't accidentally auto-configure an address. Set
it back to one when adding them back.
Re-homed newly added _wait_until_ipv6_forwarding_has_state()
accordingly.
Conflicts:
neutron/tests/functional/agent/l3/test_ha_router.py
Closes-bug: #1787919
Change-Id: Ia1f311ee31d1479089685367a97bf13cf170b342
(cherry picked from commit b847cd02c5)
(cherry picked from commit dfedafe5f6)
It may happen that L3 agent works in dvr_snat mode but
it handles some router as "normal" dvr router because
snat for this router is handled on other node.
In such case we shouldn't try to get floating IPs cidrs
from snat namespace as it doesn't exists on host.
Change-Id: Ib27dc223fcca56030ebb528625cc927fc60553e1
Related-Bug: #1717302
(cherry picked from commit 7d0e1ccd34)
With DVR routers, if a port is associated with a FloatingIP,
before it is used by a VM, the FloatingIP will be initially
started at the Network Node SNAT Namespace, since the port
is not bound to any host.
Then when the port is attached to a VM, the port gets its
host binding, and then the FloatingIP setup should be migrated
to the Compute host and the original FloatingIP in the Network
Node SNAT Namespace should be cleared.
But the original FloatingIP setup in SNAT Namespace was not
cleared by the agent.
This patch addresses the issue.
Change-Id: I55a16bcc0020087aa1abe76f5bc85cd64ccdaecd
Closes-Bug: #1796491
(cherry picked from commit cd0cc47a6a)
In test test_ha_router_namespace_has_ipv6_forwarding_disabled
functional test it may happen that L3 agent will not change ipv6
forwarding and test fails because it checks that only once just
after router state is change to master.
This patch fixes that race by adding wait for 60 seconds to
ipv6 forwarding change.
Change-Id: I85a602561ebe9b7ab135913af49a3f010b09f196
Closes-Bug: #1801930
(cherry picked from commit 916e774516)
For L3 DVR HA router, the centralized floating IP nat rules are not
installed in every HA node snat namespace. So, install the rules to
all the router snat-namespace on every scheduled HA router host.
Conflicts:
neutron/tests/common/l3_test_common.py
neutron/tests/functional/agent/l3/test_dvr_router.py
Conflicts:
neutron/tests/common/l3_test_common.py
Closes-Bug: #1793527
Change-Id: I08132510b3ed374a3f85146498f3624a103873d7
(cherry picked from commit ee7660f593)
(cherry picked from commit 2a1cdf01b5)
(cherry picked from commit b93ef2f7e8)
All FloatingIP for DVR_NO_EXTERNAL agents will be configured
in the SNAT Namespace. So there is no need to configure the
address scope related routes in the router namespace when the
agent is configured as DVR_NO_EXTERNAL.
Change-Id: I009dae9e7f485641f2f19dce8dd575da04bfb044
Related-Bug: #1753434
(cherry picked from commit 7c4da6fb75)
Extra routes are not configured on Router namespaces in dvr_snat
node with DVR-HA configuration.
This patch fixes the problem.
Change-Id: If620b23564479042aa6f58640bcd6705e5eb52cf
Closes-Bug: #1797037
(cherry picked from commit 81652cd939)
Sometimes we have seen the 'fg' ports within the fip-namespace
either goes down, not created in time or getting deleted due to
some race conditions.
When this happens, the code tries to recover itself after couple
of exceptions when there is a router_update message.
But after recovery we could see that the fip-namespace is
recreated and the 'fg-' port is plugged in and active, but the
'fpr' and the 'rfp' ports are missing which leads to the
FloatingIP failure.
This patch will fix this issue by checking for the missing devices
in all router_updates.
Change-Id: I78c7ea9f3b6a1cf5b208286eb372da05dc1ba379
Closes-Bug: #1776984
(cherry picked from commit 5a7c12f245)
In case of HA routers IPv6 forwarding is not disabled by default and
then enabled only on master node.
Before this patch it was done in opposite way, so forwarding was
enabled by default and then disabled on backup nodes.
When forwarding was enabled/disabled for qg- port, MLDv2 packets are
sent and that might lead to temportary packets loss as packets to
FIP were sent to this backup node instead of master one.
Related-Bug: #1771841
Change-Id: Ia6b772e91c1f94612ca29d7082eca999372e60d6
(cherry picked from commit 3e9e2a5b4b)
Agent OVS interface code adds ports without a vlan tag,
if neutron-openvswitch-agent fails to set the tag, or takes
too long, the port will be a trunk port, receiving
traffic from the external network or any other port
sending traffic on br-int.
Also, those kinds of ports are triggering a code path
on the ovs-vswitchd revalidator thread which can eventually
hog the CPU of the host (that's a bug under investigation [1])
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1558336
Conflicts:
neutron/tests/functional/agent/test_ovs_lib.py
needed the addition of the following import:
from neutron.plugins.ml2.drivers.openvswitch.agent.common import (
constants as agent_const)
Co-Authored-By: Slawek Kaplonski <skaplons@redhat.com>
Change-Id: I024bbbdf7059835b2f23c264b48478c71633a43c
Closes-Bug: 1767422
(cherry picked from commit 88f5e11d8b)
(cherry picked from commit 2b1d413ee9)
Post-binding information about router ports is missing in results of RPC
calls made by l3 agents. sync_routers code ensures that bindings are
present, however, it does not refresh router objects before returning
them - for RPC clients ports remain unbound before the next sync and
there is no necessary address scope information present to create routes
from fip namespaces to qrouter namespaces.
Conflicts:
neutron/api/rpc/handlers/l3_rpc.py
Change-Id: Ia135f0ed7ca99887d5208fa78fe4df1ff6412c26
Closes-Bug: #1759971
(cherry picked from commit ff5e8d7d6c)
When l3 agent is restarted on a dvr_snat node that is configured
for L3_HA and has a centralized FloatingIP configured to the
qg-interface in the snat_namespace, that FloatingIP is not
re-configured to the qg-interface when agent starts.
The reason being, the cidr is not being retrieved from the
keepalived instance and only retrieved from the
centralized_fip_cidr_set.
If 'L3_HA' is configured we need to retrieve it from the keepalived
instance.
This patch fixes the problem by retrieving the cidrs from the
keepalived instance for the qg-interface.
Change-Id: I848a20d06e2d344503a4cb1776dbe2617d91bc41
Closes-Bug: #1740450
(cherry picked from commit 64028a389f)
l3-agent checks the HA state of routers when a router is updated.
To ensure that the HA state is only checked on HA routers the following
check is performed: `if router.get('ha') and not is_dvr_only_agent`.
This check should ensure that the check is only performed on
DvrEdgeHaRouter and HaRouter objects.
Unfortunately, there are cases where we have DvrEdgeRouter objects
running on 'dvr_snat' agents. E.g. when deploying a loadbalancer with
neutron-lbaas in a landscape with 6 network nodes and
max_l3_agents_per_router set to 3, it may happen that the loadbalancer
is placed on a network node that does not have a DvrEdgeHaRouter running
on it. In such a case, neutron will deploy a DvrEdgeRouter object on the
network node to serve the loadbalancer, just like it would deploy a
DvrEdgeRouter on a compute node when deploying a VM.
Under such circumstances each update to the router will lead to an
AttributeError, because the DvrEdgeRouter object does not have the
ha_state attribute.
This patch circumvents the issue by doing an additional check on the
router object to ensure that it actually has the ha_state attribute.
Closes-Bug: #1755243
Change-Id: I755990324db445efd0ee0b8a9db1f4d7bfb58e26
(cherry picked from commit 8c2dae659a)
As soon as we call router_info.initialize(), we could
possibly try and process a router. If it is HA, and
we have not fully initialized the HA port or keepalived
manager, we could trigger an exception.
Move the call to check_ha_state_for_router() into the
update notification code so it's done after the router
has been created. Updated the functional tests for this
since the unit tests are now invalid.
Also added a retry counter to the RouterUpdate object so
the l3-agent code will stop re-enqueuing the same update
in an infinite loop. We will delete the router if the
limit is reached.
Finally, have the L3 HA code verify that ha_port and
keepalived_manager objects are valid during deletion since
there is no need to do additional work if they are not.
Change-Id: Iae65305cbc04b7af482032ddf06b6f2162a9c862
Closes-bug: #1726370
(cherry picked from commit d2b909f533)
With the current change in allowing the unbound fip
to be associated with the snat node, we are seeing
that all floating IPs that are associated with an
unbound port are created at the snat node.
This is also applicable for floating IPs that are
created just before associating the port to a VM.
We have seen such scenarios in the test cases.
This is the right behavior as per design. But when
the port is bound to a host, the floating IP should
be migrated to the respective host.
This patch fixes the issue by sending notification to
the respective node, when the port is bound and also
clear the fip from the snat node.
Closes-Bug: #1718788
Change-Id: I6b1f3ffc3c3336035632f6a82d3a87b3be57b403
(cherry picked from commit 27fcf86bcb)
When HA is enabled with DVR routers the centralized floating
IPs are not configured properly in the DVR snat namespace
for the master router namespace.
The reason is we were not calling the add_centralized_floatingip
and the remove_centralized_floatingip in the DvrEdgeHaRouter
class.
This patch overrides the add_centralized_floatingip and
remove_centralized_floatingip in dvr_edge_ha_router.py file
to add the cidr to the vips.
Closes-Bug: #1716829
Change-Id: Icc8c5d4e22313448e2066a29dbe509e4345b364c
(cherry picked from commit b9ecb3804c)
We thought _get_floatingips_bound_to_host is not needed but removing the
method caused sending garps for fip that doesn't belong to node during
the full-sync.
This patch just replaces dict lookup with get() method, so fips are
filtered based on presence on the host and if host is not set on fip, it
won't raise a KeyError.
Note: This patch hasn't been merged in master yet because of KeyError happening in grenade job now.
Co-Authored-By: Swaminathan Vasudevan <SVasudevan@suse.com>
Related-bug: #1712412
Related-bug: #1713927
Change-Id: I0fbc772d757fb13b788f9df8d6d7d28d288d054a
_get_floatingips_bound_to_host function was introduced
recently in dvr_local_router to retrieve the external
interface name for centralizing the floatingip.
This function was throwing a 'KeyError' on fip['host'] and
not required for centralized floatingips anymore.
The get_external_device_interface_name in dvr_local_router
will try to get the 'fg' interface that is required for
the bound floating-ips to clear up some of the rules.
In the case of the centralized unbound floating-ips, the
'qg' external interface is retreived from
get_snat_external_device_interface_name that is defined
in 'dvr_edge_router' and based on the namespace.
So _get_floatingips_bound_to_host can be removed from
get_external_device_inteface_name.
Closes-Bug: 1712412
Change-Id: I94c0a071df32f572745a2c29942956c3da9f309b
(cherry picked from commit 47fbc6157a)
DVR supports both East/West and North/South routing. While the
SNAT is centralized the DNAT is mostly distributed. There are
certain circumstances where the DNAT might be centralized when
the ports are unbound.
In order to have a well defined behavior and when there are
no external network connectivity available in the compute host,
the DNAT functionality is centralized.
In order to achieve this we are introducing a new agent type
option 'dvr_no_external' to centralize the DNAT.
This new L3 agent type ('dvr_no_external') would only allow the East/West
routing to occur in the compute host and the DNAT or Floating IP will be
configured in the centralized network node.
Change-Id: Ia5d7336e478e0fa5ba62b7ae5ed0c56656116d94
Partial-Bug: #1667877
(cherry picked from commit 9515c771e7)
This patch is the agent side patch that takes care of configuring
the centralized floatingips for the unbound ports in the snat_namespace.
Change-Id: I595ce4d6520adfd57bacbdf20ed03ffefd0b190a
Closes-Bug: #1583694
Router update task fails when agent restarts with DVR routers
as it was failing adding an IP rule to the namespace.
The IP rule matching code was not finding a match for
a rule for an interface since we are not specifying an
IP address, but the resulting rule does have the "any" IP
address in its output, for example, 0.0.0.0/0. Change
to always supply the IP address.
Change-Id: Ic2e80ebb59ac9e0e0063e5f6e69f3d66abe775a1
Closes-Bug: #1702790
In reviews we usually check import grouping but it is boring.
By using flake8-import-order plugin, we can avoid this.
It enforces loose checking so it sounds good to use it.
This flake8 plugin is already used in tempest.
Note that flake8-import-order version is pinned to avoid unexpected
breakage of pep8 job.
Setup for unit tests of hacking rules is tweaked to disable
flake8-import-order checks. This extension assumes an actual file exists
and causes hacking rule unit tests.
Change-Id: Ib51bd97dc4394ef2b46d4dbb7fb36a9aa9f8fe3d
In some unit and functional tests self.assertTrue(False) was used
instead of self.fail(), which might be against readability.
Using assertTrue(False) gives the following message on fail:
File "C:\Python361\lib\unittest\case.py", line 678, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true
After replacing assertTrue(False) with fail():
File "C:\Python361\lib\unittest\case.py", line 666, in fail
raise self.failureException(msg)
AssertionError: None
Although the results are the same (both tests failed), the
message 'False is not true' is unnecessary, and can be omitted
from the log by using fail().
Change-Id: I81e21040fd6a2f9713889912fafd2b19bd056b5a
In Python 3 str is unicode by default and hence there is no unicode
global function. This switches our use of it to to six.u() which handles
the py2/py3 difference under the covers.
Change-Id: I3cfff2fe8e07e2a9ed8b89c93d24351b1f440b00
When we create agent gateway port on all the nodes irrespective
of the floatingips we can basically use that agent gateway port to
forward traffic in and out of the nodes if the address_scopes match,
since we don't need SNAT functionality if address scopes match.
If a gateway is configured and if it has internal ports that belong
to the same address_scopes then no need to add the redirect rules.
At the same we should also add a static route in the fip namespace
for every interface that is connected to the router that belongs to
the same address scope.
Change-Id: I617e2fc5a70852c6f2e925ac7244f2a205d60de4
Closes-Bug: #1577488
Doing so causes a eventlet.sleep(MagicMock()) call, which results
in comparing MagicMock with float and generates a TypeError in python3.
Change-Id: I970c215433fda05fa4570b2250a36aab4c18c616
This reverts commit fb2093c365.
This patch started spamming logstash like crazy with ERRORs.
Closes-Bug: #1693539
Change-Id: I81627f1bac1b981f930b66c126abd8285653bf49
Currently when there is an IPv6 gateway set, IPv6 forwarding
configuration is skipped. This patch fixes it and also adds test
coverage to check the values of `accept_ra` and `forwarding` with
and without an IPv6 gateway.
Related-bug: #1667756
Change-Id: I0297aa6d0afeb56a8c69be41424d4b70509377d2
When we create agent gateway port on all the nodes irrespective
of the floatingips we can basically use that agent gateway port to
forward traffic in and out of the nodes if the address_scopes match,
since we don't need SNAT functionality if address scopes match.
If a gateway is configured and if it has internal ports that belong
to the same address_scopes then no need to add the redirect rules.
At the same we should also add a static route in the fip namespace
for every interface that is connected to the router that belongs to
the same address scope.
Change-Id: Iaf6d3b38b1fb45772cf0b88706586c057ddb0230
Closes-Bug: #1577488
The callback modules have been available in neutron-lib since commit [1]
and are ready for consumption.
As the callback registry is implemented with a singleton manager
instance, sync complications can arise ensuring all consumers switch to
lib's implementation at the same time. Therefore this consumption has
been broken down:
1) Shim neutron's callbacks using lib's callback system and remove
existing neutron internals related to callbacks (devref, UTs, etc.).
2) Switch all neutron's callback imports over to neutron-lib's.
3) Have all sub-projects using callbacks move their imports over to use
neutron-lib's callbacks implementation.
4) Remove the callback shims in neutron-lib once sub-projects are moved
over to lib's callbacks.
5) Follow-on patches moving our existing uses of callbacks to the new
event payload model provided by neutron-lib.callback.events
This patch implements #2 from above, moving all neutron's callback
imports to use neutron-lib's callbacks.
There are also a few places in the UT code that still patch callbacks,
we can address those in step #4 which may need [2].
NeutronLibImpact
[1] fea8bb64ba7ff52632c2bd3e3298eaedf623ee4f
[2] I9966c90e3f90552b41ed84a68b19f3e540426432
Change-Id: I8dae56f0f5c009bdf3e8ebfa1b360756216ab886
Without this commit, the run_as_root parameter is always True when
stopping a process, which leads to the usage of unnecessary sudo such as
in some functional tests, like the keepalived ones.
This commit fixes the aforemetioned problem by taking run_as_root into
account when stopping a process. However, run_as_root will still always
be True if the process is spawned in a netns.
Closes-Bug: #1491581
Change-Id: Ib40e1e3357b9a38e760f4e552bf615cdfd54ee5a
Signed-off-by: Hunt Xu <mhuntxu@gmail.com>
according to https://wiki.openstack.org/wiki/Python3,
now we should avoid using six.iteritems and replace
it with dict.items.
Change-Id: I8753e80b34c0f86cf70aebc3bcbd3392ee933f62
Partial-Bug: #1680761
In order to route traffic between the internal subnets and the
external subnet that belong to the same address_scopes we need
to create the gateway port and the fip namespace irrespective of
the configured floatingips for the internal subnet.
This will consume an additional IP from the external subnet on
all nodes, but with the introduction of service_type networks,
this will not be an issue any more.
This patch is the first in series that creates the agent gateway
port and the fip namespace on every node when the gateway is set
for the router. For every router created it will connect the
router namespace to the fip namespace.
Partial-Bug: #1577488
DocImpact: Document the change in behavior for fip-agent-gw create
Change-Id: I30c4f7fc250e486fe9a71b68540e783e90a6cf15
Neutron-lib 1.1.0 is now out and contains the portbindings
API definition (as per commit [1]). This patch moves neutron
references over to the neutron-lib version.
NeutronLibImpact
- Consumers using the public constants within neutron's
portbindings API extension must now use the values
from neutron-lib.
[1] 87e42f993c07ae320159d5123662ee9f3bd4d903
Change-Id: I669af9b4c712877772d91a03857ab108714001d4
Neutron does not disable ipv6 forwarding for HA routers and it's
enabled by default in all router namespaces. For ipv6, this means
that it will automatically join the following groups:
* link-local all-routers multicast group (ff02::2)
* interface-local all routers multicast group (ff01::2)
* site-local all routers multicast group (ff05::2))
As a side effect it will answer to multicast listener queries, thus
causing external switch to learn its MAC address and disrupting traffic
to the master instance.
This patch will enable ipv6 forwarding on the gateway interface only
for master instances and disable it otherwise to fix the issue.
Also, the accept_ra procfs entry was enabled under certain
circumstances but it wasn't disabled otherwise. This patch, will
disable RA on the gateway interface for non master instances.
Closes-Bug: #1667756
Change-Id: I9bc890b43f750cad68fc67f4c79f1426c3506863
Refactoring Neutron configuration options for agent common config to be
in neutron/conf/agent/common. This will allow centralization of all
configuration options and provide an easy way to import.
Partial-Bug: #1563069
Change-Id: Iebac0cdd3bcfd0135349128921b7ad7a1a939ab8
Needed-By: Ib676003bbe909b5a9013a3178b12dbe291d936af