When we manually move a router from one dvr_snat node to
another dvr_snat node the snat_namespace should be removed in
the originating node by the agent and will be re-created in the
destination node by the destination agent.
But when the agent dies, the router_update message reaches the
agent after the agent restarts. At this time the agent should
remove the snat_namespace since it is no more hosted by the
current agent.
Even though we do have logic in agent to take care of cleaning
up the snat namespaces if the gw_port_host does not match with the
existing agent host, in this particular use case the self.snat_namespace
is always set to 'None' in the dvr_edge_router init call when agent
restarts.
This patch fixes the above issue by initializing the snat namespace
object during the router_init. Since we do have a valid snat
namespace object and if the gw_port_host mismatches, the agent
should clean up the namespace.
Change-Id: I30524dc77b743429ef70941479c9b6cccb21c23c
Closes-Bug: #1557909
(cherry picked from commit 9dc70ed77e)
DvrEdgeRouter.process_address_scope() currently assumes that
snat_iptables_manager was initialized, however this is only done when an
external gateway is added. In case a new DVR+HA router was created
without an external gateway, the l3 agent will raise an exception and
will not create the router correctly. This patch adds a simple check to
make sure that it is defined before it's actually used.
Closes-Bug: #1560945
(cherry picked from commit a8b6067115)
Change-Id: I677e0837956a6d008a3935d961f078987a07d0c4
For legacy router, there are some iptables rules added for external gateway
port. Some of these rules are for shared snat, some are for floating ip.
When user disables shared snat of router gateway, some of the iptables rules
that floating ip needs will not be added to router. This will cause the
reported bug, ping floating ip but reply with fixed ip.
The fix will add the iptables rules that floating ip needs, no matter if
router enables shared snat. A functional test is also added for the issue.
Change-Id: I3cf4dff90f47a720a2e6a92c9ede2bc067ebd6e7
Closes-Bug: #1551530
This changes the 'plug' and 'plug_new' interfaces of the
LinuxInterfaceDriver to accept an MTU argument. It then
updates the dhcp agent and l3 agent to pass the MTU that
is set on the network that the port belongs to. This allows
it to take into account the overhead calculations that are
done for encapsulation types.
It's necessary for the L3 agent to have the MTU because it
must recognize when fragmentation is needed so it can fragment
or generate an ICMP error.
It's necessary for the DHCP agent to have the MTU so it doesn't
interfere when it plugs into a bridge with a larger than 1500
MTU (the bridge would reduce its MTU to match the agent).
If an operator sets 'network_device_mtu', the value of that
will be used instead to preserve previous behavior.
Closes-Bug: #1549470
Closes-Bug: #1542108
Closes-Bug: #1542475
DocImpact: Neutron agents now support arbitrary MTU
configurations on each network (including
jumbo frames). This is accomplished by checking
the MTU value defined for each network on which
it is wiring VIFs.
Co-Authored-By: Matt Kassawara <mkassawara@gmail.com>
Change-Id: Ic091fa78dfd133179c71cbc847bf955a06cb248a
Running 'tox -e py27' creates files in the local directory.
This fix ensures the /tmp directory is used instead.
Change-Id: If545b8795570eb424483f554fef8ad6170fa7ed9
Closes-Bug: #1541119
Currently 'force_gateway_on_subnet' configuration is set to True
by default and enforces the subnet on to the gateway. With this
fix 'force_gateway_on_subnet' can be changed to False, and
gateway outside the subnet can be added.
Before adding the default route, a route to the gateway IP is
added. This applies to both external and internal networks.
This configuration option is deprecated, and should be removed
in a future release. It should always allow gateway outside the
subnet. This is done as a separate patch
https://review.openstack.org/#/c/277303/
Change-Id: I3a942cf98d263681802729cf09527f06c80fab2b
Closes-Bug: #1335023
Closes-Bug: #1398768
RFC4861 allows us to specify the Link MTU using IPv6 RAs.
When advertise_mtu is set in the config, this patch supports
advertising the LinkMTU using Router Advertisements.
Partially Implements: blueprint mtu-selection-and-advertisement
Closes-Bug: #1495444
Change-Id: I50d40cd3b8eabf1899461a80e729d5bd1e727f28
For networks in the same address scope, network traffic routes
directly. This happens not only between internal networks, but also
between internal network and external network. No SNAT is applied
when routing traffic to the external network because addresses on the
internal network are assumed to be viable on the external network.
For networks in different scopes, network traffic can't route
directly. Between internal networks in different scopes, traffic is
blocked. DNAT for floating IPs will still work. Also, shared SNAT to
the external network will still work as it does today.
Change-Id: I439633ebef432b1a2eecee09b647207d5a271bf6
Co-Authored-By: Hong Hui Xiao <xiaohhui@cn.ibm.com>
Implements: blueprint address-scopes
Duplicate IPtables rule detected warning message is seen in the
l3 agent logs for sometime.
This will be seen when multiple floatingips are created on
the same node for different routers or when a floatingip
is disassociated and re-associated to a fixed-ip on the same node.
The fip namespace is retained in the compute node even though
the floatingip is disassociated, but when we try to re-associate
or create a new floatingip the code in l3agent is trying to check,
if this is the 'first' floatingip and if so tries to re-create the
floatingip namespace and the rules within it.
This happens because we are unsubscribing the fip namespace count
for every associated routers that we are deleting.
This duplicate call to create the fip namespace should be restricted
if there is already a fip namespace in the compute node and the fip
namespace should be unsubscribed only when the external network is
removed before the actual fip namespace is deleted.
The change proposed in this fix, will only unsubscribe the fip
namespace before it is deleted.
Closes-Bug: #1535928
Change-Id: I24016382091cad485f65e7753972f4b71702ff9f
Currently a global setting that is applied for all managed radvd
processes. Per-process setting could be done in the future.
For large clouds, it may be useful to increase the intervals, to reduce
multicast storms.
Co-Authored-By: Brian Haley <brian.haley@hpe.com>
DocImpact Router advertisement intervals for radvd are now configurable
Related-Bug: #1532338
Change-Id: I6cc313599f0ee12f7d51d073a22321221fca263f
Today static routes are added to the SNAT namespace
for DVR routers. But they are not added to the qrouter
namespace.
Also while configuring the static routes to SNAT
namespace, the router is not checked for the existence
of the gateway.
When routes are added to a router without a gateway the
routes are only configured in the router namespace, but
when a gateway is set later, those routes have to be
populated in the snat_namespace as well.
This patch addresses the above mentioned issues.
Closes-Bug: #1499785
Closes-Bug: #1499787
Change-Id: I37e0d0d723fcc727faa09028045b776957c75a82
In order for the l3-agent to see the RA and PD config options,
it needs to register them when it starts. Noticed this when I
went to override something for a test and it wouldn't work.
It now passes the config down to radvd on start so the correct
values are picked-up.
Change-Id: Iec0e0d16eed4f12af77fcd4f0b93b641b1146293
Related-Bug: #1532338
In case there are thousands of routers attached to thousands of
networks, sync_routers request might take a long time and lead to timeout
on agent side, so agent initiate another resync. This may lead to an endless
loop causing server overload and agent not being able to sync state.
This patch makes l3 agent first check how many routers are assigned to
it and then start to fetch routers by chunks.
Initial chunk size is set to 256 but may be decreased dynamically in case
timeouts happen while waiting response from server.
This approach allows to reduce the load on server side and to speed up
resync on agent side by starting processing right after receiving
the first chunk.
Closes-Bug: #1516260
Change-Id: Id675910c2a0b862bfb9e6f4fdaf3cd9fe337e52f
When an IPv6 subnet's ipv6_ra_mode is set to DHCPV6_STATEFUL,
the hosts on that subnet rely on router advertisement for the
prefix length. This is important for subnets where the lengths
of the prefixes are not 64.
Closes-Bug: #1531093
Change-Id: Ied8d390a05ee1a2e544e39e887abf11c8a56abc3
This patch makes use of the constant defined in the extension.
In addition to this having value of debing defined in one place it
also enables the caller to understand that the portbindings
extension is required.
Note: the constant is not used in the API tests. This has import
issues so it is not relevant.
TrivialFix
Change-Id: I7bfe2528dbbd8017ddbdcf949dbb6264ce1eb5d8
If the L3 agent fails to configure a router, commit:
4957b5b435 changed it so
that instead of performing an expensive full sync, only that
router is reconfigured. However, it tries to reconfigure the
cached router. This is a change of behavior from the fullsync
days. The retry is more likely to succeed if the
router is retrieved from the server, instead of using
the locally cached version, in case the user or operator
fixed bad input, or if the router was retrieved in a bad
state due to a server-side race condition.
Note that this is only relevant to full syncs, as those retrieve
routers from the server and queue updates with the router object.
Incremental updates queue up updates without router objects,
so if one of those fails it would always be resynced on a
second attempt.
Related-Bug: #1494682
Change-Id: Id0565e11b3023a639589f2734488029f194e2f9d
While processing a router update in _process_router_update method,
if an exception occurs, we try to do a full_sync.
We only need to re-sync the router whose update failed.
Addressed a TODO in the same method, which falls in similar lines.
Change-Id: I7c43a508adf46d8524f1cc48b83f1e1c276a2de0
Closes-Bug: #1494682
RFC6106 standardizes IPv6 Router Advertisements to support
Recursive DNS server information. RDNSS info allows an IPv6
host to configure the DNS information via RA messages without
needing DHCPv6 for the DNS configuration.
This patch configures RADVD daemon to include RDNSS entries in
the Router Advertisements when the IPv6 subnet has dns_nameservers.
Closes-Bug: #1495465
Change-Id: Ia516d40b1c7a83cd7046b2b7f42d1204f44288a9
The use_namespaces option has been defined as a workaround to kernels
not properly supporting namespaces. This limitation is behind us, it's
time to remove use_namespaces after its deprecation in Kilo in order to
simplify code and remove a poorly tested case (use_namespaces=False).
This change prepares for removal pullup_route method[1] which was only
used when use_namespaces=False.
[1] neutron.agent.linux.ip_lib
DocImpact
UpgradeImpact
Closes-Bug: #1508188
Related-Bug: #1435382
Depends-On: I303038eec560a6d99421140c2822aed8b518470b
Depends-On: I4feb2a15c7e1e4bfdbed2531b18b8e7d798ab3cc
Change-Id: I2fbf65df1250d9f9f1656b3964ee3b6de1ef1118
In big and busy clusters there could be a condition when
rabbitmq clustering mechanism synchronizes queues and during
this period agents connected to that instance of rabbitmq
can't communicate with the server and server considers them
dead moving resources away. After agent become active again,
it needs to cleanup state entries and synchronize its state
with neutron-server.
The solution is to make agents aware of their state from
neutron-server point of view. This is done by changing state
reports from cast to call that would return agent's status.
When agent was dead and becomes alive, it would receive special
AGENT_REVIVED status indicating that it should refresh its
local data which it would not do otherwise.
Closes-Bug: #1505166
Change-Id: Id28248f4f75821fbacf46e2c44e40f27f59172a9
Now that we have the constant defined, we should reuse it from other
code to avoid potential typos.
Change-Id: Id7a941c1a461264ba44893d97cc6226f092e9888
neutron.agent.linux.utils:replace_file() and
neutron.common.utils:replace_file() have same functionality.
This is the 1st patch in the series of 4 patches.
It modifies neutron.common.utils:replace_file(),
so it can be used by all components as a replacement
for neutron.agent.linux.utils:replace_file().
New keyword parameter 'file_mode=0o644' is added
to neutron.common.utils:replace_file().
Partial-bug: #1504477
Change-Id: Id1a7f1236786e8606c91bb9925cd9ac8e95892b3
This fixes a bug where an iptables rule to not snat traffic between
fixed IPs is only being added if enable_snat=true. We should add
this rule no matter what the value is for enable_snat.
Without this patch, current code will break such use case:
2 fixed IPs behind same router both have floatingip associated. And
the router has enable_snat=false. When fixed IP A want to ping
fixed IP B, fixed IP A will get the reply from fixed IP B's floating
IP.
More details could be found at bug description.
Change-Id: I322e8d454ef1d529ceda541fb5fe577cd70b412f
Closes-bug: #1505781
When enable_metadata_proxy is false, the agent instance will
not have metadata_driver. And agent should avoid using it.
Change-Id: Ia18dc5dea23de49b97c8f225532531eb9232fb51
Closes-Bug: #1510399
dhcp/router_delete_namespaces[1] options have been defined as a
workaround to an iproute2 limitation[1] corrected 2 years ago.
That's why the change removes these options after their deprecation
in Liberty.
[1] in neutron.agent.dhcp/l3.config
DocImpact
Closes-Bug: #1508189
Related-Bug: #1418079
Change-Id: I2a879213c3b095a007a4531f430a33cea9fdf1bd
According to the context, it should be KeyError here to catch.
AttributeError will not happen here. More details could be found
in the bug report.
Change-Id: Id6351172703ac492e86475f75bf1be03f4e4e8a3
Closes-bug: #1506934
There seems to be a timing issue between the
ARP entries that arrive from the server to
the agent and the internal qr-device getting
created by the agent.
So those unsuccessful arp entries are dropped.
This patch makes sure that the early ARP entries
are cached in the agent and then utilized when
the internal device is up.
Closes-Bug: #1501086
Change-Id: I9ec5412f14808de73e8dd86e3d51593946d312a0
Currently l3 agent skips status update for floating ips in case
status didn't change: this might be wrong if status has changed
on server side while agent was processing. See bug for details.
L3 agent skips floating ip processing in case ip address exists
on external device. So we can still skip status update for such
floating ips.
Closes-Bug: #1505557
Change-Id: I908fe5a0555f68ab85e7d199c36a903b915e103f
Explicit call to periodic resync after start may lead to
double syncing. See bug for details.
Closes-Bug: #1505282
Change-Id: Ib5e481d579039b2c3e87d4f12cad1241d02fe060