Sometimes in case of HA routers it may happend that
keepalived will set status of router to MASTER before
neutron-keepalived-state-change daemon will spawn "ip monitor"
to monitor changes of IPs in router's namespace.
In such case neutron-keepalived-state-change process will never
notice that keepalived set router to be MASTER and L3 agent will
not be notified about that so router will not be configured properly.
To avoid such race condition neutron-keepalived-state-change will
now check if VIP address is already configured on ha interface
before it will spawn "ip monitor". If it is already configured
by keepalived, it will notify L3 agent that router is set to
MASTER.
Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
Closes-Bug: #1818614
(cherry picked from commit 8fec1ffc83)
Since iptables-restore doesn't support --dport with protocol vrrp,
it errors out setting the security groups on the hypervisor.
Marking this a partial fix, since we need a change to prevent
adding those incompatible rules in the first place, but this
patch will stop the bleeding.
Change-Id: If5e557a8e61c3aa364ba1e2c60be4cbe74c1ec8f
Partial-Bug: #1818385
(cherry picked from commit 8c213e4590)
In case when L3 agent is running in dvr_snat mode on compute node,
it is like that e.g. in some of the gate jobs, it may happen that
same router is scheduled to be in standby mode on compute node and
on same compute node there is instance connected to it.
So in such case metadata proxy needs to be spawned in router namespace
even if it is in standby mode.
Conflicts:
neutron/tests/unit/agent/l3/test_agent.py
Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
Closes-Bug: #1817956
Closes-Bug: #1606741
(cherry picked from commit 6ae228cc2e)
Need to pass centralized floating IPs as preserve_ips to
_external_gateway_added during DVR router update.
Otherwise IP addresses will be deleted from gw device in certain case.
The case is when a router with active centralized floating IPs is
being scheduled to a new dvr_snat L3 agent (rescheduled from a down one).
Please see corresponding traces in the bug description.
Change-Id: Iaeb9fbed73144df6fcd9092c665ed19986e85f4d
Closes-bug: #1817306
(cherry picked from commit 1ee18775a9)
The firewall won't attempt on update to initialize port in case
port hasn't been initialized by sg_agent yet. This fixes a race where update
rpc call arrives between wiring tap device with integration bridge and
firewall initialization.
Change-Id: Ice0667df606ae23061acebceea23ab6e49dadbcf
Closes-bug: #1740885
(cherry picked from commit ed57c3de42)
In some cases our db migration tests which run on MySQL are
failing with timeout and it happens due to slow VMs on which
job is running.
Sometimes it may also happen that timeout exception is raised
in the middle of some sqlalchemy operations and
sqlalchemy.InterfaceError is raised as last one.
Details about this exception can be found in [1].
To avoid many rechecks because of this reason this patch
introduces new decorator which is very similar to "unstable_test"
but will skip test only if one of exceptions mentioned above will
be raised.
In all other cases it will fail test.
That should be a bit more safe for us because we will not miss
some other failures raised in those tests and will avoid rechecks
because of this "well-known" reason described in related bug.
[1] http://sqlalche.me/e/rvf5
Conflicts:
neutron/tests/functional/db/test_migrations.py
neutron/tests/base.py
Change-Id: Ie291fda7d23a696aaa1160d126a3cf72b08c522f
Related-Bug: #1687027
(cherry picked from commit c0fec67672)
(cherry picked from commit e6f22ce81c)
In method _generate_arp_table_and_notify_agent in neutron.db.l3_dvr_db
module notifiations about arp table was send only to one router
connected to subnet.
Now it will check if subnet is connected to more than one
dvr router and will send same notification to all such routers.
Closes-Bug: #1815913
Change-Id: I6a7d7f6645a8a7b5219788d51e17d54844d145bc
(cherry picked from commit 1f104a093c)
Oslo_concurrency needs lock_path option, make it consistent in
documentation for Suse, Redhat and Ubuntu installation guides.
Change-Id: Ib675d7bf399f2aa7eba9d343fa0f06281d33089a
Related-Bug: #1796976
Closes-Bug: #1812497
(cherry picked from commit 534e850392)
(cherry picked from commit 573b0be3e8)
(cherry picked from commit de9f813928)
RouterInfo class has got internal_ports cache which is updated
in _process_internal_ports() method.
There was an issue in this updates logic because it was
iterating through enumerate local variable "internal_ports"
which represents current router ports and if such current port
was found in updated_ports list it was storred in
RouterInfo().internal_ports variable under same index as was
found in "internal_ports" local variable.
This sometimes leads to an issue because same port can be
stored under different index in internal_ports and
RouterInfo().internal_ports lists thus wrong port in
RouterInfo().internal_ports was overwritten.
Such issue leads to problem with generating radvd config file
because in ports cache list there was duplicate info about same port
so radvd config file contained duplicate interface definitions too.
This should be properly fixed by changing RouterInfo.internal_ports
to be a dict instead of list of ports but such patch would be much
bigger and (possibly) harded to backport to stable branches.
Change-Id: I2e38457942518c8a3e07e606091bb6720317b77e
Closes-Bug: #1813279
(cherry picked from commit 21cddc47b4)
Dnsmasq driver used by dhcp agent has restart() method which is
calling disable() and then enable() dnsmasq process again.
What can be observed in functional tests from time to time it may
happen that start dnsmasq process will be called before old process
is really down. That leads to error that IP address to which
dnsmasq wants to bind is already in use and it fails to start.
This patch adds possibility to call disable() method with block flag
set to True. In such case driver will ensure in disable() method that
process is really not active.
This blocking disable() is used in restart() method now.
Change-Id: I419a451633badbc3d32edcee1945fca3e3d9f6be
Closes-Bug: #1811126
(cherry picked from commit d471a85931)
Current DHCP port management in Neutron makes the server to clear the
device_id while the agent is responsible for setting it.
This may cause a potential race condition, for example during network
rescheduling. The server aims to clear the device_id on a DHCP port and
assign the network to another agent while the old agent might just be
taking possession of the port. If the DHCP agent takes possession of the
port (i.e., update port...set the device_id) before the server clears
it, then there is no issue. However, if this happens after the clear
operation by server then the DHCP port would be updated/marked to be
owned by the old agent.
When the new agent takes over the network scheduled to it, it won't be
able to find a port to reuse so that an extra port might need to be
created. This leads to two issues:
1) an extra port is created and never deleted;
2) the extra port creation may fail if there are no available IP
addresses.
This patch proposes a validation check to prevent an agent from updating
a DHCP port unless the network is bound to that agent.
Co-authored-by: Allain Legacy <Allain.legacy@windriver.com>
Conflicts:
neutron/api/rpc/handlers/dhcp_rpc.py
Note(elod.illes): Conflict caused by missing patch (that consumes
constants from neutron_lib), which should not be backported:
Ie4bcffccf626a6e1de84af01f3487feb825f8b65
Closes-Bug: #1795126
Story: 2003919
Change-Id: Ie619516c07fb3dc9d025f64c0e1e59d5d808cb6f
(cherry picked from commit b70ee4df88)
(cherry picked from commit b9f9c021c9)
Sometime between liberty and pike, adding rules to SG's got
slow, and slower with every rule. Streamline the rule create path,
and get close to the old performance back.
Two performance fixes:
1. Get rid of an n^2 duplicate check, using a hash table instead,
on bulk creates. This is more memory intensive than the previous loop,
but usable far past where the other becomes too slow to be useful.
2. Use an object existence check in a few places where we do not
want to load all of the child rules.
Also squashed in:
Restore tenant_id check on security group rule adds to previous semantic
We switched from swapping the tenant_id in the context to explicitly
checking the db column. Switch back, and a test that checks for
not breaking this rather odd behavior. At least, until we decide
to fix it as a bug.
Co-Authored-By: William Hager <whager@salesforce.com>
Change-Id: I34e41a128f28211f2e7ab814a2611ce22620fcf3
Closes-bug: 1810563
(cherry picked from commit 2eb31f84c9)
(squashed patch from commit bd4c291cdf)
Bug #1244589 re-appeared for IPv6.
This change adds an ip6tables rule to fix the checksum of DHCPv6
response packets. Those checksums were left unfilled by virtio (as a
hypervisor internal optimization), but some picky dhcp clients (AFAIU
particularly ISC dhclient) try verifying the checksums, so they fail
to acquire an address if the checksums are left incorrect.
Change-Id: I4a045e0dcfcbd3c7959a78f1460d5bf7da0252ff
Closes-Bug: #1811639
Related-Bug: #1244589
(cherry picked from commit 26eb2509fe)
Currently any dhcp agent instance will work as an open resolver. For
deployments using publicly routed addresses for tenant networks, this
allows the agent being abused in dDoS attacks, see [1].
By setting the `--local-service` option dnsmasq will filter DNS queries
and reply only to queries from directly attached networks.
[1] https://bugs.launchpad.net/neutron/+bug/1501206
Conflicts:
neutron/cmd/sanity_check.py
Closes-Bug: 1501206
Change-Id: I76d810aad2ce0f15a88bd798963012fa0efca74e
(cherry picked from commit 0fce3ca2c1)
When associating a floating IP to a port and the router is distributed,
the VNIC type of this port must be "normal" only. In any other case,
the floating IP can't be assigned. For example, a SR-IOV can have a
floating IP if the router is distributed (the router is in the same
host of the port).
This patch adds also function can_port_be_bound_to_virtual_bridge to
neutron/db/l3_db.py module.
Originally this function was introduced in neutron-lib with [1] but
in stable branch there is older neutron-lib used so this isn't available
from neutron-lib.
[1] https://review.openstack.org/#/c/615126/
Closes-Bug: #1566951
Change-Id: I4944041df81e24683bc612560808bcdcc2db6bf2
(cherry picked from commit 1966ad3945)
(cherry picked from commit 0f14e30fa4)
(cherry picked from commit 1c573bb8b9)
When the external gateway is plugged and we enable IPv6
forwarding on it, make sure the 'all' sysctl knob is also
enabled, else IPv6 packets will not be forwarded. This
seems to only affect HA routers that default to disabling
this 'all' knob on creation.
Also, when we are removing all the IPv6 addresses from a
HA router internal interface, set 'accept_ra' to zero so
it doesn't accidentally auto-configure an address. Set
it back to one when adding them back.
Re-homed newly added _wait_until_ipv6_forwarding_has_state()
accordingly.
Conflicts:
neutron/tests/functional/agent/l3/test_ha_router.py
Closes-bug: #1787919
Change-Id: Ia1f311ee31d1479089685367a97bf13cf170b342
(cherry picked from commit b847cd02c5)
(cherry picked from commit dfedafe5f6)
If DHCP agent port cache is out of sync with neutron server, dnsmasq
entries are wrong and VMs may not acquire an IP because of duplicate
entries.
When DHCP agent executes port_create_end method, port's
IP should be checked before being used, if there are duplicate IP
addresses in the same network in the cache we should resync.
Co-Authored-By: doreilly@suse.com
Closes-Bug: #1645835
Change-Id: Icc555050283420fddfb90bb67e02bc303e989e27
AsyncProcess.stop() method has now additional parameter
kill_timeout. If this is set to some value different than
None, eventlet.green.subprocess.Popen.wait() will be called
with this timeout, so TimeoutExpired exception will be raised
in case if process will not be killed for this "kill_timeout"
time.
In such case process will be killed "again" with SIGKILL signal
to make sure that it is gone.
This should fix problem with failing fullstack tests, when
ovs_agent process is sometimes not killed and test timeout was
reached in this wait() method.
Conflicts:
neutron/agent/linux/async_process.py
Change-Id: I1e12255e5e142c395adf4e67be9d9da0f7a3d4fd
Closes-Bug: #1798472
(cherry picked from commit 9b23abbdb6)
Trunk driver is not needed to be initialized when "trunk"
service plugin is not enabled.
On production environments it's not possible to base on
"service_plugins" config option on L2 agent's side so this
driver is initialized always.
It cause problems on fullstack tests becasue there is race
condition between different ovs agents which consumes events
from Openvswitch monitor.
On fullstack tests however we can assume that agent's and server's
config are in sync so trunk driver can be initialized only if
"trunk" service plugin is enabled on server side.
Change-Id: I3ad8d6e7b8f103867ee277078d03f3a01c20ac0d
Closes-Bug: #1687709
(cherry picked from commit 806cf71eb5)
The unit test
test_enable_dhcp_helper_enable_metadata_nonisolated_dist_network
modifies the global variables fake_port1, fake_port2, creating flakiness
on unit tests that use those variables when execured in environments
with high concurrency.
Creating a deepcopy of the variable avoid that those changes can be
propagated to other unit tests.
Closes-Bug: #1809643
Change-Id: Idfd0e99739952baf4d7b545b406cd1b251deb5f8
Signed-off-by: aojeagarcia <aojeagarcia@suse.com>
(cherry picked from commit e83e5618b7)
This fixes race condition leading to lack of fdb entries
on agent after OVS restart, if agent managed to handle all ports
before sending state report with start_flag set to True.
Change-Id: I943f8d805630cdfbefff9cff1fb4bce89210618b
Closes-Bug: #1808136
(cherry picked from commit 3995abefb1)
When a deployment has instance ports that are neutron trunk ports with
DPDK vhu in vhostuserclient mode, when the instance reboots nova will
delete the ovs port and then recreate when the host comes back from
reboot. This quick transition change can trigger a race condition that
causes the tbr trunk bridge to be deleted after the port has been
recreated. See the bug for more details.
This change mitigates the race condition by adding a check for active
service ports within the trunk port deletion function.
Change-Id: I70b9c26990e6902f8888449bfd7483c25e5bff46
Closes-Bug: #1807239
(cherry picked from commit bd2a1bc6c3)
When a subnet's enable_dhcp attribute is updated, we must restart
dhcp device. So,when we decide whether 'restart' or
'reload_allocations' in refresh_dhcp_helper function we only compare
the cidr of subnets which enabled dhcp.
The previous logic only calls 'restart' when deleting or adding a
subnet. This may cause the dhcp port not updated when the subnet's
enable_dhcp is updated to True.
Change-Id: Ic547946ac786c5fab82b4ee7078bf86483f51eb5
Closes-Bug: #1805824
(cherry picked from commit 9aa7af8221)
When ovs-vswitchd process is restarted neutron-ovs-agent will
handle it and reconfigure all ports and openflows in bridges.
Unfortunatelly when tunnel networks are used together with
L2pop mechanism driver, this driver will not notice that agent
lost all openflow config and will not send all fdb entries which
should be added on host.
In such case L2pop mechanism driver should behave in same way like
when neutron-ovs-agent is restarted and send all fdb_entries to
agent.
This patch adds "simulate" of agent start flag when ovs_restart is
handled thus neutron-server will send all fdb_entries to agent and
tunnels openflow rules can be reconfigured properly.
Change-Id: I5f1471e20bbad90c4cdcbc6c06d3a4412db55b2a
Closes-bug: #1804842
(cherry picked from commit ae031d1886)
It may happen that L3 agent works in dvr_snat mode but
it handles some router as "normal" dvr router because
snat for this router is handled on other node.
In such case we shouldn't try to get floating IPs cidrs
from snat namespace as it doesn't exists on host.
Change-Id: Ib27dc223fcca56030ebb528625cc927fc60553e1
Related-Bug: #1717302
(cherry picked from commit 7d0e1ccd34)
If the host OS is using an older kernel and invoke the compile_ovs
function from the DevStack OVS library (devstack/lib/ovs), that function
will try to install the kernel-dev and kernel-headers package even if
the "build_modules" parameter is set to False.
That could fail because the specific kernel-* packages for the version
of the kernel running may not be present in the distro's repository
anymore. Plus, if the kernel modules will not be compiled, there's no
reason to install such packages.
This patch is fixing this problem by using the "build_modules" parameter
as a flag to whether install or not those kernel-* packages.
Change-Id: I11af0e22d25973e6334e867ab2659fbdf9f10d86
Closes-Bug: #1802101
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
(cherry picked from commit cdfeeaf2bc)
The current method of specifying each rootwrap filter
in the file list is prone to errors when adding or
removing filters. Instead of relying on a manually
maintained list this patch just includes all the files
of the correct naming convention from the applicable
folder. This is simpler and easier to maintain.
Closes-Bug: #1718356
Change-Id: I7f8c55f63d1c5a85a6a92062e918426f7d2d3c35
(cherry picked from commit 45f1404c68)
With DVR routers, if a port is associated with a FloatingIP,
before it is used by a VM, the FloatingIP will be initially
started at the Network Node SNAT Namespace, since the port
is not bound to any host.
Then when the port is attached to a VM, the port gets its
host binding, and then the FloatingIP setup should be migrated
to the Compute host and the original FloatingIP in the Network
Node SNAT Namespace should be cleared.
But the original FloatingIP setup in SNAT Namespace was not
cleared by the agent.
This patch addresses the issue.
Change-Id: I55a16bcc0020087aa1abe76f5bc85cd64ccdaecd
Closes-Bug: #1796491
(cherry picked from commit cd0cc47a6a)
In case when 2 dvr routers are connected to each other with
tenant network, those routers needs to be always deployed
on same compute nodes.
So this patch changes dvr routers scheduler that it will create
dvr router on each host on which there are vms or other dvr routers
connected to same subnets.
Co-Authored-By: Swaminathan Vasudevan <SVasudevan@suse.com>
Closes-Bug: #1786272
Conflicts:
neutron/agent/l3/agent.py
neutron/db/l3_dvr_db.py
neutron/tests/unit/agent/l3/test_agent.py
Change-Id: I579c2522f8aed2b4388afacba34d9ffdc26708e3
(cherry picked from commit 5018d70241)
(cherry picked from commit b127433f38)