In order to reduce the number of elements retrieved from the DB, this
patch, before processing the VLAN allocations per physical network,
deleted those registers belonging to any unconfigured physical network.
The VLAN registers per physical network are deleted using a bulk delete
operation, to speed up the process.
Those missing VLAN registers per network are now created using a bulk
insert operation, available in the ORM. This bulk operation speeds up
the sync process.
Conflicts:
neutron/plugins/ml2/drivers/type_vlan.py
Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
Partial-Bug: #1862178
(cherry picked from commit 016e7826f1)
(cherry picked from commit 651eb12bec)
(cherry picked from commit 4fff732b76)
Retrieving the SG rules now is used the admin context. This allows to
get all possible rules, independently of the user calling. The filters
passed and the RBAC policies filter those results, returning only:
- The SG rules belonging to the user.
- The SG rules belonging to a SG owned by the user.
However, if the SG list is too long, the query can take a lot of time.
Instead of this, the filtering is done in the DB query. If no filters
are passed to "get_security_group_rules" and the context is not the
admin context, only the rules specified in the first paragraph will
be retrieved.
Because overwriting the method "get_objects" is too complex, an
intermediate query is done to retrieve the SG rule IDs. Those IDs
will be used as a filter in the "get_objects" call.
Conflicts:
neutron/objects/securitygroup.py
neutron/tests/unit/db/test_securitygroups_db.py
neutron/tests/unit/objects/test_securitygroup.py
Closes-Bug: #1863201
Change-Id: I25d3da929f8d0b6ee15d7b90ec59b9d58a4ae6a5
(cherry picked from commit d874c46bff)
(cherry picked from commit d3905264b7)
(cherry picked from commit 61dc621c1b)
In version 4.15 of iproute2 there was added support
for chain index in tc_filter [1].
Such version is available e.g. in Ubuntu 18.04 and it
has to be supported in l3_tc_lib regex to match
properly output of "tc filter" command.
[1] https://lwn.net/Articles/745643/
Closes-bug: #1809497
Change-Id: Id4066b5cff933ccd0dd3c751bf67b5d58af662d1
(cherry picked from commit e788d29458)
As described in the bug, when a HA router transitions from "master" to
"backup", "keepalived" processes will set the virtual IP in all other
HA routers. Each HA router will then advert it and "keepalived" will
decide, according to a trivial algorithm (higher interface IP), which
one should be "master". At this point, the other "keepalived" processes
running in the other servers, will remove the HA router virtual IP
assigned an instant before
To avoid transitioning some routers form "backup" to "master" and then
to "backup" in a very short period, this patch delays the "backup" to
"master" transition, waiting for a possible new "backup" state. If
during the waiting period (set to the HA VRRP advert time, 2 seconds
default) to set the HA state to "master", the L3 agent receives a new
"backup" HA state, the L3 agent does nothing.
Conflicts:
neutron/agent/l3/agent.py
Closes-Bug: #1837635
Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
(cherry picked from commit 3f022a193f)
(cherry picked from commit adac5d9b7a)
Increate waiting time in "test_reschedule_network_on_new_agent", when
the network is being rescheduled from a dead DHCP agent to an running
one.
According to [1], 120 seconds could be a convervative and enough value
to wait for this transition.
[1] https://bugs.launchpad.net/neutron/+bug/1799555/comments/23
Change-Id: I2fec6015b56fd1b5d21b75f7432c40b2110fe6bc
Related-Bug: #1799555
(cherry picked from commit d15ad2e481)
After spawning the "dnsmasq" process in the method
"Dnsmasq._spawn_or_reload_process", we need to check that the "dnsmasq"
process is running and could be detected by the ProcessManager instance
controlling it.
ProcessManager determines if a process is "active":
- If the network ID is in the cmdline used to execute the process.
- If the process is detected by psutil.Process(pid), returning the
cmdline needed in the first condition.
- If the PID file exists; this is written by the dnsmasq process
once is started and is needed in the second condition.
To make this feature available for any other process using
ProcessManager, the implementation is done in this class.
Change-Id: I51dc9d342c613afcbcfdc50a1d2811502748f170
Closes-Bug: #1849502
(cherry picked from commit 7c5ce50a0c)
(cherry picked from commit ce3f2f7d26)
(cherry picked from commit 2d8613e3c4)
The DHCP agent prioritizes RPC messages based on the
priority field send from neutron-server, but then groups
them all in the same dhcp_ready_ports set when sending
them back to the server to clear the provisioning block(s).
Priority should be given to new and changed ports, since
those are most likely to be associated with new instances
which can fail to boot if they are not handled quickly when
the agent is very busy, for example, right after it was
restarted.
Conflicts:
neutron/tests/unit/agent/dhcp/test_agent.py
Change-Id: Ib5074abadd7189bb4bdd5e46c677f1bfb071221e
Closes-bug: #1864675
(cherry picked from commit 113dfac608)
When "trunk:subport" wasn't added to the list of device owners which
are supported by dvr, there was no proper config in br-int's openflow
rules for such port, e.g. there was no dvr_to_src_mac rule in table 1
added and traffic from such port was never going through br-int.
Trunk ports should be added to this dvr serviced device owners list and
that patch is adding it there.
Conflicts:
neutron/common/utils.py
Change-Id: Ic21089adfa32dbf5d0e29a89713e6e2bf28f0f05
Closes-Bug: #1870114
(cherry picked from commit d0a1652227)
DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
off the port is deleted, and when an instance is powered on a port is
created. This means a reboot is functionally a super fast
delete-then-create. Neutron trunking mode in combination with DPDK/vhu
implements a trunk bridge for each tenant, and the ports for the
instances are created as subports of that bridge. The standard way a
trunk bridge works is that when all the subports are deleted, a thread
is spawned to delete the trunk bridge, because that is an expensive and
time-consuming operation. That means that if the port in question is
the only port on the trunk on that compute node, this happens:
1. The port is deleted
2. A thread is spawned to delete the trunk
3. The port is recreated
If the trunk is deleted after #3 happens then the instance has no
networking and is inaccessible; this is the scenario that was dealt with
in a previous change [1]. But there continue to be issues with errors
"RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is
happening in this case is that the trunk is being deleted in the middle
of the execution of #3, so that it stops existing in the middle of the
port creation logic but before the port is actually recreated.
Since this is a timing issue between two different threads it's
difficult to stamp out entirely, but I think the best way to do it is to
add a slight delay in the trunk deletion thread, just a second or two.
That will give the port time to come back online and avoid the trunk
deletion entirely.
[1] https://review.opendev.org/623275
Related-Bug: #1869244
Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b
(cherry picked from commit e37722c0f5)
When user is using keepalived on their instances, he often creates
additional port in Neutron to allocate some IP address which will
be then used as VIP in keepalived and will be configured in
allowed_address_pair of other ports plugged to instances with
keepalived.
This is e.g. Octavia's use case.
This together with DVR caused problems with connectivity to such VIP
as it was populated in router's arp cache with MAC address from
Neutron db.
As this port isn't bound, it is only Neutron db entry so there is no
need to set it in arp cache of the router.
This patch is doing exactly that to filter such "unbound" and
"binding_failed" ports from the list.
Conflicts:
neutron/tests/unit/db/test_l3_dvr_db.py
Change-Id: Ia885ce00dbb5f2968859e8d0850bc511016f0846
Closes-Bug: #1869887
(cherry picked from commit eb775458c6)
Save order by in port query when not require fixed_ips,
and save some useless query for dvr subnet mac.
Conflicts:
neutron/db/db_base_plugin_common.py
neutron/db/l3_dvr_db.py
Closes-Bug: #1834308
Change-Id: I6836840edcaa5a21fd2ba9f65ffd24f7e5038fa3
(cherry picked from commit dd96f37759)
(cherry picked from commit 2d0adf4a05)
Create a method for bulk assignment of IP addresses within the ipam
driver, to support bulk creation of ports.
This also changes the logic for how the window of available IP addresses
to assign from is calculated within the neutrondb IPAM driver. The
Python random module is used to produce a statistically sampled set of
IP addresses out of the set of available IPs; this will facilitate
collission avoidance. When requesting multiple IP addresses the
selection window sized is increased significantly to ensure a larger
number of available IPs, but caps are placed on the amount to make sure
we do not transgress system limits when building pools of IPv6
addresses.
Change-Id: Iad8088eaa261b07153fa358ae34b9a2442bc2a3e
Implements: blueprint speed-up-neutron-bulk-creation
(cherry picked from commit 06e38be42e)
Do not flood the packets to bridge, since we have the
bridge port list, we can add a simple direct flow to
the right port only.
Conflicts:
neutron/agent/linux/openvswitch_firewall/firewall.py
neutron/conf/plugins/ml2/drivers/ovs_conf.py
Closes-Bug: #1732067
Related-Bug: #1841622
Change-Id: I14fefe289a19b718b247bf0740ca9bc47f8903f4
(cherry picked from commit efa8dd0895)
For vlan type network, we add a segment match flow
to the openflow security group ingress table. Then
the packets will be recorded in conntrack table, and
the reply packets can be processed properly.
Conflicts:
doc/source/contributor/internals/openvswitch_firewall.rst
Change-Id: Ieded0654d0ad16235ec923b822dcd842bd7735e5
Closes-Bug: #1831534
(cherry picked from commit aa58542e82)
Master branch now has requirements incompatible with stable branches
constraints [0], so this switches to last tag compatible.
Longer term fix will be to run rally in a virtualenv, but this is lower
priority than other tasks. This can be revisited at the time, though for
older branches we will probably stick with pinned version
[0] 5776e015f1
Change-Id: I9ec8331bae54d191955a843f9d3f8c5537dc37f6
Closes-Bug: #1868691
If a user specifies a header in their request for metadata,
it could override what the proxy would have inserted on their
behalf. Make sure to remove any headers we don't want, and
override something that might be present in the request.
If the agent somehow gets a request with both headers it will
silently drop it.
Change-Id: Id6c103b7bcebe441c27c6049d349d84ba7fd15a6
Closes-bug: #1865036
(cherry picked from commit 5af046fd4e)
During processing of security group rule list API call Neutron will
now ensure that default security group for project given in the filters
or in the context exists.
It is similar to what is done for list of security groups or creation of
new network/port in the project.
Conflicts:
neutron/db/securitygroups_db.py
Change-Id: Id6fee5a752968b356b884d939b708a420016c9bc
Closes-Bug: #1864171
(cherry picked from commit 4739a4febb)
If the DVR+HA router has external gateway, the snat-namespace will be
initialized twice during agent restart. And that ns initialization
function will run many external resource processing actions which will
definitely increase the starting time of L3 agent. This patch addresses
this issue.
Change-Id: I7719491275fa1ebfa7e881366e5cb066e3d4185c
Closes-Bug: #1850779
(cherry picked from commit 7a9d6d2641)
Patch https://review.opendev.org/#/c/697655/ cannot be backported
because it includes an RPC version change. This patch is for the
stable branches.
Currently the ovs agent calls update_device_list with the
agent_restarted flag set only on the first loop iteration. Then the
server knows to send the l2pop flooding entries for the network to
the agent. But when a compute node with many instances on many
networks reboots, it takes time to readd all the active devices and
some may be readded after the first loop iteration. Then the server
can fail to send the flooding entries which means there will be no
flood_to_tuns flow and broadcasts like dhcp will fail.
This patch fixes that by also setting the agent_restarted flag if
the agent has not received the flooding entries for a network.
Change-Id: Iccc4fe4a785ee042fd76a663d0e76a27facd1809
Closes-Bug: #1853613
(cherry picked from commit bc0ab0fcd7)
(cherry picked from commit aee87e72b1)
Port deletion triggers disassociate_floatingips. This patch ensures
that method not only clears the port association for a Floating IP,
but also removes any DNS record associated with it.
Change-Id: Ia6202610c09811f240af35e2523126447bf02ca5
Closes-Bug: #1812168
(cherry picked from commit 4379310846)
Without this mock UT related configure_ipv6 method were failing
e.g. on systems when IPv6 was disabled or on systems where such
check should be done differently, like MacOS.
Change-Id: I6d13ab1db1d5465b2ff6abf6499e0d17e1ee8bbb
(cherry picked from commit c20b5e347d)
During the ha router state change event, the gateway port only
changed the L2 binding host. So l3 agent has the entire gateway
port information. It is not necessary to send a router_update
message to l3 agent again.
Depends-On: https://review.opendev.org/708825/
Closes-Bug: #1795127
Change-Id: Ia332421aff995f42e7a6e6e96b74be1338d54fe1
(cherry picked from commit 452b282412)
If both are run under the same process, and api_workers >= 2, the server
process will instantiate two oslo_service.ProcessLauncher instances
This should be avoided [0], and indeed causes issues on subprocess and
signal handling: killed RPC workers not respawning, SIGHUP on master
process leading to unresponsive server, signal not properly sent to all
child processes, ...
To avoid this, use the wsgi ProcessLauncher instance if it exists
[0] https://docs.openstack.org/oslo.service/latest/user/usage.html#launchers
Change-Id: Ic821f8ca84add9c8137ef712031afb43e491591c
Closes-Bug: #1780139
(cherry picked from commit 13aa00026f)