We observe an excessive amount of routers created on
compute node on which some virtual machines got a fixed
ip on floating network.
Rpc servers should filter out those unnecessary routers
during syncing.
Change-Id: I299031a505f05cd0469e2476b867b9dbca59c5bf
Partial-Bug: #1840579
(cherry picked from commit 480b04ce04)
In case when physical bridge is removed and created again it
is initialized by neutron-ovs-agent.
But if agent has enabled distributed routing, dvr related
flows wasn't configured again and that lead to connectivity issues
in case of DVR routers.
This patch fixes it by adding configuration of dvr related flows
if distributed routing is enabled in agent's configuration.
It also adds reset list of phys_brs in dvr_agent. Without that there
were different objects used in ovs agent and dvr_agent classes thus
e.g. 2 various cookie ids were set on flows in physical bridge.
This was also the same issue in case when openvswitch was restarted and
all bridges were reconfigured.
Now in such case there is correctly new cookie_id configured for all
flows.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I710f00f0f542bcf7fa2fc60800797b90f9f77e14
Closes-Bug: #1864822
(cherry picked from commit 91f0bf3c85)
1. Make grenade jobs experimental for EM branches
As discussed in ML thread[1], we are going to
make grenade jobs as non voting for all EM stable and
oldest stable. grenade jobs are failing not and it might take
time to fix those if we are able to fix. Once it jobs are
working depends on project team, they can bring them back to
voting or keep non-voting.
If those jobs are failing consistently and no one is fixing them
then removing those n-v jobs in future also fine.
Additionally, it was proposed in neutron CI meeting [2] that non-voting
jobs would be moved to experimental, so move grenade jobs there instead
of keeping them non-voting
[1] http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015499.html
[2] http://eavesdrop.openstack.org/meetings/neutron_ci/2020/neutron_ci.2020-07-01-15.00.log.html#l-101
StableOnly
Conflicts:
.zuul.yaml
(cherry picked from commit 9313dce459)
2. Install pip2 for functional/fullstack/neutron-tempest-iptables_hybrid
Else both jobs fail with "sudo: pip: command not found"
3. Add ensure-tox for functional/fullstack/neutron-tempest-iptables_hybrid
Similar error message for tox
4. Disable OVS compilation for fullstack and move job to experimental
Compilation fails similarly to recent master failures:
/opt/stack/new/ovs/datapath/linux/geneve.c:943:15: error: ‘const struct ipv6_stub’ has no member named ‘ipv6_dst_lookup’
But branch 2.9 is not updated anymore. Use official package
This triggers a few tests failures, so move it to experimental (instead
of marking non-voting), same as grenade jobs
Change-Id: Ie846a8cb481da65999b12f5547b407cc7bdc3138
When a Port is deleted, the QoS extension will reset any rule (QoS
and Queue registers) applied on this port or will reset the
related Interface policing parameters.
If the Port and the related Interface are deleted during the QoS
extension operation, those commands will fail. This patch makes those
operations more resiliant by not checking the errors when writing on
the Port or the Interface register.
NOTE: this patch is squashed with [1]. That will fix the problem
with empty "vsctl" transactions when using this OVS DB implementation.
[1]https://review.opendev.org/#/c/738574/
Change-Id: I2cc4cdf5be25fab6adbc64acabb3fffebb693fa6
Closes-Bug: #1884512
(cherry picked from commit e2d1c2869a)
(cherry picked from commit 84ac8cf9ff)
(cherry picked from commit 3785868bfb)
(cherry picked from commit 7edfb0ef4a)
Neutron-ovs-agent can now enable IGMP snooping in integration bridge
if config option "igmp_snooping_enable" in OVS section in config will
be set to True.
It will also set mcast-snooping-disable-flood-unregistered=true
so flooding of multicast packets to all unregistered ports will be
disabled also.
Both changes are applied on integration bridge.
Change-Id: I12f4030a35d10d1715d3b4bfb3ed5efb9aa28f2b
Closes-Bug: #1840136
(cherry picked from commit 5b341150e2)
There is a race condition between nova-compute boots instance and
l3-agent processes DVR (local) router in compute node. This issue
can be seen when a large number of instances were booted to one
same host, and instances are under different DVR router. So the
l3-agent will concurrently process all these dvr routers in this
host at the same time.
For now we have a green pool for the router ResourceProcessingQueue
with 8 greenlet, but some of these routers can still be waiting, event
worse thing is that there are time-consuming actions during the router
processing procedure. For instance, installing arp entries, iptables
rules, route rules etc.
So when the VM is up, it will try to get meta via the local proxy
hosting by the dvr router. But the router is not ready yet in that
host. And finally those instances will not be able to setup some
config in the guest OS.
This patch adds a new measurement based on the router quantity to
indicate the L3 router process queue green pool size. The pool size
will be limit from 8 (original value) to 32, because we do not want
the L3 agent cost too much host resource on processing router in the
compute node.
Conflicts:
neutron/tests/functional/agent/l3/test_legacy_router.py
Related-Bug: #1813787
Change-Id: I62393864a103d666d5d9d379073f5fc23ac7d114
(cherry picked from commit 837c9283ab)
In the patch [1] we changed definition of the abstract method
"plug" in the LinuxInterfaceDriver class.
That broke e.g. 3rd-party drivers which still don't accept this
new parameter called "link_up" in the plug_new method.
So this patch fixes this to make such legacy drivers to be still working
with the new base interface driver class.
This commit also marks such definition of the plug_new method as
deprecated. Possibility of using it without accepting link_up parameter
will be removed in the "W" release of the OpenStack.
[1] https://review.opendev.org/#/c/707406/
Change-Id: Icd555987a1a57ca0b31fa7e4e830583d6c69c861
Closes-Bug: #1879307
(cherry picked from commit 30d573d5ab)
(cherry picked from commit 9c242a0329)
(cherry picked from commit bc8c38bda8)
Although notify_nova_on_port_status_changes defaults to true, it
could be to false, making the nova_notifier attribute unsafe to
use without checking.
This patch checks both the config option and that the attribute
exists, since the config could be changed after the plugin is
already initialized without the nova_notifier attribute being set.
Change-Id: Ide0f93275e60dffda10b7da59f6d81c5582c3849
Closes-bug: #1843269
(cherry picked from commit ab4320edb4)
In order to reduce the number of elements retrieved from the DB, this
patch, before processing the VLAN allocations per physical network,
deleted those registers belonging to any unconfigured physical network.
The VLAN registers per physical network are deleted using a bulk delete
operation, to speed up the process.
Those missing VLAN registers per network are now created using a bulk
insert operation, available in the ORM. This bulk operation speeds up
the sync process.
Conflicts:
neutron/plugins/ml2/drivers/type_vlan.py
Change-Id: I8568e2277e157754aaff87a059a40e34e6a43e2b
Partial-Bug: #1862178
(cherry picked from commit 016e7826f1)
(cherry picked from commit 651eb12bec)
(cherry picked from commit 4fff732b76)
Patch [1] introduced new mechanism which only brings UP interfaces
on master node of HA router. It works fine with keepalived 1.x
but it is broken when keepalived 2.x was used (e.g. on Centos 8) as
in this new version of keepalived by default all interfaces of VIPs
and routes are tracked, and if one of them is DOWN, keepalived is
going to FAULT state. Because of that router will never be
transitioned to MASTER on any node.
This patch fixes it by adding "no_track" option to all VIPs
and routes in keepalived's config file.
This "no_track" option isn't added to ha interface so this one
is still tracked by keepalived.
[1] https://review.opendev.org/#/c/707406/
Closes-bug: #1874211
Change-Id: Ic16cf83fe1d1576d91047adb2d4f9e07d57185b6
(cherry picked from commit dc9084a8ec)
Retrieving the SG rules now is used the admin context. This allows to
get all possible rules, independently of the user calling. The filters
passed and the RBAC policies filter those results, returning only:
- The SG rules belonging to the user.
- The SG rules belonging to a SG owned by the user.
However, if the SG list is too long, the query can take a lot of time.
Instead of this, the filtering is done in the DB query. If no filters
are passed to "get_security_group_rules" and the context is not the
admin context, only the rules specified in the first paragraph will
be retrieved.
Because overwriting the method "get_objects" is too complex, an
intermediate query is done to retrieve the SG rule IDs. Those IDs
will be used as a filter in the "get_objects" call.
Conflicts:
neutron/objects/securitygroup.py
neutron/tests/unit/db/test_securitygroups_db.py
neutron/tests/unit/objects/test_securitygroup.py
Closes-Bug: #1863201
Change-Id: I25d3da929f8d0b6ee15d7b90ec59b9d58a4ae6a5
(cherry picked from commit d874c46bff)
(cherry picked from commit d3905264b7)
(cherry picked from commit 61dc621c1b)
In version 4.15 of iproute2 there was added support
for chain index in tc_filter [1].
Such version is available e.g. in Ubuntu 18.04 and it
has to be supported in l3_tc_lib regex to match
properly output of "tc filter" command.
[1] https://lwn.net/Articles/745643/
Closes-bug: #1809497
Change-Id: Id4066b5cff933ccd0dd3c751bf67b5d58af662d1
(cherry picked from commit e788d29458)
This patch is the first one of a series of patches improving how the L3
agents update the router HA state to the Neutron server.
This patch partially reverts the previous patch [1]. When the batch
notifier sends events, it calls the callback method passed during the
initialization, in this case AgentMixin.notify_server. The batch
notifier spawns a new thread in charge of sending the notifications and
then wait the specified "batch_interval" time. If the callback method is
not synchronous with the notify thread execution (what [1] implemented),
the thread can finish while the RPC client is still sending the
HA router states. If another HA state update is received, then both
updates can be executed at the same time. It is possible then that a new
router state can be overwritten with an old one still not sent or
processed.
The batch notifier is refactored, to improve what initally was
implemented [2] and then updated [3]. Currently, each new event thread
can update the "pending_events" list. Then, a new thread is spawned to
process this event list. This thread decouples the current execution
from the calling thread, making the event processing a non-blocking
process.
But with the current implementation, each new process will spawn a new
thread, synchronized with the previous and new ones (using a
synchronized decorator). That means, during the batch interval time, the
system can have as many threads waiting as new events received. Those
threads will end secuentially when the previous threads end the batch
interval sleep time.
Instead of this, this patch receives and enqueue each new event and
allows only one thread to be alive while processing the event list. If
at the end of the processing loop new events are stored, the thread will
process then.
[1] I3f555a0c78fbc02d8214f12b62c37d140bc71da1
[2] I2f8cf261f48bdb632ac0bd643a337290b5297fce
[3] I82f403441564955345f47877151e0c457712dd2f
Partial-Bug: #1837635
Change-Id: I20cfa1cf5281198079f5e0dbf195755abc919581
(cherry picked from commit 8b7d2c8a93)
L3 router will set its devices link up by default.
For HA routers, the gateway device will be pluged
in all scheduled hosts. When the gateway deivce is
up in backup node, it will send out IPv6 related
packets (MLDv2) according to some kernal config.
This will cause the physical fabric think that the
gateway MAC is now working in the backup node. And
finally the master node L3 traffic will be broken.
This patch sets the backup gateway device link down
by default. When the VRRP sets the master state in
one host, the L3 agent state change procedure will
do link up action for the gateway device.
Conflicts:
neutron/agent/l3/router_info.py
Closes-Bug: #1859832
Change-Id: I8dca2c1a2f8cb467cfb44420f0eea54ca0932b05
(cherry picked from commit c52029c39a)
(cherry picked from commit b9a2968100)
As described in the bug, when a HA router transitions from "master" to
"backup", "keepalived" processes will set the virtual IP in all other
HA routers. Each HA router will then advert it and "keepalived" will
decide, according to a trivial algorithm (higher interface IP), which
one should be "master". At this point, the other "keepalived" processes
running in the other servers, will remove the HA router virtual IP
assigned an instant before
To avoid transitioning some routers form "backup" to "master" and then
to "backup" in a very short period, this patch delays the "backup" to
"master" transition, waiting for a possible new "backup" state. If
during the waiting period (set to the HA VRRP advert time, 2 seconds
default) to set the HA state to "master", the L3 agent receives a new
"backup" HA state, the L3 agent does nothing.
Conflicts:
neutron/agent/l3/agent.py
Closes-Bug: #1837635
Change-Id: I70037da9cdd0f8448e0af8dd96b4e3f5de5728ad
(cherry picked from commit 3f022a193f)
(cherry picked from commit adac5d9b7a)
Increate waiting time in "test_reschedule_network_on_new_agent", when
the network is being rescheduled from a dead DHCP agent to an running
one.
According to [1], 120 seconds could be a convervative and enough value
to wait for this transition.
[1] https://bugs.launchpad.net/neutron/+bug/1799555/comments/23
Change-Id: I2fec6015b56fd1b5d21b75f7432c40b2110fe6bc
Related-Bug: #1799555
(cherry picked from commit d15ad2e481)
After spawning the "dnsmasq" process in the method
"Dnsmasq._spawn_or_reload_process", we need to check that the "dnsmasq"
process is running and could be detected by the ProcessManager instance
controlling it.
ProcessManager determines if a process is "active":
- If the network ID is in the cmdline used to execute the process.
- If the process is detected by psutil.Process(pid), returning the
cmdline needed in the first condition.
- If the PID file exists; this is written by the dnsmasq process
once is started and is needed in the second condition.
To make this feature available for any other process using
ProcessManager, the implementation is done in this class.
Change-Id: I51dc9d342c613afcbcfdc50a1d2811502748f170
Closes-Bug: #1849502
(cherry picked from commit 7c5ce50a0c)
(cherry picked from commit ce3f2f7d26)
(cherry picked from commit 2d8613e3c4)
The DHCP agent prioritizes RPC messages based on the
priority field send from neutron-server, but then groups
them all in the same dhcp_ready_ports set when sending
them back to the server to clear the provisioning block(s).
Priority should be given to new and changed ports, since
those are most likely to be associated with new instances
which can fail to boot if they are not handled quickly when
the agent is very busy, for example, right after it was
restarted.
Conflicts:
neutron/tests/unit/agent/dhcp/test_agent.py
Change-Id: Ib5074abadd7189bb4bdd5e46c677f1bfb071221e
Closes-bug: #1864675
(cherry picked from commit 113dfac608)
When user is using keepalived on their instances, he often creates
additional port in Neutron to allocate some IP address which will
be then used as VIP in keepalived and will be configured in
allowed_address_pair of other ports plugged to instances with
keepalived.
This is e.g. Octavia's use case.
This together with DVR caused problems with connectivity to such VIP
as it was populated in router's arp cache with MAC address from
Neutron db.
As this port isn't bound, it is only Neutron db entry so there is no
need to set it in arp cache of the router.
This patch is doing exactly that to filter such "unbound" and
"binding_failed" ports from the list.
Conflicts:
neutron/tests/unit/db/test_l3_dvr_db.py
Change-Id: Ia885ce00dbb5f2968859e8d0850bc511016f0846
Closes-Bug: #1869887
(cherry picked from commit eb775458c6)
Create a method for bulk assignment of IP addresses within the ipam
driver, to support bulk creation of ports.
This also changes the logic for how the window of available IP addresses
to assign from is calculated within the neutrondb IPAM driver. The
Python random module is used to produce a statistically sampled set of
IP addresses out of the set of available IPs; this will facilitate
collission avoidance. When requesting multiple IP addresses the
selection window sized is increased significantly to ensure a larger
number of available IPs, but caps are placed on the amount to make sure
we do not transgress system limits when building pools of IPv6
addresses.
Change-Id: Iad8088eaa261b07153fa358ae34b9a2442bc2a3e
Implements: blueprint speed-up-neutron-bulk-creation
(cherry picked from commit 06e38be42e)
Do not flood the packets to bridge, since we have the
bridge port list, we can add a simple direct flow to
the right port only.
Conflicts:
neutron/agent/linux/openvswitch_firewall/firewall.py
neutron/conf/plugins/ml2/drivers/ovs_conf.py
Closes-Bug: #1732067
Related-Bug: #1841622
Change-Id: I14fefe289a19b718b247bf0740ca9bc47f8903f4
(cherry picked from commit efa8dd0895)
For vlan type network, we add a segment match flow
to the openflow security group ingress table. Then
the packets will be recorded in conntrack table, and
the reply packets can be processed properly.
Conflicts:
doc/source/contributor/internals/openvswitch_firewall.rst
Change-Id: Ieded0654d0ad16235ec923b822dcd842bd7735e5
Closes-Bug: #1831534
(cherry picked from commit aa58542e82)
If a user specifies a header in their request for metadata,
it could override what the proxy would have inserted on their
behalf. Make sure to remove any headers we don't want, and
override something that might be present in the request.
If the agent somehow gets a request with both headers it will
silently drop it.
Change-Id: Id6c103b7bcebe441c27c6049d349d84ba7fd15a6
Closes-bug: #1865036
(cherry picked from commit 5af046fd4e)
If the DVR+HA router has external gateway, the snat-namespace will be
initialized twice during agent restart. And that ns initialization
function will run many external resource processing actions which will
definitely increase the starting time of L3 agent. This patch addresses
this issue.
Change-Id: I7719491275fa1ebfa7e881366e5cb066e3d4185c
Closes-Bug: #1850779
(cherry picked from commit 7a9d6d2641)
Without this mock UT related configure_ipv6 method were failing
e.g. on systems when IPv6 was disabled or on systems where such
check should be done differently, like MacOS.
Change-Id: I6d13ab1db1d5465b2ff6abf6499e0d17e1ee8bbb
(cherry picked from commit c20b5e347d)
During the ha router state change event, the gateway port only
changed the L2 binding host. So l3 agent has the entire gateway
port information. It is not necessary to send a router_update
message to l3 agent again.
Depends-On: https://review.opendev.org/708825/
Closes-Bug: #1795127
Change-Id: Ia332421aff995f42e7a6e6e96b74be1338d54fe1
(cherry picked from commit 452b282412)
Security group can have a state of empty ports but non-empty members. So
we need skip the flow update only when members dict is empty.
Change-Id: I429edb3d2dea5fa97441909b4d2c776f97f0516f
Closes-Bug: #1862703
Related-Bug: #1854131
(cherry picked from commit 6dbba8d5ce)
Common neutron resource(e.g, Port) consists of:
1. Resource Attributes, e.g: Port.mac_address, etc.
2. Standard Attributes, e.g: created_at, and are shared among all
neutron resources.
The `sort` opt only supports limited attributes. We need to filter
attributes that are defined with `is_sort_key=True` and it's preferred
to explicitly warn CLI & API users of illegal sort keys rather than
just accept without check, pass forward and then hit a internal error
which's quite confusing.
Depends-on: https://review.opendev.org/#/c/660097/
Change-Id: I8d206f909b09f1279dfcdc25c39989a67bff93d5
Closes-Bug: #1659175
(cherry picked from commit 335ac4e2d9)