When the L3 agent get a router update notification, it will try to
retrieve the router info from neutron server. But at this time, if
the message queue is down/unreachable. It will get exceptions related
message queue. The resync actions will be run then. Sometimes, rabbitMQ
cluster is not so much easy to recover. Then Long time MQ recover time
will cause the router info sync RPC never get successful until it meets
the max retry time. Then the bad thing happens, L3 agent is trying to
remove the router now. It basically shutdown all the existing L3 traffic
of this router.
This patch directly removes the final router removal action, let the
router run as it is.
Closes-Bug: #1871850
Change-Id: I9062638366b45a7a930f31185cd6e23901a43957
(cherry picked from commit 12b9149e20)
When any port in the OVS agent is using a security groups (SG) and
this SG is removed, is marked to be deleted. This deletion process
is done in [1].
The SG deletion process consists on removing any reference of this SG
from the firewall and the SG port map. The firewall removes this SG in
[2].
The information of a SG is stored in:
* ConjIPFlowManager.conj_id_map = ConjIdMap(). This class stores the
conjunction IDS (conj_ids) in a dictionary using the following keys:
ConjIdMap.id_map[(sg_id, remote_sg_id, direction, ethertype,
conj_ids)] = conj_id_XXX
* ConjIPFlowManager.conj_ids is a nested dictionary, built in the
following way:
self.conj_ids[vlan_tag][(direction, ethertype)][remote_sg_id] = \
set([conj_id_1, conj_id_2, ...])
This patch stores all conjuntion IDs generated and assigned to the
tuple (sg_id, remote_sg_id, direction, ethertype). When a SG is
removed, the deletion method will look for this SG in the new storage
variable created, ConjIdMap.id_map_group, and will mark all the
conjuntion IDs related to be removed. That will cleanup those rules
left in the OVS matching:
action=conjunction(conj_id, 1/2)
[1]118930f03d/neutron/agent/linux/openvswitch_firewall/firewall.py (L731)
[2]118930f03d/neutron/agent/linux/openvswitch_firewall/firewall.py (L399)
Conflicts:
neutron/tests/unit/agent/linux/openvswitch_firewall/test_firewall.py
Change-Id: I63e446a30cf10e7bcd34a6f0d6ba1711301efcbe
Related-Bug: #1881157
(cherry picked from commit 0eebd002cc)
(cherry picked from commit ed22f7a2ff)
(cherry picked from commit 6615f248e2)
These jobs are used to test neutron against current master branches of
external projects (ryu).
Stable branches are not expected to run with this setup so we can drop
the jobs there (especially as some of these jobs are periodic ones).
Conflicts:
.zuul.yaml
This is adapted to the older single .zuul.yaml file layout
Change-Id: I2b1f4c4e951b691fb0c66119699b26540b0babd8
(cherry picked from commit 21719a875d)
(cherry picked from commit da0bad4acc)
Add validator to update_floatingip_port_forwarding so codepath does not
attempt performing invalid database operation. With that, operation fails
right away, with a hint on the offending argument(s).
This is a backport that combines changes from 2 changes that go together:
https://review.opendev.org/#/c/738145/https://review.opendev.org/#/c/744993/
Note: pep8 failed with following error:
./neutron/tests/unit/services/portforwarding/test_pf_plugin.py:237:9:
./neutron/tests/unit/services/portforwarding/test_pf_plugin.py:261:9:
N322 Possible use of no-op mock method. please use assert_called_once_with.
mock_pf_get_objects.assert_called_once()
^
So additional changes were needed for backport.
Change-Id: I8284b22c5d691bfd9eadeb8590c3d4b27d261b04
Closes-Bug: #1878299
(cherry picked from commit f379740348)
(cherry picked from commit 838399f0a4)
In case when such subnet is added directly to the router, there was
validation and subnet which is expected to get RA messages from the
external router couldn't be added to the Neutron router.
But in case when port was first created manually and then plugged to the
router, there wasn't such validation. This patch fixes it by adding same
validation to adding router interface by port.
Change-Id: I054296c790b697198550acbeae29546758b422c2
Closes-Bug: #1889619
(cherry picked from commit 38c7fd7cef)
In case when openvswitch was restarted, full sync of all bridges will
be always triggered by neutron-ovs-agent so there is no need to check
in same rpc_loop iteration if bridges were recreated.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I3cc1f1b7dc480d54a7cee369e4638f9fd597c759
Related-bug: #1864822
(cherry picked from commit 45482e300a)
(cherry picked from commit b8e7886d8b)
When add allowed-address-pair 0.0.0.0/0 to one port, it will
unexpectedly open all others' protocol under same security
group. IPv6 has the same problem.
The root cause is the openflow rules calculation of the
security group, it will unexpectedly allow all IP(4&6)
traffic to get through.
For openvswitch openflow firewall, this patch adds a source
mac address match for the allowed-address-pair which has
prefix lenght 0, that means all ethernet packets from this
mac will be accepted. It exactly will meet the request of
accepting any IP address from the configured VM.
Test result shows that the remote security group and
allowed address pair works:
1. Port has 0.0.0.0/0 allowed-address-pair clould send any
IP (src) packet out.
2. Port has x.x.x.x/y allowed-address-pair could be accepted
for those VMs under same security group.
3. Ports under same network can reach each other (remote
security group).
4. Protocol port number could be accessed only when there
has related rule.
Closes-bug: #1867119
Change-Id: I2e3aa7c400d7bb17cc117b65faaa160b41013dde
(cherry picked from commit 00298fe6e8)
Some of the tools dependencies expect the AGENT group to be
present in the configuration, and at present it is not initialized,
this patch addresses that.
Change-Id: I1a50e77749aaecc3966c9d238f91a1968ed454ef
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Closes-Bug: #1881771
(cherry picked from commit d57735ae0f)
Commit 90212b12 changed the OVS agent so adding vital drop flows on
br-int (table 0 priority 2) for packets from physical bridges was
deferred until DVR initialization later on. But if br-int has no flows
from a previous run (eg after host reboot), then these packets will hit
the NORMAL flow in table 60. And if there is more than one physical
bridge, then the physical interfaces from the different bridges are now
essentially connected at layer 2 and a network loop is possible in the
time before the flows are added by DVR. Also the DVR code won't add them
until after RPC calls to the server, so a loop is more likely if the
server is not available.
This patch restores adding these flows to when the physical bridges are
first configured. Also updated a comment that was no longer correct and
updated the unit test.
Change-Id: I42c33fefaae6a7bee134779c840f35632823472e
Closes-Bug: #1887148
Related-Bug: #1869808
(cherry picked from commit c1a77ef8b7)
(cherry picked from commit 143fe8ff89)
(cherry picked from commit 6a861b8c8c28e5675ec2208057298b811ba2b649)
(cherry picked from commit 8181c5dbfe)
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
In versions prior to Train, the "keepalived-state-change" monitor
does not format correctly the log messages. That happens when the
"Daemon.run()" method executes "unwatch_log". After the privileges
are dropped, the logging should be configured again.
Change-Id: Ief52fac479d4b3cfa5f90118235c241a14b1011f
Closes-Bug: #1886216
(cherry picked from commit 6fd89dacf3)
In case when related dvr router is configured by L3 agent, it is first
added to the tasks queue and then processed as any other router hosted
on the L3 agent.
But if L3 agent will ask neutron server about details of such router,
it wasn't returned back as this router wasn't really scheduled to the
compute node which was asking for it. It was "only" related to some
other router scheduled to this compute node. Because of that router's
info wasn't found in reply from the neutron-server and L3 agent was
removing it from the compute node.
Now _get_router_ids_for_agent method from the l3_dvrscheduler_db module
will check router serviceable ports for each dvr router hosted on the
compute node and will then find all routers related to it. Thanks to
that it will return routers which are on the compute node only because
of other related routers scheduled to this host and such router will not
be deleted anymore.
Change-Id: I689d5135b7194475c846731d846ccf6b25b80b4a
Closes-Bug: #1884527
(cherry picked from commit 38286dbd2e)
When the vlan and vxlan both exist in env, and l2population
and arp_responder are enabled, if we update a port's ip address
from vlan network, there will be arp responder related flows
added into br-tun, this will cause too many arp reply for
one arp request, and vm connections will be unnormal.
Closes-Bug: #1824504
Change-Id: I1b6154b9433a9442d3e0118dedfa01c4a9b4740b
(cherry picked from commit 5301ecf41b)
This option allows to configure Number of times nova or ironic client
should retry on any failed http call.
Default value for this new option is "3".
Conflicts:
neutron/notifiers/ironic.py
neutron/notifiers/nova.py
neutron/tests/unit/notifiers/test_nova.py
Change-Id: I795ee7ca729646be0411a1232bf218015c65010f
Closes-Bug: #1883712
(cherry picked from commit e94511cd25)
During the CI meeting we agreed that non-voting jobs in the branches
which are in Extened Maintenance (EM) phase should be moved to from
the check to the experimental queue.
This patch is doing exactly that.
Change-Id: Ie8a63eacf479ac6871af448a1741598584de8de8
We observe an excessive amount of routers created on
compute node on which some virtual machines got a fixed
ip on floating network.
Rpc servers should filter out those unnecessary routers
during syncing.
Change-Id: I299031a505f05cd0469e2476b867b9dbca59c5bf
Partial-Bug: #1840579
(cherry picked from commit 480b04ce04)
Method _ensure_default_security_group wasn't atomic as it first tries to get
default SG and if that not exists in DB, it tries to create it.
It may happend, like e.g. in Calico plugin that between
get_default_sg_id method and create_security_group method, this default
SG will be created by other neutron worker. And in such case there will
be Duplicate entry exception raised.
So this patch is adding handling of such exception.
Conflicts:
neutron/db/securitygroups_db.py
Change-Id: I515c310f221e7d9ae3be59a26260538d1bc591c2
Closes-Bug: #1883730
(cherry picked from commit 7019c5cf50)
In case when neutron-ovs-agent will notice that any of physical
bridges was "re-created", we should also ensure that stale Open
Flow rules (with old cookie id) are cleaned.
This patch is doing exactly that.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I7c7c8a4c371d6f4afdaab51ed50950e2b20db30f
Related-Bug: #1864822
(cherry picked from commit 63c45b3766)
Block traffic between br-int and br-physical is over kill
and will at least
1. interrupt vlan flow during startup, and is particularly
so if dvr enabled
2. if let's rabbitmq is not stable, it is possible data plane
will be affected and vlan will never work.
Using openstack on k8s particularly amplifies the problem
because pod could be killed pretty easily by liveness
probes.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I51050c600ba7090fea71213687d94340bac0674a
Closes-Bug: #1869808
(cherry picked from commit 90212b12cd)
In some cases it may be useful to log new vlan tag which is found
on the port when it losts old vlan tag which should is expected to
be there.
So this patch adds such value to the log message.
TrivialFix
Depends-On: https://review.opendev.org/735615
Change-Id: I231e624f460510decc6d2237040c8bef207e2e8e
(cherry picked from commit 3ac63422ea)
In case when physical bridge is removed and created again it
is initialized by neutron-ovs-agent.
But if agent has enabled distributed routing, dvr related
flows wasn't configured again and that lead to connectivity issues
in case of DVR routers.
This patch fixes it by adding configuration of dvr related flows
if distributed routing is enabled in agent's configuration.
It also adds reset list of phys_brs in dvr_agent. Without that there
were different objects used in ovs agent and dvr_agent classes thus
e.g. 2 various cookie ids were set on flows in physical bridge.
This was also the same issue in case when openvswitch was restarted and
all bridges were reconfigured.
Now in such case there is correctly new cookie_id configured for all
flows.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I710f00f0f542bcf7fa2fc60800797b90f9f77e14
Closes-Bug: #1864822
(cherry picked from commit 91f0bf3c85)
1. Make grenade jobs experimental for EM branches
As discussed in ML thread[1], we are going to
make grenade jobs as non voting for all EM stable and
oldest stable. grenade jobs are failing not and it might take
time to fix those if we are able to fix. Once it jobs are
working depends on project team, they can bring them back to
voting or keep non-voting.
If those jobs are failing consistently and no one is fixing them
then removing those n-v jobs in future also fine.
Additionally, it was proposed in neutron CI meeting [2] that non-voting
jobs would be moved to experimental, so move grenade jobs there instead
of keeping them non-voting
[1] http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015499.html
[2] http://eavesdrop.openstack.org/meetings/neutron_ci/2020/neutron_ci.2020-07-01-15.00.log.html#l-101
StableOnly
Conflicts:
.zuul.yaml
(cherry picked from commit 9313dce459)
2. Install pip2 for functional/fullstack/neutron-tempest-iptables_hybrid
Else both jobs fail with "sudo: pip: command not found"
3. Add ensure-tox for functional/fullstack/neutron-tempest-iptables_hybrid
Similar error message for tox
4. Disable OVS compilation for fullstack and move job to experimental
Compilation fails similarly to recent master failures:
/opt/stack/new/ovs/datapath/linux/geneve.c:943:15: error: ‘const struct ipv6_stub’ has no member named ‘ipv6_dst_lookup’
But branch 2.9 is not updated anymore. Use official package
This triggers a few tests failures, so move it to experimental (instead
of marking non-voting), same as grenade jobs
Change-Id: Ie846a8cb481da65999b12f5547b407cc7bdc3138
When a Port is deleted, the QoS extension will reset any rule (QoS
and Queue registers) applied on this port or will reset the
related Interface policing parameters.
If the Port and the related Interface are deleted during the QoS
extension operation, those commands will fail. This patch makes those
operations more resiliant by not checking the errors when writing on
the Port or the Interface register.
NOTE: this patch is squashed with [1]. That will fix the problem
with empty "vsctl" transactions when using this OVS DB implementation.
[1]https://review.opendev.org/#/c/738574/
Change-Id: I2cc4cdf5be25fab6adbc64acabb3fffebb693fa6
Closes-Bug: #1884512
(cherry picked from commit e2d1c2869a)
(cherry picked from commit 84ac8cf9ff)
(cherry picked from commit 3785868bfb)
(cherry picked from commit 7edfb0ef4a)
1.It is best not to use 'netaddr.IPSet.add',
because _compact_single_network in 'IPSet.add' is quite time consuming
2.When the current address pool does not have enough addresses,
all addresses are allocated from the current pool,
and allocations are continued from the next address pool
until all addresses are assigned.
Change-Id: I804a95fdaa3552c785e85ffab7b8ac849c634a87
Closes-Bug: #1813253
(cherry picked from commit 1746d7e0e6)
Neutron-ovs-agent can now enable IGMP snooping in integration bridge
if config option "igmp_snooping_enable" in OVS section in config will
be set to True.
It will also set mcast-snooping-disable-flood-unregistered=true
so flooding of multicast packets to all unregistered ports will be
disabled also.
Both changes are applied on integration bridge.
Change-Id: I12f4030a35d10d1715d3b4bfb3ed5efb9aa28f2b
Closes-Bug: #1840136
(cherry picked from commit 5b341150e2)
There is a race condition between nova-compute boots instance and
l3-agent processes DVR (local) router in compute node. This issue
can be seen when a large number of instances were booted to one
same host, and instances are under different DVR router. So the
l3-agent will concurrently process all these dvr routers in this
host at the same time.
For now we have a green pool for the router ResourceProcessingQueue
with 8 greenlet, but some of these routers can still be waiting, event
worse thing is that there are time-consuming actions during the router
processing procedure. For instance, installing arp entries, iptables
rules, route rules etc.
So when the VM is up, it will try to get meta via the local proxy
hosting by the dvr router. But the router is not ready yet in that
host. And finally those instances will not be able to setup some
config in the guest OS.
This patch adds a new measurement based on the router quantity to
indicate the L3 router process queue green pool size. The pool size
will be limit from 8 (original value) to 32, because we do not want
the L3 agent cost too much host resource on processing router in the
compute node.
Conflicts:
neutron/tests/functional/agent/l3/test_legacy_router.py
Related-Bug: #1813787
Change-Id: I62393864a103d666d5d9d379073f5fc23ac7d114
(cherry picked from commit 837c9283ab)
In the patch [1] we changed definition of the abstract method
"plug" in the LinuxInterfaceDriver class.
That broke e.g. 3rd-party drivers which still don't accept this
new parameter called "link_up" in the plug_new method.
So this patch fixes this to make such legacy drivers to be still working
with the new base interface driver class.
This commit also marks such definition of the plug_new method as
deprecated. Possibility of using it without accepting link_up parameter
will be removed in the "W" release of the OpenStack.
[1] https://review.opendev.org/#/c/707406/
Change-Id: Icd555987a1a57ca0b31fa7e4e830583d6c69c861
Closes-Bug: #1879307
(cherry picked from commit 30d573d5ab)
(cherry picked from commit 9c242a0329)
(cherry picked from commit bc8c38bda8)
Although notify_nova_on_port_status_changes defaults to true, it
could be to false, making the nova_notifier attribute unsafe to
use without checking.
This patch checks both the config option and that the attribute
exists, since the config could be changed after the plugin is
already initialized without the nova_notifier attribute being set.
Change-Id: Ide0f93275e60dffda10b7da59f6d81c5582c3849
Closes-bug: #1843269
(cherry picked from commit ab4320edb4)