When a DVR router is migrated from distributed to
centralized, we are unbinding the router from the
agents, but the ml2 distributed portbindings for
the router port still remains intact.
This patch will fix the issue by deleting the
binding entry for multiple hosts.
Closes-Bug: #1718345
Change-Id: If139790eb336ff13b07b094151946af30322ad3e
(cherry picked from commit 32bfc3edec)
Type of lvm.vlan is int and other_config.get('tag') is a string,
they can never be equal. We should do type conversion before
comparing to avoid unnecessary operation of ovsdb and flows.
Change-Id: Ib84da6296ddf3c95be9e9f370eb574bf92ceec15
Closes-Bug: #1843425
(cherry picked from commit 0550c0e1f6)
In case when vlan network was created with segmentation_id=0 and without
physical_network given, it was passing validation of provider segment
and first available segmentation_id was choosen for network.
Problem was that in such case all available segmentation ids where
allocated and no other vlan network could be created later.
This patch fixes validation of segmentation_id when it is set to value 0.
Change-Id: Ic768deb84d544db832367f9a4b84a92729eee620
Closes-bug: #1840895
(cherry picked from commit f01f3ae5dd)
DHCP agent sends to neutron-server information about
ports for which DHCP configration is finished.
There was no logged any information about ports which
has got finished this DHCP ports configuration.
This patch adds such log with INFO level. It is the same as
it is currently done in e.g. neutron-ovs-agent.
Change-Id: I9506f855af118bbbd45b55a711504d6ad0f863cc
(cherry picked from commit 6367141155)
Increased timeouts for OVSDB connection:
- ovsdb_timeout = 30
This patch will mitigate the intermittent timeouts the CI is
experiencing while running the functional tests.
Change-Id: I97a1d170926bb8a69dc6f7bb78a785bdea80936a
Closes-Bug: #1815142
(cherry picked from commit 30e901242f)
When a network is deleted, precommit handlers are notified prior to the
deletion of the network from the database. One handler exists in the ML2
plugin - _network_delete_precommit_handler. This handler queries the
database for the current state of the network and uses it to create a
NetworkContext which it saves under context._mech_context. When the
postcommit handler _network_delete_after_delete_handler is triggered
later, it passess the saved context._mech_context to mechanism drivers.
A problem can occur with provider networks since the segments service
also registers a precommit handler - _delete_segments_for_network. Both
precommit handlers use the default priority, so the order in which they
are called is random, and determined by dict ordering. If the segment
precommit handler executes first, it will delete the segments associated
with the network. When the ML2 plugin precommit handler runs it then
sees no segments for the network and sets the provider attributes of the
network in the NetworkContext to None.
A mechanism driver that is passed a NetworkContext without provider
attributes in its delete_network_postcommit method will not have the
information to perform the necessary actions. In the case of the
networking-generic-switch mechanism driver where this was observed, this
resulted in the driver ignoring the event, because the network did not
look like a VLAN.
This change uses a priority of zero for ML2 network delete precommit
handler, to ensure they query the network and store the NetworkContext
before the segments service has a chance to delete segments.
A similar change has been made for subnets, both to keep the pattern
consistent and avoid any similar issues.
Change-Id: I6482223ed2a479de4f5ef4cef056c311c0281408
Closes-Bug: #1841967
Depends-On: https://review.opendev.org/680001
(cherry picked from commit fea2d9091f)
as with https://review.opendev.org/#/c/656066/ if limit is applied in
any place other than at the end of the filters, sql alchemy will return
an error, and possibily we could return less result than intended.
Change-Id: I9a54ae99d2d5dfda63cb0061bcf3d727ed7cc992
Closes-Bug: #1827363
(cherry picked from commit 94bc403078)
We are hitting sometimes a problem in "test_port_ip_update_revises" [1].
This happens because the port created doesn't belong to the previously
created subnet. We need to enforce that the port is created in the
subnet specifically created in this test.
[1]http://logs.openstack.org/69/650269/12/check/openstack-tox-lower-constraints/7adf36e/testr_results.html.gz
Conflicts:
neutron/tests/unit/services/revisions/test_revision_plugin.py
Change-Id: I399f100fe30b6a03248cef5e6026204d4d1ffb2e
Closes-Bug: #1828865
(cherry picked from commit 872dd7f484)
Currently there is a delay (around 20 seconds) between the the agent
update call and the server reply, due to the testing servers load. This
time should be higher than the agent-server communication delay but
still short enough to detect when the DHCP agent is dead during the
active wait during the DHCP agent network rescheduling.
"log_agent_heartbeats" is activated to add information about when the
server has processed the agent report status call. This log will allow
to check the different between the server updating time and the previous
agent heartbeat timestamp.
Conflicts:
neutron/tests/fullstack/resources/config.py
Change-Id: Icf9a8802585c908fd4a70d0508139a81d5ac90ee
Related-Bug: #1799555
(cherry picked from commit d7c5ae8a03)
In patch [1] as partial fix for bug 1828375 retries mechanism
was proposed.
We noticed that sometimes in have loaded environments 3 retries
defined in [1] can be not enough.
So this patch switches to use neutron_lib.db.api.MAX_RETRIES constant
as number of retries when processing trunk subport bindings.
This MAX_RETRIES constant is set to 20 and in our cases it "fixed"
problem.
[1] https://review.opendev.org/#/c/662236/
Change-Id: I016ef3d7ccbb89b68d4a3d509162b3046a9c2f98
Related-Bug: #1828375
(cherry picked from commit d1f8888843)
The openSUSE 42.3 distribution is eol, remove this experimental job so
that the job can be removed from Zuul.
Note that master has a job for newer openSUSE running.
Change-Id: I0d26d1b1d1c4ca64c7a1dd077752d191fd3a28fb
Neutron-ovs-agent configures physical bridges that they works
in fail_mode=secure. This means that only packets which match some
OpenFlow rule in the bridge can be processed.
This may cause problem on hosts with only one physical NIC
where same bridge is used to provide control plane connectivity
like connection to rabbitmq and data plane connectivity for VM.
After e.g. host reboot bridge will still be in fail_mode=secure
but there will be no any OpenFlow rule on it thus there will be
no communication to rabbitmq.
With current order of actions in __init__ method of OVSNeutronAgent
class it first tries to establish connection to rabbitmq and later
configure physical bridges with some initial OpenFlow rules.
And in case described above it will fail as there is no connectivity
to rabbitmq through physical bridge.
So this patch changes order of actions in __init__ method that it first
setup physical bridges and than configure rpc connection.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
Change-Id: I41c02b0164537c5b1c766feab8117cc88487bc77
Closes-Bug: #1840443
(cherry picked from commit d41bd58f31)
(cherry picked from commit 3a2842bdd8)
Concurrent calls to _bind_port_if_needed may lead to a missing RPC
notification which can cause a port stuck in a DOWN state. If the only
caller that succeeds in the concurrency does not specify that an RPC
notification is allowed then no RPC would be sent to the agent. The
other caller which needs to send an RPC notification will fail since the
resulting PortContext instance will not have any binding levels set.
The failure has negative effects on consumers of the L2Population
functionality because the L2Population mechanism driver will not be
triggered to publish that a port is UP on a given compute node. Manual
intervention is required in this case.
This patch proposes to handle this by populating the PortContext with
the current binding levels so that the caller can continue on and have
an RPC notification sent out.
Closes-Bug: #1755810
Story: 2003922
Change-Id: Ie2b813b2bdf181fb3c24743dbd13487ace6ee76a
(cherry picked from commit 0dc730c7c0)
Looks like by default OVS tunnels inherit skb marks from
tunneled packets. As a result Neutron IPTables marks set in
qrouter namespace are inherited by VXLAN encapsulating packets.
These marks may conflict with marks used by underlying networking
(like Calico) and lead to VXLAN tunneled packets being dropped.
This patch ensures that skb marks are cleared by OVS before entering
a tunnel to avoid conflicts with IPTables rules in default namespace.
Closes-Bug: #1839252
Change-Id: Id029be51bffe4188dd7f2155db16b21d19da1698
(cherry picked from commit 7627735252)
In TestOVSAgent, there are two tests where the OVS agent is
configured and started twice per test. Before the second call,
the agent should be stopped first.
Depends-On: https://review.opendev.org/667216/
Change-Id: I30c2bd4ce3715cde60bc0cd3736bd9c75edc1df3
Closes-Bug: #1830895
(cherry picked from commit b77c79e5e8)
(cherry picked from commit ff66205081)
The current code will remove the port from sg_port_map, but then it
won't be added into the map, when we resize/migrate this instance,
the related openflow won't be deleted, this will cause vm connectivity
problem.
Closes-Bug: #1825295
Change-Id: I94ddddda3c1960d43893c7a367a81279d429e469
(cherry picked from commit 82782d3763)
In case when new external network is set as gateway network for
dvr router, neutron tries to create floating IP agent gateway port.
There should be always max 1 such port per network per L3 agent but
sometimes when there are 2 requests to set external gateway for 2
different routers executed almost in same time it may happend that
there will be 2 such ports created.
That will cause error with configuration of one of routers on L3 agent
and this will cause e.g. problems with access from VMs to metadata
service.
Such issues are visible in DVR CI jobs from time to time. Please check
related bug for details.
This patch adds lock mechanism during creation of such FIP gateway port.
Such solution isn't fully solving exising race condition as if 2
requests will be processed by api workers running on 2 different nodes
than this race can still happend.
But this should mitigate the issue a bit and solve problem in U/S gates
at least.
For proper fix we should probably add some constraint on database level
to prevent creation of 2 such ports for one network and one host but
such solution will not be easy to backport to stable branches so I would
prefer first to go with this easy workaround.
Conflicts:
neutron/db/l3_dvr_db.py
Change-Id: Iabab7e4d36c7d6a876b2b74423efd7106a5f63f6
Related-Bug: #1830763
(cherry picked from commit 7b81c1bc67)
(cherry picked from commit f7532f0c92)
The test creates a list of networks, and then acts on a list of
NetworkDhcpAgentBindings obtained from get_objects() not guaranteed to
follow the original build order (based on the network_ids list)
Make sure that returned list is sorted on network_id, and network_ids
itself sorted so both lists match
Change-Id: I9b07255988f7ba6609af1961b3429c3ce12d5186
Closes-Bug: #1839595
(cherry picked from commit f59b6a4706)
This is a stable-only fix since code around the change was removed
in master: https://review.opendev.org/#/c/641866
Commit a5244d6d44 changed the check
order so regular non-dvr ports are checked for agent restarted.
However regular ports may be unbound already, which leads to the
error in the bug description: agent_restarted check is done against
a 'None' agent.
This patch fixed logic back - only check agent_restarted for dvr ports.
This also adds some logging to have a clue why update port up/down fails.
Change-Id: I3ad59864eeb42916d2cf15a5292d5aa9484f6e91
Closes-Bug: #1835731
(cherry picked from commit c3a3031f78)
when update port set body={"port": {}}, neutron server will return 500.
In function _process_port_binding_attributes(plugins/ml2/plugin.py),
when update port body={"port": {}}
attrs={}
vnic_type = attrs and attrs.get(portbindings.VNIC_TYPE)
vnic_type = {}
because attrs as False, will not execute attrs.get(portbindings.VNIC_TYPE)
vnic_type will be replicated as attrs.
Change-Id: I40d388543387ebdd72f26d761339c1829bef9413
Partial-bug: #1838396
(cherry picked from commit dd080c70b4)
This is an approximate partial fix to #1828375.
update_trunk_status and update_subport_bindings rpc messages are
processed concurrently and possibly out of order on the server side.
Therefore they may race with each other.
The status update race combined with
1) the versioning feature of sqlalchemy used in the standardattributes
table and
2) the less than serializable isolation level of some DB backends (like
MySQL InnoDB)
does raise StaleDataErrors and by that leaves some trunk subports in
DOWN status.
This change retries the trunk status update (to BUILD) blindly when
StaleDataError was caught. In my local testbed this practically
fixes #1828375.
However theoretically the retry may cover up other real errors (when the
cause of the StaleDataError was a different status not just a different
revision count).
To the best of my understanding a proper fix would entail guaranteeing
the in order processing of the above rpc messages - which likely won't
ever happen.
I'm not sure at all if this change is worth merging - let me know what
you think.
Conflicts:
neutron/services/trunk/rpc/server.py
neutron/tests/unit/services/trunk/rpc/test_server.py
Change-Id: Ie581809f24f9547b55a87423dac7db933862d66a
Partial-Bug: #1828375
(cherry picked from commit 618e24e241)
(cherry picked from commit d090fb9a3c)
In case when physical bridge was recreated on host, ovs agent
is trying to reconfigure it.
If there will be e.g. timeout while getting bridge's datapath_id,
RuntimeError will be raised and that caused crash of whole agent.
This patch changes that to not crash agent in such case but try to
reconfigure everything in next rpc_loop iteration once again.
Conflicts:
neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py
neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py
Change-Id: Ic9b17a420068c0c76748e2c24d97be1ed7c460c7
Closes-Bug: #1837380
(cherry picked from commit b63809715a)
This is backport of If65b1da79dfa73c73d91af457b2a5f93c6b2eedc fixes
silent ignore of the endpoint_type option in the placement section.
Change-Id: I18819722b3c7835df60f04d6e3d8182a93f7a1ca
Related-Bug: #1818943
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
Dnsmasq emits a warning when started in most neutron deployments:
dnsmasq[27287]: LOUD WARNING: use --bind-dynamic rather than
--bind-interfaces to avoid DNS amplification attacks via
these interface(s)
Since option --bind-dynamic is available since dnsmasq 2.63
(https://github.com/liquidm/dnsmasq/blob/master/FAQ#L239) and
we require 2.67, change to use this option instead.
Change-Id: Id7971bd99b04aca38180ff109f542422b1a925d5
Closes-bug: #1828473
(cherry picked from commit 09ee934786)
process_trusted_ports() appeared to be greenthread unfriendly, so
if there are many trusted ports on a node, openvswitch agent may
"hang" for a significant time.
This patch adds explicit yield.
Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
Closes-Bug: #1836023
(cherry picked from commit da539da378)
Ovs-agent will scan and process the ports during the
first rpc_loop, and a local port update notification
will be sent out. This will cause these ports to
be processed again in the ovs-agent next (second)
rpc_loop.
This patch passes the restart flag (iteration num 0)
to the local port_update call trace. After this patch,
the local port_update notification will be ignored in
the first RPC loop.
Related-Bug: #1813703
Change-Id: Ic5bf718cfd056f805741892a91a8d45f7a6e0db3
(cherry picked from commit eaf3ff5786)
The Neutron dhcp agents reports all ready ports to the Neutron
server via the dhcp_ready_on_ports() rpc call. When the dhcp agent
gets ports ready faster than the server can process them the amount
of ports per rpc call can grow so high (e.g. 10000 Ports) that the
neutron server never has a chance of processing the request before
the rpc timeout kills the request, leading to the dhcp agent
sending the request again, resulting in an endless loop of
dhcp_ready_on_ports() calls. This happens especially on agent startup.
To mitigate this problems we now limit the number of ports sent
per dhcp_ready_on_ports() call.
Closes-bug: #1834257
Change-Id: I407e126e760ebf6aca4c31b9c3ff58dcfa55107f
(cherry picked from commit 76ccdb35d4)
The OVS Firewall blocks traffic that does not have either the IPv4 or
IPv6 ethertypes at present. This is a behavior change compared to the
iptables_hybrid firewall, which only operates on IP packets and thus
does not address other ethertypes.
This is a lightweight change that sets a configuration option in the
neutron openvswitch agent configuration file for permitted ethertypes
and then ensures that the requested ethertypes are permitted on
initialization. This addresses the security and usability concerns on
both master and stable branches while a full-fledged extension to the
security groups API is considered.
Change-Id: Ide78b0b90cf6d6069ce3787fc60766be52062da0
Related-Bug: #1832758
(cherry picked from commit 9ea6a61665)
In patch [1] handle of networks with "shared" flag set to True was
fixed and it is now possible to use "rule:shared" in API policy in
actions related e.g. to ports or subnets.
But network can be shared with some specific tenant only by doing it
with RBAC mechanism and in such case it didn't work with [1] only.
It was like that because context.get_admin_context() was used to get
network so this returned network had got shared=False set even if
request comes from tenant for which network was shared through RBAC.
Now network will be always get with context which have got set proper
tenant_id so "shared" flag will be set properly even in case if it's
shared through RBAC.
[1] https://review.opendev.org/#/c/652636/
Change-Id: I38615c0d18bb5a1f22f3e7865ce24615a540aa9a
Closes-Bug: #1833455
(cherry picked from commit d5edb080b0)
Use netaddr.IPNetwork to convert CIDRs in a query
into proper subnets.
When creating subnets the CIDR is converted into
a proper subnet. i.e a subnet can be created with
cidr=10.0.0.11/24 in the create request. On create
the cidr is turned into a proper subnet resulting
in a subnet with cidr=10.0.0.0/24.
This change does the same to CIDRs when used as
query filters during a list subnet request, so
that the same value that was used to create the
subnet can be used to find the subnet.
Conflicts:
neutron/db/db_base_plugin_common.py
Closes-Bug: #1831811
Change-Id: I8ae478a03ceedc6c3b1ae1d40081b5e5158813e6
(cherry picked from commit af77355732)