From e37722c0f5f0b746135200db6f654674dc0f6f12 Mon Sep 17 00:00:00 2001 From: Nate Johnston Date: Tue, 24 Mar 2020 18:05:16 -0400 Subject: [PATCH] Wait before deleting trunk bridges for DPDK vhu DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered off the port is deleted, and when an instance is powered on a port is created. This means a reboot is functionally a super fast delete-then-create. Neutron trunking mode in combination with DPDK/vhu implements a trunk bridge for each tenant, and the ports for the instances are created as subports of that bridge. The standard way a trunk bridge works is that when all the subports are deleted, a thread is spawned to delete the trunk bridge, because that is an expensive and time-consuming operation. That means that if the port in question is the only port on the trunk on that compute node, this happens: 1. The port is deleted 2. A thread is spawned to delete the trunk 3. The port is recreated If the trunk is deleted after #3 happens then the instance has no networking and is inaccessible; this is the scenario that was dealt with in a previous change [1]. But there continue to be issues with errors "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is happening in this case is that the trunk is being deleted in the middle of the execution of #3, so that it stops existing in the middle of the port creation logic but before the port is actually recreated. Since this is a timing issue between two different threads it's difficult to stamp out entirely, but I think the best way to do it is to add a slight delay in the trunk deletion thread, just a second or two. That will give the port time to come back online and avoid the trunk deletion entirely. [1] https://review.opendev.org/623275 Related-Bug: #1869244 Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b --- .../services/trunk/drivers/openvswitch/agent/ovsdb_handler.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/neutron/services/trunk/drivers/openvswitch/agent/ovsdb_handler.py b/neutron/services/trunk/drivers/openvswitch/agent/ovsdb_handler.py index c789314fbb9..8b6e3b67078 100644 --- a/neutron/services/trunk/drivers/openvswitch/agent/ovsdb_handler.py +++ b/neutron/services/trunk/drivers/openvswitch/agent/ovsdb_handler.py @@ -14,6 +14,7 @@ # under the License. import functools +import time import eventlet from neutron_lib.callbacks import events @@ -43,6 +44,7 @@ from neutron.services.trunk.rpc import agent LOG = logging.getLogger(__name__) DEFAULT_WAIT_FOR_PORT_TIMEOUT = 60 +WAIT_BEFORE_TRUNK_DELETE = 3 def lock_on_bridge_name(required_parameter): @@ -216,6 +218,7 @@ class OVSDBHandler(object): # try to mitigate the issue by checking if there is a port on the # bridge and if so then do not remove it. bridge = ovs_lib.OVSBridge(bridge_name) + time.sleep(WAIT_BEFORE_TRUNK_DELETE) if bridge_has_instance_port(bridge): LOG.debug("The bridge %s has instances attached so it will not " "be deleted.", bridge_name)