Wait before deleting trunk bridges for DPDK vhu
DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered off the port is deleted, and when an instance is powered on a port is created. This means a reboot is functionally a super fast delete-then-create. Neutron trunking mode in combination with DPDK/vhu implements a trunk bridge for each tenant, and the ports for the instances are created as subports of that bridge. The standard way a trunk bridge works is that when all the subports are deleted, a thread is spawned to delete the trunk bridge, because that is an expensive and time-consuming operation. That means that if the port in question is the only port on the trunk on that compute node, this happens: 1. The port is deleted 2. A thread is spawned to delete the trunk 3. The port is recreated If the trunk is deleted after #3 happens then the instance has no networking and is inaccessible; this is the scenario that was dealt with in a previous change [1]. But there continue to be issues with errors "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is happening in this case is that the trunk is being deleted in the middle of the execution of #3, so that it stops existing in the middle of the port creation logic but before the port is actually recreated. Since this is a timing issue between two different threads it's difficult to stamp out entirely, but I think the best way to do it is to add a slight delay in the trunk deletion thread, just a second or two. That will give the port time to come back online and avoid the trunk deletion entirely. [1] https://review.opendev.org/623275 Related-Bug: #1869244 Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b
This commit is contained in:
parent
56c7db5e7f
commit
e37722c0f5
|
@ -14,6 +14,7 @@
|
|||
# under the License.
|
||||
|
||||
import functools
|
||||
import time
|
||||
|
||||
import eventlet
|
||||
from neutron_lib.callbacks import events
|
||||
|
@ -43,6 +44,7 @@ from neutron.services.trunk.rpc import agent
|
|||
LOG = logging.getLogger(__name__)
|
||||
|
||||
DEFAULT_WAIT_FOR_PORT_TIMEOUT = 60
|
||||
WAIT_BEFORE_TRUNK_DELETE = 3
|
||||
|
||||
|
||||
def lock_on_bridge_name(required_parameter):
|
||||
|
@ -216,6 +218,7 @@ class OVSDBHandler(object):
|
|||
# try to mitigate the issue by checking if there is a port on the
|
||||
# bridge and if so then do not remove it.
|
||||
bridge = ovs_lib.OVSBridge(bridge_name)
|
||||
time.sleep(WAIT_BEFORE_TRUNK_DELETE)
|
||||
if bridge_has_instance_port(bridge):
|
||||
LOG.debug("The bridge %s has instances attached so it will not "
|
||||
"be deleted.", bridge_name)
|
||||
|
|
Loading…
Reference in New Issue