Wait before deleting trunk bridges for DPDK vhu
DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
off the port is deleted, and when an instance is powered on a port is
created. This means a reboot is functionally a super fast
delete-then-create. Neutron trunking mode in combination with DPDK/vhu
implements a trunk bridge for each tenant, and the ports for the
instances are created as subports of that bridge. The standard way a
trunk bridge works is that when all the subports are deleted, a thread
is spawned to delete the trunk bridge, because that is an expensive and
time-consuming operation. That means that if the port in question is
the only port on the trunk on that compute node, this happens:
1. The port is deleted
2. A thread is spawned to delete the trunk
3. The port is recreated
If the trunk is deleted after #3 happens then the instance has no
networking and is inaccessible; this is the scenario that was dealt with
in a previous change [1]. But there continue to be issues with errors
"RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X". What is
happening in this case is that the trunk is being deleted in the middle
of the execution of #3, so that it stops existing in the middle of the
port creation logic but before the port is actually recreated.
Since this is a timing issue between two different threads it's
difficult to stamp out entirely, but I think the best way to do it is to
add a slight delay in the trunk deletion thread, just a second or two.
That will give the port time to come back online and avoid the trunk
deletion entirely.
[1] https://review.opendev.org/623275
Related-Bug: #1869244
Change-Id: I36a98fe5da85da1f3a0315dd1a470f062de6f38b
(cherry picked from commit e37722c0f5
)
This commit is contained in:
parent
7b406a832c
commit
92b2d9c25a
|
@ -14,6 +14,7 @@
|
|||
# under the License.
|
||||
|
||||
import functools
|
||||
import time
|
||||
|
||||
import eventlet
|
||||
from neutron_lib.callbacks import events
|
||||
|
@ -43,6 +44,7 @@ from neutron.services.trunk.rpc import agent
|
|||
LOG = logging.getLogger(__name__)
|
||||
|
||||
DEFAULT_WAIT_FOR_PORT_TIMEOUT = 60
|
||||
WAIT_BEFORE_TRUNK_DELETE = 3
|
||||
|
||||
|
||||
def lock_on_bridge_name(required_parameter):
|
||||
|
@ -216,6 +218,7 @@ class OVSDBHandler(object):
|
|||
# try to mitigate the issue by checking if there is a port on the
|
||||
# bridge and if so then do not remove it.
|
||||
bridge = ovs_lib.OVSBridge(bridge_name)
|
||||
time.sleep(WAIT_BEFORE_TRUNK_DELETE)
|
||||
if bridge_has_instance_port(bridge):
|
||||
LOG.debug("The bridge %s has instances attached so it will not "
|
||||
"be deleted.", bridge_name)
|
||||
|
|
Loading…
Reference in New Issue