nova/nova
Balazs Gibizer 61fc81a676 Prevent leaked eventlets to send notifications
In out functional tests we run nova services as eventlets. Also those
services can spawn there own eventlets for RPC or other parallel
processing. The test case executor only sees and tracks the main
eventlet where the code of the test case is running. When that is
finishes the test executor considers the test case to be finished
regardless of the other spawned eventlets. This could lead to leaked
eventlets that are running in parallel with later test cases.

One way that it can cause trouble is via the global variables in
nova.rpc module. Those globals are re-initialized for each test case so
they are not directly leaking information between test cases. However if
a late eventlet calls nova.rpc.get_versioned_notifier() it will get a
totally usable FakeVersionedNotifier object regardless of which test
case this notifier is belongs to or which test case the eventlet belongs
to. This way the late eventlet can send a notification to the currently
running test case and therefore can make it fail.

The current case we saw is the following:

1) The test case
  nova.tests.functional.test_servers.ServersTestV219.test_description_errors
  creates a server but don't wait for it to reach terminal state (ACTIVE
  / ERROR). This test case finishes quickly but leaks running eventlets
  in the background waiting for some RPC call to return.
2) As the test case finished the cleanup code deletes the test case
   specific setup, including the DB.
3) The test executor moves forward and starts running another test case
4) 60 seconds later the leaked eventlet times out waiting for the RPC
   call to return and tries doing things, but fails as the DB is already
   gone. Then it tries to  report this as an error notification. It calls
   nova.rpc.get_versioned_notifier() and gets a fresh notifier that is
   connected to the currently running test case. Then emits the error
   notification there.
5) The currently running test case also waits for an error notification
   to be triggered by the currently running test code. But it gets the
   notification form the late eventlet first. As the content of the
   notification does not match with the expectations the currently
   running test case fails. The late eventlet prints a lot of
   error about the DB being gone making the troubleshooting pretty hard.

This patch proposes a way to fix this by marking each eventlet at spawn
time with the id of the test case that was directly or indirectly
started it.

Then when the NotificationFixture gets a notification it compares the
test case id stored in the calling eventlet with the id of the test case
initialized the NotificationFixture. If the two ids do not match then
the fixture ignores the notification and raises an exception to the
caller eventlet to make it terminate.

Change-Id: I012dcf63306bae624dc4f66aae6c6d96a20d4327
Closes-Bug: #1946339
2021-10-14 18:27:30 +02:00
..
accelerator smartnic support - reject server move and suspend 2021-08-05 15:58:41 +08:00
api Merge "Support interface attach / detach with new resource request format" 2021-09-02 19:03:47 +00:00
cmd nova-manage: Ensure mountpoint is passed when updating attachment 2021-09-29 11:53:02 +01:00
compute Store old_flavor already on source host during resize 2021-09-27 12:01:20 +02:00
conductor Merge "Support move ops with extended resource request" 2021-08-31 21:38:24 +00:00
conf Merge "workarounds: Remove rbd_volume_local_attach" 2021-09-02 12:16:53 +00:00
console Merge "console: Improve logging" 2021-09-07 14:29:08 +00:00
db Add missing __init__.py in nova/db/api 2021-09-20 11:28:46 +02:00
hacking Add two new hacking rules 2021-09-01 12:26:52 +01:00
image glance: Remove [glance]/allowed_direct_url_schemes 2021-01-28 12:46:57 +00:00
keymgr
locale Imported Translations from Zanata 2020-04-26 07:51:21 +00:00
network Support interface attach / detach with new resource request format 2021-09-01 15:51:47 +02:00
notifications Merge "Allow 'bochs' as a display device option" 2021-09-03 15:07:35 +00:00
objects Update min supported service version for Yoga 2021-10-01 13:09:02 +00:00
pci mypy: Add type annotations to 'nova.pci' 2021-04-26 18:06:21 +01:00
policies policy: Deprecate field from 'os-extended-server-attributes' policy 2021-08-26 10:54:25 +01:00
privsep Retry lvm volume and volume group query 2021-06-15 12:39:26 +02:00
scheduler Support interface attach / detach with new resource request format 2021-09-01 15:51:47 +02:00
servicegroup Remove six.binary_type/integer_types/string_types 2020-12-13 11:25:14 +00:00
storage Stop leaking ceph df cmd in RBD utils 2021-05-11 17:28:56 +02:00
tests Prevent leaked eventlets to send notifications 2021-10-14 18:27:30 +02:00
virt Merge "hardware: Add TODO to remove '(un)pin_cpu_with_siblings'" 2021-09-11 09:15:52 +00:00
volume Remove six.text_type (1/2) 2020-12-13 11:25:31 +00:00
__init__.py
availability_zones.py Remove six.PY2 and six.PY3 2020-08-15 07:45:23 +00:00
baserpc.py
block_device.py fup: Remove unused legacy block_device_info format 2021-08-20 13:26:46 +01:00
cache_utils.py trivial: Remove unused 'cache_utils' APIs 2020-02-05 17:20:28 +00:00
config.py db: Post reshuffle cleanup 2021-08-09 15:34:40 +01:00
context.py db: Unify 'nova.db.api', 'nova.db.sqlalchemy.api' 2021-08-09 15:34:40 +01:00
crypto.py Replace md5 for fips 2021-02-25 16:01:43 -05:00
debugger.py trivial: Remove remaining '_LW' instances 2020-05-18 17:00:41 +01:00
exception.py Convert features not supported error to HTTPBadRequest 2021-09-01 09:09:58 -05:00
exception_wrapper.py rpc: Rework 'get_notifier', 'wrap_exception' 2021-03-01 11:06:48 +00:00
filters.py trivial: Remove remaining '_LI' instances 2020-05-18 17:00:57 +01:00
i18n.py trivial: Remove remaining '_LI' instances 2020-05-18 17:00:57 +01:00
loadables.py
manager.py db: Unify 'nova.db.api', 'nova.db.sqlalchemy.api' 2021-08-09 15:34:40 +01:00
middleware.py Allow X-OpenStack-Nova-API-Version header in CORS 2021-06-15 07:35:36 -04:00
monkey_patch.py Correctly disable greendns 2020-09-11 12:42:04 -04:00
policy.py Reuse code from oslo lib for JSON policy migration 2021-01-14 22:41:33 +00:00
profiler.py
quota.py db: Post reshuffle cleanup 2021-08-09 15:34:40 +01:00
rpc.py rpc: Rework 'get_notifier', 'wrap_exception' 2021-03-01 11:06:48 +00:00
safe_utils.py
service.py Restore retrying the RPC connection to conductor 2020-11-13 18:02:00 +01:00
service_auth.py
test.py Prevent leaked eventlets to send notifications 2021-10-14 18:27:30 +02:00
utils.py Replace getargspec with getfullargspec 2021-05-12 10:50:52 +08:00
version.py Change API unexpected exception message 2021-02-17 21:30:07 +00:00
weights.py Remove six.add_metaclass 2020-08-15 07:45:39 +00:00
wsgi.py trivial: Remove remaining '_LI' instances 2020-05-18 17:00:57 +01:00