From 94f752bdee85183fcf9f17d41af64d19678dcae5 Mon Sep 17 00:00:00 2001 From: Slawek Kaplonski Date: Thu, 26 Nov 2020 17:15:33 +0100 Subject: [PATCH] [Docs] Guide about running and debugging fullstack tests This patch moves detailed description about fullstack tests from the general TESTING.rst document to the fullstack guide which is in contributor/testing/fullstack.rst It also adds new sections about running and debugging fullstack tests locally and how to investigate failures which happens in the fullstack jobs in the gate. Change-Id: I2539420411e8fb2f54a5da9d9047171fd37bfb11 --- TESTING.rst | 78 +----- doc/source/contributor/testing/fullstack.rst | 266 ++++++++++++++++++- 2 files changed, 255 insertions(+), 89 deletions(-) diff --git a/TESTING.rst b/TESTING.rst index ba927d6f6e4..c3857b06a72 100644 --- a/TESTING.rst +++ b/TESTING.rst @@ -234,83 +234,7 @@ that the test requires. Developers further benefit from full stack testing as it can sufficiently simulate a real environment and provide a rapidly reproducible way to verify code while you're still writing it. -How? -++++ - -Full stack tests set up their own Neutron processes (Server & agents). They -assume a working Rabbit and MySQL server before the run starts. Instructions -on how to run fullstack tests on a VM are available below. - -Each test defines its own topology (What and how many servers and agents should -be running). - -Since the test runs on the machine itself, full stack testing enables -"white box" testing. This means that you can, for example, create a router -through the API and then assert that a namespace was created for it. - -Full stack tests run in the Neutron tree with Neutron resources alone. You -may use the Neutron API (The Neutron server is set to NOAUTH so that Keystone -is out of the picture). VMs may be simulated with a container-like class: -neutron.tests.fullstack.resources.machine.FakeFullstackMachine. -An example of its usage may be found at: -neutron/tests/fullstack/test_connectivity.py. - -Full stack testing can simulate multi node testing by starting an agent -multiple times. Specifically, each node would have its own copy of the -OVS/LinuxBridge/DHCP/L3 agents, all configured with the same "host" value. -Each OVS agent is connected to its own pair of br-int/br-ex, and those bridges -are then interconnected. -For LinuxBridge agent each agent is started in its own namespace, called -"host-". Such namespaces are connected with OVS "central" -bridge to each other. - -.. image:: images/fullstack_multinode_simulation.png - -Segmentation at the database layer is guaranteed by creating a database -per test. The messaging layer achieves segmentation by utilizing a RabbitMQ -feature called 'vhosts'. In short, just like a MySQL server serve multiple -databases, so can a RabbitMQ server serve multiple messaging domains. -Exchanges and queues in one 'vhost' are segmented from those in another -'vhost'. - -Please note that if the change you would like to test using fullstack tests -involves a change to python-neutronclient as well as neutron, then you should -make sure your fullstack tests are in a separate third change that depends on -the python-neutronclient change using the 'Depends-On' tag in the commit -message. You will need to wait for the next release of python-neutronclient, -and a minimum version bump for python-neutronclient in the global requirements, -before your fullstack tests will work in the gate. This is because tox uses -the version of python-neutronclient listed in the upper-constraints.txt file in -the openstack/requirements repository. - -When? -+++++ - -1) You'd like to test the interaction between Neutron components (Server - and agents) and have already tested each component in isolation via unit or - functional tests. You should have many unit tests, fewer tests to test - a component and even fewer to test their interaction. Edge cases should - not be tested with full stack testing. -2) You'd like to increase coverage by testing features that require multi node - testing such as l2pop, L3 HA and DVR. -3) You'd like to test agent restarts. We've found bugs in the OVS, DHCP and - L3 agents and haven't found an effective way to test these scenarios. Full - stack testing can help here as the full stack infrastructure can restart an - agent during the test. - -Example -+++++++ - -Neutron offers a Quality of Service API, initially offering bandwidth -capping at the port level. In the reference implementation, it does this by -utilizing an OVS feature. -neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle -is a positive example of how the fullstack testing infrastructure should be used. -It creates a network, subnet, QoS policy & rule and a port utilizing that policy. -It then asserts that the expected bandwidth limitation is present on the OVS -bridge connected to that port. The test is a true integration test, in the -sense that it invokes the API and then asserts that Neutron interacted with -the hypervisor appropriately. +More details can be found in :ref:`FullStack Testing` guide. Gate exceptions +++++++++++++++ diff --git a/doc/source/contributor/testing/fullstack.rst b/doc/source/contributor/testing/fullstack.rst index fa642f1ba55..50ffd5aacc2 100644 --- a/doc/source/contributor/testing/fullstack.rst +++ b/doc/source/contributor/testing/fullstack.rst @@ -20,20 +20,262 @@ ''''''' Heading 4 (Avoid deeper levels because they do not render well.) +.. _fullstack_testing: Full Stack Testing ================== -Goals ------ +How? +++++ -* Stabilize the job: - - Fix L3 HA failure - - Look in to non-deterministic failures when adding a large amount of - tests (Possibly bug 1486199). - - Switch to kill signal 15 to terminate agents (Bug 1487548). -* Convert the L3 HA failover functional test to a full stack test -* Write DVR tests -* Write additional L3 HA tests -* Write a test that validates DVR + L3 HA integration after - https://bugs.launchpad.net/neutron/+bug/1365473 is fixed. +Full stack tests set up their own Neutron processes (Server & agents). They +assume a working Rabbit and MySQL server before the run starts. Instructions +on how to run fullstack tests on a VM are available below. + +Each test defines its own topology (What and how many servers and agents should +be running). + +Since the test runs on the machine itself, full stack testing enables +"white box" testing. This means that you can, for example, create a router +through the API and then assert that a namespace was created for it. + +Full stack tests run in the Neutron tree with Neutron resources alone. You +may use the Neutron API (The Neutron server is set to NOAUTH so that Keystone +is out of the picture). VMs may be simulated with a container-like class: +neutron.tests.fullstack.resources.machine.FakeFullstackMachine. +An example of its usage may be found at: +neutron/tests/fullstack/test_connectivity.py. + +Full stack testing can simulate multi node testing by starting an agent +multiple times. Specifically, each node would have its own copy of the +OVS/LinuxBridge/DHCP/L3 agents, all configured with the same "host" value. +Each OVS agent is connected to its own pair of br-int/br-ex, and those bridges +are then interconnected. +For LinuxBridge agent each agent is started in its own namespace, called +"host-". Such namespaces are connected with OVS "central" +bridge to each other. + +.. image:: images/fullstack_multinode_simulation.png + +Segmentation at the database layer is guaranteed by creating a database +per test. The messaging layer achieves segmentation by utilizing a RabbitMQ +feature called 'vhosts'. In short, just like a MySQL server serve multiple +databases, so can a RabbitMQ server serve multiple messaging domains. +Exchanges and queues in one 'vhost' are segmented from those in another +'vhost'. + +Please note that if the change you would like to test using fullstack tests +involves a change to python-neutronclient as well as neutron, then you should +make sure your fullstack tests are in a separate third change that depends on +the python-neutronclient change using the 'Depends-On' tag in the commit +message. You will need to wait for the next release of python-neutronclient, +and a minimum version bump for python-neutronclient in the global requirements, +before your fullstack tests will work in the gate. This is because tox uses +the version of python-neutronclient listed in the upper-constraints.txt file in +the openstack/requirements repository. + +When? ++++++ + +1) You'd like to test the interaction between Neutron components (Server + and agents) and have already tested each component in isolation via unit or + functional tests. You should have many unit tests, fewer tests to test + a component and even fewer to test their interaction. Edge cases should + not be tested with full stack testing. +2) You'd like to increase coverage by testing features that require multi node + testing such as l2pop, L3 HA and DVR. +3) You'd like to test agent restarts. We've found bugs in the OVS, DHCP and + L3 agents and haven't found an effective way to test these scenarios. Full + stack testing can help here as the full stack infrastructure can restart an + agent during the test. + +Example ++++++++ + +Neutron offers a Quality of Service API, initially offering bandwidth +capping at the port level. In the reference implementation, it does this by +utilizing an OVS feature. +neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle +is a positive example of how the fullstack testing infrastructure should be used. +It creates a network, subnet, QoS policy & rule and a port utilizing that policy. +It then asserts that the expected bandwidth limitation is present on the OVS +bridge connected to that port. The test is a true integration test, in the +sense that it invokes the API and then asserts that Neutron interacted with +the hypervisor appropriately. + +How to run fullstack tests locally? ++++++++++++++++++++++++++++++++++++ + +Fullstack tests can be run locally. That makes it much easier to understand +exactly how it works, debug issues in the existing tests or write new ones. +To run fullstack tests locally, you should clone +`Devstack ` and `Neutron +` repositories. When repositories are +available locally, the first thing which needs to be done is preparation of the +environment. There is a simple script in Neutron to do that. + +.. code-block:: console + + $ export VENV=dsvm-fullstack + $ tools/configure_for_func_testing.sh /opt/stack/devstack -i + +This will prepare needed files, install required packages, etc. When it is +done you should see a message like: + +.. code-block:: console + + Phew, we're done! + +That means that all went well and you should be ready to run fullstack tests +locally. Of course there are many tests there and running all of them can +take a pretty long time so lets try to run just one: + +.. code-block:: console + + $ tox -e dsvm-fullstack neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle + dsvm-fullstack create: /opt/stack/neutron/.tox/dsvm-fullstack + dsvm-fullstack installdeps: -chttps://releases.openstack.org/constraints/upper/master, -r/opt/stack/neutron/requirements.txt, -r/opt/stack/neutron/test-requirements.txt, -r/opt/stack/neutron/neutron/tests/functional/requirements.txt + dsvm-fullstack develop-inst: /opt/stack/neutron + {0} neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle(ingress) [40.395436s] ... ok + {1} neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle(egress) [43.277898s] ... ok + Stopping rootwrap daemon process with pid=12657 + Running upgrade for neutron ... + OK + /usr/lib/python3.8/subprocess.py:942: ResourceWarning: subprocess 13475 is still running + _warn("subprocess %s is still running" % self.pid, + ResourceWarning: Enable tracemalloc to get the object allocation traceback + Stopping rootwrap daemon process with pid=12669 + Running upgrade for neutron ... + OK + /usr/lib/python3.8/subprocess.py:942: ResourceWarning: subprocess 13477 is still running + _warn("subprocess %s is still running" % self.pid, + ResourceWarning: Enable tracemalloc to get the object allocation traceback + + ====== + Totals + ====== + Ran: 2 tests in 43.3367 sec. + - Passed: 2 + - Skipped: 0 + - Expected Fail: 0 + - Unexpected Success: 0 + - Failed: 0 + Sum of execute time for each test: 83.6733 sec. + + ============== + Worker Balance + ============== + - Worker 0 (1 tests) => 0:00:40.395436 + - Worker 1 (1 tests) => 0:00:43.277898 + ___________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________ + dsvm-fullstack: commands succeeded + congratulations :) + +That means that our test was run successfully. +Now you can start hacking, write new fullstack tests or debug failing ones as +needed. + +Debugging tests locally ++++++++++++++++++++++++ + +If you need to debug a fullstack test locally you can use the ``remote_pdb`` +module for that. First need to install remote_pdb module in the virtual +environment created for fullstack testing by tox. + +.. code-block:: console + + $ .tox/dsvm-fullstack/bin/pip install remote_pdb + +Then you need to install a breakpoint in your code. For example, lets do that +in the +neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle +module: + +.. code-block:: python + + def test_bw_limit_qos_policy_rule_lifecycle(self): + import remote_pdb; remote_pdb.set_trace(port=1234) + new_limit = BANDWIDTH_LIMIT + 100 + +Now you can run the test again: + +.. code-block:: console + + $ tox -e dsvm-fullstack neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle + +It will pause with message like: + +.. code-block:: console + + RemotePdb session open at 127.0.0.1:1234, waiting for connection ... + +And now you can start debugging using ``telnet`` tool: + +.. code-block:: console + + $ telnet 127.0.0.1 1234 + Trying 127.0.0.1... + Connected to 127.0.0.1. + Escape character is '^]'. + > + /opt/stack/neutron/neutron/tests/fullstack/test_qos.py(208)test_bw_limit_qos_policy_rule_lifecycle() + -> new_limit = BANDWIDTH_LIMIT + 100 + (Pdb) + +From that point you can start debugging your code in the same way you +usually do with ``pdb`` module. + +Checking test logs +++++++++++++++++++ + +Each fullstack test is spawning its own, isolated environment with needed +services. So, for example, it can be ``neutron-server``, ``neutron-ovs-agent`` +or ``neutron-dhcp-agent``. And often there is a need to check logs of some of +those processes. That is of course possible when running fullstack tests +locally. By default, logs are stored in ``/opt/stack/logs/dsvm-fullstack-logs``. +The logs directory can be defined by the environment variable ``OS_LOG_PATH``. +In that directory there are directories with names matching names of the +tests, for example: + +.. code-block:: console + + $ ls -l + total 224 + drwxr-xr-x 2 vagrant vagrant 4096 Nov 26 16:49 TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_ + -rw-rw-r-- 1 vagrant vagrant 94928 Nov 26 16:50 TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_.txt + drwxr-xr-x 2 vagrant vagrant 4096 Nov 26 16:49 TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_ingress_ + -rw-rw-r-- 1 vagrant vagrant 121027 Nov 26 16:54 TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_ingress_.txt + +For each test there is a directory and txt file with the same name. This txt +file contains the log from the test runner. So you can check exactly what was +done by the test when it was run. This file contains logs from all runs of the +same test. So if you run the test 10 times, you will have the logs from all +10 runs of the test. +In the directory with same name there are logs from the neutron services run +during the test, for example: + +.. code-block:: console + + $ ls -l TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_ingress_/ + total 1836 + -rw-rw-r-- 1 vagrant vagrant 333371 Nov 26 16:40 neutron-openvswitch-agent--2020-11-26--16-40-38-818499.log + -rw-rw-r-- 1 vagrant vagrant 552097 Nov 26 16:53 neutron-openvswitch-agent--2020-11-26--16-49-29-716615.log + -rw-rw-r-- 1 vagrant vagrant 461483 Nov 26 16:41 neutron-server--2020-11-26--16-40-35-875937.log + -rw-rw-r-- 1 vagrant vagrant 526070 Nov 26 16:54 neutron-server--2020-11-26--16-49-26-758447.log + +Here each file is only from one run and one service. In the name of the file +there is timestamp of when the service was started. + +Debugging fullstack failures in the gate +++++++++++++++++++++++++++++++++++++++++ + +Sometimes there is a need to investigate reason that a test failed in the gate. +After every ``neutron-fullstack`` job run, on the Zuul job page there are logs +available. In the directory ``controller/logs/dsvm-fullstack-logs`` you can find +exactly the same files with logs from each test case as mentioned above. + +You can also check, for example, the journal log from the node where the tests +were run. All those logs are available in the file +``controller/logs/devstack.journal.xz`` in the jobs logs. +In ``controller/logs/devstack.journal.README.txt`` there are also +instructions on how to download and check those journal logs locally.