[Docs] Guide about running and debugging fullstack tests

This patch moves detailed description about fullstack tests from
the general TESTING.rst document to the fullstack guide which is in
contributor/testing/fullstack.rst

It also adds new sections about running and debugging fullstack tests
locally and how to investigate failures which happens in the fullstack
jobs in the gate.

Change-Id: I2539420411e8fb2f54a5da9d9047171fd37bfb11
This commit is contained in:
Slawek Kaplonski 2020-11-26 17:15:33 +01:00 committed by Brian Haley
parent e59a7c9aca
commit 94f752bdee
2 changed files with 255 additions and 89 deletions

View File

@ -234,83 +234,7 @@ that the test requires. Developers further benefit from full stack testing as
it can sufficiently simulate a real environment and provide a rapidly
reproducible way to verify code while you're still writing it.
How?
++++
Full stack tests set up their own Neutron processes (Server & agents). They
assume a working Rabbit and MySQL server before the run starts. Instructions
on how to run fullstack tests on a VM are available below.
Each test defines its own topology (What and how many servers and agents should
be running).
Since the test runs on the machine itself, full stack testing enables
"white box" testing. This means that you can, for example, create a router
through the API and then assert that a namespace was created for it.
Full stack tests run in the Neutron tree with Neutron resources alone. You
may use the Neutron API (The Neutron server is set to NOAUTH so that Keystone
is out of the picture). VMs may be simulated with a container-like class:
neutron.tests.fullstack.resources.machine.FakeFullstackMachine.
An example of its usage may be found at:
neutron/tests/fullstack/test_connectivity.py.
Full stack testing can simulate multi node testing by starting an agent
multiple times. Specifically, each node would have its own copy of the
OVS/LinuxBridge/DHCP/L3 agents, all configured with the same "host" value.
Each OVS agent is connected to its own pair of br-int/br-ex, and those bridges
are then interconnected.
For LinuxBridge agent each agent is started in its own namespace, called
"host-<some_random_value>". Such namespaces are connected with OVS "central"
bridge to each other.
.. image:: images/fullstack_multinode_simulation.png
Segmentation at the database layer is guaranteed by creating a database
per test. The messaging layer achieves segmentation by utilizing a RabbitMQ
feature called 'vhosts'. In short, just like a MySQL server serve multiple
databases, so can a RabbitMQ server serve multiple messaging domains.
Exchanges and queues in one 'vhost' are segmented from those in another
'vhost'.
Please note that if the change you would like to test using fullstack tests
involves a change to python-neutronclient as well as neutron, then you should
make sure your fullstack tests are in a separate third change that depends on
the python-neutronclient change using the 'Depends-On' tag in the commit
message. You will need to wait for the next release of python-neutronclient,
and a minimum version bump for python-neutronclient in the global requirements,
before your fullstack tests will work in the gate. This is because tox uses
the version of python-neutronclient listed in the upper-constraints.txt file in
the openstack/requirements repository.
When?
+++++
1) You'd like to test the interaction between Neutron components (Server
and agents) and have already tested each component in isolation via unit or
functional tests. You should have many unit tests, fewer tests to test
a component and even fewer to test their interaction. Edge cases should
not be tested with full stack testing.
2) You'd like to increase coverage by testing features that require multi node
testing such as l2pop, L3 HA and DVR.
3) You'd like to test agent restarts. We've found bugs in the OVS, DHCP and
L3 agents and haven't found an effective way to test these scenarios. Full
stack testing can help here as the full stack infrastructure can restart an
agent during the test.
Example
+++++++
Neutron offers a Quality of Service API, initially offering bandwidth
capping at the port level. In the reference implementation, it does this by
utilizing an OVS feature.
neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle
is a positive example of how the fullstack testing infrastructure should be used.
It creates a network, subnet, QoS policy & rule and a port utilizing that policy.
It then asserts that the expected bandwidth limitation is present on the OVS
bridge connected to that port. The test is a true integration test, in the
sense that it invokes the API and then asserts that Neutron interacted with
the hypervisor appropriately.
More details can be found in :ref:`FullStack Testing<fullstack_testing>` guide.
Gate exceptions
+++++++++++++++

View File

@ -20,20 +20,262 @@
''''''' Heading 4
(Avoid deeper levels because they do not render well.)
.. _fullstack_testing:
Full Stack Testing
==================
Goals
-----
How?
++++
* Stabilize the job:
- Fix L3 HA failure
- Look in to non-deterministic failures when adding a large amount of
tests (Possibly bug 1486199).
- Switch to kill signal 15 to terminate agents (Bug 1487548).
* Convert the L3 HA failover functional test to a full stack test
* Write DVR tests
* Write additional L3 HA tests
* Write a test that validates DVR + L3 HA integration after
https://bugs.launchpad.net/neutron/+bug/1365473 is fixed.
Full stack tests set up their own Neutron processes (Server & agents). They
assume a working Rabbit and MySQL server before the run starts. Instructions
on how to run fullstack tests on a VM are available below.
Each test defines its own topology (What and how many servers and agents should
be running).
Since the test runs on the machine itself, full stack testing enables
"white box" testing. This means that you can, for example, create a router
through the API and then assert that a namespace was created for it.
Full stack tests run in the Neutron tree with Neutron resources alone. You
may use the Neutron API (The Neutron server is set to NOAUTH so that Keystone
is out of the picture). VMs may be simulated with a container-like class:
neutron.tests.fullstack.resources.machine.FakeFullstackMachine.
An example of its usage may be found at:
neutron/tests/fullstack/test_connectivity.py.
Full stack testing can simulate multi node testing by starting an agent
multiple times. Specifically, each node would have its own copy of the
OVS/LinuxBridge/DHCP/L3 agents, all configured with the same "host" value.
Each OVS agent is connected to its own pair of br-int/br-ex, and those bridges
are then interconnected.
For LinuxBridge agent each agent is started in its own namespace, called
"host-<some_random_value>". Such namespaces are connected with OVS "central"
bridge to each other.
.. image:: images/fullstack_multinode_simulation.png
Segmentation at the database layer is guaranteed by creating a database
per test. The messaging layer achieves segmentation by utilizing a RabbitMQ
feature called 'vhosts'. In short, just like a MySQL server serve multiple
databases, so can a RabbitMQ server serve multiple messaging domains.
Exchanges and queues in one 'vhost' are segmented from those in another
'vhost'.
Please note that if the change you would like to test using fullstack tests
involves a change to python-neutronclient as well as neutron, then you should
make sure your fullstack tests are in a separate third change that depends on
the python-neutronclient change using the 'Depends-On' tag in the commit
message. You will need to wait for the next release of python-neutronclient,
and a minimum version bump for python-neutronclient in the global requirements,
before your fullstack tests will work in the gate. This is because tox uses
the version of python-neutronclient listed in the upper-constraints.txt file in
the openstack/requirements repository.
When?
+++++
1) You'd like to test the interaction between Neutron components (Server
and agents) and have already tested each component in isolation via unit or
functional tests. You should have many unit tests, fewer tests to test
a component and even fewer to test their interaction. Edge cases should
not be tested with full stack testing.
2) You'd like to increase coverage by testing features that require multi node
testing such as l2pop, L3 HA and DVR.
3) You'd like to test agent restarts. We've found bugs in the OVS, DHCP and
L3 agents and haven't found an effective way to test these scenarios. Full
stack testing can help here as the full stack infrastructure can restart an
agent during the test.
Example
+++++++
Neutron offers a Quality of Service API, initially offering bandwidth
capping at the port level. In the reference implementation, it does this by
utilizing an OVS feature.
neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle
is a positive example of how the fullstack testing infrastructure should be used.
It creates a network, subnet, QoS policy & rule and a port utilizing that policy.
It then asserts that the expected bandwidth limitation is present on the OVS
bridge connected to that port. The test is a true integration test, in the
sense that it invokes the API and then asserts that Neutron interacted with
the hypervisor appropriately.
How to run fullstack tests locally?
+++++++++++++++++++++++++++++++++++
Fullstack tests can be run locally. That makes it much easier to understand
exactly how it works, debug issues in the existing tests or write new ones.
To run fullstack tests locally, you should clone
`Devstack <https://opendev.org/openstack/devstack/>` and `Neutron
<https://opendev.org/openstack/neutron>` repositories. When repositories are
available locally, the first thing which needs to be done is preparation of the
environment. There is a simple script in Neutron to do that.
.. code-block:: console
$ export VENV=dsvm-fullstack
$ tools/configure_for_func_testing.sh /opt/stack/devstack -i
This will prepare needed files, install required packages, etc. When it is
done you should see a message like:
.. code-block:: console
Phew, we're done!
That means that all went well and you should be ready to run fullstack tests
locally. Of course there are many tests there and running all of them can
take a pretty long time so lets try to run just one:
.. code-block:: console
$ tox -e dsvm-fullstack neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle
dsvm-fullstack create: /opt/stack/neutron/.tox/dsvm-fullstack
dsvm-fullstack installdeps: -chttps://releases.openstack.org/constraints/upper/master, -r/opt/stack/neutron/requirements.txt, -r/opt/stack/neutron/test-requirements.txt, -r/opt/stack/neutron/neutron/tests/functional/requirements.txt
dsvm-fullstack develop-inst: /opt/stack/neutron
{0} neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle(ingress) [40.395436s] ... ok
{1} neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle(egress) [43.277898s] ... ok
Stopping rootwrap daemon process with pid=12657
Running upgrade for neutron ...
OK
/usr/lib/python3.8/subprocess.py:942: ResourceWarning: subprocess 13475 is still running
_warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Stopping rootwrap daemon process with pid=12669
Running upgrade for neutron ...
OK
/usr/lib/python3.8/subprocess.py:942: ResourceWarning: subprocess 13477 is still running
_warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
======
Totals
======
Ran: 2 tests in 43.3367 sec.
- Passed: 2
- Skipped: 0
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 0
Sum of execute time for each test: 83.6733 sec.
==============
Worker Balance
==============
- Worker 0 (1 tests) => 0:00:40.395436
- Worker 1 (1 tests) => 0:00:43.277898
___________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________
dsvm-fullstack: commands succeeded
congratulations :)
That means that our test was run successfully.
Now you can start hacking, write new fullstack tests or debug failing ones as
needed.
Debugging tests locally
+++++++++++++++++++++++
If you need to debug a fullstack test locally you can use the ``remote_pdb``
module for that. First need to install remote_pdb module in the virtual
environment created for fullstack testing by tox.
.. code-block:: console
$ .tox/dsvm-fullstack/bin/pip install remote_pdb
Then you need to install a breakpoint in your code. For example, lets do that
in the
neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle
module:
.. code-block:: python
def test_bw_limit_qos_policy_rule_lifecycle(self):
import remote_pdb; remote_pdb.set_trace(port=1234)
new_limit = BANDWIDTH_LIMIT + 100
Now you can run the test again:
.. code-block:: console
$ tox -e dsvm-fullstack neutron.tests.fullstack.test_qos.TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle
It will pause with message like:
.. code-block:: console
RemotePdb session open at 127.0.0.1:1234, waiting for connection ...
And now you can start debugging using ``telnet`` tool:
.. code-block:: console
$ telnet 127.0.0.1 1234
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
>
/opt/stack/neutron/neutron/tests/fullstack/test_qos.py(208)test_bw_limit_qos_policy_rule_lifecycle()
-> new_limit = BANDWIDTH_LIMIT + 100
(Pdb)
From that point you can start debugging your code in the same way you
usually do with ``pdb`` module.
Checking test logs
++++++++++++++++++
Each fullstack test is spawning its own, isolated environment with needed
services. So, for example, it can be ``neutron-server``, ``neutron-ovs-agent``
or ``neutron-dhcp-agent``. And often there is a need to check logs of some of
those processes. That is of course possible when running fullstack tests
locally. By default, logs are stored in ``/opt/stack/logs/dsvm-fullstack-logs``.
The logs directory can be defined by the environment variable ``OS_LOG_PATH``.
In that directory there are directories with names matching names of the
tests, for example:
.. code-block:: console
$ ls -l
total 224
drwxr-xr-x 2 vagrant vagrant 4096 Nov 26 16:49 TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_
-rw-rw-r-- 1 vagrant vagrant 94928 Nov 26 16:50 TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_.txt
drwxr-xr-x 2 vagrant vagrant 4096 Nov 26 16:49 TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_ingress_
-rw-rw-r-- 1 vagrant vagrant 121027 Nov 26 16:54 TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_ingress_.txt
For each test there is a directory and txt file with the same name. This txt
file contains the log from the test runner. So you can check exactly what was
done by the test when it was run. This file contains logs from all runs of the
same test. So if you run the test 10 times, you will have the logs from all
10 runs of the test.
In the directory with same name there are logs from the neutron services run
during the test, for example:
.. code-block:: console
$ ls -l TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_ingress_/
total 1836
-rw-rw-r-- 1 vagrant vagrant 333371 Nov 26 16:40 neutron-openvswitch-agent--2020-11-26--16-40-38-818499.log
-rw-rw-r-- 1 vagrant vagrant 552097 Nov 26 16:53 neutron-openvswitch-agent--2020-11-26--16-49-29-716615.log
-rw-rw-r-- 1 vagrant vagrant 461483 Nov 26 16:41 neutron-server--2020-11-26--16-40-35-875937.log
-rw-rw-r-- 1 vagrant vagrant 526070 Nov 26 16:54 neutron-server--2020-11-26--16-49-26-758447.log
Here each file is only from one run and one service. In the name of the file
there is timestamp of when the service was started.
Debugging fullstack failures in the gate
++++++++++++++++++++++++++++++++++++++++
Sometimes there is a need to investigate reason that a test failed in the gate.
After every ``neutron-fullstack`` job run, on the Zuul job page there are logs
available. In the directory ``controller/logs/dsvm-fullstack-logs`` you can find
exactly the same files with logs from each test case as mentioned above.
You can also check, for example, the journal log from the node where the tests
were run. All those logs are available in the file
``controller/logs/devstack.journal.xz`` in the jobs logs.
In ``controller/logs/devstack.journal.README.txt`` there are also
instructions on how to download and check those journal logs locally.