Neutron control plane performance and agent restart

The patch set contains the test plan and the report.

Change-Id: I5236b1a3e13b2e54666457c0f271af4246e9b60d
This commit is contained in:
Ilya Shakhat 2016-09-30 19:17:14 +03:00 committed by Ilya Shakhat
parent 9494d3b4b2
commit 89f62fa919
16 changed files with 14776 additions and 0 deletions
doc/source
test_plans/neutron_features
agent_restart
index.rst
test_results/neutron_features
agent_restart
create_and_delete_networks_with_restart_neutron_l3_agent_service
create_and_delete_networks_with_restart_neutron_openvswitch_agent_service
create_and_delete_ports_with_restart_neutron_l3_agent_service
create_and_delete_ports_with_restart_neutron_openvswitch_agent_service
create_and_delete_subnets_with_restart_neutron_l3_agent_service
create_and_delete_subnets_with_restart_neutron_openvswitch_agent_service
index.rst
index.rst

@ -0,0 +1,145 @@
.. _neutron_agent_restart_test_plan:
=============================================================
OpenStack Neutron Control Plane Performance and Agent Restart
=============================================================
:status: **draft**
:version: 1.0
Test Plan
=========
Neutron Server is the core of Neutron control plane. It processes requests
from public API and internal RPC API. The latter is used to communicate with
agents. Normally RPC is used to notify agents about updated configuration.
However in case of agent restart or communication failure the agent requests
all data from server and the amount of data may be significant.
The goal of this test plan is to measure how restart of bunch of agents
affect performance of Neutron control plane.
Test Environment
----------------
Preparation
^^^^^^^^^^^
This test plan is performed against existing OpenStack cloud.
Environment description
^^^^^^^^^^^^^^^^^^^^^^^
The environment description includes hardware specification of servers,
network parameters, operation system and OpenStack deployment characteristics.
Hardware
~~~~~~~~
This section contains list of all types of hardware nodes.
+-----------+-------+----------------------------------------------------+
| Parameter | Value | Comments |
+-----------+-------+----------------------------------------------------+
| model | | e.g. Supermicro X9SRD-F |
+-----------+-------+----------------------------------------------------+
| CPU | | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz |
+-----------+-------+----------------------------------------------------+
| role | | e.g. compute or network |
+-----------+-------+----------------------------------------------------+
Network
~~~~~~~
This section contains list of interfaces and network parameters.
For complicated cases this section may include topology diagram and switch
parameters.
+------------------+-------+-------------------------+
| Parameter | Value | Comments |
+------------------+-------+-------------------------+
| network role | | e.g. provider or public |
+------------------+-------+-------------------------+
| card model | | e.g. Intel |
+------------------+-------+-------------------------+
| driver | | e.g. ixgbe |
+------------------+-------+-------------------------+
| speed | | e.g. 10G or 1G |
+------------------+-------+-------------------------+
| MTU | | e.g. 9000 |
+------------------+-------+-------------------------+
| offloading modes | | e.g. default |
+------------------+-------+-------------------------+
Software
~~~~~~~~
This section describes installed software.
+-----------------+-------+---------------------------+
| Parameter | Value | Comments |
+-----------------+-------+---------------------------+
| OS | | e.g. Ubuntu 14.04.3 |
+-----------------+-------+---------------------------+
| OpenStack | | e.g. Liberty |
+-----------------+-------+---------------------------+
| Hypervisor | | e.g. KVM |
+-----------------+-------+---------------------------+
| Neutron plugin | | e.g. ML2 + OVS |
+-----------------+-------+---------------------------+
| L2 segmentation | | e.g. VLAN or VxLAN or GRE |
+-----------------+-------+---------------------------+
| virtual routers | | HA |
+-----------------+-------+---------------------------+
Test Case: mass restart of agents
---------------------------------
Description
^^^^^^^^^^^
Measurements can be performed by methodology described in
:ref:`reliability_testing_version_2`. The following metrics need to be
collected:
.. list-table::
:header-rows: 1
*
- Priority
- Value
- Measurement Unit
- Description
*
- 1
- Service downtime
- sec
- How long the service was not available and operations were in error
state.
*
- 1
- MTTR
- sec
- How long does it takes to recover service performance after the failure.
*
- 1
- Operation Degradation
- sec
- the mean of difference in operation performance during recovery period
and operation performance when service operates normally.
*
- 1
- Operation Degradation Ratio
- sec
- the ratio between operation performance during recovery period and
operation performance when service operates normally.
Reports
=======
Test plan execution reports:
* :ref:`neutron_agent_restart_test_report`

@ -11,3 +11,4 @@ Neutron features test plans
l3_ha/plan
resource_density/plan
agent_restart/plan

@ -0,0 +1,77 @@
Networks operations and L3-agent restart
========================================
In this scenario we restart all L3 agents while Neutron creates and deletes
networks.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_networks:
-
args:
network_create_args: {}
runner:
type: "constant_for_duration"
duration: 120
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-l3-agent service
trigger:
name: event
args:
unit: iteration
at: [100]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 85 | 0.36 | 0.4 | 0.068 | 0.52 |
+-----------+-------------+-----------+-----------+---------------------+

@ -0,0 +1,76 @@
Networks operations and OVS agent restart
=========================================
In this scenario we restart all OVS agents while Neutron creates and deletes
networks.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_networks:
-
args:
network_create_args: {}
runner:
type: "constant_for_duration"
duration: 120
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-openvswitch-agent service
trigger:
name: event
args:
unit: iteration
at: [100]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 86 | 0.38 | 0.4 | 0.063 | 0.5 |
+-----------+-------------+-----------+-----------+---------------------+

@ -0,0 +1,79 @@
Ports operations and L3-agent restart
=====================================
In this scenario we restart all L3 agents while Neutron creates and deletes
ports.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_ports:
-
args:
network_create_args: {}
port_create_args: {}
ports_per_network: 10
runner:
type: "constant_for_duration"
duration: 300
concurrency: 6
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
port: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-l3-agent service
trigger:
name: event
args:
unit: iteration
at: [80]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 63 | 8.5 | 8.5 | 0.4 | 9.3 |
+-----------+-------------+-----------+-----------+---------------------+

File diff suppressed because it is too large Load Diff

After

(image error) Size: 73 KiB

@ -0,0 +1,79 @@
Ports operations and OVS agent restart
======================================
In this scenario we restart all OVS agents while Neutron creates and deletes
ports.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_ports:
-
args:
network_create_args: {}
port_create_args: {}
ports_per_network: 10
runner:
type: "constant_for_duration"
duration: 300
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
port: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-openvswitch-agent service
trigger:
name: event
args:
unit: iteration
at: [80]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 65 | 8.7 | 8.8 | 0.31 | 9.3 |
+-----------+-------------+-----------+-----------+---------------------+

@ -0,0 +1,80 @@
Subnets operations and L3-agent restart
=======================================
In this scenario we restart all L3 agents while Neutron creates and deletes
subnets.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_subnets:
-
args:
network_create_args: {}
subnet_create_args: {}
subnet_cidr_start: "1.1.0.0/28"
subnets_per_network: 2
runner:
type: "constant_for_duration"
duration: 120
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
subnet: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-l3-agent service
trigger:
name: event
args:
unit: iteration
at: [100]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 85 | 2 | 2 | 0.16 | 2.4 |
+-----------+-------------+-----------+-----------+---------------------+

@ -0,0 +1,80 @@
Subnets operations and OVS-agent restart
========================================
In this scenario we restart all OVS agents while Neutron creates and deletes
subnets.
This report is generated on results collected by execution of the following
Rally scenario:
.. code-block:: yaml
---
NeutronNetworks.create_and_delete_subnets:
-
args:
network_create_args: {}
subnet_create_args: {}
subnet_cidr_start: "1.1.0.0/28"
subnets_per_network: 2
runner:
type: "constant_for_duration"
duration: 120
concurrency: 4
context:
users:
tenants: 1
users_per_tenant: 1
quotas:
neutron:
network: -1
subnet: -1
hooks:
-
name: fault_injection
args:
action: restart neutron-openvswitch-agent service
trigger:
name: event
args:
unit: iteration
at: [100]
Summary
-------
No errors nor performance degradation observed.
Details
-------
This section contains individual data for particular scenario runs.
Run #1
^^^^^^
.. image:: plot_1.svg
Baseline
~~~~~~~~
Baseline samples are collected before the start of fault injection. They are
used to estimate service performance degradation after the fault.
+-----------+-------------+-----------+-----------+---------------------+
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
+===========+=============+===========+===========+=====================+
| 85 | 1.3 | 1.4 | 0.14 | 1.6 |
+-----------+-------------+-----------+-----------+---------------------+

@ -0,0 +1,50 @@
.. _neutron_agent_restart_test_report:
=========================================================================
OpenStack Neutron Control Plane Performance and Agent Restart Test Report
=========================================================================
This report is generated for :ref:`neutron_agent_restart_test_plan`.
Environment description
=======================
Cluster description
-------------------
* 3 controllers
* 3 compute nodes
Software versions
-----------------
**OpenStack/System**:
Fuel/MOS 9.0, Ubuntu 14.04, Linux kernel 3.13, OVS 2.4.1
**Networking**
Neutron ML2 + OVS plugin, DVR, L2pop, MTU 1500
Hardware configuration of each server
-------------------------------------
Description of servers hardware
**Compute Vendor**:
HP ProLiant DL380 Gen9,
**CPU**
2 x Intel(R) Xeon(R) CPU E5-2680 v3 @2.50GHz (48 cores)
**RAM**:
256 Gb
**NIC**
2 x Intel Corporation Ethernet 10G 2P X710
Reports
=======
Reports are collected on OpenStack with 100 instances, 100 routers,
100 networks.
.. toctree::
:glob:
:maxdepth: 1
*/index

@ -12,3 +12,4 @@ Neutron features scale testing
l3_ha/test_results_liberty
l3_ha/test_results_mitaka
resource_density/index
agent_restart/index