Publish reports for reliability testing 2.0
Change-Id: Ibe31d2674dfde70c0c2d349154c1f94c8e4fc86e
@ -1,4 +1,4 @@
|
||||
.. _reliability_testing:
|
||||
.. _reliability_testing_version_2:
|
||||
|
||||
==========================================
|
||||
OpenStack reliability testing. Version 2.0
|
||||
@ -18,11 +18,13 @@ OpenStack reliability testing. Version 2.0
|
||||
|
||||
- **MTTR** - mean time to recover service performance after the fault.
|
||||
|
||||
- **Service Downtime** - the time when service was not available and number
|
||||
of errors is more than defined by SLA.
|
||||
- **Service Downtime** - the time when service was not available.
|
||||
|
||||
- **Operation Degradation** - the difference in operation performance
|
||||
compared with performance when service operates normally.
|
||||
- **Absolute performance degradation** - is an absolute difference between
|
||||
the mean of operation duration during recovery period and the baseline's.
|
||||
|
||||
- **Relative performance degradation** - is the ratio between the mean
|
||||
of operation duration during recovery period and the baseline's.
|
||||
|
||||
- **Fault injection** - the function that emulates failure in software or
|
||||
hardware.
|
||||
@ -201,14 +203,14 @@ Overall the following metrics need to be collected:
|
||||
- How long does it takes to recover service performance after the failure.
|
||||
*
|
||||
- 1
|
||||
- Operation Degradation
|
||||
- Absolute performance degradation
|
||||
- sec
|
||||
- the mean of difference in operation performance during recovery period
|
||||
and operation performance when service operates normally.
|
||||
*
|
||||
- 1
|
||||
- Operation Degradation Ratio
|
||||
- sec
|
||||
- Relative performance degradation
|
||||
- ratio
|
||||
- the ratio between operation performance during recovery period and
|
||||
operation performance when service operates normally.
|
||||
|
||||
@ -252,13 +254,45 @@ succeed operation.
|
||||
To find the recovery period we first calculate the mean duration of
|
||||
consequent operations with sliding window. The period is treated as
|
||||
`Recovery period` when mean operation duration is significantly more than
|
||||
the mean operation duration in the baseline. `Operation degradation` is
|
||||
calculated as difference between mean of operation duration during Recovery
|
||||
period and the baseline's. `Operation ratio` is the ratio between mean of
|
||||
operation duration during Recovery period and the baseline's.
|
||||
the mean operation duration in the baseline. The average duration of Recovery
|
||||
period is `MTTR` value. `Absolute performance degradatio` is calculated as
|
||||
difference between mean of operation duration during Recovery period and
|
||||
the baseline's. `Relative performance degradation` is the ratio between
|
||||
mean of operation duration during Recovery period and the baseline's.
|
||||
|
||||
|
||||
How to run
|
||||
^^^^^^^^^^
|
||||
|
||||
Prerequisites:
|
||||
* Install `Rally` tool and configure deployment parameters
|
||||
|
||||
* Verify that Rally is properly installed by running ``rally show flavors``
|
||||
|
||||
* Install `os-faults` library: ``pip install os-faults``
|
||||
|
||||
* Configure cloud and power management parameters, refer to `os-faults-cfg`
|
||||
* Verify parameters by running ``os-inject-fault -v``
|
||||
|
||||
* Install `RallyRunners` tool: ``pip install rally-runners``
|
||||
|
||||
Run scenarios:
|
||||
``rally-reliability -s SCENARIO -o OUTPUT -b BOOK``
|
||||
|
||||
To show full list of scenarios:
|
||||
``rally-reliability -h``
|
||||
|
||||
|
||||
Reports
|
||||
=======
|
||||
|
||||
Test plan execution reports:
|
||||
* :ref:`reliability_test_results_version_2`
|
||||
|
||||
|
||||
.. references:
|
||||
|
||||
.. _Rally: https://rally.readthedocs.io/
|
||||
.. _os-faults: https://os-faults.readthedocs.io/
|
||||
.. _os-faults-cfg: http://os-faults.readthedocs.io/en/latest/readme.html#usage
|
||||
.. _RallyRunners: https://github.com/shakhat/rally-runners
|
||||
|
42
doc/source/test_results/reliability/version_2/index.rst
Normal file
@ -0,0 +1,42 @@
|
||||
.. _reliability_test_results_version_2:
|
||||
|
||||
========================================
|
||||
OpenStack reliability testing. Version 2
|
||||
========================================
|
||||
|
||||
Test results
|
||||
============
|
||||
|
||||
Environment description
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This report contains results for :ref:`reliability_testing_version_2`
|
||||
test plan. The data is collected in :ref:`intel_mirantis_performance_lab`.
|
||||
|
||||
|
||||
Software
|
||||
~~~~~~~~
|
||||
|
||||
This section describes installed software.
|
||||
|
||||
+-----------------+--------------------------------------------+
|
||||
| Parameter | Value |
|
||||
+-----------------+--------------------------------------------+
|
||||
| OS | Ubuntu 14.04.3 |
|
||||
+-----------------+--------------------------------------------+
|
||||
| OpenStack | Fuel 9.0 (Mitaka) |
|
||||
+-----------------+--------------------------------------------+
|
||||
| Networking | Neutron OVS ML2 plugin with VxLAN and DVR |
|
||||
+-----------------+--------------------------------------------+
|
||||
|
||||
|
||||
Reports
|
||||
^^^^^^^
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
|
||||
reports/*/*/index
|
||||
|
||||
Reports are calculated on :download:`Raw Rally data <raw/raw_data.tar.xz>`
|
@ -0,0 +1,296 @@
|
||||
Keystone authentication with kill of Keystone on one node
|
||||
=========================================================
|
||||
|
||||
This report is generated on results collected by execution of the following
|
||||
Rally scenario:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
---
|
||||
{% set repeat = repeat|default(5) %}
|
||||
Authenticate.keystone:
|
||||
{% for iteration in range(repeat) %}
|
||||
-
|
||||
runner:
|
||||
type: "constant_for_duration"
|
||||
duration: 30
|
||||
concurrency: 20
|
||||
context:
|
||||
users:
|
||||
tenants: 1
|
||||
users_per_tenant: 1
|
||||
hooks:
|
||||
-
|
||||
name: fault_injection
|
||||
args:
|
||||
action: kill keystone service on one node
|
||||
trigger:
|
||||
name: event
|
||||
args:
|
||||
unit: iteration
|
||||
at: [100]
|
||||
{% endfor %}
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
In Fuel architecture Keystone is deployed behind Apache2, which in turn are
|
||||
behind NGINX front-end. In this scenario we kill Keystone processes running
|
||||
on one of controller nodes.
|
||||
|
||||
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||
+=======================+============+=======================================+===========================================+
|
||||
| 0.038 ±0.081 | 2.28 ±0.23 | 1.21 ±0.35 | 9.1 ±2.3 |
|
||||
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||
|
||||
Metrics:
|
||||
* `Service downtime` is the time interval between the first and
|
||||
the last errors.
|
||||
* `MTTR` is the mean time to recover service performance after
|
||||
the fault.
|
||||
* `Absolute performance degradation` is an absolute difference between
|
||||
the mean of operation duration during recovery period and the baseline's.
|
||||
* `Relative performance degradation` is the ratio between the mean
|
||||
of operation duration during recovery period and the baseline's.
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
This section contains individual data for particular scenario runs.
|
||||
|
||||
|
||||
|
||||
Run #1
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_1.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 78 | 0.12 | 0.13 | 0.041 | 0.23 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+----------------+
|
||||
| # | Downtime, s |
|
||||
+=====+================+
|
||||
| 1 | 0.0034 ±0.0034 |
|
||||
+-----+----------------+
|
||||
| 2 | 0.0282 ±0.0014 |
|
||||
+-----+----------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 2.711 ±0.023 | 1.30 ±0.39 | 10.8 ±3.0 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #2
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_2.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 70 | 0.14 | 0.15 | 0.048 | 0.24 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+----------------+
|
||||
| # | Downtime, s |
|
||||
+=====+================+
|
||||
| 1 | 0.0047 ±0.0047 |
|
||||
+-----+----------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 2.722 ±0.026 | 1.66 ±0.43 | 11.9 ±2.9 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #3
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_3.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 84 | 0.15 | 0.16 | 0.058 | 0.27 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+----------------+
|
||||
| # | Downtime, s |
|
||||
+=====+================+
|
||||
| 1 | 0.1147 ±0.0067 |
|
||||
+-----+----------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 2.317 ±0.019 | 1.07 ±0.35 | 7.5 ±2.1 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #4
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_4.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 87 | 0.14 | 0.16 | 0.051 | 0.25 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+----------------+
|
||||
| # | Downtime, s |
|
||||
+=====+================+
|
||||
| 1 | 0.0057 ±0.0057 |
|
||||
+-----+----------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 1.695 ±0.015 | 1.11 ±0.29 | 8.0 ±1.8 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #5
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_5.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 87 | 0.14 | 0.15 | 0.051 | 0.26 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+----------------+
|
||||
| # | Downtime, s |
|
||||
+=====+================+
|
||||
| 1 | 0.0166 ±0.0044 |
|
||||
+-----+----------------+
|
||||
| 2 | 0.0162 ±0.0044 |
|
||||
+-----+----------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 1.976 ±0.015 | 0.93 ±0.29 | 7.1 ±1.9 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
After Width: | Height: | Size: 455 KiB |
After Width: | Height: | Size: 469 KiB |
After Width: | Height: | Size: 450 KiB |
After Width: | Height: | Size: 469 KiB |
After Width: | Height: | Size: 460 KiB |
@ -0,0 +1,98 @@
|
||||
Keystone authentication with kill of MySQL on one node
|
||||
======================================================
|
||||
|
||||
This report is generated on results collected by execution of the following
|
||||
Rally scenario:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
---
|
||||
Authenticate.keystone:
|
||||
-
|
||||
runner:
|
||||
type: "constant_for_duration"
|
||||
duration: 60
|
||||
concurrency: 5
|
||||
context:
|
||||
users:
|
||||
tenants: 1
|
||||
users_per_tenant: 1
|
||||
hooks:
|
||||
-
|
||||
name: fault_injection
|
||||
args:
|
||||
action: kill mysql service on one node
|
||||
trigger:
|
||||
name: event
|
||||
args:
|
||||
unit: iteration
|
||||
at: [150]
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
In this scenario we kill one of MySQL servers while working with Keystone API.
|
||||
In Fuel architecture MySQL is deployed with Galera in active-active mode,
|
||||
however Keystone looses connection to DB with the following traces::
|
||||
|
||||
(_mysql_exceptions.OperationalError) (2013, "Lost connection to MySQL
|
||||
server at 'reading initial communication packet', system error: 0")
|
||||
|
||||
+-----------------------+-----------+---------------------------------------+-------------------------------------------+
|
||||
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||
+=======================+===========+=======================================+===========================================+
|
||||
| 14.7 ±1.4 | N/A | N/A | N/A |
|
||||
+-----------------------+-----------+---------------------------------------+-------------------------------------------+
|
||||
|
||||
Metrics:
|
||||
* `Service downtime` is the time interval between the first and
|
||||
the last errors.
|
||||
* `MTTR` is the mean time to recover service performance after
|
||||
the fault.
|
||||
* `Absolute performance degradation` is an absolute difference between
|
||||
the mean of operation duration during recovery period and the baseline's.
|
||||
* `Relative performance degradation` is the ratio between the mean
|
||||
of operation duration during recovery period and the baseline's.
|
||||
|
||||
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
This section contains individual data for particular scenario runs.
|
||||
|
||||
|
||||
|
||||
Run #1
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_1.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 135 | 0.071 | 0.074 | 0.012 | 0.09 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+---------------+
|
||||
| # | Downtime, s |
|
||||
+=====+===============+
|
||||
| 1 | 14.7 ±2.0 |
|
||||
+-----+---------------+
|
||||
|
||||
|
||||
|
||||
|
After Width: | Height: | Size: 405 KiB |
@ -0,0 +1,292 @@
|
||||
Keystone authentication with Keystone API restart on one node
|
||||
=============================================================
|
||||
|
||||
This report is generated on results collected by execution of the following
|
||||
Rally scenario:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
---
|
||||
{% set repeat = repeat|default(5) %}
|
||||
Authenticate.keystone:
|
||||
{% for iteration in range(repeat) %}
|
||||
-
|
||||
runner:
|
||||
type: "constant_for_duration"
|
||||
duration: 30
|
||||
concurrency: 5
|
||||
context:
|
||||
users:
|
||||
tenants: 1
|
||||
users_per_tenant: 1
|
||||
hooks:
|
||||
-
|
||||
name: fault_injection
|
||||
args:
|
||||
action: restart keystone service on one node
|
||||
trigger:
|
||||
name: event
|
||||
args:
|
||||
unit: iteration
|
||||
at: [100]
|
||||
{% endfor %}
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
In Fuel architecture Keystone is deployed behind Apache2, which in turn are
|
||||
behind NGINX front-end. In this scenario we restart Apache2 service, as result
|
||||
Keystone becomes unavailable on one of controller nodes.
|
||||
|
||||
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||
+=======================+============+=======================================+===========================================+
|
||||
| 1.07 ±0.76 | 5.44 ±0.47 | 0.41 ±0.22 | 4.7 ±2.0 |
|
||||
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||
|
||||
Metrics:
|
||||
* `Service downtime` is the time interval between the first and
|
||||
the last errors.
|
||||
* `MTTR` is the mean time to recover service performance after
|
||||
the fault.
|
||||
* `Absolute performance degradation` is an absolute difference between
|
||||
the mean of operation duration during recovery period and the baseline's.
|
||||
* `Relative performance degradation` is the ratio between the mean
|
||||
of operation duration during recovery period and the baseline's.
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
This section contains individual data for particular scenario runs.
|
||||
|
||||
|
||||
|
||||
Run #1
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_1.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 84 | 0.071 | 0.077 | 0.017 | 0.13 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+---------------+
|
||||
| # | Downtime, s |
|
||||
+=====+===============+
|
||||
| 1 | 0.88 ±0.75 |
|
||||
+-----+---------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 3.549 ±0.034 | 0.51 ±0.25 | 7.6 ±3.3 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #2
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_2.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 84 | 0.13 | 0.13 | 0.0086 | 0.14 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+---------------+
|
||||
| # | Downtime, s |
|
||||
+=====+===============+
|
||||
| 1 | 1.00 ±0.87 |
|
||||
+-----+---------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 6.038 ±0.034 | 0.35 ±0.17 | 3.7 ±1.3 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #3
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_3.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 84 | 0.13 | 0.12 | 0.0077 | 0.14 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+---------------+
|
||||
| # | Downtime, s |
|
||||
+=====+===============+
|
||||
| 1 | 0.26 ±0.12 |
|
||||
+-----+---------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 6.123 ±0.037 | 0.43 ±0.25 | 4.4 ±2.0 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #4
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_4.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 84 | 0.13 | 0.13 | 0.0089 | 0.14 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+---------------+
|
||||
| # | Downtime, s |
|
||||
+=====+===============+
|
||||
| 1 | 1.02 ±0.73 |
|
||||
+-----+---------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 5.860 ±0.027 | 0.25 ±0.13 | 2.9 ±1.1 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #5
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_5.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 87 | 0.13 | 0.13 | 0.019 | 0.14 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+---------------+
|
||||
| # | Downtime, s |
|
||||
+=====+===============+
|
||||
| 1 | 2.173 ±0.067 |
|
||||
+-----+---------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 5.630 ±0.048 | 0.52 ±0.30 | 5.0 ±2.3 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
After Width: | Height: | Size: 255 KiB |
After Width: | Height: | Size: 165 KiB |
After Width: | Height: | Size: 166 KiB |
After Width: | Height: | Size: 166 KiB |
After Width: | Height: | Size: 217 KiB |
@ -0,0 +1,201 @@
|
||||
Keystone authentication with memached restart on one node
|
||||
=========================================================
|
||||
|
||||
This report is generated on results collected by execution of the following
|
||||
Rally scenario:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
---
|
||||
{% set repeat = repeat|default(5) %}
|
||||
Authenticate.keystone:
|
||||
{% for iteration in range(repeat) %}
|
||||
-
|
||||
runner:
|
||||
type: "constant_for_duration"
|
||||
duration: 30
|
||||
concurrency: 5
|
||||
context:
|
||||
users:
|
||||
tenants: 1
|
||||
users_per_tenant: 1
|
||||
hooks:
|
||||
-
|
||||
name: fault_injection
|
||||
args:
|
||||
action: restart memcached service on one node
|
||||
trigger:
|
||||
name: event
|
||||
args:
|
||||
unit: iteration
|
||||
at: [100]
|
||||
{% endfor %}
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
In this scenario we restart Memcached service on one of controller nodes.
|
||||
Memcached is used as caching backend for Keystone, thus it's expected that
|
||||
Keystone performance may degrade.
|
||||
|
||||
+-----------------------+--------------+---------------------------------------+-------------------------------------------+
|
||||
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||
+=======================+==============+=======================================+===========================================+
|
||||
| N/A | 0.458 ±0.068 | 0.057 ±0.034 | 1.46 ±0.27 |
|
||||
+-----------------------+--------------+---------------------------------------+-------------------------------------------+
|
||||
|
||||
Metrics:
|
||||
* `Service downtime` is the time interval between the first and
|
||||
the last errors.
|
||||
* `MTTR` is the mean time to recover service performance after
|
||||
the fault.
|
||||
* `Absolute performance degradation` is an absolute difference between
|
||||
the mean of operation duration during recovery period and the baseline's.
|
||||
* `Relative performance degradation` is the ratio between the mean
|
||||
of operation duration during recovery period and the baseline's.
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
This section contains individual data for particular scenario runs.
|
||||
|
||||
|
||||
|
||||
Run #1
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_1.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 88 | 0.12 | 0.12 | 0.014 | 0.13 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Run #2
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_2.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 84 | 0.12 | 0.12 | 0.0078 | 0.13 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 0.4059 ±0.0027 | 0.069 ±0.030 | 1.57 ±0.25 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #3
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_3.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 88 | 0.12 | 0.13 | 0.017 | 0.15 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Run #4
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_4.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 84 | 0.12 | 0.12 | 0.01 | 0.14 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Run #5
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_5.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 84 | 0.13 | 0.13 | 0.0086 | 0.14 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 0.5110 ±0.0037 | 0.045 ±0.037 | 1.35 ±0.29 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
After Width: | Height: | Size: 203 KiB |
After Width: | Height: | Size: 204 KiB |
After Width: | Height: | Size: 200 KiB |
After Width: | Height: | Size: 198 KiB |
After Width: | Height: | Size: 196 KiB |
@ -0,0 +1,160 @@
|
||||
Create and list networks with kill of one of MySQL servers
|
||||
==========================================================
|
||||
|
||||
This report is generated on results collected by execution of the following
|
||||
Rally scenario:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
---
|
||||
{% set repeat = repeat|default(3) %}
|
||||
NeutronNetworks.create_and_list_networks:
|
||||
{% for iteration in range(repeat) %}
|
||||
-
|
||||
args:
|
||||
network_create_args: {}
|
||||
runner:
|
||||
type: "constant_for_duration"
|
||||
duration: 60
|
||||
concurrency: 4
|
||||
context:
|
||||
users:
|
||||
tenants: 1
|
||||
users_per_tenant: 1
|
||||
quotas:
|
||||
neutron:
|
||||
network: -1
|
||||
hooks:
|
||||
-
|
||||
name: fault_injection
|
||||
args:
|
||||
action: kill mysql service on one node
|
||||
trigger:
|
||||
name: event
|
||||
args:
|
||||
unit: iteration
|
||||
at: [100]
|
||||
{% endfor %}
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
In this scenario we kill one of MySQL servers while working with Neutron API.
|
||||
In Fuel architecture MySQL is deployed with Galera in active-active mode, thus
|
||||
no dramatic impact should occur.
|
||||
|
||||
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||
+=======================+============+=======================================+===========================================+
|
||||
| N/A | 7.73 ±0.72 | 1.4 ±1.1 | 3.8 ±2.3 |
|
||||
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||
|
||||
Metrics:
|
||||
* `Service downtime` is the time interval between the first and
|
||||
the last errors.
|
||||
* `MTTR` is the mean time to recover service performance after
|
||||
the fault.
|
||||
* `Absolute performance degradation` is an absolute difference between
|
||||
the mean of operation duration during recovery period and the baseline's.
|
||||
* `Relative performance degradation` is the ratio between the mean
|
||||
of operation duration during recovery period and the baseline's.
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
This section contains individual data for particular scenario runs.
|
||||
|
||||
|
||||
|
||||
Run #1
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_1.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 86 | 0.48 | 0.8 | 0.49 | 1.6 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Run #2
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_2.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 85 | 0.46 | 0.5 | 0.12 | 0.7 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 6.824 ±0.093 | 1.5 ±1.2 | 4.1 ±2.5 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Run #3
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_3.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 85 | 0.45 | 0.47 | 0.065 | 0.61 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 8.63 ±0.12 | 1.18 ±1.00 | 3.5 ±2.1 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
After Width: | Height: | Size: 90 KiB |
After Width: | Height: | Size: 101 KiB |
After Width: | Height: | Size: 100 KiB |
@ -0,0 +1,119 @@
|
||||
Boot and delete VM with disabling management network on one of controllers
|
||||
==========================================================================
|
||||
|
||||
This report is generated on results collected by execution of the following
|
||||
Rally scenario:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
---
|
||||
NovaServers.boot_and_delete_server:
|
||||
-
|
||||
args:
|
||||
flavor:
|
||||
name: "m1.micro"
|
||||
image:
|
||||
name: "(^cirros.*uec$|TestVM)"
|
||||
force_delete: false
|
||||
runner:
|
||||
type: "constant_for_duration"
|
||||
duration: 600
|
||||
concurrency: 4
|
||||
context:
|
||||
users:
|
||||
tenants: 1
|
||||
users_per_tenant: 1
|
||||
hooks:
|
||||
-
|
||||
name: fault_injection
|
||||
args:
|
||||
action: disconnect management network on one node with nova-scheduler service
|
||||
trigger:
|
||||
name: event
|
||||
args:
|
||||
unit: iteration
|
||||
at: [50]
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
In this scenario we disable management network interface on one of controllers
|
||||
(in Fuel architecture controller runs DB, MQ, API services, scheduler).
|
||||
This emulates the case with networking outage (network port failure on machine
|
||||
or switch).
|
||||
|
||||
The outage causes all services to become unreachable from outside. Moreover,
|
||||
the cluster remains broken even 10 minutes after the fault.
|
||||
|
||||
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||
+=======================+============+=======================================+===========================================+
|
||||
| 358.0 ±2.7 | 149.0 ±2.1 | 24 ±17 | 5.7 ±3.4 |
|
||||
+-----------------------+------------+---------------------------------------+-------------------------------------------+
|
||||
|
||||
Metrics:
|
||||
* `Service downtime` is the time interval between the first and
|
||||
the last errors.
|
||||
* `MTTR` is the mean time to recover service performance after
|
||||
the fault.
|
||||
* `Absolute performance degradation` is an absolute difference between
|
||||
the mean of operation duration during recovery period and the baseline's.
|
||||
* `Relative performance degradation` is the ratio between the mean
|
||||
of operation duration during recovery period and the baseline's.
|
||||
|
||||
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
This section contains individual data for particular scenario runs.
|
||||
|
||||
|
||||
|
||||
Run #1
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_1.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 36 | 5.5 | 5.2 | 0.6 | 6 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+---------------+
|
||||
| # | Downtime, s |
|
||||
+=====+===============+
|
||||
| 1 | 126.32 ±0.82 |
|
||||
+-----+---------------+
|
||||
| 2 | 231.7 ±6.5 |
|
||||
+-----+---------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 149.0 ±4.6 | 24 ±17 | 5.7 ±3.4 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
After Width: | Height: | Size: 64 KiB |
@ -0,0 +1,81 @@
|
||||
Boot and delete VM with kill of RabbitMQ on one of nodes
|
||||
========================================================
|
||||
|
||||
This report is generated on results collected by execution of the following
|
||||
Rally scenario:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
---
|
||||
NovaServers.boot_and_delete_server:
|
||||
-
|
||||
args:
|
||||
flavor:
|
||||
name: "m1.micro"
|
||||
image:
|
||||
name: "(^cirros.*uec$|TestVM)"
|
||||
force_delete: false
|
||||
runner:
|
||||
type: "constant_for_duration"
|
||||
duration: 240
|
||||
concurrency: 4
|
||||
context:
|
||||
users:
|
||||
tenants: 1
|
||||
users_per_tenant: 1
|
||||
hooks:
|
||||
-
|
||||
name: fault_injection
|
||||
args:
|
||||
action: kill rabbitmq service on one node
|
||||
trigger:
|
||||
name: event
|
||||
args:
|
||||
unit: iteration
|
||||
at: [60]
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
In this scenario we kill one of running RabbitMQ servers. Once killed RabbitMQ
|
||||
gets restarted automatically by Pacemaker.
|
||||
|
||||
The cloud stays stable, no errors, nor significant performance degradation
|
||||
observed. Oslo.messaging library handles the loss of connection to RabbitMQ
|
||||
and reconnects to one of other servers automatically::
|
||||
|
||||
AMQP server on 10.43.0.3:5673 is unreachable: timed out. Trying again in
|
||||
1 seconds.
|
||||
...
|
||||
Reconnected to AMQP server on 10.43.0.6:5673 via [amqp] client
|
||||
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
This section contains individual data for particular scenario runs.
|
||||
|
||||
|
||||
|
||||
Run #1
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_1.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 45 | 5.8 | 5.8 | 0.3 | 6.1 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
|
||||
|
||||
|
After Width: | Height: | Size: 66 KiB |
@ -0,0 +1,113 @@
|
||||
Boot and delete VM with reboot of one of controllers
|
||||
====================================================
|
||||
|
||||
This report is generated on results collected by execution of the following
|
||||
Rally scenario:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
---
|
||||
NovaServers.boot_and_delete_server:
|
||||
-
|
||||
args:
|
||||
flavor:
|
||||
name: "m1.micro"
|
||||
image:
|
||||
name: "(^cirros.*uec$|TestVM)"
|
||||
force_delete: false
|
||||
runner:
|
||||
type: "constant_for_duration"
|
||||
duration: 600
|
||||
concurrency: 4
|
||||
context:
|
||||
users:
|
||||
tenants: 1
|
||||
users_per_tenant: 1
|
||||
hooks:
|
||||
-
|
||||
name: fault_injection
|
||||
args:
|
||||
action: reboot one node with rabbitmq service
|
||||
trigger:
|
||||
name: event
|
||||
args:
|
||||
unit: iteration
|
||||
at: [50]
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
In this scenario we reboot one of controllers (in Fuel architecture controller
|
||||
runs DB, MQ, API services, scheduler). The observed recovery period corresponds
|
||||
to time needed for a node to reboot, start services and get back to sync state.
|
||||
|
||||
+-----------------------+--------------+---------------------------------------+-------------------------------------------+
|
||||
| Service downtime, s | MTTR, s | Absolute performance degradation, s | Relative performance degradation, ratio |
|
||||
+=======================+==============+=======================================+===========================================+
|
||||
| 8.7 ±1.6 | 286.89 ±0.87 | 14.7 ±4.7 | 3.85 ±0.91 |
|
||||
+-----------------------+--------------+---------------------------------------+-------------------------------------------+
|
||||
|
||||
Metrics:
|
||||
* `Service downtime` is the time interval between the first and
|
||||
the last errors.
|
||||
* `MTTR` is the mean time to recover service performance after
|
||||
the fault.
|
||||
* `Absolute performance degradation` is an absolute difference between
|
||||
the mean of operation duration during recovery period and the baseline's.
|
||||
* `Relative performance degradation` is the ratio between the mean
|
||||
of operation duration during recovery period and the baseline's.
|
||||
|
||||
|
||||
|
||||
Details
|
||||
-------
|
||||
|
||||
This section contains individual data for particular scenario runs.
|
||||
|
||||
|
||||
|
||||
Run #1
|
||||
^^^^^^
|
||||
|
||||
.. image:: plot_1.svg
|
||||
|
||||
Baseline
|
||||
~~~~~~~~
|
||||
|
||||
Baseline samples are collected before the start of fault injection. They are
|
||||
used to estimate service performance degradation after the fault.
|
||||
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
| Samples | Median, s | Mean, s | Std dev | 95% percentile, s |
|
||||
+===========+=============+===========+===========+=====================+
|
||||
| 36 | 5.1 | 5.2 | 0.63 | 6.1 |
|
||||
+-----------+-------------+-----------+-----------+---------------------+
|
||||
|
||||
|
||||
Service downtime
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service is not available during the following time period(s).
|
||||
|
||||
+-----+---------------+
|
||||
| # | Downtime, s |
|
||||
+=====+===============+
|
||||
| 1 | 8.7 ±2.5 |
|
||||
+-----+---------------+
|
||||
|
||||
|
||||
|
||||
Service performance degradation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The tested service has measurable performance degradation during the
|
||||
following time period(s).
|
||||
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
| # | Time to recover, s | Absolute degradation, s | Relative degradation |
|
||||
+=====+======================+===========================+========================+
|
||||
| 1 | 286.89 ±0.76 | 14.7 ±4.7 | 3.85 ±0.91 |
|
||||
+-----+----------------------+---------------------------+------------------------+
|
||||
|
||||
|
After Width: | Height: | Size: 88 KiB |