Add reliability test results
This commit add part of reliability testing results. Scope of this commit is testing Nova API under several factors. Change-Id: Id3cb644ccf4bd315846399e6ac40a446297787f3
This commit is contained in:
parent
92816644f7
commit
f84ec2ce07
@ -14,43 +14,7 @@ OpenStack reliability testing
|
|||||||
|
|
||||||
:Conventions:
|
:Conventions:
|
||||||
|
|
||||||
- **OpenStack cluster:** consists of server nodes with deployed and fully
|
.. include:: plan_conventions.rst
|
||||||
operational OpenStack environment in high-availability configuration.
|
|
||||||
|
|
||||||
- **Fault-injection operation:** represents common types of failures which can
|
|
||||||
occur in production environment: service-hang, service-crash,
|
|
||||||
network-partition, network-flapping, and node-crash.
|
|
||||||
|
|
||||||
- **Service-hang:** faults are injected into specified OpenStack service by
|
|
||||||
sending -SIGSTOP and -SIGCONT POSIX signals.
|
|
||||||
|
|
||||||
- **Service-crash:** faults are injected by sending -SIGKILL signal into
|
|
||||||
specified OpenStack service.
|
|
||||||
|
|
||||||
- **Node-crash:** faults are injected to an OpenStack cluster by rebooting
|
|
||||||
or shutting down a server node.
|
|
||||||
|
|
||||||
- **Network-partition:** faults are injected by inserting iptables rules to
|
|
||||||
OpenStack cluster nodes to a corresponding service that should be
|
|
||||||
network-partitioned.
|
|
||||||
|
|
||||||
- **Network-flapping:** faults are injected into OpenStack cluster nodes by
|
|
||||||
inserting/deleting iptables rules on the fly which will affect
|
|
||||||
corresponding service that should be tested.
|
|
||||||
|
|
||||||
- **Factor:** consists of a set of atomic fault-injection operations. For
|
|
||||||
example: reboot-random-controller, reboot-random-rabbitmq.
|
|
||||||
|
|
||||||
- **Test plan:** contains two elements: test scenario
|
|
||||||
execution graph and fault-injection factors.
|
|
||||||
|
|
||||||
- **SLA**: Service-level agreement
|
|
||||||
|
|
||||||
- **Testing-cycles**: number of test cycles of each factor
|
|
||||||
|
|
||||||
- **Inf**: assumes infinite time to auto-healing of cluster
|
|
||||||
after fault-factor injection.
|
|
||||||
|
|
||||||
|
|
||||||
Test Plan
|
Test Plan
|
||||||
=========
|
=========
|
||||||
|
36
doc/source/test_plans/reliability/plan_conventions.rst
Normal file
36
doc/source/test_plans/reliability/plan_conventions.rst
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
- **OpenStack cluster:** consists of server nodes with deployed and fully
|
||||||
|
operational OpenStack environment in high-availability configuration.
|
||||||
|
|
||||||
|
- **Fault-injection operation:** represents common types of failures which can
|
||||||
|
occur in production environment: service-hang, service-crash,
|
||||||
|
network-partition, network-flapping, and node-crash.
|
||||||
|
|
||||||
|
- **Service-hang:** faults are injected into specified OpenStack service by
|
||||||
|
sending -SIGSTOP and -SIGCONT POSIX signals.
|
||||||
|
|
||||||
|
- **Service-crash:** faults are injected by sending -SIGKILL signal into
|
||||||
|
specified OpenStack service.
|
||||||
|
|
||||||
|
- **Node-crash:** faults are injected to an OpenStack cluster by rebooting
|
||||||
|
or shutting down a server node.
|
||||||
|
|
||||||
|
- **Network-partition:** faults are injected by inserting iptables rules to
|
||||||
|
OpenStack cluster nodes to a corresponding service that should be
|
||||||
|
network-partitioned.
|
||||||
|
|
||||||
|
- **Network-flapping:** faults are injected into OpenStack cluster nodes by
|
||||||
|
inserting/deleting iptables rules on the fly which will affect
|
||||||
|
corresponding service that should be tested.
|
||||||
|
|
||||||
|
- **Factor:** consists of a set of atomic fault-injection operations. For
|
||||||
|
example: reboot-random-controller, reboot-random-rabbitmq.
|
||||||
|
|
||||||
|
- **Test plan:** contains two elements: test scenario
|
||||||
|
execution graph and fault-injection factors.
|
||||||
|
|
||||||
|
- **SLA**: Service-level agreement
|
||||||
|
|
||||||
|
- **Testing-cycles**: number of test cycles of each factor
|
||||||
|
|
||||||
|
- **Inf**: assumes infinite time to auto-healing of cluster
|
||||||
|
after fault-factor injection.
|
@ -20,3 +20,4 @@ Test Results
|
|||||||
hardware_features/index
|
hardware_features/index
|
||||||
provisioning/index
|
provisioning/index
|
||||||
1000_nodes/index
|
1000_nodes/index
|
||||||
|
reliability/index
|
||||||
|
BIN
doc/source/test_results/reliability/images/Network_Scheme.png
Normal file
BIN
doc/source/test_results/reliability/images/Network_Scheme.png
Normal file
Binary file not shown.
After ![]() (image error) Size: 20 KiB |
650
doc/source/test_results/reliability/index.rst
Normal file
650
doc/source/test_results/reliability/index.rst
Normal file
@ -0,0 +1,650 @@
|
|||||||
|
.. _reliability_testing_results:
|
||||||
|
|
||||||
|
=============================
|
||||||
|
OpenStack reliability testing
|
||||||
|
=============================
|
||||||
|
|
||||||
|
:status: draft
|
||||||
|
:version: 0
|
||||||
|
|
||||||
|
:Abstract:
|
||||||
|
This document describes an abstract methodology for OpenStack cluster
|
||||||
|
high-availability testing and analysis. OpenStack data plane testing
|
||||||
|
at this moment is out of scope but will be described in future.
|
||||||
|
|
||||||
|
:Conventions:
|
||||||
|
|
||||||
|
.. include:: ../../test_plans/reliability/plan_conventions.rst
|
||||||
|
|
||||||
|
|
||||||
|
Test results
|
||||||
|
============
|
||||||
|
|
||||||
|
Test environment
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Software configuration on servers with OpenStack
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. table:: **Basic cluster configuration**
|
||||||
|
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Name | Build-9.0.0-451 |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|OpenStack release | Mitaka on Ubuntu 14.04 |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Total nodes | 6 nodes |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Controller | 3 nodes |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Compute, Ceph OSD | 3 nodes with KVM hypervisor |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Network | Neutron with tunneling segmentation |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|Storage back ends | | Ceph RBD for volumes (Cinder) |
|
||||||
|
| | | Ceph RadosGW for objects (Swift API) |
|
||||||
|
| | | Ceph RBD for ephemeral volumes (Nova) |
|
||||||
|
| | | Ceph RBD for images (Glance) |
|
||||||
|
+-------------------------+---------------------------------------------+
|
||||||
|
|
||||||
|
Software configuration on servers with Rally role
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Before you start configuring a server with Rally role, verify that Rally
|
||||||
|
is installed. For more information, see `Rally installation documentation`_.
|
||||||
|
|
||||||
|
.. table:: **Software version of Rally server**
|
||||||
|
|
||||||
|
+------------+-------------------+
|
||||||
|
|Software |Version |
|
||||||
|
+============+===================+
|
||||||
|
|Rally |0.4.0 |
|
||||||
|
+------------+-------------------+
|
||||||
|
|Ubuntu |14.04.3 LTS |
|
||||||
|
+------------+-------------------+
|
||||||
|
|
||||||
|
Environment description
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Hardware
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
.. table:: **Description of server hardware**
|
||||||
|
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|SERVER |name | | 728997-comp-disk-228 | 729017-comp-disk-255 |
|
||||||
|
| | | | 728998-comp-disk-227 | |
|
||||||
|
| | | | 728999-comp-disk-226 | |
|
||||||
|
| | | | 729000-comp-disk-225 | |
|
||||||
|
| | | | 729001-comp-disk-224 | |
|
||||||
|
| | | | 729002-comp-disk-223 | |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |role | | controller | Rally |
|
||||||
|
| | | | controller | |
|
||||||
|
| | | | controller | |
|
||||||
|
| | | | compute, ceph-osd | |
|
||||||
|
| | | | compute, ceph-osd | |
|
||||||
|
| | | | compute, ceph-osd | |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |vendor, model |HP, DL380 Gen9 |HP, DL380 Gen9 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |operating_system | | 3.13.0-87-generic | | 3.13.0-87-generic |
|
||||||
|
| | | | Ubuntu-trusty | | Ubuntu-trusty |
|
||||||
|
| | | | x86_64 | | x86_64 |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|CPU |vendor, model |Intel, E5-2680 v3 |Intel, E5-2680 v3 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |processor_count |2 |2 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |core_count |12 |12 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |frequency_MHz |2500 |2500 |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|RAM |vendor, model |HP, 752369-081 |HP, 752369-081 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |amount_MB |262144 |262144 |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|NETWORK |interface_name |p1p1 |p1p1 |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |vendor, model |Intel, X710 Dual Port |Intel, X710 Dual Port |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |bandwidth |10 Gbit |10 Gbit |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|STORAGE |dev_name |/dev/sda |/dev/sda |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |vendor, model | | raid10 - HP P840 | | raid10 - HP P840 |
|
||||||
|
| | | | 12 disks EH0600JEDHE | | 12 disks EH0600JEDHE |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |SSD/HDD |HDD |HDD |
|
||||||
|
| +-----------------+------------------------+------------------------+
|
||||||
|
| |size | 3,6 TB | 3,6 TB |
|
||||||
|
+--------+-----------------+------------------------+------------------------+
|
||||||
|
|
||||||
|
Software
|
||||||
|
~~~~~~~~
|
||||||
|
|
||||||
|
.. table:: **Services on servers by role**
|
||||||
|
|
||||||
|
+------------+----------------------------+
|
||||||
|
|Role |Service name |
|
||||||
|
+============+============================+
|
||||||
|
|controller || horizon |
|
||||||
|
| || keystone |
|
||||||
|
| || nova-api |
|
||||||
|
| || nava-scheduler |
|
||||||
|
| || nova-cert |
|
||||||
|
| || nova-conductor |
|
||||||
|
| || nova-consoleauth |
|
||||||
|
| || nova-consoleproxy |
|
||||||
|
| || cinder-api |
|
||||||
|
| || cinder-backup |
|
||||||
|
| || cinder-scheduler |
|
||||||
|
| || cinder-volume |
|
||||||
|
| || glance-api |
|
||||||
|
| || glance-glare |
|
||||||
|
| || glance-registry |
|
||||||
|
| || neutron-dhcp-agent |
|
||||||
|
| || neutron-l3-agent |
|
||||||
|
| || neutron-metadata-agent |
|
||||||
|
| || neutron-openvswitch-agent |
|
||||||
|
| || neutron-server |
|
||||||
|
| || heat-api |
|
||||||
|
| || heat-api-cfn |
|
||||||
|
| || heat-api-cloudwatch |
|
||||||
|
| || ceph-mon |
|
||||||
|
| || rados-gw |
|
||||||
|
| || heat-engine |
|
||||||
|
| || memcached |
|
||||||
|
| || rabbitmq_server |
|
||||||
|
| || mysqld |
|
||||||
|
| || galera |
|
||||||
|
| || corosync |
|
||||||
|
| || pacemaker |
|
||||||
|
| || haproxy |
|
||||||
|
+------------+----------------------------+
|
||||||
|
|compute-osd || nova-compute |
|
||||||
|
| || neutron-l3-agent |
|
||||||
|
| || neutron-metadata-agent |
|
||||||
|
| || neutron-openvswitch-agent |
|
||||||
|
| || ceph-osd |
|
||||||
|
+------------+----------------------------+
|
||||||
|
|
||||||
|
|
||||||
|
High availability cluster architecture
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Controller nodes:
|
||||||
|
|
||||||
|
.. image:: https://docs.mirantis.com/openstack/fuel/fuel-8.0/_images/logical-diagram-controller.svg
|
||||||
|
:height: 700px
|
||||||
|
:width: 600px
|
||||||
|
:alt: Mirantis reference HA architecture
|
||||||
|
|
||||||
|
Compute nodes:
|
||||||
|
|
||||||
|
.. image:: https://docs.mirantis.com/openstack/fuel/fuel-8.0/_images/logical-diagram-compute.svg
|
||||||
|
:height: 250px
|
||||||
|
:width: 350px
|
||||||
|
:alt: Mirantis reference HA architecture
|
||||||
|
|
||||||
|
|
||||||
|
Networking
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
All servers have the similar network configuration:
|
||||||
|
|
||||||
|
.. image:: images/Network_Scheme.png
|
||||||
|
:alt: Network Scheme of the environment
|
||||||
|
|
||||||
|
The following example shows a part of a switch configuration for each switch
|
||||||
|
port that is connected to ens1f0 interface of a server:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
switchport mode trunk
|
||||||
|
switchport trunk native vlan 600
|
||||||
|
switchport trunk allowed vlan 600-602,630-649
|
||||||
|
spanning-tree port type edge trunk
|
||||||
|
spanning-tree bpduguard enable
|
||||||
|
no snmp trap link-status
|
||||||
|
|
||||||
|
|
||||||
|
Factors description
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
- **reboot-random-controller:** consists of a node-crash fault injection on a
|
||||||
|
random OpenStack controller node.
|
||||||
|
|
||||||
|
- **sigkill-random-rabbitmq:** consists of a service-crash fault injection on
|
||||||
|
a random slave RabbitMQ messaging node.
|
||||||
|
|
||||||
|
- **sigkill-random-mysql:** consists of a service-crash fault injection on a
|
||||||
|
random MySQL node.
|
||||||
|
|
||||||
|
- **freeze-random-nova-api:** consists of a service-hang fault injection to
|
||||||
|
all nova-api process on a random controller node for a 150 seconds period.
|
||||||
|
|
||||||
|
- **freeze-random-memcached:** consists of a service-hang fault injection to
|
||||||
|
the memcached service on a random controller node for a 150 seconds period.
|
||||||
|
|
||||||
|
- **freeze-random-keystone:** consists of a service-hang fault injection to
|
||||||
|
the keystone (public and admin endpoints) service on a random controller
|
||||||
|
node for a 150 seconds period.
|
||||||
|
|
||||||
|
|
||||||
|
Testing process
|
||||||
|
===============
|
||||||
|
|
||||||
|
Use the following VM parameters for testing purposes:
|
||||||
|
|
||||||
|
.. table:: **Test parameters**
|
||||||
|
|
||||||
|
+--------------------------------+--------+
|
||||||
|
|Name |Value |
|
||||||
|
+================================+========+
|
||||||
|
|Flavor to create VM from |m1.tiny |
|
||||||
|
+--------------------------------+--------+
|
||||||
|
|Image name to create VM from |cirros |
|
||||||
|
+--------------------------------+--------+
|
||||||
|
|
||||||
|
#. Create a work directory on a server with Rally role.
|
||||||
|
In this documentation, we name this directory ``WORK_DIR``. The path
|
||||||
|
example: ``/data/rally``.
|
||||||
|
|
||||||
|
#. Create a directory ``plugins`` in ``WORK_DIR`` and copy the
|
||||||
|
:download:`scrappy.py <rally_plugins/scrappy.py>` plugin into that directory.
|
||||||
|
|
||||||
|
#. Download the bash framework :download:`scrappy.sh <rally_plugins/scrappy.sh>`
|
||||||
|
and :download:`scrappy.conf <rally_plugins/scrappy.conf>` to
|
||||||
|
``WORK_DIR/plugins``.
|
||||||
|
|
||||||
|
#. Modify the ``scrappy.conf`` file with appropriate values. For example:
|
||||||
|
|
||||||
|
.. literalinclude:: rally_plugins/scrappy.conf
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
#. Create a ``scenarios`` directory in ``WORK_DIR`` and copy all Rally
|
||||||
|
scenarios with factors that you are planning to test to that directory.
|
||||||
|
For example:
|
||||||
|
:download:`random_controller_reboot_factor.json
|
||||||
|
<rally_scenarios/NovaServers/boot_and_delete_server/random_controller_reboot_factor.json/>`.
|
||||||
|
|
||||||
|
#. Create a ``deployment.json`` file in the ``WORK_DIR`` and fill it with
|
||||||
|
your OpenStack environment info. It should looks like this:
|
||||||
|
|
||||||
|
.. code:: json
|
||||||
|
|
||||||
|
{
|
||||||
|
"admin": {
|
||||||
|
"password": "password",
|
||||||
|
"tenant_name": "tenant",
|
||||||
|
"username": "user"
|
||||||
|
},
|
||||||
|
"auth_url": "http://1.2.3.4:5000/v2.0",
|
||||||
|
"region_name": "RegionOne",
|
||||||
|
"type": "ExistingCloud",
|
||||||
|
"endpoint_type": "internal",
|
||||||
|
"admin_port": 35357,
|
||||||
|
"https_insecure": true
|
||||||
|
}
|
||||||
|
|
||||||
|
#. Prepare for tests:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
${WORK_DIR:?}
|
||||||
|
DEPLOYMENT_NAME="$(uuidgen)"
|
||||||
|
DEPLOYMENT_CONFIG="${WORK_DIR}/deployment.json"
|
||||||
|
rally deployment create --filename $(DEPLOYMENT_CONFIG) --name $(DEPLOYMENT_NAME)
|
||||||
|
|
||||||
|
#. Create a ``/root/scrappy`` directory on every node in your OpenStack
|
||||||
|
environment and copy :download:`scrappy_host.sh <rally_plugins/scrappy_host.sh>`
|
||||||
|
to that directory.
|
||||||
|
|
||||||
|
#. Perform tests:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
PLUGIN_PATH="${WORK_DIR}/plugins"
|
||||||
|
SCENARIOS="random_controller_reboot_factor.json"
|
||||||
|
for scenario in SCENARIOS; do
|
||||||
|
rally --plugin-paths ${PLUGINS_PATH} task start --tag ${scenario} ${WORK_DR}/scenarios/${scenario}
|
||||||
|
done
|
||||||
|
task_list="$(rally task list --uuids-only)"
|
||||||
|
rally task report --tasks ${task_list} --out=${WORK_DIR}/rally_report.html
|
||||||
|
|
||||||
|
Once these steps are done, you get an HTML file with Rally test results.
|
||||||
|
|
||||||
|
|
||||||
|
Test case 1: NovaServers.boot_and_delete_server
|
||||||
|
-----------------------------------------------
|
||||||
|
|
||||||
|
**Description**
|
||||||
|
|
||||||
|
This Rally scenario boots and deletes virtual instances with injected fault
|
||||||
|
factors using OpenStack Nova API.
|
||||||
|
|
||||||
|
**Service-level agreement**
|
||||||
|
|
||||||
|
=================== ========
|
||||||
|
Parameter Value
|
||||||
|
=================== ========
|
||||||
|
MTTR (sec) <=240
|
||||||
|
Failure rate (%) <=95
|
||||||
|
Auto-healing Yes
|
||||||
|
=================== ========
|
||||||
|
|
||||||
|
**Parameters**
|
||||||
|
|
||||||
|
=================== ========
|
||||||
|
Parameter Value
|
||||||
|
=================== ========
|
||||||
|
Runner constant
|
||||||
|
Concurrency 5
|
||||||
|
Times 100
|
||||||
|
Injection-iteration 20
|
||||||
|
Testing-cycles 5
|
||||||
|
=================== ========
|
||||||
|
|
||||||
|
**List of reliability metrics**
|
||||||
|
|
||||||
|
======== ============== ================= =================================================
|
||||||
|
Priority Value Measurement Units Description
|
||||||
|
======== ============== ================= =================================================
|
||||||
|
1 SLA Boolean Service-level agreement result
|
||||||
|
2 Auto-healing Boolean Is cluster auto-healed after fault-injection
|
||||||
|
3 Failure rate Percents Test iteration failure ratio
|
||||||
|
4 MTTR (auto) Seconds Automatic mean time to repair
|
||||||
|
5 MTTR (manual) Seconds Manual mean time to repair, if Auto MTTR is Inf.
|
||||||
|
======== ============== ================= =================================================
|
||||||
|
|
||||||
|
Test case 1 results
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
reboot-random-controller
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_reboot_factor.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 1 | 4.31 | 2 | Yes | Yes, up to 148.52 sec. |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 2 | 19.88 | 14 | Yes | Yes, up to 150.946 sec. |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 3 | 7.31 | 8 | Yes | Yes, up to 124.593 sec. |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 4 | 95.07 | 9 | Yes | Yes, up to 240.893 |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
| 5 | Inf. | 80.00 | No | Inf. |
|
||||||
|
+--------+-----------+-----------------+--------------+-------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`reboot_random_controller.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/reboot_random_controller.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
| Min | 4.31 | 2 |
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
| Max | 95.07 | 80 |
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
| SLA | Yes | No |
|
||||||
|
+---------------+-----------------+------------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor affects OpenStack cluster operation on every run.
|
||||||
|
Auto-healing works, but may take a long time. In our testing results, the
|
||||||
|
cluster was recovered on the fifth testing cycle only, after Rally had
|
||||||
|
completed testing with the error status. Therefore, the performance degradation
|
||||||
|
is very significant during cluster recovering.
|
||||||
|
|
||||||
|
sigkill-random-rabbitmq
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_kill_rabbitmq.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 1 | 0 | 0 | Yes | Yes, up to 12.266 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 2 | 0 | 0 | Yes | Yes, up to 15.775 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 3 | 98.52 | 1 | Yes | Yes, up to 145.115 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 4 | 0 | 0 | Yes | No |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 5 | 0 | 0 | Yes | Yes, up to 65.926 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_kill_rabbitmq.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_kill_rabbitmq.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 98.52 | 1 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | Yes |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor may affect OpenStack cluster operation.
|
||||||
|
Auto-healing works fine.
|
||||||
|
Performance degradation is significant during cluster recovering.
|
||||||
|
|
||||||
|
sigkill-random-mysql
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_kill_mysqld.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 1 | 2.31 | 0 | Yes | Yes, up to 12.928 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 2 | 0 | 0 | Yes | Yes, up to 11.156 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 3 | 0 | 1 | Yes | Yes, up to 13.592 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 4 | 0 | 0 | Yes | Yes, up to 11.864 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 5 | 0 | 0 | Yes | Yes, up to 12.715 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_kill_mysqld.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_kill_mysqld.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 2.31 | 1 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | Yes |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor may affect OpenStack cluster operation.
|
||||||
|
Auto-healing works fine.
|
||||||
|
Performance degradation is not significant.
|
||||||
|
|
||||||
|
|
||||||
|
freeze-random-nova-api
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_nova-api_150_sec.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 1 | 0 | 0 | Yes | Yes, up to 156.935 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 2 | 0 | 0 | Yes | Yes, up to 155.085 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 3 | 0 | 0 | Yes | Yes, up to 156.93 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 4 | 0 | 0 | Yes | Yes, up to 156.782 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 5 | 150.55 | 1 | Yes | Yes, up to 154.741 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_freeze_nova_api_150_sec.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_nova_api_150_sec.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 150.55 | 1 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | Yes |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor affects OpenStack cluster operation.
|
||||||
|
Auto-healing does not work. Cluster operation was recovered
|
||||||
|
only after sending SIGCONT POSIX signal to all freezed nova-api
|
||||||
|
processes. Performance degradation is determined by the factor duration time.
|
||||||
|
This behaviour is not normal for an HA OpenStack configuration
|
||||||
|
and should be investigated.
|
||||||
|
|
||||||
|
|
||||||
|
freeze-random-memcached
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_memcached_150_sec.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 1 | 0 | 0 | Yes | Yes, up to 26.679 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 2 | 0 | 0 | Yes | Yes, up to 23.726 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 3 | 0 | 0 | Yes | Yes, up to 21.893 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 4 | 0 | 0 | Yes | Yes, up to 22.796 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
| 5 | 0 | 0 | Yes | Yes, up to 27.737 sec. |
|
||||||
|
+--------------------+----------------+---------------------+------------------+-----------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_freeze_memcached_150_sec.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_memcached_150_sec.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 0 | 0 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | Yes |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor does not affect an OpenStack cluster operations.
|
||||||
|
During the factor testing, a small performance degradation is observed.
|
||||||
|
|
||||||
|
freeze-random-keystone
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
**Rally scenario used during factor testing:**
|
||||||
|
|
||||||
|
.. literalinclude:: rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_keystone_150_sec.json
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
**Factor testing results:**
|
||||||
|
|
||||||
|
.. table:: **Full description of cyclic execution results**
|
||||||
|
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| Cycles | MTTR(sec) | Failure rate(%) | Auto-healing | Performance degradation |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 1 | 97.19 | 7 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 2 | 93.87 | 6 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 3 | 92.12 | 8 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 4 | 94.51 | 6 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
| 5 | 98.37 | 7 | Yes | No |
|
||||||
|
+--------+-------------+-----------------+--------------+-------------------------+
|
||||||
|
|
||||||
|
**Rally report:** :download:`random_controller_freeze_keystone_150_sec.html <../../../../raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_keystone_150_sec.html>`
|
||||||
|
|
||||||
|
.. table:: **Testing results summary**
|
||||||
|
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Value | MTTR(sec) | Failure rate |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Min | 92.12 | 6 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| Max | 98.37 | 8 |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
| SLA | Yes | No |
|
||||||
|
+--------+-----------+--------------+
|
||||||
|
|
||||||
|
**Detailed results description**
|
||||||
|
|
||||||
|
This factor affects an OpenStack cluster operations.
|
||||||
|
After the keystone processes freeze on controllers, the HA
|
||||||
|
logic needs approximately 95 seconds to recover service operation.
|
||||||
|
After recovering, performance degradation is not observed but
|
||||||
|
only at small concurrency. This behaviour is not normal
|
||||||
|
for an HA OpenStack configuration and should be investigated in future.
|
||||||
|
|
||||||
|
.. references:
|
||||||
|
.. _Rally installation documentation: https://rally.readthedocs.io/en/latest/install.html
|
@ -0,0 +1,16 @@
|
|||||||
|
# SSH credentials
|
||||||
|
SSH_LOGIN="root"
|
||||||
|
SSH_PASS="r00tme"
|
||||||
|
|
||||||
|
# Controller nodes
|
||||||
|
CONTROLLERS[0]="10.44.0.7"
|
||||||
|
CONTROLLERS[1]="10.44.0.6"
|
||||||
|
CONTROLLERS[2]="10.44.0.5"
|
||||||
|
|
||||||
|
# Compute nodes
|
||||||
|
COMPUTES[0]="10.44.0.3"
|
||||||
|
COMPUTES[1]="10.44.0.4"
|
||||||
|
COMPUTES[2]="10.44.0.8"
|
||||||
|
|
||||||
|
#Scrappy base path
|
||||||
|
SCRAPPY_BASE="/root/scrappy"
|
115
doc/source/test_results/reliability/rally_plugins/scrappy.py
Normal file
115
doc/source/test_results/reliability/rally_plugins/scrappy.py
Normal file
@ -0,0 +1,115 @@
|
|||||||
|
# Copyright 2014: Mirantis Inc.
|
||||||
|
# All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
# not use this file except in compliance with the License. You may obtain
|
||||||
|
# a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
# License for the specific language governing permissions and limitations
|
||||||
|
# under the License.
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Rully scrappy plugin
|
||||||
|
This is pluging was designed for OpenStack
|
||||||
|
reliability testing.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from rally.common.i18n import _
|
||||||
|
from rally import consts
|
||||||
|
from rally.task import sla
|
||||||
|
import os
|
||||||
|
from rally.common import logging
|
||||||
|
from rally.common import streaming_algorithms as streaming
|
||||||
|
|
||||||
|
|
||||||
|
LOG = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
class MttrCalculation():
|
||||||
|
def __init__(self):
|
||||||
|
self.min_timestamp = streaming.MinComputation()
|
||||||
|
self.max_timestamp = streaming.MaxComputation()
|
||||||
|
self.mttr = 0
|
||||||
|
self.last_error_duration = 0
|
||||||
|
self.last_iteration = None
|
||||||
|
|
||||||
|
def add(self, iteration):
|
||||||
|
if iteration["error"]:
|
||||||
|
# Store duration of last error iteration
|
||||||
|
if self.max_timestamp.result() < iteration["timestamp"]:
|
||||||
|
self.last_error_duration = iteration["duration"]
|
||||||
|
|
||||||
|
self.min_timestamp.add(iteration["timestamp"])
|
||||||
|
self.max_timestamp.add(iteration["timestamp"])
|
||||||
|
LOG.info("TIMESTAMP: %s" % iteration["timestamp"])
|
||||||
|
|
||||||
|
self.last_iteration = iteration
|
||||||
|
|
||||||
|
def result(self):
|
||||||
|
self.mttr = round(self.max_timestamp.result() -
|
||||||
|
self.min_timestamp.result() +
|
||||||
|
self.last_error_duration, 2)
|
||||||
|
# SLA Context don't have information about iterations count,
|
||||||
|
# so assume that if last iteration completed with error,
|
||||||
|
# that cluster was not auto-healed
|
||||||
|
if self.last_iteration["error"]:
|
||||||
|
self.mttr = "Inf."
|
||||||
|
return(self.mttr)
|
||||||
|
|
||||||
|
|
||||||
|
@sla.configure(name="scrappy")
|
||||||
|
class Scrappy(sla.SLA):
|
||||||
|
"""Scrappy events."""
|
||||||
|
CONFIG_SCHEMA = {
|
||||||
|
"type": "object",
|
||||||
|
"$schema": consts.JSON_SCHEMA,
|
||||||
|
"properties": {
|
||||||
|
"on_iter": {"type": "number"},
|
||||||
|
"execute": {"type": "string"},
|
||||||
|
"cycle": {"type": "number"}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, criterion_value):
|
||||||
|
super(Scrappy, self).__init__(criterion_value)
|
||||||
|
self.on_iter = self.criterion_value.get("on_iter", None)
|
||||||
|
self.execute = self.criterion_value.get("execute", None)
|
||||||
|
self.cycle = self.criterion_value.get("cycle", 0)
|
||||||
|
self.errors = 0
|
||||||
|
self.total = 0
|
||||||
|
self.error_rate = 0.0
|
||||||
|
self.mttr = MttrCalculation()
|
||||||
|
|
||||||
|
def add_iteration(self, iteration):
|
||||||
|
self.total += 1
|
||||||
|
if iteration["error"]:
|
||||||
|
self.errors += 1
|
||||||
|
|
||||||
|
self.mttr.add(iteration)
|
||||||
|
|
||||||
|
"""Start iteration event"""
|
||||||
|
if self.on_iter == self.total:
|
||||||
|
LOG.info("Scrappy testing cycle: ITER: %s" % self.cycle)
|
||||||
|
LOG.info("Scrappy executing: %s" % self.on_iter)
|
||||||
|
os.system(self.execute)
|
||||||
|
|
||||||
|
self.error_rate = self.errors * 100.0 / self.total
|
||||||
|
self.success = self.error_rate <= 5
|
||||||
|
return self.success
|
||||||
|
|
||||||
|
def merge(self, other):
|
||||||
|
self.total += other.total
|
||||||
|
self.errors += other.errors
|
||||||
|
if self.total:
|
||||||
|
self.error_rate = self.errors * 100.0 / self.total
|
||||||
|
self.success = self.error_rate <= 5
|
||||||
|
return self.success
|
||||||
|
|
||||||
|
def details(self):
|
||||||
|
return (_("Scrappy failure rate %.2f%% MTTR %s seconds - %s") %
|
||||||
|
(self.error_rate, self.mttr.result(), self.status()))
|
116
doc/source/test_results/reliability/rally_plugins/scrappy.sh
Executable file
116
doc/source/test_results/reliability/rally_plugins/scrappy.sh
Executable file
@ -0,0 +1,116 @@
|
|||||||
|
# Copyright 2014: Mirantis Inc.
|
||||||
|
# All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
# not use this file except in compliance with the License. You may obtain
|
||||||
|
# a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
# License for the specific language governing permissions and limitations
|
||||||
|
# under the License.
|
||||||
|
|
||||||
|
#!/bin/bash -xe
|
||||||
|
|
||||||
|
# source credentionals
|
||||||
|
if [ -f /data/rally/rally_plugins/scrappy/scrappy.conf ];
|
||||||
|
then
|
||||||
|
. /data/rally/rally_plugins/scrappy/scrappy.conf
|
||||||
|
else
|
||||||
|
exit -1
|
||||||
|
fi
|
||||||
|
|
||||||
|
#
|
||||||
|
# Function exetute command over ssh
|
||||||
|
# Login & password stored in scrappy.conf
|
||||||
|
#
|
||||||
|
function ssh_exec() {
|
||||||
|
local ssh_node=$1
|
||||||
|
local ssh_cmd=$2
|
||||||
|
local ssh_options='-oConnectTimeout=5 -oStrictHostKeyChecking=no -oCheckHostIP=no -oUserKnownHostsFile=/dev/null -oRSAAuthentication=no'
|
||||||
|
echo "sshpass -p ${SSH_PASS} ssh ${ssh_options} ${SSH_LOGIN}@${ssh_node} ${ssh_cmd}"
|
||||||
|
local ssh_result=`sshpass -p ${SSH_PASS} ssh ${ssh_options} ${SSH_LOGIN}@${ssh_node} ${ssh_cmd}`
|
||||||
|
echo "$ssh_result"
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Function return random controller node from Fuel cluster
|
||||||
|
#
|
||||||
|
function get_random_controller() {
|
||||||
|
local random_controller=${CONTROLLERS[$RANDOM % ${#CONTROLLERS[@]}]}
|
||||||
|
echo $random_controller
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Function return random compute node from Fuel cluster
|
||||||
|
#
|
||||||
|
function get_random_compute() {
|
||||||
|
local random_compute=${COMPUTES[$RANDOM % ${#COMPUTES[@]}]}
|
||||||
|
echo $random_compute
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Factors
|
||||||
|
#
|
||||||
|
function random_controller_kill_rabbitmq() {
|
||||||
|
local action=$1
|
||||||
|
local controller_node=$(get_random_controller)
|
||||||
|
local result=`ssh_exec ${controller_node} "${SCRAPPY_BASE}/scrappy_host.sh send_signal rabbitmq_server -KILL"`
|
||||||
|
echo "$result"
|
||||||
|
}
|
||||||
|
|
||||||
|
function random_controller_freeze_process_random_interval() {
|
||||||
|
local process_name=$1
|
||||||
|
local interval=$2
|
||||||
|
local controller_node=$(get_random_controller)
|
||||||
|
local result=`ssh_exec ${controller_node} "${SCRAPPY_BASE}/scrappy_host.sh freeze_process_random_interval ${process_name} ${interval}"`
|
||||||
|
echo "$result"
|
||||||
|
}
|
||||||
|
|
||||||
|
function random_controller_freeze_process_fixed_interval() {
|
||||||
|
local process_name=$1
|
||||||
|
local interval=$2
|
||||||
|
local controller_node=$(get_random_controller)
|
||||||
|
local result=`ssh_exec ${controller_node} "${SCRAPPY_BASE}/scrappy_host.sh freeze_process_fixed_interval ${process_name} ${interval}"`
|
||||||
|
echo "$result"
|
||||||
|
}
|
||||||
|
|
||||||
|
function random_controller_reboot() {
|
||||||
|
local controller_node=$(get_random_controller)
|
||||||
|
local result=`ssh_exec ${controller_node} "${SCRAPPY_BASE}/scrappy_host.sh reboot_node"`
|
||||||
|
echo "$result"
|
||||||
|
}
|
||||||
|
|
||||||
|
function usage() {
|
||||||
|
echo "usage"
|
||||||
|
echo "TODO"
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Main
|
||||||
|
#
|
||||||
|
function main() {
|
||||||
|
local factor=$1
|
||||||
|
case ${factor} in
|
||||||
|
random_controller_kill_rabbitmq)
|
||||||
|
random_controller_kill_rabbitmq $2
|
||||||
|
;;
|
||||||
|
random_controller_freeze_process_random_interval)
|
||||||
|
random_controller_freeze_process_random_interval $2 $3
|
||||||
|
;;
|
||||||
|
random_controller_freeze_process_fixed_interval)
|
||||||
|
random_controller_freeze_process_fixed_interval $2 $3
|
||||||
|
;;
|
||||||
|
random_controller_reboot)
|
||||||
|
random_controller_reboot
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
usage
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
main "$@"
|
132
doc/source/test_results/reliability/rally_plugins/scrappy_host.sh
Executable file
132
doc/source/test_results/reliability/rally_plugins/scrappy_host.sh
Executable file
@ -0,0 +1,132 @@
|
|||||||
|
# Copyright 2014: Mirantis Inc.
|
||||||
|
# All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
# not use this file except in compliance with the License. You may obtain
|
||||||
|
# a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
# License for the specific language governing permissions and limitations
|
||||||
|
# under the License.
|
||||||
|
|
||||||
|
#!/bin/bash -xe
|
||||||
|
|
||||||
|
LOG_FILE="/var/log/scrappy.log"
|
||||||
|
|
||||||
|
#
|
||||||
|
# Logging function
|
||||||
|
#
|
||||||
|
function log() {
|
||||||
|
echo "`date -u` scrappy_host: $1" >> ${LOG_FILE}
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This is function send specified signal
|
||||||
|
# to all processes with given name
|
||||||
|
#
|
||||||
|
function send_signal() {
|
||||||
|
local process_name=$1
|
||||||
|
local signal=$2
|
||||||
|
local pids=`ps -ef | grep $process_name | grep -v grep | grep -v scrappy_host | awk '{print $2}'`
|
||||||
|
for each_pid in ${pids};
|
||||||
|
do
|
||||||
|
log "sending signal: ${signal} to ${process_name} with pid:$each_pid"
|
||||||
|
kill ${signal} ${each_pid}
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This is function control services
|
||||||
|
#
|
||||||
|
function service_control() {
|
||||||
|
local service_name=$1
|
||||||
|
local service_action=$2
|
||||||
|
log "service control: $service_name action: $service_action"
|
||||||
|
service $service_name $service_action
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This is function reboot node
|
||||||
|
#
|
||||||
|
function reboot_node() {
|
||||||
|
log "reboot"
|
||||||
|
shutdown -r now
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This factor freeze specifid process
|
||||||
|
#
|
||||||
|
function freeze_process_random_interval {
|
||||||
|
local process_name=$1
|
||||||
|
local max_interval=$2
|
||||||
|
local interval=$(( ($RANDOM % ${max_interval}) + 1))
|
||||||
|
log "freeze_process_random_interval: freezing process ${process_name} freeze interval ${interval}"
|
||||||
|
send_signal ${process_name} '-STOP'
|
||||||
|
sleep ${interval}
|
||||||
|
log "freeze_process_random_interval: unfreezing process ${process_name}"
|
||||||
|
send_signal ${process_name} '-CONT'
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# This factor freeze specifid process
|
||||||
|
#
|
||||||
|
function freeze_process_fixed_interval {
|
||||||
|
local process_name=$1
|
||||||
|
local interval=$2
|
||||||
|
log "freeze_process_fixed_interval: freezing process ${process_name} freeze interval ${interval}"
|
||||||
|
send_signal ${process_name} '-STOP'
|
||||||
|
sleep ${interval}
|
||||||
|
log "freeze_process_fixed_interval: unfreezing process ${process_name}"
|
||||||
|
send_signal ${process_name} '-CONT'
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# Show usage
|
||||||
|
#
|
||||||
|
function usage() {
|
||||||
|
echo "scrappy_host usage:"
|
||||||
|
echo "scrappy_host commands:"
|
||||||
|
echo -e "\t send_signal process_name signal"
|
||||||
|
echo -e "\t service_control service_name action"
|
||||||
|
echo -e "\t freeze_process_random_interval process max_interval"
|
||||||
|
echo -e "\t freeze_process_fixed_interval process interval"
|
||||||
|
echo -e "\t reboot_node"
|
||||||
|
}
|
||||||
|
|
||||||
|
#
|
||||||
|
# main
|
||||||
|
#
|
||||||
|
function main() {
|
||||||
|
local command=$1
|
||||||
|
case $command in
|
||||||
|
send_signal)
|
||||||
|
send_signal $2 $3
|
||||||
|
;;
|
||||||
|
service_control)
|
||||||
|
service_control $2 $3
|
||||||
|
;;
|
||||||
|
reboot_node)
|
||||||
|
reboot_node
|
||||||
|
;;
|
||||||
|
freeze_process_random_interval)
|
||||||
|
set +xe
|
||||||
|
freeze_process_random_interval $2 $3 &
|
||||||
|
set -xe
|
||||||
|
;;
|
||||||
|
freeze_process_fixed_interval)
|
||||||
|
set +xe
|
||||||
|
freeze_process_fixed_interval $2 $3 &
|
||||||
|
set -xe
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
usage
|
||||||
|
exit -1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
main "$@"
|
37
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_keystone_150_sec.json
Normal file
37
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_keystone_150_sec.json
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_freeze_process_fixed_interval keystone 150",
|
||||||
|
"cycle": {{i}}
|
||||||
|
},
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
37
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_memcached_150_sec.json
Normal file
37
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_memcached_150_sec.json
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_freeze_process_fixed_interval memcached 150",
|
||||||
|
"cycle": {{i}}
|
||||||
|
},
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
37
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_nova-api_150_sec.json
Normal file
37
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_freeze_nova-api_150_sec.json
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 15
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_freeze_process_fixed_interval nova-api 150",
|
||||||
|
"cycle": {{i}}
|
||||||
|
},
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
38
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_kill_mysqld.json
Normal file
38
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_kill_mysqld.json
Normal file
@ -0,0 +1,38 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh send_signal /usr/sbin/mysqld -KILL",
|
||||||
|
"cycle": {{i}}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
38
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_kill_rabbitmq.json
Normal file
38
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_kill_rabbitmq.json
Normal file
@ -0,0 +1,38 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_kill_rabbitmq",
|
||||||
|
"cycle": {{i}}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
38
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_reboot_factor.json
Normal file
38
doc/source/test_results/reliability/rally_scenarios/NovaServers/boot_and_delete_server/random_controller_reboot_factor.json
Normal file
@ -0,0 +1,38 @@
|
|||||||
|
{% set flavor_name = flavor_name or "m1.tiny" %}
|
||||||
|
{% set image_name = image_name or "^(cirros.*uec|TestVM)$" %}
|
||||||
|
{
|
||||||
|
"NovaServers.boot_and_delete_server": [
|
||||||
|
{% for i in range (0, 5, 1) %}
|
||||||
|
{
|
||||||
|
|
||||||
|
"args": {
|
||||||
|
"flavor": {
|
||||||
|
"name": "{{flavor_name}}"
|
||||||
|
},
|
||||||
|
"image": {
|
||||||
|
"name": "{{image_name}}"
|
||||||
|
},
|
||||||
|
"force_delete": false
|
||||||
|
},
|
||||||
|
"runner": {
|
||||||
|
"type": "constant",
|
||||||
|
"times": 100,
|
||||||
|
"concurrency": 5
|
||||||
|
},
|
||||||
|
"context": {
|
||||||
|
"users": {
|
||||||
|
"tenants": 1,
|
||||||
|
"users_per_tenant": 1
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sla": {
|
||||||
|
"scrappy": {
|
||||||
|
"on_iter": 20,
|
||||||
|
"execute": "/bin/bash /data/rally/rally_plugins/scrappy/scrappy.sh random_controller_reboot",
|
||||||
|
"cycle": {{i}}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{% endfor %}
|
||||||
|
]
|
||||||
|
}
|
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_keystone_150_sec.html
Normal file
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_keystone_150_sec.html
Normal file
File diff suppressed because one or more lines are too long
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_memcached_150_sec.html
Normal file
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_memcached_150_sec.html
Normal file
File diff suppressed because one or more lines are too long
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_nova_api_150_sec.html
Normal file
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_freeze_nova_api_150_sec.html
Normal file
File diff suppressed because one or more lines are too long
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_kill_mysqld.html
Normal file
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_kill_mysqld.html
Normal file
File diff suppressed because one or more lines are too long
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_kill_rabbitmq.html
Normal file
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/random_controller_kill_rabbitmq.html
Normal file
File diff suppressed because one or more lines are too long
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/reboot_random_controller.html
Normal file
856
raw_results/reliability/rally_results/NovaServers/boot_and_delete_server/reboot_random_controller.html
Normal file
File diff suppressed because one or more lines are too long
Loading…
x
Reference in New Issue
Block a user