Test plan of provisioning systems.

Add test plan which describes how measure performance
of provisioning systems.

* Add table titles
* Delete configuration management template
* Delete author info
* Fix abstract and conventions sections
* Change hardware info table to good looking
* Fix the reference to the script
Change-Id: I8cb3524dbd12bcd67502e3c6fd9c003b495f18bb
This commit is contained in:
Leontii Istomin 2015-12-08 10:57:49 +00:00
parent 8d0ce71e72
commit a101c13368
4 changed files with 432 additions and 1 deletions

View File

@ -11,4 +11,3 @@ Performance Documentation
.. raw:: pdf
PageBreak oneColumn

View File

@ -10,3 +10,4 @@ Test Plans
:maxdepth: 2
mq/index
provisioning/main

View File

@ -0,0 +1,345 @@
.. _Measuring_performance_of_provisioning_systems:
=============================================
Measuring performance of provisioning systems
=============================================
:status: draft is in progress
:version: 0
:Abstract:
This document describes a test plan for quantifying the performance of
provisioning systems as a function of the number of nodes to be provisioned. The
plan includes the collection of several resource utilization metrics, which will
be used to analyze and understand the overall performance of each system. In
particular, resource bottlenecks will either be fixed, or best practices
developed for system configuration and hardware requirements.
:Conventions:
- **Provisioning:** is the entire process of installing and configuring an
operating system.
- **Provisioning system:** is a service or a set of services which enables the
installation of an operating system and performs basic operations such as
configuring network interfaces and partitioning disks. A preliminary
`list of provisioning systems`_ can be found below in `Applications`_.
The provisioning system
can include configuration management systems like Puppet or Chef, but
this feature will not be considered in this document. The test plan for
configuration management systems is described in the
"Measuring_performance_of_configuration_management_systems" document.
- **Performance of a provisioning system:** is a set of metrics which
describes how many nodes can be provisioned at the same time and the
hardware resources required to do so.
- **Nodes:** are servers which will be provisioned.
List of performance metrics
---------------------------
The table below shows the list of test metrics to be collected. The priority
is the relative ranking of the importance of each metric in evaluating the
performance of the system.
.. table:: List of performance metrics
+--------+------------------------+------------------------------------------+
|Priority| Parameter | Description |
+========+========================+==========================================+
| | | | The elapsed time to provision all |
| 1 |PROVISIONING_TIME(NODES)| | nodes, as a function of the numbers of |
| | | | nodes |
+--------+------------------------+------------------------------------------+
| | | | Incoming network bandwidth usage as a |
| 2 |INGRESS_NET(NODES) | | function of the number of nodes. |
| | | | Average during provisioning on the host|
| | | | where the provisioning system is |
| | | | installed. |
+--------+------------------------+------------------------------------------+
| | | | Outgoing network bandwidth usage as a |
| 2 | EGRESS_NET(NODES) | | function of the number of nodes. |
| | | | Average during provisioning on the host|
| | | | where the provisioning system is |
| | | | installed. |
+--------+------------------------+------------------------------------------+
| | | | CPU utilization as a function of the |
| 3 | CPU(NODES) | | number of nodes. Average during |
| | | | provisioning on the host where the |
| | | | provisioning system is installed. |
+--------+------------------------+------------------------------------------+
| | | | Active memory usage as a function of |
| 3 | RAM(NODES) | | the number of nodes. Average during |
| | | | provisioning on the host where the |
| | | | provisioning system is installed. |
+--------+------------------------+------------------------------------------+
| | | | Storage read IO bandwidth as a |
| 3 | WRITE_IO(NODES) | | function of the number of nodes. |
| | | | Average during provisioning on the host|
| | | | where the provisioning system is |
| | | | installed. |
+--------+------------------------+------------------------------------------+
| | | | Storage write IO bandwidth as a |
| 3 | READ_IO(NODES) | | function of the number of nodes. |
| | | | Average during provisioning on the host|
| | | | where the provisioning system is |
| | | | installed. |
+--------+------------------------+------------------------------------------+
Test Plan
---------
The above performance metrics will be measured for various number
of provisioned nodes. The result will be a table that shows the
dependence of these metrics on the number of nodes.
Environment description
^^^^^^^^^^^^^^^^^^^^^^^
Test results MUST include a description of the environment used. The following items
should be included:
- **Hardware configuration of each server.** If virtual machines are used then both
physical and virtual hardware should be fully documented.
An example format is given below:
.. table:: Description of server hardware
+-------+----------------+-------+-------+
|server |name | | |
| +----------------+-------+-------+
| |role | | |
| +----------------+-------+-------+
| |vendor,model | | |
| +----------------+-------+-------+
| |operating_system| | |
+-------+----------------+-------+-------+
|CPU |vendor,model | | |
| +----------------+-------+-------+
| |processor_count | | |
| +----------------+-------+-------+
| |core_count | | |
| +----------------+-------+-------+
| |frequency_MHz | | |
+-------+----------------+-------+-------+
|RAM |vendor,model | | |
| +----------------+-------+-------+
| |amount_MB | | |
+-------+----------------+-------+-------+
|NETWORK|interface_name | | |
| +----------------+-------+-------+
| |vendor,model | | |
| +----------------+-------+-------+
| |bandwidth | | |
+-------+----------------+-------+-------+
|STORAGE|dev_name | | |
| +----------------+-------+-------+
| |vendor,model | | |
| +----------------+-------+-------+
| |SSD/HDD | | |
| +----------------+-------+-------+
| |size | | |
+-------+----------------+-------+-------+
- **Configuration of hardware network switches.** The configuration file from the
switch can be downloaded and attached.
- **Configuration of virtual machines and virtual networks (if they are used).**
The configuration files can be attached, along with the mapping of virtual
machines to host machines.
- **Network scheme.** The plan should show how all hardware is connected and
how the components communicate. All ethernet/fibrechannel and VLAN channels
should be included. Each interface of every hardware component should be
matched with the corresponding L2 channel and IP address.
- **Software configuration of the provisioning system.** `sysctl.conf` and any
other kernel file that is changed from the default should be attached.
List of installed packages should be attached. Specifications of the
operating system, network interfaces configuration, and disk partitioning
configuration should be included. If distributed provisioning systems are
to be tested then the parts that are distributed need to be described.
- **Desired software configuration of the provisioned nodes.**
The operating system, disk partitioning scheme, network interface
configuration, installed packages and other components of the nodes
affect the amount of work to be performed by the provisioning system
and thus its performance.
Preparation
^^^^^^^^^^^
1.
The following package needs to be installed on the provisioning system
servers to collect performance metrics.
.. table:: Software to be installed
+--------------+---------+-----------------------------------+
| package name | version | source |
+==============+=========+===================================+
| `dstat`_ | 0.7.2 | Ubuntu trusty universe repository |
+--------------+---------+-----------------------------------+
Measuring performance values
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The script
`Full script for collecting performance metrics`_
can be used for the first five of the following steps.
.. note::
If a distributed provisioning system is used, the values need to be
measured on each provisioning system instance.
1.
Start the collection of CPU, memory, network, and storage metrics during the
provisioning process. Use the dstat programm which can collect all of these
metrics in CSV format into a log file.
2.
Start the provisioning process for the first node and record the wall time.
3.
Wait until the provisioning process has finished (when all nodes are reachable
via ssh)
and record the wall time.
4.
Stop the dstat program.
5.
Prepare collected data for analysis. dstat provides a large amount of
information, which can be pruned by saving only the following:
* "system"[time]. Save as given.
* 100-"total cpu usage"[idl]. dstat provides only the idle CPU value. CPU
utilization is calculated by subtracting the idle value from 100%.
* "memory usage"[used]. dstat provides this value in Bytes.
This is converted it to Megabytes by dividing by 1024*1024=1048576.
* "net/eth0"[recv] receive bandwidth on the NIC. It is converted to Megabits
per second by dividing by 1024*1024/8=131072.
* "net/eth0"[send] send bandwidth on the NIC. It is converted to Megabits
per second by dividing by 1024*1024/8=131072.
* "net/eth0"[recv]+"net/eth0"[send]. The total receive and transmit bandwidth
on the NIC. dstat provides these values in Bytes per second. They are
converted to Megabits per second by dividing by 1024*1024/8=131072.
* "io/total"[read] storage read IO bandwidth.
* "io/total"[writ] storage write IO bandwidth.
* "io/total"[read]+"io/total"[writ]. The total read and write storage IO
bandwidth.
These values will be graphed and maximum values reported.
6.
Repeat steps 1-5 for provisioning at the same time the following number of
nodes:
* 10 nodes
* 20 nodes
* 40 nodes
* 80 nodes
* 160 nodes
* 320 nodes
* 640 nodes
* 1280 nodes
* 2000 nodes
Additional tests will be performed if some anomalous behaviour is found.
These may require the collection of additional performance metrics.
7.
The result of this part of test will be:
* to provide the following graphs, one for each number of provisioned nodes:
#) Three dependencies on one graph.
* INGRESS_NET(TIME) Dependence on time of incoming network bandwidth usage.
* EGRESS_NET(TIME) Dependence on time of outgoing network bandwidth usage.
* ALL_NET(TIME) Dependence on time of total network bandwidth usage.
#) One dependence on one graph.
* CPU(TIME) Dependence on time of CPU utilization.
#) One dependence on one graph.
* RAM(TIME) Dependence on time of active memory usage.
#) Three dependencies on one graph.
* WRITE_IO(TIME) Dependence on time of storage write IO bandwidth.
* READ_IO(TIME) Dependence on time of storage read IO bandwidth.
* ALL_IO(TIME) Dependence on time of total storage IO bandwidth.
.. note::
If a distributed provisioning system is used, the above graphs should be
provided for each provisioning system instance.
* to fill in the following table for maximum values:
The resource metrics are obtained from the maxima of the corresponding graphs
above. The provisioning time is the elapsed time for all nodes to be
provisioned. One set of metrics will be given for each number of provisioned
nodes.
.. table:: Maximum values of performance metrics
+-------+--------------+---------+---------+---------+---------+
|| nodes|| provisioning|| maximum|| maximum|| maximum|| maximum|
|| count|| time || CPU || RAM || NET || IO |
| | || usage || usage || usage || usage |
+=======+==============+=========+=========+=========+=========+
| 10 | | | | | |
+-------+--------------+---------+---------+---------+---------+
| 20 | | | | | |
+-------+--------------+---------+---------+---------+---------+
| 40 | | | | | |
+-------+--------------+---------+---------+---------+---------+
| 80 | | | | | |
+-------+--------------+---------+---------+---------+---------+
| 160 | | | | | |
+-------+--------------+---------+---------+---------+---------+
| 320 | | | | | |
+-------+--------------+---------+---------+---------+---------+
| 640 | | | | | |
+-------+--------------+---------+---------+---------+---------+
| 1280 | | | | | |
+-------+--------------+---------+---------+---------+---------+
| 2000 | | | | | |
+-------+--------------+---------+---------+---------+---------+
Applications
------------
list of provisioning systems
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. table:: list of provisioning systems
+-----------------------------+---------+
| Name of provisioning system | Version |
+=============================+=========+
| `Cobbler`_ | 2.4 |
+-----------------------------+---------+
| `Razor`_ | 0.13 |
+-----------------------------+---------+
| Image based provisioning | |
| via downloading images with | - |
| bittorrent protocol | |
+-----------------------------+---------+
Full script for collecting performance metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. literalinclude:: measure.sh
:language: bash
.. references:
.. _dstat: http://dag.wiee.rs/home-made/dstat/
.. _Cobbler: http://cobbler.github.io/
.. _Razor: https://github.com/puppetlabs/razor-server

View File

@ -0,0 +1,86 @@
#!/bin/bash
# Need to install the required packages on provisioning system servers:
if (("`dpkg -l | grep dstat | grep ^ii > /dev/null; echo $?` == 1"))
then
apt-get -y install dstat
fi
# Need to prepare the following script on provisioning system server to collect
# values of CPU,RAM,NET and IO loads per second. You need to change "INTERFACE"
# variable regarding the interface which connected to nodes to communicare with
# them during provisioning process. As a result of this command we'll get
# running in backgroud dstat programm which collecting needed parametes in CSV
# format into /var/log/dstat.log file.:
INTERFACE=eth0
OUTPUT_FILE=/var/log/dstat.csv
dstat --nocolor --time --cpu --mem --net -N ${INTERFACE} --io --output ${OUTPUT_FILE} > /dev/null &
# Need to prepare script which starts provisioning process and gets the time when
# provisioning started and when provisioning ended ( when all nodes reachable via
# ssh). We'll analyze results collected during this time window. For getting
# start time we can add "date" command before API call or CLI command and forward
# the output of the command to some log file. Here is example for cobbler:
ENV_NAME=env-1
start_time=`date +%s.%N`
echo "Provisioning started at "`date` > /var/log/provisioning.log
for SYSTEM in `cobbler system find --comment=${ENV_NAME}`
do
cobbler system reboot --name=$i &
done
# For getting end-time we can use the script below. This script tries to reach
# nodes via ssh and write "Provisioning finished at <date/time>" into
# /var/log/provisioning.log file. You'll need to provide ip addresses of the
# nodes (from file nodes_ips.list, where IPs listed one per line) and
# creadentials (SSH_PASSWORD and SSH_USER variables):
SSH_OPTIONS="StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
SSH_PASSWORD="r00tme"
SSH_USER="root"
NODE_IPS=(`cat nodes_ips.list`)
TIMER=0
TIMEOUT=20
while (("${TIMER}" < "${TIMEOUT}"))
do
for NODE_IP in ${NODE_IPS[@]}
do
SSH_CMD="sshpass -p ${SSH_PASSWORD} ssh -o ${SSH_OPTIONS} ${SSH_USER}@${NODE_IP}"
${SSH_CMD} "hostname" && UNHAPPY_SSH=0 || UNHAPPY_SSH=1
if (("${UNHAPPY_SSH}" == "0"))
then
echo "Node with ip "${NODE_IP}" is reachable via ssh"
NODE_IPS=(${NODE_IPS[@]/${NODE_IP}})
else
echo "Node with ip "${NODE_IP}" is still unreachable via ssh"
fi
done
TIMER=$((${TIMER} + 1))
if (("${TIMER}" == "${TIMEOUT}"))
then
echo "The following "${#NODE_IPS[@]}" are unreachable"
echo ${NODE_IPS[@]}
exit 1
fi
if ((${#NODE_IPS[@]} == 0 ))
then
break
fi
# Check that nodes are reachable once per 1 seconds
sleep 1
done
echo "Provisioning finished at "`date` > /var/log/provisioning.log
end_time=`date +%s.%N`
elapsed_time=$(echo "$end_time - $start_time" | bc -l)
echo "Total elapsed time for provisioning: $elapsed_time seconds" > /var/log/provisioning.log
# Stop dstat command
killall dstat
# Delete excess values and convert to needed metrics. So, we'll get the
# following csv format:
# time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all
awk -F "," 'BEGIN {getline;getline;getline;getline;getline;getline;getline;
print "time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all"}
{print $1","100-$4","$8/1048576","$12/131072","$13/131072","($12+$13)/131072","$14","$15","$14+$15}' \
$OUTPUT_FILE > /var/log/10_nodes.csv