Test plan of provisioning systems.

Add test plan which describes how measure performance of provisioning systems. * Add table titles * Delete configuration management template * Delete author info * Fix abstract and conventions sections * Change hardware info table to good looking * Fix the reference to the script Change-Id: I8cb3524dbd12bcd67502e3c6fd9c003b495f18bb
2015-12-08 10:57:49 +00:00 · 2015-12-08 10:57:49 +00:00 · a101c13368
parent 8d0ce71e72
commit a101c13368
4 changed files with 432 additions and 1 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -11,4 +11,3 @@ Performance Documentation
 .. raw:: pdf

    PageBreak oneColumn
-
--- a/doc/source/test_plans/index.rst
+++ b/doc/source/test_plans/index.rst
@ -10,3 +10,4 @@ Test Plans
    :maxdepth: 2

    mq/index
+    provisioning/main
--- a/doc/source/test_plans/provisioning/main.rst
+++ b/doc/source/test_plans/provisioning/main.rst
@ -0,0 +1,345 @@
+.. _Measuring_performance_of_provisioning_systems:
+
+=============================================
+Measuring performance of provisioning systems
+=============================================
+
+:status: draft is in progress
+:version: 0
+
+:Abstract:
+
+  This document describes a test plan for quantifying the performance of
+  provisioning systems as a function of the number of nodes to be provisioned. The
+  plan includes the collection of several resource utilization metrics, which will
+  be used to analyze and understand the overall performance of each system. In
+  particular, resource bottlenecks will either be fixed, or best practices
+  developed for system configuration and hardware requirements.
+
+:Conventions:
+
+  - **Provisioning:** is the entire process of installing and configuring an
+    operating system.
+
+  - **Provisioning system:** is a service or a set of services which enables the
+    installation of an operating system and performs basic operations such as
+    configuring network interfaces and partitioning disks. A preliminary
+    `list of provisioning systems`_ can be found below in `Applications`_.
+    The provisioning system
+    can include configuration management systems like Puppet or Chef, but
+    this feature will not be considered in this document. The test plan for
+    configuration management systems is described in the
+    "Measuring_performance_of_configuration_management_systems" document.
+
+  - **Performance of a provisioning system:** is a set of metrics which
+    describes how many nodes can be provisioned at the same time and the
+    hardware resources required to do so.
+
+  - **Nodes:** are servers which will be provisioned.
+
+List of performance metrics
+---------------------------
+The table below shows the list of test metrics to be collected. The priority
+is the relative ranking of the importance of each metric in evaluating the
+performance of the system.
+
+.. table:: List of performance metrics
+
+  +--------+------------------------+------------------------------------------+
+  |Priority| Parameter              | Description                              |
+  +========+========================+==========================================+
+  |        |                        | | The elapsed time to provision all      |
+  | 1      |PROVISIONING_TIME(NODES)| | nodes, as a function of the numbers of |
+  |        |                        | | nodes                                  |
+  +--------+------------------------+------------------------------------------+
+  |        |                        | | Incoming network bandwidth usage as a  |
+  | 2      |INGRESS_NET(NODES)      | | function of the number of nodes.       |
+  |        |                        | | Average during provisioning on the host|
+  |        |                        | | where the provisioning system is       |
+  |        |                        | | installed.                             |
+  +--------+------------------------+------------------------------------------+
+  |        |                        | | Outgoing network bandwidth usage as a  |
+  | 2      | EGRESS_NET(NODES)      | | function of the number of nodes.       |
+  |        |                        | | Average during provisioning on the host|
+  |        |                        | | where the provisioning system is       |
+  |        |                        | | installed.                             |
+  +--------+------------------------+------------------------------------------+
+  |        |                        | | CPU utilization as a function of the   |
+  | 3      | CPU(NODES)             | | number of nodes. Average during        |
+  |        |                        | | provisioning on the host where the     |
+  |        |                        | | provisioning system is installed.      |
+  +--------+------------------------+------------------------------------------+
+  |        |                        | | Active memory usage as a function of   |
+  | 3      | RAM(NODES)             | | the number of nodes. Average during    |
+  |        |                        | | provisioning on the host where the     |
+  |        |                        | | provisioning system is installed.      |
+  +--------+------------------------+------------------------------------------+
+  |        |                        | | Storage read IO bandwidth as a         |
+  | 3      | WRITE_IO(NODES)        | | function of the number of nodes.       |
+  |        |                        | | Average during provisioning on the host|
+  |        |                        | | where the provisioning system is       |
+  |        |                        | | installed.                             |
+  +--------+------------------------+------------------------------------------+
+  |        |                        | | Storage write IO bandwidth as a        |
+  | 3      | READ_IO(NODES)         | | function of the number of nodes.       |
+  |        |                        | | Average during provisioning on the host|
+  |        |                        | | where the provisioning system is       |
+  |        |                        | | installed.                             |
+  +--------+------------------------+------------------------------------------+
+
+Test Plan
+---------
+
+The above performance metrics will be measured for various number
+of provisioned nodes. The result will be a table that shows the
+dependence of these metrics on the number of nodes.
+
+Environment description
+^^^^^^^^^^^^^^^^^^^^^^^
+Test results MUST include a description of the environment used. The following items
+should be included:
+
+- **Hardware configuration of each server.** If virtual machines are used then both
+  physical and virtual hardware should be fully documented.
+  An example format is given below:
+
+.. table:: Description of server hardware
+
+  +-------+----------------+-------+-------+
+  |server |name            |       |       |
+  |       +----------------+-------+-------+
+  |       |role            |       |       |
+  |       +----------------+-------+-------+
+  |       |vendor,model    |       |       |
+  |       +----------------+-------+-------+
+  |       |operating_system|       |       |
+  +-------+----------------+-------+-------+
+  |CPU    |vendor,model    |       |       |
+  |       +----------------+-------+-------+
+  |       |processor_count |       |       |
+  |       +----------------+-------+-------+
+  |       |core_count      |       |       |
+  |       +----------------+-------+-------+
+  |       |frequency_MHz   |       |       |
+  +-------+----------------+-------+-------+
+  |RAM    |vendor,model    |       |       |
+  |       +----------------+-------+-------+
+  |       |amount_MB       |       |       |
+  +-------+----------------+-------+-------+
+  |NETWORK|interface_name  |       |       |
+  |       +----------------+-------+-------+
+  |       |vendor,model    |       |       |
+  |       +----------------+-------+-------+
+  |       |bandwidth       |       |       |
+  +-------+----------------+-------+-------+
+  |STORAGE|dev_name        |       |       |
+  |       +----------------+-------+-------+
+  |       |vendor,model    |       |       |
+  |       +----------------+-------+-------+
+  |       |SSD/HDD         |       |       |
+  |       +----------------+-------+-------+
+  |       |size            |       |       |
+  +-------+----------------+-------+-------+
+
+- **Configuration of hardware network switches.** The configuration file from the
+  switch can be downloaded and attached.
+
+- **Configuration of virtual machines and virtual networks (if they are used).**
+  The configuration files can be attached, along with the mapping of virtual
+  machines to host machines.
+
+- **Network scheme.** The plan should show how all hardware is connected and
+  how the components communicate. All ethernet/fibrechannel and VLAN channels
+  should be included. Each interface of every hardware component should be
+  matched with the corresponding L2 channel and IP address.
+
+- **Software configuration of the provisioning system.** `sysctl.conf` and any
+  other kernel file that is changed from the default should be attached.
+  List of installed packages should be attached. Specifications of the
+  operating system, network interfaces configuration, and disk partitioning
+  configuration should be included. If distributed provisioning systems are
+  to be tested then the parts that are distributed need to be described.
+
+- **Desired software configuration of the provisioned nodes.**
+  The operating system, disk partitioning scheme, network interface
+  configuration, installed packages and other components of the nodes
+  affect the amount of work to be performed by the provisioning system
+  and thus its performance.
+
+Preparation
+^^^^^^^^^^^
+1.
+  The following package needs to be installed on the provisioning system
+  servers to collect performance metrics.
+
+.. table:: Software to be installed
+
+  +--------------+---------+-----------------------------------+
+  | package name | version | source                            |
+  +==============+=========+===================================+
+  | `dstat`_     | 0.7.2   | Ubuntu trusty universe repository |
+  +--------------+---------+-----------------------------------+
+
+Measuring performance values
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The script
+`Full script for collecting performance metrics`_
+can be used for the first five of the following steps.
+
+.. note::
+  If a distributed provisioning system is used, the values need to be
+  measured on each provisioning system instance.
+
+1.
+  Start the collection of CPU, memory, network, and storage metrics during the
+  provisioning process. Use the dstat programm which can collect all of these
+  metrics in CSV format into a log file.
+2.
+  Start the provisioning process for the first node and record the wall time.
+3.
+  Wait until the provisioning process has finished (when all nodes are reachable
+  via ssh)
+  and record the wall time.
+4.
+  Stop the dstat program.
+5.
+  Prepare collected data for analysis. dstat provides a large amount of
+  information, which can be pruned by saving only the following:
+
+  * "system"[time]. Save as given.
+
+  * 100-"total cpu usage"[idl]. dstat provides only the idle CPU value. CPU
+    utilization is calculated by subtracting the idle value from 100%.
+
+  * "memory usage"[used]. dstat provides this value in Bytes.
+    This is converted it to Megabytes by dividing by 1024*1024=1048576.
+
+  * "net/eth0"[recv] receive bandwidth on the NIC. It is converted to Megabits
+    per second by dividing by 1024*1024/8=131072.
+
+  * "net/eth0"[send] send bandwidth on the NIC. It is converted to Megabits
+    per second by dividing by 1024*1024/8=131072.
+
+  * "net/eth0"[recv]+"net/eth0"[send]. The total receive and transmit bandwidth
+    on the NIC. dstat provides these values in Bytes per second. They are
+    converted to Megabits per second by dividing by 1024*1024/8=131072.
+
+  * "io/total"[read] storage read IO bandwidth.
+
+  * "io/total"[writ] storage write IO bandwidth.
+
+  * "io/total"[read]+"io/total"[writ]. The total read and write storage IO
+    bandwidth.
+
+  These values will be graphed and maximum values reported.
+
+6.
+  Repeat steps 1-5 for provisioning at the same time the following number of
+  nodes:
+
+  * 10 nodes
+  * 20 nodes
+  * 40 nodes
+  * 80 nodes
+  * 160 nodes
+  * 320 nodes
+  * 640 nodes
+  * 1280 nodes
+  * 2000 nodes
+
+  Additional tests will be performed if some anomalous behaviour is found.
+  These may require the collection of additional performance metrics.
+
+7.
+  The result of this part of test will be:
+
+* to provide the following graphs, one for each number of provisioned nodes:
+
+  #) Three dependencies on one graph.
+
+     * INGRESS_NET(TIME) Dependence on time of incoming network bandwidth usage.
+     * EGRESS_NET(TIME)  Dependence on time of outgoing network bandwidth usage.
+     * ALL_NET(TIME)     Dependence on time of total network bandwidth usage.
+
+  #) One dependence on one graph.
+
+     * CPU(TIME)         Dependence on time of CPU utilization.
+
+  #) One dependence on one graph.
+
+     * RAM(TIME)         Dependence on time of active memory usage.
+
+  #) Three dependencies on one graph.
+
+     * WRITE_IO(TIME)    Dependence on time of storage write IO bandwidth.
+     * READ_IO(TIME)     Dependence on time of storage read IO bandwidth.
+     * ALL_IO(TIME)      Dependence on time of total storage IO bandwidth.
+
+.. note::
+  If a distributed provisioning system is used, the above graphs should be
+  provided for each provisioning system instance.
+
+* to fill in the following table for maximum values:
+
+The resource metrics are obtained from the maxima of the corresponding graphs
+above. The provisioning time is the elapsed time for all nodes to be
+provisioned. One set of metrics will be given for each number of provisioned
+nodes.
+
+.. table:: Maximum values of performance metrics
+
+  +-------+--------------+---------+---------+---------+---------+
+  || nodes|| provisioning|| maximum|| maximum|| maximum|| maximum|
+  || count|| time        || CPU    || RAM    || NET    || IO     |
+  |       |              || usage  || usage  || usage  || usage  |
+  +=======+==============+=========+=========+=========+=========+
+  | 10    |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+  | 20    |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+  | 40    |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+  | 80    |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+  | 160   |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+  | 320   |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+  | 640   |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+  | 1280  |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+  | 2000  |              |         |         |         |         |
+  +-------+--------------+---------+---------+---------+---------+
+
+Applications
+------------
+
+list of provisioning systems
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. table:: list of provisioning systems
+
+  +-----------------------------+---------+
+  | Name of provisioning system | Version |
+  +=============================+=========+
+  | `Cobbler`_                  | 2.4     |
+  +-----------------------------+---------+
+  | `Razor`_                    | 0.13    |
+  +-----------------------------+---------+
+  | Image based provisioning    |         |
+  | via downloading images with | -       |
+  | bittorrent protocol         |         |
+  +-----------------------------+---------+
+
+Full script for collecting performance metrics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: measure.sh
+    :language: bash
+
+.. references:
+
+.. _dstat: http://dag.wiee.rs/home-made/dstat/
+.. _Cobbler: http://cobbler.github.io/
+.. _Razor: https://github.com/puppetlabs/razor-server
--- a/doc/source/test_plans/provisioning/measure.sh
+++ b/doc/source/test_plans/provisioning/measure.sh
@ -0,0 +1,86 @@
+#!/bin/bash
+
+# Need to install the required packages on provisioning system servers:
+if (("`dpkg -l | grep dstat | grep ^ii > /dev/null; echo $?` == 1"))
+then
+  apt-get -y install dstat
+fi
+
+# Need to prepare the following script on provisioning system server to collect
+# values of CPU,RAM,NET and IO loads per second. You need to change "INTERFACE"
+# variable regarding the interface which connected to nodes to communicare with
+# them during provisioning process. As a result of this command we'll get
+# running in backgroud dstat programm which collecting needed parametes in CSV
+# format into /var/log/dstat.log file.:
+INTERFACE=eth0
+OUTPUT_FILE=/var/log/dstat.csv
+dstat --nocolor --time --cpu --mem --net -N ${INTERFACE} --io --output ${OUTPUT_FILE} > /dev/null &
+
+# Need to prepare script which starts provisioning process and gets the time when
+# provisioning started and when provisioning ended ( when all nodes reachable via
+# ssh). We'll analyze results collected during this time window. For getting
+# start time we can add "date" command before API call or CLI command and forward
+# the output of the command to some log file. Here is example for cobbler:
+ENV_NAME=env-1
+start_time=`date +%s.%N`
+echo "Provisioning started at "`date` > /var/log/provisioning.log
+for SYSTEM in `cobbler system find --comment=${ENV_NAME}`
+do
+  cobbler system reboot --name=$i &
+done
+
+# For getting end-time we can use the script below. This script tries to reach
+# nodes via ssh and write "Provisioning finished at <date/time>" into
+# /var/log/provisioning.log file. You'll need to provide ip addresses of the
+# nodes (from file nodes_ips.list, where IPs listed one per line) and
+# creadentials (SSH_PASSWORD and SSH_USER variables):
+SSH_OPTIONS="StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
+SSH_PASSWORD="r00tme"
+SSH_USER="root"
+NODE_IPS=(`cat nodes_ips.list`)
+TIMER=0
+TIMEOUT=20
+while (("${TIMER}" < "${TIMEOUT}"))
+do
+     for NODE_IP in ${NODE_IPS[@]}
+     do
+             SSH_CMD="sshpass -p ${SSH_PASSWORD} ssh -o ${SSH_OPTIONS} ${SSH_USER}@${NODE_IP}"
+             ${SSH_CMD} "hostname" && UNHAPPY_SSH=0 || UNHAPPY_SSH=1
+             if (("${UNHAPPY_SSH}" == "0"))
+             then
+                     echo "Node with ip "${NODE_IP}" is reachable via ssh"
+                     NODE_IPS=(${NODE_IPS[@]/${NODE_IP}})
+             else
+                     echo "Node with ip "${NODE_IP}" is still unreachable via ssh"
+             fi
+      done
+      TIMER=$((${TIMER} + 1))
+      if (("${TIMER}" == "${TIMEOUT}"))
+      then
+              echo "The following "${#NODE_IPS[@]}" are unreachable"
+              echo ${NODE_IPS[@]}
+              exit 1
+      fi
+      if ((${#NODE_IPS[@]} == 0 ))
+      then
+              break
+      fi
+      # Check that nodes are reachable once per 1 seconds
+      sleep 1
+done
+echo "Provisioning finished at "`date` > /var/log/provisioning.log
+
+end_time=`date +%s.%N`
+elapsed_time=$(echo "$end_time - $start_time" | bc -l)
+echo "Total elapsed time for provisioning: $elapsed_time seconds" > /var/log/provisioning.log
+
+# Stop dstat command
+killall dstat
+
+# Delete excess values and convert to needed metrics. So, we'll get the
+# following csv format:
+# time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all
+awk -F "," 'BEGIN {getline;getline;getline;getline;getline;getline;getline;
+                   print "time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all"}
+            {print $1","100-$4","$8/1048576","$12/131072","$13/131072","($12+$13)/131072","$14","$15","$14+$15}' \
+$OUTPUT_FILE > /var/log/10_nodes.csv