Initial checkin of Stress Test for nova.

Change-Id: I1c4c656e3b8ec715524d369c226ec122920f89fb
This commit is contained in:
David Kranz 2012-02-22 09:36:48 -05:00
parent 951959239d
commit 6308ec2d03
17 changed files with 1444 additions and 0 deletions

62
stress/README.rst Normal file
View File

@ -0,0 +1,62 @@
Quanta Research Cambridge OpenStack Stress Test System
======================================================
Nova is a distributed, asynchronous system that is prone to race condition
bugs. These bugs will not be easily found during
functional testing but will be encountered by users in large deployments in a
way that is hard to debug. The stress test tries to cause these bugs to happen
in a more controlled environment.
The basic idea of the test is that there are a number of actions, roughly
corresponding to the Compute API, that are fired pseudo-randomly at a nova
cluster as fast as possible. These actions consist of what to do, how to
verify success, and a state filter to make sure that the operation makes sense.
For example, if the action is to reboot a server and none are active, nothing
should be done. A test case is a set of actions to be performed and the
probability that each action should be selected. There are also parameters
controlling rate of fire and stuff like that.
This test framework is designed to stress test a Nova cluster. Hence,
you must have a working Nova cluster.
Environment
------------
This particular framework assumes your working Nova cluster understands Nova
API 2.0. The stress tests can read the logs from the cluster. To enable this
you have to
provide the private key and user name for ssh to the cluster in the
[stress] section of tempest.conf. You also need to provide the
value of --logdir in nova.conf:
host_private_key_path=<path to private ssh key>
host_admin_user=<name of user for ssh command>
nova_logdir=<value of --logdir in nova.conf>
The stress test needs the top-level tempest directory to be on PYTHONPATH
if you are not using nosetests to run.
For real stress, you need to remove "ratelimit" from the pipeline in
api-paste.ini.
Running the sample test
-----------------------
To test your installation, do the following (from the tempest directory):
PYTHONPATH=. python stress/tests/user_script_sample.py
This sample test tries to create a few VMs and kill a few VMs.
Additional Tools
----------------
Sometimes the tests don't finish, or there are failures. In these
cases, you may want to clean out the nova cluster. We have provided
some scripts to do this in the ``tools`` subdirectory. To use these
tools, you will need to install python-novaclient.
You can then use the following script to destroy any keypairs,
floating ips, and servers::
stress/tools/nova_destroy_all.py

17
stress/__init__.py Normal file
View File

@ -0,0 +1,17 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Basic framework for constructing various simulated workloads for a
nova cluster."""
__author__ = "David Kranz and Eugene Shih"

41
stress/basher.py Normal file
View File

@ -0,0 +1,41 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Class to describe actions to be included in a stress test."""
class BasherAction(object):
"""
Used to describe each action that you would like to include in a test run.
"""
def __init__(self, test_case, probability, pargs=[], kargs={}):
"""
`test_case` : the name of the class that implements the action
`pargs` : positional arguments to the constructor of `test_case`
`kargs` : keyword arguments to the constructor of `test_case`
`probability`: frequency that each action
"""
self.test_case = test_case
self.pargs = pargs
self.kargs = kargs
self.probability = probability
def invoke(self, manager, state):
"""
Calls the `run` method of the `test_case`.
"""
return self.test_case.run(manager, state, *self.pargs, **self.kargs)
def __str__(self):
return self.test_case.__class__.__name__

43
stress/config.py Executable file
View File

@ -0,0 +1,43 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import ConfigParser
class StressConfig(object):
"""Provides configuration information for whitebox stress tests."""
def __init__(self, conf):
self.conf = conf
def get(self, item_name, default_value=None):
try:
return self.conf.get("stress", item_name)
except (ConfigParser.NoSectionError, ConfigParser.NoOptionError):
return default_value
@property
def host_private_key_path(self):
"""Path to ssh key for logging into compute nodes."""
return self.get("host_private_key_path", None)
@property
def host_admin_user(self):
"""Username for logging into compute nodes."""
return self.get("host_admin_user", None)
@property
def nova_logdir(self):
"""Directory containing log files on the compute nodes"""
return self.get("nova_logdir", None)

206
stress/driver.py Normal file
View File

@ -0,0 +1,206 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""The entry point for the execution of a workloadTo execute a workload.
Users pass in a description of the workload and a nova manager object
to the bash_openstack function call"""
import random
import datetime
import time
# local imports
from test_case import *
from state import State
import utils.util
from config import StressConfig
# setup logging to file
logging.basicConfig(
format='%(asctime)s %(name)-20s %(levelname)-8s %(message)s',
datefmt='%m-%d %H:%M:%S',
filename="stress.debug.log",
filemode="w",
level=logging.DEBUG,
)
# define a Handler which writes INFO messages or higher to the sys.stdout
_console = logging.StreamHandler()
_console.setLevel(logging.INFO)
# set a format which is simpler for console use
_formatter = logging.Formatter('%(name)-20s: %(levelname)-8s %(message)s')
# tell the handler to use this format
_console.setFormatter(_formatter)
# add the handler to the root logger
logging.getLogger('').addHandler(_console)
def _create_cases(choice_spec):
"""
Generate a workload of tests from workload description
"""
cases = []
count = 0
for choice in choice_spec:
p = choice.probability
for i in range(p):
cases.append(choice)
i = i + p
count = count + p
assert(count == 100)
return cases
def _get_compute_nodes(keypath, user, controller):
"""
Returns a list of active compute nodes. List is generated by running
nova-manage on the controller.
"""
nodes = []
if keypath == None or user == None:
return nodes
lines = utils.util.ssh(keypath, user, controller,
"nova-manage service list | grep ^nova-compute").\
split('\n')
# For example: nova-compute xg11eth0 nova enabled :-) 2011-10-31 18:57:46
# This is fragile but there is, at present, no other way to get this info.
for line in lines:
words = line.split()
if len(words) > 0 and words[4] == ":-)":
nodes.append(words[1])
return nodes
def _error_in_logs(keypath, logdir, user, nodes):
"""
Detect errors in the nova log files on the controller and compute nodes.
"""
grep = 'egrep "ERROR\|TRACE" %s/*.log' % logdir
for node in nodes:
errors = utils.util.ssh(keypath, user, node, grep, check=False)
if len(errors) > 0:
logging.error('%s: %s' % (node, errors))
return True
return False
def bash_openstack(manager,
choice_spec,
**kwargs):
"""
Workload driver. Executes a workload as specified by the `choice_spec`
parameter against a nova-cluster.
`manager` : Manager object
`choice_spec` : list of BasherChoice actions to run on the cluster
`kargs` : keyword arguments to the constructor of `test_case`
`duration` = how long this test should last (3 sec)
`sleep_time` = time to sleep between actions (in msec)
`test_name` = human readable workload description
(default: unnamed test)
`max_vms` = maximum number of instances to launch
(default: 32)
`seed` = random seed (default: None)
"""
# get keyword arguments
duration = kwargs.get('duration', datetime.timedelta(seconds=10))
seed = kwargs.get('seed', None)
sleep_time = float(kwargs.get('sleep_time', 3000)) / 1000
max_vms = int(kwargs.get('max_vms', 32))
test_name = kwargs.get('test_name', 'unamed test')
stress_config = StressConfig(manager.config._conf)
keypath = stress_config.host_private_key_path
user = stress_config.host_admin_user
logdir = stress_config.nova_logdir
computes = _get_compute_nodes(keypath, user, manager.config.identity.host)
utils.util.execute_on_all(keypath, user, computes,
"rm -f %s/*.log" % logdir)
random.seed(seed)
cases = _create_cases(choice_spec)
test_end_time = time.time() + duration.seconds
state = State(max_vms=max_vms)
retry_list = []
last_retry = time.time()
cooldown = False
logcheck_count = 0
test_succeeded = True
logging.debug('=== Test \"%s\" on %s ===' %
(test_name, time.asctime(time.localtime())))
for kw in kwargs:
logging.debug('\t%s = %s', kw, kwargs[kw])
while True:
if not cooldown:
if time.time() < test_end_time:
case = random.choice(cases)
logging.debug('Chose %s' % case)
retry = case.invoke(manager, state)
if retry != None:
retry_list.append(retry)
else:
logging.info('Cooling down...')
cooldown = True
if cooldown and len(retry_list) == 0:
if _error_in_logs(keypath, logdir, user, computes):
test_succeeded = False
break
# Retry verifications every 5 seconds.
if time.time() - last_retry > 5:
logging.debug('retry verifications for %d tasks', len(retry_list))
new_retry_list = []
for v in retry_list:
if not v.retry():
new_retry_list.append(v)
retry_list = new_retry_list
last_retry = time.time()
time.sleep(sleep_time)
# Check error logs after 100 actions
if logcheck_count > 100:
if _error_in_logs(keypath, logdir, user, computes):
test_succeeded = False
break
else:
logcheck_count = 0
else:
logcheck_count = logcheck_count + 1
# Cleanup
logging.info('Cleaning up: terminating virtual machines...')
vms = state.get_instances()
active_vms = [v for _k, v in vms.iteritems() if v and v[1] == 'ACTIVE']
for target in active_vms:
manager.servers_client.delete_server(target[0]['id'])
# check to see that the server was actually killed
for target in active_vms:
kill_id = target[0]['id']
i = 0
while True:
try:
manager.servers_client.get_server(kill_id)
except Exception:
break
i += 1
if i > 60:
raise
time.sleep(1)
logging.info('killed %s' % kill_id)
state.delete_instance_state(kill_id)
if test_succeeded:
logging.info('*** Test succeeded ***')
else:
logging.info('*** Test had errors ***')
return test_succeeded

65
stress/pending_action.py Normal file
View File

@ -0,0 +1,65 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Describe follow-up actions using `PendingAction` class to verify
that nova API calls such as create/delete are completed"""
import logging
import time
class PendingAction(object):
"""
Initialize and describe actions to verify that a Nova API call
is successful.
"""
def __init__(self, nova_manager, state, target_server, timeout=600):
"""
`nova_manager` : Manager object.
`state` : externally maintained data structure about
state of VMs or other persistent objects in
the nova cluster
`target_server` : server that actions were performed on
`target_server` : time before we declare a TimeoutException
`pargs` : positional arguments
`kargs` : keyword arguments
"""
self._manager = nova_manager
self._state = state
self._target = target_server
self._logger = logging.getLogger(self.__class__.__name__)
self._start_time = time.time()
self._timeout = timeout
def _check_for_status(self, state_string):
"""Check to see if the machine has transitioned states"""
t = time.time() # for debugging
target = self._target
_resp, body = self._manager.servers_client.get_server(target['id'])
if body['status'] != state_string:
# grab the actual state as we think it is
temp_obj = self._state.get_instances()[target['id']]
self._logger.debug("machine %s in state %s" %
(target['id'], temp_obj[1]))
self._logger.debug('%s, time: %d' % (temp_obj[1], time.time() - t))
return temp_obj[1]
self._logger.debug('%s, time: %d' % (state_string, time.time() - t))
return state_string
def retry(self):
"""Invoked by user of this class to verify completion of"""
"""previous TestCase actions"""
return False

41
stress/state.py Normal file
View File

@ -0,0 +1,41 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""A class to store the state of various persistent objects in the Nova
cluster, e.g. instances, volumes. Use methods to query to state which than
can be compared to the current state of the objects in Nova"""
class State(object):
def __init__(self, **kwargs):
self._max_vms = kwargs.get('max_vms', 32)
self._instances = {}
self._volumes = {}
# machine state methods
def get_instances(self):
"""return the instances dictionary that we believe are in cluster."""
return self._instances
def get_max_instances(self):
"""return the maximum number of instances we can create."""
return self._max_vms
def set_instance_state(self, key, val):
"""Store `val` in the dictionary indexed at `key`."""
self._instances[key] = val
def delete_instance_state(self, key):
"""Delete state indexed at `key`."""
del self._instances[key]

29
stress/test_case.py Normal file
View File

@ -0,0 +1,29 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Abstract class for implementing an action. You only need to override
the `run` method which specifies all the actual nova API class you wish
to make."""
import logging
class StressTestCase(object):
def __init__(self):
self._logger = logging.getLogger(self.__class__.__name__)
def run(self, nova_manager, state_obj, *pargs, **kargs):
"""Nova API methods to call that would modify state of the cluster"""
return

View File

@ -0,0 +1,311 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Defines various sub-classes of the `StressTestCase` and
`PendingAction` class. The sub-classes of StressTestCase implement various
API calls on the Nova cluster having to do with Server Actions. Each
sub-class will have a corresponding PendingAction. These pending
actions veriy that the API call was successful or not."""
# system imports
import random
import time
# local imports
import test_case
import pending_action
from tempest.exceptions import TimeoutException
from utils.util import *
class TestRebootVM(test_case.StressTestCase):
"""Reboot a server"""
def run(self, manager, state, *pargs, **kwargs):
"""
Send an HTTP POST request to the nova cluster to reboot a random
server. Update state of object in `state` variable to indicate that
it is rebooting.
`manager` : Manager object
`state` : `State` object describing our view of state of cluster
`pargs` : positional arguments
`kwargs` : keyword arguments, which include:
`timeout` : how long to wait before issuing Exception
`type` : reboot type [SOFT or HARD] (default is SOFT)
"""
vms = state.get_instances()
active_vms = [v for k, v in vms.iteritems() if v and v[1] == 'ACTIVE']
# no active vms, so return null
if not active_vms:
self._logger.info('no ACTIVE instances to reboot')
return
_reboot_type = kwargs.get('type', 'SOFT')
# select active vm to reboot and then send request to nova controller
target = random.choice(active_vms)
reboot_target = target[0]
response, body = manager.servers_client.reboot(
reboot_target['id'],
_reboot_type)
if (response.status != 202):
self._logger.error("response: %s" % response)
raise Exception
if _reboot_type == 'SOFT':
state_name = 'REBOOT'
else:
state_name = 'REBOOT' # this is a bug, should be HARD_REBOOT
self._logger.info('waiting for machine %s to change to %s' %
(reboot_target['id'], state_name))
# check for state transition
_resp, body = manager.servers_client.get_server(reboot_target['id'])
if body['status'] == state_name:
state_string = state_name
else:
# grab the actual state as we think it is
temp_obj = state.get_instances()[self._target['id']]
self._logger.debug(
"machine %s in state %s" %
(reboot_target['id'], temp_obj[1])
)
state_string = temp_obj[1]
if state_string == state_name:
self._logger.info('machine %s ACTIVE -> %s' %
(reboot_target['id'], state_name))
state.set_instance_state(reboot_target['id'],
(reboot_target, state_name))
return VerifyRebootVM(manager,
state,
reboot_target,
reboot_type=_reboot_type,
state_name=state_string)
class VerifyRebootVM(pending_action.PendingAction):
"""Class to verify that the reboot completed."""
States = enum('REBOOT_CHECK', 'ACTIVE_CHECK')
def __init__(self, manager, state, target_server,
reboot_type=None,
state_name=None,
ip_addr=None):
super(VerifyRebootVM, self).__init__(manager,
state,
target_server)
# FIX ME: this is a nova bug
if reboot_type == 'SOFT':
self._reboot_state = 'REBOOT'
else:
self._reboot_state = 'REBOOT' # should be HARD REBOOT
if state_name == 'ACTIVE': # was still active, check to see if REBOOT
self._retry_state = self.States.REBOOT_CHECK
else: # was REBOOT, so now check for ACTIVE
self._retry_state = self.States.ACTIVE_CHECK
def retry(self):
"""
Check to see that the server of interest has actually rebooted. Update
state to indicate that server is running again.
"""
# don't run reboot verification if target machine has been
# deleted or is going to be deleted
if (self._target['id'] not in self._state.get_instances().keys() or
self._state.get_instances()[self._target['id']][1] ==
'TERMINATING'):
self._logger.debug('machine %s is deleted or TERMINATING' %
self._target['id'])
return True
if time.time() - self._start_time > self._timeout:
raise TimeoutException
reboot_state = self._reboot_state
if self._retry_state == self.States.REBOOT_CHECK:
server_state = self._check_for_status(reboot_state)
if server_state == reboot_state:
self._logger.info('machine %s ACTIVE -> %s' %
(self._target['id'], reboot_state))
self._state.set_instance_state(self._target['id'],
(self._target, reboot_state)
)
self._retry_state = self.States.ACTIVE_CHECK
elif server_state == 'ACTIVE':
# machine must have gone ACTIVE -> REBOOT ->ACTIVE
self._retry_state = self.States.ACTIVE_CHECK
elif self._retry_state == self.States.ACTIVE_CHECK:
if not self._check_for_status('ACTIVE'):
return False
target = self._target
self._logger.info('machine %s REBOOT -> ACTIVE [%.1f secs elapsed]' %
(target['id'], time.time() - self._start_time))
self._state.set_instance_state(target['id'],
(target, 'ACTIVE'))
return True
# This code needs to be tested against a cluster that supports resize.
#class TestResizeVM(test_case.StressTestCase):
# """Resize a server (change flavors)"""
#
# def run(self, manager, state, *pargs, **kwargs):
# """
# Send an HTTP POST request to the nova cluster to resize a random
# server. Update `state` to indicate server is rebooting.
#
# `manager` : Manager object.
# `state` : `State` object describing our view of state of cluster
# `pargs` : positional arguments
# `kwargs` : keyword arguments, which include:
# `timeout` : how long to wait before issuing Exception
# """
#
# vms = state.get_instances()
# active_vms = [v for k, v in vms.iteritems() if v and v[1] == 'ACTIVE']
# # no active vms, so return null
# if not active_vms:
# self._logger.debug('no ACTIVE instances to resize')
# return
#
# target = random.choice(active_vms)
# resize_target = target[0]
# print resize_target
#
# _timeout = kwargs.get('timeout', 600)
#
# # determine current flavor type, and resize to a different type
# # m1.tiny -> m1.small, m1.small -> m1.tiny
# curr_size = int(resize_target['flavor']['id'])
# if curr_size == 1:
# new_size = 2
# else:
# new_size = 1
# flavor_type = { 'flavorRef': new_size } # resize to m1.small
#
# post_body = json.dumps({'resize' : flavor_type})
# url = '/servers/%s/action' % resize_target['id']
# (response, body) = manager.request('POST',
# url,
# body=post_body)
#
# if (response.status != 202):
# self._logger.error("response: %s" % response)
# raise Exception
#
# state_name = check_for_status(manager, resize_target, 'RESIZE')
#
# if state_name == 'RESIZE':
# self._logger.info('machine %s: ACTIVE -> RESIZE' %
# resize_target['id'])
# state.set_instance_state(resize_target['id'],
# (resize_target, 'RESIZE'))
#
# return VerifyResizeVM(manager,
# state,
# resize_target,
# state_name=state_name,
# timeout=_timeout)
#
#class VerifyResizeVM(pending_action.PendingAction):
# """Verify that resizing of a VM was successful"""
# States = enum('VERIFY_RESIZE_CHECK', 'ACTIVE_CHECK')
#
# def __init__(self, manager, state, created_server,
# state_name=None,
# timeout=300):
# super(VerifyResizeVM, self).__init__(manager,
# state,
# created_server,
# timeout=timeout)
# self._retry_state = self.States.VERIFY_RESIZE_CHECK
# self._state_name = state_name
#
# def retry(self):
# """
# Check to see that the server was actually resized. And change `state`
# of server to running again.
# """
# # don't run resize if target machine has been deleted
# # or is going to be deleted
# if (self._target['id'] not in self._state.get_instances().keys() or
# self._state.get_instances()[self._target['id']][1] ==
# 'TERMINATING'):
# self._logger.debug('machine %s is deleted or TERMINATING' %
# self._target['id'])
# return True
#
# if time.time() - self._start_time > self._timeout:
# raise TimeoutException
#
# if self._retry_state == self.States.VERIFY_RESIZE_CHECK:
# if self._check_for_status('VERIFY_RESIZE') == 'VERIFY_RESIZE':
# # now issue command to CONFIRM RESIZE
# post_body = json.dumps({'confirmResize' : null})
# url = '/servers/%s/action' % self._target['id']
# (response, body) = manager.request('POST',
# url,
# body=post_body)
# if (response.status != 204):
# self._logger.error("response: %s" % response)
# raise Exception
#
# self._logger.info(
# 'CONFIRMING RESIZE of machine %s [%.1f secs elapsed]' %
# (self._target['id'], time.time() - self._start_time)
# )
# state.set_instance_state(self._target['id'],
# (self._target, 'CONFIRM_RESIZE'))
#
# # change states
# self._retry_state = self.States.ACTIVE_CHECK
#
# return False
#
# elif self._retry_state == self.States.ACTIVE_CHECK:
# if not self._check_manager("ACTIVE"):
# return False
# else:
# server = self._manager.get_server(self._target['id'])
#
# # Find private IP of server?
# try:
# (_, network) = server['addresses'].popitem()
# ip = network[0]['addr']
# except KeyError:
# self._logger.error(
# 'could not get ip address for machine %s' %
# self._target['id']
# )
# raise Exception
#
# self._logger.info(
# 'machine %s: VERIFY_RESIZE -> ACTIVE [%.1f sec elapsed]' %
# (self._target['id'], time.time() - self._start_time)
# )
# self._state.set_instance_state(self._target['id'],
# (self._target, 'ACTIVE'))
#
# return True
#
# else:
# # should never get here
# self._logger.error('Unexpected state')
# raise Exception

342
stress/test_servers.py Normal file
View File

@ -0,0 +1,342 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Defines various sub-classes of the `StressTestCase` and
`PendingAction` class. The sub-classes of StressTestCase implement various
API calls on the Nova cluster having to do with creating and deleting VMs.
Each sub-class will have a corresponding PendingAction. These pending
actions veriy that the API call was successful or not."""
# system imports
import random
import time
# local imports
import test_case
import pending_action
from tempest.exceptions import TimeoutException
class TestCreateVM(test_case.StressTestCase):
"""Create a virtual machine in the Nova cluster."""
_vm_id = 0
def run(self, manager, state, *pargs, **kwargs):
"""
Send an HTTP POST request to the nova cluster to build a
server. Update the state variable to track state of new server
and set to PENDING state.
`manager` : Manager object.
`state` : `State` object describing our view of state of cluster
`pargs` : positional arguments
`kwargs` : keyword arguments, which include:
`key_name` : name of keypair
`timeout` : how long to wait before issuing Exception
`image_ref` : index to image types availablexs
`flavor_ref`: index to flavor types available
(default = 1, which is tiny)
"""
# restrict number of instances we can launch
if len(state.get_instances()) >= state.get_max_instances():
self._logger.debug("maximum number of instances created: %d" %
state.get_max_instances())
return None
_key_name = kwargs.get('key_name', '')
_timeout = int(kwargs.get('timeout', 60))
_image_ref = kwargs.get('image_ref', manager.config.compute.image_ref)
_flavor_ref = kwargs.get('flavor_ref',
manager.config.compute.flavor_ref)
expected_server = {
'name': 'server' + str(TestCreateVM._vm_id),
'metadata': {
'key1': 'value1',
'key2': 'value2',
},
'imageRef': _image_ref,
'flavorRef': _flavor_ref,
'adminPass': 'testpwd',
'key_name': _key_name
}
TestCreateVM._vm_id = TestCreateVM._vm_id + 1
response, body = manager.servers_client.create_server(
expected_server['name'],
_image_ref,
_flavor_ref,
meta=expected_server['metadata'],
adminPass=expected_server['adminPass']
)
if (response.status != 202):
self._logger.error("response: %s" % response)
self._logger.error("body: %s" % body)
raise Exception
created_server = body
self._logger.info('setting machine %s to BUILD' %
created_server['id'])
state.set_instance_state(created_server['id'],
(created_server, 'BUILD'))
return VerifyCreateVM(manager,
state,
created_server,
expected_server)
class VerifyCreateVM(pending_action.PendingAction):
"""Verify that VM was built and is running"""
def __init__(self, manager,
state,
created_server,
expected_server):
super(VerifyCreateVM, self).__init__(manager,
state,
created_server,
)
self._expected = expected_server
def retry(self):
"""
Check to see that the server was created and is running.
Update local view of state to indicate that it is running.
"""
# don't run create verification
# if target machine has been deleted or is going to be deleted
if (self._target['id'] not in self._state.get_instances().keys() or
self._state.get_instances()[self._target['id']][1] ==
'TERMINATING'):
self._logger.info('machine %s is deleted or TERMINATING' %
self._target['id'])
return True
time_diff = time.time() - self._start_time
if time_diff > self._timeout:
self._logger.error('%d exceeded launch server timeout of %d' %
(time_diff, self._timeout))
raise TimeoutException
admin_pass = self._target['adminPass']
# Could check more things here.
if (self._expected['adminPass'] != admin_pass):
self._logger.error('expected: %s' %
(self._expected['adminPass']))
self._logger.error('returned: %s' %
(admin_pass))
raise Exception
if self._check_for_status('ACTIVE') != 'ACTIVE':
return False
self._logger.info('machine %s: BUILD -> ACTIVE [%.1f secs elapsed]' %
(self._target['id'], time.time() - self._start_time))
self._state.set_instance_state(self._target['id'],
(self._target, 'ACTIVE'))
return True
class TestKillActiveVM(test_case.StressTestCase):
"""Class to destroy a random ACTIVE server."""
def run(self, manager, state, *pargs, **kwargs):
"""
Send an HTTP POST request to the nova cluster to destroy
a random ACTIVE server. Update `state` to indicate TERMINATING.
`manager` : Manager object.
`state` : `State` object describing our view of state of cluster
`pargs` : positional arguments
`kwargs` : keyword arguments, which include:
`timeout` : how long to wait before issuing Exception
"""
# check for active instances
vms = state.get_instances()
active_vms = [v for k, v in vms.iteritems() if v and v[1] == 'ACTIVE']
# no active vms, so return null
if not active_vms:
self._logger.info('no ACTIVE instances to delete')
return
_timeout = kwargs.get('timeout', 600)
target = random.choice(active_vms)
killtarget = target[0]
manager.servers_client.delete_server(killtarget['id'])
self._logger.info('machine %s: ACTIVE -> TERMINATING' %
killtarget['id'])
state.set_instance_state(killtarget['id'],
(killtarget, 'TERMINATING'))
return VerifyKillActiveVM(manager, state,
killtarget, timeout=_timeout)
class VerifyKillActiveVM(pending_action.PendingAction):
"""Verify that server was destroyed"""
def retry(self):
"""
Check to see that the server of interest is destroyed. Update
state to indicate that server is destroyed by deleting it from local
view of state.
"""
tid = self._target['id']
# if target machine has been deleted from the state, then it was
# already verified to be deleted
if (not tid in self._state.get_instances().keys()):
return False
time_diff = time.time() - self._start_time
if time_diff > self._timeout:
self._logger.error('server %s: %d exceeds terminate timeout %d' %
(tid, time_diff, self._timeout))
raise TimeoutException
try:
self._manager.servers_client.get_server(tid)
except Exception:
# if we get a 404 response, is the machine really gone?
target = self._target
self._logger.info('machine %s: DELETED [%.1f secs elapsed]' %
(target['id'], time.time() - self._start_time))
self._state.delete_machine_state(target['id'])
return True
return False
class TestKillAnyVM(test_case.StressTestCase):
"""Class to destroy a random server regardless of state."""
def run(self, manager, state, *pargs, **kwargs):
"""
Send an HTTP POST request to the nova cluster to destroy
a random server. Update state to TERMINATING.
`manager` : Manager object.
`state` : `State` object describing our view of state of cluster
`pargs` : positional arguments
`kwargs` : keyword arguments, which include:
`timeout` : how long to wait before issuing Exception
"""
vms = state.get_instances()
# no vms, so return null
if not vms:
self._logger.info('no active instances to delete')
return
_timeout = kwargs.get('timeout', 60)
target = random.choice(vms)
killtarget = target[0]
manager.servers_client.delete_server(killtarget['id'])
self._state.set_instance_state(killtarget['id'],
(killtarget, 'TERMINATING'))
# verify object will do the same thing as the active VM
return VerifyKillAnyVM(manager, state, killtarget, timeout=_timeout)
VerifyKillAnyVM = VerifyKillActiveVM
class TestUpdateVMName(test_case.StressTestCase):
"""Class to change the name of the active server"""
def run(self, manager, state, *pargs, **kwargs):
"""
Issue HTTP POST request to change the name of active server.
Update state of server to reflect name changing.
`manager` : Manager object.
`state` : `State` object describing our view of state of cluster
`pargs` : positional arguments
`kwargs` : keyword arguments, which include:
`timeout` : how long to wait before issuing Exception
"""
# select one machine from active ones
vms = state.get_instances()
active_vms = [v for k, v in vms.iteritems() if v and v[1] == 'ACTIVE']
# no active vms, so return null
if not active_vms:
self._logger.info('no active instances to update')
return
_timeout = kwargs.get('timeout', 600)
target = random.choice(active_vms)
update_target = target[0]
# Update name by appending '_updated' to the name
new_name = update_target['name'] + '_updated'
(response, body) = \
manager.servers_client.update_server(update_target['id'],
name=new_name)
if (response.status != 200):
self._logger.error("response: %s " % response)
self._logger.error("body: %s " % body)
raise Exception
assert(new_name == body['name'])
self._logger.info('machine %s: ACTIVE -> UPDATING_NAME' %
body['id'])
state.set_instance_state(body['id'],
(body, 'UPDATING_NAME'))
return VerifyUpdateVMName(manager,
state,
body,
timeout=_timeout)
class VerifyUpdateVMName(pending_action.PendingAction):
"""Check that VM has new name"""
def retry(self):
"""
Check that VM has new name. Update local view of `state` to RUNNING.
"""
# don't run update verification
# if target machine has been deleted or is going to be deleted
if (not self._target['id'] in self._state.get_instances().keys() or
self._state.get_instances()[self._target['id']][1] ==
'TERMINATING'):
return False
if time.time() - self._start_time > self._timeout:
raise TimeoutException
response, body = \
self._manager.serverse_client.get_server(self._target['id'])
if (response.status != 200):
self._logger.error("response: %s " % response)
self._logger.error("body: %s " % body)
raise Exception
if self._target['name'] != body['name']:
self._logger.error(self._target['name'] +
' vs. ' +
body['name'])
raise Exception
# log the update
self._logger.info('machine %s: UPDATING_NAME -> ACTIVE' %
self._target['id'])
self._state.set_instance_state(self._target['id'],
(body,
'ACTIVE'))
return True

View File

@ -0,0 +1,40 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""More aggressive test that creates and destroys VMs with shorter
sleep times"""
from stress.test_servers import *
from stress.basher import BasherAction
from stress.driver import *
from tempest import openstack
choice_spec = [
BasherAction(TestCreateVM(), 50,
kargs={'timeout': '600',
'image_ref': 2,
'flavor_ref': 1}
),
BasherAction(TestKillActiveVM(), 50,
kargs={'timeout': '600'})
]
nova = openstack.Manager()
bash_openstack(nova,
choice_spec,
duration=datetime.timedelta(seconds=180),
sleep_time=100, # in milliseconds
seed=int(time.time()),
test_name="create and delete",
max_vms=32)

View File

@ -0,0 +1,38 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Test that reboots random instances in a Nova cluster"""
from stress.test_servers import *
from stress.test_server_actions import *
from stress.basher import BasherAction
from stress.driver import *
from tempest import openstack
choice_spec = [
BasherAction(TestCreateVM(), 50,
kargs={'timeout': '600'}),
BasherAction(TestRebootVM(), 50,
kargs={'type': 'HARD'}),
]
nova = openstack.Manager()
bash_openstack(nova,
choice_spec,
duration=datetime.timedelta(seconds=180),
sleep_time=500, # in milliseconds
seed=int(time.time()),
test_name="hard reboots",
max_vms=32)

View File

@ -0,0 +1,38 @@
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Sample stress test that creates a few virtual machines and then
destroys them"""
from stress.test_servers import *
from stress.basher import BasherAction
from stress.driver import *
from tempest import openstack
choice_spec = [
BasherAction(TestCreateVM(), 50,
kargs={'timeout': '60'}),
BasherAction(TestKillActiveVM(), 50)
]
nova = openstack.Manager()
bash_openstack(nova,
choice_spec,
duration=datetime.timedelta(seconds=10),
sleep_time=1000, # in milliseconds
seed=None,
test_name="simple create and delete",
max_vms=10)

View File

@ -0,0 +1,53 @@
#!/usr/bin/env python
# vim: tabstop=4 shiftwidth=4 softtabstop=4
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from novaclient.v1_1 import client
import tempest.config
# get the environment variables for credentials
identity = tempest.config.TempestConfig().identity
nt = client.Client(identity.username, identity.password,
identity.tenant_name, identity.auth_url)
flavor_list = nt.flavors.list()
server_list = nt.servers.list()
images_list = nt.images.list()
keypairs_list = nt.keypairs.list()
floating_ips_list = nt.floating_ips.list()
print "total servers: %3d, total flavors: %3d, total images: %3d," % \
(len(server_list),
len(flavor_list),
len(images_list)),
print "total keypairs: %3d, total floating ips: %3d" % \
(len(keypairs_list),
len(floating_ips_list))
print "deleting all servers"
for s in server_list:
s.delete()
print "deleting all keypairs"
for s in keypairs_list:
s.delete()
print "deleting all floating_ips"
for s in floating_ips_list:
s.delete()

49
stress/tools/nova_status.py Executable file
View File

@ -0,0 +1,49 @@
#!/usr/bin/env python
# vim: tabstop=4 shiftwidth=4 softtabstop=4
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from novaclient.v1_1 import client
import tempest.config
# get the environment variables for credentials
identity = tempest.config.TempestConfig().identity
print identity.username, identity.password,\
identity.tenant_name, identity.auth_url
nt = client.Client(identity.username, identity.password,
identity.tenant_name, identity.auth_url)
flavor_list = nt.flavors.list()
server_list = nt.servers.list()
images_list = nt.images.list()
keypairs_list = nt.keypairs.list()
floating_ips_list = nt.floating_ips.list()
print "total servers: %3d, total flavors: %3d, total images: %3d" % \
(len(server_list),
len(flavor_list),
len(images_list))
print "total keypairs: %3d, total floating ips: %3d" % \
(len(keypairs_list),
len(floating_ips_list))
print "flavors:\t", flavor_list
print "servers:\t", server_list
print "images: \t", images_list
print "keypairs:\t", keypairs_list
print "floating ips:\t", floating_ips_list

15
stress/utils/__init__.py Normal file
View File

@ -0,0 +1,15 @@
# vim: tabstop=4 shiftwidth=4 softtabstop=4
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

54
stress/utils/util.py Normal file
View File

@ -0,0 +1,54 @@
# vim: tabstop=4 shiftwidth=4 softtabstop=4
# Copyright 2011 Quanta Research Cambridge, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import subprocess
import shlex
SSH_OPTIONS = (" -q" +
" -o UserKnownHostsFile=/dev/null" +
" -o StrictHostKeyChecking=no -i ")
def get_ssh_options(keypath):
return SSH_OPTIONS + keypath
def scp(keypath, args):
options = get_ssh_options(keypath)
return subprocess.check_call(shlex.split("scp" + options + args))
def ssh(keypath, user, node, command, check=True):
command = "ssh %s %s@%s %s" % (get_ssh_options(keypath), user,
node, command)
popenargs = shlex.split(command)
process = subprocess.Popen(popenargs, stdout=subprocess.PIPE)
output, unused_err = process.communicate()
retcode = process.poll()
if retcode and check:
raise Exception("%s: ssh failed with retcode: %s" % (node, retcode))
return output
def execute_on_all(keypath, user, nodes, command):
for node in nodes:
ssh(keypath, user, node, command)
def enum(*sequential, **named):
"""Create auto-incremented enumerated types"""
enums = dict(zip(sequential, range(len(sequential))), **named)
return type('Enum', (), enums)