An OpenStack fault injection library
Go to file
Ilya Shakhat 113ba244f8 Fix extended plug/unplug commands
Service port configuration is not mandatory for extended plug/unplug

Change-Id: I31d4be9d8c91109d01c3c67103edbd15e3db03d8
2020-05-04 20:47:14 +00:00
devstack Update repo address in readme file 2019-05-14 14:42:52 +02:00
doc Split requirements for py27 and >py36 2020-03-09 18:35:57 +04:00
examples Add docker containers support 2018-08-06 13:31:38 +03:00
os_faults Fix extended plug/unplug commands 2020-05-04 20:47:14 +00:00
releasenotes Switch from oslosphinx to openstackdocstheme 2017-06-29 15:11:51 +07:00
.coveragerc Use pytest to run test in tox 2016-09-14 17:46:33 +03:00
.gitignore Add DevStack plugin 2019-01-10 10:54:14 +01:00
.gitreview OpenDev Migration Patch 2019-04-19 19:25:35 +00:00
.mailmap Small cleanup before the first release 2016-09-27 16:03:01 +03:00
.zuul.yaml Add a job to mirror the code from OpenDev to Github 2020-03-09 15:10:46 +00:00
CONTRIBUTING.rst Update project links and package metadata 2019-09-09 10:50:10 +02:00
HACKING.rst Optimizing the safety of the http link site in HACKING.rst 2018-11-16 11:50:23 +08:00
LICENSE Small cleanup before the first release 2016-09-27 16:03:01 +03:00 Initial Cookiecutter Commit. 2016-08-08 12:06:17 +03:00
README.rst Update project links and package metadata 2019-09-09 10:50:10 +02:00
babel.cfg Small cleanup before the first release 2016-09-27 16:03:01 +03:00
bindep.txt Use bindep to specify binary dependencies 2019-01-07 10:52:25 +01:00
readthedocs.yml Fix readthedocs build 2016-10-17 19:04:49 +03:00
requirements.txt Split requirements for py27 and >py36 2020-03-09 18:35:57 +04:00
rtd-requirements.txt Fix readthedocs build 2016-10-17 19:04:49 +03:00
setup.cfg Update project links and package metadata 2019-09-09 10:50:10 +02:00 Update requirements 2017-03-02 11:47:36 +04:00
test-requirements.txt Split requirements for py27 and >py36 2020-03-09 18:35:57 +04:00
tox.ini Update constraints handling 2019-05-29 16:55:19 +02:00



OpenStack fault-injection library

The library does destructive actions inside an OpenStack cloud. It provides an abstraction layer over different types of cloud deployments. The actions are implemented as drivers (e.g. DevStack driver, Fuel driver, Libvirt driver, IPMI driver, Universal driver).



Ansible is required and should be installed manually system-wide or in virtual environment. Please refer to [] for installation instructions.

Regular installation:

pip install os-faults

The library contains optional libvirt driver [], if you plan to use it, please use the following command to install os-faults with extra dependencies:

pip install os-faults libvirt-python


The cloud deployment configuration is specified in JSON/YAML format or Python dictionary.

The library operates with 2 types of objects:
  • service - is a software that runs in the cloud, e.g. nova-api
  • container - is a software that runs in the cloud, e.g. neutron_api
  • nodes - nodes that host the cloud, e.g. a server with a hostname

Example 1. DevStack

Connection to DevStack can be specified using the following YAML file:

  driver: devstack
    address: devstack.local
      username: stack
      private_key_file: cloud_key
    iface: enp0s8

OS-Faults library will connect to DevStack by address devstack.local with user stack and SSH key located in file cloud_key. Default networking interface is specified with parameter iface. Note that user should have sudo permissions (by default DevStack user has them).

DevStack driver is responsible for service discovery. For more details please refer to driver documentation:

Example 2. An OpenStack with services, containers and power management

An arbitrary OpenStack can be handled too with help of universal driver. In this example os-faults is used as Python library.

cloud_config = {
    'cloud_management': {
        'driver': 'universal',
    'node_discover': {
        'driver': 'node_list',
        'args': [
                'ip': '',
                'auth': {
                    'username': 'root',
                    'private_key_file': 'openstack_key',
                'ip': '',
                'auth': {
                    'username': 'root',
                    'private_key_file': 'openstack_key',
    'services': {
        'memcached': {
            'driver': 'system_service',
            'args': {
                'service_name': 'memcached',
                'grep': 'memcached',
    'containers': {
        'neutron_api': {
            'driver': 'docker_container',
            'args': {
                'container_name': 'neutron_api',
    'power_managements': [
            'driver': 'libvirt',
            'args': {
                'connection_uri': 'qemu+unix:///system',

The config contains all OpenStack nodes with credentials and all services/containers. OS-Faults will automatically figure out the mapping between services/containers and nodes. Power management configuration is flexible and supports mixed bare-metal / virtualized deployments.

First let's establish a connection to the cloud and verify it:

cloud_management = os_faults.connect(cloud_config)

The library can also read configuration from a file in YAML or JSON format. The configuration file can be specified in the OS_FAULTS_CONFIG environment variable. By default the library searches for file os-faults.{json,yaml,yml} in one of locations:

  • current directory
  • ~/.config/os-faults
  • /etc/openstack

Now let's make some destructive action:


Human API

Human API is simplified and self-descriptive. It includes multiple commands that are written like normal English sentences.

Service-oriented command performs specified action against service on all, on one random node or on the node specified by FQDN:

<action> <service> service [on (random|one|single|<fqdn> node[s])]
  • Restart Keystone service - restarts Keystone service on all nodes.
  • kill nova-api service on one node - kills Nova API on one randomly-picked node.

Container-oriented command performs specified action against container on all, on one random node or on the node specified by FQDN:

<action> <container> container [on (random|one|single|<fqdn> node[s])]
  • Restart neutron_ovs_agent container - restarts neutron_ovs_agent container on all nodes.
  • Terminate neutron_api container on one node - stops Neutron API container on one randomly-picked node.

Node-oriented command performs specified action on node specified by FQDN or set of service's nodes:

<action> [random|one|single|<fqdn>] node[s] [with <service> service]
  • Reboot one node with mysql - reboots one random node with MySQL.
  • Reset node-2.domain.tld node - resets node node-2.domain.tld.

Network-oriented command is a subset of node-oriented and performs network management operation on selected nodes:

<action> <network> network on [random|one|single|<fqdn>] node[s]
    [with <service> service]
  • Disconnect management network on nodes with rabbitmq service - shuts down management network interface on all nodes where rabbitmq runs.
  • Connect storage network on node-1.domain.tld node - enables storage network interface on node-1.domain.tld.

Extended API

1. Service actions

Get a service and restart it:

cloud_management = os_faults.connect(cloud_config)
service = cloud_management.get_service(name='glance-api')
Available actions:
  • start - start Service
  • terminate - terminate Service gracefully
  • restart - restart Service
  • kill - terminate Service abruptly
  • unplug - unplug Service out of network
  • plug - plug Service into network

2. Container actions

Get a container and restart it:

cloud_management = os_faults.connect(cloud_config)
container = cloud_management.get_container(name='neutron_api')
Available actions:
  • start - start Container
  • terminate - terminate Container gracefully
  • restart - restart Container

3. Node actions

Get all nodes in the cloud and reboot them:

nodes = cloud_management.get_nodes()
Available actions:
  • reboot - reboot all nodes gracefully
  • poweroff - power off all nodes abruptly
  • reset - reset (cold restart) all nodes
  • disconnect - disable network with the specified name on all nodes
  • connect - enable network with the specified name on all nodes

4. Operate with nodes

Get all nodes where a service runs, pick one of them and reset:

nodes = service.get_nodes()
one = nodes.pick()

Get nodes where l3-agent runs and disable the management network on them:

fqdns = neutron.l3_agent_list_hosting_router(router_id)
nodes = cloud_management.get_nodes(fqdns=fqdns)

5. Operate with services

Restart a service on a single node:

service = cloud_management.get_service(name='keystone')
nodes = service.get_nodes().pick()

6. Operate with containers

Terminate a container on a random node:

container = cloud_management.get_container(name='neutron_ovs_agent')
nodes = container.get_nodes().pick()

License notes

Ansible is distributed under GPL-3.0 license and thus all programs that link with its code are subject to GPL restrictions [1]. However these restrictions are not applied to os-faults library since it invokes Ansible as process [2][3].

Ansible modules are provided with Apache license (compatible to GPL) [4]. Those modules import part of Ansible runtime (modules API) and executed on remote hosts. os-faults library does not import these module neither static nor dynamic.

[1] [2] [3] [4]