From 52bc239da1b8375493beed8a9853f742a86db029 Mon Sep 17 00:00:00 2001
From: asledzinskiy <asledzinskiy@mirantis.com>
Date: Tue, 30 Sep 2014 17:31:42 +0300
Subject: [PATCH] Spec with test cases to extend HA testing

Change-Id: I36b498843b707709afa0f880ab209cfaa0bb34a6
Implements: blueprint ha-test-improvements
---
 specs/6.0/ha_tests.rst | 258 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)
 create mode 100644 specs/6.0/ha_tests.rst

diff --git a/specs/6.0/ha_tests.rst b/specs/6.0/ha_tests.rst
new file mode 100644
index 00000000..b58320a1
--- /dev/null
+++ b/specs/6.0/ha_tests.rst
@@ -0,0 +1,258 @@
+ 
+==========================================
+HA tests improvements
+==========================================
+
+Include the URL of your launchpad blueprint:
+
+https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements
+
+
+
+Problem description
+===================
+
+Need to add new HA tests and modify the existing one
+
+
+Proposed change
+===============
+
+We need to clarify the list of new tests and new checks
+and then implement it in system tests
+
+Alternatives
+------------
+
+No alternatives
+
+Data model impact
+-----------------
+
+No impact
+
+REST API impact
+---------------
+
+No impact
+
+Upgrade impact
+--------------
+
+No impact
+
+Security impact
+---------------
+
+No impact
+
+Notifications impact
+--------------------
+
+No impact
+
+Other end user impact
+---------------------
+
+No impact
+
+Performance Impact
+------------------
+
+No impact
+
+Other deployer impact
+---------------------
+
+No impact
+
+Developer impact
+----------------
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Can be implemented by fuel-qa team in parallel
+
+Work Items
+----------
+
+1. Shut down public vip two times
+(link to bug https://bugs.launchpad.net/fuel/+bug/1311749)
+
+Steps:
+1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
+2. Find node with public vip
+3. Shut down eth with public vip
+4. Check vip is recovered
+5. Find node on which vip is recovered
+6. Shut down eth with public vip one more time
+7. Check vip is recovered
+8. Run OSTF
+9. Do the same for management vip
+
+2. Galera does not reassemble on galera quorum loss
+(link to bug https://bugs.launchpad.net/fuel/+bug/1350545) 
+
+Steps:
+1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
+2. Shut down one controller
+3. Wait for galera cluster to reassemble (HA health check has passed)
+4. Kill mysqld on second controller
+5. Start first controller
+6. Wait for 5 minutes that galera reassembles and check it reassembles
+7. Run OSTF
+8. Check rabbit status with MOS script
+
+3. Corrupt root file system on primary controller
+
+Steps:
+1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
+2. Corrupt root file system on primary controller
+3. Run OSTF
+
+4. Block corosync traffic
+(link to bug https://bugs.launchpad.net/fuel/+bug/1354520)
+
+Steps:
+1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
+2. Login to rabbit master node
+3. Block corosync traffic by extracting interface from management bridge
+4. Unblock corosync traffic back
+5. Check rabbitmqctl cluster_status at rabbit master node
+6. Run OSTF HA tests
+
+5. HA scalability for mongo
+
+Steps:
+1. Deploy HA cluster with Nova-network, 1 controller and 3 mongo nodes
+2. Add 2 controller nodes and re-deploy cluster
+3. Run OSTF
+4. Add 2 mongo nodes and re-deploy cluster
+5. Run OSTF
+
+6. Lock DB access on primary controller
+
+Steps:
+1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
+2. Lock DB access on primary controller
+3. Run OSTF
+
+7. Need to test HA failover on clusters with bonding
+
+Steps:
+1. Deploy HA cluster with Neutron Vlan, 3 controllers, 2 compute,
+eth1-eth4 interfaces are bonded in active backup mode
+2. Destroy primary controller
+3. Check pacemaker status
+4. Run OSTF
+5. Check rabbit status with MOS script
+(retry it during 5 min till successful result)
+
+8. HA load testing with rally
+(May be not a part of this blueprint)
+
+9. Need to test HA Neutron cluster under high load and simultaneous
+removing of virtual router ports
+(related link http://lists.openstack.org/pipermail/openstack-operators/
+2014-September/005165.html)
+
+10. Cinder Neutron Plugin
+
+Steps:
+1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute,
+cinder-neutron plugin enabled
+2. Run network verification
+3. Run OSTF
+
+11. Rmq failover test for compute service
+
+Steps:
+1. Deploy HA cluster with Nova-network, 3 controllers,
+2 compute with cinder roles
+2. Disable one compute node with
+nova-manage service disable --host=<compute_node_name> --service=nova-compute
+3. On controller node under test (which compute node under test is connected
+to via rmq port 5673) repeat spawn / destroy instance requests continuosly
+(sleep 60) while the test is running
+4. Add iptables block rule from compute IP to controller IP:5673
+(take care for conntrack as well)
+iptables -I INPUT 1 -s compute_IP -p tcp --dport 5673 -m state
+--state NEW,ESTABLISHED,RELATED -j DROP
+5. Wait 3 min for compute node under test should be marked as down
+in the nova service-list
+6. Wait for another 3 min for it to be brought up back
+7. Check for the compute node under test queue - it should be zero messages
+in it
+8. Check if the instance could be spawned at the node
+
+12. Check monit on compute nodes
+
+Steps:
+1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
+2. Ssh to every compute node
+3. Kill nova-compute service
+4. Check that service was restarted by monit
+
+13. Check pacemaker restarts heat-engine in case of losing amqp connection
+
+Steps:
+1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
+2. SSH to controller with running heat-engine
+3. Check heat-engine status
+4. Block heat-engine amqp connections
+5. Check if heat-engine was moved to another controller or stopped
+on current controller
+6. If moved - ssh to node with running heat-engine
+6.1 Check heat-engine is running
+6.2 Check heat-engine has some amqp connections
+7. If stopped - check heat-engine process is running with new pid
+7.1 Unblock heat-engine amqp connections
+7.2 Check amqp connection re-appears for heat-engine
+
+
+14. Neutron agent rescheduling
+
+Steps:
+1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute
+2. Check the neutron-agents list consitency (no duplicates,
+alive statuses, etc)
+3. On host with l3 agent create one more router
+4. Check there are 2 namespaces
+5. Destroy controller with l3 agent
+6. Check it was moved to another controller, check all routers
+and namespaces were moved
+7. Check metadata agent was also moved, there is process in router
+namespace that listen to 8775 port
+
+15. DHCP agent rescheduling
+
+Steps:
+1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute
+2. Destroy controller with dhcp agent
+3. Check it was moved to another controller
+4. Check metadata agent was also moved, there is process in router
+namespace that listen to 8775 port
+
+Dependencies
+============
+
+
+
+Testing
+=======
+
+
+
+Documentation Impact
+====================
+
+
+
+References
+==========
+
+