Merge "Spec with test cases to extend HA testing"

2014-11-03 14:15:24 +00:00 · 2014-11-03 14:15:24 +00:00 · 21349d36dd
commit 21349d36dd
parent f3d75cb8f9 52bc239da1
1 changed files with 258 additions and 0 deletions
--- a/specs/6.0/ha_tests.rst
+++ b/specs/6.0/ha_tests.rst
@ -0,0 +1,258 @@
 ==========================================
 HA tests improvements
 ==========================================
 Include the URL of your launchpad blueprint:
 https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements
 Problem description
 ===================
 Need to add new HA tests and modify the existing one
 Proposed change
 ===============
 We need to clarify the list of new tests and new checks
 and then implement it in system tests
 Alternatives
 ------------
 No alternatives
 Data model impact
 -----------------
 No impact
 REST API impact
 ---------------
 No impact
 Upgrade impact
 --------------
 No impact
 Security impact
 ---------------
 No impact
 Notifications impact
 --------------------
 No impact
 Other end user impact
 ---------------------
 No impact
 Performance Impact
 ------------------
 No impact
 Other deployer impact
 ---------------------
 No impact
 Developer impact
 ----------------
 Implementation
 ==============
 Assignee(s)
 -----------
 Can be implemented by fuel-qa team in parallel
 Work Items
 ----------
 1. Shut down public vip two times
 (link to bug https://bugs.launchpad.net/fuel/+bug/1311749)
 Steps:
 1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
 2. Find node with public vip
 3. Shut down eth with public vip
 4. Check vip is recovered
 5. Find node on which vip is recovered
 6. Shut down eth with public vip one more time
 7. Check vip is recovered
 8. Run OSTF
 9. Do the same for management vip
 2. Galera does not reassemble on galera quorum loss
 (link to bug https://bugs.launchpad.net/fuel/+bug/1350545) 
 Steps:
 1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
 2. Shut down one controller
 3. Wait for galera cluster to reassemble (HA health check has passed)
 4. Kill mysqld on second controller
 5. Start first controller
 6. Wait for 5 minutes that galera reassembles and check it reassembles
 7. Run OSTF
 8. Check rabbit status with MOS script
 3. Corrupt root file system on primary controller
 Steps:
 1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
 2. Corrupt root file system on primary controller
 3. Run OSTF
 4. Block corosync traffic
 (link to bug https://bugs.launchpad.net/fuel/+bug/1354520)
 Steps:
 1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
 2. Login to rabbit master node
 3. Block corosync traffic by extracting interface from management bridge
 4. Unblock corosync traffic back
 5. Check rabbitmqctl cluster_status at rabbit master node
 6. Run OSTF HA tests
 5. HA scalability for mongo
 Steps:
 1. Deploy HA cluster with Nova-network, 1 controller and 3 mongo nodes
 2. Add 2 controller nodes and re-deploy cluster
 3. Run OSTF
 4. Add 2 mongo nodes and re-deploy cluster
 5. Run OSTF
 6. Lock DB access on primary controller
 Steps:
 1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
 2. Lock DB access on primary controller
 3. Run OSTF
 7. Need to test HA failover on clusters with bonding
 Steps:
 1. Deploy HA cluster with Neutron Vlan, 3 controllers, 2 compute,
 eth1-eth4 interfaces are bonded in active backup mode
 2. Destroy primary controller
 3. Check pacemaker status
 4. Run OSTF
 5. Check rabbit status with MOS script
 (retry it during 5 min till successful result)
 8. HA load testing with rally
 (May be not a part of this blueprint)
 9. Need to test HA Neutron cluster under high load and simultaneous
 removing of virtual router ports
 (related link http://lists.openstack.org/pipermail/openstack-operators/
 2014-September/005165.html)
 10. Cinder Neutron Plugin
 Steps:
 1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute,
 cinder-neutron plugin enabled
 2. Run network verification
 3. Run OSTF
 11. Rmq failover test for compute service
 Steps:
 1. Deploy HA cluster with Nova-network, 3 controllers,
 2 compute with cinder roles
 2. Disable one compute node with
 nova-manage service disable --host=<compute_node_name> --service=nova-compute
 3. On controller node under test (which compute node under test is connected
 to via rmq port 5673) repeat spawn / destroy instance requests continuosly
 (sleep 60) while the test is running
 4. Add iptables block rule from compute IP to controller IP:5673
 (take care for conntrack as well)
 iptables -I INPUT 1 -s compute_IP -p tcp --dport 5673 -m state
 --state NEW,ESTABLISHED,RELATED -j DROP
 5. Wait 3 min for compute node under test should be marked as down
 in the nova service-list
 6. Wait for another 3 min for it to be brought up back
 7. Check for the compute node under test queue - it should be zero messages
 in it
 8. Check if the instance could be spawned at the node
 12. Check monit on compute nodes
 Steps:
 1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
 2. Ssh to every compute node
 3. Kill nova-compute service
 4. Check that service was restarted by monit
 13. Check pacemaker restarts heat-engine in case of losing amqp connection
 Steps:
 1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
 2. SSH to controller with running heat-engine
 3. Check heat-engine status
 4. Block heat-engine amqp connections
 5. Check if heat-engine was moved to another controller or stopped
 on current controller
 6. If moved - ssh to node with running heat-engine
 6.1 Check heat-engine is running
 6.2 Check heat-engine has some amqp connections
 7. If stopped - check heat-engine process is running with new pid
 7.1 Unblock heat-engine amqp connections
 7.2 Check amqp connection re-appears for heat-engine
 14. Neutron agent rescheduling
 Steps:
 1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute
 2. Check the neutron-agents list consitency (no duplicates,
 alive statuses, etc)
 3. On host with l3 agent create one more router
 4. Check there are 2 namespaces
 5. Destroy controller with l3 agent
 6. Check it was moved to another controller, check all routers
 and namespaces were moved
 7. Check metadata agent was also moved, there is process in router
 namespace that listen to 8775 port
 15. DHCP agent rescheduling
 Steps:
 1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute
 2. Destroy controller with dhcp agent
 3. Check it was moved to another controller
 4. Check metadata agent was also moved, there is process in router
 namespace that listen to 8775 port
 Dependencies
 ============
 Testing
 =======
 Documentation Impact
 ====================
 References
 ==========