From 52bc239da1b8375493beed8a9853f742a86db029 Mon Sep 17 00:00:00 2001 From: asledzinskiy Date: Tue, 30 Sep 2014 17:31:42 +0300 Subject: [PATCH] Spec with test cases to extend HA testing Change-Id: I36b498843b707709afa0f880ab209cfaa0bb34a6 Implements: blueprint ha-test-improvements --- specs/6.0/ha_tests.rst | 258 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 258 insertions(+) create mode 100644 specs/6.0/ha_tests.rst diff --git a/specs/6.0/ha_tests.rst b/specs/6.0/ha_tests.rst new file mode 100644 index 00000000..b58320a1 --- /dev/null +++ b/specs/6.0/ha_tests.rst @@ -0,0 +1,258 @@ + +========================================== +HA tests improvements +========================================== + +Include the URL of your launchpad blueprint: + +https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements + + + +Problem description +=================== + +Need to add new HA tests and modify the existing one + + +Proposed change +=============== + +We need to clarify the list of new tests and new checks +and then implement it in system tests + +Alternatives +------------ + +No alternatives + +Data model impact +----------------- + +No impact + +REST API impact +--------------- + +No impact + +Upgrade impact +-------------- + +No impact + +Security impact +--------------- + +No impact + +Notifications impact +-------------------- + +No impact + +Other end user impact +--------------------- + +No impact + +Performance Impact +------------------ + +No impact + +Other deployer impact +--------------------- + +No impact + +Developer impact +---------------- + +Implementation +============== + +Assignee(s) +----------- + +Can be implemented by fuel-qa team in parallel + +Work Items +---------- + +1. Shut down public vip two times +(link to bug https://bugs.launchpad.net/fuel/+bug/1311749) + +Steps: +1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute +2. Find node with public vip +3. Shut down eth with public vip +4. Check vip is recovered +5. Find node on which vip is recovered +6. Shut down eth with public vip one more time +7. Check vip is recovered +8. Run OSTF +9. Do the same for management vip + +2. Galera does not reassemble on galera quorum loss +(link to bug https://bugs.launchpad.net/fuel/+bug/1350545) + +Steps: +1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute +2. Shut down one controller +3. Wait for galera cluster to reassemble (HA health check has passed) +4. Kill mysqld on second controller +5. Start first controller +6. Wait for 5 minutes that galera reassembles and check it reassembles +7. Run OSTF +8. Check rabbit status with MOS script + +3. Corrupt root file system on primary controller + +Steps: +1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute +2. Corrupt root file system on primary controller +3. Run OSTF + +4. Block corosync traffic +(link to bug https://bugs.launchpad.net/fuel/+bug/1354520) + +Steps: +1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute +2. Login to rabbit master node +3. Block corosync traffic by extracting interface from management bridge +4. Unblock corosync traffic back +5. Check rabbitmqctl cluster_status at rabbit master node +6. Run OSTF HA tests + +5. HA scalability for mongo + +Steps: +1. Deploy HA cluster with Nova-network, 1 controller and 3 mongo nodes +2. Add 2 controller nodes and re-deploy cluster +3. Run OSTF +4. Add 2 mongo nodes and re-deploy cluster +5. Run OSTF + +6. Lock DB access on primary controller + +Steps: +1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute +2. Lock DB access on primary controller +3. Run OSTF + +7. Need to test HA failover on clusters with bonding + +Steps: +1. Deploy HA cluster with Neutron Vlan, 3 controllers, 2 compute, +eth1-eth4 interfaces are bonded in active backup mode +2. Destroy primary controller +3. Check pacemaker status +4. Run OSTF +5. Check rabbit status with MOS script +(retry it during 5 min till successful result) + +8. HA load testing with rally +(May be not a part of this blueprint) + +9. Need to test HA Neutron cluster under high load and simultaneous +removing of virtual router ports +(related link http://lists.openstack.org/pipermail/openstack-operators/ +2014-September/005165.html) + +10. Cinder Neutron Plugin + +Steps: +1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute, +cinder-neutron plugin enabled +2. Run network verification +3. Run OSTF + +11. Rmq failover test for compute service + +Steps: +1. Deploy HA cluster with Nova-network, 3 controllers, +2 compute with cinder roles +2. Disable one compute node with +nova-manage service disable --host= --service=nova-compute +3. On controller node under test (which compute node under test is connected +to via rmq port 5673) repeat spawn / destroy instance requests continuosly +(sleep 60) while the test is running +4. Add iptables block rule from compute IP to controller IP:5673 +(take care for conntrack as well) +iptables -I INPUT 1 -s compute_IP -p tcp --dport 5673 -m state +--state NEW,ESTABLISHED,RELATED -j DROP +5. Wait 3 min for compute node under test should be marked as down +in the nova service-list +6. Wait for another 3 min for it to be brought up back +7. Check for the compute node under test queue - it should be zero messages +in it +8. Check if the instance could be spawned at the node + +12. Check monit on compute nodes + +Steps: +1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute +2. Ssh to every compute node +3. Kill nova-compute service +4. Check that service was restarted by monit + +13. Check pacemaker restarts heat-engine in case of losing amqp connection + +Steps: +1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute +2. SSH to controller with running heat-engine +3. Check heat-engine status +4. Block heat-engine amqp connections +5. Check if heat-engine was moved to another controller or stopped +on current controller +6. If moved - ssh to node with running heat-engine +6.1 Check heat-engine is running +6.2 Check heat-engine has some amqp connections +7. If stopped - check heat-engine process is running with new pid +7.1 Unblock heat-engine amqp connections +7.2 Check amqp connection re-appears for heat-engine + + +14. Neutron agent rescheduling + +Steps: +1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute +2. Check the neutron-agents list consitency (no duplicates, +alive statuses, etc) +3. On host with l3 agent create one more router +4. Check there are 2 namespaces +5. Destroy controller with l3 agent +6. Check it was moved to another controller, check all routers +and namespaces were moved +7. Check metadata agent was also moved, there is process in router +namespace that listen to 8775 port + +15. DHCP agent rescheduling + +Steps: +1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute +2. Destroy controller with dhcp agent +3. Check it was moved to another controller +4. Check metadata agent was also moved, there is process in router +namespace that listen to 8775 port + +Dependencies +============ + + + +Testing +======= + + + +Documentation Impact +==================== + + + +References +========== + +