fuel-specs/specs/6.0/ha_tests.rst
asledzinskiy 52bc239da1 Spec with test cases to extend HA testing
Change-Id: I36b498843b707709afa0f880ab209cfaa0bb34a6
Implements: blueprint ha-test-improvements
2014-10-23 13:58:25 +03:00

259 lines
6.2 KiB
ReStructuredText

==========================================
HA tests improvements
==========================================
Include the URL of your launchpad blueprint:
https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements
Problem description
===================
Need to add new HA tests and modify the existing one
Proposed change
===============
We need to clarify the list of new tests and new checks
and then implement it in system tests
Alternatives
------------
No alternatives
Data model impact
-----------------
No impact
REST API impact
---------------
No impact
Upgrade impact
--------------
No impact
Security impact
---------------
No impact
Notifications impact
--------------------
No impact
Other end user impact
---------------------
No impact
Performance Impact
------------------
No impact
Other deployer impact
---------------------
No impact
Developer impact
----------------
Implementation
==============
Assignee(s)
-----------
Can be implemented by fuel-qa team in parallel
Work Items
----------
1. Shut down public vip two times
(link to bug https://bugs.launchpad.net/fuel/+bug/1311749)
Steps:
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
2. Find node with public vip
3. Shut down eth with public vip
4. Check vip is recovered
5. Find node on which vip is recovered
6. Shut down eth with public vip one more time
7. Check vip is recovered
8. Run OSTF
9. Do the same for management vip
2. Galera does not reassemble on galera quorum loss
(link to bug https://bugs.launchpad.net/fuel/+bug/1350545)
Steps:
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
2. Shut down one controller
3. Wait for galera cluster to reassemble (HA health check has passed)
4. Kill mysqld on second controller
5. Start first controller
6. Wait for 5 minutes that galera reassembles and check it reassembles
7. Run OSTF
8. Check rabbit status with MOS script
3. Corrupt root file system on primary controller
Steps:
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
2. Corrupt root file system on primary controller
3. Run OSTF
4. Block corosync traffic
(link to bug https://bugs.launchpad.net/fuel/+bug/1354520)
Steps:
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
2. Login to rabbit master node
3. Block corosync traffic by extracting interface from management bridge
4. Unblock corosync traffic back
5. Check rabbitmqctl cluster_status at rabbit master node
6. Run OSTF HA tests
5. HA scalability for mongo
Steps:
1. Deploy HA cluster with Nova-network, 1 controller and 3 mongo nodes
2. Add 2 controller nodes and re-deploy cluster
3. Run OSTF
4. Add 2 mongo nodes and re-deploy cluster
5. Run OSTF
6. Lock DB access on primary controller
Steps:
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
2. Lock DB access on primary controller
3. Run OSTF
7. Need to test HA failover on clusters with bonding
Steps:
1. Deploy HA cluster with Neutron Vlan, 3 controllers, 2 compute,
eth1-eth4 interfaces are bonded in active backup mode
2. Destroy primary controller
3. Check pacemaker status
4. Run OSTF
5. Check rabbit status with MOS script
(retry it during 5 min till successful result)
8. HA load testing with rally
(May be not a part of this blueprint)
9. Need to test HA Neutron cluster under high load and simultaneous
removing of virtual router ports
(related link http://lists.openstack.org/pipermail/openstack-operators/
2014-September/005165.html)
10. Cinder Neutron Plugin
Steps:
1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute,
cinder-neutron plugin enabled
2. Run network verification
3. Run OSTF
11. Rmq failover test for compute service
Steps:
1. Deploy HA cluster with Nova-network, 3 controllers,
2 compute with cinder roles
2. Disable one compute node with
nova-manage service disable --host=<compute_node_name> --service=nova-compute
3. On controller node under test (which compute node under test is connected
to via rmq port 5673) repeat spawn / destroy instance requests continuosly
(sleep 60) while the test is running
4. Add iptables block rule from compute IP to controller IP:5673
(take care for conntrack as well)
iptables -I INPUT 1 -s compute_IP -p tcp --dport 5673 -m state
--state NEW,ESTABLISHED,RELATED -j DROP
5. Wait 3 min for compute node under test should be marked as down
in the nova service-list
6. Wait for another 3 min for it to be brought up back
7. Check for the compute node under test queue - it should be zero messages
in it
8. Check if the instance could be spawned at the node
12. Check monit on compute nodes
Steps:
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
2. Ssh to every compute node
3. Kill nova-compute service
4. Check that service was restarted by monit
13. Check pacemaker restarts heat-engine in case of losing amqp connection
Steps:
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
2. SSH to controller with running heat-engine
3. Check heat-engine status
4. Block heat-engine amqp connections
5. Check if heat-engine was moved to another controller or stopped
on current controller
6. If moved - ssh to node with running heat-engine
6.1 Check heat-engine is running
6.2 Check heat-engine has some amqp connections
7. If stopped - check heat-engine process is running with new pid
7.1 Unblock heat-engine amqp connections
7.2 Check amqp connection re-appears for heat-engine
14. Neutron agent rescheduling
Steps:
1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute
2. Check the neutron-agents list consitency (no duplicates,
alive statuses, etc)
3. On host with l3 agent create one more router
4. Check there are 2 namespaces
5. Destroy controller with l3 agent
6. Check it was moved to another controller, check all routers
and namespaces were moved
7. Check metadata agent was also moved, there is process in router
namespace that listen to 8775 port
15. DHCP agent rescheduling
Steps:
1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute
2. Destroy controller with dhcp agent
3. Check it was moved to another controller
4. Check metadata agent was also moved, there is process in router
namespace that listen to 8775 port
Dependencies
============
Testing
=======
Documentation Impact
====================
References
==========