Merge "Spec with test cases to extend HA testing"
This commit is contained in:
commit
21349d36dd
258
specs/6.0/ha_tests.rst
Normal file
258
specs/6.0/ha_tests.rst
Normal file
@ -0,0 +1,258 @@
|
|||||||
|
|
||||||
|
==========================================
|
||||||
|
HA tests improvements
|
||||||
|
==========================================
|
||||||
|
|
||||||
|
Include the URL of your launchpad blueprint:
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/fuel/+spec/ha-test-improvements
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Need to add new HA tests and modify the existing one
|
||||||
|
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
We need to clarify the list of new tests and new checks
|
||||||
|
and then implement it in system tests
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
No alternatives
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
No impact
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
No impact
|
||||||
|
|
||||||
|
Upgrade impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
No impact
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
No impact
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
No impact
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
No impact
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
No impact
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
No impact
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Can be implemented by fuel-qa team in parallel
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
1. Shut down public vip two times
|
||||||
|
(link to bug https://bugs.launchpad.net/fuel/+bug/1311749)
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
|
||||||
|
2. Find node with public vip
|
||||||
|
3. Shut down eth with public vip
|
||||||
|
4. Check vip is recovered
|
||||||
|
5. Find node on which vip is recovered
|
||||||
|
6. Shut down eth with public vip one more time
|
||||||
|
7. Check vip is recovered
|
||||||
|
8. Run OSTF
|
||||||
|
9. Do the same for management vip
|
||||||
|
|
||||||
|
2. Galera does not reassemble on galera quorum loss
|
||||||
|
(link to bug https://bugs.launchpad.net/fuel/+bug/1350545)
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
|
||||||
|
2. Shut down one controller
|
||||||
|
3. Wait for galera cluster to reassemble (HA health check has passed)
|
||||||
|
4. Kill mysqld on second controller
|
||||||
|
5. Start first controller
|
||||||
|
6. Wait for 5 minutes that galera reassembles and check it reassembles
|
||||||
|
7. Run OSTF
|
||||||
|
8. Check rabbit status with MOS script
|
||||||
|
|
||||||
|
3. Corrupt root file system on primary controller
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
|
||||||
|
2. Corrupt root file system on primary controller
|
||||||
|
3. Run OSTF
|
||||||
|
|
||||||
|
4. Block corosync traffic
|
||||||
|
(link to bug https://bugs.launchpad.net/fuel/+bug/1354520)
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
|
||||||
|
2. Login to rabbit master node
|
||||||
|
3. Block corosync traffic by extracting interface from management bridge
|
||||||
|
4. Unblock corosync traffic back
|
||||||
|
5. Check rabbitmqctl cluster_status at rabbit master node
|
||||||
|
6. Run OSTF HA tests
|
||||||
|
|
||||||
|
5. HA scalability for mongo
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 1 controller and 3 mongo nodes
|
||||||
|
2. Add 2 controller nodes and re-deploy cluster
|
||||||
|
3. Run OSTF
|
||||||
|
4. Add 2 mongo nodes and re-deploy cluster
|
||||||
|
5. Run OSTF
|
||||||
|
|
||||||
|
6. Lock DB access on primary controller
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
|
||||||
|
2. Lock DB access on primary controller
|
||||||
|
3. Run OSTF
|
||||||
|
|
||||||
|
7. Need to test HA failover on clusters with bonding
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Neutron Vlan, 3 controllers, 2 compute,
|
||||||
|
eth1-eth4 interfaces are bonded in active backup mode
|
||||||
|
2. Destroy primary controller
|
||||||
|
3. Check pacemaker status
|
||||||
|
4. Run OSTF
|
||||||
|
5. Check rabbit status with MOS script
|
||||||
|
(retry it during 5 min till successful result)
|
||||||
|
|
||||||
|
8. HA load testing with rally
|
||||||
|
(May be not a part of this blueprint)
|
||||||
|
|
||||||
|
9. Need to test HA Neutron cluster under high load and simultaneous
|
||||||
|
removing of virtual router ports
|
||||||
|
(related link http://lists.openstack.org/pipermail/openstack-operators/
|
||||||
|
2014-September/005165.html)
|
||||||
|
|
||||||
|
10. Cinder Neutron Plugin
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute,
|
||||||
|
cinder-neutron plugin enabled
|
||||||
|
2. Run network verification
|
||||||
|
3. Run OSTF
|
||||||
|
|
||||||
|
11. Rmq failover test for compute service
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 3 controllers,
|
||||||
|
2 compute with cinder roles
|
||||||
|
2. Disable one compute node with
|
||||||
|
nova-manage service disable --host=<compute_node_name> --service=nova-compute
|
||||||
|
3. On controller node under test (which compute node under test is connected
|
||||||
|
to via rmq port 5673) repeat spawn / destroy instance requests continuosly
|
||||||
|
(sleep 60) while the test is running
|
||||||
|
4. Add iptables block rule from compute IP to controller IP:5673
|
||||||
|
(take care for conntrack as well)
|
||||||
|
iptables -I INPUT 1 -s compute_IP -p tcp --dport 5673 -m state
|
||||||
|
--state NEW,ESTABLISHED,RELATED -j DROP
|
||||||
|
5. Wait 3 min for compute node under test should be marked as down
|
||||||
|
in the nova service-list
|
||||||
|
6. Wait for another 3 min for it to be brought up back
|
||||||
|
7. Check for the compute node under test queue - it should be zero messages
|
||||||
|
in it
|
||||||
|
8. Check if the instance could be spawned at the node
|
||||||
|
|
||||||
|
12. Check monit on compute nodes
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
|
||||||
|
2. Ssh to every compute node
|
||||||
|
3. Kill nova-compute service
|
||||||
|
4. Check that service was restarted by monit
|
||||||
|
|
||||||
|
13. Check pacemaker restarts heat-engine in case of losing amqp connection
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Nova-network, 3 controllers, 2 compute
|
||||||
|
2. SSH to controller with running heat-engine
|
||||||
|
3. Check heat-engine status
|
||||||
|
4. Block heat-engine amqp connections
|
||||||
|
5. Check if heat-engine was moved to another controller or stopped
|
||||||
|
on current controller
|
||||||
|
6. If moved - ssh to node with running heat-engine
|
||||||
|
6.1 Check heat-engine is running
|
||||||
|
6.2 Check heat-engine has some amqp connections
|
||||||
|
7. If stopped - check heat-engine process is running with new pid
|
||||||
|
7.1 Unblock heat-engine amqp connections
|
||||||
|
7.2 Check amqp connection re-appears for heat-engine
|
||||||
|
|
||||||
|
|
||||||
|
14. Neutron agent rescheduling
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute
|
||||||
|
2. Check the neutron-agents list consitency (no duplicates,
|
||||||
|
alive statuses, etc)
|
||||||
|
3. On host with l3 agent create one more router
|
||||||
|
4. Check there are 2 namespaces
|
||||||
|
5. Destroy controller with l3 agent
|
||||||
|
6. Check it was moved to another controller, check all routers
|
||||||
|
and namespaces were moved
|
||||||
|
7. Check metadata agent was also moved, there is process in router
|
||||||
|
namespace that listen to 8775 port
|
||||||
|
|
||||||
|
15. DHCP agent rescheduling
|
||||||
|
|
||||||
|
Steps:
|
||||||
|
1. Deploy HA cluster with Neutron GRE, 3 controllers, 2 compute
|
||||||
|
2. Destroy controller with dhcp agent
|
||||||
|
3. Check it was moved to another controller
|
||||||
|
4. Check metadata agent was also moved, there is process in router
|
||||||
|
namespace that listen to 8775 port
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user