Reproposes the Hyper-V Cluster spec to Newton

Previously-approved: Liberty, Mitaka

Change-Id: Id45d86990fe69d4ef4d7b7d5bbb92b7b7d4b2713
Implements: blueprint hyper-v-cluster
This commit is contained in:
Claudiu Belu
2016-02-11 15:05:28 +02:00
parent c42305fef2
commit a2212df213

View File

@@ -0,0 +1,264 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
===============
Hyper-V Cluster
===============
https://blueprints.launchpad.net/nova/+spec/hyper-v-cluster
Hyper-V Clustering has been introduced since Windows / Hyper-V Server 2008
and it introduced several benefits such as highly available VMs, better
performance, faster live migrations and other features. [1][2][3]
Problem description
===================
Hyper-V Clustering can bring a set of advantages to advantages that are not
available otherwise and also improve the performance of existing features. A
few examples would be highly available VMs, faster live migrations, network
health detection, etc. A more detailed list of features can be found in the
References section [1][2][3].
Currently, there is no support for Hyper-V Clusters in OpenStack. This
blueprint is addressing this issue and adds an implementation.
Use Cases
----------
This feature is particularly useful for its increased performance, highly
available VMs and virtual machine and virtual machine network health
detection.
Proposed change
===============
There are two methods for creating and deploying a Hyper-V Cluster, each with
their own advantages and disadvantages:
* Option A. Hyper-V Cluster controlled by a single nova-compute service. This
means that the nova-compute service will run on a single Hyper-V Node in a
Cluster and can manipulate WMI objects remotely on all the Cluster Nodes.
Advantages:
* Consistent disk resource tracking. The Cluster Shared Storage is only
tracked by a single compute service.
* Smaller overhead, as only one nova-compute service will necessary, as
oposed to one nova-compute service / node.
Disadvantages:
* neutron-hyperv-agent are still mandatory on every Node. Even though its
performance has been enhanced over the past release cycles, it won't be
able to handle port binding efficiently, VLAN tagging and creating security
group rules for each new port (up to thousands of ports in some scenarios).
* ceilometer-agent-compute will have to run on each Node or implementing a
Hyper-V Cluster Inspector is necessary, in order to poll the metrics of all
the resources.
* Free memory tracking issue. Consider this example: 16 x Nodes Cluster, each
having 1 GB free memory => ResourceTracker will report 16 GB free memory.
Deploying a 2 GB instance in the Cluster fails, as there is no viable host
for it.
* Free vCPU tracking issue. Same as above.
* nova-compute service might perform poorly, as it will spawn threads for
console logging for a considerably larger number of instances, which will
cause the serial console access to be less responsive.
* When performing actions on an instance, extra queries will be necessary in
the Hyper-V Cluster Driver to determine on which Node the instance resides,
in order to properly manipulate it.
* The Hyper-V Cluster will act as a scheduler in choosing a node for a new
instance, resulting in poor allocation choices.
* The underlying cluster infrastructure will be opaque and the user won't be
be able to know on which physical node the instance resides usinf Nova API.
* Users cannot choose to live-migrate in the same cluster. As there is only
one compute node reported in nova, all the 'foo' instances will be deployed
on the host 'bar' and running the command:
nova live-migration foo bar
will result in a UnableToMigrateToSelf exception. This will negate one of
the Hyper-V Cluster's advantages: faster live migrations within the same
Cluster.
* Option B. nova-compute service on each Hyper-V Cluster Node.
Advantages:
* Correct memory and vCPU tracking.
* nova-scheduler will properly schedule the instances in the Cluster.
* No decrease in nova-compute service's performance.
* Live migrations within the same cluster are faster.
Disadvantages:
* Free disk resource tracking. Since all the nova-compute services will
report on the same Cluster Shared Storage, each ResourceTracker will report
different amount of storage used. For example, having a 500 GB shared
storage and 2 instances with 200 GB used storage each on a single node in
the cluster, that node will report having 100 GB free storage space, while
other nodes, with no instances, will report as having 500 GB free. Trying
to deploy another 200 GB instance would fail. (WIP)
This blueprint will address Option B, as its value far outweighs Option A.
Almost all the existing Hyper-V code in nova is reusable for the purpose of
creating the Hyper-V Cluster Driver, though a few changes are necessary for
Option B:
* Instances will have to added to be clustered when they are spawned.
* Need to check before live migration if the new host is in the same Cluster.
If it is in the same Cluster, cluster live migration will have to be
performed, otherwise, the instance will have to unclustered before doing a
classic live migration.
* Cold migrations are still possible in Hyper-V Clusters, the same conditions
as live migration apply.
* The instance must be unclustered before it is destroyed.
* When new instance is added to the Cluster via live migration or cold
migration from a non-clustered Hyper-V Server or from another Cluster,
the instance will have to be clustered.
* Develop method to query free / available disk space for a Cluster Shared
Storage, which will be reported to the Resource Tracker.
* Develop method to ensure that only one Hyper-V compute node will fetch a
certain glance image.
Alternatives
------------
None, in order take advantage of the benefits offered by the Hyper-V Cluster,
the instances have to be clustered.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
nova-compute service will have to run with an Active Directory user which has
Hyper-V Management priviledges on all the Hyper-V nodes.
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
* Because of the cluster shared storage, the images will have to cached only
once per cluster, instead of once per node, resulting in less storage used
for caching and less time spent doing it.
* Because of the cluster shared storage, live migration and cold migration
duration is greatly reduced.
* Host evacuation takes place automatically when a clustered compute node is
put into maintenance mode or is taken down. The instances are live-migrated,
assuring high availability.
Other deployer impact
---------------------
* Hyper-V Cluster requirements: [4]
* Creating Hyper-V Cluster: [5]
* Hyper-V nodes will have to be joined in an Active Directory.
* Hyper-V nodes will have to be joined in a Failover Cluster and the setup
has to be validated.[6][7]
* Only nodes with the same version can be joined in the same cluster. For
example, clusters can contain only Windows / Hyper-V Server 2012,
Windows / Hyper-V Server 2012 R2 or Windows / Hyper-V Server 2008 R2.
* All Hyper-V nodes in the cluster must have access to the same shared cluster
storage.
* The path to the shared storage will have to be set in the compute
nodes' nova.conf file as such:
instances_path=\\SHARED_STORAGE\OpenStack\Instances
* The compute_driver in compute nodes' nova.conf file will have to be set as
such:
compute_driver=nova.virt.hyperv.cluster.driver.HyperVClusterDriver
* The WMI namespace for the Hyper-V Cluster is '/root/MSCluster'. When using
that namespace, the driver will fail to start due to stack overflow exception
while instantiating the namespace. This is happens because of a missing magic
method in the WMI module (__nonzero__). This happens in python wmi module,
for versions 1.4.9 or older.
* Hyper-V nodes in the same Cluster should be added to the same host aggregate.
This will ensure that the scheduler will opt for a host in the same aggregate
for cold migration.
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
Claudiu Belu <cbelu@cloudbasesolutions.com>
Work Items
----------
As described in the Proposed change section.
Dependencies
============
None
Testing
=======
* Unit tests.
* Tempest tests will be able to validate this feature and they will run as part
of the Hyper-V CI.
Documentation Impact
====================
Documentation about HyperVClusterDriver will be added.
References
==========
[1] Windows Hyper-V / Server 2012 Cluster features:
https://technet.microsoft.com/en-us/library/dn265972.aspx#BKMK_2012
[2] Windows Hyper-V / Server 2012 R2 Cluster features:
https://technet.microsoft.com/en-us/library/dn265972.aspx#BKMK_2012R2
[3] Hyper-V Cluster live migration:
https://technet.microsoft.com/en-us/library/dd759249.aspx#BKMK_live
[4] Hyper-V Cluster requirements:
https://technet.microsoft.com/en-us/library/jj612869.aspx
[5] Creating Hyper-V Cluster:
http://blogs.technet.com/b/keithmayer/archive/2012/12/12/step-by-step-building-a-free-hyper-v-server-2012-cluster-part-1-of-2.aspx
[6] Hyper-V Cluster validation:
https://technet.microsoft.com/en-us/library/jj134244.aspx
[7] Windows Hyper-V / Server 2012 R2 Cluster valudation:
https://technet.microsoft.com/en-us/library/hh847274%28v=wps.630%29.aspx
History
=======