Reproposes the Hyper-V Cluster spec to Newton

Previously-approved: Liberty, Mitaka Change-Id: Id45d86990fe69d4ef4d7b7d5bbb92b7b7d4b2713 Implements: blueprint hyper-v-cluster
2016-02-11 15:05:28 +02:00
parent c42305fef2
commit a2212df213
1 changed files with 264 additions and 0 deletions
--- a/specs/newton/approved/hyper-v-cluster.rst
+++ b/specs/newton/approved/hyper-v-cluster.rst
@@ -0,0 +1,264 @@
 ..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.
 http://creativecommons.org/licenses/by/3.0/legalcode
 ===============
 Hyper-V Cluster
 ===============
 https://blueprints.launchpad.net/nova/+spec/hyper-v-cluster
 Hyper-V Clustering has been introduced since Windows / Hyper-V Server 2008
 and it introduced several benefits such as highly available VMs, better
 performance, faster live migrations and other features. [1][2][3]
 Problem description
 ===================
 Hyper-V Clustering can bring a set of advantages to advantages that are not
 available otherwise and also improve the performance of existing features. A
 few examples would be highly available VMs, faster live migrations, network
 health detection, etc. A more detailed list of features can be found in the
 References section [1][2][3].
 Currently, there is no support for Hyper-V Clusters in OpenStack. This
 blueprint is addressing this issue and adds an implementation.
 Use Cases
 ----------
 This feature is particularly useful for its increased performance, highly
 available VMs and virtual machine and virtual machine network health
 detection.
 Proposed change
 ===============
 There are two methods for creating and deploying a Hyper-V Cluster, each with
 their own advantages and disadvantages:
 * Option A. Hyper-V Cluster controlled by a single nova-compute service. This
  means that the nova-compute service will run on a single Hyper-V Node in a
  Cluster and can manipulate WMI objects remotely on all the Cluster Nodes.
  Advantages:
  * Consistent disk resource tracking. The Cluster Shared Storage is only
    tracked by a single compute service.
  * Smaller overhead, as only one nova-compute service will necessary, as
    oposed to one nova-compute service / node.
  Disadvantages:
  * neutron-hyperv-agent are still mandatory on every Node. Even though its
    performance has been enhanced over the past release cycles, it won't be
    able to handle port binding efficiently, VLAN tagging and creating security
    group rules for each new port (up to thousands of ports in some scenarios).
  * ceilometer-agent-compute will have to run on each Node or implementing a
    Hyper-V Cluster Inspector is necessary, in order to poll the metrics of all
    the resources.
  * Free memory tracking issue. Consider this example: 16 x Nodes Cluster, each
    having 1 GB free memory => ResourceTracker will report 16 GB free memory.
    Deploying a 2 GB instance in the Cluster fails, as there is no viable host
    for it.
  * Free vCPU tracking issue. Same as above.
  * nova-compute service might perform poorly, as it will spawn threads for
    console logging for a considerably larger number of instances, which will
    cause the serial console access to be less responsive.
  * When performing actions on an instance, extra queries will be necessary in
    the Hyper-V Cluster Driver to determine on which Node the instance resides,
    in order to properly manipulate it.
  * The Hyper-V Cluster will act as a scheduler in choosing a node for a new
    instance, resulting in poor allocation choices.
  * The underlying cluster infrastructure will be opaque and the user won't be
    be able to know on which physical node the instance resides usinf Nova API.
  * Users cannot choose to live-migrate in the same cluster. As there is only
    one compute node reported in nova, all the 'foo' instances will be deployed
    on the host 'bar' and running the command:
        nova live-migration foo bar
    will result in a UnableToMigrateToSelf exception. This will negate one of
    the Hyper-V Cluster's advantages: faster live migrations within the same
    Cluster.
 * Option B. nova-compute service on each Hyper-V Cluster Node.
  Advantages:
  * Correct memory and vCPU tracking.
  * nova-scheduler will properly schedule the instances in the Cluster.
  * No decrease in nova-compute service's performance.
  * Live migrations within the same cluster are faster.
  Disadvantages:
  * Free disk resource tracking. Since all the nova-compute services will
    report on the same Cluster Shared Storage, each ResourceTracker will report
    different amount of storage used. For example, having a 500 GB shared
    storage and 2 instances with 200 GB used storage each on a single node in
    the cluster, that node will report having 100 GB free storage space, while
    other nodes, with no instances, will report as having 500 GB free. Trying
    to deploy another 200 GB instance would fail. (WIP)
 This blueprint will address Option B, as its value far outweighs Option A.
 Almost all the existing Hyper-V code in nova is reusable for the purpose of
 creating the Hyper-V Cluster Driver, though a few changes are necessary for
 Option B:
 * Instances will have to added to be clustered when they are spawned.
 * Need to check before live migration if the new host is in the same Cluster.
  If it is in the same Cluster, cluster live migration will have to be
  performed, otherwise, the instance will have to unclustered before doing a
  classic live migration.
 * Cold migrations are still possible in Hyper-V Clusters, the same conditions
  as live migration apply.
 * The instance must be unclustered before it is destroyed.
 * When new instance is added to the Cluster via live migration or cold
  migration from a non-clustered Hyper-V Server or from another Cluster,
  the instance will have to be clustered.
 * Develop method to query free / available disk space for a Cluster Shared
  Storage, which will be reported to the Resource Tracker.
 * Develop method to ensure that only one Hyper-V compute node will fetch a
  certain glance image.
 Alternatives
 ------------
 None, in order take advantage of the benefits offered by the Hyper-V Cluster,
 the instances have to be clustered.
 Data model impact
 -----------------
 None
 REST API impact
 ---------------
 None
 Security impact
 ---------------
 nova-compute service will have to run with an Active Directory user which has
 Hyper-V Management priviledges on all the Hyper-V nodes.
 Notifications impact
 --------------------
 None
 Other end user impact
 ---------------------
 None
 Performance Impact
 ------------------
 * Because of the cluster shared storage, the images will have to cached only
  once per cluster, instead of once per node, resulting in less storage used
  for caching and less time spent doing it.
 * Because of the cluster shared storage, live migration and cold migration
  duration is greatly reduced.
 * Host evacuation takes place automatically when a clustered compute node is
  put into maintenance mode or is taken down. The instances are live-migrated,
  assuring high availability.
 Other deployer impact
 ---------------------
 * Hyper-V Cluster requirements: [4]
 * Creating Hyper-V Cluster: [5]
 * Hyper-V nodes will have to be joined in an Active Directory.
 * Hyper-V nodes will have to be joined in a Failover Cluster and the setup
  has to be validated.[6][7]
 * Only nodes with the same version can be joined in the same cluster. For
  example, clusters can contain only Windows / Hyper-V Server 2012,
  Windows / Hyper-V Server 2012 R2 or Windows / Hyper-V Server 2008 R2.
 * All Hyper-V nodes in the cluster must have access to the same shared cluster
  storage.
 * The path to the shared storage will have to be set in the compute
  nodes' nova.conf file as such:
  instances_path=\\SHARED_STORAGE\OpenStack\Instances
 * The compute_driver in compute nodes' nova.conf file will have to be set as
  such:
  compute_driver=nova.virt.hyperv.cluster.driver.HyperVClusterDriver
 * The WMI namespace for the Hyper-V Cluster is '/root/MSCluster'. When using
  that namespace, the driver will fail to start due to stack overflow exception
  while instantiating the namespace. This is happens because of a missing magic
  method in the WMI module (__nonzero__). This happens in python wmi module,
  for versions 1.4.9 or older.
 * Hyper-V nodes in the same Cluster should be added to the same host aggregate.
  This will ensure that the scheduler will opt for a host in the same aggregate
  for cold migration.
 Developer impact
 ----------------
 None
 Implementation
 ==============
 Assignee(s)
 -----------
 Primary assignee:
  Claudiu Belu <cbelu@cloudbasesolutions.com>
 Work Items
 ----------
 As described in the Proposed change section.
 Dependencies
 ============
 None
 Testing
 =======
 * Unit tests.
 * Tempest tests will be able to validate this feature and they will run as part
  of the Hyper-V CI.
 Documentation Impact
 ====================
 Documentation about HyperVClusterDriver will be added.
 References
 ==========
 [1] Windows Hyper-V / Server 2012 Cluster features:
  https://technet.microsoft.com/en-us/library/dn265972.aspx#BKMK_2012
 [2] Windows Hyper-V / Server 2012 R2 Cluster features:
  https://technet.microsoft.com/en-us/library/dn265972.aspx#BKMK_2012R2
 [3] Hyper-V Cluster live migration:
  https://technet.microsoft.com/en-us/library/dd759249.aspx#BKMK_live
 [4] Hyper-V Cluster requirements:
  https://technet.microsoft.com/en-us/library/jj612869.aspx
 [5] Creating Hyper-V Cluster:
  http://blogs.technet.com/b/keithmayer/archive/2012/12/12/step-by-step-building-a-free-hyper-v-server-2012-cluster-part-1-of-2.aspx
 [6] Hyper-V Cluster validation:
  https://technet.microsoft.com/en-us/library/jj134244.aspx
 [7] Windows Hyper-V / Server 2012 R2 Cluster valudation:
  https://technet.microsoft.com/en-us/library/hh847274%28v=wps.630%29.aspx
 History
 =======