From 682de53f105f90833dc6ccb829d99e3d455e4f98 Mon Sep 17 00:00:00 2001
From: Ben Silverman <tersian@rocketmail.com>
Date: Mon, 20 Feb 2017 00:08:08 -0500
Subject: [PATCH] [arch-design-draft] Compute design - storage solutions
 updated

- Updated content in compute arch storage solutions
- This completes the Ocata updates for compute-design in arch-design-draft

Change-Id: I1b6b484b7b76b5bd9ff05bf7a7de1340f43e4376
Implements: blueprint arch-guide-restructure-ocata
---
 .../design-compute/design-compute-storage.rst | 92 ++++++++++---------
 1 file changed, 51 insertions(+), 41 deletions(-)

diff --git a/doc/arch-design-draft/source/design-compute/design-compute-storage.rst b/doc/arch-design-draft/source/design-compute/design-compute-storage.rst
index af5e6577b5..c525478a61 100644
--- a/doc/arch-design-draft/source/design-compute/design-compute-storage.rst
+++ b/doc/arch-design-draft/source/design-compute/design-compute-storage.rst
@@ -1,13 +1,10 @@
-===========================
+==========================
 Instance storage solutions
-===========================
+==========================
 
-As part of the procurement for a compute cluster, you must specify some
-storage for the disk on which the instantiated instance runs. There are
-three main approaches to providing this temporary-style storage, and it
-is important to understand the implications of the choice.
-
-They are:
+As part of the architecture design for a compute cluster, you must specify some
+storage for the disk on which the instantiated instance runs. There are three
+main approaches to providing temporary storage:
 
 * Off compute node storage—shared file system
 * On compute node storage—shared file system
@@ -16,34 +13,38 @@ They are:
 In general, the questions you should ask when selecting storage are as
 follows:
 
-* What is the platter count you can achieve?
-* Do more spindles result in better I/O despite network access?
-* Which one results in the best cost-performance scenario you are aiming for?
-* How do you manage the storage operationally?
+* What are my workloads?
+* Do my workloads have IOPS requirements?
+* Are there read, write, or random access performance requirements?
+* What is my forecast for the scaling of storage for compute?
+* What storage is my enterprise currently using? Can it be re-purposed?
+* How do I manage the storage operationally?
 
-Many operators use separate compute and storage hosts. Compute services
-and storage services have different requirements, and compute hosts
-typically require more CPU and RAM than storage hosts. Therefore, for a
-fixed budget, it makes sense to have different configurations for your
-compute nodes and your storage nodes. Compute nodes will be invested in
-CPU and RAM, and storage nodes will be invested in block storage.
+Many operators use separate compute and storage hosts instead of a
+hyperconverged solution. Compute services and storage services have different
+requirements, and compute hosts typically require more CPU and RAM than storage
+hosts. Therefore, for a fixed budget, it makes sense to have different
+configurations for your compute nodes and your storage nodes. Compute nodes
+will be invested in CPU and RAM, and storage nodes will be invested in block
+storage.
 
-However, if you are more restricted in the number of physical hosts you
-have available for creating your cloud and you want to be able to
-dedicate as many of your hosts as possible to running instances, it
-makes sense to run compute and storage on the same machines.
+However, if you are more restricted in the number of physical hosts you have
+available for creating your cloud and you want to be able to dedicate as many
+of your hosts as possible to running instances, it makes sense to run compute
+and storage on the same machines or use an existing storage array that is
+available.
 
 The three main approaches to instance storage are provided in the next
 few sections.
 
-Off compute node storage—shared file system
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Non-compute node based shared file system
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 In this option, the disks storing the running instances are hosted in
 servers outside of the compute nodes.
 
 If you use separate compute and storage hosts, you can treat your
-compute hosts as "stateless." As long as you do not have any instances
+compute hosts as "stateless". As long as you do not have any instances
 currently running on a compute host, you can take it offline or wipe it
 completely without having any effect on the rest of your cloud. This
 simplifies maintenance for the compute hosts.
@@ -60,6 +61,7 @@ The main disadvantages to this approach are:
 * Depending on design, heavy I/O usage from some instances can affect
   unrelated instances.
 * Use of the network can decrease performance.
+* Scalability can be affected by network architecture.
 
 On compute node storage—shared file system
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -79,36 +81,37 @@ However, this option has several disadvantages:
 * The chassis size of the compute node can limit the number of spindles
   able to be used in a compute node.
 * Use of the network can decrease performance.
+* Loss of compute nodes decreases storage availability for all hosts.
 
 On compute node storage—nonshared file system
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-In this option, each compute node is specified with enough disks to
-store the instances it hosts.
+In this option, each compute node is specified with enough disks to store the
+instances it hosts.
 
 There are two main advantages:
 
-* Heavy I/O usage on one compute node does not affect instances on
-  other compute nodes.
-* Direct I/O access can increase performance.
+* Heavy I/O usage on one compute node does not affect instances on other
+  compute nodes. Direct I/O access can increase performance.
+* Each host can have different storage profiles for hosts aggregation and
+  availability zones.
 
-This has several disadvantages:
+There are several disadvantages:
 
-* If a compute node fails, the instances running on that node are lost.
+* If a compute node fails, the data associated with the instances running on
+  that node is lost.
 * The chassis size of the compute node can limit the number of spindles
   able to be used in a compute node.
 * Migrations of instances from one node to another are more complicated
   and rely on features that may not continue to be developed.
 * If additional storage is required, this option does not scale.
 
-Running a shared file system on a storage system apart from the computes
-nodes is ideal for clouds where reliability and scalability are the most
-important factors. Running a shared file system on the compute nodes
-themselves may be best in a scenario where you have to deploy to
-preexisting servers for which you have little to no control over their
-specifications. Running a nonshared file system on the compute nodes
-themselves is a good option for clouds with high I/O requirements and
-low concern for reliability.
+Running a shared file system on a storage system apart from the compute nodes
+is ideal for clouds where reliability and scalability are the most important
+factors. Running a shared file system on the compute nodes themselves may be
+best in a scenario where you have to deploy to pre-existing servers for which
+you have little to no control over their specifications or have specific
+storage performance needs but do not have a need for persistent storage.
 
 Issues with live migration
 --------------------------
@@ -123,7 +126,14 @@ Live migration can also be done with nonshared storage, using a feature
 known as *KVM live block migration*. While an earlier implementation of
 block-based migration in KVM and QEMU was considered unreliable, there
 is a newer, more reliable implementation of block-based live migration
-as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
+as of the Mitaka release.
+
+Live migration and block migration still have some issues:
+
+* Error reporting has received some attention in Mitaka and Newton but there
+  are improvements needed.
+* Live migration resource tracking issues.
+* Live migration of rescued images.
 
 Choice of file system
 ---------------------