Remove passive voice Arch Guide Ch4 Storage Focus

Change-Id: I95f80c5f4ff7181790e6ee789ca08c29999901e0
Closes-bug: #1402462
This commit is contained in:
kallimachos
2015-03-10 16:45:46 +10:00
parent 53f45d1bf4
commit a4b0bcff03
6 changed files with 235 additions and 266 deletions

View File

@@ -6,34 +6,34 @@
xml:id="storage_focus">
<title>Storage focused</title>
<para>Cloud storage is a model of data storage where digital data
is stored in logical pools and physical storage that spans
<para>Cloud storage is a model of data storage that stores digital
data in logical pools and physical storage that spans
across multiple servers and locations. Cloud storage commonly
refers to a hosted object storage service, however the term
has extended to include other types of data storage that are
also includes other types of data storage that are
available as a service, for example block storage.</para>
<para>Cloud storage is based on virtualized infrastructure and
<para>Cloud storage runs on virtualized infrastructure and
resembles broader cloud computing in terms of accessible
interfaces, elasticity, scalability, multi-tenancy, and
metered resources. Cloud storage services can be utilized from
an off-premises service or deployed on-premises.</para>
<para>Cloud storage is made up of many distributed, yet still
synonymous resources, and is often referred to as integrated
metered resources. You can use cloud storage services from
an off-premises service or deploy on-premises.</para>
<para>Cloud storage consists of many distributed, synonymous
resources, which are often referred to as integrated
storage clouds. Cloud storage is highly fault tolerant through
redundancy and the distribution of data. It is highly durable
through the creation of versioned copies, and can be
consistent with regard to data replicas.</para>
<para>At a certain scale, management of data operations can become
a resource intensive process to an organization. Hierarchical
storage management (HSM) systems and data grids can help
<para>At large scale, management of data operations is
a resource intensive process for an organization. Hierarchical
storage management (HSM) systems and data grids help
annotate and report a baseline data valuation to make
intelligent decisions and automate data decisions. HSM allows
for automating tiering and movement, as well as orchestration
intelligent decisions and automate data decisions. HSM enables
automated tiering and movement, as well as orchestration
of data operations. A data grid is an architecture, or set of
services evolving technology, that brings together sets of
services allowing users to manage large data sets.</para>
<para>Examples of applications that can be deployed with cloud
storage characteristics are:</para>
services enabling users to manage large data sets.</para>
<para>Example applications deployed with cloud
storage characteristics:</para>
<itemizedlist>
<listitem>
<para>Active archive, backups and hierarchical storage

View File

@@ -27,60 +27,53 @@
heavily utilized to transfer storage, but they are not
otherwise network intensive.</para>
<para>For a storage-focused OpenStack design architecture, the
selection of storage hardware will determine the overall
performance and scalability of the design architecture. A
number of different factors must be considered in the design
process:</para>
selection of storage hardware determines the overall
performance and scalability of the design architecture. Several factors
impact the design process:</para>
<variablelist>
<varlistentry>
<term>Cost</term>
<listitem>
<para>The cost of components can change which storage
<para>The cost of components affects which storage
architecture and hardware you choose.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Performance</term>
<listitem>
<para>Performance is measured by observing the latency of
storage I/O requests. Performance requirements can change
which solution is implemented.</para>
<para>The latency of storage I/O requests indicates performance.
Performance requirements affect which solution you choose.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Scalability</term>
<listitem>
<para>Scalability refers to how well the
storage solution performs as it is expanded up to its
maximum size. Storage solutions that perform well in
small configurations but have degraded performance
would not be considered scalable.
However, a solution that continues to perform well
at maximum expansion would be considered scalable. The
ability of the storage solution to continue to perform
well as it expands is important.</para>
<para>Scalability refers to how the storage solution performs
as it expands to its maximum size. Storage solutions
that perform well in small configurations but have
degraded performance in large configurations are not scalable.
A solution that performs well at maximum expansion is
scalable. Large deployments require a storage solution
that performs well as it expands.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Expandability</term>
<listitem>
<para>Expandability is the overall
ability of the solution to grow. A storage solution
that expands to 50 PB is considered more expandable
than a solution that only scales to 10 PB.</para>
<para>Expandability is the overall ability of the solution
to grow. A storage solution that expands to 50 PB is
more expandable than a solution that only scales to 10 PB.</para>
<note>
<para>This metric is related to but different from
scalability which is a measure of the solution's
performance as it expands.
<para>This metric is related to scalability.
</para>
</note>
</listitem>
</varlistentry>
</variablelist>
<para>Latency is one of the key considerations in a
<para>Latency is a key consideration in a
storage-focused OpenStack cloud. Using solid-state disks
(SSDs) to minimize latency for instance storage, and reduce
CPU delays caused by waiting for the storage, will increase
(SSDs) to minimize latency for instance storage, and to reduce
CPU delays caused by waiting for the storage, increases
performance. We recommend evaluating the
gains from using RAID controller cards in compute hosts to
improve the performance of the underlying disk
@@ -89,36 +82,33 @@
solution should be used or if a single, highly expandable and
scalable centralized storage array would be a better choice.
If a centralized storage array is the right fit for the requirements
then the hardware will be determined by the array vendor. It is possible
then the array vendor determines the hardware selection. It is possible
to build a storage array using commodity hardware with Open Source
software, but there needs to be access to people with expertise
to build such a system.</para>
software, but requires people with expertise to build such a system.</para>
<para>On the other hand, a scale-out storage solution that
uses direct-attached storage (DAS) in the servers may be an
appropriate choice. If this is true, then the server hardware
needs to be configured to support the storage solution.</para>
<para>Some potential impacts that might affect a particular
storage architecture (and corresponding storage hardware) of a
Storage-focused OpenStack cloud:</para>
appropriate choice. This requires configuration of the server
hardware to support the storage solution.</para>
<para>Considerations affecting storage architecture (and corresponding
storage hardware) of a Storage-focused OpenStack cloud:</para>
<variablelist>
<varlistentry>
<term>Connectivity</term>
<listitem>
<para>Based on the storage solution
selected, ensure the connectivity matches the storage
solution requirements. If a centralized storage array
is selected, it is important to determine how the
hypervisors will connect to the storage array.
<para>Based on the selected storage solution, ensure the
connectivity matches the storage solution requirements.
If selecting centralized storage array, determine how the
hypervisors connect to the storage array.
Connectivity can affect latency and thus performance.
We recommended you check that the network
characteristics will minimize latency to boost the
We recommended confirming that the network
characteristics minimize latency to boost the
overall performance of the design.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Latency</term>
<listitem>
<para>Determine if the use case will have
<para>Determine if the use case has
consistent or highly variable latency.</para>
</listitem>
</varlistentry>
@@ -143,7 +133,7 @@
<section xml:id="compute-server-hardware-selection">
<title>Compute (server) hardware selection</title>
<para>Compute (server) hardware must be evaluated against four
<para>Evaluate Compute (server) hardware four
opposing dimensions:</para>
<variablelist>
<varlistentry>
@@ -158,16 +148,14 @@
<term>Resource capacity</term>
<listitem>
<para>The number of CPU cores, how much
RAM, or how much storage a given server will
deliver.</para>
RAM, or how much storage a given server delivers.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Expandability</term>
<listitem>
<para>The number of additional resources
that can be added to a server before it has reached
its limit.</para>
<para>The number of additional resources you can add to a server
before it reaches capacity.</para>
</listitem>
</varlistentry>
<varlistentry>
@@ -179,7 +167,7 @@
</listitem>
</varlistentry>
</variablelist>
<para>The dimensions need to be weighed against each other to
<para>You must weigh the dimensions against each other to
determine the best design for the desired purpose. For
example, increasing server density can mean sacrificing
resource capacity or expandability. Increasing resource
@@ -192,7 +180,7 @@
a result, the required server hardware must supply adequate
CPU sockets, additional CPU cores, and more RAM; network
connectivity and storage capacity are not as critical. The
hardware will need to provide enough network connectivity and
hardware needs to provide enough network connectivity and
storage capacity to meet the user requirements, however they
are not the primary consideration.</para>
<para>Some server hardware form factors are better
@@ -242,7 +230,7 @@
<para>Larger rack-mounted servers, such as 4U servers,
often provide even greater CPU capacity. Commonly
supporting four or even eight CPU sockets. These
servers have greater expandability capacity but such
servers have greater expandability but such
servers have much lower server density and usually
greater hardware cost.</para>
</listitem>
@@ -258,7 +246,7 @@
additional cost and configuration complexity.</para>
</listitem>
</itemizedlist>
<para>Other factors will strongly influence server hardware
<para>Other factors strongly influence server hardware
selection for a storage-focused OpenStack design
architecture. The following is a list of these factors:</para>
<variablelist>
@@ -266,8 +254,8 @@
<term>Instance density</term>
<listitem>
<para>In this architecture, instance
density and CPU-RAM oversubscription are lower. More
hosts will be required to support the anticipated
density and CPU-RAM oversubscription are lower. You
require more hosts to support the anticipated
scale, especially if the design uses dual-socket
hardware designs.</para>
</listitem>
@@ -277,7 +265,7 @@
<listitem>
<para>Another option to address the higher
host count is to use a quad socket platform. Taking
this approach will decrease host density which also
this approach decreases host density which also
increases rack count. This configuration affects the
number of power connections and also impacts network
and cooling requirements.</para>
@@ -297,31 +285,30 @@
</variablelist>
<para>Storage-focused OpenStack design architecture server
hardware selection should focus on a "scale up" versus "scale
out" solution. The determination of which will be the best
out" solution. The determination of which is the best
solution, a smaller number of larger hosts or a larger number of
smaller hosts, will depend on a combination of factors
smaller hosts, depends on a combination of factors
including cost, power, cooling, physical rack and floor space,
support-warranty, and manageability.</para>
</section>
<section xml:id="networking-hardware-selections">
<title>Networking hardware selection</title>
<para>Some of the key considerations that should be included in
the selection of networking hardware include:</para>
<para>Key considerations for the selection of networking hardware include:</para>
<variablelist>
<varlistentry>
<term>Port count</term>
<listitem>
<para>The user will require networking
<para>The user requires networking
hardware that has the requisite port count.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Port density</term>
<listitem>
<para>The network design will be affected by
the physical space that is required to provide the
requisite port count. A switch that can provide 48 10&nbsp;GbE
<para>The physical space required to provide the
requisite port count affects the network design.
A switch that provides 48 10&nbsp;GbE
ports in 1U has a much higher port density than a
switch that provides 24 10&nbsp;GbE ports in 2U. On a
general scale, a higher port density leaves more rack
@@ -343,15 +330,14 @@
<varlistentry>
<term>Redundancy</term>
<listitem>
<para>The level of network hardware redundancy
required is influenced by the user requirements for
high availability and cost considerations. Network
redundancy can be achieved by adding redundant power
supplies or paired switches.</para>
<para>User requirements for high availability and cost
considerations influence the required level of network
hardware redundancy. Achieve network redundancy by adding
redundant power supplies or paired switches.</para>
<note>
<para>If this is a requirement
the hardware will need to support this configuration.
User requirements will determine if a completely
<para>If this is a requirement,
the hardware must support this configuration.
User requirements determine if a completely
redundant network infrastructure is required.</para>
</note>
</listitem>
@@ -382,7 +368,7 @@
<section xml:id="software-selection-arch-storage">
<title>Software selection</title>
<para>Selecting software to be included in a storage-focused
<para>Selecting software for a storage-focused
OpenStack architecture design includes three areas:</para>
<itemizedlist>
<listitem>
@@ -403,25 +389,22 @@
<title>Operating system and hypervisor</title>
<para>Selecting the OS and hypervisor has a significant impact
on the overall design and also affects server hardware
selection. Ensure that the storage hardware is supported by
the selected operating system and hypervisor combination and
that the networking hardware selection and topology will work
with the chosen operating system and hypervisor combination.
For example, if the design uses Link Aggregation Control
Protocol (LACP), the OS and hypervisor are both required to
support it.</para>
<para>Some areas that could be impacted by the selection of OS and
hypervisor include:</para>
selection. Ensure that the selected operating system and
hypervisor combination support the storage hardware and work
with the networking hardware selection and topology.
For example, Link Aggregation Control Protocol (LACP) requires
support from both the OS and hypervisor.</para>
<para>OS and hypervisor selection affect the following areas:</para>
<variablelist>
<varlistentry>
<term>Cost</term>
<listitem>
<para>Selection of a commercially supported
hypervisor, such as Microsoft Hyper-V, will result in
a different cost model rather than selected a
hypervisor, such as Microsoft Hyper-V, results in
a different cost model than a
community-supported open source hypervisor like
Kinstance or Xen. Similarly, choosing Ubuntu over Red
Hat (or vice versa) will have an impact on cost due to
Hat (or vice versa) impacts cost due to
support contracts. However, business or application
requirements might dictate a specific or commercially
supported hypervisor.</para>
@@ -431,8 +414,8 @@
<term>Supportability</term>
<listitem>
<para>Staff must have training with the chosen hypervisor.
The cost of training should be considered when choosing
the solution. The support of a commercial product
Consider the cost of training when choosing
a solution. The support of a commercial product
such as Red Hat, SUSE, or Windows, is the
responsibility of the OS vendor. If an open source
platform is chosen, the support comes from in-house
@@ -442,11 +425,10 @@
<varlistentry>
<term>Management tools</term>
<listitem>
<para>The management tools used for
Ubuntu and Kinstance differ from the management tools
for VMware vSphere. Although both OS and hypervisor
combinations are supported by OpenStack, there will
be varying impacts to the rest of the
<para>Ubuntu and Kinstance use different management tools
than VMware vSphere. Although both OS and hypervisor
combinations are supported by OpenStack, there are
varying impacts to the rest of the
design as a result of the selection of one combination
versus the other.</para>
</listitem>
@@ -454,36 +436,36 @@
<varlistentry>
<term>Scale and performance</term>
<listitem>
<para>Make sure that selected OS
<para>Ensure that the selected OS
and hypervisor combination meet the appropriate scale
and performance requirements needed for this storage
focused OpenStack cloud. The chosen architecture will
need to meet the targeted instance-host ratios with
focused OpenStack cloud. The chosen architecture must
meet the targeted instance-host ratios with
the selected OS-hypervisor combination.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Security</term>
<listitem>
<para>Make sure that the design can accommodate
<para>Ensure that the design can accommodate
the regular periodic installation of application
security patches while maintaining the required
workloads. The frequency of security patches for the
proposed OS-hypervisor combination will have an impact
on performance and the patch installation process
proposed OS-hypervisor combination impacts
performance and the patch installation process
could affect maintenance windows.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Supported features</term>
<listitem>
<para>Determine what features of
OpenStack are required. This will often determine the
<para>Determine the required features of
OpenStack. This often determines the
selection of the OS-hypervisor combination. Certain
features are only available with specific OSes or
hypervisors. For example, if certain features are not
available, the design might need to be modified to
meet the user requirements.</para>
available, you might need to modify the design to
meet user requirements.</para>
</listitem>
</varlistentry>
<varlistentry>
@@ -494,7 +476,7 @@
OS-hyervisor combinations. Operational and troubleshooting
tools for one OS-hypervisor combination may differ
from the tools used for another OS-hypervisor
combination. As a result, the design will need to
combination. As a result, the design must
address if the two sets of tools need to interoperate.
</para>
</listitem>
@@ -506,8 +488,8 @@
<title>OpenStack components</title>
<para>Which OpenStack components you choose can have a significant
impact on the overall design. While there are certain
components that will always be present, (Compute and Image Service, for
example) there are other services that may not need to be
components that are always present, Compute and Image Service, for
example, there are other services that may not need to be
present. As an example, a certain design may not require
the Orchestration module. Omitting Orchestration would not typically have a
significant impact on the overall design, however, if the
@@ -517,8 +499,8 @@
<para>A storage-focused design might require the ability to use
Orchestration to launch instances with Block Storage volumes to perform
storage-intensive processing.</para>
<para>For a storage-focused OpenStack design architecture, the
following components would typically be used:</para>
<para>A storage-focused OpenStack design architecture typically uses the
following components:</para>
<itemizedlist>
<listitem>
<para>OpenStack Identity (keystone)</para>
@@ -546,20 +528,19 @@
<para>Excluding certain OpenStack components may limit or
constrain the functionality of other components. If a design
opts to include Orchestration but exclude Telemetry, then the design
will not be able to take advantage of Orchestration's auto scaling
cannot take advantage of Orchestration's auto scaling
functionality (which relies on information from Telemetry).
Due to the fact that you can use Orchestration to spin up a large
number of instances to perform the compute-intensive
processing, including Orchestration in a compute-focused architecture
design is strongly recommended.</para>
processing, we strongly recommend including Orchestration in a
compute-focused architecture design.</para>
</section>
<section xml:id="supplemental-software-arch-storage">
<title>Supplemental software</title>
<para>While OpenStack is a fairly complete collection of software
projects for building a platform for cloud services, there are
additional pieces of software that might need to be added to
any given OpenStack design.</para>
projects for building a platform for cloud services, you may need
to add other pieces of software.</para>
</section>
<section xml:id="networking-software-arch-storage">
@@ -568,13 +549,12 @@
services for instances. There are many additional networking
software packages that may be useful to manage the OpenStack
components themselves. Some examples include HAProxy,
keepalived, and various routing daemons (like Quagga). Some of
these software packages, HAProxy in particular, are described
in more detail in the <citetitle>OpenStack High Availability
Guide</citetitle> (refer to the <link
keepalived, and various routing daemons (like Quagga). The
<citetitle>OpenStack High Availability Guide</citetitle> describes
some of these software packages, HAProxy in particular. See the <link
xlink:href="http://docs.openstack.org/high-availability-guide/content/ch-network.html">Network
controller cluster stack chapter</link> of the OpenStack High
Availability Guide).</para>
Availability Guide.</para>
</section>
<section xml:id="management-software-arch-storage">
@@ -604,30 +584,29 @@
</itemizedlist>
<important>
<para>The factors for determining which
software packages in this category should be selected is
software packages in this category to select is
outside the scope of this design guide.</para>
</important>
<para>Clustering Software, such as Corosync or Pacemaker, is
determined primarily by the availability design requirements.
The impact of including (or not including) these
software packages is determined by the availability of the
cloud infrastructure and the complexity of supporting the
configuration after it is deployed. The <citetitle>OpenStack High
Availability Guide</citetitle> provides more details on the installation
and configuration of Corosync and Pacemaker, should these
packages need to be included in the design.</para>
<para>Requirements for logging, monitoring, and alerting are
determined by operational considerations. Each of these
sub-categories includes a number of various options. For
example, in the logging sub-category one might consider
<para>The availability design requirements determine the selection of
Clustering Software, such as Corosync or Pacemaker.
The availability of the cloud infrastructure and the complexity
of supporting the configuration after deployment determines
the impact of including these software packages. The
<citetitle>OpenStack High Availability Guide</citetitle> provides
more details on the installation and configuration of Corosync
and Pacemaker.</para>
<para>Operational considerations determine the requirements for
logging, monitoring, and alerting. Each of these
sub-categories includes options. For
example, in the logging sub-category you could select
Logstash, Splunk, Log Insight, or another log
aggregation-consolidation tool. Logs should be stored in a
centralized location to make it easier to perform analytics
aggregation-consolidation tool. Store logs in a
centralized location to facilitate performing analytics
against the data. Log data analytics engines can also provide
automation and issue notification, by providing a mechanism to
both alert and automatically attempt to remediate some of the
more commonly known issues.</para>
<para>If any of these software packages are needed, then the
<para>If you require any of these software packages, the
design must account for the additional resource consumption
(CPU, RAM, storage, and network bandwidth for a log
aggregation solution, for example). Some other potential
@@ -648,40 +627,37 @@
<section xml:id="database-software-arch-storage">
<title>Database software</title>
<para>Virtually all of the OpenStack components require access to
<para>Most OpenStack components require access to
back-end database services to store state and configuration
information. Choose an appropriate back-end database which
will satisfy the availability and fault tolerance requirements
satisfies the availability and fault tolerance requirements
of the OpenStack services.</para>
<para>MySQL is generally considered to be the de facto database
for OpenStack, however, other compatible databases are also
known to work.</para>
<para>MySQL is the default database for OpenStack, but other
compatible databases are available.</para>
<note>
<para>
Telemetry uses MongoDB.
</para>
</note>
<para>The solution selected to provide high availability for the
database will change based on the selected database. If MySQL
is selected, then a number of options are available. For
active-active clustering a replication technology such as
Galera can be used. For active-passive some form of shared
storage must be used. Each of these potential solutions has an
<para>The chosen high availability database solution changes
according to the selected database. MySQL, for example, provides
several options. Use a replication technology such as Galera
for active-active clustering. For active-passive use some form of
shared storage. Each of these potential solutions has an
impact on the design:</para>
<itemizedlist>
<listitem>
<para>Solutions that employ Galera/MariaDB will require at
<para>Solutions that employ Galera/MariaDB require at
least three MySQL nodes.</para>
</listitem>
<listitem>
<para>MongoDB will have its own design considerations,
with regards to making the database highly
available.</para>
<para>MongoDB has its own design considerations for high
availability.</para>
</listitem>
<listitem>
<para>OpenStack design, generally, does not include shared
storage but for a high availability design some
components might require it depending on the specific
storage. However, for some high availability designs,
certain components might require it depending on the specific
implementation.</para>
</listitem>
</itemizedlist>

View File

@@ -83,7 +83,7 @@
</listitem>
<listitem>
<para>Alerting and notification of responsible teams or
automated systems which will remediate problems with
automated systems which remediate problems with
storage as they arise.</para>
</listitem>
<listitem>
@@ -94,7 +94,7 @@
<section xml:id="management-efficiency">
<title>Management efficiency</title>
<para>Operations personnel will often be required to replace failed
<para>Operations personnel are often required to replace failed
drives or nodes and provide ongoing maintenance of the storage hardware.</para>
<para>Provisioning and configuration of new or upgraded storage is
another important consideration when it comes to management of
@@ -109,8 +109,8 @@
<title>Application awareness</title>
<para>Well-designed applications should be aware of underlying storage
subsystems, in order to use cloud storage solutions effectively.</para>
<para>If natively available replication is not available, the application
must be able to be modified by operations personnel so that they
<para>If natively available replication is not available, operations personnel
must be able to modify the application so that they
can provide their own replication service. In the event that
replication is unavailable, operations personnel can design applications
to react such that they can provide their own replication services.
@@ -125,22 +125,21 @@
<para>Designing for fault tolerance and availability of storage
systems in an OpenStack cloud is vastly different when
comparing the Block Storage and Object Storage services. The
Object Storage service is designed to have consistency and
Object Storage service design features consistency and
partition tolerance as a function of the application.
Therefore, it does not have any reliance on hardware RAID
controllers to provide redundancy for physical disks.</para>
<section xml:id="block-storage-fault-tolerance-and-availability">
<title>Block Storage fault tolerance and availability</title>
<para>Block Storage resource nodes are commonly configured
with advanced RAID controllers and high performance disks that
are designed to provide fault tolerance at the hardware
level.</para>
<para>Deploy high performing storage solutions
<para>Block Storage resource nodes are commonly configured
with advanced RAID controllers and high performance disks to
provide fault tolerance at the hardware level.</para>
<para>Deploy high performing storage solutions
such as SSD disk drives or flash storage systems in cases where applications
require extreme performance out of Block Storage devices.</para>
<para>In environments that place extreme demands on Block Storage,
it is advisable to take advantage of multiple storage pools.
<para>In environments that place extreme demands on Block Storage,
we recommend using multiple storage pools.
In this case, each pool of devices should have a similar
hardware design and disk configuration across all hardware
nodes in that pool. This allows for a design that provides
@@ -152,13 +151,13 @@
storage across resource nodes. Ensuring that applications can
schedule volumes in multiple regions, each with their own
network, power, and cooling infrastructure, can give tenants
the ability to build fault tolerant applications that will be
the ability to build fault tolerant applications that are
distributed across multiple availability zones.</para>
<para>In addition to the Block Storage resource nodes, it is
important to design for high availability and redundancy of
the APIs and related services that are responsible for
provisioning and providing access to storage. We
recommend desiging a layer of hardware or software load
recommend designing a layer of hardware or software load
balancers in order to achieve high availability of the
appropriate REST API services to provide uninterrupted
service. In some cases, it may also be necessary to deploy an
@@ -172,8 +171,8 @@
so that tenants can manage Block Storage volumes.</para>
<para>In a cloud with extreme demands on Block Storage, the network
architecture should take into account the amount of East-West
bandwidth that will be required for instances to make use of
the available storage resources. Network devices selected
bandwidth required for instances to make use of
the available storage resources. The selected network devices
should support jumbo frames for transferring large blocks of
data. In some cases, it may be necessary to create an
additional back-end storage network dedicated to providing
@@ -184,38 +183,37 @@
<title>Object Storage fault tolerance and availability</title>
<para>While consistency and partition tolerance are both inherent
features of the Object Storage service, it is important to
design the overall storage architecture to ensure that those
goals are met by the system being implemented. The
design the overall storage architecture to ensure that the
implemented system meets those goals. The
OpenStack Object Storage service places a specific number of
data replicas as objects on resource nodes. These replicas are
distributed throughout the cluster based on a consistent hash
ring which exists on all nodes in the cluster.</para>
<para>The Object Storage system should be designed with sufficient
<para>Design the Object Storage system with a sufficient
number of zones to provide quorum for the number of replicas
defined. As an example, with three replicas configured in the
defined. For example, with three replicas configured in the
Swift cluster, the recommended number of zones to configure
within the Object Storage cluster in order to achieve quorum
is 5. While it is possible to deploy a solution with fewer
zones, the implied risk of doing so is that some data may not
be available and API requests to certain objects stored in the
cluster might fail. For this reason, ensure the number of
zones in the Object Storage cluster is properly accounted for.</para>
cluster might fail. For this reason, ensure you properly account
for the number of zones in the Object Storage cluster.</para>
<para>Each Object Storage zone should be self-contained within its
own availability zone. Each availability zone should have
independent access to network, power and cooling
infrastructure to ensure uninterrupted access to data. In
addition, each availability zone should be serviced by a pool
of Object Storage proxy servers which will provide access to
data stored on the object nodes. Object proxies in each region
should leverage local read and write affinity so that access
to objects is facilitated by local storage resources wherever
possible. We recommend that upstream load balancing be
deployed to ensure that proxy services can be distributed
across the multiple zones and, in some cases, it may be
necessary to make use of third party solutions to aid with
geographical distribution of services.</para>
addition, a pool of Object Storage proxy servers providing access
to data stored on the object nodes should service
each availability zone. Object proxies in each region
should leverage local read and write affinity so that local storage
resources facilitate access to objects wherever
possible. We recommend deploying upstream load balancing to ensure
that proxy services are distributed across the multiple zones and,
in some cases, it may be necessary to make use of third party
solutions to aid with geographical distribution of services.</para>
<para>A zone within an Object Storage cluster is a logical
division. A zone can be represented as any of the following:</para>
division. Any of the following may represent a zone:</para>
<itemizedlist>
<listitem>
<para>
@@ -243,7 +241,7 @@
</para>
</listitem>
</itemizedlist>
<para>Deciding the proper zone design is crucial for allowing the Object
<para>Selecting the proper zone design is crucial for allowing the Object
Storage cluster to scale while providing an available and
redundant storage system. It may be necessary to
configure storage policies that have different requirements
@@ -263,9 +261,9 @@
consideration during the design phase.</para>
<section xml:id="scaling-block-storage">
<title>Scaling Block Storage</title>
<para>Block Storage pools can be upgraded to add storage capacity
rather easily without interruption to the overall Block
Storage service. Nodes can be added to the pool by simply
<para>You can upgrade Block Storage pools to add storage capacity
without interruption to the overall Block
Storage service. Add nodes to the pool by
installing and configuring the appropriate hardware and
software and then allowing that node to report in to the
proper storage pool via the message bus. This is because Block
@@ -276,10 +274,10 @@
<para>In some cases, the demand on Block Storage from instances
may exhaust the available network bandwidth. As a result,
design network infrastructure that services Block Storage
resources in such a way that capacity and bandwidth can be
added relatively easily. This often involves the use of
resources in such a way that you can add capacity and
bandwidth easily. This often involves the use of
dynamic routing protocols or advanced networking solutions to
allow capacity to be added to downstream devices easily. Both
add capacity to downstream devices easily. Both
the front-end and back-end storage network designs should
encompass the ability to quickly and easily add capacity and
bandwidth.</para>
@@ -297,23 +295,23 @@
disks.</para>
<para>For example, a system that starts with a single disk and a
partition power of 3 can have 8 (2^3) partitions. Adding a
second disk means that each will have 4 partitions.
second disk means that each has 4 partitions.
The one-disk-per-partition limit means that this system can
never have more than 8 disks, limiting its scalability.
However, a system that starts with a single disk and a
partition power of 10 can have up to 1024 (2^10) disks.</para>
<para>As back-end storage capacity is added to the system, the
partition maps cause data to be redistributed amongst storage
nodes. In some cases, this replication can consist of
extremely large data sets. In these cases, we recommended
making use of back-end replication links which will not
<para>As you add back-end storage capacity to the system, the
partition maps redistribute data amongst the storage
nodes. In some cases, this replication consists of
extremely large data sets. In these cases, we recommend
using back-end replication links that do not
contend with tenants' access to data.</para>
<para>As more tenants begin to access data within the cluster and
their data sets grow it will become necessary to add front-end
their data sets grow it is necessary to add front-end
bandwidth to service data access requests. Adding front-end
bandwidth to an Object Storage cluster requires careful
planning and design of the Object Storage proxies that will be
used by tenants to gain access to the data, along with the
planning and design of the Object Storage proxies that tenants
use to gain access to the data, along with the
high availability solutions that enable easy scaling of the
proxy layer. We recommend designing a front-end load
balancing layer that tenants and consumers use to gain access
@@ -321,9 +319,9 @@
may be distributed across zones, regions or even across
geographic boundaries, which may also require that the design
encompass geo-location solutions.</para>
<para>In some cases, adding bandwidth and capacity to the network
<para>In some cases, you must add bandwidth and capacity to the network
resources servicing requests between proxy servers and storage
nodes will be required. For this reason, the network
nodes. For this reason, the network
architecture used for access to storage nodes and proxy
servers should make use of a design which is scalable.</para>
</section>

View File

@@ -7,8 +7,8 @@
<?dbhtml stop-chunking?>
<title>Prescriptive examples</title>
<para>Storage-focused architecture highly depends on the
specific use case. Three specific example use cases are
discussed in this section:</para>
specific use case. This section discusses three
specific example use cases:</para>
<itemizedlist>
<listitem>
<para>
@@ -38,9 +38,9 @@
</imageobject>
</mediaobject>
</para>
<para>The presented REST interface does not require a high performance
caching tier, and is presented as a traditional Object store running
on traditional spindles.</para>
<para>The example REST interface, presented as a traditional Object store running
on traditional spindles, does not require a high performance
caching tier.</para>
<para>This example uses the following components:</para>
<para>Network:</para>
<itemizedlist>
@@ -52,7 +52,7 @@
<para>Storage hardware:</para>
<itemizedlist>
<listitem>
<para>10 storage servers each with 12x4 TB disks equalling
<para>10 storage servers each with 12x4 TB disks equaling
480 TB total space with approximately 160 Tb of
usable space after replicas.</para>
</listitem>
@@ -87,7 +87,7 @@
</para>
<para>One potential solution to this problem is the implementation of storage
systems designed for performance. Parallel file systems have previously
filled this need in the HPC space and as a result could be considered
filled this need in the HPC space and are suitable
for large scale performance-orientated systems.</para>
<para>OpenStack has integration with Hadoop to manage the Hadoop cluster
within the cloud. This diagram shows an OpenStack store with a high
@@ -112,37 +112,36 @@
<title>High performance database with Database service</title>
<para>Databases are a common workload that benefit from high performance
storage back ends. Although enterprise storage is not a requirement,
many environments have existing storage that can be used as back ends for
OpenStack cloud. A storage pool can be created to provide block devices
many environments have existing storage that OpenStack cloud can use as
back ends. You can create a storage pool to provide block devices
with OpenStack Block Storage for instances as well as object interfaces.
In this example, the database I-O requirements were high and demanded
In this example, the database I-O requirements are high and demand
storage presented from a fast SSD pool.</para>
<para>A storage system is used to present a LUN that is backed by
<para>A storage system presents a LUN backed by
a set of SSDs using a traditional storage array with OpenStack
Block Storage integration or a storage platform such as Ceph
or Gluster.</para>
<para>This system can provide additional performance. For example,
in the database example below, a portion of the SSD pool can act
as a block device to the Database server. In the high performance analytics
example, the REST interface would be accelerated by the inline
SSD cache layer.</para>
example, the inline SSD cache layer accelerates the REST interface.</para>
<mediaobject>
<imageobject>
<imagedata contentwidth="4in"
fileref="../figures/Storage_Database_+_Object5.png"/>
</imageobject>
</mediaobject>
<para>Ceph was selected to present a Swift-compatible REST
<para>In this example, Ceph presents a Swift-compatible REST
interface, as well as a block level storage from a distributed
storage cluster. It is highly flexible and has features that
allow to reduce cost of operations such as self healing and
enable reduced cost of operations such as self healing and
auto balancing. Using erasure coded pools are a suitable way of
maximizing the amount of usable space.</para>
<note>
<para>There are special considerations around erasure coded pools.
For example, higher computational requirements and limitations on
the operations allowed on an object; partial writes are not
supported in an erasure coded pool.
the operations allowed on an object; erasure coded pools do not
support partial writes.
</para>
</note>
<para>Using Ceph as an applicable example, a potential architecture
@@ -183,8 +182,8 @@
</listitem>
</itemizedlist>
<para>Using an SSD cache layer, you can present block devices
directly to Hypervisors or instances. The SSD cache systems
can also be used as an inline cache for the REST interface.
directly to Hypervisors or instances. The REST interface can
also use the SSD cache systems as an inline cache.
</para>
</section>
</section>

View File

@@ -23,7 +23,7 @@
Running scripted smaller benchmarks during the
life cycle of the architecture helps record the system
health at different points in time. The data from
these scripted benchmarks will assist in future
these scripted benchmarks assist in future
scoping and gaining a deeper understanding of an
organization's needs.</para>
</listitem>
@@ -32,14 +32,14 @@
<term>Scale</term>
<listitem>
<para>Scaling storage solutions in a storage focused
OpenStack architecture design is driven by both initial
OpenStack architecture design is driven by initial
requirements, including <glossterm>IOPS</glossterm>,
capacity, and bandwidth, and future needs. Planning
capacity based on projected needs over the
course of a budget cycle is important for a design.
The architecture should balance cost
and capacity, while also allowing flexibility
for new technologies and methods to be implemented as
and capacity, while also allowing flexibility to
implement new technologies and methods as
they become available.</para>
</listitem>
</varlistentry>
@@ -49,10 +49,9 @@
<para>Designing security around data has multiple
points of focus that vary depending on SLAs, legal
requirements, industry regulations, and certifications
needed for systems or people. HIPPA, ISO9000, and SOX
compliance should be considered based on the type of
data. Levels of access control can be important for
certain organizations.</para>
needed for systems or people. Consider compliance with HIPPA,
ISO9000, and SOX based on the type of data. For certain
organizations, levels of access control are important.</para>
</listitem>
</varlistentry>
<varlistentry>
@@ -71,8 +70,8 @@
<varlistentry>
<term>Storage management</term>
<listitem>
<para>A range of storage
management-related considerations must be addressed in
<para>You must address a range of storage
management-related considerations in
the design of a storage focused OpenStack cloud. These
considerations include, but are not limited to, backup
strategy (and restore strategy, since a backup that
@@ -94,7 +93,7 @@
</varlistentry>
</variablelist>
<para>When building a storage focused OpenStack architecture,
strive to build a flexible design that is based on an
strive to build a flexible design based on an
industry standard core. One way of accomplishing this might be
through the use of different back ends serving different use
cases.</para>

View File

@@ -6,8 +6,7 @@
xml:id="user-requirements-storage-focus">
<?dbhtml stop-chunking?>
<title>User requirements</title>
<para>Storage-focused clouds are defined by their requirements for
data. These include:</para>
<para>Requirements for data define storage-focused clouds. These include:</para>
<itemizedlist>
<listitem>
<para>
@@ -25,9 +24,8 @@
</para>
</listitem>
</itemizedlist>
<para>A balance between cost and user
requirements dictate what methods and technologies will be
used in a cloud architecture.</para>
<para>A balance between cost and user requirements dictate
what methods and technologies to use in a cloud architecture.</para>
<variablelist>
<varlistentry>
<term>Cost</term>
@@ -94,8 +92,8 @@
<term>Data compliance</term>
<listitem>
<para>Policies governing types of
information that are required to reside in certain
locations due to regular issues and cannot reside in
information that must reside in certain
locations due to regulatory issues and cannot reside in
other locations for the same reason.</para>
</listitem>
</varlistentry>
@@ -104,17 +102,17 @@
<section xml:id="technical-requirements-storage-focus">
<title>Technical requirements</title>
<para>The following are technical requirements that could be
incorporated into the architecture design:</para>
<para>You can incorporate the following technical requirements
into the architecture design:</para>
<variablelist>
<varlistentry>
<term>Storage proximity</term>
<listitem>
<para>In order to provide high
performance or large amounts of storage space, the
design may have to accommodate storage that is each of
the hypervisors or served from a central storage
device.</para>
design may have to accommodate storage that is
attached to each hypervisor or served from a
central storage device.</para>
</listitem>
</varlistentry>
<varlistentry>
@@ -129,16 +127,15 @@
<term>Availability</term>
<listitem>
<para>Specific requirements regarding
availability will influence the technology used to
store and protect data. These requirements will
influence the cost and solution that will be
implemented.</para>
availability influence the technology used to
store and protect data. These requirements
influence cost and the implemented solution.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Security</term>
<listitem>
<para>Data will need to be protected both in
<para>You must protect data both in
transit and at rest.</para>
</listitem>
</varlistentry>