Remove passive voice from Arch Guide Chap 2

Closes-Bug: #1400552

Change-Id: I5b1572d7b5cf3321dcfa2836d7f777fe450a87e3
This commit is contained in:
Brian Moss 2015-04-23 09:48:06 +10:00
parent 2a93d184c5
commit 60002fc5be
6 changed files with 408 additions and 456 deletions

View File

@ -8,12 +8,11 @@
<para>A compute-focused cloud is a specialized subset of the general purpose
OpenStack cloud architecture. Unlike the general purpose OpenStack
architecture, which is built to host a wide variety of workloads and
architecture, which hosts a wide variety of workloads and
applications and does not heavily tax any particular computing aspect,
a compute-focused cloud is built and designed specifically to support
compute intensive workloads. As such, the design must be specifically
tailored to support hosting compute intensive workloads. Compute intensive
workloads may be CPU intensive, RAM intensive, or both. However, they are
a compute-focused cloud specifically supports
compute intensive workloads. Compute intensive
workloads may be CPU intensive, RAM intensive, or both; they are
not typically storage intensive or network intensive. Compute-focused
workloads may include the following use cases:</para>
<itemizedlist>
@ -36,11 +35,11 @@
</itemizedlist>
<para>Based on the use case requirements, such clouds might need to provide
additional services such as a virtual machine disk library, file or object
storage, firewalls, load balancers, IP addresses, and network connectivity
storage, firewalls, load balancers, IP addresses, or network connectivity
in the form of overlays or virtual local area networks (VLANs). A
compute-focused OpenStack cloud will not typically use raw block storage
services since the applications hosted on a compute-focused OpenStack
cloud generally do not need persistent block storage.</para>
compute-focused OpenStack cloud does not typically use raw block storage
services as it does not generally host applications that require
persistent block storage.</para>
<xi:include href="compute_focus/section_user_requirements_compute_focus.xml"/>
<xi:include href="compute_focus/section_tech_considerations_compute_focus.xml"/>

View File

@ -20,15 +20,15 @@
</itemizedlist>
<para>
An OpenStack cloud with extreme demands on processor and memory
resources is considered to be compute-focused, and requires hardware that
resources is compute-focused, and requires hardware that
can handle these demands. This can mean choosing hardware which might
not perform as well on storage or network capabilities. In a compute-
focused architecture, storage and networking are required while loading a
focused architecture, storage and networking load a
data set into the computational cluster, but are not otherwise in heavy
demand.
</para>
<para>
Compute (server) hardware must be evaluated against four dimensions:
Consider the following factors when selecting compute (server) hardware:
</para>
<variablelist>
<varlistentry>
@ -42,14 +42,14 @@
<term>Resource capacity</term>
<listitem>
<para>The number of CPU cores, how much RAM, or how
much storage a given server will deliver.</para>
much storage a given server delivers.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Expandability</term>
<listitem>
<para>The number of additional resources that can be
added to a server before it has reached its limit.</para>
<para>The number of additional resources you can add to a
server before it reaches its limit.</para>
</listitem>
</varlistentry>
<varlistentry>
@ -60,7 +60,7 @@
</listitem>
</varlistentry>
</variablelist>
<para>The dimensions need to be weighed against each other to determine the
<para>Weigh these considerations against each other to determine the
best design for the desired purpose. For example, increasing server density
means sacrificing resource capacity or expandability. Increasing resource
capacity and expandability can increase cost but decreases server density.
@ -68,38 +68,38 @@
resource capacity, and expandability.</para>
<para>A compute-focused cloud should have an emphasis on server hardware
that can offer more CPU sockets, more CPU cores, and more RAM. Network
connectivity and storage capacity are less critical. The hardware will
need to be configured to provide enough network connectivity and storage
connectivity and storage capacity are less critical. The hardware must
provide enough network connectivity and storage
capacity to meet minimum user requirements, but they are not the primary
consideration.</para>
<para>Some server hardware form factors are better suited than others, as
CPU and RAM capacity have the highest priority. Some considerations for
selecting hardware:</para>
<para>Some server hardware form factors suit a compute-focused architecture
better than others. CPU and RAM capacity have the highest priority. Some
considerations for selecting hardware:</para>
<itemizedlist>
<listitem>
<para>Most blade servers can support dual-socket multi-core CPUs. To
avoid this CPU limit, select "full width" or "full height" blades,
however this will also decrease the server density. For example,
high density blade servers (like HP BladeSystem or Dell PowerEdge
M1000e) which support up to 16 servers in only ten rack units. Using
avoid this CPU limit, select "full width" or "full height" blades.
Be aware, however, that this also decreases server density. For example,
high density blade servers such as HP BladeSystem or Dell PowerEdge
M1000e support up to 16 servers in only ten rack units. Using
half-height blades is twice as dense as using full-height blades,
which results in only eight servers per ten rack units.</para>
</listitem>
<listitem>
<para>1U rack-mounted servers (servers that occupy only a single rack
unit) may be able to offer greater server density than a blade server
<para>1U rack-mounted servers that occupy only a single rack
unit may offer greater server density than a blade server
solution. It is possible to place forty 1U servers in a rack, providing
space for the top of rack (ToR) switches, compared to 32 full width
blade servers. However, as of the Icehouse release, 1U servers from
the major vendors are limited to dual-socket, multi-core CPU
the major vendors have only dual-socket, multi-core CPU
configurations. To obtain greater than dual-socket support in a 1U
rack-mount form factor, you will need to buy systems from original
rack-mount form factor, purchase systems from original
design (ODMs) or second-tier manufacturers.</para>
</listitem>
<listitem>
<para>2U rack-mounted servers provide quad-socket, multi-core CPU
support, but with a corresponding decrease in server density (half
the density offered by 1U rack-mounted servers).</para>
the density that 1U rack-mounted servers offer).</para>
</listitem>
<listitem>
<para>Larger rack-mounted servers, such as 4U servers, often provide
@ -108,8 +108,8 @@
have much lower server density and are often more expensive.</para>
</listitem>
<listitem>
<para>"Sled servers" (rack-mounted servers that support multiple
independent servers in a single 2U or 3U enclosure) deliver increased
<para>"Sled servers" are rack-mounted servers that support multiple
independent servers in a single 2U or 3U enclosure. These deliver higher
density as compared to typical 1U or 2U rack-mounted servers. For
example, many sled servers offer four independent dual-socket
nodes in 2U for a total of eight CPU sockets in 2U. However, the
@ -125,7 +125,7 @@
<listitem>
<para>In a compute-focused architecture, instance density is
lower, which means CPU and RAM over-subscription ratios are
also lower. More hosts will be required to support the anticipated
also lower. You require more hosts to support the anticipated
scale due to instance density being lower, especially if the
design uses dual-socket hardware designs.</para>
</listitem>
@ -134,8 +134,8 @@
<term>Host density</term>
<listitem>
<para>Another option to address the higher host count
that might be needed with dual socket designs is to use a quad
socket platform. Taking this approach will decrease host density,
of dual socket designs is to use a quad
socket platform. Taking this approach decreases host density,
which increases rack count. This configuration may
affect the network requirements, the number of power connections, and
possibly impact the cooling requirements.</para>
@ -145,64 +145,62 @@
<term>Power and cooling density</term>
<listitem>
<para>The power and cooling density
requirements might be lower than with blade, sled, or 1U server
designs because of lower host density (by using 2U, 3U or even 4U
server designs). For data centers with older infrastructure, this may
requirements for 2U, 3U or even 4U server designs might be lower
than for blade, sled, or 1U server designs because of lower host
density. For data centers with older infrastructure, this may
be a desirable feature.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Compute-focused OpenStack design architecture server hardware
selection results in a "scale up" versus "scale out" decision.
Selecting a better solution, smaller number of larger hosts, or a
larger number of smaller hosts depends on a combination of factors:
<para>When designing a compute-focused OpenStack architecture, you must
consider whether you intend to scale up or scale out.
Selecting a smaller number of larger hosts, or a
larger number of smaller hosts, depends on a combination of factors:
cost, power, cooling, physical rack and floor space, support-warranty,
and manageability.</para>
<section xml:id="storage-hardware-selection">
<title>Storage hardware selection</title>
<para>For a compute-focused OpenStack design architecture, the selection of
storage hardware is not critical as it is not primary criteria, however
it is still important. There are a number of different factors that a
cloud architect must consider:</para>
<para>For a compute-focused OpenStack architecture, the
selection of storage hardware is not critical as it is not a primary
consideration. Nonetheless, there are several factors
to consider:</para>
<variablelist>
<varlistentry>
<term>Cost</term>
<listitem>
<para>The overall cost of the solution will play a major role
in what storage architecture (and resulting storage hardware) is
selected.</para>
<para>The overall cost of the solution plays a major role
in what storage architecture and storage hardware you select.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Performance</term>
<listitem>
<para>The performance of the solution is also a big
role and can be measured by observing the latency of storage I-O
<para>The performance of the storage solution is important; you can
measure it by observing the latency of storage I-O
requests. In a compute-focused OpenStack cloud, storage latency
can be a major consideration. In some compute-intensive
workloads, minimizing the delays that the CPU experiences while
fetching data from the storage can have a significant impact on
fetching data from storage can impact significantly on
the overall performance of the application.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Scalability</term>
<listitem>
<para>This section will refer to the term "scalability"
to refer to how well the storage solution performs as it is
expanded up to its maximum size. A storage solution that performs
<para>Scalability refers to the performance of a storage solution
as it expands to its maximum size. A solution that performs
well in small configurations but has degrading
performance as it expands would not be considered scalable. On
performance as it expands is not scalable. On
the other hand, a solution that continues to perform well at
maximum expansion would be considered scalable.</para>
maximum expansion is scalable.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Expandability</term>
<listitem>
<para>Expandability refers to the overall ability of
the solution to grow. A storage solution that expands to 50 PB is
considered more expandable than a solution that only scales to 10PB.
a storage solution to grow. A solution that expands to 50 PB is
more expandable than a solution that only scales to 10PB.
Note that this metric is related to, but different
from, scalability, which is a measure of the solution's
performance as it expands.</para>
@ -211,23 +209,20 @@
</variablelist>
<para>For a compute-focused OpenStack cloud, latency of storage is a
major consideration. Using solid-state disks (SSDs) to minimize
latency for instance storage and reduce CPU delays caused by waiting
for the storage will increase performance. Consider using RAID
latency for instance storage reduces CPU delays related to storage
and improves performance. Consider using RAID
controller cards in compute hosts to improve the performance of the
underlying disk subsystem.</para>
<para>The selection of storage architecture, and the corresponding
storage hardware (if there is the option), is determined by evaluating
possible solutions against the key factors listed above. This will
determine if a scale-out solution (such as Ceph, GlusterFS, or similar)
should be used, or if a single, highly expandable and scalable
centralized storage array would be a better choice. If a centralized
storage array is the right fit for the requirements, the hardware will
be determined by the array vendor. It is also possible to build a
storage array using commodity hardware with Open Source software, but
there needs to be access to people with expertise to build such a
system. Conversely, a scale-out storage solution that uses
<para>Evaluate solutions against the key factors above when considering
your storage architecture. This determines if a scale-out solution
such as Ceph or GlusterFS is suitable, or if a single, highly expandable,
scalable, centralized storage array is better. If a centralized
storage array suits the requirements, the array vendor determines the
hardware. You can build a storage array using commodity hardware with
Open Source software, but you require people with expertise to build
such a system. Conversely, a scale-out storage solution that uses
direct-attached storage (DAS) in the servers may be an appropriate
choice. If so, then the server hardware needs to be configured to
choice. If so, then the server hardware must
support the storage solution.</para>
<para>The following lists some of the potential impacts that may affect a
particular storage architecture, and the corresponding storage hardware,
@ -236,92 +231,89 @@
<varlistentry>
<term>Connectivity</term>
<listitem>
<para>Based on the storage solution selected, ensure
the connectivity matches the storage solution requirements. If a
centralized storage array is selected, it is important to determine
how the hypervisors will connect to the storage array. Connectivity
could affect latency and thus performance, so check that the network
characteristics will minimize latency to boost the overall
<para>Ensure connectivity matches the storage solution requirements.
If you select a centralized storage array, determine how the
hypervisors should connect to the storage array. Connectivity
can affect latency and thus performance, so ensure that the network
characteristics minimize latency to boost the overall
performance of the design.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Latency</term>
<listitem>
<para>Determine if the use case will have consistent or
<para>Determine if the use case has consistent or
highly variable latency.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Throughput</term>
<listitem>
<para>To improve overall performance, make sure that the
storage solution throughout is optimized. While it is not likely
that a compute-focused cloud will have major data I-O to and
from storage, this is an important factor to consider.</para>
<para>To improve overall performance, ensure that you optimize the
storage solution. While a compute-focused cloud does not usually
have major data I-O to and from storage, this is an important
factor to consider.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Server Hardware</term>
<listitem>
<para>If the solution uses DAS, this impacts, and
is not limited to, the server hardware choice that will ripple into
<para>If the solution uses DAS, this impacts the server hardware choice,
host density, instance density, power density, OS-hypervisor, and
management tools.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Where instances need to be made highly available, or they need to be
capable of migration between hosts, use of a shared storage file-system
to store instance ephemeral data should be employed to ensure that
<para>When instances must be highly available or capable of migration
between hosts, use a shared storage file-system
to store instance ephemeral data to ensure that
compute services can run uninterrupted in the event of a node
failure.</para>
</section>
<section xml:id="selecting-networking-hardware-arch">
<title>Selecting networking hardware</title>
<para>Some of the key considerations that should be included in
the selection of networking hardware include:</para>
<para>Some of the key considerations for networking hardware selection
include:</para>
<variablelist>
<varlistentry>
<term>Port count</term>
<listitem>
<para>The design will require networking hardware that
<para>The design requires networking hardware that
has the requisite port count.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Port density</term>
<listitem>
<para>The network design will be affected by the
physical space that is required to provide the requisite port count.
<para>The required port count affects the physical space that a
network design requires.
A switch that can provide 48 10 GbE ports in 1U has a much higher
port density than a switch that provides 24 10 GbE ports in 2U. A
higher port density is preferred, as it leaves more rack space for
compute or storage components that might be required by the design.
This also leads into concerns about fault domains and power density
that must also be considered. Higher density switches are more
expensive and should also be considered, as it is important not to
over design the network if it is not required.</para>
higher port density is better, as it leaves more rack space for
compute or storage components. You must also consider fault
domains and power density. Although more expensive, you can also
consider higher density switches as it is important not to
design the network beyond requirements.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Port speed</term>
<listitem>
<para>The networking hardware must support the proposed
network speed, for example: 1 GbE, 10 GbE, or 40 GbE (or even 100
GbE).</para>
network speed, for example: 1 GbE, 10 GbE, 40 GbE, or 100
GbE.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Redundancy</term>
<listitem>
<para>The level of network hardware redundancy required
is influenced by the user requirements for high availability and
cost considerations. Network redundancy can be achieved by adding
<para>User requirements for high availability and cost considerations
influence the level of network hardware redundancy you require.
You can achieve network redundancy by adding
redundant power supplies or paired switches. If this is a
requirement, the hardware will need to support this configuration.
User requirements will determine if a completely redundant network
infrastructure is required.</para>
requirement, the hardware must support this configuration.
User requirements determine if you require a completely redundant network
infrastructure.</para>
</listitem>
</varlistentry>
<varlistentry>
@ -335,29 +327,24 @@
</listitem>
</varlistentry>
</variablelist>
<para>It is important to first understand additional factors as well as
the use case because these additional factors heavily influence the
cloud network architecture. Once these key considerations have been
decided, the proper network can be designed to best serve the workloads
being placed in the cloud.</para>
<para>We recommend designing the network architecture using
a scalable network model that makes it easy to add capacity and
bandwidth. A good example of such a model is the leaf-spline model. In
this type of network design, it is possible to easily add additional
bandwidth as well as scale out to additional racks of gear. It is
important to select network hardware that will support the required
port count, port speed and port density while also allowing for future
important to select network hardware that supports the required
port count, port speed, and port density while also allowing for future
growth as workload demands increase. It is also important to evaluate
where in the network architecture it is valuable to provide redundancy.
Increased network availability and redundancy comes at a cost, therefore
we recommend to weigh the cost versus the benefit gained from
we recommend weighing the cost versus the benefit gained from
utilizing and deploying redundant network switches and using bonded
interfaces at the host level.</para>
</section>
<section xml:id="software-selection-arch">
<title>Software selection</title>
<para>Selecting software to be included in a compute-focused OpenStack
architecture design must include three main areas:</para>
<para>Consider your selection of software for a compute-focused
OpenStack:</para>
<itemizedlist>
<listitem>
<para>Operating system (OS) and hypervisor</para>
@ -377,24 +364,19 @@
<para>The selection of operating system (OS) and hypervisor has a
significant impact on the end point design. Selecting a particular
operating system and hypervisor could affect server hardware selection.
For example, a selected combination needs to be supported on the selected
hardware. Ensuring the storage hardware selection and topology supports
the selected operating system and hypervisor combination should also be
considered. Additionally, make sure that the networking hardware
selection and topology will work with the chosen operating system and
hypervisor combination. For example, if the design uses Link Aggregation
Control Protocol (LACP), the hypervisor needs to support it.</para>
<para>Some areas that could be impacted by the selection of OS and
hypervisor include:</para>
The node, networking, and storage hardware must support the selected
combination. For example, if the design uses Link Aggregation
Control Protocol (LACP), the hypervisor must support it.</para>
<para>OS and hypervisor selection impact the following areas:</para>
<variablelist>
<varlistentry>
<term>Cost</term>
<listitem>
<para>Selecting a commercially supported hypervisor such as
Microsoft Hyper-V will result in a different cost model rather than
choosing a community-supported open source hypervisor like Kinstance
Microsoft Hyper-V results in a different cost model from
choosing a community-supported, open source hypervisor like Kinstance
or Xen. Even within the ranks of open source solutions, choosing
Ubuntu over Red Hat (or vice versa) will have an impact on cost due
one solution over another can impact cost due
to support contracts. On the other hand, business or application
requirements might dictate a specific or commercially supported
hypervisor.</para>
@ -403,11 +385,9 @@
<varlistentry>
<term>Supportability</term>
<listitem>
<para>Depending on the selected hypervisor, the staff
should have the appropriate training and knowledge to support the
selected OS and hypervisor combination. If they do not, training
will need to be provided which could have a cost impact on the
design.</para>
<para>Staff require appropriate training and knowledge to support the
selected OS and hypervisor combination. Consideration of training
costs may impact the design.</para>
</listitem>
</varlistentry>
<varlistentry>
@ -415,10 +395,8 @@
<listitem>
<para>The management tools used for Ubuntu and
Kinstance differ from the management tools for VMware vSphere.
Although both OS and hypervisor combinations are supported by
OpenStack, there will be very different impacts to the rest of the
design as a result of the selection of one combination versus the
other.</para>
Although OpenStack supports both OS and hypervisor combinations,
the choice of tool impacts the rest of the design.</para>
</listitem>
</varlistentry>
<varlistentry>
@ -426,7 +404,7 @@
<listitem>
<para>Ensure that selected OS and hypervisor
combinations meet the appropriate scale and performance
requirements. The chosen architecture will need to meet the targeted
requirements. The chosen architecture must meet the targeted
instance-host ratios with the selected OS-hypervisor
combination.</para>
</listitem>
@ -435,52 +413,47 @@
<term>Security</term>
<listitem>
<para>Ensure that the design can accommodate the regular
periodic installation of application security patches while
installation of application security patches while
maintaining the required workloads. The frequency of security
patches for the proposed OS-hypervisor combination will have an
impact on performance and the patch installation process could
patches for the proposed OS-hypervisor combination has an
impact on performance and the patch installation process can
affect maintenance windows.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Supported features</term>
<listitem>
<para>Determine what features of OpenStack are
required. This will often determine the selection of the
<para>Determine what features of OpenStack you require.
The choice of features often determines the selection of the
OS-hypervisor combination. Certain features are only available with
specific OSs or hypervisors. For example, if certain features are
not available, the design might need to be modified to meet the user
requirements.</para>
not available, modify the design to meet user requirements.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Interoperability</term>
<listitem>
<para>Consideration should be given to the ability
of the selected OS-hypervisor combination to interoperate or
co-exist with other OS-hypervisors, or other software solutions in
the overall design (if required). Operational and troubleshooting
tools for one OS-hypervisor combination may differ from the
tools used for another OS-hypervisor combination and,
as a result, the design will need to address if the
two sets of tools need to interoperate.</para>
<para>Consider the ability of the selected OS-hypervisor combination
to interoperate or co-exist with other OS-hypervisors, or with
other software solutions in the overall design. Operational and
troubleshooting tools for one OS-hypervisor combination may differ
from the tools for another OS-hypervisor combination. The design
must address if the two sets of tools need to interoperate.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section xml:id="openstack-components-arch">
<title>OpenStack components</title>
<para>The selection of which OpenStack components will actually be
included in the design and deployed has significant impact. There are
certain components that will always be present, (Compute and Image service, for
example) yet there are other services that might not need to be present.
For example, a certain design may not require the Orchestration module.
Omitting Heat would not typically have a significant impact on the
overall design. However, if the architecture uses a replacement for
OpenStack Object Storage for its storage component, this could potentially have
<para>The selection of OpenStack components has a significant impact.
There are certain components that are omnipresent, for example the compute
and image services, but others, such as the orchestration module may not
be present. Omitting heat does not typically have a significant impact
on the overall design. However, if the architecture uses a replacement for
OpenStack Object Storage for its storage component, this could have
significant impacts on the rest of the design.</para>
<para>For a compute-focused OpenStack design architecture, the
following components would be used:</para>
following components may be present:</para>
<itemizedlist>
<listitem>
<para>Identity (keystone)</para>
@ -504,96 +477,89 @@
<para>Orchestration (heat)</para>
</listitem>
</itemizedlist>
<para>OpenStack Block Storage would potentially not be incorporated
into a compute-focused design due to persistent block storage not
being a significant requirement for the types of workloads that would
be deployed onto instances running in a compute-focused cloud. However,
there may be some situations where the need for performance dictates
that a block storage component be used to improve data I-O.</para>
<para>The exclusion of certain OpenStack components might also limit or
constrain the functionality of other components. If a design opts to
<para>A compute-focused design is less likely to include OpenStack Block
Storage due to persistent block storage not
being a significant requirement for the expected workloads. However,
there may be some situations where the need for performance employs
a block storage component to improve data I-O.</para>
<para>The exclusion of certain OpenStack components might also limit the
functionality of other components. If a design opts to
include the Orchestration module but excludes the Telemetry module, then
the design will not be able to take advantage of Orchestration's auto
scaling functionality (which relies on information from Telemetry). This is due
to the fact that you can use Orchestration to spin up a large number of
instances to perform the compute-intensive processing. This includes
Orchestration in a compute-focused architecture design, which is strongly
recommended.</para>
the design cannot take advantage of Orchestration's auto
scaling functionality as this relies on information from Telemetry.</para>
</section>
<section xml:id="supplemental-software">
<title>Supplemental software</title>
<para>While OpenStack is a fairly complete collection of software
projects for building a platform for cloud services, there are
invariably additional pieces of software that might need to be
added to any given OpenStack design.</para>
invariably additional pieces of software that you might add
to an OpenStack design.</para>
<section xml:id="networking-software-arch">
<title>Networking software</title>
<para>OpenStack Networking provides a wide variety of networking services
for instances. There are many additional networking software packages
that might be useful to manage the OpenStack components themselves.
Some examples include software to provide load balancing,
network redundancy protocols, and routing daemons. Some of these
software packages are described in more detail in the
network redundancy protocols, and routing daemons. The
<citetitle>OpenStack High Availability Guide</citetitle> (<link
xlink:href="http://docs.openstack.org/high-availability-guide/content">http://docs.openstack.org/high-availability-guide/content</link>).
xlink:href="http://docs.openstack.org/high-availability-guide/content">http://docs.openstack.org/high-availability-guide/content</link>)
describes some of these software packages in more detail.
</para>
<para>For a compute-focused OpenStack cloud, the OpenStack infrastructure
components will need to be highly available. If the design does not
include hardware load balancing, networking software packages like
HAProxy will need to be included.</para>
components must be highly available. If the design does not
include hardware load balancing, you must add networking software packages
like HAProxy.</para>
</section>
<section xml:id="management-software-arch">
<title>Management software</title>
<para>The selected supplemental software solution impacts and affects
the overall OpenStack cloud design. This includes software for
providing clustering, logging, monitoring and alerting.</para>
<para>Inclusion of clustering Software, such as Corosync or Pacemaker,
is determined primarily by the availability design requirements.
Therefore, the impact of including (or not including) these software
packages is primarily determined by the availability of the cloud
infrastructure and the complexity of supporting the configuration after
it is deployed. The OpenStack High Availability Guide provides more
details on the installation and configuration of Corosync and Pacemaker,
should these packages need to be included in the design.</para>
<para>Requirements for logging, monitoring, and alerting are determined
by operational considerations. Each of these sub-categories includes
a number of various options. For example, in the logging sub-category
one might consider Logstash, Splunk, Log Insight, or some other log
aggregation-consolidation tool. Logs should be stored in a centralized
location to make it easier to perform analytics against the data. Log
<para>The availability of design requirements is the main determination
for the inclusion of clustering Software, such as Corosync or Pacemaker.
Therefore, the availability of the cloud infrastructure and the
complexity of supporting the configuration after deployment impacts
the inclusion of these software packages. The OpenStack High Availability
Guide provides more
details on the installation and configuration of Corosync and Pacemaker.
</para>
<para>Operational considerations determine the requirements for logging,
monitoring, and alerting. Each of these sub-categories includes
various options. For example, in the logging sub-category
consider Logstash, Splunk, Log Insight, or some other log
aggregation-consolidation tool. Store logs in a centralized
location to ease analysis of the data. Log
data analytics engines can also provide automation and issue
notification by providing a mechanism to both alert and automatically
attempt to remediate some of the more commonly known issues.</para>
<para>If any of these software packages are needed, then the design
must account for the additional resource consumption (CPU, RAM,
storage, and network bandwidth for a log aggregation solution, for
example). Some other potential design impacts include:</para>
notification by alerting and
attempting to remediate some of the more commonly known issues.</para>
<para>If you require any of these software packages, then the design
must account for the additional resource consumption.
Some other potential design impacts include:</para>
<itemizedlist>
<listitem>
<para>OS-hypervisor combination: Ensure that the selected logging,
<para>OS-hypervisor combination: ensure that the selected logging,
monitoring, or alerting tools support the proposed OS-hypervisor
combination.</para>
</listitem>
<listitem>
<para>Network hardware: The network hardware selection needs to be
supported by the logging, monitoring, and alerting software.</para>
<para>Network hardware: the logging, monitoring, and alerting software
must support the network hardware selection.</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="database-software-arch">
<title>Database software</title>
<para>A large majority of the OpenStack components require access to
<para>A large majority of OpenStack components require access to
back-end database services to store state and configuration
information. Selection of an appropriate back-end database that will
satisfy the availability and fault tolerance requirements of the
OpenStack services is required. OpenStack services support connecting
to any database that is supported by the SQLAlchemy Python drivers,
information. Select an appropriate back-end database that
satisfies the availability and fault tolerance requirements of the
OpenStack services. OpenStack services support connecting
to any database that the SQLAlchemy Python drivers support,
however most common database deployments make use of MySQL or some
variation of it. We recommend that the database which provides
back-end services within a general-purpose cloud, be made highly
available using an available technology which can accomplish that
goal. Some of the more common software solutions used include Galera,
MariaDB and MySQL with multi-master replication.</para>
variation of it. We recommend that you make the database that provides
back-end services within a general-purpose cloud highly
available. Some of the more common software solutions include Galera,
MariaDB, and MySQL with multi-master replication.</para>
</section>
</section>
</section>

View File

@ -52,7 +52,7 @@
cloud service providers or telecom providers. Smaller implementations
are more inclined to rely on smaller support teams that need
to combine the engineering, design, and operation roles.</para>
<para>The maintenance of OpenStack installations require a variety
<para>The maintenance of OpenStack installations requires a variety
of technical skills. To ease the operational burden, consider
incorporating features into the architecture and
design. Some examples include:</para>
@ -61,7 +61,7 @@
<para>Automating the operations functions</para>
</listitem>
<listitem>
<para>Utilising a third party management company</para>
<para>Utilizing a third party management company</para>
</listitem>
</itemizedlist>
</section>
@ -86,7 +86,7 @@
<section xml:id="expected-unexpected-server-downtime">
<title>Expected and unexpected server downtime</title>
<para>Unexpected server downtime is inevitable, and SLAs can
be used to address how long it takes to recover from failure.
address how long it takes to recover from failure.
Recovery of a failed host means restoring instances from a snapshot, or
respawning that instance on another available host.</para>
<para>It is acceptable to design a compute-focused cloud
@ -103,10 +103,10 @@
<para>Adding extra capacity to an OpenStack cloud is a
horizontally scaling process.</para>
<note>
<para>Be mindful, however, of any additional work to place the nodes into
appropriate Availability Zones and Host Aggregates if necessary.</para>
<para>Be mindful, however, of additional work to place the nodes into
appropriate Availability Zones and Host Aggregates.</para>
</note>
<para>We recommend the same (or very similar) CPUs
<para>We recommend the same or very similar CPUs
when adding extra nodes to the environment because they reduce
the chance of breaking live-migration features if they are
present. Scaling out hypervisor hosts also has a direct effect
@ -116,9 +116,8 @@
<para>Changing the internal components of a Compute host to account for
increases in demand is a process known as vertical scaling.
Swapping a CPU for one with more cores, or
increasing the memory in a server, can help add extra needed
capacity depending on whether the running applications are
more CPU intensive or memory based.</para>
increasing the memory in a server, can help add extra
capacity for running applications.</para>
<para>Another option is to assess the average workloads and
increase the number of instances that can run within the
compute environment by adjusting the overcommit ratio. While

View File

@ -7,7 +7,7 @@
<?dbhtml stop-chunking?>
<title>Prescriptive examples</title>
<para>The Conseil Européen pour la Recherche Nucléaire (CERN),
also known as the European Organization for, Nuclear Research
also known as the European Organization for Nuclear Research,
provides particle accelerators and other infrastructure for
high-energy physics research.</para>
<para>As of 2011 CERN operated these two compute centers in Europe
@ -43,35 +43,35 @@
</tr>
</tbody>
</informaltable>
<para>To support a growing number of compute heavy users of
experiments related to the Large Hadron Collider (LHC) CERN
<para>To support a growing number of compute-heavy users of
experiments related to the Large Hadron Collider (LHC), CERN
ultimately elected to deploy an OpenStack cloud using
Scientific Linux and RDO. This effort aimed to simplify the
management of the center's compute resources with a view to
doubling compute capacity through the addition of an
additional data center in 2013 while maintaining the same
doubling compute capacity through the addition of a
data center in 2013 while maintaining the same
levels of compute staff.</para>
<para>The CERN solution uses <glossterm baseform="cell">cells</glossterm>
for segregation of compute
resources and to transparently scale between different data
resources and for transparently scaling between different data
centers. This decision meant trading off support for security
groups and live migration. In addition some details like
flavors needed to be manually replicated across cells. In
spite of these drawbacks cells were determined to provide the
groups and live migration. In addition, they must manually replicate
some details, like flavors, across cells. In
spite of these drawbacks cells provide the
required scale while exposing a single public API endpoint to
users.</para>
<para>A compute cell was created for each of the two original data
centers and a third was created when a new data center was
added in 2013. Each cell contains three availability zones to
<para>CERN created a compute cell for each of the two original data
centers and created a third when it added a new data center
in 2013. Each cell contains three availability zones to
further segregate compute resources and at least three
RabbitMQ message brokers configured to be clustered with
RabbitMQ message brokers configured for clustering with
mirrored queues for high availability.</para>
<para>The API cell, which resides behind a HAProxy load balancer,
is located in the data center in Switzerland and directs API
is in the data center in Switzerland and directs API
calls to compute cells using a customized variation of the
cell scheduler. The customizations allow certain workloads to
be directed to a specific data center or "all" data centers
with cell selection determined by cell RAM availability in the
route to a specific data center or all data centers,
with cell RAM availability determining cell selection in the
latter case.</para>
<mediaobject>
<imageobject>
@ -94,45 +94,42 @@
single default.
</para></listitem>
</itemizedlist>
<para>The MySQL database server in each cell is managed by a
central database team and configured in an active/passive
configuration with a NetApp storage back end. Backups are
performed ever 6 hours.</para>
<para>A central database team manages the MySQL database server in each cell
in an active/passive configuration with a NetApp storage back end.
Backups run every 6 hours.</para>
<section xml:id="network-architecture">
<title>Network architecture</title>
<para>To integrate with existing CERN networking infrastructure
customizations were made to legacy networking (nova-network). This was in the
<para>To integrate with existing networking infrastructure, CERN
made customizations to legacy networking (nova-network). This was in the
form of a driver to integrate with CERN's existing database
for tracking MAC and IP address assignments.</para>
<para>The driver facilitates selection of a MAC address and IP for
new instances based on the compute node the scheduler places
the instance on</para>
<para>The driver considers the compute node that the scheduler
placed an instance on and then selects a MAC address and IP
new instances based on the compute node where the scheduler places
the instance.</para>
<para>The driver considers the compute node where the scheduler
placed an instance and selects a MAC address and IP
from the pre-registered list associated with that node in the
database. The database is then updated to reflect the instance
the addresses were assigned to.</para></section>
database. The database updates to reflect the address assignment to
that instance.</para></section>
<section xml:id="storage-architecture">
<title>Storage architecture</title>
<para>The OpenStack Image service is deployed in the API cell and
configured to expose version 1 (V1) of the API. As a result
the image registry is also required. The storage back end in
<para>CERN deploys the OpenStack Image service in the API cell and
configures it to expose version 1 (V1) of the API. This also requires
the image registry. The storage back end in
use is a 3 PB Ceph cluster.</para>
<para>A small set of "golden" Scientific Linux 5 and 6 images are
maintained which applications can in turn be placed on using
orchestration tools. Puppet is used for instance configuration
management and customization but Orchestration deployment is
expected.</para></section>
<para>CERN maintains a small set of Scientific Linux 5 and 6 images onto
which orchestration tools can place applications. Puppet manages
instance configuration and customization.</para></section>
<section xml:id="monitoring">
<title>Monitoring</title>
<para>Although direct billing is not required, the Telemetry module
is used to perform metering for the purposes of adjusting
project quotas. A sharded, replicated, MongoDB back end is
used. To spread API load, instances of the nova-api service
were deployed within the child cells for Telemetry to query
against. This also meant that some supporting services
including keystone, glance-api and glance-registry needed to
also be configured in the child cells.</para>
<para>CERN does not require direct billing, but uses the Telemetry module
to perform metering for the purposes of adjusting
project quotas. CERN uses a sharded, replicated, MongoDB back-end.
To spread API load, CERN deploys instances of the nova-api service
within the child cells for Telemetry to query
against. This also requires the configuration of supporting services
such as keystone, glance-api, and glance-registry in the child cells.
</para>
<mediaobject>
<imageobject>
<imagedata contentwidth="4in"

View File

@ -11,43 +11,46 @@
<?dbhtml stop-chunking?>
<title>Technical considerations</title>
<para>In a compute-focused OpenStack cloud, the type of instance
workloads being provisioned heavily influences technical
workloads you provision heavily influences technical
decision making. For example, specific use cases that demand
multiple short running jobs present different requirements
multiple, short-running jobs present different requirements
than those that specify long-running jobs, even though both
situations are considered "compute focused."</para>
situations are compute focused.</para>
<para>Public and private clouds require deterministic capacity
planning to support elastic growth in order to meet user SLA
expectations. Deterministic capacity planning is the path to
predicting the effort and expense of making a given process
consistently performant. This process is important because,
when a service becomes a critical part of a user's
infrastructure, the user's fate becomes wedded to the SLAs of
the cloud itself. In cloud computing, a service's performance
will not be measured by its average speed but rather by the
consistency of its speed.</para>
<para>There are two aspects of capacity planning to consider:
planning the initial deployment footprint, and planning
expansion of it to stay ahead of the demands of cloud
users.</para>
<para>Planning the initial footprint for an OpenStack deployment
is typically done based on existing infrastructure workloads
infrastructure, the user's experience links directly to the SLAs of
the cloud itself. In cloud computing, it is not average speed but
speed consistency that determines a service's performance.
There are two aspects of capacity planning to consider:</para>
<itemizedlist>
<listitem>
<para>planning the initial deployment footprint</para>
</listitem>
<listitem>
<para>planning expansion of it to stay ahead of the demands of cloud
users</para>
</listitem>
</itemizedlist>
<para>Plan the initial footprint for an OpenStack deployment
based on existing infrastructure workloads
and estimates based on expected uptake.</para>
<para>The starting point is the core count of the cloud. By
applying relevant ratios, the user can gather information
about:</para>
<itemizedlist>
<listitem>
<para>The number of instances expected to be available
concurrently: (overcommit fraction × cores) / virtual
cores per instance</para>
<para>The number of expected concurrent instances:
(overcommit fraction × cores) / virtual cores per instance</para>
</listitem>
<listitem>
<para>How much storage is required: flavor disk size ×
number of instances</para>
<para>Required storage: flavor disk size × number of instances</para>
</listitem>
</itemizedlist>
<para>These ratios can be used to determine the amount of
<para>Use these ratios to determine the amount of
additional infrastructure needed to support the cloud. For
example, consider a situation in which you require 1600
instances, each with 2 vCPU and 50 GB of storage. Assuming the
@ -69,18 +72,18 @@
services, database servers, and queue servers are likely to
encounter.</para>
<para>Consider, for example, the differences between a cloud that
supports a managed web-hosting platform with one running
supports a managed web-hosting platform and one running
integration tests for a development project that creates one
instance per code commit. In the former, the heavy work of
creating an instance happens only every few months, whereas
the latter puts constant heavy load on the cloud controller.
The average instance lifetime must be considered, as a larger
The average instance lifetime is significant, as a larger
number generally means less load on the cloud
controller.</para>
<para>Aside from the creation and termination of instances, the
impact of users must be considered when accessing the service,
<para>Aside from the creation and termination of instances, consider the
impact of users accessing the service,
particularly on nova-api and its associated database. Listing
instances garners a great deal of information and, given the
instances gathers a great deal of information and, given the
frequency with which users run this operation, a cloud with a
large number of users can increase the load significantly.
This can even occur unintentionally. For example, the
@ -88,8 +91,8 @@
instances every 30 seconds, so leaving it open in a browser
window can cause unexpected load.</para>
<para>Consideration of these factors can help determine how many
cloud controller cores are required. A server with 8 CPU cores
and 8 GB of RAM server would be sufficient for up to a rack of
cloud controller cores you require. A server with 8 CPU cores
and 8 GB of RAM server would be sufficient for a rack of
compute nodes, given the above caveats.</para>
<para>Key hardware specifications are also crucial to the
performance of user instances. Be sure to consider budget and
@ -98,59 +101,58 @@
bandwidth (Gbps/core), and overall CPU performance
(CPU/core).</para>
<para>The cloud resource calculator is a useful tool in examining
the impacts of different hardware and instance load outs. It
is available at: <link xlink:href="https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods">https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods</link>
the impacts of different hardware and instance load outs. See:
<link xlink:href="https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods">https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods</link>
</para>
<section xml:id="expansion-planning-compute-focus">
<title>Expansion planning</title>
<para>A key challenge faced when planning the expansion of cloud
<para>A key challenge for planning the expansion of cloud
compute services is the elastic nature of cloud infrastructure
demands. Previously, new users or customers would be forced to
demands. Previously, new users or customers had to
plan for and request the infrastructure they required ahead of
time, allowing time for reactive procurement processes. Cloud
computing users have come to expect the agility provided by
having instant access to new resources as they are required.
Consequently, this means planning should be delivered for
typical usage, but also more importantly, for sudden bursts in
computing users have come to expect the agility of having
instant access to new resources as required.
Consequently, plan for typical usage and for sudden bursts in
usage.</para>
<para>Planning for expansion can be a delicate balancing act.
<para>Planning for expansion is a balancing act.
Planning too conservatively can lead to unexpected
oversubscription of the cloud and dissatisfied users. Planning
for cloud expansion too aggressively can lead to unexpected
underutilization of the cloud and funds spent on operating
infrastructure that is not being used efficiently.</para>
<para>The key is to carefully monitor the spikes and valleys in
underutilization of the cloud and funds spent unnecessarily on operating
infrastructure.</para>
<para>The key is to carefully monitor the trends in
cloud usage over time. The intent is to measure the
consistency with which services can be delivered, not the
consistency with which you deliver services, not the
average speed or capacity of the cloud. Using this information
to model performance results in capacity enables users to more
to model capacity performance enables users to more
accurately determine the current and future capacity of the
cloud.</para></section>
<section xml:id="cpu-and-ram-compute-focus"><title>CPU and RAM</title>
<para>(Adapted from:
<link xlink:href="http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice">http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice</link>)</para>
<para>Adapted from:
<link xlink:href="http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice">http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice</link></para>
<para>In current generations, CPUs have up to 12 cores. If an
Intel CPU supports Hyper-Threading, those 12 cores are doubled
to 24 cores. If a server is purchased that supports multiple
CPUs, the number of cores is further multiplied.
Intel CPU supports Hyper-Threading, those 12 cores double
to 24 cores. A server that supports multiple CPUs multiplies
the number of available cores.
Hyper-Threading is Intel's proprietary simultaneous
multi-threading implementation, used to improve
parallelization on their CPUs. Consider enabling
Hyper-Threading to improve the performance of multithreaded
applications.</para>
<para>Whether the user should enable Hyper-Threading on a CPU
depends upon the use case. For example, disabling
depends on the use case. For example, disabling
Hyper-Threading can be beneficial in intense computing
environments. Performance testing conducted by running local
workloads with both Hyper-Threading on and off can help
determine what is more appropriate in any particular
environments. Running performance tests using local
workloads with and without Hyper-Threading can help
determine which option is more appropriate in any particular
case.</para>
<para>If the Libvirt/KVM hypervisor driver are the intended use
cases, then the CPUs used in the compute nodes must support
<para>If they must run the Libvirt or KVM hypervisor drivers,
then the compute node CPUs must support
virtualization by way of the VT-x extensions for Intel chips
and AMD-v extensions for AMD chips to provide full
performance.</para>
<para>OpenStack enables the user to overcommit CPU and RAM on
<para>OpenStack enables users to overcommit CPU and RAM on
compute nodes. This allows an increase in the number of
instances running on the cloud at the cost of reducing the
performance of the instances. OpenStack Compute uses the
@ -179,8 +181,8 @@
the RAM associated with the instances reaches 72 GB (such as
nine instances, in the case where each instance has 8 GB of
RAM).</para>
<para>The appropriate CPU and RAM allocation ratio must be
selected based on particular use cases.</para></section>
<para>You must select the appropriate CPU and RAM allocation ratio
based on particular use cases.</para></section>
<section xml:id="additional-hardware-compute-focus">
<title>Additional hardware</title>
<para>Certain use cases may benefit from exposure to additional
@ -201,15 +203,15 @@
<listitem>
<para>Database management systems that benefit from the
availability of SSDs for ephemeral storage to maximize
read/write time when it is required.</para>
read/write time.</para>
</listitem>
</itemizedlist>
<para>Host aggregates are used to group hosts that share similar
<para>Host aggregates group hosts that share similar
characteristics, which can include hardware similarities. The
addition of specialized hardware to a cloud deployment is
likely to add to the cost of each node, so careful
consideration must be given to whether all compute nodes, or
just a subset which is targetable using flavors, need the
likely to add to the cost of each node, so consider carefully
consideration whether all compute nodes, or
just a subset targeted by flavors, need the
additional customization to support the desired
workloads.</para></section>
<section xml:id="utilization"><title>Utilization</title>
@ -219,13 +221,13 @@
instances while making the best use of the available physical
resources.</para>
<para>In order to facilitate packing of virtual machines onto
physical hosts, the default selection of flavors are
constructed so that the second largest flavor is half the size
physical hosts, the default selection of flavors provides a
second largest flavor that is half the size
of the largest flavor in every dimension. It has half the
vCPUs, half the vRAM, and half the ephemeral disk space. The
next largest flavor is half that size again. As a result,
packing a server for general purpose computing might look
conceptually something like this figure:
next largest flavor is half that size again. The following figure
provides a visual representation of this concept for a general
purpose computing design:
<mediaobject>
<imageobject>
<imagedata contentwidth="4in"
@ -233,8 +235,7 @@
/>
</imageobject>
</mediaobject></para>
<para>On the other hand, a CPU optimized packed server might look
like the following figure:
<para>The following figure displays a CPU-optimized, packed server:
<mediaobject>
<imageobject>
<imagedata contentwidth="4in"
@ -242,35 +243,32 @@
/>
</imageobject>
</mediaobject></para>
<para>These default flavors are well suited to typical load outs
for commodity server hardware. To maximize utilization,
<para>These default flavors are well suited to typical configurations
of commodity server hardware. To maximize utilization,
however, it may be necessary to customize the flavors or
create new ones, to better align instance sizes to the
create new ones in order to better align instance sizes to the
available hardware.</para>
<para>Workload characteristics may also influence hardware choices
and flavor configuration, particularly where they present
different ratios of CPU versus RAM versus HDD
requirements.</para>
<para>For more information on Flavors refer to:
<para>For more information on Flavors see:
<link xlink:href="http://docs.openstack.org/openstack-ops/content/flavors.html">http://docs.openstack.org/openstack-ops/content/flavors.html</link></para>
</section>
<section xml:id="performance-compute-focus"><title>Performance</title>
<para>The infrastructure of a cloud should not be shared, so that
it is possible for the workloads to consume as many resources
as are made available, and accommodations should be made to
provide large scale workloads.</para>
<para>So that workloads can consume as many resources
as are available, do not share cloud infrastructure. Ensure you accommodate
large scale workloads.</para>
<para>The duration of batch processing differs depending on
individual workloads that are launched. Time limits range from
seconds, minutes to hours, and as a result it is considered
difficult to predict when resources will be used, for how
long, and even which resources will be used.</para>
individual workloads. Time limits range from
seconds to hours, and as a result it is difficult to predict resource
use.</para>
</section>
<section xml:id="security-compute-focus"><title>Security</title>
<para>The security considerations needed for this scenario are
similar to those of the other scenarios discussed in this
book.</para>
<para>A security domain comprises users, applications, servers
or networks that share common trust requirements and
<para>The security considerations for this scenario are
similar to those of the other scenarios in this guide.</para>
<para>A security domain comprises users, applications, servers,
and networks that share common trust requirements and
expectations within a system. Typically they have the same
authentication and authorization requirements and
users.</para>
@ -289,77 +287,77 @@
<para>Data</para>
</listitem>
</orderedlist>
<para>These security domains can be mapped individually to the
installation, or they can also be combined. For example, some
<para>You can map these security domains individually to the
installation, or combine them. For example, some
deployment topologies combine both guest and data domains onto
one physical network, whereas in other cases these networks
are physically separated. In each case, the cloud operator
should be aware of the appropriate security concerns. Security
domains should be mapped out against specific OpenStack
are physically separate. In each case, the cloud operator
should be aware of the appropriate security concerns. Map out
security domains against specific OpenStack
deployment topology. The domains and their trust requirements
depend upon whether the cloud instance is public, private, or
depend on whether the cloud instance is public, private, or
hybrid.</para>
<para>The public security domain is an entirely untrusted area of
<para>The public security domain is an untrusted area of
the cloud infrastructure. It can refer to the Internet as a
whole or simply to networks over which the user has no
authority. This domain should always be considered
untrusted.</para>
authority. Always consider this domain untrusted.</para>
<para>Typically used for compute instance-to-instance traffic, the
guest security domain handles compute data generated by
instances on the cloud; not services that support the
instances on the cloud. It does not handle services that support the
operation of the cloud, for example API calls. Public cloud
providers and private cloud providers who do not have
stringent controls on instance use or who allow unrestricted
Internet access to instances should consider this domain to be
untrusted. Private cloud providers may want to consider this
network as internal and therefore trusted only if they have
Internet access to instances should consider this an untrusted domain.
Private cloud providers may want to consider this
an internal network and therefore trusted only if they have
controls in place to assert that they trust instances and all
their tenants.</para>
<para>The management security domain is where services interact.
Sometimes referred to as the "control plane", the networks in
this domain transport confidential data such as configuration
parameters, user names, and passwords. In most deployments this
domain is considered trusted.</para>
<para>The data security domain is concerned primarily with
is a trusted domain.</para>
<para>The data security domain deals with
information pertaining to the storage services within
OpenStack. Much of the data that crosses this network has high
integrity and confidentiality requirements and depending on
the type of deployment there may also be strong availability
requirements. The trust level of this network is heavily
dependent on deployment decisions and as such we do not assign
this any default level of trust.</para>
<para>When deploying OpenStack in an enterprise as a private cloud
it is assumed to be behind a firewall and within the trusted
this a default level of trust.</para>
<para>When deploying OpenStack in an enterprise as a private cloud, you can
generally assume it is behind a firewall and within the trusted
network alongside existing systems. Users of the cloud are
typically employees or trusted individuals that are bound by
the security requirements set forth by the company. This tends
to push most of the security domains towards a more trusted
model. However, when deploying OpenStack in a public-facing
role, no assumptions can be made and the attack vectors
significantly increase. For example, the API endpoints and the
software behind it will be vulnerable to potentially hostile
entities wanting to gain unauthorized access or prevent access
to services. This can result in loss of reputation and must be
protected against through auditing and appropriate
role, you cannot make these assumptions and the number of attack vectors
significantly increases. For example, the API endpoints and the
software behind it become vulnerable to hostile
entities attempting to gain unauthorized access or prevent access
to services. This can result in loss of reputation and you must
protect against it through auditing and appropriate
filtering.</para>
<para>Consideration must be taken when managing the users of the
system, whether it is the operation of public or private
clouds. The identity service allows for LDAP to be part of the
<para>Take care when managing the users of the
system, whether in public or private
clouds. The identity service enables LDAP to be part of the
authentication process, and includes such systems as an
OpenStack deployment that may ease user management if
integrated into existing systems.</para>
<para>It is strongly recommended that the API services are placed
behind hardware that performs SSL termination. API services
<para>We recommend placing API services behind hardware that
performs SSL termination. API services
transmit user names, passwords, and generated tokens between
client machines and API endpoints and therefore must be
secured.</para>
<para>More information on OpenStack Security can be found
at <link xlink:href="http://docs.openstack.org/security-guide/">http://docs.openstack.org/security-guide/</link></para>
secure.</para>
<para>For more information on OpenStack Security, see
<link xlink:href="http://docs.openstack.org/security-guide/">http://docs.openstack.org/security-guide/</link>
</para>
</section>
<section xml:id="openstack-components-compute-focus">
<title>OpenStack components</title>
<para>Due to the nature of the workloads that will be used in this
scenario, a number of components will be highly beneficial in
<para>Due to the nature of the workloads in this
scenario, a number of components are highly beneficial for
a Compute-focused cloud. This includes the typical OpenStack
components:</para>
<itemizedlist>
@ -381,10 +379,10 @@
</listitem>
</itemizedlist>
<para>It is safe to assume that, given the nature of the
applications involved in this scenario, these will be heavily
automated deployments. Making use of Orchestration will be highly
beneficial in this case. Deploying a batch of instances and
running an automated set of tests can be scripted, however it
applications involved in this scenario, these are heavily
automated deployments. Making use of Orchestration is highly
beneficial in this case. You can script the deployment of a
batch of instances and the running of tests, but it
makes sense to use the Orchestration module
to handle all these actions.</para>
<itemizedlist>
@ -392,11 +390,10 @@
<para>Telemetry module (ceilometer)</para>
</listitem>
</itemizedlist>
<para>Telemetry and the alarms it generates are required
to support autoscaling of instances using
Orchestration. Users that are not using the
<para>Telemetry and the alarms it generates support autoscaling
of instances using Orchestration. Users that are not using the
Orchestration module do not need to deploy the Telemetry module and
may choose to use other external solutions to fulfill their
may choose to use external solutions to fulfill their
metering and monitoring requirements.</para>
<para>See also:
<link xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">http://docs.openstack.org/openstack-ops/content/logging_monitoring.html</link></para>
@ -406,12 +403,12 @@
</listitem>
</itemizedlist>
<para>Due to the burst-able nature of the workloads and the
applications and instances that will be used for batch
processing, this cloud will utilize mainly memory or CPU, so
applications and instances that perform batch
processing, this cloud mainly uses memory or CPU, so
the need for add-on storage to each instance is not a likely
requirement. This does not mean that OpenStack Block Storage
(cinder) will not be used in the infrastructure, but
typically it will not be used as a central component.</para>
requirement. This does not mean that you do not use
OpenStack Block Storage (cinder) in the infrastructure, but
typically it is not a central component.</para>
<itemizedlist>
<listitem>
<para>Networking</para>
@ -419,7 +416,7 @@
</itemizedlist>
<para>When choosing a networking platform, ensure that it either
works with all desired hypervisor and container technologies
and their OpenStack drivers, or includes an implementation of
an ML2 mechanism driver. Networking platforms that provide ML2
mechanisms drivers can be mixed.</para></section>
and their OpenStack drivers, or that it includes an implementation of
an ML2 mechanism driver. You can mix networking platforms
that provide ML2 mechanisms drivers.</para></section>
</section>

View File

@ -10,10 +10,9 @@
xml:id="user-requirements-compute-focus">
<?dbhtml stop-chunking?>
<title>User requirements</title>
<para>Compute intensive workloads are defined by their high
utilization of CPU, RAM, or both. User requirements will
determine if a cloud must be built to accommodate anticipated
performance demands.
<para>High utilization of CPU, RAM, or both defines compute
intensive workloads. User requirements determine the performance
demands for the cloud.
</para>
<variablelist>
<varlistentry>
@ -23,25 +22,23 @@
compute-focused cloud, however some organizations
might be concerned with cost avoidance. Repurposing
existing resources to tackle compute-intensive tasks
instead of needing to acquire additional resources may
instead of acquiring additional resources may
offer cost reduction opportunities.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Time to market</term>
<listitem>
<para>Compute-focused clouds can be used
to deliver products more quickly, for example,
speeding up a company's software development life cycle
(SDLC) for building products and applications.</para>
<para>Compute-focused clouds can deliver products more quickly,
for example by speeding up a company's software development
life cycle (SDLC) for building products and applications.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Revenue opportunity</term>
<listitem>
<para>Companies that are interested
in building services or products that rely on the
power of the compute resources will benefit from a
<para>Companies that want to build services or products that
rely on the power of compute resources benefit from a
compute-focused cloud. Examples include the analysis
of large data sets (via Hadoop or Cassandra) or
completing computational intensive tasks such as
@ -71,9 +68,9 @@
jurisdictions.</para>
</listitem>
<listitem>
<para>Data compliance&mdash;certain types of information needs
to reside in certain locations due to regular issues&mdash;and
more important cannot reside in other locations
<para>Data compliance: certain types of information need
to reside in certain locations due to regular issues and,
more importantly, cannot reside in other locations
for the same reason.</para>
</listitem>
</itemizedlist>
@ -88,15 +85,14 @@
information.</para></section>
<section xml:id="technical-considerations-compute-focus-user">
<title>Technical considerations</title>
<para>The following are some technical requirements that need to
be incorporated into the architecture design.
<para>The following are some technical requirements you must consider
in the architecture design:
</para>
<variablelist>
<varlistentry>
<term>Performance</term>
<listitem>
<para>If a primary technical concern is for
the environment to deliver high performance
<para>If a primary technical concern is to deliver high performance
capability, then a compute-focused design is an
obvious choice because it is specifically designed to
host compute-intensive workloads.</para>
@ -106,24 +102,23 @@
<term>Workload persistence</term>
<listitem>
<para>Workloads can be either
short-lived or long running. Short-lived workloads
might include continuous integration and continuous
deployment (CI-CD) jobs, where large numbers of
compute instances are created simultaneously to
perform a set of compute-intensive tasks. The results
or artifacts are then copied from the instance into
long-term storage before the instance is destroyed.
short-lived or long-running. Short-lived workloads
can include continuous integration and continuous
deployment (CI-CD) jobs, which create large numbers of
compute instances simultaneously to
perform a set of compute-intensive tasks. The environment then
copies the results or artifacts from each instance into
long-term storage before destroying the instance.
Long-running workloads, like a Hadoop or
high-performance computing (HPC) cluster, typically
ingest large data sets, perform the computational work
on those data sets, then push the results into long
term storage. Unlike short-lived workloads, when the
computational work is completed, they will remain idle
until the next job is pushed to them. Long-running
workloads are often larger and more complex, so the
effort of building them is mitigated by keeping them
active between jobs. Another example of long running
workloads is legacy applications that typically are
on those data sets, then push the results into long-term
storage. When the computational work finishes, the instances
remain idle until they receive another job. Environments
for long-running workloads are often larger and more complex,
but you can offset the cost of building them by keeping them
active between jobs. Another example of long-running
workloads is legacy applications that are
persistent over time.</para>
</listitem>
</varlistentry>
@ -132,14 +127,14 @@
<listitem>
<para>Workloads targeted for a compute-focused
OpenStack cloud generally do not require any
persistent block storage (although some usages of
Hadoop with HDFS may dictate the use of persistent
block storage). A shared filesystem or object store
will maintain the initial data set(s) and serve as the
persistent block storage, although some uses of
Hadoop with HDFS may require persistent
block storage. A shared filesystem or object store
maintains the initial data sets and serves as the
destination for saving the computational results. By
avoiding the input-output (IO) overhead, workload
performance is significantly enhanced. Depending on
the size of the data set(s), it might be necessary to
avoiding the input-output (IO) overhead, you can significantly
enhance workload performance. Depending on
the size of the data sets, it may be necessary to
scale the object store or shared file system to match
the storage demand.</para>
</listitem>
@ -150,7 +145,7 @@
<para>Like any other cloud architecture, a
compute-focused OpenStack cloud requires an on-demand
and self-service user interface. End users must be
able to provision computing power, storage, networks
able to provision computing power, storage, networks,
and software simply and flexibly. This includes
scaling the infrastructure up to a substantial level
without disrupting host operations.</para>
@ -159,12 +154,12 @@
<varlistentry>
<term>Security</term>
<listitem>
<para>Security is going to be highly dependent
on the business requirements. For example, a
<para>Security is highly dependent
on business requirements. For example, a
computationally intense drug discovery application
will obviously have much higher security requirements
than a cloud that is designed for processing market
data for a retailer. As a general start, the security
has much higher security requirements
than a cloud for processing market
data for a retailer. As a general rule, the security
recommendations and guidelines provided in the
OpenStack Security Guide are applicable.</para>
</listitem>
@ -173,9 +168,8 @@
</section>
<section xml:id="operational-considerations-compute-focus-user">
<title>Operational considerations</title>
<para>The compute intensive cloud from the operational perspective
is similar to the requirements for the general-purpose cloud.
More details on operational requirements can be found in the
general-purpose design section.</para>
<para>From an operational perspective, a compute intensive cloud
is similar to a general-purpose cloud. See the general-purpose
design section for more details on operational requirements.</para>
</section>
</section>