d81f3251a7
1. Removal of duplicated content 2. Removal of unnecessary content 3. Edits to existing content Change-Id: Ib44cb3364dd7ec199e6134906fc83b575e6e4150 Implements: blueprint arch-guide
277 lines
13 KiB
XML
277 lines
13 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!DOCTYPE section [
|
||
<!ENTITY % openstack SYSTEM "../../common/entities/openstack.ent">
|
||
%openstack;
|
||
]>
|
||
<section xmlns="http://docbook.org/ns/docbook"
|
||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||
version="5.0"
|
||
xml:id="technical-considerations-compute-focus">
|
||
<?dbhtml stop-chunking?>
|
||
<title>Technical considerations</title>
|
||
<para>In a compute-focused OpenStack cloud, the type of instance
|
||
workloads you provision heavily influences technical
|
||
decision making.</para>
|
||
<para>Public and private clouds require deterministic capacity
|
||
planning to support elastic growth in order to meet user SLA
|
||
expectations. Deterministic capacity planning is the path to
|
||
predicting the effort and expense of making a given process
|
||
perform consistently. This process is important because,
|
||
when a service becomes a critical part of a user's
|
||
infrastructure, the user's experience links directly to the SLAs of
|
||
the cloud itself.</para>
|
||
<para>There are two aspects of capacity planning to consider:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Planning the initial deployment footprint</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Planning expansion of the environment to stay ahead of the
|
||
demands of cloud users</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Begin planning an initial OpenStack deployment footprint with
|
||
estimations of expected uptake, and existing infrastructure workloads.</para>
|
||
<para>The starting point is the core count of the cloud. By
|
||
applying relevant ratios, the user can gather information
|
||
about:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>The number of expected concurrent instances:
|
||
(overcommit fraction × cores) / virtual cores per instance</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Required storage: flavor disk size × number of instances</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>These ratios determine the amount of
|
||
additional infrastructure needed to support the cloud. For
|
||
example, consider a situation in which you require 1600
|
||
instances, each with 2 vCPU and 50 GB of storage. Assuming the
|
||
default overcommit rate of 16:1, working out the math provides
|
||
an equation of:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>1600 = (16 × (number of physical cores)) / 2</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Storage required = 50 GB × 1600</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>On the surface, the equations reveal the need for 200
|
||
physical cores and 80 TB of storage for
|
||
<filename>/var/lib/nova/instances/</filename>. However,
|
||
it is also important to
|
||
look at patterns of usage to estimate the load that the API
|
||
services, database servers, and queue servers are likely to
|
||
encounter.</para>
|
||
<para>Aside from the creation and termination of instances, consider the
|
||
impact of users accessing the service,
|
||
particularly on nova-api and its associated database. Listing
|
||
instances gathers a great deal of information and given the
|
||
frequency with which users run this operation, a cloud with a
|
||
large number of users can increase the load significantly.
|
||
This can even occur unintentionally. For example, the
|
||
OpenStack Dashboard instances tab refreshes the list of
|
||
instances every 30 seconds, so leaving it open in a browser
|
||
window can cause unexpected load.</para>
|
||
<para>Consideration of these factors can help determine how many
|
||
cloud controller cores you require. A server with 8 CPU cores
|
||
and 8 GB of RAM server would be sufficient for a rack of
|
||
compute nodes, given the above caveats.</para>
|
||
<para>Key hardware specifications are also crucial to the
|
||
performance of user instances. Be sure to consider budget and
|
||
performance needs, including storage performance
|
||
(spindles/core), memory availability (RAM/core), network
|
||
bandwidth (Gbps/core), and overall CPU performance
|
||
(CPU/core).</para>
|
||
<para>The cloud resource calculator is a useful tool in examining
|
||
the impacts of different hardware and instance load outs. See:
|
||
<link xlink:href="https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods">https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods</link>
|
||
</para>
|
||
|
||
<section xml:id="expansion-planning-compute-focus">
|
||
<title>Expansion planning</title>
|
||
<para>A key challenge for planning the expansion of cloud
|
||
compute services is the elastic nature of cloud infrastructure
|
||
demands.</para>
|
||
<para>Planning for expansion is a balancing act.
|
||
Planning too conservatively can lead to unexpected
|
||
oversubscription of the cloud and dissatisfied users. Planning
|
||
for cloud expansion too aggressively can lead to unexpected
|
||
underutilization of the cloud and funds spent unnecessarily on operating
|
||
infrastructure.</para>
|
||
<para>The key is to carefully monitor the trends in
|
||
cloud usage over time. The intent is to measure the
|
||
consistency with which you deliver services, not the
|
||
average speed or capacity of the cloud. Using this information
|
||
to model capacity performance enables users to more
|
||
accurately determine the current and future capacity of the
|
||
cloud.</para>
|
||
</section>
|
||
|
||
<section xml:id="cpu-and-ram-compute-focus">
|
||
<title>CPU and RAM</title>
|
||
<para>OpenStack enables users to overcommit CPU and RAM on
|
||
compute nodes. This allows an increase in the number of
|
||
instances running on the cloud at the cost of reducing the
|
||
performance of the instances. OpenStack Compute uses the
|
||
following ratios by default:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>CPU allocation ratio: 16:1</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>RAM allocation ratio: 1.5:1</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>The default CPU allocation ratio of 16:1 means that the
|
||
scheduler allocates up to 16 virtual cores per physical core.
|
||
For example, if a physical node has 12 cores, the scheduler
|
||
sees 192 available virtual cores. With typical flavor
|
||
definitions of 4 virtual cores per instance, this ratio would
|
||
provide 48 instances on a physical node.</para>
|
||
<para>Similarly, the default RAM allocation ratio of 1.5:1 means
|
||
that the scheduler allocates instances to a physical node as
|
||
long as the total amount of RAM associated with the instances
|
||
is less than 1.5 times the amount of RAM available on the
|
||
physical node.</para>
|
||
<para>You must select the appropriate CPU and RAM allocation ratio
|
||
based on particular use cases.</para>
|
||
</section>
|
||
|
||
<section xml:id="additional-hardware-compute-focus">
|
||
<title>Additional hardware</title>
|
||
<para>Certain use cases may benefit from exposure to additional
|
||
devices on the compute node. Examples might include:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>High performance computing jobs that benefit from
|
||
the availability of graphics processing units (GPUs)
|
||
for general-purpose computing.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Cryptographic routines that benefit from the
|
||
availability of hardware random number generators to
|
||
avoid entropy starvation.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Database management systems that benefit from the
|
||
availability of SSDs for ephemeral storage to maximize
|
||
read/write time.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Host aggregates group hosts that share similar
|
||
characteristics, which can include hardware similarities. The
|
||
addition of specialized hardware to a cloud deployment is
|
||
likely to add to the cost of each node, so consider carefully
|
||
whether all compute nodes, or
|
||
just a subset targeted by flavors, need the
|
||
additional customization to support the desired
|
||
workloads.</para>
|
||
</section>
|
||
|
||
<section xml:id="utilization">
|
||
<title>Utilization</title>
|
||
<para>Infrastructure-as-a-Service offerings, including OpenStack,
|
||
use flavors to provide standardized views of virtual machine
|
||
resource requirements that simplify the problem of scheduling
|
||
instances while making the best use of the available physical
|
||
resources.</para>
|
||
<para>In order to facilitate packing of virtual machines onto
|
||
physical hosts, the default selection of flavors provides a
|
||
second largest flavor that is half the size
|
||
of the largest flavor in every dimension. It has half the
|
||
vCPUs, half the vRAM, and half the ephemeral disk space. The
|
||
next largest flavor is half that size again. The following figure
|
||
provides a visual representation of this concept for a general
|
||
purpose computing design:
|
||
<mediaobject>
|
||
<imageobject>
|
||
<imagedata contentwidth="4in"
|
||
fileref="../figures/Compute_Tech_Bin_Packing_General1.png"
|
||
/>
|
||
</imageobject>
|
||
</mediaobject></para>
|
||
<para>The following figure displays a CPU-optimized, packed server:
|
||
<mediaobject>
|
||
<imageobject>
|
||
<imagedata contentwidth="4in"
|
||
fileref="../figures/Compute_Tech_Bin_Packing_CPU_optimized1.png"
|
||
/>
|
||
</imageobject>
|
||
</mediaobject></para>
|
||
<para>These default flavors are well suited to typical configurations
|
||
of commodity server hardware. To maximize utilization,
|
||
however, it may be necessary to customize the flavors or
|
||
create new ones in order to better align instance sizes to the
|
||
available hardware.</para>
|
||
<para>Workload characteristics may also influence hardware choices
|
||
and flavor configuration, particularly where they present
|
||
different ratios of CPU versus RAM versus HDD
|
||
requirements.</para>
|
||
<para>For more information on Flavors see:
|
||
<link xlink:href="http://docs.openstack.org/openstack-ops/content/flavors.html">OpenStack Operations Guide: Flavors</link></para>
|
||
</section>
|
||
|
||
<section xml:id="openstack-components-compute-focus">
|
||
<title>OpenStack components</title>
|
||
<para>Due to the nature of the workloads in this
|
||
scenario, a number of components are highly beneficial for
|
||
a Compute-focused cloud. This includes the typical OpenStack
|
||
components:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>OpenStack Compute (nova)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack Image service (glance)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack Identity (keystone)</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Also consider several specialized components:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para><glossterm>Orchestration</glossterm> module
|
||
(<glossterm>heat</glossterm>)</para>
|
||
<para>Given the nature of the
|
||
applications involved in this scenario, these are heavily
|
||
automated deployments. Making use of Orchestration is highly
|
||
beneficial in this case. You can script the deployment of a
|
||
batch of instances and the running of tests, but it
|
||
makes sense to use the Orchestration module
|
||
to handle all these actions.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Telemetry module (ceilometer)</para>
|
||
<para>Telemetry and the alarms it generates support autoscaling
|
||
of instances using Orchestration. Users that are not using the
|
||
Orchestration module do not need to deploy the Telemetry module and
|
||
may choose to use external solutions to fulfill their
|
||
metering and monitoring requirements.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack Block Storage (cinder)</para>
|
||
<para>Due to the burst-able nature of the workloads and the
|
||
applications and instances that perform batch
|
||
processing, this cloud mainly uses memory or CPU, so
|
||
the need for add-on storage to each instance is not a likely
|
||
requirement. This does not mean that you do not use
|
||
OpenStack Block Storage (cinder) in the infrastructure, but
|
||
typically it is not a central component.</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Networking</para>
|
||
<para>When choosing a networking platform, ensure that it either
|
||
works with all desired hypervisor and container technologies
|
||
and their OpenStack drivers, or that it includes an implementation of
|
||
an ML2 mechanism driver. You can mix networking platforms
|
||
that provide ML2 mechanisms drivers.</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
</section>
|
||
</section>
|