[arch-design] Update structure of compute section to match networking and storage
1. Added new directory for compute concepts 2. Populated sections with existing content 3. Removed old compute-design-tech section 4. Added landing page for compute to mimic networking and storage Change-Id: I9633b1d8bd30194026fcaf71b6335fac54e946d2 Implements: blueprint arch-guide-restructure
This commit is contained in:
parent
8deae1289a
commit
374d0fa214
@ -1,36 +0,0 @@
|
|||||||
=============================
|
|
||||||
Compute node technical detail
|
|
||||||
=============================
|
|
||||||
|
|
||||||
This chapter describes the technical details that should be explored when
|
|
||||||
architecting OpenStack compute nodes.
|
|
||||||
|
|
||||||
Compute node design overview
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Hardware selection
|
|
||||||
~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Storage selection
|
|
||||||
~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Local storage
|
|
||||||
-------------
|
|
||||||
|
|
||||||
Remote storage
|
|
||||||
--------------
|
|
||||||
|
|
||||||
Networking
|
|
||||||
~~~~~~~~~~
|
|
||||||
|
|
||||||
Hardware
|
|
||||||
--------
|
|
||||||
|
|
||||||
Firmware
|
|
||||||
--------
|
|
||||||
|
|
||||||
Special features
|
|
||||||
----------------
|
|
||||||
|
|
||||||
High availability
|
|
||||||
-----------------
|
|
@ -5,624 +5,15 @@ Compute nodes
|
|||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 3
|
:maxdepth: 3
|
||||||
|
|
||||||
design-compute-tech
|
design-compute/design-compute-concepts
|
||||||
|
design-compute/design-compute-cpu
|
||||||
|
design-compute/design-compute-hypervisor
|
||||||
|
design-compute/design-compute-hardware
|
||||||
|
design-compute/design-compute-overcommit
|
||||||
|
design-compute/design-compute-storage
|
||||||
|
design-compute/design-compute-networking
|
||||||
|
|
||||||
|
This section describes some of the choices you need to consider
|
||||||
This chapter describes some of the choices you need to consider
|
|
||||||
when designing and building your compute nodes. Compute nodes form the
|
when designing and building your compute nodes. Compute nodes form the
|
||||||
resource core of the OpenStack Compute cloud, providing the processing, memory,
|
resource core of the OpenStack Compute cloud, providing the processing, memory,
|
||||||
network and storage resources to run instances.
|
network and storage resources to run instances.
|
||||||
|
|
||||||
Overview
|
|
||||||
~~~~~~~~
|
|
||||||
|
|
||||||
When designing compute resource pools, consider the number of processors,
|
|
||||||
amount of memory, and the quantity of storage required for each hypervisor.
|
|
||||||
|
|
||||||
Determine whether compute resources will be provided in a single pool or in
|
|
||||||
multiple pools. In most cases, multiple pools of resources can be allocated
|
|
||||||
and addressed on demand, commonly referred to as bin packing.
|
|
||||||
|
|
||||||
In a bin packing design, each independent resource pool provides service
|
|
||||||
for specific flavors. Since instances are scheduled onto compute hypervisors,
|
|
||||||
each independent node's resources will be allocated to efficiently use the
|
|
||||||
available hardware. Bin packing also requires a common hardware design,
|
|
||||||
with all hardware nodes within a compute resource pool sharing a common
|
|
||||||
processor, memory, and storage layout. This makes it easier to deploy,
|
|
||||||
support, and maintain nodes throughout their lifecycle.
|
|
||||||
|
|
||||||
Increasing the size of the supporting compute environment increases the
|
|
||||||
network traffic and messages, adding load to the controller or
|
|
||||||
networking nodes. Effective monitoring of the environment will help with
|
|
||||||
capacity decisions on scaling.
|
|
||||||
|
|
||||||
Compute nodes automatically attach to OpenStack clouds, resulting in a
|
|
||||||
horizontally scaling process when adding extra compute capacity to an
|
|
||||||
OpenStack cloud. Additional processes are required to place nodes into
|
|
||||||
appropriate availability zones and host aggregates. When adding
|
|
||||||
additional compute nodes to environments, ensure identical or functional
|
|
||||||
compatible CPUs are used, otherwise live migration features will break.
|
|
||||||
It is necessary to add rack capacity or network switches as scaling out
|
|
||||||
compute hosts directly affects data center resources.
|
|
||||||
|
|
||||||
Compute host components can also be upgraded to account for increases in
|
|
||||||
demand, known as vertical scaling. Upgrading CPUs with more
|
|
||||||
cores, or increasing the overall server memory, can add extra needed
|
|
||||||
capacity depending on whether the running applications are more CPU
|
|
||||||
intensive or memory intensive.
|
|
||||||
|
|
||||||
When selecting a processor, compare features and performance
|
|
||||||
characteristics. Some processors include features specific to
|
|
||||||
virtualized compute hosts, such as hardware-assisted virtualization, and
|
|
||||||
technology related to memory paging (also known as EPT shadowing). These
|
|
||||||
types of features can have a significant impact on the performance of
|
|
||||||
your virtual machine.
|
|
||||||
|
|
||||||
The number of processor cores and threads impacts the number of worker
|
|
||||||
threads which can be run on a resource node. Design decisions must
|
|
||||||
relate directly to the service being run on it, as well as provide a
|
|
||||||
balanced infrastructure for all services.
|
|
||||||
|
|
||||||
Another option is to assess the average workloads and increase the
|
|
||||||
number of instances that can run within the compute environment by
|
|
||||||
adjusting the overcommit ratio. This ratio is configurable for CPU and
|
|
||||||
memory. The default CPU overcommit ratio is 16:1, and the default memory
|
|
||||||
overcommit ratio is 1.5:1. Determining the tuning of the overcommit
|
|
||||||
ratios during the design phase is important as it has a direct impact on
|
|
||||||
the hardware layout of your compute nodes.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Changing the CPU overcommit ratio can have a detrimental effect
|
|
||||||
and cause a potential increase in a noisy neighbor.
|
|
||||||
|
|
||||||
Insufficient disk capacity could also have a negative effect on overall
|
|
||||||
performance including CPU and memory usage. Depending on the back-end
|
|
||||||
architecture of the OpenStack Block Storage layer, capacity includes
|
|
||||||
adding disk shelves to enterprise storage systems or installing
|
|
||||||
additional Block Storage nodes. Upgrading directly attached storage
|
|
||||||
installed in Compute hosts, and adding capacity to the shared storage
|
|
||||||
for additional ephemeral storage to instances, may be necessary.
|
|
||||||
|
|
||||||
Consider the Compute requirements of non-hypervisor nodes (also referred to as
|
|
||||||
resource nodes). This includes controller, Object Storage nodes, Block Storage
|
|
||||||
nodes, and networking services.
|
|
||||||
|
|
||||||
The ability to add Compute resource pools for unpredictable workloads should
|
|
||||||
be considered. In some cases, the demand for certain instance types or flavors
|
|
||||||
may not justify individual hardware design. Allocate hardware designs that are
|
|
||||||
capable of servicing the most common instance requests. Adding hardware to the
|
|
||||||
overall architecture can be done later.
|
|
||||||
|
|
||||||
|
|
||||||
Choosing a CPU
|
|
||||||
~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The type of CPU in your compute node is a very important choice. First,
|
|
||||||
ensure that the CPU supports virtualization by way of *VT-x* for Intel
|
|
||||||
chips and *AMD-v* for AMD chips.
|
|
||||||
|
|
||||||
.. tip::
|
|
||||||
|
|
||||||
Consult the vendor documentation to check for virtualization
|
|
||||||
support. For Intel, read `“Does my processor support Intel® Virtualization
|
|
||||||
Technology?” <http://www.intel.com/support/processors/sb/cs-030729.htm>`_.
|
|
||||||
For AMD, read `AMD Virtualization
|
|
||||||
<http://www.amd.com/en-us/innovations/software-technologies/server-solution/virtualization>`_.
|
|
||||||
Note that your CPU may support virtualization but it may be
|
|
||||||
disabled. Consult your BIOS documentation for how to enable CPU
|
|
||||||
features.
|
|
||||||
|
|
||||||
The number of cores that the CPU has also affects the decision. It's
|
|
||||||
common for current CPUs to have up to 12 cores. Additionally, if an
|
|
||||||
Intel CPU supports hyperthreading, those 12 cores are doubled to 24
|
|
||||||
cores. If you purchase a server that supports multiple CPUs, the number
|
|
||||||
of cores is further multiplied.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
**Multithread Considerations**
|
|
||||||
|
|
||||||
Hyper-Threading is Intel's proprietary simultaneous multithreading
|
|
||||||
implementation used to improve parallelization on their CPUs. You might
|
|
||||||
consider enabling Hyper-Threading to improve the performance of
|
|
||||||
multithreaded applications.
|
|
||||||
|
|
||||||
Whether you should enable Hyper-Threading on your CPUs depends upon your
|
|
||||||
use case. For example, disabling Hyper-Threading can be beneficial in
|
|
||||||
intense computing environments. We recommend that you do performance
|
|
||||||
testing with your local workload with both Hyper-Threading on and off to
|
|
||||||
determine what is more appropriate in your case.
|
|
||||||
|
|
||||||
Choosing a hypervisor
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
A hypervisor provides software to manage virtual machine access to the
|
|
||||||
underlying hardware. The hypervisor creates, manages, and monitors
|
|
||||||
virtual machines. OpenStack Compute supports many hypervisors to various
|
|
||||||
degrees, including:
|
|
||||||
|
|
||||||
* `KVM <http://www.linux-kvm.org/page/Main_Page>`_
|
|
||||||
* `LXC <https://linuxcontainers.org/>`_
|
|
||||||
* `QEMU <http://wiki.qemu.org/Main_Page>`_
|
|
||||||
* `VMware ESX/ESXi <https://www.vmware.com/support/vsphere-hypervisor>`_
|
|
||||||
* `Xen <http://www.xenproject.org/>`_
|
|
||||||
* `Hyper-V <http://technet.microsoft.com/en-us/library/hh831531.aspx>`_
|
|
||||||
* `Docker <https://www.docker.com/>`_
|
|
||||||
|
|
||||||
Probably the most important factor in your choice of hypervisor is your
|
|
||||||
current usage or experience. Aside from that, there are practical
|
|
||||||
concerns to do with feature parity, documentation, and the level of
|
|
||||||
community experience.
|
|
||||||
|
|
||||||
For example, KVM is the most widely adopted hypervisor in the OpenStack
|
|
||||||
community. Besides KVM, more deployments run Xen, LXC, VMware, and
|
|
||||||
Hyper-V than the others listed. However, each of these are lacking some
|
|
||||||
feature support or the documentation on how to use them with OpenStack
|
|
||||||
is out of date.
|
|
||||||
|
|
||||||
The best information available to support your choice is found on the
|
|
||||||
`Hypervisor Support Matrix
|
|
||||||
<http://docs.openstack.org/developer/nova/support-matrix.html>`_
|
|
||||||
and in the `configuration reference
|
|
||||||
<http://docs.openstack.org/mitaka/config-reference/compute/hypervisors.html>`_.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
It is also possible to run multiple hypervisors in a single
|
|
||||||
deployment using host aggregates or cells. However, an individual
|
|
||||||
compute node can run only a single hypervisor at a time.
|
|
||||||
|
|
||||||
Choosing server hardware
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Consider the following in selecting server hardware form factor suited for
|
|
||||||
your OpenStack design architecture:
|
|
||||||
|
|
||||||
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
|
||||||
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
|
||||||
aware, however, that this also decreases server density. For example,
|
|
||||||
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
|
||||||
M1000e support up to 16 servers in only ten rack units. Using
|
|
||||||
half-height blades is twice as dense as using full-height blades,
|
|
||||||
which results in only eight servers per ten rack units.
|
|
||||||
|
|
||||||
* 1U rack-mounted servers have the ability to offer greater server density
|
|
||||||
than a blade server solution, but are often limited to dual-socket,
|
|
||||||
multi-core CPU configurations. It is possible to place forty 1U servers
|
|
||||||
in a rack, providing space for the top of rack (ToR) switches, compared
|
|
||||||
to 32 full width blade servers.
|
|
||||||
|
|
||||||
To obtain greater than dual-socket support in a 1U rack-mount form
|
|
||||||
factor, customers need to buy their systems from Original Design
|
|
||||||
Manufacturers (ODMs) or second-tier manufacturers.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
This may cause issues for organizations that have preferred
|
|
||||||
vendor policies or concerns with support and hardware warranties
|
|
||||||
of non-tier 1 vendors.
|
|
||||||
|
|
||||||
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
|
||||||
but with a corresponding decrease in server density (half the density
|
|
||||||
that 1U rack-mounted servers offer).
|
|
||||||
|
|
||||||
* Larger rack-mounted servers, such as 4U servers, often provide even
|
|
||||||
greater CPU capacity, commonly supporting four or even eight CPU
|
|
||||||
sockets. These servers have greater expandability, but such servers
|
|
||||||
have much lower server density and are often more expensive.
|
|
||||||
|
|
||||||
* ``Sled servers`` are rack-mounted servers that support multiple
|
|
||||||
independent servers in a single 2U or 3U enclosure. These deliver
|
|
||||||
higher density as compared to typical 1U or 2U rack-mounted servers.
|
|
||||||
For example, many sled servers offer four independent dual-socket
|
|
||||||
nodes in 2U for a total of eight CPU sockets in 2U.
|
|
||||||
|
|
||||||
|
|
||||||
Other hardware considerations
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Other factors that influence server hardware selection for an OpenStack
|
|
||||||
design architecture include:
|
|
||||||
|
|
||||||
Instance density
|
|
||||||
More hosts are required to support the anticipated scale
|
|
||||||
if the design architecture uses dual-socket hardware designs.
|
|
||||||
|
|
||||||
For a general purpose OpenStack cloud, sizing is an important consideration.
|
|
||||||
The expected or anticipated number of instances that each hypervisor can
|
|
||||||
host is a common meter used in sizing the deployment. The selected server
|
|
||||||
hardware needs to support the expected or anticipated instance density.
|
|
||||||
|
|
||||||
Host density
|
|
||||||
Another option to address the higher host count is to use a
|
|
||||||
quad-socket platform. Taking this approach decreases host density
|
|
||||||
which also increases rack count. This configuration affects the
|
|
||||||
number of power connections and also impacts network and cooling
|
|
||||||
requirements.
|
|
||||||
|
|
||||||
Physical data centers have limited physical space, power, and
|
|
||||||
cooling. The number of hosts (or hypervisors) that can be fitted
|
|
||||||
into a given metric (rack, rack unit, or floor tile) is another
|
|
||||||
important method of sizing. Floor weight is an often overlooked
|
|
||||||
consideration. The data center floor must be able to support the
|
|
||||||
weight of the proposed number of hosts within a rack or set of
|
|
||||||
racks. These factors need to be applied as part of the host density
|
|
||||||
calculation and server hardware selection.
|
|
||||||
|
|
||||||
Power and cooling density
|
|
||||||
The power and cooling density requirements might be lower than with
|
|
||||||
blade, sled, or 1U server designs due to lower host density (by
|
|
||||||
using 2U, 3U or even 4U server designs). For data centers with older
|
|
||||||
infrastructure, this might be a desirable feature.
|
|
||||||
|
|
||||||
Data centers have a specified amount of power fed to a given rack or
|
|
||||||
set of racks. Older data centers may have a power density as power
|
|
||||||
as low as 20 AMPs per rack, while more recent data centers can be
|
|
||||||
architected to support power densities as high as 120 AMP per rack.
|
|
||||||
The selected server hardware must take power density into account.
|
|
||||||
|
|
||||||
Network connectivity
|
|
||||||
The selected server hardware must have the appropriate number of
|
|
||||||
network connections, as well as the right type of network
|
|
||||||
connections, in order to support the proposed architecture. Ensure
|
|
||||||
that, at a minimum, there are at least two diverse network
|
|
||||||
connections coming into each rack.
|
|
||||||
|
|
||||||
The selection of form factors or architectures affects the selection of
|
|
||||||
server hardware. Ensure that the selected server hardware is configured
|
|
||||||
to support enough storage capacity (or storage expandability) to match
|
|
||||||
the requirements of selected scale-out storage solution. Similarly, the
|
|
||||||
network architecture impacts the server hardware selection and vice
|
|
||||||
versa.
|
|
||||||
|
|
||||||
|
|
||||||
Instance Storage Solutions
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
As part of the procurement for a compute cluster, you must specify some
|
|
||||||
storage for the disk on which the instantiated instance runs. There are
|
|
||||||
three main approaches to providing this temporary-style storage, and it
|
|
||||||
is important to understand the implications of the choice.
|
|
||||||
|
|
||||||
They are:
|
|
||||||
|
|
||||||
* Off compute node storage—shared file system
|
|
||||||
* On compute node storage—shared file system
|
|
||||||
* On compute node storage—nonshared file system
|
|
||||||
|
|
||||||
In general, the questions you should ask when selecting storage are as
|
|
||||||
follows:
|
|
||||||
|
|
||||||
* What is the platter count you can achieve?
|
|
||||||
* Do more spindles result in better I/O despite network access?
|
|
||||||
* Which one results in the best cost-performance scenario you are aiming for?
|
|
||||||
* How do you manage the storage operationally?
|
|
||||||
|
|
||||||
Many operators use separate compute and storage hosts. Compute services
|
|
||||||
and storage services have different requirements, and compute hosts
|
|
||||||
typically require more CPU and RAM than storage hosts. Therefore, for a
|
|
||||||
fixed budget, it makes sense to have different configurations for your
|
|
||||||
compute nodes and your storage nodes. Compute nodes will be invested in
|
|
||||||
CPU and RAM, and storage nodes will be invested in block storage.
|
|
||||||
|
|
||||||
However, if you are more restricted in the number of physical hosts you
|
|
||||||
have available for creating your cloud and you want to be able to
|
|
||||||
dedicate as many of your hosts as possible to running instances, it
|
|
||||||
makes sense to run compute and storage on the same machines.
|
|
||||||
|
|
||||||
The three main approaches to instance storage are provided in the next
|
|
||||||
few sections.
|
|
||||||
|
|
||||||
Off Compute Node Storage—Shared File System
|
|
||||||
-------------------------------------------
|
|
||||||
|
|
||||||
In this option, the disks storing the running instances are hosted in
|
|
||||||
servers outside of the compute nodes.
|
|
||||||
|
|
||||||
If you use separate compute and storage hosts, you can treat your
|
|
||||||
compute hosts as "stateless." As long as you don't have any instances
|
|
||||||
currently running on a compute host, you can take it offline or wipe it
|
|
||||||
completely without having any effect on the rest of your cloud. This
|
|
||||||
simplifies maintenance for the compute hosts.
|
|
||||||
|
|
||||||
There are several advantages to this approach:
|
|
||||||
|
|
||||||
* If a compute node fails, instances are usually easily recoverable.
|
|
||||||
* Running a dedicated storage system can be operationally simpler.
|
|
||||||
* You can scale to any number of spindles.
|
|
||||||
* It may be possible to share the external storage for other purposes.
|
|
||||||
|
|
||||||
The main downsides to this approach are:
|
|
||||||
|
|
||||||
* Depending on design, heavy I/O usage from some instances can affect
|
|
||||||
unrelated instances.
|
|
||||||
* Use of the network can decrease performance.
|
|
||||||
|
|
||||||
On Compute Node Storage—Shared File System
|
|
||||||
------------------------------------------
|
|
||||||
|
|
||||||
In this option, each compute node is specified with a significant amount
|
|
||||||
of disk space, but a distributed file system ties the disks from each
|
|
||||||
compute node into a single mount.
|
|
||||||
|
|
||||||
The main advantage of this option is that it scales to external storage
|
|
||||||
when you require additional storage.
|
|
||||||
|
|
||||||
However, this option has several downsides:
|
|
||||||
|
|
||||||
* Running a distributed file system can make you lose your data
|
|
||||||
locality compared with nonshared storage.
|
|
||||||
* Recovery of instances is complicated by depending on multiple hosts.
|
|
||||||
* The chassis size of the compute node can limit the number of spindles
|
|
||||||
able to be used in a compute node.
|
|
||||||
* Use of the network can decrease performance.
|
|
||||||
|
|
||||||
On Compute Node Storage—Nonshared File System
|
|
||||||
---------------------------------------------
|
|
||||||
|
|
||||||
In this option, each compute node is specified with enough disks to
|
|
||||||
store the instances it hosts.
|
|
||||||
|
|
||||||
There are two main reasons why this is a good idea:
|
|
||||||
|
|
||||||
* Heavy I/O usage on one compute node does not affect instances on
|
|
||||||
other compute nodes.
|
|
||||||
* Direct I/O access can increase performance.
|
|
||||||
|
|
||||||
This has several downsides:
|
|
||||||
|
|
||||||
* If a compute node fails, the instances running on that node are lost.
|
|
||||||
* The chassis size of the compute node can limit the number of spindles
|
|
||||||
able to be used in a compute node.
|
|
||||||
* Migrations of instances from one node to another are more complicated
|
|
||||||
and rely on features that may not continue to be developed.
|
|
||||||
* If additional storage is required, this option does not scale.
|
|
||||||
|
|
||||||
Running a shared file system on a storage system apart from the computes
|
|
||||||
nodes is ideal for clouds where reliability and scalability are the most
|
|
||||||
important factors. Running a shared file system on the compute nodes
|
|
||||||
themselves may be best in a scenario where you have to deploy to
|
|
||||||
preexisting servers for which you have little to no control over their
|
|
||||||
specifications. Running a nonshared file system on the compute nodes
|
|
||||||
themselves is a good option for clouds with high I/O requirements and
|
|
||||||
low concern for reliability.
|
|
||||||
|
|
||||||
Issues with Live Migration
|
|
||||||
--------------------------
|
|
||||||
|
|
||||||
Live migration is an integral part of the operations of the
|
|
||||||
cloud. This feature provides the ability to seamlessly move instances
|
|
||||||
from one physical host to another, a necessity for performing upgrades
|
|
||||||
that require reboots of the compute hosts, but only works well with
|
|
||||||
shared storage.
|
|
||||||
|
|
||||||
Live migration can also be done with nonshared storage, using a feature
|
|
||||||
known as *KVM live block migration*. While an earlier implementation of
|
|
||||||
block-based migration in KVM and QEMU was considered unreliable, there
|
|
||||||
is a newer, more reliable implementation of block-based live migration
|
|
||||||
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
|
|
||||||
|
|
||||||
Choice of File System
|
|
||||||
---------------------
|
|
||||||
|
|
||||||
If you want to support shared-storage live migration, you need to
|
|
||||||
configure a distributed file system.
|
|
||||||
|
|
||||||
Possible options include:
|
|
||||||
|
|
||||||
* NFS (default for Linux)
|
|
||||||
* GlusterFS
|
|
||||||
* MooseFS
|
|
||||||
* Lustre
|
|
||||||
|
|
||||||
We recommend that you choose the option operators are most familiar with.
|
|
||||||
NFS is the easiest to set up and there is extensive community knowledge
|
|
||||||
about it.
|
|
||||||
|
|
||||||
Overcommitting
|
|
||||||
~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
OpenStack allows you to overcommit CPU and RAM on compute nodes. This
|
|
||||||
allows you to increase the number of instances you can have running on
|
|
||||||
your cloud, at the cost of reducing the performance of the instances.
|
|
||||||
OpenStack Compute uses the following ratios by default:
|
|
||||||
|
|
||||||
* CPU allocation ratio: 16:1
|
|
||||||
* RAM allocation ratio: 1.5:1
|
|
||||||
|
|
||||||
The default CPU allocation ratio of 16:1 means that the scheduler
|
|
||||||
allocates up to 16 virtual cores per physical core. For example, if a
|
|
||||||
physical node has 12 cores, the scheduler sees 192 available virtual
|
|
||||||
cores. With typical flavor definitions of 4 virtual cores per instance,
|
|
||||||
this ratio would provide 48 instances on a physical node.
|
|
||||||
|
|
||||||
The formula for the number of virtual instances on a compute node is
|
|
||||||
``(OR*PC)/VC``, where:
|
|
||||||
|
|
||||||
OR
|
|
||||||
CPU overcommit ratio (virtual cores per physical core)
|
|
||||||
|
|
||||||
PC
|
|
||||||
Number of physical cores
|
|
||||||
|
|
||||||
VC
|
|
||||||
Number of virtual cores per instance
|
|
||||||
|
|
||||||
Similarly, the default RAM allocation ratio of 1.5:1 means that the
|
|
||||||
scheduler allocates instances to a physical node as long as the total
|
|
||||||
amount of RAM associated with the instances is less than 1.5 times the
|
|
||||||
amount of RAM available on the physical node.
|
|
||||||
|
|
||||||
For example, if a physical node has 48 GB of RAM, the scheduler
|
|
||||||
allocates instances to that node until the sum of the RAM associated
|
|
||||||
with the instances reaches 72 GB (such as nine instances, in the case
|
|
||||||
where each instance has 8 GB of RAM).
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
Regardless of the overcommit ratio, an instance can not be placed
|
|
||||||
on any physical node with fewer raw (pre-overcommit) resources than
|
|
||||||
the instance flavor requires.
|
|
||||||
|
|
||||||
You must select the appropriate CPU and RAM allocation ratio for your
|
|
||||||
particular use case.
|
|
||||||
|
|
||||||
Logging
|
|
||||||
~~~~~~~
|
|
||||||
|
|
||||||
Logging is described in more detail in `Logging and Monitoring
|
|
||||||
<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`_. However,
|
|
||||||
it is an important design consideration to take into account before
|
|
||||||
commencing operations of your cloud.
|
|
||||||
|
|
||||||
OpenStack produces a great deal of useful logging information, however;
|
|
||||||
but for the information to be useful for operations purposes, you should
|
|
||||||
consider having a central logging server to send logs to, and a log
|
|
||||||
parsing/analysis system (such as logstash).
|
|
||||||
|
|
||||||
Networking
|
|
||||||
~~~~~~~~~~
|
|
||||||
|
|
||||||
Networking in OpenStack is a complex, multifaceted challenge. See
|
|
||||||
:doc:`design-networking`.
|
|
||||||
|
|
||||||
Compute (server) hardware selection
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Consider the following factors when selecting compute (server) hardware:
|
|
||||||
|
|
||||||
* Server density
|
|
||||||
A measure of how many servers can fit into a given measure of
|
|
||||||
physical space, such as a rack unit [U].
|
|
||||||
|
|
||||||
* Resource capacity
|
|
||||||
The number of CPU cores, how much RAM, or how much storage a given
|
|
||||||
server delivers.
|
|
||||||
|
|
||||||
* Expandability
|
|
||||||
The number of additional resources you can add to a server before it
|
|
||||||
reaches capacity.
|
|
||||||
|
|
||||||
* Cost
|
|
||||||
The relative cost of the hardware weighed against the level of
|
|
||||||
design effort needed to build the system.
|
|
||||||
|
|
||||||
Weigh these considerations against each other to determine the best
|
|
||||||
design for the desired purpose. For example, increasing server density
|
|
||||||
means sacrificing resource capacity or expandability. Increasing resource
|
|
||||||
capacity and expandability can increase cost but decrease server density.
|
|
||||||
Decreasing cost often means decreasing supportability, server density,
|
|
||||||
resource capacity, and expandability.
|
|
||||||
|
|
||||||
Compute capacity (CPU cores and RAM capacity) is a secondary
|
|
||||||
consideration for selecting server hardware. The required
|
|
||||||
server hardware must supply adequate CPU sockets, additional CPU cores,
|
|
||||||
and more RAM; network connectivity and storage capacity are not as
|
|
||||||
critical. The hardware needs to provide enough network connectivity and
|
|
||||||
storage capacity to meet the user requirements.
|
|
||||||
|
|
||||||
For a compute-focused cloud, emphasis should be on server
|
|
||||||
hardware that can offer more CPU sockets, more CPU cores, and more RAM.
|
|
||||||
Network connectivity and storage capacity are less critical.
|
|
||||||
|
|
||||||
When designing a OpenStack cloud architecture, you must
|
|
||||||
consider whether you intend to scale up or scale out. Selecting a
|
|
||||||
smaller number of larger hosts, or a larger number of smaller hosts,
|
|
||||||
depends on a combination of factors: cost, power, cooling, physical rack
|
|
||||||
and floor space, support-warranty, and manageability.
|
|
||||||
|
|
||||||
Consider the following in selecting server hardware form factor suited for
|
|
||||||
your OpenStack design architecture:
|
|
||||||
|
|
||||||
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
|
||||||
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
|
||||||
aware, however, that this also decreases server density. For example,
|
|
||||||
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
|
||||||
M1000e support up to 16 servers in only ten rack units. Using
|
|
||||||
half-height blades is twice as dense as using full-height blades,
|
|
||||||
which results in only eight servers per ten rack units.
|
|
||||||
|
|
||||||
* 1U rack-mounted servers have the ability to offer greater server density
|
|
||||||
than a blade server solution, but are often limited to dual-socket,
|
|
||||||
multi-core CPU configurations. It is possible to place forty 1U servers
|
|
||||||
in a rack, providing space for the top of rack (ToR) switches, compared
|
|
||||||
to 32 full width blade servers.
|
|
||||||
|
|
||||||
To obtain greater than dual-socket support in a 1U rack-mount form
|
|
||||||
factor, customers need to buy their systems from Original Design
|
|
||||||
Manufacturers (ODMs) or second-tier manufacturers.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
This may cause issues for organizations that have preferred
|
|
||||||
vendor policies or concerns with support and hardware warranties
|
|
||||||
of non-tier 1 vendors.
|
|
||||||
|
|
||||||
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
|
||||||
but with a corresponding decrease in server density (half the density
|
|
||||||
that 1U rack-mounted servers offer).
|
|
||||||
|
|
||||||
* Larger rack-mounted servers, such as 4U servers, often provide even
|
|
||||||
greater CPU capacity, commonly supporting four or even eight CPU
|
|
||||||
sockets. These servers have greater expandability, but such servers
|
|
||||||
have much lower server density and are often more expensive.
|
|
||||||
|
|
||||||
* ``Sled servers`` are rack-mounted servers that support multiple
|
|
||||||
independent servers in a single 2U or 3U enclosure. These deliver
|
|
||||||
higher density as compared to typical 1U or 2U rack-mounted servers.
|
|
||||||
For example, many sled servers offer four independent dual-socket
|
|
||||||
nodes in 2U for a total of eight CPU sockets in 2U.
|
|
||||||
|
|
||||||
Other factors that influence server hardware selection for an OpenStack
|
|
||||||
design architecture include:
|
|
||||||
|
|
||||||
Instance density
|
|
||||||
More hosts are required to support the anticipated scale
|
|
||||||
if the design architecture uses dual-socket hardware designs.
|
|
||||||
|
|
||||||
For a general purpose OpenStack cloud, sizing is an important consideration.
|
|
||||||
The expected or anticipated number of instances that each hypervisor can
|
|
||||||
host is a common meter used in sizing the deployment. The selected server
|
|
||||||
hardware needs to support the expected or anticipated instance density.
|
|
||||||
|
|
||||||
Host density
|
|
||||||
Another option to address the higher host count is to use a
|
|
||||||
quad-socket platform. Taking this approach decreases host density
|
|
||||||
which also increases rack count. This configuration affects the
|
|
||||||
number of power connections and also impacts network and cooling
|
|
||||||
requirements.
|
|
||||||
|
|
||||||
Physical data centers have limited physical space, power, and
|
|
||||||
cooling. The number of hosts (or hypervisors) that can be fitted
|
|
||||||
into a given metric (rack, rack unit, or floor tile) is another
|
|
||||||
important method of sizing. Floor weight is an often overlooked
|
|
||||||
consideration. The data center floor must be able to support the
|
|
||||||
weight of the proposed number of hosts within a rack or set of
|
|
||||||
racks. These factors need to be applied as part of the host density
|
|
||||||
calculation and server hardware selection.
|
|
||||||
|
|
||||||
Power and cooling density
|
|
||||||
The power and cooling density requirements might be lower than with
|
|
||||||
blade, sled, or 1U server designs due to lower host density (by
|
|
||||||
using 2U, 3U or even 4U server designs). For data centers with older
|
|
||||||
infrastructure, this might be a desirable feature.
|
|
||||||
|
|
||||||
Data centers have a specified amount of power fed to a given rack or
|
|
||||||
set of racks. Older data centers may have a power density as power
|
|
||||||
as low as 20 AMPs per rack, while more recent data centers can be
|
|
||||||
architected to support power densities as high as 120 AMP per rack.
|
|
||||||
The selected server hardware must take power density into account.
|
|
||||||
|
|
||||||
Network connectivity
|
|
||||||
The selected server hardware must have the appropriate number of
|
|
||||||
network connections, as well as the right type of network
|
|
||||||
connections, in order to support the proposed architecture. Ensure
|
|
||||||
that, at a minimum, there are at least two diverse network
|
|
||||||
connections coming into each rack.
|
|
||||||
|
|
||||||
The selection of form factors or architectures affects the selection of
|
|
||||||
server hardware. Ensure that the selected server hardware is configured
|
|
||||||
to support enough storage capacity (or storage expandability) to match
|
|
||||||
the requirements of selected scale-out storage solution. Similarly, the
|
|
||||||
network architecture impacts the server hardware selection and vice
|
|
||||||
versa.
|
|
||||||
|
@ -0,0 +1,81 @@
|
|||||||
|
=========
|
||||||
|
Overview
|
||||||
|
=========
|
||||||
|
|
||||||
|
When designing compute resource pools, consider the number of processors,
|
||||||
|
amount of memory, and the quantity of storage required for each hypervisor.
|
||||||
|
|
||||||
|
Determine whether compute resources will be provided in a single pool or in
|
||||||
|
multiple pools. In most cases, multiple pools of resources can be allocated
|
||||||
|
and addressed on demand, commonly referred to as bin packing.
|
||||||
|
|
||||||
|
In a bin packing design, each independent resource pool provides service
|
||||||
|
for specific flavors. Since instances are scheduled onto compute hypervisors,
|
||||||
|
each independent node's resources will be allocated to efficiently use the
|
||||||
|
available hardware. Bin packing also requires a common hardware design,
|
||||||
|
with all hardware nodes within a compute resource pool sharing a common
|
||||||
|
processor, memory, and storage layout. This makes it easier to deploy,
|
||||||
|
support, and maintain nodes throughout their lifecycle.
|
||||||
|
|
||||||
|
Increasing the size of the supporting compute environment increases the
|
||||||
|
network traffic and messages, adding load to the controller or
|
||||||
|
networking nodes. Effective monitoring of the environment will help with
|
||||||
|
capacity decisions on scaling.
|
||||||
|
|
||||||
|
Compute nodes automatically attach to OpenStack clouds, resulting in a
|
||||||
|
horizontally scaling process when adding extra compute capacity to an
|
||||||
|
OpenStack cloud. Additional processes are required to place nodes into
|
||||||
|
appropriate availability zones and host aggregates. When adding
|
||||||
|
additional compute nodes to environments, ensure identical or functional
|
||||||
|
compatible CPUs are used, otherwise live migration features will break.
|
||||||
|
It is necessary to add rack capacity or network switches as scaling out
|
||||||
|
compute hosts directly affects data center resources.
|
||||||
|
|
||||||
|
Compute host components can also be upgraded to account for increases in
|
||||||
|
demand, known as vertical scaling. Upgrading CPUs with more
|
||||||
|
cores, or increasing the overall server memory, can add extra needed
|
||||||
|
capacity depending on whether the running applications are more CPU
|
||||||
|
intensive or memory intensive.
|
||||||
|
|
||||||
|
When selecting a processor, compare features and performance
|
||||||
|
characteristics. Some processors include features specific to
|
||||||
|
virtualized compute hosts, such as hardware-assisted virtualization, and
|
||||||
|
technology related to memory paging (also known as EPT shadowing). These
|
||||||
|
types of features can have a significant impact on the performance of
|
||||||
|
your virtual machine.
|
||||||
|
|
||||||
|
The number of processor cores and threads impacts the number of worker
|
||||||
|
threads which can be run on a resource node. Design decisions must
|
||||||
|
relate directly to the service being run on it, as well as provide a
|
||||||
|
balanced infrastructure for all services.
|
||||||
|
|
||||||
|
Another option is to assess the average workloads and increase the
|
||||||
|
number of instances that can run within the compute environment by
|
||||||
|
adjusting the overcommit ratio. This ratio is configurable for CPU and
|
||||||
|
memory. The default CPU overcommit ratio is 16:1, and the default memory
|
||||||
|
overcommit ratio is 1.5:1. Determining the tuning of the overcommit
|
||||||
|
ratios during the design phase is important as it has a direct impact on
|
||||||
|
the hardware layout of your compute nodes.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Changing the CPU overcommit ratio can have a detrimental effect
|
||||||
|
and cause a potential increase in a noisy neighbor.
|
||||||
|
|
||||||
|
Insufficient disk capacity could also have a negative effect on overall
|
||||||
|
performance including CPU and memory usage. Depending on the back-end
|
||||||
|
architecture of the OpenStack Block Storage layer, capacity includes
|
||||||
|
adding disk shelves to enterprise storage systems or installing
|
||||||
|
additional Block Storage nodes. Upgrading directly attached storage
|
||||||
|
installed in Compute hosts, and adding capacity to the shared storage
|
||||||
|
for additional ephemeral storage to instances, may be necessary.
|
||||||
|
|
||||||
|
Consider the Compute requirements of non-hypervisor nodes (also referred to as
|
||||||
|
resource nodes). This includes controller, Object Storage nodes, Block Storage
|
||||||
|
nodes, and networking services.
|
||||||
|
|
||||||
|
The ability to add Compute resource pools for unpredictable workloads should
|
||||||
|
be considered. In some cases, the demand for certain instance types or flavors
|
||||||
|
may not justify individual hardware design. Allocate hardware designs that are
|
||||||
|
capable of servicing the most common instance requests. Adding hardware to the
|
||||||
|
overall architecture can be done later.
|
@ -0,0 +1,37 @@
|
|||||||
|
==============
|
||||||
|
Choosing a CPU
|
||||||
|
==============
|
||||||
|
|
||||||
|
The type of CPU in your compute node is a very important choice. First, ensure
|
||||||
|
that the CPU supports virtualization by way of *VT-x* for Intel chips and
|
||||||
|
*AMD-v* for AMD chips.
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
|
||||||
|
Consult the vendor documentation to check for virtualization support. For
|
||||||
|
Intel, read `“Does my processor support Intel® Virtualization Technology?”
|
||||||
|
<http://www.intel.com/support/processors/sb/cs-030729.htm>`_. For AMD, read
|
||||||
|
`AMD Virtualization
|
||||||
|
<http://www.amd.com/en-us/innovations/software-technologies/server-solution/virtualization>`_.
|
||||||
|
Your CPU may support virtualization but it may be disabled.
|
||||||
|
Consult your BIOS documentation for how to enable CPU features.
|
||||||
|
|
||||||
|
The number of cores that the CPU has also affects the decision. It is common
|
||||||
|
for current CPUs to have up to 24 cores. Additionally, if an Intel CPU supports
|
||||||
|
hyperthreading, those 24 cores are doubled to 48 cores. If you purchase a
|
||||||
|
server that supports multiple CPUs, the number of cores is further multiplied.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
**Multithread Considerations**
|
||||||
|
|
||||||
|
Hyper-Threading is Intel's proprietary simultaneous multithreading
|
||||||
|
implementation used to improve parallelization on their CPUs. You might
|
||||||
|
consider enabling Hyper-Threading to improve the performance of
|
||||||
|
multithreaded applications.
|
||||||
|
|
||||||
|
Whether you should enable Hyper-Threading on your CPUs depends upon your use
|
||||||
|
case. For example, disabling Hyper-Threading can be beneficial in intense
|
||||||
|
computing environments. We recommend performance testing with
|
||||||
|
your local workload with both Hyper-Threading on and off to determine what
|
||||||
|
is more appropriate in your case.
|
@ -0,0 +1,176 @@
|
|||||||
|
=========================
|
||||||
|
Choosing server hardware
|
||||||
|
=========================
|
||||||
|
|
||||||
|
Consider the following factors when selecting compute (server) hardware:
|
||||||
|
|
||||||
|
* Server density
|
||||||
|
A measure of how many servers can fit into a given measure of
|
||||||
|
physical space, such as a rack unit [U].
|
||||||
|
|
||||||
|
* Resource capacity
|
||||||
|
The number of CPU cores, how much RAM, or how much storage a given
|
||||||
|
server delivers.
|
||||||
|
|
||||||
|
* Expandability
|
||||||
|
The number of additional resources you can add to a server before it
|
||||||
|
reaches capacity.
|
||||||
|
|
||||||
|
* Cost
|
||||||
|
The relative cost of the hardware weighed against the level of
|
||||||
|
design effort needed to build the system.
|
||||||
|
|
||||||
|
|
||||||
|
Compute (server) hardware selection
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Weigh these considerations against each other to determine the best
|
||||||
|
design for the desired purpose. For example, increasing server density
|
||||||
|
means sacrificing resource capacity or expandability. Increasing resource
|
||||||
|
capacity and expandability can increase cost but decrease server density.
|
||||||
|
Decreasing cost often means decreasing supportability, server density,
|
||||||
|
resource capacity, and expandability.
|
||||||
|
|
||||||
|
Compute capacity (CPU cores and RAM capacity) is a secondary
|
||||||
|
consideration for selecting server hardware. The required
|
||||||
|
server hardware must supply adequate CPU sockets, additional CPU cores,
|
||||||
|
and more RAM. Network connectivity and storage capacity are not as
|
||||||
|
critical. Your hardware will need to provide enough network connectivity and
|
||||||
|
storage capacity to meet the user requirements.
|
||||||
|
|
||||||
|
For a compute-focused cloud, emphasis should be on server
|
||||||
|
hardware that can offer more CPU sockets, more CPU cores, and more RAM.
|
||||||
|
Network connectivity and storage capacity are less critical.
|
||||||
|
|
||||||
|
When designing a OpenStack cloud architecture, you must
|
||||||
|
consider whether you intend to scale up or scale out. Selecting a
|
||||||
|
smaller number of larger hosts, or a larger number of smaller hosts,
|
||||||
|
depends on a combination of factors: cost, power, cooling, physical rack
|
||||||
|
and floor space, support-warranty, and manageability.
|
||||||
|
|
||||||
|
Consider the following in selecting server hardware form factor suited for
|
||||||
|
your OpenStack design architecture:
|
||||||
|
|
||||||
|
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
||||||
|
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
||||||
|
aware, however, that this also decreases server density. For example,
|
||||||
|
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
||||||
|
M1000e support up to 16 servers in only ten rack units. Using
|
||||||
|
half-height blades is twice as dense as using full-height blades,
|
||||||
|
which results in only eight servers per ten rack units.
|
||||||
|
|
||||||
|
* 1U rack-mounted servers have the ability to offer greater server density
|
||||||
|
than a blade server solution, but are often limited to dual-socket,
|
||||||
|
multi-core CPU configurations. It is possible to place forty 1U servers
|
||||||
|
in a rack, providing space for the top of rack (ToR) switches, compared
|
||||||
|
to 32 full width blade servers.
|
||||||
|
|
||||||
|
To obtain greater than dual-socket support in a 1U rack-mount form
|
||||||
|
factor, customers need to buy their systems from Original Design
|
||||||
|
Manufacturers (ODMs) or second-tier manufacturers.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This may cause issues for organizations that have preferred
|
||||||
|
vendor policies or concerns with support and hardware warranties
|
||||||
|
of non-tier 1 vendors.
|
||||||
|
|
||||||
|
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
||||||
|
but with a corresponding decrease in server density (half the density
|
||||||
|
that 1U rack-mounted servers offer).
|
||||||
|
|
||||||
|
* Larger rack-mounted servers, such as 4U servers, often provide even
|
||||||
|
greater CPU capacity, commonly supporting four or even eight CPU
|
||||||
|
sockets. These servers have greater expandability, but such servers
|
||||||
|
have much lower server density and are often more expensive.
|
||||||
|
|
||||||
|
* ``Sled servers`` are rack-mounted servers that support multiple
|
||||||
|
independent servers in a single 2U or 3U enclosure. These deliver
|
||||||
|
higher density as compared to typical 1U or 2U rack-mounted servers.
|
||||||
|
For example, many sled servers offer four independent dual-socket
|
||||||
|
nodes in 2U for a total of eight CPU sockets in 2U.
|
||||||
|
|
||||||
|
Other factors that influence server hardware selection for an OpenStack
|
||||||
|
design architecture include:
|
||||||
|
|
||||||
|
Instance density
|
||||||
|
More hosts are required to support the anticipated scale
|
||||||
|
if the design architecture uses dual-socket hardware designs.
|
||||||
|
|
||||||
|
For a general purpose OpenStack cloud, sizing is an important consideration.
|
||||||
|
The expected or anticipated number of instances that each hypervisor can
|
||||||
|
host is a common meter used in sizing the deployment. The selected server
|
||||||
|
hardware needs to support the expected or anticipated instance density.
|
||||||
|
|
||||||
|
Host density
|
||||||
|
Another option to address the higher host count is to use a
|
||||||
|
quad-socket platform. Taking this approach decreases host density
|
||||||
|
which also increases rack count. This configuration affects the
|
||||||
|
number of power connections and also impacts network and cooling
|
||||||
|
requirements.
|
||||||
|
|
||||||
|
Physical data centers have limited physical space, power, and
|
||||||
|
cooling. The number of hosts (or hypervisors) that can be fitted
|
||||||
|
into a given metric (rack, rack unit, or floor tile) is another
|
||||||
|
important method of sizing. Floor weight is an often overlooked
|
||||||
|
consideration. The data center floor must be able to support the
|
||||||
|
weight of the proposed number of hosts within a rack or set of
|
||||||
|
racks. These factors need to be applied as part of the host density
|
||||||
|
calculation and server hardware selection.
|
||||||
|
|
||||||
|
Power and cooling density
|
||||||
|
The power and cooling density requirements might be lower than with
|
||||||
|
blade, sled, or 1U server designs due to lower host density (by
|
||||||
|
using 2U, 3U or even 4U server designs). For data centers with older
|
||||||
|
infrastructure, this might be a desirable feature.
|
||||||
|
|
||||||
|
Data centers have a specified amount of power fed to a given rack or
|
||||||
|
set of racks. Older data centers may have a power density as power
|
||||||
|
as low as 20 AMPs per rack, while more recent data centers can be
|
||||||
|
architected to support power densities as high as 120 AMP per rack.
|
||||||
|
The selected server hardware must take power density into account.
|
||||||
|
|
||||||
|
Specific hardware concepts
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Consider the following in selecting server hardware form factor suited for
|
||||||
|
your OpenStack design architecture:
|
||||||
|
|
||||||
|
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
||||||
|
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
||||||
|
aware, however, that this also decreases server density. For example,
|
||||||
|
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
||||||
|
M1000e support up to 16 servers in only ten rack units. Using
|
||||||
|
half-height blades is twice as dense as using full-height blades,
|
||||||
|
which results in only eight servers per ten rack units.
|
||||||
|
|
||||||
|
* 1U rack-mounted servers have the ability to offer greater server density
|
||||||
|
than a blade server solution, but are often limited to dual-socket,
|
||||||
|
multi-core CPU configurations. It is possible to place forty 1U servers
|
||||||
|
in a rack, providing space for the top of rack (ToR) switches, compared
|
||||||
|
to 32 full width blade servers.
|
||||||
|
|
||||||
|
To obtain greater than dual-socket support in a 1U rack-mount form
|
||||||
|
factor, customers need to buy their systems from Original Design
|
||||||
|
Manufacturers (ODMs) or second-tier manufacturers.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This may cause issues for organizations that have preferred
|
||||||
|
vendor policies or concerns with support and hardware warranties
|
||||||
|
of non-tier 1 vendors.
|
||||||
|
|
||||||
|
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
||||||
|
but with a corresponding decrease in server density (half the density
|
||||||
|
that 1U rack-mounted servers offer).
|
||||||
|
|
||||||
|
* Larger rack-mounted servers, such as 4U servers, often provide even
|
||||||
|
greater CPU capacity, commonly supporting four or even eight CPU
|
||||||
|
sockets. These servers have greater expandability, but such servers
|
||||||
|
have much lower server density and are often more expensive.
|
||||||
|
|
||||||
|
* ``Sled servers`` are rack-mounted servers that support multiple
|
||||||
|
independent servers in a single 2U or 3U enclosure. These deliver
|
||||||
|
higher density as compared to typical 1U or 2U rack-mounted servers.
|
||||||
|
For example, many sled servers offer four independent dual-socket
|
||||||
|
nodes in 2U for a total of eight CPU sockets in 2U.
|
@ -0,0 +1,39 @@
|
|||||||
|
======================
|
||||||
|
Choosing a hypervisor
|
||||||
|
======================
|
||||||
|
|
||||||
|
A hypervisor provides software to manage virtual machine access to the
|
||||||
|
underlying hardware. The hypervisor creates, manages, and monitors
|
||||||
|
virtual machines. OpenStack Compute (nova) supports many hypervisors to various
|
||||||
|
degrees, including:
|
||||||
|
|
||||||
|
* `KVM <http://www.linux-kvm.org/page/Main_Page>`_
|
||||||
|
* `LXC <https://linuxcontainers.org/>`_
|
||||||
|
* `QEMU <http://wiki.qemu.org/Main_Page>`_
|
||||||
|
* `VMware ESX/ESXi <https://www.vmware.com/support/vsphere-hypervisor>`_
|
||||||
|
* `Xen <http://www.xenproject.org/>`_
|
||||||
|
* `Hyper-V <http://technet.microsoft.com/en-us/library/hh831531.aspx>`_
|
||||||
|
* `Docker <https://www.docker.com/>`_
|
||||||
|
|
||||||
|
Probably the most important factor in your choice of hypervisor is your
|
||||||
|
current usage or experience. Aside from that, there are practical
|
||||||
|
concerns to do with feature parity, documentation, and the level of
|
||||||
|
community experience.
|
||||||
|
|
||||||
|
For example, KVM is the most widely adopted hypervisor in the OpenStack
|
||||||
|
community. Besides KVM, more deployments run Xen, LXC, VMware, and
|
||||||
|
Hyper-V than the others listed. However, each of these are lacking some
|
||||||
|
feature support or the documentation on how to use them with OpenStack
|
||||||
|
is out of date.
|
||||||
|
|
||||||
|
The best information available to support your choice is found on the
|
||||||
|
`Hypervisor Support Matrix
|
||||||
|
<http://docs.openstack.org/developer/nova/support-matrix.html>`_
|
||||||
|
and in the `configuration reference
|
||||||
|
<http://docs.openstack.org/mitaka/config-reference/compute/hypervisors.html>`_.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
It is also possible to run multiple hypervisors in a single
|
||||||
|
deployment using host aggregates or cells. However, an individual
|
||||||
|
compute node can run only a single hypervisor at a time.
|
@ -0,0 +1,16 @@
|
|||||||
|
=====================
|
||||||
|
Network connectivity
|
||||||
|
=====================
|
||||||
|
|
||||||
|
The selected server hardware must have the appropriate number of
|
||||||
|
network connections, as well as the right type of network
|
||||||
|
connections, in order to support the proposed architecture. Ensure
|
||||||
|
that, at a minimum, there are at least two diverse network
|
||||||
|
connections coming into each rack.
|
||||||
|
|
||||||
|
The selection of form factors or architectures affects the selection of
|
||||||
|
server hardware. Ensure that the selected server hardware is configured
|
||||||
|
to support enough storage capacity (or storage expandability) to match
|
||||||
|
the requirements of selected scale-out storage solution. Similarly, the
|
||||||
|
network architecture impacts the server hardware selection and vice
|
||||||
|
versa.
|
@ -0,0 +1,66 @@
|
|||||||
|
==============
|
||||||
|
Overcommitting
|
||||||
|
==============
|
||||||
|
|
||||||
|
OpenStack allows you to overcommit CPU and RAM on compute nodes. This
|
||||||
|
allows you to increase the number of instances you can have running on
|
||||||
|
your cloud, at the cost of reducing the performance of the instances.
|
||||||
|
OpenStack Compute uses the following ratios by default:
|
||||||
|
|
||||||
|
* CPU allocation ratio: 16:1
|
||||||
|
* RAM allocation ratio: 1.5:1
|
||||||
|
|
||||||
|
The default CPU allocation ratio of 16:1 means that the scheduler
|
||||||
|
allocates up to 16 virtual cores per physical core. For example, if a
|
||||||
|
physical node has 12 cores, the scheduler sees 192 available virtual
|
||||||
|
cores. With typical flavor definitions of 4 virtual cores per instance,
|
||||||
|
this ratio would provide 48 instances on a physical node.
|
||||||
|
|
||||||
|
The formula for the number of virtual instances on a compute node is
|
||||||
|
``(OR*PC)/VC``, where:
|
||||||
|
|
||||||
|
OR
|
||||||
|
CPU overcommit ratio (virtual cores per physical core)
|
||||||
|
|
||||||
|
PC
|
||||||
|
Number of physical cores
|
||||||
|
|
||||||
|
VC
|
||||||
|
Number of virtual cores per instance
|
||||||
|
|
||||||
|
Similarly, the default RAM allocation ratio of 1.5:1 means that the
|
||||||
|
scheduler allocates instances to a physical node as long as the total
|
||||||
|
amount of RAM associated with the instances is less than 1.5 times the
|
||||||
|
amount of RAM available on the physical node.
|
||||||
|
|
||||||
|
For example, if a physical node has 48 GB of RAM, the scheduler
|
||||||
|
allocates instances to that node until the sum of the RAM associated
|
||||||
|
with the instances reaches 72 GB (such as nine instances, in the case
|
||||||
|
where each instance has 8 GB of RAM).
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Regardless of the overcommit ratio, an instance can not be placed
|
||||||
|
on any physical node with fewer raw (pre-overcommit) resources than
|
||||||
|
the instance flavor requires.
|
||||||
|
|
||||||
|
You must select the appropriate CPU and RAM allocation ratio for your
|
||||||
|
particular use case.
|
||||||
|
|
||||||
|
Logging
|
||||||
|
~~~~~~~
|
||||||
|
|
||||||
|
Logging is described in more detail in `Logging and Monitoring
|
||||||
|
<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`_. However,
|
||||||
|
it is an important design consideration to take into account before
|
||||||
|
commencing operations of your cloud.
|
||||||
|
|
||||||
|
OpenStack produces a great deal of useful logging information, however,
|
||||||
|
for the information to be useful for operations purposes, you should
|
||||||
|
consider having a central logging server to send logs to, and a log
|
||||||
|
parsing/analysis system (such as logstash).
|
||||||
|
|
||||||
|
Networking
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
Networking in OpenStack is a complex, multifaceted challenge. See
|
||||||
|
:doc:`../design-networking/design-networking-concepts`.
|
@ -0,0 +1,143 @@
|
|||||||
|
===========================
|
||||||
|
Instance storage solutions
|
||||||
|
===========================
|
||||||
|
|
||||||
|
As part of the procurement for a compute cluster, you must specify some
|
||||||
|
storage for the disk on which the instantiated instance runs. There are
|
||||||
|
three main approaches to providing this temporary-style storage, and it
|
||||||
|
is important to understand the implications of the choice.
|
||||||
|
|
||||||
|
They are:
|
||||||
|
|
||||||
|
* Off compute node storage—shared file system
|
||||||
|
* On compute node storage—shared file system
|
||||||
|
* On compute node storage—nonshared file system
|
||||||
|
|
||||||
|
In general, the questions you should ask when selecting storage are as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
* What is the platter count you can achieve?
|
||||||
|
* Do more spindles result in better I/O despite network access?
|
||||||
|
* Which one results in the best cost-performance scenario you are aiming for?
|
||||||
|
* How do you manage the storage operationally?
|
||||||
|
|
||||||
|
Many operators use separate compute and storage hosts. Compute services
|
||||||
|
and storage services have different requirements, and compute hosts
|
||||||
|
typically require more CPU and RAM than storage hosts. Therefore, for a
|
||||||
|
fixed budget, it makes sense to have different configurations for your
|
||||||
|
compute nodes and your storage nodes. Compute nodes will be invested in
|
||||||
|
CPU and RAM, and storage nodes will be invested in block storage.
|
||||||
|
|
||||||
|
However, if you are more restricted in the number of physical hosts you
|
||||||
|
have available for creating your cloud and you want to be able to
|
||||||
|
dedicate as many of your hosts as possible to running instances, it
|
||||||
|
makes sense to run compute and storage on the same machines.
|
||||||
|
|
||||||
|
The three main approaches to instance storage are provided in the next
|
||||||
|
few sections.
|
||||||
|
|
||||||
|
Off compute node storage—shared file system
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In this option, the disks storing the running instances are hosted in
|
||||||
|
servers outside of the compute nodes.
|
||||||
|
|
||||||
|
If you use separate compute and storage hosts, you can treat your
|
||||||
|
compute hosts as "stateless." As long as you do not have any instances
|
||||||
|
currently running on a compute host, you can take it offline or wipe it
|
||||||
|
completely without having any effect on the rest of your cloud. This
|
||||||
|
simplifies maintenance for the compute hosts.
|
||||||
|
|
||||||
|
There are several advantages to this approach:
|
||||||
|
|
||||||
|
* If a compute node fails, instances are usually easily recoverable.
|
||||||
|
* Running a dedicated storage system can be operationally simpler.
|
||||||
|
* You can scale to any number of spindles.
|
||||||
|
* It may be possible to share the external storage for other purposes.
|
||||||
|
|
||||||
|
The main disadvantages to this approach are:
|
||||||
|
|
||||||
|
* Depending on design, heavy I/O usage from some instances can affect
|
||||||
|
unrelated instances.
|
||||||
|
* Use of the network can decrease performance.
|
||||||
|
|
||||||
|
On compute node storage—shared file system
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In this option, each compute node is specified with a significant amount
|
||||||
|
of disk space, but a distributed file system ties the disks from each
|
||||||
|
compute node into a single mount.
|
||||||
|
|
||||||
|
The main advantage of this option is that it scales to external storage
|
||||||
|
when you require additional storage.
|
||||||
|
|
||||||
|
However, this option has several disadvantages:
|
||||||
|
|
||||||
|
* Running a distributed file system can make you lose your data
|
||||||
|
locality compared with nonshared storage.
|
||||||
|
* Recovery of instances is complicated by depending on multiple hosts.
|
||||||
|
* The chassis size of the compute node can limit the number of spindles
|
||||||
|
able to be used in a compute node.
|
||||||
|
* Use of the network can decrease performance.
|
||||||
|
|
||||||
|
On compute node storage—nonshared file system
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In this option, each compute node is specified with enough disks to
|
||||||
|
store the instances it hosts.
|
||||||
|
|
||||||
|
There are two main advantages:
|
||||||
|
|
||||||
|
* Heavy I/O usage on one compute node does not affect instances on
|
||||||
|
other compute nodes.
|
||||||
|
* Direct I/O access can increase performance.
|
||||||
|
|
||||||
|
This has several disadvantages:
|
||||||
|
|
||||||
|
* If a compute node fails, the instances running on that node are lost.
|
||||||
|
* The chassis size of the compute node can limit the number of spindles
|
||||||
|
able to be used in a compute node.
|
||||||
|
* Migrations of instances from one node to another are more complicated
|
||||||
|
and rely on features that may not continue to be developed.
|
||||||
|
* If additional storage is required, this option does not scale.
|
||||||
|
|
||||||
|
Running a shared file system on a storage system apart from the computes
|
||||||
|
nodes is ideal for clouds where reliability and scalability are the most
|
||||||
|
important factors. Running a shared file system on the compute nodes
|
||||||
|
themselves may be best in a scenario where you have to deploy to
|
||||||
|
preexisting servers for which you have little to no control over their
|
||||||
|
specifications. Running a nonshared file system on the compute nodes
|
||||||
|
themselves is a good option for clouds with high I/O requirements and
|
||||||
|
low concern for reliability.
|
||||||
|
|
||||||
|
Issues with live migration
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
Live migration is an integral part of the operations of the
|
||||||
|
cloud. This feature provides the ability to seamlessly move instances
|
||||||
|
from one physical host to another, a necessity for performing upgrades
|
||||||
|
that require reboots of the compute hosts, but only works well with
|
||||||
|
shared storage.
|
||||||
|
|
||||||
|
Live migration can also be done with nonshared storage, using a feature
|
||||||
|
known as *KVM live block migration*. While an earlier implementation of
|
||||||
|
block-based migration in KVM and QEMU was considered unreliable, there
|
||||||
|
is a newer, more reliable implementation of block-based live migration
|
||||||
|
as of QEMU 1.4 and libvirt 1.0.2 that is also compatible with OpenStack.
|
||||||
|
|
||||||
|
Choice of file system
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
If you want to support shared-storage live migration, you need to
|
||||||
|
configure a distributed file system.
|
||||||
|
|
||||||
|
Possible options include:
|
||||||
|
|
||||||
|
* NFS (default for Linux)
|
||||||
|
* GlusterFS
|
||||||
|
* MooseFS
|
||||||
|
* Lustre
|
||||||
|
|
||||||
|
We recommend that you choose the option operators are most familiar with.
|
||||||
|
NFS is the easiest to set up and there is extensive community knowledge
|
||||||
|
about it.
|
Loading…
Reference in New Issue
Block a user