c76361ceb6
Replaced instances of 'it is recommended' with 'we recommend' Change-Id: I1e630a3f6b066ca3f3800839f4267c99af7026a7 Closes-bug: #1374813
760 lines
44 KiB
XML
760 lines
44 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!DOCTYPE section [
|
||
<!ENTITY % openstack SYSTEM "../../common/entities/openstack.ent">
|
||
%openstack;
|
||
]>
|
||
<section xmlns="http://docbook.org/ns/docbook"
|
||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||
version="5.0"
|
||
xml:id="technical-considerations-general-purpose">
|
||
<?dbhtml stop-chunking?>
|
||
<title>Technical considerations</title>
|
||
<para>When designing a general purpose cloud, there is an implied
|
||
requirement to design for all of the base services generally
|
||
associated with providing Infrastructure-as-a-Service:
|
||
compute, network and storage. Each of these services have
|
||
different resource requirements. As a result, it is important
|
||
to make design decisions relating directly to the service
|
||
currently under design, while providing a balanced
|
||
infrastructure that provides for all services.</para>
|
||
<para>When designing an OpenStack cloud as a general purpose
|
||
cloud, the hardware selection process can be lengthy and
|
||
involved due to the sheer mass of services which need to be
|
||
designed and the unique characteristics and requirements of
|
||
each service within the cloud. Hardware designs need to be
|
||
generated for each type of resource pool; specifically,
|
||
compute, network, and storage. In addition to the hardware
|
||
designs, which affect the resource nodes themselves, there are
|
||
also a number of additional hardware decisions to be made
|
||
related to network architecture and facilities planning. These
|
||
factors play heavily into the overall architecture of an
|
||
OpenStack cloud.</para>
|
||
<section xml:id="designing-compute-resources-tech-considerations">
|
||
<title>Designing compute resources</title>
|
||
<para>We recommend you design compute resources as pools of
|
||
resources which will be addressed on-demand. When designing
|
||
compute resource pools, a number of factors impact your design
|
||
decisions. For example, decisions related to processors,
|
||
memory, and storage within each hypervisor are just one
|
||
element of designing compute resources. In addition, it is
|
||
necessary to decide whether compute resources will be provided
|
||
in a single pool or in multiple pools.</para>
|
||
<para>To design for the best use of available resources by
|
||
applications running in the cloud, we recommend you design
|
||
more than one compute resource pool. Each independent resource
|
||
pool should be designed to provide service for specific
|
||
flavors of instances or groupings of flavors. For the purpose
|
||
of this book, "instance" refers to a virtual machines and the
|
||
operating system running on the virtual machine. Designing
|
||
multiple resource pools helps to ensure that, as instances are
|
||
scheduled onto compute hypervisors, each independent node's
|
||
resources will be allocated in a way that makes the most
|
||
efficient use of available hardware. This is commonly referred
|
||
to as bin packing.</para>
|
||
<para>Using a consistent hardware design among the nodes that are
|
||
placed within a resource pool also helps support bin packing.
|
||
Hardware nodes selected for being a part of a compute resource
|
||
pool should share a common processor, memory, and storage
|
||
layout. By choosing a common hardware design, it becomes
|
||
easier to deploy, support and maintain those nodes throughout
|
||
their life cycle in the cloud.</para>
|
||
<para>OpenStack provides the ability to configure overcommit
|
||
ratio—the ratio of virtual resources available for allocation
|
||
to physical resources present—for both CPU and memory. The
|
||
default CPU overcommit ratio is 16:1 and the default memory
|
||
overcommit ratio is 1.5:1. Determine the tuning of the
|
||
overcommit ratios for both of these options during the design
|
||
phase, as this has a direct impact on the hardware layout of
|
||
your compute nodes.</para>
|
||
<para>As an example, consider that a m1.small instance uses 1
|
||
vCPU, 20 GB of ephemeral storage and 2,048 MB of RAM. When
|
||
designing a hardware node as a compute resource pool to
|
||
service instances, take into consideration the number of
|
||
processor cores available on the node as well as the required
|
||
disk and memory to service instances running at capacity. For
|
||
a server with 2 CPUs of 10 cores each, with hyperthreading
|
||
turned on, the default CPU overcommit ratio of 16:1 would
|
||
allow for 640 (2 × 10 × 2 × 16) total <literal>m1.small</literal> instances. By
|
||
the same reasoning, using the default memory overcommit ratio
|
||
of 1.5:1 you can determine that the server will need at least
|
||
853 GB (640 × 2,048 MB / 1.5) of RAM. When sizing nodes for
|
||
memory, it is also important to consider the additional memory
|
||
required to service operating system and service needs.</para>
|
||
<para>Processor selection is an extremely important consideration
|
||
in hardware design, especially when comparing the features and
|
||
performance characteristics of different processors. Some
|
||
newly released processors include features specific to
|
||
virtualized compute hosts including hardware assisted
|
||
virtualization and technology related to memory paging (also
|
||
known as EPT shadowing). These features have a tremendous
|
||
positive impact on the performance of virtual machines running
|
||
in the cloud.</para>
|
||
<para>In addition to the impact on actual compute services, it is
|
||
also important to consider the compute requirements of
|
||
resource nodes within the cloud. Resource nodes refers to
|
||
non-hypervisor nodes providing controller, object storage,
|
||
block storage, or networking services in the cloud. The number
|
||
of processor cores and threads has a direct correlation to the
|
||
number of worker threads which can be run on a resource node.
|
||
It is important to ensure sufficient compute capacity and
|
||
memory is planned on resource nodes.</para>
|
||
<para>Workload profiles are unpredictable in a general purpose
|
||
cloud, so it may be difficult to design for every specific use
|
||
case in mind. Additional compute resource pools can be added
|
||
to the cloud at a later time, so this unpredictability should
|
||
not be a problem. In some cases, the demand on certain
|
||
instance types or flavors may not justify an individual
|
||
hardware design. In either of these cases, start by providing
|
||
hardware designs which will be capable of servicing the most
|
||
common instance requests first, looking to add additional
|
||
hardware designs to the overall architecture in the form of
|
||
new hardware node designs and resource pools as they become
|
||
justified at a later time.</para></section>
|
||
<section xml:id="designing-network-resources-tech-considerations">
|
||
<title>Designing network resources</title>
|
||
<para>An OpenStack cloud traditionally has multiple network
|
||
segments, each of which provides access to resources within
|
||
the cloud to both operators and tenants. In addition, the
|
||
network services themselves also require network communication
|
||
paths which should also be separated from the other networks.
|
||
When designing network services for a general purpose cloud,
|
||
it is recommended to plan for either a physical or logical
|
||
separation of network segments which will be used by operators
|
||
and tenants. It is further suggested to create an additional
|
||
network segment for access to internal services such as the
|
||
message bus and database used by the various cloud services.
|
||
Segregating these services onto separate networks helps to
|
||
protect sensitive data and also protects against unauthorized
|
||
access to services.</para>
|
||
<para>Based on the requirements of instances being serviced in the
|
||
cloud, the next design choice which will affect your design is
|
||
the choice of network service which will be used to service
|
||
instances in the cloud. The choice between legacy networking (nova-network), as a
|
||
part of OpenStack Compute, and OpenStack Networking
|
||
(neutron), has tremendous implications and will have
|
||
a huge impact on the architecture and design of the cloud
|
||
network infrastructure.</para>
|
||
<para>
|
||
The legacy networking (nova-network) service is primarily a
|
||
layer-2 networking service that functions in two modes. In
|
||
legacy networking, the two modes differ in their use of VLANs.
|
||
When using legacy networking in a flat network mode, all network
|
||
hardware nodes and devices throughout the cloud are connected to
|
||
a single layer-2 network segment that provides access to
|
||
application data.</para>
|
||
<para>When the network devices in the cloud support segmentation
|
||
using VLANs, legacy networking can operate in the second mode. In
|
||
this design model, each tenant within the cloud is assigned a
|
||
network subnet which is mapped to a VLAN on the physical
|
||
network. It is especially important to remember the maximum
|
||
number of 4096 VLANs which can be used within a spanning tree
|
||
domain. These limitations place hard limits on the amount of
|
||
growth possible within the data center. When designing a
|
||
general purpose cloud intended to support multiple tenants, it
|
||
is especially recommended to use legacy networking with VLANs, and
|
||
not in flat network mode.</para>
|
||
<para>Another consideration regarding network is the fact that
|
||
legacy networking is entirely managed by the cloud operator;
|
||
tenants do not have control over network resources. If tenants
|
||
require the ability to manage and create network resources
|
||
such as network segments and subnets, it will be necessary to
|
||
install the OpenStack Networking service to provide network
|
||
access to instances.</para>
|
||
<para>OpenStack Networking is a first class networking
|
||
service that gives full control over creation of virtual
|
||
network resources to tenants. This is often accomplished in
|
||
the form of tunneling protocols which will establish
|
||
encapsulated communication paths over existing network
|
||
infrastructure in order to segment tenant traffic. These
|
||
methods vary depending on the specific implementation, but
|
||
some of the more common methods include tunneling over GRE,
|
||
encapsulating with VXLAN, and VLAN tags.</para>
|
||
<para>Initially, it is suggested to design at least three network
|
||
segments, the first of which will be used for access to the
|
||
cloud’s REST APIs by tenants and operators. This is generally
|
||
referred to as a public network. In most cases, the controller
|
||
nodes and swift proxies within the cloud will be the only
|
||
devices necessary to connect to this network segment. In some
|
||
cases, this network might also be serviced by hardware load
|
||
balancers and other network devices.</para>
|
||
<para>The next segment is used by cloud administrators to manage
|
||
hardware resources and is also used by configuration
|
||
management tools when deploying software and services onto new
|
||
hardware. In some cases, this network segment might also be
|
||
used for internal services, including the message bus and
|
||
database services, to communicate with each other. Due to the
|
||
highly secure nature of this network segment, it may be
|
||
desirable to secure this network from unauthorized access.
|
||
This network will likely need to communicate with every
|
||
hardware node within the cloud.</para>
|
||
<para>The last network segment is used by applications and
|
||
consumers to provide access to the physical network and also
|
||
for users accessing applications running within the cloud.
|
||
This network is generally segregated from the one used to
|
||
access the cloud APIs and is not capable of communicating
|
||
directly with the hardware resources in the cloud. Compute
|
||
resource nodes will need to communicate on this network
|
||
segment, as will any network gateway services which allow
|
||
application data to access the physical network outside of the
|
||
cloud.</para></section>
|
||
<section xml:id="designing-storage-resources-tech-considerations">
|
||
<title>Designing storage resources</title>
|
||
<para>OpenStack has two independent storage services to consider,
|
||
each with its own specific design requirements and goals. In
|
||
addition to services which provide storage as their primary
|
||
function, there are additional design considerations with
|
||
regard to compute and controller nodes which will affect the
|
||
overall cloud architecture.</para></section>
|
||
<section xml:id="designing-openstack-object-storage-tech-considerations">
|
||
<title>Designing OpenStack Object Storage</title>
|
||
<para>When designing hardware resources for OpenStack Object
|
||
Storage, the primary goal is to maximize the amount of storage
|
||
in each resource node while also ensuring that the cost per
|
||
terabyte is kept to a minimum. This often involves utilizing
|
||
servers which can hold a large number of spinning disks.
|
||
Whether choosing to use 2U server form factors with directly
|
||
attached storage or an external chassis that holds a larger
|
||
number of drives, the main goal is to maximize the storage
|
||
available in each node.</para>
|
||
<para>It is not recommended to invest in enterprise class drives
|
||
for an OpenStack Object Storage cluster. The consistency and
|
||
partition tolerance characteristics of OpenStack Object
|
||
Storage will ensure that data stays up to date and survives
|
||
hardware faults without the use of any specialized data
|
||
replication devices.</para>
|
||
<para>A great benefit of OpenStack Object Storage is the ability
|
||
to mix and match drives by utilizing weighting within the
|
||
swift ring. When designing your swift storage cluster, it is
|
||
recommended to make use of the most cost effective storage
|
||
solution available at the time. Many server chassis on the
|
||
market can hold 60 or more drives in 4U of rack space,
|
||
therefore it is recommended to maximize the amount of storage
|
||
per rack unit at the best cost per terabyte. Furthermore, the
|
||
use of RAID controllers is not recommended in an object
|
||
storage node.</para>
|
||
<para>In order to achieve this durability and availability of data
|
||
stored as objects, it is important to design object storage
|
||
resource pools in a way that provides the suggested
|
||
availability that the service can provide. Beyond designing at
|
||
the hardware node level, it is important to consider
|
||
rack-level and zone-level designs to accommodate the number of
|
||
replicas configured to be stored in the Object Storage service
|
||
(the default number of replicas is three). Each replica of
|
||
data should exist in its own availability zone with its own
|
||
power, cooling, and network resources available to service
|
||
that specific zone.</para>
|
||
<para>Object storage nodes should be designed so that the number
|
||
of requests does not hinder the performance of the cluster.
|
||
The object storage service is a chatty protocol, therefore
|
||
making use of multiple processors that have higher core counts
|
||
will ensure the IO requests do not inundate the server.</para></section>
|
||
<section xml:id="designing-openstack-block-storage">
|
||
<title>Designing OpenStack Block Storage</title>
|
||
<para>When designing OpenStack Block Storage resource nodes, it is
|
||
helpful to understand the workloads and requirements that will
|
||
drive the use of block storage in the cloud. In a general
|
||
purpose cloud these use patterns are often unknown. It is
|
||
recommended to design block storage pools so that tenants can
|
||
choose the appropriate storage solution for their
|
||
applications. By creating multiple storage pools of different
|
||
types, in conjunction with configuring an advanced storage
|
||
scheduler for the block storage service, it is possible to
|
||
provide tenants with a large catalog of storage services with
|
||
a variety of performance levels and redundancy options.</para>
|
||
<para>In addition to directly attached storage populated in
|
||
servers, block storage can also take advantage of a number of
|
||
enterprise storage solutions. These are addressed via a plug-in
|
||
driver developed by the hardware vendor. A large number of
|
||
enterprise storage plug-in drivers ship out-of-the-box with
|
||
OpenStack Block Storage (and many more available via third
|
||
party channels). While a general purpose cloud would likely
|
||
use directly attached storage in the majority of block storage
|
||
nodes, it may also be necessary to provide additional levels
|
||
of service to tenants which can only be provided by enterprise
|
||
class storage solutions.</para>
|
||
<para>The determination to use a RAID controller card in block
|
||
storage nodes is impacted primarily by the redundancy and
|
||
availability requirements of the application. Applications
|
||
which have a higher demand on input-output per second (IOPS)
|
||
will influence both the choice to use a RAID controller and
|
||
the level of RAID configured on the volume. Where performance
|
||
is a consideration, it is suggested to make use of higher
|
||
performing RAID volumes. In contrast, where redundancy of
|
||
block storage volumes is more important it is recommended to
|
||
make use of a redundant RAID configuration such as RAID 5 or
|
||
RAID 6. Some specialized features, such as automated
|
||
replication of block storage volumes, may require the use of
|
||
third-party plug-ins and enterprise block storage solutions in
|
||
order to provide the high demand on storage. Furthermore,
|
||
where extreme performance is a requirement it may also be
|
||
necessary to make use of high speed SSD disk drives' high
|
||
performing flash storage solutions.</para></section>
|
||
<section xml:id="software-selection-tech-considerations">
|
||
<title>Software selection</title>
|
||
<para>The software selection process can play a large role in the
|
||
architecture of a general purpose cloud. Choice of operating
|
||
system, selection of OpenStack software components, choice of
|
||
hypervisor and selection of supplemental software will have a
|
||
large impact on the design of the cloud.</para>
|
||
<para>Operating system (OS) selection plays a large role in the
|
||
design and architecture of a cloud. There are a number of OSes
|
||
which have native support for OpenStack including Ubuntu, Red
|
||
Hat Enterprise Linux (RHEL), CentOS, and SUSE Linux Enterprise
|
||
Server (SLES). "Native support" in this context means that the
|
||
distribution provides distribution-native packages by which to
|
||
install OpenStack in their repositories. Note that "native
|
||
support" is not a constraint on the choice of OS; users are
|
||
free to choose just about any Linux distribution (or even
|
||
Microsoft Windows) and install OpenStack directly from source
|
||
(or compile their own packages). However, the reality is that
|
||
many organizations will prefer to install OpenStack from
|
||
distribution-supplied packages or repositories (although using
|
||
the distribution vendor's OpenStack packages might be a
|
||
requirement for support).</para>
|
||
<para>OS selection also directly influences hypervisor selection.
|
||
A cloud architect who selects Ubuntu, RHEL, or SLES has some
|
||
flexibility in hypervisor; KVM, Xen, and LXC are supported
|
||
virtualization methods available under OpenStack Compute
|
||
(nova) on these Linux distributions. A cloud architect who
|
||
selects Hyper-V, on the other hand, is limited to Windows
|
||
Server. Similarly, a cloud architect who selects XenServer is
|
||
limited to the CentOS-based dom0 operating system provided
|
||
with XenServer.</para>
|
||
<para>The primary factors that play into OS-hypervisor selection
|
||
include:</para>
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term>User requirements</term>
|
||
<listitem>
|
||
<para>The selection of OS-hypervisor
|
||
combination first and foremost needs to support the
|
||
user requirements.</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>Support</term>
|
||
<listitem>
|
||
<para>The selected OS-hypervisor combination
|
||
needs to be supported by OpenStack.</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>Interoperability</term>
|
||
<listitem>
|
||
<para>The OS-hypervisor needs to be
|
||
interoperable with other features and services in the
|
||
OpenStack design in order to meet the user
|
||
requirements.</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</section>
|
||
<section xml:id="hypervisor-tech-considerations">
|
||
<title>Hypervisor</title>
|
||
<para>OpenStack supports a wide variety of hypervisors, one or
|
||
more of which can be used in a single cloud. These hypervisors
|
||
include:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>KVM (and QEMU)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>XCP/XenServer</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>vSphere (vCenter and ESXi)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Hyper-V</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>LXC</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Docker</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Bare-metal</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>A complete list of supported hypervisors and their
|
||
capabilities can be found at
|
||
<link xlink:href="https://wiki.openstack.org/wiki/HypervisorSupportMatrix">https://wiki.openstack.org/wiki/HypervisorSupportMatrix</link>.
|
||
</para>
|
||
<para>General purpose clouds should make use of hypervisors that
|
||
support the most general purpose use cases, such as KVM and
|
||
Xen. More specific hypervisors should then be chosen to
|
||
account for specific functionality or a supported feature
|
||
requirement. In some cases, there may also be a mandated
|
||
requirement to run software on a certified hypervisor
|
||
including solutions from VMware, Microsoft, and Citrix.</para>
|
||
<para>The features offered through the OpenStack cloud platform
|
||
determine the best choice of a hypervisor. As an example, for
|
||
a general purpose cloud that predominantly supports a
|
||
Microsoft-based migration, or is managed by staff that has a
|
||
particular skill for managing certain hypervisors and
|
||
operating systems, Hyper-V might be the best available choice.
|
||
While the decision to use Hyper-V does not limit the ability
|
||
to run alternative operating systems, be mindful of those that
|
||
are deemed supported. Each different hypervisor also has their
|
||
own hardware requirements which may affect the decisions
|
||
around designing a general purpose cloud. For example, to
|
||
utilize the live migration feature of VMware, vMotion, this
|
||
requires an installation of vCenter/vSphere and the use of the
|
||
ESXi hypervisor, which increases the infrastructure
|
||
requirements.</para>
|
||
<para>In a mixed hypervisor environment, specific aggregates of
|
||
compute resources, each with defined capabilities, enable
|
||
workloads to utilize software and hardware specific to their
|
||
particular requirements. This functionality can be exposed
|
||
explicitly to the end user, or accessed through defined
|
||
metadata within a particular flavor of an instance.</para></section>
|
||
<section xml:id="openstack-components-tech-considerations">
|
||
<title>OpenStack components</title>
|
||
<para>A general purpose OpenStack cloud design should incorporate
|
||
the core OpenStack services to provide a wide range of
|
||
services to end-users. The OpenStack core services recommended
|
||
in a general purpose cloud are:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>Compute</glossterm>
|
||
(<glossterm>nova</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>Networking</glossterm>
|
||
(<glossterm>neutron</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>Image Service</glossterm>
|
||
(<glossterm>glance</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>Identity</glossterm>
|
||
(<glossterm>keystone</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>dashboard</glossterm>
|
||
(<glossterm>horizon</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><glossterm>Telemetry</glossterm> module
|
||
(<glossterm>ceilometer</glossterm>)</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>A general purpose cloud may also include OpenStack
|
||
<glossterm>Object Storage</glossterm> (<glossterm>swift</glossterm>).
|
||
OpenStack <glossterm>Block Storage</glossterm>
|
||
(<glossterm>cinder</glossterm>) may be
|
||
selected to provide persistent storage to applications and
|
||
instances although, depending on the use case, this could be
|
||
optional.</para>
|
||
</section>
|
||
<section xml:id="supplemental-software-tech-considerations">
|
||
<title>Supplemental software</title>
|
||
<para>A general purpose OpenStack deployment consists of more than
|
||
just OpenStack-specific components. A typical deployment
|
||
involves services that provide supporting functionality,
|
||
including databases and message queues, and may also involve
|
||
software to provide high availability of the OpenStack
|
||
environment. Design decisions around the underlying message
|
||
queue might affect the required number of controller services,
|
||
as well as the technology to provide highly resilient database
|
||
functionality, such as MariaDB with Galera. In such a
|
||
scenario, replication of services relies on quorum. Therefore,
|
||
the underlying database nodes, for example, should consist of
|
||
at least 3 nodes to account for the recovery of a failed
|
||
Galera node. When increasing the number of nodes to support a
|
||
feature of the software, consideration of rack space and
|
||
switch port density becomes important.</para>
|
||
<para>Where many general purpose deployments use hardware load
|
||
balancers to provide highly available API access and SSL
|
||
termination, software solutions, for example HAProxy, can also
|
||
be considered. It is vital to ensure that such software
|
||
implementations are also made highly available. This high
|
||
availability can be achieved by using software such as
|
||
Keepalived or Pacemaker with Corosync. Pacemaker and Corosync
|
||
can provide active-active or active-passive highly available
|
||
configuration depending on the specific service in the
|
||
OpenStack environment. Using this software can affect the
|
||
design as it assumes at least a 2-node controller
|
||
infrastructure where one of those nodes may be running certain
|
||
services in standby mode.</para>
|
||
<para>Memcached is a distributed memory object caching system, and
|
||
Redis is a key-value store. Both are usually deployed on
|
||
general purpose clouds to assist in alleviating load to the
|
||
Identity service. The memcached service caches tokens, and due
|
||
to its distributed nature it can help alleviate some
|
||
bottlenecks to the underlying authentication system. Using
|
||
memcached or Redis does not affect the overall design of your
|
||
architecture as they tend to be deployed onto the
|
||
infrastructure nodes providing the OpenStack services.</para></section>
|
||
<section xml:id="performance-tech-considerations">
|
||
<title>Performance</title>
|
||
<para>Performance of an OpenStack deployment is dependent on a
|
||
number of factors related to the infrastructure and controller
|
||
services. The user requirements can be split into general
|
||
network performance, performance of compute resources, and
|
||
performance of storage systems.</para></section>
|
||
<section xml:id="controller-infrastructure-tech-considerations">
|
||
<title>Controller infrastructure</title>
|
||
<para>The Controller infrastructure nodes provide management
|
||
services to the end-user as well as providing services
|
||
internally for the operating of the cloud. The Controllers
|
||
typically run message queuing services that carry system
|
||
messages between each service. Performance issues related to
|
||
the message bus would lead to delays in sending that message
|
||
to where it needs to go. The result of this condition would be
|
||
delays in operation functions such as spinning up and deleting
|
||
instances, provisioning new storage volumes and managing
|
||
network resources. Such delays could adversely affect an
|
||
application’s ability to react to certain conditions,
|
||
especially when using auto-scaling features. It is important
|
||
to properly design the hardware used to run the controller
|
||
infrastructure as outlined above in the Hardware Selection
|
||
section.</para>
|
||
<para>Performance of the controller services is not just limited
|
||
to processing power, but restrictions may emerge in serving
|
||
concurrent users. Ensure that the APIs and Horizon services
|
||
are load tested to ensure that you are able to serve your
|
||
customers. Particular attention should be made to the
|
||
OpenStack Identity Service (Keystone), which provides the
|
||
authentication and authorization for all services, both
|
||
internally to OpenStack itself and to end-users. This service
|
||
can lead to a degradation of overall performance if this is
|
||
not sized appropriately.</para></section>
|
||
<section xml:id="network-performance-tech-considerations">
|
||
<title>Network performance</title>
|
||
<para>In a general purpose OpenStack cloud, the requirements of
|
||
the network help determine its performance capabilities. For
|
||
example, small deployments may employ 1 Gigabit Ethernet (GbE)
|
||
networking, whereas larger installations serving multiple
|
||
departments or many users would be better architected with
|
||
10 GbE networking. The performance of the running instances will
|
||
be limited by these speeds. It is possible to design OpenStack
|
||
environments that run a mix of networking capabilities. By
|
||
utilizing the different interface speeds, the users of the
|
||
OpenStack environment can choose networks that are fit for
|
||
their purpose. For example, web application instances may run
|
||
on a public network presented through OpenStack Networking
|
||
that has 1 GbE capability, whereas the back-end database uses
|
||
an OpenStack Networking network that has 10 GbE capability to
|
||
replicate its data or, in some cases, the design may
|
||
incorporate link aggregation for greater throughput.</para>
|
||
<para>Network performance can be boosted considerably by
|
||
implementing hardware load balancers to provide front-end
|
||
service to the cloud APIs. The hardware load balancers also
|
||
perform SSL termination if that is a requirement of your
|
||
environment. When implementing SSL offloading, it is important
|
||
to understand the SSL offloading capabilities of the devices
|
||
selected.</para></section>
|
||
<section xml:id="compute-host-tech-considerations">
|
||
<title>Compute host</title>
|
||
<para>The choice of hardware specifications used in compute nodes
|
||
including CPU, memory and disk type directly affects the
|
||
performance of the instances. Other factors which can directly
|
||
affect performance include tunable parameters within the
|
||
OpenStack services, for example the overcommit ratio applied
|
||
to resources. The defaults in OpenStack Compute set a 16:1
|
||
over-commit of the CPU and 1.5 over-commit of the memory.
|
||
Running at such high ratios leads to an increase in
|
||
"noisy-neighbor" activity. Care must be taken when sizing your
|
||
Compute environment to avoid this scenario. For running
|
||
general purpose OpenStack environments it is possible to keep
|
||
to the defaults, but make sure to monitor your environment as
|
||
usage increases.</para></section>
|
||
<section xml:id="storage-performance-tech-considerations">
|
||
<title>Storage performance</title>
|
||
<para>When considering performance of OpenStack Block Storage,
|
||
hardware and architecture choice is important. Block Storage
|
||
can use enterprise back-end systems such as NetApp or EMC, use
|
||
scale out storage such as GlusterFS and Ceph, or simply use
|
||
the capabilities of directly attached storage in the nodes
|
||
themselves. Block Storage may be deployed so that traffic
|
||
traverses the host network, which could affect, and be
|
||
adversely affected by, the front-side API traffic performance.
|
||
As such, consider using a dedicated data storage network with
|
||
dedicated interfaces on the Controller and Compute
|
||
hosts.</para>
|
||
<para>When considering performance of OpenStack Object Storage, a
|
||
number of design choices will affect performance. A user’s
|
||
access to the Object Storage is through the proxy services,
|
||
which typically sit behind hardware load balancers. By the
|
||
very nature of a highly resilient storage system, replication
|
||
of the data would affect performance of the overall system. In
|
||
this case, 10 GbE (or better) networking is recommended
|
||
throughout the storage network architecture.</para></section>
|
||
<section xml:id="availability-tech-considerations">
|
||
<title>Availability</title>
|
||
<para>In OpenStack, the infrastructure is integral to providing
|
||
services and should always be available, especially when
|
||
operating with SLAs. Ensuring network availability is
|
||
accomplished by designing the network architecture so that no
|
||
single point of failure exists. A consideration of the number
|
||
of switches, routes and redundancies of power should be
|
||
factored into core infrastructure, as well as the associated
|
||
bonding of networks to provide diverse routes to your highly
|
||
available switch infrastructure.</para>
|
||
<para>The OpenStack services themselves should be deployed across
|
||
multiple servers that do not represent a single point of
|
||
failure. Ensuring API availability can be achieved by placing
|
||
these services behind highly available load balancers that
|
||
have multiple OpenStack servers as members.</para>
|
||
<para>OpenStack lends itself to deployment in a highly available
|
||
manner where it is expected that at least 2 servers be
|
||
utilized. These can run all the services involved from the
|
||
message queuing service, for example RabbitMQ or QPID, and an
|
||
appropriately deployed database service such as MySQL or
|
||
MariaDB. As services in the cloud are scaled out, back-end
|
||
services will need to scale too. Monitoring and reporting on
|
||
server utilization and response times, as well as load testing
|
||
your systems, will help determine scale out decisions.</para>
|
||
<para>Care must be taken when deciding network functionality.
|
||
Currently, OpenStack supports both the legacy networking (nova-network)
|
||
system and the newer, extensible OpenStack Networking. Both
|
||
have their pros and cons when it comes to providing highly
|
||
available access. Legacy networking, which provides networking
|
||
access maintained in the OpenStack Compute code, provides a
|
||
feature that removes a single point of failure when it comes
|
||
to routing, and this feature is currently missing in OpenStack
|
||
Networking. The effect of legacy networking’s multi-host
|
||
functionality restricts failure domains to the host running
|
||
that instance.</para>
|
||
<para>On the other hand, when using OpenStack Networking, the
|
||
OpenStack controller servers or separate Networking
|
||
hosts handle routing. For a deployment that requires features
|
||
available in only Networking, it is possible to
|
||
remove this restriction by using third party software that
|
||
helps maintain highly available L3 routes. Doing so allows for
|
||
common APIs to control network hardware, or to provide complex
|
||
multi-tier web applications in a secure manner. It is also
|
||
possible to completely remove routing from
|
||
Networking, and instead rely on hardware routing capabilities.
|
||
In this case, the switching infrastructure must support L3
|
||
routing.</para>
|
||
<para>
|
||
OpenStack Networking (neutron) and legacy networking
|
||
(nova-network) both have their advantages and
|
||
disadvantages. They are both valid and supported options that
|
||
fit different network deployment models described in the
|
||
<citetitle><link
|
||
xlink:href="http://docs.openstack.org/openstack-ops/content/network_design.html#network_deployment_options"
|
||
>OpenStack Operations Guide</link></citetitle>.</para>
|
||
<para>Ensure your deployment has adequate back-up capabilities. As
|
||
an example, in a deployment that has two infrastructure
|
||
controller nodes, the design should include controller
|
||
availability. In the event of the loss of a single controller,
|
||
cloud services will run from a single controller in the event
|
||
of failure. Where the design has higher availability
|
||
requirements, it is important to meet those requirements by
|
||
designing the proper redundancy and availability of controller
|
||
nodes.</para>
|
||
<para>Application design must also be factored into the
|
||
capabilities of the underlying cloud infrastructure. If the
|
||
compute hosts do not provide a seamless live migration
|
||
capability, then it must be expected that when a compute host
|
||
fails, that instance and any data local to that instance will
|
||
be deleted. Conversely, when providing an expectation to users
|
||
that instances have a high-level of uptime guarantees, the
|
||
infrastructure must be deployed in a way that eliminates any
|
||
single point of failure when a compute host disappears. This
|
||
may include utilizing shared file systems on enterprise
|
||
storage or OpenStack Block storage to provide a level of
|
||
guarantee to match service features.</para>
|
||
<para>
|
||
For more information on high availability in OpenStack, see the <link
|
||
xlink:href="http://docs.openstack.org/high-availability-guide"><citetitle>OpenStack
|
||
High Availability Guide</citetitle></link>.
|
||
</para>
|
||
</section>
|
||
<section xml:id="security-tech-considerations">
|
||
<title>Security</title>
|
||
<para>A security domain comprises users, applications, servers or
|
||
networks that share common trust requirements and expectations
|
||
within a system. Typically they have the same authentication
|
||
and authorization requirements and users.</para>
|
||
<para>These security domains are:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Public</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Guest</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Management</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Data</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>These security domains can be mapped to an OpenStack
|
||
deployment individually, or combined. For example, some
|
||
deployment topologies combine both guest and data domains onto
|
||
one physical network, whereas in other cases these networks
|
||
are physically separated. In each case, the cloud operator
|
||
should be aware of the appropriate security concerns. Security
|
||
domains should be mapped out against your specific OpenStack
|
||
deployment topology. The domains and their trust requirements
|
||
depend upon whether the cloud instance is public, private, or
|
||
hybrid.</para>
|
||
<para>The public security domain is an entirely untrusted area of
|
||
the cloud infrastructure. It can refer to the Internet as a
|
||
whole or simply to networks over which you have no authority.
|
||
This domain should always be considered untrusted.</para>
|
||
<para>Typically used for compute instance-to-instance traffic, the
|
||
guest security domain handles compute data generated by
|
||
instances on the cloud but not services that support the
|
||
operation of the cloud, such as API calls. Public cloud
|
||
providers and private cloud providers who do not have
|
||
stringent controls on instance use or who allow unrestricted
|
||
Internet access to instances should consider this domain to be
|
||
untrusted. Private cloud providers may want to consider this
|
||
network as internal and therefore trusted only if they have
|
||
controls in place to assert that they trust instances and all
|
||
their tenants.</para>
|
||
<para>The management security domain is where services interact.
|
||
Sometimes referred to as the "control plane", the networks in
|
||
this domain transport confidential data such as configuration
|
||
parameters, user names, and passwords. In most deployments this
|
||
domain is considered trusted.</para>
|
||
<para>The data security domain is concerned primarily with
|
||
information pertaining to the storage services within
|
||
OpenStack. Much of the data that crosses this network has high
|
||
integrity and confidentiality requirements and, depending on
|
||
the type of deployment, may also have strong availability
|
||
requirements. The trust level of this network is heavily
|
||
dependent on other deployment decisions.</para>
|
||
<para>When deploying OpenStack in an enterprise as a private cloud
|
||
it is usually behind the firewall and within the trusted
|
||
network alongside existing systems. Users of the cloud are,
|
||
traditionally, employees that are bound by the security
|
||
requirements set forth by the company. This tends to push most
|
||
of the security domains towards a more trusted model. However,
|
||
when deploying OpenStack in a public facing role, no
|
||
assumptions can be made and the attack vectors significantly
|
||
increase. For example, the API endpoints, along with the
|
||
software behind them, become vulnerable to bad actors wanting
|
||
to gain unauthorized access or prevent access to services,
|
||
which could lead to loss of data, functionality, and
|
||
reputation. These services must be protected against through
|
||
auditing and appropriate filtering.</para>
|
||
<para>Consideration must be taken when managing the users of the
|
||
system for both public and private clouds. The identity
|
||
service allows for LDAP to be part of the authentication
|
||
process. Including such systems in an OpenStack deployment may
|
||
ease user management if integrating into existing
|
||
systems.</para>
|
||
<para>It's important to understand that user authentication
|
||
requests include sensitive information including user names,
|
||
passwords and authentication tokens. For this reason, placing
|
||
the API services behind hardware that performs SSL termination
|
||
is strongly recommended.</para>
|
||
<para>
|
||
For more information OpenStack Security, see the <link
|
||
xlink:href="http://docs.openstack.org/security-guide/"><citetitle>OpenStack
|
||
Security Guide</citetitle></link>
|
||
</para>
|
||
</section>
|
||
</section>
|