e2866805c9
Modified the introduction only. Removed repetition and simplified wording. Change-Id: I97898a31cb7df96f8951dd87c998f05efddfff06 Implements: blueprint arch-guide
849 lines
45 KiB
XML
849 lines
45 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!DOCTYPE section [
|
||
<!ENTITY % openstack SYSTEM "../../common/entities/openstack.ent">
|
||
%openstack;
|
||
]>
|
||
<section xmlns="http://docbook.org/ns/docbook"
|
||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||
version="5.0"
|
||
xml:id="technical-considerations-general-purpose">
|
||
<?dbhtml stop-chunking?>
|
||
<title>Technical considerations</title>
|
||
<para>General purpose clouds are often expected to
|
||
include these base services:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Compute
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Network
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Storage
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Each of these services have different resource requirements.
|
||
As a result, you must make design decisions relating directly
|
||
to the service, as well as provide a balanced infrastructure for all services.</para>
|
||
<para>Take into consideration the unique aspects of each service, as individual characteristics and
|
||
service mass can impact the hardware selection process. Hardware designs should be
|
||
generated for each of the services.</para>
|
||
<para>Hardware decisions are also made in relation to network architecture
|
||
and facilities planning. These factors play heavily into
|
||
the overall architecture of an OpenStack cloud.</para>
|
||
|
||
<section xml:id="designing-compute-resources-tech-considerations">
|
||
<title>Designing compute resources</title>
|
||
<para>When designing compute resource pools, a number of factors
|
||
can impact your design decisions.
|
||
For example, decisions related to processors, memory, and
|
||
storage within each hypervisor are just one element of designing
|
||
compute resources. In addition, decide whether to provide compute
|
||
resources in a single pool or in multiple pools.
|
||
We recommend the compute design allocates multiple pools of resources to
|
||
be addressed on-demand.</para>
|
||
<para>A compute design that allocates multiple pools of resources makes best
|
||
use of application resources running in the cloud.
|
||
Each independent resource pool should be designed to provide service for specific
|
||
flavors of instances or groupings of flavors.
|
||
Designing multiple resource pools helps to ensure that, as instances are
|
||
scheduled onto compute hypervisors, each independent node's
|
||
resources will be allocated to make the most efficient use of available
|
||
hardware. This is commonly referred to as bin packing.</para>
|
||
<para>Using a consistent hardware design among the nodes that are
|
||
placed within a resource pool also helps support bin packing.
|
||
Hardware nodes selected for being a part of a compute resource
|
||
pool should share a common processor, memory, and storage
|
||
layout. By choosing a common hardware design, it becomes
|
||
easier to deploy, support and maintain those nodes throughout
|
||
their life cycle in the cloud.</para>
|
||
<para>An <firstterm>overcommit ratio</firstterm> is the ratio of available
|
||
virtual resources, compared to the available physical resources.
|
||
OpenStack is able to configure the overcommit ratio for CPU and memory.
|
||
The default CPU overcommit ratio is 16:1 and the default memory
|
||
overcommit ratio is 1.5:1. Determining the tuning of the overcommit
|
||
ratios for both of these options during the design phase is important
|
||
as it has a direct impact on the hardware layout of your compute nodes.</para>
|
||
<para>For example, consider a m1.small instance uses 1
|
||
vCPU, 20 GB of ephemeral storage and 2,048 MB of RAM. When
|
||
designing a hardware node as a compute resource pool to
|
||
service instances, take into consideration the number of
|
||
processor cores available on the node as well as the required
|
||
disk and memory to service instances running at capacity. For
|
||
a server with 2 CPUs of 10 cores each, with hyperthreading
|
||
turned on, the default CPU overcommit ratio of 16:1 would
|
||
allow for 640 (2 × 10 × 2 × 16) total <literal>m1.small</literal> instances. By
|
||
the same reasoning, using the default memory overcommit ratio
|
||
of 1.5:1 you can determine that the server will need at least
|
||
853 GB (640 × 2,048 MB / 1.5) of RAM. When sizing nodes for
|
||
memory, it is also important to consider the additional memory
|
||
required to service operating system and service needs.</para>
|
||
<para>Processor selection is an extremely important consideration
|
||
in hardware design, especially when comparing the features and
|
||
performance characteristics of different processors. Processors
|
||
can include features specific to virtualized compute hosts including
|
||
hardware assisted virtualization and technology related to memory paging (also
|
||
known as EPT shadowing). These types of features can have a significant impact on the
|
||
performance of your virtual machine running in the cloud.</para>
|
||
<para>It is also important to consider the compute requirements of
|
||
resource nodes within the cloud. Resource nodes refer to
|
||
non-hypervisor nodes providing the following in the cloud:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Controller
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Object storage
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Block storage
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Networking services
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>The number of processor cores and threads has a direct correlation to the
|
||
number of worker threads which can be run on a resource node.
|
||
As a result, you must make design decisions relating directly to
|
||
the service, as well as provide a balanced infrastructure for all services.</para>
|
||
<para>Workload profiles are unpredictable in a general purpose
|
||
cloud. Additional compute resource pools can be added to the cloud
|
||
later, reducing the stress of unpredictability. In some cases, the demand on certain
|
||
instance types or flavors may not justify individual
|
||
hardware design. In either of these cases, initiate the design by
|
||
allocating hardware designs that are capable of servicing the most
|
||
common instances requests. If you are looking to add additional
|
||
hardware designs to the overall architecture, this can be done at
|
||
a later time.</para>
|
||
</section>
|
||
|
||
<section xml:id="designing-network-resources-tech-considerations">
|
||
<title>Designing network resources</title>
|
||
<para>OpenStack clouds traditionally have multiple network
|
||
segments, each of which provides access to resources within
|
||
the cloud to both operators and tenants. The network services
|
||
themselves also require network communication paths which should
|
||
be separated from the other networks. When designing network services
|
||
for a general purpose cloud, we recommend planning for a physical or
|
||
logical separation of network segments that will be used by operators
|
||
and tenants. We further suggest the creation of an additional network
|
||
segment for access to internal services such as the message bus and
|
||
databse used by the various cloud services.
|
||
Segregating these services onto separate networks helps to protect
|
||
sensitive data and protects against unauthorized access to services.</para>
|
||
<para>Based on the requirements of instances being serviced in the cloud,
|
||
the choice of network service will be the next decision that affects
|
||
your design architecture.</para>
|
||
<para>The choice between legacy networking (nova-network), as a
|
||
part of OpenStack Compute, and OpenStack Networking
|
||
(neutron), has a huge impact on the architecture and design of the cloud
|
||
network infrastructure.</para>
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term>Legacy networking (nova-network)</term>
|
||
<listitem>
|
||
<para>The legacy networking (nova-network) service is primarily a
|
||
layer-2 networking service that functions in two modes. In
|
||
legacy networking, the two modes differ in their use of VLANs.
|
||
When using legacy networking in a flat network mode, all network
|
||
hardware nodes and devices throughout the cloud are connected to
|
||
a single layer-2 network segment that provides access to
|
||
application data.</para>
|
||
<para>When the network devices in the cloud support segmentation
|
||
using VLANs, legacy networking can operate in the second mode. In
|
||
this design model, each tenant within the cloud is assigned a
|
||
network subnet which is mapped to a VLAN on the physical
|
||
network. It is especially important to remember the maximum
|
||
number of 4096 VLANs which can be used within a spanning tree
|
||
domain. These limitations place hard limits on the amount of
|
||
growth possible within the data center. When designing a
|
||
general purpose cloud intended to support multiple tenants, we
|
||
recommend the use of legacy networking with VLANs, and
|
||
not in flat network mode.</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
<para>Another consideration regarding network is the fact that
|
||
legacy networking is entirely managed by the cloud operator;
|
||
tenants do not have control over network resources. If tenants
|
||
require the ability to manage and create network resources
|
||
such as network segments and subnets, it will be necessary to
|
||
install the OpenStack Networking service to provide network
|
||
access to instances.</para>
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term>OpenStack Networking (neutron)</term>
|
||
<listitem>
|
||
<para>OpenStack Networking (neutron) is a first class networking
|
||
service that gives full control over creation of virtual
|
||
network resources to tenants. This is often accomplished in
|
||
the form of tunneling protocols which will establish
|
||
encapsulated communication paths over existing network
|
||
infrastructure in order to segment tenant traffic. These
|
||
methods vary depending on the specific implementation, but
|
||
some of the more common methods include tunneling over GRE,
|
||
encapsulating with VXLAN, and VLAN tags.</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
<para>Initially, it is suggested to design at least three network
|
||
segments, the first of which will be used for access to the
|
||
cloud's REST APIs by tenants and operators. This is
|
||
referred to as a public network. In most cases, the controller
|
||
nodes and swift proxies within the cloud will be the only
|
||
devices necessary to connect to this network segment. In some
|
||
cases, this network might also be serviced by hardware load
|
||
balancers and other network devices.</para>
|
||
<para>The next segment is used by cloud administrators to manage
|
||
hardware resources and is also used by configuration
|
||
management tools when deploying software and services onto new
|
||
hardware. In some cases, this network segment might also be
|
||
used for internal services, including the message bus and
|
||
database services, to communicate with each other. Due to the
|
||
highly secure nature of this network segment, it may be
|
||
desirable to secure this network from unauthorized access.
|
||
This network will likely need to communicate with every
|
||
hardware node within the cloud.</para>
|
||
<para>The last network segment is used by applications and
|
||
consumers to provide access to the physical network and also
|
||
for users accessing applications running within the cloud.
|
||
This network is generally segregated from the one used to
|
||
access the cloud APIs and is not capable of communicating
|
||
directly with the hardware resources in the cloud. Compute
|
||
resource nodes will need to communicate on this network
|
||
segment, as will any network gateway services which allow
|
||
application data to access the physical network outside of the
|
||
cloud.</para>
|
||
</section>
|
||
|
||
<section xml:id="designing-storage-resources-tech-considerations">
|
||
<title>Designing storage resources</title>
|
||
<para>OpenStack has two independent storage services to consider,
|
||
each with its own specific design requirements and goals. In
|
||
addition to services which provide storage as their primary
|
||
function, there are additional design considerations with
|
||
regard to compute and controller nodes which will affect the
|
||
overall cloud architecture.</para> <!--Could there be room for more tech content here? A.S-->
|
||
</section>
|
||
|
||
<section xml:id="designing-openstack-object-storage-tech-considerations">
|
||
<title>Designing OpenStack Object Storage</title>
|
||
<para>When designing hardware resources for OpenStack Object
|
||
Storage, the primary goal is to maximize the amount of storage
|
||
in each resource node while also ensuring that the cost per
|
||
terabyte is kept to a minimum. This often involves utilizing
|
||
servers which can hold a large number of spinning disks.
|
||
Whether choosing to use 2U server form factors with directly
|
||
attached storage or an external chassis that holds a larger
|
||
number of drives, the main goal is to maximize the storage
|
||
available in each node.</para>
|
||
<para>We do not recommended investing in enterprise class drives
|
||
for an OpenStack Object Storage cluster. The consistency and
|
||
partition tolerance characteristics of OpenStack Object
|
||
Storage will ensure that data stays up to date and survives
|
||
hardware faults without the use of any specialized data
|
||
replication devices.</para>
|
||
<para>One of the benefits of OpenStack Object Storage is the ability
|
||
to mix and match drives by making use of weighting within the
|
||
swift ring. When designing your swift storage cluster, we
|
||
recommend making use of the most cost effective storage
|
||
solution available at the time. Many server chassis on the
|
||
market can hold 60 or more drives in 4U of rack space,
|
||
therefore we recommend maximizing the amount of storage
|
||
per rack unit at the best cost per terabyte. Furthermore, we
|
||
do not recommend the use of RAID controllers in an object storage
|
||
node.</para>
|
||
<para>To achieve durability and availability of data stored as objects
|
||
it is important to design object storage resource pools to ensure they can
|
||
provide the suggested availability. Considering rack-level and zone-level
|
||
designs to accommodate the number of replicas configured to be stored in the
|
||
Object Storage service (the defult number of replicas is three) is important
|
||
when designing beyond the hardware node level. Each replica of
|
||
data should exist in its own availability zone with its own
|
||
power, cooling, and network resources available to service
|
||
that specific zone.</para>
|
||
<para>Object storage nodes should be designed so that the number
|
||
of requests does not hinder the performance of the cluster.
|
||
The object storage service is a chatty protocol, therefore
|
||
making use of multiple processors that have higher core counts
|
||
will ensure the IO requests do not inundate the server.</para>
|
||
</section>
|
||
|
||
<section xml:id="designing-openstack-block-storage">
|
||
<title>Designing OpenStack Block Storage</title>
|
||
<para>When designing OpenStack Block Storage resource nodes, it is
|
||
helpful to understand the workloads and requirements that will
|
||
drive the use of block storage in the cloud. We recommend designing
|
||
block storage pools so that tenants can choose appropriate storage
|
||
solutions for their applications. By creating multiple storage pools of different
|
||
types, in conjunction with configuring an advanced storage
|
||
scheduler for the block storage service, it is possible to
|
||
provide tenants with a large catalog of storage services with
|
||
a variety of performance levels and redundancy options.</para>
|
||
<para>Block storage also takes advantage of a number of enterprise storage
|
||
solutions. These are addressed via a plug-in driver developed by the
|
||
hardware vendor. A large number of
|
||
enterprise storage plug-in drivers ship out-of-the-box with
|
||
OpenStack Block Storage (and many more available via third
|
||
party channels). General purpose clouds are more likely to use
|
||
directly attached storage in the majority of block storage nodes,
|
||
deeming it necessary to provide additional levels of service to tenants
|
||
which can only be provided by enterprise class storage solutions.</para>
|
||
<para>Redundancy and availability requirements impact the decision to use
|
||
a RAID controller card in block storage nodes. The input-output per second (IOPS)
|
||
demand of your application will influence whether or not you should use a RAID
|
||
controller, and which level of RAID is required.
|
||
Making use of higher performing RAID volumes is suggested when
|
||
considering performance. However, where redundancy of
|
||
block storage volumes is more important we recommend
|
||
making use of a redundant RAID configuration such as RAID 5 or
|
||
RAID 6. Some specialized features, such as automated
|
||
replication of block storage volumes, may require the use of
|
||
third-party plug-ins and enterprise block storage solutions in
|
||
order to provide the high demand on storage. Furthermore,
|
||
where extreme performance is a requirement it may also be
|
||
necessary to make use of high speed SSD disk drives' high
|
||
performing flash storage solutions.</para>
|
||
</section>
|
||
|
||
<section xml:id="software-selection-tech-considerations">
|
||
<title>Software selection</title>
|
||
<para>The software selection process plays a large role in the
|
||
architecture of a general purpose cloud. The following have
|
||
a large impact on the design of the cloud:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Choice of operating system
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Selection of OpenStack software components
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Choice of hypervisor
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Selection of supplemental software
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>Operating system (OS) selection plays a large role in the
|
||
design and architecture of a cloud. There are a number of OSes
|
||
which have native support for OpenStack including:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
Ubuntu
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Red Hat Enterprise Linux (RHEL)
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
CentOS
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
SUSE Linux Enterprise Server (SLES)
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<note>
|
||
<para>Native support is not a constraint on the choice of OS; users are
|
||
free to choose just about any Linux distribution (or even
|
||
Microsoft Windows) and install OpenStack directly from source
|
||
(or compile their own packages). However, many organizations will
|
||
prefer to install OpenStack from distribution-supplied packages or
|
||
repositories (although using the distribution vendor's OpenStack
|
||
packages might be a requirement for support).
|
||
</para>
|
||
</note>
|
||
<para>OS selection also directly influences hypervisor selection.
|
||
A cloud architect who selects Ubuntu, RHEL, or SLES has some
|
||
flexibility in hypervisor; KVM, Xen, and LXC are supported
|
||
virtualization methods available under OpenStack Compute
|
||
(nova) on these Linux distributions. However, a cloud architect
|
||
who selects Hyper-V is limited to Windows Servers. Similarly, a
|
||
cloud architect who selects XenServer is limited to the CentOS-based
|
||
dom0 operating system provided with XenServer.</para>
|
||
<para>The primary factors that play into OS-hypervisor selection
|
||
include:</para>
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term>User requirements</term>
|
||
<listitem>
|
||
<para>The selection of OS-hypervisor
|
||
combination first and foremost needs to support the
|
||
user requirements.</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>Support</term>
|
||
<listitem>
|
||
<para>The selected OS-hypervisor combination
|
||
needs to be supported by OpenStack.</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term>Interoperability</term>
|
||
<listitem>
|
||
<para>The OS-hypervisor needs to be
|
||
interoperable with other features and services in the
|
||
OpenStack design in order to meet the user
|
||
requirements.</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</section>
|
||
|
||
<section xml:id="hypervisor-tech-considerations">
|
||
<title>Hypervisor</title>
|
||
<para>OpenStack supports a wide variety of hypervisors, one or
|
||
more of which can be used in a single cloud. These hypervisors
|
||
include:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>KVM (and QEMU)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>XCP/XenServer</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>vSphere (vCenter and ESXi)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Hyper-V</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>LXC</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Docker</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Bare-metal</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>A complete list of supported hypervisors and their
|
||
capabilities can be found at
|
||
<link xlink:href="https://wiki.openstack.org/wiki/HypervisorSupportMatrix">OpenStack Hypervisor Support Matrix</link>.
|
||
</para>
|
||
<para>We recommend general purpose clouds use hypervisors that
|
||
support the most general purpose use cases, such as KVM and
|
||
Xen. More specific hypervisors should be chosen to account
|
||
for specific functionality or a supported feature requirement.
|
||
In some cases, there may also be a mandated
|
||
requirement to run software on a certified hypervisor
|
||
including solutions from VMware, Microsoft, and Citrix.</para>
|
||
<para>The features offered through the OpenStack cloud platform
|
||
determine the best choice of a hypervisor. As an example, for
|
||
a general purpose cloud that predominantly supports a
|
||
Microsoft-based migration, or is managed by staff that has a
|
||
particular skill for managing certain hypervisors and
|
||
operating systems, Hyper-V would be the best available choice.
|
||
While the decision to use Hyper-V does not limit the ability
|
||
to run alternative operating systems, be mindful of those that
|
||
are deemed supported. Each different hypervisor also has their
|
||
own hardware requirements which may affect the decisions
|
||
around designing a general purpose cloud. For example, to
|
||
utilize the live migration feature of VMware, vMotion, this
|
||
requires an installation of vCenter/vSphere and the use of the
|
||
ESXi hypervisor, which increases the infrastructure
|
||
requirements.</para>
|
||
<para>In a mixed hypervisor environment, specific aggregates of
|
||
compute resources, each with defined capabilities, enable
|
||
workloads to utilize software and hardware specific to their
|
||
particular requirements. This functionality can be exposed
|
||
explicitly to the end user, or accessed through defined
|
||
metadata within a particular flavor of an instance.</para>
|
||
</section>
|
||
|
||
<section xml:id="openstack-components-tech-considerations">
|
||
<title>OpenStack components</title>
|
||
<para>A general purpose OpenStack cloud design should incorporate
|
||
the core OpenStack services to provide a wide range of
|
||
services to end-users. The OpenStack core services recommended
|
||
in a general purpose cloud are:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>Compute</glossterm>
|
||
(<glossterm>nova</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>Networking</glossterm>
|
||
(<glossterm>neutron</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>Image service</glossterm>
|
||
(<glossterm>glance</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>Identity</glossterm>
|
||
(<glossterm>keystone</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>OpenStack <glossterm>dashboard</glossterm>
|
||
(<glossterm>horizon</glossterm>)</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para><glossterm>Telemetry</glossterm> module
|
||
(<glossterm>ceilometer</glossterm>)</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>A general purpose cloud may also include OpenStack
|
||
<glossterm>Object Storage</glossterm> (<glossterm>swift</glossterm>).
|
||
OpenStack <glossterm>Block Storage</glossterm>
|
||
(<glossterm>cinder</glossterm>). These may be
|
||
selected to provide storage to applications and
|
||
instances.</para>
|
||
<note>
|
||
<para>However, depending on the use case, these could be
|
||
optional.</para>
|
||
</note>
|
||
</section>
|
||
|
||
<section xml:id="supplemental-software-tech-considerations">
|
||
<title>Supplemental software</title>
|
||
<para>A general purpose OpenStack deployment consists of more than
|
||
just OpenStack-specific components. A typical deployment
|
||
involves services that provide supporting functionality,
|
||
including databases and message queues, and may also involve
|
||
software to provide high availability of the OpenStack
|
||
environment. Design decisions around the underlying message
|
||
queue might affect the required number of controller services,
|
||
as well as the technology to provide highly resilient database
|
||
functionality, such as MariaDB with Galera. In such a
|
||
scenario, replication of services relies on quorum. Therefore,
|
||
the underlying database nodes, for example, should consist of
|
||
at least 3 nodes to account for the recovery of a failed
|
||
Galera node. When increasing the number of nodes to support a
|
||
feature of the software, consideration of rack space and
|
||
switch port density becomes important.</para>
|
||
<para>Where many general purpose deployments use hardware load
|
||
balancers to provide highly available API access and SSL
|
||
termination, software solutions, for example HAProxy, can also
|
||
be considered. It is vital to ensure that such software
|
||
implementations are also made highly available. High
|
||
availability can be achieved by using software such as
|
||
Keepalived or Pacemaker with Corosync. Pacemaker and Corosync
|
||
can provide active-active or active-passive highly available
|
||
configuration depending on the specific service in the
|
||
OpenStack environment. Using this software can affect the
|
||
design as it assumes at least a 2-node controller
|
||
infrastructure where one of those nodes may be running certain
|
||
services in standby mode.</para>
|
||
<para>Memcached is a distributed memory object caching system, and
|
||
Redis is a key-value store. Both are deployed on
|
||
general purpose clouds to assist in alleviating load to the
|
||
Identity service. The memcached service caches tokens, and due
|
||
to its distributed nature it can help alleviate some
|
||
bottlenecks to the underlying authentication system. Using
|
||
memcached or Redis does not affect the overall design of your
|
||
architecture as they tend to be deployed onto the
|
||
infrastructure nodes providing the OpenStack services.</para>
|
||
</section>
|
||
|
||
<section xml:id="performance-tech-considerations">
|
||
<title>Performance</title>
|
||
<para>Performance of an OpenStack deployment is dependent on a
|
||
number of factors related to the infrastructure and controller
|
||
services. The user requirements can be split into general
|
||
network performance, performance of compute resources, and
|
||
performance of storage systems.</para>
|
||
</section>
|
||
|
||
<section xml:id="controller-infrastructure-tech-considerations">
|
||
<title>Controller infrastructure</title>
|
||
<para>The Controller infrastructure nodes provide management
|
||
services to the end-user as well as providing services
|
||
internally for the operating of the cloud. The Controllers
|
||
run message queuing services that carry system
|
||
messages between each service. Performance issues related to
|
||
the message bus would lead to delays in sending that message
|
||
to where it needs to go. The result of this condition would be
|
||
delays in operation functions such as spinning up and deleting
|
||
instances, provisioning new storage volumes and managing
|
||
network resources. Such delays could adversely affect an
|
||
application’s ability to react to certain conditions,
|
||
especially when using auto-scaling features. It is important
|
||
to properly design the hardware used to run the controller
|
||
infrastructure as outlined above in the Hardware Selection
|
||
section.</para>
|
||
<para>Performance of the controller services is not limited
|
||
to processing power, but restrictions may emerge in serving
|
||
concurrent users. Ensure that the APIs and Horizon services
|
||
are load tested to ensure that you are able to serve your
|
||
customers. Particular attention should be made to the
|
||
OpenStack Identity Service (Keystone), which provides the
|
||
authentication and authorization for all services, both
|
||
internally to OpenStack itself and to end-users. This service
|
||
can lead to a degradation of overall performance if this is
|
||
not sized appropriately.</para>
|
||
</section>
|
||
|
||
<section xml:id="network-performance-tech-considerations">
|
||
<title>Network performance</title>
|
||
<para>In a general purpose OpenStack cloud, the requirements of
|
||
the network help determine performance capabilities. For
|
||
example, small deployments may employ 1 Gigabit Ethernet (GbE)
|
||
networking, whereas larger installations serving multiple
|
||
departments or many users would be better architected with
|
||
10 GbE networking. The performance of the running instances will
|
||
be limited by these speeds. It is possible to design OpenStack
|
||
environments that run a mix of networking capabilities. By
|
||
utilizing the different interface speeds, the users of the
|
||
OpenStack environment can choose networks that are fit for
|
||
their purpose.</para>
|
||
<para>For example, web application instances may run
|
||
on a public network presented through OpenStack Networking
|
||
that has 1 GbE capability, whereas the back-end database uses
|
||
an OpenStack Networking network that has 10 GbE capability to
|
||
replicate its data or, in some cases, the design may
|
||
incorporate link aggregation for greater throughput.</para>
|
||
<para>Network performance can be boosted considerably by
|
||
implementing hardware load balancers to provide front-end
|
||
service to the cloud APIs. The hardware load balancers also
|
||
perform SSL termination if that is a requirement of your
|
||
environment. When implementing SSL offloading, it is important
|
||
to understand the SSL offloading capabilities of the devices
|
||
selected.</para>
|
||
</section>
|
||
|
||
<section xml:id="compute-host-tech-considerations">
|
||
<title>Compute host</title>
|
||
<para>The choice of hardware specifications used in compute nodes
|
||
including CPU, memory and disk type directly affects the
|
||
performance of the instances. Other factors which can directly
|
||
affect performance include tunable parameters within the
|
||
OpenStack services, for example the overcommit ratio applied
|
||
to resources. The defaults in OpenStack Compute set a 16:1
|
||
over-commit of the CPU and 1.5 over-commit of the memory.
|
||
Running at such high ratios leads to an increase in
|
||
"noisy-neighbor" activity. Care must be taken when sizing your
|
||
Compute environment to avoid this scenario. For running
|
||
general purpose OpenStack environments it is possible to keep
|
||
to the defaults, but make sure to monitor your environment as
|
||
usage increases.</para>
|
||
</section>
|
||
|
||
<section xml:id="storage-performance-tech-considerations">
|
||
<title>Storage performance</title>
|
||
<para>When considering performance of OpenStack Block Storage,
|
||
hardware and architecture choice is important. Block Storage
|
||
can use enterprise back-end systems such as NetApp or EMC,
|
||
scale out storage such as GlusterFS and Ceph, or simply use
|
||
the capabilities of directly attached storage in the nodes
|
||
themselves. Block Storage may be deployed so that traffic
|
||
traverses the host network, which could affect, and be
|
||
adversely affected by, the front-side API traffic performance.
|
||
As such, consider using a dedicated data storage network with
|
||
dedicated interfaces on the Controller and Compute
|
||
hosts.</para>
|
||
<para>When considering performance of OpenStack Object Storage, a
|
||
number of design choices will affect performance. A user’s
|
||
access to the Object Storage is through the proxy services,
|
||
which sit behind hardware load balancers. By the
|
||
very nature of a highly resilient storage system, replication
|
||
of the data would affect performance of the overall system. In
|
||
this case, 10 GbE (or better) networking is recommended
|
||
throughout the storage network architecture.</para>
|
||
</section>
|
||
|
||
<section xml:id="availability-tech-considerations">
|
||
<title>Availability</title>
|
||
<para>In OpenStack, the infrastructure is integral to providing
|
||
services and should always be available, especially when
|
||
operating with SLAs. Ensuring network availability is
|
||
accomplished by designing the network architecture so that no
|
||
single point of failure exists. A consideration of the number
|
||
of switches, routes and redundancies of power should be
|
||
factored into core infrastructure, as well as the associated
|
||
bonding of networks to provide diverse routes to your highly
|
||
available switch infrastructure.</para>
|
||
<para>The OpenStack services themselves should be deployed across
|
||
multiple servers that do not represent a single point of
|
||
failure. Ensuring API availability can be achieved by placing
|
||
these services behind highly available load balancers that
|
||
have multiple OpenStack servers as members.</para>
|
||
<para>OpenStack lends itself to deployment in a highly available
|
||
manner where it is expected that at least 2 servers be
|
||
utilized. These can run all the services involved from the
|
||
message queuing service, for example RabbitMQ or QPID, and an
|
||
appropriately deployed database service such as MySQL or
|
||
MariaDB. As services in the cloud are scaled out, back-end
|
||
services will need to scale too. Monitoring and reporting on
|
||
server utilization and response times, as well as load testing
|
||
your systems, will help determine scale out decisions.</para>
|
||
<para>Care must be taken when deciding network functionality.
|
||
Currently, OpenStack supports both the legacy networking (nova-network)
|
||
system and the newer, extensible OpenStack Networking (neutron). Both
|
||
have their pros and cons when it comes to providing highly
|
||
available access. Legacy networking, which provides networking
|
||
access maintained in the OpenStack Compute code, provides a
|
||
feature that removes a single point of failure when it comes
|
||
to routing, and this feature is currently missing in OpenStack
|
||
Networking. The effect of legacy networking’s multi-host
|
||
functionality restricts failure domains to the host running
|
||
that instance.</para>
|
||
<para>When using OpenStack Networking, the
|
||
OpenStack controller servers or separate Networking
|
||
hosts handle routing. For a deployment that requires features
|
||
available in only Networking, it is possible to
|
||
remove this restriction by using third party software that
|
||
helps maintain highly available L3 routes. Doing so allows for
|
||
common APIs to control network hardware, or to provide complex
|
||
multi-tier web applications in a secure manner. It is also
|
||
possible to completely remove routing from
|
||
Networking, and instead rely on hardware routing capabilities.
|
||
In this case, the switching infrastructure must support L3
|
||
routing.</para>
|
||
<para>OpenStack Networking and legacy networking
|
||
both have their advantages and
|
||
disadvantages. They are both valid and supported options that
|
||
fit different network deployment models described in the
|
||
<citetitle><link
|
||
xlink:href="http://docs.openstack.org/openstack-ops/content/network_design.html#network_deployment_options"
|
||
>OpenStack Operations Guide</link></citetitle>.</para>
|
||
<para>Ensure your deployment has adequate back-up capabilities. As
|
||
an example, in a deployment that has two infrastructure
|
||
controller nodes, the design should include controller
|
||
availability. In the event of the loss of a single controller,
|
||
cloud services will run from a single controller in the event
|
||
of failure. Where the design has higher availability
|
||
requirements, it is important to meet those requirements by
|
||
designing the proper redundancy and availability of controller
|
||
nodes.</para>
|
||
<para>Application design must also be factored into the
|
||
capabilities of the underlying cloud infrastructure. If the
|
||
compute hosts do not provide a seamless live migration
|
||
capability, then it must be expected that when a compute host
|
||
fails, that instance and any data local to that instance will
|
||
be deleted. Conversely, when providing an expectation to users
|
||
that instances have a high-level of uptime guarantees, the
|
||
infrastructure must be deployed in a way that eliminates any
|
||
single point of failure when a compute host disappears. This
|
||
may include utilizing shared file systems on enterprise
|
||
storage or OpenStack Block storage to provide a level of
|
||
guarantee to match service features.</para>
|
||
<para>For more information on high availability in OpenStack, see the <link
|
||
xlink:href="http://docs.openstack.org/high-availability-guide"><citetitle>OpenStack
|
||
High Availability Guide</citetitle></link>.
|
||
</para>
|
||
</section>
|
||
|
||
<section xml:id="security-tech-considerations">
|
||
<title>Security</title>
|
||
<para>A security domain comprises users, applications, servers or
|
||
networks that share common trust requirements and expectations
|
||
within a system. Typically they have the same authentication
|
||
and authorization requirements and users.</para>
|
||
<para>These security domains are:</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>Public</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Guest</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Management</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>Data</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>These security domains can be mapped to an OpenStack
|
||
deployment individually, or combined. For example, some
|
||
deployment topologies combine both guest and data domains onto
|
||
one physical network, whereas in other cases these networks
|
||
are physically separated. In each case, the cloud operator
|
||
should be aware of the appropriate security concerns. Security
|
||
domains should be mapped out against your specific OpenStack
|
||
deployment topology. The domains and their trust requirements
|
||
depend upon whether the cloud instance is public, private, or
|
||
hybrid.</para>
|
||
<para>The public security domain is an entirely untrusted area of
|
||
the cloud infrastructure. It can refer to the Internet as a
|
||
whole or simply to networks over which you have no authority.
|
||
This domain should always be considered untrusted.</para>
|
||
<para>Typically used for compute instance-to-instance traffic, the
|
||
guest security domain handles compute data generated by
|
||
instances on the cloud but not services that support the
|
||
operation of the cloud, such as API calls. Public cloud
|
||
providers and private cloud providers who do not have
|
||
stringent controls on instance use or who allow unrestricted
|
||
Internet access to instances should consider this domain to be
|
||
untrusted. Private cloud providers may want to consider this
|
||
network as internal and therefore trusted only if they have
|
||
controls in place to assert that they trust instances and all
|
||
their tenants.</para>
|
||
<para>The management security domain is where services interact.
|
||
Sometimes referred to as the "control plane", the networks in
|
||
this domain transport confidential data such as configuration
|
||
parameters, user names, and passwords. In most deployments this
|
||
domain is considered trusted.</para>
|
||
<para>The data security domain is concerned primarily with
|
||
information pertaining to the storage services within
|
||
OpenStack. Much of the data that crosses this network has high
|
||
integrity and confidentiality requirements and, depending on
|
||
the type of deployment, may also have strong availability
|
||
requirements. The trust level of this network is heavily
|
||
dependent on other deployment decisions.</para>
|
||
<para>When deploying OpenStack in an enterprise as a private cloud
|
||
it is usually behind the firewall and within the trusted
|
||
network alongside existing systems. Users of the cloud are,
|
||
traditionally, employees that are bound by the security
|
||
requirements set forth by the company. This tends to push most
|
||
of the security domains towards a more trusted model. However,
|
||
when deploying OpenStack in a public facing role, no
|
||
assumptions can be made and the attack vectors significantly
|
||
increase. For example, the API endpoints, along with the
|
||
software behind them, become vulnerable to bad actors wanting
|
||
to gain unauthorized access or prevent access to services,
|
||
which could lead to loss of data, functionality, and
|
||
reputation. These services must be protected against through
|
||
auditing and appropriate filtering.</para>
|
||
<para>Consideration must be taken when managing the users of the
|
||
system for both public and private clouds. The identity
|
||
service allows for LDAP to be part of the authentication
|
||
process. Including such systems in an OpenStack deployment may
|
||
ease user management if integrating into existing
|
||
systems.</para>
|
||
<para>It's important to understand that user authentication
|
||
requests include sensitive information including user names,
|
||
passwords and authentication tokens. For this reason, placing
|
||
the API services behind hardware that performs SSL termination
|
||
is strongly recommended.</para>
|
||
<para>
|
||
For more information OpenStack Security, see the <link
|
||
xlink:href="http://docs.openstack.org/security-guide/"><citetitle>OpenStack
|
||
Security Guide</citetitle></link>
|
||
</para>
|
||
</section>
|
||
</section>
|