openstack-manuals/doc/arch-design/generalpurpose/section_tech_considerations_general_purpose.xml
loquacity 9aa6a15ef8 Editing Tech Considerations section
In the General Purpose chapter, this is a copy edit of the
top of the tech considerations section.

Change-Id: I620ba3342c39f8820925b2109741aa42db1f3f2e
Implements: blueprint arch-guide
2015-08-20 05:45:00 +10:00

802 lines
43 KiB
XML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE section [
<!ENTITY % openstack SYSTEM "../../common/entities/openstack.ent">
%openstack;
]>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="technical-considerations-general-purpose">
<?dbhtml stop-chunking?>
<title>Technical considerations</title>
<para>General purpose clouds are expected to
include these base services:</para>
<itemizedlist>
<listitem>
<para>
Compute
</para>
</listitem>
<listitem>
<para>
Network
</para>
</listitem>
<listitem>
<para>
Storage
</para>
</listitem>
</itemizedlist>
<para>Each of these services have different resource requirements.
As a result, you must make design decisions relating directly
to the service, as well as provide a balanced infrastructure for
all services.</para>
<para>Take into consideration the unique aspects of each service, as
individual characteristics and service mass can impact the hardware
selection process. Hardware designs should be generated for each of the
services.</para>
<para>Hardware decisions are also made in relation to network architecture
and facilities planning. These factors play heavily into
the overall architecture of an OpenStack cloud.</para>
<section xml:id="designing-compute-resources-tech-considerations">
<title>Compute resource design</title>
<para>When designing compute resource pools, a number of factors
can impact your design decisions. Factors such as number of processors,
amount of memory, and the quantity of storage required for each hypervisor
must be taken into account.</para>
<para>You will also need to decide whether to provide compute resources
in a single pool or in multiple pools. In most cases, multiple pools
of resources can be allocated and addressed on demand. A compute design
that allocates multiple pools of resources makes best use of application
resources, and is commonly referred to as
<firstterm>bin packing</firstterm>.</para>
<para>In a bin packing design, each independent resource pool provides service
for specific flavors. This helps to ensure that, as instances are scheduled
onto compute hypervisors, each independent node's resources will be allocated
in a way that makes the most efficient use of the available hardware. Bin
packing also requires a common hardware design, with all hardware nodes within
a compute resource pool sharing a common processor, memory, and storage layout.
This makes it easier to deploy, support, and maintain nodes throughout their
life cycle.</para>
<para>An <firstterm>overcommit ratio</firstterm> is the ratio of available
virtual resources to available physical resources. This ratio is
configurable for CPU and memory. The default CPU overcommit ratio is 16:1, and
the default memory overcommit ratio is 1.5:1. Determining the tuning of the
overcommit ratios during the design phase is important as it has a direct
impact on the hardware layout of your compute nodes.</para>
<para>For example, if you wanted to design a hardware node as a compute resource
pool to service instances, consider the number of processor cores available
on the node as well as the required disk and memory to service instances
running at capacity. For a server with 2 CPUs of 10 cores each, with
hyperthreading turned on, the default CPU overcommit ratio of 16:1 would allow
for 640 (2 &times; 10 &times; 2 &times; 16) total <literal>m1.small</literal>
instances, where each instance uses 1 vCPU, 20&nbsp;GB of ephemeral storage
and 2,048&nbsp;MB of RAM. By the same reasoning, using the default memory
overcommit ratio of 1.5:1 you can determine that the server will need at least
853&nbsp;GB (640 &times; 2,048&nbsp;MB / 1.5) of RAM. When sizing nodes for
memory, it is also important to consider the additional memory required to
service operating system and service needs.</para>
<para>When selecting a processor, compare features and performance
characteristics. Some processors include features specific to virtualized
compute hosts, such as hardware-assisted virtualization, and technology
related to memory paging (also known as EPT shadowing). These types of features
can have a significant impact on the performance of your virtual machine.</para>
<para>You will also need to consider the compute requirements of non-hypervisor
nodes (sometimes referred to as resource nodes). This includes controller, object
storage, and block storage nodes, and networking services.</para>
<para>The number of processor cores and threads impacts the number of worker
threads which can be run on a resource node. Design decisions must relate
directly to the service being run on it, as well as provide a balanced
infrastructure for all services.</para>
<para>Workload can be unpredictable in a general purpose cloud, so consider
including the ability to add additional compute resource pools on demand.
In some cases, however, the demand for certain instance types or flavors may not
justify individual hardware design. In either case, start by allocating
hardware designs that are capable of servicing the most common instance
requests. If you want to add additional hardware to the overall architecture,
this can be done later.</para>
</section>
<section xml:id="designing-network-resources-tech-considerations">
<title>Designing network resources</title>
<para>OpenStack clouds generally have multiple network segments, with
each segment providing access to particular resources. The network services
themselves also require network communication paths which should
be separated from the other networks. When designing network services
for a general purpose cloud, plan for either a physical or logical
separation of network segments used by operators and tenants. You can also
create an additional network segment for access to internal services such as
the message bus and database used by various services. Segregating these
services onto separate networks helps to protect sensitive data and protects
against unauthorized access to services.</para>
<para>Choose a networking service based on the requirements of your instances.
The architecture and design of your cloud will impact whether you choose
OpenStack Networking(neutron), or legacy networking (nova-network).</para>
<variablelist>
<varlistentry>
<term>Legacy networking (nova-network)</term>
<listitem>
<para>The legacy networking (nova-network) service is primarily a
layer-2 networking service that functions in two modes, which
use VLANs in different ways. In a flat network mode, all
network hardware nodes and devices throughout the cloud are connected
to a single layer-2 network segment that provides access to
application data.</para>
<para>When the network devices in the cloud support segmentation
using VLANs, legacy networking can operate in the second mode. In
this design model, each tenant within the cloud is assigned a
network subnet which is mapped to a VLAN on the physical
network. It is especially important to remember the maximum
number of 4096 VLANs which can be used within a spanning tree
domain. This places a hard limit on the amount of
growth possible within the data center. When designing a
general purpose cloud intended to support multiple tenants, we
recommend the use of legacy networking with VLANs, and
not in flat network mode.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Another consideration regarding network is the fact that
legacy networking is entirely managed by the cloud operator;
tenants do not have control over network resources. If tenants
require the ability to manage and create network resources
such as network segments and subnets, it will be necessary to
install the OpenStack Networking service to provide network
access to instances.</para>
<variablelist>
<varlistentry>
<term>OpenStack Networking (neutron)</term>
<listitem>
<para>OpenStack Networking (neutron) is a first class networking
service that gives full control over creation of virtual
network resources to tenants. This is often accomplished in
the form of tunneling protocols which will establish
encapsulated communication paths over existing network
infrastructure in order to segment tenant traffic. These
methods vary depending on the specific implementation, but
some of the more common methods include tunneling over GRE,
encapsulating with VXLAN, and VLAN tags.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Initially, we recommend you design at least three network segments.</para>
<para>The first segment is a public network, used for access to REST APIs
by tenants and operators. Generally, the controller nodes and swift
proxies will be the only devices connecting to this network segment. In some
cases, this network might also be serviced by hardware load balancers
and other network devices.</para>
<para>The second segment is used by administrators to manage hardware resources.
It is also used by configuration management tools for deploying software and
services onto new hardware. In some cases, this network segment might also be
used for internal services, including the message bus and database services.
This network will probably need to communicate with every hardware node.
Due to the highly secure nature of this network segment, you will also need to
secure this network from unauthorized access.</para>
<para>The third network segment is used by applications and consumers to access
the physical network, and for users to access applications. This network is
generally segregated from the one used to access the cloud APIs and is not
capable of communicating directly with the hardware resources in the cloud.
Compute resource nodes will need to communicate on this network segment, as
will any network gateway services which allow application data to access the
physical network from outside of the cloud.</para>
</section>
<section xml:id="designing-storage-resources-tech-considerations">
<title>Designing storage resources</title>
<para>OpenStack has two independent storage services to consider,
each with its own specific design requirements and goals. In
addition to services which provide storage as their primary
function, there are additional design considerations with
regard to compute and controller nodes which will affect the
overall cloud architecture.</para> <!--Could there be room for more tech content here? A.S-->
</section>
<section xml:id="designing-openstack-object-storage-tech-considerations">
<title>Designing OpenStack Object Storage</title>
<para>When designing hardware resources for OpenStack Object
Storage, the primary goal is to maximize the amount of storage
in each resource node while also ensuring that the cost per
terabyte is kept to a minimum. This often involves utilizing
servers which can hold a large number of spinning disks.
Whether choosing to use 2U server form factors with directly
attached storage or an external chassis that holds a larger
number of drives, the main goal is to maximize the storage
available in each node.</para>
<para>We do not recommended investing in enterprise class drives
for an OpenStack Object Storage cluster. The consistency and
partition tolerance characteristics of OpenStack Object
Storage will ensure that data stays up to date and survives
hardware faults without the use of any specialized data
replication devices.</para>
<para>One of the benefits of OpenStack Object Storage is the ability
to mix and match drives by making use of weighting within the
swift ring. When designing your swift storage cluster, we
recommend making use of the most cost effective storage
solution available at the time. Many server chassis on the
market can hold 60 or more drives in 4U of rack space,
therefore we recommend maximizing the amount of storage
per rack unit at the best cost per terabyte. Furthermore, we
do not recommend the use of RAID controllers in an object storage
node.</para>
<para>To achieve durability and availability of data stored as objects
it is important to design object storage resource pools to ensure they can
provide the suggested availability. Considering rack-level and zone-level
designs to accommodate the number of replicas configured to be stored in the
Object Storage service (the defult number of replicas is three) is important
when designing beyond the hardware node level. Each replica of
data should exist in its own availability zone with its own
power, cooling, and network resources available to service
that specific zone.</para>
<para>Object storage nodes should be designed so that the number
of requests does not hinder the performance of the cluster.
The object storage service is a chatty protocol, therefore
making use of multiple processors that have higher core counts
will ensure the IO requests do not inundate the server.</para>
</section>
<section xml:id="designing-openstack-block-storage">
<title>Designing OpenStack Block Storage</title>
<para>When designing OpenStack Block Storage resource nodes, it is
helpful to understand the workloads and requirements that will
drive the use of block storage in the cloud. We recommend designing
block storage pools so that tenants can choose appropriate storage
solutions for their applications. By creating multiple storage pools of different
types, in conjunction with configuring an advanced storage
scheduler for the block storage service, it is possible to
provide tenants with a large catalog of storage services with
a variety of performance levels and redundancy options.</para>
<para>Block storage also takes advantage of a number of enterprise storage
solutions. These are addressed via a plug-in driver developed by the
hardware vendor. A large number of
enterprise storage plug-in drivers ship out-of-the-box with
OpenStack Block Storage (and many more available via third
party channels). General purpose clouds are more likely to use
directly attached storage in the majority of block storage nodes,
deeming it necessary to provide additional levels of service to tenants
which can only be provided by enterprise class storage solutions.</para>
<para>Redundancy and availability requirements impact the decision to use
a RAID controller card in block storage nodes. The input-output per second (IOPS)
demand of your application will influence whether or not you should use a RAID
controller, and which level of RAID is required.
Making use of higher performing RAID volumes is suggested when
considering performance. However, where redundancy of
block storage volumes is more important we recommend
making use of a redundant RAID configuration such as RAID 5 or
RAID 6. Some specialized features, such as automated
replication of block storage volumes, may require the use of
third-party plug-ins and enterprise block storage solutions in
order to provide the high demand on storage. Furthermore,
where extreme performance is a requirement it may also be
necessary to make use of high speed SSD disk drives' high
performing flash storage solutions.</para>
</section>
<section xml:id="software-selection-tech-considerations">
<title>Software selection</title>
<para>The software selection process plays a large role in the
architecture of a general purpose cloud. The following have
a large impact on the design of the cloud:</para>
<itemizedlist>
<listitem>
<para>
Choice of operating system
</para>
</listitem>
<listitem>
<para>
Selection of OpenStack software components
</para>
</listitem>
<listitem>
<para>
Choice of hypervisor
</para>
</listitem>
<listitem>
<para>
Selection of supplemental software
</para>
</listitem>
</itemizedlist>
<para>Operating system (OS) selection plays a large role in the
design and architecture of a cloud. There are a number of OSes
which have native support for OpenStack including:</para>
<itemizedlist>
<listitem>
<para>
Ubuntu
</para>
</listitem>
<listitem>
<para>
Red Hat Enterprise Linux (RHEL)
</para>
</listitem>
<listitem>
<para>
CentOS
</para>
</listitem>
<listitem>
<para>
SUSE Linux Enterprise Server (SLES)
</para>
</listitem>
</itemizedlist>
<note>
<para>Native support is not a constraint on the choice of OS; users are
free to choose just about any Linux distribution (or even
Microsoft Windows) and install OpenStack directly from source
(or compile their own packages). However, many organizations will
prefer to install OpenStack from distribution-supplied packages or
repositories (although using the distribution vendor's OpenStack
packages might be a requirement for support).
</para>
</note>
<para>OS selection also directly influences hypervisor selection.
A cloud architect who selects Ubuntu, RHEL, or SLES has some
flexibility in hypervisor; KVM, Xen, and LXC are supported
virtualization methods available under OpenStack Compute
(nova) on these Linux distributions. However, a cloud architect
who selects Hyper-V is limited to Windows Servers. Similarly, a
cloud architect who selects XenServer is limited to the CentOS-based
dom0 operating system provided with XenServer.</para>
<para>The primary factors that play into OS-hypervisor selection
include:</para>
<variablelist>
<varlistentry>
<term>User requirements</term>
<listitem>
<para>The selection of OS-hypervisor
combination first and foremost needs to support the
user requirements.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Support</term>
<listitem>
<para>The selected OS-hypervisor combination
needs to be supported by OpenStack.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Interoperability</term>
<listitem>
<para>The OS-hypervisor needs to be
interoperable with other features and services in the
OpenStack design in order to meet the user
requirements.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section xml:id="hypervisor-tech-considerations">
<title>Hypervisor</title>
<para>OpenStack supports a wide variety of hypervisors, one or
more of which can be used in a single cloud. These hypervisors
include:</para>
<itemizedlist>
<listitem>
<para>KVM (and QEMU)</para>
</listitem>
<listitem>
<para>XCP/XenServer</para>
</listitem>
<listitem>
<para>vSphere (vCenter and ESXi)</para>
</listitem>
<listitem>
<para>Hyper-V</para>
</listitem>
<listitem>
<para>LXC</para>
</listitem>
<listitem>
<para>Docker</para>
</listitem>
<listitem>
<para>Bare-metal</para>
</listitem>
</itemizedlist>
<para>A complete list of supported hypervisors and their
capabilities can be found at
<link xlink:href="https://wiki.openstack.org/wiki/HypervisorSupportMatrix">OpenStack Hypervisor Support Matrix</link>.
</para>
<para>We recommend general purpose clouds use hypervisors that
support the most general purpose use cases, such as KVM and
Xen. More specific hypervisors should be chosen to account
for specific functionality or a supported feature requirement.
In some cases, there may also be a mandated
requirement to run software on a certified hypervisor
including solutions from VMware, Microsoft, and Citrix.</para>
<para>The features offered through the OpenStack cloud platform
determine the best choice of a hypervisor. As an example, for
a general purpose cloud that predominantly supports a
Microsoft-based migration, or is managed by staff that has a
particular skill for managing certain hypervisors and
operating systems, Hyper-V would be the best available choice.
While the decision to use Hyper-V does not limit the ability
to run alternative operating systems, be mindful of those that
are deemed supported. Each different hypervisor also has their
own hardware requirements which may affect the decisions
around designing a general purpose cloud. For example, to
utilize the live migration feature of VMware, vMotion, this
requires an installation of vCenter/vSphere and the use of the
ESXi hypervisor, which increases the infrastructure
requirements.</para>
<para>In a mixed hypervisor environment, specific aggregates of
compute resources, each with defined capabilities, enable
workloads to utilize software and hardware specific to their
particular requirements. This functionality can be exposed
explicitly to the end user, or accessed through defined
metadata within a particular flavor of an instance.</para>
</section>
<section xml:id="openstack-components-tech-considerations">
<title>OpenStack components</title>
<para>A general purpose OpenStack cloud design should incorporate
the core OpenStack services to provide a wide range of
services to end-users. The OpenStack core services recommended
in a general purpose cloud are:</para>
<itemizedlist>
<listitem>
<para>OpenStack <glossterm>Compute</glossterm>
(<glossterm>nova</glossterm>)</para>
</listitem>
<listitem>
<para>OpenStack <glossterm>Networking</glossterm>
(<glossterm>neutron</glossterm>)</para>
</listitem>
<listitem>
<para>OpenStack <glossterm>Image service</glossterm>
(<glossterm>glance</glossterm>)</para>
</listitem>
<listitem>
<para>OpenStack <glossterm>Identity</glossterm>
(<glossterm>keystone</glossterm>)</para>
</listitem>
<listitem>
<para>OpenStack <glossterm>dashboard</glossterm>
(<glossterm>horizon</glossterm>)</para>
</listitem>
<listitem>
<para><glossterm>Telemetry</glossterm> module
(<glossterm>ceilometer</glossterm>)</para>
</listitem>
</itemizedlist>
<para>A general purpose cloud may also include OpenStack
<glossterm>Object Storage</glossterm> (<glossterm>swift</glossterm>).
OpenStack <glossterm>Block Storage</glossterm>
(<glossterm>cinder</glossterm>). These may be
selected to provide storage to applications and
instances.</para>
<note>
<para>However, depending on the use case, these could be
optional.</para>
</note>
</section>
<section xml:id="supplemental-software-tech-considerations">
<title>Supplemental software</title>
<para>A general purpose OpenStack deployment consists of more than
just OpenStack-specific components. A typical deployment
involves services that provide supporting functionality,
including databases and message queues, and may also involve
software to provide high availability of the OpenStack
environment. Design decisions around the underlying message
queue might affect the required number of controller services,
as well as the technology to provide highly resilient database
functionality, such as MariaDB with Galera. In such a
scenario, replication of services relies on quorum. Therefore,
the underlying database nodes, for example, should consist of
at least 3 nodes to account for the recovery of a failed
Galera node. When increasing the number of nodes to support a
feature of the software, consideration of rack space and
switch port density becomes important.</para>
<para>Where many general purpose deployments use hardware load
balancers to provide highly available API access and SSL
termination, software solutions, for example HAProxy, can also
be considered. It is vital to ensure that such software
implementations are also made highly available. High
availability can be achieved by using software such as
Keepalived or Pacemaker with Corosync. Pacemaker and Corosync
can provide active-active or active-passive highly available
configuration depending on the specific service in the
OpenStack environment. Using this software can affect the
design as it assumes at least a 2-node controller
infrastructure where one of those nodes may be running certain
services in standby mode.</para>
<para>Memcached is a distributed memory object caching system, and
Redis is a key-value store. Both are deployed on
general purpose clouds to assist in alleviating load to the
Identity service. The memcached service caches tokens, and due
to its distributed nature it can help alleviate some
bottlenecks to the underlying authentication system. Using
memcached or Redis does not affect the overall design of your
architecture as they tend to be deployed onto the
infrastructure nodes providing the OpenStack services.</para>
</section>
<section xml:id="performance-tech-considerations">
<title>Performance</title>
<para>Performance of an OpenStack deployment is dependent on a
number of factors related to the infrastructure and controller
services. The user requirements can be split into general
network performance, performance of compute resources, and
performance of storage systems.</para>
</section>
<section xml:id="controller-infrastructure-tech-considerations">
<title>Controller infrastructure</title>
<para>The Controller infrastructure nodes provide management
services to the end-user as well as providing services
internally for the operating of the cloud. The Controllers
run message queuing services that carry system
messages between each service. Performance issues related to
the message bus would lead to delays in sending that message
to where it needs to go. The result of this condition would be
delays in operation functions such as spinning up and deleting
instances, provisioning new storage volumes and managing
network resources. Such delays could adversely affect an
applications ability to react to certain conditions,
especially when using auto-scaling features. It is important
to properly design the hardware used to run the controller
infrastructure as outlined above in the Hardware Selection
section.</para>
<para>Performance of the controller services is not limited
to processing power, but restrictions may emerge in serving
concurrent users. Ensure that the APIs and Horizon services
are load tested to ensure that you are able to serve your
customers. Particular attention should be made to the
OpenStack Identity Service (Keystone), which provides the
authentication and authorization for all services, both
internally to OpenStack itself and to end-users. This service
can lead to a degradation of overall performance if this is
not sized appropriately.</para>
</section>
<section xml:id="network-performance-tech-considerations">
<title>Network performance</title>
<para>In a general purpose OpenStack cloud, the requirements of
the network help determine performance capabilities. For
example, small deployments may employ 1 Gigabit Ethernet (GbE)
networking, whereas larger installations serving multiple
departments or many users would be better architected with
10&nbsp;GbE networking. The performance of the running instances will
be limited by these speeds. It is possible to design OpenStack
environments that run a mix of networking capabilities. By
utilizing the different interface speeds, the users of the
OpenStack environment can choose networks that are fit for
their purpose.</para>
<para>For example, web application instances may run
on a public network presented through OpenStack Networking
that has 1 GbE capability, whereas the back-end database uses
an OpenStack Networking network that has 10&nbsp;GbE capability to
replicate its data or, in some cases, the design may
incorporate link aggregation for greater throughput.</para>
<para>Network performance can be boosted considerably by
implementing hardware load balancers to provide front-end
service to the cloud APIs. The hardware load balancers also
perform SSL termination if that is a requirement of your
environment. When implementing SSL offloading, it is important
to understand the SSL offloading capabilities of the devices
selected.</para>
</section>
<section xml:id="compute-host-tech-considerations">
<title>Compute host</title>
<para>The choice of hardware specifications used in compute nodes
including CPU, memory and disk type directly affects the
performance of the instances. Other factors which can directly
affect performance include tunable parameters within the
OpenStack services, for example the overcommit ratio applied
to resources. The defaults in OpenStack Compute set a 16:1
over-commit of the CPU and 1.5 over-commit of the memory.
Running at such high ratios leads to an increase in
"noisy-neighbor" activity. Care must be taken when sizing your
Compute environment to avoid this scenario. For running
general purpose OpenStack environments it is possible to keep
to the defaults, but make sure to monitor your environment as
usage increases.</para>
</section>
<section xml:id="storage-performance-tech-considerations">
<title>Storage performance</title>
<para>When considering performance of OpenStack Block Storage,
hardware and architecture choice is important. Block Storage
can use enterprise back-end systems such as NetApp or EMC,
scale out storage such as GlusterFS and Ceph, or simply use
the capabilities of directly attached storage in the nodes
themselves. Block Storage may be deployed so that traffic
traverses the host network, which could affect, and be
adversely affected by, the front-side API traffic performance.
As such, consider using a dedicated data storage network with
dedicated interfaces on the Controller and Compute
hosts.</para>
<para>When considering performance of OpenStack Object Storage, a
number of design choices will affect performance. A users
access to the Object Storage is through the proxy services,
which sit behind hardware load balancers. By the
very nature of a highly resilient storage system, replication
of the data would affect performance of the overall system. In
this case, 10 GbE (or better) networking is recommended
throughout the storage network architecture.</para>
</section>
<section xml:id="availability-tech-considerations">
<title>Availability</title>
<para>In OpenStack, the infrastructure is integral to providing
services and should always be available, especially when
operating with SLAs. Ensuring network availability is
accomplished by designing the network architecture so that no
single point of failure exists. A consideration of the number
of switches, routes and redundancies of power should be
factored into core infrastructure, as well as the associated
bonding of networks to provide diverse routes to your highly
available switch infrastructure.</para>
<para>The OpenStack services themselves should be deployed across
multiple servers that do not represent a single point of
failure. Ensuring API availability can be achieved by placing
these services behind highly available load balancers that
have multiple OpenStack servers as members.</para>
<para>OpenStack lends itself to deployment in a highly available
manner where it is expected that at least 2 servers be
utilized. These can run all the services involved from the
message queuing service, for example RabbitMQ or QPID, and an
appropriately deployed database service such as MySQL or
MariaDB. As services in the cloud are scaled out, back-end
services will need to scale too. Monitoring and reporting on
server utilization and response times, as well as load testing
your systems, will help determine scale out decisions.</para>
<para>Care must be taken when deciding network functionality.
Currently, OpenStack supports both the legacy networking (nova-network)
system and the newer, extensible OpenStack Networking (neutron). Both
have their pros and cons when it comes to providing highly
available access. Legacy networking, which provides networking
access maintained in the OpenStack Compute code, provides a
feature that removes a single point of failure when it comes
to routing, and this feature is currently missing in OpenStack
Networking. The effect of legacy networkings multi-host
functionality restricts failure domains to the host running
that instance.</para>
<para>When using OpenStack Networking, the
OpenStack controller servers or separate Networking
hosts handle routing. For a deployment that requires features
available in only Networking, it is possible to
remove this restriction by using third party software that
helps maintain highly available L3 routes. Doing so allows for
common APIs to control network hardware, or to provide complex
multi-tier web applications in a secure manner. It is also
possible to completely remove routing from
Networking, and instead rely on hardware routing capabilities.
In this case, the switching infrastructure must support L3
routing.</para>
<para>OpenStack Networking and legacy networking
both have their advantages and
disadvantages. They are both valid and supported options that
fit different network deployment models described in the
<citetitle><link
xlink:href="http://docs.openstack.org/openstack-ops/content/network_design.html#network_deployment_options"
>OpenStack Operations Guide</link></citetitle>.</para>
<para>Ensure your deployment has adequate back-up capabilities. As
an example, in a deployment that has two infrastructure
controller nodes, the design should include controller
availability. In the event of the loss of a single controller,
cloud services will run from a single controller in the event
of failure. Where the design has higher availability
requirements, it is important to meet those requirements by
designing the proper redundancy and availability of controller
nodes.</para>
<para>Application design must also be factored into the
capabilities of the underlying cloud infrastructure. If the
compute hosts do not provide a seamless live migration
capability, then it must be expected that when a compute host
fails, that instance and any data local to that instance will
be deleted. Conversely, when providing an expectation to users
that instances have a high-level of uptime guarantees, the
infrastructure must be deployed in a way that eliminates any
single point of failure when a compute host disappears. This
may include utilizing shared file systems on enterprise
storage or OpenStack Block storage to provide a level of
guarantee to match service features.</para>
<para>For more information on high availability in OpenStack, see the <link
xlink:href="http://docs.openstack.org/high-availability-guide"><citetitle>OpenStack
High Availability Guide</citetitle></link>.
</para>
</section>
<section xml:id="security-tech-considerations">
<title>Security</title>
<para>A security domain comprises users, applications, servers or
networks that share common trust requirements and expectations
within a system. Typically they have the same authentication
and authorization requirements and users.</para>
<para>These security domains are:</para>
<itemizedlist>
<listitem>
<para>Public</para>
</listitem>
<listitem>
<para>Guest</para>
</listitem>
<listitem>
<para>Management</para>
</listitem>
<listitem>
<para>Data</para>
</listitem>
</itemizedlist>
<para>These security domains can be mapped to an OpenStack
deployment individually, or combined. For example, some
deployment topologies combine both guest and data domains onto
one physical network, whereas in other cases these networks
are physically separated. In each case, the cloud operator
should be aware of the appropriate security concerns. Security
domains should be mapped out against your specific OpenStack
deployment topology. The domains and their trust requirements
depend upon whether the cloud instance is public, private, or
hybrid.</para>
<para>The public security domain is an entirely untrusted area of
the cloud infrastructure. It can refer to the Internet as a
whole or simply to networks over which you have no authority.
This domain should always be considered untrusted.</para>
<para>Typically used for compute instance-to-instance traffic, the
guest security domain handles compute data generated by
instances on the cloud but not services that support the
operation of the cloud, such as API calls. Public cloud
providers and private cloud providers who do not have
stringent controls on instance use or who allow unrestricted
Internet access to instances should consider this domain to be
untrusted. Private cloud providers may want to consider this
network as internal and therefore trusted only if they have
controls in place to assert that they trust instances and all
their tenants.</para>
<para>The management security domain is where services interact.
Sometimes referred to as the "control plane", the networks in
this domain transport confidential data such as configuration
parameters, user names, and passwords. In most deployments this
domain is considered trusted.</para>
<para>The data security domain is concerned primarily with
information pertaining to the storage services within
OpenStack. Much of the data that crosses this network has high
integrity and confidentiality requirements and, depending on
the type of deployment, may also have strong availability
requirements. The trust level of this network is heavily
dependent on other deployment decisions.</para>
<para>When deploying OpenStack in an enterprise as a private cloud
it is usually behind the firewall and within the trusted
network alongside existing systems. Users of the cloud are,
traditionally, employees that are bound by the security
requirements set forth by the company. This tends to push most
of the security domains towards a more trusted model. However,
when deploying OpenStack in a public facing role, no
assumptions can be made and the attack vectors significantly
increase. For example, the API endpoints, along with the
software behind them, become vulnerable to bad actors wanting
to gain unauthorized access or prevent access to services,
which could lead to loss of data, functionality, and
reputation. These services must be protected against through
auditing and appropriate filtering.</para>
<para>Consideration must be taken when managing the users of the
system for both public and private clouds. The identity
service allows for LDAP to be part of the authentication
process. Including such systems in an OpenStack deployment may
ease user management if integrating into existing
systems.</para>
<para>It's important to understand that user authentication
requests include sensitive information including user names,
passwords and authentication tokens. For this reason, placing
the API services behind hardware that performs SSL termination
is strongly recommended.</para>
<para>
For more information OpenStack Security, see the <link
xlink:href="http://docs.openstack.org/security-guide/"><citetitle>OpenStack
Security Guide</citetitle></link>
</para>
</section>
</section>