Adds starting point for Architecture and Design Guide
The areas that still need work are: - needs double-checking for tables - see http://docs.openstack.org/arc/OpenStackArchitectureDesignGuide.epub for intended structure Co-Authored-By: Nick Chase <nchase@mirantis.com> Co-Authored-By: Beth Cohen <beth.cohen@verizon.com> Co-Authored-By: Sean Collins <sean_collins2@cable.comcast.com> Co-Authored-By: Steve Gordon <sgordon@redhat.com> Co-Authored-By: Sebastian Gutierrez <segutier@redhat.com> Co-Authored-By: Kevin Jackson <Kevin.Jackson@rackspace.co.uk> Co-Authored-By: Scott Lowe <slowe@vmware.com> Co-Authored-By: Maish Saidel-Keesing <msaidelk@cisco.com> Co-Authored-By: Alexandra Settle <alexandra.settle@rackspace.com> Co-Authored-By: Vinny Valdez <vvaldez@redhat.com> Co-Authored-By: Anthony Veiga <Anthony_Veiga@cable.comcast.com> Co-Authored-By: Sean Winn <sean.winn@cloudscaling.com> Change-Id: Ia0ca278cd5d2d0ee67b9b7528870c1a2a80fdadf
75
doc/arch-design/bk-openstack-arch-design.xml
Normal file
@ -0,0 +1,75 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<book xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="openstack-compute-admin-manual-grizzly">
|
||||
<title>OpenStack Architecture Design Guide</title>
|
||||
<?rax title.font.size="28px" subtitle.font.size="28px"?>
|
||||
<titleabbrev>Architecture Guide</titleabbrev>
|
||||
<info>
|
||||
<author>
|
||||
<personname>
|
||||
<firstname/>
|
||||
<surname/>
|
||||
</personname>
|
||||
<affiliation>
|
||||
<orgname>OpenStack Foundation</orgname>
|
||||
</affiliation>
|
||||
</author>
|
||||
<copyright>
|
||||
<year>2014</year>
|
||||
<holder>OpenStack Foundation</holder>
|
||||
</copyright>
|
||||
<releaseinfo>current</releaseinfo>
|
||||
<productname>OpenStack</productname>
|
||||
<pubdate/>
|
||||
<legalnotice role="apache2">
|
||||
<annotation>
|
||||
<remark>Copyright details are filled in by the
|
||||
template.</remark>
|
||||
</annotation>
|
||||
</legalnotice>
|
||||
<legalnotice role="cc-by-sa">
|
||||
<annotation>
|
||||
<remark>Remaining licensing details are filled in by
|
||||
the template.</remark>
|
||||
</annotation>
|
||||
</legalnotice>
|
||||
<abstract>
|
||||
<para>To reap the benefits of OpenStack, you should
|
||||
plan, design, and architect your cloud properly,
|
||||
taking user's needs into account and understanding the
|
||||
use cases.</para>
|
||||
</abstract>
|
||||
<revhistory>
|
||||
<!-- ... continue adding more revisions here as you change this document using the markup shown below... -->
|
||||
<revision>
|
||||
<date>2014-07-21</date>
|
||||
<revdescription>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Initial release.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</revdescription>
|
||||
</revision>
|
||||
</revhistory>
|
||||
</info>
|
||||
<!-- Chapters are referred from the book file through these
|
||||
include statements. You can add additional chapters using
|
||||
these types of statements. -->
|
||||
<xi:include href="../common/ch_preface.xml"/>
|
||||
<xi:include href="ch_introduction.xml"/>
|
||||
<xi:include href="ch_generalpurpose.xml"/>
|
||||
<xi:include href="ch_compute_focus.xml"/>
|
||||
<xi:include href="ch_storage_focus.xml"/>
|
||||
<xi:include href="ch_network_focus.xml"/>
|
||||
<xi:include href="ch_multi_site.xml"/>
|
||||
<xi:include href="ch_hybrid.xml"/>
|
||||
<xi:include href="ch_massively_scalable.xml"/>
|
||||
<xi:include href="ch_specialized.xml"/>
|
||||
<xi:include href="ch_references.xml"/><!--
|
||||
<xi:include href="ch_glossary.xml"/>-->
|
||||
<xi:include href="../common/app_support.xml"/>
|
||||
</book>
|
16
doc/arch-design/ch_compute_focus.xml
Normal file
@ -0,0 +1,16 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="compute_focus">
|
||||
<title>Compute Focused</title>
|
||||
|
||||
<xi:include href="compute_focus/section_introduction_compute_focus.xml"/>
|
||||
<xi:include href="compute_focus/section_user_requirements_compute_focus.xml"/>
|
||||
<xi:include href="compute_focus/section_tech_considerations_compute_focus.xml"/>
|
||||
<xi:include href="compute_focus/section_operational_considerations_compute_focus.xml"/>
|
||||
<xi:include href="compute_focus/section_architecture_compute_focus.xml"/>
|
||||
<xi:include href="compute_focus/section_prescriptive_examples_compute_focus.xml"/>
|
||||
|
||||
</chapter>
|
16
doc/arch-design/ch_generalpurpose.xml
Normal file
@ -0,0 +1,16 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="generalpurpose">
|
||||
<title>General Purpose</title>
|
||||
|
||||
<xi:include href="generalpurpose/section_introduction_generalpurpose.xml"/>
|
||||
<xi:include href="generalpurpose/section_user_requirements_general_purpose.xml"/>
|
||||
<xi:include href="generalpurpose/section_tech_considerations_general_purpose.xml"/>
|
||||
<xi:include href="generalpurpose/section_operational_considerations_general_purpose.xml"/>
|
||||
<xi:include href="generalpurpose/section_architecture_general_purpose.xml"/>
|
||||
<xi:include href="generalpurpose/section_prescriptive_example_general_purpose.xml"/>
|
||||
|
||||
</chapter>
|
580
doc/arch-design/ch_glossary.xml
Normal file
@ -0,0 +1,580 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-design-glossary">
|
||||
<title>Glossary</title>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>6to4 - A mechanism that allows IPv6 packets to be
|
||||
transmitted over an IPv4 network, providing a strategy
|
||||
for migrating to IPv6.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>AAA - authentication, authorization and
|
||||
auditing.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Anycast - A network routing methodology that routes
|
||||
traffic from a single sender to the nearest node, in a
|
||||
pool of nodes.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>ARP - Address Resolution Protocol - the protocol by
|
||||
which layer 3 IP addresses are resolved into layer 2,
|
||||
link local addresses.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>BGP - Border Gateway Protocol is a dynamic routing
|
||||
protocol that connects autonomous systems together.
|
||||
Considered the backbone of the Internet, this protocol
|
||||
connects disparate networks together to form a larger
|
||||
network.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Boot Storm - When hundreds of users log in to and
|
||||
consume resources at the same time, causing
|
||||
significant performance degradation. This problem is
|
||||
particularly common in Virtual Desktop Infrastructure
|
||||
(VDI) environments.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Broadcast Domain - The layer 2 segment shared by a
|
||||
group of network connected nodes.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Bursting - The practice of utilizing a secondary
|
||||
environment to elastically build instances on-demand
|
||||
when the primary environment is resource
|
||||
constrained.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Capital Expenditure (CapEx) - A capital expense,
|
||||
capital expenditure, CapEx is an initial cost for
|
||||
building a product, business, or system.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Cascading Failure - A scenario where a single
|
||||
failure in a system creates a cascading effect, where
|
||||
other systems fail as load is transferred from the
|
||||
failing system.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>CDN - Content delivery network - a specialized
|
||||
network that is used to distribute content to clients,
|
||||
typically located close to the client for increased
|
||||
performance.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Cells - An OpenStack Compute (Nova) feature, where a
|
||||
compute deployment can be split into smaller clusters
|
||||
or cells with their own queue and database for
|
||||
performance and scalability, while still providing a
|
||||
single API endpoint.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>CI/CD - Continuous Integration / Continuous
|
||||
Deployment, a methodology where software is
|
||||
continually built and unit tests run for each change
|
||||
that is merged, or proposed for merge. Continuous
|
||||
Deployment is a software development methodology where
|
||||
changes are deployed into production as they are
|
||||
merged into source control, rather than being
|
||||
collected into a release and deployed at regular
|
||||
intervals</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Cloud Broker - A cloud broker is a third-party
|
||||
individual or business that acts as an intermediary
|
||||
between the purchaser of a cloud computing service and
|
||||
the sellers of that service. In general, a broker is
|
||||
someone who acts as an intermediary between two or
|
||||
more parties during negotiations.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Cloud Consumer - User that consumes cloud instances,
|
||||
storage, or other resources in a cloud environment.
|
||||
This user interacts with OpenStack or other cloud
|
||||
management tools.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Cloud Management Platform (CMP) - Products that
|
||||
provide a common interface to manage multiple cloud
|
||||
environments or platforms.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Connection Broker - In desktop virtualization, a
|
||||
connection broker is a software program that allows
|
||||
the end-user to connect to an available
|
||||
desktop.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Direct Attached Storage (DAS) - Data storage that is
|
||||
directly connected to a machine.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>DefCore - DefCore sets base requirements by defining
|
||||
capabilities, code and must-pass tests for all
|
||||
OpenStack products. This definition uses community
|
||||
resources and involvement to drive interoperability by
|
||||
creating the minimum standards for products labeled
|
||||
"OpenStack." See
|
||||
https://wiki.openstack.org/wiki/Governance/CoreDefinition
|
||||
for more information.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Desktop as a Service (DaaS) - A platform that
|
||||
provides a suite of desktop environments that users
|
||||
may log in to to receive a desktop experience from any
|
||||
location. This may provide general use, development,
|
||||
or even homogenous testing environments.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Direct Server Return - A technique in load balancing
|
||||
where an initial request is routed through a load
|
||||
balancer, and the reply is sent from the responding
|
||||
node directly to the requester.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Denial of Service (DoS) - In computing, a
|
||||
denial-of-service or distributed denial-of-service
|
||||
attack is an attempt to make a machine or network
|
||||
resource unavailable to its intended users.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Distributed Replicated Block Device (DRBD) - The
|
||||
Distributed Replicated Block Device (DRBD) is a
|
||||
distributed replicated storage system for the Linux
|
||||
platform.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Differentiated Service Code Point (DSCP) - Defined
|
||||
in RFC 2474, this field in IPv4 and IPv6 headers is
|
||||
used to define classes of network traffic, for quality
|
||||
of service purposes.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>External Border Gateway Protocol (eBGP) - External
|
||||
Border Gateway Protocol describes a specific
|
||||
implementation of BGP designed for inter-autonomous
|
||||
system communication</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Elastic IP - An Amazon Web Services concept, which
|
||||
is an IP address that can be dynamically allocated and
|
||||
re-assigned to running instances on the fly. The
|
||||
OpenStack equivalent is a Floating IP.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Encapsulation - The practice of placing one packet
|
||||
type within another for the purposes of abstracting or
|
||||
securing data. Examples include GRE, MPLS, or
|
||||
IPSEC.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>External Cloud - A cloud environment that exists
|
||||
outside of the control of an organization. Referred to
|
||||
for hybrid cloud to indicate a public cloud or an
|
||||
off-site hosted cloud.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Federated Cloud - A federated cloud describes a
|
||||
multiple sets of cloud resources, for example compute
|
||||
or storage, that are managed by a centralized
|
||||
endpoint.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Flow - A series of packets that are stateful in
|
||||
nature and represent a session. Usually represented by
|
||||
a TCP stream, but can also indicate other packet types
|
||||
that when combined comprise a connection between two
|
||||
points.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Golden Image - An operating system image that
|
||||
contains a set of pre-installed software packages and
|
||||
configurations. This may be used to build standardized
|
||||
instances that have the same base set of configuration
|
||||
to improve mean time to functional application</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Graphics Processing Unit (GPU) - A single chip
|
||||
processor with integrated transform, lighting,
|
||||
triangle setup/clipping, and rendering engines that is
|
||||
capable of processing a minimum of 10 million polygons
|
||||
per second. Traditional uses are any compute problem
|
||||
that can be represented as a vector or matrix
|
||||
operation.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Hadoop Distributed File System (HDFS) - A
|
||||
distributed file-system that stores data on commodity
|
||||
machines, providing very high aggregate bandwidth
|
||||
across the cluster.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High Availability (HA) - High availability system
|
||||
design approach and associated service implementation
|
||||
that ensures a prearranged level of operational
|
||||
performance will be met during a contractual
|
||||
measurement period.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High Performance Computing (HPC) - Also known as
|
||||
distributed computing - used for computation intensive
|
||||
processes run on a large number of instances</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Hierarchical Storage Management (HSM) - Hierarchical
|
||||
storage management is a data storage technique, which
|
||||
automatically moves data between high-cost and
|
||||
low-cost storage media</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Hot Standby Router Protocol (HSRP) - Hot Standby
|
||||
Router Protocol is a Cisco proprietary redundancy
|
||||
protocol for establishing a fault-tolerant default
|
||||
gateway, and has been described in detail in RFC
|
||||
2281.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Hybrid Cloud - Hybrid cloud is a composition of two
|
||||
or more clouds (private, community or public) that
|
||||
remain distinct entities but are bound together,
|
||||
offering the benefits of multiple deployment models.
|
||||
Hybrid cloud can also mean the ability to connect
|
||||
colocation, managed and/or dedicated services with
|
||||
cloud resources.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Interior Border Gateway Protocol (iBGP) - Interior
|
||||
Border Gateway Protocol is a an interior gateway
|
||||
protocol designed to exchange routing and reachability
|
||||
information within autonomous systems.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Interior Gateway Protocol (IGP) - An Interior
|
||||
Gateway Protocol is a type of protocol used for
|
||||
exchanging routing information between gateways
|
||||
(commonly routers) within an Autonomous System (for
|
||||
example, a system of corporate local area networks).
|
||||
This routing information can then be used to route
|
||||
network-level protocols like IP.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Input/Output Operations Per Second (IOPS) - A common
|
||||
performance measurement used to benchmark computer
|
||||
storage devices like hard disk drives, solid state
|
||||
drives, and storage area networks.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>jClouds - An open source multi-cloud toolkit for the
|
||||
Java platform that gives you the freedom to create
|
||||
applications that are portable across clouds while
|
||||
giving you full control to use cloud-specific
|
||||
features.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Jitter - Is the deviation from true periodicity of a
|
||||
presumed periodic signal in electronics and
|
||||
telecommunications, often in relation to a reference
|
||||
clock source.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Jumbo Frame - Ethernet frames with more than 1500
|
||||
bytes of payload.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Kernel-based Virtual Machine (KVM) - A full
|
||||
virtualization solution for Linux on x86 hardware
|
||||
containing virtualization extensions (Intel VT or
|
||||
AMD-V). It consists of a loadable kernel module, that
|
||||
provides the core virtualization infrastructure and a
|
||||
processor specific module.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>LAG - Link aggregation group is a term to describe
|
||||
various methods of combining (aggregating) multiple
|
||||
network connections in parallel into a group to
|
||||
increase throughput beyond what a single connection
|
||||
could sustain, and to provide redundancy in case one
|
||||
of the links fail.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Layer 2 - The data link layer provides a reliable
|
||||
link between two directly connected nodes, by
|
||||
detecting and possibly correcting errors that may
|
||||
occur in the physical layer.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Layer 3 - The network layer provides the functional
|
||||
and procedural means of transferring variable length
|
||||
data sequences (called datagrams) from one node to
|
||||
another connected to the same network.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Legacy System - An old method, technology, computer
|
||||
system, or application program that is considered
|
||||
outdated.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Looking Glass - A tool that provides information on
|
||||
backbone routing and network efficiency.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Microsoft Azure - A cloud computing platform and
|
||||
infrastructure, created by Microsoft, for building,
|
||||
deploying and managing applications and services
|
||||
through a global network of Microsoft-managed
|
||||
datacenters.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>MongoDB - A cross-platform document-oriented
|
||||
database. Classified as a NoSQL database, MongoDB
|
||||
eschews the traditional table-based relational
|
||||
database structure in favor of JSON-like documents
|
||||
with dynamic schemas.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Mean Time Before Failures (MTBF) - Mean time before
|
||||
failures is the predicted elapsed time before inherent
|
||||
failures of a system during operation. MTBF can be
|
||||
calculated as the arithmetic mean (average) time
|
||||
between failures of a system.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Maximum Transmission Unit (MTU) - The maximum
|
||||
transmission unit of a communications protocol of a
|
||||
layer is the size (in bytes) of the largest protocol
|
||||
data unit that the layer can pass onwards.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>NAT64 - NAT64 is a mechanism to allow IPv6 hosts to
|
||||
communicate with IPv4 servers. The NAT64 server is the
|
||||
endpoint for at least one IPv4 address and an IPv6
|
||||
network segment of 32-bits.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network Functions Virtualization (NFV) - Network
|
||||
Functions Virtualization is a network architecture
|
||||
concept that proposes using IT virtualization related
|
||||
technologies, to virtualize entire classes of network
|
||||
node functions into building blocks that may be
|
||||
connected, or chained, together to create
|
||||
communication services.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>NoSQL - A NoSQL or Not Only SQL database provides a
|
||||
mechanism for storage and retrieval of data that is
|
||||
modeled in means other than the tabular relations used
|
||||
in relational databases.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Open vSwitch - Open vSwitch is a production quality,
|
||||
multilayer virtual switch licensed under the open
|
||||
source Apache 2.0 license. It is designed to enable
|
||||
massive network automation through programmatic
|
||||
extension, while still supporting standard management
|
||||
interfaces and protocols (e.g. NetFlow, sFlow, SPAN,
|
||||
RSPAN, CLI, LACP, 802.1ag).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Operational Expenditure (OPEX) - An operating
|
||||
expense, operating expenditure, operational expense,
|
||||
operational expenditure or OPEX is an ongoing cost for
|
||||
running a product, business, or system.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Original Design Manufacturers (ODM) - Original
|
||||
Design Manufacturers, a company which designs and
|
||||
manufactures a product which is specified and
|
||||
eventually branded by another firm for sale.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Overlay Network - An overlay network is a computer
|
||||
network which is built on the top of another network.
|
||||
Nodes in the overlay can be thought of as being
|
||||
connected by virtual or logical links, each of which
|
||||
corresponds to a path, perhaps through many physical
|
||||
links, in the underlying network.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Packet Storm - A cause of degraded service or
|
||||
failure that occurs when a network system is
|
||||
overwhelmed by continuous multicast or broadcast
|
||||
traffic.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Platform as a Service (PaaS) - Platform as a Service
|
||||
is a category of cloud computing services that
|
||||
provides a computing platform and a solution stack as
|
||||
a service.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Power Usage Effectiveness (PUE) - Power usage
|
||||
effectiveness is a measure of how efficiently a
|
||||
computer data center uses energy; specifically, how
|
||||
much energy is used by the computing equipment (in
|
||||
contrast to cooling and other overhead).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Quality of Service (QoS) - Quality of Service is the
|
||||
overall performance of a telephony or computer
|
||||
network, particularly the performance seen by the
|
||||
users of the network.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Remote Desktop Host - A server that hosts Remote
|
||||
Applications as session-based desktops. Users can
|
||||
access a Remote Desktop Host server by using the
|
||||
Remote Desktop Connection client.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Renumbering - Network renumbering, the exercise of
|
||||
renumbering a network consists of changing the IP host
|
||||
addresses, and perhaps the network mask, of each
|
||||
device within the network that has an address
|
||||
associated with it.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Rollback - In database technologies, a rollback is
|
||||
an operation which returns the database to some
|
||||
previous state. Rollbacks are important for database
|
||||
integrity, because they mean that the database can be
|
||||
restored to a clean copy even after erroneous
|
||||
operations are performed.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Remote Procedure Call (RPC) - A powerful technique
|
||||
for constructing distributed, client-server based
|
||||
applications. The communicating processes may be on
|
||||
the same system, or they may be on different systems
|
||||
with a network connecting them.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Recovery Point Objective (RPO) - A recovery point
|
||||
objective is defined by business continuity planning.
|
||||
It is the maximum tolerable period in which data might
|
||||
be lost from an IT service due to a major incident.
|
||||
The RPO gives systems designers a limit to work
|
||||
to.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Recovery Time Objective (RTO) - The recovery time
|
||||
objective is the duration of time and a service level
|
||||
within which a business process must be restored after
|
||||
a disaster (or disruption) in order to avoid
|
||||
unacceptable consequences associated with a break in
|
||||
business continuity.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Software Development Kit (SDK) - A software
|
||||
development kit is typically a set of software
|
||||
development tools that allows for the creation of
|
||||
applications for a certain software package, software
|
||||
framework, hardware platform, computer system, video
|
||||
game console, operating system, or similar development
|
||||
platform.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Service Level Agreement (SLA) - A service-level
|
||||
agreement is a part of a service
|
||||
contract[disambiguation needed] where a service is
|
||||
formally defined. In practice, the term SLA is
|
||||
sometimes used to refer to the contracted delivery
|
||||
time (of the service or performance).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Software Development Lifecycle (SDLC) - Software
|
||||
development life cycle - A software development
|
||||
process, also known as a software development
|
||||
life-cycle (SDLC), is a structure imposed on the
|
||||
development of a software product.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Top of Rack Switch (ToR Switch) - A Top of the Rack
|
||||
or (TOR) switch is a small port count switch that sits
|
||||
on the very top or near the top of a Telco rack you
|
||||
see in Datacenters.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Traffic Shaping - Traffic shaping (also known as
|
||||
"packet shaping") is a computer network traffic
|
||||
management technique which delays some or all
|
||||
datagrams to bring them into compliance with a desired
|
||||
traffic profile. Traffic shaping is a form of rate
|
||||
limiting.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Tunneling - Computer networks use a tunneling
|
||||
protocol when one network protocol (the delivery
|
||||
protocol) encapsulates a different payload protocol.
|
||||
By using tunneling one can (for example) carry a
|
||||
payload over an incompatible delivery-network, or
|
||||
provide a secure path through an untrusted
|
||||
network.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Virtual Desktop Infrastructure (VDI) - Virtual
|
||||
Desktop Infrastructure is a desktop-centric service
|
||||
that hosts user desktop environments on remote
|
||||
servers, which are accessed over a network using a
|
||||
remote display protocol. A connection brokering
|
||||
service is used to connect users to their assigned
|
||||
desktop sessions.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Virtual Local Area Networks (VLAN) - In computer
|
||||
networking, a single layer-2 network may be
|
||||
partitioned to create multiple distinct broadcast
|
||||
domains, which are mutually isolated so that packets
|
||||
can only pass between them via one or more routers;
|
||||
such a domain is referred to as a virtual local area
|
||||
network, virtual LAN or VLAN.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Voice over Internet Protocol (VoIP) -
|
||||
Voice-over-Internet Protocol (VoIP) is a methodology
|
||||
and group of technologies for the delivery of voice
|
||||
communications and multimedia sessions over Internet
|
||||
Protocol (IP) networks, such as the Internet.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Virtual Router Redundancy Protocol (VRRP) - The
|
||||
Virtual Router Redundancy Protocol (VRRP) is a
|
||||
computer networking protocol that provides for
|
||||
automatic assignment of available Internet Protocol
|
||||
(IP) routers to participating hosts. This increases
|
||||
the availability and reliability of routing paths via
|
||||
automatic default gateway selections on an IP
|
||||
sub-network.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>VXLAN Tunnel Endpoint (VTEP) - VXLAN Tunnel Endpoint
|
||||
- Used for frame encapsulation. VTEP functionality can
|
||||
be implemented in software such as a virtual switch or
|
||||
in the form a physical switch.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Virtual Extensible Local Area Network (VXLAN) -
|
||||
Virtual Extensible LAN is a network virtualization
|
||||
technology that attempts to ameliorate the scalability
|
||||
problems associated with large cloud computing
|
||||
deployments. It uses a VLAN-like encapsulation
|
||||
technique to encapsulate MAC-based OSI layer 2
|
||||
Ethernet frames within layer 3 UDP packets.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Wide Area Network (WAN) - A wide area network is a
|
||||
network that covers a broad area using leased or
|
||||
private telecommunication lines.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Xen - Xen is a hypervisor using a microkernel
|
||||
design, providing services that allow multiple
|
||||
computer operating systems to execute on the same
|
||||
computer hardware concurrently.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</chapter>
|
17
doc/arch-design/ch_hybrid.xml
Normal file
@ -0,0 +1,17 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="hybrid">
|
||||
<title>General Purpose</title>
|
||||
|
||||
<xi:include href="hybrid/section_introduction_hybrid.xml"/>
|
||||
<xi:include href="hybrid/section_user_requirements_hybrid.xml"/>
|
||||
<xi:include href="hybrid/section_tech_considerations_hybrid.xml"/>
|
||||
<xi:include href="hybrid/section_operational_considerations_hybrid.xml"/>
|
||||
<xi:include href="hybrid/section_architecture_hybrid.xml"/>
|
||||
<xi:include href="hybrid/section_prescriptive_examples_hybrid.xml"/>
|
||||
|
||||
</chapter>
|
||||
|
15
doc/arch-design/ch_introduction.xml
Normal file
@ -0,0 +1,15 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="introduction">
|
||||
<title>Introduction</title>
|
||||
|
||||
<xi:include href="introduction/section_introduction_to_openstack_architecture_design_guide.xml"/>
|
||||
<xi:include href="introduction/section_intended_audience.xml"/>
|
||||
<xi:include href="introduction/section_how_this_book_is_organized.xml"/>
|
||||
<xi:include href="introduction/section_how_this_book_was_written.xml"/>
|
||||
<xi:include href="introduction/section_methodology.xml"/>
|
||||
|
||||
</chapter>
|
14
doc/arch-design/ch_massively_scalable.xml
Normal file
@ -0,0 +1,14 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="massively_scalable">
|
||||
<title>Massively Scalable</title>
|
||||
|
||||
<xi:include href="massively_scalable/section_introduction_massively_scalable.xml"/>
|
||||
<xi:include href="massively_scalable/section_user_requirements_massively_scalable.xml"/>
|
||||
<xi:include href="massively_scalable/section_tech_considerations_massively_scalable.xml"/>
|
||||
<xi:include href="massively_scalable/section_operational_considerations_massively_scalable.xml"/>
|
||||
|
||||
</chapter>
|
16
doc/arch-design/ch_multi_site.xml
Normal file
@ -0,0 +1,16 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="multi_site">
|
||||
<title>Hybrid</title>
|
||||
|
||||
<xi:include href="multi_site/section_introduction_multi_site.xml"/>
|
||||
<xi:include href="multi_site/section_user_requirements_multi_site.xml"/>
|
||||
<xi:include href="multi_site/section_tech_considerations_multi_site.xml"/>
|
||||
<xi:include href="multi_site/section_operational_considerations_multi_site.xml"/>
|
||||
<xi:include href="multi_site/section_architecture_multi_site.xml"/>
|
||||
<xi:include href="multi_site/section_prescriptive_examples_multi_site.xml"/>
|
||||
|
||||
</chapter>
|
16
doc/arch-design/ch_network_focus.xml
Normal file
@ -0,0 +1,16 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="network_focus">
|
||||
<title>Network Focused</title>
|
||||
|
||||
<xi:include href="network_focus/section_introduction_network_focus.xml"/>
|
||||
<xi:include href="network_focus/section_user_requirements_network_focus.xml"/>
|
||||
<xi:include href="network_focus/section_tech_considerations_network_focus.xml"/>
|
||||
<xi:include href="network_focus/section_operational_considerations_network_focus.xml"/>
|
||||
<xi:include href="network_focus/section_architecture_network_focus.xml"/>
|
||||
<xi:include href="network_focus/section_prescriptive_examples_network_focus.xml"/>
|
||||
|
||||
</chapter>
|
77
doc/arch-design/ch_references.xml
Normal file
@ -0,0 +1,77 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-design-references">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>References</title>
|
||||
<para>Data Protection framework of the European Union:
|
||||
http://ec.europa.eu/justice/data-protection/Guidance on Data
|
||||
Protection laws governed by the EU</para>
|
||||
<para>Depletion of IPv4 Addresses:
|
||||
http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/Article
|
||||
describing how IPv4 addresses and the migration to IPv6 is
|
||||
inevitable</para>
|
||||
<para>Ethernet Switch Reliability:
|
||||
http://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
|
||||
Research white paper on Ethernet Switch reliability</para>
|
||||
<para>Financial Industry Regulatory Authority:
|
||||
http://www.finra.org/Industry/Regulation/FINRARules/ Requirements
|
||||
of the Financial Industry Regulatory Authority in the USA</para>
|
||||
<para>Image Service property keys:
|
||||
http://docs.openstack.org/cli-reference/content/chapter_cli-glance-property.html Glance
|
||||
API property keys allows the administrator to attach custom
|
||||
characteristics to images</para>
|
||||
<para>LibGuestFS Documentation: http://libguestfs.orgOfficial
|
||||
LibGuestFS documentation</para>
|
||||
<para>Logging and Monitoring
|
||||
http://docs.openstack.org/openstack-ops/content/logging_monitoring.html Official
|
||||
OpenStack Operations documentation</para>
|
||||
<para>ManageIQ Cloud Management Platform: http://manageiq.org/ An
|
||||
Open Source Cloud Management Platform for managing multiple
|
||||
clouds</para>
|
||||
<para>N-Tron Network Availability:
|
||||
http://www.n-tron.com/pdf/network_availability.pdfResearch
|
||||
white paper on network availability</para>
|
||||
<para>Nested KVM:
|
||||
http://davejingtian.org/2014/03/30/nested-kvm-just-for-funBlog
|
||||
Post on how to nest KVM under KVM.</para>
|
||||
<para>Open Compute Project: http://www.opencompute.org/The Open
|
||||
Compute Project Foundation’s mission is to design and enable the
|
||||
delivery of the most efficient server, storage and data center
|
||||
hardware designs for scalable computing.</para>
|
||||
<para>OpenStack Flavors:
|
||||
http://docs.openstack.org/openstack-ops/content/flavors.htmlOfficial
|
||||
OpenStack documentation</para>
|
||||
<para>OpenStack High Availability Guide:
|
||||
http://docs.openstack.org/high-availability-guide/content/Information
|
||||
on how to provide redundancy for the OpenStack components</para>
|
||||
<para>OpenStack Hypervisor Support
|
||||
Matrix:https://wiki.openstack.org/wiki/HypervisorSupportMatrix
|
||||
Matrix of supported hypervisors and capabilities when used with
|
||||
OpenStack</para>
|
||||
<para>OpenStack Object Store (Swift) Replication Reference:
|
||||
http://docs.openstack.org/developer/swift/replication_network.html
|
||||
Developer documentation of Swift replication</para>
|
||||
<para>OpenStack Operations Guide:
|
||||
http://docs.openstack.org/openstack-ops/The OpenStack Operations
|
||||
Guide provides information on setting up and installing
|
||||
OpenStack</para>
|
||||
<para>OpenStack Security
|
||||
Guide:http://docs.openstack.org/security-guide/The OpenStack
|
||||
Security Guide provides information on securing OpenStack
|
||||
deployments</para>
|
||||
<para>OpenStack Training Marketplace:
|
||||
http://www.openstack.org/marketplace/trainingThe OpenStack Market
|
||||
for training and Vendors providing training on OpenStack.</para>
|
||||
<para>PCI passthrough:
|
||||
https://wiki.openstack.org/wiki/Pci_passthrough#How_to_check_PCI_status_with_PCI_api_paches
|
||||
The PCI api patches extends the servers/os-hypervisor to show PCI
|
||||
information for instance and compute node, and also provides a
|
||||
resource endpoint to show PCI information.</para>
|
||||
<para>TripleO: https://wiki.openstack.org/wiki/TripleOTripleO is a
|
||||
program aimed at installing, upgrading and operating OpenStack
|
||||
clouds using OpenStack's own cloud facilities as the
|
||||
foundation.</para>
|
||||
</chapter>
|
17
doc/arch-design/ch_specialized.xml
Normal file
@ -0,0 +1,17 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="specialized">
|
||||
<title>Specialized Cases</title>
|
||||
|
||||
<xi:include href="specialized/section_introduction_specialized.xml"/>
|
||||
<xi:include href="specialized/section_multi_hypervisor_specialized.xml"/>
|
||||
<xi:include href="specialized/section_networking_specialized.xml"/>
|
||||
<xi:include href="specialized/section_software_defined_networking_specialized.xml"/>
|
||||
<xi:include href="specialized/section_desktop_as_a_service_specialized.xml"/>
|
||||
<xi:include href="specialized/section_openstack_on_openstack_specialized.xml"/>
|
||||
<xi:include href="specialized/section_hardware_specialized.xml"/>
|
||||
|
||||
</chapter>
|
16
doc/arch-design/ch_storage_focus.xml
Normal file
@ -0,0 +1,16 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<chapter xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="storage_focus">
|
||||
<title>Storage Focused</title>
|
||||
|
||||
<xi:include href="storage_focus/section_introduction_storage_focus.xml"/>
|
||||
<xi:include href="storage_focus/section_user_requirements_storage_focus.xml"/>
|
||||
<xi:include href="storage_focus/section_tech_considerations_storage_focus.xml"/>
|
||||
<xi:include href="storage_focus/section_operational_considerations_storage_focus.xml"/>
|
||||
<xi:include href="storage_focus/section_architecture_storage_focus.xml"/>
|
||||
<xi:include href="storage_focus/section_prescriptive_examples_storage_focus.xml"/>
|
||||
|
||||
</chapter>
|
@ -0,0 +1,879 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-design-architecture-hardware">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Architecture</title>
|
||||
<para>The hardware selection covers three areas:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Compute</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Storage</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>In a compute-focused OpenStack cloud the hardware selection
|
||||
must
|
||||
reflect the workloads being compute intensive.
|
||||
Compute-focused is
|
||||
defined as having extreme demands on
|
||||
processor and memory resources.
|
||||
The hardware selection for a
|
||||
compute-focused OpenStack architecture
|
||||
design must reflect
|
||||
this preference for compute-intensive workloads, as
|
||||
these
|
||||
workloads are not storage intensive, nor are they consistently
|
||||
network intensive. The network and storage may be heavily
|
||||
utilized
|
||||
while loading a data set into the computational
|
||||
cluster, but they are
|
||||
not otherwise intensive.
|
||||
</para>
|
||||
<para>Compute (server) hardware must be evaluated against four
|
||||
opposing
|
||||
dimensions:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Server density: A measure of how many servers can
|
||||
fit into a
|
||||
given measure of physical space, such as a
|
||||
rack unit [U].
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Resource capacity: The number of CPU cores, how much
|
||||
RAM, or how
|
||||
much storage a given server will
|
||||
deliver.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Expandability: The number of additional resources
|
||||
that can be
|
||||
added to a server before it has reached
|
||||
its limit.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Cost: The relative purchase price of the hardware
|
||||
weighted
|
||||
against the level of design effort needed to
|
||||
build the system.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>The dimensions need to be weighted against each other to
|
||||
determine the best design for the desired purpose. For
|
||||
example,
|
||||
increasing server density means sacrificing resource
|
||||
capacity or
|
||||
expandability. Increasing resource capacity and
|
||||
expandability can
|
||||
increase cost but decreases server density.
|
||||
Decreasing cost can mean
|
||||
decreasing supportability, server
|
||||
density, resource capacity, and
|
||||
expandability.
|
||||
</para>
|
||||
<para>Selection of hardware for a compute-focused cloud should
|
||||
have an
|
||||
emphasis on server hardware that can offer more CPU
|
||||
sockets, more CPU
|
||||
cores, and more RAM; network connectivity
|
||||
and storage capacity are less
|
||||
critical. The hardware will need
|
||||
to be configured to provide enough
|
||||
network connectivity and
|
||||
storage capacity to meet minimum user
|
||||
requirements, but they
|
||||
are not the primary consideration.
|
||||
</para>
|
||||
<para>Some server hardware form factors are better suited than
|
||||
others,
|
||||
as CPU and RAM capacity have the highest
|
||||
priority.
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Most blade servers can support dual-socket
|
||||
multi-core CPUs. To
|
||||
avoid the limit means selecting
|
||||
"full width" or "full height" blades,
|
||||
which
|
||||
consequently loses server density. As an example,
|
||||
using high
|
||||
density blade servers including HP
|
||||
BladeSystem and Dell PowerEdge
|
||||
M1000e) which support
|
||||
up to 16 servers in only 10 rack units using
|
||||
half-height blades, suddenly decreases the density by
|
||||
50% by
|
||||
selecting full-height blades resulting in only
|
||||
8 servers per 10 rack
|
||||
units.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>1U rack-mounted servers (servers that occupy only a
|
||||
single rack
|
||||
unit) may be able to offer greater server
|
||||
density than a blade server
|
||||
solution. It is possible
|
||||
to place 40 servers in a rack, providing
|
||||
space for the
|
||||
top of rack [ToR] switches, versus 32 "full width" or
|
||||
"full height" blade servers in a rack), but often are
|
||||
limited to
|
||||
dual-socket, multi-core CPU configurations.
|
||||
Note that, as of the
|
||||
Icehouse release, neither HP,
|
||||
IBM, nor Dell offered 1U rack servers
|
||||
with more than 2
|
||||
CPU sockets. To obtain greater than dual-socket
|
||||
support in a 1U rack-mount form factor, customers need
|
||||
to buy their
|
||||
systems from Original Design
|
||||
Manufacturers (ODMs) or second-tier
|
||||
manufacturers.
|
||||
This may cause issues for organizations that have
|
||||
preferred vendor policies or concerns with support and
|
||||
hardware
|
||||
warranties of non-tier 1 vendors.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>2U rack-mounted servers provide quad-socket,
|
||||
multi-core CPU
|
||||
support, but with a corresponding
|
||||
decrease in server density (half
|
||||
the density offered
|
||||
by 1U rack-mounted servers).
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Larger rack-mounted servers, such as 4U servers,
|
||||
often provide
|
||||
even greater CPU capacity, commonly
|
||||
supporting four or even eight CPU
|
||||
sockets. These
|
||||
servers have greater expandability, but such servers
|
||||
have much lower server density and usually greater
|
||||
hardware cost.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>"Sled servers" (rack-mounted servers that support
|
||||
multiple
|
||||
independent servers in a single 2U or 3U
|
||||
enclosure) deliver increased
|
||||
density as compared to
|
||||
typical 1U or 2U rack-mounted servers. For
|
||||
example,
|
||||
many sled servers offer four independent dual-socket
|
||||
nodes in
|
||||
2U for a total of 8 CPU sockets in 2U.
|
||||
However, the dual-socket
|
||||
limitation on individual
|
||||
nodes may not be sufficient to offset their
|
||||
additional
|
||||
cost and configuration complexity.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>The following facts will strongly influence server hardware
|
||||
selection for a compute-focused OpenStack design
|
||||
architecture:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Instance density: In this architecture instance
|
||||
density is
|
||||
considered lower; therefore CPU and RAM
|
||||
over-subscription ratios are
|
||||
also lower. More hosts
|
||||
will be required to support the anticipated
|
||||
scale due
|
||||
to instance density being lower, especially if the
|
||||
design
|
||||
uses dual-socket hardware designs.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Host density: Another option to address the higher host count
|
||||
that might be needed with dual socket designs is to use a quad
|
||||
socket platform. Taking this approach will decrease host
|
||||
density,
|
||||
which increases rack count. This configuration may
|
||||
affect the network
|
||||
requirements, the number of power
|
||||
connections, and possibly impact
|
||||
the cooling
|
||||
requirements.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Power and cooling density: The power and cooling
|
||||
density
|
||||
requirements might be lower than with blade,
|
||||
sled, or 1U server
|
||||
designs because of lower host
|
||||
density (by using 2U, 3U or even 4U
|
||||
server designs).
|
||||
For data centers with older infrastructure, this may
|
||||
be a desirable feature.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Compute-focused OpenStack design architecture server
|
||||
hardware
|
||||
selection results in a "scale up" versus "scale out"
|
||||
decision.
|
||||
Selecting a better solution, smaller number of
|
||||
larger hosts, or a
|
||||
larger number of smaller hosts depends on a
|
||||
combination of factors:
|
||||
cost, power, cooling, physical rack
|
||||
and floor space, support-warranty,
|
||||
and manageability.
|
||||
</para>
|
||||
<section xml:id="storage-hardware-selection">
|
||||
<title>Storage Hardware Selection</title>
|
||||
<para>For a compute-focused OpenStack design architecture, the
|
||||
selection of
|
||||
storage hardware is not critical as it is not primary
|
||||
criteria, however
|
||||
it is still important. There are a number of
|
||||
different factors that a
|
||||
cloud architect must consider:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cost: The overall cost of the solution will play a major role
|
||||
in what storage architecture (and resulting storage hardware) is
|
||||
selected.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Performance: The performance of the solution is also a big
|
||||
role and can be measured by observing the latency of storage I-O
|
||||
requests. In a compute-focused OpenStack cloud, storage latency
|
||||
can
|
||||
be a major consideration. In some compute-intensive
|
||||
workloads,
|
||||
minimizing the delays that the CPU experiences while
|
||||
fetching data
|
||||
from the storage can have a significant impact on
|
||||
the overall
|
||||
performance of the application.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Scalability: This section will refer to the term "scalability"
|
||||
to refer to how well the storage solution performs as it is
|
||||
expanded up to its maximum size. A storage solution that
|
||||
performs
|
||||
well in small configurations but has degrading
|
||||
performance as it
|
||||
expands would not be considered scalable. On
|
||||
the other hand, a
|
||||
solution that continues to perform well at
|
||||
maximum expansion would
|
||||
be considered scalable.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Expandability: Expandability refers to the overall ability of
|
||||
the solution to grow. A storage solution that expands to 50 PB
|
||||
is
|
||||
considered more expandable than a solution that only scales
|
||||
to 10PB.
|
||||
Note that this metric is related to, but different
|
||||
from,
|
||||
scalability, which is a measure of the solution's
|
||||
performance as it
|
||||
expands.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>For a compute-focused OpenStack cloud, latency of storage is a
|
||||
major
|
||||
consideration. Using solid-state disks (SSDs) to minimize
|
||||
latency for
|
||||
instance storage and reduce CPU delays caused by waiting
|
||||
for the storage
|
||||
will increase performance. Consider using RAID
|
||||
controller cards in
|
||||
compute hosts to improve the performance of the
|
||||
underlying disk
|
||||
subsystem.
|
||||
</para>
|
||||
<para>The selection of storage architecture, and the corresponding
|
||||
storage
|
||||
hardware (if there is the option), is determined by evaluating
|
||||
possible
|
||||
solutions against the key factors listed above. This will
|
||||
determine
|
||||
if a
|
||||
scale-out solution (such as Ceph, GlusterFS, or similar)
|
||||
should be used,
|
||||
or if a single, highly expandable and scalable
|
||||
centralized storage
|
||||
array
|
||||
would be a better choice. If a centralized
|
||||
storage array is the right
|
||||
fit for the requirements, the hardware will
|
||||
be determined by the array
|
||||
vendor. It is also possible to build a
|
||||
storage array using commodity
|
||||
hardware with Open Source software, but
|
||||
there needs to be access to
|
||||
people with expertise to build such a
|
||||
system. Conversely, a scale-out
|
||||
storage solution that uses
|
||||
direct-attached storage (DAS) in the
|
||||
servers
|
||||
may be an appropriate
|
||||
choice. If so, then the server hardware needs to
|
||||
be configured to
|
||||
support the storage solution.
|
||||
</para>
|
||||
<para>The following lists some of the potential impacts that may
|
||||
affect a
|
||||
particular storage architecture, and the corresponding
|
||||
storage hardware,
|
||||
of a compute-focused OpenStack cloud:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Connectivity: Based on the storage solution selected, ensure
|
||||
the connectivity matches the storage solution requirements. If a
|
||||
centralized storage array is selected, it is important to
|
||||
determine
|
||||
how the hypervisors will connect to the storage array.
|
||||
Connectivity
|
||||
could affect latency and thus performance, so check
|
||||
that the network
|
||||
characteristics will minimize latency to boost
|
||||
the overall
|
||||
performance of the design.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Latency: Determine if the use case will have consistent or
|
||||
highly variable latency.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Throughput: To improve overall performance, make sure that the
|
||||
storage solution throughout is optimized. While it is not likely
|
||||
that a compute-focused cloud will have major data I-O to and
|
||||
from storage, this is an important factor to consider.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Server Hardware: If the solution uses DAS, this impacts, and
|
||||
is not limited to, the server hardware choice that will ripple
|
||||
into
|
||||
host density, instance density, power density,
|
||||
OS-hypervisor, and
|
||||
management tools.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Where instances need to be made highly available, or they need
|
||||
to be
|
||||
capable of migration between hosts, use of a shared storage
|
||||
file-system
|
||||
to store instance ephemeral data should be employed to
|
||||
ensure that
|
||||
compute services can run uninterrupted in the event of a
|
||||
node
|
||||
failure.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="selecting-networking-hardware-arch">
|
||||
<title>Selecting Networking Hardware</title>
|
||||
<para>Some of the key considerations that should be included in
|
||||
the
|
||||
selection of networking hardware include:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Port count: The design will require networking
|
||||
hardware that
|
||||
has the requisite port count.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Port density: The network design will be affected by
|
||||
the
|
||||
physical space that is required to provide the
|
||||
requisite port count.
|
||||
A switch that can provide 48 10
|
||||
GbE ports in 1U has a much higher
|
||||
port density than a
|
||||
switch that provides 24 10 GbE ports in 2U. A
|
||||
higher
|
||||
port density is preferred, as it leaves more rack
|
||||
space for
|
||||
compute or storage components that might be
|
||||
required by the design.
|
||||
This also leads into concerns
|
||||
about fault domains and power density
|
||||
that must also
|
||||
be considered. Higher density switches are more
|
||||
expensive and should also be considered, as it is
|
||||
important not to
|
||||
over design the network if it is not
|
||||
required.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Port speed: The networking hardware must support the
|
||||
proposed
|
||||
network speed, for example: 1 GbE, 10 GbE, or
|
||||
40 GbE (or even 100
|
||||
GbE).
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Redundancy: The level of network hardware redundancy
|
||||
required
|
||||
is influenced by the user requirements for
|
||||
high availability and
|
||||
cost considerations. Network
|
||||
redundancy can be achieved by adding
|
||||
redundant power
|
||||
supplies or paired switches. If this is a
|
||||
requirement,
|
||||
the hardware will need to support this configuration.
|
||||
User requirements will determine if a completely
|
||||
redundant network
|
||||
infrastructure is required.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Power requirements: Ensure that the physical data
|
||||
center
|
||||
provides the necessary power for the selected
|
||||
network hardware. This
|
||||
is not an issue for top of rack
|
||||
(ToR) switches, but may be an issue
|
||||
for spine switches
|
||||
in a leaf and spine fabric, or end of row (EoR)
|
||||
switches.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>It is important to first understand additional factors as
|
||||
well as
|
||||
the use case because these additional factors heavily
|
||||
influence the
|
||||
cloud network architecture. Once these key
|
||||
considerations have been
|
||||
decided, the proper network can be
|
||||
designed to best serve the
|
||||
workloads being placed in the
|
||||
cloud.
|
||||
</para>
|
||||
<para>It is recommended that the network architecture is designed
|
||||
using a scalable network model that makes it easy to add
|
||||
capacity and
|
||||
bandwidth. A good example of such a model is the
|
||||
leaf-spline model. In
|
||||
this type of network design, it is
|
||||
possible to easily add additional
|
||||
bandwidth as well as scale
|
||||
out to additional racks of gear. It is
|
||||
important to select
|
||||
network hardware that will support the required
|
||||
port count,
|
||||
port speed and port density while also allowing for future
|
||||
growth as workload demands increase. It is also important to
|
||||
evaluate
|
||||
where in the network architecture it is valuable to
|
||||
provide
|
||||
redundancy. Increased network availability and
|
||||
redundancy comes at a
|
||||
cost, therefore it is recommended to
|
||||
weigh the cost versus the benefit
|
||||
gained from utilizing and
|
||||
deploying redundant network switches and
|
||||
using bonded
|
||||
interfaces at the host level.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="software-selection-arch">
|
||||
<title>Software Selection</title>
|
||||
<para>Selecting software to be included in a compute-focused
|
||||
OpenStack
|
||||
architecture design must include three main
|
||||
areas:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Operating system (OS) and hypervisor</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack components</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Supplemental software</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Design decisions made in each of these areas impact the rest
|
||||
of
|
||||
the OpenStack architecture design.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="os-and-hypervisor-arch">
|
||||
<title>OS and Hypervisor</title>
|
||||
<para>The selection of OS and hypervisor has a significant impact
|
||||
on
|
||||
the end point design. Selecting a particular operating
|
||||
system and
|
||||
hypervisor could affect server hardware selection.
|
||||
For example, a
|
||||
selected combination needs to be supported on
|
||||
the selected hardware.
|
||||
Ensuring the storage hardware selection
|
||||
and topology supports the
|
||||
selected operating system and
|
||||
hypervisor combination should also be
|
||||
considered.
|
||||
Additionally, make sure that the networking hardware
|
||||
selection
|
||||
and topology will work with the chosen operating system and
|
||||
hypervisor combination. For example, if the design uses Link
|
||||
Aggregation Control Protocol (LACP), the hypervisor needs to
|
||||
support
|
||||
it.
|
||||
</para>
|
||||
<para>Some areas that could be impacted by the selection of OS and
|
||||
hypervisor include:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cost: Selecting a commercially supported hypervisor
|
||||
such as
|
||||
Microsoft Hyper-V will result in a different
|
||||
cost model rather than
|
||||
chosen a community-supported
|
||||
open source hypervisor like Kinstance
|
||||
or Xen. Even
|
||||
within the ranks of open source solutions, choosing
|
||||
Ubuntu over Red Hat (or vice versa) will have an
|
||||
impact on cost due
|
||||
to support contracts. On the other
|
||||
hand, business or application
|
||||
requirements might
|
||||
dictate a specific or commercially supported
|
||||
hypervisor.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Supportability: Depending on the selected
|
||||
hypervisor, the staff
|
||||
should have the appropriate
|
||||
training and knowledge to support the
|
||||
selected OS and
|
||||
hypervisor combination. If they do not, training
|
||||
will
|
||||
need to be provided which could have a cost impact on
|
||||
the
|
||||
design.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Management tools: The management tools used for
|
||||
Ubuntu and
|
||||
Kinstance differ from the management tools
|
||||
for VMware vSphere.
|
||||
Although both OS and hypervisor
|
||||
combinations are supported by
|
||||
OpenStack, there will be
|
||||
very different impacts to the rest of the
|
||||
design as a
|
||||
result of the selection of one combination versus the
|
||||
other.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Scale and performance: Ensure that selected OS and
|
||||
hypervisor
|
||||
combinations meet the appropriate scale and
|
||||
performance
|
||||
requirements. The chosen architecture will
|
||||
need to meet the targeted
|
||||
instance-host ratios with
|
||||
the selected OS-hypervisor combination.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Security: Ensure that the design can accommodate the
|
||||
regular
|
||||
periodic installation of application security
|
||||
patches while
|
||||
maintaining the required workloads. The
|
||||
frequency of security
|
||||
patches for the proposed OS -
|
||||
hypervisor combination will have an
|
||||
impact on
|
||||
performance and the patch installation process could
|
||||
affect maintenance windows.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Supported features: Determine what features of
|
||||
OpenStack are
|
||||
required. This will often determine the
|
||||
selection of the
|
||||
OS-hypervisor combination. Certain
|
||||
features are only available with
|
||||
specific OSs or
|
||||
hypervisors. For example, if certain features are
|
||||
not
|
||||
available, the design might need to be modified to
|
||||
meet the user
|
||||
requirements.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Interoperability: Consideration should be given to
|
||||
the ability
|
||||
of the selected OS-hypervisor combination
|
||||
to interoperate or
|
||||
co-exist with other OS-hypervisors,
|
||||
or other software solutions in
|
||||
the overall design (if
|
||||
required). Operational and troubleshooting
|
||||
tools for
|
||||
one OS-hypervisor combination may differ from the
|
||||
tools
|
||||
used for another OS-hypervisor combination and,
|
||||
as a result, the
|
||||
design will need to address if the
|
||||
two sets of tools need to
|
||||
interoperate.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
<section xml:id="openstack-components-arch">
|
||||
<title>OpenStack Components</title>
|
||||
<para>The selection of which OpenStack components will actually be
|
||||
included in the design and deployed has significant impact.
|
||||
There are
|
||||
certain components that will always be present,
|
||||
(Nova and Glance, for
|
||||
example) yet there are other services
|
||||
that might not need to be
|
||||
present. For example, a certain
|
||||
design may not require OpenStack Heat.
|
||||
Omitting Heat would not
|
||||
typically have a significant impact on the
|
||||
overall design.
|
||||
However, if the architecture uses a replacement for
|
||||
OpenStack
|
||||
Swift for its storage component, this could potentially have
|
||||
significant impacts on the rest of the design.
|
||||
</para>
|
||||
<para>For a compute-focused OpenStack design architecture, the
|
||||
following components would be used:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Identity (Keystone)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Dashboard (Horizon)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Compute (Nova)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Object Storage (Swift, Ceph or a commercial
|
||||
solution)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Image (Glance)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Networking (Neutron)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Orchestration (Heat)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>OpenStack Block Storage would potentially not be
|
||||
incorporated
|
||||
into a compute-focused design due to persistent
|
||||
block storage not
|
||||
being a significant requirement for the
|
||||
types of workloads that would
|
||||
be deployed onto instances
|
||||
running in a compute-focused cloud.
|
||||
However, there may be some
|
||||
situations where the need for performance
|
||||
dictates that a
|
||||
block storage component be used to improve data I-O.
|
||||
</para>
|
||||
<para>The exclusion of certain OpenStack components might also
|
||||
limit or
|
||||
constrain the functionality of other components. If a
|
||||
design opts to
|
||||
include Heat but exclude Ceilometer, then the
|
||||
design will not be able
|
||||
to take advantage of Heat's
|
||||
auto scaling functionality (which relies
|
||||
on information from
|
||||
Ceilometer). Due to the fact that you can use Heat
|
||||
to spin up
|
||||
a large number of instances to perform the
|
||||
compute-intensive
|
||||
processing, including Heat in a compute-focused
|
||||
architecture
|
||||
design is strongly recommended.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="supplemental-software">
|
||||
<title>Supplemental Software</title>
|
||||
<para>While OpenStack is a fairly complete collection of software
|
||||
projects for building a platform for cloud services, there are
|
||||
invariably additional pieces of software that might need to be
|
||||
added
|
||||
to any given OpenStack design.
|
||||
</para>
|
||||
<section xml:id="networking-software-arch">
|
||||
<title>Networking Software</title>
|
||||
<para>OpenStack Networking provides a wide variety of networking
|
||||
services for instances. There are many additional networking
|
||||
software packages that might be useful to manage the OpenStack
|
||||
components themselves. Some examples include software to
|
||||
provide load
|
||||
balancing, network redundancy protocols, and
|
||||
routing daemons. Some of
|
||||
these software packages are described
|
||||
in more detail in the OpenStack
|
||||
HA Guide (refer to Chapter 8
|
||||
of the OpenStack High Availability
|
||||
Guide).
|
||||
</para>
|
||||
<para>For a compute-focused OpenStack cloud, the OpenStack
|
||||
infrastructure components will need to be highly available. If
|
||||
the
|
||||
design does not include hardware load balancing,
|
||||
networking software
|
||||
packages like HAProxy will need to be
|
||||
included.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="management-software-arch">
|
||||
<title>Management Software</title>
|
||||
<para>The selected supplemental software solution impacts and
|
||||
affects
|
||||
the overall OpenStack cloud design. This includes
|
||||
software for
|
||||
providing clustering, logging, monitoring and
|
||||
alerting.
|
||||
</para>
|
||||
<para>Inclusion of clustering Software, such as Corosync or
|
||||
Pacemaker, is determined primarily by the availability design
|
||||
requirements. Therefore, the impact of including (or not
|
||||
including)
|
||||
these software packages is primarily determined by
|
||||
the availability
|
||||
of the cloud infrastructure and the
|
||||
complexity of supporting the
|
||||
configuration after it is
|
||||
deployed. The OpenStack High Availability
|
||||
Guide provides more
|
||||
details on the installation and configuration of
|
||||
Corosync and
|
||||
Pacemaker, should these packages need to be included in
|
||||
the
|
||||
design.
|
||||
</para>
|
||||
<para>Requirements for logging, monitoring, and alerting are
|
||||
determined by operational considerations. Each of these
|
||||
sub-categories includes a number of various options. For
|
||||
example, in
|
||||
the logging sub-category one might consider
|
||||
Logstash, Splunk, Log
|
||||
Insight, or some other log
|
||||
aggregation-consolidation tool. Logs
|
||||
should be stored in a
|
||||
centralized location to make it easier to
|
||||
perform analytics
|
||||
against the data. Log data analytics engines can
|
||||
also provide
|
||||
automation and issue notification by providing a
|
||||
mechanism to
|
||||
both alert and automatically attempt to remediate some
|
||||
of the
|
||||
more commonly known issues.
|
||||
</para>
|
||||
<para>If any of these software packages are needed, then the
|
||||
design
|
||||
must account for the additional resource consumption
|
||||
(CPU, RAM,
|
||||
storage, and network bandwidth for a log
|
||||
aggregation solution, for
|
||||
example). Some other potential
|
||||
design impacts include:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OS - Hypervisor combination: Ensure that the
|
||||
selected logging,
|
||||
monitoring, or alerting tools
|
||||
support the proposed OS-hypervisor
|
||||
combination.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network hardware: The network hardware selection
|
||||
needs to be
|
||||
supported by the logging, monitoring, and
|
||||
alerting software.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
<section xml:id="database-software-arch">
|
||||
<title>Database Software</title>
|
||||
<para>A large majority of the OpenStack components require access
|
||||
to
|
||||
back-end database services to store state and configuration
|
||||
information. Selection of an appropriate back-end database
|
||||
that will
|
||||
satisfy the availability and fault tolerance
|
||||
requirements of the
|
||||
OpenStack services is required. OpenStack
|
||||
services support connecting
|
||||
to any database that is supported
|
||||
by the sqlalchemy Python drivers,
|
||||
however most common database
|
||||
deployments make use of mySQL or some
|
||||
variation of it. It is
|
||||
recommended that the database which provides
|
||||
back-end service
|
||||
within a general purpose cloud be made highly
|
||||
available using
|
||||
an available technology which can accomplish that
|
||||
goal. Some
|
||||
of the more common software solutions used include Galera,
|
||||
MariaDB and mySQL with multi-master replication.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
@ -0,0 +1,49 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-intro-compute-focus">
|
||||
<title>Introduction</title>
|
||||
<para>A compute-focused cloud is a specialized subset of the
|
||||
general purpose OpenStack cloud architecture. Unlike the
|
||||
general purpose OpenStack architecture, which is built to host
|
||||
a wide variety of workloads and applications and does not
|
||||
heavily tax any particular computing aspect, a compute-focused
|
||||
cloud is built and designed specifically to support compute
|
||||
intensive workloads. As such, the design must be specifically
|
||||
tailored to support hosting compute intensive workloads.
|
||||
Compute intensive workloads may be CPU intensive, RAM
|
||||
intensive, or both. However, they are not typically storage
|
||||
intensive or network intensive. Compute-focused workloads may
|
||||
include the following use cases:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>High performance computing (HPC)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Big data analytics using Hadoop or other distributed
|
||||
data stores</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Continuous integration/continuous deployment
|
||||
(CI/CD)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Platform-as-a-Service (PaaS)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Signal processing for Network Function
|
||||
Virtualization (NFV)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Based on the use case requirements, such clouds might need
|
||||
to provide additional services such as a virtual machine disk
|
||||
library, file or object storage, firewalls, load balancers, IP
|
||||
addresses, and network connectivity in the form of overlays or
|
||||
virtual Local Area Networks (VLANs). A compute-focused
|
||||
OpenStack cloud will not typically use raw block storage
|
||||
services since the applications hosted on a compute-focused
|
||||
OpenStack cloud generally do not need persistent block
|
||||
storage.</para>
|
||||
</section>
|
@ -0,0 +1,117 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="operational-considerations-compute-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Operational Considerations</title>
|
||||
<para>Operationally, there are a number of considerations that
|
||||
affect the design of compute-focused OpenStack clouds. Some
|
||||
examples might include enforcing strict API availability
|
||||
requirements, understanding and dealing with failure
|
||||
scenarios, or managing host maintenance schedules.</para>
|
||||
<para>Service-level agreements (SLAs) are a contractual obligation
|
||||
that gives assurances around availability of a provided
|
||||
service. As such, factoring in promises of availability
|
||||
implies a certain level of redundancy and resiliency when
|
||||
designing an OpenStack cloud.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Guarantees for API availability imply multiple
|
||||
infrastructure services combined with appropriately
|
||||
high available load balancers.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network uptime guarantees will affect the switch
|
||||
design and might require redundant switching and
|
||||
power.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network security policy requirements need to be
|
||||
factored in to deployments.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Knowing when and where to implement redundancy and high
|
||||
availability (HA) is directly affected by terms contained in
|
||||
any associated SLA, if one is present.</para>
|
||||
<section xml:id="support-and-maintainability-compute-focus">
|
||||
<title>Support and Maintainability</title>
|
||||
<para>OpenStack cloud management requires operations staff to be
|
||||
able to understand and comprehend design architecture content
|
||||
on some level. The level of skills and the level of separation
|
||||
of the operations and engineering staff is dependent on the
|
||||
size and purpose of the installation. A large cloud service
|
||||
provider or a telecom provider is more inclined to be managed
|
||||
by specially trained dedicated operations organization. A
|
||||
smaller implementation is more inclined to rely on a smaller
|
||||
support staff that might need to take on the combined
|
||||
engineering, design and operations functions.</para>
|
||||
<para>Maintaining OpenStack installations require a variety of
|
||||
technical skills. Some of these skills may include the ability
|
||||
to debug Python log output to a basic level as well as an
|
||||
understanding of networking concepts.</para>
|
||||
<para>Consider incorporating features into the architecture and
|
||||
design that reduce the operational burden. Some examples
|
||||
include automating some of the operations functions, or
|
||||
alternatively exploring the possibility of using a third party
|
||||
management company with special expertise in managing
|
||||
OpenStack deployments.</para></section>
|
||||
<section xml:id="montioring-compute-focus"><title>Monitoring</title>
|
||||
<para>Like any other infrastructure deployment, OpenStack clouds
|
||||
need an appropriate monitoring platform to ensure errors are
|
||||
caught and managed appropriately. Consider leveraging any
|
||||
existing monitoring system to see if it will be able to
|
||||
effectively monitor an OpenStack environment. While there are
|
||||
many aspects that need to be monitored, specific metrics that
|
||||
are critically important to capture include image disk
|
||||
utilization, or response time to the Compute API.</para></section>
|
||||
<section xml:id="expected-unexpected-server-downtime"><title>Expected and unexpected server downtime</title>
|
||||
<para>At some point, servers will fail. The SLAs in place affect
|
||||
how the design has to address recovery time. Recovery of a
|
||||
failed host may mean restoring instances from a snapshot, or
|
||||
respawning that instance on another available host, which then
|
||||
has consequences on the overall application design running on
|
||||
the OpenStack cloud.</para>
|
||||
<para>It might be acceptable to design a compute-focused cloud
|
||||
without the ability to migrate instances from one host to
|
||||
another, because the expectation is that the application
|
||||
developer must handle failure within the application itself.
|
||||
Conversely, a compute-focused cloud might be provisioned to
|
||||
provide extra resilience as a requirement of that business. In
|
||||
this scenario, it is expected that extra supporting services
|
||||
are also deployed, such as shared storage attached to hosts to
|
||||
aid in recovery and resiliency of services in order to meet
|
||||
strict SLAs.</para></section>
|
||||
<section xml:id="capacity-planning-operational"><title>Capacity Planning</title>
|
||||
<para>Adding extra capacity to an OpenStack cloud is an easy
|
||||
horizontally scaling process, as consistently configured nodes
|
||||
automatically attach to an OpenStack cloud. Be mindful,
|
||||
however, of any additional work to place the nodes into
|
||||
appropriate Availability Zones and Host Aggregates if
|
||||
necessary. The same (or very similar) CPUs are recommended
|
||||
when adding extra nodes to the environment because it reduces
|
||||
the chance to break any live-migration features if they are
|
||||
present. Scaling out hypervisor hosts also has a direct effect
|
||||
on network and other data center resources, so factor in this
|
||||
increase when reaching rack capacity or when extra network
|
||||
switches are required.</para>
|
||||
<para>Compute hosts can also have internal components changed to
|
||||
account for increases in demand, a process also known as
|
||||
vertical scaling. Swapping a CPU for one with more cores, or
|
||||
increasing the memory in a server, can help add extra needed
|
||||
capacity depending on whether the running applications are
|
||||
more CPU intensive or memory based (as would be expected in a
|
||||
compute-focused OpenStack cloud).</para>
|
||||
<para>Another option is to assess the average workloads and
|
||||
increase the number of instances that can run within the
|
||||
compute environment by adjusting the overcommit ratio. While
|
||||
only appropriate in some environments, it's important to
|
||||
remember that changing the CPU overcommit ratio can have a
|
||||
detrimental effect and cause a potential increase in noisy
|
||||
neighbor. The added risk of increasing the overcommit ratio is
|
||||
more instances will fail when a compute host fails. In a
|
||||
compute-focused OpenStack design architecture, increasing the
|
||||
CPU overcommit ratio increases the potential for noisy
|
||||
neighbor issues and is not recommended.</para></section>
|
||||
</section>
|
@ -0,0 +1,128 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="prescriptive-example-compute-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Prescriptive Examples</title>
|
||||
<para>The Conseil Européen pour la Recherche Nucléaire (CERN),
|
||||
also known as the European Organization for, Nuclear Research
|
||||
provides particle accelerators and other infrastructure for
|
||||
high-energy physics research.</para>
|
||||
<para>As of 2011 CERN operated these two compute centers in Europe
|
||||
with plans to add a third:</para>
|
||||
<para>To support a growing number of compute heavy users of
|
||||
experiments related to the Large Hadron Collider (LHC) CERN
|
||||
ultimately elected to deploy an OpenStack cloud using
|
||||
Scientific Linux and RDO. This effort aimed to simplify the
|
||||
management of the center's compute resources with a view to
|
||||
doubling compute capacity through the addition of an
|
||||
additional data center in 2013 while maintaining the same
|
||||
levels of compute staff.</para>
|
||||
<para>The CERN solution uses Cells for segregation of compute
|
||||
resources and to transparently scale between different data
|
||||
centers. This decision meant trading off support for security
|
||||
groups and live migration. In addition some details like
|
||||
flavors needed to be manually replicated across cells. In
|
||||
spite of these drawbacks cells were determined to provide the
|
||||
required scale while exposing a single public API endpoint to
|
||||
users.</para>
|
||||
<para>A compute cell was created for each of the two original data
|
||||
centers and a third was created when a new data center was
|
||||
added in 2013. Each cell contains three availability zones to
|
||||
further segregate compute resources and at least three
|
||||
RabbitMQ message brokers configured to be clustered with
|
||||
mirrored queues for high availability.</para>
|
||||
<para>The API cell, which resides behind a HAProxy load balancer,
|
||||
is located in the data center in Switzerland and directs API
|
||||
calls to compute cells using a customized variation of the
|
||||
cell scheduler. The customizations allow certain workloads to
|
||||
be directed to a specific data center or "all" data centers
|
||||
with cell selection determined by cell RAM availability in the
|
||||
latter case.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="../images/Generic_CERN_Example.png"/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>There is also some customization of the filter scheduler
|
||||
that handles placement within the cells:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>ImagePropertiesFilter - To provide special handling
|
||||
depending on the guest operating system in use
|
||||
(Linux-based or Windows-based).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>ProjectsToAggregateFilter - To provide special
|
||||
handling depending on the project the instance is
|
||||
associated with.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>default_schedule_zones - Allows the selection of
|
||||
multiple default availability zones, rather than a
|
||||
single default.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>The MySQL database server in each cell is managed by a
|
||||
central database team and configured in an active/passive
|
||||
configuration with a NetApp storage back end. Backups are
|
||||
performed ever 6 hours.</para>
|
||||
<section xml:id="network-architecture"><title>Network Architecture</title>
|
||||
<para>To integrate with existing CERN networking infrastructure
|
||||
customizations were made to Nova Networking. This was in the
|
||||
form of a driver to integrate with CERN's existing database
|
||||
for tracking MAC and IP address assignments.</para>
|
||||
<para>The driver facilitates selection of a MAC address and IP for
|
||||
new instances based on the compute node the scheduler places
|
||||
the instance on</para>
|
||||
<para>The driver considers the compute node that the scheduler
|
||||
placed an instance on and then selects a MAC address and IP
|
||||
from the pre-registered list associated with that node in the
|
||||
database. The database is then updated to reflect the instance
|
||||
the addresses were assigned to.</para></section>
|
||||
<section xml:id="storage-architecture"><title>Storage Architecture</title>
|
||||
<para>The OpenStack image service is deployed in the API cell and
|
||||
configured to expose version 1 (V1) of the API. As a result
|
||||
the image registry is also required. The storage back end in
|
||||
use is a 3 PB Ceph cluster.</para>
|
||||
<para>A small set of "golden" Scientific Linux 5 and 6 images are
|
||||
maintained which applications can in turn be placed on using
|
||||
orchestration tools. Puppet is used for instance configuration
|
||||
management and customization but Heat deployment is
|
||||
expected.</para></section>
|
||||
<section xml:id="monitoring"><title>Monitoring</title>
|
||||
<para>Although direct billing is not required, OpenStack Telemetry
|
||||
is used to perform metering for the purposes of adjusting
|
||||
project quotas. A sharded, replicated, MongoDB back end is
|
||||
used. To spread API load, instances of the nova-api service
|
||||
were deployed within the child cells for Telemetry to query
|
||||
against. This also meant that some supporting services
|
||||
including keystone, glance-api and glance-registry needed to
|
||||
also be configured in the child cells.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Generic_CERN_Architecture.png"/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>Additional monitoring tools in use include Flume
|
||||
(http://flume.apache.org/), Elastic Search, Kibana
|
||||
(http://www.elasticsearch.org/overview/kibana/), and the CERN
|
||||
developed Lemon (http://lemon.web.cern.ch/lemon/index.shtml)
|
||||
project.</para></section>
|
||||
<section xml:id="references-cern-resources"><title>References</title>
|
||||
<para>The authors of the Architecture Design Guide would like to
|
||||
thank CERN for publicly documenting their OpenStack deployment
|
||||
in these resources, which formed the basis for this
|
||||
chapter:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>http://openstack-in-production.blogspot.fr/</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>http://www.openstack.org/assets/presentation-media/Deep-Dive-into-the-CERN-Cloud-Infrastructure.pdf</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
</section>
|
@ -0,0 +1,421 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="technical-considerations-compute-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Technical Considerations</title>
|
||||
<para>In a compute-focused OpenStack cloud, the type of instance
|
||||
workloads being provisioned heavily influences technical
|
||||
decision making. For example, specific use cases that demand
|
||||
multiple short running jobs present different requirements
|
||||
than those that specify long-running jobs, even though both
|
||||
situations are considered "compute focused."</para>
|
||||
<para>Public and private clouds require deterministic capacity
|
||||
planning to support elastic growth in order to meet user SLA
|
||||
expectations. Deterministic capacity planning is the path to
|
||||
predicting the effort and expense of making a given process
|
||||
consistently performant. This process is important because,
|
||||
when a service becomes a critical part of a user's
|
||||
infrastructure, the user's fate becomes wedded to the SLAs of
|
||||
the cloud itself. In cloud computing, a service’s performance
|
||||
will not be measured by its average speed but rather by the
|
||||
consistency of its speed.</para>
|
||||
<para>There are two aspects of capacity planning to consider:
|
||||
planning the initial deployment footprint, and planning
|
||||
expansion of it to stay ahead of the demands of cloud
|
||||
users.</para>
|
||||
<para>Planning the initial footprint for an OpenStack deployment
|
||||
is typically done based on existing infrastructure workloads
|
||||
and estimates based on expected uptake.</para>
|
||||
<para>The starting point is the core count of the cloud. By
|
||||
applying relevant ratios, the user can gather information
|
||||
about:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>The number of instances expected to be available
|
||||
concurrently: (overcommit fraction × cores) / virtual
|
||||
cores per instance</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>How much storage is required: flavor disk size ×
|
||||
number of instances</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>These ratios can be used to determine the amount of
|
||||
additional infrastructure needed to support the cloud. For
|
||||
example, consider a situation in which you require 1600
|
||||
instances, each with 2 vCPU and 50 GB of storage. Assuming the
|
||||
default overcommit rate of 16:1, working out the math provides
|
||||
an equation of:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>1600 = (16 x (number of physical cores)) / 2</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>storage required = 50 GB x 1600</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>On the surface, the equations reveal the need for 200
|
||||
physical cores and 80 TB of storage for
|
||||
/var/lib/nova/instances/. However, it is also important to
|
||||
look at patterns of usage to estimate the load that the API
|
||||
services, database servers, and queue servers are likely to
|
||||
encounter.</para>
|
||||
<para>Consider, for example, the differences between a cloud that
|
||||
supports a managed web-hosting platform with one running
|
||||
integration tests for a development project that creates one
|
||||
instance per code commit. In the former, the heavy work of
|
||||
creating an instance happens only every few months, whereas
|
||||
the latter puts constant heavy load on the cloud controller.
|
||||
The average instance lifetime must be considered, as a larger
|
||||
number generally means less load on the cloud
|
||||
controller.</para>
|
||||
<para>Aside from the creation and termination of instances, the
|
||||
impact of users must be considered when accessing the service,
|
||||
particularly on nova-api and its associated database. Listing
|
||||
instances garners a great deal of information and, given the
|
||||
frequency with which users run this operation, a cloud with a
|
||||
large number of users can increase the load significantly.
|
||||
This can even occur unintentionally. For example, the
|
||||
OpenStack Dashboard instances tab refreshes the list of
|
||||
instances every 30 seconds, so leaving it open in a browser
|
||||
window can cause unexpected load.</para>
|
||||
<para>Consideration of these factors can help determine how many
|
||||
cloud controller cores are required. A server with 8 CPU cores
|
||||
and 8 GB of RAM server would be sufficient for up to a rack of
|
||||
compute nodes, given the above caveats.</para>
|
||||
<para>Key hardware specifications are also crucial to the
|
||||
performance of user instances. Be sure to consider budget and
|
||||
performance needs, including storage performance
|
||||
(spindles/core), memory availability (RAM/core), network
|
||||
bandwidth (Gbps/core), and overall CPU performance
|
||||
(CPU/core).</para>
|
||||
<para>The cloud resource calculator is a useful tool in examining
|
||||
the impacts of different hardware and instance load outs. It
|
||||
is available at:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<section xml:id="expansion-planning-compute-focus">
|
||||
<title>Expansion Planning</title>
|
||||
<para>A key challenge faced when planning the expansion of cloud
|
||||
compute services is the elastic nature of cloud infrastructure
|
||||
demands. Previously, new users or customers would be forced to
|
||||
plan for and request the infrastructure they required ahead of
|
||||
time, allowing time for reactive procurement processes. Cloud
|
||||
computing users have come to expect the agility provided by
|
||||
having instant access to new resources as they are required.
|
||||
Consequently, this means planning should be delivered for
|
||||
typical usage, but also more importantly, for sudden bursts in
|
||||
usage.</para>
|
||||
<para>Planning for expansion can be a delicate balancing act.
|
||||
Planning too conservatively can lead to unexpected
|
||||
oversubscription of the cloud and dissatisfied users. Planning
|
||||
for cloud expansion too aggressively can lead to unexpected
|
||||
underutilization of the cloud and funds spent on operating
|
||||
infrastructure that is not being used efficiently.</para>
|
||||
<para>The key is to carefully monitor the spikes and valleys in
|
||||
cloud usage over time. The intent is to measure the
|
||||
consistency with which services can be delivered, not the
|
||||
average speed or capacity of the cloud. Using this information
|
||||
to model performance results in capacity enables users to more
|
||||
accurately determine the current and future capacity of the
|
||||
cloud.</para></section>
|
||||
<section xml:id="cpu-and-ram-compute-focus"><title>CPU and RAM</title>
|
||||
<para>(Adapted from:
|
||||
http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice)</para>
|
||||
<para>In current generations, CPUs have up to 12 cores. If an
|
||||
Intel CPU supports Hyper-Threading, those 12 cores are doubled
|
||||
to 24 cores. If a server is purchased that supports multiple
|
||||
CPUs, the number of cores is further multiplied.
|
||||
Hyper-Threading is Intel's proprietary simultaneous
|
||||
multi-threading implementation, used to improve
|
||||
parallelization on their CPUs. Consider enabling
|
||||
Hyper-Threading to improve the performance of multithreaded
|
||||
applications.</para>
|
||||
<para>Whether the user should enable Hyper-Threading on a CPU
|
||||
depends upon the use case. For example, disabling
|
||||
Hyper-Threading can be beneficial in intense computing
|
||||
environments. Performance testing conducted by running local
|
||||
workloads with both Hyper-Threading on and off can help
|
||||
determine what is more appropriate in any particular
|
||||
case.</para>
|
||||
<para>If the Libvirt/KVM Hypervisor driver are the intended use
|
||||
cases, then the CPUs used in the compute nodes must support
|
||||
virtualization by way of the VT-x extensions for Intel chips
|
||||
and AMD-v extensions for AMD chips to provide full
|
||||
performance.</para>
|
||||
<para>OpenStack enables the user to overcommit CPU and RAM on
|
||||
compute nodes. This allows an increase in the number of
|
||||
instances running on the cloud at the cost of reducing the
|
||||
performance of the instances. OpenStack Compute uses the
|
||||
following ratios by default:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>CPU allocation ratio: 16:1</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>RAM allocation ratio: 1.5:1</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>The default CPU allocation ratio of 16:1 means that the
|
||||
scheduler allocates up to 16 virtual cores per physical core.
|
||||
For example, if a physical node has 12 cores, the scheduler
|
||||
sees 192 available virtual cores. With typical flavor
|
||||
definitions of 4 virtual cores per instance, this ratio would
|
||||
provide 48 instances on a physical node.</para>
|
||||
<para>Similarly, the default RAM allocation ratio of 1.5:1 means
|
||||
that the scheduler allocates instances to a physical node as
|
||||
long as the total amount of RAM associated with the instances
|
||||
is less than 1.5 times the amount of RAM available on the
|
||||
physical node.</para>
|
||||
<para>For example, if a physical node has 48 GB of RAM, the
|
||||
scheduler allocates instances to that node until the sum of
|
||||
the RAM associated with the instances reaches 72 GB (such as
|
||||
nine instances, in the case where each instance has 8 GB of
|
||||
RAM).</para>
|
||||
<para>The appropriate CPU and RAM allocation ratio must be
|
||||
selected based on particular use cases.</para></section>
|
||||
<section xml:id="additional-hardware-compute-focus"><title>Additional Hardware</title>
|
||||
<para>Certain use cases may benefit from exposure to additional
|
||||
devices on the compute node. Examples might include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>High performance computing jobs that benefit from
|
||||
the availability of graphics processing units (GPUs)
|
||||
for general-purpose computing.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cryptographic routines that benefit from the
|
||||
availability of hardware random number generators to
|
||||
avoid entropy starvation.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Database management systems that benefit from the
|
||||
availability of SSDs for ephemeral storage to maximize
|
||||
read/write time when it is required.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Host aggregates are used to group hosts that share similar
|
||||
characteristics, which can include hardware similarities. The
|
||||
addition of specialized hardware to a cloud deployment is
|
||||
likely to add to the cost of each node, so careful
|
||||
consideration must be given to whether all compute nodes, or
|
||||
just a subset which is targetable using flavors, need the
|
||||
additional customization to support the desired
|
||||
workloads.</para></section>
|
||||
<section xml:id="utilization"><title>Utilization</title>
|
||||
<para>Infrastructure-as-a-Service offerings, including OpenStack,
|
||||
use flavors to provide standardized views of virtual machine
|
||||
resource requirements that simplify the problem of scheduling
|
||||
instances while making the best use of the available physical
|
||||
resources.</para>
|
||||
<para>In order to facilitate packing of virtual machines onto
|
||||
physical hosts, the default selection of flavors are
|
||||
constructed so that the second largest flavor is half the size
|
||||
of the largest flavor in every dimension. It has half the
|
||||
vCPUs, half the vRAM, and half the ephemeral disk space. The
|
||||
next largest flavor is half that size again. As a result,
|
||||
packing a server for general purpose computing might look
|
||||
conceptually something like this figure:</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Compute_Tech_Bin_Packing_General1.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>On the other hand, a CPU optimized packed server might look
|
||||
like the following figure:</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Compute_Tech_Bin_Packing_CPU_optimized1.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>These default flavors are well suited to typical load outs
|
||||
for commodity server hardware. To maximize utilization,
|
||||
however, it may be necessary to customize the flavors or
|
||||
create new ones, to better align instance sizes to the
|
||||
available hardware.</para>
|
||||
<para>Workload characteristics may also influence hardware choices
|
||||
and flavor configuration, particularly where they present
|
||||
different ratios of CPU versus RAM versus HDD
|
||||
requirements.</para>
|
||||
<para>For more information on Flavors refer to:
|
||||
http://docs.openstack.org/openstack-ops/content/flavors.html</para>
|
||||
</section>
|
||||
<section xml:id="performance-compute-focus"><title>Performance</title>
|
||||
<para>The infrastructure of a cloud should not be shared, so that
|
||||
it is possible for the workloads to consume as many resources
|
||||
as are made available, and accommodations should be made to
|
||||
provide large scale workloads.</para>
|
||||
<para>The duration of batch processing differs depending on
|
||||
individual workloads that are launched. Time limits range from
|
||||
seconds, minutes to hours, and as a result it is considered
|
||||
difficult to predict when resources will be used, for how
|
||||
long, and even which resources will be used.</para>
|
||||
</section>
|
||||
<section xml:id="security-compute-focus"><title>Security</title>
|
||||
<para>The security considerations needed for this scenario are
|
||||
similar to those of the other scenarios discussed in this
|
||||
book.</para>
|
||||
<para>A security domain comprises of users, applications, servers
|
||||
or networks that share common trust requirements and
|
||||
expectations within a system. Typically they have the same
|
||||
authentication and authorization requirements and
|
||||
users.</para>
|
||||
<para>These security domains are:</para>
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>Public</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Guest</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Management</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<para>These security domains can be mapped individually to the
|
||||
installation, or they can also be combined. For example, some
|
||||
deployment topologies combine both guest and data domains onto
|
||||
one physical network, whereas in other cases these networks
|
||||
are physically separated. In each case, the cloud operator
|
||||
should be aware of the appropriate security concerns. Security
|
||||
domains should be mapped out against specific OpenStack
|
||||
deployment topology. The domains and their trust requirements
|
||||
depend upon whether the cloud instance is public, private, or
|
||||
hybrid.</para>
|
||||
<para>The public security domain is an entirely untrusted area of
|
||||
the cloud infrastructure. It can refer to the Internet as a
|
||||
whole or simply to networks over which the user has no
|
||||
authority. This domain should always be considered
|
||||
untrusted.</para>
|
||||
<para>Typically used for compute instance-to-instance traffic, the
|
||||
guest security domain handles compute data generated by
|
||||
instances on the cloud; not services that support the
|
||||
operation of the cloud, for example API calls. Public cloud
|
||||
providers and private cloud providers who do not have
|
||||
stringent controls on instance use or who allow unrestricted
|
||||
internet access to instances should consider this domain to be
|
||||
untrusted. Private cloud providers may want to consider this
|
||||
network as internal and therefore trusted only if they have
|
||||
controls in place to assert that they trust instances and all
|
||||
their tenants.</para>
|
||||
<para>The management security domain is where services interact.
|
||||
Sometimes referred to as the "control plane", the networks in
|
||||
this domain transport confidential data such as configuration
|
||||
parameters, user names, and passwords. In most deployments this
|
||||
domain is considered trusted.</para>
|
||||
<para>The data security domain is concerned primarily with
|
||||
information pertaining to the storage services within
|
||||
OpenStack. Much of the data that crosses this network has high
|
||||
integrity and confidentiality requirements and depending on
|
||||
the type of deployment there may also be strong availability
|
||||
requirements. The trust level of this network is heavily
|
||||
dependent on deployment decisions and as such we do not assign
|
||||
this any default level of trust.</para>
|
||||
<para>When deploying OpenStack in an enterprise as a private cloud
|
||||
it is assumed to be behind a firewall and within the trusted
|
||||
network alongside existing systems. Users of the cloud are
|
||||
typically employees or trusted individuals that are bound by
|
||||
the security requirements set forth by the company. This tends
|
||||
to push most of the security domains towards a more trusted
|
||||
model. However, when deploying OpenStack in a public-facing
|
||||
role, no assumptions can be made and the attack vectors
|
||||
significantly increase. For example, the API endpoints and the
|
||||
software behind it will be vulnerable to potentially hostile
|
||||
entities wanting to gain unauthorized access or prevent access
|
||||
to services. This can result in loss of reputation and must be
|
||||
protected against through auditing and appropriate
|
||||
filtering.</para>
|
||||
<para>Consideration must be taken when managing the users of the
|
||||
system, whether it is the operation of public or private
|
||||
clouds. The identity service allows for LDAP to be part of the
|
||||
authentication process, and includes such systems as an
|
||||
OpenStack deployment that may ease user management if
|
||||
integrated into existing systems.</para>
|
||||
<para>It is strongly recommended that the API services are placed
|
||||
behind hardware that performs SSL termination. API services
|
||||
transmit user names, passwords, and generated tokens between
|
||||
client machines and API endpoints and therefore must be
|
||||
secured.</para>
|
||||
<para>More information on OpenStack Security can be found
|
||||
at http://docs.openstack.org/security-guide/</para>
|
||||
</section>
|
||||
<section xml:id="openstack-components-compute-focus"><title>OpenStack Components</title>
|
||||
<para>Due to the nature of the workloads that will be used in this
|
||||
scenario, a number of components will be highly beneficial in
|
||||
a Compute-focused cloud. This includes the typical OpenStack
|
||||
components:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OpenStack Compute (Nova)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Image Service (Glance)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Identity Service (Keystone)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Also consider several specialized components:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OpenStack Orchestration Engine (Heat)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>It is safe to assume that, given the nature of the
|
||||
applications involved in this scenario, these will be heavily
|
||||
automated deployments. Making use of Heat will be highly
|
||||
beneficial in this case. Deploying a batch of instances and
|
||||
running an automated set of tests can be scripted, however it
|
||||
makes sense to use the OpenStack Orchestration Engine (Heat)
|
||||
to handle all these actions.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OpenStack Telemetry (Ceilometer)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>OpenStack Telemetry and the alarms it generates are required
|
||||
to support autoscaling of instances using OpenStack
|
||||
Orchestration. Users that are not using OpenStack
|
||||
Orchestration do not need to deploy OpenStack Telemetry and
|
||||
may choose to use other external solutions to fulfill their
|
||||
metering and monitoring requirements.</para>
|
||||
<para>See also:
|
||||
http://docs.openstack.org/openstack-ops/content/logging_monitoring.html</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OpenStack Block Storage (Cinder)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Due to the burst-able nature of the workloads and the
|
||||
applications and instances that will be used for batch
|
||||
processing, this cloud will utilize mainly memory or CPU, so
|
||||
the need for add-on storage to each instance is not a likely
|
||||
requirement. This does not mean the OpenStack Block Storage
|
||||
service (Cinder) will not be used in the infrastructure, but
|
||||
typically it will not be used as a central component.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Networking</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>When choosing a networking platform, ensure that it either
|
||||
works with all desired hypervisor and container technologies
|
||||
and their OpenStack drivers, or includes an implementation of
|
||||
an ML2 mechanism driver. Networking platforms that provide ML2
|
||||
mechanisms drivers can be mixed.</para></section>
|
||||
</section>
|
@ -0,0 +1,144 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="user-requirements-compute-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>User Requirements</title>
|
||||
<para>Compute intensive workloads are defined by their high
|
||||
utilization of CPU, RAM, or both. User requirements will
|
||||
determine if a cloud must be built to accommodate anticipated
|
||||
performance demands.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cost: Cost is not generally a primary concern for a
|
||||
compute-focused cloud, however some organizations
|
||||
might be concerned with cost avoidance. Repurposing
|
||||
existing resources to tackle compute-intensive tasks
|
||||
instead of needing to acquire additional resources may
|
||||
offer cost reduction opportunities.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Time to Market: Compute-focused clouds can be used
|
||||
to deliver products more quickly, for example,
|
||||
speeding up a company's software development life cycle
|
||||
(SDLC) for building products and applications.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Revenue Opportunity: Companies that are interested
|
||||
in building services or products that rely on the
|
||||
power of the compute resources will benefit from a
|
||||
compute-focused cloud. Examples include the analysis
|
||||
of large data sets (via Hadoop or Cassandra) or
|
||||
completing computational intensive tasks such as
|
||||
rendering, scientific computation, or
|
||||
simulations.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<section xml:id="legal-requirements-compute-focus"><title>Legal Requirements</title>
|
||||
<para>Many jurisdictions have legislative and regulatory
|
||||
requirements governing the storage and management of data in
|
||||
cloud environments. Common areas of regulation include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Data retention policies ensuring storage of
|
||||
persistent data and records management to meet data
|
||||
archival requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data ownership policies governing the possession and
|
||||
responsibility for data.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data sovereignty policies governing the storage of
|
||||
data in foreign countries or otherwise separate
|
||||
jurisdictions.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data compliance - certain types of information needs
|
||||
to reside in certain locations due to regular issues -
|
||||
and more important cannot reside in other locations
|
||||
for the same reason.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Examples of such legal frameworks include the data
|
||||
protection framework of the European Union
|
||||
(http://ec.europa.eu/justice/data-protection/ ) and the
|
||||
requirements of the Financial Industry Regulatory Authority
|
||||
(http://www.finra.org/Industry/Regulation/FINRARules/ ) in the
|
||||
United States. Consult a local regulatory body for more
|
||||
information.</para></section>
|
||||
<section xml:id="technical-considerations-compute-focus-user"><title>Technical Considerations</title>
|
||||
<para>The following are some technical requirements that need to
|
||||
be incorporated into the architecture design.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Performance: If a primary technical concern is for
|
||||
the environment to deliver high performance
|
||||
capability, then a compute-focused design is an
|
||||
obvious choice because it is specifically designed to
|
||||
host compute-intensive workloads.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Workload persistence: Workloads can be either
|
||||
short-lived or long running. Short-lived workloads
|
||||
might include continuous integration and continuous
|
||||
deployment (CI-CD) jobs, where large numbers of
|
||||
compute instances are created simultaneously to
|
||||
perform a set of compute-intensive tasks. The results
|
||||
or artifacts are then copied from the instance into
|
||||
long-term storage before the instance is destroyed.
|
||||
Long-running workloads, like a Hadoop or
|
||||
high-performance computing (HPC) cluster, typically
|
||||
ingest large data sets, perform the computational work
|
||||
on those data sets, then push the results into long
|
||||
term storage. Unlike short-lived workloads, when the
|
||||
computational work is completed, they will remain idle
|
||||
until the next job is pushed to them. Long-running
|
||||
workloads are often larger and more complex, so the
|
||||
effort of building them is mitigated by keeping them
|
||||
active between jobs. Another example of long running
|
||||
workloads is legacy applications that typically are
|
||||
persistent over time.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Storage: Workloads targeted for a compute-focused
|
||||
OpenStack cloud generally do not require any
|
||||
persistent block storage (although some usages of
|
||||
Hadoop with HDFS may dictate the use of persistent
|
||||
block storage). A shared filesystem or object store
|
||||
will maintain the initial data set(s) and serve as the
|
||||
destination for saving the computational results. By
|
||||
avoiding the input-output (IO) overhead, workload
|
||||
performance is significantly enhanced. Depending on
|
||||
the size of the data set(s), it might be necessary to
|
||||
scale the object store or shared file system to match
|
||||
the storage demand.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>User Interface: Like any other cloud architecture, a
|
||||
compute-focused OpenStack cloud requires an on-demand
|
||||
and self-service user interface. End users must be
|
||||
able to provision computing power, storage, networks
|
||||
and software simply and flexibly. This includes
|
||||
scaling the infrastructure up to a substantial level
|
||||
without disrupting host operations.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Security: Security is going to be highly dependent
|
||||
on the business requirements. For example, a
|
||||
computationally intense drug discovery application
|
||||
will obviously have much higher security requirements
|
||||
than a cloud that is designed for processing market
|
||||
data for a retailer. As a general start, the security
|
||||
recommendations and guidelines provided in the
|
||||
OpenStack Security Guide are applicable.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="operational-considerations-compute-focus-user"><title>Operational Considerations</title>
|
||||
<para>The compute intensive cloud from the operational perspective
|
||||
is similar to the requirements for the general-purpose cloud.
|
||||
More details on operational requirements can be found in the
|
||||
general-purpose design section.</para></section>
|
||||
</section>
|
@ -0,0 +1,744 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-architecture-overview">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Architecture</title>
|
||||
<para>Hardware selection involves three key areas:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Compute</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Storage</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>For each of these areas, the selection of hardware for a
|
||||
general purpose OpenStack cloud must reflect the fact that a
|
||||
the cloud has no pre-defined usage model. This means that
|
||||
there will be a wide variety of applications running on this
|
||||
cloud that will have varying resource usage requirements. Some
|
||||
applications will be RAM-intensive, some applications will be
|
||||
CPU-intensive, while others will be storage-intensive.
|
||||
Therefore, choosing hardware for a general purpose OpenStack
|
||||
cloud must provide balanced access to all major
|
||||
resources.</para>
|
||||
<para>Certain hardware form factors may be better suited for use
|
||||
in a general purpose OpenStack cloud because of the need for
|
||||
an equal or nearly equal balance of resources. Server hardware
|
||||
for a general purpose OpenStack architecture design must
|
||||
provide an equal or nearly equal balance of compute capacity
|
||||
(RAM and CPU), network capacity (number and speed of links),
|
||||
and storage capacity (gigabytes or terabytes as well as I-O
|
||||
Operations Per Second (IOPS).</para>
|
||||
<para>Server hardware is evaluated around four conflicting
|
||||
dimensions:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Server density: A measure of how many servers can
|
||||
fit into a given measure of physical space, such as a
|
||||
rack unit [U].</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Resource capacity: The number of CPU cores, how much
|
||||
RAM, or how much storage a given server will
|
||||
deliver.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Expandability: The number of additional resources
|
||||
that can be added to a server before it has reached
|
||||
its limit.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Cost: The relative purchase price of the hardware
|
||||
weighted against the level of design effort needed to
|
||||
build the system.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Increasing server density means sacrificing resource
|
||||
capacity or expandability, however, increasing resource
|
||||
capacity and expandability increases cost and decreases server
|
||||
density. As a result, determining the best server hardware for
|
||||
a general purpose OpenStack architecture means understanding
|
||||
how choice of form factor will impact the rest of the
|
||||
design.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Blade servers typically support dual-socket
|
||||
multi-core CPUs, which is the configuration generally
|
||||
considered to be the "sweet spot" for a general
|
||||
purpose cloud deployment. Blades also offer
|
||||
outstanding density. As an example, both HP
|
||||
BladeSystem and Dell PowerEdge M1000e support up to 16
|
||||
servers in only 10 rack units. However, the blade
|
||||
servers themselves often have limited storage and
|
||||
networking capacity. Additionally, the expandability
|
||||
of many blade servers can be limited.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>1U rack-mounted servers occupy only a single rack
|
||||
unit. Their benefits include high density, support for
|
||||
dual-socket multi-core CPUs, and support for
|
||||
reasonable RAM amounts. This form factor offers
|
||||
limited storage capacity, limited network capacity,
|
||||
and limited expandability.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>2U rack-mounted servers offer the expanded storage
|
||||
and networking capacity that 1U servers tend to lack,
|
||||
but with a corresponding decrease in server density
|
||||
(half the density offered by 1U rack-mounted
|
||||
servers).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Larger rack-mounted servers, such as 4U servers,
|
||||
will tend to offer even greater CPU capacity, often
|
||||
supporting four or even eight CPU sockets. These
|
||||
servers often have much greater expandability so will
|
||||
provide the best option for upgradability. This means,
|
||||
however, that the servers have a much lower server
|
||||
density and a much greater hardware cost.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>"Sled servers" are rack-mounted servers that support
|
||||
multiple independent servers in a single 2U or 3U
|
||||
enclosure. This form factor offers increased density
|
||||
over typical 1U-2U rack-mounted servers but tends to
|
||||
suffer from limitations in the amount of storage or
|
||||
network capacity each individual server
|
||||
supports.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Given the wide selection of hardware and general user
|
||||
requirements, the best form factor for the server hardware
|
||||
supporting a general purpose OpenStack cloud is driven by
|
||||
outside business and cost factors. No single reference
|
||||
architecture will apply to all implementations; the decision
|
||||
must flow out of the user requirements, technical
|
||||
considerations, and operational considerations. Here are some
|
||||
of the key factors that influence the selection of server
|
||||
hardware:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Instance density: Sizing is an important
|
||||
consideration for a general purpose OpenStack cloud.
|
||||
The expected or anticipated number of instances that
|
||||
each hypervisor can host is a common metric used in
|
||||
sizing the deployment. The selected server hardware
|
||||
needs to support the expected or anticipated instance
|
||||
density.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Host density: Physical data centers have limited
|
||||
physical space, power, and cooling. The number of
|
||||
hosts (or hypervisors) that can be fitted into a given
|
||||
metric (rack, rack unit, or floor tile) is another
|
||||
important method of sizing. Floor weight is an often
|
||||
overlooked consideration. The data center floor must
|
||||
be able to support the weight of the proposed number
|
||||
of hosts within a rack or set of racks. These factors
|
||||
need to be applied as part of the host density
|
||||
calculation and server hardware selection.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Power density: Data centers have a specified amount
|
||||
of power fed to a given rack or set of racks. Older
|
||||
data centers may have a power density as power as low
|
||||
as 20 AMPs per rack, while more recent data centers
|
||||
can be architected to support power densities as high
|
||||
as 120 AMP per rack. The selected server hardware must
|
||||
take power density into account.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network connectivity: The selected server hardware
|
||||
must have the appropriate number of network
|
||||
connections, as well as the right type of network
|
||||
connections, in order to support the proposed
|
||||
architecture. Ensure that, at a minimum, there are at
|
||||
least two diverse network connections coming into each
|
||||
rack. For architectures requiring even more
|
||||
redundancy, it might be necessary to confirm that the
|
||||
network connections are from diverse telecom
|
||||
providers. Many data centers have that capacity
|
||||
available.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>The selection of certain form factors or architectures will
|
||||
affect the selection of server hardware. For example, if the
|
||||
design calls for a scale-out storage architecture (For
|
||||
example, leveraging Ceph, Gluster, or a similar commercial
|
||||
solution), then the server hardware selection will need to be
|
||||
carefully considered to match the requirements set by the
|
||||
commercial solution. Ensure that the selected server hardware
|
||||
is configured to support enough storage capacity (or storage
|
||||
expandability) to match the requirements of selected scale-out
|
||||
storage solution. For example, if a centralized storage
|
||||
solution is required, such as a centralized storage array from
|
||||
a storage vendor that has infiniBand or FDDI connections, the
|
||||
server hardware will need to have appropriate network adapters
|
||||
installed to be compatible with the storage array vendor's
|
||||
specifications.</para>
|
||||
<para>Similarly, the network architecture will have an impact on
|
||||
the server hardware selection and vice versa. For example,
|
||||
make sure that the server is configured with enough additional
|
||||
network ports and expansion cards to support all of the
|
||||
networks required. There is variability in network expansion
|
||||
cards, so it is important to be aware of potential impacts or
|
||||
interoperability issues with other components in the
|
||||
architecture. This is especially true if the architecture uses
|
||||
InfiniBand or another less commonly used networking
|
||||
protocol.</para>
|
||||
<section xml:id="selecting-storage-hardware">
|
||||
<title>Selecting Storage Hardware</title>
|
||||
<para>The selection of storage hardware is largely determined by
|
||||
the proposed storage architecture. Factors that need to be
|
||||
incorporated into the storage architecture include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cost: Storage can be a significant portion of the
|
||||
overall system cost that should be factored into the
|
||||
design decision. For an organization that is concerned
|
||||
with vendor support, a commercial storage solution is
|
||||
advisable, although it is comes with a higher price
|
||||
tag. If initial capital expenditure requires
|
||||
minimization, designing a system based on commodity
|
||||
hardware would apply. The trade-off is potentially
|
||||
higher support costs and a greater risk of
|
||||
incompatibility and interoperability issues.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Performance: Storage performance, measured by
|
||||
observing the latency of storage I-O requests, is not
|
||||
a critical factor for a general purpose OpenStack
|
||||
cloud as overall systems performance is not a design
|
||||
priority.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Scalability: The term "scalability" refers to how
|
||||
well the storage solution performs as it expands up to
|
||||
its maximum designed size. A solution that continues
|
||||
to perform well at maximum expansion is considered
|
||||
scalable. A storage solution that performs well in
|
||||
small configurations but has degrading performance as
|
||||
it expands was not designed to be not scalable.
|
||||
Scalability, along with expandability, is a major
|
||||
consideration in a general purpose OpenStack cloud. It
|
||||
might be difficult to predict the final intended size
|
||||
of the implementation because there are no established
|
||||
usage patterns for a general purpose cloud. Therefore,
|
||||
it may become necessary to expand the initial
|
||||
deployment in order to accommodate growth and user
|
||||
demand. The ability of the storage solution to
|
||||
continue to perform well as it expands is
|
||||
important.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Expandability: This refers to the overall ability of
|
||||
the solution to grow. A storage solution that expands
|
||||
to 50 PB is considered more expandable than a solution
|
||||
that only scales to 10 PB. This metric is related to,
|
||||
but different, from scalability, which is a measure of
|
||||
the solution's performance as it expands.
|
||||
Expandability is a major architecture factor for
|
||||
storage solutions with general purpose OpenStack
|
||||
cloud. For example, the storage architecture for a
|
||||
cloud that is intended for a development platform may
|
||||
not have the same expandability and scalability
|
||||
requirements as a cloud that is intended for a
|
||||
commercial product.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Storage hardware architecture is largely determined by the
|
||||
selected storage architecture. The selection of storage
|
||||
architecture, as well as the corresponding storage hardware,
|
||||
is determined by evaluating possible solutions against the
|
||||
critical factors, the user requirements, technical
|
||||
considerations, and operational considerations. A combination
|
||||
of all the factors and considerations will determine which
|
||||
approach will be best.</para>
|
||||
<para>Using a scale-out storage solution with direct-attached
|
||||
storage (DAS) in the servers is well suited for a general
|
||||
purpose OpenStack cloud. In this scenario, it is possible to
|
||||
populate storage in either the compute hosts similar to a grid
|
||||
computing solution or into hosts dedicated to providing block
|
||||
storage exclusively. When deploying storage in the compute
|
||||
hosts, appropriate hardware which can support both the storage
|
||||
and compute services on the same hardware will be required.
|
||||
This approach is referred to as a grid computing architecture
|
||||
because there is a grid of modules that have both compute and
|
||||
storage in a single box.</para>
|
||||
<para>Understanding the requirements of cloud services will help
|
||||
determine if Ceph, Gluster, or a similar scale-out solution
|
||||
should be used. It can then be further determined if a single,
|
||||
highly expandable and highly vertical, scalable, centralized
|
||||
storage array should be included in the design. Once the
|
||||
approach has been determined, the storage hardware needs to be
|
||||
chosen based on this criteria. If a centralized storage array
|
||||
fits the requirements best, then the array vendor will
|
||||
determine the hardware. For cost reasons it may be decided to
|
||||
build an open source storage array using solutions such as
|
||||
OpenFiler, Nexenta Open Source, or BackBlaze Open
|
||||
Source.</para>
|
||||
<para>This list expands upon the potential impacts for including a
|
||||
particular storage architecture (and corresponding storage
|
||||
hardware) into the design for a general purpose OpenStack
|
||||
cloud:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Connectivity: Ensure that, if storage protocols
|
||||
other than Ethernet are part of the storage solution,
|
||||
the appropriate hardware has been selected. Some
|
||||
examples include InfiniBand, FDDI and Fibre Channel.
|
||||
If a centralized storage array is selected, ensure
|
||||
that the hypervisor will be able to connect to that
|
||||
storage array for image storage.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Usage: How the particular storage architecture will
|
||||
be used is critical for determining the architecture.
|
||||
Some of the configurations that will influence the
|
||||
architecture include whether it will be used by the
|
||||
hypervisors for ephemeral instance storage or if
|
||||
OpenStack Swift will use it for object storage. All of
|
||||
these usage models are affected by the selection of
|
||||
particular storage architecture and the corresponding
|
||||
storage hardware to support that architecture.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Instance and image locations: Where instances and
|
||||
images will be stored will influence the architecture.
|
||||
For example, instances can be stored in a number of
|
||||
options. OpenStack Cinder is a good location for
|
||||
instances because it is persistent block storage,
|
||||
however, Swift can be used if storage latency is less
|
||||
of a concern. The same argument applies to the
|
||||
appropriate image storage location.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Server Hardware: If the solution is a scale-out
|
||||
storage architecture that includes DAS, naturally that
|
||||
will affect the server hardware selection. This could
|
||||
ripple into the decisions that affect host density,
|
||||
instance density, power density, OS-hypervisor,
|
||||
management tools and others.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>General purpose OpenStack cloud has multiple options. As a
|
||||
result, there is no single decision that will apply to all
|
||||
implementations. The key factors that will have an influence
|
||||
on selection of storage hardware for a general purpose
|
||||
OpenStack cloud are as follows:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Capacity: Hardware resources selected for the
|
||||
resource nodes should be capable of supporting enough
|
||||
storage for the cloud services that will use them. It
|
||||
is important to clearly define the initial
|
||||
requirements and ensure that the design can support
|
||||
adding capacity as resources are used in the cloud, as
|
||||
workloads are relatively unknown. Hardware nodes
|
||||
selected for object storage should be capable of
|
||||
supporting a large number of inexpensive disks and
|
||||
should not have any reliance on RAID controller cards.
|
||||
Hardware nodes selected for block storage should be
|
||||
capable of supporting higher speed storage solutions
|
||||
and RAID controller cards to provide performance and
|
||||
redundancy to storage at the hardware level. Selecting
|
||||
hardware RAID controllers that can automatically
|
||||
repair damaged arrays will further assist with
|
||||
replacing and repairing degraded or destroyed storage
|
||||
devices within the cloud.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Performance: Disks selected for the object storage
|
||||
service do not need to be fast performing disks. It is
|
||||
recommended that object storage nodes take advantage
|
||||
of the best cost per terabyte available for storage at
|
||||
the time of acquisition and avoid enterprise class
|
||||
drives. In contrast, disks chosen for the block
|
||||
storage service should take advantage of performance
|
||||
boosting features and may entail the use of SSDs or
|
||||
flash storage to provide for high performing block
|
||||
storage pools. Storage performance of ephemeral disks
|
||||
used for instances should also be taken into
|
||||
consideration. If compute pools are expected to have a
|
||||
high utilization of ephemeral storage or requires very
|
||||
high performance, it would be advantageous to deploy
|
||||
similar hardware solutions to block storage in order
|
||||
to increase the storage performance.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Fault Tolerance: Object storage resource nodes have
|
||||
no requirements for hardware fault tolerance or RAID
|
||||
controllers. It is not necessary to plan for fault
|
||||
tolerance within the object storage hardware because
|
||||
the object storage service provides replication
|
||||
between zones as a feature of the service. Block
|
||||
storage nodes, compute nodes and cloud controllers
|
||||
should all have fault tolerance built in at the
|
||||
hardware level by making use of hardware RAID
|
||||
controllers and varying levels of RAID configuration.
|
||||
The level of RAID chosen should be consistent with the
|
||||
performance and availability requirements of the
|
||||
cloud.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
<section xml:id="selecting-networking-hardware">
|
||||
<title>Selecting Networking Hardware</title>
|
||||
<para>As is the case with storage architecture, selecting a
|
||||
network architecture often determines which network hardware
|
||||
will be used. The networking software in use is determined by
|
||||
the selected networking hardware. Some design impacts are
|
||||
obvious, for example, selecting networking hardware that only
|
||||
supports Gigabit Ethernet (GbE) will naturally have an impact
|
||||
on many different areas of the overall design. Similarly,
|
||||
deciding to use 10 Gigabit Ethernet (10 GbE) has a number of
|
||||
impacts on various areas of the overall design.</para>
|
||||
<para>As an example, selecting Cisco networking hardware implies
|
||||
that the architecture will be using Cisco networking software
|
||||
(IOS, NX-OS, etc.). Conversely, selecting Arista networking
|
||||
hardware means the network devices will use Arista networking
|
||||
software (EOS). In addition, there are more subtle design
|
||||
impacts that need to be considered. The selection of certain
|
||||
networking hardware (and therefore the networking software)
|
||||
could affect the management tools that can be used. There are
|
||||
exceptions to this; the rise of "open" networking software
|
||||
that supports a range of networking hardware means that there
|
||||
are instances where the relationship between networking
|
||||
hardware and networking software are not as tightly defined.
|
||||
An example of this type of software is Cumulus Linux, which is
|
||||
capable of running on a number of switch vendor’s hardware
|
||||
solutions.</para>
|
||||
<para>Some of the key considerations that should be included in
|
||||
the selection of networking hardware include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Port count: The design will require networking
|
||||
hardware that has the requisite port count.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Port density: The network design will be affected by
|
||||
the physical space that is required to provide the
|
||||
requisite port count. A switch that can provide 48 10
|
||||
GbE ports in 1U has a much higher port density than a
|
||||
switch that provides 24 10 GbE ports in 2U. A higher
|
||||
port density is preferred, as it leaves more rack
|
||||
space for compute or storage components that may be
|
||||
required by the design. This can also lead into
|
||||
concerns about fault domains and power density that
|
||||
should be considered. Higher density switches are more
|
||||
expensive and should also be considered, as it is
|
||||
important not to over design the network if it is not
|
||||
required.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Port speed: The networking hardware must support the
|
||||
proposed network speed, for example: 1 GbE, 10 GbE, or
|
||||
40 GbE (or even 100 GbE).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Redundancy: The level of network hardware redundancy
|
||||
required is influenced by the user requirements for
|
||||
high availability and cost considerations. Network
|
||||
redundancy can be achieved by adding redundant power
|
||||
supplies or paired switches. If this is a requirement,
|
||||
the hardware will need to support this configuration.
|
||||
User requirements will determine if a completely
|
||||
redundant network infrastructure is required.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Power requirements: Make sure that the physical data
|
||||
center provides the necessary power for the selected
|
||||
network hardware. This is not an issue for top of rack
|
||||
(ToR) switches, but may be an issue for spine switches
|
||||
in a leaf and spine fabric, or end of row (EoR)
|
||||
switches.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>There is no single best practice architecture for the
|
||||
networking hardware supporting a general purpose OpenStack
|
||||
cloud that will apply to all implementations. Some of the key
|
||||
factors that will have a strong influence on selection of
|
||||
networking hardware include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Connectivity: All nodes within an OpenStack cloud
|
||||
require some form of network connectivity. In some
|
||||
cases, nodes require access to more than one network
|
||||
segment. The design must encompass sufficient network
|
||||
capacity and bandwidth to ensure that all
|
||||
communications within the cloud, both north-south and
|
||||
east-west traffic have sufficient resources
|
||||
available.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Scalability: The chosen network design should
|
||||
encompass a physical and logical network design that
|
||||
can be easily expanded upon. Network hardware should
|
||||
offer the appropriate types of interfaces and speeds
|
||||
that are required by the hardware nodes.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Availability: To ensure that access to nodes within
|
||||
the cloud is not interrupted, it is recommended that
|
||||
the network architecture identify any single points of
|
||||
failure and provide some level of redundancy or fault
|
||||
tolerance. With regard to the network infrastructure
|
||||
itself, this often involves use of networking
|
||||
protocols such as LACP, VRRP or others to achieve a
|
||||
highly available network connection. In addition, it
|
||||
is important to consider the networking implications
|
||||
on API availability. In order to ensure that the APIs,
|
||||
and potentially other services in the cloud are highly
|
||||
available, it is recommended to design load balancing
|
||||
solutions within the network architecture to
|
||||
accommodate for these requirements.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
<section xml:id="software-selection">
|
||||
<title>Software Selection</title>
|
||||
<para>Software selection for a general purpose OpenStack
|
||||
architecture design needs to include these three areas:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Operating system (OS) and hypervisor</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack components</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Supplemental software</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
<section xml:id="os-and-hypervisor"><title>OS and Hypervisor</title>
|
||||
<para>The selection of OS and hypervisor has a tremendous impact
|
||||
on the overall design. Selecting a particular operating system
|
||||
and hypervisor can also directly affect server hardware
|
||||
selection. It is recommended to make sure the storage hardware
|
||||
selection and topology support the selected operating system
|
||||
and hypervisor combination. Finally, it is important to ensure
|
||||
that the networking hardware selection and topology will work
|
||||
with the chosen operating system and hypervisor combination.
|
||||
For example, if the design uses Link Aggregation Control
|
||||
Protocol (LACP), the OS and hypervisor both need to support
|
||||
it.</para>
|
||||
<para>Some areas that could be impacted by the selection of OS and
|
||||
hypervisor include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cost: Selecting a commercially supported hypervisor,
|
||||
such as Microsoft Hyper-V, will result in a different
|
||||
cost model rather than community-supported open source
|
||||
hypervisors including KVM, Kinstance or Xen. When
|
||||
comparing open source OS solutions, choosing Ubuntu
|
||||
over Red Hat (or vice versa) will have an impact on
|
||||
cost due to support contracts. On the other hand,
|
||||
business or application requirements may dictate a
|
||||
specific or commercially supported hypervisor.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Supportability: Depending on the selected
|
||||
hypervisor, the staff should have the appropriate
|
||||
training and knowledge to support the selected OS and
|
||||
hypervisor combination. If they do not, training will
|
||||
need to be provided which could have a cost impact on
|
||||
the design.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Management tools: The management tools used for
|
||||
Ubuntu and Kinstance differ from the management tools
|
||||
for VMware vSphere. Although both OS and hypervisor
|
||||
combinations are supported by OpenStack, there will be
|
||||
very different impacts to the rest of the design as a
|
||||
result of the selection of one combination versus the
|
||||
other.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Scale and performance: Ensure that selected OS and
|
||||
hypervisor combinations meet the appropriate scale and
|
||||
performance requirements. The chosen architecture will
|
||||
need to meet the targeted instance-host ratios with
|
||||
the selected OS-hypervisor combinations.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Security: Ensure that the design can accommodate the
|
||||
regular periodic installation of application security
|
||||
patches while maintaining the required workloads. The
|
||||
frequency of security patches for the proposed OS -
|
||||
hypervisor combination will have an impact on
|
||||
performance and the patch installation process could
|
||||
affect maintenance windows.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Supported features: Determine which features of
|
||||
OpenStack are required. This will often determine the
|
||||
selection of the OS-hypervisor combination. Certain
|
||||
features are only available with specific OSs or
|
||||
hypervisors. For example, if certain features are not
|
||||
available, the design might need to be modified to
|
||||
meet the user requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Interoperability: Consideration should be given to
|
||||
the ability of the selected OS-hypervisor combination
|
||||
to interoperate or co-exist with other OS-hypervisors
|
||||
as well as other software solutions in the overall
|
||||
design (if required). Operational troubleshooting
|
||||
tools for one OS-hypervisor combination may differ
|
||||
from the tools used for another OS-hypervisor
|
||||
combination and, as a result, the design will need to
|
||||
address if the two sets of tools need to interoperate.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
<section xml:id="openstack-components">
|
||||
<title>OpenStack Components</title>
|
||||
<para>The selection of which OpenStack components are included has
|
||||
a significant impact on the overall design. While there are
|
||||
certain components that will always be present, (Nova and
|
||||
Glance, for example) there are other services that may not be
|
||||
required. As an example, a certain design might not need
|
||||
OpenStack Heat. Omitting Heat would not have a significant
|
||||
impact on the overall design of a cloud; however, if the
|
||||
architecture uses a replacement for OpenStack Swift for its
|
||||
storage component, it could potentially have significant
|
||||
impacts on the rest of the design.</para>
|
||||
<para>The exclusion of certain OpenStack components might also
|
||||
limit or constrain the functionality of other components. If
|
||||
the architecture includes Heat but excludes Ceilometer, then
|
||||
the design will not be able to take advantage of Heat's auto
|
||||
scaling functionality (which relies on information from
|
||||
Ceilometer). It is important to research the component
|
||||
interdependencies in conjunction with the technical
|
||||
requirements before deciding what components need to be
|
||||
included and what components can be dropped from the final
|
||||
architecture.</para>
|
||||
</section>
|
||||
<section xml:id="supplemental-components"><title>Supplemental Components</title>
|
||||
<para>While OpenStack is a fairly complete collection of software
|
||||
projects for building a platform for cloud services, there are
|
||||
invariably additional pieces of software that need to be
|
||||
considered in any given OpenStack design.</para>
|
||||
</section>
|
||||
<section xml:id="networking-software"><title>Networking Software</title>
|
||||
<para>OpenStack Neutron provides a wide variety of networking
|
||||
services for instances. There are many additional networking
|
||||
software packages that might be useful to manage the OpenStack
|
||||
components themselves. Some examples include software to
|
||||
provide load balancing, network redundancy protocols, and
|
||||
routing daemons. Some of these software packages are described
|
||||
in more detail in the OpenStack HA Guide (refer to Chapter 8
|
||||
of the OpenStack High Availability Guide).</para>
|
||||
<para>For a general purpose OpenStack cloud, the OpenStack
|
||||
infrastructure components will need to be highly available. If
|
||||
the design does not include hardware load balancing,
|
||||
networking software packages like HAProxy will need to be
|
||||
included.</para>
|
||||
</section>
|
||||
<section xml:id="management-software"><title>Management Software</title>
|
||||
<para>The selected supplemental software solution impacts and
|
||||
affects the overall OpenStack cloud design. This includes
|
||||
software for providing clustering, logging, monitoring and
|
||||
alerting.</para>
|
||||
<para>Inclusion of clustering Software, such as Corosync or
|
||||
Pacemaker, is determined primarily by the availability
|
||||
requirements. Therefore, the impact of including (or not
|
||||
including) these software packages is primarily determined by
|
||||
the availability of the cloud infrastructure and the
|
||||
complexity of supporting the configuration after it is
|
||||
deployed. The OpenStack High Availability Guide provides more
|
||||
details on the installation and configuration of Corosync and
|
||||
Pacemaker, should these packages need to be included in the
|
||||
design.</para>
|
||||
<para>Requirements for logging, monitoring, and alerting are
|
||||
determined by operational considerations. Each of these
|
||||
sub-categories includes a number of various options. For
|
||||
example, in the logging sub-category one might consider
|
||||
Logstash, Splunk, instanceware Log Insight, or some other log
|
||||
aggregation-consolidation tool. Logs should be stored in a
|
||||
centralized location to make it easier to perform analytics
|
||||
against the data. Log data analytics engines can also provide
|
||||
automation and issue notification by providing a mechanism to
|
||||
both alert and automatically attempt to remediate some of the
|
||||
more commonly known issues.</para>
|
||||
<para>If any of these software packages are required, then the
|
||||
design must account for the additional resource consumption
|
||||
(CPU, RAM, storage, and network bandwidth for a log
|
||||
aggregation solution, for example). Some other potential
|
||||
design impacts include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OS - Hypervisor combination: Ensure that the
|
||||
selected logging, monitoring, or alerting tools
|
||||
support the proposed OS-hypervisor combination.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network hardware: The network hardware selection
|
||||
needs to be supported by the logging, monitoring, and
|
||||
alerting software.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
<section xml:id="database-software"><title>Database Software</title>
|
||||
<para>A large majority of the OpenStack components require access
|
||||
to back-end database services to store state and configuration
|
||||
information. Selection of an appropriate back-end database
|
||||
that will satisfy the availability and fault tolerance
|
||||
requirements of the OpenStack services is required. OpenStack
|
||||
services supports connecting to any database that is supported
|
||||
by the sqlalchemy python drivers, however, most common
|
||||
database deployments make use of mySQL or variations of it. It
|
||||
is recommended that the database which provides back-end
|
||||
service within a general purpose cloud be made highly
|
||||
available when using an available technology which can
|
||||
accomplish that goal. Some of the more common software
|
||||
solutions used include Galera, MariaDB and mySQL with
|
||||
multi-master replication.</para>
|
||||
</section>
|
||||
<section xml:id="addressing-performance-sensitive-workloads"><title>Addressing Performance-Sensitive Workloads</title>
|
||||
<para>Although one of the key defining factors for a general
|
||||
purpose OpenStack cloud is that performance is not a
|
||||
determining factor, there may still be some
|
||||
performance-sensitive workloads deployed on the general
|
||||
purpose OpenStack cloud. For design guidance on
|
||||
performance-sensitive workloads, it is recommended to refer to
|
||||
the focused scenarios later in this guide. The
|
||||
resource-focused guides can be used as a supplement to this
|
||||
guide to help with decisions regarding performance-sensitive
|
||||
workloads.</para>
|
||||
</section>
|
||||
<section xml:id="compute-focused-workloads"><title>Compute-Focused Workloads</title>
|
||||
<para>In an OpenStack cloud that is compute-focused, there are
|
||||
some design choices that can help accommodate those workloads.
|
||||
Compute-focused workloads are generally those that would place
|
||||
a higher demand on CPU and memory resources with lower
|
||||
priority given to storage and network performance, other than
|
||||
what is required to support the intended compute workloads.
|
||||
For guidance on designing for this type of cloud, please refer
|
||||
to the section on Compute Focused clouds.</para>
|
||||
</section>
|
||||
<section xml:id="network-focused-workloads"><title>Network-Focused Workloads</title>
|
||||
<para>In a network-focused OpenStack cloud some design choices can
|
||||
improve the performance of these types of workloads.
|
||||
Network-focused workloads have extreme demands on network
|
||||
bandwidth and services that require specialized consideration
|
||||
and planning. For guidance on designing for this type of
|
||||
cloud, please refer to the section on Network-Focused clouds.</para>
|
||||
</section>
|
||||
<section xml:id="storage-focused-workloads"><title>Storage-Focused Workloads</title>
|
||||
<para>Storage focused OpenStack clouds need to be designed to
|
||||
accommodate workloads that have extreme demands on either
|
||||
object or block storage services that require specialized
|
||||
consideration and planning. For guidance on designing for this
|
||||
type of cloud, please refer to the section on Storage-Focused
|
||||
clouds.</para></section>
|
||||
</section>
|
@ -0,0 +1,64 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-intro-general-purpose">
|
||||
<title>Introduction</title>
|
||||
<para>An OpenStack general purpose cloud is often considered a
|
||||
starting point for building a cloud deployment. General
|
||||
purpose clouds, by their nature, balance the components and do
|
||||
not emphasize (or heavily emphasize) any particular aspect of
|
||||
the overall computing environment. The expectation is that the
|
||||
compute, network, and storage components will be given equal
|
||||
weight in the design. General purpose clouds can be found in
|
||||
private, public, and hybrid environments. They lend themselves
|
||||
to many different use cases but, since they are homogeneous
|
||||
deployments, they are not suited to specialized environments
|
||||
or edge case situations. Common uses to consider for a general
|
||||
purpose cloud could be, but are not limited to, providing a
|
||||
simple database, a web application runtime environment, a
|
||||
shared application development platform, or lab test bed. In
|
||||
other words, any use case that would benefit from a scale-out
|
||||
rather than a scale-up approach is a good candidate for a
|
||||
general purpose cloud architecture.</para>
|
||||
<para>A general purpose cloud, by definition, is something that is
|
||||
designed to have a range of potential uses or functions; not
|
||||
specialized for a specific use. General purpose architecture
|
||||
is largely considered a scenario that would address 80% of the
|
||||
potential use cases. The infrastructure, in itself, is a
|
||||
specific use case. It is also a good place to start the design
|
||||
process. As the most basic cloud service model, general
|
||||
purpose clouds are designed to be platforms suited for general
|
||||
purpose applications.</para>
|
||||
<para>General purpose clouds are limited to the most basic
|
||||
components, but they can include additional resources such
|
||||
as:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Virtual-machine disk image library</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Raw block storage</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>File or object storage</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Firewalls</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Load balancers</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>IP addresses</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network overlays or virtual local area networks
|
||||
(VLANs)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Software bundles</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
@ -0,0 +1,143 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="operational-considerations-general-purpose">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Operational Considerations</title>
|
||||
<para>Many operational factors will affect general purpose cloud
|
||||
design choices. In larger installations, it is not uncommon
|
||||
for operations staff to be tasked with maintaining cloud
|
||||
environments. This differs from the operations staff that is
|
||||
responsible for building or designing the infrastructure. It
|
||||
is important to include the operations function in the
|
||||
planning and design phases of the build out.</para>
|
||||
<para>Service Level Agreements (SLAs) are contractual obligations
|
||||
that provide assurances for service availability. SLAs define
|
||||
levels of availability that drive the technical design, often
|
||||
with penalties for not meeting the contractual obligations.
|
||||
The strictness of the SLA dictates the level of redundancy and
|
||||
resiliency in the OpenStack cloud design. Knowing when and
|
||||
where to implement redundancy and HA is directly affected by
|
||||
expectations set by the terms of the SLA. Some of the SLA
|
||||
terms that will affect the design include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Guarantees for API availability imply multiple
|
||||
infrastructure services combined with highly available
|
||||
load balancers.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network uptime guarantees will affect the switch
|
||||
design and might require redundant switching and
|
||||
power.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network security policies requirements need to be
|
||||
factored in to deployments.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<section xml:id="support-and-maintainability-general-purpose"><title>Support and Maintainability</title>
|
||||
<para>OpenStack cloud management requires operations staff to be
|
||||
able to understand and comprehend design architecture content
|
||||
on some level. The level of skills and the level of separation
|
||||
of the operations and engineering staff are dependent on the
|
||||
size and purpose of the installation. A large cloud service
|
||||
provider or a telecom provider is more likely to be managed by
|
||||
a specially trained, dedicated operations organization. A
|
||||
smaller implementation is more likely to rely on a smaller
|
||||
support staff that might need to take on the combined
|
||||
engineering, design and operations functions.</para>
|
||||
<para>Furthermore, maintaining OpenStack installations requires a
|
||||
variety of technical skills. Some of these skills may include
|
||||
the ability to debug Python log output to a basic level and an
|
||||
understanding of networking concepts.</para>
|
||||
<para>Consider incorporating features into the architecture and
|
||||
design that reduce the operations burden. This is accomplished
|
||||
by automating some of the operations functions. In some cases
|
||||
it may be beneficial to use a third party management company
|
||||
with special expertise in managing OpenStack
|
||||
deployments.</para></section>
|
||||
<section xml:id="monitoring-general-purpose"><title>Monitoring</title>
|
||||
<para>Like any other infrastructure deployment, OpenStack clouds
|
||||
need an appropriate monitoring platform to ensure any errors
|
||||
are caught and managed appropriately. Consider leveraging any
|
||||
existing monitoring system to see if it will be able to
|
||||
effectively monitor an OpenStack environment. While there are
|
||||
many aspects that need to be monitored, specific metrics that
|
||||
are critically important to capture include image disk
|
||||
utilization, or response time to the Compute API.</para></section>
|
||||
<section xml:id="downtime-general-purpose"><title>Downtime</title>
|
||||
<para>No matter how robust the architecture is, at some point
|
||||
components will fail. Designing for high availability (HA) can
|
||||
have significant cost ramifications, therefore the resiliency
|
||||
of the overall system and the individual components is going
|
||||
to be dictated by the requirements of the SLA. Downtime
|
||||
planning includes creating processes and architectures that
|
||||
support planned (maintenance) and unplanned (system faults)
|
||||
downtime.</para>
|
||||
<para>An example of an operational consideration is the recovery
|
||||
of a failed compute host. This might mean requiring the
|
||||
restoration of instances from a snapshot or respawning an
|
||||
instance on another available compute host. This could have
|
||||
consequences on the overall application design. A general
|
||||
purpose cloud should not need to provide an ability to migrate
|
||||
instances from one host to another. If the expectation is that
|
||||
the application will be designed to tolerate failure,
|
||||
additional considerations need to be made around supporting
|
||||
instance migration. In this scenario, extra supporting
|
||||
services, including shared storage attached to compute hosts,
|
||||
might need to be deployed.</para></section>
|
||||
<section xml:id="capacity-planning"><title>Capacity Planning</title>
|
||||
<para>Capacity planning for future growth is a critically
|
||||
important and often overlooked consideration. Capacity
|
||||
constraints in a general purpose cloud environment include
|
||||
compute and storage limits. There is a relationship between
|
||||
the size of the compute environment and the supporting
|
||||
OpenStack infrastructure controller nodes required to support
|
||||
it. As the size of the supporting compute environment
|
||||
increases, the network traffic and messages will increase
|
||||
which will add load to the controller or networking nodes.
|
||||
While no hard and fast rule exists, effective monitoring of
|
||||
the environment will help with capacity decisions on when to
|
||||
scale the back-end infrastructure as part of the scaling of
|
||||
the compute resources.</para>
|
||||
<para>Adding extra compute capacity to an OpenStack cloud is a
|
||||
horizontally scaling process as consistently configured
|
||||
compute nodes automatically attach to an OpenStack cloud. Be
|
||||
mindful of any additional work that is needed to place the
|
||||
nodes into appropriate availability zones and host aggregates.
|
||||
Make sure to use identical or functionally compatible CPUs
|
||||
when adding additional compute nodes to the environment
|
||||
otherwise live migration features will break. Scaling out
|
||||
compute hosts will directly affect network and other
|
||||
datacenter resources so it will be necessary to add rack
|
||||
capacity or network switches.</para>
|
||||
<para>Another option is to assess the average workloads and
|
||||
increase the number of instances that can run within the
|
||||
compute environment by adjusting the overcommit ratio. While
|
||||
only appropriate in some environments, it's important to
|
||||
remember that changing the CPU overcommit ratio can have a
|
||||
detrimental effect and cause a potential increase in noisy
|
||||
neighbor. The added risk of increasing the overcommit ratio is
|
||||
more instances will fail when a compute host fails.</para>
|
||||
<para>Compute host components can also be upgraded to account for
|
||||
increases in demand; this is known as vertical scaling.
|
||||
Upgrading CPUs with more cores, or increasing the overall
|
||||
server memory, can add extra needed capacity depending on
|
||||
whether the running applications are more CPU intensive or
|
||||
memory intensive.</para>
|
||||
<para>Insufficient disk capacity could also have a negative effect
|
||||
on overall performance including CPU and memory usage.
|
||||
Depending on the back-end architecture of the OpenStack Block
|
||||
Storage layer, capacity might include adding disk shelves to
|
||||
enterprise storage systems or installing additional block
|
||||
storage nodes. It may also be necessary to upgrade directly
|
||||
attached storage installed in compute hosts or add capacity to
|
||||
the shared storage to provide additional ephemeral storage to
|
||||
instances.</para>
|
||||
<para>For a deeper discussion on many of these topics, refer to
|
||||
the OpenStack Operations Guide at
|
||||
http://docs.openstack.org/ops.</para></section>
|
||||
</section>
|
@ -0,0 +1,100 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="prescriptive-example-online-classifieds">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Prescriptive Example</title>
|
||||
<para>An online classified advertising company wants to run web applications
|
||||
consisting of Tomcat, Nginx and MariaDB in a private cloud. In order to
|
||||
meet policy requirements, the cloud infrastructure will run in their own
|
||||
data center. They have predictable load requirements but require an
|
||||
element of scaling to cope with nightly increases in demand. Their
|
||||
current environment is not flexible enough to align with their goal of
|
||||
running an open source API driven environment. Their current environment
|
||||
consists of the following:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Between 120 and 140 installations of Nginx and
|
||||
Tomcat, each with 2 vCPUs and 4 GB of RAM</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A three-node MariaDB and Galera cluster, each with 4
|
||||
vCPUs and 8 GB RAM</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>The company runs hardware load balancers and multiple web
|
||||
applications serving the sites. The company orchestrates their
|
||||
environment using a combination of scripts and Puppet. The
|
||||
websites generate a large amount of log data each day that
|
||||
needs to be archived.</para>
|
||||
<para>The solution would consist of the following OpenStack
|
||||
components:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>A firewall, switches and load balancers on the
|
||||
public facing network connections.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Controller services running Image,
|
||||
Identity, Networking and supporting services such as
|
||||
MariaDB and RabbitMQ. The controllers will run in a
|
||||
highly available configuration on at least three
|
||||
controller nodes.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Compute nodes running the KVM
|
||||
hypervisor.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Block Storage for use by compute instances
|
||||
that require persistent storage such as databases for
|
||||
dynamic sites.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Object Storage for serving static objects
|
||||
such as images.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para><inlinemediaobject><imageobject><imagedata
|
||||
fileref="../images/General_Architecture3.png"
|
||||
/></imageobject></inlinemediaobject>Running up to 140
|
||||
web instances and the small number of MariaDB instances
|
||||
requires 292 vCPUs available, as well as 584 GB RAM. On a
|
||||
typical 1U server using dual-socket hex-core Intel CPUs with
|
||||
Hyperthreading, and assuming 2:1 CPU overcommit ratio, this
|
||||
would require 8 OpenStack Compute nodes.</para>
|
||||
<para>The web application instances run from local storage on each
|
||||
of the OpenStack Compute nodes. The web application instances
|
||||
are stateless, meaning that any of the instances can fail and
|
||||
the application will continue to function.</para>
|
||||
<para>MariaDB server instances store their data on shared
|
||||
enterprise storage, such as NetApp or Solidfire devices. If a
|
||||
MariaDB instance fails, storage would be expected to be
|
||||
re-attached to another instance and rejoined to the Galera
|
||||
cluster.</para>
|
||||
<para>Logs from the web application servers are shipped to
|
||||
OpenStack Object Storage for later processing and
|
||||
archiving.</para>
|
||||
<para>In this scenario, additional capabilities can be realized by
|
||||
moving static web content to be served from OpenStack Object
|
||||
Storage containers, and backing the OpenStack Image Service
|
||||
with OpenStack Object Storage. Note that an increase in
|
||||
OpenStack Object Storage means that network bandwidth needs to
|
||||
be taken in to consideration. It is best to run OpenStack
|
||||
Object Storage with network connections offering 10 GbE or
|
||||
better connectivity.</para>
|
||||
<para>There is also a potential to leverage the Orchestration and
|
||||
Telemetry OpenStack modules to provide an auto-scaling,
|
||||
orchestrated web application environment. Defining the web
|
||||
applications in Heat Orchestration Templates (HOT) would
|
||||
negate the reliance on the scripted Puppet solution currently
|
||||
employed.</para>
|
||||
<para>OpenStack Networking can be used to control hardware load
|
||||
balancers through the use of plug-ins and the Networking API.
|
||||
This would allow a user to control hardware load balance pools
|
||||
and instances as members in these pools, but their use in
|
||||
production environments must be carefully weighed against
|
||||
current stability.</para>
|
||||
</section>
|
@ -0,0 +1,715 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="technical-considerations-general-purpose">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Technical Considerations</title>
|
||||
<para>When designing a general purpose cloud, there is an implied
|
||||
requirement to design for all of the base services generally
|
||||
associated with providing Infrastructure-as-a-Service:
|
||||
compute, network and storage. Each of these services have
|
||||
different resource requirements. As a result, it is important
|
||||
to make design decisions relating directly to the service
|
||||
currently under design, while providing a balanced
|
||||
infrastructure that provides for all services.</para>
|
||||
<para>When designing an OpenStack cloud as a general purpose
|
||||
cloud, the hardware selection process can be lengthy and
|
||||
involved due to the sheer mass of services which need to be
|
||||
designed and the unique characteristics and requirements of
|
||||
each service within the cloud. Hardware designs need to be
|
||||
generated for each type of resource pool; specifically,
|
||||
compute, network, and storage. In addition to the hardware
|
||||
designs, which affect the resource nodes themselves, there are
|
||||
also a number of additional hardware decisions to be made
|
||||
related to network architecture and facilities planning. These
|
||||
factors play heavily into the overall architecture of an
|
||||
OpenStack cloud.</para>
|
||||
<section xml:id="designing-compute-resources-tech-considerations">
|
||||
<title>Designing Compute Resources</title>
|
||||
<para>It is recommended to design compute resources as pools of
|
||||
resources which will be addressed on-demand. When designing
|
||||
compute resource pools, a number of factors impact your design
|
||||
decisions. For example, decisions related to processors,
|
||||
memory, and storage within each hypervisor are just one
|
||||
element of designing compute resources. In addition, it is
|
||||
necessary to decide whether compute resources will be provided
|
||||
in a single pool or in multiple pools.</para>
|
||||
<para>To design for the best use of available resources by
|
||||
applications running in the cloud, it is recommended to design
|
||||
more than one compute resource pool. Each independent resource
|
||||
pool should be designed to provide service for specific
|
||||
flavors of instances or groupings of flavors. For the purpose
|
||||
of this book, "instance" refers to a virtual machines and the
|
||||
operating system running on the virtual machine. Designing
|
||||
multiple resource pools helps to ensure that, as instances are
|
||||
scheduled onto compute hypervisors, each independent node's
|
||||
resources will be allocated in a way that makes the most
|
||||
efficient use of available hardware. This is commonly referred
|
||||
to as bin packing.</para>
|
||||
<para>Using a consistent hardware design among the nodes that are
|
||||
placed within a resource pool also helps support bin packing.
|
||||
Hardware nodes selected for being a part of a compute resource
|
||||
pool should share a common processor, memory, and storage
|
||||
layout. By choosing a common hardware design, it becomes
|
||||
easier to deploy, support and maintain those nodes throughout
|
||||
their life cycle in the cloud.</para>
|
||||
<para>OpenStack provides the ability to configure overcommit
|
||||
ratio--the ratio of virtual resources available for allocation
|
||||
to physical resources present--for both CPU and memory. The
|
||||
default CPU overcommit ratio is 16:1 and the default memory
|
||||
overcommit ratio is 1.5:1. Determine the tuning of the
|
||||
overcommit ratios for both of these options during the design
|
||||
phase, as this has a direct impact on the hardware layout of
|
||||
your compute nodes.</para>
|
||||
<para>As an example, consider that a m1.small instance uses 1
|
||||
vCPU, 20 GB of ephemeral storage and 2,048 MB of RAM. When
|
||||
designing a hardware node as a compute resource pool to
|
||||
service instances, take into consideration the number of
|
||||
processor cores available on the node as well as the required
|
||||
disk and memory to service instances running at capacity. For
|
||||
a server with 2 CPUs of 10 cores each, with hyperthreading
|
||||
turned on, the default CPU overcommit ratio of 16:1 would
|
||||
allow for 640 (2 x 10 x 2 x 16) total m1.small instances. By
|
||||
the same reasoning, using the default memory overcommit ratio
|
||||
of 1.5:1 you can determine that the server will need at least
|
||||
853GB (640 x 2,048 MB % 1.5) of RAM. When sizing nodes for
|
||||
memory, it is also important to consider the additional memory
|
||||
required to service operating system and service needs.</para>
|
||||
<para>Processor selection is an extremely important consideration
|
||||
in hardware design, especially when comparing the features and
|
||||
performance characteristics of different processors. Some
|
||||
newly released processors include features specific to
|
||||
virtualized compute hosts including hardware assisted
|
||||
virtualization and technology related to memory paging (also
|
||||
known as EPT shadowing). These features have a tremendous
|
||||
positive impact on the performance of virtual machines running
|
||||
in the cloud.</para>
|
||||
<para>In addition to the impact on actual compute services, it is
|
||||
also important to consider the compute requirements of
|
||||
resource nodes within the cloud. Resource nodes refers to
|
||||
non-hypervisor nodes providing controller, object storage,
|
||||
block storage, or networking services in the cloud. The number
|
||||
of processor cores and threads has a direct correlation to the
|
||||
number of worker threads which can be run on a resource node.
|
||||
It is important to ensure sufficient compute capacity and
|
||||
memory is planned on resource nodes.</para>
|
||||
<para>Workload profiles are unpredictable in a general purpose
|
||||
cloud, so it may be difficult to design for every specific use
|
||||
case in mind. Additional compute resource pools can be added
|
||||
to the cloud at a later time, so this unpredictability should
|
||||
not be a problem. In some cases, the demand on certain
|
||||
instance types or flavors may not justify an individual
|
||||
hardware design. In either of these cases, start by providing
|
||||
hardware designs which will be capable of servicing the most
|
||||
common instance requests first, looking to add additional
|
||||
hardware designs to the overall architecture in the form of
|
||||
new hardware node designs and resource pools as they become
|
||||
justified at a later time.</para></section>
|
||||
<section xml:id="designing-network-resources-tech-considerations">
|
||||
<title>Designing Network Resources</title>
|
||||
<para>An OpenStack cloud traditionally has multiple network
|
||||
segments, each of which provides access to resources within
|
||||
the cloud to both operators and tenants. In addition, the
|
||||
network services themselves also require network communication
|
||||
paths which should also be separated from the other networks.
|
||||
When designing network services for a general purpose cloud,
|
||||
it is recommended to plan for either a physical or logical
|
||||
separation of network segments which will be used by operators
|
||||
and tenants. It is further suggested to create an additional
|
||||
network segment for access to internal services such as the
|
||||
message bus and database used by the various cloud services.
|
||||
Segregating these services onto separate networks helps to
|
||||
protect sensitive data and also protects against unauthorized
|
||||
access to services.</para>
|
||||
<para>Based on the requirements of instances being serviced in the
|
||||
cloud, the next design choice which will affect your design is
|
||||
the choice of network service which will be used to service
|
||||
instances in the cloud. The choice between nova-network, as a
|
||||
part of OpenStack Compute Service, and Neutron, the OpenStack
|
||||
Networking Service, has tremendous implications and will have
|
||||
a huge impact on the architecture and design of the cloud
|
||||
network infrastructure.</para>
|
||||
<para>The nova-network service is primarily a layer 2 networking
|
||||
service which has two main modes in which it will function.
|
||||
The difference between the two modes in nova-network pertain
|
||||
to whether or not nova-network uses VLANs. When using
|
||||
nova-network in a flat network mode, all network hardware
|
||||
nodes and devices throughout the cloud are connected to a
|
||||
single layer 2 network segment which provides access to
|
||||
application data.</para>
|
||||
<para>When the network devices in the cloud support segmentation
|
||||
using VLANs, nova-network can operate in the second mode. In
|
||||
this design model, each tenant within the cloud is assigned a
|
||||
network subnet which is mapped to a VLAN on the physical
|
||||
network. It is especially important to remember the maximum
|
||||
number of 4096 VLANs which can be used within a spanning tree
|
||||
domain. These limitations place hard limits on the amount of
|
||||
growth possible within the data center. When designing a
|
||||
general purpose cloud intended to support multiple tenants, it
|
||||
is especially recommended to use nova-network with VLANs, and
|
||||
not in flat network mode.</para>
|
||||
<para>Another consideration regarding network is the fact that
|
||||
nova-network is entirely managed by the cloud operator;
|
||||
tenants do not have control over network resources. If tenants
|
||||
require the ability to manage and create network resources
|
||||
such as network segments and subnets, it will be necessary to
|
||||
install the OpenStack Networking Service to provide network
|
||||
access to instances.</para>
|
||||
<para>The OpenStack Networking Service is a first class networking
|
||||
service that gives full control over creation of virtual
|
||||
network resources to tenants. This is often accomplished in
|
||||
the form of tunneling protocols which will establish
|
||||
encapsulated communication paths over existing network
|
||||
infrastructure in order to segment tenant traffic. These
|
||||
methods vary depending on the specific implementation, but
|
||||
some of the more common methods include tunneling over GRE,
|
||||
encapsulating with VXLAN, and VLAN tags.</para>
|
||||
<para>Initially, it is suggested to design at least three network
|
||||
segments, the first of which will be used for access to the
|
||||
cloud’s REST APIs by tenants and operators. This is generally
|
||||
referred to as a public network. In most cases, the controller
|
||||
nodes and swift proxies within the cloud will be the only
|
||||
devices necessary to connect to this network segment. In some
|
||||
cases, this network might also be serviced by hardware load
|
||||
balancers and other network devices.</para>
|
||||
<para>The next segment is used by cloud administrators to manage
|
||||
hardware resources and is also used by configuration
|
||||
management tools when deploying software and services onto new
|
||||
hardware. In some cases, this network segment might also be
|
||||
used for internal services, including the message bus and
|
||||
database services, to communicate with each other. Due to the
|
||||
highly secure nature of this network segment, it may be
|
||||
desirable to secure this network from unauthorized access.
|
||||
This network will likely need to communicate with every
|
||||
hardware node within the cloud.</para>
|
||||
<para>The last network segment is used by applications and
|
||||
consumers to provide access to the physical network and also
|
||||
for users accessing applications running within the cloud.
|
||||
This network is generally segregated from the one used to
|
||||
access the cloud APIs and is not capable of communicating
|
||||
directly with the hardware resources in the cloud. Compute
|
||||
resource nodes will need to communicate on this network
|
||||
segment, as will any network gateway services which allow
|
||||
application data to access the physical network outside of the
|
||||
cloud.</para></section>
|
||||
<section xml:id="designing-storage-resources-tech-considerations"><title>Designing Storage Resources</title>
|
||||
<para>OpenStack has two independent storage services to consider,
|
||||
each with its own specific design requirements and goals. In
|
||||
addition to services which provide storage as their primary
|
||||
function, there are additional design considerations with
|
||||
regard to compute and controller nodes which will affect the
|
||||
overall cloud architecture.</para></section>
|
||||
<section xml:id="designing-openstack-object-storage-tech-considerations">
|
||||
<title>Designing OpenStack Object Storage</title>
|
||||
<para>When designing hardware resources for OpenStack Object
|
||||
Storage, the primary goal is to maximize the amount of storage
|
||||
in each resource node while also ensuring that the cost per
|
||||
terabyte is kept to a minimum. This often involves utilizing
|
||||
servers which can hold a large number of spinning disks.
|
||||
Whether choosing to use 2U server form factors with directly
|
||||
attached storage or an external chassis that holds a larger
|
||||
number of drives, the main goal is to maximize the storage
|
||||
available in each node.</para>
|
||||
<para>It is not recommended to invest in enterprise class drives
|
||||
for an OpenStack Object Storage cluster. The consistency and
|
||||
partition tolerance characteristics of OpenStack Object
|
||||
Storage will ensure that data stays up to date and survives
|
||||
hardware faults without the use of any specialized data
|
||||
replication devices.</para>
|
||||
<para>A great benefit of OpenStack Object Storage is the ability
|
||||
to mix and match drives by utilizing weighting within the
|
||||
swift ring. When designing your swift storage cluster, it is
|
||||
recommended to make use of the most cost effective storage
|
||||
solution available at the time. Many server chassis on the
|
||||
market can hold 60 or more drives in 4U of rack space,
|
||||
therefore it is recommended to maximize the amount of storage
|
||||
per rack unit at the best cost per terabyte. Furthermore, the
|
||||
use of RAID controllers is not recommended in an object
|
||||
storage node.</para>
|
||||
<para>In order to achieve this durability and availability of data
|
||||
stored as objects, it is important to design object storage
|
||||
resource pools in a way that provides the suggested
|
||||
availability that the service can provide. Beyond designing at
|
||||
the hardware node level, it is important to consider
|
||||
rack-level and zone-level designs to accommodate the number of
|
||||
replicas configured to be stored in the Object Storage service
|
||||
(the default number of replicas is three). Each replica of
|
||||
data should exist in its own availability zone with its own
|
||||
power, cooling, and network resources available to service
|
||||
that specific zone.</para>
|
||||
<para>Object storage nodes should be designed so that the number
|
||||
of requests does not hinder the performance of the cluster.
|
||||
The object storage service is a chatty protocol, therefore
|
||||
making use of multiple processors that have higher core counts
|
||||
will ensure the IO requests do not inundate the server.</para></section>
|
||||
<section xml:id="designing-openstack-block-storage"><title>Designing OpenStack Block Storage</title>
|
||||
<para>When designing OpenStack Block Storage resource nodes, it is
|
||||
helpful to understand the workloads and requirements that will
|
||||
drive the use of block storage in the cloud. In a general
|
||||
purpose cloud these use patterns are often unknown. It is
|
||||
recommended to design block storage pools so that tenants can
|
||||
choose the appropriate storage solution for their
|
||||
applications. By creating multiple storage pools of different
|
||||
types, in conjunction with configuring an advanced storage
|
||||
scheduler for the block storage service, it is possible to
|
||||
provide tenants with a large catalog of storage services with
|
||||
a variety of performance levels and redundancy options.</para>
|
||||
<para>In addition to directly attached storage populated in
|
||||
servers, block storage can also take advantage of a number of
|
||||
enterprise storage solutions. These are addressed via a plug-in
|
||||
driver developed by the hardware vendor. A large number of
|
||||
enterprise storage plug-in drivers ship out-of-the-box with
|
||||
OpenStack Block Storage (and many more available via third
|
||||
party channels). While a general purpose cloud would likely
|
||||
use directly attached storage in the majority of block storage
|
||||
nodes, it may also be necessary to provide additional levels
|
||||
of service to tenants which can only be provided by enterprise
|
||||
class storage solutions.</para>
|
||||
<para>The determination to use a RAID controller card in block
|
||||
storage nodes is impacted primarily by the redundancy and
|
||||
availability requirements of the application. Applications
|
||||
which have a higher demand on input-output per second (IOPS)
|
||||
will influence both the choice to use a RAID controller and
|
||||
the level of RAID configured on the volume. Where performance
|
||||
is a consideration, it is suggested to make use of higher
|
||||
performing RAID volumes. In contrast, where redundancy of
|
||||
block storage volumes is more important it is recommended to
|
||||
make use of a redundant RAID configuration such as RAID 5 or
|
||||
RAID 6. Some specialized features, such as automated
|
||||
replication of block storage volumes, may require the use of
|
||||
third-party plug-ins and enterprise block storage solutions in
|
||||
order to provide the high demand on storage. Furthermore,
|
||||
where extreme performance is a requirement it may also be
|
||||
necessary to make use of high speed SSD disk drives' high
|
||||
performing flash storage solutions.</para></section>
|
||||
<section xml:id="software-selection-tech-considerations">
|
||||
<title>Software Selection</title>
|
||||
<para>The software selection process can play a large role in the
|
||||
architecture of a general purpose cloud. Choice of operating
|
||||
system, selection of OpenStack software components, choice of
|
||||
hypervisor and selection of supplemental software will have a
|
||||
large impact on the design of the cloud.</para>
|
||||
<para>Operating system (OS) selection plays a large role in the
|
||||
design and architecture of a cloud. There are a number of OSes
|
||||
which have native support for OpenStack including Ubuntu, Red
|
||||
Hat Enterprise Linux (RHEL), CentOS, and SUSE Linux Enterprise
|
||||
Server (SLES). "Native support" in this context means that the
|
||||
distribution provides distribution-native packages by which to
|
||||
install OpenStack in their repositories. Note that "native
|
||||
support" is not a constraint on the choice of OS; users are
|
||||
free to choose just about any Linux distribution (or even
|
||||
Microsoft Windows) and install OpenStack directly from source
|
||||
(or compile their own packages). However, the reality is that
|
||||
many organizations will prefer to install OpenStack from
|
||||
distribution-supplied packages or repositories (although using
|
||||
the distribution vendor's OpenStack packages might be a
|
||||
requirement for support).</para>
|
||||
<para>OS selection also directly influences hypervisor selection.
|
||||
A cloud architect who selects Ubuntu or RHEL has some
|
||||
flexibility in hypervisor; KVM, Xen, and LXC are supported
|
||||
virtualization methods available under OpenStack Compute
|
||||
(Nova) on these Linux distributions. A cloud architect who
|
||||
selects Hyper-V, on the other hand, is limited to Windows
|
||||
Server. Similarly, a cloud architect who selects XenServer is
|
||||
limited to the CentOS-based dom0 operating system provided
|
||||
with XenServer.</para>
|
||||
<para>The primary factors that play into OS/hypervisor selection
|
||||
include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>User requirements: The selection of OS/hypervisor
|
||||
combination first and foremost needs to support the
|
||||
user requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Support: The selected OS/hypervisor combination
|
||||
needs to be supported by OpenStack.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Interoperability: The OS/hypervisor needs to be
|
||||
interoperable with other features and services in the
|
||||
OpenStack design in order to meet the user
|
||||
requirements.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="hypervisor-tech-considerations"><title>Hypervisor</title>
|
||||
<para>OpenStack supports a wide variety of hypervisors, one or
|
||||
more of which can be used in a single cloud. These hypervisors
|
||||
include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>KVM (and Qemu)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>XCP/XenServer</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>vSphere (vCenter and ESXi)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Hyper-V</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>LXC</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Docker</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Bare-metal</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>A complete list of supported hypervisors and their
|
||||
capabilities can be found at
|
||||
https://wiki.openstack.org/wiki/HypervisorSupportMatrix.</para>
|
||||
<para>General purpose clouds should make use of hypervisors that
|
||||
support the most general purpose use cases, such as KVM and
|
||||
Xen. More specific hypervisors should then be chosen to
|
||||
account for specific functionality or a supported feature
|
||||
requirement. In some cases, there may also be a mandated
|
||||
requirement to run software on a certified hypervisor
|
||||
including solutions from VMware, Microsoft, and Citrix.</para>
|
||||
<para>The features offered through the OpenStack cloud platform
|
||||
determine the best choice of a hypervisor. As an example, for
|
||||
a general purpose cloud that predominantly supports a
|
||||
Microsoft-based migration, or is managed by staff that has a
|
||||
particular skill for managing certain hypervisors and
|
||||
operating systems, Hyper-V might be the best available choice.
|
||||
While the decision to use Hyper-V does not limit the ability
|
||||
to run alternative operating systems, be mindful of those that
|
||||
are deemed supported. Each different hypervisor also has their
|
||||
own hardware requirements which may affect the decisions
|
||||
around designing a general purpose cloud. For example, to
|
||||
utilize the live migration feature of VMware, vMotion, this
|
||||
requires an installation of vCenter/vSphere and the use of the
|
||||
ESXi hypervisor, which increases the infrastructure
|
||||
requirements.</para>
|
||||
<para>In a mixed hypervisor environment, specific aggregates of
|
||||
compute resources, each with defined capabilities, enable
|
||||
workloads to utilize software and hardware specific to their
|
||||
particular requirements. This functionality can be exposed
|
||||
explicitly to the end user, or accessed through defined
|
||||
metadata within a particular flavor of an instance.</para></section>
|
||||
<section xml:id="openstack-components-tech-considerations"><title>OpenStack Components</title>
|
||||
<para>A general purpose OpenStack cloud design should incorporate
|
||||
the core OpenStack services to provide a wide range of
|
||||
services to end-users. The OpenStack core services recommended
|
||||
in a general purpose cloud are:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OpenStack Compute (Nova)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Networking (Neutron)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Image Service (Glance)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Identity Service (Keystone)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Dashboard (Horizon)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Telemetry (Ceilometer)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>A general purpose cloud may also include OpenStack Object
|
||||
Storage (Swift). OpenStack Block Storage (Cinder) may be
|
||||
selected to provide persistent storage to applications and
|
||||
instances although, depending on the use case, this could be
|
||||
optional.</para></section>
|
||||
<section xml:id="supplemental-software-tech-considerations"><title>Supplemental Software</title>
|
||||
<para>A general purpose OpenStack deployment consists of more than
|
||||
just OpenStack-specific components. A typical deployment
|
||||
involves services that provide supporting functionality,
|
||||
including databases and message queues, and may also involve
|
||||
software to provide high availability of the OpenStack
|
||||
environment. Design decisions around the underlying message
|
||||
queue might affect the required number of controller services,
|
||||
as well as the technology to provide highly resilient database
|
||||
functionality, such as MariaDB with Galera. In such a
|
||||
scenario, replication of services relies on quorum. Therefore,
|
||||
the underlying database nodes, for example, should consist of
|
||||
at least 3 nodes to account for the recovery of a failed
|
||||
Galera node. When increasing the number of nodes to support a
|
||||
feature of the software, consideration of rack space and
|
||||
switch port density becomes important.</para>
|
||||
<para>Where many general purpose deployments use hardware load
|
||||
balancers to provide highly available API access and SSL
|
||||
termination, software solutions, for example HAProxy, can also
|
||||
be considered. It is vital to ensure that such software
|
||||
implementations are also made highly available. This high
|
||||
availability can be achieved by using software such as
|
||||
Keepalived or Pacemaker with Corosync. Pacemaker and Corosync
|
||||
can provide Active-Active or Active-Passive highly available
|
||||
configuration depending on the specific service in the
|
||||
OpenStack environment. Using this software can affect the
|
||||
design as it assumes at least a 2-node controller
|
||||
infrastructure where one of those nodes may be running certain
|
||||
services in standby mode.</para>
|
||||
<para>Memcached is a distributed memory object caching system, and
|
||||
Redis is a key-value store. Both are usually deployed on
|
||||
general purpose clouds to assist in alleviating load to the
|
||||
Identity service. The memcached service caches tokens, and due
|
||||
to its distributed nature it can help alleviate some
|
||||
bottlenecks to the underlying authentication system. Using
|
||||
memcached or Redis does not affect the overall design of your
|
||||
architecture as they tend to be deployed onto the
|
||||
infrastructure nodes providing the OpenStack services.</para></section>
|
||||
<section xml:id="performance-tech-considerations"><title>Performance</title>
|
||||
<para>Performance of an OpenStack deployment is dependent on a
|
||||
number of factors related to the infrastructure and controller
|
||||
services. The user requirements can be split into general
|
||||
network performance, performance of compute resources, and
|
||||
performance of storage systems.</para></section>
|
||||
<section xml:id="controller-infrastructure-tech-considerations">
|
||||
<title>Controller Infrastructure</title>
|
||||
<para>The Controller infrastructure nodes provide management
|
||||
services to the end-user as well as providing services
|
||||
internally for the operating of the cloud. The Controllers
|
||||
typically run message queuing services that carry system
|
||||
messages between each service. Performance issues related to
|
||||
the message bus would lead to delays in sending that message
|
||||
to where it needs to go. The result of this condition would be
|
||||
delays in operation functions such as spinning up and deleting
|
||||
instances, provisioning new storage volumes and managing
|
||||
network resources. Such delays could adversely affect an
|
||||
application’s ability to react to certain conditions,
|
||||
especially when using auto-scaling features. It is important
|
||||
to properly design the hardware used to run the controller
|
||||
infrastructure as outlined above in the Hardware Selection
|
||||
section.</para>
|
||||
<para>Performance of the controller services is not just limited
|
||||
to processing power, but restrictions may emerge in serving
|
||||
concurrent users. Ensure that the APIs and Horizon services
|
||||
are load tested to ensure that you are able to serve your
|
||||
customers. Particular attention should be made to the
|
||||
OpenStack Identity Service (Keystone), which provides the
|
||||
authentication and authorization for all services, both
|
||||
internally to OpenStack itself and to end-users. This service
|
||||
can lead to a degradation of overall performance if this is
|
||||
not sized appropriately.</para></section>
|
||||
<section xml:id="network-performance-tech-considerations"><title>Network Performance</title>
|
||||
<para>In a general purpose OpenStack cloud, the requirements of
|
||||
the network help determine its performance capabilities. For
|
||||
example, small deployments may employ 1 Gibabit Ethernet (GbE)
|
||||
networking, whereas larger installations serving multiple
|
||||
departments or many users would be better architected with 10
|
||||
GbE networking. The performance of the running instances will
|
||||
be limited by these speeds. It is possible to design OpenStack
|
||||
environments that run a mix of networking capabilities. By
|
||||
utilizing the different interface speeds, the users of the
|
||||
OpenStack environment can choose networks that are fit for
|
||||
their purpose. For example, web application instances may run
|
||||
on a public network presented through OpenStack Networking
|
||||
that has 1 GbE capability, whereas the back-end database uses
|
||||
an OpenStack Networking network that has 10 GbE capability to
|
||||
replicate its data or, in some cases, the design may
|
||||
incorporate link aggregation for greater throughput.</para>
|
||||
<para>Network performance can be boosted considerably by
|
||||
implementing hardware load balancers to provide front-end
|
||||
service to the cloud APIs. The hardware load balancers also
|
||||
perform SSL termination if that is a requirement of your
|
||||
environment. When implementing SSL offloading, it is important
|
||||
to understand the SSL offloading capabilities of the devices
|
||||
selected.</para></section>
|
||||
<section xml:id="compute-host-tech-considerations"><title>Compute Host</title>
|
||||
<para>The choice of hardware specifications used in compute nodes
|
||||
including CPU, memory and disk type directly affects the
|
||||
performance of the instances. Other factors which can directly
|
||||
affect performance include tunable parameters within the
|
||||
OpenStack services, for example the overcommit ratio applied
|
||||
to resources. The defaults in OpenStack Compute set a 16:1
|
||||
over-commit of the CPU and 1.5 over-commit of the memory.
|
||||
Running at such high ratios leads to an increase in
|
||||
"noisy-neighbor" activity. Care must be taken when sizing your
|
||||
Compute environment to avoid this scenario. For running
|
||||
general purpose OpenStack environments it is possible to keep
|
||||
to the defaults, but make sure to monitor your environment as
|
||||
usage increases.</para></section>
|
||||
<section xml:id="storage-performance-tech-considerations"><title>Storage Performance</title>
|
||||
<para>When considering performance of OpenStack Block Storage,
|
||||
hardware and architecture choice is important. Block Storage
|
||||
can use enterprise back-end systems such as NetApp or EMC, use
|
||||
scale out storage such as GlusterFS and Ceph, or simply use
|
||||
the capabilities of directly attached storage in the nodes
|
||||
themselves. Block Storage may be deployed so that traffic
|
||||
traverses the host network, which could affect, and be
|
||||
adversely affected by, the front-side API traffic performance.
|
||||
As such, consider using a dedicated data storage network with
|
||||
dedicated interfaces on the Controller and Compute
|
||||
hosts.</para>
|
||||
<para>When considering performance of OpenStack Object Storage, a
|
||||
number of design choices will affect performance. A user’s
|
||||
access to the Object Storage is through the proxy services,
|
||||
which typically sit behind hardware load balancers. By the
|
||||
very nature of a highly resilient storage system, replication
|
||||
of the data would affect performance of the overall system. In
|
||||
this case, 10 GbE (or better) networking is recommended
|
||||
throughout the storage network architecture.</para></section>
|
||||
<section xml:id="availability-tech-considerations"><title>Availability</title>
|
||||
<para>In OpenStack, the infrastructure is integral to providing
|
||||
services and should always be available, especially when
|
||||
operating with SLAs. Ensuring network availability is
|
||||
accomplished by designing the network architecture so that no
|
||||
single point of failure exists. A consideration of the number
|
||||
of switches, routes and redundancies of power should be
|
||||
factored into core infrastructure, as well as the associated
|
||||
bonding of networks to provide diverse routes to your highly
|
||||
available switch infrastructure.</para>
|
||||
<para>The OpenStack services themselves should be deployed across
|
||||
multiple servers that do not represent a single point of
|
||||
failure. Ensuring API availability can be achieved by placing
|
||||
these services behind highly available load balancers that
|
||||
have multiple OpenStack servers as members.</para>
|
||||
<para>OpenStack lends itself to deployment in a highly available
|
||||
manner where it is expected that at least 2 servers be
|
||||
utilized. These can run all the services involved from the
|
||||
message queuing service, for example RabbitMQ or QPID, and an
|
||||
appropriately deployed database service such as MySQL or
|
||||
MariaDB. As services in the cloud are scaled out, back-end
|
||||
services will need to scale too. Monitoring and reporting on
|
||||
server utilization and response times, as well as load testing
|
||||
your systems, will help determine scale out decisions.</para>
|
||||
<para>Care must be taken when deciding network functionality.
|
||||
Currently, OpenStack supports both the legacy Nova-network
|
||||
system and the newer, extensible OpenStack Networking. Both
|
||||
have their pros and cons when it comes to providing highly
|
||||
available access. Nova-network, which provides networking
|
||||
access maintained in the OpenStack Compute code, provides a
|
||||
feature that removes a single point of failure when it comes
|
||||
to routing, and this feature is currently missing in OpenStack
|
||||
Networking. The effect of Nova network’s Multi-Host
|
||||
functionality restricts failure domains to the host running
|
||||
that instance.</para>
|
||||
<para>On the other hand, when using OpenStack Networking, the
|
||||
OpenStack controller servers or separate OpenStack Networking
|
||||
hosts handle routing. For a deployment that requires features
|
||||
available in only OpenStack Networking, it is possible to
|
||||
remove this restriction by using third party software that
|
||||
helps maintain highly available L3 routes. Doing so allows for
|
||||
common APIs to control network hardware, or to provide complex
|
||||
multi-tier web applications in a secure manner. It is also
|
||||
possible to completely remove routing from OpenStack
|
||||
Networking, and instead rely on hardware routing capabilities.
|
||||
In this case, the switching infrastructure must support L3
|
||||
routing.</para>
|
||||
<para>OpenStack Networking (Neutron) and Nova Network both have
|
||||
their advantages and disadvantages. They are both valid and
|
||||
supported options that fit different use cases as described in
|
||||
the following table.</para>
|
||||
<para>Ensure your deployment has adequate back-up capabilities. As
|
||||
an example, in a deployment that has two infrastructure
|
||||
controller nodes, the design should include controller
|
||||
availability. In the event of the loss of a single controller,
|
||||
cloud services will run from a single controller in the event
|
||||
of failure. Where the design has higher availability
|
||||
requirements, it is important to meet those requirements by
|
||||
designing the proper redundancy and availability of controller
|
||||
nodes.</para>
|
||||
<para>Application design must also be factored into the
|
||||
capabilities of the underlying cloud infrastructure. If the
|
||||
compute hosts do not provide a seamless live migration
|
||||
capability, then it must be expected that when a compute host
|
||||
fails, that instance and any data local to that instance will
|
||||
be deleted. Conversely, when providing an expectation to users
|
||||
that instances have a high-level of uptime guarantees, the
|
||||
infrastructure must be deployed in a way that eliminates any
|
||||
single point of failure when a compute host disappears. This
|
||||
may include utilizing shared file systems on enterprise
|
||||
storage or OpenStack Block storage to provide a level of
|
||||
guarantee to match service features.</para>
|
||||
<para>For more information on HA in OpenStack, see the OpenStack
|
||||
High Availability Guide found at
|
||||
http://docs.openstack.org/high-availability-guide.</para></section>
|
||||
<section xml:id="security-tech-considerations"><title>Security</title>
|
||||
<para>A security domain comprises users, applications, servers or
|
||||
networks that share common trust requirements and expectations
|
||||
within a system. Typically they have the same authentication
|
||||
and authorization requirements and users.</para>
|
||||
<para>These security domains are:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Public</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Guest</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Management</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>These security domains can be mapped to an OpenStack
|
||||
deployment individually, or combined. For example, some
|
||||
deployment topologies combine both guest and data domains onto
|
||||
one physical network, whereas in other cases these networks
|
||||
are physically separated. In each case, the cloud operator
|
||||
should be aware of the appropriate security concerns. Security
|
||||
domains should be mapped out against your specific OpenStack
|
||||
deployment topology. The domains and their trust requirements
|
||||
depend upon whether the cloud instance is public, private, or
|
||||
hybrid.</para>
|
||||
<para>The public security domain is an entirely untrusted area of
|
||||
the cloud infrastructure. It can refer to the Internet as a
|
||||
whole or simply to networks over which you have no authority.
|
||||
This domain should always be considered untrusted.</para>
|
||||
<para>Typically used for compute instance-to-instance traffic, the
|
||||
guest security domain handles compute data generated by
|
||||
instances on the cloud but not services that support the
|
||||
operation of the cloud, such as API calls. Public cloud
|
||||
providers and private cloud providers who do not have
|
||||
stringent controls on instance use or who allow unrestricted
|
||||
internet access to instances should consider this domain to be
|
||||
untrusted. Private cloud providers may want to consider this
|
||||
network as internal and therefore trusted only if they have
|
||||
controls in place to assert that they trust instances and all
|
||||
their tenants.</para>
|
||||
<para>The management security domain is where services interact.
|
||||
Sometimes referred to as the "control plane", the networks in
|
||||
this domain transport confidential data such as configuration
|
||||
parameters, user names, and passwords. In most deployments this
|
||||
domain is considered trusted.</para>
|
||||
<para>The data security domain is concerned primarily with
|
||||
information pertaining to the storage services within
|
||||
OpenStack. Much of the data that crosses this network has high
|
||||
integrity and confidentiality requirements and, depending on
|
||||
the type of deployment, may also have strong availability
|
||||
requirements. The trust level of this network is heavily
|
||||
dependent on other deployment decisions.</para>
|
||||
<para>When deploying OpenStack in an enterprise as a private cloud
|
||||
it is usually behind the firewall and within the trusted
|
||||
network alongside existing systems. Users of the cloud are,
|
||||
traditionally, employees that are bound by the security
|
||||
requirements set forth by the company. This tends to push most
|
||||
of the security domains towards a more trusted model. However,
|
||||
when deploying OpenStack in a public facing role, no
|
||||
assumptions can be made and the attack vectors significantly
|
||||
increase. For example, the API endpoints, along with the
|
||||
software behind them, become vulnerable to bad actors wanting
|
||||
to gain unauthorized access or prevent access to services,
|
||||
which could lead to loss of data, functionality, and
|
||||
reputation. These services must be protected against through
|
||||
auditing and appropriate filtering.</para>
|
||||
<para>Consideration must be taken when managing the users of the
|
||||
system for both public and private clouds. The identity
|
||||
service allows for LDAP to be part of the authentication
|
||||
process. Including such systems in an OpenStack deployment may
|
||||
ease user management if integrating into existing
|
||||
systems.</para>
|
||||
<para>It's important to understand that user authentication
|
||||
requests include sensitive information including user names,
|
||||
passwords and authentication tokens. For this reason, placing
|
||||
the API services behind hardware that performs SSL termination
|
||||
is strongly recommended.</para>
|
||||
<para>For more information OpenStack Security, see the OpenStack
|
||||
Security Guide, at
|
||||
http://docs.openstack.org/security-guide/.</para>
|
||||
</section>
|
||||
</section>
|
@ -0,0 +1,175 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="user-requirements-general-purpose">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>User Requirements</title>
|
||||
<para>The general purpose cloud is built following the
|
||||
Infrastructure-as-a-Service (IaaS) model; as a platform best
|
||||
suited for use cases with simple requirements. The general
|
||||
purpose cloud user requirements themselves are typically not
|
||||
complex. However, it is still important to capture them even
|
||||
if the project has minimum business and technical requirements
|
||||
such as a Proof of Concept (PoC) or a small lab
|
||||
platform.</para>
|
||||
<para>These user considerations are written from the perspective
|
||||
of the organization that is building the cloud, not from the
|
||||
perspective of the end-users who will consume cloud services
|
||||
provided by this design.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cost: Financial factors are a primary concern for
|
||||
any organization. Since general purpose clouds are
|
||||
considered the baseline from which all other cloud
|
||||
architecture environments derive, cost will commonly
|
||||
be an important criteria. This type of cloud, however,
|
||||
does not always provide the most cost-effective
|
||||
environment for a specialized application or
|
||||
situation. Unless razor-thin margins and costs have
|
||||
been mandated as a critical factor, cost should not be
|
||||
the sole consideration when choosing or designing a
|
||||
general purpose architecture.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Time to market: Another common business factor in
|
||||
building a general purpose cloud is the ability to
|
||||
deliver a service or product more quickly and
|
||||
flexibly. In the modern hyper-fast business world,
|
||||
being able to deliver a product in six months instead
|
||||
of two years is often a major driving force behind the
|
||||
decision to build a general purpose cloud. General
|
||||
purpose clouds allow users to self-provision and gain
|
||||
access to compute, network, and storage resources
|
||||
on-demand thus decreasing time to market. It may
|
||||
potentially make more sense to build a general purpose
|
||||
PoC as opposed to waiting to finalize the ultimate use
|
||||
case for the system. The tradeoff with taking this
|
||||
approach is the risk that the general purpose cloud is
|
||||
not optimized for the actual final workloads. The
|
||||
final decision on which approach to take will be
|
||||
dependent on the specifics of the business objectives
|
||||
and time frame for the project.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Revenue opportunity: The revenue opportunity for a
|
||||
given cloud will vary greatly based on the intended
|
||||
use case of that particular cloud. Some general
|
||||
purpose clouds are built for commercial customer
|
||||
facing products, but there are plenty of other reasons
|
||||
that might make the general purpose cloud the right
|
||||
choice. A small cloud service provider (CSP) might
|
||||
want to build a general purpose cloud rather than a
|
||||
massively scalable cloud because they do not have the
|
||||
deep financial resources needed, or because they do
|
||||
not or will not know in advance the purposes for which
|
||||
their customers are going to use the cloud. For some
|
||||
users, the advantages cloud itself offers mean an
|
||||
enhancement of revenue opportunity. For others, the
|
||||
fact that a general purpose cloud provides only
|
||||
baseline functionality will be a disincentive for use,
|
||||
leading to a potential stagnation of potential revenue
|
||||
opportunities.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<section xml:id="legal-requirements-general-purpose"><title>Legal Requirements</title>
|
||||
<para>Many jurisdictions have legislative and regulatory
|
||||
requirements governing the storage and management of data in
|
||||
cloud environments. Common areas of regulation include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Data retention policies ensuring storage of
|
||||
persistent data and records management to meet data
|
||||
archival requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data ownership policies governing the possession and
|
||||
responsibility for data.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data sovereignty policies governing the storage of
|
||||
data in foreign countries or otherwise separate
|
||||
jurisdictions.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data compliance policies governing certain types of
|
||||
information need to reside in certain locations due to
|
||||
regular issues - and more important cannot reside in
|
||||
other locations for the same reason.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Examples of such legal frameworks include the data
|
||||
protection framework of the European Union
|
||||
(http://ec.europa.eu/justice/data-protection/ ) and the
|
||||
requirements of the Financial Industry Regulatory Authority
|
||||
(http://www.finra.org/Industry/Regulation/FINRARules/ ) in the
|
||||
United States. Consult a local regulatory body for more
|
||||
information.</para></section>
|
||||
<section xml:id="technical-requirements"><title>Technical Requirements</title>
|
||||
<para>Technical cloud architecture requirements should be weighted
|
||||
against the business requirements.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Performance: As a baseline product, general purpose
|
||||
clouds do not provide optimized performance for any
|
||||
particular function. While a general purpose cloud
|
||||
should provide enough performance to satisfy average
|
||||
user considerations, performance is not a general
|
||||
purpose cloud customer driver.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>No predefined usage model: The lack of a pre-defined
|
||||
usage model enables the user to run a wide variety of
|
||||
applications without having to know the application
|
||||
requirements in advance. This provides a degree of
|
||||
independence and flexibility that no other cloud
|
||||
scenarios are able to provide.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>On-demand and self-service application: By
|
||||
definition, a cloud provides end users with the
|
||||
ability to self-provision computing power, storage,
|
||||
networks, and software in a simple and flexible way.
|
||||
The user must be able to scale their resources up to a
|
||||
substantial level without disrupting the underlying
|
||||
host operations. One of the benefits of using a
|
||||
general purpose cloud architecture is the ability to
|
||||
start with limited resources and increase them over
|
||||
time as the user demand grows.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Public cloud: For a company interested in building a
|
||||
commercial public cloud offering based on OpenStack,
|
||||
the general purpose architecture model might be the
|
||||
best choice because the designers are not going to
|
||||
know the purposes or workloads for which the end users
|
||||
will use the cloud.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Internal consumption (private) cloud: Organizations
|
||||
need to determine if it makes the most sense to create
|
||||
their own clouds internally. The main advantage of a
|
||||
private cloud is that it allows the organization to
|
||||
maintain complete control over all the architecture
|
||||
and the cloud components. One caution is to think
|
||||
about the possibility that users will want to combine
|
||||
using the internal cloud with access to an external
|
||||
cloud. If that case is likely, it might be worth
|
||||
exploring the possibility of taking a multi-cloud
|
||||
approach with regard to at least some of the
|
||||
architectural elements. Designs that incorporate the
|
||||
use of multiple clouds, such as a private cloud and a
|
||||
public cloud offering, are described in the
|
||||
"Multi-Cloud" scenario.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Security: Security should be implemented according
|
||||
to asset, threat, and vulnerability risk assessment
|
||||
matrices. For cloud domains that require increased
|
||||
computer security, network security, or information
|
||||
security, general purpose cloud is not considered an
|
||||
appropriate choice.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
</section>
|
186
doc/arch-design/hybrid/section_architecture_hybrid.xml
Normal file
@ -0,0 +1,186 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-architecture-hybrid">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Architecture</title>
|
||||
<para>Once business and application requirements have been
|
||||
defined, the first step for designing a hybrid cloud solution
|
||||
is to map out the dependencies between the expected workloads
|
||||
and the diverse cloud infrastructures that need to support
|
||||
them. By mapping the applications and the targeted cloud
|
||||
environments, you can architect a solution that enables the
|
||||
broadest compatibility between cloud platforms and minimizes
|
||||
the need to create workarounds and processes to fill
|
||||
identified gaps. Note the evaluation of the monitoring and
|
||||
orchestration APIs available on each cloud platform and the
|
||||
relative levels of support for them in the chosen Cloud
|
||||
Management Platform.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Multi-Cloud_Priv-AWS4.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<section xml:id="image-portability"><title>Image portability</title>
|
||||
<para>The majority of cloud workloads currently run on instances
|
||||
using hypervisor technologies such as KVM, Xen, or ESXi. The
|
||||
challenge is that each of these hypervisors use an image
|
||||
format that is mostly, or not at all, compatible with one
|
||||
another. In a private or hybrid cloud solution, this can be
|
||||
mitigated by standardizing on the same hypervisor and instance
|
||||
image format but this is not always feasible. This is
|
||||
particularly evident if one of the clouds in the architecture
|
||||
is a public cloud that is outside of the control of the
|
||||
designers.</para>
|
||||
<para>There are conversion tools such as virt-v2v
|
||||
(http://libguestfs.org/virt-v2v/) and virt-edit
|
||||
(http://libguestfs.org/virt-edit.1.html) that can be used in
|
||||
those scenarios but they are often not suitable beyond very
|
||||
basic cloud instance specifications. An alternative is to
|
||||
build a thin operating system image as the base for new
|
||||
instances. This facilitates rapid creation of cloud instances
|
||||
using cloud orchestration or configuration management tools,
|
||||
driven by the CMP, for more specific templating. Another more
|
||||
expensive option is to use a commercial image migration tool.
|
||||
The issue of image portability is not just for a one time
|
||||
migration. If the intention is to use the multiple cloud for
|
||||
disaster recovery, application diversity or high availability,
|
||||
the images and instances are likely to be moved between the
|
||||
different cloud platforms regularly.</para></section>
|
||||
<section xml:id="upper-layer-services"><title>Upper-Layer Services</title>
|
||||
<para>Many clouds offer complementary services over and above the
|
||||
basic compute, network, and storage components. These
|
||||
additional services are often used to simplify the deployment
|
||||
and management of applications on a cloud platform.</para>
|
||||
<para>Consideration is required to be given to moving workloads
|
||||
that may have upper-layer service dependencies on the source
|
||||
cloud platform to a destination cloud platform that may not
|
||||
have a comparable service. Conversely, the user can implement
|
||||
it in a different way or by using a different technology. For
|
||||
example, moving an application that uses a NoSQL database
|
||||
service such as MongoDB that is delivered as a service on the
|
||||
source cloud, to a destination cloud that does not offer that
|
||||
service or may only use a relational database such as MySQL,
|
||||
could cause difficulties in maintaining the application
|
||||
between the platforms.</para>
|
||||
<para>There are a number of options that might be appropriate for
|
||||
the hybrid cloud use case:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Create a baseline of upper-layer services that are
|
||||
implemented across all of the cloud platforms. For
|
||||
platforms that do not support a given service, create
|
||||
a service on top of that platform and apply it to the
|
||||
workloads as they are launched on that cloud. For
|
||||
example, OpenStack, via Trove, supports MySQL as a
|
||||
service but not NoSQL databases in production. To move
|
||||
from or to run alongside on AWS a NoSQL workload would
|
||||
require recreating the NoSQL database on top of
|
||||
OpenStack and automate the process of implementing it
|
||||
using a tool such as OpenStack Orchestration
|
||||
(Heat).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Deploy a Platform as a Service (PaaS) technology
|
||||
such as Cloud Foundry or OpenShift that abstracts the
|
||||
upper-layer services from the underlying cloud
|
||||
platform. The unit of application deployment and
|
||||
migration is the PaaS and leverages the services of
|
||||
the PaaS and only consumes the base infrastructure
|
||||
services of the cloud platform. The downside to this
|
||||
approach is that the PaaS itself then potentially
|
||||
becomes a source of lock-in.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Use only the base infrastructure services that are
|
||||
common across all cloud platforms. Use automation
|
||||
tools to create the required upper-layer services
|
||||
which are portable across all cloud platforms. For
|
||||
example, instead of using any database services that
|
||||
are inherent in the cloud platforms, launch cloud
|
||||
instances and deploy the databases on to those
|
||||
instances using scripts or various configuration and
|
||||
application deployment tools.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="network-services"><title>Network Services</title>
|
||||
<para>Network services functionality is a significant barrier for
|
||||
multiple cloud architectures. It could be an important factor
|
||||
to assess when choosing a CMP and cloud provider.
|
||||
Considerations are: functionality, security, scalability and
|
||||
High availability (HA). Verification and ongoing testing of
|
||||
the critical features of the cloud endpoint used by the
|
||||
architecture are important tasks.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Once the network functionality framework has been
|
||||
decided, a minimum functionality test should be
|
||||
designed to confirm that the functionality is in fact
|
||||
compatible. This will ensure testing and functionality
|
||||
persists during and after upgrades. Note that over
|
||||
time, the diverse cloud platforms are likely to
|
||||
de-synchronize if care is not taken to maintain
|
||||
compatibility. This is a particular issue with
|
||||
APIs.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Scalability across multiple cloud providers may
|
||||
dictate which underlying network framework is chosen
|
||||
for the different cloud providers. It is important to
|
||||
have the network API functions presented and to verify
|
||||
that the desired functionality persists across all
|
||||
chosen cloud endpoint.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High availability (HA) implementations vary in
|
||||
functionality and design. Examples of some common
|
||||
methods are Active-Hot-Standby, Active-Passive and
|
||||
Active-Active. High availability and a test framework
|
||||
need to be developed to insure that the functionality
|
||||
and limitations are well understood.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Security considerations, such as how data is secured
|
||||
between client and endpoint and any traffic that
|
||||
traverses the multiple clouds, from eavesdropping to
|
||||
DoS activities must be addressed. Business and
|
||||
regulatory requirements dictate the security approach
|
||||
that needs to be taken.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="data"><title>Data</title>
|
||||
<para>Replication has been the traditional method for protecting
|
||||
object store implementations. A variety of different
|
||||
implementations have existed in storage architectures.
|
||||
Examples of this are both synchronous and asynchronous
|
||||
mirroring. Most object stores and back-end storage systems have
|
||||
a method for replication that can be implemented at the
|
||||
storage subsystem layer. Object stores also have implemented
|
||||
replication techniques that can be tailored to fit a clouds
|
||||
needs. An organization must find the right balance between
|
||||
data integrity and data availability. Replication strategy may
|
||||
also influence the disaster recovery methods
|
||||
implemented.</para>
|
||||
<para>Replication across different racks, data centers and
|
||||
geographical regions has led to the increased focus of
|
||||
determining and ensuring data locality. The ability to
|
||||
guarantee data is accessed from the nearest or fastest storage
|
||||
can be necessary for applications to perform well. Examples of
|
||||
this are Hadoop running in a cloud. The user either runs with
|
||||
a native HDFS, when applicable, or on a separate parallel file
|
||||
system such as those provided by Hitachi and IBM. Special
|
||||
consideration should be taken when running embedded object
|
||||
store methods to not cause extra data replication, which can
|
||||
create unnecessary performance issues. Another example of
|
||||
ensuring data locality is by using Ceph. Ceph has a data
|
||||
container abstraction called a pool. Pools can be created with
|
||||
replicas or erasure code. Replica based pools can also have a
|
||||
rule set defined to have data written to a “local” set of
|
||||
hardware which would be the primary access and modification
|
||||
point.</para>
|
||||
</section>
|
||||
</section>
|
68
doc/arch-design/hybrid/section_introduction_hybrid.xml
Normal file
@ -0,0 +1,68 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-intro-hybrid">
|
||||
<title>Introduction</title>
|
||||
<para>Hybrid cloud, by definition, means that the design spans
|
||||
more than one cloud. An example of this kind of architecture
|
||||
may include a situation in which the design involves more than
|
||||
one OpenStack cloud (for example, an OpenStack-based private
|
||||
cloud and an OpenStack-based public cloud), or it may be a
|
||||
situation incorporating an OpenStack cloud and a non-OpenStack
|
||||
cloud (for example, an OpenStack-based private cloud that
|
||||
interacts with Amazon Web Services). Bursting into an external
|
||||
cloud is the practice of creating new instances to alleviate
|
||||
extra load where there is no available capacity in the private
|
||||
cloud.</para>
|
||||
<para>Some situations that could involve hybrid cloud architecture
|
||||
include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Bursting from a private cloud to a public
|
||||
cloud</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Disaster recovery</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Development and testing</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Federated cloud, enabling users to choose resources
|
||||
from multiple providers</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Hybrid clouds built to support legacy systems as
|
||||
they transition to cloud</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>As a hybrid cloud design deals with systems that are outside
|
||||
of the control of the cloud architect or organization, a
|
||||
hybrid cloud architecture requires considering aspects of the
|
||||
architecture that might not have otherwise been necessary. For
|
||||
example, the design may need to deal with hardware, software,
|
||||
and APIs under the control of a separate organization.</para>
|
||||
<para>Similarly, the degree to which the architecture is
|
||||
OpenStack-based will have an effect on the cloud operator or
|
||||
cloud consumer's ability to accomplish tasks with native
|
||||
OpenStack tools. By definition, this is a situation in which
|
||||
no single cloud can provide all of the necessary
|
||||
functionality. In order to manage the entire system, users,
|
||||
operators and consumers will need an overarching tool known as
|
||||
a cloud management platform (CMP). Any organization that is
|
||||
working with multiple clouds already has a CMP, even if that
|
||||
CMP is the operator who logs into an external web portal and
|
||||
launches a public cloud instance.</para>
|
||||
<para>There are commercially available options, such as
|
||||
Rightscale, and open source options, such as ManageIQ
|
||||
(http://manageiq.org/), but there is no single CMP that can
|
||||
address all needs in all scenarios. Whereas most of the
|
||||
sections of this book talk about the aspects of OpenStack, an
|
||||
architect needs to consider when designing an OpenStack
|
||||
architecture. This section will also discuss the things the
|
||||
architect must address when choosing or building a CMP to run
|
||||
a hybrid cloud design, even if the CMP will be a manually
|
||||
built solution.</para>
|
||||
</section>
|
@ -0,0 +1,99 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-hybrid-operational-considerations">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Operational Considerations</title>
|
||||
<para>Hybrid cloud deployments present complex operational
|
||||
challenges. There are several factors to consider that affect
|
||||
the way each cloud is deployed and how users and operators
|
||||
will interact with each cloud. Not every cloud provider
|
||||
implements infrastructure components the same way which may
|
||||
lead to incompatible interactions with workloads or a specific
|
||||
Cloud Management Platform (CMP). Different cloud providers may
|
||||
also offer different levels of integration with competing
|
||||
cloud offerings.</para>
|
||||
<para>When selecting a CMP, one of the most important aspects to
|
||||
consider is monitoring. Gaining valuable insight into each
|
||||
cloud is critical to gaining a holistic view of all involved
|
||||
clouds. In choosing an existing CMP, determining whether it
|
||||
supports monitoring of all the clouds involved or if
|
||||
compatible APIs are available which can be queried for the
|
||||
necessary information, is vital. Once all the information
|
||||
about each cloud can be gathered and stored in a searchable
|
||||
database, proper actions can be taken on that data offline so
|
||||
workloads will not be impacted.</para>
|
||||
<section xml:id="agility"><title>Agility</title>
|
||||
<para>Implementing a hybrid cloud solution can provide application
|
||||
availability across disparate cloud environments and
|
||||
technologies. This availability enables the deployment to
|
||||
survive a complete disaster in any single cloud environment.
|
||||
Each cloud should provide the means to quickly spin up new
|
||||
instances in the case of capacity issues or complete
|
||||
unavailability of a single cloud installation.</para></section>
|
||||
<section xml:id="application-readiness-hybrid"><title>Application Readiness</title>
|
||||
<para>It is important to understand the type of application
|
||||
workloads that will be deployed across the hybrid cloud
|
||||
environment. Enterprise workloads that depend on the
|
||||
underlying infrastructure for availability are not designed to
|
||||
run on OpenStack. Although these types of applications can run
|
||||
on an OpenStack cloud, if the application is not able to
|
||||
tolerate infrastructure failures, it is likely to require
|
||||
significant operator intervention to recover. Cloud workloads,
|
||||
however, are designed with fault tolerance in mind and the SLA
|
||||
of the application is not tied to the underlying
|
||||
infrastructure. Ideally, cloud applications will be designed
|
||||
to recover when entire racks and even data centers full of
|
||||
infrastructure experience an outage.</para></section>
|
||||
<section xml:id="upgrades"><title>Upgrades</title>
|
||||
<para>OpenStack is a complex and constantly evolving collection of
|
||||
software. Upgrades may be performed to one or more of the
|
||||
cloud environments involved. If a public cloud is involved in
|
||||
the deployment, predicting upgrades may not be possible. Be
|
||||
sure to examine the advertised SLA for any public cloud
|
||||
provider being used. Note that at massive scale, even when
|
||||
dealing with a cloud that offers an SLA with a high percentage
|
||||
of uptime, workloads must be able to recover at short
|
||||
notice.</para>
|
||||
<para>Similarly, when upgrading private cloud deployments, care
|
||||
must be taken to minimize disruption by making incremental
|
||||
changes and providing a facility to either rollback or
|
||||
continue to roll forward when using a continuous delivery
|
||||
model.</para>
|
||||
<para>Another consideration is upgrades to the CMP which may need
|
||||
to be completed in coordination with any of the hybrid cloud
|
||||
upgrades. This may be necessary whenever API changes are made
|
||||
in one of the cloud solutions in use to support the new
|
||||
functionality.</para></section>
|
||||
<section xml:id="network-operation-center-noc"><title>Network Operation Center (NOC)</title>
|
||||
<para>When planning the Network Operation Center for a hybrid
|
||||
cloud environment, it is important to recognize where control
|
||||
over each piece of infrastructure resides. If a significant
|
||||
portion of the cloud is on externally managed systems, be
|
||||
prepared for situations in which it may not be possible to
|
||||
make changes at all or at the most convenient time.
|
||||
Additionally, situations of conflict may arise in which
|
||||
multiple providers have differing points of view on the way
|
||||
infrastructure must be managed and exposed. This can lead to
|
||||
delays in root cause and analysis where each insists the blame
|
||||
lies with the other provider.</para>
|
||||
<para>It is important to ensure that the structure put in place
|
||||
enables connection of the networking of both clouds to form an
|
||||
integrated system, keeping in mind the state of handoffs.
|
||||
These handoffs must both be as reliable as possible and
|
||||
include as little latency as possible to ensure the best
|
||||
performance of the overall system.</para></section>
|
||||
<section xml:id="maintainability"><title>Maintainability</title>
|
||||
<para>Operating hybrid clouds is a situation in which there is a
|
||||
greater reliance on third party systems and processes. As a
|
||||
result of a lack of control of various pieces of a hybrid
|
||||
cloud environment, it is not necessarily possible to guarantee
|
||||
proper maintenance of the overall system. Instead, the user
|
||||
must be prepared to abandon workloads and spin them up again
|
||||
in an improved state. Having a hybrid cloud deployment does,
|
||||
however, provide agility for these situations by allowing the
|
||||
migration of workloads to alternative clouds in response to
|
||||
cloud-specific issues.</para></section>
|
||||
</section>
|
175
doc/arch-design/hybrid/section_prescriptive_examples_hybrid.xml
Normal file
@ -0,0 +1,175 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="prescriptive-examples-multi-cloud">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Prescriptive Examples</title>
|
||||
<para>Multi-cloud environments are typically created to facilitate
|
||||
these use cases:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Bursting workloads from private to public OpenStack
|
||||
clouds</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Bursting workloads from private to public
|
||||
non-OpenStack clouds</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High Availability across clouds (for technical
|
||||
diversity)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Examples of environments that address each of these use
|
||||
cases will be discussed in this chapter.</para>
|
||||
<para>Company A's data center is running dangerously low on
|
||||
capacity. The option of expanding the data center will not be
|
||||
possible in the foreseeable future. In order to accommodate
|
||||
the continuously growing need for development resources in the
|
||||
organization, the decision was make use of resource in the
|
||||
public cloud.</para>
|
||||
<para>The company has an internal cloud management platform that
|
||||
will direct requests to the appropriate cloud, depending on
|
||||
the currently local capacity.</para>
|
||||
<para>This is a custom in-house application that has been written
|
||||
for this specific purpose.</para>
|
||||
<para>An example for such a solution is described in the figure
|
||||
below.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Multi-Cloud_Priv-Pub3.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>This example shows two clouds, with a Cloud Management
|
||||
Platform (CMP) connecting them. This guide does not attempt to
|
||||
cover a specific CMP, but describes how workloads are
|
||||
typically orchestrated using the Orchestration and Telemetry
|
||||
services as shown in the diagram above. It is also possibly to
|
||||
connect directly to the other OpenStack APIs with a
|
||||
CMP.</para>
|
||||
<para>The private cloud is an OpenStack cloud with one or more
|
||||
controllers and one or more compute nodes. It includes
|
||||
metering provided by OpenStack Telemetry. As load increases
|
||||
Telemetry captures this and the information is in turn
|
||||
processed by the CMP. As long as capacity is available, the
|
||||
CMP uses the OpenStack API to call the Orchestration service
|
||||
to create instances on the private cloud in response to user
|
||||
requests. When capacity is not available on the private cloud,
|
||||
the CMP issues a request to the Orchestration service API of
|
||||
the public cloud to create the instance on the public
|
||||
cloud.</para>
|
||||
<para>In this example, the whole deployment was not directed to an
|
||||
external public cloud because of the company's fear of lack of
|
||||
resource control and security concerns over control and
|
||||
increased operational expense.</para>
|
||||
<para>In addition, CompanyA has already established a data center
|
||||
with a substantial amount of hardware, and migrating all the
|
||||
workloads out to a public cloud was not feasible.</para>
|
||||
<section xml:id="bursting-to-public-nonopenstack-cloud"><title>Bursting to a Public non-OpenStack Cloud</title>
|
||||
<para>Another common scenario is bursting workloads from the
|
||||
private cloud into a non-OpenStack public cloud such as Amazon
|
||||
Web Services (AWS) to take advantage of additional capacity
|
||||
and scale applications as needed.</para>
|
||||
<para>For an OpenStack-to-AWS hybrid cloud, the architecture looks
|
||||
similar to the figure below:</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Multi-Cloud_Priv-AWS4.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>In this scenario CompanyA has an additional requirement in
|
||||
that the developers were already using AWS for some of their
|
||||
work and did not want to change the cloud provider. Primarily
|
||||
due to excessive overhead with network firewall rules that
|
||||
needed to be created and corporate financial procedures that
|
||||
required entering into an agreement with a new
|
||||
provider.</para>
|
||||
<para>As long as the CMP is capable of connecting an external
|
||||
cloud provider with the appropriate API, the workflow process
|
||||
will remain the same as the previous scenario. The actions the
|
||||
CMP takes such as monitoring load, creating new instances, and
|
||||
so forth are the same, but they would be performed in the
|
||||
public cloud using the appropriate API calls. For example, if
|
||||
the public cloud is Amazon Web Services, the CMP would use the
|
||||
EC2 API to create a new instance and assign an Elastic IP.
|
||||
That IP can then be added to HAProxy in the private cloud,
|
||||
just as it was before. The CMP can also reference AWS-specific
|
||||
tools such as CloudWatch and CloudFormation.</para>
|
||||
<para>Several open source tool kits for building CMPs are now
|
||||
available that can handle this kind of translation, including
|
||||
ManageIQ, jClouds, and JumpGate.</para></section>
|
||||
<section xml:id="high-availability-disaster-recovery"><title>High Availability/Disaster Recovery</title>
|
||||
<para>CompanyA has a requirement to be able to recover from
|
||||
catastrophic failure in their local data center. Some of the
|
||||
workloads currently in use are running on their private
|
||||
OpenStack cloud. Protecting the data involves block storage,
|
||||
object storage, and a database. The architecture is designed
|
||||
to support the failure of large components of the system, yet
|
||||
ensuring that the system will continue to deliver services.
|
||||
While the services remain available to users, the failed
|
||||
components are restored in the background based on standard
|
||||
best practice DR policies. To achieve the objectives, data is
|
||||
replicated to a second cloud, in a geographically distant
|
||||
location. The logical diagram of the system is described in
|
||||
the figure below:</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Multi-Cloud_failover2.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>This example includes two private OpenStack clouds connected
|
||||
with a Cloud Management Platform (CMP). The source cloud,
|
||||
OpenStack Cloud 1, includes a controller and at least one
|
||||
instance running MySQL. It also includes at least one block
|
||||
storage volume and one object storage volume so that the data
|
||||
is available to the users at all times. The details of the
|
||||
method for protecting each of these sources of data
|
||||
differs.</para>
|
||||
<para>The object storage relies on the replication capabilities of
|
||||
the object storage provider. OpenStack Object Storage is
|
||||
enabled so that it creates geographically separated replicas
|
||||
that take advantage of this feature. It is configured so that
|
||||
at least one replica exists in each cloud. In order to make
|
||||
this work a single array spanning both clouds is configured
|
||||
with OpenStack Identity that uses Federated Identity and talks
|
||||
to both clouds, communicating with OpenStack Object Storage
|
||||
through the Swift proxy.</para>
|
||||
<para>For block storage, the replication is a little more
|
||||
difficult, and involves tools outside of OpenStack itself. The
|
||||
OpenStack Block Storage volume is not set as the drive itself
|
||||
but as a logical object that points to a physical back end. The
|
||||
disaster recovery is configured for Block Storage for
|
||||
synchronous backup for the highest level of data protection,
|
||||
but asynchronous backup could have been set as an alternative
|
||||
that is not as latency sensitive. For asynchronous backup, the
|
||||
Cinder API makes it possible to export the data and also the
|
||||
metadata of a particular volume, so that it can be moved and
|
||||
replicated elsewhere. More information can be found here:
|
||||
https://blueprints.launchpad.net/cinder/+spec/cinder-backup-volume-metadata-support.</para>
|
||||
<para>The synchronous backups create an identical volume in both
|
||||
clouds and chooses the appropriate flavor so that each cloud
|
||||
has an identical back end. This was done by creating volumes
|
||||
through the CMP, because the CMP knows to create identical
|
||||
volumes in both clouds. Once this is configured, a solution,
|
||||
involving DRDB, is used to synchronize the actual physical
|
||||
drives.</para>
|
||||
<para>The database component is backed up using synchronous
|
||||
backups. MySQL does not support geographically diverse
|
||||
replication, so disaster recovery is provided by replicating
|
||||
the file itself. As it is not possible to use object storage
|
||||
as the back end of a database like MySQL, Swift replication
|
||||
was not an option. It was decided not to store the data on
|
||||
another geo-tiered storage system, such as Ceph, as block
|
||||
storage. This would have given another layer of protection.
|
||||
Another option would have been to store the database on an
|
||||
OpenStack Block Storage volume and backing it up just as any
|
||||
other block storage.</para></section>
|
||||
</section>
|
325
doc/arch-design/hybrid/section_tech_considerations_hybrid.xml
Normal file
@ -0,0 +1,325 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="technical-considerations-hybrid">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Technical Considerations</title>
|
||||
<para>A hybrid cloud environment requires inspection and
|
||||
understanding of technical issues that are not only outside of
|
||||
an organization's data center, but potentially outside of an
|
||||
organization's control. In many cases, it is necessary to
|
||||
ensure that the architecture and CMP chosen can adapt to, not
|
||||
only to different environments, but also to the possibility of
|
||||
change. In this situation, applications are crossing diverse
|
||||
platforms and are likely to be located in diverse locations.
|
||||
All of these factors will influence and add complexity to the
|
||||
design of a hybrid cloud architecture.</para>
|
||||
<para>The only situation where cloud platform incompatibilities
|
||||
are not going to be an issue is when working with clouds that
|
||||
are based on the same version and the same distribution of
|
||||
OpenStack. Otherwise incompatibilities are virtually
|
||||
inevitable.</para>
|
||||
<para>Incompatibility should be less of an issue for clouds that
|
||||
exclusively use the same version of OpenStack, even if they
|
||||
use different distributions. The newer the distribution in
|
||||
question, the less likely it is that there will be
|
||||
incompatibilities between version. This is due to the fact
|
||||
that the OpenStack community has established an initiative to
|
||||
define core functions that need to remain backward compatible
|
||||
between supported versions. The DefCore initiative defines
|
||||
basic functions that every distribution must support in order
|
||||
to bear the name "OpenStack".</para>
|
||||
<para>Some vendors, however, add proprietary customizations to
|
||||
their distributions. If an application or architecture makes
|
||||
use of these features, it will be difficult to migrate to or
|
||||
use other types of environments. Anyone considering
|
||||
incorporating older versions of OpenStack prior to Havana
|
||||
should consider carefully before attempting to incorporate
|
||||
functionality between versions. Internal differences in older
|
||||
versions may be so great that the best approach might be to
|
||||
consider the versions to be essentially diverse platforms, as
|
||||
different as OpenStack and Amazon Web Services or Microsoft
|
||||
Azure.</para>
|
||||
<para>The situation is more predictable if using different cloud
|
||||
platforms is incorporated from inception. If the other clouds
|
||||
are not based on OpenStack, then all pretense of compatibility
|
||||
vanishes, and CMP tools must account for the myriad of
|
||||
differences in the way operations are handled and services are
|
||||
implemented. Some situations in which these incompatibilities
|
||||
can arise include differences between the way in which a
|
||||
cloud:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Deploys instances</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Manages networks</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Treats applications</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Implements services</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<section xml:id="capacity-planning-hybrid"><title>Capacity planning</title>
|
||||
<para>One of the primary reasons many organizations turn to a
|
||||
hybrid cloud system is to increase capacity without having to
|
||||
make large capital investments. However, capacity planning is
|
||||
still necessary when designing an OpenStack installation even
|
||||
if it is augmented with external clouds.</para>
|
||||
<para>Specifically, overall capacity and placement of workloads
|
||||
need to be accounted for when designing for a mostly
|
||||
internally-operated cloud with the occasional capacity burs.
|
||||
The long-term capacity plan for such a design needs to
|
||||
incorporate growth over time to prevent the need to
|
||||
permanently burst into, and occupy, a potentially more
|
||||
expensive external cloud. In order to avoid this scenario,
|
||||
account for the future applications and capacity requirements
|
||||
and plan growth appropriately.</para>
|
||||
<para>One of the drawbacks of capacity planning is
|
||||
unpredictability. It is difficult to predict the amount of
|
||||
load a particular application might incur if the number of
|
||||
users fluctuates or the application experiences an unexpected
|
||||
increase in popularity. It is possible to define application
|
||||
requirements in terms of vCPU, RAM, bandwidth or other
|
||||
resources and plan appropriately, but other clouds may not use
|
||||
the same metric or even the same oversubscription
|
||||
rates.</para>
|
||||
<para>Oversubscription is a method to emulate more capacity than
|
||||
they may physically be present. For example, a physical
|
||||
hypervisor node with 32 gigabytes of RAM may host 24
|
||||
instances, each provisioned with 2 gigabytes of RAM. As long
|
||||
as all 24 of them are not concurrently utilizing 2 full
|
||||
gigabytes, this arrangement is a non-issue. However, some
|
||||
hosts take oversubscription to extremes and, as a result,
|
||||
performance can frequently be inconsistent. If at all
|
||||
possible, determine what the oversubscription rates of each
|
||||
host are and plan capacity accordingly.</para></section>
|
||||
<section xml:id="security-hybrid"><title>Security</title>
|
||||
<para>The nature of a hybrid cloud environment removes complete
|
||||
control over the infrastructure. Security becomes a stronger
|
||||
requirement because data or applications may exist in a cloud
|
||||
that is outside of an organization's control. Security domains
|
||||
become an important distinction when planning for a hybrid
|
||||
cloud environment and its capabilities. A security domain
|
||||
comprises users, applications, servers or networks that share
|
||||
common trust requirements and expectations within a
|
||||
system.</para>
|
||||
<para>The security domains are:</para>
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>Public</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Guest</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Management</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<para>These security domains can be mapped individually to the
|
||||
organization's installation or combined. For example, some
|
||||
deployment topologies combine both guest and data domains onto
|
||||
one physical network, whereas other topologies may physically
|
||||
separate these networks. In each case, the cloud operator
|
||||
should be aware of the appropriate security concerns. Security
|
||||
domains should be mapped out against the specific OpenStack
|
||||
deployment topology. The domains and their trust requirements
|
||||
depend upon whether the cloud instance is public, private, or
|
||||
hybrid.</para>
|
||||
<para>The public security domain is an entirely untrusted area of
|
||||
the cloud infrastructure. It can refer to the Internet as a
|
||||
whole or simply to networks over which an organization has no
|
||||
authority. This domain should always be considered untrusted.
|
||||
When considering hybrid cloud deployments, any traffic
|
||||
traversing beyond and between the multiple clouds should
|
||||
always be considered to reside in this security domain and is
|
||||
therefore untrusted.</para>
|
||||
<para>Typically used for instance-to-instance traffic within a
|
||||
single data center, the guest security domain handles compute
|
||||
data generated by instances on the cloud but not services that
|
||||
support the operation of the cloud such as API calls. Public
|
||||
cloud providers that are used in a hybrid cloud configuration
|
||||
which an organization does not control and private cloud
|
||||
providers who do not have stringent controls on instance use
|
||||
or who allow unrestricted internet access to instances should
|
||||
consider this domain to be untrusted. Private cloud providers
|
||||
may consider this network as internal and therefore trusted
|
||||
only if there are controls in place to assert that instances
|
||||
and tenants are trusted.</para>
|
||||
<para>The management security domain is where services interact.
|
||||
Sometimes referred to as the "control plane", the networks in
|
||||
this domain transport confidential data such as configuration
|
||||
parameters, user names, and passwords. In deployments behind an
|
||||
organization's firewall, this domain is considered trusted. In
|
||||
a public cloud model which could be part of an architecture,
|
||||
this would have to be assessed with the Public Cloud provider
|
||||
to understand the controls in place.</para>
|
||||
<para>The data security domain is concerned primarily with
|
||||
information pertaining to the storage services within
|
||||
OpenStack. Much of the data that crosses this network has high
|
||||
integrity and confidentiality requirements and depending on
|
||||
the type of deployment there may also be strong availability
|
||||
requirements. The trust level of this network is heavily
|
||||
dependent on deployment decisions and as such this is not
|
||||
assigned a default level of trust.</para>
|
||||
<para>Consideration must be taken when managing the users of the
|
||||
system, whether operating or utilizing public or private
|
||||
clouds. The identity service allows for LDAP to be part of the
|
||||
authentication process. Including such systems in your
|
||||
OpenStack deployments may ease user management if integrating
|
||||
into existing systems. Be mindful when utilizing 3rd party
|
||||
clouds to explore authentication options applicable to the
|
||||
installation to help manage and keep user authentication
|
||||
consistent.</para>
|
||||
<para>Due to the process of passing user names, passwords, and
|
||||
generated tokens between client machines and API endpoints,
|
||||
placing API services behind hardware that performs SSL
|
||||
termination is strongly recommended.</para>
|
||||
<para>Within cloud components themselves, another component that
|
||||
needs security scrutiny is the hypervisor. In a public cloud,
|
||||
organizations typically do not have control over the choice of
|
||||
hypervisor. (Amazon uses its own particular version of Xen,
|
||||
for example.) In some cases, hypervisors may be vulnerable to
|
||||
a type of attack called "hypervisor breakout" if they are not
|
||||
properly secured. Hypervisor breakout describes the event of a
|
||||
compromised or malicious instance breaking out of the resource
|
||||
controls of the hypervisor and gaining access to the bare
|
||||
metal operating system and hardware resources.</para>
|
||||
<para>If the security of instances is not considered important,
|
||||
there may not be an issue. In most cases, however, enterprises
|
||||
need to avoid this kind of vulnerability, and the only way to
|
||||
do that is to avoid a situation in which the instances are
|
||||
running on a public cloud. That does not mean that there is a
|
||||
need to own all of the infrastructure on which an OpenStack
|
||||
installation operates; it suggests avoiding situations in
|
||||
which hardware may be shared with others.</para>
|
||||
<para>There are other services worth considering that provide a
|
||||
bare metal instance instead of a cloud. In other cases, it is
|
||||
possible to replicate a second private cloud by integrating
|
||||
with a Private Cloud as a Service deployment, in which an
|
||||
organization does not buy hardware, but also does not share it
|
||||
with other tenants. It is also possible use a provider that
|
||||
hosts a bare-metal "public" cloud instance for which the
|
||||
hardware is dedicated only to one customer, or a provider that
|
||||
offers Private Cloud as a Service.</para>
|
||||
<para>Finally, it is important to realize that each cloud
|
||||
implements services differently. What keeps data secure in one
|
||||
cloud may not do the same in another. Be sure to know the
|
||||
security requirements of every cloud that handles the
|
||||
organization's data or workloads.</para>
|
||||
<para>More information on OpenStack Security can be found at
|
||||
http://docs.openstack.org/security-guide/</para></section>
|
||||
<section xml:id="utilization-hybrid"><title>Utilization</title>
|
||||
<para>When it comes to utilization, it is important that the CMP
|
||||
understands what workloads are running, where they are
|
||||
running, and their preferred utilizations. For example, in
|
||||
most cases it is desirable to run as many workloads internally
|
||||
as possible, utilizing other resources only when necessary. On
|
||||
the other hand, situations exist in which the opposite is
|
||||
true. The internal cloud may only be for development and
|
||||
stressing it is undesirable. In most cases, a cost model of
|
||||
various scenarios helps with this decision, however this
|
||||
analysis is heavily influenced by internal priorities. The
|
||||
important thing is the ability to efficiently make those
|
||||
decisions on a programmatic basis.</para>
|
||||
<para>The OpenStack Telemetry (Ceilometer) project is designed to
|
||||
provide information on the usage of various OpenStack
|
||||
components. There are two limitations to consider: first, if
|
||||
there is to be a large amount of data (for example, if
|
||||
monitoring a large cloud, or a very active one) it is
|
||||
desirable to use a NoSQL back end for Ceilometer, such as
|
||||
MongoDB. Second, when connecting to a non-OpenStack cloud,
|
||||
there will need to be a way to monitor that usage and to
|
||||
provide that monitoring data back to the CMP.</para></section>
|
||||
<section xml:id="performace-hybrid"><title>Performance</title>
|
||||
<para>Performance is of primary importance in the design of a
|
||||
cloud. When it comes to a hybrid cloud deployment, many of the
|
||||
same issues for multi-site deployments apply, such as network
|
||||
latency between sites. It is also important to think about the
|
||||
speed at which a workload can be spun up in another cloud, and
|
||||
what can be done to reduce the time necessary to accomplish
|
||||
that task. That may mean moving data closer to applications,
|
||||
or conversely, applications closer to the data they process.
|
||||
It may mean grouping functionality so that connections that
|
||||
require low latency take place over a single cloud rather than
|
||||
spanning clouds. That may also mean ensuring that the CMP has
|
||||
the intelligence to know which cloud can most efficiently run
|
||||
which types of workloads.</para>
|
||||
<para>As with utilization, native OpenStack tools are available to
|
||||
assist. Ceilometer can measure performance and, if necessary,
|
||||
OpenStack Orchestration via the Heat project can be used to
|
||||
react to changes in demand by spinning up more resources. It
|
||||
is important to note, however, that Orchestration requires
|
||||
special configurations in the client to enable functioning
|
||||
with solution offerings from Amazon Web Services. When dealing
|
||||
with other types of clouds, it is necessary to rely on the
|
||||
features of the CMP.</para></section>
|
||||
<section xml:id="components"><title>Components</title>
|
||||
<para>The number and types of native OpenStack components that are
|
||||
available for use is dependent on whether the deployment is
|
||||
exclusively an OpenStack cloud or not. If so, all of the
|
||||
OpenStack components will be available for use, and in many
|
||||
ways the issues that need to be considered will be similar to
|
||||
those that need to be considered for a multi-site
|
||||
deployment.</para>
|
||||
<para>That said, in any situation in which more than one cloud is
|
||||
being used, at least four OpenStack tools will be
|
||||
considered:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OpenStack Compute (Nova): Regardless of deployment
|
||||
location, hypervisor choice has a direct effect on how
|
||||
difficult it is to integrate with one or more
|
||||
additional clouds. For example, integrating a Hyper-V
|
||||
based OpenStack cloud with Azure will have less
|
||||
compatibility issues than if KVM is used.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Networking: Whether OpenStack Networking (Neutron)
|
||||
or Nova-network is used, the network is one place
|
||||
where integration capabilities need to be understood
|
||||
in order to connect between clouds.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Telemetry (Ceilometer): Use of Ceilometer
|
||||
depends, in large part, on what the other parts of the
|
||||
cloud are using.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Orchestration module (Heat): Similarly, Heat can
|
||||
be a valuable tool in orchestrating tasks a CMP
|
||||
decides are necessary in an OpenStack-based
|
||||
cloud.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="special-considerations-hybrid"><title>Special considerations</title>
|
||||
<para>Hybrid cloud deployments also involve two more issues that
|
||||
are not common in other situations:</para>
|
||||
<para>Image portability: Note that, as of the Icehouse release,
|
||||
there is no single common image format that is usable by all
|
||||
clouds. This means that images will need to be converted or
|
||||
recreated when porting between clouds. To make things simpler,
|
||||
launch the smallest and simplest images feasible, installing
|
||||
only what is necessary preferably using a deployment manager
|
||||
such as Chef or Puppet. That means not to use golden images
|
||||
for speeding up the process, however if the same images are
|
||||
being repeatedly deployed it may make more sense to utilize
|
||||
this technique instead of provisioning applications on lighter
|
||||
images each time.</para>
|
||||
<para>API differences: The most profound issue that cannot be
|
||||
avoided when using a hybrid cloud deployment with more than
|
||||
just OpenStack (or with different versions of OpenStack) is
|
||||
that the APIs needed to perform certain functions are
|
||||
different. The CMP needs to know how to handle all necessary
|
||||
versions. To get around this issue, some implementers build
|
||||
portals to achieve a hybrid cloud environment, but a heavily
|
||||
developer-focused organization will get more use out of a
|
||||
hybrid cloud broker SDK such as jClouds.</para></section>
|
||||
</section>
|
314
doc/arch-design/hybrid/section_user_requirements_hybrid.xml
Normal file
@ -0,0 +1,314 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="user-requirements-hybrid">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>User Requirements</title>
|
||||
<para>Hybrid cloud architectures introduce additional
|
||||
complexities, particularly those that use heterogeneous cloud
|
||||
platforms. As a result, it is important to make sure that
|
||||
design choices match requirements in such a way that the
|
||||
benefits outweigh the inherent additional complexity and
|
||||
risks.</para>
|
||||
<para>Business considerations to make when designing a hybrid
|
||||
cloud deployment include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Cost: A hybrid cloud architecture involves multiple
|
||||
vendors and technical architectures. These
|
||||
architectures may be more expensive to deploy and
|
||||
maintain. Operational costs can be higher because of
|
||||
the need for more sophisticated orchestration and
|
||||
brokerage tools than in other architectures. In
|
||||
contrast, overall operational costs might be lower by
|
||||
virtue of using a cloud brokerage tool to deploy the
|
||||
workloads to the most cost effective platform.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Revenue opportunity: Revenue opportunities vary
|
||||
greatly based on the intent and use case of the cloud.
|
||||
If it is being built as a commercial customer-facing
|
||||
product, consider the drivers for building it over
|
||||
multiple platforms and whether the use of multiple
|
||||
platforms make the design more attractive to target
|
||||
customers, thus enhancing the revenue
|
||||
opportunity.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Time to Market: One of the most common reasons to
|
||||
use cloud platforms is to speed the time to market of
|
||||
a new product or application. A business requirement
|
||||
to use multiple cloud platforms may be because there
|
||||
is an existing investment in several applications and
|
||||
it is faster to tie them together rather than
|
||||
migrating components and refactoring to a single
|
||||
platform.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Business or technical diversity: Organizations
|
||||
already leveraging cloud-based services may wish to
|
||||
embrace business diversity and utilize a hybrid cloud
|
||||
design to spread their workloads across multiple cloud
|
||||
providers so that no application is hosted in a single
|
||||
cloud provider.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Application momentum: A business with existing
|
||||
applications that are already in production on
|
||||
multiple cloud environments may find that it is more
|
||||
cost effective to integrate the applications on
|
||||
multiple cloud platforms rather than migrate them to a
|
||||
single platform.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<section xml:id="legal-requirements-hybrid"><title>Legal Requirements</title>
|
||||
<para>Many jurisdictions have legislative and regulatory
|
||||
requirements governing the storage and management of data in
|
||||
cloud environments. Common areas of regulation include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Data retention policies ensuring storage of
|
||||
persistent data and records management to meet data
|
||||
archival requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data ownership policies governing the possession and
|
||||
responsibility for data.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data sovereignty policies governing the storage of
|
||||
data in foreign countries or otherwise separate
|
||||
jurisdictions.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data compliance policies governing certain types of
|
||||
information needs to reside in certain locations due
|
||||
to regular issues and, more importantly, cannot reside
|
||||
in other locations for the same reason.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Examples of such legal frameworks include the data
|
||||
protection framework of the European Union
|
||||
(http://ec.europa.eu/justice/data-protection/) and the
|
||||
requirements of the Financial Industry Regulatory Authority
|
||||
(http://www.finra.org/Industry/Regulation/FINRARules/) in the
|
||||
United States. Consult a local regulatory body for more
|
||||
information.</para></section>
|
||||
<section xml:id="workload-considerations"><title>Workload Considerations</title>
|
||||
<para>Defining what the word "workload" means in the context of a
|
||||
hybrid cloud environment is important. Workload can be defined
|
||||
as the intended way the systems will be utilized, which is
|
||||
often referred to as a “use case.” A workload can be a single
|
||||
application or a suite of applications that work in concert.
|
||||
It can also be a duplicate set of applications that need to
|
||||
run on multiple cloud environments. In a hybrid cloud
|
||||
deployment, the same workload will often need to function
|
||||
equally well on radically different public and private cloud
|
||||
environments. The architecture needs to address these
|
||||
potential conflicts, complexity, and platform
|
||||
incompatibilities.</para>
|
||||
<para>Some possible use cases for a hybrid cloud architecture
|
||||
include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Dynamic resource expansion or "bursting": Another
|
||||
common reason to use a multiple cloud architecture is
|
||||
a "bursty" application that needs additional resources
|
||||
at times. An example of this case could be a retailer
|
||||
that needs additional resources during the holiday
|
||||
selling season, but does not want to build expensive
|
||||
cloud resources to meet the peak demand. They might
|
||||
have an OpenStack private cloud but want to burst to
|
||||
AWS or some other public cloud for these peak load
|
||||
periods. These bursts could be for long or short
|
||||
cycles ranging from hourly, monthly or yearly.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Disaster recovery-business continuity: The cheaper
|
||||
storage and instance management makes a good case for
|
||||
using the cloud as a secondary site. The public cloud
|
||||
is already heavily used for these purposes in
|
||||
combination with an OpenStack public or private
|
||||
cloud.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Federated hypervisor-instance management: Adding
|
||||
self-service, charge back and transparent delivery of
|
||||
the right resources from a federated pool can be cost
|
||||
effective. In a hybrid cloud environment, this is a
|
||||
particularly important consideration. Look for a cloud
|
||||
that provides cross-platform hypervisor support and
|
||||
robust instance management tools.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Application portfolio integration: An enterprise
|
||||
cloud delivers better application portfolio management
|
||||
and more efficient deployment by leveraging
|
||||
self-service features and rules for deployments based
|
||||
on types of use. A common driver for building hybrid
|
||||
cloud architecture is to stitch together multiple
|
||||
existing cloud environments that are already in
|
||||
production or development.<!-- In the interest of time to
|
||||
market, the requirements may be to maintain the
|
||||
multiple clouds and just integrate the pieces
|
||||
together, not rationalize to one cloud environment, but
|
||||
instead to --></para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Migration scenarios: A common reason to create a
|
||||
hybrid cloud architecture is to allow the migration of
|
||||
applications between different clouds. This may be
|
||||
because the application will be migrated permanently
|
||||
to a new platform, or it might be because the
|
||||
application needs to be supported on multiple
|
||||
platforms going forward.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High availability: Another important reason for
|
||||
wanting a multiple cloud architecture is to address
|
||||
the needs for high availability. By using a
|
||||
combination of multiple locations and platforms, a
|
||||
design can achieve a level of availability that is not
|
||||
possible with a single platform. This approach does
|
||||
add a significant amount of complexity.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>In addition to thinking about how the workload will work on
|
||||
a single cloud, the design must accommodate the added
|
||||
complexity of needing the workload to run on multiple cloud
|
||||
platforms. The complexity of transferring workloads across
|
||||
clouds needs to be explored at the application, instance,
|
||||
cloud platform, hypervisor, and network levels.</para></section>
|
||||
<section xml:id="tools-considerations-hybrid"><title>Tools Considerations</title>
|
||||
<para>When working with designs spanning multiple clouds, the
|
||||
design must incorporate tools to facilitate working across
|
||||
those multiple clouds. Some of the user requirements drive the
|
||||
need for tools that will do the following functions:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Broker between clouds: Since the multiple cloud
|
||||
architecture assumes that there will be at least two
|
||||
different and possibly incompatible platforms that are
|
||||
likely to have different costs, brokering software is
|
||||
designed to evaluate relative costs between different
|
||||
cloud platforms. These solutions are sometimes
|
||||
referred to as Cloud Management Platforms (CMPs).
|
||||
Examples include Rightscale, Gravitent, Scalr,
|
||||
CloudForms, and ManageIQ. These tools allow the
|
||||
designer to determine the right location for the
|
||||
workload based on predetermined criteria.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Facilitate orchestration across the clouds: CMPs are
|
||||
tools are used to tie everything together. Cloud
|
||||
orchestration tools are used to improve the management
|
||||
of IT application portfolios as they migrate onto
|
||||
public, private, and hybrid cloud platforms. These
|
||||
tools are an important consideration. Cloud
|
||||
orchestration tools are used for managing a diverse
|
||||
portfolio of installed systems across multiple cloud
|
||||
platforms. The typical enterprise IT application
|
||||
portfolio is still comprised of a few thousand
|
||||
applications scattered over legacy hardware,
|
||||
virtualized infrastructure, and now dozens of
|
||||
disjointed shadow public Infrastructure-as-a-Service
|
||||
(IaaS) and Software-as-a-Service (SaaS) providers and
|
||||
offerings.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="network-considerations-hybrid"><title>Network Considerations</title>
|
||||
<para>The network services functionality is an important factor to
|
||||
assess when choosing a CMP and cloud provider. Considerations
|
||||
are functionality, security, scalability and HA. Verification
|
||||
and ongoing testing of the critical features of the cloud
|
||||
endpoint used by the architecture are important tasks.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Once the network functionality framework has been
|
||||
decided, a minimum functionality test should be
|
||||
designed. This will ensure testing and functionality
|
||||
persists during and after upgrades.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Scalability across multiple cloud providers may
|
||||
dictate which underlying network framework you will
|
||||
choose in different cloud providers. It is important
|
||||
to have the network API functions presented and to
|
||||
verify that functionality persists across all cloud
|
||||
endpoints chosen.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High availability implementations vary in
|
||||
functionality and design. Examples of some common
|
||||
methods are Active-Hot-Standby, Active-Passive and
|
||||
Active-Active. High availability and a test framework
|
||||
needs to be developed to insure that the functionality
|
||||
and limitations are well understood.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Security considerations include how data is secured
|
||||
between client and endpoint and any traffic that
|
||||
traverses the multiple clouds, from eavesdropping to
|
||||
DoS activities.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="risk-mitigation-management-hybrid"><title>Risk Mitigation and Management
|
||||
Considerations</title>
|
||||
<para>Hybrid cloud architectures introduce additional risk because
|
||||
they add additional complexity and potentially conflicting or
|
||||
incompatible components or tools. However, they also reduce
|
||||
risk by spreading workloads over multiple providers. This
|
||||
means, if one was to go out of business, the organization
|
||||
could remain operational.</para>
|
||||
<para>Risks that will be heightened by using a hybrid cloud
|
||||
architecture include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Provider availability or implementation details:
|
||||
This can range from the company going out of business
|
||||
to the company changing how it delivers its services.
|
||||
Cloud architectures are inherently designed to be
|
||||
flexible and changeable; paradoxically, the cloud is
|
||||
both perceived to be rock solid and ever flexible at
|
||||
the same time.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Differing SLAs: Users of hybrid cloud environments
|
||||
potentially encounter some losses through differences
|
||||
in service level agreements. A hybrid cloud design
|
||||
needs to accommodate the different SLAs provided by
|
||||
the various clouds involved in the design, and must
|
||||
address the actual enforceability of the providers'
|
||||
SLAs.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Security levels: Securing multiple cloud
|
||||
environments is more complex than securing a single
|
||||
cloud environment. Concerns need to be addressed at,
|
||||
but not limited to, the application, network, and
|
||||
cloud platform levels. One issue is that different
|
||||
cloud platforms approach security differently, and a
|
||||
hybrid cloud design must address and compensate for
|
||||
differences in security approaches. For example, AWS
|
||||
uses a relatively simple model that relies on user
|
||||
privilege combined with firewalls.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Provider API changes: APIs are crucial in a hybrid
|
||||
cloud environment. As a consumer of a provider's cloud
|
||||
services, an organization will rarely have any control
|
||||
over provider changes to APIs. Cloud services that
|
||||
might have previously had compatible APIs may no
|
||||
longer work. This is particularly a problem with AWS
|
||||
and OpenStack AWS-compatible APIs. OpenStack was
|
||||
originally planned to maintain compatibility with
|
||||
changes in AWS APIs. However, over time, the APIs have
|
||||
become more divergent in functionality. One way to
|
||||
address this issue is to focus on using only the most
|
||||
common and basic APIs to minimize potential
|
||||
conflicts.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
</section>
|
BIN
doc/arch-design/images/Compute_NSX.png
Normal file
After Width: | Height: | Size: 52 KiB |
After Width: | Height: | Size: 39 KiB |
BIN
doc/arch-design/images/Compute_Tech_Bin_Packing_General1.png
Normal file
After Width: | Height: | Size: 35 KiB |
After Width: | Height: | Size: 118 KiB |
After Width: | Height: | Size: 83 KiB |
BIN
doc/arch-design/images/General_Architecture1.png
Normal file
After Width: | Height: | Size: 79 KiB |
BIN
doc/arch-design/images/General_Architecture2.png
Normal file
After Width: | Height: | Size: 77 KiB |
BIN
doc/arch-design/images/General_Architecture3.png
Normal file
After Width: | Height: | Size: 79 KiB |
BIN
doc/arch-design/images/Generic_CERN_Architecture.png
Normal file
After Width: | Height: | Size: 70 KiB |
BIN
doc/arch-design/images/Generic_CERN_Example.png
Normal file
After Width: | Height: | Size: 39 KiB |
After Width: | Height: | Size: 42 KiB |
BIN
doc/arch-design/images/Methodology.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
doc/arch-design/images/Multi-Cloud_DR2.png
Normal file
After Width: | Height: | Size: 60 KiB |
BIN
doc/arch-design/images/Multi-Cloud_Priv-AWS3.png
Normal file
After Width: | Height: | Size: 52 KiB |
BIN
doc/arch-design/images/Multi-Cloud_Priv-AWS4.png
Normal file
After Width: | Height: | Size: 60 KiB |
BIN
doc/arch-design/images/Multi-Cloud_Priv-Pub2.png
Normal file
After Width: | Height: | Size: 49 KiB |
BIN
doc/arch-design/images/Multi-Cloud_Priv-Pub3.png
Normal file
After Width: | Height: | Size: 55 KiB |
BIN
doc/arch-design/images/Multi-Cloud_failover.png
Normal file
After Width: | Height: | Size: 72 KiB |
BIN
doc/arch-design/images/Multi-Cloud_failover2.png
Normal file
After Width: | Height: | Size: 55 KiB |
BIN
doc/arch-design/images/Multi-Site_Customer_Edge.png
Normal file
After Width: | Height: | Size: 68 KiB |
BIN
doc/arch-design/images/Multi-Site_Location_Local.png
Normal file
After Width: | Height: | Size: 54 KiB |
BIN
doc/arch-design/images/Multi-Site_shared_keystone.png
Normal file
After Width: | Height: | Size: 53 KiB |
BIN
doc/arch-design/images/Multi-Site_shared_keystone1.png
Normal file
After Width: | Height: | Size: 50 KiB |
BIN
doc/arch-design/images/Multi-Site_shared_keystone_horizon.png
Normal file
After Width: | Height: | Size: 54 KiB |
After Width: | Height: | Size: 55 KiB |
After Width: | Height: | Size: 52 KiB |
BIN
doc/arch-design/images/Multi-site_Geo_Redundant_LB.png
Normal file
After Width: | Height: | Size: 75 KiB |
BIN
doc/arch-design/images/Network_Cloud_Storage1.png
Normal file
After Width: | Height: | Size: 40 KiB |
BIN
doc/arch-design/images/Network_Cloud_Storage2.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
doc/arch-design/images/Network_Web_Services1.png
Normal file
After Width: | Height: | Size: 56 KiB |
After Width: | Height: | Size: 20 KiB |
BIN
doc/arch-design/images/Special_case_SDN_external.png
Normal file
After Width: | Height: | Size: 30 KiB |
BIN
doc/arch-design/images/Special_case_SDN_hosted.png
Normal file
After Width: | Height: | Size: 22 KiB |
BIN
doc/arch-design/images/Specialized_Hardware2.png
Normal file
After Width: | Height: | Size: 46 KiB |
BIN
doc/arch-design/images/Specialized_OOO.png
Normal file
After Width: | Height: | Size: 57 KiB |
BIN
doc/arch-design/images/Specialized_VDI1.png
Normal file
After Width: | Height: | Size: 25 KiB |
BIN
doc/arch-design/images/Storage_Database_+_Object2.png
Normal file
After Width: | Height: | Size: 56 KiB |
BIN
doc/arch-design/images/Storage_Database_+_Object3.png
Normal file
After Width: | Height: | Size: 52 KiB |
BIN
doc/arch-design/images/Storage_Database_+_Object5.png
Normal file
After Width: | Height: | Size: 50 KiB |
BIN
doc/arch-design/images/Storage_Hadoop.png
Normal file
After Width: | Height: | Size: 46 KiB |
BIN
doc/arch-design/images/Storage_Hadoop3.png
Normal file
After Width: | Height: | Size: 50 KiB |
BIN
doc/arch-design/images/Storage_Object.png
Normal file
After Width: | Height: | Size: 35 KiB |
BIN
doc/arch-design/images/design-methodology.png
Normal file
After Width: | Height: | Size: 21 KiB |
BIN
doc/arch-design/images/openstack_fullcover2014_1.jpg
Normal file
After Width: | Height: | Size: 1.3 MiB |
BIN
doc/arch-design/images/packingexample-2.png
Normal file
After Width: | Height: | Size: 5.0 KiB |
BIN
doc/arch-design/images/region-example.png
Normal file
After Width: | Height: | Size: 39 KiB |
@ -0,0 +1,97 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-how-this-book-is-organized">
|
||||
<title>How this Book is Organized</title>
|
||||
<para>This book has been organized into various chapters that help
|
||||
define the use cases associated with making architectural
|
||||
choices related to an OpenStack cloud installation. Each
|
||||
chapter is intended to stand alone to encourage individual
|
||||
chapter readability, however each chapter is designed to
|
||||
contain useful information that may be applicable in
|
||||
situations covered by other chapters. Cloud architects may use
|
||||
this book as a comprehensive guide by reading all of the use
|
||||
cases, but it is also possible to review only the chapters
|
||||
which pertain to a specific use case. When choosing to read
|
||||
specific use cases, note that it may be necessary to read more
|
||||
than one section of the guide to formulate a complete design
|
||||
for the cloud. The use cases covered in this guide
|
||||
include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>General purpose: A cloud built with common
|
||||
components that should address 80% of common use
|
||||
cases.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Compute focused: A cloud designed to address compute
|
||||
intensive workloads such as high performance computing
|
||||
(HPC).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Storage focused: A cloud focused on storage
|
||||
intensive workloads such as data analytics with
|
||||
parallel file systems.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network focused: A cloud depending on high
|
||||
performance and reliable networking, such as a content
|
||||
delivery network (CDN).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Multi-site: A cloud built with multiple sites
|
||||
available for application deployments for
|
||||
geographical, reliability or data locality
|
||||
reasons.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Hybrid cloud: An architecture where multiple
|
||||
disparate clouds are connected either for failover,
|
||||
hybrid cloud bursting, or availability.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Massively Scalable: An architecture that is intended
|
||||
for cloud service providers or other extremely large
|
||||
installations.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>A section titled Specialized Use Cases provides information
|
||||
on architectures that have not previously been covered in the
|
||||
defined use cases.</para>
|
||||
<para>Each chapter in the guide is then further broken down into
|
||||
the following sections:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Introduction: Provides an overview of the
|
||||
architectural use case.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>User requirements: Defines the set of user
|
||||
considerations that typically come into play for that
|
||||
use case.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Technical considerations: Covers the technical
|
||||
issues that must be accounted when dealing with this
|
||||
use case.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Operational considerations: Covers the ongoing
|
||||
operational tasks associated with this use case and
|
||||
architecture.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Architecture: Covers the overall architecture
|
||||
associated with the use case.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Prescriptive examples: Presents one or more
|
||||
scenarios where this architecture could be
|
||||
deployed.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>A Glossary covers the terms and phrases used in the
|
||||
book.</para>
|
||||
</section>
|
@ -0,0 +1,88 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-why-and-who-we-wrote-this-book">
|
||||
<title>Why and How We Wrote this Book</title>
|
||||
<para>The velocity at which OpenStack environments are moving from
|
||||
proof-of-concepts to production deployments is leading to
|
||||
increasing questions and issues related to architecture design
|
||||
considerations. By and large these considerations are not
|
||||
addressed in the existing documentation, which typically
|
||||
focuses on the specifics of deployment and configuration
|
||||
options or operational considerations, rather than the bigger
|
||||
picture.</para>
|
||||
<para>We wrote this book to guide readers in designing an
|
||||
OpenStack architecture that meets the needs of their
|
||||
organization. This guide concentrates on identifying important
|
||||
design considerations for common cloud use cases and provides
|
||||
examples based on these design guidelines. This guide does not
|
||||
aim to provide explicit instructions for installing and
|
||||
configuring the cloud, but rather focuses on design principles
|
||||
as they relate to user requirements as well as technical and
|
||||
operational considerations. For specific guidance with
|
||||
installation and configuration there are a number of resources
|
||||
already available in the OpenStack documentation that help in
|
||||
that area.</para>
|
||||
<para>This book was written in a book sprint format, which is a
|
||||
facilitated, rapid development production method for books.
|
||||
For more information, see the Book Sprints website
|
||||
(www.booksprints.net).</para>
|
||||
<para>This book was written in five days during July 2014 while
|
||||
exhausting the M&M, Mountain Dew and healthy options
|
||||
supply, complete with juggling entertainment during lunches at
|
||||
VMware's headquarters in Palo Alto. The event was also
|
||||
documented on Twitter using the #OpenStackDesign hashtag. The
|
||||
Book Sprint was facilitated by Faith Bosworth and Adam
|
||||
Hyde.</para>
|
||||
<para>We would like to thank VMware for their generous
|
||||
hospitality, as well as our employers, Cisco, Cloudscaling,
|
||||
Comcast, EMC, Mirantis, Rackspace, Red Hat, Verizon, and
|
||||
VMware, for enabling us to contribute our time. We would
|
||||
especially like to think Anne Gentle and Kenneth Hui for all
|
||||
of their shepherding and organization in making this
|
||||
happen.</para>
|
||||
<para>The author team includes:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Kenneth Hui (EMC) @hui_kenneth</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Alexandra Settle (Rackspace) @dewsday</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Anthony Veiga (Comcast) @daaelar</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Beth Cohen (Verizon) @bfcohen</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Kevin Jackson (Rackspace) @itarchitectkev</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Maish Saidel-Keesing (Cisco) @maishsk</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Nick Chase (Mirantis) @NickChase</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Scott Lowe (VMware) @scott_lowe</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Sean Collins (Comcast) @sc68cal</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Sean Winn (Cloudscaling) @seanmwinn</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Sebastian Gutierrez (Red Hat) @gutseb</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Stephen Gordon (Red Hat) @xsgordon</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Vinny Valdez (Red Hat) @VinnyValdez</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
17
doc/arch-design/introduction/section_intended_audience.xml
Normal file
@ -0,0 +1,17 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-intended-audience">
|
||||
<title>Intended Audience</title>
|
||||
<para>This book has been written for architects and designers of
|
||||
OpenStack clouds. This book is not intended for people who are
|
||||
deploying OpenStack. For a guide on deploying and operating
|
||||
OpenStack, please refer to the Operations Guide
|
||||
http://docs.openstack.org/openstack-ops.</para>
|
||||
<para>The reader should have prior knowledge of cloud architecture
|
||||
and principles, experience in enterprise system design, Linux
|
||||
and virtualization experience, and a basic understanding of
|
||||
networking principles and protocols.</para>
|
||||
</section>
|
@ -0,0 +1,33 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-intro-to-openstack-arch-design-guide">
|
||||
<title>Introduction to the OpenStack Architecture Design
|
||||
Guide</title>
|
||||
<para>OpenStack is a leader in the cloud technology gold rush, as
|
||||
organizations of all stripes discover the increased
|
||||
flexibility and speed to market that self-service cloud and
|
||||
Infrastructure as a Service (IaaS) provides. To truly reap
|
||||
those benefits, however, the cloud must be designed and
|
||||
architected properly.</para>
|
||||
<para>A well-architected cloud provides a stable IT environment
|
||||
that offers easy access to needed resources, usage-based
|
||||
expenses, extra capacity on demand, disaster recovery, and a
|
||||
secure environment, but a well-architected cloud does not
|
||||
magically build itself. It requires careful consideration of a
|
||||
multitude of factors, both technical and non-technical.</para>
|
||||
<para>There is no single architecture that is "right" for an
|
||||
OpenStack cloud deployment. OpenStack can be used for any
|
||||
number of different purposes, and each of them has its own
|
||||
particular requirements and architectural
|
||||
peculiarities.</para>
|
||||
<para>This book is designed to look at some of the most common
|
||||
uses for OpenStack clouds (and even some that are less common,
|
||||
but provide a good example) and explain what issues need to be
|
||||
considered and why, along with a wealth of knowledge and
|
||||
advice to help an organization to design and build a
|
||||
well-architected OpenStack cloud that will fit its unique
|
||||
requirements.</para>
|
||||
</section>
|
232
doc/arch-design/introduction/section_methodology.xml
Normal file
@ -0,0 +1,232 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="methodology">
|
||||
<title>Methodology</title>
|
||||
<para>The magic of the cloud is that it can do anything. It is both robust
|
||||
and flexible, the best of both worlds. Yes, the cloud is highly flexible
|
||||
and it can do almost anything, but to get the most out of a cloud
|
||||
investment, it is important to define how the cloud will be used by
|
||||
creating and testing use cases. This is the chapter that describes the
|
||||
thought process behind how to design a cloud architecture that best
|
||||
suits the intended use.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="../images/Methodology.png"/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>The diagram shows at a very abstract level the process for capturing
|
||||
requirements and building use cases. Once a set of use cases has been
|
||||
defined, it can then be used to design the cloud architecture.</para>
|
||||
<para>Use case planning can seem counter-intuitive. After all, it takes
|
||||
about five minutes to sign up for a server with Amazon. Amazon does not
|
||||
know in advance what any given user is planning on doing with it, right?
|
||||
Wrong. Amazon’s product management department spends plenty of time
|
||||
figuring out exactly what would be attractive to their typical customer
|
||||
and honing the service to deliver it. For the enterprise, the planning
|
||||
process is no different, but instead of planning for an external paying
|
||||
customer, for example, the use could be for internal application
|
||||
developers or a web portal. The following is a list of the high level
|
||||
objectives that need to be incorporated into the thinking about creating
|
||||
a use case.</para>
|
||||
<para>Overall business objectives</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Develop clear definition of business goals and requirements
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Increase project support and engagement with business,
|
||||
customers and end users.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Technology</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Coordinate the OpenStack architecture across the project and
|
||||
leverage OpenStack community efforts more effectively.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Architect for automation as much as possible to speed
|
||||
development and deployment.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Use the appropriate tools for the development effort.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Create better and more test metrics and test harnesses to
|
||||
support continuous and integrated development, test processes
|
||||
and automation.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Organization</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Better messaging of management support of team efforts</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Develop better cultural understanding of Open Source, cloud
|
||||
architectures, Agile methodologies, continuous development, test
|
||||
and integration, overall development concepts in general</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>As an example of how this works, consider a business goal of using the
|
||||
cloud for the company’s E-commerce website. This goal means planning for
|
||||
applications that will support thousands of sessions per second,
|
||||
variable workloads, and lots of complex and changing data. By
|
||||
identifying the key metrics, such as number of concurrent transactions
|
||||
per second, size of database, and so on, it is possible to then build a
|
||||
method for testing the assumptions.</para>
|
||||
<para>Develop functional user scenarios: Develop functional user scenarios
|
||||
that can be used to develop test cases that can be used to measure
|
||||
overall project trajectory. If the organization is not ready to commit
|
||||
to an application or applications that can be used to develop user
|
||||
requirements, it needs to create requirements to build valid test
|
||||
harnesses and develop useable metrics. Once the metrics are established,
|
||||
as requirements change, it is easier to respond to the changes quickly
|
||||
without having to worry overly much about setting the exact requirements
|
||||
in advance. Think of this as creating ways to configure the system,
|
||||
rather than redesigning it every time there is a requirements change.</para>
|
||||
<para>Limit cloud feature set: Create requirements that address the pain
|
||||
points, but do not recreate the entire OpenStack tool suite. The
|
||||
requirement to build OpenStack, only better, is self-defeating. It is
|
||||
important to limit scope creep by concentrating on developing a platform
|
||||
that will address tool limitations for the requirements, but not
|
||||
recreating the entire suite of tools. Work with technical product owners
|
||||
to establish critical features that are needed for a successful cloud
|
||||
deployment.</para>
|
||||
<section xml:id="application-cloud-readiness-methods">
|
||||
<title>Application Cloud Readiness</title>
|
||||
<para>Although the cloud is designed to make things easier, it is
|
||||
important to realize that "using cloud" is more than just firing up
|
||||
an instance and dropping an application on it. The "lift and shift"
|
||||
approach works in certain situations, but there is a fundamental
|
||||
difference between clouds and traditional bare-metal-based
|
||||
environments, or even traditional virtualized environments.</para>
|
||||
<para>In traditional environments, with traditional enterprise
|
||||
applications, the applications and the servers that run on them are
|
||||
"pets". They're lovingly crafted and cared for, the servers have
|
||||
names like Gandalf or Tardis, and if they get sick, someone nurses
|
||||
them back to health. All of this is designed so that the application
|
||||
does not experience an outage.</para>
|
||||
<para>In cloud environments, on the other hand, servers are more like
|
||||
cattle. There are thousands of them, they get names like NY-1138-Q,
|
||||
and if they get sick, they get put down and a sysadmin installs
|
||||
another one. Traditional applications that are unprepared for this
|
||||
kind of environment, naturally will suffer outages, lost data, or
|
||||
worse.</para>
|
||||
<para>There are other reasons to design applications with cloud in mind.
|
||||
Some are defensive, such as the fact that applications cannot be
|
||||
certain of exactly where or on what hardware they will be launched,
|
||||
they need to be flexible, or at least adaptable. Others are
|
||||
proactive. For example, one of the advantages of using the cloud is
|
||||
scalability, so applications need to be designed in such a way that
|
||||
they t can take advantage of those and other opportunities.</para>
|
||||
</section>
|
||||
<section xml:id="determining-whether-an-application-is-cloud-ready">
|
||||
<title>Determining whether an application is cloud-ready</title>
|
||||
<para>There are several factors to take into consideration when looking
|
||||
at whether an application is a good fit for the cloud.</para>
|
||||
<para>Structure: A large, monolithic, single-tiered legacy application
|
||||
typically isn't a good fit for the cloud. Efficiencies are gained
|
||||
when load can be spread over several instances, so that a failure in
|
||||
one part of the system can be mitigated without affecting other
|
||||
parts of the system, or so that scaling can take place where the app
|
||||
needs it.</para>
|
||||
<para>Dependencies: Applications that depend on specific hardware --
|
||||
such as a particular chip set or an external device such as a
|
||||
fingerprint reader -- might not be a good fit for the cloud, unless
|
||||
those dependencies are specifically addressed. Similarly, if an
|
||||
application depends on an operating system or set of libraries that
|
||||
cannot be used in the cloud, or cannot be virtualized, that is a
|
||||
problem.</para>
|
||||
<para>Connectivity: Self-contained applications or those that depend on
|
||||
resources that are not reachable by the cloud in question, will not
|
||||
run. In some situations, work around these issues with custom
|
||||
network setup, but how well this works depends on the chosen cloud
|
||||
environment.</para>
|
||||
<para>Durability and Resilience: Despite the existence of SLAs, the one
|
||||
reality of the cloud is that Things Break. Servers go down, network
|
||||
connections are disrupted, other tenants on a server ramp up the
|
||||
load to make the server unusable. Any number of things can happen,
|
||||
and an application that isn't built to withstand this kind of
|
||||
disruption isn't going to work properly.</para>
|
||||
</section>
|
||||
<section xml:id="designing-for-the-cloud">
|
||||
<title>Designing for the cloud</title>
|
||||
<para>Here are some guidelines to keep in mind when designing an
|
||||
application for the cloud:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Be a pessimist: Assume everything fails and design
|
||||
backwards. Love your chaos monkey.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Put your eggs in multiple baskets: Leverage multiple
|
||||
providers, geographic regions and availability zones to
|
||||
accommodate for local availability issues. Design for
|
||||
portability.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Think efficiency: Inefficient designs will not scale.
|
||||
Efficient designs become cheaper as they scale. Kill off
|
||||
unneeded components or capacity.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Be paranoid: Design for defense in depth and zero
|
||||
tolerance by building in security at every level and between
|
||||
every component. Trust no one.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>But not too paranoid: Not every application needs the
|
||||
platinum solution. Architect for different SLA’s, service
|
||||
tiers and security levels.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Manage the data: Data is usually the most inflexible and
|
||||
complex area of a cloud and cloud integration architecture.
|
||||
Don’t short change the effort in analyzing and addressing
|
||||
data needs.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Hands off: Leverage automation to increase consistency and
|
||||
quality and reduce response times.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Divide and conquer: Pursue partitioning and
|
||||
parallel layering wherever possible. Make components as small
|
||||
and portable as possible. Use load balancing between layers.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Think elasticity: Increasing resources should result in a
|
||||
proportional increase in performance and scalability.
|
||||
Decreasing resources should have the opposite effect.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Be dynamic: Enable dynamic configuration changes such as
|
||||
auto scaling, failure recovery and resource discovery to
|
||||
adapt to changing environments, faults and workload volumes.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Stay close: Reduce latency by moving highly interactive
|
||||
components and data near each other.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Keep it loose: Loose coupling, service interfaces,
|
||||
separation of concerns, abstraction and well defined API’s
|
||||
deliver flexibility.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Be cost aware: Autoscaling, data transmission, virtual
|
||||
software licenses, reserved instances, and so on can rapidly
|
||||
increase monthly usage charges. Monitor usage closely.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
</section>
|
@ -0,0 +1,75 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-intro-massive-scale">
|
||||
<title>Introduction</title>
|
||||
<para>A massively scalable architecture is defined as a cloud
|
||||
implementation that is either a very large deployment, such as
|
||||
one that would be built by a commercial service provider, or
|
||||
one that has the capability to support user requests for large
|
||||
amounts of cloud resources. An example would be an
|
||||
infrastructure in which requests to service 500 instances or
|
||||
more at a time is not uncommon. In a massively scalable
|
||||
infrastructure, such a request is fulfilled without completely
|
||||
consuming all of the available cloud infrastructure resources.
|
||||
While the high capital cost of implementing such a cloud
|
||||
architecture makes it cost prohibitive and is only spearheaded
|
||||
by few organizations, many organizations are planning for
|
||||
massive scalability moving toward the future.</para>
|
||||
<para>A massively scalable OpenStack cloud design presents a
|
||||
unique set of challenges and considerations. For the most part
|
||||
it is similar to a general purpose cloud architecture, as it
|
||||
is built to address a non-specific range of potential use
|
||||
cases or functions. Typically, it is rare that massively
|
||||
scalable clouds are designed or specialized for particular
|
||||
workloads. Like the general purpose cloud, the massively
|
||||
scalable cloud is most often built as a platform for a variety
|
||||
of workloads. Massively scalable OpenStack clouds are
|
||||
generally built as commercial public cloud offerings since
|
||||
single private organizations rarely have the resources or need
|
||||
for this scale.</para>
|
||||
<para>Services provided by a massively scalable OpenStack cloud
|
||||
will include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Virtual-machine disk image library</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Raw block storage</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>File or object storage</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Firewall functionality</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Load balancing functionality</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Private (non-routable) and public (floating) IP
|
||||
addresses</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Virtualized network topologies</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Software bundles</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Virtual compute resources</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Like a general purpose cloud, the instances deployed in a
|
||||
massively scalable OpenStack cloud will not necessarily use
|
||||
any specific aspect of the cloud offering (compute, network,
|
||||
or storage). As the cloud grows in scale, the scale of the
|
||||
number of workloads can cause stress on all of the cloud
|
||||
components. Additional stresses are introduced to supporting
|
||||
infrastructure including databases and message brokers. The
|
||||
architecture design for such a cloud must account for these
|
||||
performance pressures without negatively impacting user
|
||||
experience.</para>
|
||||
</section>
|
@ -0,0 +1,99 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="operational-considerations-massive-scale">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Operational Considerations</title>
|
||||
<para>In order to run at massive scale, it is important to plan on
|
||||
the automation of as many of the operational processes as
|
||||
possible. Automation includes the configuration of
|
||||
provisioning, monitoring and alerting systems. Part of the
|
||||
automation process includes the capability to determine when
|
||||
human intervention is required and who should act. The
|
||||
objective is to increase the ratio of operational staff to
|
||||
running systems as much as possible to reduce maintenance
|
||||
costs. In a massively scaled environment, it is impossible for
|
||||
staff to give each system individual care.</para>
|
||||
<para>Configuration management tools such as Puppet or Chef allow
|
||||
operations staff to categorize systems into groups based on
|
||||
their role and thus create configurations and system states
|
||||
that are enforced through the provisioning system. Systems
|
||||
that fall out of the defined state due to errors or failures
|
||||
are quickly removed from the pool of active nodes and
|
||||
replaced.</para>
|
||||
<para>At large scale the resource cost of diagnosing individual
|
||||
systems that have failed is far greater than the cost of
|
||||
replacement. It is more economical to immediately replace the
|
||||
system with a new system that can be provisioned and
|
||||
configured automatically and quickly brought back into the
|
||||
pool of active nodes. By automating tasks that are labor
|
||||
intensive, repetitive, and critical to operations with
|
||||
automation, cloud operations teams are able to be managed more
|
||||
efficiently because fewer resources are needed for these
|
||||
babysitting tasks. Administrators are then free to tackle
|
||||
tasks that cannot be easily automated and have longer-term
|
||||
impacts on the business such as capacity planning.</para>
|
||||
<section xml:id="the-bleeding-edge"><title>The Bleeding Edge</title>
|
||||
<para>Running OpenStack at massive scale requires striking a
|
||||
balance between stability and features. For example, it might
|
||||
be tempting to run an older stable release branch of OpenStack
|
||||
to make deployments easier. However, when running at massive
|
||||
scale, known issues that may be of some concern or only have
|
||||
minimal impact in smaller deployments could become pain points
|
||||
at massive scale. If the issue is well known, in many cases,
|
||||
it may be resolved in more recent releases. The OpenStack
|
||||
community can help resolve any issues reported by the applying
|
||||
the collective expertise of the OpenStack developers.</para>
|
||||
<para>When issues crop up, the number of organizations running at
|
||||
a similar scale is a relatively tiny proportion of the
|
||||
OpenStack community, therefore it is important to share these
|
||||
issues with the community and be a vocal advocate for
|
||||
resolving them. Some issues only manifest when operating at
|
||||
large scale and the number of organizations able to duplicate
|
||||
and validate an issue is small, so it will be important to
|
||||
document and dedicate resources to their resolution.</para>
|
||||
<para>In some cases, the resolution to the problem is ultimately
|
||||
to deploy a more recent version of OpenStack. Alternatively,
|
||||
when the issue needs to be resolved in a production
|
||||
environment where rebuilding the entire environment is not an
|
||||
option, it is possible to deploy just the more recent separate
|
||||
underlying components required to resolve issues or gain
|
||||
significant performance improvements. At first glance, this
|
||||
could be perceived as potentially exposing the deployment to
|
||||
increased risk to and instability. However, in many cases it
|
||||
could be an issue that has not been discovered yet.</para>
|
||||
<para>It is advisable to cultivate a development and operations
|
||||
organization that is responsible for creating desired
|
||||
features, diagnose and resolve issues, and also build the
|
||||
infrastructure for large scale continuous integration tests
|
||||
and continuous deployment. This helps catch bugs early and
|
||||
make deployments quicker and less painful. In addition to
|
||||
development resources, the recruitment of experts in the
|
||||
fields of message queues, databases, distributed systems, and
|
||||
networking, cloud and storage is also advisable.</para></section>
|
||||
<section xml:id="growth-and-capacity-planning"><title>Growth and Capacity Planning</title>
|
||||
<para>An important consideration in running at massive scale is
|
||||
projecting growth and utilization trends to plan capital
|
||||
expenditures for the near and long term. Utilization metrics
|
||||
for compute, network, and storage as well as a historical
|
||||
record of these metrics are required. While securing major
|
||||
anchor tenants can lead to rapid jumps in the utilization
|
||||
rates of all resources, the steady adoption of the cloud
|
||||
inside an organizations or by public consumers in a public
|
||||
offering will also create a steady trend of increased
|
||||
utilization.</para></section>
|
||||
<section xml:id="skills-and-training"><title>Skills and Training</title>
|
||||
<para>Projecting growth for storage, networking, and compute is
|
||||
only one aspect of a growth plan for running OpenStack at
|
||||
massive scale. Growing and nurturing development and
|
||||
operational staff is an additional consideration. Sending team
|
||||
members to OpenStack conferences, meetup events, and
|
||||
encouraging active participation in the mailing lists and
|
||||
committees is a very important way to maintain skills and
|
||||
forge relationships in the community. A list of OpenStack
|
||||
training providers in the marketplace can be found here:
|
||||
http://www.openstack.org/marketplace/training/.</para>
|
||||
</section>
|
||||
</section>
|
@ -0,0 +1,127 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="technical-considerations-massive-scale">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Technical Considerations</title>
|
||||
<para>Converting an existing OpenStack environment that was
|
||||
designed for a different purpose to be massively scalable is a
|
||||
formidable task. When building a massively scalable
|
||||
environment from the ground up, make sure the initial
|
||||
deployment is built with the same principles and choices that
|
||||
apply as the environment grows. For example, a good approach
|
||||
is to deploy the first site as a multi-site environment. This
|
||||
allows the same deployment and segregation methods to be used
|
||||
as the environment grows to separate locations across
|
||||
dedicated links or wide area networks. In a hyperscale cloud,
|
||||
scale trumps redundancy. Applications must be modified with
|
||||
this in mind, relying on the scale and homogeneity of the
|
||||
environment to provide reliability rather than redundant
|
||||
infrastructure provided by non-commodity hardware
|
||||
solutions.</para>
|
||||
<section xml:id="infrastructure-segregation-massive-scale"><title>Infrastructure Segregation</title>
|
||||
<para>Fortunately, OpenStack services are designed to support
|
||||
massive horizontal scale. Be aware that this is not the case
|
||||
for the entire supporting infrastructure. This is particularly
|
||||
a problem for the database management systems and message
|
||||
queues used by the various OpenStack services for data storage
|
||||
and remote procedure call communications.</para>
|
||||
<para>Traditional clustering techniques are typically used to
|
||||
provide high availability and some additional scale for these
|
||||
environments. In the quest for massive scale, however,
|
||||
additional steps need to be taken to relieve the performance
|
||||
pressure on these components to prevent them from negatively
|
||||
impacting the overall performance of the environment. It is
|
||||
important to make sure that all the components are in balance
|
||||
so that, if and when the massively scalable environment fails,
|
||||
all the components are at, or close to, maximum
|
||||
capacity.</para>
|
||||
<para>Regions are used to segregate completely independent
|
||||
installations linked only by an Identity and Dashboard
|
||||
(optional) installation. Services are installed with separate
|
||||
API endpoints for each region, complete with separate database
|
||||
and queue installations. This exposes some awareness of the
|
||||
environment's fault domains to users and gives them the
|
||||
ability to ensure some degree of application resiliency while
|
||||
also imposing the requirement to specify which region their
|
||||
actions must be applied to.</para>
|
||||
<para>Environments operating at massive scale typically need their
|
||||
regions or sites subdivided further without exposing the
|
||||
requirement to specify the failure domain to the user. This
|
||||
provides the ability to further divide the installation into
|
||||
failure domains while also providing a logical unit for
|
||||
maintenance and the addition of new hardware. At hyperscale,
|
||||
instead of adding single compute nodes, administrators may add
|
||||
entire racks or even groups of racks at a time with each new
|
||||
addition of nodes exposed via one of the segregation concepts
|
||||
mentioned herein.</para>
|
||||
<para>Cells provide the ability to subdivide the compute portion
|
||||
of an OpenStack installation, including regions, while still
|
||||
exposing a single endpoint. In each region an API cell is
|
||||
created along with a number of compute cells where the
|
||||
workloads actually run. Each cell gets its own database and
|
||||
message queue setup (ideally clustered), providing the ability
|
||||
to subdivide the load on these subsystems, improving overall
|
||||
performance.</para>
|
||||
<para>Within each compute cell a complete compute installation is
|
||||
provided, complete with full database and queue installations,
|
||||
scheduler, conductor, and multiple compute hosts. The cells
|
||||
scheduler handles placement of user requests from the single
|
||||
API endpoint to a specific cell from those available. The
|
||||
normal filter scheduler then handles placement within the
|
||||
cell.</para>
|
||||
<para>The downside of using cells is that they are not well
|
||||
supported by any of the OpenStack services other than compute.
|
||||
Also, they do not adequately support some relatively standard
|
||||
OpenStack functionality such as security groups and host
|
||||
aggregates. Due to their relative newness and specialized use,
|
||||
they receive relatively little testing in the OpenStack gate.
|
||||
Despite these issues, however, cells are used in some very
|
||||
well known OpenStack installations operating at massive scale
|
||||
including those at CERN and Rackspace.</para></section>
|
||||
<section xml:id="host-aggregates"><title>Host Aggregates</title>
|
||||
<para>Host Aggregates enable partitioning of OpenStack Compute
|
||||
deployments into logical groups for load balancing and
|
||||
instance distribution. Host aggregates may also be used to
|
||||
further partition an availability zone. Consider a cloud which
|
||||
might use host aggregates to partition an availability zone
|
||||
into groups of hosts that either share common resources, such
|
||||
as storage and network, or have a special property, such as
|
||||
trusted computing hardware. Host aggregates are not explicitly
|
||||
user-targetable; instead they are implicitly targeted via the
|
||||
selection of instance flavors with extra specifications that
|
||||
map to host aggregate metadata.</para></section>
|
||||
<section xml:id="availability-zones"><title>Availability Zones</title>
|
||||
<para>Availability zones provide another mechanism for subdividing
|
||||
an installation or region. They are, in effect, Host
|
||||
aggregates that are exposed for (optional) explicit targeting
|
||||
by users.</para>
|
||||
<para>Unlike cells, they do not have their own database server or
|
||||
queue broker but simply represent an arbitrary grouping of
|
||||
compute nodes. Typically, grouping of nodes into availability
|
||||
zones is based on a shared failure domain based on a physical
|
||||
characteristic such as a shared power source, physical network
|
||||
connection, and so on. Availability Zones are exposed to the
|
||||
user because they can be targeted; however, users are not
|
||||
required to target them. An alternate approach is for the
|
||||
operator to set a default availability zone to schedule
|
||||
instances to other than the default availability zone of
|
||||
nova.</para></section>
|
||||
<section xml:id="segregation-example"><title>Segregation Example</title>
|
||||
<para>In this example the cloud is divided into two regions, one
|
||||
for each site, with two availability zones in each based on
|
||||
the power layout of the data centers. A number of host
|
||||
aggregates have also been defined to allow targeting of
|
||||
virtual machine instances using flavors, that require special
|
||||
capabilities shared by the target hosts such as SSDs, 10 G
|
||||
networks, or GPU cards.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Massively_Scalable_Cells_+_regions_+_azs.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject></section>
|
||||
</section>
|
@ -0,0 +1,173 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="user-requirements-massive-scale-overview">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>User Requirements</title>
|
||||
<para>More so than other scenarios, defining user requirements for
|
||||
a massively scalable OpenStack design architecture dictates
|
||||
approaching the design from two different, yet sometimes
|
||||
opposing, perspectives: the cloud user, and the cloud
|
||||
operator. The expectations and perceptions of the consumption
|
||||
and management of resources of a massively scalable OpenStack
|
||||
cloud from the user point of view is distinctly different from
|
||||
that of the cloud operator.</para>
|
||||
<para>Many jurisdictions have legislative and regulatory
|
||||
requirements governing the storage and management of data in
|
||||
cloud environments. Common areas of regulation include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Data retention policies ensuring storage of
|
||||
persistent data and records management to meet data
|
||||
archival requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data ownership policies governing the possession and
|
||||
responsibility for data.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data sovereignty policies governing the storage of
|
||||
data in foreign countries or otherwise separate
|
||||
jurisdictions.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data compliance policies governing certain types of
|
||||
information needs to reside in certain locations due
|
||||
to regular issues and, more importantly, cannot reside
|
||||
in other locations for the same reason.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Examples of such legal frameworks include the data
|
||||
protection framework of the European Union
|
||||
(http://ec.europa.eu/justice/data-protection/ ) and the
|
||||
requirements of the Financial Industry Regulatory Authority
|
||||
(http://www.finra.org/Industry/Regulation/FINRARules/ ) in the
|
||||
United States. Consult a local regulatory body for more
|
||||
information.</para>
|
||||
<section xml:id="user-requirements-massive-scale"><title>User Requirements</title>
|
||||
<para>Massively scalable OpenStack clouds have the following user
|
||||
requirements:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>The cloud user expects repeatable, dependable, and
|
||||
deterministic processes for launching and deploying
|
||||
cloud resources. This could be delivered through a
|
||||
web-based interface or publicly available API
|
||||
endpoints. All appropriate options for requesting
|
||||
cloud resources need to be available through some type
|
||||
of user interface, a command-line interface (CLI), or
|
||||
API endpoints.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Cloud users expect a fully self-service and
|
||||
on-demand consumption model. When an OpenStack cloud
|
||||
reaches the "massively scalable" size, it means it is
|
||||
expected to be consumed "as a service" in each and
|
||||
every way.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>For a user of a massively scalable OpenStack public
|
||||
cloud, there will be no expectations for control over
|
||||
security, performance, or availability. Only SLAs
|
||||
related to uptime of API services are expected, and
|
||||
very basic SLAs expected of services offered. The user
|
||||
understands it is his or her responsibility to address
|
||||
these issues on their own. The exception to this
|
||||
expectation is the rare case of a massively scalable
|
||||
cloud infrastructure built for a private or government
|
||||
organization that has specific requirements.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>As might be expected, the cloud user requirements or
|
||||
expectations that determine the design are all focused on the
|
||||
consumption model. The user expects to be able to easily
|
||||
consume cloud resources in an automated and deterministic way,
|
||||
without any need for knowledge of the capacity, scalability,
|
||||
or other attributes of the cloud's underlying
|
||||
infrastructure.</para></section>
|
||||
<section xml:id="operator-requirements-massive-scale"><title>Operator Requirements</title>
|
||||
<para>Whereas the cloud user should be completely unaware of the
|
||||
underlying infrastructure of the cloud and its attributes, the
|
||||
operator must be able to build and support the infrastructure,
|
||||
as well as how it needs to operate at scale. This presents a
|
||||
very demanding set of requirements for building such a cloud
|
||||
from the operator's perspective:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>First and foremost, everything must be capable of
|
||||
automation. From the deployment of new hardware,
|
||||
compute hardware, storage hardware, or networking
|
||||
hardware, to the installation and configuration of the
|
||||
supporting software, everything must be capable of
|
||||
being automated. Manual processes will not suffice in
|
||||
a massively scalable OpenStack design
|
||||
architecture.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>The cloud operator requires that capital expenditure
|
||||
(CapEx) is minimized at all layers of the stack.
|
||||
Operators of massively scalable OpenStack clouds
|
||||
require the use of dependable commodity hardware and
|
||||
freely available open source software components to
|
||||
reduce deployment costs and operational expenses.
|
||||
Initiatives like OpenCompute (more information
|
||||
available at http://www.opencompute.org) provide
|
||||
additional information and pointers. To cut costs,
|
||||
many operators sacrifice redundancy. For example,
|
||||
redundant power supplies, redundant network
|
||||
connections, and redundant rack switches.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Companies operating a massively scalable OpenStack
|
||||
cloud also require that operational expenditures
|
||||
(OpEx) be minimized as much as possible. It is
|
||||
recommended that cloud-optimized hardware is a good
|
||||
approach when managing operational overhead. Some of
|
||||
the factors that need to be considered include power,
|
||||
cooling, and the physical design of the chassis. It is
|
||||
possible to customize the hardware and systems so they
|
||||
are optimized for this type of workload because of the
|
||||
scale of these implementations.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Massively scalable OpenStack clouds require
|
||||
extensive metering and monitoring functionality to
|
||||
maximize the operational efficiency by keeping the
|
||||
operator informed about the status and state of the
|
||||
infrastructure. This includes full scale metering of
|
||||
the hardware and software status. A corresponding
|
||||
framework of logging and alerting is also required to
|
||||
store and allow operations to act upon the metrics
|
||||
provided by the metering and monitoring solution(s).
|
||||
The cloud operator also needs a solution that uses the
|
||||
data provided by the metering and monitoring solution
|
||||
to provide capacity planning and capacity trending
|
||||
analysis.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A massively scalable OpenStack cloud will be a
|
||||
multi-site cloud. Therefore, the user-operator
|
||||
requirements for a multi-site OpenStack architecture
|
||||
design are also applicable here. This includes various
|
||||
legal requirements for data storage, data placement,
|
||||
and data retention; other jurisdictional legal or
|
||||
compliance requirements; image
|
||||
consistency-availability; storage replication and
|
||||
availability (both block and file/object storage); and
|
||||
authentication, authorization, and auditing (AAA),
|
||||
just to name a few. Refer to the "Multi-Site" section
|
||||
for more details on requirements and considerations
|
||||
for multi-site OpenStack clouds.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Considerations around physical facilities such as
|
||||
space, floor weight, rack height and type,
|
||||
environmental considerations, power usage and power
|
||||
usage efficiency (PUE), and physical security must
|
||||
also be addressed by the design architecture of a
|
||||
massively scalable OpenStack cloud.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
</section>
|
118
doc/arch-design/multi_site/section_architecture_multi_site.xml
Normal file
@ -0,0 +1,118 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
|
||||
xml:id="arch-design-architecture-multiple-site">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Architecture</title>
|
||||
<para>This graphic is a high level diagram of a multiple site OpenStack
|
||||
architecture. Each site is an OpenStack cloud but it may be necessary to
|
||||
architect the sites on different versions. For example, if the second
|
||||
site is intended to be a replacement for the first site, they would be
|
||||
different. Another common design would be a private OpenStack cloud with
|
||||
replicated site that would be used for high availability or disaster
|
||||
recovery. The most important design decision is how to configure the
|
||||
storage. It can be configured as a single shared pool or separate pools,
|
||||
depending on the user and technical requirements.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Multi-Site_shared_keystone_horizon_swift1.png"/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<section xml:id="openstack-services-architecture">
|
||||
<title>OpenStack Services Architecture</title>
|
||||
<para>The OpenStack Identity service, which is used by all other
|
||||
OpenStack components for authorization and the catalog of service
|
||||
endpoints, supports the concept of regions. A region is a logical
|
||||
construct that can be used to group OpenStack services that are in
|
||||
close proximity to one another. The concept of regions is flexible;
|
||||
it may can contain OpenStack service endpoints located within a
|
||||
distinct geographic region, or regions. It may be smaller in scope,
|
||||
where a region is a single rack within a data center or even a
|
||||
single blade chassis, with multiple regions existing in adjacent
|
||||
racks in the same data center.</para>
|
||||
<para>The majority of OpenStack components are designed to run within
|
||||
the context of a single region. The OpenStack Compute service is
|
||||
designed to manage compute resources within a region, with support
|
||||
for subdivisions of compute resources by using Availability Zones
|
||||
and Cells. The OpenStack Networking service can be used to manage
|
||||
network resources in the same broadcast domain or collection of
|
||||
switches that are linked. The OpenStack Block Storage service
|
||||
controls storage resources within a region with all storage
|
||||
resources residing on the same storage network. Like the OpenStack
|
||||
Compute service, the OpenStack Block Storage Service also supports
|
||||
the Availability Zone construct,which can be used to subdivide
|
||||
storage resources.</para>
|
||||
<para>The OpenStack Dashboard, OpenStack Identity Service, and OpenStack
|
||||
Object Storage services are components that can each be deployed
|
||||
centrally in order to serve multiple regions.</para>
|
||||
</section>
|
||||
<section xml:id="arch-multi-storage">
|
||||
<title>Storage</title>
|
||||
<para>With multiple OpenStack regions, having a single OpenStack Object
|
||||
Storage Service endpoint that delivers shared file storage for all
|
||||
regions is desirable. The Object Storage service internally
|
||||
replicates files to multiple nodes. The advantages of this are that,
|
||||
if a file placed into the Object Storage service is visible to all
|
||||
regions, it can be used by applications or workloads in any or all
|
||||
of the regions. This simplifies high availability failover and
|
||||
disaster recovery rollback.</para>
|
||||
<para>In order to scale the Object Storage service to meet the workload
|
||||
of multiple regions, multiple proxy workers are run and
|
||||
load-balanced, storage nodes are installed in each region, and the
|
||||
entire Object Storage Service can be fronted by an HTTP caching
|
||||
layer. This is done so client requests for objects can be served out
|
||||
of caches rather than directly from the storage modules themselves,
|
||||
reducing the actual load on the storage network. In addition to an
|
||||
HTTP caching layer, use a caching layer like Memcache to cache
|
||||
objects between the proxy and storage nodes.</para>
|
||||
<para>If the cloud is designed without a single Object Storage Service
|
||||
endpoint for multiple regions, and instead a separate Object Storage
|
||||
Service endpoint is made available in each region, applications are
|
||||
required to handle synchronization (if desired) and other management
|
||||
operations to ensure consistency across the nodes. For some
|
||||
applications, having multiple Object Storage Service endpoints
|
||||
located in the same region as the application may be desirable due
|
||||
to reduced latency, cross region bandwidth, and ease of
|
||||
deployment.</para>
|
||||
<para>For the Block Storage service, the most important decisions are
|
||||
the selection of the storage technology and whether or not a
|
||||
dedicated network is used to carry storage traffic from the storage
|
||||
service to the compute nodes.</para>
|
||||
</section>
|
||||
<section xml:id="arch-networking-multiple">
|
||||
<title>Networking</title>
|
||||
<para>When connecting multiple regions together there are several design
|
||||
considerations. The overlay network technology choice determines how
|
||||
packets are transmitted between regions and how the logical network
|
||||
and addresses present to the application. If there are security or
|
||||
regulatory requirements, encryption should be implemented to secure
|
||||
the traffic between regions. For networking inside a region, the
|
||||
overlay network technology for tenant networks is equally important.
|
||||
The overlay technology and the network traffic of an application
|
||||
generates or receives can be either complementary or be at cross
|
||||
purpose. For example, using an overlay technology for an application
|
||||
that transmits a large amount of small packets could add excessive
|
||||
latency or overhead to each packet if not configured
|
||||
properly.</para>
|
||||
</section>
|
||||
<section xml:id="arch-dependencies-multiple">
|
||||
<title>Dependencies</title>
|
||||
<para>The architecture for a multi-site installation of OpenStack is
|
||||
dependent on a number of factors. One major dependency to consider
|
||||
is storage. When designing the storage system, the storage mechanism
|
||||
needs to be determined. Once the storage type is determined, how it
|
||||
will be accessed is critical. For example, it is recommended that
|
||||
storage should utilize a dedicated network. Another concern is how
|
||||
the storage is configured to protect the data. For example, the
|
||||
recovery point objective (RPO) and the recovery time objective
|
||||
(RTO). How quickly can the recovery from a fault be completed, will
|
||||
determine how often the replication of data be required. Ensure that
|
||||
enough storage is allocated to support the data protection
|
||||
strategy.</para>
|
||||
<para>Networking decisions include the encapsulation mechanism that will
|
||||
be used for the tenant networks, how large the broadcast domains
|
||||
should be, and the contracted SLAs for the interconnects.</para>
|
||||
</section>
|
||||
</section>
|
@ -0,0 +1,33 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-intro-multi">
|
||||
<title>Introduction</title>
|
||||
<para>A multi-site OpenStack environment is one in which services
|
||||
located in more than one data center are used to provide the
|
||||
overall solution. Usage requirements of different multi-site
|
||||
clouds may vary widely, however they share some common needs.
|
||||
OpenStack is capable of running in a multi-region
|
||||
configuration allowing some parts of OpenStack to effectively
|
||||
manage a grouping of sites as a single cloud. With some
|
||||
careful planning in the design phase, OpenStack can act as an
|
||||
excellent multi-site cloud solution for a multitude of
|
||||
needs.</para>
|
||||
<para>Some use cases that might indicate a need for a multi-site
|
||||
deployment of OpenStack include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>An organization with a diverse geographic
|
||||
footprint.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Geo-location sensitive data.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data locality, in which specific data or
|
||||
functionality should be close to users.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
@ -0,0 +1,178 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="operational-considerations-multi-site">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Operational Considerations</title>
|
||||
<para>Deployment of a multi-site OpenStack cloud using regions
|
||||
requires that the service catalog contains per-region entries
|
||||
for each service deployed other than the Identity service
|
||||
itself. There is limited support amongst currently available
|
||||
off-the-shelf OpenStack deployment tools for defining multiple
|
||||
regions in this fashion.</para>
|
||||
<para>Deployers must be aware of this and provide the appropriate
|
||||
customization of the service catalog for their site either
|
||||
manually or via customization of the deployment tools in
|
||||
use.</para>
|
||||
<para>Note that, as of the Icehouse release, documentation for
|
||||
implementing this feature is in progress. See this bug for
|
||||
more information:
|
||||
https://bugs.launchpad.net/openstack-manuals/+bug/1340509</para>
|
||||
<section xml:id="licensing"><title>Licensing</title>
|
||||
<para>Multi-site OpenStack deployments present additional
|
||||
licensing considerations over and above regular OpenStack
|
||||
clouds, particularly where site licenses are in use to provide
|
||||
cost efficient access to software licenses. The licensing for
|
||||
host operating systems, guest operating systems, OpenStack
|
||||
distributions (if applicable), software-defined infrastructure
|
||||
including network controllers and storage systems, and even
|
||||
individual applications need to be evaluated in light of the
|
||||
multi-site nature of the cloud.</para>
|
||||
<para>Topics to consider include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>The specific definition of what constitutes a site
|
||||
in the relevant licenses, as the term does not
|
||||
necessarily denote a geographic or otherwise
|
||||
physically isolated location in the traditional
|
||||
sense.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Differentiations between "hot" (active) and "cold"
|
||||
(inactive) sites where significant savings may be made
|
||||
in situations where one site is a cold standby for
|
||||
disaster recovery purposes only.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Certain locations might require local vendors to
|
||||
provide support and services for each site provides
|
||||
challenges, but will vary on the licensing agreement
|
||||
in place.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="logging-and-monitoring-multi-site"><title>Logging and Monitoring</title>
|
||||
<para>Logging and monitoring does not significantly differ for a
|
||||
multi-site OpenStack cloud. The same well known tools
|
||||
described in the Operations Guide
|
||||
(http://docs.openstack.org/openstack-ops/content/logging_monitoring.html)
|
||||
remain applicable. Logging and monitoring can be provided both
|
||||
on a per-site basis and in a common centralized
|
||||
location.</para>
|
||||
<para>When attempting to deploy logging and monitoring facilities
|
||||
to a centralized location, care must be taken with regards to
|
||||
the load placed on the inter-site networking links.</para></section>
|
||||
<section xml:id="upgrades-multi-site"><title>Upgrades</title>
|
||||
<para>In multi-site OpenStack clouds deployed using regions each
|
||||
site is, effectively, an independent OpenStack installation
|
||||
which is linked to the others by using centralized services
|
||||
such as Identity which are shared between sites. At a high
|
||||
level the recommended order of operations to upgrade an
|
||||
individual OpenStack environment is
|
||||
(http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html):</para>
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>Upgrade the OpenStack Identity Service
|
||||
(Keystone).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Upgrade the OpenStack Image Service (Glance).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Upgrade OpenStack Compute (Nova), including
|
||||
networking components.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Upgrade OpenStack Block Storage (Cinder).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Upgrade the OpenStack dashboard.(Horizon)</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<para>The process for upgrading a multi-site environment is not
|
||||
significantly different:</para>
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>Upgrade the shared OpenStack Identity Service
|
||||
(Keystone) deployment.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Upgrade the OpenStack Image Service (glance) at each
|
||||
site.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Upgrade OpenStack Compute (Nova), including
|
||||
networking components, at each site.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Upgrade OpenStack Block Storage (Cinder) at each
|
||||
site.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Upgrade the OpenStack dashboard (Horizon), at each
|
||||
site - or in the single central location if it is
|
||||
shared.</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
<para>Note that, as of the OpenStack Icehouse release, compute
|
||||
upgrades within each site can also be performed in a rolling
|
||||
fashion. Compute controller services (API, Scheduler, and
|
||||
Conductor) can be upgraded prior to upgrading of individual
|
||||
compute nodes. This maximizes the ability of operations staff
|
||||
to keep a site operational for users of compute services while
|
||||
performing an upgrade.</para></section>
|
||||
<section xml:id="quota-management-multi-site"><title>Quota Management</title>
|
||||
<para>To prevent system capacities from being exhausted without
|
||||
notification, OpenStack provides operators with the ability to
|
||||
define quotas. Quotas are used to set operational limits and
|
||||
are currently enforced at the tenant (or project) level rather
|
||||
than at the user level.</para>
|
||||
<para>Quotas are defined on a per-region basis. Operators may wish
|
||||
to define identical quotas for tenants in each region of the
|
||||
cloud to provide a consistent experience, or even create a
|
||||
process for synchronizing allocated quotas across regions. It
|
||||
is important to note that only the operational limits imposed
|
||||
by the quotas will be aligned consumption of quotas by users
|
||||
will not be reflected between regions.</para>
|
||||
<para>For example, given a cloud with two regions, if the operator
|
||||
grants a user a quota of 25 instances in each region then that
|
||||
user may launch a total of 50 instances spread across both
|
||||
regions. They may not, however, launch more than 25 instances
|
||||
in any single region.</para>
|
||||
<para>For more information on managing quotas refer to Chapter 9.
|
||||
Managing Projects and Users
|
||||
(http://docs.openstack.org/openstack-ops/content/projects_users.html)
|
||||
of the OpenStack Operators Guide.</para></section>
|
||||
<section xml:id="policy-management-multi-site"><title>Policy Management</title>
|
||||
<para>OpenStack provides a default set of Role Based Access
|
||||
Control (RBAC) policies, defined in a <filename>policy.json</filename> file, for
|
||||
each service. Operators edit these files to customize the
|
||||
policies for their OpenStack installation. If the application
|
||||
of consistent RBAC policies across sites is considered a
|
||||
requirement, then it is necessary to ensure proper
|
||||
synchronization of the <filename>policy.json</filename> files to all
|
||||
installations.</para>
|
||||
<para>This must be done using normal system administration tools
|
||||
such as rsync as no functionality for synchronizing policies
|
||||
across regions is currently provided within OpenStack.</para></section>
|
||||
<section xml:id="documentation-multi-site"><title>Documentation</title>
|
||||
<para>Users must be able to leverage cloud infrastructure and
|
||||
provision new resources in the environment. It is important
|
||||
that user documentation is accessible by users of the cloud
|
||||
infrastructure to ensure they are given sufficient information
|
||||
to help them leverage the cloud. As an example, by default
|
||||
OpenStack will schedule instances on a compute node
|
||||
automatically. However, when multiple regions are available,
|
||||
it is left to the end user to decide in which region to
|
||||
schedule the new instance. Horizon will present the user with
|
||||
the first region in your configuration. The API and CLI tools
|
||||
will not execute commands unless a valid region is specified.
|
||||
It is therefore important to provide documentation to your
|
||||
users describing the region layout as well as calling out that
|
||||
quotas are region-specific. If a user reaches his or her quota
|
||||
in one region, OpenStack will not automatically build new
|
||||
instances in another. Documenting specific examples will help
|
||||
users understand how to operate the cloud, thereby reducing
|
||||
calls and tickets filed with the help desk.</para></section>
|
||||
</section>
|
@ -0,0 +1,218 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="prescriptive-example-multisite">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Prescriptive Examples</title>
|
||||
<para>Based on the needs of the intended workloads, there are
|
||||
multiple ways to build a multi-site OpenStack installation.
|
||||
Below are example architectures based on different
|
||||
requirements. These examples are meant as a reference, and not
|
||||
a hard and fast rule for deployments. Use the previous
|
||||
sections of this chapter to assist in selecting specific
|
||||
components and implementations based on specific needs.</para>
|
||||
<para>A large content provider needs to deliver content to
|
||||
customers that are geographically dispersed. The workload is
|
||||
very sensitive to latency and needs a rapid response to
|
||||
end-users. After reviewing the user, technical and operational
|
||||
considerations, it is determined beneficial to build a number
|
||||
of regions local to the customer’s edge. In this case rather
|
||||
than build a few large, centralized data centers, the intent
|
||||
of the architecture is to provide a pair of small data centers
|
||||
in locations that are closer to the customer. In this use
|
||||
case, spreading applications out allows for different
|
||||
horizontal scaling than a traditional compute workload scale.
|
||||
The intent is to scale by creating more copies of the
|
||||
application in closer proximity to the users that need it
|
||||
most, in order to ensure faster response time to user
|
||||
requests. This provider will deploy two datacenters at each of
|
||||
the four chosen regions. The implications of this design are
|
||||
based around the method of placing copies of resources in each
|
||||
of the remote regions. Swift objects, Glance images, and block
|
||||
storage will need to be manually replicated into each region.
|
||||
This may be beneficial for some systems, such as the case of
|
||||
content service, where only some of the content needs to exist
|
||||
in some but not all regions. A centralized Keystone is
|
||||
recommended to ensure authentication and that access to the
|
||||
API endpoints is easily manageable.</para>
|
||||
<para>Installation of an automated DNS system such as Designate is
|
||||
highly recommended. Unless an external Dynamic DNS system is
|
||||
available, application administrators will need a way to
|
||||
manage the mapping of which application copy exists in each
|
||||
region and how to reach it. Designate will assist by making
|
||||
the process automatic and by populating the records in the
|
||||
each region's zone.</para>
|
||||
<para>Telemetry for each region is also deployed, as each region
|
||||
may grow differently or be used at a different rate.
|
||||
Ceilometer will run to collect each region's metrics from each
|
||||
of the controllers and report them back to a central location.
|
||||
This is useful both to the end user and the administrator of
|
||||
the OpenStack environment. The end user will find this method
|
||||
useful, in that it is possible to determine if certain
|
||||
locations are experiencing higher load than others, and take
|
||||
appropriate action. Administrators will also benefit by
|
||||
possibly being able to forecast growth per region, rather than
|
||||
expanding the capacity of all regions simultaneously,
|
||||
therefore maximizing the cost-effectiveness of the multi-site
|
||||
design.</para>
|
||||
<para>One of the key decisions of running this sort of
|
||||
infrastructure is whether or not to provide a redundancy
|
||||
model. Two types of redundancy and high availability models in
|
||||
this configuration will be implemented. The first type
|
||||
revolves around the availability of the central OpenStack
|
||||
components. Keystone will be made highly available in three
|
||||
central data centers that will host the centralized OpenStack
|
||||
components. This prevents a loss of any one of the regions
|
||||
causing an outage in service. It also has the added benefit of
|
||||
being able to run a central storage repository as a primary
|
||||
cache for distributing content to each of the regions.</para>
|
||||
<para>The second redundancy topic is that of the edge data center
|
||||
itself. A second data center in each of the edge regional
|
||||
locations will house a second region near the first. This
|
||||
ensures that the application will not suffer degraded
|
||||
performance in terms of latency and availability.</para>
|
||||
<para>This figure depicts the solution designed to have both a
|
||||
centralized set of core data centers for OpenStack services
|
||||
and paired edge data centers:</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Multi-Site_Customer_Edge.png"/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<section xml:id="geo-redundant-load-balancing"><title>Geo-redundant load balancing</title>
|
||||
<para>A large-scale web application has been designed with cloud
|
||||
principles in mind. The application is designed provide
|
||||
service to application store, on a 24/7 basis. The company has
|
||||
typical 2-tier architecture with a web front-end servicing the
|
||||
customer requests and a NoSQL database back end storing the
|
||||
information.</para>
|
||||
<para>As of late there has been several outages in number of major
|
||||
public cloud providers - usually due to the fact these
|
||||
applications were running out of a single geographical
|
||||
location. The design therefore should mitigate the chance of a
|
||||
single site causing an outage for their business.</para>
|
||||
<para>The solution would consist of the following OpenStack
|
||||
components:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>A firewall, switches and load balancers on the
|
||||
public facing network connections.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Controller services running, Networking,
|
||||
Horizon, Cinder and Nova compute running locally in
|
||||
each of the three regions. The other services,
|
||||
Keystone, Heat Ceilometer, Glance and Swift will be
|
||||
installed centrally - with nodes in each of the region
|
||||
providing a redundant OpenStack Controller plane
|
||||
throughout the globe.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Compute nodes running the KVM
|
||||
hypervisor.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Object Storage for serving static objects
|
||||
such as images will be used to ensure that all images
|
||||
are standardized across all the regions, and
|
||||
replicated on a regular basis.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A Distributed DNS service available to all regions -
|
||||
that allows for dynamic update of DNS records of
|
||||
deployed instances.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A geo-redundant load balancing service will be used
|
||||
to service the requests from the customers based on
|
||||
their origin.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>An autoscaling heat template will used to deploy the
|
||||
application in the three regions. This template will
|
||||
include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Web Servers, running Apache.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Appropriate user_data to populate the central DNS
|
||||
servers upon instance launch.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Appropriate Ceilometer alarms that maintain state of
|
||||
the application and allow for handling of region or
|
||||
instance failure.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Another autoscaling Heat template will be used to deploy a
|
||||
distributed MongoDB shard over the three locations - with the
|
||||
option of storing required data on a globally available Swift
|
||||
container. according to the usage and load on the database
|
||||
server - additional shards will be provisioned according to
|
||||
the thresholds defined in Ceilometer.</para>
|
||||
<para>The reason that 3 regions were selected here was because of
|
||||
the fear of having abnormal load on a single region in the
|
||||
event of a failure. Two data center would have been sufficient
|
||||
had the requirements been met.</para>
|
||||
<para>Heat is used because of the built-in functionality of
|
||||
autoscaling and auto healing in the event of increased load.
|
||||
Additional configuration management tools, such as Puppet or
|
||||
Chef could also have been used in this scenario, but were not
|
||||
chosen due to the fact that Heat had the appropriate built-in
|
||||
hooks into the OpenStack cloud - whereas the other tools were
|
||||
external and not native to OpenStack. In addition - since this
|
||||
deployment scenario was relatively straight forward - the
|
||||
external tools were not needed.</para>
|
||||
<para>Swift is used here to serve as a back end for Glance and
|
||||
Object storage since was the most suitable solution for a
|
||||
globally distributed storage solution - with its own
|
||||
replication mechanism. Home grown solutions could also have
|
||||
been used including the handling of replication - but were not
|
||||
chosen, because Swift is already an intricate part of the
|
||||
infrastructure - and proven solution.</para>
|
||||
<para>An external load balancing service was used and not the
|
||||
LBaaS in OpenStack because the solution in OpenStack is not
|
||||
redundant and does have any awareness of geo location.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Multi-site_Geo_Redundant_LB.png"/>
|
||||
</imageobject>
|
||||
</mediaobject></section>
|
||||
<section xml:id="location-local-services"><title>Location-local service</title>
|
||||
<para>A common use for a multi-site deployment of OpenStack, is
|
||||
for creating a Content Delivery Network. An application that
|
||||
uses a location-local architecture will require low network
|
||||
latency and proximity to the user, in order to provide an
|
||||
optimal user experience, in addition to reducing the cost of
|
||||
bandwidth and transit, since the content resides on sites
|
||||
closer to the customer, instead of a centralized content store
|
||||
that would require utilizing higher cost cross country
|
||||
links.</para>
|
||||
<para>This architecture usually includes a geo-location component
|
||||
that places user requests at the closest possible node. In
|
||||
this scenario, 100% redundancy of content across every site is
|
||||
a goal rather than a requirement, with the intent being to
|
||||
maximize the amount of content available that is within a
|
||||
minimum number of network hops for any given end user. Despite
|
||||
these differences, the storage replication configuration has
|
||||
significant overlap with that of a geo-redundant load
|
||||
balancing use case.</para>
|
||||
<para>In this example, the application utilizing this multi-site
|
||||
OpenStack install that is location aware would launch web
|
||||
server or content serving instances on the compute cluster in
|
||||
each site. Requests from clients will first be sent to a
|
||||
global services load balancer that determines the location of
|
||||
the client, then routes the request to the closest OpenStack
|
||||
site where the application completes the request.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Multi-Site_shared_keystone1.png"/>
|
||||
</imageobject>
|
||||
</mediaobject></section>
|
||||
</section>
|
@ -0,0 +1,196 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="technical-considerations-multi-site">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Technical Considerations</title>
|
||||
<para>There are many technical considerations to take into account
|
||||
with regard to designing a multi-site OpenStack
|
||||
implementation. An OpenStack cloud can be designed in a
|
||||
variety of ways to handle individual application needs. A
|
||||
multi-site deployment will have additional challenges compared
|
||||
to single site installations and will therefore be a more
|
||||
complex solution.</para>
|
||||
<para>When determining capacity options be sure to take into
|
||||
account not just the technical issues, but also the economic
|
||||
or operational issues that might arise from specific
|
||||
decisions.</para>
|
||||
<para>Inter-site link capacity describes the capabilities of the
|
||||
connectivity between the different OpenStack sites. This
|
||||
includes parameters such as bandwidth, latency, whether or not
|
||||
a link is dedicated, and any business policies applied to the
|
||||
connection. The capability and number of the links between
|
||||
sites will determine what kind of options may be available for
|
||||
deployment. For example, if two sites have a pair of
|
||||
high-bandwidth links available between them, it may be wise to
|
||||
configure a separate storage replication network between the
|
||||
two sites to support a single Swift endpoint and a shared
|
||||
object storage capability between them. (An example of this
|
||||
technique, as well as a configuration walk-through, is
|
||||
available at
|
||||
http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network).
|
||||
Another option in this scenario is to build a dedicated set of
|
||||
tenant private networks across the secondary link using
|
||||
overlay networks with a third party mapping the site overlays
|
||||
to each other.</para>
|
||||
<para>The capacity requirements of the links between sites will be
|
||||
driven by application behavior. If the latency of the links is
|
||||
too high, certain applications that use a large number of
|
||||
small packets, for example RPC calls, may encounter issues
|
||||
communicating with each other or operating properly.
|
||||
Additionally, OpenStack may encounter similar types of issues.
|
||||
To mitigate this, tuning of the Keystone call timeouts may be
|
||||
necessary to prevent issues authenticating against a central
|
||||
Identity Service.</para>
|
||||
<para>Another capacity consideration when it comes to networking
|
||||
for a multi-site deployment is the available amount and
|
||||
performance of overlay networks for tenant networks. If using
|
||||
shared tenant networks across zones, it is imperative that an
|
||||
external overlay manager or controller be used to map these
|
||||
overlays together. It is necessary to ensure the amount of
|
||||
possible IDs between the zones are identical. Note that, as of
|
||||
the Icehouse release, Neutron was not capable of managing
|
||||
tunnel IDs across installations. This means that if one site
|
||||
runs out of IDs, but other does not, that tenant's network
|
||||
will be unable to reach the other site.</para>
|
||||
<para>Capacity can take other forms as well. The ability for a
|
||||
region to grow depends on scaling out the number of available
|
||||
compute nodes. This topic is covered in greater detail in the
|
||||
section for compute-focused deployments. However, it should be
|
||||
noted that cells may be necessary to grow an individual region
|
||||
beyond a certain point. This point depends on the size of your
|
||||
cluster and the ratio of virtual machines per
|
||||
hypervisor.</para>
|
||||
<para>A third form of capacity comes in the multi-region-capable
|
||||
components of OpenStack. Centralized Object Storage is capable
|
||||
of serving objects through a single namespace across multiple
|
||||
regions. Since this works by accessing the object store via
|
||||
swift proxy, it is possible to overload the proxies. There are
|
||||
two options available to mitigate this issue. The first is to
|
||||
deploy a large number of swift proxies. The drawback to this
|
||||
is that the proxies are not load-balanced and a large file
|
||||
request could continually hit the same proxy. The other way to
|
||||
mitigate this is to front-end the proxies with a caching HTTP
|
||||
proxy and load balancer. Since swift objects are returned to
|
||||
the requester via HTTP, this load balancer would alleviate the
|
||||
load required on the swift proxies.</para>
|
||||
<section xml:id="utilization-multi-site"><title>Utilization</title>
|
||||
<para>While constructing a multi-site OpenStack environment is the
|
||||
goal of this guide, the real test is whether an application
|
||||
can utilize it.</para>
|
||||
<para>Identity is normally the first interface for the majority of
|
||||
OpenStack users. Interacting with Keystone is required for
|
||||
almost all major operations within OpenStack. Therefore, it is
|
||||
important to ensure that you provide users with a single URL
|
||||
for Keystone authentication. Equally important is proper
|
||||
documentation and configuration of regions within Keystone.
|
||||
Each of the sites defined in your installation is considered
|
||||
to be a region in Keystone nomenclature. This is important for
|
||||
the users of the system, when reading Keystone documentation,
|
||||
as it is required to define the Region name when providing
|
||||
actions to an API endpoint or in Horizon.</para>
|
||||
<para>Load balancing is another common issue with multi-site
|
||||
installations. While it is still possible to run HAproxy
|
||||
instances with load balancer as a service, these will be local
|
||||
to a specific region. Some applications may be able to cope
|
||||
with this via internal mechanisms. Others, however, may
|
||||
require the implementation of an external system including
|
||||
global services load balancers or anycast-advertised
|
||||
DNS.</para>
|
||||
<para>Depending on the storage model chosen during site design,
|
||||
storage replication and availability will also be a concern
|
||||
for end-users. If an application is capable of understanding
|
||||
regions, then it is possible to keep the object storage system
|
||||
separated by region. In this case, users who want to have an
|
||||
object available to more than one region will need to do the
|
||||
cross-site replication themselves. With a centralized swift
|
||||
proxy, however, the user may need to benchmark the replication
|
||||
timing of the Swift back end. Benchmarking allows the
|
||||
operational staff to provide users with an understanding of
|
||||
the amount of time required for a stored or modified object to
|
||||
become available to the entire environment.</para></section>
|
||||
<section xml:id="performance"><title>Performance</title>
|
||||
<para>Determining the performance of a multi-site installation
|
||||
involves considerations that do not come into play in a
|
||||
single-site deployment. Being a distributed deployment,
|
||||
multi-site deployments incur a few extra penalties to
|
||||
performance in certain situations.</para>
|
||||
<para>Since multi-site systems can be geographically separated,
|
||||
they may have worse than normal latency or jitter when
|
||||
communicating across regions. This can especially impact
|
||||
systems like the OpenStack Identity service when making
|
||||
authentication attempts from regions that do not contain the
|
||||
centralized Keystone implementation. It can also affect
|
||||
certain applications which rely on remote procedure call (RPC)
|
||||
for normal operation. An example of this can be seen in High
|
||||
Performance Computing workloads.</para>
|
||||
<para>Storage availability can also be impacted by the
|
||||
architecture of a multi-site deployment. A centralized Object
|
||||
Storage Service requires more time for an object to be
|
||||
available to instances locally in regions where the object was
|
||||
not created. Some applications may need to be tuned to account
|
||||
for this effect. Block storage does not currently have a
|
||||
method for replicating data across multiple regions, so
|
||||
applications that depend on available block storage will need
|
||||
to manually cope with this limitation by creating duplicate
|
||||
block storage entries in each region.</para></section>
|
||||
<section xml:id="security-multi-site"><title>Security</title>
|
||||
<para>Securing a multi-site OpenStack installation also brings
|
||||
extra challenges. Tenants may expect a tenant-created network
|
||||
to be secure. In a multi-site installation the use of a
|
||||
non-private connection between sites may be required. This may
|
||||
mean that traffic would be visible to third parties and, in
|
||||
cases where an application requires security, this issue will
|
||||
require mitigation. Installing a VPN or encrypted connection
|
||||
between sites is recommended in such instances.</para>
|
||||
<para>Another security consideration with regard to multi-site
|
||||
deployments is Identity. Authentication in a multi-site
|
||||
deployment should be centralized. Centralization provides a
|
||||
single authentication point for users across the deployment,
|
||||
as well as a single point of administration for traditional
|
||||
create, read, update and delete operations. Centralized
|
||||
authentication is also useful for auditing purposes because
|
||||
all authentication tokens originate from the same
|
||||
source.</para>
|
||||
<para>Just as tenants in a single-site deployment need isolation
|
||||
from each other, so do tenants in multi-site installations.
|
||||
The extra challenges in multi-site designs revolve around
|
||||
ensuring that tenant networks function across regions.
|
||||
Unfortunately, OpenStack Networking does not presently support
|
||||
a mechanism to provide this functionality, therefore an
|
||||
external system may be necessary to manage these mappings.
|
||||
Tenant networks may contain sensitive information requiring
|
||||
that this mapping be accurate and consistent to ensure that a
|
||||
tenant in one site does not connect to a different tenant in
|
||||
another site.</para></section>
|
||||
<section xml:id="openstack-components-multi-site"><title>OpenStack Components</title>
|
||||
<para>Most OpenStack installations require a bare minimum set of
|
||||
pieces to function. These include Keystone for authentication,
|
||||
Nova for compute, Glance for image storage, Neutron for
|
||||
networking, and potentially an object store in the form of
|
||||
Swift. Bringing multi-site into play also demands extra
|
||||
components in order to coordinate between regions. Centralized
|
||||
Keystone is necessary to provide the single authentication
|
||||
point. Centralized Horizon is also recommended to provide a
|
||||
single login point and a mapped experience to the API and CLI
|
||||
options available. If necessary, a centralized Swift may be
|
||||
used and will require the installation of the Swift proxy
|
||||
service.</para>
|
||||
<para>It may also be helpful to install a few extra options in
|
||||
order to facilitate certain use cases. For instance,
|
||||
installing Designate may assist in automatically generating
|
||||
DNS domains for each region with an automatically-populated
|
||||
zone full of resource records for each instance. This
|
||||
facilitates using DNS as a mechanism for determining which
|
||||
region would be selected for certain applications.</para>
|
||||
<para>Another useful tool for managing a multi-site installation
|
||||
is Heat. Heat allows the use of templates to define a set of
|
||||
instances to be launched together or for scaling existing
|
||||
sets. It can also be used to setup matching or differentiated
|
||||
groupings based on regions. For instance, if an application
|
||||
requires an equally balanced number of nodes across sites, the
|
||||
same heat template can be used to cover each site with small
|
||||
alterations to only the region name.</para></section>
|
||||
</section>
|
@ -0,0 +1,213 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="user-requirements-multi-site">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>User Requirements</title>
|
||||
<para>A multi-site architecture is complex and has its own risks
|
||||
and considerations, therefore it is important to make sure
|
||||
when contemplating the design such an architecture that it
|
||||
meets the user and business requirements.</para>
|
||||
<para>Many jurisdictions have legislative and regulatory
|
||||
requirements governing the storage and management of data in
|
||||
cloud environments. Common areas of regulation include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Data retention policies ensuring storage of
|
||||
persistent data and records management to meet data
|
||||
archival requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data ownership policies governing the possession and
|
||||
responsibility for data.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data sovereignty policies governing the storage of
|
||||
data in foreign countries or otherwise separate
|
||||
jurisdictions.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data compliance policies governing types of
|
||||
information that needs to reside in certain locations
|
||||
due to regular issues and, more importantly, cannot
|
||||
reside in other locations for the same reason.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Examples of such legal frameworks include the data
|
||||
protection framework of the European Union
|
||||
(http://ec.europa.eu/justice/data-protection) and the
|
||||
requirements of the Financial Industry Regulatory Authority
|
||||
(http://www.finra.org/Industry/Regulation/FINRARules) in the
|
||||
United States. Consult a local regulatory body for more
|
||||
information.</para>
|
||||
<section xml:id="workload-characteristics"><title>Workload Characteristics</title>
|
||||
<para>The expected workload is a critical requirement that needs
|
||||
to be captured to guide decision-making. An understanding of
|
||||
the workloads in the context of the desired multi-site
|
||||
environment and use case is important. Another way of thinking
|
||||
about a workload is to think of it as the way the systems are
|
||||
used. A workload could be a single application or a suite of
|
||||
applications that work together. It could also be a duplicate
|
||||
set of applications that need to run in multiple cloud
|
||||
environments. Often in a multi-site deployment the same
|
||||
workload will need to work identically in more than one
|
||||
physical location.</para>
|
||||
<para>This multi-site scenario likely includes one or more of the
|
||||
other scenarios in this book with the additional requirement
|
||||
of having the workloads in two or more locations. The
|
||||
following are some possible scenarios:</para>
|
||||
<para>For many use cases the proximity of the user to their
|
||||
workloads has a direct influence on the performance of the
|
||||
application and therefore should be taken into consideration
|
||||
in the design. Certain applications require zero to minimal
|
||||
latency that can only be achieved by deploying the cloud in
|
||||
multiple locations. These locations could be in different data
|
||||
centers, cities, countries or geographical regions, depending
|
||||
on the user requirement and location of the users.</para></section>
|
||||
<section xml:id="consistency-images-templates-across-sites">
|
||||
<title>Consistency of images and templates across different
|
||||
sites</title>
|
||||
<para>It is essential that the deployment of instances is
|
||||
consistent across the different sites. This needs to be built
|
||||
into the infrastructure. If OpenStack Object Store is used as
|
||||
a back end for Glance, it is possible to create repositories of
|
||||
consistent images across multiple sites. Having a central
|
||||
endpoint with multiple storage nodes will allow for a
|
||||
consistent centralized storage for each and every site.</para>
|
||||
<para>Not using a centralized object store will increase
|
||||
operational overhead so that a consistent image library can be
|
||||
maintained. This could include development of a replication
|
||||
mechanism to handle the transport of images and the changes to
|
||||
the images across multiple sites.</para></section>
|
||||
<section xml:id="high-availability-multi-site"><title>High Availability</title>
|
||||
<para>If high availability is a requirement to provide continuous
|
||||
infrastructure operations, a basic requirement of High
|
||||
Availability should be defined.</para>
|
||||
<para>The OpenStack management components need to have a basic and
|
||||
minimal level of redundancy. The simplest example is the loss
|
||||
of any single site has no significant impact on the
|
||||
availability of the OpenStack services of the entire
|
||||
infrastructure.</para>
|
||||
<para>The OpenStack High Availability Guide
|
||||
(http://docs.openstack.org/high-availability-guide/content/)
|
||||
contains more information on how to provide redundancy for the
|
||||
OpenStack components.</para>
|
||||
<para>Multiple network links should be deployed between sites to
|
||||
provide redundancy for all components. This includes storage
|
||||
replication, which should be isolated to a dedicated network
|
||||
or VLAN with the ability to assign QoS to control the
|
||||
replication traffic or provide priority for this traffic. Note
|
||||
that if the data store is highly changeable, the network
|
||||
requirements could have a significant effect on the
|
||||
operational cost of maintaining the sites.</para>
|
||||
<para>The ability to maintain object availability in both sites
|
||||
has significant implications on the object storage design and
|
||||
implementation. It will also have a significant impact on the
|
||||
WAN network design between the sites.</para>
|
||||
<para>Connecting more than two sites increases the challenges and
|
||||
adds more complexity to the design considerations. Multi-site
|
||||
implementations require extra planning to address the
|
||||
additional topology complexity used for internal and external
|
||||
connectivity. Some options include full mesh topology, hub
|
||||
spoke, spine leaf, or 3d Torus.</para>
|
||||
<para>Not all the applications running in a cloud are cloud-aware.
|
||||
If that is the case, there should be clear measures and
|
||||
expectations to define what the infrastructure can support
|
||||
and, more importantly, what it cannot. An example would be
|
||||
shared storage between sites. It is possible, however such a
|
||||
solution is not native to OpenStack and requires a third-party
|
||||
hardware vendor to fulfill such a requirement. Another example
|
||||
can be seen in applications that are able to consume resources
|
||||
in object storage directly. These applications need to be
|
||||
cloud aware to make good use of an OpenStack Object
|
||||
Store.</para></section>
|
||||
<section xml:id="application-readiness"><title>Application readiness</title>
|
||||
<para>Some applications are tolerant of the lack of synchronized
|
||||
object storage, while others may need those objects to be
|
||||
replicated and available across regions. Understanding of how
|
||||
the cloud implementation impacts new and existing applications
|
||||
is important for risk mitigation and the overall success of a
|
||||
cloud project. Applications may have to be written to expect
|
||||
an infrastructure with little to no redundancy. Existing
|
||||
applications not developed with the cloud in mind may need to
|
||||
be rewritten.</para></section>
|
||||
<section xml:id="cost-multi-site"><title>Cost</title>
|
||||
<para>The requirement of having more than one site has a cost
|
||||
attached to it. The greater the number of sites, the greater
|
||||
the cost and complexity. Costs can be broken down into the
|
||||
following categories</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Compute Resources</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Networking resources</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Replication</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Storage</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Management</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Operational costs</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="site-loss-and-recovery"><title>Site Loss and Recovery</title>
|
||||
<para>Outages can cause loss of partial or full functionality of a
|
||||
site. Strategies should be implemented to understand and plan
|
||||
for recovery scenarios.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>The deployed applications need to continue to
|
||||
function and, more importantly, consideration should
|
||||
be taken of the impact on the performance and
|
||||
reliability of the application when a site is
|
||||
unavailable.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>It is important to understand what will happen to
|
||||
replication of objects and data between the sites when
|
||||
a site goes down. If this causes queues to start
|
||||
building up, considering how long these queues can
|
||||
safely exist until something explodes.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Ensure determination of the method for resuming
|
||||
proper operations of a site when it comes back online
|
||||
after a disaster. It is recommended to architect the
|
||||
recovery to avoid race conditions.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="compliance-and-geo-location-multi-site"><title>Compliance and Geo-location</title>
|
||||
<para>An organization could have certain legal obligations and
|
||||
regulatory compliance measures which could require certain
|
||||
workloads or data to not be located in certain regions.</para></section>
|
||||
<section xml:id="auditing-multi-site"><title>Auditing</title>
|
||||
<para>A well thought-out auditing strategy is important in order
|
||||
to be able to quickly track down issues. Keeping track of
|
||||
changes made to security groups and tenant changes can be
|
||||
useful in rolling back the changes if they affect production.
|
||||
For example, if all security group rules for a tenant
|
||||
disappeared, the ability to quickly track down the issue would
|
||||
be important for operational and legal reasons.</para></section>
|
||||
<section xml:id="separation-of-duties"><title>Separation of duties</title>
|
||||
<para>A common requirement is to define different roles for the
|
||||
different cloud administration functions. An example would be
|
||||
a requirement to segregate the duties and permissions by
|
||||
site.</para></section>
|
||||
<section xml:id="authentication-between-sites">
|
||||
<title>Authentication between sites</title>
|
||||
<para>Ideally it is best to have a single authentication domain
|
||||
and not need a separate implementation for each and every
|
||||
site. This will, of course, require an authentication
|
||||
mechanism that is highly available and distributed to ensure
|
||||
continuous operation. Authentication server locality is also
|
||||
something that might be needed as well and should be planned
|
||||
for.</para></section>
|
||||
</section>
|
@ -0,0 +1,215 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="architecture-network-focus">
|
||||
<title>Architecture</title>
|
||||
<para>Network focused OpenStack architectures have many
|
||||
similarities to other OpenStack architecture use cases. There
|
||||
a number of very specific considerations to keep in mind when
|
||||
designing for a network-centric or network-heavy application
|
||||
environment.</para>
|
||||
<para>Networks exist to serve a as medium of transporting data
|
||||
between systems. It is inevitable that an OpenStack design
|
||||
have inter-dependencies with non-network portions of OpenStack
|
||||
as well as on external systems. Depending on the specific
|
||||
workload, there may be major interactions with storage systems
|
||||
both within and external to the OpenStack environment. For
|
||||
example, if the workload is a content delivery network, then
|
||||
the interactions with storage will be two-fold. There will be
|
||||
traffic flowing to and from the storage array for ingesting
|
||||
and serving content in a north-south direction. In addition,
|
||||
there is replication traffic flowing in an east-west
|
||||
direction.</para>
|
||||
<para>Compute-heavy workloads may also induce interactions with
|
||||
the network. Some high performance compute applications
|
||||
require network-based memory mapping and data sharing and, as
|
||||
a result, will induce a higher network load when they transfer
|
||||
results and data sets. Others may be highly transactional and
|
||||
issue transaction locks, perform their functions and rescind
|
||||
transaction locks at very high rates. This also has an impact
|
||||
on the network performance.</para>
|
||||
<para>Some network dependencies are going to be external to
|
||||
OpenStack. While Neutron is capable of providing network
|
||||
ports, IP addresses, some level of routing, and overlay
|
||||
networks, there are some other functions that it cannot
|
||||
provide. For many of these, external systems or equipment may
|
||||
be required to fill in the functional gaps. Hardware load
|
||||
balancers are an example of equipment that may be necessary to
|
||||
distribute workloads or offload certain functions. Note that,
|
||||
as of the icehouse release, dynamic routing is currently in
|
||||
its infancy within OpenStack and may need to be implemented
|
||||
either by an external device or a specialized service instance
|
||||
within OpenStack. Tunneling is a feature provided by Neutron,
|
||||
however it is constrained to a Neutron-managed region. If the
|
||||
need arises to extend a tunnel beyond the OpenStack region to
|
||||
either another region or an external system, it is necessary
|
||||
to implement the tunnel itself outside OpenStack or by using a
|
||||
tunnel management system to map the tunnel or overlay to an
|
||||
external tunnel. OpenStack does not currently provide quotas
|
||||
for network resources. Where network quotas are required, it
|
||||
is necessary to implement quality of service management
|
||||
outside of OpenStack. In many of these instances, similar
|
||||
solutions for traffic shaping or other network functions will
|
||||
be needed.</para>
|
||||
<para>Depending on the selected design, Neutron itself may not
|
||||
even support the required layer 3 network functionality. If it
|
||||
is necessary or advantageous to use the provider networking
|
||||
mode of Neutron without running the layer 3 agent, then an
|
||||
external router will be required to provide layer 3
|
||||
connectivity to outside systems.</para>
|
||||
<para>Interaction with orchestration services is inevitable in
|
||||
larger-scale deployments. Heat is capable of allocating
|
||||
network resource defined in templates to map to tenant
|
||||
networks and for port creation, as well as allocating floating
|
||||
IPs. If there is a requirement to define and manage network
|
||||
resources in using orchestration, it is recommended that the
|
||||
design include OpenStack Orchestration to meet the demands of
|
||||
users.</para>
|
||||
<section xml:id="desing-impacts"><title>Design Impacts</title>
|
||||
<para>A wide variety of factors can affect a network focused
|
||||
OpenStack architecture. While there are some considerations
|
||||
shared with a general use case, specific workloads related to
|
||||
network requirements will influence network design
|
||||
decisions.</para>
|
||||
<para>One decision includes whether or not to use Network Address
|
||||
Translation (NAT) and where to implement it. If there is a
|
||||
requirement for floating IPs to be available instead of using
|
||||
public fixed addresses then NAT is required. This can be seen
|
||||
in network management applications that rely on an IP
|
||||
endpoint. An example of this is a DHCP relay that needs to
|
||||
know the IP of the actual DHCP server. In these cases it is
|
||||
easier to automate the infrastructure to apply the target IP
|
||||
to a new instance rather than reconfigure legacy or external
|
||||
systems for each new instance.</para>
|
||||
<para>NAT for floating IPs managed by Neutron will reside within
|
||||
the hypervisor but there are also versions of NAT that may be
|
||||
running elsewhere. If there is a shortage of IPv4 addresses
|
||||
there are two common methods to mitigate this externally to
|
||||
OpenStack. The first is to run a load balancer either within
|
||||
OpenStack as a instance, or use an external load balancing
|
||||
solution. In the internal scenario, load balancing software,
|
||||
such as HAproxy, can be managed with Neutron's Load Balancer
|
||||
as a Service (LBaaS). This is specifically to manage the
|
||||
Virtual IP (VIPs) while a dual-homed connection from the
|
||||
HAproxy instance connects the public network with the tenant
|
||||
private network that hosts all of the content servers. In the
|
||||
external scenario, a load balancer would need to serve the VIP
|
||||
and also be joined to the tenant overlay network through
|
||||
external means or routed to it via private addresses.</para>
|
||||
<para>Another kind of NAT that may be useful is protocol NAT. In
|
||||
some cases it may be desirable to use only IPv6 addresses on
|
||||
instances and operate either an instance or an external
|
||||
service to provide a NAT-based transition technology such as
|
||||
NAT64 and DNS64. This provides the ability to have a globally
|
||||
routable IPv6 address while only consuming IPv4 addresses as
|
||||
necessary or in a shared manner.</para>
|
||||
<para>Application workloads will affect the design of the
|
||||
underlying network architecture. If a workload requires
|
||||
network-level redundancy, the routing and switching
|
||||
architecture will have to accommodate this. There are
|
||||
differing methods for providing this that are dependent on the
|
||||
network hardware selected, the performance of the hardware,
|
||||
and which networking model is deployed. Some examples of this
|
||||
are the use of Link aggregation (LAG) or Hot Standby Router
|
||||
Protocol (HSRP). There are also the considerations of whether
|
||||
to deploy Neutron or Nova-network and which plug-in to select
|
||||
for Neutron. If using an external system, Neutron will need to
|
||||
be configured to run layer 2 with a provider network
|
||||
configuration. For example, it may be necessary to implement
|
||||
HSRP to terminate layer 3 connectivity.</para>
|
||||
<para>Depending on the workload, overlay networks may or may not
|
||||
be a recommended configuration. Where application network
|
||||
connections are small, short lived or bursty, running a
|
||||
dynamic overlay can generate as much bandwidth as the packets
|
||||
it carries. It also can induce enough latency to cause issues
|
||||
with certain applications. There is an impact to the device
|
||||
generating the overlay which, in most installations, will be
|
||||
the hypervisor. This will cause performance degradation on
|
||||
packet per second and connection per second rates.</para>
|
||||
<para>Overlays also come with a secondary option that may or may
|
||||
not be appropriate to a specific workload. While all of them
|
||||
will operate in full mesh by default, there might be good
|
||||
reasons to disable this function because it may cause
|
||||
excessive overhead for some workloads. Conversely, other
|
||||
workloads will operate without issue. For example, most web
|
||||
services applications will not have major issues with a full
|
||||
mesh overlay network, while some network monitoring tools or
|
||||
storage replication workloads will have performance issues
|
||||
with throughput or excessive broadcast traffic.</para>
|
||||
<para>A design decision that many overlook is a choice of layer 3
|
||||
protocols. While OpenStack was initially built with only IPv4
|
||||
support, Neutron now supports IPv6 and dual-stacked networks.
|
||||
Note that, as of the icehouse release, this only includes
|
||||
stateless address autoconfiguration but the work is in
|
||||
progress to support stateless and stateful dhcpv6 as well as
|
||||
IPv6 floating IPs without NAT. Some workloads become possible
|
||||
through the use of IPv6 and IPv6 to IPv4 reverse transition
|
||||
mechanisms such as NAT64 and DNS64 or 6to4, because these
|
||||
options are available. This will alter the requirements for
|
||||
any address plan as single-stacked and transitional IPv6
|
||||
deployments can alleviate the need for IPv4 addresses.</para>
|
||||
<para>As of the icehouse release, OpenStack has limited support
|
||||
for dynamic routing, however there are a number of options
|
||||
available by incorporating third party solutions to implement
|
||||
routing within the cloud including network equipment, hardware
|
||||
nodes, and instances. Some workloads will perform well with
|
||||
nothing more than static routes and default gateways
|
||||
configured at the layer 3 termination point. In most cases
|
||||
this will suffice, however some cases require the addition of
|
||||
at least one type of dynamic routing protocol if not multiple
|
||||
protocols. Having a form of interior gateway protocol (IGP)
|
||||
available to the instances inside an OpenStack installation
|
||||
opens up the possibility of use cases for anycast route
|
||||
injection for services that need to use it as a geographic
|
||||
location or failover mechanism. Other applications may wish to
|
||||
directly participate in a routing protocol, either as a
|
||||
passive observer as in the case of a looking glass, or as an
|
||||
active participant in the form of a route reflector. Since an
|
||||
instance might have a large amount of compute and memory
|
||||
resources, it is trivial to hold an entire unpartitioned
|
||||
routing table and use it to provide services such as network
|
||||
path visibility to other applications or as a monitoring
|
||||
tool.</para>
|
||||
<para>A lesser known, but harder to diagnose issue, is that of
|
||||
path Maximum Transmission Unit (MTU) failures. It is less of
|
||||
an optional design consideration and more a design warning
|
||||
that MTU must be at least large enough to handle normal
|
||||
traffic, plus any overhead from an overlay network, and the
|
||||
desired layer 3 protocol. Adding externally built tunnels will
|
||||
further lessen the MTU packet size making it imperative to pay
|
||||
attention to the fully calculated MTU as some systems may be
|
||||
configured to ignore or drop path MTU discovery
|
||||
packets.</para></section>
|
||||
<section xml:id="tunables">
|
||||
<title>Tunable networking components</title>
|
||||
<para>Consider configurable networking components related to an
|
||||
OpenStack architecture design when designing for network intensive
|
||||
workloads include MTU and QoS. Some workloads will require a larger
|
||||
MTU than normal based on a requirement to transfer large blocks of
|
||||
data. When providing network service for applications such as video
|
||||
streaming or storage replication, it is recommended to ensure that
|
||||
both OpenStack hardware nodes and the supporting network equipment
|
||||
are configured for jumbo frames where possible. This will allow for
|
||||
a better utilization of available bandwidth. Configuration of jumbo
|
||||
frames should be done across the complete path the packets will
|
||||
traverse. If one network component is not capable of handling jumbo
|
||||
frames then the entire path will revert to the default MTU.</para>
|
||||
<para>Quality of Service (QoS) also has a great impact on network
|
||||
intensive workloads by providing instant service to packets which
|
||||
have a higher priority due to their ability to be impacted by poor
|
||||
network performance. In applications such as Voice over IP (VoIP)
|
||||
differentiated services code points are a near requirement for
|
||||
proper operation. QoS can also be used in the opposite direction for
|
||||
mixed workloads to prevent low priority but high bandwidth
|
||||
applications, for example backup services, video conferencing or
|
||||
file sharing, from blocking bandwidth that is needed for the proper
|
||||
operation of other workloads. It is possible to tag file storage
|
||||
traffic as a lower class, such as best effort or scavenger, to allow
|
||||
the higher priority traffic through. In cases where regions within a
|
||||
cloud might be geographically distributed it may also be necessary
|
||||
to plan accordingly to implement WAN optimization to combat latency
|
||||
or packet loss.</para>
|
||||
</section>
|
||||
</section>
|
@ -0,0 +1,138 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="arch-guide-intro-network-focus">
|
||||
<title>Introduction</title>
|
||||
<para>All OpenStack deployments are dependent, to some extent, on
|
||||
network communication in order to function properly due to a
|
||||
service-based nature. In some cases, however, use cases
|
||||
dictate that the network is elevated beyond simple
|
||||
infrastructure. This section is a discussion of architectures
|
||||
that are more reliant or focused on network services. These
|
||||
architectures are heavily dependent on the network
|
||||
infrastructure and need to be architected so that the network
|
||||
services perform and are reliable in order to satisfy user and
|
||||
application requirements.</para>
|
||||
<para>Some possible use cases include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Content Delivery Network: This could include
|
||||
streaming video, photographs or any other cloud based
|
||||
repository of data that is distributed to a large
|
||||
number of end users. Mass market streaming video will
|
||||
be very heavily affected by the network configurations
|
||||
that would affect latency, bandwidth, and the
|
||||
distribution of instances. Not all video streaming is
|
||||
consumer focused. For example, multicast videos (used
|
||||
for media, press conferences, corporate presentations,
|
||||
web conferencing services, etc.) can also utilize a
|
||||
content delivery network. Content delivery will be
|
||||
affected by the location of the video repository and
|
||||
its relationship to end users. Performance is also
|
||||
affected by network throughput of the backend systems,
|
||||
as well as the WAN architecture and the cache
|
||||
methodology.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network Management Functions: A cloud that provides
|
||||
network service functions would be built to support
|
||||
the delivery of back-end network services such as DNS,
|
||||
NTP or SNMP and would be used by a company for
|
||||
internal network management.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network Service Offerings: A cloud can be used to
|
||||
run customer facing network tools to support services.
|
||||
For example, VPNs, MPLS private networks, GRE tunnels
|
||||
and others.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Web portals / Web Services: Web servers are a common
|
||||
application for cloud services and it is recommended
|
||||
to have an understanding of the network requirements.
|
||||
The network will need to be able to scale out to meet
|
||||
user demand and deliver webpages with a minimum of
|
||||
latency. Internal east-west and north-south network
|
||||
bandwidth must be considered depending on the details
|
||||
of the portal architecture.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High Speed and High Volume Transactional Systems:
|
||||
These types of applications are very sensitive to
|
||||
network configurations. Examples include many
|
||||
financial systems, credit card transaction
|
||||
applications, trading and other extremely high volume
|
||||
systems. These systems are sensitive to network jitter
|
||||
and latency. They also have a high volume of both
|
||||
east-west and north-south network traffic that needs
|
||||
to be balanced to maximize efficiency of the data
|
||||
delivery. Many of these systems have large high
|
||||
performance database back ends that need to be
|
||||
accessed.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High Availability: These types of use cases are
|
||||
highly dependent on the proper sizing of the network
|
||||
to maintain replication of data between sites for high
|
||||
availability. If one site becomes unavailable, the
|
||||
extra sites will be able to serve the displaced load
|
||||
until the original site returns to service. It is
|
||||
important to size network capacity to handle the loads
|
||||
that are desired.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Big Data: Clouds that will be used for the
|
||||
management and collection of big data (data ingest)
|
||||
will have a significant demand on network resources.
|
||||
Big data often uses partial replicas of the data to
|
||||
maintain data integrity over large distributed clouds.
|
||||
Other big data applications that require a large
|
||||
amount of network resources are Hadoop, Cassandra,
|
||||
NuoDB, RIAK and other No-SQL and distributed
|
||||
databases.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Virtual Desktop Infrastructure (VDI): This use case
|
||||
is very sensitive to network congestion, latency,
|
||||
jitter and other network characteristics. Like video
|
||||
streaming, the user experience is very important
|
||||
however, unlike video streaming, caching is not an
|
||||
option to offset the network issues. VDI requires both
|
||||
upstream and downstream traffic and cannot rely on
|
||||
caching for the delivery of the application to the end
|
||||
user.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Voice over IP (VoIP): This is extremely sensitive to
|
||||
network congestion, latency, jitter and other network
|
||||
characteristics. VoIP has a symmetrical traffic
|
||||
pattern and it requires network quality of service
|
||||
(QoS) for best performance. It may also require an
|
||||
active queue management implementation to ensure
|
||||
delivery. Users are very sensitive to latency and
|
||||
jitter fluctuations and can detect them at very low
|
||||
levels.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Video Conference / Web Conference: This also is
|
||||
extremely sensitive to network congestion, latency,
|
||||
jitter and other network flaws. Video Conferencing has
|
||||
a symmetrical traffic pattern, but unless the network
|
||||
is on an MPLS private network, it cannot use network
|
||||
quality of service (QoS) to improve performance.
|
||||
Similar to VOIP, users will be sensitive to network
|
||||
performance issues even at low levels.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>High Performance Computing (HPC): This is a complex
|
||||
use case that requires careful consideration of the
|
||||
traffic flows and usage patterns to address the needs
|
||||
of cloud clusters. It has high East-West traffic
|
||||
patterns for distributed computing, but there can be
|
||||
substantial North-South traffic depending on the
|
||||
specific application.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
@ -0,0 +1,72 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="operational-considerations-networking-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Operational Considerations</title>
|
||||
<para>Network focused OpenStack clouds have a number of
|
||||
operational considerations that will influence the selected
|
||||
design. Topics including, but not limited to, dynamic routing
|
||||
of static routes, service level agreements, and ownership of
|
||||
user management all need to be considered.</para>
|
||||
<para>One of the first required decisions is the selection of a
|
||||
telecom company or transit provider. This is especially true
|
||||
if the network requirements include external or site-to-site
|
||||
network connectivity.</para>
|
||||
<para>Additional design decisions need to be made about monitoring
|
||||
and alarming. These can be an internal responsibility or the
|
||||
responsibility of the external provider. In the case of using
|
||||
an external provider, SLAs will likely apply. In addition,
|
||||
other operational considerations such as bandwidth, latency,
|
||||
and jitter can be part of a service level agreement.</para>
|
||||
<para>The ability to upgrade the infrastructure is another subject
|
||||
for consideration. As demand for network resources increase,
|
||||
operators will be required to add additional IP address blocks
|
||||
and add additional bandwidth capacity. Managing hardware and
|
||||
software life cycle events, for example upgrades,
|
||||
decommissioning, and outages while avoiding service
|
||||
interruptions for tenants, will also need to be
|
||||
considered.</para>
|
||||
<para>Maintainability will also need to be factored into the
|
||||
overall network design. This includes the ability to manage
|
||||
and maintain IP addresses as well as the use of overlay
|
||||
identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
|
||||
tags. As an example, if all of the IP addresses have to be
|
||||
changed on a network, a process known as renumbering, then the
|
||||
design needs to support the ability to do so.</para>
|
||||
<para>Network focused applications themselves need to be addressed
|
||||
when concerning certain operational realities. For example,
|
||||
the impending exhaustion of IPv4 addresses, the migration to
|
||||
IPv6 and the utilization of private networks to segregate
|
||||
different types of traffic that an application receives or
|
||||
generates. In the case of IPv4 to IPv6 migrations,
|
||||
applications should follow best practices for storing IP
|
||||
addresses. It is further recommended to avoid relying on IPv4
|
||||
features that were not carried over to the IPv6 protocol or
|
||||
have differences in implementation.</para>
|
||||
<para>When using private networks to segregate traffic,
|
||||
applications should create private tenant networks for
|
||||
database and data storage network traffic, and utilize public
|
||||
networks for client-facing traffic. By segregating this
|
||||
traffic, quality of service and security decisions can be made
|
||||
to ensure that each network has the correct level of service
|
||||
that it requires.</para>
|
||||
<para>Finally, decisions must be made about the routing of network
|
||||
traffic. For some applications, a more complex policy
|
||||
framework for routing must be developed. The economic cost of
|
||||
transmitting traffic over expensive links versus cheaper
|
||||
links, in addition to bandwidth, latency, and jitter
|
||||
requirements, can be used to create a routing policy that will
|
||||
satisfy business requirements.</para>
|
||||
<para>How to respond to network events must also be taken into
|
||||
consideration. As an example, how load is transferred from one
|
||||
link to another during a failure scenario could be a factor in
|
||||
the design. If network capacity is not planned correctly,
|
||||
failover traffic could overwhelm other ports or network links
|
||||
and create a cascading failure scenario. In this case, traffic
|
||||
that fails over to one link overwhelms that link and then
|
||||
moves to the subsequent links until the all network traffic
|
||||
stops.</para>
|
||||
</section>
|
@ -0,0 +1,189 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="prescriptive-example-large-scale-web-app">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Prescriptive Examples</title>
|
||||
<para>A large-scale web application has been designed with cloud
|
||||
principles in mind. The application is designed to scale
|
||||
horizontally in a bursting fashion and will generate a high
|
||||
instance count. The application requires an SSL connection to
|
||||
secure data and must not lose connection state to individual
|
||||
servers.</para>
|
||||
<para>An example design for this workload is depicted in the
|
||||
figure below. In this example, a hardware load balancer is
|
||||
configured to provide SSL offload functionality and to connect
|
||||
to tenant networks in order to reduce address consumption.
|
||||
This load balancer is linked to the routing architecture as it
|
||||
will service the VIP for the application. The router and load
|
||||
balancer are configured with GRE tunnel ID of the
|
||||
application's tenant network and provided an IP address within
|
||||
the tenant subnet but outside of the address pool. This is to
|
||||
ensure that the load balancer can communicate with the
|
||||
application's HTTP servers without requiring the consumption
|
||||
of a public IP address.</para>
|
||||
<para>Since sessions must remain until closing, the routing and
|
||||
switching architecture is designed for high availability.
|
||||
Switches are meshed to each hypervisor and to each other, and
|
||||
also provide an MLAG implementation to ensure layer 2
|
||||
connectivity does not fail. Routers are configured with VRRP
|
||||
and fully meshed with switches to ensure layer 3 connectivity.
|
||||
Since GRE is used as an overlay network, Neutron is installed
|
||||
and configured to use the Open vSwitch agent in GRE tunnel
|
||||
mode. This ensures all devices can reach all other devices and
|
||||
that tenant networks can be created for private addressing
|
||||
links to the load balancer.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Network_Web_Services1.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>A web service architecture has many options and optional
|
||||
components. Due to this, it can fit into a large number of
|
||||
other OpenStack designs however a few key components will need
|
||||
to be in place to handle the nature of most web-scale
|
||||
workloads. The user needs the following components:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OpenStack Controller services (Image, Identity,
|
||||
Networking and supporting services such as MariaDB and
|
||||
RabbitMQ)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Compute running KVM hypervisor</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Object Storage</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Orchestration</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>OpenStack Telemetry</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Beyond the normal Keystone, Nova, Glance and Swift
|
||||
components, Heat is a recommended component to handle properly
|
||||
scaling the workloads to adjust to demand. Ceilometer will
|
||||
also need to be included in the design due to the requirement
|
||||
for auto-scaling. Web services tend to be bursty in load, have
|
||||
very defined peak and valley usage patterns and, as a result,
|
||||
benefit from automatic scaling of instances based upon
|
||||
traffic. At a network level, a split network configuration
|
||||
will work well with databases residing on private tenant
|
||||
networks since these do not emit a large quantity of broadcast
|
||||
traffic and may need to interconnect to some databases for
|
||||
content.</para>
|
||||
<section xml:id="load-balancing"><title>Load Balancing</title>
|
||||
<para>Load balancing was included in this design to spread
|
||||
requests across multiple instances. This workload scales well
|
||||
horizontally across large numbers of instances. This allows
|
||||
instances to run without publicly routed IP addresses and
|
||||
simply rely on the load balancer for the service to be
|
||||
globally reachable. Many of these services do not require
|
||||
direct server return. This aids in address planning and
|
||||
utilization at scale since only the virtual IP (VIP) must be
|
||||
public.</para></section>
|
||||
<section xml:id="overlay-networks"><title>Overlay Networks</title>
|
||||
<para>OpenStack Networking using the Open vSwitch GRE tunnel mode
|
||||
was included in the design to provide overlay functionality.
|
||||
In this case, the layer 3 external routers will be in a pair
|
||||
with VRRP and switches should be paired with an implementation
|
||||
of MLAG running to ensure that there is no loss of
|
||||
connectivity with the upstream routing infrastructure.</para></section>
|
||||
<section xml:id="performance-tuning"><title>Performance Tuning</title>
|
||||
<para>Network level tuning for this workload is minimal.
|
||||
Quality-of-Service (QoS) will be applied to these workloads
|
||||
for a middle ground Class Selector depending on existing
|
||||
policies. It will be higher than a best effort queue but lower
|
||||
than an Expedited Forwarding or Assured Forwarding queue.
|
||||
Since this type of application generates larger packets with
|
||||
longer-lived connections, bandwidth utilization can be
|
||||
optimized for long duration TCP. Normal bandwidth planning
|
||||
applies here with regards to benchmarking a session's usage
|
||||
multiplied by the expected number of concurrent sessions with
|
||||
overhead.</para></section>
|
||||
<section xml:id="network-functions"><title>Network Functions</title>
|
||||
<para>Network functions is a broad category but encompasses
|
||||
workloads that support the rest of a system's network. These
|
||||
workloads tend to consist of large amounts of small packets
|
||||
that are very short lived, such as DNS queries or SNMP traps.
|
||||
These messages need to arrive quickly and do not deal with
|
||||
packet loss as there can be a very large volume of them. There
|
||||
are a few extra considerations to take into account for this
|
||||
type of workload and this can change a configuration all the
|
||||
way to the hypervisor level. For an application that generates
|
||||
10 TCP sessions per user with an average bandwidth of 512
|
||||
kilobytes per second per flow and expected user count of ten
|
||||
thousand concurrent users, the expected bandwidth plan is
|
||||
approximately 4.88 gigabits per second.</para>
|
||||
<para>The supporting network for this type of configuration needs
|
||||
to have a low latency and evenly distributed availability.
|
||||
This workload benefits from having services local to the
|
||||
consumers of the service. A multi-site approach is used as
|
||||
well as deploying many copies of the application to handle
|
||||
load as close as possible to consumers. Since these
|
||||
applications function independently, they do not warrant
|
||||
running overlays to interconnect tenant networks. Overlays
|
||||
also have the drawback of performing poorly with rapid flow
|
||||
setup and may incur too much overhead with large quantities of
|
||||
small packets and are therefore not recommended.</para>
|
||||
<para>QoS is desired for some workloads to ensure delivery. DNS
|
||||
has a major impact on the load times of other services and
|
||||
needs to be reliable and provide rapid responses. It is to
|
||||
configure rules in upstream devices to apply a higher Class
|
||||
Selector to DNS to ensure faster delivery or a better spot in
|
||||
queuing algorithms.</para></section>
|
||||
<section xml:id="cloud-storage"><title>Cloud Storage</title>
|
||||
<para>Another common use case for OpenStack environments is to
|
||||
provide a cloud based file storage and sharing service. While
|
||||
this may initially be considered to be a storage focused use
|
||||
case there are also major requirements on the network side
|
||||
that place it in the realm of requiring a network focused
|
||||
architecture. An example for this application is cloud
|
||||
backup.</para>
|
||||
<para>There are two specific behaviors of this workload that have
|
||||
major and different impacts on the network. Since this is both
|
||||
an externally facing service and internally replicating
|
||||
application there are both North-South and East-West traffic
|
||||
considerations.</para>
|
||||
<para>North-South traffic is primarily user facing. This means
|
||||
that when a user uploads content for storage it will be coming
|
||||
into the OpenStack installation. Users who download this
|
||||
content will be drawing traffic from the OpenStack
|
||||
installation. Since the service is intended primarily as a
|
||||
backup the majority of the traffic will be southbound into the
|
||||
environment. In this case it is beneficial to configure a
|
||||
network to be asymmetric downstream as the traffic entering
|
||||
the OpenStack installation will be greater than traffic
|
||||
leaving.</para>
|
||||
<para>East-West traffic is likely to be fully symmetric. Since
|
||||
replication will originate from any node and may target
|
||||
multiple other nodes algorithmically, it is less likely for
|
||||
this traffic to have a larger volume in any specific
|
||||
direction. However this traffic may interfere with north-south
|
||||
traffic.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Network_Cloud_Storage2.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>This application will prioritize the North-South traffic
|
||||
over East-West traffic as it is the customer-facing data. QoS
|
||||
is implemented on East-West traffic to be a lower priority
|
||||
Class Selector, while North-South traffic requires a higher
|
||||
level in the priority queue because of this.</para>
|
||||
<para>The network design in this case is less dependant on
|
||||
availability and more dependant on being able to handle high
|
||||
bandwidth. As a direct result, it is beneficial to forego
|
||||
redundant links in favor of bonding those connections. This
|
||||
increases available bandwidth. It is also beneficial to
|
||||
configure all devices in the path, including OpenStack, to
|
||||
generate and pass jumbo frames.</para></section>
|
||||
</section>
|
@ -0,0 +1,402 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="technical-considerations-network-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Technical Considerations</title>
|
||||
<para>Designing an OpenStack network architecture involves a
|
||||
combination of layer 2 and layer 3 considerations. Layer 2
|
||||
decisions involve those made at the data-link layer, such as
|
||||
the decision to use Ethernet versus Token Ring. Layer 3
|
||||
involve those made about the protocol layer and the point at
|
||||
which IP comes into the picture. As an example, a completely
|
||||
internal OpenStack network can exist at layer 2 and ignore
|
||||
layer 3 however, in order for any traffic to go outside of
|
||||
that cloud, to another network, or to the Internet, a layer 3
|
||||
router or switch must be involved.</para>
|
||||
<para>The past few years have seen two competing trends in
|
||||
networking. There has been a trend towards building data
|
||||
center network architectures based on layer 2 networking and
|
||||
simultaneously another network architecture approach is to
|
||||
treat the cloud environment essentially as a miniature version
|
||||
of the Internet. This represents a radically different
|
||||
approach to the network architecture from what is currently
|
||||
installed in the staging environment because the Internet is
|
||||
based entirely on layer 3 routing rather than layer 2
|
||||
switching.</para>
|
||||
<para>In the data center context, there are advantages of
|
||||
designing the network on layer 2 protocols rather than layer
|
||||
3. In spite of the difficulties of using a bridge to perform
|
||||
the network role of a router, many vendors, customers, and
|
||||
service providers are attracted to the idea of using Ethernet
|
||||
in as many parts of their networks as possible. The benefits
|
||||
of selecting a layer 2 design are:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Ethernet frames contain all the essentials for
|
||||
networking. These include, but are not limited to,
|
||||
globally unique source addresses, globally unique
|
||||
destination addresses, and error control.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Ethernet frames can carry any kind of packet.
|
||||
Networking at layer 2 is independent of the layer 3
|
||||
protocol.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>More layers added to the Ethernet frame only slow
|
||||
the networking process down. This is known as 'nodal
|
||||
processing delay'.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Adjunct networking features, for example class of
|
||||
service (CoS) or multicasting, can be added to
|
||||
Ethernet as readily as IP networks.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>VLANs are an easy mechanism for isolating
|
||||
networks.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Most information starts and ends inside Ethernet frames.
|
||||
Today this applies to data, voice (for example, VoIP) and
|
||||
video (for example, web cameras). The concept is that, if more
|
||||
of the end-to-end transfer of information from a source to a
|
||||
destination can be done in the form of Ethernet frames, more
|
||||
of the benefits of Ethernet can be realized on the network.
|
||||
Though it is not a substitute for IP networking, networking at
|
||||
layer 2 can be a powerful adjunct to IP networking.</para>
|
||||
<para>The basic reasoning behind using layer 2 Ethernet over layer
|
||||
3 IP networks is the speed, the reduced overhead of the IP
|
||||
hierarchy, and the lack of requirement to keep track of IP
|
||||
address configuration as systems are moved around. Whereas the
|
||||
simplicity of layer 2 protocols might work well in a data
|
||||
center with hundreds of physical machines, cloud data centers
|
||||
have the additional burden of needing to keep track of all
|
||||
virtual machine addresses and networks. In these data centers,
|
||||
it is not uncommon for one physical node to support 30-40
|
||||
instances.</para>
|
||||
<para>Important Note: Networking at the frame level says nothing
|
||||
about the presence or absence of IP addresses at the packet
|
||||
level. Almost all ports, links, and devices on a network of
|
||||
LAN switches still have IP addresses, as do all the source and
|
||||
destination hosts. There are many reasons for the continued
|
||||
need for IP addressing. The largest one is the need to manage
|
||||
the network. A device or link without an IP address is usually
|
||||
invisible to most management applications. Utilities including
|
||||
remote access for diagnostics, file transfer of configurations
|
||||
and software, and similar applications cannot run without IP
|
||||
addresses as well as MAC addresses.</para>
|
||||
<section xml:id="layer-2-arch-limitations"><title>Layer 2 Architecture Limitations</title>
|
||||
<para>Outside of the traditional data center the limitations of
|
||||
layer 2 network architectures become more obvious.</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Number of VLANs is limited to 4096.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>The number of MACs stored in switch tables is
|
||||
limited.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>The need to maintain a set of layer 4 devices to
|
||||
handle traffic control must be accommodated.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>MLAG, often used for switch redundancy, is a
|
||||
proprietary solution that does not scale beyond two
|
||||
devices and forces vendor lock-in.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>It can be difficult to troubleshoot a network
|
||||
without IP addresses and ICMP.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Configuring ARP is considered complicated on large
|
||||
layer 2 networks.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>All network devices need to be aware of all MACs,
|
||||
even instance MACs, so there is constant churn in MAC
|
||||
tables and network state changes as instances are
|
||||
started or stopped.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Migrating MACs (instance migration) to different
|
||||
physical locations are a potential problem if ARP
|
||||
table timeouts are not set properly.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>It is important to know that layer 2 has a very limited set
|
||||
of network management tools. It is very difficult to control
|
||||
traffic, as it does not have mechanisms to manage the network
|
||||
or shape the traffic, and network troubleshooting is very
|
||||
difficult. One reason for this difficulty is network devices
|
||||
have no IP addresses. As a result, there is no reasonable way
|
||||
to check network delay in a layer 2 network.</para>
|
||||
<para>On large layer 2 networks, configuring ARP learning can also
|
||||
be complicated. The setting for the MAC address timer on
|
||||
switches is critical and, if set incorrectly, can cause
|
||||
significant performance problems. As an example, the Cisco
|
||||
default MAC address timer is extremely long. Migrating MACs to
|
||||
different physical locations to support instance migration can
|
||||
be a significant problem. In this case, the network
|
||||
information maintained in the switches could be out of sync
|
||||
with the new location of the instance.</para>
|
||||
<para>In a layer 2 network, all devices are aware of all MACs,
|
||||
even those that belong to instances. The network state
|
||||
information in the backbone changes whenever an instance is
|
||||
started or stopped. As a result there is far too much churn in
|
||||
the MAC tables on the backbone switches.</para></section>
|
||||
<section xml:id="layer-3-arch-advantages"><title>Layer 3 Architecture Advantages</title>
|
||||
<para>In the layer 3 case, there is no churn in the routing tables
|
||||
due to instances starting and stopping. The only time there
|
||||
would be a routing state change would be in the case of a Top
|
||||
of Rack (ToR) switch failure or a link failure in the backbone
|
||||
itself. Other advantages of using a layer 3 architecture
|
||||
include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>layer 3 networks provide the same level of
|
||||
resiliency and scalability as the Internet.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Controlling traffic with routing metrics is
|
||||
straightforward.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>layer 3 can be configured to use BGP confederation
|
||||
for scalability so core routers have state
|
||||
proportional to number of racks, not to the number of
|
||||
servers or instances.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Routing ensures that instance MAC and IP addresses
|
||||
out of the network core reducing state churn. Routing
|
||||
state changes only occur in the case of a ToR switch
|
||||
failure or backbone link failure.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>There are a variety of well tested tools, for
|
||||
example ICMP, to monitor and manage traffic.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>layer 3 architectures allow for the use of Quality
|
||||
of Service (QoS) to manage network performance.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<section xml:id="layer-3-arch-limitations"><title>Layer 3 Architecture Limitations</title>
|
||||
<para>The main limitation of layer 3 is that there is no built-in
|
||||
isolation mechanism comparable to the VLANs in layer 2
|
||||
networks. Furthermore, the hierarchical nature of IP addresses
|
||||
means that an instance will also be on the same subnet as its
|
||||
physical host. This means that it cannot be migrated outside
|
||||
of the subnet easily. For these reasons, network
|
||||
virtualization needs to use IP encapsulation and software at
|
||||
the end hosts for both isolation, as well as for separation of
|
||||
the addressing in the virtual layer from addressing in the
|
||||
physical layer. Other potential disadvantages of layer 3
|
||||
include the need to design an IP addressing scheme rather than
|
||||
relying on the switches to automatically keep track of the MAC
|
||||
addresses and to configure the interior gateway routing
|
||||
protocol in the switches.</para></section></section>
|
||||
<section xml:id="network-recommendations-overview">
|
||||
<title>Network Recommendations Overview</title>
|
||||
<para>OpenStack has complex networking requirements for several
|
||||
reasons. Many components interact at different levels of the
|
||||
system stack that adds complexity. Data flows are complex.
|
||||
Data in an OpenStack cloud moves both between instances across
|
||||
the network (also known as East-West), as well as in and out
|
||||
of the system (also known as North-South). Physical server
|
||||
nodes have network requirements that are independent of those
|
||||
used by instances which need to be isolated from the core
|
||||
network to account for scalability. It is also recommended to
|
||||
functionally separate the networks for security purposes and
|
||||
tune performance through traffic shaping.</para>
|
||||
<para>A number of important general technical and business factors
|
||||
need to be taken into consideration when planning and
|
||||
designing an OpenStack network. They include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>A requirement for vendor independence. To avoid
|
||||
hardware or software vendor lock-in, the design should
|
||||
not rely on specific features of a vendor’s router or
|
||||
switch.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A requirement to massively scale the ecosystem to
|
||||
support millions of end users.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A requirement to support indeterminate platforms and
|
||||
applications.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A requirement to design for cost efficient
|
||||
operations to take advantage of massive scale.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A requirement to ensure that there is no single
|
||||
point of failure in the cloud ecosystem.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A requirement for high availability architecture to
|
||||
meet customer SLA requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A requirement to be tolerant of rack level
|
||||
failure.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A requirement to maximize flexibility to architect
|
||||
future production environments.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Keeping all of these in mind, the following network design
|
||||
recommendations can be made:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Layer 3 designs are preferred over layer 2
|
||||
architectures.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Design a dense multi-path network core to support
|
||||
multi-directional scaling and flexibility.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Use hierarchical addressing because it is the only
|
||||
viable option to scale network ecosystem.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Use virtual networking to isolate instance service
|
||||
network traffic from the management and internal
|
||||
network traffic.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Isolate virtual networks using encapsulation
|
||||
technologies.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Use traffic shaping for performance tuning.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Use eBGP to connect to the Internet up-link.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Use iBGP to flatten the internal traffic on the
|
||||
layer 3 mesh.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Determine the most effective configuration for block
|
||||
storage network.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="additional-considerations-network-focus"><title>Additional Considerations</title>
|
||||
<para>There are numerous topics to consider when designing a
|
||||
network-focused OpenStack cloud.</para>
|
||||
<section xml:id="openstack-networking-versus-nova-network"><title>OpenStack Networking versus Nova Network
|
||||
Considerations</title>
|
||||
<para>Selecting the type of networking technology to implement
|
||||
depends on many factors. OpenStack Networking (Neutron) and
|
||||
Nova Network both have their advantages and disadvantages.
|
||||
They are both valid and supported options that fit different
|
||||
use cases as described in the following table.</para></section>
|
||||
<section xml:id="redundant-networking-tor-switch-ha"><title>Redundant Networking: ToR Switch High Availability
|
||||
Risk Analysis</title>
|
||||
<para>A technical consideration of networking is the idea that
|
||||
switching gear in the data center that should be installed
|
||||
with backup switches in case of hardware failure.</para>
|
||||
<para>Research into the mean time between failures (MTBF) on
|
||||
switches is between 100,000 and 200,000 hours. This number is
|
||||
dependent on the ambient temperature of the switch in the data
|
||||
center. When properly cooled and maintained, this translates
|
||||
to between 11 and 22 years before failure. Even in the worst
|
||||
case of poor ventilation and high ambient temperatures in the
|
||||
data center, the MTBF is still 2-3 years. This is based on
|
||||
published research found at
|
||||
http://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
|
||||
and http://www.n-tron.com/pdf/network_availability.pdf</para>
|
||||
<para>In most cases, it is much more economical to only use a
|
||||
single switch with a small pool of spare switches to replace
|
||||
failed units than it is to outfit an entire data center with
|
||||
redundant switches. Applications should also be able to
|
||||
tolerate rack level outages without affecting normal
|
||||
operations since network and compute resources are easily
|
||||
provisioned and plentiful.</para></section>
|
||||
<section xml:id="preparing-for-future-ipv6-support"><title>Preparing for the future: IPv6 Support</title>
|
||||
<para>One of the most important networking topics today is the
|
||||
impending exhaustion of IPv4 addresses. In early 2014, ICANN
|
||||
announced that they started allocating the final IPv4 address
|
||||
blocks to the Regional Internet Registries
|
||||
http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/.
|
||||
This means the IPv4 address space is close to being fully
|
||||
allocated. As a result, it will soon become difficult to
|
||||
allocate more IPv4 addresses to an application that has
|
||||
experienced growth, or is expected to scale out, due to the
|
||||
lack of unallocated IPv4 address blocks.</para>
|
||||
<para>For network focused applications the future is the IPv6
|
||||
protocol. IPv6 increases the address space significantly,
|
||||
fixes long standing issues in the IPv4 protocol, and will
|
||||
become an essential for network focused applications in the
|
||||
future.</para>
|
||||
<para>Neutron supports IPv6 when configured to take advantage of
|
||||
the feature. To enable it, simply create an IPv6 subnet in
|
||||
OpenStack Neutron and use IPv6 prefixes when creating security
|
||||
groups.</para></section>
|
||||
<section xml:id="asymetric-links"><title>Asymmetric Links</title>
|
||||
<para>When designing a network architecture, the traffic patterns
|
||||
of an application will heavily influence the allocation of
|
||||
total bandwidth and the number of links that are used to send
|
||||
and receive traffic. Applications that provide file storage
|
||||
for customers will allocate bandwidth and links to favor
|
||||
incoming traffic, whereas video streaming applications will
|
||||
allocate bandwidth and links to favor outgoing traffic.</para></section>
|
||||
<section xml:id="performance-network-focus"><title>Performance</title>
|
||||
<para>It is important to analyze the applications' tolerance for
|
||||
latency and jitter when designing an environment to support
|
||||
network focused applications. Certain applications, for
|
||||
example VoIP, are less tolerant of latency and jitter. Where
|
||||
latency and jitter are concerned, certain applications may
|
||||
require tuning of QoS parameters and network device queues to
|
||||
ensure that they are queued for transmit immediately or
|
||||
guaranteed minimum bandwidth. Since OpenStack currently does
|
||||
not support these functions, some considerations may need to
|
||||
be made for the network plug-in selected.</para>
|
||||
<para>The location of a service may also impact the application or
|
||||
consumer experience. If an application is designed to serve
|
||||
differing content to differing users it will need to be
|
||||
designed to properly direct connections to those specific
|
||||
locations. Use a multi-site installation for these situations,
|
||||
where appropriate.</para>
|
||||
<para>OpenStack networking can be implemented in two separate
|
||||
ways. The legacy nova-network provides a flat DHCP network
|
||||
with a single broadcast domain. This implementation does not
|
||||
support tenant isolation networks or advanced plug-ins, but it
|
||||
is currently the only way to implement a distributed layer 3
|
||||
agent using the multi_host configuration. Neutron is the
|
||||
official current implementation of OpenStack Networking. It
|
||||
provides a pluggable architecture that supports a large
|
||||
variety of network methods. Some of these include a layer 2
|
||||
only provider network model, external device plug-ins, or even
|
||||
OpenFlow controllers.</para>
|
||||
<para>Networking at large scales becomes a set of boundary
|
||||
questions. The determination of how large a layer 2 domain
|
||||
needs to be is based on the amount of nodes within the domain
|
||||
and the amount of broadcast traffic that passes between
|
||||
instances. Breaking layer 2 boundaries may require the
|
||||
implementation of overlay networks and tunnels. This decision
|
||||
is a balancing act between the need for a smaller overhead or
|
||||
a need for a smaller domain.</para>
|
||||
<para>When selecting network devices, be aware that making this
|
||||
decision based on largest port density often comes with a
|
||||
drawback. Aggregation switches and routers have not all kept
|
||||
pace with Top of Rack switches and may induce bottlenecks on
|
||||
north-south traffic. As a result, it may be possible for
|
||||
massive amounts of downstream network utilization to impact
|
||||
upstream network devices, impacting service to the cloud.
|
||||
Since OpenStack does not currently provide a mechanism for
|
||||
traffic shaping or rate limiting, it is necessary to implement
|
||||
these features at the network hardware level.</para></section></section>
|
||||
</section>
|
@ -0,0 +1,170 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="user-requirements-network-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>User Requirements</title>
|
||||
<para>Network focused architectures vary from the general purpose
|
||||
designs. They are heavily influenced by a specific subset of
|
||||
applications that interact with the network in a more
|
||||
impacting way. Some of the business requirements that will
|
||||
influence the design include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>User experience: User experience is impacted by
|
||||
network latency through slow page loads, degraded
|
||||
video streams, and low quality VoIP sessions. Users
|
||||
are often not aware of how network design and
|
||||
architecture affects their experiences. Both
|
||||
enterprise customers and end-users rely on the network
|
||||
for delivery of an application. Network performance
|
||||
problems can provide a negative experience for the
|
||||
end-user, as well as productivity and economic loss.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Regulatory requirements: Networks need to take into
|
||||
consideration any regulatory requirements about the
|
||||
physical location of data as it traverses the network.
|
||||
For example, Canadian medical records cannot pass
|
||||
outside of Canadian sovereign territory. Another
|
||||
network consideration is maintaining network
|
||||
segregation of private data flows and ensuring that
|
||||
the network between cloud locations is encrypted where
|
||||
required. Network architectures are affected by
|
||||
regulatory requirements for encryption and protection
|
||||
of data in flight as the data moves through various
|
||||
networks.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Many jurisdictions have legislative and regulatory
|
||||
requirements governing the storage and management of data in
|
||||
cloud environments. Common areas of regulation include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Data retention policies ensuring storage of
|
||||
persistent data and records management to meet data
|
||||
archival requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data ownership policies governing the possession and
|
||||
responsibility for data.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data sovereignty policies governing the storage of
|
||||
data in foreign countries or otherwise separate
|
||||
jurisdictions.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data compliance policies governing where information
|
||||
needs to reside in certain locations due to regular
|
||||
issues and, more importantly, where it cannot reside
|
||||
in other locations for the same reason.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Examples of such legal frameworks include the data
|
||||
protection framework of the European Union
|
||||
(http://ec.europa.eu/justice/data-protection/ ) and the
|
||||
requirements of the Financial Industry Regulatory Authority
|
||||
(http://www.finra.org/Industry/Regulation/FINRARules) in the
|
||||
United States. Consult a local regulatory body for more
|
||||
information.</para>
|
||||
<section xml:id="high-availability-issues-network-focus"><title>High Availability Issues</title>
|
||||
<para>OpenStack installations with high demand on network
|
||||
resources have high availability requirements that are
|
||||
determined by the application and use case. Financial
|
||||
transaction systems will have a much higher requirement for
|
||||
high availability than a development application. Forms of
|
||||
network availability, for example quality of service (QoS),
|
||||
can be used to improve the network performance of sensitive
|
||||
applications, for example VoIP and video streaming.</para>
|
||||
<para>Often, high performance systems will have SLA requirements
|
||||
for a minimum QoS with regard to guaranteed uptime, latency
|
||||
and bandwidth. The level of the SLA can have a significant
|
||||
impact on the network architecture and requirements for
|
||||
redundancy in the systems.</para></section>
|
||||
<section xml:id="risks-network-focus"><title>Risks</title>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Network Misconfigurations: Configuring incorrect IP
|
||||
addresses, VLANs, and routes can cause outages to
|
||||
areas of the network or, in the worst case scenario,
|
||||
the entire cloud infrastructure. Misconfigurations can
|
||||
cause disruptive problems and should be automated to
|
||||
minimize the opportunity for operator error.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Capacity Planning: Cloud networks need to be managed
|
||||
for capacity and growth over time. There is a risk
|
||||
that the network will not grow to support the
|
||||
workload. Capacity planning includes the purchase of
|
||||
network circuits and hardware that can potentially
|
||||
have lead times measured in months or more.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Network Tuning: Cloud networks need to be configured
|
||||
to minimize link loss, packet loss, packet storms,
|
||||
broadcast storms, and loops.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Single Point Of Failure (SPOF): High availability
|
||||
must be taken into account even at the physical and
|
||||
environmental layers. If there is a single point of
|
||||
failure due to only one upstream link, or only one
|
||||
power supply, an outage becomes unavoidable.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Complexity: An overly complex network design becomes
|
||||
difficult to maintain and troubleshoot. While
|
||||
automated tools that handle overlay networks or device
|
||||
level configuration can mitigate this, non-traditional
|
||||
interconnects between functions and specialized
|
||||
hardware need to be well documented or avoided to
|
||||
prevent outages.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Non-standard features: There are additional risks
|
||||
that arise from configuring the cloud network to take
|
||||
advantage of vendor specific features. One example is
|
||||
multi-link aggregation (MLAG) that is being used to
|
||||
provide redundancy at the aggregator switch level of
|
||||
the network. MLAG is not a standard and, as a result,
|
||||
each vendor has their own proprietary implementation
|
||||
of the feature. MLAG architectures are not
|
||||
interoperable across switch vendors, which leads to
|
||||
vendor lock-in, and can cause delays or inability when
|
||||
upgrading components.</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="security-network-focus"><title>Security</title>
|
||||
<para>Security is often overlooked or added after a design has
|
||||
been implemented. Consider security implications and
|
||||
requirements before designing the physical and logical network
|
||||
topologies. Some of the factors that need to be addressed
|
||||
include making sure the networks are properly segregated and
|
||||
traffic flows are going to the correct destinations without
|
||||
crossing through locations that are undesirable. Some examples
|
||||
of factors that need to be taken into consideration are:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Firewalls</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Overlay interconnects for joining separated tenant
|
||||
networks</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Routing through or avoiding specific networks</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Another security vulnerability that must be taken into
|
||||
account is how networks are attached to hypervisors. If a
|
||||
network must be separated from other systems at all costs, it
|
||||
may be necessary to schedule instances for that network onto
|
||||
dedicated compute nodes. This may also be done to mitigate
|
||||
against exploiting a hypervisor breakout allowing the attacker
|
||||
access to networks from a compromised instance.</para>
|
||||
</section>
|
||||
</section>
|
79
doc/arch-design/pom.xml
Normal file
@ -0,0 +1,79 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
|
||||
<parent>
|
||||
<groupId>org.openstack.docs</groupId>
|
||||
<artifactId>parent-pom</artifactId>
|
||||
<version>1.0.0-SNAPSHOT</version>
|
||||
<relativePath>../pom.xml</relativePath>
|
||||
</parent>
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<artifactId>openstack-arch-design</artifactId>
|
||||
<packaging>jar</packaging>
|
||||
<name>OpenStack Architecture Design Guide</name>
|
||||
<properties>
|
||||
<!-- This is set by Jenkins according to the branch. -->
|
||||
<release.path.name></release.path.name>
|
||||
<comments.enabled>0</comments.enabled>
|
||||
</properties>
|
||||
<!-- ################################################ -->
|
||||
<!-- USE "mvn clean generate-sources" to run this POM -->
|
||||
<!-- ################################################ -->
|
||||
<build>
|
||||
<plugins>
|
||||
<plugin>
|
||||
<groupId>com.rackspace.cloud.api</groupId>
|
||||
<artifactId>clouddocs-maven-plugin</artifactId>
|
||||
<!-- version set in ../pom.xml -->
|
||||
<executions>
|
||||
<execution>
|
||||
<id>generate-webhelp</id>
|
||||
<goals>
|
||||
<goal>generate-webhelp</goal>
|
||||
</goals>
|
||||
<phase>generate-sources</phase>
|
||||
<configuration>
|
||||
<!-- These parameters only apply to webhelp -->
|
||||
<enableDisqus>0</enableDisqus>
|
||||
<disqusShortname>openstack-arch-design</disqusShortname>
|
||||
<enableGoogleAnalytics>1</enableGoogleAnalytics>
|
||||
<googleAnalyticsId>UA-17511903-1</googleAnalyticsId>
|
||||
<generateToc>
|
||||
appendix toc,title
|
||||
article/appendix nop
|
||||
article toc,title
|
||||
book toc,title,figure,table,example,equation
|
||||
chapter toc,title
|
||||
section toc
|
||||
part toc,title
|
||||
qandadiv toc
|
||||
qandaset toc
|
||||
reference toc,title
|
||||
set toc,title
|
||||
</generateToc>
|
||||
<!-- The following elements sets the autonumbering of sections in output for chapter numbers but no numbered sections-->
|
||||
<sectionAutolabel>0</sectionAutolabel>
|
||||
<tocSectionDepth>1</tocSectionDepth>
|
||||
<sectionLabelIncludesComponentLabel>0</sectionLabelIncludesComponentLabel>
|
||||
<webhelpDirname>arch-design</webhelpDirname>
|
||||
<pdfFilenameBase>arch-design</pdfFilenameBase>
|
||||
</configuration>
|
||||
</execution>
|
||||
</executions>
|
||||
<configuration>
|
||||
<!-- These parameters apply to pdf and webhelp -->
|
||||
<xincludeSupported>true</xincludeSupported>
|
||||
<sourceDirectory>.</sourceDirectory>
|
||||
<includes>
|
||||
bk-openstack-arch-design.xml
|
||||
</includes>
|
||||
<canonicalUrlBase>http://docs.openstack.org/openstack-arch-design/content</canonicalUrlBase>
|
||||
<glossaryCollection>${basedir}/../glossary/glossary-terms.xml</glossaryCollection>
|
||||
<branding>openstack</branding>
|
||||
<formalProcedures>0</formalProcedures>
|
||||
</configuration>
|
||||
</plugin>
|
||||
</plugins>
|
||||
</build>
|
||||
</project>
|
@ -0,0 +1,62 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="desktop-as-a-service">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Desktop as a Service</title>
|
||||
<para>Virtual Desktop Infrastructure (VDI) is a service that hosts
|
||||
user desktop environments on remote servers. This application
|
||||
is very sensitive to network latency and requires a high
|
||||
performance compute environment. Traditionally these types of
|
||||
environments have not been put on cloud environments because
|
||||
few clouds are built to support such a demanding workload that
|
||||
is so exposed to end users. Recently, as cloud environments
|
||||
become more robust, vendors are starting to provide services
|
||||
that allow virtual desktops to be hosted in the cloud. In the
|
||||
not too distant future, OpenStack could be used as the
|
||||
underlying infrastructure to run a virtual infrastructure
|
||||
environment, either in-house or in the cloud.</para>
|
||||
<section xml:id="challenges"><title>Challenges</title>
|
||||
<para>Designing an infrastructure that is suitable to host virtual
|
||||
desktops is a very different task to that of most virtual
|
||||
workloads. The infrastructure will need to be designed, for
|
||||
example:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Boot storms - What happens when hundreds or
|
||||
thousands of users log in during shift changes,
|
||||
affects the storage design.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>The performance of the applications running in these
|
||||
virtual desktops</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Operating system and compatibility with the
|
||||
OpenStack hypervisor</para>
|
||||
</listitem>
|
||||
</itemizedlist></section>
|
||||
<section xml:id="broker"><title>Broker</title>
|
||||
<para>The Connection Broker is a central component of the
|
||||
architecture that determines which Remote Desktop Host will be
|
||||
assigned or connected to the user. The broker is often a
|
||||
full-blown management product allowing for the automated
|
||||
deployment and provisioning of Remote Desktop Hosts.</para></section>
|
||||
<section xml:id="possible-solutions"><title>Possible Solutions</title>
|
||||
<para>There a number of commercial products available today that
|
||||
provide such a broker solution but nothing that is native in
|
||||
the OpenStack project. There of course is also the option of
|
||||
not providing a broker and managing this manually - but this
|
||||
would not suffice as a large scale, enterprise
|
||||
solution.</para></section>
|
||||
<section xml:id="diagram"><title>Diagram</title>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata
|
||||
fileref="../images/Specialized_VDI1.png"
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject></section>
|
||||
</section>
|
45
doc/arch-design/specialized/section_hardware_specialized.xml
Normal file
@ -0,0 +1,45 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<section xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
version="5.0"
|
||||
xml:id="specialized-hardware">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Specialized Hardware</title>
|
||||
<para>Certain workloads require specialized hardware devices that
|
||||
are either difficult to virtualize or impossible to share.
|
||||
Applications such as load balancers, highly parallel brute
|
||||
force computing, and direct to wire networking may need
|
||||
capabilities that basic OpenStack components do not
|
||||
provide.</para>
|
||||
<section xml:id="challenges-specialized-hardware"><title>Challenges</title>
|
||||
<para>Some applications need access to hardware devices to either
|
||||
improve performance or provide capabilities that are not
|
||||
virtual CPU, RAM, network or storage. These can be a shared
|
||||
resource, such as a cryptography processor, or a dedicated
|
||||
resource such as a Graphics Processing Unit. OpenStack has
|
||||
ways of providing some of these, while others may need extra
|
||||
work.</para></section>
|
||||
<section xml:id="solutions-specialized-hardware"><title>Solutions</title>
|
||||
<para>In order to provide cryptography offloading to a set of
|
||||
instances, it is possible to use Glance configuration options
|
||||
to assign the cryptography chip to a device node in the guest.
|
||||
The documentation at
|
||||
http://docs.openstack.org/cli-reference/content/chapter_cli-glance-property.html
|
||||
contains further information on configuring this solution, but
|
||||
it allows all guests using the configured images to access the
|
||||
hypervisor cryptography device.</para>
|
||||
<para>If direct access to a specific device is required, it can be
|
||||
dedicated to a single instance per hypervisor through the use
|
||||
of PCI pass-through. The OpenStack administrator needs to
|
||||
define a flavor that specifically has the PCI device in order
|
||||
to properly schedule instances. More information regarding PCI
|
||||
pass-through, including instructions for implementing and
|
||||
using it, is available at
|
||||
https://wiki.openstack.org/wiki/Pci_passthrough#How_to_check_PCI_status_with_PCI_api_patches.</para>
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
<imagedata fileref="../images/Specialized_Hardware2.png"/>
|
||||
</imageobject>
|
||||
</mediaobject></section>
|
||||
</section>
|