Merge "[arch-design-draft] Migrate technical requirements content"
This commit is contained in:
commit
25bde8fc08
@ -1,449 +0,0 @@
|
||||
==================
|
||||
Hardware selection
|
||||
==================
|
||||
|
||||
Hardware selection involves three key areas:
|
||||
|
||||
* Network
|
||||
|
||||
* Compute
|
||||
|
||||
* Storage
|
||||
|
||||
Network hardware selection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The network architecture determines which network hardware will be
|
||||
used. Networking software is determined by the selected networking
|
||||
hardware.
|
||||
|
||||
There are more subtle design impacts that need to be considered. The
|
||||
selection of certain networking hardware (and the networking software)
|
||||
affects the management tools that can be used. There are exceptions to
|
||||
this; the rise of *open* networking software that supports a range of
|
||||
networking hardware means there are instances where the relationship
|
||||
between networking hardware and networking software are not as tightly
|
||||
defined.
|
||||
|
||||
For a compute-focus architecture, we recommend designing the network
|
||||
architecture using a scalable network model that makes it easy to add
|
||||
capacity and bandwidth. A good example of such a model is the leaf-spline
|
||||
model. In this type of network design, it is possible to easily add additional
|
||||
bandwidth as well as scale out to additional racks of gear. It is important to
|
||||
select network hardware that supports the required port count, port speed, and
|
||||
port density while also allowing for future growth as workload demands
|
||||
increase. It is also important to evaluate where in the network architecture
|
||||
it is valuable to provide redundancy.
|
||||
|
||||
Some of the key considerations that should be included in the selection
|
||||
of networking hardware include:
|
||||
|
||||
Port count
|
||||
The design will require networking hardware that has the requisite
|
||||
port count.
|
||||
|
||||
Port density
|
||||
The network design will be affected by the physical space that is
|
||||
required to provide the requisite port count. A higher port density
|
||||
is preferred, as it leaves more rack space for compute or storage
|
||||
components that may be required by the design. This can also lead
|
||||
into considerations about fault domains and power density. Higher
|
||||
density switches are more expensive, therefore it is important not
|
||||
to over design the network.
|
||||
|
||||
Port speed
|
||||
The networking hardware must support the proposed network speed, for
|
||||
example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE).
|
||||
|
||||
Redundancy
|
||||
User requirements for high availability and cost considerations
|
||||
influence the required level of network hardware redundancy.
|
||||
Network redundancy can be achieved by adding redundant power
|
||||
supplies or paired switches.
|
||||
|
||||
.. note::
|
||||
|
||||
If this is a requirement, the hardware must support this
|
||||
configuration. User requirements determine if a completely
|
||||
redundant network infrastructure is required.
|
||||
|
||||
Power requirements
|
||||
Ensure that the physical data center provides the necessary power
|
||||
for the selected network hardware.
|
||||
|
||||
.. note::
|
||||
|
||||
This is not an issue for top of rack (ToR) switches. This may be an issue
|
||||
for spine switches in a leaf and spine fabric, or end of row (EoR)
|
||||
switches.
|
||||
|
||||
Protocol support
|
||||
It is possible to gain more performance out of a single storage
|
||||
system by using specialized network technologies such as RDMA, SRP,
|
||||
iSER and SCST. The specifics for using these technologies is beyond
|
||||
the scope of this book.
|
||||
|
||||
There is no single best practice architecture for the networking
|
||||
hardware supporting an OpenStack cloud that will apply to all implementations.
|
||||
Some of the key factors that will have a major influence on selection of
|
||||
networking hardware include:
|
||||
|
||||
Connectivity
|
||||
All nodes within an OpenStack cloud require network connectivity. In
|
||||
some cases, nodes require access to more than one network segment.
|
||||
The design must encompass sufficient network capacity and bandwidth
|
||||
to ensure that all communications within the cloud, both north-south
|
||||
and east-west traffic have sufficient resources available.
|
||||
|
||||
Scalability
|
||||
The network design should encompass a physical and logical network
|
||||
design that can be easily expanded upon. Network hardware should
|
||||
offer the appropriate types of interfaces and speeds that are
|
||||
required by the hardware nodes.
|
||||
|
||||
Availability
|
||||
To ensure access to nodes within the cloud is not interrupted,
|
||||
we recommend that the network architecture identify any single
|
||||
points of failure and provide some level of redundancy or fault
|
||||
tolerance. The network infrastructure often involves use of
|
||||
networking protocols such as LACP, VRRP or others to achieve a highly
|
||||
available network connection. It is also important to consider the
|
||||
networking implications on API availability. We recommend a load balancing
|
||||
solution is designed within the network architecture to ensure that the APIs,
|
||||
and potentially other services in the cloud are highly available.
|
||||
|
||||
Compute (server) hardware selection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Consider the following factors when selecting compute (server) hardware:
|
||||
|
||||
* Server density
|
||||
A measure of how many servers can fit into a given measure of
|
||||
physical space, such as a rack unit [U].
|
||||
|
||||
* Resource capacity
|
||||
The number of CPU cores, how much RAM, or how much storage a given
|
||||
server delivers.
|
||||
|
||||
* Expandability
|
||||
The number of additional resources you can add to a server before it
|
||||
reaches capacity.
|
||||
|
||||
* Cost
|
||||
The relative cost of the hardware weighed against the level of
|
||||
design effort needed to build the system.
|
||||
|
||||
Weigh these considerations against each other to determine the best
|
||||
design for the desired purpose. For example, increasing server density
|
||||
means sacrificing resource capacity or expandability. Increasing resource
|
||||
capacity and expandability can increase cost but decrease server density.
|
||||
Decreasing cost often means decreasing supportability, server density,
|
||||
resource capacity, and expandability.
|
||||
|
||||
Compute capacity (CPU cores and RAM capacity) is a secondary
|
||||
consideration for selecting server hardware. The required
|
||||
server hardware must supply adequate CPU sockets, additional CPU cores,
|
||||
and more RAM; network connectivity and storage capacity are not as
|
||||
critical. The hardware needs to provide enough network connectivity and
|
||||
storage capacity to meet the user requirements.
|
||||
|
||||
For a compute-focused cloud, emphasis should be on server
|
||||
hardware that can offer more CPU sockets, more CPU cores, and more RAM.
|
||||
Network connectivity and storage capacity are less critical.
|
||||
|
||||
When designing a OpenStack cloud architecture, you must
|
||||
consider whether you intend to scale up or scale out. Selecting a
|
||||
smaller number of larger hosts, or a larger number of smaller hosts,
|
||||
depends on a combination of factors: cost, power, cooling, physical rack
|
||||
and floor space, support-warranty, and manageability.
|
||||
|
||||
Consider the following in selecting server hardware form factor suited for
|
||||
your OpenStack design architecture:
|
||||
|
||||
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
||||
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
||||
aware, however, that this also decreases server density. For example,
|
||||
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
||||
M1000e support up to 16 servers in only ten rack units. Using
|
||||
half-height blades is twice as dense as using full-height blades,
|
||||
which results in only eight servers per ten rack units.
|
||||
|
||||
* 1U rack-mounted servers have the ability to offer greater server density
|
||||
than a blade server solution, but are often limited to dual-socket,
|
||||
multi-core CPU configurations. It is possible to place forty 1U servers
|
||||
in a rack, providing space for the top of rack (ToR) switches, compared
|
||||
to 32 full width blade servers.
|
||||
|
||||
To obtain greater than dual-socket support in a 1U rack-mount form
|
||||
factor, customers need to buy their systems from Original Design
|
||||
Manufacturers (ODMs) or second-tier manufacturers.
|
||||
|
||||
.. warning::
|
||||
|
||||
This may cause issues for organizations that have preferred
|
||||
vendor policies or concerns with support and hardware warranties
|
||||
of non-tier 1 vendors.
|
||||
|
||||
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
||||
but with a corresponding decrease in server density (half the density
|
||||
that 1U rack-mounted servers offer).
|
||||
|
||||
* Larger rack-mounted servers, such as 4U servers, often provide even
|
||||
greater CPU capacity, commonly supporting four or even eight CPU
|
||||
sockets. These servers have greater expandability, but such servers
|
||||
have much lower server density and are often more expensive.
|
||||
|
||||
* ``Sled servers`` are rack-mounted servers that support multiple
|
||||
independent servers in a single 2U or 3U enclosure. These deliver
|
||||
higher density as compared to typical 1U or 2U rack-mounted servers.
|
||||
For example, many sled servers offer four independent dual-socket
|
||||
nodes in 2U for a total of eight CPU sockets in 2U.
|
||||
|
||||
Other factors that influence server hardware selection for an OpenStack
|
||||
design architecture include:
|
||||
|
||||
Instance density
|
||||
More hosts are required to support the anticipated scale
|
||||
if the design architecture uses dual-socket hardware designs.
|
||||
|
||||
For a general purpose OpenStack cloud, sizing is an important consideration.
|
||||
The expected or anticipated number of instances that each hypervisor can
|
||||
host is a common meter used in sizing the deployment. The selected server
|
||||
hardware needs to support the expected or anticipated instance density.
|
||||
|
||||
Host density
|
||||
Another option to address the higher host count is to use a
|
||||
quad-socket platform. Taking this approach decreases host density
|
||||
which also increases rack count. This configuration affects the
|
||||
number of power connections and also impacts network and cooling
|
||||
requirements.
|
||||
|
||||
Physical data centers have limited physical space, power, and
|
||||
cooling. The number of hosts (or hypervisors) that can be fitted
|
||||
into a given metric (rack, rack unit, or floor tile) is another
|
||||
important method of sizing. Floor weight is an often overlooked
|
||||
consideration. The data center floor must be able to support the
|
||||
weight of the proposed number of hosts within a rack or set of
|
||||
racks. These factors need to be applied as part of the host density
|
||||
calculation and server hardware selection.
|
||||
|
||||
Power and cooling density
|
||||
The power and cooling density requirements might be lower than with
|
||||
blade, sled, or 1U server designs due to lower host density (by
|
||||
using 2U, 3U or even 4U server designs). For data centers with older
|
||||
infrastructure, this might be a desirable feature.
|
||||
|
||||
Data centers have a specified amount of power fed to a given rack or
|
||||
set of racks. Older data centers may have a power density as power
|
||||
as low as 20 AMPs per rack, while more recent data centers can be
|
||||
architected to support power densities as high as 120 AMP per rack.
|
||||
The selected server hardware must take power density into account.
|
||||
|
||||
Network connectivity
|
||||
The selected server hardware must have the appropriate number of
|
||||
network connections, as well as the right type of network
|
||||
connections, in order to support the proposed architecture. Ensure
|
||||
that, at a minimum, there are at least two diverse network
|
||||
connections coming into each rack.
|
||||
|
||||
The selection of form factors or architectures affects the selection of
|
||||
server hardware. Ensure that the selected server hardware is configured
|
||||
to support enough storage capacity (or storage expandability) to match
|
||||
the requirements of selected scale-out storage solution. Similarly, the
|
||||
network architecture impacts the server hardware selection and vice
|
||||
versa.
|
||||
|
||||
Hardware for general purpose OpenStack cloud
|
||||
--------------------------------------------
|
||||
|
||||
Hardware for a general purpose OpenStack cloud should reflect a cloud
|
||||
with no pre-defined usage model, designed to run a wide variety of
|
||||
applications with varying resource usage requirements. These
|
||||
applications include any of the following:
|
||||
|
||||
* RAM-intensive
|
||||
|
||||
* CPU-intensive
|
||||
|
||||
* Storage-intensive
|
||||
|
||||
Certain hardware form factors may better suit a general purpose
|
||||
OpenStack cloud due to the requirement for equal (or nearly equal)
|
||||
balance of resources. Server hardware must provide the following:
|
||||
|
||||
* Equal (or nearly equal) balance of compute capacity (RAM and CPU)
|
||||
|
||||
* Network capacity (number and speed of links)
|
||||
|
||||
* Storage capacity (gigabytes or terabytes as well as :term:`Input/Output
|
||||
Operations Per Second (IOPS)`
|
||||
|
||||
The best form factor for server hardware supporting a general purpose
|
||||
OpenStack cloud is driven by outside business and cost factors. No
|
||||
single reference architecture applies to all implementations; the
|
||||
decision must flow from user requirements, technical considerations, and
|
||||
operational considerations.
|
||||
|
||||
Selecting storage hardware
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Storage hardware architecture is determined by selecting specific storage
|
||||
architecture. Determine the selection of storage architecture by
|
||||
evaluating possible solutions against the critical factors, the user
|
||||
requirements, technical considerations, and operational considerations.
|
||||
Consider the following factors when selecting storage hardware:
|
||||
|
||||
Cost
|
||||
Storage can be a significant portion of the overall system cost. For
|
||||
an organization that is concerned with vendor support, a commercial
|
||||
storage solution is advisable, although it comes with a higher price
|
||||
tag. If initial capital expenditure requires minimization, designing
|
||||
a system based on commodity hardware would apply. The trade-off is
|
||||
potentially higher support costs and a greater risk of
|
||||
incompatibility and interoperability issues.
|
||||
|
||||
Performance
|
||||
The latency of storage I/O requests indicates performance. Performance
|
||||
requirements affect which solution you choose.
|
||||
|
||||
Scalability
|
||||
Scalability, along with expandability, is a major consideration in a
|
||||
general purpose OpenStack cloud. It might be difficult to predict
|
||||
the final intended size of the implementation as there are no
|
||||
established usage patterns for a general purpose cloud. It might
|
||||
become necessary to expand the initial deployment in order to
|
||||
accommodate growth and user demand.
|
||||
|
||||
Expandability
|
||||
Expandability is a major architecture factor for storage solutions
|
||||
with general purpose OpenStack cloud. A storage solution that
|
||||
expands to 50 PB is considered more expandable than a solution that
|
||||
only scales to 10 PB. This meter is related to scalability, which is
|
||||
the measure of a solution's performance as it expands.
|
||||
|
||||
General purpose cloud storage requirements
|
||||
------------------------------------------
|
||||
Using a scale-out storage solution with direct-attached storage (DAS) in
|
||||
the servers is well suited for a general purpose OpenStack cloud. Cloud
|
||||
services requirements determine your choice of scale-out solution. You
|
||||
need to determine if a single, highly expandable and highly vertical,
|
||||
scalable, centralized storage array is suitable for your design. After
|
||||
determining an approach, select the storage hardware based on this
|
||||
criteria.
|
||||
|
||||
This list expands upon the potential impacts for including a particular
|
||||
storage architecture (and corresponding storage hardware) into the
|
||||
design for a general purpose OpenStack cloud:
|
||||
|
||||
Connectivity
|
||||
If storage protocols other than Ethernet are part of the storage solution,
|
||||
ensure the appropriate hardware has been selected. If a centralized storage
|
||||
array is selected, ensure that the hypervisor will be able to connect to
|
||||
that storage array for image storage.
|
||||
|
||||
Usage
|
||||
How the particular storage architecture will be used is critical for
|
||||
determining the architecture. Some of the configurations that will
|
||||
influence the architecture include whether it will be used by the
|
||||
hypervisors for ephemeral instance storage, or if OpenStack Object
|
||||
Storage will use it for object storage.
|
||||
|
||||
Instance and image locations
|
||||
Where instances and images will be stored will influence the
|
||||
architecture.
|
||||
|
||||
Server hardware
|
||||
If the solution is a scale-out storage architecture that includes
|
||||
DAS, it will affect the server hardware selection. This could ripple
|
||||
into the decisions that affect host density, instance density, power
|
||||
density, OS-hypervisor, management tools and others.
|
||||
|
||||
A general purpose OpenStack cloud has multiple options. The key factors
|
||||
that will have an influence on selection of storage hardware for a
|
||||
general purpose OpenStack cloud are as follows:
|
||||
|
||||
Capacity
|
||||
Hardware resources selected for the resource nodes should be capable
|
||||
of supporting enough storage for the cloud services. Defining the
|
||||
initial requirements and ensuring the design can support adding
|
||||
capacity is important. Hardware nodes selected for object storage
|
||||
should be capable of support a large number of inexpensive disks
|
||||
with no reliance on RAID controller cards. Hardware nodes selected
|
||||
for block storage should be capable of supporting high speed storage
|
||||
solutions and RAID controller cards to provide performance and
|
||||
redundancy to storage at a hardware level. Selecting hardware RAID
|
||||
controllers that automatically repair damaged arrays will assist
|
||||
with the replacement and repair of degraded or deleted storage
|
||||
devices.
|
||||
|
||||
Performance
|
||||
Disks selected for object storage services do not need to be fast
|
||||
performing disks. We recommend that object storage nodes take
|
||||
advantage of the best cost per terabyte available for storage.
|
||||
Contrastingly, disks chosen for block storage services should take
|
||||
advantage of performance boosting features that may entail the use
|
||||
of SSDs or flash storage to provide high performance block storage
|
||||
pools. Storage performance of ephemeral disks used for instances
|
||||
should also be taken into consideration.
|
||||
|
||||
Fault tolerance
|
||||
Object storage resource nodes have no requirements for hardware
|
||||
fault tolerance or RAID controllers. It is not necessary to plan for
|
||||
fault tolerance within the object storage hardware because the
|
||||
object storage service provides replication between zones as a
|
||||
feature of the service. Block storage nodes, compute nodes, and
|
||||
cloud controllers should all have fault tolerance built in at the
|
||||
hardware level by making use of hardware RAID controllers and
|
||||
varying levels of RAID configuration. The level of RAID chosen
|
||||
should be consistent with the performance and availability
|
||||
requirements of the cloud.
|
||||
|
||||
Storage-focus cloud storage requirements
|
||||
----------------------------------------
|
||||
|
||||
Storage-focused OpenStack clouds must address I/O intensive workloads.
|
||||
These workloads are not CPU intensive, nor are they consistently network
|
||||
intensive. The network may be heavily utilized to transfer storage, but
|
||||
they are not otherwise network intensive.
|
||||
|
||||
The selection of storage hardware determines the overall performance and
|
||||
scalability of a storage-focused OpenStack design architecture. Several
|
||||
factors impact the design process, including:
|
||||
|
||||
Latency is a key consideration in a storage-focused OpenStack cloud.
|
||||
Using solid-state disks (SSDs) to minimize latency and, to reduce CPU
|
||||
delays caused by waiting for the storage, increases performance. Use
|
||||
RAID controller cards in compute hosts to improve the performance of the
|
||||
underlying disk subsystem.
|
||||
|
||||
Depending on the storage architecture, you can adopt a scale-out
|
||||
solution, or use a highly expandable and scalable centralized storage
|
||||
array. If a centralized storage array meets your requirements, then the
|
||||
array vendor determines the hardware selection. It is possible to build
|
||||
a storage array using commodity hardware with Open Source software, but
|
||||
requires people with expertise to build such a system.
|
||||
|
||||
On the other hand, a scale-out storage solution that uses
|
||||
direct-attached storage (DAS) in the servers may be an appropriate
|
||||
choice. This requires configuration of the server hardware to support
|
||||
the storage solution.
|
||||
|
||||
Considerations affecting storage architecture (and corresponding storage
|
||||
hardware) of a Storage-focused OpenStack cloud include:
|
||||
|
||||
Connectivity
|
||||
Ensure the connectivity matches the storage solution requirements. We
|
||||
recommended confirming that the network characteristics minimize latency
|
||||
to boost the overall performance of the design.
|
||||
|
||||
Latency
|
||||
Determine if the use case has consistent or highly variable latency.
|
||||
|
||||
Throughput
|
||||
Ensure that the storage solution throughput is optimized for your
|
||||
application requirements.
|
||||
|
||||
Server hardware
|
||||
Use of DAS impacts the server hardware choice and affects host
|
||||
density, instance density, power density, OS-hypervisor, and
|
||||
management tools.
|
@ -1,27 +0,0 @@
|
||||
======================
|
||||
Logging and monitoring
|
||||
======================
|
||||
|
||||
OpenStack clouds require appropriate monitoring platforms to catch and
|
||||
manage errors.
|
||||
|
||||
.. note::
|
||||
|
||||
We recommend leveraging existing monitoring systems to see if they
|
||||
are able to effectively monitor an OpenStack environment.
|
||||
|
||||
Specific meters that are critically important to capture include:
|
||||
|
||||
* Image disk utilization
|
||||
|
||||
* Response time to the Compute API
|
||||
|
||||
Logging and monitoring does not significantly differ for a multi-site OpenStack
|
||||
cloud. The tools described in the `Logging and monitoring chapter
|
||||
<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`__ of
|
||||
the Operations Guide remain applicable. Logging and monitoring can be provided
|
||||
on a per-site basis, and in a common centralized location.
|
||||
|
||||
When attempting to deploy logging and monitoring facilities to a centralized
|
||||
location, care must be taken with the load placed on the inter-site networking
|
||||
links.
|
@ -1,445 +0,0 @@
|
||||
==========
|
||||
Networking
|
||||
==========
|
||||
|
||||
OpenStack clouds generally have multiple network segments, with each
|
||||
segment providing access to particular resources. The network segments
|
||||
themselves also require network communication paths that should be
|
||||
separated from the other networks. When designing network services for a
|
||||
general purpose cloud, plan for either a physical or logical separation
|
||||
of network segments used by operators and tenants. Additional network
|
||||
segments can also be created for access to internal services such as the
|
||||
message bus and database used by various systems. Segregating these
|
||||
services onto separate networks helps to protect sensitive data and
|
||||
unauthorized access.
|
||||
|
||||
Choose a networking service based on the requirements of your instances.
|
||||
The architecture and design of your cloud will impact whether you choose
|
||||
OpenStack Networking (neutron) or legacy networking (nova-network).
|
||||
|
||||
Networking (neutron)
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenStack Networking (neutron) is a first class networking service that gives
|
||||
full control over creation of virtual network resources to tenants. This is
|
||||
often accomplished in the form of tunneling protocols that establish
|
||||
encapsulated communication paths over existing network infrastructure in order
|
||||
to segment tenant traffic. This method varies depending on the specific
|
||||
implementation, but some of the more common methods include tunneling over
|
||||
GRE, encapsulating with VXLAN, and VLAN tags.
|
||||
|
||||
We recommend you design at least three network segments. The first segment
|
||||
should be a public network, used to access REST APIs by tenants and operators.
|
||||
The controller nodes and swift proxies are the only devices connecting to this
|
||||
network segment. In some cases, this public network might also be serviced by
|
||||
hardware load balancers and other network devices.
|
||||
|
||||
The second segment is used by administrators to manage hardware resources.
|
||||
Configuration management tools also utilize this segment for deploying
|
||||
software and services onto new hardware. In some cases, this network
|
||||
segment is also used for internal services, including the message bus
|
||||
and database services. The second segment needs to communicate with every
|
||||
hardware node. Due to the highly sensitive nature of this network segment,
|
||||
it needs to be secured from unauthorized access.
|
||||
|
||||
The third network segment is used by applications and consumers to access the
|
||||
physical network, and for users to access applications. This network is
|
||||
segregated from the one used to access the cloud APIs and is not capable
|
||||
of communicating directly with the hardware resources in the cloud.
|
||||
Communication on this network segment is required by compute resource
|
||||
nodes and network gateway services that allow application data to access the
|
||||
physical network from outside the cloud.
|
||||
|
||||
Legacy networking (nova-network)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The legacy networking (nova-network) service is primarily a layer-2 networking
|
||||
service. It functions in two modes: flat networking mode and VLAN mode. In a
|
||||
flat network mode, all network hardware nodes and devices throughout the cloud
|
||||
are connected to a single layer-2 network segment that provides access to
|
||||
application data.
|
||||
|
||||
However, when the network devices in the cloud support segmentation using
|
||||
VLANs, legacy networking can operate in the second mode. In this design model,
|
||||
each tenant within the cloud is assigned a network subnet which is mapped to
|
||||
a VLAN on the physical network. It is especially important to remember that
|
||||
the maximum number of VLANs that can be used within a spanning tree domain
|
||||
is 4096. This places a hard limit on the amount of growth possible within the
|
||||
data center. Consequently, when designing a general purpose cloud intended to
|
||||
support multiple tenants, we recommend the use of legacy networking with
|
||||
VLANs, and not in flat network mode.
|
||||
|
||||
Layer-2 architecture advantages
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A network designed on layer-2 protocols has advantages over a network designed
|
||||
on layer-3 protocols. In spite of the difficulties of using a bridge to perform
|
||||
the network role of a router, many vendors, customers, and service providers
|
||||
choose to use Ethernet in as many parts of their networks as possible. The
|
||||
benefits of selecting a layer-2 design are:
|
||||
|
||||
* Ethernet frames contain all the essentials for networking. These include, but
|
||||
are not limited to, globally unique source addresses, globally unique
|
||||
destination addresses, and error control.
|
||||
|
||||
* Ethernet frames contain all the essentials for networking. These include,
|
||||
but are not limited to, globally unique source addresses, globally unique
|
||||
destination addresses, and error control.
|
||||
|
||||
* Ethernet frames can carry any kind of packet. Networking at layer-2 is
|
||||
independent of the layer-3 protocol.
|
||||
|
||||
* Adding more layers to the Ethernet frame only slows the networking process
|
||||
down. This is known as nodal processing delay.
|
||||
|
||||
* You can add adjunct networking features, for example class of service (CoS)
|
||||
or multicasting, to Ethernet as readily as IP networks.
|
||||
|
||||
* VLANs are an easy mechanism for isolating networks.
|
||||
|
||||
Most information starts and ends inside Ethernet frames. Today this applies
|
||||
to data, voice, and video. The concept is that the network will benefit more
|
||||
from the advantages of Ethernet if the transfer of information from a source
|
||||
to a destination is in the form of Ethernet frames.
|
||||
|
||||
Although it is not a substitute for IP networking, networking at layer-2 can
|
||||
be a powerful adjunct to IP networking.
|
||||
|
||||
Layer-2 Ethernet usage has these additional advantages over layer-3 IP network
|
||||
usage:
|
||||
|
||||
* Speed
|
||||
* Reduced overhead of the IP hierarchy.
|
||||
* No need to keep track of address configuration as systems move around.
|
||||
|
||||
Whereas the simplicity of layer-2 protocols might work well in a data center
|
||||
with hundreds of physical machines, cloud data centers have the additional
|
||||
burden of needing to keep track of all virtual machine addresses and
|
||||
networks. In these data centers, it is not uncommon for one physical node
|
||||
to support 30-40 instances.
|
||||
|
||||
.. Important::
|
||||
|
||||
Networking at the frame level says nothing about the presence or
|
||||
absence of IP addresses at the packet level. Almost all ports, links, and
|
||||
devices on a network of LAN switches still have IP addresses, as do all the
|
||||
source and destination hosts. There are many reasons for the continued need
|
||||
for IP addressing. The largest one is the need to manage the network. A
|
||||
device or link without an IP address is usually invisible to most
|
||||
management applications. Utilities including remote access for diagnostics,
|
||||
file transfer of configurations and software, and similar applications
|
||||
cannot run without IP addresses as well as MAC addresses.
|
||||
|
||||
Layer-2 architecture limitations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Layer-2 network architectures have some limitations that become noticeable when
|
||||
used outside of traditional data centers.
|
||||
|
||||
* Number of VLANs is limited to 4096.
|
||||
* The number of MACs stored in switch tables is limited.
|
||||
* You must accommodate the need to maintain a set of layer-4 devices to handle
|
||||
traffic control.
|
||||
* MLAG, often used for switch redundancy, is a proprietary solution that does
|
||||
not scale beyond two devices and forces vendor lock-in.
|
||||
* It can be difficult to troubleshoot a network without IP addresses and ICMP.
|
||||
* Configuring ARP can be complicated on a large layer-2 networks.
|
||||
* All network devices need to be aware of all MACs, even instance MACs, so
|
||||
there is constant churn in MAC tables and network state changes as instances
|
||||
start and stop.
|
||||
* Migrating MACs (instance migration) to different physical locations are a
|
||||
potential problem if you do not set ARP table timeouts properly.
|
||||
|
||||
It is important to know that layer-2 has a very limited set of network
|
||||
management tools. It is difficult to control traffic as it does not have
|
||||
mechanisms to manage the network or shape the traffic. Network
|
||||
troubleshooting is also troublesome, in part because network devices have
|
||||
no IP addresses. As a result, there is no reasonable way to check network
|
||||
delay.
|
||||
|
||||
In a layer-2 network all devices are aware of all MACs, even those that belong
|
||||
to instances. The network state information in the backbone changes whenever an
|
||||
instance starts or stops. Because of this, there is far too much churn in the
|
||||
MAC tables on the backbone switches.
|
||||
|
||||
Furthermore, on large layer-2 networks, configuring ARP learning can be
|
||||
complicated. The setting for the MAC address timer on switches is critical
|
||||
and, if set incorrectly, can cause significant performance problems. So when
|
||||
migrating MACs to different physical locations to support instance migration,
|
||||
problems may arise. As an example, the Cisco default MAC address timer is
|
||||
extremely long. As such, the network information maintained in the switches
|
||||
could be out of sync with the new location of the instance.
|
||||
|
||||
Layer-3 architecture advantages
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In layer-3 networking, routing takes instance MAC and IP addresses out of the
|
||||
network core, reducing state churn. The only time there would be a routing
|
||||
state change is in the case of a Top of Rack (ToR) switch failure or a link
|
||||
failure in the backbone itself. Other advantages of using a layer-3
|
||||
architecture include:
|
||||
|
||||
* Layer-3 networks provide the same level of resiliency and scalability
|
||||
as the Internet.
|
||||
|
||||
* Controlling traffic with routing metrics is straightforward.
|
||||
|
||||
* You can configure layer-3 to useˇBGPˇconfederation for scalability. This
|
||||
way core routers have state proportional to the number of racks, not to the
|
||||
number of servers or instances.
|
||||
|
||||
* There are a variety of well tested tools, such as ICMP, to monitor and
|
||||
manage traffic.
|
||||
|
||||
* Layer-3 architectures enable the use of :term:`quality of service (QoS)` to
|
||||
manage network performance.
|
||||
|
||||
Layer-3 architecture limitations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The main limitation of layer 3 is that there is no built-in isolation mechanism
|
||||
comparable to the VLANs in layer-2 networks. Furthermore, the hierarchical
|
||||
nature of IP addresses means that an instance is on the same subnet as its
|
||||
physical host, making migration out of the subnet difficult. For these reasons,
|
||||
network virtualization needs to use IPencapsulation and software at the end
|
||||
hosts. This is for isolation and the separation of the addressing in the
|
||||
virtual layer from the addressing in the physical layer. Other potential
|
||||
disadvantages of layer 3 include the need to design an IP addressing scheme
|
||||
rather than relying on the switches to keep track of the MAC addresses
|
||||
automatically, and to configure the interior gateway routing protocol in the
|
||||
switches.
|
||||
|
||||
Network design
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
There are many reasons an OpenStack network has complex requirements. However,
|
||||
one main factor is the many components that interact at different levels of the
|
||||
system stack, adding complexity. Data flows are also complex. Data in an
|
||||
OpenStack cloud moves both between instances across the network (also known as
|
||||
East-West), as well as in and out of the system (also known as North-South).
|
||||
Physical server nodes have network requirements that are independent of
|
||||
instance network requirements; these you must isolate from the core network
|
||||
to account for scalability. We recommend functionally separating the networks
|
||||
for security purposes and tuning performance through traffic shaping.
|
||||
|
||||
You must consider a number of important general technical and business factors
|
||||
when planning and designing an OpenStack network. These include:
|
||||
|
||||
* A requirement for vendor independence. To avoid hardware or software vendor
|
||||
lock-in, the design should not rely on specific features of a vendors router
|
||||
or switch.
|
||||
* A requirement to massively scale the ecosystem to support millions of end
|
||||
users.
|
||||
* A requirement to support indeterminate platforms and applications.
|
||||
* A requirement to design for cost efficient operations to take advantage of
|
||||
massive scale.
|
||||
* A requirement to ensure that there is no single point of failure in the
|
||||
cloud ecosystem.
|
||||
* A requirement for high availability architecture to meet customer SLA
|
||||
requirements.
|
||||
* A requirement to be tolerant of rack level failure.
|
||||
* A requirement to maximize flexibility to architect future production
|
||||
environments.
|
||||
|
||||
Bearing in mind these considerations, we recommend the following:
|
||||
|
||||
* Layer-3 designs are preferable to layer-2 architectures.
|
||||
* Design a dense multi-path network core to support multi-directional
|
||||
scaling and flexibility.
|
||||
* Use hierarchical addressing because it is the only viable option to scale
|
||||
network ecosystem.
|
||||
* Use virtual networking to isolate instance service network traffic from the
|
||||
management and internal network traffic.
|
||||
* Isolate virtual networks using encapsulation technologies.
|
||||
* Use traffic shaping for performance tuning.
|
||||
* Use eBGP to connect to the Internet up-link.
|
||||
* Use iBGP to flatten the internal traffic on the layer-3 mesh.
|
||||
* Determine the most effective configuration for block storage network.
|
||||
|
||||
Operator considerations
|
||||
-----------------------
|
||||
|
||||
The network design for an OpenStack cluster includes decisions regarding
|
||||
the interconnect needs within the cluster, the need to allow clients to
|
||||
access their resources, and the access requirements for operators to
|
||||
administrate the cluster. You should consider the bandwidth, latency,
|
||||
and reliability of these networks.
|
||||
|
||||
Whether you are using an external provider or an internal team, you need
|
||||
to consider additional design decisions about monitoring and alarming.
|
||||
If you are using an external provider, service level agreements (SLAs)
|
||||
are typically defined in your contract. Operational considerations such
|
||||
as bandwidth, latency, and jitter can be part of the SLA.
|
||||
|
||||
As demand for network resources increase, make sure your network design
|
||||
accommodates expansion and upgrades. Operators add additional IP address
|
||||
blocks and add additional bandwidth capacity. In addition, consider
|
||||
managing hardware and software lifecycle events, for example upgrades,
|
||||
decommissioning, and outages, while avoiding service interruptions for
|
||||
tenants.
|
||||
|
||||
Factor maintainability into the overall network design. This includes
|
||||
the ability to manage and maintain IP addresses as well as the use of
|
||||
overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
|
||||
tags. As an example, if you may need to change all of the IP addresses
|
||||
on a network, a process known as renumbering, then the design must
|
||||
support this function.
|
||||
|
||||
Address network-focused applications when considering certain
|
||||
operational realities. For example, consider the impending exhaustion of
|
||||
IPv4 addresses, the migration to IPv6, and the use of private networks
|
||||
to segregate different types of traffic that an application receives or
|
||||
generates. In the case of IPv4 to IPv6 migrations, applications should
|
||||
follow best practices for storing IP addresses. We recommend you avoid
|
||||
relying on IPv4 features that did not carry over to the IPv6 protocol or
|
||||
have differences in implementation.
|
||||
|
||||
To segregate traffic, allow applications to create a private tenant
|
||||
network for database and storage network traffic. Use a public network
|
||||
for services that require direct client access from the Internet. Upon
|
||||
segregating the traffic, consider :term:`quality of service (QoS)` and
|
||||
security to ensure each network has the required level of service.
|
||||
|
||||
Finally, consider the routing of network traffic. For some applications,
|
||||
develop a complex policy framework for routing. To create a routing
|
||||
policy that satisfies business requirements, consider the economic cost
|
||||
of transmitting traffic over expensive links versus cheaper links, in
|
||||
addition to bandwidth, latency, and jitter requirements.
|
||||
|
||||
Additionally, consider how to respond to network events. How load
|
||||
transfers from one link to another during a failure scenario could be
|
||||
a factor in the design. If you do not plan network capacity
|
||||
correctly, failover traffic could overwhelm other ports or network
|
||||
links and create a cascading failure scenario. In this case,
|
||||
traffic that fails over to one link overwhelms that link and then
|
||||
moves to the subsequent links until all network traffic stops.
|
||||
|
||||
Additional considerations
|
||||
-------------------------
|
||||
|
||||
There are several further considerations when designing a network-focused
|
||||
OpenStack cloud. Redundant networking: ToR switch high availability risk
|
||||
analysis. In most cases, it is much more economical to use a single switch
|
||||
with a small pool of spare switches to replace failed units than it is to
|
||||
outfit an entire data center with redundant switches. Applications should
|
||||
tolerate rack level outages without affecting normal operations since network
|
||||
and compute resources are easily provisioned and plentiful.
|
||||
|
||||
Research indicates the mean time between failures (MTBF) on switches is
|
||||
between 100,000 and 200,000 hours. This number is dependent on the ambient
|
||||
temperature of the switch in the data center. When properly cooled and
|
||||
maintained, this translates to between 11 and 22 years before failure. Even
|
||||
in the worst case of poor ventilation and high ambient temperatures in the data
|
||||
center, the MTBF is still 2-3 years.
|
||||
|
||||
Reference
|
||||
https://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
|
||||
for further information.
|
||||
|
||||
Legacy networking (nova-network)
|
||||
OpenStack Networking
|
||||
Simple, single agent
|
||||
Complex, multiple agents
|
||||
Flat or VLAN
|
||||
Flat, VLAN, Overlays, L2-L3, SDN
|
||||
No plug-in support
|
||||
Plug-in support for 3rd parties
|
||||
No multi-tier topologies
|
||||
Multi-tier topologies
|
||||
|
||||
Preparing for the future: IPv6 support
|
||||
--------------------------------------
|
||||
|
||||
One of the most important networking topics today is the exhaustion of
|
||||
IPv4 addresses. As of late 2015, ICANN announced that the the final
|
||||
IPv4 address blocks have been fully assigned. Because of this, IPv6
|
||||
protocol has become the future of network focused applications. IPv6
|
||||
increases the address space significantly, fixes long standing issues
|
||||
in the IPv4 protocol, and will become essential for network focused
|
||||
applications in the future.
|
||||
|
||||
OpenStack Networking, when configured for it, supports IPv6. To enable
|
||||
IPv6, create an IPv6 subnet in Networking and use IPv6 prefixes when
|
||||
creating security groups.
|
||||
|
||||
Asymmetric links
|
||||
----------------
|
||||
|
||||
When designing a network architecture, the traffic patterns of an
|
||||
application heavily influence the allocation of total bandwidth and
|
||||
the number of links that you use to send and receive traffic. Applications
|
||||
that provide file storage for customers allocate bandwidth and links to
|
||||
favor incoming traffic; whereas video streaming applications allocate
|
||||
bandwidth and links to favor outgoing traffic.
|
||||
|
||||
Performance
|
||||
-----------
|
||||
|
||||
It is important to analyze the applications tolerance for latency and
|
||||
jitter when designing an environment to support network focused
|
||||
applications. Certain applications, for example VoIP, are less tolerant
|
||||
of latency and jitter. When latency and jitter are issues, certain
|
||||
applications may require tuning of QoS parameters and network device
|
||||
queues to ensure that they queue for transmit immediately or guarantee
|
||||
minimum bandwidth. Since OpenStack currently does not support these functions,
|
||||
consider carefully your selected network plug-in.
|
||||
|
||||
The location of a service may also impact the application or consumer
|
||||
experience. If an application serves differing content to different users,
|
||||
it must properly direct connections to those specific locations. Where
|
||||
appropriate, use a multi-site installation for these situations.
|
||||
|
||||
You can implement networking in two separate ways. Legacy networking
|
||||
(nova-network) provides a flat DHCP network with a single broadcast domain.
|
||||
This implementation does not support tenant isolation networks or advanced
|
||||
plug-ins, but it is currently the only way to implement a distributed
|
||||
layer-3 (L3) agent using the multi host configuration. OpenStack Networking
|
||||
(neutron) is the official networking implementation and provides a pluggable
|
||||
architecture that supports a large variety of network methods. Some of these
|
||||
include a layer-2 only provider network model, external device plug-ins, or
|
||||
even OpenFlow controllers.
|
||||
|
||||
Networking at large scales becomes a set of boundary questions. The
|
||||
determination of how large a layer-2 domain must be is based on the
|
||||
amount of nodes within the domain and the amount of broadcast traffic
|
||||
that passes between instances. Breaking layer-2 boundaries may require
|
||||
the implementation of overlay networks and tunnels. This decision is a
|
||||
balancing act between the need for a smaller overhead or a need for a smaller
|
||||
domain.
|
||||
|
||||
When selecting network devices, be aware that making a decision based on the
|
||||
greatest port density often comes with a drawback. Aggregation switches and
|
||||
routers have not all kept pace with Top of Rack switches and may induce
|
||||
bottlenecks on north-south traffic. As a result, it may be possible for
|
||||
massive amounts of downstream network utilization to impact upstream network
|
||||
devices, impacting service to the cloud. Since OpenStack does not currently
|
||||
provide a mechanism for traffic shaping or rate limiting, it is necessary to
|
||||
implement these features at the network hardware level.
|
||||
|
||||
Tunable networking components
|
||||
-----------------------------
|
||||
|
||||
Consider configurable networking components related to an OpenStack
|
||||
architecture design when designing for network intensive workloads
|
||||
that include MTU and QoS. Some workloads require a larger MTU than normal
|
||||
due to the transfer of large blocks of data. When providing network
|
||||
service for applications such as video streaming or storage replication,
|
||||
we recommend that you configure both OpenStack hardware nodes and the
|
||||
supporting network equipment for jumbo frames where possible. This
|
||||
allows for better use of available bandwidth. Configure jumbo frames across the
|
||||
complete path the packets traverse. If one network component is not capable of
|
||||
handling jumbo frames then the entire path reverts to the default MTU.
|
||||
|
||||
:term:`Quality of Service (QoS)` also has a great impact on network intensive
|
||||
workloads as it provides instant service to packets which have a higher
|
||||
priority due to the impact of poor network performance. In applications such as
|
||||
Voice over IP (VoIP), differentiated services code points are a near
|
||||
requirement for proper operation. You can also use QoS in the opposite
|
||||
direction for mixed workloads to prevent low priority but high bandwidth
|
||||
applications, for example backup services, video conferencing, or file sharing,
|
||||
from blocking bandwidth that is needed for the proper operation of other
|
||||
workloads. It is possible to tag file storage traffic as a lower class, such as
|
||||
best effort or scavenger, to allow the higher priority traffic through. In
|
||||
cases where regions within a cloud might be geographically distributed it may
|
||||
also be necessary to plan accordingly to implement WAN optimization to combat
|
||||
latency or packet loss.
|
@ -1,260 +0,0 @@
|
||||
==================
|
||||
Software selection
|
||||
==================
|
||||
|
||||
Software selection, particularly for a general purpose OpenStack architecture
|
||||
design involves three areas:
|
||||
|
||||
* Operating system (OS) and hypervisor
|
||||
|
||||
* OpenStack components
|
||||
|
||||
* Supplemental software
|
||||
|
||||
Operating system and hypervisor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The operating system (OS) and hypervisor have a significant impact on
|
||||
the overall design. Selecting a particular operating system and
|
||||
hypervisor can directly affect server hardware selection. Make sure the
|
||||
storage hardware and topology support the selected operating system and
|
||||
hypervisor combination. Also ensure the networking hardware selection
|
||||
and topology will work with the chosen operating system and hypervisor
|
||||
combination.
|
||||
|
||||
Some areas that could be impacted by the selection of OS and hypervisor
|
||||
include:
|
||||
|
||||
Cost
|
||||
Selecting a commercially supported hypervisor, such as Microsoft
|
||||
Hyper-V, will result in a different cost model rather than
|
||||
community-supported open source hypervisors including
|
||||
:term:`KVM<kernel-based VM (KVM)>`, Kinstance or :term:`Xen`. When
|
||||
comparing open source OS solutions, choosing Ubuntu over Red Hat
|
||||
(or vice versa) will have an impact on cost due to support
|
||||
contracts.
|
||||
|
||||
Support
|
||||
Depending on the selected hypervisor, staff should have the
|
||||
appropriate training and knowledge to support the selected OS and
|
||||
hypervisor combination. If they do not, training will need to be
|
||||
provided which could have a cost impact on the design.
|
||||
|
||||
Management tools
|
||||
The management tools used for Ubuntu and Kinstance differ from the
|
||||
management tools for VMware vSphere. Although both OS and hypervisor
|
||||
combinations are supported by OpenStack, there is
|
||||
different impact to the rest of the design as a result of the
|
||||
selection of one combination versus the other.
|
||||
|
||||
Scale and performance
|
||||
Ensure that selected OS and hypervisor combinations meet the
|
||||
appropriate scale and performance requirements. The chosen
|
||||
architecture will need to meet the targeted instance-host ratios
|
||||
with the selected OS-hypervisor combinations.
|
||||
|
||||
Security
|
||||
Ensure that the design can accommodate regular periodic
|
||||
installations of application security patches while maintaining
|
||||
required workloads. The frequency of security patches for the
|
||||
proposed OS-hypervisor combination will have an impact on
|
||||
performance and the patch installation process could affect
|
||||
maintenance windows.
|
||||
|
||||
Supported features
|
||||
Determine which OpenStack features are required. This will often
|
||||
determine the selection of the OS-hypervisor combination. Some
|
||||
features are only available with specific operating systems or
|
||||
hypervisors.
|
||||
|
||||
Interoperability
|
||||
You will need to consider how the OS and hypervisor combination
|
||||
interactions with other operating systems and hypervisors, including
|
||||
other software solutions. Operational troubleshooting tools for one
|
||||
OS-hypervisor combination may differ from the tools used for another
|
||||
OS-hypervisor combination and, as a result, the design will need to
|
||||
address if the two sets of tools need to interoperate.
|
||||
|
||||
OpenStack components
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Selecting which OpenStack components are included in the overall design
|
||||
is important. Some OpenStack components, like compute and Image service,
|
||||
are required in every architecture. Other components, like
|
||||
Orchestration, are not always required.
|
||||
|
||||
A compute-focused OpenStack design architecture may contain the following
|
||||
components:
|
||||
|
||||
* Identity (keystone)
|
||||
|
||||
* Dashboard (horizon)
|
||||
|
||||
* Compute (nova)
|
||||
|
||||
* Object Storage (swift)
|
||||
|
||||
* Image (glance)
|
||||
|
||||
* Networking (neutron)
|
||||
|
||||
* Orchestration (heat)
|
||||
|
||||
.. note::
|
||||
|
||||
A compute-focused design is less likely to include OpenStack Block
|
||||
Storage. However, there may be some situations where the need for
|
||||
performance requires a block storage component to improve data I-O.
|
||||
|
||||
Excluding certain OpenStack components can limit or constrain the
|
||||
functionality of other components. For example, if the architecture
|
||||
includes Orchestration but excludes Telemetry, then the design will not
|
||||
be able to take advantage of Orchestrations' auto scaling functionality.
|
||||
It is important to research the component interdependencies in
|
||||
conjunction with the technical requirements before deciding on the final
|
||||
architecture.
|
||||
|
||||
Networking software
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenStack Networking (neutron) provides a wide variety of networking
|
||||
services for instances. There are many additional networking software
|
||||
packages that can be useful when managing OpenStack components. Some
|
||||
examples include:
|
||||
|
||||
* Software to provide load balancing
|
||||
|
||||
* Network redundancy protocols
|
||||
|
||||
* Routing daemons
|
||||
|
||||
Some of these software packages are described in more detail in the
|
||||
`OpenStack network nodes chapter <http://docs.openstack.org/ha-guide
|
||||
/networking-ha.html>`_ in the OpenStack High Availability Guide.
|
||||
|
||||
For a general purpose OpenStack cloud, the OpenStack infrastructure
|
||||
components need to be highly available. If the design does not include
|
||||
hardware load balancing, networking software packages like HAProxy will
|
||||
need to be included.
|
||||
|
||||
For a compute-focused OpenStack cloud, the OpenStack infrastructure
|
||||
components must be highly available. If the design does not include
|
||||
hardware load balancing, you must add networking software packages, for
|
||||
example, HAProxy.
|
||||
|
||||
Management software
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Management software includes software for providing:
|
||||
|
||||
* Clustering
|
||||
|
||||
* Logging
|
||||
|
||||
* Monitoring
|
||||
|
||||
* Alerting
|
||||
|
||||
.. important::
|
||||
|
||||
The factors for determining which software packages in this category
|
||||
to select is outside the scope of this design guide.
|
||||
|
||||
The selected supplemental software solution impacts and affects the overall
|
||||
OpenStack cloud design. This includes software for providing clustering,
|
||||
logging, monitoring and alerting.
|
||||
|
||||
The inclusion of clustering software, such as Corosync or Pacemaker, is
|
||||
primarily determined by the availability of the cloud infrastructure and
|
||||
the complexity of supporting the configuration after it is deployed. The
|
||||
`OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/>`_
|
||||
provides more details on the installation and configuration of Corosync
|
||||
and Pacemaker, should these packages need to be included in the design.
|
||||
|
||||
Operational considerations determine the requirements for logging,
|
||||
monitoring, and alerting. Each of these sub-categories include various
|
||||
options.
|
||||
|
||||
For example, in the logging sub-category you could select Logstash,
|
||||
Splunk, Log Insight, or another log aggregation-consolidation tool.
|
||||
Store logs in a centralized location to facilitate performing analytics
|
||||
against the data. Log data analytics engines can also provide automation
|
||||
and issue notification, by providing a mechanism to both alert and
|
||||
automatically attempt to remediate some of the more commonly known
|
||||
issues.
|
||||
|
||||
If these software packages are required, the design must account for the
|
||||
additional resource consumption (CPU, RAM, storage, and network
|
||||
bandwidth). Some other potential design impacts include:
|
||||
|
||||
* OS-hypervisor combination
|
||||
Ensure that the selected logging, monitoring, or alerting tools support
|
||||
the proposed OS-hypervisor combination.
|
||||
|
||||
* Network hardware
|
||||
The network hardware selection needs to be supported by the logging,
|
||||
monitoring, and alerting software.
|
||||
|
||||
Database software
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Most OpenStack components require access to back-end database services
|
||||
to store state and configuration information. Choose an appropriate
|
||||
back-end database which satisfies the availability and fault tolerance
|
||||
requirements of the OpenStack services.
|
||||
|
||||
MySQL is the default database for OpenStack, but other compatible
|
||||
databases are available.
|
||||
|
||||
.. note::
|
||||
|
||||
Telemetry uses MongoDB.
|
||||
|
||||
The chosen high availability database solution changes according to the
|
||||
selected database. MySQL, for example, provides several options. Use a
|
||||
replication technology such as Galera for active-active clustering. For
|
||||
active-passive use some form of shared storage. Each of these potential
|
||||
solutions has an impact on the design:
|
||||
|
||||
* Solutions that employ Galera/MariaDB require at least three MySQL
|
||||
nodes.
|
||||
|
||||
* MongoDB has its own design considerations for high availability.
|
||||
|
||||
* OpenStack design, generally, does not include shared storage.
|
||||
However, for some high availability designs, certain components might
|
||||
require it depending on the specific implementation.
|
||||
|
||||
|
||||
Licensing
|
||||
~~~~~~~~~
|
||||
|
||||
The many different forms of license agreements for software are often written
|
||||
with the use of dedicated hardware in mind. This model is relevant for the
|
||||
cloud platform itself, including the hypervisor operating system, supporting
|
||||
software for items such as database, RPC, backup, and so on. Consideration
|
||||
must be made when offering Compute service instances and applications to end
|
||||
users of the cloud, since the license terms for that software may need some
|
||||
adjustment to be able to operate economically in the cloud.
|
||||
|
||||
Multi-site OpenStack deployments present additional licensing
|
||||
considerations over and above regular OpenStack clouds, particularly
|
||||
where site licenses are in use to provide cost efficient access to
|
||||
software licenses. The licensing for host operating systems, guest
|
||||
operating systems, OpenStack distributions (if applicable),
|
||||
software-defined infrastructure including network controllers and
|
||||
storage systems, and even individual applications need to be evaluated.
|
||||
|
||||
Topics to consider include:
|
||||
|
||||
* The definition of what constitutes a site in the relevant licenses,
|
||||
as the term does not necessarily denote a geographic or otherwise
|
||||
physically isolated location.
|
||||
|
||||
* Differentiations between "hot" (active) and "cold" (inactive) sites,
|
||||
where significant savings may be made in situations where one site is
|
||||
a cold standby for disaster recovery purposes only.
|
||||
|
||||
* Certain locations might require local vendors to provide support and
|
||||
services for each site which may vary with the licensing agreement in
|
||||
place.
|
@ -1,61 +0,0 @@
|
||||
======================
|
||||
Technical requirements
|
||||
======================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
technical-requirements-software-selection.rst
|
||||
technical-requirements-hardware-selection.rst
|
||||
technical-requirements-network-design.rst
|
||||
technical-requirements-logging-monitoring.rst
|
||||
|
||||
Any given cloud deployment is expected to include these base services:
|
||||
|
||||
* Compute
|
||||
|
||||
* Networking
|
||||
|
||||
* Storage
|
||||
|
||||
Each of these services have different software and hardware resource
|
||||
requirements.
|
||||
As a result, you must make design decisions relating directly
|
||||
to the service, as well as provide a balanced infrastructure for all services.
|
||||
|
||||
There are many ways to split out an OpenStack deployment, but a two box
|
||||
deployment typically consists of:
|
||||
|
||||
* A controller node
|
||||
* A compute node
|
||||
|
||||
The controller node will typically host:
|
||||
|
||||
* Identity service (for authentication)
|
||||
* Image service (for image storage)
|
||||
* Block Storage
|
||||
* Networking service (the ``nova-network`` service may be used instead)
|
||||
* Compute service API, conductor, and scheduling services
|
||||
* Supporting services like the message broker (RabbitMQ)
|
||||
and database (MySQL or PostgreSQL)
|
||||
|
||||
The compute node will typically host:
|
||||
|
||||
* Nova compute
|
||||
* A networking agent, if using OpenStack Networking
|
||||
|
||||
To provide additional block storage in a small environment, you may also
|
||||
choose to deploy ``cinder-volume`` on the compute node.
|
||||
You may also choose to run ``nova-compute`` on the controller itself to
|
||||
allow you to run virtual machines on both hosts in a small environments.
|
||||
|
||||
To expand such an environment you would add additional compute nodes,
|
||||
a separate networking node, and eventually a second controller for high
|
||||
availability. You might also split out storage to dedicated nodes.
|
||||
|
||||
The OpenStack Installation guides provide some guidance on getting a basic
|
||||
2-3 node deployment installed and running:
|
||||
|
||||
* `OpenStack Installation Guide for Ubuntu <http://docs.openstack.org/mitaka/install-guide-ubuntu/>`_
|
||||
* `OpenStack Installation Guide for Red Hat Enterprise Linux and CentOS <http://docs.openstack.org/mikata/install-guide-rdo/>`_
|
||||
* `OpenStack Installation Guide for openSUSE and SUSE Linux Enterprise <http://docs.openstack.org/mitaka/install-guide-obs/>`_
|
@ -134,7 +134,7 @@ of cores is further multiplied.
|
||||
testing with your local workload with both Hyper-Threading on and off to
|
||||
determine what is more appropriate in your case.
|
||||
|
||||
Choosing a Hypervisor
|
||||
Choosing a hypervisor
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A hypervisor provides software to manage virtual machine access to the
|
||||
@ -173,6 +173,110 @@ and in the `configuration reference
|
||||
deployment using host aggregates or cells. However, an individual
|
||||
compute node can run only a single hypervisor at a time.
|
||||
|
||||
Choosing server hardware
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Consider the following in selecting server hardware form factor suited for
|
||||
your OpenStack design architecture:
|
||||
|
||||
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
||||
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
||||
aware, however, that this also decreases server density. For example,
|
||||
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
||||
M1000e support up to 16 servers in only ten rack units. Using
|
||||
half-height blades is twice as dense as using full-height blades,
|
||||
which results in only eight servers per ten rack units.
|
||||
|
||||
* 1U rack-mounted servers have the ability to offer greater server density
|
||||
than a blade server solution, but are often limited to dual-socket,
|
||||
multi-core CPU configurations. It is possible to place forty 1U servers
|
||||
in a rack, providing space for the top of rack (ToR) switches, compared
|
||||
to 32 full width blade servers.
|
||||
|
||||
To obtain greater than dual-socket support in a 1U rack-mount form
|
||||
factor, customers need to buy their systems from Original Design
|
||||
Manufacturers (ODMs) or second-tier manufacturers.
|
||||
|
||||
.. warning::
|
||||
|
||||
This may cause issues for organizations that have preferred
|
||||
vendor policies or concerns with support and hardware warranties
|
||||
of non-tier 1 vendors.
|
||||
|
||||
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
||||
but with a corresponding decrease in server density (half the density
|
||||
that 1U rack-mounted servers offer).
|
||||
|
||||
* Larger rack-mounted servers, such as 4U servers, often provide even
|
||||
greater CPU capacity, commonly supporting four or even eight CPU
|
||||
sockets. These servers have greater expandability, but such servers
|
||||
have much lower server density and are often more expensive.
|
||||
|
||||
* ``Sled servers`` are rack-mounted servers that support multiple
|
||||
independent servers in a single 2U or 3U enclosure. These deliver
|
||||
higher density as compared to typical 1U or 2U rack-mounted servers.
|
||||
For example, many sled servers offer four independent dual-socket
|
||||
nodes in 2U for a total of eight CPU sockets in 2U.
|
||||
|
||||
|
||||
Other hardware considerations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Other factors that influence server hardware selection for an OpenStack
|
||||
design architecture include:
|
||||
|
||||
Instance density
|
||||
More hosts are required to support the anticipated scale
|
||||
if the design architecture uses dual-socket hardware designs.
|
||||
|
||||
For a general purpose OpenStack cloud, sizing is an important consideration.
|
||||
The expected or anticipated number of instances that each hypervisor can
|
||||
host is a common meter used in sizing the deployment. The selected server
|
||||
hardware needs to support the expected or anticipated instance density.
|
||||
|
||||
Host density
|
||||
Another option to address the higher host count is to use a
|
||||
quad-socket platform. Taking this approach decreases host density
|
||||
which also increases rack count. This configuration affects the
|
||||
number of power connections and also impacts network and cooling
|
||||
requirements.
|
||||
|
||||
Physical data centers have limited physical space, power, and
|
||||
cooling. The number of hosts (or hypervisors) that can be fitted
|
||||
into a given metric (rack, rack unit, or floor tile) is another
|
||||
important method of sizing. Floor weight is an often overlooked
|
||||
consideration. The data center floor must be able to support the
|
||||
weight of the proposed number of hosts within a rack or set of
|
||||
racks. These factors need to be applied as part of the host density
|
||||
calculation and server hardware selection.
|
||||
|
||||
Power and cooling density
|
||||
The power and cooling density requirements might be lower than with
|
||||
blade, sled, or 1U server designs due to lower host density (by
|
||||
using 2U, 3U or even 4U server designs). For data centers with older
|
||||
infrastructure, this might be a desirable feature.
|
||||
|
||||
Data centers have a specified amount of power fed to a given rack or
|
||||
set of racks. Older data centers may have a power density as power
|
||||
as low as 20 AMPs per rack, while more recent data centers can be
|
||||
architected to support power densities as high as 120 AMP per rack.
|
||||
The selected server hardware must take power density into account.
|
||||
|
||||
Network connectivity
|
||||
The selected server hardware must have the appropriate number of
|
||||
network connections, as well as the right type of network
|
||||
connections, in order to support the proposed architecture. Ensure
|
||||
that, at a minimum, there are at least two diverse network
|
||||
connections coming into each rack.
|
||||
|
||||
The selection of form factors or architectures affects the selection of
|
||||
server hardware. Ensure that the selected server hardware is configured
|
||||
to support enough storage capacity (or storage expandability) to match
|
||||
the requirements of selected scale-out storage solution. Similarly, the
|
||||
network architecture impacts the server hardware selection and vice
|
||||
versa.
|
||||
|
||||
|
||||
Instance Storage Solutions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -381,3 +485,144 @@ Networking
|
||||
|
||||
Networking in OpenStack is a complex, multifaceted challenge. See
|
||||
:doc:`design-networking`.
|
||||
|
||||
Compute (server) hardware selection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Consider the following factors when selecting compute (server) hardware:
|
||||
|
||||
* Server density
|
||||
A measure of how many servers can fit into a given measure of
|
||||
physical space, such as a rack unit [U].
|
||||
|
||||
* Resource capacity
|
||||
The number of CPU cores, how much RAM, or how much storage a given
|
||||
server delivers.
|
||||
|
||||
* Expandability
|
||||
The number of additional resources you can add to a server before it
|
||||
reaches capacity.
|
||||
|
||||
* Cost
|
||||
The relative cost of the hardware weighed against the level of
|
||||
design effort needed to build the system.
|
||||
|
||||
Weigh these considerations against each other to determine the best
|
||||
design for the desired purpose. For example, increasing server density
|
||||
means sacrificing resource capacity or expandability. Increasing resource
|
||||
capacity and expandability can increase cost but decrease server density.
|
||||
Decreasing cost often means decreasing supportability, server density,
|
||||
resource capacity, and expandability.
|
||||
|
||||
Compute capacity (CPU cores and RAM capacity) is a secondary
|
||||
consideration for selecting server hardware. The required
|
||||
server hardware must supply adequate CPU sockets, additional CPU cores,
|
||||
and more RAM; network connectivity and storage capacity are not as
|
||||
critical. The hardware needs to provide enough network connectivity and
|
||||
storage capacity to meet the user requirements.
|
||||
|
||||
For a compute-focused cloud, emphasis should be on server
|
||||
hardware that can offer more CPU sockets, more CPU cores, and more RAM.
|
||||
Network connectivity and storage capacity are less critical.
|
||||
|
||||
When designing a OpenStack cloud architecture, you must
|
||||
consider whether you intend to scale up or scale out. Selecting a
|
||||
smaller number of larger hosts, or a larger number of smaller hosts,
|
||||
depends on a combination of factors: cost, power, cooling, physical rack
|
||||
and floor space, support-warranty, and manageability.
|
||||
|
||||
Consider the following in selecting server hardware form factor suited for
|
||||
your OpenStack design architecture:
|
||||
|
||||
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
||||
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
||||
aware, however, that this also decreases server density. For example,
|
||||
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
||||
M1000e support up to 16 servers in only ten rack units. Using
|
||||
half-height blades is twice as dense as using full-height blades,
|
||||
which results in only eight servers per ten rack units.
|
||||
|
||||
* 1U rack-mounted servers have the ability to offer greater server density
|
||||
than a blade server solution, but are often limited to dual-socket,
|
||||
multi-core CPU configurations. It is possible to place forty 1U servers
|
||||
in a rack, providing space for the top of rack (ToR) switches, compared
|
||||
to 32 full width blade servers.
|
||||
|
||||
To obtain greater than dual-socket support in a 1U rack-mount form
|
||||
factor, customers need to buy their systems from Original Design
|
||||
Manufacturers (ODMs) or second-tier manufacturers.
|
||||
|
||||
.. warning::
|
||||
|
||||
This may cause issues for organizations that have preferred
|
||||
vendor policies or concerns with support and hardware warranties
|
||||
of non-tier 1 vendors.
|
||||
|
||||
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
||||
but with a corresponding decrease in server density (half the density
|
||||
that 1U rack-mounted servers offer).
|
||||
|
||||
* Larger rack-mounted servers, such as 4U servers, often provide even
|
||||
greater CPU capacity, commonly supporting four or even eight CPU
|
||||
sockets. These servers have greater expandability, but such servers
|
||||
have much lower server density and are often more expensive.
|
||||
|
||||
* ``Sled servers`` are rack-mounted servers that support multiple
|
||||
independent servers in a single 2U or 3U enclosure. These deliver
|
||||
higher density as compared to typical 1U or 2U rack-mounted servers.
|
||||
For example, many sled servers offer four independent dual-socket
|
||||
nodes in 2U for a total of eight CPU sockets in 2U.
|
||||
|
||||
Other factors that influence server hardware selection for an OpenStack
|
||||
design architecture include:
|
||||
|
||||
Instance density
|
||||
More hosts are required to support the anticipated scale
|
||||
if the design architecture uses dual-socket hardware designs.
|
||||
|
||||
For a general purpose OpenStack cloud, sizing is an important consideration.
|
||||
The expected or anticipated number of instances that each hypervisor can
|
||||
host is a common meter used in sizing the deployment. The selected server
|
||||
hardware needs to support the expected or anticipated instance density.
|
||||
|
||||
Host density
|
||||
Another option to address the higher host count is to use a
|
||||
quad-socket platform. Taking this approach decreases host density
|
||||
which also increases rack count. This configuration affects the
|
||||
number of power connections and also impacts network and cooling
|
||||
requirements.
|
||||
|
||||
Physical data centers have limited physical space, power, and
|
||||
cooling. The number of hosts (or hypervisors) that can be fitted
|
||||
into a given metric (rack, rack unit, or floor tile) is another
|
||||
important method of sizing. Floor weight is an often overlooked
|
||||
consideration. The data center floor must be able to support the
|
||||
weight of the proposed number of hosts within a rack or set of
|
||||
racks. These factors need to be applied as part of the host density
|
||||
calculation and server hardware selection.
|
||||
|
||||
Power and cooling density
|
||||
The power and cooling density requirements might be lower than with
|
||||
blade, sled, or 1U server designs due to lower host density (by
|
||||
using 2U, 3U or even 4U server designs). For data centers with older
|
||||
infrastructure, this might be a desirable feature.
|
||||
|
||||
Data centers have a specified amount of power fed to a given rack or
|
||||
set of racks. Older data centers may have a power density as power
|
||||
as low as 20 AMPs per rack, while more recent data centers can be
|
||||
architected to support power densities as high as 120 AMP per rack.
|
||||
The selected server hardware must take power density into account.
|
||||
|
||||
Network connectivity
|
||||
The selected server hardware must have the appropriate number of
|
||||
network connections, as well as the right type of network
|
||||
connections, in order to support the proposed architecture. Ensure
|
||||
that, at a minimum, there are at least two diverse network
|
||||
connections coming into each rack.
|
||||
|
||||
The selection of form factors or architectures affects the selection of
|
||||
server hardware. Ensure that the selected server hardware is configured
|
||||
to support enough storage capacity (or storage expandability) to match
|
||||
the requirements of selected scale-out storage solution. Similarly, the
|
||||
network architecture impacts the server hardware selection and vice
|
||||
versa.
|
||||
|
@ -4,6 +4,516 @@ Networking concepts
|
||||
|
||||
Cloud fundementally changes the ways that networking is provided and consumed.
|
||||
Understanding the following concepts and decisions is imperative when making
|
||||
the right architectural decisions
|
||||
the right architectural decisions.
|
||||
|
||||
OpenStack clouds generally have multiple network segments, with each
|
||||
segment providing access to particular resources. The network segments
|
||||
themselves also require network communication paths that should be
|
||||
separated from the other networks. When designing network services for a
|
||||
general purpose cloud, plan for either a physical or logical separation
|
||||
of network segments used by operators and tenants. Additional network
|
||||
segments can also be created for access to internal services such as the
|
||||
message bus and database used by various systems. Segregating these
|
||||
services onto separate networks helps to protect sensitive data and
|
||||
unauthorized access.
|
||||
|
||||
Choose a networking service based on the requirements of your instances.
|
||||
The architecture and design of your cloud will impact whether you choose
|
||||
OpenStack Networking (neutron) or legacy networking (nova-network).
|
||||
|
||||
Networking (neutron)
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenStack Networking (neutron) is a first class networking service that gives
|
||||
full control over creation of virtual network resources to tenants. This is
|
||||
often accomplished in the form of tunneling protocols that establish
|
||||
encapsulated communication paths over existing network infrastructure in order
|
||||
to segment tenant traffic. This method varies depending on the specific
|
||||
implementation, but some of the more common methods include tunneling over
|
||||
GRE, encapsulating with VXLAN, and VLAN tags.
|
||||
|
||||
We recommend you design at least three network segments. The first segment
|
||||
should be a public network, used to access REST APIs by tenants and operators.
|
||||
The controller nodes and swift proxies are the only devices connecting to this
|
||||
network segment. In some cases, this public network might also be serviced by
|
||||
hardware load balancers and other network devices.
|
||||
|
||||
The second segment is used by administrators to manage hardware resources.
|
||||
Configuration management tools also utilize this segment for deploying
|
||||
software and services onto new hardware. In some cases, this network
|
||||
segment is also used for internal services, including the message bus
|
||||
and database services. The second segment needs to communicate with every
|
||||
hardware node. Due to the highly sensitive nature of this network segment,
|
||||
it needs to be secured from unauthorized access.
|
||||
|
||||
The third network segment is used by applications and consumers to access the
|
||||
physical network, and for users to access applications. This network is
|
||||
segregated from the one used to access the cloud APIs and is not capable
|
||||
of communicating directly with the hardware resources in the cloud.
|
||||
Communication on this network segment is required by compute resource
|
||||
nodes and network gateway services that allow application data to access the
|
||||
physical network from outside the cloud.
|
||||
|
||||
Legacy networking (nova-network)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The legacy networking (nova-network) service is primarily a layer-2 networking
|
||||
service. It functions in two modes: flat networking mode and VLAN mode. In a
|
||||
flat network mode, all network hardware nodes and devices throughout the cloud
|
||||
are connected to a single layer-2 network segment that provides access to
|
||||
application data.
|
||||
|
||||
However, when the network devices in the cloud support segmentation using
|
||||
VLANs, legacy networking can operate in the second mode. In this design model,
|
||||
each tenant within the cloud is assigned a network subnet which is mapped to
|
||||
a VLAN on the physical network. It is especially important to remember that
|
||||
the maximum number of VLANs that can be used within a spanning tree domain
|
||||
is 4096. This places a hard limit on the amount of growth possible within the
|
||||
data center. Consequently, when designing a general purpose cloud intended to
|
||||
support multiple tenants, we recommend the use of legacy networking with
|
||||
VLANs, and not in flat network mode.
|
||||
|
||||
Layer-2 architecture advantages
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
A network designed on layer-2 protocols has advantages over a network designed
|
||||
on layer-3 protocols. In spite of the difficulties of using a bridge to perform
|
||||
the network role of a router, many vendors, customers, and service providers
|
||||
choose to use Ethernet in as many parts of their networks as possible. The
|
||||
benefits of selecting a layer-2 design are:
|
||||
|
||||
* Ethernet frames contain all the essentials for networking. These include, but
|
||||
are not limited to, globally unique source addresses, globally unique
|
||||
destination addresses, and error control.
|
||||
|
||||
* Ethernet frames contain all the essentials for networking. These include,
|
||||
but are not limited to, globally unique source addresses, globally unique
|
||||
destination addresses, and error control.
|
||||
|
||||
* Ethernet frames can carry any kind of packet. Networking at layer-2 is
|
||||
independent of the layer-3 protocol.
|
||||
|
||||
* Adding more layers to the Ethernet frame only slows the networking process
|
||||
down. This is known as nodal processing delay.
|
||||
|
||||
* You can add adjunct networking features, for example class of service (CoS)
|
||||
or multicasting, to Ethernet as readily as IP networks.
|
||||
|
||||
* VLANs are an easy mechanism for isolating networks.
|
||||
|
||||
Most information starts and ends inside Ethernet frames. Today this applies
|
||||
to data, voice, and video. The concept is that the network will benefit more
|
||||
from the advantages of Ethernet if the transfer of information from a source
|
||||
to a destination is in the form of Ethernet frames.
|
||||
|
||||
Although it is not a substitute for IP networking, networking at layer-2 can
|
||||
be a powerful adjunct to IP networking.
|
||||
|
||||
Layer-2 Ethernet usage has these additional advantages over layer-3 IP network
|
||||
usage:
|
||||
|
||||
* Speed
|
||||
* Reduced overhead of the IP hierarchy.
|
||||
* No need to keep track of address configuration as systems move around.
|
||||
|
||||
Whereas the simplicity of layer-2 protocols might work well in a data center
|
||||
with hundreds of physical machines, cloud data centers have the additional
|
||||
burden of needing to keep track of all virtual machine addresses and
|
||||
networks. In these data centers, it is not uncommon for one physical node
|
||||
to support 30-40 instances.
|
||||
|
||||
.. Important::
|
||||
|
||||
Networking at the frame level says nothing about the presence or
|
||||
absence of IP addresses at the packet level. Almost all ports, links, and
|
||||
devices on a network of LAN switches still have IP addresses, as do all the
|
||||
source and destination hosts. There are many reasons for the continued need
|
||||
for IP addressing. The largest one is the need to manage the network. A
|
||||
device or link without an IP address is usually invisible to most
|
||||
management applications. Utilities including remote access for diagnostics,
|
||||
file transfer of configurations and software, and similar applications
|
||||
cannot run without IP addresses as well as MAC addresses.
|
||||
|
||||
Layer-2 architecture limitations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Layer-2 network architectures have some limitations that become noticeable when
|
||||
used outside of traditional data centers.
|
||||
|
||||
* Number of VLANs is limited to 4096.
|
||||
* The number of MACs stored in switch tables is limited.
|
||||
* You must accommodate the need to maintain a set of layer-4 devices to handle
|
||||
traffic control.
|
||||
* MLAG, often used for switch redundancy, is a proprietary solution that does
|
||||
not scale beyond two devices and forces vendor lock-in.
|
||||
* It can be difficult to troubleshoot a network without IP addresses and ICMP.
|
||||
* Configuring ARP can be complicated on a large layer-2 networks.
|
||||
* All network devices need to be aware of all MACs, even instance MACs, so
|
||||
there is constant churn in MAC tables and network state changes as instances
|
||||
start and stop.
|
||||
* Migrating MACs (instance migration) to different physical locations are a
|
||||
potential problem if you do not set ARP table timeouts properly.
|
||||
|
||||
It is important to know that layer-2 has a very limited set of network
|
||||
management tools. It is difficult to control traffic as it does not have
|
||||
mechanisms to manage the network or shape the traffic. Network
|
||||
troubleshooting is also troublesome, in part because network devices have
|
||||
no IP addresses. As a result, there is no reasonable way to check network
|
||||
delay.
|
||||
|
||||
In a layer-2 network all devices are aware of all MACs, even those that belong
|
||||
to instances. The network state information in the backbone changes whenever an
|
||||
instance starts or stops. Because of this, there is far too much churn in the
|
||||
MAC tables on the backbone switches.
|
||||
|
||||
Furthermore, on large layer-2 networks, configuring ARP learning can be
|
||||
complicated. The setting for the MAC address timer on switches is critical
|
||||
and, if set incorrectly, can cause significant performance problems. So when
|
||||
migrating MACs to different physical locations to support instance migration,
|
||||
problems may arise. As an example, the Cisco default MAC address timer is
|
||||
extremely long. As such, the network information maintained in the switches
|
||||
could be out of sync with the new location of the instance.
|
||||
|
||||
Layer-3 architecture advantages
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In layer-3 networking, routing takes instance MAC and IP addresses out of the
|
||||
network core, reducing state churn. The only time there would be a routing
|
||||
state change is in the case of a Top of Rack (ToR) switch failure or a link
|
||||
failure in the backbone itself. Other advantages of using a layer-3
|
||||
architecture include:
|
||||
|
||||
* Layer-3 networks provide the same level of resiliency and scalability
|
||||
as the Internet.
|
||||
|
||||
* Controlling traffic with routing metrics is straightforward.
|
||||
|
||||
* You can configure layer-3 to useˇBGPˇconfederation for scalability. This
|
||||
way core routers have state proportional to the number of racks, not to the
|
||||
number of servers or instances.
|
||||
|
||||
* There are a variety of well tested tools, such as ICMP, to monitor and
|
||||
manage traffic.
|
||||
|
||||
* Layer-3 architectures enable the use of :term:`quality of service (QoS)` to
|
||||
manage network performance.
|
||||
|
||||
Layer-3 architecture limitations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The main limitation of layer-3 networking is that there is no built-in
|
||||
isolation mechanism comparable to the VLANs in layer-2 networks. Furthermore,
|
||||
the hierarchical nature of IP addresses means that an instance is on the same
|
||||
subnet as its
|
||||
physical host, making migration out of the subnet difficult. For these reasons,
|
||||
network virtualization needs to use IPencapsulation and software at the end
|
||||
hosts. This is for isolation and the separation of the addressing in the
|
||||
virtual layer from the addressing in the physical layer. Other potential
|
||||
disadvantages of layer 3 include the need to design an IP addressing scheme
|
||||
rather than relying on the switches to keep track of the MAC addresses
|
||||
automatically, and to configure the interior gateway routing protocol in the
|
||||
switches.
|
||||
|
||||
Network design
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
There are many reasons an OpenStack network has complex requirements. However,
|
||||
one main factor is the many components that interact at different levels of the
|
||||
system stack, adding complexity. Data flows are also complex. Data in an
|
||||
OpenStack cloud moves both between instances across the network (also known as
|
||||
East-West), as well as in and out of the system (also known as North-South).
|
||||
Physical server nodes have network requirements that are independent of
|
||||
instance network requirements, and must be isolated to account for
|
||||
scalability. We recommend separating the networks for security purposes and
|
||||
tuning performance through traffic shaping.
|
||||
|
||||
You must consider a number of important general technical and business factors
|
||||
when planning and designing an OpenStack network. These include:
|
||||
|
||||
* A requirement for vendor independence. To avoid hardware or software vendor
|
||||
lock-in, the design should not rely on specific features of a vendors router
|
||||
or switch.
|
||||
* A requirement to massively scale the ecosystem to support millions of end
|
||||
users.
|
||||
* A requirement to support indeterminate platforms and applications.
|
||||
* A requirement to design for cost efficient operations to take advantage of
|
||||
massive scale.
|
||||
* A requirement to ensure that there is no single point of failure in the
|
||||
cloud ecosystem.
|
||||
* A requirement for high availability architecture to meet customer SLA
|
||||
requirements.
|
||||
* A requirement to be tolerant of rack level failure.
|
||||
* A requirement to maximize flexibility to architect future production
|
||||
environments.
|
||||
|
||||
Bearing in mind these considerations, we recommend the following:
|
||||
|
||||
* Layer-3 designs are preferable to layer-2 architectures.
|
||||
* Design a dense multi-path network core to support multi-directional
|
||||
scaling and flexibility.
|
||||
* Use hierarchical addressing because it is the only viable option to scale
|
||||
network ecosystem.
|
||||
* Use virtual networking to isolate instance service network traffic from the
|
||||
management and internal network traffic.
|
||||
* Isolate virtual networks using encapsulation technologies.
|
||||
* Use traffic shaping for performance tuning.
|
||||
* Use eBGP to connect to the Internet up-link.
|
||||
* Use iBGP to flatten the internal traffic on the layer-3 mesh.
|
||||
* Determine the most effective configuration for block storage network.
|
||||
|
||||
|
||||
Additional considerations
|
||||
-------------------------
|
||||
|
||||
There are several further considerations when designing a network-focused
|
||||
OpenStack cloud. Redundant networking: ToR switch high availability risk
|
||||
analysis. In most cases, it is much more economical to use a single switch
|
||||
with a small pool of spare switches to replace failed units than it is to
|
||||
outfit an entire data center with redundant switches. Applications should
|
||||
tolerate rack level outages without affecting normal operations since network
|
||||
and compute resources are easily provisioned and plentiful.
|
||||
|
||||
Research indicates the mean time between failures (MTBF) on switches is
|
||||
between 100,000 and 200,000 hours. This number is dependent on the ambient
|
||||
temperature of the switch in the data center. When properly cooled and
|
||||
maintained, this translates to between 11 and 22 years before failure. Even
|
||||
in the worst case of poor ventilation and high ambient temperatures in the data
|
||||
center, the MTBF is still 2-3 years.
|
||||
|
||||
Reference
|
||||
https://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
|
||||
for further information.
|
||||
|
||||
Legacy networking (nova-network)
|
||||
OpenStack Networking
|
||||
Simple, single agent
|
||||
Complex, multiple agents
|
||||
Flat or VLAN
|
||||
Flat, VLAN, Overlays, L2-L3, SDN
|
||||
No plug-in support
|
||||
Plug-in support for 3rd parties
|
||||
No multi-tier topologies
|
||||
Multi-tier topologies
|
||||
|
||||
Preparing for the future: IPv6 support
|
||||
--------------------------------------
|
||||
|
||||
One of the most important networking topics today is the exhaustion of
|
||||
IPv4 addresses. As of late 2015, ICANN announced that the the final
|
||||
IPv4 address blocks have been fully assigned. Because of this, IPv6
|
||||
protocol has become the future of network focused applications. IPv6
|
||||
increases the address space significantly, fixes long standing issues
|
||||
in the IPv4 protocol, and will become essential for network focused
|
||||
applications in the future.
|
||||
|
||||
OpenStack Networking, when configured for it, supports IPv6. To enable
|
||||
IPv6, create an IPv6 subnet in Networking and use IPv6 prefixes when
|
||||
creating security groups.
|
||||
|
||||
Asymmetric links
|
||||
----------------
|
||||
|
||||
When designing a network architecture, the traffic patterns of an
|
||||
application heavily influence the allocation of total bandwidth and
|
||||
the number of links that you use to send and receive traffic. Applications
|
||||
that provide file storage for customers allocate bandwidth and links to
|
||||
favor incoming traffic; whereas video streaming applications allocate
|
||||
bandwidth and links to favor outgoing traffic.
|
||||
|
||||
Performance
|
||||
-----------
|
||||
|
||||
It is important to analyze the applications tolerance for latency and
|
||||
jitter when designing an environment to support network focused
|
||||
applications. Certain applications, for example VoIP, are less tolerant
|
||||
of latency and jitter. When latency and jitter are issues, certain
|
||||
applications may require tuning of QoS parameters and network device
|
||||
queues to ensure that they queue for transmit immediately or guarantee
|
||||
minimum bandwidth. Since OpenStack currently does not support these functions,
|
||||
consider carefully your selected network plug-in.
|
||||
|
||||
The location of a service may also impact the application or consumer
|
||||
experience. If an application serves differing content to different users,
|
||||
it must properly direct connections to those specific locations. Where
|
||||
appropriate, use a multi-site installation for these situations.
|
||||
|
||||
You can implement networking in two separate ways. Legacy networking
|
||||
(nova-network) provides a flat DHCP network with a single broadcast domain.
|
||||
This implementation does not support tenant isolation networks or advanced
|
||||
plug-ins, but it is currently the only way to implement a distributed
|
||||
layer-3 (L3) agent using the multi host configuration. OpenStack Networking
|
||||
(neutron) is the official networking implementation and provides a pluggable
|
||||
architecture that supports a large variety of network methods. Some of these
|
||||
include a layer-2 only provider network model, external device plug-ins, or
|
||||
even OpenFlow controllers.
|
||||
|
||||
Networking at large scales becomes a set of boundary questions. The
|
||||
determination of how large a layer-2 domain must be is based on the
|
||||
amount of nodes within the domain and the amount of broadcast traffic
|
||||
that passes between instances. Breaking layer-2 boundaries may require
|
||||
the implementation of overlay networks and tunnels. This decision is a
|
||||
balancing act between the need for a smaller overhead or a need for a smaller
|
||||
domain.
|
||||
|
||||
When selecting network devices, be aware that making a decision based on the
|
||||
greatest port density often comes with a drawback. Aggregation switches and
|
||||
routers have not all kept pace with Top of Rack switches and may induce
|
||||
bottlenecks on north-south traffic. As a result, it may be possible for
|
||||
massive amounts of downstream network utilization to impact upstream network
|
||||
devices, impacting service to the cloud. Since OpenStack does not currently
|
||||
provide a mechanism for traffic shaping or rate limiting, it is necessary to
|
||||
implement these features at the network hardware level.
|
||||
|
||||
Tunable networking components
|
||||
-----------------------------
|
||||
|
||||
Consider configurable networking components related to an OpenStack
|
||||
architecture design when designing for network intensive workloads
|
||||
that include MTU and QoS. Some workloads require a larger MTU than normal
|
||||
due to the transfer of large blocks of data. When providing network
|
||||
service for applications such as video streaming or storage replication,
|
||||
we recommend that you configure both OpenStack hardware nodes and the
|
||||
supporting network equipment for jumbo frames where possible. This
|
||||
allows for better use of available bandwidth. Configure jumbo frames across the
|
||||
complete path the packets traverse. If one network component is not capable of
|
||||
handling jumbo frames then the entire path reverts to the default MTU.
|
||||
|
||||
:term:`Quality of Service (QoS)` also has a great impact on network intensive
|
||||
workloads as it provides instant service to packets which have a higher
|
||||
priority due to the impact of poor network performance. In applications such as
|
||||
Voice over IP (VoIP), differentiated services code points are a near
|
||||
requirement for proper operation. You can also use QoS in the opposite
|
||||
direction for mixed workloads to prevent low priority but high bandwidth
|
||||
applications, for example backup services, video conferencing, or file sharing,
|
||||
from blocking bandwidth that is needed for the proper operation of other
|
||||
workloads. It is possible to tag file storage traffic as a lower class, such as
|
||||
best effort or scavenger, to allow the higher priority traffic through. In
|
||||
cases where regions within a cloud might be geographically distributed it may
|
||||
also be necessary to plan accordingly to implement WAN optimization to combat
|
||||
latency or packet loss
|
||||
|
||||
Network hardware selection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The network architecture determines which network hardware will be
|
||||
used. Networking software is determined by the selected networking
|
||||
hardware.
|
||||
|
||||
There are more subtle design impacts that need to be considered. The
|
||||
selection of certain networking hardware (and the networking software)
|
||||
affects the management tools that can be used. There are exceptions to
|
||||
this; the rise of *open* networking software that supports a range of
|
||||
networking hardware means there are instances where the relationship
|
||||
between networking hardware and networking software are not as tightly
|
||||
defined.
|
||||
|
||||
For a compute-focus architecture, we recommend designing the network
|
||||
architecture using a scalable network model that makes it easy to add
|
||||
capacity and bandwidth. A good example of such a model is the leaf-spline
|
||||
model. In this type of network design, you can add additional
|
||||
bandwidth as well as scale out to additional racks of gear. It is important to
|
||||
select network hardware that supports port count, port speed, and
|
||||
port density while allowing for future growth as workload demands
|
||||
increase. In the network architecture, it is also important to evaluate
|
||||
where to provide redundancy.
|
||||
|
||||
Some of the key considerations in the selection of networking hardware
|
||||
include:
|
||||
|
||||
Port count
|
||||
The design will require networking hardware that has the requisite
|
||||
port count.
|
||||
|
||||
Port density
|
||||
The network design will be affected by the physical space that is
|
||||
required to provide the requisite port count. A higher port density
|
||||
is preferred, as it leaves more rack space for compute or storage
|
||||
components. This can also lead into considerations about fault domains
|
||||
and power density. Higher density switches are more expensive, therefore
|
||||
it is important not to over design the network.
|
||||
|
||||
Port speed
|
||||
The networking hardware must support the proposed network speed, for
|
||||
example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE).
|
||||
|
||||
Redundancy
|
||||
User requirements for high availability and cost considerations
|
||||
influence the level of network hardware redundancy.
|
||||
Network redundancy can be achieved by adding redundant power
|
||||
supplies or paired switches.
|
||||
|
||||
.. note::
|
||||
|
||||
Hardware must support network redundacy.
|
||||
|
||||
Power requirements
|
||||
Ensure that the physical data center provides the necessary power
|
||||
for the selected network hardware.
|
||||
|
||||
.. note::
|
||||
|
||||
This is not an issue for top of rack (ToR) switches. This may be an issue
|
||||
for spine switches in a leaf and spine fabric, or end of row (EoR)
|
||||
switches.
|
||||
|
||||
Protocol support
|
||||
It is possible to gain more performance out of a single storage
|
||||
system by using specialized network technologies such as RDMA, SRP,
|
||||
iSER and SCST. The specifics for using these technologies is beyond
|
||||
the scope of this book.
|
||||
|
||||
There is no single best practice architecture for the networking
|
||||
hardware supporting an OpenStack cloud. Some of the key factors that will
|
||||
have a major influence on selection of networking hardware include:
|
||||
|
||||
Connectivity
|
||||
All nodes within an OpenStack cloud require network connectivity. In
|
||||
some cases, nodes require access to more than one network segment.
|
||||
The design must encompass sufficient network capacity and bandwidth
|
||||
to ensure that all communications within the cloud, both north-south
|
||||
and east-west traffic have sufficient resources available.
|
||||
|
||||
Scalability
|
||||
The network design should encompass a physical and logical network
|
||||
design that can be easily expanded upon. Network hardware should
|
||||
offer the appropriate types of interfaces and speeds that are
|
||||
required by the hardware nodes.
|
||||
|
||||
Availability
|
||||
To ensure access to nodes within the cloud is not interrupted,
|
||||
we recommend that the network architecture identify any single
|
||||
points of failure and provide some level of redundancy or fault
|
||||
tolerance. The network infrastructure often involves use of
|
||||
networking protocols such as LACP, VRRP or others to achieve a highly
|
||||
available network connection. It is also important to consider the
|
||||
networking implications on API availability. We recommend a load balancing
|
||||
solution is designed within the network architecture to ensure that the APIs,
|
||||
and potentially other services in the cloud are highly available.
|
||||
|
||||
Networking software selection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenStack Networking (neutron) provides a wide variety of networking
|
||||
services for instances. There are many additional networking software
|
||||
packages that can be useful when managing OpenStack components. Some
|
||||
examples include:
|
||||
|
||||
* Software to provide load balancing
|
||||
|
||||
* Network redundancy protocols
|
||||
|
||||
* Routing daemons
|
||||
|
||||
Some of these software packages are described in more detail in the
|
||||
`OpenStack network nodes chapter <http://docs.openstack.org/ha-guide
|
||||
/networking-ha.html>`_ in the OpenStack High Availability Guide.
|
||||
|
||||
For a general purpose OpenStack cloud, the OpenStack infrastructure
|
||||
components need to be highly available. If the design does not include
|
||||
hardware load balancing, networking software packages like HAProxy will
|
||||
need to be included.
|
||||
|
||||
For a compute-focused OpenStack cloud, the OpenStack infrastructure
|
||||
components must be highly available. If the design does not include
|
||||
hardware load balancing, you must add networking software packages, for
|
||||
example, HAProxy.
|
||||
|
@ -226,3 +226,166 @@ compute cloud are:
|
||||
* To provide users with a persistent storage mechanism
|
||||
* As a scalable, reliable data store for virtual machine images
|
||||
|
||||
Selecting storage hardware
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Storage hardware architecture is determined by selecting specific storage
|
||||
architecture. Determine the selection of storage architecture by
|
||||
evaluating possible solutions against the critical factors, the user
|
||||
requirements, technical considerations, and operational considerations.
|
||||
Consider the following factors when selecting storage hardware:
|
||||
|
||||
Cost
|
||||
Storage can be a significant portion of the overall system cost. For
|
||||
an organization that is concerned with vendor support, a commercial
|
||||
storage solution is advisable, although it comes with a higher price
|
||||
tag. If initial capital expenditure requires minimization, designing
|
||||
a system based on commodity hardware would apply. The trade-off is
|
||||
potentially higher support costs and a greater risk of
|
||||
incompatibility and interoperability issues.
|
||||
|
||||
Performance
|
||||
The latency of storage I/O requests indicates performance. Performance
|
||||
requirements affect which solution you choose.
|
||||
|
||||
Scalability
|
||||
Scalability, along with expandability, is a major consideration in a
|
||||
general purpose OpenStack cloud. It might be difficult to predict
|
||||
the final intended size of the implementation as there are no
|
||||
established usage patterns for a general purpose cloud. It might
|
||||
become necessary to expand the initial deployment in order to
|
||||
accommodate growth and user demand.
|
||||
|
||||
Expandability
|
||||
Expandability is a major architecture factor for storage solutions
|
||||
with general purpose OpenStack cloud. A storage solution that
|
||||
expands to 50 PB is considered more expandable than a solution that
|
||||
only scales to 10 PB. This meter is related to scalability, which is
|
||||
the measure of a solution's performance as it expands.
|
||||
|
||||
General purpose cloud storage requirements
|
||||
------------------------------------------
|
||||
Using a scale-out storage solution with direct-attached storage (DAS) in
|
||||
the servers is well suited for a general purpose OpenStack cloud. Cloud
|
||||
services requirements determine your choice of scale-out solution. You
|
||||
need to determine if a single, highly expandable and highly vertical,
|
||||
scalable, centralized storage array is suitable for your design. After
|
||||
determining an approach, select the storage hardware based on this
|
||||
criteria.
|
||||
|
||||
This list expands upon the potential impacts for including a particular
|
||||
storage architecture (and corresponding storage hardware) into the
|
||||
design for a general purpose OpenStack cloud:
|
||||
|
||||
Connectivity
|
||||
If storage protocols other than Ethernet are part of the storage solution,
|
||||
ensure the appropriate hardware has been selected. If a centralized storage
|
||||
array is selected, ensure that the hypervisor will be able to connect to
|
||||
that storage array for image storage.
|
||||
|
||||
Usage
|
||||
How the particular storage architecture will be used is critical for
|
||||
determining the architecture. Some of the configurations that will
|
||||
influence the architecture include whether it will be used by the
|
||||
hypervisors for ephemeral instance storage, or if OpenStack Object
|
||||
Storage will use it for object storage.
|
||||
|
||||
Instance and image locations
|
||||
Where instances and images will be stored will influence the
|
||||
architecture.
|
||||
|
||||
Server hardware
|
||||
If the solution is a scale-out storage architecture that includes
|
||||
DAS, it will affect the server hardware selection. This could ripple
|
||||
into the decisions that affect host density, instance density, power
|
||||
density, OS-hypervisor, management tools and others.
|
||||
|
||||
A general purpose OpenStack cloud has multiple options. The key factors
|
||||
that will have an influence on selection of storage hardware for a
|
||||
general purpose OpenStack cloud are as follows:
|
||||
|
||||
Capacity
|
||||
Hardware resources selected for the resource nodes should be capable
|
||||
of supporting enough storage for the cloud services. Defining the
|
||||
initial requirements and ensuring the design can support adding
|
||||
capacity is important. Hardware nodes selected for object storage
|
||||
should be capable of support a large number of inexpensive disks
|
||||
with no reliance on RAID controller cards. Hardware nodes selected
|
||||
for block storage should be capable of supporting high speed storage
|
||||
solutions and RAID controller cards to provide performance and
|
||||
redundancy to storage at a hardware level. Selecting hardware RAID
|
||||
controllers that automatically repair damaged arrays will assist
|
||||
with the replacement and repair of degraded or deleted storage
|
||||
devices.
|
||||
|
||||
Performance
|
||||
Disks selected for object storage services do not need to be fast
|
||||
performing disks. We recommend that object storage nodes take
|
||||
advantage of the best cost per terabyte available for storage.
|
||||
Contrastingly, disks chosen for block storage services should take
|
||||
advantage of performance boosting features that may entail the use
|
||||
of SSDs or flash storage to provide high performance block storage
|
||||
pools. Storage performance of ephemeral disks used for instances
|
||||
should also be taken into consideration.
|
||||
|
||||
Fault tolerance
|
||||
Object storage resource nodes have no requirements for hardware
|
||||
fault tolerance or RAID controllers. It is not necessary to plan for
|
||||
fault tolerance within the object storage hardware because the
|
||||
object storage service provides replication between zones as a
|
||||
feature of the service. Block storage nodes, compute nodes, and
|
||||
cloud controllers should all have fault tolerance built in at the
|
||||
hardware level by making use of hardware RAID controllers and
|
||||
varying levels of RAID configuration. The level of RAID chosen
|
||||
should be consistent with the performance and availability
|
||||
requirements of the cloud.
|
||||
|
||||
Storage-focus cloud storage requirements
|
||||
----------------------------------------
|
||||
|
||||
Storage-focused OpenStack clouds must address I/O intensive workloads.
|
||||
These workloads are not CPU intensive, nor are they consistently network
|
||||
intensive. The network may be heavily utilized to transfer storage, but
|
||||
they are not otherwise network intensive.
|
||||
|
||||
The selection of storage hardware determines the overall performance and
|
||||
scalability of a storage-focused OpenStack design architecture. Several
|
||||
factors impact the design process, including:
|
||||
|
||||
Latency is a key consideration in a storage-focused OpenStack cloud.
|
||||
Using solid-state disks (SSDs) to minimize latency and, to reduce CPU
|
||||
delays caused by waiting for the storage, increases performance. Use
|
||||
RAID controller cards in compute hosts to improve the performance of the
|
||||
underlying disk subsystem.
|
||||
|
||||
Depending on the storage architecture, you can adopt a scale-out
|
||||
solution, or use a highly expandable and scalable centralized storage
|
||||
array. If a centralized storage array meets your requirements, then the
|
||||
array vendor determines the hardware selection. It is possible to build
|
||||
a storage array using commodity hardware with Open Source software, but
|
||||
requires people with expertise to build such a system.
|
||||
|
||||
On the other hand, a scale-out storage solution that uses
|
||||
direct-attached storage (DAS) in the servers may be an appropriate
|
||||
choice. This requires configuration of the server hardware to support
|
||||
the storage solution.
|
||||
|
||||
Considerations affecting storage architecture (and corresponding storage
|
||||
hardware) of a Storage-focused OpenStack cloud include:
|
||||
|
||||
Connectivity
|
||||
Ensure the connectivity matches the storage solution requirements. We
|
||||
recommended confirming that the network characteristics minimize latency
|
||||
to boost the overall performance of the design.
|
||||
|
||||
Latency
|
||||
Determine if the use case has consistent or highly variable latency.
|
||||
|
||||
Throughput
|
||||
Ensure that the storage solution throughput is optimized for your
|
||||
application requirements.
|
||||
|
||||
Server hardware
|
||||
Use of DAS impacts the server hardware choice and affects host
|
||||
density, instance density, power density, OS-hypervisor, and
|
||||
management tools.
|
||||
|
@ -5,6 +5,63 @@ Operator requirements
|
||||
This section describes operational factors affecting the design of an
|
||||
OpenStack cloud.
|
||||
|
||||
Network design
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The network design for an OpenStack cluster includes decisions regarding
|
||||
the interconnect needs within the cluster, the need to allow clients to
|
||||
access their resources, and the access requirements for operators to
|
||||
administrate the cluster. You should consider the bandwidth, latency,
|
||||
and reliability of these networks.
|
||||
|
||||
Consider additional design decisions about monitoring and alarming.
|
||||
If you are using an external provider, service level agreements (SLAs)
|
||||
are typically defined in your contract. Operational considerations such
|
||||
as bandwidth, latency, and jitter can be part of the SLA.
|
||||
|
||||
As demand for network resources increase, make sure your network design
|
||||
accommodates expansion and upgrades. Operators add additional IP address
|
||||
blocks and add additional bandwidth capacity. In addition, consider
|
||||
managing hardware and software lifecycle events, for example upgrades,
|
||||
decommissioning, and outages, while avoiding service interruptions for
|
||||
tenants.
|
||||
|
||||
Factor maintainability into the overall network design. This includes
|
||||
the ability to manage and maintain IP addresses as well as the use of
|
||||
overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
|
||||
tags. As an example, if you may need to change all of the IP addresses
|
||||
on a network, a process known as renumbering, then the design must
|
||||
support this function.
|
||||
|
||||
Address network-focused applications when considering certain
|
||||
operational realities. For example, consider the impending exhaustion of
|
||||
IPv4 addresses, the migration to IPv6, and the use of private networks
|
||||
to segregate different types of traffic that an application receives or
|
||||
generates. In the case of IPv4 to IPv6 migrations, applications should
|
||||
follow best practices for storing IP addresses. We recommend you avoid
|
||||
relying on IPv4 features that did not carry over to the IPv6 protocol or
|
||||
have differences in implementation.
|
||||
|
||||
To segregate traffic, allow applications to create a private tenant
|
||||
network for database and storage network traffic. Use a public network
|
||||
for services that require direct client access from the Internet. Upon
|
||||
segregating the traffic, consider :term:`quality of service (QoS)` and
|
||||
security to ensure each network has the required level of service.
|
||||
|
||||
Also consider the routing of network traffic. For some applications,
|
||||
develop a complex policy framework for routing. To create a routing
|
||||
policy that satisfies business requirements, consider the economic cost
|
||||
of transmitting traffic over expensive links versus cheaper links, in
|
||||
addition to bandwidth, latency, and jitter requirements.
|
||||
|
||||
Finally, consider how to respond to network events. How load
|
||||
transfers from one link to another during a failure scenario could be
|
||||
a factor in the design. If you do not plan network capacity
|
||||
correctly, failover traffic could overwhelm other ports or network
|
||||
links and create a cascading failure scenario. In this case,
|
||||
traffic that fails over to one link overwhelms that link and then
|
||||
moves to the subsequent links until all network traffic stops.
|
||||
|
||||
SLA considerations
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -102,6 +159,89 @@ managing and maintaining your OpenStack environment, see the
|
||||
`Operations chapter <http://docs.openstack.org/ops-guide/operations.html>`_
|
||||
in the OpenStack Operations Guide.
|
||||
|
||||
Logging and monitoring
|
||||
----------------------
|
||||
|
||||
OpenStack clouds require appropriate monitoring platforms to identify and
|
||||
manage errors.
|
||||
|
||||
.. note::
|
||||
|
||||
We recommend leveraging existing monitoring systems to see if they
|
||||
are able to effectively monitor an OpenStack environment.
|
||||
|
||||
Specific meters that are critically important to capture include:
|
||||
|
||||
* Image disk utilization
|
||||
|
||||
* Response time to the Compute API
|
||||
|
||||
Logging and monitoring does not significantly differ for a multi-site OpenStack
|
||||
cloud. The tools described in the `Logging and monitoring chapter
|
||||
<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`__ of
|
||||
the Operations Guide remain applicable. Logging and monitoring can be provided
|
||||
on a per-site basis, and in a common centralized location.
|
||||
|
||||
When attempting to deploy logging and monitoring facilities to a centralized
|
||||
location, care must be taken with the load placed on the inter-site networking
|
||||
links
|
||||
|
||||
Management software
|
||||
-------------------
|
||||
|
||||
Management software providing clustering, logging, monitoring, and alerting
|
||||
details for a cloud environment is often used. This impacts and affects the
|
||||
overall OpenStack cloud design, and must account for the additional resource
|
||||
consumption such as CPU, RAM, storage, and network
|
||||
bandwidth.
|
||||
|
||||
The inclusion of clustering software, such as Corosync or Pacemaker, is
|
||||
primarily determined by the availability of the cloud infrastructure and
|
||||
the complexity of supporting the configuration after it is deployed. The
|
||||
`OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/>`_
|
||||
provides more details on the installation and configuration of Corosync
|
||||
and Pacemaker, should these packages need to be included in the design.
|
||||
|
||||
Some other potential design impacts include:
|
||||
|
||||
* OS-hypervisor combination
|
||||
Ensure that the selected logging, monitoring, or alerting tools support
|
||||
the proposed OS-hypervisor combination.
|
||||
|
||||
* Network hardware
|
||||
The network hardware selection needs to be supported by the logging,
|
||||
monitoring, and alerting software.
|
||||
|
||||
Database software
|
||||
-----------------
|
||||
|
||||
Most OpenStack components require access to back-end database services
|
||||
to store state and configuration information. Choose an appropriate
|
||||
back-end database which satisfies the availability and fault tolerance
|
||||
requirements of the OpenStack services.
|
||||
|
||||
MySQL is the default database for OpenStack, but other compatible
|
||||
databases are available.
|
||||
|
||||
.. note::
|
||||
|
||||
Telemetry uses MongoDB.
|
||||
|
||||
The chosen high availability database solution changes according to the
|
||||
selected database. MySQL, for example, provides several options. Use a
|
||||
replication technology such as Galera for active-active clustering. For
|
||||
active-passive use some form of shared storage. Each of these potential
|
||||
solutions has an impact on the design:
|
||||
|
||||
* Solutions that employ Galera/MariaDB require at least three MySQL
|
||||
nodes.
|
||||
|
||||
* MongoDB has its own design considerations for high availability.
|
||||
|
||||
* OpenStack design, generally, does not include shared storage.
|
||||
However, for some high availability designs, certain components might
|
||||
require it depending on the specific implementation.
|
||||
|
||||
Operator access to systems
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
33
doc/arch-design-draft/source/overview-software-licensing.rst
Normal file
33
doc/arch-design-draft/source/overview-software-licensing.rst
Normal file
@ -0,0 +1,33 @@
|
||||
==================
|
||||
Software licensing
|
||||
==================
|
||||
|
||||
The many different forms of license agreements for software are often written
|
||||
with the use of dedicated hardware in mind. This model is relevant for the
|
||||
cloud platform itself, including the hypervisor operating system, supporting
|
||||
software for items such as database, RPC, backup, and so on. Consideration
|
||||
must be made when offering Compute service instances and applications to end
|
||||
users of the cloud, since the license terms for that software may need some
|
||||
adjustment to be able to operate economically in the cloud.
|
||||
|
||||
Multi-site OpenStack deployments present additional licensing
|
||||
considerations over and above regular OpenStack clouds, particularly
|
||||
where site licenses are in use to provide cost efficient access to
|
||||
software licenses. The licensing for host operating systems, guest
|
||||
operating systems, OpenStack distributions (if applicable),
|
||||
software-defined infrastructure including network controllers and
|
||||
storage systems, and even individual applications need to be evaluated.
|
||||
|
||||
Topics to consider include:
|
||||
|
||||
* The definition of what constitutes a site in the relevant licenses,
|
||||
as the term does not necessarily denote a geographic or otherwise
|
||||
physically isolated location.
|
||||
|
||||
* Differentiations between "hot" (active) and "cold" (inactive) sites,
|
||||
where significant savings may be made in situations where one site is
|
||||
a cold standby for disaster recovery purposes only.
|
||||
|
||||
* Certain locations might require local vendors to provide support and
|
||||
services for each site which may vary with the licensing agreement in
|
||||
place.
|
@ -55,5 +55,6 @@ covered include:
|
||||
overview-planning
|
||||
overview-customer-requirements
|
||||
overview-legal-requirements
|
||||
overview-software-licensing
|
||||
overview-security-requirements
|
||||
overview-operator-requirements
|
||||
|
Loading…
Reference in New Issue
Block a user