Merge "[arch-design-draft] Migrate technical requirements content"
This commit is contained in:
commit
25bde8fc08
@ -1,449 +0,0 @@
|
|||||||
==================
|
|
||||||
Hardware selection
|
|
||||||
==================
|
|
||||||
|
|
||||||
Hardware selection involves three key areas:
|
|
||||||
|
|
||||||
* Network
|
|
||||||
|
|
||||||
* Compute
|
|
||||||
|
|
||||||
* Storage
|
|
||||||
|
|
||||||
Network hardware selection
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The network architecture determines which network hardware will be
|
|
||||||
used. Networking software is determined by the selected networking
|
|
||||||
hardware.
|
|
||||||
|
|
||||||
There are more subtle design impacts that need to be considered. The
|
|
||||||
selection of certain networking hardware (and the networking software)
|
|
||||||
affects the management tools that can be used. There are exceptions to
|
|
||||||
this; the rise of *open* networking software that supports a range of
|
|
||||||
networking hardware means there are instances where the relationship
|
|
||||||
between networking hardware and networking software are not as tightly
|
|
||||||
defined.
|
|
||||||
|
|
||||||
For a compute-focus architecture, we recommend designing the network
|
|
||||||
architecture using a scalable network model that makes it easy to add
|
|
||||||
capacity and bandwidth. A good example of such a model is the leaf-spline
|
|
||||||
model. In this type of network design, it is possible to easily add additional
|
|
||||||
bandwidth as well as scale out to additional racks of gear. It is important to
|
|
||||||
select network hardware that supports the required port count, port speed, and
|
|
||||||
port density while also allowing for future growth as workload demands
|
|
||||||
increase. It is also important to evaluate where in the network architecture
|
|
||||||
it is valuable to provide redundancy.
|
|
||||||
|
|
||||||
Some of the key considerations that should be included in the selection
|
|
||||||
of networking hardware include:
|
|
||||||
|
|
||||||
Port count
|
|
||||||
The design will require networking hardware that has the requisite
|
|
||||||
port count.
|
|
||||||
|
|
||||||
Port density
|
|
||||||
The network design will be affected by the physical space that is
|
|
||||||
required to provide the requisite port count. A higher port density
|
|
||||||
is preferred, as it leaves more rack space for compute or storage
|
|
||||||
components that may be required by the design. This can also lead
|
|
||||||
into considerations about fault domains and power density. Higher
|
|
||||||
density switches are more expensive, therefore it is important not
|
|
||||||
to over design the network.
|
|
||||||
|
|
||||||
Port speed
|
|
||||||
The networking hardware must support the proposed network speed, for
|
|
||||||
example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE).
|
|
||||||
|
|
||||||
Redundancy
|
|
||||||
User requirements for high availability and cost considerations
|
|
||||||
influence the required level of network hardware redundancy.
|
|
||||||
Network redundancy can be achieved by adding redundant power
|
|
||||||
supplies or paired switches.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
If this is a requirement, the hardware must support this
|
|
||||||
configuration. User requirements determine if a completely
|
|
||||||
redundant network infrastructure is required.
|
|
||||||
|
|
||||||
Power requirements
|
|
||||||
Ensure that the physical data center provides the necessary power
|
|
||||||
for the selected network hardware.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
This is not an issue for top of rack (ToR) switches. This may be an issue
|
|
||||||
for spine switches in a leaf and spine fabric, or end of row (EoR)
|
|
||||||
switches.
|
|
||||||
|
|
||||||
Protocol support
|
|
||||||
It is possible to gain more performance out of a single storage
|
|
||||||
system by using specialized network technologies such as RDMA, SRP,
|
|
||||||
iSER and SCST. The specifics for using these technologies is beyond
|
|
||||||
the scope of this book.
|
|
||||||
|
|
||||||
There is no single best practice architecture for the networking
|
|
||||||
hardware supporting an OpenStack cloud that will apply to all implementations.
|
|
||||||
Some of the key factors that will have a major influence on selection of
|
|
||||||
networking hardware include:
|
|
||||||
|
|
||||||
Connectivity
|
|
||||||
All nodes within an OpenStack cloud require network connectivity. In
|
|
||||||
some cases, nodes require access to more than one network segment.
|
|
||||||
The design must encompass sufficient network capacity and bandwidth
|
|
||||||
to ensure that all communications within the cloud, both north-south
|
|
||||||
and east-west traffic have sufficient resources available.
|
|
||||||
|
|
||||||
Scalability
|
|
||||||
The network design should encompass a physical and logical network
|
|
||||||
design that can be easily expanded upon. Network hardware should
|
|
||||||
offer the appropriate types of interfaces and speeds that are
|
|
||||||
required by the hardware nodes.
|
|
||||||
|
|
||||||
Availability
|
|
||||||
To ensure access to nodes within the cloud is not interrupted,
|
|
||||||
we recommend that the network architecture identify any single
|
|
||||||
points of failure and provide some level of redundancy or fault
|
|
||||||
tolerance. The network infrastructure often involves use of
|
|
||||||
networking protocols such as LACP, VRRP or others to achieve a highly
|
|
||||||
available network connection. It is also important to consider the
|
|
||||||
networking implications on API availability. We recommend a load balancing
|
|
||||||
solution is designed within the network architecture to ensure that the APIs,
|
|
||||||
and potentially other services in the cloud are highly available.
|
|
||||||
|
|
||||||
Compute (server) hardware selection
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Consider the following factors when selecting compute (server) hardware:
|
|
||||||
|
|
||||||
* Server density
|
|
||||||
A measure of how many servers can fit into a given measure of
|
|
||||||
physical space, such as a rack unit [U].
|
|
||||||
|
|
||||||
* Resource capacity
|
|
||||||
The number of CPU cores, how much RAM, or how much storage a given
|
|
||||||
server delivers.
|
|
||||||
|
|
||||||
* Expandability
|
|
||||||
The number of additional resources you can add to a server before it
|
|
||||||
reaches capacity.
|
|
||||||
|
|
||||||
* Cost
|
|
||||||
The relative cost of the hardware weighed against the level of
|
|
||||||
design effort needed to build the system.
|
|
||||||
|
|
||||||
Weigh these considerations against each other to determine the best
|
|
||||||
design for the desired purpose. For example, increasing server density
|
|
||||||
means sacrificing resource capacity or expandability. Increasing resource
|
|
||||||
capacity and expandability can increase cost but decrease server density.
|
|
||||||
Decreasing cost often means decreasing supportability, server density,
|
|
||||||
resource capacity, and expandability.
|
|
||||||
|
|
||||||
Compute capacity (CPU cores and RAM capacity) is a secondary
|
|
||||||
consideration for selecting server hardware. The required
|
|
||||||
server hardware must supply adequate CPU sockets, additional CPU cores,
|
|
||||||
and more RAM; network connectivity and storage capacity are not as
|
|
||||||
critical. The hardware needs to provide enough network connectivity and
|
|
||||||
storage capacity to meet the user requirements.
|
|
||||||
|
|
||||||
For a compute-focused cloud, emphasis should be on server
|
|
||||||
hardware that can offer more CPU sockets, more CPU cores, and more RAM.
|
|
||||||
Network connectivity and storage capacity are less critical.
|
|
||||||
|
|
||||||
When designing a OpenStack cloud architecture, you must
|
|
||||||
consider whether you intend to scale up or scale out. Selecting a
|
|
||||||
smaller number of larger hosts, or a larger number of smaller hosts,
|
|
||||||
depends on a combination of factors: cost, power, cooling, physical rack
|
|
||||||
and floor space, support-warranty, and manageability.
|
|
||||||
|
|
||||||
Consider the following in selecting server hardware form factor suited for
|
|
||||||
your OpenStack design architecture:
|
|
||||||
|
|
||||||
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
|
||||||
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
|
||||||
aware, however, that this also decreases server density. For example,
|
|
||||||
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
|
||||||
M1000e support up to 16 servers in only ten rack units. Using
|
|
||||||
half-height blades is twice as dense as using full-height blades,
|
|
||||||
which results in only eight servers per ten rack units.
|
|
||||||
|
|
||||||
* 1U rack-mounted servers have the ability to offer greater server density
|
|
||||||
than a blade server solution, but are often limited to dual-socket,
|
|
||||||
multi-core CPU configurations. It is possible to place forty 1U servers
|
|
||||||
in a rack, providing space for the top of rack (ToR) switches, compared
|
|
||||||
to 32 full width blade servers.
|
|
||||||
|
|
||||||
To obtain greater than dual-socket support in a 1U rack-mount form
|
|
||||||
factor, customers need to buy their systems from Original Design
|
|
||||||
Manufacturers (ODMs) or second-tier manufacturers.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
This may cause issues for organizations that have preferred
|
|
||||||
vendor policies or concerns with support and hardware warranties
|
|
||||||
of non-tier 1 vendors.
|
|
||||||
|
|
||||||
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
|
||||||
but with a corresponding decrease in server density (half the density
|
|
||||||
that 1U rack-mounted servers offer).
|
|
||||||
|
|
||||||
* Larger rack-mounted servers, such as 4U servers, often provide even
|
|
||||||
greater CPU capacity, commonly supporting four or even eight CPU
|
|
||||||
sockets. These servers have greater expandability, but such servers
|
|
||||||
have much lower server density and are often more expensive.
|
|
||||||
|
|
||||||
* ``Sled servers`` are rack-mounted servers that support multiple
|
|
||||||
independent servers in a single 2U or 3U enclosure. These deliver
|
|
||||||
higher density as compared to typical 1U or 2U rack-mounted servers.
|
|
||||||
For example, many sled servers offer four independent dual-socket
|
|
||||||
nodes in 2U for a total of eight CPU sockets in 2U.
|
|
||||||
|
|
||||||
Other factors that influence server hardware selection for an OpenStack
|
|
||||||
design architecture include:
|
|
||||||
|
|
||||||
Instance density
|
|
||||||
More hosts are required to support the anticipated scale
|
|
||||||
if the design architecture uses dual-socket hardware designs.
|
|
||||||
|
|
||||||
For a general purpose OpenStack cloud, sizing is an important consideration.
|
|
||||||
The expected or anticipated number of instances that each hypervisor can
|
|
||||||
host is a common meter used in sizing the deployment. The selected server
|
|
||||||
hardware needs to support the expected or anticipated instance density.
|
|
||||||
|
|
||||||
Host density
|
|
||||||
Another option to address the higher host count is to use a
|
|
||||||
quad-socket platform. Taking this approach decreases host density
|
|
||||||
which also increases rack count. This configuration affects the
|
|
||||||
number of power connections and also impacts network and cooling
|
|
||||||
requirements.
|
|
||||||
|
|
||||||
Physical data centers have limited physical space, power, and
|
|
||||||
cooling. The number of hosts (or hypervisors) that can be fitted
|
|
||||||
into a given metric (rack, rack unit, or floor tile) is another
|
|
||||||
important method of sizing. Floor weight is an often overlooked
|
|
||||||
consideration. The data center floor must be able to support the
|
|
||||||
weight of the proposed number of hosts within a rack or set of
|
|
||||||
racks. These factors need to be applied as part of the host density
|
|
||||||
calculation and server hardware selection.
|
|
||||||
|
|
||||||
Power and cooling density
|
|
||||||
The power and cooling density requirements might be lower than with
|
|
||||||
blade, sled, or 1U server designs due to lower host density (by
|
|
||||||
using 2U, 3U or even 4U server designs). For data centers with older
|
|
||||||
infrastructure, this might be a desirable feature.
|
|
||||||
|
|
||||||
Data centers have a specified amount of power fed to a given rack or
|
|
||||||
set of racks. Older data centers may have a power density as power
|
|
||||||
as low as 20 AMPs per rack, while more recent data centers can be
|
|
||||||
architected to support power densities as high as 120 AMP per rack.
|
|
||||||
The selected server hardware must take power density into account.
|
|
||||||
|
|
||||||
Network connectivity
|
|
||||||
The selected server hardware must have the appropriate number of
|
|
||||||
network connections, as well as the right type of network
|
|
||||||
connections, in order to support the proposed architecture. Ensure
|
|
||||||
that, at a minimum, there are at least two diverse network
|
|
||||||
connections coming into each rack.
|
|
||||||
|
|
||||||
The selection of form factors or architectures affects the selection of
|
|
||||||
server hardware. Ensure that the selected server hardware is configured
|
|
||||||
to support enough storage capacity (or storage expandability) to match
|
|
||||||
the requirements of selected scale-out storage solution. Similarly, the
|
|
||||||
network architecture impacts the server hardware selection and vice
|
|
||||||
versa.
|
|
||||||
|
|
||||||
Hardware for general purpose OpenStack cloud
|
|
||||||
--------------------------------------------
|
|
||||||
|
|
||||||
Hardware for a general purpose OpenStack cloud should reflect a cloud
|
|
||||||
with no pre-defined usage model, designed to run a wide variety of
|
|
||||||
applications with varying resource usage requirements. These
|
|
||||||
applications include any of the following:
|
|
||||||
|
|
||||||
* RAM-intensive
|
|
||||||
|
|
||||||
* CPU-intensive
|
|
||||||
|
|
||||||
* Storage-intensive
|
|
||||||
|
|
||||||
Certain hardware form factors may better suit a general purpose
|
|
||||||
OpenStack cloud due to the requirement for equal (or nearly equal)
|
|
||||||
balance of resources. Server hardware must provide the following:
|
|
||||||
|
|
||||||
* Equal (or nearly equal) balance of compute capacity (RAM and CPU)
|
|
||||||
|
|
||||||
* Network capacity (number and speed of links)
|
|
||||||
|
|
||||||
* Storage capacity (gigabytes or terabytes as well as :term:`Input/Output
|
|
||||||
Operations Per Second (IOPS)`
|
|
||||||
|
|
||||||
The best form factor for server hardware supporting a general purpose
|
|
||||||
OpenStack cloud is driven by outside business and cost factors. No
|
|
||||||
single reference architecture applies to all implementations; the
|
|
||||||
decision must flow from user requirements, technical considerations, and
|
|
||||||
operational considerations.
|
|
||||||
|
|
||||||
Selecting storage hardware
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Storage hardware architecture is determined by selecting specific storage
|
|
||||||
architecture. Determine the selection of storage architecture by
|
|
||||||
evaluating possible solutions against the critical factors, the user
|
|
||||||
requirements, technical considerations, and operational considerations.
|
|
||||||
Consider the following factors when selecting storage hardware:
|
|
||||||
|
|
||||||
Cost
|
|
||||||
Storage can be a significant portion of the overall system cost. For
|
|
||||||
an organization that is concerned with vendor support, a commercial
|
|
||||||
storage solution is advisable, although it comes with a higher price
|
|
||||||
tag. If initial capital expenditure requires minimization, designing
|
|
||||||
a system based on commodity hardware would apply. The trade-off is
|
|
||||||
potentially higher support costs and a greater risk of
|
|
||||||
incompatibility and interoperability issues.
|
|
||||||
|
|
||||||
Performance
|
|
||||||
The latency of storage I/O requests indicates performance. Performance
|
|
||||||
requirements affect which solution you choose.
|
|
||||||
|
|
||||||
Scalability
|
|
||||||
Scalability, along with expandability, is a major consideration in a
|
|
||||||
general purpose OpenStack cloud. It might be difficult to predict
|
|
||||||
the final intended size of the implementation as there are no
|
|
||||||
established usage patterns for a general purpose cloud. It might
|
|
||||||
become necessary to expand the initial deployment in order to
|
|
||||||
accommodate growth and user demand.
|
|
||||||
|
|
||||||
Expandability
|
|
||||||
Expandability is a major architecture factor for storage solutions
|
|
||||||
with general purpose OpenStack cloud. A storage solution that
|
|
||||||
expands to 50 PB is considered more expandable than a solution that
|
|
||||||
only scales to 10 PB. This meter is related to scalability, which is
|
|
||||||
the measure of a solution's performance as it expands.
|
|
||||||
|
|
||||||
General purpose cloud storage requirements
|
|
||||||
------------------------------------------
|
|
||||||
Using a scale-out storage solution with direct-attached storage (DAS) in
|
|
||||||
the servers is well suited for a general purpose OpenStack cloud. Cloud
|
|
||||||
services requirements determine your choice of scale-out solution. You
|
|
||||||
need to determine if a single, highly expandable and highly vertical,
|
|
||||||
scalable, centralized storage array is suitable for your design. After
|
|
||||||
determining an approach, select the storage hardware based on this
|
|
||||||
criteria.
|
|
||||||
|
|
||||||
This list expands upon the potential impacts for including a particular
|
|
||||||
storage architecture (and corresponding storage hardware) into the
|
|
||||||
design for a general purpose OpenStack cloud:
|
|
||||||
|
|
||||||
Connectivity
|
|
||||||
If storage protocols other than Ethernet are part of the storage solution,
|
|
||||||
ensure the appropriate hardware has been selected. If a centralized storage
|
|
||||||
array is selected, ensure that the hypervisor will be able to connect to
|
|
||||||
that storage array for image storage.
|
|
||||||
|
|
||||||
Usage
|
|
||||||
How the particular storage architecture will be used is critical for
|
|
||||||
determining the architecture. Some of the configurations that will
|
|
||||||
influence the architecture include whether it will be used by the
|
|
||||||
hypervisors for ephemeral instance storage, or if OpenStack Object
|
|
||||||
Storage will use it for object storage.
|
|
||||||
|
|
||||||
Instance and image locations
|
|
||||||
Where instances and images will be stored will influence the
|
|
||||||
architecture.
|
|
||||||
|
|
||||||
Server hardware
|
|
||||||
If the solution is a scale-out storage architecture that includes
|
|
||||||
DAS, it will affect the server hardware selection. This could ripple
|
|
||||||
into the decisions that affect host density, instance density, power
|
|
||||||
density, OS-hypervisor, management tools and others.
|
|
||||||
|
|
||||||
A general purpose OpenStack cloud has multiple options. The key factors
|
|
||||||
that will have an influence on selection of storage hardware for a
|
|
||||||
general purpose OpenStack cloud are as follows:
|
|
||||||
|
|
||||||
Capacity
|
|
||||||
Hardware resources selected for the resource nodes should be capable
|
|
||||||
of supporting enough storage for the cloud services. Defining the
|
|
||||||
initial requirements and ensuring the design can support adding
|
|
||||||
capacity is important. Hardware nodes selected for object storage
|
|
||||||
should be capable of support a large number of inexpensive disks
|
|
||||||
with no reliance on RAID controller cards. Hardware nodes selected
|
|
||||||
for block storage should be capable of supporting high speed storage
|
|
||||||
solutions and RAID controller cards to provide performance and
|
|
||||||
redundancy to storage at a hardware level. Selecting hardware RAID
|
|
||||||
controllers that automatically repair damaged arrays will assist
|
|
||||||
with the replacement and repair of degraded or deleted storage
|
|
||||||
devices.
|
|
||||||
|
|
||||||
Performance
|
|
||||||
Disks selected for object storage services do not need to be fast
|
|
||||||
performing disks. We recommend that object storage nodes take
|
|
||||||
advantage of the best cost per terabyte available for storage.
|
|
||||||
Contrastingly, disks chosen for block storage services should take
|
|
||||||
advantage of performance boosting features that may entail the use
|
|
||||||
of SSDs or flash storage to provide high performance block storage
|
|
||||||
pools. Storage performance of ephemeral disks used for instances
|
|
||||||
should also be taken into consideration.
|
|
||||||
|
|
||||||
Fault tolerance
|
|
||||||
Object storage resource nodes have no requirements for hardware
|
|
||||||
fault tolerance or RAID controllers. It is not necessary to plan for
|
|
||||||
fault tolerance within the object storage hardware because the
|
|
||||||
object storage service provides replication between zones as a
|
|
||||||
feature of the service. Block storage nodes, compute nodes, and
|
|
||||||
cloud controllers should all have fault tolerance built in at the
|
|
||||||
hardware level by making use of hardware RAID controllers and
|
|
||||||
varying levels of RAID configuration. The level of RAID chosen
|
|
||||||
should be consistent with the performance and availability
|
|
||||||
requirements of the cloud.
|
|
||||||
|
|
||||||
Storage-focus cloud storage requirements
|
|
||||||
----------------------------------------
|
|
||||||
|
|
||||||
Storage-focused OpenStack clouds must address I/O intensive workloads.
|
|
||||||
These workloads are not CPU intensive, nor are they consistently network
|
|
||||||
intensive. The network may be heavily utilized to transfer storage, but
|
|
||||||
they are not otherwise network intensive.
|
|
||||||
|
|
||||||
The selection of storage hardware determines the overall performance and
|
|
||||||
scalability of a storage-focused OpenStack design architecture. Several
|
|
||||||
factors impact the design process, including:
|
|
||||||
|
|
||||||
Latency is a key consideration in a storage-focused OpenStack cloud.
|
|
||||||
Using solid-state disks (SSDs) to minimize latency and, to reduce CPU
|
|
||||||
delays caused by waiting for the storage, increases performance. Use
|
|
||||||
RAID controller cards in compute hosts to improve the performance of the
|
|
||||||
underlying disk subsystem.
|
|
||||||
|
|
||||||
Depending on the storage architecture, you can adopt a scale-out
|
|
||||||
solution, or use a highly expandable and scalable centralized storage
|
|
||||||
array. If a centralized storage array meets your requirements, then the
|
|
||||||
array vendor determines the hardware selection. It is possible to build
|
|
||||||
a storage array using commodity hardware with Open Source software, but
|
|
||||||
requires people with expertise to build such a system.
|
|
||||||
|
|
||||||
On the other hand, a scale-out storage solution that uses
|
|
||||||
direct-attached storage (DAS) in the servers may be an appropriate
|
|
||||||
choice. This requires configuration of the server hardware to support
|
|
||||||
the storage solution.
|
|
||||||
|
|
||||||
Considerations affecting storage architecture (and corresponding storage
|
|
||||||
hardware) of a Storage-focused OpenStack cloud include:
|
|
||||||
|
|
||||||
Connectivity
|
|
||||||
Ensure the connectivity matches the storage solution requirements. We
|
|
||||||
recommended confirming that the network characteristics minimize latency
|
|
||||||
to boost the overall performance of the design.
|
|
||||||
|
|
||||||
Latency
|
|
||||||
Determine if the use case has consistent or highly variable latency.
|
|
||||||
|
|
||||||
Throughput
|
|
||||||
Ensure that the storage solution throughput is optimized for your
|
|
||||||
application requirements.
|
|
||||||
|
|
||||||
Server hardware
|
|
||||||
Use of DAS impacts the server hardware choice and affects host
|
|
||||||
density, instance density, power density, OS-hypervisor, and
|
|
||||||
management tools.
|
|
@ -1,27 +0,0 @@
|
|||||||
======================
|
|
||||||
Logging and monitoring
|
|
||||||
======================
|
|
||||||
|
|
||||||
OpenStack clouds require appropriate monitoring platforms to catch and
|
|
||||||
manage errors.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
We recommend leveraging existing monitoring systems to see if they
|
|
||||||
are able to effectively monitor an OpenStack environment.
|
|
||||||
|
|
||||||
Specific meters that are critically important to capture include:
|
|
||||||
|
|
||||||
* Image disk utilization
|
|
||||||
|
|
||||||
* Response time to the Compute API
|
|
||||||
|
|
||||||
Logging and monitoring does not significantly differ for a multi-site OpenStack
|
|
||||||
cloud. The tools described in the `Logging and monitoring chapter
|
|
||||||
<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`__ of
|
|
||||||
the Operations Guide remain applicable. Logging and monitoring can be provided
|
|
||||||
on a per-site basis, and in a common centralized location.
|
|
||||||
|
|
||||||
When attempting to deploy logging and monitoring facilities to a centralized
|
|
||||||
location, care must be taken with the load placed on the inter-site networking
|
|
||||||
links.
|
|
@ -1,445 +0,0 @@
|
|||||||
==========
|
|
||||||
Networking
|
|
||||||
==========
|
|
||||||
|
|
||||||
OpenStack clouds generally have multiple network segments, with each
|
|
||||||
segment providing access to particular resources. The network segments
|
|
||||||
themselves also require network communication paths that should be
|
|
||||||
separated from the other networks. When designing network services for a
|
|
||||||
general purpose cloud, plan for either a physical or logical separation
|
|
||||||
of network segments used by operators and tenants. Additional network
|
|
||||||
segments can also be created for access to internal services such as the
|
|
||||||
message bus and database used by various systems. Segregating these
|
|
||||||
services onto separate networks helps to protect sensitive data and
|
|
||||||
unauthorized access.
|
|
||||||
|
|
||||||
Choose a networking service based on the requirements of your instances.
|
|
||||||
The architecture and design of your cloud will impact whether you choose
|
|
||||||
OpenStack Networking (neutron) or legacy networking (nova-network).
|
|
||||||
|
|
||||||
Networking (neutron)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
OpenStack Networking (neutron) is a first class networking service that gives
|
|
||||||
full control over creation of virtual network resources to tenants. This is
|
|
||||||
often accomplished in the form of tunneling protocols that establish
|
|
||||||
encapsulated communication paths over existing network infrastructure in order
|
|
||||||
to segment tenant traffic. This method varies depending on the specific
|
|
||||||
implementation, but some of the more common methods include tunneling over
|
|
||||||
GRE, encapsulating with VXLAN, and VLAN tags.
|
|
||||||
|
|
||||||
We recommend you design at least three network segments. The first segment
|
|
||||||
should be a public network, used to access REST APIs by tenants and operators.
|
|
||||||
The controller nodes and swift proxies are the only devices connecting to this
|
|
||||||
network segment. In some cases, this public network might also be serviced by
|
|
||||||
hardware load balancers and other network devices.
|
|
||||||
|
|
||||||
The second segment is used by administrators to manage hardware resources.
|
|
||||||
Configuration management tools also utilize this segment for deploying
|
|
||||||
software and services onto new hardware. In some cases, this network
|
|
||||||
segment is also used for internal services, including the message bus
|
|
||||||
and database services. The second segment needs to communicate with every
|
|
||||||
hardware node. Due to the highly sensitive nature of this network segment,
|
|
||||||
it needs to be secured from unauthorized access.
|
|
||||||
|
|
||||||
The third network segment is used by applications and consumers to access the
|
|
||||||
physical network, and for users to access applications. This network is
|
|
||||||
segregated from the one used to access the cloud APIs and is not capable
|
|
||||||
of communicating directly with the hardware resources in the cloud.
|
|
||||||
Communication on this network segment is required by compute resource
|
|
||||||
nodes and network gateway services that allow application data to access the
|
|
||||||
physical network from outside the cloud.
|
|
||||||
|
|
||||||
Legacy networking (nova-network)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The legacy networking (nova-network) service is primarily a layer-2 networking
|
|
||||||
service. It functions in two modes: flat networking mode and VLAN mode. In a
|
|
||||||
flat network mode, all network hardware nodes and devices throughout the cloud
|
|
||||||
are connected to a single layer-2 network segment that provides access to
|
|
||||||
application data.
|
|
||||||
|
|
||||||
However, when the network devices in the cloud support segmentation using
|
|
||||||
VLANs, legacy networking can operate in the second mode. In this design model,
|
|
||||||
each tenant within the cloud is assigned a network subnet which is mapped to
|
|
||||||
a VLAN on the physical network. It is especially important to remember that
|
|
||||||
the maximum number of VLANs that can be used within a spanning tree domain
|
|
||||||
is 4096. This places a hard limit on the amount of growth possible within the
|
|
||||||
data center. Consequently, when designing a general purpose cloud intended to
|
|
||||||
support multiple tenants, we recommend the use of legacy networking with
|
|
||||||
VLANs, and not in flat network mode.
|
|
||||||
|
|
||||||
Layer-2 architecture advantages
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
A network designed on layer-2 protocols has advantages over a network designed
|
|
||||||
on layer-3 protocols. In spite of the difficulties of using a bridge to perform
|
|
||||||
the network role of a router, many vendors, customers, and service providers
|
|
||||||
choose to use Ethernet in as many parts of their networks as possible. The
|
|
||||||
benefits of selecting a layer-2 design are:
|
|
||||||
|
|
||||||
* Ethernet frames contain all the essentials for networking. These include, but
|
|
||||||
are not limited to, globally unique source addresses, globally unique
|
|
||||||
destination addresses, and error control.
|
|
||||||
|
|
||||||
* Ethernet frames contain all the essentials for networking. These include,
|
|
||||||
but are not limited to, globally unique source addresses, globally unique
|
|
||||||
destination addresses, and error control.
|
|
||||||
|
|
||||||
* Ethernet frames can carry any kind of packet. Networking at layer-2 is
|
|
||||||
independent of the layer-3 protocol.
|
|
||||||
|
|
||||||
* Adding more layers to the Ethernet frame only slows the networking process
|
|
||||||
down. This is known as nodal processing delay.
|
|
||||||
|
|
||||||
* You can add adjunct networking features, for example class of service (CoS)
|
|
||||||
or multicasting, to Ethernet as readily as IP networks.
|
|
||||||
|
|
||||||
* VLANs are an easy mechanism for isolating networks.
|
|
||||||
|
|
||||||
Most information starts and ends inside Ethernet frames. Today this applies
|
|
||||||
to data, voice, and video. The concept is that the network will benefit more
|
|
||||||
from the advantages of Ethernet if the transfer of information from a source
|
|
||||||
to a destination is in the form of Ethernet frames.
|
|
||||||
|
|
||||||
Although it is not a substitute for IP networking, networking at layer-2 can
|
|
||||||
be a powerful adjunct to IP networking.
|
|
||||||
|
|
||||||
Layer-2 Ethernet usage has these additional advantages over layer-3 IP network
|
|
||||||
usage:
|
|
||||||
|
|
||||||
* Speed
|
|
||||||
* Reduced overhead of the IP hierarchy.
|
|
||||||
* No need to keep track of address configuration as systems move around.
|
|
||||||
|
|
||||||
Whereas the simplicity of layer-2 protocols might work well in a data center
|
|
||||||
with hundreds of physical machines, cloud data centers have the additional
|
|
||||||
burden of needing to keep track of all virtual machine addresses and
|
|
||||||
networks. In these data centers, it is not uncommon for one physical node
|
|
||||||
to support 30-40 instances.
|
|
||||||
|
|
||||||
.. Important::
|
|
||||||
|
|
||||||
Networking at the frame level says nothing about the presence or
|
|
||||||
absence of IP addresses at the packet level. Almost all ports, links, and
|
|
||||||
devices on a network of LAN switches still have IP addresses, as do all the
|
|
||||||
source and destination hosts. There are many reasons for the continued need
|
|
||||||
for IP addressing. The largest one is the need to manage the network. A
|
|
||||||
device or link without an IP address is usually invisible to most
|
|
||||||
management applications. Utilities including remote access for diagnostics,
|
|
||||||
file transfer of configurations and software, and similar applications
|
|
||||||
cannot run without IP addresses as well as MAC addresses.
|
|
||||||
|
|
||||||
Layer-2 architecture limitations
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Layer-2 network architectures have some limitations that become noticeable when
|
|
||||||
used outside of traditional data centers.
|
|
||||||
|
|
||||||
* Number of VLANs is limited to 4096.
|
|
||||||
* The number of MACs stored in switch tables is limited.
|
|
||||||
* You must accommodate the need to maintain a set of layer-4 devices to handle
|
|
||||||
traffic control.
|
|
||||||
* MLAG, often used for switch redundancy, is a proprietary solution that does
|
|
||||||
not scale beyond two devices and forces vendor lock-in.
|
|
||||||
* It can be difficult to troubleshoot a network without IP addresses and ICMP.
|
|
||||||
* Configuring ARP can be complicated on a large layer-2 networks.
|
|
||||||
* All network devices need to be aware of all MACs, even instance MACs, so
|
|
||||||
there is constant churn in MAC tables and network state changes as instances
|
|
||||||
start and stop.
|
|
||||||
* Migrating MACs (instance migration) to different physical locations are a
|
|
||||||
potential problem if you do not set ARP table timeouts properly.
|
|
||||||
|
|
||||||
It is important to know that layer-2 has a very limited set of network
|
|
||||||
management tools. It is difficult to control traffic as it does not have
|
|
||||||
mechanisms to manage the network or shape the traffic. Network
|
|
||||||
troubleshooting is also troublesome, in part because network devices have
|
|
||||||
no IP addresses. As a result, there is no reasonable way to check network
|
|
||||||
delay.
|
|
||||||
|
|
||||||
In a layer-2 network all devices are aware of all MACs, even those that belong
|
|
||||||
to instances. The network state information in the backbone changes whenever an
|
|
||||||
instance starts or stops. Because of this, there is far too much churn in the
|
|
||||||
MAC tables on the backbone switches.
|
|
||||||
|
|
||||||
Furthermore, on large layer-2 networks, configuring ARP learning can be
|
|
||||||
complicated. The setting for the MAC address timer on switches is critical
|
|
||||||
and, if set incorrectly, can cause significant performance problems. So when
|
|
||||||
migrating MACs to different physical locations to support instance migration,
|
|
||||||
problems may arise. As an example, the Cisco default MAC address timer is
|
|
||||||
extremely long. As such, the network information maintained in the switches
|
|
||||||
could be out of sync with the new location of the instance.
|
|
||||||
|
|
||||||
Layer-3 architecture advantages
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
In layer-3 networking, routing takes instance MAC and IP addresses out of the
|
|
||||||
network core, reducing state churn. The only time there would be a routing
|
|
||||||
state change is in the case of a Top of Rack (ToR) switch failure or a link
|
|
||||||
failure in the backbone itself. Other advantages of using a layer-3
|
|
||||||
architecture include:
|
|
||||||
|
|
||||||
* Layer-3 networks provide the same level of resiliency and scalability
|
|
||||||
as the Internet.
|
|
||||||
|
|
||||||
* Controlling traffic with routing metrics is straightforward.
|
|
||||||
|
|
||||||
* You can configure layer-3 to useˇBGPˇconfederation for scalability. This
|
|
||||||
way core routers have state proportional to the number of racks, not to the
|
|
||||||
number of servers or instances.
|
|
||||||
|
|
||||||
* There are a variety of well tested tools, such as ICMP, to monitor and
|
|
||||||
manage traffic.
|
|
||||||
|
|
||||||
* Layer-3 architectures enable the use of :term:`quality of service (QoS)` to
|
|
||||||
manage network performance.
|
|
||||||
|
|
||||||
Layer-3 architecture limitations
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The main limitation of layer 3 is that there is no built-in isolation mechanism
|
|
||||||
comparable to the VLANs in layer-2 networks. Furthermore, the hierarchical
|
|
||||||
nature of IP addresses means that an instance is on the same subnet as its
|
|
||||||
physical host, making migration out of the subnet difficult. For these reasons,
|
|
||||||
network virtualization needs to use IPencapsulation and software at the end
|
|
||||||
hosts. This is for isolation and the separation of the addressing in the
|
|
||||||
virtual layer from the addressing in the physical layer. Other potential
|
|
||||||
disadvantages of layer 3 include the need to design an IP addressing scheme
|
|
||||||
rather than relying on the switches to keep track of the MAC addresses
|
|
||||||
automatically, and to configure the interior gateway routing protocol in the
|
|
||||||
switches.
|
|
||||||
|
|
||||||
Network design
|
|
||||||
~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
There are many reasons an OpenStack network has complex requirements. However,
|
|
||||||
one main factor is the many components that interact at different levels of the
|
|
||||||
system stack, adding complexity. Data flows are also complex. Data in an
|
|
||||||
OpenStack cloud moves both between instances across the network (also known as
|
|
||||||
East-West), as well as in and out of the system (also known as North-South).
|
|
||||||
Physical server nodes have network requirements that are independent of
|
|
||||||
instance network requirements; these you must isolate from the core network
|
|
||||||
to account for scalability. We recommend functionally separating the networks
|
|
||||||
for security purposes and tuning performance through traffic shaping.
|
|
||||||
|
|
||||||
You must consider a number of important general technical and business factors
|
|
||||||
when planning and designing an OpenStack network. These include:
|
|
||||||
|
|
||||||
* A requirement for vendor independence. To avoid hardware or software vendor
|
|
||||||
lock-in, the design should not rely on specific features of a vendors router
|
|
||||||
or switch.
|
|
||||||
* A requirement to massively scale the ecosystem to support millions of end
|
|
||||||
users.
|
|
||||||
* A requirement to support indeterminate platforms and applications.
|
|
||||||
* A requirement to design for cost efficient operations to take advantage of
|
|
||||||
massive scale.
|
|
||||||
* A requirement to ensure that there is no single point of failure in the
|
|
||||||
cloud ecosystem.
|
|
||||||
* A requirement for high availability architecture to meet customer SLA
|
|
||||||
requirements.
|
|
||||||
* A requirement to be tolerant of rack level failure.
|
|
||||||
* A requirement to maximize flexibility to architect future production
|
|
||||||
environments.
|
|
||||||
|
|
||||||
Bearing in mind these considerations, we recommend the following:
|
|
||||||
|
|
||||||
* Layer-3 designs are preferable to layer-2 architectures.
|
|
||||||
* Design a dense multi-path network core to support multi-directional
|
|
||||||
scaling and flexibility.
|
|
||||||
* Use hierarchical addressing because it is the only viable option to scale
|
|
||||||
network ecosystem.
|
|
||||||
* Use virtual networking to isolate instance service network traffic from the
|
|
||||||
management and internal network traffic.
|
|
||||||
* Isolate virtual networks using encapsulation technologies.
|
|
||||||
* Use traffic shaping for performance tuning.
|
|
||||||
* Use eBGP to connect to the Internet up-link.
|
|
||||||
* Use iBGP to flatten the internal traffic on the layer-3 mesh.
|
|
||||||
* Determine the most effective configuration for block storage network.
|
|
||||||
|
|
||||||
Operator considerations
|
|
||||||
-----------------------
|
|
||||||
|
|
||||||
The network design for an OpenStack cluster includes decisions regarding
|
|
||||||
the interconnect needs within the cluster, the need to allow clients to
|
|
||||||
access their resources, and the access requirements for operators to
|
|
||||||
administrate the cluster. You should consider the bandwidth, latency,
|
|
||||||
and reliability of these networks.
|
|
||||||
|
|
||||||
Whether you are using an external provider or an internal team, you need
|
|
||||||
to consider additional design decisions about monitoring and alarming.
|
|
||||||
If you are using an external provider, service level agreements (SLAs)
|
|
||||||
are typically defined in your contract. Operational considerations such
|
|
||||||
as bandwidth, latency, and jitter can be part of the SLA.
|
|
||||||
|
|
||||||
As demand for network resources increase, make sure your network design
|
|
||||||
accommodates expansion and upgrades. Operators add additional IP address
|
|
||||||
blocks and add additional bandwidth capacity. In addition, consider
|
|
||||||
managing hardware and software lifecycle events, for example upgrades,
|
|
||||||
decommissioning, and outages, while avoiding service interruptions for
|
|
||||||
tenants.
|
|
||||||
|
|
||||||
Factor maintainability into the overall network design. This includes
|
|
||||||
the ability to manage and maintain IP addresses as well as the use of
|
|
||||||
overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
|
|
||||||
tags. As an example, if you may need to change all of the IP addresses
|
|
||||||
on a network, a process known as renumbering, then the design must
|
|
||||||
support this function.
|
|
||||||
|
|
||||||
Address network-focused applications when considering certain
|
|
||||||
operational realities. For example, consider the impending exhaustion of
|
|
||||||
IPv4 addresses, the migration to IPv6, and the use of private networks
|
|
||||||
to segregate different types of traffic that an application receives or
|
|
||||||
generates. In the case of IPv4 to IPv6 migrations, applications should
|
|
||||||
follow best practices for storing IP addresses. We recommend you avoid
|
|
||||||
relying on IPv4 features that did not carry over to the IPv6 protocol or
|
|
||||||
have differences in implementation.
|
|
||||||
|
|
||||||
To segregate traffic, allow applications to create a private tenant
|
|
||||||
network for database and storage network traffic. Use a public network
|
|
||||||
for services that require direct client access from the Internet. Upon
|
|
||||||
segregating the traffic, consider :term:`quality of service (QoS)` and
|
|
||||||
security to ensure each network has the required level of service.
|
|
||||||
|
|
||||||
Finally, consider the routing of network traffic. For some applications,
|
|
||||||
develop a complex policy framework for routing. To create a routing
|
|
||||||
policy that satisfies business requirements, consider the economic cost
|
|
||||||
of transmitting traffic over expensive links versus cheaper links, in
|
|
||||||
addition to bandwidth, latency, and jitter requirements.
|
|
||||||
|
|
||||||
Additionally, consider how to respond to network events. How load
|
|
||||||
transfers from one link to another during a failure scenario could be
|
|
||||||
a factor in the design. If you do not plan network capacity
|
|
||||||
correctly, failover traffic could overwhelm other ports or network
|
|
||||||
links and create a cascading failure scenario. In this case,
|
|
||||||
traffic that fails over to one link overwhelms that link and then
|
|
||||||
moves to the subsequent links until all network traffic stops.
|
|
||||||
|
|
||||||
Additional considerations
|
|
||||||
-------------------------
|
|
||||||
|
|
||||||
There are several further considerations when designing a network-focused
|
|
||||||
OpenStack cloud. Redundant networking: ToR switch high availability risk
|
|
||||||
analysis. In most cases, it is much more economical to use a single switch
|
|
||||||
with a small pool of spare switches to replace failed units than it is to
|
|
||||||
outfit an entire data center with redundant switches. Applications should
|
|
||||||
tolerate rack level outages without affecting normal operations since network
|
|
||||||
and compute resources are easily provisioned and plentiful.
|
|
||||||
|
|
||||||
Research indicates the mean time between failures (MTBF) on switches is
|
|
||||||
between 100,000 and 200,000 hours. This number is dependent on the ambient
|
|
||||||
temperature of the switch in the data center. When properly cooled and
|
|
||||||
maintained, this translates to between 11 and 22 years before failure. Even
|
|
||||||
in the worst case of poor ventilation and high ambient temperatures in the data
|
|
||||||
center, the MTBF is still 2-3 years.
|
|
||||||
|
|
||||||
Reference
|
|
||||||
https://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
|
|
||||||
for further information.
|
|
||||||
|
|
||||||
Legacy networking (nova-network)
|
|
||||||
OpenStack Networking
|
|
||||||
Simple, single agent
|
|
||||||
Complex, multiple agents
|
|
||||||
Flat or VLAN
|
|
||||||
Flat, VLAN, Overlays, L2-L3, SDN
|
|
||||||
No plug-in support
|
|
||||||
Plug-in support for 3rd parties
|
|
||||||
No multi-tier topologies
|
|
||||||
Multi-tier topologies
|
|
||||||
|
|
||||||
Preparing for the future: IPv6 support
|
|
||||||
--------------------------------------
|
|
||||||
|
|
||||||
One of the most important networking topics today is the exhaustion of
|
|
||||||
IPv4 addresses. As of late 2015, ICANN announced that the the final
|
|
||||||
IPv4 address blocks have been fully assigned. Because of this, IPv6
|
|
||||||
protocol has become the future of network focused applications. IPv6
|
|
||||||
increases the address space significantly, fixes long standing issues
|
|
||||||
in the IPv4 protocol, and will become essential for network focused
|
|
||||||
applications in the future.
|
|
||||||
|
|
||||||
OpenStack Networking, when configured for it, supports IPv6. To enable
|
|
||||||
IPv6, create an IPv6 subnet in Networking and use IPv6 prefixes when
|
|
||||||
creating security groups.
|
|
||||||
|
|
||||||
Asymmetric links
|
|
||||||
----------------
|
|
||||||
|
|
||||||
When designing a network architecture, the traffic patterns of an
|
|
||||||
application heavily influence the allocation of total bandwidth and
|
|
||||||
the number of links that you use to send and receive traffic. Applications
|
|
||||||
that provide file storage for customers allocate bandwidth and links to
|
|
||||||
favor incoming traffic; whereas video streaming applications allocate
|
|
||||||
bandwidth and links to favor outgoing traffic.
|
|
||||||
|
|
||||||
Performance
|
|
||||||
-----------
|
|
||||||
|
|
||||||
It is important to analyze the applications tolerance for latency and
|
|
||||||
jitter when designing an environment to support network focused
|
|
||||||
applications. Certain applications, for example VoIP, are less tolerant
|
|
||||||
of latency and jitter. When latency and jitter are issues, certain
|
|
||||||
applications may require tuning of QoS parameters and network device
|
|
||||||
queues to ensure that they queue for transmit immediately or guarantee
|
|
||||||
minimum bandwidth. Since OpenStack currently does not support these functions,
|
|
||||||
consider carefully your selected network plug-in.
|
|
||||||
|
|
||||||
The location of a service may also impact the application or consumer
|
|
||||||
experience. If an application serves differing content to different users,
|
|
||||||
it must properly direct connections to those specific locations. Where
|
|
||||||
appropriate, use a multi-site installation for these situations.
|
|
||||||
|
|
||||||
You can implement networking in two separate ways. Legacy networking
|
|
||||||
(nova-network) provides a flat DHCP network with a single broadcast domain.
|
|
||||||
This implementation does not support tenant isolation networks or advanced
|
|
||||||
plug-ins, but it is currently the only way to implement a distributed
|
|
||||||
layer-3 (L3) agent using the multi host configuration. OpenStack Networking
|
|
||||||
(neutron) is the official networking implementation and provides a pluggable
|
|
||||||
architecture that supports a large variety of network methods. Some of these
|
|
||||||
include a layer-2 only provider network model, external device plug-ins, or
|
|
||||||
even OpenFlow controllers.
|
|
||||||
|
|
||||||
Networking at large scales becomes a set of boundary questions. The
|
|
||||||
determination of how large a layer-2 domain must be is based on the
|
|
||||||
amount of nodes within the domain and the amount of broadcast traffic
|
|
||||||
that passes between instances. Breaking layer-2 boundaries may require
|
|
||||||
the implementation of overlay networks and tunnels. This decision is a
|
|
||||||
balancing act between the need for a smaller overhead or a need for a smaller
|
|
||||||
domain.
|
|
||||||
|
|
||||||
When selecting network devices, be aware that making a decision based on the
|
|
||||||
greatest port density often comes with a drawback. Aggregation switches and
|
|
||||||
routers have not all kept pace with Top of Rack switches and may induce
|
|
||||||
bottlenecks on north-south traffic. As a result, it may be possible for
|
|
||||||
massive amounts of downstream network utilization to impact upstream network
|
|
||||||
devices, impacting service to the cloud. Since OpenStack does not currently
|
|
||||||
provide a mechanism for traffic shaping or rate limiting, it is necessary to
|
|
||||||
implement these features at the network hardware level.
|
|
||||||
|
|
||||||
Tunable networking components
|
|
||||||
-----------------------------
|
|
||||||
|
|
||||||
Consider configurable networking components related to an OpenStack
|
|
||||||
architecture design when designing for network intensive workloads
|
|
||||||
that include MTU and QoS. Some workloads require a larger MTU than normal
|
|
||||||
due to the transfer of large blocks of data. When providing network
|
|
||||||
service for applications such as video streaming or storage replication,
|
|
||||||
we recommend that you configure both OpenStack hardware nodes and the
|
|
||||||
supporting network equipment for jumbo frames where possible. This
|
|
||||||
allows for better use of available bandwidth. Configure jumbo frames across the
|
|
||||||
complete path the packets traverse. If one network component is not capable of
|
|
||||||
handling jumbo frames then the entire path reverts to the default MTU.
|
|
||||||
|
|
||||||
:term:`Quality of Service (QoS)` also has a great impact on network intensive
|
|
||||||
workloads as it provides instant service to packets which have a higher
|
|
||||||
priority due to the impact of poor network performance. In applications such as
|
|
||||||
Voice over IP (VoIP), differentiated services code points are a near
|
|
||||||
requirement for proper operation. You can also use QoS in the opposite
|
|
||||||
direction for mixed workloads to prevent low priority but high bandwidth
|
|
||||||
applications, for example backup services, video conferencing, or file sharing,
|
|
||||||
from blocking bandwidth that is needed for the proper operation of other
|
|
||||||
workloads. It is possible to tag file storage traffic as a lower class, such as
|
|
||||||
best effort or scavenger, to allow the higher priority traffic through. In
|
|
||||||
cases where regions within a cloud might be geographically distributed it may
|
|
||||||
also be necessary to plan accordingly to implement WAN optimization to combat
|
|
||||||
latency or packet loss.
|
|
@ -1,260 +0,0 @@
|
|||||||
==================
|
|
||||||
Software selection
|
|
||||||
==================
|
|
||||||
|
|
||||||
Software selection, particularly for a general purpose OpenStack architecture
|
|
||||||
design involves three areas:
|
|
||||||
|
|
||||||
* Operating system (OS) and hypervisor
|
|
||||||
|
|
||||||
* OpenStack components
|
|
||||||
|
|
||||||
* Supplemental software
|
|
||||||
|
|
||||||
Operating system and hypervisor
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The operating system (OS) and hypervisor have a significant impact on
|
|
||||||
the overall design. Selecting a particular operating system and
|
|
||||||
hypervisor can directly affect server hardware selection. Make sure the
|
|
||||||
storage hardware and topology support the selected operating system and
|
|
||||||
hypervisor combination. Also ensure the networking hardware selection
|
|
||||||
and topology will work with the chosen operating system and hypervisor
|
|
||||||
combination.
|
|
||||||
|
|
||||||
Some areas that could be impacted by the selection of OS and hypervisor
|
|
||||||
include:
|
|
||||||
|
|
||||||
Cost
|
|
||||||
Selecting a commercially supported hypervisor, such as Microsoft
|
|
||||||
Hyper-V, will result in a different cost model rather than
|
|
||||||
community-supported open source hypervisors including
|
|
||||||
:term:`KVM<kernel-based VM (KVM)>`, Kinstance or :term:`Xen`. When
|
|
||||||
comparing open source OS solutions, choosing Ubuntu over Red Hat
|
|
||||||
(or vice versa) will have an impact on cost due to support
|
|
||||||
contracts.
|
|
||||||
|
|
||||||
Support
|
|
||||||
Depending on the selected hypervisor, staff should have the
|
|
||||||
appropriate training and knowledge to support the selected OS and
|
|
||||||
hypervisor combination. If they do not, training will need to be
|
|
||||||
provided which could have a cost impact on the design.
|
|
||||||
|
|
||||||
Management tools
|
|
||||||
The management tools used for Ubuntu and Kinstance differ from the
|
|
||||||
management tools for VMware vSphere. Although both OS and hypervisor
|
|
||||||
combinations are supported by OpenStack, there is
|
|
||||||
different impact to the rest of the design as a result of the
|
|
||||||
selection of one combination versus the other.
|
|
||||||
|
|
||||||
Scale and performance
|
|
||||||
Ensure that selected OS and hypervisor combinations meet the
|
|
||||||
appropriate scale and performance requirements. The chosen
|
|
||||||
architecture will need to meet the targeted instance-host ratios
|
|
||||||
with the selected OS-hypervisor combinations.
|
|
||||||
|
|
||||||
Security
|
|
||||||
Ensure that the design can accommodate regular periodic
|
|
||||||
installations of application security patches while maintaining
|
|
||||||
required workloads. The frequency of security patches for the
|
|
||||||
proposed OS-hypervisor combination will have an impact on
|
|
||||||
performance and the patch installation process could affect
|
|
||||||
maintenance windows.
|
|
||||||
|
|
||||||
Supported features
|
|
||||||
Determine which OpenStack features are required. This will often
|
|
||||||
determine the selection of the OS-hypervisor combination. Some
|
|
||||||
features are only available with specific operating systems or
|
|
||||||
hypervisors.
|
|
||||||
|
|
||||||
Interoperability
|
|
||||||
You will need to consider how the OS and hypervisor combination
|
|
||||||
interactions with other operating systems and hypervisors, including
|
|
||||||
other software solutions. Operational troubleshooting tools for one
|
|
||||||
OS-hypervisor combination may differ from the tools used for another
|
|
||||||
OS-hypervisor combination and, as a result, the design will need to
|
|
||||||
address if the two sets of tools need to interoperate.
|
|
||||||
|
|
||||||
OpenStack components
|
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Selecting which OpenStack components are included in the overall design
|
|
||||||
is important. Some OpenStack components, like compute and Image service,
|
|
||||||
are required in every architecture. Other components, like
|
|
||||||
Orchestration, are not always required.
|
|
||||||
|
|
||||||
A compute-focused OpenStack design architecture may contain the following
|
|
||||||
components:
|
|
||||||
|
|
||||||
* Identity (keystone)
|
|
||||||
|
|
||||||
* Dashboard (horizon)
|
|
||||||
|
|
||||||
* Compute (nova)
|
|
||||||
|
|
||||||
* Object Storage (swift)
|
|
||||||
|
|
||||||
* Image (glance)
|
|
||||||
|
|
||||||
* Networking (neutron)
|
|
||||||
|
|
||||||
* Orchestration (heat)
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
A compute-focused design is less likely to include OpenStack Block
|
|
||||||
Storage. However, there may be some situations where the need for
|
|
||||||
performance requires a block storage component to improve data I-O.
|
|
||||||
|
|
||||||
Excluding certain OpenStack components can limit or constrain the
|
|
||||||
functionality of other components. For example, if the architecture
|
|
||||||
includes Orchestration but excludes Telemetry, then the design will not
|
|
||||||
be able to take advantage of Orchestrations' auto scaling functionality.
|
|
||||||
It is important to research the component interdependencies in
|
|
||||||
conjunction with the technical requirements before deciding on the final
|
|
||||||
architecture.
|
|
||||||
|
|
||||||
Networking software
|
|
||||||
~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
OpenStack Networking (neutron) provides a wide variety of networking
|
|
||||||
services for instances. There are many additional networking software
|
|
||||||
packages that can be useful when managing OpenStack components. Some
|
|
||||||
examples include:
|
|
||||||
|
|
||||||
* Software to provide load balancing
|
|
||||||
|
|
||||||
* Network redundancy protocols
|
|
||||||
|
|
||||||
* Routing daemons
|
|
||||||
|
|
||||||
Some of these software packages are described in more detail in the
|
|
||||||
`OpenStack network nodes chapter <http://docs.openstack.org/ha-guide
|
|
||||||
/networking-ha.html>`_ in the OpenStack High Availability Guide.
|
|
||||||
|
|
||||||
For a general purpose OpenStack cloud, the OpenStack infrastructure
|
|
||||||
components need to be highly available. If the design does not include
|
|
||||||
hardware load balancing, networking software packages like HAProxy will
|
|
||||||
need to be included.
|
|
||||||
|
|
||||||
For a compute-focused OpenStack cloud, the OpenStack infrastructure
|
|
||||||
components must be highly available. If the design does not include
|
|
||||||
hardware load balancing, you must add networking software packages, for
|
|
||||||
example, HAProxy.
|
|
||||||
|
|
||||||
Management software
|
|
||||||
~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Management software includes software for providing:
|
|
||||||
|
|
||||||
* Clustering
|
|
||||||
|
|
||||||
* Logging
|
|
||||||
|
|
||||||
* Monitoring
|
|
||||||
|
|
||||||
* Alerting
|
|
||||||
|
|
||||||
.. important::
|
|
||||||
|
|
||||||
The factors for determining which software packages in this category
|
|
||||||
to select is outside the scope of this design guide.
|
|
||||||
|
|
||||||
The selected supplemental software solution impacts and affects the overall
|
|
||||||
OpenStack cloud design. This includes software for providing clustering,
|
|
||||||
logging, monitoring and alerting.
|
|
||||||
|
|
||||||
The inclusion of clustering software, such as Corosync or Pacemaker, is
|
|
||||||
primarily determined by the availability of the cloud infrastructure and
|
|
||||||
the complexity of supporting the configuration after it is deployed. The
|
|
||||||
`OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/>`_
|
|
||||||
provides more details on the installation and configuration of Corosync
|
|
||||||
and Pacemaker, should these packages need to be included in the design.
|
|
||||||
|
|
||||||
Operational considerations determine the requirements for logging,
|
|
||||||
monitoring, and alerting. Each of these sub-categories include various
|
|
||||||
options.
|
|
||||||
|
|
||||||
For example, in the logging sub-category you could select Logstash,
|
|
||||||
Splunk, Log Insight, or another log aggregation-consolidation tool.
|
|
||||||
Store logs in a centralized location to facilitate performing analytics
|
|
||||||
against the data. Log data analytics engines can also provide automation
|
|
||||||
and issue notification, by providing a mechanism to both alert and
|
|
||||||
automatically attempt to remediate some of the more commonly known
|
|
||||||
issues.
|
|
||||||
|
|
||||||
If these software packages are required, the design must account for the
|
|
||||||
additional resource consumption (CPU, RAM, storage, and network
|
|
||||||
bandwidth). Some other potential design impacts include:
|
|
||||||
|
|
||||||
* OS-hypervisor combination
|
|
||||||
Ensure that the selected logging, monitoring, or alerting tools support
|
|
||||||
the proposed OS-hypervisor combination.
|
|
||||||
|
|
||||||
* Network hardware
|
|
||||||
The network hardware selection needs to be supported by the logging,
|
|
||||||
monitoring, and alerting software.
|
|
||||||
|
|
||||||
Database software
|
|
||||||
~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Most OpenStack components require access to back-end database services
|
|
||||||
to store state and configuration information. Choose an appropriate
|
|
||||||
back-end database which satisfies the availability and fault tolerance
|
|
||||||
requirements of the OpenStack services.
|
|
||||||
|
|
||||||
MySQL is the default database for OpenStack, but other compatible
|
|
||||||
databases are available.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Telemetry uses MongoDB.
|
|
||||||
|
|
||||||
The chosen high availability database solution changes according to the
|
|
||||||
selected database. MySQL, for example, provides several options. Use a
|
|
||||||
replication technology such as Galera for active-active clustering. For
|
|
||||||
active-passive use some form of shared storage. Each of these potential
|
|
||||||
solutions has an impact on the design:
|
|
||||||
|
|
||||||
* Solutions that employ Galera/MariaDB require at least three MySQL
|
|
||||||
nodes.
|
|
||||||
|
|
||||||
* MongoDB has its own design considerations for high availability.
|
|
||||||
|
|
||||||
* OpenStack design, generally, does not include shared storage.
|
|
||||||
However, for some high availability designs, certain components might
|
|
||||||
require it depending on the specific implementation.
|
|
||||||
|
|
||||||
|
|
||||||
Licensing
|
|
||||||
~~~~~~~~~
|
|
||||||
|
|
||||||
The many different forms of license agreements for software are often written
|
|
||||||
with the use of dedicated hardware in mind. This model is relevant for the
|
|
||||||
cloud platform itself, including the hypervisor operating system, supporting
|
|
||||||
software for items such as database, RPC, backup, and so on. Consideration
|
|
||||||
must be made when offering Compute service instances and applications to end
|
|
||||||
users of the cloud, since the license terms for that software may need some
|
|
||||||
adjustment to be able to operate economically in the cloud.
|
|
||||||
|
|
||||||
Multi-site OpenStack deployments present additional licensing
|
|
||||||
considerations over and above regular OpenStack clouds, particularly
|
|
||||||
where site licenses are in use to provide cost efficient access to
|
|
||||||
software licenses. The licensing for host operating systems, guest
|
|
||||||
operating systems, OpenStack distributions (if applicable),
|
|
||||||
software-defined infrastructure including network controllers and
|
|
||||||
storage systems, and even individual applications need to be evaluated.
|
|
||||||
|
|
||||||
Topics to consider include:
|
|
||||||
|
|
||||||
* The definition of what constitutes a site in the relevant licenses,
|
|
||||||
as the term does not necessarily denote a geographic or otherwise
|
|
||||||
physically isolated location.
|
|
||||||
|
|
||||||
* Differentiations between "hot" (active) and "cold" (inactive) sites,
|
|
||||||
where significant savings may be made in situations where one site is
|
|
||||||
a cold standby for disaster recovery purposes only.
|
|
||||||
|
|
||||||
* Certain locations might require local vendors to provide support and
|
|
||||||
services for each site which may vary with the licensing agreement in
|
|
||||||
place.
|
|
@ -1,61 +0,0 @@
|
|||||||
======================
|
|
||||||
Technical requirements
|
|
||||||
======================
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 2
|
|
||||||
|
|
||||||
technical-requirements-software-selection.rst
|
|
||||||
technical-requirements-hardware-selection.rst
|
|
||||||
technical-requirements-network-design.rst
|
|
||||||
technical-requirements-logging-monitoring.rst
|
|
||||||
|
|
||||||
Any given cloud deployment is expected to include these base services:
|
|
||||||
|
|
||||||
* Compute
|
|
||||||
|
|
||||||
* Networking
|
|
||||||
|
|
||||||
* Storage
|
|
||||||
|
|
||||||
Each of these services have different software and hardware resource
|
|
||||||
requirements.
|
|
||||||
As a result, you must make design decisions relating directly
|
|
||||||
to the service, as well as provide a balanced infrastructure for all services.
|
|
||||||
|
|
||||||
There are many ways to split out an OpenStack deployment, but a two box
|
|
||||||
deployment typically consists of:
|
|
||||||
|
|
||||||
* A controller node
|
|
||||||
* A compute node
|
|
||||||
|
|
||||||
The controller node will typically host:
|
|
||||||
|
|
||||||
* Identity service (for authentication)
|
|
||||||
* Image service (for image storage)
|
|
||||||
* Block Storage
|
|
||||||
* Networking service (the ``nova-network`` service may be used instead)
|
|
||||||
* Compute service API, conductor, and scheduling services
|
|
||||||
* Supporting services like the message broker (RabbitMQ)
|
|
||||||
and database (MySQL or PostgreSQL)
|
|
||||||
|
|
||||||
The compute node will typically host:
|
|
||||||
|
|
||||||
* Nova compute
|
|
||||||
* A networking agent, if using OpenStack Networking
|
|
||||||
|
|
||||||
To provide additional block storage in a small environment, you may also
|
|
||||||
choose to deploy ``cinder-volume`` on the compute node.
|
|
||||||
You may also choose to run ``nova-compute`` on the controller itself to
|
|
||||||
allow you to run virtual machines on both hosts in a small environments.
|
|
||||||
|
|
||||||
To expand such an environment you would add additional compute nodes,
|
|
||||||
a separate networking node, and eventually a second controller for high
|
|
||||||
availability. You might also split out storage to dedicated nodes.
|
|
||||||
|
|
||||||
The OpenStack Installation guides provide some guidance on getting a basic
|
|
||||||
2-3 node deployment installed and running:
|
|
||||||
|
|
||||||
* `OpenStack Installation Guide for Ubuntu <http://docs.openstack.org/mitaka/install-guide-ubuntu/>`_
|
|
||||||
* `OpenStack Installation Guide for Red Hat Enterprise Linux and CentOS <http://docs.openstack.org/mikata/install-guide-rdo/>`_
|
|
||||||
* `OpenStack Installation Guide for openSUSE and SUSE Linux Enterprise <http://docs.openstack.org/mitaka/install-guide-obs/>`_
|
|
@ -134,7 +134,7 @@ of cores is further multiplied.
|
|||||||
testing with your local workload with both Hyper-Threading on and off to
|
testing with your local workload with both Hyper-Threading on and off to
|
||||||
determine what is more appropriate in your case.
|
determine what is more appropriate in your case.
|
||||||
|
|
||||||
Choosing a Hypervisor
|
Choosing a hypervisor
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
A hypervisor provides software to manage virtual machine access to the
|
A hypervisor provides software to manage virtual machine access to the
|
||||||
@ -173,6 +173,110 @@ and in the `configuration reference
|
|||||||
deployment using host aggregates or cells. However, an individual
|
deployment using host aggregates or cells. However, an individual
|
||||||
compute node can run only a single hypervisor at a time.
|
compute node can run only a single hypervisor at a time.
|
||||||
|
|
||||||
|
Choosing server hardware
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Consider the following in selecting server hardware form factor suited for
|
||||||
|
your OpenStack design architecture:
|
||||||
|
|
||||||
|
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
||||||
|
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
||||||
|
aware, however, that this also decreases server density. For example,
|
||||||
|
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
||||||
|
M1000e support up to 16 servers in only ten rack units. Using
|
||||||
|
half-height blades is twice as dense as using full-height blades,
|
||||||
|
which results in only eight servers per ten rack units.
|
||||||
|
|
||||||
|
* 1U rack-mounted servers have the ability to offer greater server density
|
||||||
|
than a blade server solution, but are often limited to dual-socket,
|
||||||
|
multi-core CPU configurations. It is possible to place forty 1U servers
|
||||||
|
in a rack, providing space for the top of rack (ToR) switches, compared
|
||||||
|
to 32 full width blade servers.
|
||||||
|
|
||||||
|
To obtain greater than dual-socket support in a 1U rack-mount form
|
||||||
|
factor, customers need to buy their systems from Original Design
|
||||||
|
Manufacturers (ODMs) or second-tier manufacturers.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This may cause issues for organizations that have preferred
|
||||||
|
vendor policies or concerns with support and hardware warranties
|
||||||
|
of non-tier 1 vendors.
|
||||||
|
|
||||||
|
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
||||||
|
but with a corresponding decrease in server density (half the density
|
||||||
|
that 1U rack-mounted servers offer).
|
||||||
|
|
||||||
|
* Larger rack-mounted servers, such as 4U servers, often provide even
|
||||||
|
greater CPU capacity, commonly supporting four or even eight CPU
|
||||||
|
sockets. These servers have greater expandability, but such servers
|
||||||
|
have much lower server density and are often more expensive.
|
||||||
|
|
||||||
|
* ``Sled servers`` are rack-mounted servers that support multiple
|
||||||
|
independent servers in a single 2U or 3U enclosure. These deliver
|
||||||
|
higher density as compared to typical 1U or 2U rack-mounted servers.
|
||||||
|
For example, many sled servers offer four independent dual-socket
|
||||||
|
nodes in 2U for a total of eight CPU sockets in 2U.
|
||||||
|
|
||||||
|
|
||||||
|
Other hardware considerations
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Other factors that influence server hardware selection for an OpenStack
|
||||||
|
design architecture include:
|
||||||
|
|
||||||
|
Instance density
|
||||||
|
More hosts are required to support the anticipated scale
|
||||||
|
if the design architecture uses dual-socket hardware designs.
|
||||||
|
|
||||||
|
For a general purpose OpenStack cloud, sizing is an important consideration.
|
||||||
|
The expected or anticipated number of instances that each hypervisor can
|
||||||
|
host is a common meter used in sizing the deployment. The selected server
|
||||||
|
hardware needs to support the expected or anticipated instance density.
|
||||||
|
|
||||||
|
Host density
|
||||||
|
Another option to address the higher host count is to use a
|
||||||
|
quad-socket platform. Taking this approach decreases host density
|
||||||
|
which also increases rack count. This configuration affects the
|
||||||
|
number of power connections and also impacts network and cooling
|
||||||
|
requirements.
|
||||||
|
|
||||||
|
Physical data centers have limited physical space, power, and
|
||||||
|
cooling. The number of hosts (or hypervisors) that can be fitted
|
||||||
|
into a given metric (rack, rack unit, or floor tile) is another
|
||||||
|
important method of sizing. Floor weight is an often overlooked
|
||||||
|
consideration. The data center floor must be able to support the
|
||||||
|
weight of the proposed number of hosts within a rack or set of
|
||||||
|
racks. These factors need to be applied as part of the host density
|
||||||
|
calculation and server hardware selection.
|
||||||
|
|
||||||
|
Power and cooling density
|
||||||
|
The power and cooling density requirements might be lower than with
|
||||||
|
blade, sled, or 1U server designs due to lower host density (by
|
||||||
|
using 2U, 3U or even 4U server designs). For data centers with older
|
||||||
|
infrastructure, this might be a desirable feature.
|
||||||
|
|
||||||
|
Data centers have a specified amount of power fed to a given rack or
|
||||||
|
set of racks. Older data centers may have a power density as power
|
||||||
|
as low as 20 AMPs per rack, while more recent data centers can be
|
||||||
|
architected to support power densities as high as 120 AMP per rack.
|
||||||
|
The selected server hardware must take power density into account.
|
||||||
|
|
||||||
|
Network connectivity
|
||||||
|
The selected server hardware must have the appropriate number of
|
||||||
|
network connections, as well as the right type of network
|
||||||
|
connections, in order to support the proposed architecture. Ensure
|
||||||
|
that, at a minimum, there are at least two diverse network
|
||||||
|
connections coming into each rack.
|
||||||
|
|
||||||
|
The selection of form factors or architectures affects the selection of
|
||||||
|
server hardware. Ensure that the selected server hardware is configured
|
||||||
|
to support enough storage capacity (or storage expandability) to match
|
||||||
|
the requirements of selected scale-out storage solution. Similarly, the
|
||||||
|
network architecture impacts the server hardware selection and vice
|
||||||
|
versa.
|
||||||
|
|
||||||
|
|
||||||
Instance Storage Solutions
|
Instance Storage Solutions
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
@ -381,3 +485,144 @@ Networking
|
|||||||
|
|
||||||
Networking in OpenStack is a complex, multifaceted challenge. See
|
Networking in OpenStack is a complex, multifaceted challenge. See
|
||||||
:doc:`design-networking`.
|
:doc:`design-networking`.
|
||||||
|
|
||||||
|
Compute (server) hardware selection
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Consider the following factors when selecting compute (server) hardware:
|
||||||
|
|
||||||
|
* Server density
|
||||||
|
A measure of how many servers can fit into a given measure of
|
||||||
|
physical space, such as a rack unit [U].
|
||||||
|
|
||||||
|
* Resource capacity
|
||||||
|
The number of CPU cores, how much RAM, or how much storage a given
|
||||||
|
server delivers.
|
||||||
|
|
||||||
|
* Expandability
|
||||||
|
The number of additional resources you can add to a server before it
|
||||||
|
reaches capacity.
|
||||||
|
|
||||||
|
* Cost
|
||||||
|
The relative cost of the hardware weighed against the level of
|
||||||
|
design effort needed to build the system.
|
||||||
|
|
||||||
|
Weigh these considerations against each other to determine the best
|
||||||
|
design for the desired purpose. For example, increasing server density
|
||||||
|
means sacrificing resource capacity or expandability. Increasing resource
|
||||||
|
capacity and expandability can increase cost but decrease server density.
|
||||||
|
Decreasing cost often means decreasing supportability, server density,
|
||||||
|
resource capacity, and expandability.
|
||||||
|
|
||||||
|
Compute capacity (CPU cores and RAM capacity) is a secondary
|
||||||
|
consideration for selecting server hardware. The required
|
||||||
|
server hardware must supply adequate CPU sockets, additional CPU cores,
|
||||||
|
and more RAM; network connectivity and storage capacity are not as
|
||||||
|
critical. The hardware needs to provide enough network connectivity and
|
||||||
|
storage capacity to meet the user requirements.
|
||||||
|
|
||||||
|
For a compute-focused cloud, emphasis should be on server
|
||||||
|
hardware that can offer more CPU sockets, more CPU cores, and more RAM.
|
||||||
|
Network connectivity and storage capacity are less critical.
|
||||||
|
|
||||||
|
When designing a OpenStack cloud architecture, you must
|
||||||
|
consider whether you intend to scale up or scale out. Selecting a
|
||||||
|
smaller number of larger hosts, or a larger number of smaller hosts,
|
||||||
|
depends on a combination of factors: cost, power, cooling, physical rack
|
||||||
|
and floor space, support-warranty, and manageability.
|
||||||
|
|
||||||
|
Consider the following in selecting server hardware form factor suited for
|
||||||
|
your OpenStack design architecture:
|
||||||
|
|
||||||
|
* Most blade servers can support dual-socket multi-core CPUs. To avoid
|
||||||
|
this CPU limit, select ``full width`` or ``full height`` blades. Be
|
||||||
|
aware, however, that this also decreases server density. For example,
|
||||||
|
high density blade servers such as HP BladeSystem or Dell PowerEdge
|
||||||
|
M1000e support up to 16 servers in only ten rack units. Using
|
||||||
|
half-height blades is twice as dense as using full-height blades,
|
||||||
|
which results in only eight servers per ten rack units.
|
||||||
|
|
||||||
|
* 1U rack-mounted servers have the ability to offer greater server density
|
||||||
|
than a blade server solution, but are often limited to dual-socket,
|
||||||
|
multi-core CPU configurations. It is possible to place forty 1U servers
|
||||||
|
in a rack, providing space for the top of rack (ToR) switches, compared
|
||||||
|
to 32 full width blade servers.
|
||||||
|
|
||||||
|
To obtain greater than dual-socket support in a 1U rack-mount form
|
||||||
|
factor, customers need to buy their systems from Original Design
|
||||||
|
Manufacturers (ODMs) or second-tier manufacturers.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This may cause issues for organizations that have preferred
|
||||||
|
vendor policies or concerns with support and hardware warranties
|
||||||
|
of non-tier 1 vendors.
|
||||||
|
|
||||||
|
* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
|
||||||
|
but with a corresponding decrease in server density (half the density
|
||||||
|
that 1U rack-mounted servers offer).
|
||||||
|
|
||||||
|
* Larger rack-mounted servers, such as 4U servers, often provide even
|
||||||
|
greater CPU capacity, commonly supporting four or even eight CPU
|
||||||
|
sockets. These servers have greater expandability, but such servers
|
||||||
|
have much lower server density and are often more expensive.
|
||||||
|
|
||||||
|
* ``Sled servers`` are rack-mounted servers that support multiple
|
||||||
|
independent servers in a single 2U or 3U enclosure. These deliver
|
||||||
|
higher density as compared to typical 1U or 2U rack-mounted servers.
|
||||||
|
For example, many sled servers offer four independent dual-socket
|
||||||
|
nodes in 2U for a total of eight CPU sockets in 2U.
|
||||||
|
|
||||||
|
Other factors that influence server hardware selection for an OpenStack
|
||||||
|
design architecture include:
|
||||||
|
|
||||||
|
Instance density
|
||||||
|
More hosts are required to support the anticipated scale
|
||||||
|
if the design architecture uses dual-socket hardware designs.
|
||||||
|
|
||||||
|
For a general purpose OpenStack cloud, sizing is an important consideration.
|
||||||
|
The expected or anticipated number of instances that each hypervisor can
|
||||||
|
host is a common meter used in sizing the deployment. The selected server
|
||||||
|
hardware needs to support the expected or anticipated instance density.
|
||||||
|
|
||||||
|
Host density
|
||||||
|
Another option to address the higher host count is to use a
|
||||||
|
quad-socket platform. Taking this approach decreases host density
|
||||||
|
which also increases rack count. This configuration affects the
|
||||||
|
number of power connections and also impacts network and cooling
|
||||||
|
requirements.
|
||||||
|
|
||||||
|
Physical data centers have limited physical space, power, and
|
||||||
|
cooling. The number of hosts (or hypervisors) that can be fitted
|
||||||
|
into a given metric (rack, rack unit, or floor tile) is another
|
||||||
|
important method of sizing. Floor weight is an often overlooked
|
||||||
|
consideration. The data center floor must be able to support the
|
||||||
|
weight of the proposed number of hosts within a rack or set of
|
||||||
|
racks. These factors need to be applied as part of the host density
|
||||||
|
calculation and server hardware selection.
|
||||||
|
|
||||||
|
Power and cooling density
|
||||||
|
The power and cooling density requirements might be lower than with
|
||||||
|
blade, sled, or 1U server designs due to lower host density (by
|
||||||
|
using 2U, 3U or even 4U server designs). For data centers with older
|
||||||
|
infrastructure, this might be a desirable feature.
|
||||||
|
|
||||||
|
Data centers have a specified amount of power fed to a given rack or
|
||||||
|
set of racks. Older data centers may have a power density as power
|
||||||
|
as low as 20 AMPs per rack, while more recent data centers can be
|
||||||
|
architected to support power densities as high as 120 AMP per rack.
|
||||||
|
The selected server hardware must take power density into account.
|
||||||
|
|
||||||
|
Network connectivity
|
||||||
|
The selected server hardware must have the appropriate number of
|
||||||
|
network connections, as well as the right type of network
|
||||||
|
connections, in order to support the proposed architecture. Ensure
|
||||||
|
that, at a minimum, there are at least two diverse network
|
||||||
|
connections coming into each rack.
|
||||||
|
|
||||||
|
The selection of form factors or architectures affects the selection of
|
||||||
|
server hardware. Ensure that the selected server hardware is configured
|
||||||
|
to support enough storage capacity (or storage expandability) to match
|
||||||
|
the requirements of selected scale-out storage solution. Similarly, the
|
||||||
|
network architecture impacts the server hardware selection and vice
|
||||||
|
versa.
|
||||||
|
@ -4,6 +4,516 @@ Networking concepts
|
|||||||
|
|
||||||
Cloud fundementally changes the ways that networking is provided and consumed.
|
Cloud fundementally changes the ways that networking is provided and consumed.
|
||||||
Understanding the following concepts and decisions is imperative when making
|
Understanding the following concepts and decisions is imperative when making
|
||||||
the right architectural decisions
|
the right architectural decisions.
|
||||||
|
|
||||||
|
OpenStack clouds generally have multiple network segments, with each
|
||||||
|
segment providing access to particular resources. The network segments
|
||||||
|
themselves also require network communication paths that should be
|
||||||
|
separated from the other networks. When designing network services for a
|
||||||
|
general purpose cloud, plan for either a physical or logical separation
|
||||||
|
of network segments used by operators and tenants. Additional network
|
||||||
|
segments can also be created for access to internal services such as the
|
||||||
|
message bus and database used by various systems. Segregating these
|
||||||
|
services onto separate networks helps to protect sensitive data and
|
||||||
|
unauthorized access.
|
||||||
|
|
||||||
|
Choose a networking service based on the requirements of your instances.
|
||||||
|
The architecture and design of your cloud will impact whether you choose
|
||||||
|
OpenStack Networking (neutron) or legacy networking (nova-network).
|
||||||
|
|
||||||
|
Networking (neutron)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack Networking (neutron) is a first class networking service that gives
|
||||||
|
full control over creation of virtual network resources to tenants. This is
|
||||||
|
often accomplished in the form of tunneling protocols that establish
|
||||||
|
encapsulated communication paths over existing network infrastructure in order
|
||||||
|
to segment tenant traffic. This method varies depending on the specific
|
||||||
|
implementation, but some of the more common methods include tunneling over
|
||||||
|
GRE, encapsulating with VXLAN, and VLAN tags.
|
||||||
|
|
||||||
|
We recommend you design at least three network segments. The first segment
|
||||||
|
should be a public network, used to access REST APIs by tenants and operators.
|
||||||
|
The controller nodes and swift proxies are the only devices connecting to this
|
||||||
|
network segment. In some cases, this public network might also be serviced by
|
||||||
|
hardware load balancers and other network devices.
|
||||||
|
|
||||||
|
The second segment is used by administrators to manage hardware resources.
|
||||||
|
Configuration management tools also utilize this segment for deploying
|
||||||
|
software and services onto new hardware. In some cases, this network
|
||||||
|
segment is also used for internal services, including the message bus
|
||||||
|
and database services. The second segment needs to communicate with every
|
||||||
|
hardware node. Due to the highly sensitive nature of this network segment,
|
||||||
|
it needs to be secured from unauthorized access.
|
||||||
|
|
||||||
|
The third network segment is used by applications and consumers to access the
|
||||||
|
physical network, and for users to access applications. This network is
|
||||||
|
segregated from the one used to access the cloud APIs and is not capable
|
||||||
|
of communicating directly with the hardware resources in the cloud.
|
||||||
|
Communication on this network segment is required by compute resource
|
||||||
|
nodes and network gateway services that allow application data to access the
|
||||||
|
physical network from outside the cloud.
|
||||||
|
|
||||||
|
Legacy networking (nova-network)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The legacy networking (nova-network) service is primarily a layer-2 networking
|
||||||
|
service. It functions in two modes: flat networking mode and VLAN mode. In a
|
||||||
|
flat network mode, all network hardware nodes and devices throughout the cloud
|
||||||
|
are connected to a single layer-2 network segment that provides access to
|
||||||
|
application data.
|
||||||
|
|
||||||
|
However, when the network devices in the cloud support segmentation using
|
||||||
|
VLANs, legacy networking can operate in the second mode. In this design model,
|
||||||
|
each tenant within the cloud is assigned a network subnet which is mapped to
|
||||||
|
a VLAN on the physical network. It is especially important to remember that
|
||||||
|
the maximum number of VLANs that can be used within a spanning tree domain
|
||||||
|
is 4096. This places a hard limit on the amount of growth possible within the
|
||||||
|
data center. Consequently, when designing a general purpose cloud intended to
|
||||||
|
support multiple tenants, we recommend the use of legacy networking with
|
||||||
|
VLANs, and not in flat network mode.
|
||||||
|
|
||||||
|
Layer-2 architecture advantages
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
A network designed on layer-2 protocols has advantages over a network designed
|
||||||
|
on layer-3 protocols. In spite of the difficulties of using a bridge to perform
|
||||||
|
the network role of a router, many vendors, customers, and service providers
|
||||||
|
choose to use Ethernet in as many parts of their networks as possible. The
|
||||||
|
benefits of selecting a layer-2 design are:
|
||||||
|
|
||||||
|
* Ethernet frames contain all the essentials for networking. These include, but
|
||||||
|
are not limited to, globally unique source addresses, globally unique
|
||||||
|
destination addresses, and error control.
|
||||||
|
|
||||||
|
* Ethernet frames contain all the essentials for networking. These include,
|
||||||
|
but are not limited to, globally unique source addresses, globally unique
|
||||||
|
destination addresses, and error control.
|
||||||
|
|
||||||
|
* Ethernet frames can carry any kind of packet. Networking at layer-2 is
|
||||||
|
independent of the layer-3 protocol.
|
||||||
|
|
||||||
|
* Adding more layers to the Ethernet frame only slows the networking process
|
||||||
|
down. This is known as nodal processing delay.
|
||||||
|
|
||||||
|
* You can add adjunct networking features, for example class of service (CoS)
|
||||||
|
or multicasting, to Ethernet as readily as IP networks.
|
||||||
|
|
||||||
|
* VLANs are an easy mechanism for isolating networks.
|
||||||
|
|
||||||
|
Most information starts and ends inside Ethernet frames. Today this applies
|
||||||
|
to data, voice, and video. The concept is that the network will benefit more
|
||||||
|
from the advantages of Ethernet if the transfer of information from a source
|
||||||
|
to a destination is in the form of Ethernet frames.
|
||||||
|
|
||||||
|
Although it is not a substitute for IP networking, networking at layer-2 can
|
||||||
|
be a powerful adjunct to IP networking.
|
||||||
|
|
||||||
|
Layer-2 Ethernet usage has these additional advantages over layer-3 IP network
|
||||||
|
usage:
|
||||||
|
|
||||||
|
* Speed
|
||||||
|
* Reduced overhead of the IP hierarchy.
|
||||||
|
* No need to keep track of address configuration as systems move around.
|
||||||
|
|
||||||
|
Whereas the simplicity of layer-2 protocols might work well in a data center
|
||||||
|
with hundreds of physical machines, cloud data centers have the additional
|
||||||
|
burden of needing to keep track of all virtual machine addresses and
|
||||||
|
networks. In these data centers, it is not uncommon for one physical node
|
||||||
|
to support 30-40 instances.
|
||||||
|
|
||||||
|
.. Important::
|
||||||
|
|
||||||
|
Networking at the frame level says nothing about the presence or
|
||||||
|
absence of IP addresses at the packet level. Almost all ports, links, and
|
||||||
|
devices on a network of LAN switches still have IP addresses, as do all the
|
||||||
|
source and destination hosts. There are many reasons for the continued need
|
||||||
|
for IP addressing. The largest one is the need to manage the network. A
|
||||||
|
device or link without an IP address is usually invisible to most
|
||||||
|
management applications. Utilities including remote access for diagnostics,
|
||||||
|
file transfer of configurations and software, and similar applications
|
||||||
|
cannot run without IP addresses as well as MAC addresses.
|
||||||
|
|
||||||
|
Layer-2 architecture limitations
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Layer-2 network architectures have some limitations that become noticeable when
|
||||||
|
used outside of traditional data centers.
|
||||||
|
|
||||||
|
* Number of VLANs is limited to 4096.
|
||||||
|
* The number of MACs stored in switch tables is limited.
|
||||||
|
* You must accommodate the need to maintain a set of layer-4 devices to handle
|
||||||
|
traffic control.
|
||||||
|
* MLAG, often used for switch redundancy, is a proprietary solution that does
|
||||||
|
not scale beyond two devices and forces vendor lock-in.
|
||||||
|
* It can be difficult to troubleshoot a network without IP addresses and ICMP.
|
||||||
|
* Configuring ARP can be complicated on a large layer-2 networks.
|
||||||
|
* All network devices need to be aware of all MACs, even instance MACs, so
|
||||||
|
there is constant churn in MAC tables and network state changes as instances
|
||||||
|
start and stop.
|
||||||
|
* Migrating MACs (instance migration) to different physical locations are a
|
||||||
|
potential problem if you do not set ARP table timeouts properly.
|
||||||
|
|
||||||
|
It is important to know that layer-2 has a very limited set of network
|
||||||
|
management tools. It is difficult to control traffic as it does not have
|
||||||
|
mechanisms to manage the network or shape the traffic. Network
|
||||||
|
troubleshooting is also troublesome, in part because network devices have
|
||||||
|
no IP addresses. As a result, there is no reasonable way to check network
|
||||||
|
delay.
|
||||||
|
|
||||||
|
In a layer-2 network all devices are aware of all MACs, even those that belong
|
||||||
|
to instances. The network state information in the backbone changes whenever an
|
||||||
|
instance starts or stops. Because of this, there is far too much churn in the
|
||||||
|
MAC tables on the backbone switches.
|
||||||
|
|
||||||
|
Furthermore, on large layer-2 networks, configuring ARP learning can be
|
||||||
|
complicated. The setting for the MAC address timer on switches is critical
|
||||||
|
and, if set incorrectly, can cause significant performance problems. So when
|
||||||
|
migrating MACs to different physical locations to support instance migration,
|
||||||
|
problems may arise. As an example, the Cisco default MAC address timer is
|
||||||
|
extremely long. As such, the network information maintained in the switches
|
||||||
|
could be out of sync with the new location of the instance.
|
||||||
|
|
||||||
|
Layer-3 architecture advantages
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
In layer-3 networking, routing takes instance MAC and IP addresses out of the
|
||||||
|
network core, reducing state churn. The only time there would be a routing
|
||||||
|
state change is in the case of a Top of Rack (ToR) switch failure or a link
|
||||||
|
failure in the backbone itself. Other advantages of using a layer-3
|
||||||
|
architecture include:
|
||||||
|
|
||||||
|
* Layer-3 networks provide the same level of resiliency and scalability
|
||||||
|
as the Internet.
|
||||||
|
|
||||||
|
* Controlling traffic with routing metrics is straightforward.
|
||||||
|
|
||||||
|
* You can configure layer-3 to useˇBGPˇconfederation for scalability. This
|
||||||
|
way core routers have state proportional to the number of racks, not to the
|
||||||
|
number of servers or instances.
|
||||||
|
|
||||||
|
* There are a variety of well tested tools, such as ICMP, to monitor and
|
||||||
|
manage traffic.
|
||||||
|
|
||||||
|
* Layer-3 architectures enable the use of :term:`quality of service (QoS)` to
|
||||||
|
manage network performance.
|
||||||
|
|
||||||
|
Layer-3 architecture limitations
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The main limitation of layer-3 networking is that there is no built-in
|
||||||
|
isolation mechanism comparable to the VLANs in layer-2 networks. Furthermore,
|
||||||
|
the hierarchical nature of IP addresses means that an instance is on the same
|
||||||
|
subnet as its
|
||||||
|
physical host, making migration out of the subnet difficult. For these reasons,
|
||||||
|
network virtualization needs to use IPencapsulation and software at the end
|
||||||
|
hosts. This is for isolation and the separation of the addressing in the
|
||||||
|
virtual layer from the addressing in the physical layer. Other potential
|
||||||
|
disadvantages of layer 3 include the need to design an IP addressing scheme
|
||||||
|
rather than relying on the switches to keep track of the MAC addresses
|
||||||
|
automatically, and to configure the interior gateway routing protocol in the
|
||||||
|
switches.
|
||||||
|
|
||||||
|
Network design
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
There are many reasons an OpenStack network has complex requirements. However,
|
||||||
|
one main factor is the many components that interact at different levels of the
|
||||||
|
system stack, adding complexity. Data flows are also complex. Data in an
|
||||||
|
OpenStack cloud moves both between instances across the network (also known as
|
||||||
|
East-West), as well as in and out of the system (also known as North-South).
|
||||||
|
Physical server nodes have network requirements that are independent of
|
||||||
|
instance network requirements, and must be isolated to account for
|
||||||
|
scalability. We recommend separating the networks for security purposes and
|
||||||
|
tuning performance through traffic shaping.
|
||||||
|
|
||||||
|
You must consider a number of important general technical and business factors
|
||||||
|
when planning and designing an OpenStack network. These include:
|
||||||
|
|
||||||
|
* A requirement for vendor independence. To avoid hardware or software vendor
|
||||||
|
lock-in, the design should not rely on specific features of a vendors router
|
||||||
|
or switch.
|
||||||
|
* A requirement to massively scale the ecosystem to support millions of end
|
||||||
|
users.
|
||||||
|
* A requirement to support indeterminate platforms and applications.
|
||||||
|
* A requirement to design for cost efficient operations to take advantage of
|
||||||
|
massive scale.
|
||||||
|
* A requirement to ensure that there is no single point of failure in the
|
||||||
|
cloud ecosystem.
|
||||||
|
* A requirement for high availability architecture to meet customer SLA
|
||||||
|
requirements.
|
||||||
|
* A requirement to be tolerant of rack level failure.
|
||||||
|
* A requirement to maximize flexibility to architect future production
|
||||||
|
environments.
|
||||||
|
|
||||||
|
Bearing in mind these considerations, we recommend the following:
|
||||||
|
|
||||||
|
* Layer-3 designs are preferable to layer-2 architectures.
|
||||||
|
* Design a dense multi-path network core to support multi-directional
|
||||||
|
scaling and flexibility.
|
||||||
|
* Use hierarchical addressing because it is the only viable option to scale
|
||||||
|
network ecosystem.
|
||||||
|
* Use virtual networking to isolate instance service network traffic from the
|
||||||
|
management and internal network traffic.
|
||||||
|
* Isolate virtual networks using encapsulation technologies.
|
||||||
|
* Use traffic shaping for performance tuning.
|
||||||
|
* Use eBGP to connect to the Internet up-link.
|
||||||
|
* Use iBGP to flatten the internal traffic on the layer-3 mesh.
|
||||||
|
* Determine the most effective configuration for block storage network.
|
||||||
|
|
||||||
|
|
||||||
|
Additional considerations
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
There are several further considerations when designing a network-focused
|
||||||
|
OpenStack cloud. Redundant networking: ToR switch high availability risk
|
||||||
|
analysis. In most cases, it is much more economical to use a single switch
|
||||||
|
with a small pool of spare switches to replace failed units than it is to
|
||||||
|
outfit an entire data center with redundant switches. Applications should
|
||||||
|
tolerate rack level outages without affecting normal operations since network
|
||||||
|
and compute resources are easily provisioned and plentiful.
|
||||||
|
|
||||||
|
Research indicates the mean time between failures (MTBF) on switches is
|
||||||
|
between 100,000 and 200,000 hours. This number is dependent on the ambient
|
||||||
|
temperature of the switch in the data center. When properly cooled and
|
||||||
|
maintained, this translates to between 11 and 22 years before failure. Even
|
||||||
|
in the worst case of poor ventilation and high ambient temperatures in the data
|
||||||
|
center, the MTBF is still 2-3 years.
|
||||||
|
|
||||||
|
Reference
|
||||||
|
https://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
|
||||||
|
for further information.
|
||||||
|
|
||||||
|
Legacy networking (nova-network)
|
||||||
|
OpenStack Networking
|
||||||
|
Simple, single agent
|
||||||
|
Complex, multiple agents
|
||||||
|
Flat or VLAN
|
||||||
|
Flat, VLAN, Overlays, L2-L3, SDN
|
||||||
|
No plug-in support
|
||||||
|
Plug-in support for 3rd parties
|
||||||
|
No multi-tier topologies
|
||||||
|
Multi-tier topologies
|
||||||
|
|
||||||
|
Preparing for the future: IPv6 support
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
One of the most important networking topics today is the exhaustion of
|
||||||
|
IPv4 addresses. As of late 2015, ICANN announced that the the final
|
||||||
|
IPv4 address blocks have been fully assigned. Because of this, IPv6
|
||||||
|
protocol has become the future of network focused applications. IPv6
|
||||||
|
increases the address space significantly, fixes long standing issues
|
||||||
|
in the IPv4 protocol, and will become essential for network focused
|
||||||
|
applications in the future.
|
||||||
|
|
||||||
|
OpenStack Networking, when configured for it, supports IPv6. To enable
|
||||||
|
IPv6, create an IPv6 subnet in Networking and use IPv6 prefixes when
|
||||||
|
creating security groups.
|
||||||
|
|
||||||
|
Asymmetric links
|
||||||
|
----------------
|
||||||
|
|
||||||
|
When designing a network architecture, the traffic patterns of an
|
||||||
|
application heavily influence the allocation of total bandwidth and
|
||||||
|
the number of links that you use to send and receive traffic. Applications
|
||||||
|
that provide file storage for customers allocate bandwidth and links to
|
||||||
|
favor incoming traffic; whereas video streaming applications allocate
|
||||||
|
bandwidth and links to favor outgoing traffic.
|
||||||
|
|
||||||
|
Performance
|
||||||
|
-----------
|
||||||
|
|
||||||
|
It is important to analyze the applications tolerance for latency and
|
||||||
|
jitter when designing an environment to support network focused
|
||||||
|
applications. Certain applications, for example VoIP, are less tolerant
|
||||||
|
of latency and jitter. When latency and jitter are issues, certain
|
||||||
|
applications may require tuning of QoS parameters and network device
|
||||||
|
queues to ensure that they queue for transmit immediately or guarantee
|
||||||
|
minimum bandwidth. Since OpenStack currently does not support these functions,
|
||||||
|
consider carefully your selected network plug-in.
|
||||||
|
|
||||||
|
The location of a service may also impact the application or consumer
|
||||||
|
experience. If an application serves differing content to different users,
|
||||||
|
it must properly direct connections to those specific locations. Where
|
||||||
|
appropriate, use a multi-site installation for these situations.
|
||||||
|
|
||||||
|
You can implement networking in two separate ways. Legacy networking
|
||||||
|
(nova-network) provides a flat DHCP network with a single broadcast domain.
|
||||||
|
This implementation does not support tenant isolation networks or advanced
|
||||||
|
plug-ins, but it is currently the only way to implement a distributed
|
||||||
|
layer-3 (L3) agent using the multi host configuration. OpenStack Networking
|
||||||
|
(neutron) is the official networking implementation and provides a pluggable
|
||||||
|
architecture that supports a large variety of network methods. Some of these
|
||||||
|
include a layer-2 only provider network model, external device plug-ins, or
|
||||||
|
even OpenFlow controllers.
|
||||||
|
|
||||||
|
Networking at large scales becomes a set of boundary questions. The
|
||||||
|
determination of how large a layer-2 domain must be is based on the
|
||||||
|
amount of nodes within the domain and the amount of broadcast traffic
|
||||||
|
that passes between instances. Breaking layer-2 boundaries may require
|
||||||
|
the implementation of overlay networks and tunnels. This decision is a
|
||||||
|
balancing act between the need for a smaller overhead or a need for a smaller
|
||||||
|
domain.
|
||||||
|
|
||||||
|
When selecting network devices, be aware that making a decision based on the
|
||||||
|
greatest port density often comes with a drawback. Aggregation switches and
|
||||||
|
routers have not all kept pace with Top of Rack switches and may induce
|
||||||
|
bottlenecks on north-south traffic. As a result, it may be possible for
|
||||||
|
massive amounts of downstream network utilization to impact upstream network
|
||||||
|
devices, impacting service to the cloud. Since OpenStack does not currently
|
||||||
|
provide a mechanism for traffic shaping or rate limiting, it is necessary to
|
||||||
|
implement these features at the network hardware level.
|
||||||
|
|
||||||
|
Tunable networking components
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
Consider configurable networking components related to an OpenStack
|
||||||
|
architecture design when designing for network intensive workloads
|
||||||
|
that include MTU and QoS. Some workloads require a larger MTU than normal
|
||||||
|
due to the transfer of large blocks of data. When providing network
|
||||||
|
service for applications such as video streaming or storage replication,
|
||||||
|
we recommend that you configure both OpenStack hardware nodes and the
|
||||||
|
supporting network equipment for jumbo frames where possible. This
|
||||||
|
allows for better use of available bandwidth. Configure jumbo frames across the
|
||||||
|
complete path the packets traverse. If one network component is not capable of
|
||||||
|
handling jumbo frames then the entire path reverts to the default MTU.
|
||||||
|
|
||||||
|
:term:`Quality of Service (QoS)` also has a great impact on network intensive
|
||||||
|
workloads as it provides instant service to packets which have a higher
|
||||||
|
priority due to the impact of poor network performance. In applications such as
|
||||||
|
Voice over IP (VoIP), differentiated services code points are a near
|
||||||
|
requirement for proper operation. You can also use QoS in the opposite
|
||||||
|
direction for mixed workloads to prevent low priority but high bandwidth
|
||||||
|
applications, for example backup services, video conferencing, or file sharing,
|
||||||
|
from blocking bandwidth that is needed for the proper operation of other
|
||||||
|
workloads. It is possible to tag file storage traffic as a lower class, such as
|
||||||
|
best effort or scavenger, to allow the higher priority traffic through. In
|
||||||
|
cases where regions within a cloud might be geographically distributed it may
|
||||||
|
also be necessary to plan accordingly to implement WAN optimization to combat
|
||||||
|
latency or packet loss
|
||||||
|
|
||||||
|
Network hardware selection
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The network architecture determines which network hardware will be
|
||||||
|
used. Networking software is determined by the selected networking
|
||||||
|
hardware.
|
||||||
|
|
||||||
|
There are more subtle design impacts that need to be considered. The
|
||||||
|
selection of certain networking hardware (and the networking software)
|
||||||
|
affects the management tools that can be used. There are exceptions to
|
||||||
|
this; the rise of *open* networking software that supports a range of
|
||||||
|
networking hardware means there are instances where the relationship
|
||||||
|
between networking hardware and networking software are not as tightly
|
||||||
|
defined.
|
||||||
|
|
||||||
|
For a compute-focus architecture, we recommend designing the network
|
||||||
|
architecture using a scalable network model that makes it easy to add
|
||||||
|
capacity and bandwidth. A good example of such a model is the leaf-spline
|
||||||
|
model. In this type of network design, you can add additional
|
||||||
|
bandwidth as well as scale out to additional racks of gear. It is important to
|
||||||
|
select network hardware that supports port count, port speed, and
|
||||||
|
port density while allowing for future growth as workload demands
|
||||||
|
increase. In the network architecture, it is also important to evaluate
|
||||||
|
where to provide redundancy.
|
||||||
|
|
||||||
|
Some of the key considerations in the selection of networking hardware
|
||||||
|
include:
|
||||||
|
|
||||||
|
Port count
|
||||||
|
The design will require networking hardware that has the requisite
|
||||||
|
port count.
|
||||||
|
|
||||||
|
Port density
|
||||||
|
The network design will be affected by the physical space that is
|
||||||
|
required to provide the requisite port count. A higher port density
|
||||||
|
is preferred, as it leaves more rack space for compute or storage
|
||||||
|
components. This can also lead into considerations about fault domains
|
||||||
|
and power density. Higher density switches are more expensive, therefore
|
||||||
|
it is important not to over design the network.
|
||||||
|
|
||||||
|
Port speed
|
||||||
|
The networking hardware must support the proposed network speed, for
|
||||||
|
example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE).
|
||||||
|
|
||||||
|
Redundancy
|
||||||
|
User requirements for high availability and cost considerations
|
||||||
|
influence the level of network hardware redundancy.
|
||||||
|
Network redundancy can be achieved by adding redundant power
|
||||||
|
supplies or paired switches.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Hardware must support network redundacy.
|
||||||
|
|
||||||
|
Power requirements
|
||||||
|
Ensure that the physical data center provides the necessary power
|
||||||
|
for the selected network hardware.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This is not an issue for top of rack (ToR) switches. This may be an issue
|
||||||
|
for spine switches in a leaf and spine fabric, or end of row (EoR)
|
||||||
|
switches.
|
||||||
|
|
||||||
|
Protocol support
|
||||||
|
It is possible to gain more performance out of a single storage
|
||||||
|
system by using specialized network technologies such as RDMA, SRP,
|
||||||
|
iSER and SCST. The specifics for using these technologies is beyond
|
||||||
|
the scope of this book.
|
||||||
|
|
||||||
|
There is no single best practice architecture for the networking
|
||||||
|
hardware supporting an OpenStack cloud. Some of the key factors that will
|
||||||
|
have a major influence on selection of networking hardware include:
|
||||||
|
|
||||||
|
Connectivity
|
||||||
|
All nodes within an OpenStack cloud require network connectivity. In
|
||||||
|
some cases, nodes require access to more than one network segment.
|
||||||
|
The design must encompass sufficient network capacity and bandwidth
|
||||||
|
to ensure that all communications within the cloud, both north-south
|
||||||
|
and east-west traffic have sufficient resources available.
|
||||||
|
|
||||||
|
Scalability
|
||||||
|
The network design should encompass a physical and logical network
|
||||||
|
design that can be easily expanded upon. Network hardware should
|
||||||
|
offer the appropriate types of interfaces and speeds that are
|
||||||
|
required by the hardware nodes.
|
||||||
|
|
||||||
|
Availability
|
||||||
|
To ensure access to nodes within the cloud is not interrupted,
|
||||||
|
we recommend that the network architecture identify any single
|
||||||
|
points of failure and provide some level of redundancy or fault
|
||||||
|
tolerance. The network infrastructure often involves use of
|
||||||
|
networking protocols such as LACP, VRRP or others to achieve a highly
|
||||||
|
available network connection. It is also important to consider the
|
||||||
|
networking implications on API availability. We recommend a load balancing
|
||||||
|
solution is designed within the network architecture to ensure that the APIs,
|
||||||
|
and potentially other services in the cloud are highly available.
|
||||||
|
|
||||||
|
Networking software selection
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
OpenStack Networking (neutron) provides a wide variety of networking
|
||||||
|
services for instances. There are many additional networking software
|
||||||
|
packages that can be useful when managing OpenStack components. Some
|
||||||
|
examples include:
|
||||||
|
|
||||||
|
* Software to provide load balancing
|
||||||
|
|
||||||
|
* Network redundancy protocols
|
||||||
|
|
||||||
|
* Routing daemons
|
||||||
|
|
||||||
|
Some of these software packages are described in more detail in the
|
||||||
|
`OpenStack network nodes chapter <http://docs.openstack.org/ha-guide
|
||||||
|
/networking-ha.html>`_ in the OpenStack High Availability Guide.
|
||||||
|
|
||||||
|
For a general purpose OpenStack cloud, the OpenStack infrastructure
|
||||||
|
components need to be highly available. If the design does not include
|
||||||
|
hardware load balancing, networking software packages like HAProxy will
|
||||||
|
need to be included.
|
||||||
|
|
||||||
|
For a compute-focused OpenStack cloud, the OpenStack infrastructure
|
||||||
|
components must be highly available. If the design does not include
|
||||||
|
hardware load balancing, you must add networking software packages, for
|
||||||
|
example, HAProxy.
|
||||||
|
@ -226,3 +226,166 @@ compute cloud are:
|
|||||||
* To provide users with a persistent storage mechanism
|
* To provide users with a persistent storage mechanism
|
||||||
* As a scalable, reliable data store for virtual machine images
|
* As a scalable, reliable data store for virtual machine images
|
||||||
|
|
||||||
|
Selecting storage hardware
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Storage hardware architecture is determined by selecting specific storage
|
||||||
|
architecture. Determine the selection of storage architecture by
|
||||||
|
evaluating possible solutions against the critical factors, the user
|
||||||
|
requirements, technical considerations, and operational considerations.
|
||||||
|
Consider the following factors when selecting storage hardware:
|
||||||
|
|
||||||
|
Cost
|
||||||
|
Storage can be a significant portion of the overall system cost. For
|
||||||
|
an organization that is concerned with vendor support, a commercial
|
||||||
|
storage solution is advisable, although it comes with a higher price
|
||||||
|
tag. If initial capital expenditure requires minimization, designing
|
||||||
|
a system based on commodity hardware would apply. The trade-off is
|
||||||
|
potentially higher support costs and a greater risk of
|
||||||
|
incompatibility and interoperability issues.
|
||||||
|
|
||||||
|
Performance
|
||||||
|
The latency of storage I/O requests indicates performance. Performance
|
||||||
|
requirements affect which solution you choose.
|
||||||
|
|
||||||
|
Scalability
|
||||||
|
Scalability, along with expandability, is a major consideration in a
|
||||||
|
general purpose OpenStack cloud. It might be difficult to predict
|
||||||
|
the final intended size of the implementation as there are no
|
||||||
|
established usage patterns for a general purpose cloud. It might
|
||||||
|
become necessary to expand the initial deployment in order to
|
||||||
|
accommodate growth and user demand.
|
||||||
|
|
||||||
|
Expandability
|
||||||
|
Expandability is a major architecture factor for storage solutions
|
||||||
|
with general purpose OpenStack cloud. A storage solution that
|
||||||
|
expands to 50 PB is considered more expandable than a solution that
|
||||||
|
only scales to 10 PB. This meter is related to scalability, which is
|
||||||
|
the measure of a solution's performance as it expands.
|
||||||
|
|
||||||
|
General purpose cloud storage requirements
|
||||||
|
------------------------------------------
|
||||||
|
Using a scale-out storage solution with direct-attached storage (DAS) in
|
||||||
|
the servers is well suited for a general purpose OpenStack cloud. Cloud
|
||||||
|
services requirements determine your choice of scale-out solution. You
|
||||||
|
need to determine if a single, highly expandable and highly vertical,
|
||||||
|
scalable, centralized storage array is suitable for your design. After
|
||||||
|
determining an approach, select the storage hardware based on this
|
||||||
|
criteria.
|
||||||
|
|
||||||
|
This list expands upon the potential impacts for including a particular
|
||||||
|
storage architecture (and corresponding storage hardware) into the
|
||||||
|
design for a general purpose OpenStack cloud:
|
||||||
|
|
||||||
|
Connectivity
|
||||||
|
If storage protocols other than Ethernet are part of the storage solution,
|
||||||
|
ensure the appropriate hardware has been selected. If a centralized storage
|
||||||
|
array is selected, ensure that the hypervisor will be able to connect to
|
||||||
|
that storage array for image storage.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
How the particular storage architecture will be used is critical for
|
||||||
|
determining the architecture. Some of the configurations that will
|
||||||
|
influence the architecture include whether it will be used by the
|
||||||
|
hypervisors for ephemeral instance storage, or if OpenStack Object
|
||||||
|
Storage will use it for object storage.
|
||||||
|
|
||||||
|
Instance and image locations
|
||||||
|
Where instances and images will be stored will influence the
|
||||||
|
architecture.
|
||||||
|
|
||||||
|
Server hardware
|
||||||
|
If the solution is a scale-out storage architecture that includes
|
||||||
|
DAS, it will affect the server hardware selection. This could ripple
|
||||||
|
into the decisions that affect host density, instance density, power
|
||||||
|
density, OS-hypervisor, management tools and others.
|
||||||
|
|
||||||
|
A general purpose OpenStack cloud has multiple options. The key factors
|
||||||
|
that will have an influence on selection of storage hardware for a
|
||||||
|
general purpose OpenStack cloud are as follows:
|
||||||
|
|
||||||
|
Capacity
|
||||||
|
Hardware resources selected for the resource nodes should be capable
|
||||||
|
of supporting enough storage for the cloud services. Defining the
|
||||||
|
initial requirements and ensuring the design can support adding
|
||||||
|
capacity is important. Hardware nodes selected for object storage
|
||||||
|
should be capable of support a large number of inexpensive disks
|
||||||
|
with no reliance on RAID controller cards. Hardware nodes selected
|
||||||
|
for block storage should be capable of supporting high speed storage
|
||||||
|
solutions and RAID controller cards to provide performance and
|
||||||
|
redundancy to storage at a hardware level. Selecting hardware RAID
|
||||||
|
controllers that automatically repair damaged arrays will assist
|
||||||
|
with the replacement and repair of degraded or deleted storage
|
||||||
|
devices.
|
||||||
|
|
||||||
|
Performance
|
||||||
|
Disks selected for object storage services do not need to be fast
|
||||||
|
performing disks. We recommend that object storage nodes take
|
||||||
|
advantage of the best cost per terabyte available for storage.
|
||||||
|
Contrastingly, disks chosen for block storage services should take
|
||||||
|
advantage of performance boosting features that may entail the use
|
||||||
|
of SSDs or flash storage to provide high performance block storage
|
||||||
|
pools. Storage performance of ephemeral disks used for instances
|
||||||
|
should also be taken into consideration.
|
||||||
|
|
||||||
|
Fault tolerance
|
||||||
|
Object storage resource nodes have no requirements for hardware
|
||||||
|
fault tolerance or RAID controllers. It is not necessary to plan for
|
||||||
|
fault tolerance within the object storage hardware because the
|
||||||
|
object storage service provides replication between zones as a
|
||||||
|
feature of the service. Block storage nodes, compute nodes, and
|
||||||
|
cloud controllers should all have fault tolerance built in at the
|
||||||
|
hardware level by making use of hardware RAID controllers and
|
||||||
|
varying levels of RAID configuration. The level of RAID chosen
|
||||||
|
should be consistent with the performance and availability
|
||||||
|
requirements of the cloud.
|
||||||
|
|
||||||
|
Storage-focus cloud storage requirements
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
Storage-focused OpenStack clouds must address I/O intensive workloads.
|
||||||
|
These workloads are not CPU intensive, nor are they consistently network
|
||||||
|
intensive. The network may be heavily utilized to transfer storage, but
|
||||||
|
they are not otherwise network intensive.
|
||||||
|
|
||||||
|
The selection of storage hardware determines the overall performance and
|
||||||
|
scalability of a storage-focused OpenStack design architecture. Several
|
||||||
|
factors impact the design process, including:
|
||||||
|
|
||||||
|
Latency is a key consideration in a storage-focused OpenStack cloud.
|
||||||
|
Using solid-state disks (SSDs) to minimize latency and, to reduce CPU
|
||||||
|
delays caused by waiting for the storage, increases performance. Use
|
||||||
|
RAID controller cards in compute hosts to improve the performance of the
|
||||||
|
underlying disk subsystem.
|
||||||
|
|
||||||
|
Depending on the storage architecture, you can adopt a scale-out
|
||||||
|
solution, or use a highly expandable and scalable centralized storage
|
||||||
|
array. If a centralized storage array meets your requirements, then the
|
||||||
|
array vendor determines the hardware selection. It is possible to build
|
||||||
|
a storage array using commodity hardware with Open Source software, but
|
||||||
|
requires people with expertise to build such a system.
|
||||||
|
|
||||||
|
On the other hand, a scale-out storage solution that uses
|
||||||
|
direct-attached storage (DAS) in the servers may be an appropriate
|
||||||
|
choice. This requires configuration of the server hardware to support
|
||||||
|
the storage solution.
|
||||||
|
|
||||||
|
Considerations affecting storage architecture (and corresponding storage
|
||||||
|
hardware) of a Storage-focused OpenStack cloud include:
|
||||||
|
|
||||||
|
Connectivity
|
||||||
|
Ensure the connectivity matches the storage solution requirements. We
|
||||||
|
recommended confirming that the network characteristics minimize latency
|
||||||
|
to boost the overall performance of the design.
|
||||||
|
|
||||||
|
Latency
|
||||||
|
Determine if the use case has consistent or highly variable latency.
|
||||||
|
|
||||||
|
Throughput
|
||||||
|
Ensure that the storage solution throughput is optimized for your
|
||||||
|
application requirements.
|
||||||
|
|
||||||
|
Server hardware
|
||||||
|
Use of DAS impacts the server hardware choice and affects host
|
||||||
|
density, instance density, power density, OS-hypervisor, and
|
||||||
|
management tools.
|
||||||
|
@ -5,6 +5,63 @@ Operator requirements
|
|||||||
This section describes operational factors affecting the design of an
|
This section describes operational factors affecting the design of an
|
||||||
OpenStack cloud.
|
OpenStack cloud.
|
||||||
|
|
||||||
|
Network design
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The network design for an OpenStack cluster includes decisions regarding
|
||||||
|
the interconnect needs within the cluster, the need to allow clients to
|
||||||
|
access their resources, and the access requirements for operators to
|
||||||
|
administrate the cluster. You should consider the bandwidth, latency,
|
||||||
|
and reliability of these networks.
|
||||||
|
|
||||||
|
Consider additional design decisions about monitoring and alarming.
|
||||||
|
If you are using an external provider, service level agreements (SLAs)
|
||||||
|
are typically defined in your contract. Operational considerations such
|
||||||
|
as bandwidth, latency, and jitter can be part of the SLA.
|
||||||
|
|
||||||
|
As demand for network resources increase, make sure your network design
|
||||||
|
accommodates expansion and upgrades. Operators add additional IP address
|
||||||
|
blocks and add additional bandwidth capacity. In addition, consider
|
||||||
|
managing hardware and software lifecycle events, for example upgrades,
|
||||||
|
decommissioning, and outages, while avoiding service interruptions for
|
||||||
|
tenants.
|
||||||
|
|
||||||
|
Factor maintainability into the overall network design. This includes
|
||||||
|
the ability to manage and maintain IP addresses as well as the use of
|
||||||
|
overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
|
||||||
|
tags. As an example, if you may need to change all of the IP addresses
|
||||||
|
on a network, a process known as renumbering, then the design must
|
||||||
|
support this function.
|
||||||
|
|
||||||
|
Address network-focused applications when considering certain
|
||||||
|
operational realities. For example, consider the impending exhaustion of
|
||||||
|
IPv4 addresses, the migration to IPv6, and the use of private networks
|
||||||
|
to segregate different types of traffic that an application receives or
|
||||||
|
generates. In the case of IPv4 to IPv6 migrations, applications should
|
||||||
|
follow best practices for storing IP addresses. We recommend you avoid
|
||||||
|
relying on IPv4 features that did not carry over to the IPv6 protocol or
|
||||||
|
have differences in implementation.
|
||||||
|
|
||||||
|
To segregate traffic, allow applications to create a private tenant
|
||||||
|
network for database and storage network traffic. Use a public network
|
||||||
|
for services that require direct client access from the Internet. Upon
|
||||||
|
segregating the traffic, consider :term:`quality of service (QoS)` and
|
||||||
|
security to ensure each network has the required level of service.
|
||||||
|
|
||||||
|
Also consider the routing of network traffic. For some applications,
|
||||||
|
develop a complex policy framework for routing. To create a routing
|
||||||
|
policy that satisfies business requirements, consider the economic cost
|
||||||
|
of transmitting traffic over expensive links versus cheaper links, in
|
||||||
|
addition to bandwidth, latency, and jitter requirements.
|
||||||
|
|
||||||
|
Finally, consider how to respond to network events. How load
|
||||||
|
transfers from one link to another during a failure scenario could be
|
||||||
|
a factor in the design. If you do not plan network capacity
|
||||||
|
correctly, failover traffic could overwhelm other ports or network
|
||||||
|
links and create a cascading failure scenario. In this case,
|
||||||
|
traffic that fails over to one link overwhelms that link and then
|
||||||
|
moves to the subsequent links until all network traffic stops.
|
||||||
|
|
||||||
SLA considerations
|
SLA considerations
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
@ -102,6 +159,89 @@ managing and maintaining your OpenStack environment, see the
|
|||||||
`Operations chapter <http://docs.openstack.org/ops-guide/operations.html>`_
|
`Operations chapter <http://docs.openstack.org/ops-guide/operations.html>`_
|
||||||
in the OpenStack Operations Guide.
|
in the OpenStack Operations Guide.
|
||||||
|
|
||||||
|
Logging and monitoring
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
OpenStack clouds require appropriate monitoring platforms to identify and
|
||||||
|
manage errors.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
We recommend leveraging existing monitoring systems to see if they
|
||||||
|
are able to effectively monitor an OpenStack environment.
|
||||||
|
|
||||||
|
Specific meters that are critically important to capture include:
|
||||||
|
|
||||||
|
* Image disk utilization
|
||||||
|
|
||||||
|
* Response time to the Compute API
|
||||||
|
|
||||||
|
Logging and monitoring does not significantly differ for a multi-site OpenStack
|
||||||
|
cloud. The tools described in the `Logging and monitoring chapter
|
||||||
|
<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`__ of
|
||||||
|
the Operations Guide remain applicable. Logging and monitoring can be provided
|
||||||
|
on a per-site basis, and in a common centralized location.
|
||||||
|
|
||||||
|
When attempting to deploy logging and monitoring facilities to a centralized
|
||||||
|
location, care must be taken with the load placed on the inter-site networking
|
||||||
|
links
|
||||||
|
|
||||||
|
Management software
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Management software providing clustering, logging, monitoring, and alerting
|
||||||
|
details for a cloud environment is often used. This impacts and affects the
|
||||||
|
overall OpenStack cloud design, and must account for the additional resource
|
||||||
|
consumption such as CPU, RAM, storage, and network
|
||||||
|
bandwidth.
|
||||||
|
|
||||||
|
The inclusion of clustering software, such as Corosync or Pacemaker, is
|
||||||
|
primarily determined by the availability of the cloud infrastructure and
|
||||||
|
the complexity of supporting the configuration after it is deployed. The
|
||||||
|
`OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/>`_
|
||||||
|
provides more details on the installation and configuration of Corosync
|
||||||
|
and Pacemaker, should these packages need to be included in the design.
|
||||||
|
|
||||||
|
Some other potential design impacts include:
|
||||||
|
|
||||||
|
* OS-hypervisor combination
|
||||||
|
Ensure that the selected logging, monitoring, or alerting tools support
|
||||||
|
the proposed OS-hypervisor combination.
|
||||||
|
|
||||||
|
* Network hardware
|
||||||
|
The network hardware selection needs to be supported by the logging,
|
||||||
|
monitoring, and alerting software.
|
||||||
|
|
||||||
|
Database software
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Most OpenStack components require access to back-end database services
|
||||||
|
to store state and configuration information. Choose an appropriate
|
||||||
|
back-end database which satisfies the availability and fault tolerance
|
||||||
|
requirements of the OpenStack services.
|
||||||
|
|
||||||
|
MySQL is the default database for OpenStack, but other compatible
|
||||||
|
databases are available.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Telemetry uses MongoDB.
|
||||||
|
|
||||||
|
The chosen high availability database solution changes according to the
|
||||||
|
selected database. MySQL, for example, provides several options. Use a
|
||||||
|
replication technology such as Galera for active-active clustering. For
|
||||||
|
active-passive use some form of shared storage. Each of these potential
|
||||||
|
solutions has an impact on the design:
|
||||||
|
|
||||||
|
* Solutions that employ Galera/MariaDB require at least three MySQL
|
||||||
|
nodes.
|
||||||
|
|
||||||
|
* MongoDB has its own design considerations for high availability.
|
||||||
|
|
||||||
|
* OpenStack design, generally, does not include shared storage.
|
||||||
|
However, for some high availability designs, certain components might
|
||||||
|
require it depending on the specific implementation.
|
||||||
|
|
||||||
Operator access to systems
|
Operator access to systems
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
33
doc/arch-design-draft/source/overview-software-licensing.rst
Normal file
33
doc/arch-design-draft/source/overview-software-licensing.rst
Normal file
@ -0,0 +1,33 @@
|
|||||||
|
==================
|
||||||
|
Software licensing
|
||||||
|
==================
|
||||||
|
|
||||||
|
The many different forms of license agreements for software are often written
|
||||||
|
with the use of dedicated hardware in mind. This model is relevant for the
|
||||||
|
cloud platform itself, including the hypervisor operating system, supporting
|
||||||
|
software for items such as database, RPC, backup, and so on. Consideration
|
||||||
|
must be made when offering Compute service instances and applications to end
|
||||||
|
users of the cloud, since the license terms for that software may need some
|
||||||
|
adjustment to be able to operate economically in the cloud.
|
||||||
|
|
||||||
|
Multi-site OpenStack deployments present additional licensing
|
||||||
|
considerations over and above regular OpenStack clouds, particularly
|
||||||
|
where site licenses are in use to provide cost efficient access to
|
||||||
|
software licenses. The licensing for host operating systems, guest
|
||||||
|
operating systems, OpenStack distributions (if applicable),
|
||||||
|
software-defined infrastructure including network controllers and
|
||||||
|
storage systems, and even individual applications need to be evaluated.
|
||||||
|
|
||||||
|
Topics to consider include:
|
||||||
|
|
||||||
|
* The definition of what constitutes a site in the relevant licenses,
|
||||||
|
as the term does not necessarily denote a geographic or otherwise
|
||||||
|
physically isolated location.
|
||||||
|
|
||||||
|
* Differentiations between "hot" (active) and "cold" (inactive) sites,
|
||||||
|
where significant savings may be made in situations where one site is
|
||||||
|
a cold standby for disaster recovery purposes only.
|
||||||
|
|
||||||
|
* Certain locations might require local vendors to provide support and
|
||||||
|
services for each site which may vary with the licensing agreement in
|
||||||
|
place.
|
@ -55,5 +55,6 @@ covered include:
|
|||||||
overview-planning
|
overview-planning
|
||||||
overview-customer-requirements
|
overview-customer-requirements
|
||||||
overview-legal-requirements
|
overview-legal-requirements
|
||||||
|
overview-software-licensing
|
||||||
overview-security-requirements
|
overview-security-requirements
|
||||||
overview-operator-requirements
|
overview-operator-requirements
|
||||||
|
Loading…
Reference in New Issue
Block a user