Remove passive voice from chap 5, arch guide
Change-Id: Iaaa9d2a052c9f81cefebd18bdc866d96d4fed64e Closes-Bug: #1427935
This commit is contained in:
parent
dbf44fc980
commit
31771bef7b
@ -5,167 +5,141 @@
|
||||
version="5.0"
|
||||
xml:id="network_focus">
|
||||
<title>Network focused</title>
|
||||
|
||||
<para>All OpenStack deployments are dependent, to some extent, on
|
||||
network communication in order to function properly due to a
|
||||
service-based nature. In some cases, however, use cases
|
||||
dictate that the network is elevated beyond simple
|
||||
infrastructure. This chapter is a discussion of architectures
|
||||
that are more reliant or focused on network services. These
|
||||
architectures are heavily dependent on the network
|
||||
infrastructure and need to be architected so that the network
|
||||
services perform and are reliable in order to satisfy user and
|
||||
application requirements.</para>
|
||||
<para>All OpenStack deployments depend on network communication in order
|
||||
to function properly due to its service-based nature. In some cases,
|
||||
however, the network elevates beyond simple
|
||||
infrastructure. This chapter discusses architectures that are more
|
||||
reliant or focused on network services. These architectures depend
|
||||
on the network infrastructure and require
|
||||
network services that perform reliably in order to satisfy user and
|
||||
application requirements.</para>
|
||||
<para>Some possible use cases include:</para>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Content delivery network</term>
|
||||
<listitem>
|
||||
<para>This could include
|
||||
streaming video, photographs or any other cloud based
|
||||
repository of data that is distributed to a large
|
||||
number of end users. Mass market streaming video will
|
||||
be very heavily affected by the network configurations
|
||||
that would affect latency, bandwidth, and the
|
||||
distribution of instances. Not all video streaming is
|
||||
consumer focused. For example, multicast videos (used
|
||||
for media, press conferences, corporate presentations,
|
||||
web conferencing services, and so on) can also utilize a
|
||||
content delivery network. Content delivery will be
|
||||
affected by the location of the video repository and
|
||||
its relationship to end users. Performance is also
|
||||
affected by network throughput of the back-end systems,
|
||||
as well as the WAN architecture and the cache
|
||||
methodology.</para>
|
||||
<para>This includes streaming video, viewing photographs, or
|
||||
accessing any other cloud-based data repository distributed to
|
||||
a large number of end users. Network configuration affects
|
||||
latency, bandwidth, and the distribution of instances. Therefore,
|
||||
it impacts video streaming. Not all video streaming is
|
||||
consumer-focused. For example, multicast videos (used for media,
|
||||
press conferences, corporate presentations, and web conferencing
|
||||
services) can also use a content delivery network.
|
||||
The location of the video repository and its relationship to end
|
||||
users affects content delivery. Network throughput of the back-end
|
||||
systems, as well as the WAN architecture and the cache methodology,
|
||||
also affect performance.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Network management functions</term>
|
||||
<listitem>
|
||||
<para>A cloud that provides
|
||||
network service functions would be built to support
|
||||
the delivery of back-end network services such as DNS,
|
||||
NTP or SNMP and would be used by a company for
|
||||
internal network management.</para>
|
||||
<para>Use this cloud to provide network service functions built to
|
||||
support the delivery of back-end network services such as DNS,
|
||||
NTP, or SNMP. A company can use these services for internal
|
||||
network management.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Network service offerings</term>
|
||||
<listitem>
|
||||
<para>A cloud can be used to
|
||||
run customer facing network tools to support services.
|
||||
For example, VPNs, MPLS private networks, GRE tunnels
|
||||
and others.</para>
|
||||
<para>Use this cloud to run customer-facing network tools to
|
||||
support services. Examples include VPNs, MPLS private networks,
|
||||
and GRE tunnels.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Web portals or web services</term>
|
||||
<listitem>
|
||||
<para>Web servers are a common
|
||||
application for cloud services and we recommend
|
||||
an understanding of the network requirements.
|
||||
The network will need to be able to scale out to meet
|
||||
user demand and deliver webpages with a minimum of
|
||||
latency. Internal east-west and north-south network
|
||||
bandwidth must be considered depending on the details
|
||||
of the portal architecture.</para>
|
||||
<para>Web servers are a common application for cloud services,
|
||||
and we recommend an understanding of their network requirements.
|
||||
The network requires scaling out to meet user demand and deliver
|
||||
web pages with a minimum latency. Depending on the details of
|
||||
the portal architecture, consider the internal east-west and
|
||||
north-south network bandwidth.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>High speed and high volume transactional systems</term>
|
||||
<listitem>
|
||||
<para>
|
||||
These types of applications are very sensitive to
|
||||
network configurations. Examples include many
|
||||
financial systems, credit card transaction
|
||||
applications, trading and other extremely high volume
|
||||
systems. These systems are sensitive to network jitter
|
||||
and latency. They also have a high volume of both
|
||||
east-west and north-south network traffic that needs
|
||||
to be balanced to maximize efficiency of the data
|
||||
delivery. Many of these systems have large high
|
||||
performance database back ends that need to be
|
||||
accessed.</para>
|
||||
These types of applications are sensitive to network
|
||||
configurations. Examples include financial systems,
|
||||
credit card transaction applications, and trading and other
|
||||
extremely high volume systems. These systems are sensitive
|
||||
to network jitter and latency. They must balance a high volume
|
||||
of East-West and North-South network traffic to
|
||||
maximize efficiency of the data delivery.
|
||||
Many of these systems must access large, high performance
|
||||
database back ends.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>High availability</term>
|
||||
<listitem>
|
||||
<para>These types of use cases are
|
||||
highly dependent on the proper sizing of the network
|
||||
to maintain replication of data between sites for high
|
||||
availability. If one site becomes unavailable, the
|
||||
extra sites will be able to serve the displaced load
|
||||
until the original site returns to service. It is
|
||||
important to size network capacity to handle the loads
|
||||
that are desired.</para>
|
||||
<para>These types of use cases are dependent on the proper sizing
|
||||
of the network to maintain replication of data between sites for
|
||||
high availability. If one site becomes unavailable, the extra
|
||||
sites can serve the displaced load until the original site
|
||||
returns to service. It is important to size network capacity
|
||||
to handle the desired loads.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Big data</term>
|
||||
<listitem>
|
||||
<para>Clouds that will be used for the
|
||||
management and collection of big data (data ingest)
|
||||
will have a significant demand on network resources.
|
||||
Big data often uses partial replicas of the data to
|
||||
maintain data integrity over large distributed clouds.
|
||||
Other big data applications that require a large
|
||||
amount of network resources are Hadoop, Cassandra,
|
||||
NuoDB, RIAK and other No-SQL and distributed
|
||||
databases.</para>
|
||||
<para>Clouds used for the management and collection of big data
|
||||
(data ingest) have a significant demand on network resources.
|
||||
Big data often uses partial replicas of the data to maintain
|
||||
integrity over large distributed clouds. Other big data
|
||||
applications that require a large amount of network resources
|
||||
are Hadoop, Cassandra, NuoDB, Riak, and other NoSQL and
|
||||
distributed databases.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Virtual desktop infrastructure (VDI)</term>
|
||||
<listitem>
|
||||
<para>This use case
|
||||
is very sensitive to network congestion, latency,
|
||||
jitter and other network characteristics. Like video
|
||||
streaming, the user experience is very important
|
||||
however, unlike video streaming, caching is not an
|
||||
option to offset the network issues. VDI requires both
|
||||
upstream and downstream traffic and cannot rely on
|
||||
caching for the delivery of the application to the end
|
||||
user.</para>
|
||||
<para>This use case is sensitive to network congestion, latency,
|
||||
jitter, and other network characteristics. Like video streaming,
|
||||
the user experience is important. However, unlike video
|
||||
streaming, caching is not an option to offset the network issues.
|
||||
VDI requires both upstream and downstream traffic and cannot rely
|
||||
on caching for the delivery of the application to the end user.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Voice over IP (VoIP)</term>
|
||||
<listitem>
|
||||
<para>This is extremely sensitive to
|
||||
network congestion, latency, jitter and other network
|
||||
characteristics. VoIP has a symmetrical traffic
|
||||
pattern and it requires network quality of service
|
||||
(QoS) for best performance. It may also require an
|
||||
active queue management implementation to ensure
|
||||
delivery. Users are very sensitive to latency and
|
||||
jitter fluctuations and can detect them at very low
|
||||
levels.</para>
|
||||
<para>This is sensitive to network congestion, latency, jitter,
|
||||
and other network characteristics. VoIP has a symmetrical traffic
|
||||
pattern and it requires network quality of service (QoS) for best
|
||||
performance. In addition, you can implement active queue management
|
||||
to deliver voice and multimedia content. Users are sensitive to
|
||||
latency and jitter fluctuations and can detect them at very low
|
||||
levels.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Video Conference or web conference</term>
|
||||
<listitem>
|
||||
<para>This also is
|
||||
extremely sensitive to network congestion, latency,
|
||||
jitter and other network flaws. Video Conferencing has
|
||||
a symmetrical traffic pattern, but unless the network
|
||||
is on an MPLS private network, it cannot use network
|
||||
quality of service (QoS) to improve performance.
|
||||
Similar to VOIP, users will be sensitive to network
|
||||
performance issues even at low levels.</para>
|
||||
<para>This is sensitive to network congestion, latency, jitter,
|
||||
and other network characteristics. Video Conferencing has a
|
||||
symmetrical traffic pattern, but unless the network is on an
|
||||
MPLS private network, it cannot use network quality of service
|
||||
(QoS) to improve performance. Similar to VoIP, users are
|
||||
sensitive to network performance issues even at low levels.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>High performance computing (HPC)</term>
|
||||
<listitem>
|
||||
<para>This is a complex
|
||||
use case that requires careful consideration of the
|
||||
traffic flows and usage patterns to address the needs
|
||||
of cloud clusters. It has high East-West traffic
|
||||
patterns for distributed computing, but there can be
|
||||
substantial North-South traffic depending on the
|
||||
specific application.</para>
|
||||
<para>This is a complex use case that requires careful
|
||||
consideration of the traffic flows and usage patterns to address
|
||||
the needs of cloud clusters. It has high east-west traffic
|
||||
patterns for distributed computing, but there can be substantial
|
||||
north-south traffic depending on the specific application.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
@ -5,222 +5,190 @@
|
||||
version="5.0"
|
||||
xml:id="architecture-network-focus">
|
||||
<title>Architecture</title>
|
||||
<para>Network focused OpenStack architectures have many
|
||||
similarities to other OpenStack architecture use cases. There
|
||||
are a number of very specific considerations to keep in mind when
|
||||
designing for a network-centric or network-heavy application
|
||||
environment.</para>
|
||||
<para>Networks exist to serve as a medium of transporting data
|
||||
between systems. It is inevitable that an OpenStack design
|
||||
has inter-dependencies with non-network portions of OpenStack
|
||||
as well as on external systems. Depending on the specific
|
||||
workload, there may be major interactions with storage systems
|
||||
both within and external to the OpenStack environment. For
|
||||
example, if the workload is a content delivery network, then
|
||||
the interactions with storage will be two-fold. There will be
|
||||
traffic flowing to and from the storage array for ingesting
|
||||
and serving content in a north-south direction. In addition,
|
||||
there is replication traffic flowing in an east-west
|
||||
direction.</para>
|
||||
<para>Compute-heavy workloads may also induce interactions with
|
||||
the network. Some high performance compute applications
|
||||
require network-based memory mapping and data sharing and, as
|
||||
a result, will induce a higher network load when they transfer
|
||||
results and data sets. Others may be highly transactional and
|
||||
issue transaction locks, perform their functions and rescind
|
||||
transaction locks at very high rates. This also has an impact
|
||||
on the network performance.</para>
|
||||
<para>Some network dependencies are going to be external to
|
||||
OpenStack. While OpenStack Networking is capable of providing network
|
||||
ports, IP addresses, some level of routing, and overlay
|
||||
networks, there are some other functions that it cannot
|
||||
provide. For many of these, external systems or equipment may
|
||||
be required to fill in the functional gaps. Hardware load
|
||||
balancers are an example of equipment that may be necessary to
|
||||
distribute workloads or offload certain functions. Note that,
|
||||
as of the Kilo release, dynamic routing is currently in
|
||||
its infancy within OpenStack and may need to be implemented
|
||||
either by an external device or a specialized service instance
|
||||
within OpenStack. Tunneling is a feature provided by OpenStack Networking,
|
||||
however it is constrained to a Networking-managed region. If the
|
||||
need arises to extend a tunnel beyond the OpenStack region to
|
||||
either another region or an external system, it is necessary
|
||||
to implement the tunnel itself outside OpenStack or by using a
|
||||
tunnel management system to map the tunnel or overlay to an
|
||||
external tunnel. OpenStack does not currently provide quotas
|
||||
for network resources. Where network quotas are required, it
|
||||
is necessary to implement quality of service management
|
||||
outside of OpenStack. In many of these instances, similar
|
||||
solutions for traffic shaping or other network functions will
|
||||
be needed.
|
||||
<para>Network-focused OpenStack architectures have many similarities to
|
||||
other OpenStack architecture use cases. There are several factors
|
||||
to consider when designing for a network-centric or network-heavy
|
||||
application environment.</para>
|
||||
<para>Networks exist to serve as a medium of transporting data between
|
||||
systems. It is inevitable that an OpenStack design has inter-dependencies
|
||||
with non-network portions of OpenStack as well as on external systems.
|
||||
Depending on the specific workload, there may be major interactions with
|
||||
storage systems both within and external to the OpenStack environment.
|
||||
For example, in the case of content delivery network, there is twofold
|
||||
interaction with storage. Traffic flows to and from the storage array for
|
||||
ingesting and serving content in a north-south direction. In addition,
|
||||
there is replication traffic flowing in an east-west direction.</para>
|
||||
<para>Compute-heavy workloads may also induce interactions with the
|
||||
network. Some high performance compute applications require network-based
|
||||
memory mapping and data sharing and, as a result, induce a higher network
|
||||
load when they transfer results and data sets. Others may be highly
|
||||
transactional and issue transaction locks, perform their functions, and
|
||||
revoke transaction locks at high rates. This also has an impact on the
|
||||
network performance.</para>
|
||||
<para>Some network dependencies are external to OpenStack. While
|
||||
OpenStack Networking is capable of providing network ports, IP addresses,
|
||||
some level of routing, and overlay networks, there are some other
|
||||
functions that it cannot provide. For many of these, you may require
|
||||
external systems or equipment to fill in the functional gaps. Hardware
|
||||
load balancers are an example of equipment that may be necessary to
|
||||
distribute workloads or offload certain functions. As of the Icehouse
|
||||
release, dynamic routing is currently in its infancy within OpenStack and
|
||||
you may require an external device or a specialized service instance
|
||||
within OpenStack to implement it. OpenStack Networking provides a
|
||||
tunneling feature, however it is constrained to a Networking-managed
|
||||
region. If the need arises to extend a tunnel beyond the OpenStack region
|
||||
to either another region or an external system, implement the tunnel
|
||||
itself outside OpenStack or use a tunnel management system to map the
|
||||
tunnel or overlay to an external tunnel. OpenStack does not currently
|
||||
provide quotas for network resources. Where network quotas are required,
|
||||
implement quality of service management outside of OpenStack. In many of
|
||||
these instances, similar solutions for traffic shaping or other network
|
||||
functions are needed.
|
||||
</para>
|
||||
<para>
|
||||
Depending on the selected design, Networking itself might not
|
||||
even support the required
|
||||
<glossterm baseform="Layer-3 network">layer-3
|
||||
support the required <glossterm baseform="Layer-3 network">layer-3
|
||||
network</glossterm> functionality. If you choose to use the
|
||||
provider networking mode without running the layer-3 agent, you
|
||||
must install an external router to provide layer-3 connectivity
|
||||
to outside systems.
|
||||
</para>
|
||||
<para>Interaction with orchestration services is inevitable in
|
||||
larger-scale deployments. The Orchestration module is capable of allocating
|
||||
network resource defined in templates to map to tenant
|
||||
networks and for port creation, as well as allocating floating
|
||||
IPs. If there is a requirement to define and manage network
|
||||
resources in using orchestration, we recommend that the
|
||||
design include the Orchestration module to meet the demands of
|
||||
users.</para>
|
||||
larger-scale deployments. The Orchestration module is capable of
|
||||
allocating network resource defined in templates to map to tenant
|
||||
networks and for port creation, as well as allocating floating IPs.
|
||||
If there is a requirement to define and manage network resources when
|
||||
using orchestration, we recommend that the design include the
|
||||
Orchestration module to meet the demands of users.</para>
|
||||
<section xml:id="design-impacts">
|
||||
<title>Design impacts</title>
|
||||
<para>A wide variety of factors can affect a network focused
|
||||
OpenStack architecture. While there are some considerations
|
||||
shared with a general use case, specific workloads related to
|
||||
network requirements will influence network design
|
||||
decisions.</para>
|
||||
<para>One decision includes whether or not to use Network Address
|
||||
Translation (NAT) and where to implement it. If there is a
|
||||
requirement for floating IPs to be available instead of using
|
||||
public fixed addresses then NAT is required. This can be seen
|
||||
in network management applications that rely on an IP
|
||||
endpoint. An example of this is a DHCP relay that needs to
|
||||
know the IP of the actual DHCP server. In these cases it is
|
||||
easier to automate the infrastructure to apply the target IP
|
||||
to a new instance rather than reconfigure legacy or external
|
||||
systems for each new instance.</para>
|
||||
<para>NAT for floating IPs managed by Networking will reside within
|
||||
the hypervisor but there are also versions of NAT that may be
|
||||
running elsewhere. If there is a shortage of IPv4 addresses
|
||||
there are two common methods to mitigate this externally to
|
||||
OpenStack. The first is to run a load balancer either within
|
||||
OpenStack as an instance, or use an external load balancing
|
||||
solution. In the internal scenario, load balancing software,
|
||||
such as HAproxy, can be managed with Networking's
|
||||
Load-Balancer-as-a-Service (LBaaS). This is specifically to
|
||||
manage the
|
||||
Virtual IP (VIP) while a dual-homed connection from the
|
||||
HAproxy instance connects the public network with the tenant
|
||||
private network that hosts all of the content servers. In the
|
||||
external scenario, a load balancer would need to serve the VIP
|
||||
and also be joined to the tenant overlay network through
|
||||
external means or routed to it via private addresses.</para>
|
||||
<para>Another kind of NAT that may be useful is protocol NAT. In
|
||||
some cases it may be desirable to use only IPv6 addresses on
|
||||
instances and operate either an instance or an external
|
||||
service to provide a NAT-based transition technology such as
|
||||
NAT64 and DNS64. This provides the ability to have a globally
|
||||
routable IPv6 address while only consuming IPv4 addresses as
|
||||
necessary or in a shared manner.</para>
|
||||
<para>Application workloads will affect the design of the
|
||||
underlying network architecture. If a workload requires
|
||||
network-level redundancy, the routing and switching
|
||||
architecture will have to accommodate this. There are
|
||||
differing methods for providing this that are dependent on the
|
||||
network hardware selected, the performance of the hardware,
|
||||
and which networking model is deployed. Some examples of this
|
||||
are the use of Link aggregation (LAG) or Hot Standby Router
|
||||
Protocol (HSRP). There are also the considerations of whether
|
||||
to deploy OpenStack Networking or legacy networking (nova-network)
|
||||
and which plug-in to select
|
||||
for OpenStack Networking. If using an external system, Networking will need to
|
||||
be configured to run
|
||||
<glossterm baseform="Layer-2 network">layer 2</glossterm>
|
||||
with a provider network
|
||||
configuration. For example, it may be necessary to implement
|
||||
HSRP to terminate layer-3 connectivity.</para>
|
||||
<para>Depending on the workload, overlay networks may or may not
|
||||
be a recommended configuration. Where application network
|
||||
connections are small, short lived or bursty, running a
|
||||
dynamic overlay can generate as much bandwidth as the packets
|
||||
it carries. It also can induce enough latency to cause issues
|
||||
with certain applications. There is an impact to the device
|
||||
generating the overlay which, in most installations, will be
|
||||
the hypervisor. This will cause performance degradation on
|
||||
packet per second and connection per second rates.</para>
|
||||
<para>Overlays also come with a secondary option that may or may
|
||||
not be appropriate to a specific workload. While all of them
|
||||
will operate in full mesh by default, there might be good
|
||||
reasons to disable this function because it may cause
|
||||
excessive overhead for some workloads. Conversely, other
|
||||
workloads will operate without issue. For example, most web
|
||||
services applications will not have major issues with a full
|
||||
mesh overlay network, while some network monitoring tools or
|
||||
storage replication workloads will have performance issues
|
||||
with throughput or excessive broadcast traffic.</para>
|
||||
<para>Many people overlook an important design decision: The choice
|
||||
of layer-3
|
||||
protocols. While OpenStack was initially built with only IPv4
|
||||
<para>A wide variety of factors can affect a network-focused OpenStack
|
||||
architecture. While there are some considerations shared with a general
|
||||
use case, specific workloads related to network requirements influence
|
||||
network design decisions.</para>
|
||||
<para>One decision includes whether or not to use Network Address
|
||||
Translation (NAT) and where to implement it. If there is a requirement
|
||||
for floating IPs instead of public fixed addresses then you must use
|
||||
NAT. An example of this is a DHCP relay that must know the IP of the
|
||||
DHCP server. In these cases it is easier to automate the infrastructure
|
||||
to apply the target IP to a new instance rather than to reconfigure
|
||||
legacy or external systems for each new instance.</para>
|
||||
<para>NAT for floating IPs managed by Networking resides within the
|
||||
hypervisor but there are also versions of NAT that may be running
|
||||
elsewhere. If there is a shortage of IPv4 addresses there are two common
|
||||
methods to mitigate this externally to OpenStack. The first is to run a
|
||||
load balancer either within OpenStack as an instance, or use an external
|
||||
load balancing solution. In the internal scenario, Networking's
|
||||
Load-Balancer-as-a-Service (LBaaS) can manage load balancing
|
||||
software, for example HAproxy. This is specifically to manage the
|
||||
Virtual IP (VIP) while a dual-homed connection from the HAproxy instance
|
||||
connects the public network with the tenant private network that hosts
|
||||
all of the content servers. In the external scenario, a load balancer
|
||||
needs to serve the VIP and also connect to the tenant overlay
|
||||
network through external means or through private addresses.</para>
|
||||
<para>Another kind of NAT that may be useful is protocol NAT. In some
|
||||
cases it may be desirable to use only IPv6 addresses on instances and
|
||||
operate either an instance or an external service to provide a NAT-based
|
||||
transition technology such as NAT64 and DNS64. This provides the ability
|
||||
to have a globally routable IPv6 address while only consuming IPv4
|
||||
addresses as necessary or in a shared manner.</para>
|
||||
<para>Application workloads affect the design of the underlying network
|
||||
architecture. If a workload requires network-level redundancy, the
|
||||
routing and switching architecture have to accommodate this. There
|
||||
are differing methods for providing this that are dependent on the
|
||||
selected network hardware, the performance of the hardware, and which
|
||||
networking model you deploy. Examples include
|
||||
Link aggregation (LAG) and Hot Standby Router Protocol (HSRP). Also
|
||||
consider whether to deploy OpenStack Networking or
|
||||
legacy networking (nova-network), and which plug-in to select for
|
||||
OpenStack Networking. If using an external system, configure Networking
|
||||
to run <glossterm baseform="Layer-2 network">layer 2</glossterm>
|
||||
with a provider network configuration. For example, implement HSRP
|
||||
to terminate layer-3 connectivity.</para>
|
||||
<para>Depending on the workload, overlay networks may not be the best
|
||||
solution. Where application network connections are
|
||||
small, short lived, or bursty, running a dynamic overlay can generate
|
||||
as much bandwidth as the packets it carries. It also can induce enough
|
||||
latency to cause issues with certain applications. There is an impact
|
||||
to the device generating the overlay which, in most installations,
|
||||
is the hypervisor. This causes performance degradation on packet
|
||||
per second and connection per second rates.</para>
|
||||
<para>Overlays also come with a secondary option that may not be
|
||||
appropriate to a specific workload. While all of them operate in full
|
||||
mesh by default, there might be good reasons to disable this function
|
||||
because it may cause excessive overhead for some workloads. Conversely,
|
||||
other workloads operate without issue. For example, most web services
|
||||
applications do not have major issues with a full mesh overlay network,
|
||||
while some network monitoring tools or storage replication workloads
|
||||
have performance issues with throughput or excessive broadcast
|
||||
traffic.</para>
|
||||
<para>Many people overlook an important design decision: The choice of
|
||||
layer-3 protocols. While OpenStack was initially built with only IPv4
|
||||
support, Networking now supports IPv6 and dual-stacked networks.
|
||||
Note that, as of the Icehouse release, this only includes
|
||||
stateless address auto configuration but work is in
|
||||
progress to support stateless and stateful DHCPv6 as well as
|
||||
IPv6 floating IPs without NAT. Some workloads become possible
|
||||
through the use of IPv6 and IPv6 to IPv4 reverse transition
|
||||
mechanisms such as NAT64 and DNS64 or <glossterm>6to4</glossterm>,
|
||||
because these
|
||||
options are available. This will alter the requirements for
|
||||
any address plan as single-stacked and transitional IPv6
|
||||
deployments can alleviate the need for IPv4 addresses.</para>
|
||||
<para>As of the Kilo release, OpenStack has limited support
|
||||
for dynamic routing, however there are a number of options
|
||||
available by incorporating third party solutions to implement
|
||||
routing within the cloud including network equipment, hardware
|
||||
nodes, and instances. Some workloads will perform well with
|
||||
nothing more than static routes and default gateways
|
||||
configured at the layer-3 termination point. In most cases
|
||||
this will suffice, however some cases require the addition of
|
||||
at least one type of dynamic routing protocol if not multiple
|
||||
protocols. Having a form of interior gateway protocol (IGP)
|
||||
available to the instances inside an OpenStack installation
|
||||
opens up the possibility of use cases for anycast route
|
||||
injection for services that need to use it as a geographic
|
||||
location or failover mechanism. Other applications may wish to
|
||||
directly participate in a routing protocol, either as a
|
||||
passive observer as in the case of a looking glass, or as an
|
||||
active participant in the form of a route reflector. Since an
|
||||
instance might have a large amount of compute and memory
|
||||
resources, it is trivial to hold an entire unpartitioned
|
||||
routing table and use it to provide services such as network
|
||||
path visibility to other applications or as a monitoring
|
||||
As of the Icehouse release, this only includes stateless
|
||||
address auto configuration but work is in progress to support stateless
|
||||
and stateful DHCPv6 as well as IPv6 floating IPs without NAT. Some
|
||||
workloads are possible through the use of IPv6 and IPv6 to IPv4
|
||||
reverse transition mechanisms such as NAT64 and DNS64 or
|
||||
<glossterm>6to4</glossterm>.
|
||||
This alters the requirements for any address plan as single-stacked and
|
||||
transitional IPv6 deployments can alleviate the need for IPv4
|
||||
addresses.</para>
|
||||
<para>As of the Icehouse release, OpenStack has limited support for
|
||||
dynamic routing, however there are a number of options available by
|
||||
incorporating third party solutions to implement routing within the
|
||||
cloud including network equipment, hardware nodes, and instances. Some
|
||||
workloads perform well with nothing more than static routes and default
|
||||
gateways configured at the layer-3 termination point. In most cases this
|
||||
is sufficient, however some cases require the addition of at least one
|
||||
type of dynamic routing protocol if not multiple protocols. Having a
|
||||
form of interior gateway protocol (IGP) available to the instances
|
||||
inside an OpenStack installation opens up the possibility of use cases
|
||||
for anycast route injection for services that need to use it as a
|
||||
geographic location or failover mechanism. Other applications may wish
|
||||
to directly participate in a routing protocol, either as a passive
|
||||
observer, as in the case of a looking glass, or as an active participant
|
||||
in the form of a route reflector. Since an instance might have a large
|
||||
amount of compute and memory resources, it is trivial to hold an entire
|
||||
unpartitioned routing table and use it to provide services such as
|
||||
network path visibility to other applications or as a monitoring
|
||||
tool.</para>
|
||||
<para>
|
||||
Path maximum transmission unit (MTU) failures are lesser known
|
||||
but harder to diagnose. The MTU must be large enough to handle
|
||||
normal traffic, overhead from an overlay network, and the
|
||||
desired layer-3 protocol. When you add externally built tunnels,
|
||||
the MTU packet size is reduced. In this case, you must pay
|
||||
attention to the fully calculated MTU size because some systems
|
||||
are configured to ignore or drop path MTU discovery packets.
|
||||
</para>
|
||||
<para>Path maximum transmission unit (MTU) failures are lesser known but
|
||||
harder to diagnose. The MTU must be large enough to handle normal
|
||||
traffic, overhead from an overlay network, and the desired layer-3
|
||||
protocol. Adding externally built tunnels reduces the MTU packet size.
|
||||
In this case, you must pay attention to the fully
|
||||
calculated MTU size because some systems ignore or
|
||||
drop path MTU discovery packets.</para>
|
||||
</section>
|
||||
<section xml:id="tunables">
|
||||
<title>Tunable networking components</title>
|
||||
<para>Consider configurable networking components related to an
|
||||
OpenStack architecture design when designing for network intensive
|
||||
workloads include MTU and QoS. Some workloads will require a larger
|
||||
MTU than normal based on a requirement to transfer large blocks of
|
||||
data. When providing network service for applications such as video
|
||||
streaming or storage replication, it is recommended to ensure that
|
||||
both OpenStack hardware nodes and the supporting network equipment
|
||||
are configured for jumbo frames where possible. This will allow for
|
||||
a better utilization of available bandwidth. Configuration of jumbo
|
||||
frames should be done across the complete path the packets will
|
||||
traverse. If one network component is not capable of handling jumbo
|
||||
frames then the entire path will revert to the default MTU.</para>
|
||||
<para>Quality of Service (QoS) also has a great impact on network
|
||||
intensive workloads by providing instant service to packets which
|
||||
have a higher priority due to their ability to be impacted by poor
|
||||
network performance. In applications such as Voice over IP (VoIP)
|
||||
differentiated services code points are a near requirement for
|
||||
proper operation. QoS can also be used in the opposite direction for
|
||||
mixed workloads to prevent low priority but high bandwidth
|
||||
applications, for example backup services, video conferencing or
|
||||
file sharing, from blocking bandwidth that is needed for the proper
|
||||
operation of other workloads. It is possible to tag file storage
|
||||
traffic as a lower class, such as best effort or scavenger, to allow
|
||||
the higher priority traffic through. In cases where regions within a
|
||||
cloud might be geographically distributed it may also be necessary
|
||||
to plan accordingly to implement WAN optimization to combat latency
|
||||
or packet loss.</para>
|
||||
<title>Tunable networking components</title>
|
||||
<para>Consider configurable networking components related to an
|
||||
OpenStack architecture design when designing for network intensive
|
||||
workloads that include MTU and QoS. Some workloads require a larger MTU
|
||||
than normal due to the transfer of large blocks of data.
|
||||
When providing network service for applications such as video
|
||||
streaming or storage replication, we recommend that you configure
|
||||
both OpenStack hardware nodes and the supporting network equipment
|
||||
for jumbo frames where possible. This allows for better use of
|
||||
available bandwidth. Configure jumbo frames
|
||||
across the complete path the packets traverse. If one network
|
||||
component is not capable of handling jumbo frames then the entire
|
||||
path reverts to the default MTU.</para>
|
||||
<para>Quality of Service (QoS) also has a great impact on network
|
||||
intensive workloads as it provides instant service to packets which
|
||||
have a higher priority due to the impact of poor
|
||||
network performance. In applications such as Voice over IP (VoIP),
|
||||
differentiated services code points are a near requirement for proper
|
||||
operation. You can also use QoS in the opposite direction for mixed
|
||||
workloads to prevent low priority but high bandwidth applications,
|
||||
for example backup services, video conferencing, or file sharing,
|
||||
from blocking bandwidth that is needed for the proper operation of
|
||||
other workloads. It is possible to tag file storage traffic as a
|
||||
lower class, such as best effort or scavenger, to allow the higher
|
||||
priority traffic through. In cases where regions within a cloud might
|
||||
be geographically distributed it may also be necessary to plan
|
||||
accordingly to implement WAN optimization to combat latency or
|
||||
packet loss.</para>
|
||||
</section>
|
||||
</section>
|
||||
|
@ -6,67 +6,63 @@
|
||||
xml:id="operational-considerations-networking-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Operational considerations</title>
|
||||
<para>Network focused OpenStack clouds have a number of
|
||||
operational considerations that will influence the selected
|
||||
design. Topics including, but not limited to, dynamic routing
|
||||
of static routes, service level agreements, and ownership of
|
||||
user management all need to be considered.</para>
|
||||
<para>One of the first required decisions is the selection of a
|
||||
telecom company or transit provider. This is especially true
|
||||
if the network requirements include external or site-to-site
|
||||
network connectivity.</para>
|
||||
<para>Additional design decisions need to be made about monitoring
|
||||
and alarming. These can be an internal responsibility or the
|
||||
responsibility of the external provider. In the case of using
|
||||
an external provider, SLAs will likely apply. In addition,
|
||||
other operational considerations such as bandwidth, latency,
|
||||
and jitter can be part of a service level agreement.</para>
|
||||
<para>The ability to upgrade the infrastructure is another subject
|
||||
for consideration. As demand for network resources increase,
|
||||
operators will be required to add additional IP address blocks
|
||||
and add additional bandwidth capacity. Managing hardware and
|
||||
software life cycle events, for example upgrades,
|
||||
decommissioning, and outages while avoiding service
|
||||
interruptions for tenants, will also need to be
|
||||
considered.</para>
|
||||
<para>Maintainability will also need to be factored into the
|
||||
overall network design. This includes the ability to manage
|
||||
and maintain IP addresses as well as the use of overlay
|
||||
identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
|
||||
tags. As an example, if all of the IP addresses have to be
|
||||
changed on a network, a process known as renumbering, then the
|
||||
design needs to support the ability to do so.</para>
|
||||
<para>Network focused applications themselves need to be addressed
|
||||
when concerning certain operational realities. For example,
|
||||
the impending exhaustion of IPv4 addresses, the migration to
|
||||
IPv6 and the utilization of private networks to segregate
|
||||
different types of traffic that an application receives or
|
||||
generates. In the case of IPv4 to IPv6 migrations,
|
||||
applications should follow best practices for storing IP
|
||||
addresses. It is further recommended to avoid relying on IPv4
|
||||
features that were not carried over to the IPv6 protocol or
|
||||
have differences in implementation.</para>
|
||||
<para>When using private networks to segregate traffic,
|
||||
applications should create private tenant networks for
|
||||
database and data storage network traffic, and utilize public
|
||||
networks for client-facing traffic. By segregating this
|
||||
traffic, quality of service and security decisions can be made
|
||||
to ensure that each network has the correct level of service
|
||||
that it requires.</para>
|
||||
<para>Finally, decisions must be made about the routing of network
|
||||
traffic. For some applications, a more complex policy
|
||||
framework for routing must be developed. The economic cost of
|
||||
transmitting traffic over expensive links versus cheaper
|
||||
links, in addition to bandwidth, latency, and jitter
|
||||
requirements, can be used to create a routing policy that will
|
||||
satisfy business requirements.</para>
|
||||
<para>How to respond to network events must also be taken into
|
||||
consideration. As an example, how load is transferred from one
|
||||
link to another during a failure scenario could be a factor in
|
||||
the design. If network capacity is not planned correctly,
|
||||
failover traffic could overwhelm other ports or network links
|
||||
and create a cascading failure scenario. In this case, traffic
|
||||
that fails over to one link overwhelms that link and then
|
||||
moves to the subsequent links until the all network traffic
|
||||
stops.</para>
|
||||
<para>Network-focused OpenStack clouds have a number of operational
|
||||
considerations that influence the selected design, including:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Dynamic routing of static routes</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Service level agreements (SLAs)</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Ownership of user management</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>An initial network consideration is the selection of a telecom
|
||||
company or transit provider.</para>
|
||||
<para>Make additional design decisions about monitoring and alarming.
|
||||
This can be an internal responsibility or the responsibility of the
|
||||
external provider. In the case of using an external provider, service
|
||||
level agreements (SLAs) likely apply. In addition, other operational
|
||||
considerations such as bandwidth, latency, and jitter can be part of an
|
||||
SLA.</para>
|
||||
<para>Consider the ability to upgrade the infrastructure. As demand for
|
||||
network resources increase, operators add additional IP address blocks
|
||||
and add additional bandwidth capacity. In addition, consider managing
|
||||
hardware and software life cycle events, for example upgrades,
|
||||
decommissioning, and outages, while avoiding service interruptions for
|
||||
tenants.</para>
|
||||
<para>Factor maintainability into the overall network design. This
|
||||
includes the ability to manage and maintain IP addresses as well as the
|
||||
use of overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and
|
||||
MPLS tags. As an example, if you may need to change all of the IP
|
||||
addresses on a network, a process known as renumbering, then the design
|
||||
must support this function.</para>
|
||||
<para>Address network-focused applications when considering certain
|
||||
operational realities. For example, consider the impending exhaustion
|
||||
of IPv4 addresses, the migration to IPv6, and the use of private
|
||||
networks to segregate different types of traffic that an application
|
||||
receives or generates. In the case of IPv4 to IPv6 migrations,
|
||||
applications should follow best practices for storing IP addresses.
|
||||
We recommend you avoid relying on IPv4 features that did not carry over
|
||||
to the IPv6 protocol or have differences in implementation.</para>
|
||||
<para>To segregate traffic, allow applications to create a private tenant
|
||||
network for database and storage network traffic. Use a public network
|
||||
for services that require direct client access from the internet. Upon
|
||||
segregating the traffic, consider quality of service (QoS) and security
|
||||
to ensure each network has the required level of service.</para>
|
||||
<para>Finally, consider the routing of network traffic.
|
||||
For some applications, develop a complex policy framework for
|
||||
routing. To create a routing policy that satisfies business requirements,
|
||||
consider the economic cost of transmitting traffic over expensive links
|
||||
versus cheaper links, in addition to bandwidth, latency, and jitter
|
||||
requirements.</para>
|
||||
<para>Additionally, consider how to respond to network events. As an
|
||||
example, how load transfers from one link to another during a
|
||||
failure scenario could be a factor in the design. If you do not plan
|
||||
network capacity correctly, failover traffic could overwhelm other ports
|
||||
or network links and create a cascading failure scenario. In this case,
|
||||
traffic that fails over to one link overwhelms that link and then moves
|
||||
to the subsequent links until all network traffic stops.</para>
|
||||
</section>
|
||||
|
@ -6,34 +6,34 @@
|
||||
xml:id="prescriptive-example-large-scale-web-app">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>Prescriptive examples</title>
|
||||
<para>A large-scale web application has been designed with cloud
|
||||
principles in mind. The application is designed to scale
|
||||
horizontally in a bursting fashion and will generate a high
|
||||
<para>An organization design a large-scale web application with cloud
|
||||
principles in mind. The application scales
|
||||
horizontally in a bursting fashion and generates a high
|
||||
instance count. The application requires an SSL connection to
|
||||
secure data and must not lose connection state to individual
|
||||
servers.</para>
|
||||
<para>An example design for this workload is depicted in the
|
||||
figure below. In this example, a hardware load balancer is
|
||||
configured to provide SSL offload functionality and to connect
|
||||
<para>The figure below depicts an example design for this workload.
|
||||
In this example, a hardware load balancer provides SSL offload
|
||||
functionality and connects
|
||||
to tenant networks in order to reduce address consumption.
|
||||
This load balancer is linked to the routing architecture as it
|
||||
will service the VIP for the application. The router and load
|
||||
balancer are configured with GRE tunnel ID of the
|
||||
application's tenant network and provided an IP address within
|
||||
This load balancer links to the routing architecture as it
|
||||
services the VIP for the application. The router and load
|
||||
balancer use the GRE tunnel ID of the
|
||||
application's tenant network and an IP address within
|
||||
the tenant subnet but outside of the address pool. This is to
|
||||
ensure that the load balancer can communicate with the
|
||||
application's HTTP servers without requiring the consumption
|
||||
of a public IP address.</para>
|
||||
<para>Because sessions persist until they are closed, the routing and
|
||||
switching architecture is designed for high availability.
|
||||
Switches are meshed to each hypervisor and each other, and
|
||||
<para>Because sessions persist until closed, the routing and
|
||||
switching architecture provides high availability.
|
||||
Switches mesh to each hypervisor and each other, and
|
||||
also provide an MLAG implementation to ensure that layer-2
|
||||
connectivity does not fail. Routers are configured with VRRP
|
||||
and fully meshed with switches to ensure layer-3 connectivity.
|
||||
Since GRE is used as an overlay network, Networking is installed
|
||||
and configured to use the Open vSwitch agent in GRE tunnel
|
||||
connectivity does not fail. Routers use VRRP
|
||||
and fully mesh with switches to ensure layer-3 connectivity.
|
||||
Since GRE is provides an overlay network, Networking is present
|
||||
and uses the Open vSwitch agent in GRE tunnel
|
||||
mode. This ensures all devices can reach all other devices and
|
||||
that tenant networks can be created for private addressing
|
||||
that you can create tenant networks for private addressing
|
||||
links to the load balancer.
|
||||
<mediaobject>
|
||||
<imageobject>
|
||||
@ -44,9 +44,9 @@
|
||||
</mediaobject></para>
|
||||
<para>A web service architecture has many options and optional
|
||||
components. Due to this, it can fit into a large number of
|
||||
other OpenStack designs however a few key components will need
|
||||
other OpenStack designs. A few key components, however, need
|
||||
to be in place to handle the nature of most web-scale
|
||||
workloads. The user needs the following components:</para>
|
||||
workloads. You require the following components:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>OpenStack Controller services (Image, Identity,
|
||||
@ -66,59 +66,59 @@
|
||||
<para>Telemetry module</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>
|
||||
Beyond the normal Identity, Compute, Image service and Object
|
||||
Storage components, the Orchestration module is a recommended
|
||||
component to handle properly scaling the workloads to adjust to
|
||||
demand. Due to the requirement for auto-scaling,
|
||||
the design includes the Telemetry module. Web services
|
||||
tend to be bursty in load, have very defined peak and valley
|
||||
usage patterns and, as a result, benefit from automatic scaling
|
||||
of instances based upon traffic. At a network level, a split
|
||||
network configuration will work well with databases residing on
|
||||
private tenant networks since these do not emit a large quantity
|
||||
of broadcast traffic and may need to interconnect to some
|
||||
databases for content.
|
||||
<para>Beyond the normal Identity, Compute, Image service, and Object
|
||||
Storage components, we recommend the Orchestration module
|
||||
component to handle the proper scaling of workloads to adjust to
|
||||
demand. Due to the requirement for auto-scaling,
|
||||
the design includes the Telemetry module. Web services
|
||||
tend to be bursty in load, have very defined peak and valley
|
||||
usage patterns and, as a result, benefit from automatic scaling
|
||||
of instances based upon traffic. At a network level, a split
|
||||
network configuration works well with databases residing on
|
||||
private tenant networks since these do not emit a large quantity
|
||||
of broadcast traffic and may need to interconnect to some
|
||||
databases for content.
|
||||
</para>
|
||||
<section xml:id="load-balancing">
|
||||
<title>Load balancing</title>
|
||||
<para>Load balancing was included in this design to spread
|
||||
requests across multiple instances. This workload scales well
|
||||
horizontally across large numbers of instances. This allows
|
||||
instances to run without publicly routed IP addresses and
|
||||
simply rely on the load balancer for the service to be
|
||||
globally reachable. Many of these services do not require
|
||||
<para>Load balancing spreads requests across multiple instances.
|
||||
This workload scales well horizontally across large numbers of
|
||||
instances. This enables instances to run without publicly
|
||||
routed IP addresses and instead to rely on the load
|
||||
balancer to provide a globally reachable service.
|
||||
Many of these services do not require
|
||||
direct server return. This aids in address planning and
|
||||
utilization at scale since only the virtual IP (VIP) must be
|
||||
public.</para></section>
|
||||
|
||||
public.</para>
|
||||
</section>
|
||||
<section xml:id="overlay-networks">
|
||||
<title>Overlay networks</title>
|
||||
<para>
|
||||
The overlay functionality design includes OpenStack Networking
|
||||
in Open vSwitch GRE tunnel mode.
|
||||
In this case, the layer-3 external routers are paired with
|
||||
VRRP and switches should be paired with an implementation of
|
||||
MLAG running to ensure that you do not lose connectivity with
|
||||
In this case, the layer-3 external routers pair with
|
||||
VRRP, and switches pair with an implementation of
|
||||
MLAG to ensure that you do not lose connectivity with
|
||||
the upstream routing infrastructure.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="performance-tuning">
|
||||
<title>Performance tuning</title>
|
||||
<para>Network level tuning for this workload is minimal.
|
||||
Quality-of-Service (QoS) will be applied to these workloads
|
||||
<para>Network level tuning for this workload is minimal.
|
||||
Quality-of-Service (QoS) applies to these workloads
|
||||
for a middle ground Class Selector depending on existing
|
||||
policies. It will be higher than a best effort queue but lower
|
||||
policies. It is higher than a best effort queue but lower
|
||||
than an Expedited Forwarding or Assured Forwarding queue.
|
||||
Since this type of application generates larger packets with
|
||||
longer-lived connections, bandwidth utilization can be
|
||||
optimized for long duration TCP. Normal bandwidth planning
|
||||
longer-lived connections, you can optimize bandwidth utilization
|
||||
for long duration TCP. Normal bandwidth planning
|
||||
applies here with regards to benchmarking a session's usage
|
||||
multiplied by the expected number of concurrent sessions with
|
||||
overhead.</para></section>
|
||||
overhead.</para>
|
||||
</section>
|
||||
<section xml:id="network-functions">
|
||||
<title>Network functions</title>
|
||||
<para>Network functions is a broad category but encompasses
|
||||
<para>Network functions is a broad category but encompasses
|
||||
workloads that support the rest of a system's network. These
|
||||
workloads tend to consist of large amounts of small packets
|
||||
that are very short lived, such as DNS queries or SNMP traps.
|
||||
@ -134,63 +134,57 @@
|
||||
<para>The supporting network for this type of configuration needs
|
||||
to have a low latency and evenly distributed availability.
|
||||
This workload benefits from having services local to the
|
||||
consumers of the service. A multi-site approach is used as
|
||||
consumers of the service. Use a multi-site approach as
|
||||
well as deploying many copies of the application to handle
|
||||
load as close as possible to consumers. Since these
|
||||
applications function independently, they do not warrant
|
||||
running overlays to interconnect tenant networks. Overlays
|
||||
also have the drawback of performing poorly with rapid flow
|
||||
setup and may incur too much overhead with large quantities of
|
||||
small packets and are therefore not recommended.</para>
|
||||
<para>QoS is desired for some workloads to ensure delivery. DNS
|
||||
small packets and therefore we do not recommend them.</para>
|
||||
<para>QoS is desirable for some workloads to ensure delivery. DNS
|
||||
has a major impact on the load times of other services and
|
||||
needs to be reliable and provide rapid responses. It is to
|
||||
configure rules in upstream devices to apply a higher Class
|
||||
needs to be reliable and provide rapid responses. Configure rules
|
||||
in upstream devices to apply a higher Class
|
||||
Selector to DNS to ensure faster delivery or a better spot in
|
||||
queuing algorithms.</para></section>
|
||||
queuing algorithms.</para>
|
||||
</section>
|
||||
<section xml:id="cloud-storage">
|
||||
<title>Cloud storage</title>
|
||||
<para>
|
||||
Another common use case for OpenStack environments is to provide
|
||||
a cloud-based file storage and sharing service. You might
|
||||
consider this a storage-focused use case, but its network-side
|
||||
requirements make it a network-focused use case.</para>
|
||||
<para>
|
||||
For example, consider a cloud backup application. This workload
|
||||
has two specific behaviors that impact the network. Because this
|
||||
workload is an externally-facing service and an
|
||||
internally-replicating application, it has both <glossterm
|
||||
baseform="north-south traffic">north-south</glossterm> and
|
||||
<glossterm>east-west traffic</glossterm>
|
||||
considerations, as follows:
|
||||
</para>
|
||||
<para>Another common use case for OpenStack environments is providing
|
||||
a cloud-based file storage and sharing service. You might
|
||||
consider this a storage-focused use case, but its network-side
|
||||
requirements make it a network-focused use case.</para>
|
||||
<para>For example, consider a cloud backup application. This workload
|
||||
has two specific behaviors that impact the network. Because this
|
||||
workload is an externally-facing service and an
|
||||
internally-replicating application, it has both <glossterm
|
||||
baseform="north-south traffic">north-south</glossterm> and
|
||||
<glossterm>east-west traffic</glossterm>
|
||||
considerations:</para>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>north-south traffic</term>
|
||||
<listitem>
|
||||
<para>
|
||||
When a user uploads and stores content, that content moves
|
||||
<para>When a user uploads and stores content, that content moves
|
||||
into the OpenStack installation. When users download this
|
||||
content, the content moves from the OpenStack
|
||||
installation. Because this service is intended primarily
|
||||
content, the content moves out from the OpenStack
|
||||
installation. Because this service operates primarily
|
||||
as a backup, most of the traffic moves southbound into the
|
||||
environment. In this situation, it benefits you to
|
||||
configure a network to be asymmetrically downstream
|
||||
because the traffic that enters the OpenStack installation
|
||||
is greater than the traffic that leaves the installation.
|
||||
</para>
|
||||
is greater than the traffic that leaves the installation.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>east-west traffic</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Likely to be fully symmetric. Because replication
|
||||
<para>Likely to be fully symmetric. Because replication
|
||||
originates from any node and might target multiple other
|
||||
nodes algorithmically, it is less likely for this traffic
|
||||
to have a larger volume in any specific direction. However
|
||||
this traffic might interfere with north-south traffic.
|
||||
</para>
|
||||
this traffic might interfere with north-south traffic.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
@ -201,16 +195,15 @@
|
||||
/>
|
||||
</imageobject>
|
||||
</mediaobject>
|
||||
<para>
|
||||
This application prioritizes the north-south traffic over
|
||||
<para>This application prioritizes the north-south traffic over
|
||||
east-west traffic: the north-south traffic involves
|
||||
customer-facing data.
|
||||
</para>
|
||||
customer-facing data.</para>
|
||||
<para>The network design in this case is less dependent on
|
||||
availability and more dependent on being able to handle high
|
||||
bandwidth. As a direct result, it is beneficial to forego
|
||||
redundant links in favor of bonding those connections. This
|
||||
increases available bandwidth. It is also beneficial to
|
||||
configure all devices in the path, including OpenStack, to
|
||||
generate and pass jumbo frames.</para></section>
|
||||
availability and more dependent on being able to handle high
|
||||
bandwidth. As a direct result, it is beneficial to forgo
|
||||
redundant links in favor of bonding those connections. This
|
||||
increases available bandwidth. It is also beneficial to
|
||||
configure all devices in the path, including OpenStack, to
|
||||
generate and pass jumbo frames.</para>
|
||||
</section>
|
||||
</section>
|
||||
|
@ -13,27 +13,23 @@
|
||||
involve those made about the protocol layer and the point when
|
||||
IP comes into the picture. As an example, a completely
|
||||
internal OpenStack network can exist at layer 2 and ignore
|
||||
layer 3 however, in order for any traffic to go outside of
|
||||
that cloud, to another network, or to the Internet, a layer-3
|
||||
router or switch must be involved.</para>
|
||||
<para>
|
||||
The past few years have seen two competing trends in
|
||||
layer 3. In order for any traffic to go outside of
|
||||
that cloud, to another network, or to the Internet, however, you must
|
||||
use a layer-3 router or switch.</para>
|
||||
<para>The past few years have seen two competing trends in
|
||||
networking. One trend leans towards building data center network
|
||||
architectures based on layer-2 networking. Another trend treats
|
||||
the cloud environment essentially as a miniature version of the
|
||||
Internet. This approach is radically different from the network
|
||||
architecture approach that is used in the staging environment:
|
||||
the Internet is based entirely on layer-3 routing rather than
|
||||
layer-2 switching.
|
||||
</para>
|
||||
<para>
|
||||
A network designed on layer-2 protocols has advantages over one
|
||||
architecture approach in the staging environment:
|
||||
the Internet only uses layer-3 routing rather than
|
||||
layer-2 switching.</para>
|
||||
<para>A network designed on layer-2 protocols has advantages over one
|
||||
designed on layer-3 protocols. In spite of the difficulties of
|
||||
using a bridge to perform the network role of a router, many
|
||||
vendors, customers, and service providers choose to use Ethernet
|
||||
in as many parts of their networks as possible. The benefits of
|
||||
selecting a layer-2 design are:
|
||||
</para>
|
||||
selecting a layer-2 design are:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Ethernet frames contain all the essentials for
|
||||
@ -47,13 +43,13 @@
|
||||
protocol.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>More layers added to the Ethernet frame only slow
|
||||
<para>Adding more layers to the Ethernet frame only slows
|
||||
the networking process down. This is known as 'nodal
|
||||
processing delay'.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Adjunct networking features, for example class of
|
||||
service (CoS) or multicasting, can be added to
|
||||
<para>You can add adjunct networking features, for
|
||||
example class of service (CoS) or multicasting, to
|
||||
Ethernet as readily as IP networks.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
@ -62,45 +58,37 @@
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Most information starts and ends inside Ethernet frames.
|
||||
Today this applies to data, voice (for example, VoIP) and
|
||||
video (for example, web cameras). The concept is that, if more
|
||||
of the end-to-end transfer of information from a source to a
|
||||
destination can be done in the form of Ethernet frames, more
|
||||
of the benefits of Ethernet can be realized on the network.
|
||||
Though it is not a substitute for IP networking, networking at
|
||||
layer 2 can be a powerful adjunct to IP networking.
|
||||
</para>
|
||||
Today this applies to data, voice (for example, VoIP), and
|
||||
video (for example, web cameras). The concept is that, if you can
|
||||
perform more of the end-to-end transfer of information from
|
||||
a source to a destination in the form of Ethernet frames, the network
|
||||
benefits more from the advantages of Ethernet.
|
||||
Although it is not a substitute for IP networking, networking at
|
||||
layer 2 can be a powerful adjunct to IP networking.</para>
|
||||
<para>
|
||||
Layer-2 Ethernet usage has these advantages over layer-3 IP
|
||||
network usage:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
Speed
|
||||
</para>
|
||||
<para>Speed</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Reduced overhead of the IP hierarchy.
|
||||
</para>
|
||||
<para>Reduced overhead of the IP hierarchy.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
No need to keep track of address configuration as systems
|
||||
are moved around. Whereas the simplicity of layer-2
|
||||
<para>No need to keep track of address configuration as systems
|
||||
move around. Whereas the simplicity of layer-2
|
||||
protocols might work well in a data center with hundreds
|
||||
of physical machines, cloud data centers have the
|
||||
additional burden of needing to keep track of all virtual
|
||||
machine addresses and networks. In these data centers, it
|
||||
is not uncommon for one physical node to support 30-40
|
||||
instances.
|
||||
</para>
|
||||
instances.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<important>
|
||||
<para>
|
||||
Networking at the frame level says nothing
|
||||
<para>Networking at the frame level says nothing
|
||||
about the presence or absence of IP addresses at the packet
|
||||
level. Almost all ports, links, and devices on a network of
|
||||
LAN switches still have IP addresses, as do all the source and
|
||||
@ -125,8 +113,8 @@
|
||||
limited.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>The need to maintain a set of layer-4 devices to
|
||||
handle traffic control must be accommodated.</para>
|
||||
<para>You must accommodate the need to maintain a set of
|
||||
layer-4 devices to handle traffic control.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>MLAG, often used for switch redundancy, is a
|
||||
@ -138,21 +126,20 @@
|
||||
without IP addresses and ICMP.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Configuring <glossterm
|
||||
<para>Configuring <glossterm
|
||||
baseform="Address Resolution Protocol (ARP)">ARP</glossterm>
|
||||
is considered complicated on large layer-2 networks.</para>
|
||||
can be complicated on large layer-2 networks.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>All network devices need to be aware of all MACs,
|
||||
even instance MACs, so there is constant churn in MAC
|
||||
tables and network state changes as instances are
|
||||
started or stopped.</para>
|
||||
tables and network state changes as instances start and
|
||||
stop.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Migrating MACs (instance migration) to different
|
||||
physical locations are a potential problem if ARP
|
||||
table timeouts are not set properly.</para>
|
||||
physical locations are a potential problem if you do not
|
||||
set ARP table timeouts properly.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>It is important to know that layer 2 has a very limited set
|
||||
@ -173,14 +160,15 @@
|
||||
with the new location of the instance.</para>
|
||||
<para>In a layer-2 network, all devices are aware of all MACs,
|
||||
even those that belong to instances. The network state
|
||||
information in the backbone changes whenever an instance is
|
||||
started or stopped. As a result there is far too much churn in
|
||||
the MAC tables on the backbone switches.</para></section>
|
||||
information in the backbone changes whenever an instance starts
|
||||
or stops. As a result there is far too much churn in
|
||||
the MAC tables on the backbone switches.</para>
|
||||
</section>
|
||||
<section xml:id="layer-3-arch-advantages">
|
||||
<title>Layer-3 architecture advantages</title>
|
||||
<para>In the layer 3 case, there is no churn in the routing tables
|
||||
due to instances starting and stopping. The only time there
|
||||
would be a routing state change would be in the case of a Top
|
||||
would be a routing state change is in the case of a Top
|
||||
of Rack (ToR) switch failure or a link failure in the backbone
|
||||
itself. Other advantages of using a layer-3 architecture
|
||||
include:</para>
|
||||
@ -194,15 +182,15 @@
|
||||
straightforward.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Layer 3 can be configured to use <glossterm
|
||||
<para>You can configure layer 3 to use <glossterm
|
||||
baseform="Border Gateway Protocol (BGP)">BGP</glossterm>
|
||||
confederation for scalability so core routers have state
|
||||
proportional to the number of racks, not to the number of
|
||||
servers or instances.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Routing ensures that instance MAC and IP addresses
|
||||
out of the network core reducing state churn. Routing
|
||||
<para>Routing takes instance MAC and IP addresses
|
||||
out of the network core, reducing state churn. Routing
|
||||
state changes only occur in the case of a ToR switch
|
||||
failure or backbone link failure.</para>
|
||||
</listitem>
|
||||
@ -211,7 +199,7 @@
|
||||
example ICMP, to monitor and manage traffic.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Layer-3 architectures allow for the use of Quality
|
||||
<para>Layer-3 architectures enable the use of Quality
|
||||
of Service (QoS) to manage network performance.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
@ -220,17 +208,16 @@
|
||||
<para>The main limitation of layer 3 is that there is no built-in
|
||||
isolation mechanism comparable to the VLANs in layer-2
|
||||
networks. Furthermore, the hierarchical nature of IP addresses
|
||||
means that an instance will also be on the same subnet as its
|
||||
physical host. This means that it cannot be migrated outside
|
||||
means that an instance is on the same subnet as its
|
||||
physical host. This means that you cannot migrate it outside
|
||||
of the subnet easily. For these reasons, network
|
||||
virtualization needs to use IP <glossterm>encapsulation</glossterm>
|
||||
and software at
|
||||
the end hosts for both isolation, as well as for separation of
|
||||
the addressing in the virtual layer from addressing in the
|
||||
and software at the end hosts for isolation and the separation of
|
||||
the addressing in the virtual layer from the addressing in the
|
||||
physical layer. Other potential disadvantages of layer 3
|
||||
include the need to design an IP addressing scheme rather than
|
||||
relying on the switches to automatically keep track of the MAC
|
||||
addresses and to configure the interior gateway routing
|
||||
relying on the switches to keep track of the MAC
|
||||
addresses automatically and to configure the interior gateway routing
|
||||
protocol in the switches.</para>
|
||||
</section>
|
||||
</section>
|
||||
@ -242,13 +229,13 @@
|
||||
Data in an OpenStack cloud moves both between instances across
|
||||
the network (also known as East-West), as well as in and out
|
||||
of the system (also known as North-South). Physical server
|
||||
nodes have network requirements that are independent of those
|
||||
used by instances which need to be isolated from the core
|
||||
network to account for scalability. It is also recommended to
|
||||
functionally separate the networks for security purposes and
|
||||
tune performance through traffic shaping.</para>
|
||||
<para>A number of important general technical and business factors
|
||||
need to be taken into consideration when planning and
|
||||
nodes have network requirements that are independent of instance
|
||||
network requirements, which you must isolate from the core
|
||||
network to account for scalability. We recommend
|
||||
functionally separating the networks for security purposes and
|
||||
tuning performance through traffic shaping.</para>
|
||||
<para>You must consider a number of important general technical
|
||||
and business factors when planning and
|
||||
designing an OpenStack network. They include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
@ -286,11 +273,10 @@
|
||||
future production environments.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Keeping all of these in mind, the following network design
|
||||
recommendations can be made:</para>
|
||||
<para>Bearing in mind these considerations, we recommend the following:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Layer-3 designs are preferred over layer-2
|
||||
<para>Layer-3 designs are preferable to layer-2
|
||||
architectures.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
@ -327,16 +313,16 @@
|
||||
</itemizedlist></section>
|
||||
<section xml:id="additional-considerations-network-focus">
|
||||
<title>Additional considerations</title>
|
||||
<para>There are numerous topics to consider when designing a
|
||||
<para>There are several further considerations when designing a
|
||||
network-focused OpenStack cloud.</para>
|
||||
<section xml:id="openstack-networking-versus-nova-network">
|
||||
<title>OpenStack Networking versus legacy networking (nova-network)
|
||||
considerations</title>
|
||||
<para>Selecting the type of networking technology to implement
|
||||
<para>Selecting the type of networking technology to implement
|
||||
depends on many factors. OpenStack Networking (neutron) and
|
||||
legacy networking (nova-network) both have their advantages and disadvantages.
|
||||
They are both valid and supported options that fit different
|
||||
use cases as described in the following table.</para>
|
||||
legacy networking (nova-network) both have their advantages and
|
||||
disadvantages. They are both valid and supported options that fit
|
||||
different use cases:</para>
|
||||
<informaltable rules="all">
|
||||
<col width="40%" />
|
||||
<col width="60%" />
|
||||
@ -375,79 +361,75 @@
|
||||
<title>Redundant networking: ToR switch high availability
|
||||
risk analysis</title>
|
||||
<para>A technical consideration of networking is the idea that
|
||||
switching gear in the data center that should be installed
|
||||
you should install switching gear in a data center
|
||||
with backup switches in case of hardware failure.</para>
|
||||
<para>
|
||||
Research into the mean time between failures (MTBF) on switches
|
||||
<para>Research indicates the mean time between failures (MTBF) on switches
|
||||
is between 100,000 and 200,000 hours. This number is dependent
|
||||
on the ambient temperature of the switch in the data
|
||||
center. When properly cooled and maintained, this translates to
|
||||
between 11 and 22 years before failure. Even in the worst case
|
||||
of poor ventilation and high ambient temperatures in the data
|
||||
center, the MTBF is still 2-3 years. This is based on published
|
||||
research found at <link
|
||||
center, the MTBF is still 2-3 years. See <link
|
||||
xlink:href="http://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf">http://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf</link>
|
||||
and <link
|
||||
xlink:href="http://www.n-tron.com/pdf/network_availability.pdf">http://www.n-tron.com/pdf/network_availability.pdf</link>.
|
||||
</para>
|
||||
<para>In most cases, it is much more economical to only use a
|
||||
for further information.</para>
|
||||
<para>In most cases, it is much more economical to use a
|
||||
single switch with a small pool of spare switches to replace
|
||||
failed units than it is to outfit an entire data center with
|
||||
redundant switches. Applications should also be able to
|
||||
tolerate rack level outages without affecting normal
|
||||
operations since network and compute resources are easily
|
||||
provisioned and plentiful.</para></section>
|
||||
redundant switches. Applications should tolerate rack level
|
||||
outages without affecting normal
|
||||
operations, since network and compute resources are easily
|
||||
provisioned and plentiful.</para>
|
||||
</section>
|
||||
<section xml:id="preparing-for-future-ipv6-support">
|
||||
<title>Preparing for the future: IPv6 support</title>
|
||||
<para>
|
||||
One of the most important networking topics today is the
|
||||
impending exhaustion of IPv4 addresses. In early 2014, ICANN
|
||||
announced that they started allocating the final IPv4 address
|
||||
blocks to the Regional Internet Registries (<link
|
||||
xlink:href="http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/">http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/</link>).
|
||||
This means the IPv4 address space is close to being fully
|
||||
allocated. As a result, it will soon become difficult to
|
||||
allocate more IPv4 addresses to an application that has
|
||||
experienced growth, or is expected to scale out, due to the lack
|
||||
of unallocated IPv4 address blocks.</para>
|
||||
<para>For network focused applications the future is the IPv6
|
||||
<para>One of the most important networking topics today is the
|
||||
impending exhaustion of IPv4 addresses. In early 2014, ICANN
|
||||
announced that they started allocating the final IPv4 address
|
||||
blocks to the Regional Internet Registries (<link
|
||||
xlink:href="http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/">http://www.internetsociety.org/deploy360/blog/2014/05/goodbye-ipv4-iana-starts-allocating-final-address-blocks/</link>).
|
||||
This means the IPv4 address space is close to being fully
|
||||
allocated. As a result, it will soon become difficult to
|
||||
allocate more IPv4 addresses to an application that has
|
||||
experienced growth, or that you expect to scale out, due to the lack
|
||||
of unallocated IPv4 address blocks.</para>
|
||||
<para>For network focused applications the future is the IPv6
|
||||
protocol. IPv6 increases the address space significantly,
|
||||
fixes long standing issues in the IPv4 protocol, and will
|
||||
become essential for network focused applications in the
|
||||
future.</para>
|
||||
<para>OpenStack Networking supports IPv6 when configured to take advantage of
|
||||
the feature. To enable it, simply create an IPv6 subnet in
|
||||
<para>OpenStack Networking supports IPv6 when configured to take
|
||||
advantage of it. To enable IPv6, create an IPv6 subnet in
|
||||
Networking and use IPv6 prefixes when creating security
|
||||
groups.</para></section>
|
||||
<section xml:id="asymmetric-links">
|
||||
<title>Asymmetric links</title>
|
||||
<para>When designing a network architecture, the traffic patterns
|
||||
of an application will heavily influence the allocation of
|
||||
total bandwidth and the number of links that are used to send
|
||||
<para>When designing a network architecture, the traffic patterns
|
||||
of an application heavily influence the allocation of
|
||||
total bandwidth and the number of links that you use to send
|
||||
and receive traffic. Applications that provide file storage
|
||||
for customers will allocate bandwidth and links to favor
|
||||
incoming traffic, whereas video streaming applications will
|
||||
allocate bandwidth and links to favor outgoing traffic.</para></section>
|
||||
for customers allocate bandwidth and links to favor
|
||||
incoming traffic, whereas video streaming applications
|
||||
allocate bandwidth and links to favor outgoing traffic.</para>
|
||||
</section>
|
||||
<section xml:id="performance-network-focus">
|
||||
<title>Performance</title>
|
||||
<para>It is important to analyze the applications' tolerance for
|
||||
<para>It is important to analyze the applications' tolerance for
|
||||
latency and jitter when designing an environment to support
|
||||
network focused applications. Certain applications, for
|
||||
example VoIP, are less tolerant of latency and jitter. Where
|
||||
latency and jitter are concerned, certain applications may
|
||||
require tuning of QoS parameters and network device queues to
|
||||
ensure that they are queued for transmit immediately or
|
||||
guaranteed minimum bandwidth. Since OpenStack currently does
|
||||
not support these functions, some considerations may need to
|
||||
be made for the network plug-in selected.</para>
|
||||
<para>The location of a service may also impact the application or
|
||||
consumer experience. If an application is designed to serve
|
||||
differing content to differing users it will need to be
|
||||
designed to properly direct connections to those specific
|
||||
locations. Use a multi-site installation for these situations,
|
||||
where appropriate.</para>
|
||||
<para>Networking can be implemented in two separate
|
||||
ways. The legacy networking (nova-network) provides a flat DHCP network
|
||||
ensure that they queue for transmit immediately or
|
||||
guarantee minimum bandwidth. Since OpenStack currently does
|
||||
not support these functions, consider carefully your selected
|
||||
network plug-in.</para>
|
||||
<para>The location of a service may also impact the application or
|
||||
consumer experience. If an application serves
|
||||
differing content to different users it must properly direct
|
||||
connections to those specific locations. Where appropriate,
|
||||
use a multi-site installation for these situations.</para>
|
||||
<para>You can implement networking in two separate
|
||||
ways. Legacy networking (nova-network) provides a flat DHCP network
|
||||
with a single broadcast domain. This implementation does not
|
||||
support tenant isolation networks or advanced plug-ins, but it
|
||||
is currently the only way to implement a distributed layer-3
|
||||
@ -457,15 +439,15 @@
|
||||
variety of network methods. Some of these include a layer-2
|
||||
only provider network model, external device plug-ins, or even
|
||||
OpenFlow controllers.</para>
|
||||
<para>Networking at large scales becomes a set of boundary
|
||||
<para>Networking at large scales becomes a set of boundary
|
||||
questions. The determination of how large a layer-2 domain
|
||||
needs to be is based on the amount of nodes within the domain
|
||||
must be is based on the amount of nodes within the domain
|
||||
and the amount of broadcast traffic that passes between
|
||||
instances. Breaking layer-2 boundaries may require the
|
||||
implementation of overlay networks and tunnels. This decision
|
||||
is a balancing act between the need for a smaller overhead or
|
||||
a need for a smaller domain.</para>
|
||||
<para>When selecting network devices, be aware that making this
|
||||
<para>When selecting network devices, be aware that making this
|
||||
decision based on the greatest port density often comes with a
|
||||
drawback. Aggregation switches and routers have not all kept
|
||||
pace with Top of Rack switches and may induce bottlenecks on
|
||||
|
@ -6,187 +6,160 @@
|
||||
xml:id="user-requirements-network-focus">
|
||||
<?dbhtml stop-chunking?>
|
||||
<title>User requirements</title>
|
||||
<para>Network focused architectures vary from the general purpose
|
||||
designs. They are heavily influenced by a specific subset of
|
||||
applications that interact with the network in a more
|
||||
impacting way. Some of the business requirements that will
|
||||
influence the design include:</para>
|
||||
<para>Network-focused architectures vary from the general-purpose
|
||||
architecture designs. Certain network-intensive applications influence
|
||||
these architectures. Some of the business requirements that influence
|
||||
the design include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>User experience: User experience is impacted by
|
||||
network latency through slow page loads, degraded
|
||||
video streams, and low quality VoIP sessions. Users
|
||||
are often not aware of how network design and
|
||||
architecture affects their experiences. Both
|
||||
enterprise customers and end-users rely on the network
|
||||
for delivery of an application. Network performance
|
||||
problems can provide a negative experience for the
|
||||
end-user, as well as productivity and economic loss.
|
||||
<para>Network latency through slow page loads, degraded video
|
||||
streams, and low quality VoIP sessions impacts the user
|
||||
experience. Users are often not aware of how network design and
|
||||
architecture affects their experiences. Both enterprise customers
|
||||
and end-users rely on the network for delivery of an application.
|
||||
Network performance problems can result in a negative experience
|
||||
for the end-user, as well as productivity and economic loss.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Regulatory requirements: Networks need to take into
|
||||
consideration any regulatory requirements about the
|
||||
physical location of data as it traverses the network.
|
||||
For example, Canadian medical records cannot pass
|
||||
outside of Canadian sovereign territory. Another
|
||||
network consideration is maintaining network
|
||||
segregation of private data flows and ensuring that
|
||||
the network between cloud locations is encrypted where
|
||||
required. Network architectures are affected by
|
||||
regulatory requirements for encryption and protection
|
||||
of data in flight as the data moves through various
|
||||
networks.</para>
|
||||
<para>Regulatory requirements: Consider regulatory
|
||||
requirements about the physical location of data as it traverses
|
||||
the network. In addition, maintain network segregation of private
|
||||
data flows while ensuring an encrypted network between cloud
|
||||
locations where required. Regulatory requirements for encryption
|
||||
and protection of data in flight affect network architectures as
|
||||
the data moves through various networks.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Many jurisdictions have legislative and regulatory
|
||||
requirements governing the storage and management of data in
|
||||
cloud environments. Common areas of regulation include:</para>
|
||||
<para>Many jurisdictions have legislative and regulatory requirements
|
||||
governing the storage and management of data in cloud environments.
|
||||
Common areas of regulation include:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Data retention policies ensuring storage of
|
||||
persistent data and records management to meet data
|
||||
archival requirements.</para>
|
||||
<para>Data retention policies ensuring storage of persistent data
|
||||
and records management to meet data archival requirements.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data ownership policies governing the possession and
|
||||
responsibility for data.</para>
|
||||
responsibility for data.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data sovereignty policies governing the storage of
|
||||
data in foreign countries or otherwise separate
|
||||
jurisdictions.</para>
|
||||
<para>Data sovereignty policies governing the storage of data in
|
||||
foreign countries or otherwise separate jurisdictions.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Data compliance policies governing where information
|
||||
needs to reside in certain locations due to regular
|
||||
issues and, more importantly, where it cannot reside
|
||||
in other locations for the same reason.</para>
|
||||
<para>Data compliance policies govern where information can and
|
||||
cannot reside in certain locations.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Examples of such legal frameworks include the data
|
||||
protection framework of the European Union (<link
|
||||
xlink:href="http://ec.europa.eu/justice/data-protection/">http://ec.europa.eu/justice/data-protection/</link>)
|
||||
and the requirements of the Financial Industry Regulatory
|
||||
Authority (<link
|
||||
xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
|
||||
in the United States. Consult a local regulatory body for more
|
||||
information.</para>
|
||||
<para>Examples of such legal frameworks include the data protection
|
||||
framework of the European Union
|
||||
(<link xlink:href="http://ec.europa.eu/justice/data-protection/">http://ec.europa.eu/justice/data-protection/</link>)
|
||||
and the requirements of the Financial Industry Regulatory Authority
|
||||
(<link xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
|
||||
in the United States. Consult a local regulatory body for more
|
||||
information.</para>
|
||||
<section xml:id="high-availability-issues-network-focus">
|
||||
<title>High availability issues</title>
|
||||
<para>OpenStack installations with high demand on network
|
||||
resources have high availability requirements that are
|
||||
determined by the application and use case. Financial
|
||||
transaction systems will have a much higher requirement for
|
||||
high availability than a development application. Forms of
|
||||
network availability, for example quality of service (QoS),
|
||||
can be used to improve the network performance of sensitive
|
||||
applications, for example VoIP and video streaming.</para>
|
||||
<para>Often, high performance systems will have SLA requirements
|
||||
for a minimum QoS with regard to guaranteed uptime, latency
|
||||
and bandwidth. The level of the SLA can have a significant
|
||||
impact on the network architecture and requirements for
|
||||
redundancy in the systems.</para></section>
|
||||
<para>Depending on the application and use case, network-intensive
|
||||
OpenStack installations can have high availability requirements.
|
||||
Financial transaction systems have a much higher requirement for high
|
||||
availability than a development application. Use network availability
|
||||
technologies, for example quality of service (QoS), to improve the
|
||||
network performance of sensitive applications such as VoIP and video
|
||||
streaming.</para>
|
||||
<para>High performance systems have SLA requirements for a minimum
|
||||
QoS with regard to guaranteed uptime, latency, and bandwidth. The level
|
||||
of the SLA can have a significant impact on the network architecture and
|
||||
requirements for redundancy in the systems.</para>
|
||||
</section>
|
||||
<section xml:id="risks-network-focus">
|
||||
<title>Risks</title>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Network misconfigurations</term>
|
||||
<listitem>
|
||||
<para>Configuring incorrect IP
|
||||
addresses, VLANs, and routes can cause outages to
|
||||
areas of the network or, in the worst-case scenario,
|
||||
the entire cloud infrastructure. Misconfigurations can
|
||||
cause disruptive problems and should be automated to
|
||||
minimize the opportunity for operator error.</para>
|
||||
<para>Configuring incorrect IP addresses, VLANs, and routers
|
||||
can cause outages to areas of the network or, in the worst-case
|
||||
scenario, the entire cloud infrastructure. Automate network
|
||||
configurations to minimize the opportunity for operator error
|
||||
as it can cause disruptive problems.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Capacity planning</term>
|
||||
<listitem>
|
||||
<para>Cloud networks need to be managed
|
||||
for capacity and growth over time. There is a risk
|
||||
that the network will not grow to support the
|
||||
workload. Capacity planning includes the purchase of
|
||||
network circuits and hardware that can potentially
|
||||
have lead times measured in months or more.</para>
|
||||
<para>Cloud networks require management for capacity and growth
|
||||
over time. Capacity planning includes the purchase of network
|
||||
circuits and hardware that can potentially have lead times
|
||||
measured in months or years.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Network tuning</term>
|
||||
<listitem>
|
||||
<para>Cloud networks need to be configured
|
||||
to minimize link loss, packet loss, packet storms,
|
||||
broadcast storms, and loops.</para>
|
||||
<para>Configure cloud networks to minimize link loss, packet loss,
|
||||
packet storms, broadcast storms, and loops.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Single Point Of Failure (SPOF)</term>
|
||||
<listitem>
|
||||
<para>High availability
|
||||
must be taken into account even at the physical and
|
||||
environmental layers. If there is a single point of
|
||||
failure due to only one upstream link, or only one
|
||||
power supply, an outage becomes unavoidable.</para>
|
||||
<para>Consider high availability at the physical and environmental
|
||||
layers. If there is a single point of failure due to only one
|
||||
upstream link, or only one power supply, an outage can become
|
||||
unavoidable.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Complexity</term>
|
||||
<listitem>
|
||||
<para>An overly complex network design becomes
|
||||
difficult to maintain and troubleshoot. While
|
||||
automated tools that handle overlay networks or device
|
||||
level configuration can mitigate this, non-traditional
|
||||
interconnects between functions and specialized
|
||||
hardware need to be well documented or avoided to
|
||||
prevent outages.</para>
|
||||
<para>An overly complex network design can be difficult to
|
||||
maintain and troubleshoot. While device-level configuration
|
||||
can ease maintenance concerns and automated tools can handle
|
||||
overlay networks, avoid or document non-traditional interconnects
|
||||
between functions and specialized hardware to prevent
|
||||
outages.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Non-standard features</term>
|
||||
<listitem>
|
||||
<para>There are additional risks
|
||||
that arise from configuring the cloud network to take
|
||||
advantage of vendor specific features. One example is
|
||||
multi-link aggregation (MLAG) that is being used to
|
||||
provide redundancy at the aggregator switch level of
|
||||
the network. MLAG is not a standard and, as a result,
|
||||
each vendor has their own proprietary implementation
|
||||
of the feature. MLAG architectures are not
|
||||
interoperable across switch vendors, which leads to
|
||||
vendor lock-in, and can cause delays or inability when
|
||||
upgrading components.</para>
|
||||
<para>There are additional risks that arise from configuring the
|
||||
cloud network to take advantage of vendor specific features.
|
||||
One example is multi-link aggregation (MLAG) used to provide
|
||||
redundancy at the aggregator switch level of the network. MLAG
|
||||
is not a standard and, as a result, each vendor has their own
|
||||
proprietary implementation of the feature. MLAG architectures
|
||||
are not interoperable across switch vendors, which leads to
|
||||
vendor lock-in, and can cause delays or inability when upgrading
|
||||
components.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</section>
|
||||
<section xml:id="security-network-focus"><title>Security</title>
|
||||
<para>Security is often overlooked or added after a design has
|
||||
been implemented. Consider security implications and
|
||||
requirements before designing the physical and logical network
|
||||
topologies. Some of the factors that need to be addressed
|
||||
include making sure the networks are properly segregated and
|
||||
traffic flows are going to the correct destinations without
|
||||
crossing through locations that are undesirable. Some examples
|
||||
of factors that need to be taken into consideration are:</para>
|
||||
<para>Users often overlook or add security after a design implementation.
|
||||
Consider security implications and requirements before designing the
|
||||
physical and logical network topologies. Make sure that the networks are
|
||||
properly segregated and traffic flows are going to the correct
|
||||
destinations without crossing through locations that are undesirable.
|
||||
Consider the following example factors:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Firewalls</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Overlay interconnects for joining separated tenant
|
||||
networks</para>
|
||||
<para>Overlay interconnects for joining separated tenant networks</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Routing through or avoiding specific networks</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>Another security vulnerability that must be taken into
|
||||
account is how networks are attached to hypervisors. If a
|
||||
network must be separated from other systems at all costs, it
|
||||
may be necessary to schedule instances for that network onto
|
||||
dedicated compute nodes. This may also be done to mitigate
|
||||
against exploiting a hypervisor breakout allowing the attacker
|
||||
access to networks from a compromised instance.</para>
|
||||
<para>How networks attach to hypervisors can expose security
|
||||
vulnerabilities. To mitigate against exploiting hypervisor breakouts,
|
||||
separate networks from other systems and schedule instances for the
|
||||
network onto dedicated compute nodes. This prevents attackers
|
||||
from having access to the networks from a compromised instance.</para>
|
||||
</section>
|
||||
</section>
|
||||
|
Loading…
Reference in New Issue
Block a user