Multi-site chapter edits

1. Edits to the multi-site chapter
2. Removed duplicated legal content which was added to a common section. See https://review.openstack.org/#/c/212299/

Change-Id: I10e3a04650548454c73024d87cbbb6fda63454e8
Implements: blueprint arch-guide
This commit is contained in:
darrenchan 2015-08-12 23:26:53 +10:00
parent 87ff7002f8
commit 68e8c66e79
6 changed files with 339 additions and 350 deletions

View File

@ -6,16 +6,9 @@
xml:id="multi_site">
<title>Multi-site</title>
<para>A multi-site OpenStack environment is one in which services,
located in more than one data center, are used to provide the
overall solution. Usage requirements of different multi-site
clouds may vary widely, but they share some common needs.
OpenStack is capable of running in a multi-region
<para>OpenStack is capable of running in a multi-region
configuration. This enables some parts of OpenStack to
effectively manage a group of sites as a single cloud. With
careful planning in the design phase, OpenStack can act as an
excellent multi-site cloud solution for a multitude of
needs.</para>
effectively manage a group of sites as a single cloud.</para>
<para>Some use cases that might indicate a need for a multi-site
deployment of OpenStack include:</para>
<itemizedlist>

View File

@ -6,59 +6,61 @@
xml:id="arch-design-architecture-multiple-site">
<?dbhtml stop-chunking?>
<title>Architecture</title>
<para>This graphic is a high level diagram of a multi-site OpenStack
architecture. Each site is an OpenStack cloud but it may be necessary to
architect the sites on different versions. For example, if the second
site is intended to be a replacement for the first site, they would be
different. Another common design would be a private OpenStack cloud with
replicated site that would be used for high availability or disaster
recovery. The most important design decision is how to configure the
storage. It can be configured as a single shared pool or separate pools,
depending on the user and technical requirements.</para>
<para><xref linkend="multi-site_arch"/>
illustrates a high level multi-site OpenStack
architecture. Each site is an OpenStack cloud but it may be necessary
to architect the sites on different versions. For example, if the
second site is intended to be a replacement for the first site,
they would be different. Another common design would be a private
OpenStack cloud with a replicated site that would be used for high
availability or disaster recovery. The most important design decision
is configuring storage as a single shared pool or separate pools,
depending on user and technical requirements.</para>
<figure xml:id="multi-site_arch">
<title>Multi-site OpenStack architecture</title>
<mediaobject>
<imageobject>
<imagedata contentwidth="4in"
<imagedata contentwidth="6in"
fileref="../figures/Multi-Site_shared_keystone_horizon_swift1.png"/>
</imageobject>
</mediaobject>
</figure>
<section xml:id="openstack-services-architecture">
<title>OpenStack services architecture</title>
<para>The OpenStack Identity service, which is used by all other
OpenStack components for authorization and the catalog of service
endpoints, supports the concept of regions. A region is a logical
construct that can be used to group OpenStack services that are in
close proximity to one another. The concept of regions is flexible;
it may can contain OpenStack service endpoints located within a
distinct geographic region, or regions. It may be smaller in scope,
where a region is a single rack within a data center or even a
single blade chassis, with multiple regions existing in adjacent
OpenStack components for authorization and the catalog of
service endpoints, supports the concept of regions. A region
is a logical construct used to group OpenStack services in
close proximity to one another. The concept of
regions is flexible; it may can contain OpenStack service
endpoints located within a distinct geographic region or regions.
It may be smaller in scope, where a region is a single rack
within a data center, with multiple regions existing in adjacent
racks in the same data center.</para>
<para>The majority of OpenStack components are designed to run within
the context of a single region. The OpenStack Compute service is
designed to manage compute resources within a region, with support
for subdivisions of compute resources by using availability zones
and cells. The OpenStack Networking service can be used to manage
network resources in the same broadcast domain or collection of
switches that are linked. The OpenStack Block Storage service
controls storage resources within a region with all storage
resources residing on the same storage network. Like the OpenStack
Compute service, the OpenStack Block Storage service also supports
the availability zone construct which can be used to subdivide
storage resources.</para>
<para>The majority of OpenStack components are designed to run
within the context of a single region. The OpenStack Compute
service is designed to manage compute resources within a region,
with support for subdivisions of compute resources by using
availability zones and cells. The OpenStack Networking service
can be used to manage network resources in the same broadcast
domain or collection of switches that are linked. The OpenStack
Block Storage service controls storage resources within a region
with all storage resources residing on the same storage network.
Like the OpenStack Compute service, the OpenStack Block Storage
service also supports the availability zone construct which can
be used to subdivide storage resources.</para>
<para>The OpenStack dashboard, OpenStack Identity, and OpenStack
Object Storage services are components that can each be deployed
centrally in order to serve multiple regions.</para>
</section>
<section xml:id="arch-multi-storage">
<title>Storage</title>
<para>With multiple OpenStack regions, having a single OpenStack Object
Storage service endpoint that delivers shared file storage for all
regions is desirable. The Object Storage service internally
replicates files to multiple nodes. The advantages of this are that,
if a file placed into the Object Storage service is visible to all
regions, it can be used by applications or workloads in any or all
of the regions. This simplifies high availability failover and
disaster recovery rollback.</para>
<para>With multiple OpenStack regions, it is recommended to configure
a single OpenStack Object Storage service endpoint to deliver
shared file storage for all regions. The Object Storage service
internally replicates files to multiple nodes which can be used
by applications or workloads in multiple regions. This simplifies
high availability failover and disaster recovery rollback.</para>
<para>In order to scale the Object Storage service to meet the workload
of multiple regions, multiple proxy workers are run and
load-balanced, storage nodes are installed in each region, and the
@ -68,19 +70,20 @@
reducing the actual load on the storage network. In addition to an
HTTP caching layer, use a caching layer like Memcache to cache
objects between the proxy and storage nodes.</para>
<para>If the cloud is designed without a single Object Storage Service
endpoint for multiple regions, and instead a separate Object Storage
Service endpoint is made available in each region, applications are
<para>If the cloud is designed with a separate Object Storage
Service endpoint made available in each region, applications are
required to handle synchronization (if desired) and other management
operations to ensure consistency across the nodes. For some
applications, having multiple Object Storage Service endpoints
located in the same region as the application may be desirable due
to reduced latency, cross region bandwidth, and ease of
deployment.</para>
<para>For the Block Storage service, the most important decisions are
the selection of the storage technology and whether or not a
dedicated network is used to carry storage traffic from the storage
service to the compute nodes.</para>
<note>
<para>For the Block Storage service, the most important decisions
are the selection of the storage technology, and whether
a dedicated network is used to carry storage traffic
from the storage service to the compute nodes.</para>
</note>
</section>
<section xml:id="arch-networking-multiple">
<title>Networking</title>
@ -100,18 +103,19 @@
</section>
<section xml:id="arch-dependencies-multiple">
<title>Dependencies</title>
<para>The architecture for a multi-site installation of OpenStack is
dependent on a number of factors. One major dependency to consider
is storage. When designing the storage system, the storage mechanism
needs to be determined. Once the storage type is determined, how it
is accessed is critical. For example, we recommend that
storage should use a dedicated network. Another concern is how
the storage is configured to protect the data. For example, the
recovery point objective (RPO) and the recovery time objective
(RTO). How quickly can the recovery from a fault be completed,
determines how often the replication of data is required. Ensure that
enough storage is allocated to support the data protection
strategy.</para>
<para>The architecture for a multi-site OpenStack installation
is dependent on a number of factors. One major dependency to
consider is storage. When designing the storage system, the
storage mechanism needs to be determined. Once the storage
type is determined, how it is accessed is critical. For example,
we recommend that storage should use a dedicated network.
Another concern is how the storage is configured to protect
the data. For example, the Recovery Point Objective (RPO) and
the Recovery Time Objective (RTO). How quickly recovery from
a fault can be completed, determines how often the replication of
data is required. Ensure that enough storage is allocated to
support the data protection strategy.
</para>
<para>Networking decisions include the encapsulation mechanism that can
be used for the tenant networks, how large the broadcast domains
should be, and the contracted SLAs for the interconnects.</para>

View File

@ -6,16 +6,14 @@
xml:id="operational-considerations-multi-site">
<?dbhtml stop-chunking?>
<title>Operational considerations</title>
<para>Deployment of a multi-site OpenStack cloud using regions
<para>Multi-site OpenStack cloud deployment using regions
requires that the service catalog contains per-region entries
for each service deployed other than the Identity service
itself. There is limited support amongst currently available
off-the-shelf OpenStack deployment tools for defining multiple
regions in this fashion.</para>
<para>Deployers must be aware of this and provide the appropriate
for each service deployed other than the Identity service. Most
off-the-shelf OpenStack deployment tools have limited support
for defining multiple regions in this fashion.</para>
<para>Deployers should be aware of this and provide the appropriate
customization of the service catalog for their site either
manually or via customization of the deployment tools in
use.</para>
manually, or by customizing deployment tools in use.</para>
<note><para>As of the Kilo release, documentation for
implementing this feature is in progress. See this bug for
more information:
@ -31,51 +29,46 @@
host operating systems, guest operating systems, OpenStack
distributions (if applicable), software-defined infrastructure
including network controllers and storage systems, and even
individual applications need to be evaluated in light of the
multi-site nature of the cloud.</para>
individual applications need to be evaluated.</para>
<para>Topics to consider include:</para>
<itemizedlist>
<listitem>
<para>The specific definition of what constitutes a site
<para>The definition of what constitutes a site
in the relevant licenses, as the term does not
necessarily denote a geographic or otherwise
physically isolated location in the traditional
sense.</para>
physically isolated location.</para>
</listitem>
<listitem>
<para>Differentiations between "hot" (active) and "cold"
(inactive) sites where significant savings may be made
(inactive) sites, where significant savings may be made
in situations where one site is a cold standby for
disaster recovery purposes only.</para>
</listitem>
<listitem>
<para>Certain locations might require local vendors to
provide support and services for each site provides
challenges, but will vary on the licensing agreement
in place.</para>
provide support and services for each site which may vary
with the licensing agreement in place.</para>
</listitem>
</itemizedlist></section>
<section xml:id="logging-and-monitoring-multi-site">
<title>Logging and monitoring</title>
<para>Logging and monitoring does not significantly differ for a
multi-site OpenStack cloud. The same well known tools
described in the <link
multi-site OpenStack cloud. The tools described in the <link
xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">Logging
and monitoring chapter</link> of the <citetitle>Operations
Guide</citetitle> remain applicable. Logging and monitoring
can be provided both on a per-site basis and in a common
can be provided on a per-site basis, and in a common
centralized location.</para>
<para>When attempting to deploy logging and monitoring facilities
to a centralized location, care must be taken with regards to
the load placed on the inter-site networking links.</para></section>
to a centralized location, care must be taken with the load
placed on the inter-site networking links.</para></section>
<section xml:id="upgrades-multi-site">
<title>Upgrades</title>
<para>In multi-site OpenStack clouds deployed using regions each
site is, effectively, an independent OpenStack installation
which is linked to the others by using centralized services
such as Identity which are shared between sites. At a high
level the recommended order of operations to upgrade an
individual OpenStack environment is (see the <link
<para>In multi-site OpenStack clouds deployed using regions, sites
are independent OpenStack installations which are linked
together using shared centralized services such as OpenStack
Identity. At a high level the recommended order of operations
to upgrade an individual OpenStack environment is (see the <link
xlink:href="http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html">Upgrades
chapter</link> of the <citetitle>Operations Guide</citetitle>
for details):</para>
@ -123,22 +116,20 @@
shared.</para>
</listitem>
</orderedlist>
<para>Note that Compute
upgrades within each site can also be performed in a rolling
<para>Compute upgrades within each site can also be performed in a rolling
fashion. Compute controller services (API, Scheduler, and
Conductor) can be upgraded prior to upgrading of individual
compute nodes. This maximizes the ability of operations staff
to keep a site operational for users of compute services while
performing an upgrade.</para></section>
compute nodes. This allows operations staff to keep a site
operational for users of Compute services while performing an
upgrade.</para></section>
<section xml:id="quota-management-multi-site">
<title>Quota management</title>
<para>To prevent system capacities from being exhausted without
notification, OpenStack provides operators with the ability to
define quotas. Quotas are used to set operational limits and
are currently enforced at the tenant (or project) level rather
than at the user level.</para>
<para>Quotas are defined on a per-region basis. Operators may wish
to define identical quotas for tenants in each region of the
<para>Quotas are used to set operational limits to prevent system
capacities from being exhausted without notification. They are
currently enforced at the tenant (or project) level rather than
at the user level.</para>
<para>Quotas are defined on a per-region basis. Operators can
define identical quotas for tenants in each region of the
cloud to provide a consistent experience, or even create a
process for synchronizing allocated quotas across regions. It
is important to note that only the operational limits imposed
@ -161,24 +152,22 @@
Control (RBAC) policies, defined in a <filename>policy.json</filename> file, for
each service. Operators edit these files to customize the
policies for their OpenStack installation. If the application
of consistent RBAC policies across sites is considered a
requirement, then it is necessary to ensure proper
synchronization of the <filename>policy.json</filename> files to all
installations.</para>
<para>This must be done using normal system administration tools
such as rsync as no functionality for synchronizing policies
across regions is currently provided within OpenStack.</para></section>
of consistent RBAC policies across sites is a requirement, then
it is necessary to ensure proper synchronization of the
<filename>policy.json</filename> files to all installations.</para>
<para>This must be done using system administration tools
such as rsync as functionality for synchronizing policies
across regions is not currently provided within OpenStack.</para></section>
<section xml:id="documentation-multi-site">
<title>Documentation</title>
<para>Users must be able to leverage cloud infrastructure and
provision new resources in the environment. It is important
that user documentation is accessible by users of the cloud
infrastructure to ensure they are given sufficient information
to help them leverage the cloud. As an example, by default
OpenStack schedules instances on a compute node
that user documentation is accessible by users to ensure they
are given sufficient information to help them leverage the cloud.
As an example, by default OpenStack schedules instances on a compute node
automatically. However, when multiple regions are available,
it is left to the end user to decide in which region to
schedule the new instance. The dashboard presents the user with
the end user needs to decide in which region to schedule the
new instance. The dashboard presents the user with
the first region in your configuration. The API and CLI tools
do not execute commands unless a valid region is specified.
It is therefore important to provide documentation to your

View File

@ -22,10 +22,10 @@
very sensitive to latency and needs a rapid response to
end-users. After reviewing the user, technical and operational
considerations, it is determined beneficial to build a number
of regions local to the customer's edge. In this case rather
than build a few large, centralized data centers, the intent
of the architecture is to provide a pair of small data centers
in locations that are closer to the customer. In this use
of regions local to the customer's edge. Rather than build a
few large, centralized data centers, the intent of the architecture
is to provide a pair of small data centers in locations that
are closer to the customer. In this use
case, spreading applications out allows for different
horizontal scaling than a traditional compute workload scale.
The intent is to scale by creating more copies of the
@ -60,44 +60,47 @@
expanding the capacity of all regions simultaneously,
therefore maximizing the cost-effectiveness of the multi-site
design.</para>
<para>One of the key decisions of running this sort of
infrastructure is whether or not to provide a redundancy
<para>One of the key decisions of running this infrastructure is
whether or not to provide a redundancy
model. Two types of redundancy and high availability models in
this configuration can be implemented. The first type
revolves around the availability of the central OpenStack
is the availability of central OpenStack
components. Keystone can be made highly available in three
central data centers that host the centralized OpenStack
components. This prevents a loss of any one of the regions
causing an outage in service. It also has the added benefit of
being able to run a central storage repository as a primary
cache for distributing content to each of the regions.</para>
<para>The second redundancy topic is that of the edge data center
itself. A second data center in each of the edge regional
locations house a second region near the first. This
<para>The second redundancy type is the edge data center itself.
A second data center in each of the edge regional
locations house a second region near the first region. This
ensures that the application does not suffer degraded
performance in terms of latency and availability.</para>
<para>This figure depicts the solution designed to have both a
centralized set of core data centers for OpenStack services
and paired edge data centers:</para>
<mediaobject>
<para><xref linkend="multi-site_customer_edge"/> depicts
the solution designed to have both a centralized set of core
data centers for OpenStack services and paired edge data centers:</para>
<figure xml:id="multi-site_customer_edge">
<title>Multi-site architecture example</title>
<mediaobject>
<imageobject>
<imagedata contentwidth="4in"
<imagedata contentwidth="6in"
fileref="../figures/Multi-Site_Customer_Edge.png"/>
</imageobject>
</mediaobject>
</mediaobject>
</figure>
<section xml:id="geo-redundant-load-balancing">
<title>Geo-redundant load balancing</title>
<para>A large-scale web application has been designed with cloud
principles in mind. The application is designed provide
service to application store, on a 24/7 basis. The company has
typical 2-tier architecture with a web front-end servicing the
customer requests and a NoSQL database back end storing the
typical two tier architecture with a web front-end servicing the
customer requests, and a NoSQL database back end storing the
information.</para>
<para>As of late there has been several outages in number of major
public cloud providers&mdash;usually due to the fact these
applications were running out of a single geographical
location. The design therefore should mitigate the chance of a
single site causing an outage for their business.</para>
public cloud providers due to applications running out of
a single geographical location. The design therefore should
mitigate the chance of a single site causing an outage for their
business.</para>
<para>The solution would consist of the following OpenStack
components:</para>
<itemizedlist>
@ -108,12 +111,11 @@
<listitem>
<para>OpenStack Controller services running, Networking,
dashboard, Block Storage and Compute running locally in
each of the three regions. The other services,
Identity, Orchestration, Telemetry, Image service and
Object Storage can be
installed centrally&mdash;with nodes in each of the region
providing a redundant OpenStack Controller plane
throughout the globe.</para>
each of the three regions. Identity service, Orchestration
service, Telemetry service, Image service and
Object Storage can be installed centrally, with
nodes in each of the region providing a redundant
OpenStack Controller plane throughout the globe.</para>
</listitem>
<listitem>
<para>OpenStack Compute nodes running the KVM
@ -126,9 +128,9 @@
replicated on a regular basis.</para>
</listitem>
<listitem>
<para>A Distributed DNS service available to all
regions&mdash;that allows for dynamic update of DNS records of
deployed instances.</para>
<para>A distributed DNS service available to all
regions that allows for dynamic update of DNS
records of deployed instances.</para>
</listitem>
<listitem>
<para>A geo-redundant load balancing service can be used
@ -153,10 +155,10 @@
</listitem>
</itemizedlist>
<para>Another autoscaling Heat template can be used to deploy a
distributed MongoDB shard over the three locations&mdash;with the
distributed MongoDB shard over the three locations, with the
option of storing required data on a globally available swift
container. According to the usage and load on the database
server&mdash;additional shards can be provisioned according to
server, additional shards can be provisioned according to
the thresholds defined in Telemetry.</para>
<!-- <para>The reason that three regions were selected here was because of
the fear of having abnormal load on a single region in the
@ -169,57 +171,66 @@
autoscaling and auto healing in the event of increased load.
Additional configuration management tools, such as Puppet or
Chef could also have been used in this scenario, but were not
chosen due to the fact that Orchestration had the appropriate built-in
hooks into the OpenStack cloud&mdash;whereas the other tools were
external and not native to OpenStack. In addition&mdash;since this
deployment scenario was relatively straight forward&mdash;the
external tools were not needed.</para>
<para>
OpenStack Object Storage is used here to serve as a back end for
chosen since Orchestration had the appropriate built-in
hooks into the OpenStack cloud, whereas the other tools were
external and not native to OpenStack. In addition, external
tools were not needed since this deployment scenario was straight
forward.</para>
<para>OpenStack Object Storage is used here to serve as a back end for
the Image service since it is the most suitable solution for a
globally distributed storage solution&mdash;with its own
globally distributed storage solution with its own
replication mechanism. Home grown solutions could also have
been used including the handling of replication&mdash;but were not
been used including the handling of replication, but were not
chosen, because Object Storage is already an intricate part of the
infrastructure&mdash;and proven solution.</para>
infrastructure and a proven solution.</para>
<para>An external load balancing service was used and not the
LBaaS in OpenStack because the solution in OpenStack is not
redundant and does not have any awareness of geo location.</para>
<mediaobject>
<figure xml:id="multi-site_geo_redundant">
<title>Multi-site geo-redundant architecture</title>
<mediaobject>
<imageobject>
<imagedata contentwidth="4in"
<imagedata contentwidth="6in"
fileref="../figures/Multi-site_Geo_Redundant_LB.png"/>
</imageobject>
</mediaobject></section>
<section xml:id="location-local-services"><title>Location-local service</title>
<para>A common use for a multi-site deployment of OpenStack, is
for creating a Content Delivery Network. An application that
</mediaobject>
</figure>
</section>
<section xml:id="location-local-services">
<title>Location-local service</title>
<para>A common use for multi-site OpenStack deployment is
creating a Content Delivery Network. An application that
uses a location-local architecture requires low network
latency and proximity to the user, in order to provide an
optimal user experience, in addition to reducing the cost of
bandwidth and transit, since the content resides on sites
closer to the customer, instead of a centralized content store
that requires utilizing higher cost cross-country links.</para>
<para>This architecture usually includes a geo-location component
that places user requests at the closest possible node. In
latency and proximity to the user to provide an
optimal user experience and reduce the cost of bandwidth and
transit. The content resides on sites closer to the customer,
instead of a centralized content store that requires utilizing
higher cost cross-country links.</para>
<para>This architecture includes a geo-location component
that places user requests to the closest possible node. In
this scenario, 100% redundancy of content across every site is
a goal rather than a requirement, with the intent being to
maximize the amount of content available that is within a
minimum number of network hops for any given end user. Despite
a goal rather than a requirement, with the intent to
maximize the amount of content available within a
minimum number of network hops for end users. Despite
these differences, the storage replication configuration has
significant overlap with that of a geo-redundant load
balancing use case.</para>
<para>In this example, the application utilizing this multi-site
OpenStack install that is location aware would launch web
server or content serving instances on the compute cluster in
each site. Requests from clients are first sent to a
global services load balancer that determines the location of
the client, then routes the request to the closest OpenStack
site where the application completes the request.</para>
<mediaobject>
<para>In <xref linkend="multi-site_shared_shared_keystone"/>,
the application utilizing this multi-site OpenStack install
that is location-aware would launch web server or content
serving instances on the compute cluster in each site. Requests
from clients are first sent to a global services load balancer
that determines the location of the client, then routes the
request to the closest OpenStack site where the application
completes the request.</para>
<figure xml:id="multi-site_shared_shared_keystone">
<title>Multi-site shared keystone architecture</title>
<mediaobject>
<imageobject>
<imagedata contentwidth="4in"
<imagedata contentwidth="6in"
fileref="../figures/Multi-Site_shared_keystone1.png"/>
</imageobject>
</mediaobject></section>
</mediaobject>
</figure>
</section>
</section>

View File

@ -27,105 +27,108 @@
high-bandwidth links available between them, it may be wise to
configure a separate storage replication network between the
two sites to support a single Swift endpoint and a shared
object storage capability between them. (An example of this
Object Storage capability between them. An example of this
technique, as well as a configuration walk-through, is
available at <link
xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>).
xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>.
Another option in this scenario is to build a dedicated set of
tenant private networks across the secondary link using
tenant private networks across the secondary link, using
overlay networks with a third party mapping the site overlays
to each other.</para>
<para>The capacity requirements of the links between sites is
driven by application behavior. If the latency of the links is
driven by application behavior. If the link latency is
too high, certain applications that use a large number of
small packets, for example RPC calls, may encounter issues
communicating with each other or operating properly.
Additionally, OpenStack may encounter similar types of issues.
To mitigate this, tuning of the Identity service call timeouts may be
necessary to prevent issues authenticating against a central
To mitigate this, Identity service call timeouts can be
tuned to prevent issues authenticating against a central
Identity service.</para>
<para>Another capacity consideration when it comes to networking
for a multi-site deployment is the available amount and
performance of overlay networks for tenant networks. If using
shared tenant networks across zones, it is imperative that an
external overlay manager or controller be used to map these
overlays together. It is necessary to ensure the amount of
possible IDs between the zones are identical. Note that, as of
the Kilo release, OpenStack Networking was not capable of managing
tunnel IDs across installations. This means that if one site
runs out of IDs, but other does not, that tenant's network
is unable to reach the other site.</para>
<para>Another network capacity consideration for a multi-site
deployment is the amount and performance of overlay networks
available for tenant networks. If using shared tenant networks
across zones, it is imperative that an external overlay manager
or controller be used to map these overlays together. It is
necessary to ensure the amount of possible IDs between the zones
are identical.</para>
<note>
<para>As of the Kilo release, OpenStack Networking was not
capable of managing tunnel IDs across installations. So if
one site runs out of IDs, but another does not, that tenant's
network is unable to reach the other site.</para>
</note>
<para>Capacity can take other forms as well. The ability for a
region to grow depends on scaling out the number of available
compute nodes. This topic is covered in greater detail in the
section for compute-focused deployments. However, it should be
noted that cells may be necessary to grow an individual region
beyond a certain point. This point depends on the size of your
cluster and the ratio of virtual machines per
section for compute-focused deployments. However, it may be
necessary to grow cells in an individual region, depending on
the size of your cluster and the ratio of virtual machines per
hypervisor.</para>
<para>A third form of capacity comes in the multi-region-capable
components of OpenStack. Centralized Object Storage is capable
of serving objects through a single namespace across multiple
regions. Since this works by accessing the object store via
regions. Since this works by accessing the object store through
swift proxy, it is possible to overload the proxies. There are
two options available to mitigate this issue. The first is to
deploy a large number of swift proxies. The drawback to this
is that the proxies are not load-balanced and a large file
request could continually hit the same proxy. The other way to
mitigate this is to front-end the proxies with a caching HTTP
proxy and load balancer. Since swift objects are returned to
the requester via HTTP, this load balancer would alleviate the
load required on the swift proxies.</para>
two options available to mitigate this issue:</para>
<itemizedlist>
<listitem>
<para>Deploy a large number of swift proxies. The drawback is
that the proxies are not load-balanced and a large file
request could continually hit the same proxy.</para>
</listitem>
<listitem>
<para>Add a caching HTTP proxy and load balancer in front of
the swift proxies. Since swift objects are returned to the
requester via HTTP, this load balancer would alleviate the
load required on the swift proxies.</para>
</listitem>
</itemizedlist>
<section xml:id="utilization-multi-site"><title>Utilization</title>
<para>While constructing a multi-site OpenStack environment is the
goal of this guide, the real test is whether an application
can utilize it.</para>
<para>Identity is normally the first interface for the majority of
OpenStack users. Interacting with the Identity service is required for
almost all major operations within OpenStack. Therefore, it is
important to ensure that you provide users with a single URL
for Identity service authentication. Equally important is proper
documentation and configuration of regions within the Identity service.
<para>The Identity service is normally the first interface for
OpenStack users and is required for almost all major operations
within OpenStack. Therefore, it is important that you provide users
with a single URL for Identity service authentication, and
document the configuration of regions within the Identity service.
Each of the sites defined in your installation is considered
to be a region in Identity nomenclature. This is important for
the users of the system, when reading Identity documentation,
as it is required to define the region name when providing
actions to an API endpoint or in the dashboard.</para>
the users, as it is required to define the region name when
providing actions to an API endpoint or in the dashboard.</para>
<para>Load balancing is another common issue with multi-site
installations. While it is still possible to run HAproxy
instances with Load-Balancer-as-a-Service, these are local
to a specific region. Some applications may be able to cope
with this via internal mechanisms. Others, however, may
require the implementation of an external system including
global services load balancers or anycast-advertised
DNS.</para>
instances with Load-Balancer-as-a-Service, these are defined
to a specific region. Some applications can manage this using
internal mechanisms. Other applications may require the
implementation of an external system, including global services
load balancers or anycast-advertised DNS.</para>
<para>Depending on the storage model chosen during site design,
storage replication and availability are also a concern
for end-users. If an application is capable of understanding
regions, then it is possible to keep the object storage system
separated by region. In this case, users who want to have an
object available to more than one region need to do the
cross-site replication themselves. With a centralized swift
proxy, however, the user may need to benchmark the replication
timing of the Object Storage back end. Benchmarking allows the
operational staff to provide users with an understanding of
the amount of time required for a stored or modified object to
become available to the entire environment.</para></section>
for end-users. If an application can support regions, then it
is possible to keep the object storage system separated by region.
In this case, users who want to have an object available to
more than one region need to perform cross-site replication.
However, with a centralized swift proxy, the user may need to
benchmark the replication timing of the Object Storage back end.
Benchmarking allows the operational staff to provide users with
an understanding of the amount of time required for a stored or
modified object to become available to the entire environment.</para>
</section>
<section xml:id="performance"><title>Performance</title>
<para>Determining the performance of a multi-site installation
involves considerations that do not come into play in a
single-site deployment. Being a distributed deployment,
multi-site deployments incur a few extra penalties to
performance in certain situations.</para>
performance in multi-site deployments may be affected in certain
situations.</para>
<para>Since multi-site systems can be geographically separated,
they may have worse than normal latency or jitter when
communicating across regions. This can especially impact
systems like the OpenStack Identity service when making
authentication attempts from regions that do not contain the
centralized Identity implementation. It can also affect
certain applications which rely on remote procedure call (RPC)
for normal operation. An example of this can be seen in High
Performance Computing workloads.</para>
there may be greater latency or jitter when communicating across
regions. This can especially impact systems like the OpenStack
Identity service when making authentication attempts from regions
that do not contain the centralized Identity implementation. It
can also affect applications which rely on Remote Procedure Call (RPC)
for normal operation. An example of this can be seen in high
performance computing workloads.</para>
<para>Storage availability can also be impacted by the
architecture of a multi-site deployment. A centralized Object
Storage service requires more time for an object to be
@ -137,4 +140,37 @@
to manually cope with this limitation by creating duplicate
block storage entries in each region.</para>
</section>
<section xml:id="openstack-components_multi-site">
<title>OpenStack components</title>
<para>Most OpenStack installations require a bare minimum set of
pieces to function. These include the OpenStack Identity
(keystone) for authentication, OpenStack Compute
(nova) for compute, OpenStack Image service (glance) for image
storage, OpenStack Networking (neutron) for networking, and
potentially an object store in the form of OpenStack Object
Storage (swift). Deploying a multi-site installation also demands extra
components in order to coordinate between regions. A centralized
Identity service is necessary to provide the single authentication
point. A centralized dashboard is also recommended to provide a
single login point and a mapping to the API and CLI
options available. A centralized Object Storage service may also
be used, but will require the installation of the swift proxy
service.</para>
<para>It may also be helpful to install a few extra options in
order to facilitate certain use cases. For example,
installing Designate may assist in automatically generating
DNS domains for each region with an automatically-populated
zone full of resource records for each instance. This
facilitates using DNS as a mechanism for determining which
region will be selected for certain applications.</para>
<para>Another useful tool for managing a multi-site installation
is Orchestration (heat). The Orchestration module allows the
use of templates to define a set of instances to be launched
together or for scaling existing sets. It can also be used to
set up matching or differentiated groupings based on
regions. For instance, if an application requires an equally
balanced number of nodes across sites, the same heat template
can be used to cover each site with small alterations to only
the region name.</para>
</section>
</section>

View File

@ -6,55 +6,16 @@
xml:id="user-requirements-multi-site">
<?dbhtml stop-chunking?>
<title>User requirements</title>
<para>A multi-site architecture is complex and has its own risks
and considerations, therefore it is important to make sure
when contemplating the design such an architecture that it
meets the user and business requirements.</para>
<para>Many jurisdictions have legislative and regulatory
requirements governing the storage and management of data in
cloud environments. Common areas of regulation include:</para>
<itemizedlist>
<listitem>
<para>Data retention policies ensuring storage of
persistent data and records management to meet data
archival requirements.</para>
</listitem>
<listitem>
<para>Data ownership policies governing the possession and
responsibility for data.</para>
</listitem>
<listitem>
<para>Data sovereignty policies governing the storage of
data in foreign countries or otherwise separate
jurisdictions.</para>
</listitem>
<listitem>
<para>Data compliance policies governing types of
information that needs to reside in certain locations
due to regular issues and, more importantly, cannot
reside in other locations for the same reason.</para>
</listitem>
</itemizedlist>
<para>Examples of such legal frameworks include the data
protection framework of the European Union (<link
xlink:href="http://ec.europa.eu/justice/data-protection">http://ec.europa.eu/justice/data-protection</link>)
and the requirements of the Financial Industry Regulatory
Authority (<link
xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
in the United States. Consult a local regulatory body for more
information.</para>
<section xml:id="workload-characteristics">
<title>Workload characteristics</title>
<para>The expected workload is a critical requirement that needs
to be captured to guide decision-making. An understanding of
the workloads in the context of the desired multi-site
environment and use case is important. Another way of thinking
about a workload is to think of it as the way the systems are
used. A workload could be a single application or a suite of
applications that work together. It could also be a duplicate
set of applications that need to run in multiple cloud
environments. Often in a multi-site deployment the same
workload will need to work identically in more than one
<para>An understanding of the expected workloads for a desired
multi-site environment and use case is an important factor in
the decision-making process. In this context, <literal>workload</literal>
refers to the way the systems are used. A workload could be a
single application or a suite of applications that work together.
It could also be a duplicate set of applications that need to
run in multiple cloud environments. Often in a multi-site deployment,
the same workload will need to work identically in more than one
physical location.</para>
<para>This multi-site scenario likely includes one or more of the
other scenarios in this book with the additional requirement
@ -72,26 +33,26 @@
<title>Consistency of images and templates across different
sites</title>
<para>It is essential that the deployment of instances is
consistent across the different sites. This needs to be built
consistent across the different sites and built
into the infrastructure. If the OpenStack Object Storage is used as
a back end for the Image service, it is possible to create repositories of
consistent images across multiple sites. Having central
a back end for the Image service, it is possible to create repositories
of consistent images across multiple sites. Having central
endpoints with multiple storage nodes allows consistent centralized
storage for each and every site.</para>
<para>Not using a centralized object store increases operational
overhead so that a consistent image library can be maintained. This
storage for every site.</para>
<para>Not using a centralized object store increases the operational
overhead of maintaining a consistent image library. This
could include development of a replication mechanism to handle
the transport of images and the changes to the images across
multiple sites.</para></section>
<section xml:id="high-availability-multi-site"><title>High availability</title>
<section xml:id="high-availability-multi-site">
<title>High availability</title>
<para>If high availability is a requirement to provide continuous
infrastructure operations, a basic requirement of high
availability should be defined.</para>
<para>The OpenStack management components need to have a basic and
minimal level of redundancy. The simplest example is the loss
of any single site has no significant impact on the
availability of the OpenStack services of the entire
infrastructure.</para>
of any single site should have minimal impact on the
availability of the OpenStack services.</para>
<para>The <link
xlink:href="http://docs.openstack.org/high-availability-guide/content/"><citetitle>OpenStack
High Availability Guide</citetitle></link>
@ -111,14 +72,12 @@
WAN network design between the sites.</para>
<para>Connecting more than two sites increases the challenges and
adds more complexity to the design considerations. Multi-site
implementations require extra planning to address the
additional topology complexity used for internal and external
connectivity. Some options include full mesh topology, hub
spoke, spine leaf, or 3d Torus.</para>
<para>Not all the applications running in a cloud are cloud-aware.
If that is the case, there should be clear measures and
expectations to define what the infrastructure can support
and, more importantly, what it cannot. An example would be
implementations require planning to address the additional
topology used for internal and external connectivity. Some options
include full mesh topology, hub spoke, spine leaf, and 3D Torus.</para>
<para>If applications running in a cloud are not cloud-aware, there
should be clear measures and expectations to define what the
infrastructure can and cannot support. An example would be
shared storage between sites. It is possible, however such a
solution is not native to OpenStack and requires a third-party
hardware vendor to fulfill such a requirement. Another example
@ -126,21 +85,21 @@
in object storage directly. These applications need to be
cloud aware to make good use of an OpenStack Object
Store.</para></section>
<section xml:id="application-readiness"><title>Application readiness</title>
<section xml:id="application-readiness">
<title>Application readiness</title>
<para>Some applications are tolerant of the lack of synchronized
object storage, while others may need those objects to be
replicated and available across regions. Understanding of how
replicated and available across regions. Understanding how
the cloud implementation impacts new and existing applications
is important for risk mitigation and the overall success of a
cloud project. Applications may have to be written to expect
an infrastructure with little to no redundancy. Existing
applications not developed with the cloud in mind may need to
be rewritten.</para></section>
<section xml:id="cost-multi-site"><title>Cost</title>
<para>The requirement of having more than one site has a cost
attached to it. The greater the number of sites, the greater
the cost and complexity. Costs can be broken down into the
following categories:</para>
is important for risk mitigation, and the overall success of a
cloud project. Applications may have to be written or rewritten
for an infrastructure with little to no redundancy, or with the
cloud in mind.</para></section>
<section xml:id="cost-multi-site">
<title>Cost</title>
<para>A greater number of sites increase cost and complexity for a
multi-site deployment. Costs can be broken down into the following
categories:</para>
<itemizedlist>
<listitem>
<para>Compute resources</para>
@ -163,34 +122,32 @@
</itemizedlist></section>
<section xml:id="site-loss-and-recovery">
<title>Site loss and recovery</title>
<para>Outages can cause loss of partial or full functionality of a
site. Strategies should be implemented to understand and plan
for recovery scenarios.</para>
<para>Outages can cause partial or full loss of site functionality.
Strategies should be implemented to understand and plan for recovery
scenarios.</para>
<itemizedlist>
<listitem>
<para>The deployed applications need to continue to
function and, more importantly, consideration should
be taken of the impact on the performance and
reliability of the application when a site is
unavailable.</para>
function and, more importantly, you must consider the
impact on the performance and reliability of the application
when a site is unavailable.</para>
</listitem>
<listitem>
<para>It is important to understand what happens to the
replication of objects and data between the sites when
a site goes down. If this causes queues to start
building up, consider how long these queues can
safely exist until something explodes.</para>
safely exist until an error occurs.</para>
</listitem>
<listitem>
<para>Ensure determination of the method for resuming
proper operations of a site when it comes back online
after a disaster. We recommend you architect the
recovery to avoid race conditions.</para>
<para>After an outage, ensure the method for resuming proper
operations of a site is implemented when it comes back online.
We recommend you architect the recovery to avoid race conditions.</para>
</listitem>
</itemizedlist></section>
<section xml:id="compliance-and-geo-location-multi-site">
<title>Compliance and geo-location</title>
<para>An organization could have certain legal obligations and
<para>An organization may have certain legal obligations and
regulatory compliance measures which could require certain
workloads or data to not be located in certain regions.</para></section>
<section xml:id="auditing-multi-site">
@ -210,11 +167,10 @@
site.</para></section>
<section xml:id="authentication-between-sites">
<title>Authentication between sites</title>
<para>Ideally it is best to have a single authentication domain
and not need a separate implementation for each and every
site. This, of course, requires an authentication
mechanism that is highly available and distributed to ensure
continuous operation. Authentication server locality is also
something that might be needed as well and should be planned
for.</para></section>
<para>It is recommended to have a single authentication domain
rather than a separate implementation for each and every
site. This requires an authentication mechanism that is highly
available and distributed to ensure continuous operation.
Authentication server locality might be required and should be
planned for.</para></section>
</section>