Multi-site chapter edits

1. Edits to the multi-site chapter
2. Removed duplicated legal content which was added to a common section. See https://review.openstack.org/#/c/212299/

Change-Id: I10e3a04650548454c73024d87cbbb6fda63454e8
Implements: blueprint arch-guide
This commit is contained in:
darrenchan
2015-08-12 23:26:53 +10:00
parent 87ff7002f8
commit 68e8c66e79
6 changed files with 339 additions and 350 deletions

View File

@@ -6,16 +6,9 @@
xml:id="multi_site"> xml:id="multi_site">
<title>Multi-site</title> <title>Multi-site</title>
<para>A multi-site OpenStack environment is one in which services, <para>OpenStack is capable of running in a multi-region
located in more than one data center, are used to provide the
overall solution. Usage requirements of different multi-site
clouds may vary widely, but they share some common needs.
OpenStack is capable of running in a multi-region
configuration. This enables some parts of OpenStack to configuration. This enables some parts of OpenStack to
effectively manage a group of sites as a single cloud. With effectively manage a group of sites as a single cloud.</para>
careful planning in the design phase, OpenStack can act as an
excellent multi-site cloud solution for a multitude of
needs.</para>
<para>Some use cases that might indicate a need for a multi-site <para>Some use cases that might indicate a need for a multi-site
deployment of OpenStack include:</para> deployment of OpenStack include:</para>
<itemizedlist> <itemizedlist>

View File

@@ -6,59 +6,61 @@
xml:id="arch-design-architecture-multiple-site"> xml:id="arch-design-architecture-multiple-site">
<?dbhtml stop-chunking?> <?dbhtml stop-chunking?>
<title>Architecture</title> <title>Architecture</title>
<para>This graphic is a high level diagram of a multi-site OpenStack <para><xref linkend="multi-site_arch"/>
architecture. Each site is an OpenStack cloud but it may be necessary to illustrates a high level multi-site OpenStack
architect the sites on different versions. For example, if the second architecture. Each site is an OpenStack cloud but it may be necessary
site is intended to be a replacement for the first site, they would be to architect the sites on different versions. For example, if the
different. Another common design would be a private OpenStack cloud with second site is intended to be a replacement for the first site,
replicated site that would be used for high availability or disaster they would be different. Another common design would be a private
recovery. The most important design decision is how to configure the OpenStack cloud with a replicated site that would be used for high
storage. It can be configured as a single shared pool or separate pools, availability or disaster recovery. The most important design decision
depending on the user and technical requirements.</para> is configuring storage as a single shared pool or separate pools,
depending on user and technical requirements.</para>
<figure xml:id="multi-site_arch">
<title>Multi-site OpenStack architecture</title>
<mediaobject> <mediaobject>
<imageobject> <imageobject>
<imagedata contentwidth="4in" <imagedata contentwidth="6in"
fileref="../figures/Multi-Site_shared_keystone_horizon_swift1.png"/> fileref="../figures/Multi-Site_shared_keystone_horizon_swift1.png"/>
</imageobject> </imageobject>
</mediaobject> </mediaobject>
</figure>
<section xml:id="openstack-services-architecture"> <section xml:id="openstack-services-architecture">
<title>OpenStack services architecture</title> <title>OpenStack services architecture</title>
<para>The OpenStack Identity service, which is used by all other <para>The OpenStack Identity service, which is used by all other
OpenStack components for authorization and the catalog of service OpenStack components for authorization and the catalog of
endpoints, supports the concept of regions. A region is a logical service endpoints, supports the concept of regions. A region
construct that can be used to group OpenStack services that are in is a logical construct used to group OpenStack services in
close proximity to one another. The concept of regions is flexible; close proximity to one another. The concept of
it may can contain OpenStack service endpoints located within a regions is flexible; it may can contain OpenStack service
distinct geographic region, or regions. It may be smaller in scope, endpoints located within a distinct geographic region or regions.
where a region is a single rack within a data center or even a It may be smaller in scope, where a region is a single rack
single blade chassis, with multiple regions existing in adjacent within a data center, with multiple regions existing in adjacent
racks in the same data center.</para> racks in the same data center.</para>
<para>The majority of OpenStack components are designed to run within <para>The majority of OpenStack components are designed to run
the context of a single region. The OpenStack Compute service is within the context of a single region. The OpenStack Compute
designed to manage compute resources within a region, with support service is designed to manage compute resources within a region,
for subdivisions of compute resources by using availability zones with support for subdivisions of compute resources by using
and cells. The OpenStack Networking service can be used to manage availability zones and cells. The OpenStack Networking service
network resources in the same broadcast domain or collection of can be used to manage network resources in the same broadcast
switches that are linked. The OpenStack Block Storage service domain or collection of switches that are linked. The OpenStack
controls storage resources within a region with all storage Block Storage service controls storage resources within a region
resources residing on the same storage network. Like the OpenStack with all storage resources residing on the same storage network.
Compute service, the OpenStack Block Storage service also supports Like the OpenStack Compute service, the OpenStack Block Storage
the availability zone construct which can be used to subdivide service also supports the availability zone construct which can
storage resources.</para> be used to subdivide storage resources.</para>
<para>The OpenStack dashboard, OpenStack Identity, and OpenStack <para>The OpenStack dashboard, OpenStack Identity, and OpenStack
Object Storage services are components that can each be deployed Object Storage services are components that can each be deployed
centrally in order to serve multiple regions.</para> centrally in order to serve multiple regions.</para>
</section> </section>
<section xml:id="arch-multi-storage"> <section xml:id="arch-multi-storage">
<title>Storage</title> <title>Storage</title>
<para>With multiple OpenStack regions, having a single OpenStack Object <para>With multiple OpenStack regions, it is recommended to configure
Storage service endpoint that delivers shared file storage for all a single OpenStack Object Storage service endpoint to deliver
regions is desirable. The Object Storage service internally shared file storage for all regions. The Object Storage service
replicates files to multiple nodes. The advantages of this are that, internally replicates files to multiple nodes which can be used
if a file placed into the Object Storage service is visible to all by applications or workloads in multiple regions. This simplifies
regions, it can be used by applications or workloads in any or all high availability failover and disaster recovery rollback.</para>
of the regions. This simplifies high availability failover and
disaster recovery rollback.</para>
<para>In order to scale the Object Storage service to meet the workload <para>In order to scale the Object Storage service to meet the workload
of multiple regions, multiple proxy workers are run and of multiple regions, multiple proxy workers are run and
load-balanced, storage nodes are installed in each region, and the load-balanced, storage nodes are installed in each region, and the
@@ -68,19 +70,20 @@
reducing the actual load on the storage network. In addition to an reducing the actual load on the storage network. In addition to an
HTTP caching layer, use a caching layer like Memcache to cache HTTP caching layer, use a caching layer like Memcache to cache
objects between the proxy and storage nodes.</para> objects between the proxy and storage nodes.</para>
<para>If the cloud is designed without a single Object Storage Service <para>If the cloud is designed with a separate Object Storage
endpoint for multiple regions, and instead a separate Object Storage Service endpoint made available in each region, applications are
Service endpoint is made available in each region, applications are
required to handle synchronization (if desired) and other management required to handle synchronization (if desired) and other management
operations to ensure consistency across the nodes. For some operations to ensure consistency across the nodes. For some
applications, having multiple Object Storage Service endpoints applications, having multiple Object Storage Service endpoints
located in the same region as the application may be desirable due located in the same region as the application may be desirable due
to reduced latency, cross region bandwidth, and ease of to reduced latency, cross region bandwidth, and ease of
deployment.</para> deployment.</para>
<para>For the Block Storage service, the most important decisions are <note>
the selection of the storage technology and whether or not a <para>For the Block Storage service, the most important decisions
dedicated network is used to carry storage traffic from the storage are the selection of the storage technology, and whether
service to the compute nodes.</para> a dedicated network is used to carry storage traffic
from the storage service to the compute nodes.</para>
</note>
</section> </section>
<section xml:id="arch-networking-multiple"> <section xml:id="arch-networking-multiple">
<title>Networking</title> <title>Networking</title>
@@ -100,18 +103,19 @@
</section> </section>
<section xml:id="arch-dependencies-multiple"> <section xml:id="arch-dependencies-multiple">
<title>Dependencies</title> <title>Dependencies</title>
<para>The architecture for a multi-site installation of OpenStack is <para>The architecture for a multi-site OpenStack installation
dependent on a number of factors. One major dependency to consider is dependent on a number of factors. One major dependency to
is storage. When designing the storage system, the storage mechanism consider is storage. When designing the storage system, the
needs to be determined. Once the storage type is determined, how it storage mechanism needs to be determined. Once the storage
is accessed is critical. For example, we recommend that type is determined, how it is accessed is critical. For example,
storage should use a dedicated network. Another concern is how we recommend that storage should use a dedicated network.
the storage is configured to protect the data. For example, the Another concern is how the storage is configured to protect
recovery point objective (RPO) and the recovery time objective the data. For example, the Recovery Point Objective (RPO) and
(RTO). How quickly can the recovery from a fault be completed, the Recovery Time Objective (RTO). How quickly recovery from
determines how often the replication of data is required. Ensure that a fault can be completed, determines how often the replication of
enough storage is allocated to support the data protection data is required. Ensure that enough storage is allocated to
strategy.</para> support the data protection strategy.
</para>
<para>Networking decisions include the encapsulation mechanism that can <para>Networking decisions include the encapsulation mechanism that can
be used for the tenant networks, how large the broadcast domains be used for the tenant networks, how large the broadcast domains
should be, and the contracted SLAs for the interconnects.</para> should be, and the contracted SLAs for the interconnects.</para>

View File

@@ -6,16 +6,14 @@
xml:id="operational-considerations-multi-site"> xml:id="operational-considerations-multi-site">
<?dbhtml stop-chunking?> <?dbhtml stop-chunking?>
<title>Operational considerations</title> <title>Operational considerations</title>
<para>Deployment of a multi-site OpenStack cloud using regions <para>Multi-site OpenStack cloud deployment using regions
requires that the service catalog contains per-region entries requires that the service catalog contains per-region entries
for each service deployed other than the Identity service for each service deployed other than the Identity service. Most
itself. There is limited support amongst currently available off-the-shelf OpenStack deployment tools have limited support
off-the-shelf OpenStack deployment tools for defining multiple for defining multiple regions in this fashion.</para>
regions in this fashion.</para> <para>Deployers should be aware of this and provide the appropriate
<para>Deployers must be aware of this and provide the appropriate
customization of the service catalog for their site either customization of the service catalog for their site either
manually or via customization of the deployment tools in manually, or by customizing deployment tools in use.</para>
use.</para>
<note><para>As of the Kilo release, documentation for <note><para>As of the Kilo release, documentation for
implementing this feature is in progress. See this bug for implementing this feature is in progress. See this bug for
more information: more information:
@@ -31,51 +29,46 @@
host operating systems, guest operating systems, OpenStack host operating systems, guest operating systems, OpenStack
distributions (if applicable), software-defined infrastructure distributions (if applicable), software-defined infrastructure
including network controllers and storage systems, and even including network controllers and storage systems, and even
individual applications need to be evaluated in light of the individual applications need to be evaluated.</para>
multi-site nature of the cloud.</para>
<para>Topics to consider include:</para> <para>Topics to consider include:</para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para>The specific definition of what constitutes a site <para>The definition of what constitutes a site
in the relevant licenses, as the term does not in the relevant licenses, as the term does not
necessarily denote a geographic or otherwise necessarily denote a geographic or otherwise
physically isolated location in the traditional physically isolated location.</para>
sense.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>Differentiations between "hot" (active) and "cold" <para>Differentiations between "hot" (active) and "cold"
(inactive) sites where significant savings may be made (inactive) sites, where significant savings may be made
in situations where one site is a cold standby for in situations where one site is a cold standby for
disaster recovery purposes only.</para> disaster recovery purposes only.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>Certain locations might require local vendors to <para>Certain locations might require local vendors to
provide support and services for each site provides provide support and services for each site which may vary
challenges, but will vary on the licensing agreement with the licensing agreement in place.</para>
in place.</para>
</listitem> </listitem>
</itemizedlist></section> </itemizedlist></section>
<section xml:id="logging-and-monitoring-multi-site"> <section xml:id="logging-and-monitoring-multi-site">
<title>Logging and monitoring</title> <title>Logging and monitoring</title>
<para>Logging and monitoring does not significantly differ for a <para>Logging and monitoring does not significantly differ for a
multi-site OpenStack cloud. The same well known tools multi-site OpenStack cloud. The tools described in the <link
described in the <link
xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">Logging xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">Logging
and monitoring chapter</link> of the <citetitle>Operations and monitoring chapter</link> of the <citetitle>Operations
Guide</citetitle> remain applicable. Logging and monitoring Guide</citetitle> remain applicable. Logging and monitoring
can be provided both on a per-site basis and in a common can be provided on a per-site basis, and in a common
centralized location.</para> centralized location.</para>
<para>When attempting to deploy logging and monitoring facilities <para>When attempting to deploy logging and monitoring facilities
to a centralized location, care must be taken with regards to to a centralized location, care must be taken with the load
the load placed on the inter-site networking links.</para></section> placed on the inter-site networking links.</para></section>
<section xml:id="upgrades-multi-site"> <section xml:id="upgrades-multi-site">
<title>Upgrades</title> <title>Upgrades</title>
<para>In multi-site OpenStack clouds deployed using regions each <para>In multi-site OpenStack clouds deployed using regions, sites
site is, effectively, an independent OpenStack installation are independent OpenStack installations which are linked
which is linked to the others by using centralized services together using shared centralized services such as OpenStack
such as Identity which are shared between sites. At a high Identity. At a high level the recommended order of operations
level the recommended order of operations to upgrade an to upgrade an individual OpenStack environment is (see the <link
individual OpenStack environment is (see the <link
xlink:href="http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html">Upgrades xlink:href="http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html">Upgrades
chapter</link> of the <citetitle>Operations Guide</citetitle> chapter</link> of the <citetitle>Operations Guide</citetitle>
for details):</para> for details):</para>
@@ -123,22 +116,20 @@
shared.</para> shared.</para>
</listitem> </listitem>
</orderedlist> </orderedlist>
<para>Note that Compute <para>Compute upgrades within each site can also be performed in a rolling
upgrades within each site can also be performed in a rolling
fashion. Compute controller services (API, Scheduler, and fashion. Compute controller services (API, Scheduler, and
Conductor) can be upgraded prior to upgrading of individual Conductor) can be upgraded prior to upgrading of individual
compute nodes. This maximizes the ability of operations staff compute nodes. This allows operations staff to keep a site
to keep a site operational for users of compute services while operational for users of Compute services while performing an
performing an upgrade.</para></section> upgrade.</para></section>
<section xml:id="quota-management-multi-site"> <section xml:id="quota-management-multi-site">
<title>Quota management</title> <title>Quota management</title>
<para>To prevent system capacities from being exhausted without <para>Quotas are used to set operational limits to prevent system
notification, OpenStack provides operators with the ability to capacities from being exhausted without notification. They are
define quotas. Quotas are used to set operational limits and currently enforced at the tenant (or project) level rather than
are currently enforced at the tenant (or project) level rather at the user level.</para>
than at the user level.</para> <para>Quotas are defined on a per-region basis. Operators can
<para>Quotas are defined on a per-region basis. Operators may wish define identical quotas for tenants in each region of the
to define identical quotas for tenants in each region of the
cloud to provide a consistent experience, or even create a cloud to provide a consistent experience, or even create a
process for synchronizing allocated quotas across regions. It process for synchronizing allocated quotas across regions. It
is important to note that only the operational limits imposed is important to note that only the operational limits imposed
@@ -161,24 +152,22 @@
Control (RBAC) policies, defined in a <filename>policy.json</filename> file, for Control (RBAC) policies, defined in a <filename>policy.json</filename> file, for
each service. Operators edit these files to customize the each service. Operators edit these files to customize the
policies for their OpenStack installation. If the application policies for their OpenStack installation. If the application
of consistent RBAC policies across sites is considered a of consistent RBAC policies across sites is a requirement, then
requirement, then it is necessary to ensure proper it is necessary to ensure proper synchronization of the
synchronization of the <filename>policy.json</filename> files to all <filename>policy.json</filename> files to all installations.</para>
installations.</para> <para>This must be done using system administration tools
<para>This must be done using normal system administration tools such as rsync as functionality for synchronizing policies
such as rsync as no functionality for synchronizing policies across regions is not currently provided within OpenStack.</para></section>
across regions is currently provided within OpenStack.</para></section>
<section xml:id="documentation-multi-site"> <section xml:id="documentation-multi-site">
<title>Documentation</title> <title>Documentation</title>
<para>Users must be able to leverage cloud infrastructure and <para>Users must be able to leverage cloud infrastructure and
provision new resources in the environment. It is important provision new resources in the environment. It is important
that user documentation is accessible by users of the cloud that user documentation is accessible by users to ensure they
infrastructure to ensure they are given sufficient information are given sufficient information to help them leverage the cloud.
to help them leverage the cloud. As an example, by default As an example, by default OpenStack schedules instances on a compute node
OpenStack schedules instances on a compute node
automatically. However, when multiple regions are available, automatically. However, when multiple regions are available,
it is left to the end user to decide in which region to the end user needs to decide in which region to schedule the
schedule the new instance. The dashboard presents the user with new instance. The dashboard presents the user with
the first region in your configuration. The API and CLI tools the first region in your configuration. The API and CLI tools
do not execute commands unless a valid region is specified. do not execute commands unless a valid region is specified.
It is therefore important to provide documentation to your It is therefore important to provide documentation to your

View File

@@ -22,10 +22,10 @@
very sensitive to latency and needs a rapid response to very sensitive to latency and needs a rapid response to
end-users. After reviewing the user, technical and operational end-users. After reviewing the user, technical and operational
considerations, it is determined beneficial to build a number considerations, it is determined beneficial to build a number
of regions local to the customer's edge. In this case rather of regions local to the customer's edge. Rather than build a
than build a few large, centralized data centers, the intent few large, centralized data centers, the intent of the architecture
of the architecture is to provide a pair of small data centers is to provide a pair of small data centers in locations that
in locations that are closer to the customer. In this use are closer to the customer. In this use
case, spreading applications out allows for different case, spreading applications out allows for different
horizontal scaling than a traditional compute workload scale. horizontal scaling than a traditional compute workload scale.
The intent is to scale by creating more copies of the The intent is to scale by creating more copies of the
@@ -60,44 +60,47 @@
expanding the capacity of all regions simultaneously, expanding the capacity of all regions simultaneously,
therefore maximizing the cost-effectiveness of the multi-site therefore maximizing the cost-effectiveness of the multi-site
design.</para> design.</para>
<para>One of the key decisions of running this sort of <para>One of the key decisions of running this infrastructure is
infrastructure is whether or not to provide a redundancy whether or not to provide a redundancy
model. Two types of redundancy and high availability models in model. Two types of redundancy and high availability models in
this configuration can be implemented. The first type this configuration can be implemented. The first type
revolves around the availability of the central OpenStack is the availability of central OpenStack
components. Keystone can be made highly available in three components. Keystone can be made highly available in three
central data centers that host the centralized OpenStack central data centers that host the centralized OpenStack
components. This prevents a loss of any one of the regions components. This prevents a loss of any one of the regions
causing an outage in service. It also has the added benefit of causing an outage in service. It also has the added benefit of
being able to run a central storage repository as a primary being able to run a central storage repository as a primary
cache for distributing content to each of the regions.</para> cache for distributing content to each of the regions.</para>
<para>The second redundancy topic is that of the edge data center <para>The second redundancy type is the edge data center itself.
itself. A second data center in each of the edge regional A second data center in each of the edge regional
locations house a second region near the first. This locations house a second region near the first region. This
ensures that the application does not suffer degraded ensures that the application does not suffer degraded
performance in terms of latency and availability.</para> performance in terms of latency and availability.</para>
<para>This figure depicts the solution designed to have both a <para><xref linkend="multi-site_customer_edge"/> depicts
centralized set of core data centers for OpenStack services the solution designed to have both a centralized set of core
and paired edge data centers:</para> data centers for OpenStack services and paired edge data centers:</para>
<figure xml:id="multi-site_customer_edge">
<title>Multi-site architecture example</title>
<mediaobject> <mediaobject>
<imageobject> <imageobject>
<imagedata contentwidth="4in" <imagedata contentwidth="6in"
fileref="../figures/Multi-Site_Customer_Edge.png"/> fileref="../figures/Multi-Site_Customer_Edge.png"/>
</imageobject> </imageobject>
</mediaobject> </mediaobject>
</figure>
<section xml:id="geo-redundant-load-balancing"> <section xml:id="geo-redundant-load-balancing">
<title>Geo-redundant load balancing</title> <title>Geo-redundant load balancing</title>
<para>A large-scale web application has been designed with cloud <para>A large-scale web application has been designed with cloud
principles in mind. The application is designed provide principles in mind. The application is designed provide
service to application store, on a 24/7 basis. The company has service to application store, on a 24/7 basis. The company has
typical 2-tier architecture with a web front-end servicing the typical two tier architecture with a web front-end servicing the
customer requests and a NoSQL database back end storing the customer requests, and a NoSQL database back end storing the
information.</para> information.</para>
<para>As of late there has been several outages in number of major <para>As of late there has been several outages in number of major
public cloud providers&mdash;usually due to the fact these public cloud providers due to applications running out of
applications were running out of a single geographical a single geographical location. The design therefore should
location. The design therefore should mitigate the chance of a mitigate the chance of a single site causing an outage for their
single site causing an outage for their business.</para> business.</para>
<para>The solution would consist of the following OpenStack <para>The solution would consist of the following OpenStack
components:</para> components:</para>
<itemizedlist> <itemizedlist>
@@ -108,12 +111,11 @@
<listitem> <listitem>
<para>OpenStack Controller services running, Networking, <para>OpenStack Controller services running, Networking,
dashboard, Block Storage and Compute running locally in dashboard, Block Storage and Compute running locally in
each of the three regions. The other services, each of the three regions. Identity service, Orchestration
Identity, Orchestration, Telemetry, Image service and service, Telemetry service, Image service and
Object Storage can be Object Storage can be installed centrally, with
installed centrally&mdash;with nodes in each of the region nodes in each of the region providing a redundant
providing a redundant OpenStack Controller plane OpenStack Controller plane throughout the globe.</para>
throughout the globe.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>OpenStack Compute nodes running the KVM <para>OpenStack Compute nodes running the KVM
@@ -126,9 +128,9 @@
replicated on a regular basis.</para> replicated on a regular basis.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>A Distributed DNS service available to all <para>A distributed DNS service available to all
regions&mdash;that allows for dynamic update of DNS records of regions that allows for dynamic update of DNS
deployed instances.</para> records of deployed instances.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>A geo-redundant load balancing service can be used <para>A geo-redundant load balancing service can be used
@@ -153,10 +155,10 @@
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>Another autoscaling Heat template can be used to deploy a <para>Another autoscaling Heat template can be used to deploy a
distributed MongoDB shard over the three locations&mdash;with the distributed MongoDB shard over the three locations, with the
option of storing required data on a globally available swift option of storing required data on a globally available swift
container. According to the usage and load on the database container. According to the usage and load on the database
server&mdash;additional shards can be provisioned according to server, additional shards can be provisioned according to
the thresholds defined in Telemetry.</para> the thresholds defined in Telemetry.</para>
<!-- <para>The reason that three regions were selected here was because of <!-- <para>The reason that three regions were selected here was because of
the fear of having abnormal load on a single region in the the fear of having abnormal load on a single region in the
@@ -169,57 +171,66 @@
autoscaling and auto healing in the event of increased load. autoscaling and auto healing in the event of increased load.
Additional configuration management tools, such as Puppet or Additional configuration management tools, such as Puppet or
Chef could also have been used in this scenario, but were not Chef could also have been used in this scenario, but were not
chosen due to the fact that Orchestration had the appropriate built-in chosen since Orchestration had the appropriate built-in
hooks into the OpenStack cloud&mdash;whereas the other tools were hooks into the OpenStack cloud, whereas the other tools were
external and not native to OpenStack. In addition&mdash;since this external and not native to OpenStack. In addition, external
deployment scenario was relatively straight forward&mdash;the tools were not needed since this deployment scenario was straight
external tools were not needed.</para> forward.</para>
<para> <para>OpenStack Object Storage is used here to serve as a back end for
OpenStack Object Storage is used here to serve as a back end for
the Image service since it is the most suitable solution for a the Image service since it is the most suitable solution for a
globally distributed storage solution&mdash;with its own globally distributed storage solution with its own
replication mechanism. Home grown solutions could also have replication mechanism. Home grown solutions could also have
been used including the handling of replication&mdash;but were not been used including the handling of replication, but were not
chosen, because Object Storage is already an intricate part of the chosen, because Object Storage is already an intricate part of the
infrastructure&mdash;and proven solution.</para> infrastructure and a proven solution.</para>
<para>An external load balancing service was used and not the <para>An external load balancing service was used and not the
LBaaS in OpenStack because the solution in OpenStack is not LBaaS in OpenStack because the solution in OpenStack is not
redundant and does not have any awareness of geo location.</para> redundant and does not have any awareness of geo location.</para>
<figure xml:id="multi-site_geo_redundant">
<title>Multi-site geo-redundant architecture</title>
<mediaobject> <mediaobject>
<imageobject> <imageobject>
<imagedata contentwidth="4in" <imagedata contentwidth="6in"
fileref="../figures/Multi-site_Geo_Redundant_LB.png"/> fileref="../figures/Multi-site_Geo_Redundant_LB.png"/>
</imageobject> </imageobject>
</mediaobject></section> </mediaobject>
<section xml:id="location-local-services"><title>Location-local service</title> </figure>
<para>A common use for a multi-site deployment of OpenStack, is </section>
for creating a Content Delivery Network. An application that <section xml:id="location-local-services">
<title>Location-local service</title>
<para>A common use for multi-site OpenStack deployment is
creating a Content Delivery Network. An application that
uses a location-local architecture requires low network uses a location-local architecture requires low network
latency and proximity to the user, in order to provide an latency and proximity to the user to provide an
optimal user experience, in addition to reducing the cost of optimal user experience and reduce the cost of bandwidth and
bandwidth and transit, since the content resides on sites transit. The content resides on sites closer to the customer,
closer to the customer, instead of a centralized content store instead of a centralized content store that requires utilizing
that requires utilizing higher cost cross-country links.</para> higher cost cross-country links.</para>
<para>This architecture usually includes a geo-location component <para>This architecture includes a geo-location component
that places user requests at the closest possible node. In that places user requests to the closest possible node. In
this scenario, 100% redundancy of content across every site is this scenario, 100% redundancy of content across every site is
a goal rather than a requirement, with the intent being to a goal rather than a requirement, with the intent to
maximize the amount of content available that is within a maximize the amount of content available within a
minimum number of network hops for any given end user. Despite minimum number of network hops for end users. Despite
these differences, the storage replication configuration has these differences, the storage replication configuration has
significant overlap with that of a geo-redundant load significant overlap with that of a geo-redundant load
balancing use case.</para> balancing use case.</para>
<para>In this example, the application utilizing this multi-site <para>In <xref linkend="multi-site_shared_shared_keystone"/>,
OpenStack install that is location aware would launch web the application utilizing this multi-site OpenStack install
server or content serving instances on the compute cluster in that is location-aware would launch web server or content
each site. Requests from clients are first sent to a serving instances on the compute cluster in each site. Requests
global services load balancer that determines the location of from clients are first sent to a global services load balancer
the client, then routes the request to the closest OpenStack that determines the location of the client, then routes the
site where the application completes the request.</para> request to the closest OpenStack site where the application
completes the request.</para>
<figure xml:id="multi-site_shared_shared_keystone">
<title>Multi-site shared keystone architecture</title>
<mediaobject> <mediaobject>
<imageobject> <imageobject>
<imagedata contentwidth="4in" <imagedata contentwidth="6in"
fileref="../figures/Multi-Site_shared_keystone1.png"/> fileref="../figures/Multi-Site_shared_keystone1.png"/>
</imageobject> </imageobject>
</mediaobject></section> </mediaobject>
</figure>
</section>
</section> </section>

View File

@@ -27,105 +27,108 @@
high-bandwidth links available between them, it may be wise to high-bandwidth links available between them, it may be wise to
configure a separate storage replication network between the configure a separate storage replication network between the
two sites to support a single Swift endpoint and a shared two sites to support a single Swift endpoint and a shared
object storage capability between them. (An example of this Object Storage capability between them. An example of this
technique, as well as a configuration walk-through, is technique, as well as a configuration walk-through, is
available at <link available at <link
xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>). xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>.
Another option in this scenario is to build a dedicated set of Another option in this scenario is to build a dedicated set of
tenant private networks across the secondary link using tenant private networks across the secondary link, using
overlay networks with a third party mapping the site overlays overlay networks with a third party mapping the site overlays
to each other.</para> to each other.</para>
<para>The capacity requirements of the links between sites is <para>The capacity requirements of the links between sites is
driven by application behavior. If the latency of the links is driven by application behavior. If the link latency is
too high, certain applications that use a large number of too high, certain applications that use a large number of
small packets, for example RPC calls, may encounter issues small packets, for example RPC calls, may encounter issues
communicating with each other or operating properly. communicating with each other or operating properly.
Additionally, OpenStack may encounter similar types of issues. Additionally, OpenStack may encounter similar types of issues.
To mitigate this, tuning of the Identity service call timeouts may be To mitigate this, Identity service call timeouts can be
necessary to prevent issues authenticating against a central tuned to prevent issues authenticating against a central
Identity service.</para> Identity service.</para>
<para>Another capacity consideration when it comes to networking <para>Another network capacity consideration for a multi-site
for a multi-site deployment is the available amount and deployment is the amount and performance of overlay networks
performance of overlay networks for tenant networks. If using available for tenant networks. If using shared tenant networks
shared tenant networks across zones, it is imperative that an across zones, it is imperative that an external overlay manager
external overlay manager or controller be used to map these or controller be used to map these overlays together. It is
overlays together. It is necessary to ensure the amount of necessary to ensure the amount of possible IDs between the zones
possible IDs between the zones are identical. Note that, as of are identical.</para>
the Kilo release, OpenStack Networking was not capable of managing <note>
tunnel IDs across installations. This means that if one site <para>As of the Kilo release, OpenStack Networking was not
runs out of IDs, but other does not, that tenant's network capable of managing tunnel IDs across installations. So if
is unable to reach the other site.</para> one site runs out of IDs, but another does not, that tenant's
network is unable to reach the other site.</para>
</note>
<para>Capacity can take other forms as well. The ability for a <para>Capacity can take other forms as well. The ability for a
region to grow depends on scaling out the number of available region to grow depends on scaling out the number of available
compute nodes. This topic is covered in greater detail in the compute nodes. This topic is covered in greater detail in the
section for compute-focused deployments. However, it should be section for compute-focused deployments. However, it may be
noted that cells may be necessary to grow an individual region necessary to grow cells in an individual region, depending on
beyond a certain point. This point depends on the size of your the size of your cluster and the ratio of virtual machines per
cluster and the ratio of virtual machines per
hypervisor.</para> hypervisor.</para>
<para>A third form of capacity comes in the multi-region-capable <para>A third form of capacity comes in the multi-region-capable
components of OpenStack. Centralized Object Storage is capable components of OpenStack. Centralized Object Storage is capable
of serving objects through a single namespace across multiple of serving objects through a single namespace across multiple
regions. Since this works by accessing the object store via regions. Since this works by accessing the object store through
swift proxy, it is possible to overload the proxies. There are swift proxy, it is possible to overload the proxies. There are
two options available to mitigate this issue. The first is to two options available to mitigate this issue:</para>
deploy a large number of swift proxies. The drawback to this <itemizedlist>
is that the proxies are not load-balanced and a large file <listitem>
request could continually hit the same proxy. The other way to <para>Deploy a large number of swift proxies. The drawback is
mitigate this is to front-end the proxies with a caching HTTP that the proxies are not load-balanced and a large file
proxy and load balancer. Since swift objects are returned to request could continually hit the same proxy.</para>
the requester via HTTP, this load balancer would alleviate the </listitem>
<listitem>
<para>Add a caching HTTP proxy and load balancer in front of
the swift proxies. Since swift objects are returned to the
requester via HTTP, this load balancer would alleviate the
load required on the swift proxies.</para> load required on the swift proxies.</para>
</listitem>
</itemizedlist>
<section xml:id="utilization-multi-site"><title>Utilization</title> <section xml:id="utilization-multi-site"><title>Utilization</title>
<para>While constructing a multi-site OpenStack environment is the <para>While constructing a multi-site OpenStack environment is the
goal of this guide, the real test is whether an application goal of this guide, the real test is whether an application
can utilize it.</para> can utilize it.</para>
<para>Identity is normally the first interface for the majority of <para>The Identity service is normally the first interface for
OpenStack users. Interacting with the Identity service is required for OpenStack users and is required for almost all major operations
almost all major operations within OpenStack. Therefore, it is within OpenStack. Therefore, it is important that you provide users
important to ensure that you provide users with a single URL with a single URL for Identity service authentication, and
for Identity service authentication. Equally important is proper document the configuration of regions within the Identity service.
documentation and configuration of regions within the Identity service.
Each of the sites defined in your installation is considered Each of the sites defined in your installation is considered
to be a region in Identity nomenclature. This is important for to be a region in Identity nomenclature. This is important for
the users of the system, when reading Identity documentation, the users, as it is required to define the region name when
as it is required to define the region name when providing providing actions to an API endpoint or in the dashboard.</para>
actions to an API endpoint or in the dashboard.</para>
<para>Load balancing is another common issue with multi-site <para>Load balancing is another common issue with multi-site
installations. While it is still possible to run HAproxy installations. While it is still possible to run HAproxy
instances with Load-Balancer-as-a-Service, these are local instances with Load-Balancer-as-a-Service, these are defined
to a specific region. Some applications may be able to cope to a specific region. Some applications can manage this using
with this via internal mechanisms. Others, however, may internal mechanisms. Other applications may require the
require the implementation of an external system including implementation of an external system, including global services
global services load balancers or anycast-advertised load balancers or anycast-advertised DNS.</para>
DNS.</para>
<para>Depending on the storage model chosen during site design, <para>Depending on the storage model chosen during site design,
storage replication and availability are also a concern storage replication and availability are also a concern
for end-users. If an application is capable of understanding for end-users. If an application can support regions, then it
regions, then it is possible to keep the object storage system is possible to keep the object storage system separated by region.
separated by region. In this case, users who want to have an In this case, users who want to have an object available to
object available to more than one region need to do the more than one region need to perform cross-site replication.
cross-site replication themselves. With a centralized swift However, with a centralized swift proxy, the user may need to
proxy, however, the user may need to benchmark the replication benchmark the replication timing of the Object Storage back end.
timing of the Object Storage back end. Benchmarking allows the Benchmarking allows the operational staff to provide users with
operational staff to provide users with an understanding of an understanding of the amount of time required for a stored or
the amount of time required for a stored or modified object to modified object to become available to the entire environment.</para>
become available to the entire environment.</para></section> </section>
<section xml:id="performance"><title>Performance</title> <section xml:id="performance"><title>Performance</title>
<para>Determining the performance of a multi-site installation <para>Determining the performance of a multi-site installation
involves considerations that do not come into play in a involves considerations that do not come into play in a
single-site deployment. Being a distributed deployment, single-site deployment. Being a distributed deployment,
multi-site deployments incur a few extra penalties to performance in multi-site deployments may be affected in certain
performance in certain situations.</para> situations.</para>
<para>Since multi-site systems can be geographically separated, <para>Since multi-site systems can be geographically separated,
they may have worse than normal latency or jitter when there may be greater latency or jitter when communicating across
communicating across regions. This can especially impact regions. This can especially impact systems like the OpenStack
systems like the OpenStack Identity service when making Identity service when making authentication attempts from regions
authentication attempts from regions that do not contain the that do not contain the centralized Identity implementation. It
centralized Identity implementation. It can also affect can also affect applications which rely on Remote Procedure Call (RPC)
certain applications which rely on remote procedure call (RPC) for normal operation. An example of this can be seen in high
for normal operation. An example of this can be seen in High performance computing workloads.</para>
Performance Computing workloads.</para>
<para>Storage availability can also be impacted by the <para>Storage availability can also be impacted by the
architecture of a multi-site deployment. A centralized Object architecture of a multi-site deployment. A centralized Object
Storage service requires more time for an object to be Storage service requires more time for an object to be
@@ -137,4 +140,37 @@
to manually cope with this limitation by creating duplicate to manually cope with this limitation by creating duplicate
block storage entries in each region.</para> block storage entries in each region.</para>
</section> </section>
<section xml:id="openstack-components_multi-site">
<title>OpenStack components</title>
<para>Most OpenStack installations require a bare minimum set of
pieces to function. These include the OpenStack Identity
(keystone) for authentication, OpenStack Compute
(nova) for compute, OpenStack Image service (glance) for image
storage, OpenStack Networking (neutron) for networking, and
potentially an object store in the form of OpenStack Object
Storage (swift). Deploying a multi-site installation also demands extra
components in order to coordinate between regions. A centralized
Identity service is necessary to provide the single authentication
point. A centralized dashboard is also recommended to provide a
single login point and a mapping to the API and CLI
options available. A centralized Object Storage service may also
be used, but will require the installation of the swift proxy
service.</para>
<para>It may also be helpful to install a few extra options in
order to facilitate certain use cases. For example,
installing Designate may assist in automatically generating
DNS domains for each region with an automatically-populated
zone full of resource records for each instance. This
facilitates using DNS as a mechanism for determining which
region will be selected for certain applications.</para>
<para>Another useful tool for managing a multi-site installation
is Orchestration (heat). The Orchestration module allows the
use of templates to define a set of instances to be launched
together or for scaling existing sets. It can also be used to
set up matching or differentiated groupings based on
regions. For instance, if an application requires an equally
balanced number of nodes across sites, the same heat template
can be used to cover each site with small alterations to only
the region name.</para>
</section>
</section> </section>

View File

@@ -6,55 +6,16 @@
xml:id="user-requirements-multi-site"> xml:id="user-requirements-multi-site">
<?dbhtml stop-chunking?> <?dbhtml stop-chunking?>
<title>User requirements</title> <title>User requirements</title>
<para>A multi-site architecture is complex and has its own risks
and considerations, therefore it is important to make sure
when contemplating the design such an architecture that it
meets the user and business requirements.</para>
<para>Many jurisdictions have legislative and regulatory
requirements governing the storage and management of data in
cloud environments. Common areas of regulation include:</para>
<itemizedlist>
<listitem>
<para>Data retention policies ensuring storage of
persistent data and records management to meet data
archival requirements.</para>
</listitem>
<listitem>
<para>Data ownership policies governing the possession and
responsibility for data.</para>
</listitem>
<listitem>
<para>Data sovereignty policies governing the storage of
data in foreign countries or otherwise separate
jurisdictions.</para>
</listitem>
<listitem>
<para>Data compliance policies governing types of
information that needs to reside in certain locations
due to regular issues and, more importantly, cannot
reside in other locations for the same reason.</para>
</listitem>
</itemizedlist>
<para>Examples of such legal frameworks include the data
protection framework of the European Union (<link
xlink:href="http://ec.europa.eu/justice/data-protection">http://ec.europa.eu/justice/data-protection</link>)
and the requirements of the Financial Industry Regulatory
Authority (<link
xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
in the United States. Consult a local regulatory body for more
information.</para>
<section xml:id="workload-characteristics"> <section xml:id="workload-characteristics">
<title>Workload characteristics</title> <title>Workload characteristics</title>
<para>The expected workload is a critical requirement that needs <para>An understanding of the expected workloads for a desired
to be captured to guide decision-making. An understanding of multi-site environment and use case is an important factor in
the workloads in the context of the desired multi-site the decision-making process. In this context, <literal>workload</literal>
environment and use case is important. Another way of thinking refers to the way the systems are used. A workload could be a
about a workload is to think of it as the way the systems are single application or a suite of applications that work together.
used. A workload could be a single application or a suite of It could also be a duplicate set of applications that need to
applications that work together. It could also be a duplicate run in multiple cloud environments. Often in a multi-site deployment,
set of applications that need to run in multiple cloud the same workload will need to work identically in more than one
environments. Often in a multi-site deployment the same
workload will need to work identically in more than one
physical location.</para> physical location.</para>
<para>This multi-site scenario likely includes one or more of the <para>This multi-site scenario likely includes one or more of the
other scenarios in this book with the additional requirement other scenarios in this book with the additional requirement
@@ -72,26 +33,26 @@
<title>Consistency of images and templates across different <title>Consistency of images and templates across different
sites</title> sites</title>
<para>It is essential that the deployment of instances is <para>It is essential that the deployment of instances is
consistent across the different sites. This needs to be built consistent across the different sites and built
into the infrastructure. If the OpenStack Object Storage is used as into the infrastructure. If the OpenStack Object Storage is used as
a back end for the Image service, it is possible to create repositories of a back end for the Image service, it is possible to create repositories
consistent images across multiple sites. Having central of consistent images across multiple sites. Having central
endpoints with multiple storage nodes allows consistent centralized endpoints with multiple storage nodes allows consistent centralized
storage for each and every site.</para> storage for every site.</para>
<para>Not using a centralized object store increases operational <para>Not using a centralized object store increases the operational
overhead so that a consistent image library can be maintained. This overhead of maintaining a consistent image library. This
could include development of a replication mechanism to handle could include development of a replication mechanism to handle
the transport of images and the changes to the images across the transport of images and the changes to the images across
multiple sites.</para></section> multiple sites.</para></section>
<section xml:id="high-availability-multi-site"><title>High availability</title> <section xml:id="high-availability-multi-site">
<title>High availability</title>
<para>If high availability is a requirement to provide continuous <para>If high availability is a requirement to provide continuous
infrastructure operations, a basic requirement of high infrastructure operations, a basic requirement of high
availability should be defined.</para> availability should be defined.</para>
<para>The OpenStack management components need to have a basic and <para>The OpenStack management components need to have a basic and
minimal level of redundancy. The simplest example is the loss minimal level of redundancy. The simplest example is the loss
of any single site has no significant impact on the of any single site should have minimal impact on the
availability of the OpenStack services of the entire availability of the OpenStack services.</para>
infrastructure.</para>
<para>The <link <para>The <link
xlink:href="http://docs.openstack.org/high-availability-guide/content/"><citetitle>OpenStack xlink:href="http://docs.openstack.org/high-availability-guide/content/"><citetitle>OpenStack
High Availability Guide</citetitle></link> High Availability Guide</citetitle></link>
@@ -111,14 +72,12 @@
WAN network design between the sites.</para> WAN network design between the sites.</para>
<para>Connecting more than two sites increases the challenges and <para>Connecting more than two sites increases the challenges and
adds more complexity to the design considerations. Multi-site adds more complexity to the design considerations. Multi-site
implementations require extra planning to address the implementations require planning to address the additional
additional topology complexity used for internal and external topology used for internal and external connectivity. Some options
connectivity. Some options include full mesh topology, hub include full mesh topology, hub spoke, spine leaf, and 3D Torus.</para>
spoke, spine leaf, or 3d Torus.</para> <para>If applications running in a cloud are not cloud-aware, there
<para>Not all the applications running in a cloud are cloud-aware. should be clear measures and expectations to define what the
If that is the case, there should be clear measures and infrastructure can and cannot support. An example would be
expectations to define what the infrastructure can support
and, more importantly, what it cannot. An example would be
shared storage between sites. It is possible, however such a shared storage between sites. It is possible, however such a
solution is not native to OpenStack and requires a third-party solution is not native to OpenStack and requires a third-party
hardware vendor to fulfill such a requirement. Another example hardware vendor to fulfill such a requirement. Another example
@@ -126,21 +85,21 @@
in object storage directly. These applications need to be in object storage directly. These applications need to be
cloud aware to make good use of an OpenStack Object cloud aware to make good use of an OpenStack Object
Store.</para></section> Store.</para></section>
<section xml:id="application-readiness"><title>Application readiness</title> <section xml:id="application-readiness">
<title>Application readiness</title>
<para>Some applications are tolerant of the lack of synchronized <para>Some applications are tolerant of the lack of synchronized
object storage, while others may need those objects to be object storage, while others may need those objects to be
replicated and available across regions. Understanding of how replicated and available across regions. Understanding how
the cloud implementation impacts new and existing applications the cloud implementation impacts new and existing applications
is important for risk mitigation and the overall success of a is important for risk mitigation, and the overall success of a
cloud project. Applications may have to be written to expect cloud project. Applications may have to be written or rewritten
an infrastructure with little to no redundancy. Existing for an infrastructure with little to no redundancy, or with the
applications not developed with the cloud in mind may need to cloud in mind.</para></section>
be rewritten.</para></section> <section xml:id="cost-multi-site">
<section xml:id="cost-multi-site"><title>Cost</title> <title>Cost</title>
<para>The requirement of having more than one site has a cost <para>A greater number of sites increase cost and complexity for a
attached to it. The greater the number of sites, the greater multi-site deployment. Costs can be broken down into the following
the cost and complexity. Costs can be broken down into the categories:</para>
following categories:</para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para>Compute resources</para> <para>Compute resources</para>
@@ -163,34 +122,32 @@
</itemizedlist></section> </itemizedlist></section>
<section xml:id="site-loss-and-recovery"> <section xml:id="site-loss-and-recovery">
<title>Site loss and recovery</title> <title>Site loss and recovery</title>
<para>Outages can cause loss of partial or full functionality of a <para>Outages can cause partial or full loss of site functionality.
site. Strategies should be implemented to understand and plan Strategies should be implemented to understand and plan for recovery
for recovery scenarios.</para> scenarios.</para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para>The deployed applications need to continue to <para>The deployed applications need to continue to
function and, more importantly, consideration should function and, more importantly, you must consider the
be taken of the impact on the performance and impact on the performance and reliability of the application
reliability of the application when a site is when a site is unavailable.</para>
unavailable.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>It is important to understand what happens to the <para>It is important to understand what happens to the
replication of objects and data between the sites when replication of objects and data between the sites when
a site goes down. If this causes queues to start a site goes down. If this causes queues to start
building up, consider how long these queues can building up, consider how long these queues can
safely exist until something explodes.</para> safely exist until an error occurs.</para>
</listitem> </listitem>
<listitem> <listitem>
<para>Ensure determination of the method for resuming <para>After an outage, ensure the method for resuming proper
proper operations of a site when it comes back online operations of a site is implemented when it comes back online.
after a disaster. We recommend you architect the We recommend you architect the recovery to avoid race conditions.</para>
recovery to avoid race conditions.</para>
</listitem> </listitem>
</itemizedlist></section> </itemizedlist></section>
<section xml:id="compliance-and-geo-location-multi-site"> <section xml:id="compliance-and-geo-location-multi-site">
<title>Compliance and geo-location</title> <title>Compliance and geo-location</title>
<para>An organization could have certain legal obligations and <para>An organization may have certain legal obligations and
regulatory compliance measures which could require certain regulatory compliance measures which could require certain
workloads or data to not be located in certain regions.</para></section> workloads or data to not be located in certain regions.</para></section>
<section xml:id="auditing-multi-site"> <section xml:id="auditing-multi-site">
@@ -210,11 +167,10 @@
site.</para></section> site.</para></section>
<section xml:id="authentication-between-sites"> <section xml:id="authentication-between-sites">
<title>Authentication between sites</title> <title>Authentication between sites</title>
<para>Ideally it is best to have a single authentication domain <para>It is recommended to have a single authentication domain
and not need a separate implementation for each and every rather than a separate implementation for each and every
site. This, of course, requires an authentication site. This requires an authentication mechanism that is highly
mechanism that is highly available and distributed to ensure available and distributed to ensure continuous operation.
continuous operation. Authentication server locality is also Authentication server locality might be required and should be
something that might be needed as well and should be planned planned for.</para></section>
for.</para></section>
</section> </section>