Multi-site chapter edits
1. Edits to the multi-site chapter 2. Removed duplicated legal content which was added to a common section. See https://review.openstack.org/#/c/212299/ Change-Id: I10e3a04650548454c73024d87cbbb6fda63454e8 Implements: blueprint arch-guide
This commit is contained in:
@@ -6,16 +6,9 @@
|
|||||||
xml:id="multi_site">
|
xml:id="multi_site">
|
||||||
<title>Multi-site</title>
|
<title>Multi-site</title>
|
||||||
|
|
||||||
<para>A multi-site OpenStack environment is one in which services,
|
<para>OpenStack is capable of running in a multi-region
|
||||||
located in more than one data center, are used to provide the
|
|
||||||
overall solution. Usage requirements of different multi-site
|
|
||||||
clouds may vary widely, but they share some common needs.
|
|
||||||
OpenStack is capable of running in a multi-region
|
|
||||||
configuration. This enables some parts of OpenStack to
|
configuration. This enables some parts of OpenStack to
|
||||||
effectively manage a group of sites as a single cloud. With
|
effectively manage a group of sites as a single cloud.</para>
|
||||||
careful planning in the design phase, OpenStack can act as an
|
|
||||||
excellent multi-site cloud solution for a multitude of
|
|
||||||
needs.</para>
|
|
||||||
<para>Some use cases that might indicate a need for a multi-site
|
<para>Some use cases that might indicate a need for a multi-site
|
||||||
deployment of OpenStack include:</para>
|
deployment of OpenStack include:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
|
|||||||
@@ -6,59 +6,61 @@
|
|||||||
xml:id="arch-design-architecture-multiple-site">
|
xml:id="arch-design-architecture-multiple-site">
|
||||||
<?dbhtml stop-chunking?>
|
<?dbhtml stop-chunking?>
|
||||||
<title>Architecture</title>
|
<title>Architecture</title>
|
||||||
<para>This graphic is a high level diagram of a multi-site OpenStack
|
<para><xref linkend="multi-site_arch"/>
|
||||||
architecture. Each site is an OpenStack cloud but it may be necessary to
|
illustrates a high level multi-site OpenStack
|
||||||
architect the sites on different versions. For example, if the second
|
architecture. Each site is an OpenStack cloud but it may be necessary
|
||||||
site is intended to be a replacement for the first site, they would be
|
to architect the sites on different versions. For example, if the
|
||||||
different. Another common design would be a private OpenStack cloud with
|
second site is intended to be a replacement for the first site,
|
||||||
replicated site that would be used for high availability or disaster
|
they would be different. Another common design would be a private
|
||||||
recovery. The most important design decision is how to configure the
|
OpenStack cloud with a replicated site that would be used for high
|
||||||
storage. It can be configured as a single shared pool or separate pools,
|
availability or disaster recovery. The most important design decision
|
||||||
depending on the user and technical requirements.</para>
|
is configuring storage as a single shared pool or separate pools,
|
||||||
|
depending on user and technical requirements.</para>
|
||||||
|
<figure xml:id="multi-site_arch">
|
||||||
|
<title>Multi-site OpenStack architecture</title>
|
||||||
<mediaobject>
|
<mediaobject>
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata contentwidth="4in"
|
<imagedata contentwidth="6in"
|
||||||
fileref="../figures/Multi-Site_shared_keystone_horizon_swift1.png"/>
|
fileref="../figures/Multi-Site_shared_keystone_horizon_swift1.png"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
|
</figure>
|
||||||
<section xml:id="openstack-services-architecture">
|
<section xml:id="openstack-services-architecture">
|
||||||
<title>OpenStack services architecture</title>
|
<title>OpenStack services architecture</title>
|
||||||
<para>The OpenStack Identity service, which is used by all other
|
<para>The OpenStack Identity service, which is used by all other
|
||||||
OpenStack components for authorization and the catalog of service
|
OpenStack components for authorization and the catalog of
|
||||||
endpoints, supports the concept of regions. A region is a logical
|
service endpoints, supports the concept of regions. A region
|
||||||
construct that can be used to group OpenStack services that are in
|
is a logical construct used to group OpenStack services in
|
||||||
close proximity to one another. The concept of regions is flexible;
|
close proximity to one another. The concept of
|
||||||
it may can contain OpenStack service endpoints located within a
|
regions is flexible; it may can contain OpenStack service
|
||||||
distinct geographic region, or regions. It may be smaller in scope,
|
endpoints located within a distinct geographic region or regions.
|
||||||
where a region is a single rack within a data center or even a
|
It may be smaller in scope, where a region is a single rack
|
||||||
single blade chassis, with multiple regions existing in adjacent
|
within a data center, with multiple regions existing in adjacent
|
||||||
racks in the same data center.</para>
|
racks in the same data center.</para>
|
||||||
<para>The majority of OpenStack components are designed to run within
|
<para>The majority of OpenStack components are designed to run
|
||||||
the context of a single region. The OpenStack Compute service is
|
within the context of a single region. The OpenStack Compute
|
||||||
designed to manage compute resources within a region, with support
|
service is designed to manage compute resources within a region,
|
||||||
for subdivisions of compute resources by using availability zones
|
with support for subdivisions of compute resources by using
|
||||||
and cells. The OpenStack Networking service can be used to manage
|
availability zones and cells. The OpenStack Networking service
|
||||||
network resources in the same broadcast domain or collection of
|
can be used to manage network resources in the same broadcast
|
||||||
switches that are linked. The OpenStack Block Storage service
|
domain or collection of switches that are linked. The OpenStack
|
||||||
controls storage resources within a region with all storage
|
Block Storage service controls storage resources within a region
|
||||||
resources residing on the same storage network. Like the OpenStack
|
with all storage resources residing on the same storage network.
|
||||||
Compute service, the OpenStack Block Storage service also supports
|
Like the OpenStack Compute service, the OpenStack Block Storage
|
||||||
the availability zone construct which can be used to subdivide
|
service also supports the availability zone construct which can
|
||||||
storage resources.</para>
|
be used to subdivide storage resources.</para>
|
||||||
<para>The OpenStack dashboard, OpenStack Identity, and OpenStack
|
<para>The OpenStack dashboard, OpenStack Identity, and OpenStack
|
||||||
Object Storage services are components that can each be deployed
|
Object Storage services are components that can each be deployed
|
||||||
centrally in order to serve multiple regions.</para>
|
centrally in order to serve multiple regions.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="arch-multi-storage">
|
<section xml:id="arch-multi-storage">
|
||||||
<title>Storage</title>
|
<title>Storage</title>
|
||||||
<para>With multiple OpenStack regions, having a single OpenStack Object
|
<para>With multiple OpenStack regions, it is recommended to configure
|
||||||
Storage service endpoint that delivers shared file storage for all
|
a single OpenStack Object Storage service endpoint to deliver
|
||||||
regions is desirable. The Object Storage service internally
|
shared file storage for all regions. The Object Storage service
|
||||||
replicates files to multiple nodes. The advantages of this are that,
|
internally replicates files to multiple nodes which can be used
|
||||||
if a file placed into the Object Storage service is visible to all
|
by applications or workloads in multiple regions. This simplifies
|
||||||
regions, it can be used by applications or workloads in any or all
|
high availability failover and disaster recovery rollback.</para>
|
||||||
of the regions. This simplifies high availability failover and
|
|
||||||
disaster recovery rollback.</para>
|
|
||||||
<para>In order to scale the Object Storage service to meet the workload
|
<para>In order to scale the Object Storage service to meet the workload
|
||||||
of multiple regions, multiple proxy workers are run and
|
of multiple regions, multiple proxy workers are run and
|
||||||
load-balanced, storage nodes are installed in each region, and the
|
load-balanced, storage nodes are installed in each region, and the
|
||||||
@@ -68,19 +70,20 @@
|
|||||||
reducing the actual load on the storage network. In addition to an
|
reducing the actual load on the storage network. In addition to an
|
||||||
HTTP caching layer, use a caching layer like Memcache to cache
|
HTTP caching layer, use a caching layer like Memcache to cache
|
||||||
objects between the proxy and storage nodes.</para>
|
objects between the proxy and storage nodes.</para>
|
||||||
<para>If the cloud is designed without a single Object Storage Service
|
<para>If the cloud is designed with a separate Object Storage
|
||||||
endpoint for multiple regions, and instead a separate Object Storage
|
Service endpoint made available in each region, applications are
|
||||||
Service endpoint is made available in each region, applications are
|
|
||||||
required to handle synchronization (if desired) and other management
|
required to handle synchronization (if desired) and other management
|
||||||
operations to ensure consistency across the nodes. For some
|
operations to ensure consistency across the nodes. For some
|
||||||
applications, having multiple Object Storage Service endpoints
|
applications, having multiple Object Storage Service endpoints
|
||||||
located in the same region as the application may be desirable due
|
located in the same region as the application may be desirable due
|
||||||
to reduced latency, cross region bandwidth, and ease of
|
to reduced latency, cross region bandwidth, and ease of
|
||||||
deployment.</para>
|
deployment.</para>
|
||||||
<para>For the Block Storage service, the most important decisions are
|
<note>
|
||||||
the selection of the storage technology and whether or not a
|
<para>For the Block Storage service, the most important decisions
|
||||||
dedicated network is used to carry storage traffic from the storage
|
are the selection of the storage technology, and whether
|
||||||
service to the compute nodes.</para>
|
a dedicated network is used to carry storage traffic
|
||||||
|
from the storage service to the compute nodes.</para>
|
||||||
|
</note>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="arch-networking-multiple">
|
<section xml:id="arch-networking-multiple">
|
||||||
<title>Networking</title>
|
<title>Networking</title>
|
||||||
@@ -100,18 +103,19 @@
|
|||||||
</section>
|
</section>
|
||||||
<section xml:id="arch-dependencies-multiple">
|
<section xml:id="arch-dependencies-multiple">
|
||||||
<title>Dependencies</title>
|
<title>Dependencies</title>
|
||||||
<para>The architecture for a multi-site installation of OpenStack is
|
<para>The architecture for a multi-site OpenStack installation
|
||||||
dependent on a number of factors. One major dependency to consider
|
is dependent on a number of factors. One major dependency to
|
||||||
is storage. When designing the storage system, the storage mechanism
|
consider is storage. When designing the storage system, the
|
||||||
needs to be determined. Once the storage type is determined, how it
|
storage mechanism needs to be determined. Once the storage
|
||||||
is accessed is critical. For example, we recommend that
|
type is determined, how it is accessed is critical. For example,
|
||||||
storage should use a dedicated network. Another concern is how
|
we recommend that storage should use a dedicated network.
|
||||||
the storage is configured to protect the data. For example, the
|
Another concern is how the storage is configured to protect
|
||||||
recovery point objective (RPO) and the recovery time objective
|
the data. For example, the Recovery Point Objective (RPO) and
|
||||||
(RTO). How quickly can the recovery from a fault be completed,
|
the Recovery Time Objective (RTO). How quickly recovery from
|
||||||
determines how often the replication of data is required. Ensure that
|
a fault can be completed, determines how often the replication of
|
||||||
enough storage is allocated to support the data protection
|
data is required. Ensure that enough storage is allocated to
|
||||||
strategy.</para>
|
support the data protection strategy.
|
||||||
|
</para>
|
||||||
<para>Networking decisions include the encapsulation mechanism that can
|
<para>Networking decisions include the encapsulation mechanism that can
|
||||||
be used for the tenant networks, how large the broadcast domains
|
be used for the tenant networks, how large the broadcast domains
|
||||||
should be, and the contracted SLAs for the interconnects.</para>
|
should be, and the contracted SLAs for the interconnects.</para>
|
||||||
|
|||||||
@@ -6,16 +6,14 @@
|
|||||||
xml:id="operational-considerations-multi-site">
|
xml:id="operational-considerations-multi-site">
|
||||||
<?dbhtml stop-chunking?>
|
<?dbhtml stop-chunking?>
|
||||||
<title>Operational considerations</title>
|
<title>Operational considerations</title>
|
||||||
<para>Deployment of a multi-site OpenStack cloud using regions
|
<para>Multi-site OpenStack cloud deployment using regions
|
||||||
requires that the service catalog contains per-region entries
|
requires that the service catalog contains per-region entries
|
||||||
for each service deployed other than the Identity service
|
for each service deployed other than the Identity service. Most
|
||||||
itself. There is limited support amongst currently available
|
off-the-shelf OpenStack deployment tools have limited support
|
||||||
off-the-shelf OpenStack deployment tools for defining multiple
|
for defining multiple regions in this fashion.</para>
|
||||||
regions in this fashion.</para>
|
<para>Deployers should be aware of this and provide the appropriate
|
||||||
<para>Deployers must be aware of this and provide the appropriate
|
|
||||||
customization of the service catalog for their site either
|
customization of the service catalog for their site either
|
||||||
manually or via customization of the deployment tools in
|
manually, or by customizing deployment tools in use.</para>
|
||||||
use.</para>
|
|
||||||
<note><para>As of the Kilo release, documentation for
|
<note><para>As of the Kilo release, documentation for
|
||||||
implementing this feature is in progress. See this bug for
|
implementing this feature is in progress. See this bug for
|
||||||
more information:
|
more information:
|
||||||
@@ -31,51 +29,46 @@
|
|||||||
host operating systems, guest operating systems, OpenStack
|
host operating systems, guest operating systems, OpenStack
|
||||||
distributions (if applicable), software-defined infrastructure
|
distributions (if applicable), software-defined infrastructure
|
||||||
including network controllers and storage systems, and even
|
including network controllers and storage systems, and even
|
||||||
individual applications need to be evaluated in light of the
|
individual applications need to be evaluated.</para>
|
||||||
multi-site nature of the cloud.</para>
|
|
||||||
<para>Topics to consider include:</para>
|
<para>Topics to consider include:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>The specific definition of what constitutes a site
|
<para>The definition of what constitutes a site
|
||||||
in the relevant licenses, as the term does not
|
in the relevant licenses, as the term does not
|
||||||
necessarily denote a geographic or otherwise
|
necessarily denote a geographic or otherwise
|
||||||
physically isolated location in the traditional
|
physically isolated location.</para>
|
||||||
sense.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Differentiations between "hot" (active) and "cold"
|
<para>Differentiations between "hot" (active) and "cold"
|
||||||
(inactive) sites where significant savings may be made
|
(inactive) sites, where significant savings may be made
|
||||||
in situations where one site is a cold standby for
|
in situations where one site is a cold standby for
|
||||||
disaster recovery purposes only.</para>
|
disaster recovery purposes only.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Certain locations might require local vendors to
|
<para>Certain locations might require local vendors to
|
||||||
provide support and services for each site provides
|
provide support and services for each site which may vary
|
||||||
challenges, but will vary on the licensing agreement
|
with the licensing agreement in place.</para>
|
||||||
in place.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist></section>
|
</itemizedlist></section>
|
||||||
<section xml:id="logging-and-monitoring-multi-site">
|
<section xml:id="logging-and-monitoring-multi-site">
|
||||||
<title>Logging and monitoring</title>
|
<title>Logging and monitoring</title>
|
||||||
<para>Logging and monitoring does not significantly differ for a
|
<para>Logging and monitoring does not significantly differ for a
|
||||||
multi-site OpenStack cloud. The same well known tools
|
multi-site OpenStack cloud. The tools described in the <link
|
||||||
described in the <link
|
|
||||||
xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">Logging
|
xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">Logging
|
||||||
and monitoring chapter</link> of the <citetitle>Operations
|
and monitoring chapter</link> of the <citetitle>Operations
|
||||||
Guide</citetitle> remain applicable. Logging and monitoring
|
Guide</citetitle> remain applicable. Logging and monitoring
|
||||||
can be provided both on a per-site basis and in a common
|
can be provided on a per-site basis, and in a common
|
||||||
centralized location.</para>
|
centralized location.</para>
|
||||||
<para>When attempting to deploy logging and monitoring facilities
|
<para>When attempting to deploy logging and monitoring facilities
|
||||||
to a centralized location, care must be taken with regards to
|
to a centralized location, care must be taken with the load
|
||||||
the load placed on the inter-site networking links.</para></section>
|
placed on the inter-site networking links.</para></section>
|
||||||
<section xml:id="upgrades-multi-site">
|
<section xml:id="upgrades-multi-site">
|
||||||
<title>Upgrades</title>
|
<title>Upgrades</title>
|
||||||
<para>In multi-site OpenStack clouds deployed using regions each
|
<para>In multi-site OpenStack clouds deployed using regions, sites
|
||||||
site is, effectively, an independent OpenStack installation
|
are independent OpenStack installations which are linked
|
||||||
which is linked to the others by using centralized services
|
together using shared centralized services such as OpenStack
|
||||||
such as Identity which are shared between sites. At a high
|
Identity. At a high level the recommended order of operations
|
||||||
level the recommended order of operations to upgrade an
|
to upgrade an individual OpenStack environment is (see the <link
|
||||||
individual OpenStack environment is (see the <link
|
|
||||||
xlink:href="http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html">Upgrades
|
xlink:href="http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html">Upgrades
|
||||||
chapter</link> of the <citetitle>Operations Guide</citetitle>
|
chapter</link> of the <citetitle>Operations Guide</citetitle>
|
||||||
for details):</para>
|
for details):</para>
|
||||||
@@ -123,22 +116,20 @@
|
|||||||
shared.</para>
|
shared.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</orderedlist>
|
</orderedlist>
|
||||||
<para>Note that Compute
|
<para>Compute upgrades within each site can also be performed in a rolling
|
||||||
upgrades within each site can also be performed in a rolling
|
|
||||||
fashion. Compute controller services (API, Scheduler, and
|
fashion. Compute controller services (API, Scheduler, and
|
||||||
Conductor) can be upgraded prior to upgrading of individual
|
Conductor) can be upgraded prior to upgrading of individual
|
||||||
compute nodes. This maximizes the ability of operations staff
|
compute nodes. This allows operations staff to keep a site
|
||||||
to keep a site operational for users of compute services while
|
operational for users of Compute services while performing an
|
||||||
performing an upgrade.</para></section>
|
upgrade.</para></section>
|
||||||
<section xml:id="quota-management-multi-site">
|
<section xml:id="quota-management-multi-site">
|
||||||
<title>Quota management</title>
|
<title>Quota management</title>
|
||||||
<para>To prevent system capacities from being exhausted without
|
<para>Quotas are used to set operational limits to prevent system
|
||||||
notification, OpenStack provides operators with the ability to
|
capacities from being exhausted without notification. They are
|
||||||
define quotas. Quotas are used to set operational limits and
|
currently enforced at the tenant (or project) level rather than
|
||||||
are currently enforced at the tenant (or project) level rather
|
at the user level.</para>
|
||||||
than at the user level.</para>
|
<para>Quotas are defined on a per-region basis. Operators can
|
||||||
<para>Quotas are defined on a per-region basis. Operators may wish
|
define identical quotas for tenants in each region of the
|
||||||
to define identical quotas for tenants in each region of the
|
|
||||||
cloud to provide a consistent experience, or even create a
|
cloud to provide a consistent experience, or even create a
|
||||||
process for synchronizing allocated quotas across regions. It
|
process for synchronizing allocated quotas across regions. It
|
||||||
is important to note that only the operational limits imposed
|
is important to note that only the operational limits imposed
|
||||||
@@ -161,24 +152,22 @@
|
|||||||
Control (RBAC) policies, defined in a <filename>policy.json</filename> file, for
|
Control (RBAC) policies, defined in a <filename>policy.json</filename> file, for
|
||||||
each service. Operators edit these files to customize the
|
each service. Operators edit these files to customize the
|
||||||
policies for their OpenStack installation. If the application
|
policies for their OpenStack installation. If the application
|
||||||
of consistent RBAC policies across sites is considered a
|
of consistent RBAC policies across sites is a requirement, then
|
||||||
requirement, then it is necessary to ensure proper
|
it is necessary to ensure proper synchronization of the
|
||||||
synchronization of the <filename>policy.json</filename> files to all
|
<filename>policy.json</filename> files to all installations.</para>
|
||||||
installations.</para>
|
<para>This must be done using system administration tools
|
||||||
<para>This must be done using normal system administration tools
|
such as rsync as functionality for synchronizing policies
|
||||||
such as rsync as no functionality for synchronizing policies
|
across regions is not currently provided within OpenStack.</para></section>
|
||||||
across regions is currently provided within OpenStack.</para></section>
|
|
||||||
<section xml:id="documentation-multi-site">
|
<section xml:id="documentation-multi-site">
|
||||||
<title>Documentation</title>
|
<title>Documentation</title>
|
||||||
<para>Users must be able to leverage cloud infrastructure and
|
<para>Users must be able to leverage cloud infrastructure and
|
||||||
provision new resources in the environment. It is important
|
provision new resources in the environment. It is important
|
||||||
that user documentation is accessible by users of the cloud
|
that user documentation is accessible by users to ensure they
|
||||||
infrastructure to ensure they are given sufficient information
|
are given sufficient information to help them leverage the cloud.
|
||||||
to help them leverage the cloud. As an example, by default
|
As an example, by default OpenStack schedules instances on a compute node
|
||||||
OpenStack schedules instances on a compute node
|
|
||||||
automatically. However, when multiple regions are available,
|
automatically. However, when multiple regions are available,
|
||||||
it is left to the end user to decide in which region to
|
the end user needs to decide in which region to schedule the
|
||||||
schedule the new instance. The dashboard presents the user with
|
new instance. The dashboard presents the user with
|
||||||
the first region in your configuration. The API and CLI tools
|
the first region in your configuration. The API and CLI tools
|
||||||
do not execute commands unless a valid region is specified.
|
do not execute commands unless a valid region is specified.
|
||||||
It is therefore important to provide documentation to your
|
It is therefore important to provide documentation to your
|
||||||
|
|||||||
@@ -22,10 +22,10 @@
|
|||||||
very sensitive to latency and needs a rapid response to
|
very sensitive to latency and needs a rapid response to
|
||||||
end-users. After reviewing the user, technical and operational
|
end-users. After reviewing the user, technical and operational
|
||||||
considerations, it is determined beneficial to build a number
|
considerations, it is determined beneficial to build a number
|
||||||
of regions local to the customer's edge. In this case rather
|
of regions local to the customer's edge. Rather than build a
|
||||||
than build a few large, centralized data centers, the intent
|
few large, centralized data centers, the intent of the architecture
|
||||||
of the architecture is to provide a pair of small data centers
|
is to provide a pair of small data centers in locations that
|
||||||
in locations that are closer to the customer. In this use
|
are closer to the customer. In this use
|
||||||
case, spreading applications out allows for different
|
case, spreading applications out allows for different
|
||||||
horizontal scaling than a traditional compute workload scale.
|
horizontal scaling than a traditional compute workload scale.
|
||||||
The intent is to scale by creating more copies of the
|
The intent is to scale by creating more copies of the
|
||||||
@@ -60,44 +60,47 @@
|
|||||||
expanding the capacity of all regions simultaneously,
|
expanding the capacity of all regions simultaneously,
|
||||||
therefore maximizing the cost-effectiveness of the multi-site
|
therefore maximizing the cost-effectiveness of the multi-site
|
||||||
design.</para>
|
design.</para>
|
||||||
<para>One of the key decisions of running this sort of
|
<para>One of the key decisions of running this infrastructure is
|
||||||
infrastructure is whether or not to provide a redundancy
|
whether or not to provide a redundancy
|
||||||
model. Two types of redundancy and high availability models in
|
model. Two types of redundancy and high availability models in
|
||||||
this configuration can be implemented. The first type
|
this configuration can be implemented. The first type
|
||||||
revolves around the availability of the central OpenStack
|
is the availability of central OpenStack
|
||||||
components. Keystone can be made highly available in three
|
components. Keystone can be made highly available in three
|
||||||
central data centers that host the centralized OpenStack
|
central data centers that host the centralized OpenStack
|
||||||
components. This prevents a loss of any one of the regions
|
components. This prevents a loss of any one of the regions
|
||||||
causing an outage in service. It also has the added benefit of
|
causing an outage in service. It also has the added benefit of
|
||||||
being able to run a central storage repository as a primary
|
being able to run a central storage repository as a primary
|
||||||
cache for distributing content to each of the regions.</para>
|
cache for distributing content to each of the regions.</para>
|
||||||
<para>The second redundancy topic is that of the edge data center
|
<para>The second redundancy type is the edge data center itself.
|
||||||
itself. A second data center in each of the edge regional
|
A second data center in each of the edge regional
|
||||||
locations house a second region near the first. This
|
locations house a second region near the first region. This
|
||||||
ensures that the application does not suffer degraded
|
ensures that the application does not suffer degraded
|
||||||
performance in terms of latency and availability.</para>
|
performance in terms of latency and availability.</para>
|
||||||
<para>This figure depicts the solution designed to have both a
|
<para><xref linkend="multi-site_customer_edge"/> depicts
|
||||||
centralized set of core data centers for OpenStack services
|
the solution designed to have both a centralized set of core
|
||||||
and paired edge data centers:</para>
|
data centers for OpenStack services and paired edge data centers:</para>
|
||||||
|
<figure xml:id="multi-site_customer_edge">
|
||||||
|
<title>Multi-site architecture example</title>
|
||||||
<mediaobject>
|
<mediaobject>
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata contentwidth="4in"
|
<imagedata contentwidth="6in"
|
||||||
fileref="../figures/Multi-Site_Customer_Edge.png"/>
|
fileref="../figures/Multi-Site_Customer_Edge.png"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
|
</figure>
|
||||||
<section xml:id="geo-redundant-load-balancing">
|
<section xml:id="geo-redundant-load-balancing">
|
||||||
<title>Geo-redundant load balancing</title>
|
<title>Geo-redundant load balancing</title>
|
||||||
<para>A large-scale web application has been designed with cloud
|
<para>A large-scale web application has been designed with cloud
|
||||||
principles in mind. The application is designed provide
|
principles in mind. The application is designed provide
|
||||||
service to application store, on a 24/7 basis. The company has
|
service to application store, on a 24/7 basis. The company has
|
||||||
typical 2-tier architecture with a web front-end servicing the
|
typical two tier architecture with a web front-end servicing the
|
||||||
customer requests and a NoSQL database back end storing the
|
customer requests, and a NoSQL database back end storing the
|
||||||
information.</para>
|
information.</para>
|
||||||
<para>As of late there has been several outages in number of major
|
<para>As of late there has been several outages in number of major
|
||||||
public cloud providers—usually due to the fact these
|
public cloud providers due to applications running out of
|
||||||
applications were running out of a single geographical
|
a single geographical location. The design therefore should
|
||||||
location. The design therefore should mitigate the chance of a
|
mitigate the chance of a single site causing an outage for their
|
||||||
single site causing an outage for their business.</para>
|
business.</para>
|
||||||
<para>The solution would consist of the following OpenStack
|
<para>The solution would consist of the following OpenStack
|
||||||
components:</para>
|
components:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
@@ -108,12 +111,11 @@
|
|||||||
<listitem>
|
<listitem>
|
||||||
<para>OpenStack Controller services running, Networking,
|
<para>OpenStack Controller services running, Networking,
|
||||||
dashboard, Block Storage and Compute running locally in
|
dashboard, Block Storage and Compute running locally in
|
||||||
each of the three regions. The other services,
|
each of the three regions. Identity service, Orchestration
|
||||||
Identity, Orchestration, Telemetry, Image service and
|
service, Telemetry service, Image service and
|
||||||
Object Storage can be
|
Object Storage can be installed centrally, with
|
||||||
installed centrally—with nodes in each of the region
|
nodes in each of the region providing a redundant
|
||||||
providing a redundant OpenStack Controller plane
|
OpenStack Controller plane throughout the globe.</para>
|
||||||
throughout the globe.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>OpenStack Compute nodes running the KVM
|
<para>OpenStack Compute nodes running the KVM
|
||||||
@@ -126,9 +128,9 @@
|
|||||||
replicated on a regular basis.</para>
|
replicated on a regular basis.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>A Distributed DNS service available to all
|
<para>A distributed DNS service available to all
|
||||||
regions—that allows for dynamic update of DNS records of
|
regions that allows for dynamic update of DNS
|
||||||
deployed instances.</para>
|
records of deployed instances.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>A geo-redundant load balancing service can be used
|
<para>A geo-redundant load balancing service can be used
|
||||||
@@ -153,10 +155,10 @@
|
|||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
<para>Another autoscaling Heat template can be used to deploy a
|
<para>Another autoscaling Heat template can be used to deploy a
|
||||||
distributed MongoDB shard over the three locations—with the
|
distributed MongoDB shard over the three locations, with the
|
||||||
option of storing required data on a globally available swift
|
option of storing required data on a globally available swift
|
||||||
container. According to the usage and load on the database
|
container. According to the usage and load on the database
|
||||||
server—additional shards can be provisioned according to
|
server, additional shards can be provisioned according to
|
||||||
the thresholds defined in Telemetry.</para>
|
the thresholds defined in Telemetry.</para>
|
||||||
<!-- <para>The reason that three regions were selected here was because of
|
<!-- <para>The reason that three regions were selected here was because of
|
||||||
the fear of having abnormal load on a single region in the
|
the fear of having abnormal load on a single region in the
|
||||||
@@ -169,57 +171,66 @@
|
|||||||
autoscaling and auto healing in the event of increased load.
|
autoscaling and auto healing in the event of increased load.
|
||||||
Additional configuration management tools, such as Puppet or
|
Additional configuration management tools, such as Puppet or
|
||||||
Chef could also have been used in this scenario, but were not
|
Chef could also have been used in this scenario, but were not
|
||||||
chosen due to the fact that Orchestration had the appropriate built-in
|
chosen since Orchestration had the appropriate built-in
|
||||||
hooks into the OpenStack cloud—whereas the other tools were
|
hooks into the OpenStack cloud, whereas the other tools were
|
||||||
external and not native to OpenStack. In addition—since this
|
external and not native to OpenStack. In addition, external
|
||||||
deployment scenario was relatively straight forward—the
|
tools were not needed since this deployment scenario was straight
|
||||||
external tools were not needed.</para>
|
forward.</para>
|
||||||
<para>
|
<para>OpenStack Object Storage is used here to serve as a back end for
|
||||||
OpenStack Object Storage is used here to serve as a back end for
|
|
||||||
the Image service since it is the most suitable solution for a
|
the Image service since it is the most suitable solution for a
|
||||||
globally distributed storage solution—with its own
|
globally distributed storage solution with its own
|
||||||
replication mechanism. Home grown solutions could also have
|
replication mechanism. Home grown solutions could also have
|
||||||
been used including the handling of replication—but were not
|
been used including the handling of replication, but were not
|
||||||
chosen, because Object Storage is already an intricate part of the
|
chosen, because Object Storage is already an intricate part of the
|
||||||
infrastructure—and proven solution.</para>
|
infrastructure and a proven solution.</para>
|
||||||
<para>An external load balancing service was used and not the
|
<para>An external load balancing service was used and not the
|
||||||
LBaaS in OpenStack because the solution in OpenStack is not
|
LBaaS in OpenStack because the solution in OpenStack is not
|
||||||
redundant and does not have any awareness of geo location.</para>
|
redundant and does not have any awareness of geo location.</para>
|
||||||
|
<figure xml:id="multi-site_geo_redundant">
|
||||||
|
<title>Multi-site geo-redundant architecture</title>
|
||||||
<mediaobject>
|
<mediaobject>
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata contentwidth="4in"
|
<imagedata contentwidth="6in"
|
||||||
fileref="../figures/Multi-site_Geo_Redundant_LB.png"/>
|
fileref="../figures/Multi-site_Geo_Redundant_LB.png"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
</mediaobject></section>
|
</mediaobject>
|
||||||
<section xml:id="location-local-services"><title>Location-local service</title>
|
</figure>
|
||||||
<para>A common use for a multi-site deployment of OpenStack, is
|
</section>
|
||||||
for creating a Content Delivery Network. An application that
|
<section xml:id="location-local-services">
|
||||||
|
<title>Location-local service</title>
|
||||||
|
<para>A common use for multi-site OpenStack deployment is
|
||||||
|
creating a Content Delivery Network. An application that
|
||||||
uses a location-local architecture requires low network
|
uses a location-local architecture requires low network
|
||||||
latency and proximity to the user, in order to provide an
|
latency and proximity to the user to provide an
|
||||||
optimal user experience, in addition to reducing the cost of
|
optimal user experience and reduce the cost of bandwidth and
|
||||||
bandwidth and transit, since the content resides on sites
|
transit. The content resides on sites closer to the customer,
|
||||||
closer to the customer, instead of a centralized content store
|
instead of a centralized content store that requires utilizing
|
||||||
that requires utilizing higher cost cross-country links.</para>
|
higher cost cross-country links.</para>
|
||||||
<para>This architecture usually includes a geo-location component
|
<para>This architecture includes a geo-location component
|
||||||
that places user requests at the closest possible node. In
|
that places user requests to the closest possible node. In
|
||||||
this scenario, 100% redundancy of content across every site is
|
this scenario, 100% redundancy of content across every site is
|
||||||
a goal rather than a requirement, with the intent being to
|
a goal rather than a requirement, with the intent to
|
||||||
maximize the amount of content available that is within a
|
maximize the amount of content available within a
|
||||||
minimum number of network hops for any given end user. Despite
|
minimum number of network hops for end users. Despite
|
||||||
these differences, the storage replication configuration has
|
these differences, the storage replication configuration has
|
||||||
significant overlap with that of a geo-redundant load
|
significant overlap with that of a geo-redundant load
|
||||||
balancing use case.</para>
|
balancing use case.</para>
|
||||||
<para>In this example, the application utilizing this multi-site
|
<para>In <xref linkend="multi-site_shared_shared_keystone"/>,
|
||||||
OpenStack install that is location aware would launch web
|
the application utilizing this multi-site OpenStack install
|
||||||
server or content serving instances on the compute cluster in
|
that is location-aware would launch web server or content
|
||||||
each site. Requests from clients are first sent to a
|
serving instances on the compute cluster in each site. Requests
|
||||||
global services load balancer that determines the location of
|
from clients are first sent to a global services load balancer
|
||||||
the client, then routes the request to the closest OpenStack
|
that determines the location of the client, then routes the
|
||||||
site where the application completes the request.</para>
|
request to the closest OpenStack site where the application
|
||||||
|
completes the request.</para>
|
||||||
|
<figure xml:id="multi-site_shared_shared_keystone">
|
||||||
|
<title>Multi-site shared keystone architecture</title>
|
||||||
<mediaobject>
|
<mediaobject>
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata contentwidth="4in"
|
<imagedata contentwidth="6in"
|
||||||
fileref="../figures/Multi-Site_shared_keystone1.png"/>
|
fileref="../figures/Multi-Site_shared_keystone1.png"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
</mediaobject></section>
|
</mediaobject>
|
||||||
|
</figure>
|
||||||
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|||||||
@@ -27,105 +27,108 @@
|
|||||||
high-bandwidth links available between them, it may be wise to
|
high-bandwidth links available between them, it may be wise to
|
||||||
configure a separate storage replication network between the
|
configure a separate storage replication network between the
|
||||||
two sites to support a single Swift endpoint and a shared
|
two sites to support a single Swift endpoint and a shared
|
||||||
object storage capability between them. (An example of this
|
Object Storage capability between them. An example of this
|
||||||
technique, as well as a configuration walk-through, is
|
technique, as well as a configuration walk-through, is
|
||||||
available at <link
|
available at <link
|
||||||
xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>).
|
xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>.
|
||||||
Another option in this scenario is to build a dedicated set of
|
Another option in this scenario is to build a dedicated set of
|
||||||
tenant private networks across the secondary link using
|
tenant private networks across the secondary link, using
|
||||||
overlay networks with a third party mapping the site overlays
|
overlay networks with a third party mapping the site overlays
|
||||||
to each other.</para>
|
to each other.</para>
|
||||||
<para>The capacity requirements of the links between sites is
|
<para>The capacity requirements of the links between sites is
|
||||||
driven by application behavior. If the latency of the links is
|
driven by application behavior. If the link latency is
|
||||||
too high, certain applications that use a large number of
|
too high, certain applications that use a large number of
|
||||||
small packets, for example RPC calls, may encounter issues
|
small packets, for example RPC calls, may encounter issues
|
||||||
communicating with each other or operating properly.
|
communicating with each other or operating properly.
|
||||||
Additionally, OpenStack may encounter similar types of issues.
|
Additionally, OpenStack may encounter similar types of issues.
|
||||||
To mitigate this, tuning of the Identity service call timeouts may be
|
To mitigate this, Identity service call timeouts can be
|
||||||
necessary to prevent issues authenticating against a central
|
tuned to prevent issues authenticating against a central
|
||||||
Identity service.</para>
|
Identity service.</para>
|
||||||
<para>Another capacity consideration when it comes to networking
|
<para>Another network capacity consideration for a multi-site
|
||||||
for a multi-site deployment is the available amount and
|
deployment is the amount and performance of overlay networks
|
||||||
performance of overlay networks for tenant networks. If using
|
available for tenant networks. If using shared tenant networks
|
||||||
shared tenant networks across zones, it is imperative that an
|
across zones, it is imperative that an external overlay manager
|
||||||
external overlay manager or controller be used to map these
|
or controller be used to map these overlays together. It is
|
||||||
overlays together. It is necessary to ensure the amount of
|
necessary to ensure the amount of possible IDs between the zones
|
||||||
possible IDs between the zones are identical. Note that, as of
|
are identical.</para>
|
||||||
the Kilo release, OpenStack Networking was not capable of managing
|
<note>
|
||||||
tunnel IDs across installations. This means that if one site
|
<para>As of the Kilo release, OpenStack Networking was not
|
||||||
runs out of IDs, but other does not, that tenant's network
|
capable of managing tunnel IDs across installations. So if
|
||||||
is unable to reach the other site.</para>
|
one site runs out of IDs, but another does not, that tenant's
|
||||||
|
network is unable to reach the other site.</para>
|
||||||
|
</note>
|
||||||
<para>Capacity can take other forms as well. The ability for a
|
<para>Capacity can take other forms as well. The ability for a
|
||||||
region to grow depends on scaling out the number of available
|
region to grow depends on scaling out the number of available
|
||||||
compute nodes. This topic is covered in greater detail in the
|
compute nodes. This topic is covered in greater detail in the
|
||||||
section for compute-focused deployments. However, it should be
|
section for compute-focused deployments. However, it may be
|
||||||
noted that cells may be necessary to grow an individual region
|
necessary to grow cells in an individual region, depending on
|
||||||
beyond a certain point. This point depends on the size of your
|
the size of your cluster and the ratio of virtual machines per
|
||||||
cluster and the ratio of virtual machines per
|
|
||||||
hypervisor.</para>
|
hypervisor.</para>
|
||||||
<para>A third form of capacity comes in the multi-region-capable
|
<para>A third form of capacity comes in the multi-region-capable
|
||||||
components of OpenStack. Centralized Object Storage is capable
|
components of OpenStack. Centralized Object Storage is capable
|
||||||
of serving objects through a single namespace across multiple
|
of serving objects through a single namespace across multiple
|
||||||
regions. Since this works by accessing the object store via
|
regions. Since this works by accessing the object store through
|
||||||
swift proxy, it is possible to overload the proxies. There are
|
swift proxy, it is possible to overload the proxies. There are
|
||||||
two options available to mitigate this issue. The first is to
|
two options available to mitigate this issue:</para>
|
||||||
deploy a large number of swift proxies. The drawback to this
|
<itemizedlist>
|
||||||
is that the proxies are not load-balanced and a large file
|
<listitem>
|
||||||
request could continually hit the same proxy. The other way to
|
<para>Deploy a large number of swift proxies. The drawback is
|
||||||
mitigate this is to front-end the proxies with a caching HTTP
|
that the proxies are not load-balanced and a large file
|
||||||
proxy and load balancer. Since swift objects are returned to
|
request could continually hit the same proxy.</para>
|
||||||
the requester via HTTP, this load balancer would alleviate the
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Add a caching HTTP proxy and load balancer in front of
|
||||||
|
the swift proxies. Since swift objects are returned to the
|
||||||
|
requester via HTTP, this load balancer would alleviate the
|
||||||
load required on the swift proxies.</para>
|
load required on the swift proxies.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
<section xml:id="utilization-multi-site"><title>Utilization</title>
|
<section xml:id="utilization-multi-site"><title>Utilization</title>
|
||||||
<para>While constructing a multi-site OpenStack environment is the
|
<para>While constructing a multi-site OpenStack environment is the
|
||||||
goal of this guide, the real test is whether an application
|
goal of this guide, the real test is whether an application
|
||||||
can utilize it.</para>
|
can utilize it.</para>
|
||||||
<para>Identity is normally the first interface for the majority of
|
<para>The Identity service is normally the first interface for
|
||||||
OpenStack users. Interacting with the Identity service is required for
|
OpenStack users and is required for almost all major operations
|
||||||
almost all major operations within OpenStack. Therefore, it is
|
within OpenStack. Therefore, it is important that you provide users
|
||||||
important to ensure that you provide users with a single URL
|
with a single URL for Identity service authentication, and
|
||||||
for Identity service authentication. Equally important is proper
|
document the configuration of regions within the Identity service.
|
||||||
documentation and configuration of regions within the Identity service.
|
|
||||||
Each of the sites defined in your installation is considered
|
Each of the sites defined in your installation is considered
|
||||||
to be a region in Identity nomenclature. This is important for
|
to be a region in Identity nomenclature. This is important for
|
||||||
the users of the system, when reading Identity documentation,
|
the users, as it is required to define the region name when
|
||||||
as it is required to define the region name when providing
|
providing actions to an API endpoint or in the dashboard.</para>
|
||||||
actions to an API endpoint or in the dashboard.</para>
|
|
||||||
<para>Load balancing is another common issue with multi-site
|
<para>Load balancing is another common issue with multi-site
|
||||||
installations. While it is still possible to run HAproxy
|
installations. While it is still possible to run HAproxy
|
||||||
instances with Load-Balancer-as-a-Service, these are local
|
instances with Load-Balancer-as-a-Service, these are defined
|
||||||
to a specific region. Some applications may be able to cope
|
to a specific region. Some applications can manage this using
|
||||||
with this via internal mechanisms. Others, however, may
|
internal mechanisms. Other applications may require the
|
||||||
require the implementation of an external system including
|
implementation of an external system, including global services
|
||||||
global services load balancers or anycast-advertised
|
load balancers or anycast-advertised DNS.</para>
|
||||||
DNS.</para>
|
|
||||||
<para>Depending on the storage model chosen during site design,
|
<para>Depending on the storage model chosen during site design,
|
||||||
storage replication and availability are also a concern
|
storage replication and availability are also a concern
|
||||||
for end-users. If an application is capable of understanding
|
for end-users. If an application can support regions, then it
|
||||||
regions, then it is possible to keep the object storage system
|
is possible to keep the object storage system separated by region.
|
||||||
separated by region. In this case, users who want to have an
|
In this case, users who want to have an object available to
|
||||||
object available to more than one region need to do the
|
more than one region need to perform cross-site replication.
|
||||||
cross-site replication themselves. With a centralized swift
|
However, with a centralized swift proxy, the user may need to
|
||||||
proxy, however, the user may need to benchmark the replication
|
benchmark the replication timing of the Object Storage back end.
|
||||||
timing of the Object Storage back end. Benchmarking allows the
|
Benchmarking allows the operational staff to provide users with
|
||||||
operational staff to provide users with an understanding of
|
an understanding of the amount of time required for a stored or
|
||||||
the amount of time required for a stored or modified object to
|
modified object to become available to the entire environment.</para>
|
||||||
become available to the entire environment.</para></section>
|
</section>
|
||||||
<section xml:id="performance"><title>Performance</title>
|
<section xml:id="performance"><title>Performance</title>
|
||||||
<para>Determining the performance of a multi-site installation
|
<para>Determining the performance of a multi-site installation
|
||||||
involves considerations that do not come into play in a
|
involves considerations that do not come into play in a
|
||||||
single-site deployment. Being a distributed deployment,
|
single-site deployment. Being a distributed deployment,
|
||||||
multi-site deployments incur a few extra penalties to
|
performance in multi-site deployments may be affected in certain
|
||||||
performance in certain situations.</para>
|
situations.</para>
|
||||||
<para>Since multi-site systems can be geographically separated,
|
<para>Since multi-site systems can be geographically separated,
|
||||||
they may have worse than normal latency or jitter when
|
there may be greater latency or jitter when communicating across
|
||||||
communicating across regions. This can especially impact
|
regions. This can especially impact systems like the OpenStack
|
||||||
systems like the OpenStack Identity service when making
|
Identity service when making authentication attempts from regions
|
||||||
authentication attempts from regions that do not contain the
|
that do not contain the centralized Identity implementation. It
|
||||||
centralized Identity implementation. It can also affect
|
can also affect applications which rely on Remote Procedure Call (RPC)
|
||||||
certain applications which rely on remote procedure call (RPC)
|
for normal operation. An example of this can be seen in high
|
||||||
for normal operation. An example of this can be seen in High
|
performance computing workloads.</para>
|
||||||
Performance Computing workloads.</para>
|
|
||||||
<para>Storage availability can also be impacted by the
|
<para>Storage availability can also be impacted by the
|
||||||
architecture of a multi-site deployment. A centralized Object
|
architecture of a multi-site deployment. A centralized Object
|
||||||
Storage service requires more time for an object to be
|
Storage service requires more time for an object to be
|
||||||
@@ -137,4 +140,37 @@
|
|||||||
to manually cope with this limitation by creating duplicate
|
to manually cope with this limitation by creating duplicate
|
||||||
block storage entries in each region.</para>
|
block storage entries in each region.</para>
|
||||||
</section>
|
</section>
|
||||||
|
<section xml:id="openstack-components_multi-site">
|
||||||
|
<title>OpenStack components</title>
|
||||||
|
<para>Most OpenStack installations require a bare minimum set of
|
||||||
|
pieces to function. These include the OpenStack Identity
|
||||||
|
(keystone) for authentication, OpenStack Compute
|
||||||
|
(nova) for compute, OpenStack Image service (glance) for image
|
||||||
|
storage, OpenStack Networking (neutron) for networking, and
|
||||||
|
potentially an object store in the form of OpenStack Object
|
||||||
|
Storage (swift). Deploying a multi-site installation also demands extra
|
||||||
|
components in order to coordinate between regions. A centralized
|
||||||
|
Identity service is necessary to provide the single authentication
|
||||||
|
point. A centralized dashboard is also recommended to provide a
|
||||||
|
single login point and a mapping to the API and CLI
|
||||||
|
options available. A centralized Object Storage service may also
|
||||||
|
be used, but will require the installation of the swift proxy
|
||||||
|
service.</para>
|
||||||
|
<para>It may also be helpful to install a few extra options in
|
||||||
|
order to facilitate certain use cases. For example,
|
||||||
|
installing Designate may assist in automatically generating
|
||||||
|
DNS domains for each region with an automatically-populated
|
||||||
|
zone full of resource records for each instance. This
|
||||||
|
facilitates using DNS as a mechanism for determining which
|
||||||
|
region will be selected for certain applications.</para>
|
||||||
|
<para>Another useful tool for managing a multi-site installation
|
||||||
|
is Orchestration (heat). The Orchestration module allows the
|
||||||
|
use of templates to define a set of instances to be launched
|
||||||
|
together or for scaling existing sets. It can also be used to
|
||||||
|
set up matching or differentiated groupings based on
|
||||||
|
regions. For instance, if an application requires an equally
|
||||||
|
balanced number of nodes across sites, the same heat template
|
||||||
|
can be used to cover each site with small alterations to only
|
||||||
|
the region name.</para>
|
||||||
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|||||||
@@ -6,55 +6,16 @@
|
|||||||
xml:id="user-requirements-multi-site">
|
xml:id="user-requirements-multi-site">
|
||||||
<?dbhtml stop-chunking?>
|
<?dbhtml stop-chunking?>
|
||||||
<title>User requirements</title>
|
<title>User requirements</title>
|
||||||
<para>A multi-site architecture is complex and has its own risks
|
|
||||||
and considerations, therefore it is important to make sure
|
|
||||||
when contemplating the design such an architecture that it
|
|
||||||
meets the user and business requirements.</para>
|
|
||||||
<para>Many jurisdictions have legislative and regulatory
|
|
||||||
requirements governing the storage and management of data in
|
|
||||||
cloud environments. Common areas of regulation include:</para>
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>Data retention policies ensuring storage of
|
|
||||||
persistent data and records management to meet data
|
|
||||||
archival requirements.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>Data ownership policies governing the possession and
|
|
||||||
responsibility for data.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>Data sovereignty policies governing the storage of
|
|
||||||
data in foreign countries or otherwise separate
|
|
||||||
jurisdictions.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>Data compliance policies governing types of
|
|
||||||
information that needs to reside in certain locations
|
|
||||||
due to regular issues and, more importantly, cannot
|
|
||||||
reside in other locations for the same reason.</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
<para>Examples of such legal frameworks include the data
|
|
||||||
protection framework of the European Union (<link
|
|
||||||
xlink:href="http://ec.europa.eu/justice/data-protection">http://ec.europa.eu/justice/data-protection</link>)
|
|
||||||
and the requirements of the Financial Industry Regulatory
|
|
||||||
Authority (<link
|
|
||||||
xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
|
|
||||||
in the United States. Consult a local regulatory body for more
|
|
||||||
information.</para>
|
|
||||||
<section xml:id="workload-characteristics">
|
<section xml:id="workload-characteristics">
|
||||||
<title>Workload characteristics</title>
|
<title>Workload characteristics</title>
|
||||||
<para>The expected workload is a critical requirement that needs
|
<para>An understanding of the expected workloads for a desired
|
||||||
to be captured to guide decision-making. An understanding of
|
multi-site environment and use case is an important factor in
|
||||||
the workloads in the context of the desired multi-site
|
the decision-making process. In this context, <literal>workload</literal>
|
||||||
environment and use case is important. Another way of thinking
|
refers to the way the systems are used. A workload could be a
|
||||||
about a workload is to think of it as the way the systems are
|
single application or a suite of applications that work together.
|
||||||
used. A workload could be a single application or a suite of
|
It could also be a duplicate set of applications that need to
|
||||||
applications that work together. It could also be a duplicate
|
run in multiple cloud environments. Often in a multi-site deployment,
|
||||||
set of applications that need to run in multiple cloud
|
the same workload will need to work identically in more than one
|
||||||
environments. Often in a multi-site deployment the same
|
|
||||||
workload will need to work identically in more than one
|
|
||||||
physical location.</para>
|
physical location.</para>
|
||||||
<para>This multi-site scenario likely includes one or more of the
|
<para>This multi-site scenario likely includes one or more of the
|
||||||
other scenarios in this book with the additional requirement
|
other scenarios in this book with the additional requirement
|
||||||
@@ -72,26 +33,26 @@
|
|||||||
<title>Consistency of images and templates across different
|
<title>Consistency of images and templates across different
|
||||||
sites</title>
|
sites</title>
|
||||||
<para>It is essential that the deployment of instances is
|
<para>It is essential that the deployment of instances is
|
||||||
consistent across the different sites. This needs to be built
|
consistent across the different sites and built
|
||||||
into the infrastructure. If the OpenStack Object Storage is used as
|
into the infrastructure. If the OpenStack Object Storage is used as
|
||||||
a back end for the Image service, it is possible to create repositories of
|
a back end for the Image service, it is possible to create repositories
|
||||||
consistent images across multiple sites. Having central
|
of consistent images across multiple sites. Having central
|
||||||
endpoints with multiple storage nodes allows consistent centralized
|
endpoints with multiple storage nodes allows consistent centralized
|
||||||
storage for each and every site.</para>
|
storage for every site.</para>
|
||||||
<para>Not using a centralized object store increases operational
|
<para>Not using a centralized object store increases the operational
|
||||||
overhead so that a consistent image library can be maintained. This
|
overhead of maintaining a consistent image library. This
|
||||||
could include development of a replication mechanism to handle
|
could include development of a replication mechanism to handle
|
||||||
the transport of images and the changes to the images across
|
the transport of images and the changes to the images across
|
||||||
multiple sites.</para></section>
|
multiple sites.</para></section>
|
||||||
<section xml:id="high-availability-multi-site"><title>High availability</title>
|
<section xml:id="high-availability-multi-site">
|
||||||
|
<title>High availability</title>
|
||||||
<para>If high availability is a requirement to provide continuous
|
<para>If high availability is a requirement to provide continuous
|
||||||
infrastructure operations, a basic requirement of high
|
infrastructure operations, a basic requirement of high
|
||||||
availability should be defined.</para>
|
availability should be defined.</para>
|
||||||
<para>The OpenStack management components need to have a basic and
|
<para>The OpenStack management components need to have a basic and
|
||||||
minimal level of redundancy. The simplest example is the loss
|
minimal level of redundancy. The simplest example is the loss
|
||||||
of any single site has no significant impact on the
|
of any single site should have minimal impact on the
|
||||||
availability of the OpenStack services of the entire
|
availability of the OpenStack services.</para>
|
||||||
infrastructure.</para>
|
|
||||||
<para>The <link
|
<para>The <link
|
||||||
xlink:href="http://docs.openstack.org/high-availability-guide/content/"><citetitle>OpenStack
|
xlink:href="http://docs.openstack.org/high-availability-guide/content/"><citetitle>OpenStack
|
||||||
High Availability Guide</citetitle></link>
|
High Availability Guide</citetitle></link>
|
||||||
@@ -111,14 +72,12 @@
|
|||||||
WAN network design between the sites.</para>
|
WAN network design between the sites.</para>
|
||||||
<para>Connecting more than two sites increases the challenges and
|
<para>Connecting more than two sites increases the challenges and
|
||||||
adds more complexity to the design considerations. Multi-site
|
adds more complexity to the design considerations. Multi-site
|
||||||
implementations require extra planning to address the
|
implementations require planning to address the additional
|
||||||
additional topology complexity used for internal and external
|
topology used for internal and external connectivity. Some options
|
||||||
connectivity. Some options include full mesh topology, hub
|
include full mesh topology, hub spoke, spine leaf, and 3D Torus.</para>
|
||||||
spoke, spine leaf, or 3d Torus.</para>
|
<para>If applications running in a cloud are not cloud-aware, there
|
||||||
<para>Not all the applications running in a cloud are cloud-aware.
|
should be clear measures and expectations to define what the
|
||||||
If that is the case, there should be clear measures and
|
infrastructure can and cannot support. An example would be
|
||||||
expectations to define what the infrastructure can support
|
|
||||||
and, more importantly, what it cannot. An example would be
|
|
||||||
shared storage between sites. It is possible, however such a
|
shared storage between sites. It is possible, however such a
|
||||||
solution is not native to OpenStack and requires a third-party
|
solution is not native to OpenStack and requires a third-party
|
||||||
hardware vendor to fulfill such a requirement. Another example
|
hardware vendor to fulfill such a requirement. Another example
|
||||||
@@ -126,21 +85,21 @@
|
|||||||
in object storage directly. These applications need to be
|
in object storage directly. These applications need to be
|
||||||
cloud aware to make good use of an OpenStack Object
|
cloud aware to make good use of an OpenStack Object
|
||||||
Store.</para></section>
|
Store.</para></section>
|
||||||
<section xml:id="application-readiness"><title>Application readiness</title>
|
<section xml:id="application-readiness">
|
||||||
|
<title>Application readiness</title>
|
||||||
<para>Some applications are tolerant of the lack of synchronized
|
<para>Some applications are tolerant of the lack of synchronized
|
||||||
object storage, while others may need those objects to be
|
object storage, while others may need those objects to be
|
||||||
replicated and available across regions. Understanding of how
|
replicated and available across regions. Understanding how
|
||||||
the cloud implementation impacts new and existing applications
|
the cloud implementation impacts new and existing applications
|
||||||
is important for risk mitigation and the overall success of a
|
is important for risk mitigation, and the overall success of a
|
||||||
cloud project. Applications may have to be written to expect
|
cloud project. Applications may have to be written or rewritten
|
||||||
an infrastructure with little to no redundancy. Existing
|
for an infrastructure with little to no redundancy, or with the
|
||||||
applications not developed with the cloud in mind may need to
|
cloud in mind.</para></section>
|
||||||
be rewritten.</para></section>
|
<section xml:id="cost-multi-site">
|
||||||
<section xml:id="cost-multi-site"><title>Cost</title>
|
<title>Cost</title>
|
||||||
<para>The requirement of having more than one site has a cost
|
<para>A greater number of sites increase cost and complexity for a
|
||||||
attached to it. The greater the number of sites, the greater
|
multi-site deployment. Costs can be broken down into the following
|
||||||
the cost and complexity. Costs can be broken down into the
|
categories:</para>
|
||||||
following categories:</para>
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Compute resources</para>
|
<para>Compute resources</para>
|
||||||
@@ -163,34 +122,32 @@
|
|||||||
</itemizedlist></section>
|
</itemizedlist></section>
|
||||||
<section xml:id="site-loss-and-recovery">
|
<section xml:id="site-loss-and-recovery">
|
||||||
<title>Site loss and recovery</title>
|
<title>Site loss and recovery</title>
|
||||||
<para>Outages can cause loss of partial or full functionality of a
|
<para>Outages can cause partial or full loss of site functionality.
|
||||||
site. Strategies should be implemented to understand and plan
|
Strategies should be implemented to understand and plan for recovery
|
||||||
for recovery scenarios.</para>
|
scenarios.</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>The deployed applications need to continue to
|
<para>The deployed applications need to continue to
|
||||||
function and, more importantly, consideration should
|
function and, more importantly, you must consider the
|
||||||
be taken of the impact on the performance and
|
impact on the performance and reliability of the application
|
||||||
reliability of the application when a site is
|
when a site is unavailable.</para>
|
||||||
unavailable.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>It is important to understand what happens to the
|
<para>It is important to understand what happens to the
|
||||||
replication of objects and data between the sites when
|
replication of objects and data between the sites when
|
||||||
a site goes down. If this causes queues to start
|
a site goes down. If this causes queues to start
|
||||||
building up, consider how long these queues can
|
building up, consider how long these queues can
|
||||||
safely exist until something explodes.</para>
|
safely exist until an error occurs.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Ensure determination of the method for resuming
|
<para>After an outage, ensure the method for resuming proper
|
||||||
proper operations of a site when it comes back online
|
operations of a site is implemented when it comes back online.
|
||||||
after a disaster. We recommend you architect the
|
We recommend you architect the recovery to avoid race conditions.</para>
|
||||||
recovery to avoid race conditions.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist></section>
|
</itemizedlist></section>
|
||||||
<section xml:id="compliance-and-geo-location-multi-site">
|
<section xml:id="compliance-and-geo-location-multi-site">
|
||||||
<title>Compliance and geo-location</title>
|
<title>Compliance and geo-location</title>
|
||||||
<para>An organization could have certain legal obligations and
|
<para>An organization may have certain legal obligations and
|
||||||
regulatory compliance measures which could require certain
|
regulatory compliance measures which could require certain
|
||||||
workloads or data to not be located in certain regions.</para></section>
|
workloads or data to not be located in certain regions.</para></section>
|
||||||
<section xml:id="auditing-multi-site">
|
<section xml:id="auditing-multi-site">
|
||||||
@@ -210,11 +167,10 @@
|
|||||||
site.</para></section>
|
site.</para></section>
|
||||||
<section xml:id="authentication-between-sites">
|
<section xml:id="authentication-between-sites">
|
||||||
<title>Authentication between sites</title>
|
<title>Authentication between sites</title>
|
||||||
<para>Ideally it is best to have a single authentication domain
|
<para>It is recommended to have a single authentication domain
|
||||||
and not need a separate implementation for each and every
|
rather than a separate implementation for each and every
|
||||||
site. This, of course, requires an authentication
|
site. This requires an authentication mechanism that is highly
|
||||||
mechanism that is highly available and distributed to ensure
|
available and distributed to ensure continuous operation.
|
||||||
continuous operation. Authentication server locality is also
|
Authentication server locality might be required and should be
|
||||||
something that might be needed as well and should be planned
|
planned for.</para></section>
|
||||||
for.</para></section>
|
|
||||||
</section>
|
</section>
|
||||||
|
|||||||
Reference in New Issue
Block a user