Multi-site chapter edits

1. Edits to the multi-site chapter 2. Removed duplicated legal content which was added to a common section. See https://review.openstack.org/#/c/212299/ Change-Id: I10e3a04650548454c73024d87cbbb6fda63454e8 Implements: blueprint arch-guide
2015-08-12 23:26:53 +10:00 · 2015-08-12 23:26:53 +10:00 · 68e8c66e79
commit 68e8c66e79
parent 87ff7002f8
6 changed files with 339 additions and 350 deletions
--- a/doc/arch-design/ch_multi_site.xml
+++ b/doc/arch-design/ch_multi_site.xml
@ -6,16 +6,9 @@
  xml:id="multi_site">
    <title>Multi-site</title>

-    <para>A multi-site OpenStack environment is one in which services,
-        located in more than one data center, are used to provide the
-        overall solution. Usage requirements of different multi-site
-        clouds may vary widely, but they share some common needs.
-        OpenStack is capable of running in a multi-region
+    <para>OpenStack is capable of running in a multi-region
        configuration. This enables some parts of OpenStack to
-        effectively manage a group of sites as a single cloud. With
-        careful planning in the design phase, OpenStack can act as an
-        excellent multi-site cloud solution for a multitude of
-        needs.</para>
+        effectively manage a group of sites as a single cloud.</para>
    <para>Some use cases that might indicate a need for a multi-site
        deployment of OpenStack include:</para>
    <itemizedlist>
--- a/doc/arch-design/multi_site/section_architecture_multi_site.xml
+++ b/doc/arch-design/multi_site/section_architecture_multi_site.xml
@ -6,59 +6,61 @@
  xml:id="arch-design-architecture-multiple-site">
    <?dbhtml stop-chunking?>
    <title>Architecture</title>
-    <para>This graphic is a high level diagram of a multi-site OpenStack
-        architecture. Each site is an OpenStack cloud but it may be necessary to
-        architect the sites on different versions. For example, if the second
-        site is intended to be a replacement for the first site, they would be
-        different. Another common design would be a private OpenStack cloud with
-        replicated site that would be used for high availability or disaster
-        recovery. The most important design decision is how to configure the
-        storage. It can be configured as a single shared pool or separate pools,
-        depending on the user and technical requirements.</para>
+    <para><xref linkend="multi-site_arch"/>
+      illustrates a high level multi-site OpenStack
+      architecture. Each site is an OpenStack cloud but it may be necessary
+      to architect the sites on different versions. For example, if the
+      second site is intended to be a replacement for the first site,
+      they would be different. Another common design would be a private
+      OpenStack cloud with a replicated site that would be used for high
+      availability or disaster recovery. The most important design decision
+      is configuring storage as a single shared pool or separate pools,
+      depending on user and technical requirements.</para>
+   <figure xml:id="multi-site_arch">
+     <title>Multi-site OpenStack architecture</title>
    <mediaobject>
        <imageobject>
-            <imagedata contentwidth="4in"
+            <imagedata contentwidth="6in"
                fileref="../figures/Multi-Site_shared_keystone_horizon_swift1.png"/>
        </imageobject>
    </mediaobject>
+  </figure>
    <section xml:id="openstack-services-architecture">
        <title>OpenStack services architecture</title>
        <para>The OpenStack Identity service, which is used by all other
-            OpenStack components for authorization and the catalog of service
-            endpoints, supports the concept of regions. A region is a logical
-            construct that can be used to group OpenStack services that are in
-            close proximity to one another. The concept of regions is flexible;
-            it may can contain OpenStack service endpoints located within a
-            distinct geographic region, or regions. It may be smaller in scope,
-            where a region is a single rack within a data center or even a
-            single blade chassis, with multiple regions existing in adjacent
+            OpenStack components for authorization and the catalog of
+            service endpoints, supports the concept of regions. A region
+            is a logical construct used to group OpenStack services in
+            close proximity to one another. The concept of
+            regions is flexible; it may can contain OpenStack service
+            endpoints located within a distinct geographic region or regions.
+            It may be smaller in scope, where a region is a single rack
+            within a data center, with multiple regions existing in adjacent
            racks in the same data center.</para>
-        <para>The majority of OpenStack components are designed to run within
-            the context of a single region. The OpenStack Compute service is
-            designed to manage compute resources within a region, with support
-            for subdivisions of compute resources by using availability zones
-            and cells. The OpenStack Networking service can be used to manage
-            network resources in the same broadcast domain or collection of
-            switches that are linked. The OpenStack Block Storage service
-            controls storage resources within a region with all storage
-            resources residing on the same storage network. Like the OpenStack
-            Compute service, the OpenStack Block Storage service also supports
-            the availability zone construct which can be used to subdivide
-            storage resources.</para>
+        <para>The majority of OpenStack components are designed to run
+          within the context of a single region. The OpenStack Compute
+          service is designed to manage compute resources within a region,
+          with support for subdivisions of compute resources by using
+          availability zones and cells. The OpenStack Networking service
+          can be used to manage network resources in the same broadcast
+          domain or collection of switches that are linked. The OpenStack
+          Block Storage service controls storage resources within a region
+          with all storage resources residing on the same storage network.
+          Like the OpenStack Compute service, the OpenStack Block Storage
+          service also supports the availability zone construct which can
+          be used to subdivide storage resources.</para>
        <para>The OpenStack dashboard, OpenStack Identity, and OpenStack
            Object Storage services are components that can each be deployed
            centrally in order to serve multiple regions.</para>
    </section>
    <section xml:id="arch-multi-storage">
        <title>Storage</title>
-        <para>With multiple OpenStack regions, having a single OpenStack Object
-            Storage service endpoint that delivers shared file storage for all
-            regions is desirable. The Object Storage service internally
-            replicates files to multiple nodes. The advantages of this are that,
-            if a file placed into the Object Storage service is visible to all
-            regions, it can be used by applications or workloads in any or all
-            of the regions. This simplifies high availability failover and
-            disaster recovery rollback.</para>
+        <para>With multiple OpenStack regions, it is recommended to configure
+          a single OpenStack Object Storage service endpoint to deliver
+          shared file storage for all regions. The Object Storage service
+          internally replicates files to multiple nodes which can be used
+          by applications or workloads in multiple regions. This simplifies
+          high availability failover and disaster recovery rollback.</para>
        <para>In order to scale the Object Storage service to meet the workload
            of multiple regions, multiple proxy workers are run and
            load-balanced, storage nodes are installed in each region, and the
@ -68,19 +70,20 @@
            reducing the actual load on the storage network. In addition to an
            HTTP caching layer, use a caching layer like Memcache to cache
            objects between the proxy and storage nodes.</para>
-        <para>If the cloud is designed without a single Object Storage Service
-            endpoint for multiple regions, and instead a separate Object Storage
-            Service endpoint is made available in each region, applications are
+        <para>If the cloud is designed with a separate Object Storage
+            Service endpoint made available in each region, applications are
            required to handle synchronization (if desired) and other management
            operations to ensure consistency across the nodes. For some
            applications, having multiple Object Storage Service endpoints
            located in the same region as the application may be desirable due
            to reduced latency, cross region bandwidth, and ease of
            deployment.</para>
-        <para>For the Block Storage service, the most important decisions are
-            the selection of the storage technology and whether or not a
-            dedicated network is used to carry storage traffic from the storage
-            service to the compute nodes.</para>
+          <note>
+            <para>For the Block Storage service, the most important decisions
+              are the selection of the storage technology, and whether
+              a dedicated network is used to carry storage traffic
+              from the storage service to the compute nodes.</para>
+          </note>
    </section>
    <section xml:id="arch-networking-multiple">
        <title>Networking</title>
@ -100,18 +103,19 @@
    </section>
    <section xml:id="arch-dependencies-multiple">
        <title>Dependencies</title>
-        <para>The architecture for a multi-site installation of OpenStack is
-            dependent on a number of factors. One major dependency to consider
-            is storage. When designing the storage system, the storage mechanism
-            needs to be determined. Once the storage type is determined, how it
-            is accessed is critical. For example, we recommend that
-            storage should use a dedicated network. Another concern is how
-            the storage is configured to protect the data. For example, the
-            recovery point objective (RPO) and the recovery time objective
-            (RTO). How quickly can the recovery from a fault be completed,
-            determines how often the replication of data is required. Ensure that
-            enough storage is allocated to support the data protection
-            strategy.</para>
+        <para>The architecture for a multi-site OpenStack installation
+          is dependent on a number of factors. One major dependency to
+          consider is storage. When designing the storage system, the
+          storage mechanism needs to be determined. Once the storage
+          type is determined, how it is accessed is critical. For example,
+          we recommend that storage should use a dedicated network.
+          Another concern is how the storage is configured to protect
+          the data. For example, the Recovery Point Objective (RPO) and
+          the Recovery Time Objective (RTO). How quickly recovery from
+          a fault can be completed, determines how often the replication of
+          data is required. Ensure that enough storage is allocated to
+          support the data protection strategy.
+      </para>
        <para>Networking decisions include the encapsulation mechanism that can
            be used for the tenant networks, how large the broadcast domains
            should be, and the contracted SLAs for the interconnects.</para>
--- a/doc/arch-design/multi_site/section_operational_considerations_multi_site.xml
+++ b/doc/arch-design/multi_site/section_operational_considerations_multi_site.xml
@ -6,16 +6,14 @@
  xml:id="operational-considerations-multi-site">
    <?dbhtml stop-chunking?>
    <title>Operational considerations</title>
-    <para>Deployment of a multi-site OpenStack cloud using regions
+    <para>Multi-site OpenStack cloud deployment using regions
        requires that the service catalog contains per-region entries
-        for each service deployed other than the Identity service
-        itself. There is limited support amongst currently available
-        off-the-shelf OpenStack deployment tools for defining multiple
-        regions in this fashion.</para>
-    <para>Deployers must be aware of this and provide the appropriate
+        for each service deployed other than the Identity service. Most
+        off-the-shelf OpenStack deployment tools have limited support
+        for defining multiple regions in this fashion.</para>
+    <para>Deployers should be aware of this and provide the appropriate
        customization of the service catalog for their site either
-        manually or via customization of the deployment tools in
-        use.</para>
+        manually, or by customizing deployment tools in use.</para>
    <note><para>As of the Kilo release, documentation for
        implementing this feature is in progress. See this bug for
        more information:
@ -31,51 +29,46 @@
        host operating systems, guest operating systems, OpenStack
        distributions (if applicable), software-defined infrastructure
        including network controllers and storage systems, and even
-        individual applications need to be evaluated in light of the
-        multi-site nature of the cloud.</para>
+        individual applications need to be evaluated.</para>
    <para>Topics to consider include:</para>
    <itemizedlist>
        <listitem>
-            <para>The specific definition of what constitutes a site
+            <para>The definition of what constitutes a site
                in the relevant licenses, as the term does not
                necessarily denote a geographic or otherwise
-                physically isolated location in the traditional
-                sense.</para>
+                physically isolated location.</para>
        </listitem>
        <listitem>
            <para>Differentiations between "hot" (active) and "cold"
-                (inactive) sites where significant savings may be made
+                (inactive) sites, where significant savings may be made
                in situations where one site is a cold standby for
                disaster recovery purposes only.</para>
        </listitem>
        <listitem>
            <para>Certain locations might require local vendors to
-                provide support and services for each site provides
-                challenges, but will vary on the licensing agreement
-                in place.</para>
+                provide support and services for each site which may vary
+                with the licensing agreement in place.</para>
        </listitem>
    </itemizedlist></section>
    <section xml:id="logging-and-monitoring-multi-site">
      <title>Logging and monitoring</title>
    <para>Logging and monitoring does not significantly differ for a
-        multi-site OpenStack cloud. The same well known tools
-        described in the <link
+        multi-site OpenStack cloud. The tools described in the <link
        xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">Logging
        and monitoring chapter</link> of the <citetitle>Operations
        Guide</citetitle> remain applicable. Logging and monitoring
-        can be provided both on a per-site basis and in a common
+        can be provided on a per-site basis, and in a common
        centralized location.</para>
    <para>When attempting to deploy logging and monitoring facilities
-        to a centralized location, care must be taken with regards to
-        the load placed on the inter-site networking links.</para></section>
+        to a centralized location, care must be taken with the load
+        placed on the inter-site networking links.</para></section>
    <section xml:id="upgrades-multi-site">
      <title>Upgrades</title>
-    <para>In multi-site OpenStack clouds deployed using regions each
-        site is, effectively, an independent OpenStack installation
-        which is linked to the others by using centralized services
-        such as Identity which are shared between sites. At a high
-        level the recommended order of operations to upgrade an
-        individual OpenStack environment is (see the <link
+    <para>In multi-site OpenStack clouds deployed using regions, sites
+        are independent OpenStack installations which are linked
+        together using shared centralized services such as OpenStack
+        Identity. At a high level the recommended order of operations
+        to upgrade an individual OpenStack environment is (see the <link
        xlink:href="http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html">Upgrades
        chapter</link> of the <citetitle>Operations Guide</citetitle>
        for details):</para>
@ -123,22 +116,20 @@
                shared.</para>
        </listitem>
    </orderedlist>
-    <para>Note that Compute
-        upgrades within each site can also be performed in a rolling
+    <para>Compute upgrades within each site can also be performed in a rolling
        fashion. Compute controller services (API, Scheduler, and
        Conductor) can be upgraded prior to upgrading of individual
-        compute nodes. This maximizes the ability of operations staff
-        to keep a site operational for users of compute services while
-        performing an upgrade.</para></section>
+        compute nodes. This allows operations staff to keep a site
+        operational for users of Compute services while performing an
+        upgrade.</para></section>
    <section xml:id="quota-management-multi-site">
      <title>Quota management</title>
-    <para>To prevent system capacities from being exhausted without
-        notification, OpenStack provides operators with the ability to
-        define quotas. Quotas are used to set operational limits and
-        are currently enforced at the tenant (or project) level rather
-        than at the user level.</para>
-    <para>Quotas are defined on a per-region basis. Operators may wish
-        to define identical quotas for tenants in each region of the
+      <para>Quotas are used to set operational limits to prevent system
+        capacities from being exhausted without notification. They are
+        currently enforced at the tenant (or project) level rather than
+        at the user level.</para>
+      <para>Quotas are defined on a per-region basis. Operators can
+        define identical quotas for tenants in each region of the
        cloud to provide a consistent experience, or even create a
        process for synchronizing allocated quotas across regions. It
        is important to note that only the operational limits imposed
@ -161,24 +152,22 @@
        Control (RBAC) policies, defined in a <filename>policy.json</filename> file, for
        each service. Operators edit these files to customize the
        policies for their OpenStack installation. If the application
-        of consistent RBAC policies across sites is considered a
-        requirement, then it is necessary to ensure proper
-        synchronization of the <filename>policy.json</filename> files to all
-        installations.</para>
-    <para>This must be done using normal system administration tools
-        such as rsync as no functionality for synchronizing policies
-        across regions is currently provided within OpenStack.</para></section>
+        of consistent RBAC policies across sites is a requirement, then
+        it is necessary to ensure proper synchronization of the
+        <filename>policy.json</filename> files to all installations.</para>
+    <para>This must be done using system administration tools
+        such as rsync as functionality for synchronizing policies
+        across regions is not currently provided within OpenStack.</para></section>
    <section xml:id="documentation-multi-site">
      <title>Documentation</title>
    <para>Users must be able to leverage cloud infrastructure and
        provision new resources in the environment. It is important
-        that user documentation is accessible by users of the cloud
-        infrastructure to ensure they are given sufficient information
-        to help them leverage the cloud. As an example, by default
-        OpenStack schedules instances on a compute node
+        that user documentation is accessible by users to ensure they
+        are given sufficient information to help them leverage the cloud.
+        As an example, by default OpenStack schedules instances on a compute node
        automatically. However, when multiple regions are available,
-        it is left to the end user to decide in which region to
-        schedule the new instance. The dashboard presents the user with
+        the end user needs to decide in which region to schedule the
+        new instance. The dashboard presents the user with
        the first region in your configuration. The API and CLI tools
        do not execute commands unless a valid region is specified.
        It is therefore important to provide documentation to your
--- a/doc/arch-design/multi_site/section_prescriptive_examples_multi_site.xml
+++ b/doc/arch-design/multi_site/section_prescriptive_examples_multi_site.xml
@ -22,10 +22,10 @@
        very sensitive to latency and needs a rapid response to
        end-users. After reviewing the user, technical and operational
        considerations, it is determined beneficial to build a number
-        of regions local to the customer's edge. In this case rather
-        than build a few large, centralized data centers, the intent
-        of the architecture is to provide a pair of small data centers
-        in locations that are closer to the customer. In this use
+        of regions local to the customer's edge. Rather than build a
+        few large, centralized data centers, the intent of the architecture
+        is to provide a pair of small data centers in locations that
+        are closer to the customer. In this use
        case, spreading applications out allows for different
        horizontal scaling than a traditional compute workload scale.
        The intent is to scale by creating more copies of the
@ -60,44 +60,47 @@
        expanding the capacity of all regions simultaneously,
        therefore maximizing the cost-effectiveness of the multi-site
        design.</para>
-    <para>One of the key decisions of running this sort of
-        infrastructure is whether or not to provide a redundancy
+    <para>One of the key decisions of running this infrastructure is
+        whether or not to provide a redundancy
        model. Two types of redundancy and high availability models in
        this configuration can be implemented. The first type
-        revolves around the availability of the central OpenStack
+        is the availability of central OpenStack
        components. Keystone can be made highly available in three
        central data centers that host the centralized OpenStack
        components. This prevents a loss of any one of the regions
        causing an outage in service. It also has the added benefit of
        being able to run a central storage repository as a primary
        cache for distributing content to each of the regions.</para>
-    <para>The second redundancy topic is that of the edge data center
-        itself. A second data center in each of the edge regional
-        locations house a second region near the first. This
+    <para>The second redundancy type is the edge data center itself.
+        A second data center in each of the edge regional
+        locations house a second region near the first region. This
        ensures that the application does not suffer degraded
        performance in terms of latency and availability.</para>
-    <para>This figure depicts the solution designed to have both a
-        centralized set of core data centers for OpenStack services
-        and paired edge data centers:</para>
-    <mediaobject>
+      <para><xref linkend="multi-site_customer_edge"/> depicts
+        the solution designed to have both a centralized set of core
+        data centers for OpenStack services and paired edge data centers:</para>
+      <figure xml:id="multi-site_customer_edge">
+        <title>Multi-site architecture example</title>
+        <mediaobject>
        <imageobject>
-            <imagedata contentwidth="4in"
+            <imagedata contentwidth="6in"
                fileref="../figures/Multi-Site_Customer_Edge.png"/>
        </imageobject>
-    </mediaobject>
+      </mediaobject>
+      </figure>
    <section xml:id="geo-redundant-load-balancing">
      <title>Geo-redundant load balancing</title>
    <para>A large-scale web application has been designed with cloud
        principles in mind. The application is designed provide
        service to application store, on a 24/7 basis. The company has
-        typical 2-tier architecture with a web front-end servicing the
-        customer requests and a NoSQL database back end storing the
+        typical two tier architecture with a web front-end servicing the
+        customer requests, and a NoSQL database back end storing the
        information.</para>
    <para>As of late there has been several outages in number of major
-        public cloud providers&mdash;usually due to the fact these
-        applications were running out of a single geographical
-        location. The design therefore should mitigate the chance of a
-        single site causing an outage for their business.</para>
+        public cloud providers due to applications running out of
+        a single geographical location. The design therefore should
+        mitigate the chance of a single site causing an outage for their
+        business.</para>
    <para>The solution would consist of the following OpenStack
        components:</para>
    <itemizedlist>
@ -108,12 +111,11 @@
        <listitem>
            <para>OpenStack Controller services running, Networking,
                dashboard, Block Storage and Compute running locally in
-                each of the three regions. The other services,
-                Identity, Orchestration, Telemetry, Image service and
-                Object Storage can be
-                installed centrally&mdash;with nodes in each of the region
-                providing a redundant OpenStack Controller plane
-                throughout the globe.</para>
+                each of the three regions. Identity service, Orchestration
+                service, Telemetry service, Image service and
+                Object Storage can be installed centrally, with
+                nodes in each of the region providing a redundant
+                OpenStack Controller plane throughout the globe.</para>
        </listitem>
        <listitem>
            <para>OpenStack Compute nodes running the KVM
@ -126,9 +128,9 @@
                replicated on a regular basis.</para>
        </listitem>
        <listitem>
-            <para>A Distributed DNS service available to all
-                regions&mdash;that allows for dynamic update of DNS records of
-                deployed instances.</para>
+            <para>A distributed DNS service available to all
+                regions that allows for dynamic update of DNS
+                records of deployed instances.</para>
        </listitem>
        <listitem>
            <para>A geo-redundant load balancing service can be used
@ -153,10 +155,10 @@
        </listitem>
    </itemizedlist>
    <para>Another autoscaling Heat template can be used to deploy a
-        distributed MongoDB shard over the three locations&mdash;with the
+        distributed MongoDB shard over the three locations, with the
        option of storing required data on a globally available swift
        container. According to the usage and load on the database
-        server&mdash;additional shards can be provisioned according to
+        server, additional shards can be provisioned according to
        the thresholds defined in Telemetry.</para>
 <!--    <para>The reason that three regions were selected here was because of
        the fear of having abnormal load on a single region in the
@ -169,57 +171,66 @@
        autoscaling and auto healing in the event of increased load.
        Additional configuration management tools, such as Puppet or
        Chef could also have been used in this scenario, but were not
-        chosen due to the fact that Orchestration had the appropriate built-in
-        hooks into the OpenStack cloud&mdash;whereas the other tools were
-        external and not native to OpenStack. In addition&mdash;since this
-        deployment scenario was relatively straight forward&mdash;the
-        external tools were not needed.</para>
-    <para>
-        OpenStack Object Storage is used here to serve as a back end for
+        chosen since Orchestration had the appropriate built-in
+        hooks into the OpenStack cloud, whereas the other tools were
+        external and not native to OpenStack. In addition, external
+        tools were not needed since this deployment scenario was straight
+        forward.</para>
+    <para>OpenStack Object Storage is used here to serve as a back end for
        the Image service since it is the most suitable solution for a
-        globally distributed storage solution&mdash;with its own
+        globally distributed storage solution with its own
        replication mechanism. Home grown solutions could also have
-        been used including the handling of replication&mdash;but were not
+        been used including the handling of replication, but were not
        chosen, because Object Storage is already an intricate part of the
-        infrastructure&mdash;and proven solution.</para>
+        infrastructure and a proven solution.</para>
    <para>An external load balancing service was used and not the
        LBaaS in OpenStack because the solution in OpenStack is not
        redundant and does not have any awareness of geo location.</para>
-    <mediaobject>
+      <figure xml:id="multi-site_geo_redundant">
+        <title>Multi-site geo-redundant architecture</title>
+      <mediaobject>
        <imageobject>
-            <imagedata contentwidth="4in"
+            <imagedata contentwidth="6in"
                fileref="../figures/Multi-site_Geo_Redundant_LB.png"/>
        </imageobject>
-    </mediaobject></section>
-    <section xml:id="location-local-services"><title>Location-local service</title>
-    <para>A common use for a multi-site deployment of OpenStack, is
-        for creating a Content Delivery Network. An application that
+      </mediaobject>
+     </figure>
+    </section>
+    <section xml:id="location-local-services">
+      <title>Location-local service</title>
+    <para>A common use for multi-site OpenStack deployment is
+        creating a Content Delivery Network. An application that
        uses a location-local architecture requires low network
-        latency and proximity to the user, in order to provide an
-        optimal user experience, in addition to reducing the cost of
-        bandwidth and transit, since the content resides on sites
-        closer to the customer, instead of a centralized content store
-        that requires utilizing higher cost cross-country links.</para>
-    <para>This architecture usually includes a geo-location component
-        that places user requests at the closest possible node. In
+        latency and proximity to the user to provide an
+        optimal user experience and reduce the cost of bandwidth and
+        transit. The content resides on sites closer to the customer,
+        instead of a centralized content store that requires utilizing
+        higher cost cross-country links.</para>
+    <para>This architecture includes a geo-location component
+        that places user requests to the closest possible node. In
        this scenario, 100% redundancy of content across every site is
-        a goal rather than a requirement, with the intent being to
-        maximize the amount of content available that is within a
-        minimum number of network hops for any given end user. Despite
+        a goal rather than a requirement, with the intent to
+        maximize the amount of content available within a
+        minimum number of network hops for end users. Despite
        these differences, the storage replication configuration has
        significant overlap with that of a geo-redundant load
        balancing use case.</para>
-    <para>In this example, the application utilizing this multi-site
-        OpenStack install that is location aware would launch web
-        server or content serving instances on the compute cluster in
-        each site. Requests from clients are first sent to a
-        global services load balancer that determines the location of
-        the client, then routes the request to the closest OpenStack
-        site where the application completes the request.</para>
-    <mediaobject>
+      <para>In <xref linkend="multi-site_shared_shared_keystone"/>,
+        the application utilizing this multi-site OpenStack install
+        that is location-aware would launch web server or content
+        serving instances on the compute cluster in each site. Requests
+        from clients are first sent to a global services load balancer
+        that determines the location of the client, then routes the
+        request to the closest OpenStack site where the application
+        completes the request.</para>
+      <figure xml:id="multi-site_shared_shared_keystone">
+        <title>Multi-site shared keystone architecture</title>
+      <mediaobject>
        <imageobject>
-            <imagedata contentwidth="4in"
+            <imagedata contentwidth="6in"
                fileref="../figures/Multi-Site_shared_keystone1.png"/>
        </imageobject>
-    </mediaobject></section>
+      </mediaobject>
+     </figure>
+    </section>
 </section>
--- a/doc/arch-design/multi_site/section_tech_considerations_multi_site.xml
+++ b/doc/arch-design/multi_site/section_tech_considerations_multi_site.xml
@ -27,105 +27,108 @@
        high-bandwidth links available between them, it may be wise to
        configure a separate storage replication network between the
        two sites to support a single Swift endpoint and a shared
-        object storage capability between them. (An example of this
+        Object Storage capability between them. An example of this
        technique, as well as a configuration walk-through, is
        available at <link
-        xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>).
+        xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>.
        Another option in this scenario is to build a dedicated set of
-        tenant private networks across the secondary link using
+        tenant private networks across the secondary link, using
        overlay networks with a third party mapping the site overlays
        to each other.</para>
    <para>The capacity requirements of the links between sites is
-        driven by application behavior. If the latency of the links is
+        driven by application behavior. If the link latency is
        too high, certain applications that use a large number of
        small packets, for example RPC calls, may encounter issues
        communicating with each other or operating properly.
        Additionally, OpenStack may encounter similar types of issues.
-        To mitigate this, tuning of the Identity service call timeouts may be
-        necessary to prevent issues authenticating against a central
+        To mitigate this, Identity service call timeouts can be
+        tuned to prevent issues authenticating against a central
        Identity service.</para>
-    <para>Another capacity consideration when it comes to networking
-        for a multi-site deployment is the available amount and
-        performance of overlay networks for tenant networks. If using
-        shared tenant networks across zones, it is imperative that an
-        external overlay manager or controller be used to map these
-        overlays together. It is necessary to ensure the amount of
-        possible IDs between the zones are identical. Note that, as of
-        the Kilo release, OpenStack Networking was not capable of managing
-        tunnel IDs across installations. This means that if one site
-        runs out of IDs, but other does not, that tenant's network
-        is unable to reach the other site.</para>
+    <para>Another network capacity consideration for a multi-site
+        deployment is the amount and performance of overlay networks
+        available for tenant networks. If using shared tenant networks
+        across zones, it is imperative that an external overlay manager
+        or controller be used to map these overlays together. It is
+        necessary to ensure the amount of possible IDs between the zones
+        are identical.</para>
+      <note>
+        <para>As of the Kilo release, OpenStack Networking was not
+          capable of managing tunnel IDs across installations. So if
+          one site runs out of IDs, but another does not, that tenant's
+          network is unable to reach the other site.</para>
+      </note>
    <para>Capacity can take other forms as well. The ability for a
        region to grow depends on scaling out the number of available
        compute nodes. This topic is covered in greater detail in the
-        section for compute-focused deployments. However, it should be
-        noted that cells may be necessary to grow an individual region
-        beyond a certain point. This point depends on the size of your
-        cluster and the ratio of virtual machines per
+        section for compute-focused deployments. However, it may be
+        necessary to grow cells in an individual region, depending on
+        the size of your cluster and the ratio of virtual machines per
        hypervisor.</para>
    <para>A third form of capacity comes in the multi-region-capable
        components of OpenStack. Centralized Object Storage is capable
        of serving objects through a single namespace across multiple
-        regions. Since this works by accessing the object store via
+        regions. Since this works by accessing the object store through
        swift proxy, it is possible to overload the proxies. There are
-        two options available to mitigate this issue. The first is to
-        deploy a large number of swift proxies. The drawback to this
-        is that the proxies are not load-balanced and a large file
-        request could continually hit the same proxy. The other way to
-        mitigate this is to front-end the proxies with a caching HTTP
-        proxy and load balancer. Since swift objects are returned to
-        the requester via HTTP, this load balancer would alleviate the
-        load required on the swift proxies.</para>
+        two options available to mitigate this issue:</para>
+      <itemizedlist>
+        <listitem>
+          <para>Deploy a large number of swift proxies. The drawback is
+            that the proxies are not load-balanced and a large file
+            request could continually hit the same proxy.</para>
+        </listitem>
+        <listitem>
+          <para>Add a caching HTTP proxy and load balancer in front of
+            the swift proxies. Since swift objects are returned to the
+            requester via HTTP, this load balancer would alleviate the
+            load required on the swift proxies.</para>
+         </listitem>
+       </itemizedlist>
    <section xml:id="utilization-multi-site"><title>Utilization</title>
    <para>While constructing a multi-site OpenStack environment is the
        goal of this guide, the real test is whether an application
        can utilize it.</para>
-    <para>Identity is normally the first interface for the majority of
-        OpenStack users. Interacting with the Identity service is required for
-        almost all major operations within OpenStack. Therefore, it is
-        important to ensure that you provide users with a single URL
-        for Identity service authentication. Equally important is proper
-        documentation and configuration of regions within the Identity service.
+    <para>The Identity service is normally the first interface for
+        OpenStack users and is required for almost all major operations
+        within OpenStack. Therefore, it is important that you provide users
+        with a single URL for Identity service authentication, and
+        document the configuration of regions within the Identity service.
        Each of the sites defined in your installation is considered
        to be a region in Identity nomenclature. This is important for
-        the users of the system, when reading Identity documentation,
-        as it is required to define the region name when providing
-        actions to an API endpoint or in the dashboard.</para>
+        the users, as it is required to define the region name when
+        providing actions to an API endpoint or in the dashboard.</para>
    <para>Load balancing is another common issue with multi-site
        installations. While it is still possible to run HAproxy
-        instances with Load-Balancer-as-a-Service, these are local
-        to a specific region. Some applications may be able to cope
-        with this via internal mechanisms. Others, however, may
-        require the implementation of an external system including
-        global services load balancers or anycast-advertised
-        DNS.</para>
+        instances with Load-Balancer-as-a-Service, these are defined
+        to a specific region. Some applications can manage this using
+        internal mechanisms. Other applications may require the
+        implementation of an external system, including global services
+        load balancers or anycast-advertised DNS.</para>
    <para>Depending on the storage model chosen during site design,
        storage replication and availability are also a concern
-        for end-users. If an application is capable of understanding
-        regions, then it is possible to keep the object storage system
-        separated by region. In this case, users who want to have an
-        object available to more than one region need to do the
-        cross-site replication themselves. With a centralized swift
-        proxy, however, the user may need to benchmark the replication
-        timing of the Object Storage back end. Benchmarking allows the
-        operational staff to provide users with an understanding of
-        the amount of time required for a stored or modified object to
-        become available to the entire environment.</para></section>
+        for end-users. If an application can support regions, then it
+        is possible to keep the object storage system separated by region.
+        In this case, users who want to have an object available to
+        more than one region need to perform cross-site replication.
+        However, with a centralized swift proxy, the user may need to
+        benchmark the replication timing of the Object Storage back end.
+        Benchmarking allows the operational staff to provide users with
+        an understanding of the amount of time required for a stored or
+        modified object to become available to the entire environment.</para>
+      </section>
    <section xml:id="performance"><title>Performance</title>
    <para>Determining the performance of a multi-site installation
        involves considerations that do not come into play in a
        single-site deployment. Being a distributed deployment,
-        multi-site deployments incur a few extra penalties to
-        performance in certain situations.</para>
+        performance in multi-site deployments may be affected in certain
+        situations.</para>
    <para>Since multi-site systems can be geographically separated,
-        they may have worse than normal latency or jitter when
-        communicating across regions. This can especially impact
-        systems like the OpenStack Identity service when making
-        authentication attempts from regions that do not contain the
-        centralized Identity implementation. It can also affect
-        certain applications which rely on remote procedure call (RPC)
-        for normal operation. An example of this can be seen in High
-        Performance Computing workloads.</para>
+        there may be greater latency or jitter when communicating across
+        regions. This can especially impact systems like the OpenStack
+        Identity service when making authentication attempts from regions
+        that do not contain the centralized Identity implementation. It
+        can also affect applications which rely on Remote Procedure Call (RPC)
+        for normal operation. An example of this can be seen in high
+        performance computing workloads.</para>
    <para>Storage availability can also be impacted by the
        architecture of a multi-site deployment. A centralized Object
        Storage service requires more time for an object to be
@ -137,4 +140,37 @@
        to manually cope with this limitation by creating duplicate
        block storage entries in each region.</para>
      </section>
+    <section xml:id="openstack-components_multi-site">
+      <title>OpenStack components</title>
+    <para>Most OpenStack installations require a bare minimum set of
+        pieces to function. These include the OpenStack Identity
+        (keystone) for authentication, OpenStack Compute
+        (nova) for compute, OpenStack Image service (glance) for image
+        storage, OpenStack Networking (neutron) for networking, and
+        potentially an object store in the form of OpenStack Object
+        Storage (swift). Deploying a multi-site installation also demands extra
+        components in order to coordinate between regions. A centralized
+        Identity service is necessary to provide the single authentication
+        point. A centralized dashboard is also recommended to provide a
+        single login point and a mapping to the API and CLI
+        options available. A centralized Object Storage service may also
+        be used, but will require the installation of the swift proxy
+        service.</para>
+    <para>It may also be helpful to install a few extra options in
+        order to facilitate certain use cases. For example,
+        installing Designate may assist in automatically generating
+        DNS domains for each region with an automatically-populated
+        zone full of resource records for each instance. This
+        facilitates using DNS as a mechanism for determining which
+        region will be selected for certain applications.</para>
+    <para>Another useful tool for managing a multi-site installation
+        is Orchestration (heat). The Orchestration module allows the
+        use of templates to define a set of instances to be launched
+        together or for scaling existing sets. It can also be used to
+        set up matching or differentiated groupings based on
+        regions. For instance, if an application requires an equally
+        balanced number of nodes across sites, the same heat template
+        can be used to cover each site with small alterations to only
+        the region name.</para>
+    </section>
 </section>
--- a/doc/arch-design/multi_site/section_user_requirements_multi_site.xml
+++ b/doc/arch-design/multi_site/section_user_requirements_multi_site.xml
@ -6,55 +6,16 @@
  xml:id="user-requirements-multi-site">
    <?dbhtml stop-chunking?>
    <title>User requirements</title>
-    <para>A multi-site architecture is complex and has its own risks
-        and considerations, therefore it is important to make sure
-        when contemplating the design such an architecture that it
-        meets the user and business requirements.</para>
-    <para>Many jurisdictions have legislative and regulatory
-        requirements governing the storage and management of data in
-        cloud environments. Common areas of regulation include:</para>
-    <itemizedlist>
-        <listitem>
-            <para>Data retention policies ensuring storage of
-                persistent data and records management to meet data
-                archival requirements.</para>
-        </listitem>
-        <listitem>
-            <para>Data ownership policies governing the possession and
-                responsibility for data.</para>
-        </listitem>
-        <listitem>
-            <para>Data sovereignty policies governing the storage of
-                data in foreign countries or otherwise separate
-                jurisdictions.</para>
-        </listitem>
-        <listitem>
-            <para>Data compliance policies governing types of
-                information that needs to reside in certain locations
-                due to regular issues and, more importantly, cannot
-                reside in other locations for the same reason.</para>
-        </listitem>
-    </itemizedlist>
-    <para>Examples of such legal frameworks include the data
-        protection framework of the European Union (<link
-        xlink:href="http://ec.europa.eu/justice/data-protection">http://ec.europa.eu/justice/data-protection</link>)
-        and the requirements of the Financial Industry Regulatory
-        Authority (<link
-        xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
-        in the United States. Consult a local regulatory body for more
-        information.</para>
    <section xml:id="workload-characteristics">
      <title>Workload characteristics</title>
-    <para>The expected workload is a critical requirement that needs
-        to be captured to guide decision-making. An understanding of
-        the workloads in the context of the desired multi-site
-        environment and use case is important. Another way of thinking
-        about a workload is to think of it as the way the systems are
-        used. A workload could be a single application or a suite of
-        applications that work together. It could also be a duplicate
-        set of applications that need to run in multiple cloud
-        environments. Often in a multi-site deployment the same
-        workload will need to work identically in more than one
+    <para>An understanding of the expected workloads for a desired
+        multi-site environment and use case is an important factor in
+        the decision-making process. In this context, <literal>workload</literal>
+        refers to the way the systems are used. A workload could be a
+        single application or a suite of applications that work together.
+        It could also be a duplicate set of applications that need to
+        run in multiple cloud environments. Often in a multi-site deployment,
+        the same workload will need to work identically in more than one
        physical location.</para>
    <para>This multi-site scenario likely includes one or more of the
        other scenarios in this book with the additional requirement
@ -72,26 +33,26 @@
        <title>Consistency of images and templates across different
        sites</title>
    <para>It is essential that the deployment of instances is
-        consistent across the different sites. This needs to be built
+        consistent across the different sites and built
        into the infrastructure. If the OpenStack Object Storage is used as
-        a back end for the Image service, it is possible to create repositories of
-        consistent images across multiple sites. Having central
+        a back end for the Image service, it is possible to create repositories
+        of consistent images across multiple sites. Having central
        endpoints with multiple storage nodes allows consistent centralized
-        storage for each and every site.</para>
-    <para>Not using a centralized object store increases operational
-        overhead so that a consistent image library can be maintained. This
+        storage for every site.</para>
+      <para>Not using a centralized object store increases the operational
+        overhead of maintaining a consistent image library. This
        could include development of a replication mechanism to handle
        the transport of images and the changes to the images across
        multiple sites.</para></section>
-    <section xml:id="high-availability-multi-site"><title>High availability</title>
+    <section xml:id="high-availability-multi-site">
+      <title>High availability</title>
    <para>If high availability is a requirement to provide continuous
        infrastructure operations, a basic requirement of high
        availability should be defined.</para>
    <para>The OpenStack management components need to have a basic and
        minimal level of redundancy. The simplest example is the loss
-        of any single site has no significant impact on the
-        availability of the OpenStack services of the entire
-        infrastructure.</para>
+        of any single site should have minimal impact on the
+        availability of the OpenStack services.</para>
    <para>The <link
        xlink:href="http://docs.openstack.org/high-availability-guide/content/"><citetitle>OpenStack
        High Availability Guide</citetitle></link>
@ -111,14 +72,12 @@
        WAN network design between the sites.</para>
    <para>Connecting more than two sites increases the challenges and
        adds more complexity to the design considerations. Multi-site
-        implementations require extra planning to address the
-        additional topology complexity used for internal and external
-        connectivity. Some options include full mesh topology, hub
-        spoke, spine leaf, or 3d Torus.</para>
-    <para>Not all the applications running in a cloud are cloud-aware.
-        If that is the case, there should be clear measures and
-        expectations to define what the infrastructure can support
-        and, more importantly, what it cannot. An example would be
+        implementations require planning to address the additional
+        topology used for internal and external connectivity. Some options
+        include full mesh topology, hub spoke, spine leaf, and 3D Torus.</para>
+    <para>If applications running in a cloud are not cloud-aware, there
+        should be clear measures and expectations to define what the
+        infrastructure can and cannot support. An example would be
        shared storage between sites. It is possible, however such a
        solution is not native to OpenStack and requires a third-party
        hardware vendor to fulfill such a requirement. Another example
@ -126,21 +85,21 @@
        in object storage directly. These applications need to be
        cloud aware to make good use of an OpenStack Object
        Store.</para></section>
-    <section xml:id="application-readiness"><title>Application readiness</title>
+    <section xml:id="application-readiness">
+      <title>Application readiness</title>
    <para>Some applications are tolerant of the lack of synchronized
        object storage, while others may need those objects to be
-        replicated and available across regions. Understanding of how
+        replicated and available across regions. Understanding how
        the cloud implementation impacts new and existing applications
-        is important for risk mitigation and the overall success of a
-        cloud project. Applications may have to be written to expect
-        an infrastructure with little to no redundancy. Existing
-        applications not developed with the cloud in mind may need to
-        be rewritten.</para></section>
-    <section xml:id="cost-multi-site"><title>Cost</title>
-    <para>The requirement of having more than one site has a cost
-        attached to it. The greater the number of sites, the greater
-        the cost and complexity. Costs can be broken down into the
-        following categories:</para>
+        is important for risk mitigation, and the overall success of a
+        cloud project. Applications may have to be written or rewritten
+        for an infrastructure with little to no redundancy, or with the
+        cloud in mind.</para></section>
+    <section xml:id="cost-multi-site">
+      <title>Cost</title>
+    <para>A greater number of sites increase cost and complexity for a
+        multi-site deployment. Costs can be broken down into the following
+        categories:</para>
    <itemizedlist>
        <listitem>
            <para>Compute resources</para>
@ -163,34 +122,32 @@
    </itemizedlist></section>
    <section xml:id="site-loss-and-recovery">
      <title>Site loss and recovery</title>
-    <para>Outages can cause loss of partial or full functionality of a
-        site. Strategies should be implemented to understand and plan
-        for recovery scenarios.</para>
+    <para>Outages can cause partial or full loss of site functionality.
+      Strategies should be implemented to understand and plan for recovery
+      scenarios.</para>
    <itemizedlist>
        <listitem>
            <para>The deployed applications need to continue to
-                function and, more importantly, consideration should
-                be taken of the impact on the performance and
-                reliability of the application when a site is
-                unavailable.</para>
+                function and, more importantly, you must consider the
+                impact on the performance and reliability of the application
+                when a site is unavailable.</para>
        </listitem>
        <listitem>
            <para>It is important to understand what happens to the
                replication of objects and data between the sites when
                a site goes down. If this causes queues to start
                building up, consider how long these queues can
-                safely exist until something explodes.</para>
+                safely exist until an error occurs.</para>
        </listitem>
        <listitem>
-            <para>Ensure determination of the method for resuming
-                proper operations of a site when it comes back online
-                after a disaster. We recommend you architect the
-                recovery to avoid race conditions.</para>
+          <para>After an outage, ensure the method for resuming proper
+            operations of a site is implemented when it comes back online.
+            We recommend you architect the recovery to avoid race conditions.</para>
        </listitem>
    </itemizedlist></section>
    <section xml:id="compliance-and-geo-location-multi-site">
      <title>Compliance and geo-location</title>
-    <para>An organization could have certain legal obligations and
+    <para>An organization may have certain legal obligations and
        regulatory compliance measures which could require certain
        workloads or data to not be located in certain regions.</para></section>
    <section xml:id="auditing-multi-site">
@ -210,11 +167,10 @@
        site.</para></section>
    <section xml:id="authentication-between-sites">
        <title>Authentication between sites</title>
-    <para>Ideally it is best to have a single authentication domain
-        and not need a separate implementation for each and every
-        site. This, of course, requires an authentication
-        mechanism that is highly available and distributed to ensure
-        continuous operation. Authentication server locality is also
-        something that might be needed as well and should be planned
-        for.</para></section>
+    <para>It is recommended to have a single authentication domain
+        rather than a separate implementation for each and every
+        site. This requires an authentication mechanism that is highly
+        available and distributed to ensure continuous operation.
+        Authentication server locality might be required and should be
+        planned for.</para></section>
 </section>