Multi-site chapter edits

1. Edits to the multi-site chapter 2. Removed duplicated legal content which was added to a common section. See https://review.openstack.org/#/c/212299/ Change-Id: I10e3a04650548454c73024d87cbbb6fda63454e8 Implements: blueprint arch-guide
2015-08-12 23:26:53 +10:00
parent 87ff7002f8
commit 68e8c66e79
6 changed files with 339 additions and 350 deletions
--- a/doc/arch-design/ch_multi_site.xml
+++ b/doc/arch-design/ch_multi_site.xml
@@ -6,16 +6,9 @@
  xml:id="multi_site">
    <title>Multi-site</title>
-    <para>A multi-site OpenStack environment is one in which services,
+    <para>OpenStack is capable of running in a multi-region
        located in more than one data center, are used to provide the
        overall solution. Usage requirements of different multi-site
        clouds may vary widely, but they share some common needs.
        OpenStack is capable of running in a multi-region
        configuration. This enables some parts of OpenStack to
-        effectively manage a group of sites as a single cloud. With
+        effectively manage a group of sites as a single cloud.</para>
        careful planning in the design phase, OpenStack can act as an
        excellent multi-site cloud solution for a multitude of
        needs.</para>
    <para>Some use cases that might indicate a need for a multi-site
        deployment of OpenStack include:</para>
    <itemizedlist>
--- a/doc/arch-design/multi_site/section_architecture_multi_site.xml
+++ b/doc/arch-design/multi_site/section_architecture_multi_site.xml
@@ -6,59 +6,61 @@
  xml:id="arch-design-architecture-multiple-site">
    <?dbhtml stop-chunking?>
    <title>Architecture</title>
-    <para>This graphic is a high level diagram of a multi-site OpenStack
+    <para><xref linkend="multi-site_arch"/>
-        architecture. Each site is an OpenStack cloud but it may be necessary to
+      illustrates a high level multi-site OpenStack
-        architect the sites on different versions. For example, if the second
+      architecture. Each site is an OpenStack cloud but it may be necessary
-        site is intended to be a replacement for the first site, they would be
+      to architect the sites on different versions. For example, if the
-        different. Another common design would be a private OpenStack cloud with
+      second site is intended to be a replacement for the first site,
-        replicated site that would be used for high availability or disaster
+      they would be different. Another common design would be a private
-        recovery. The most important design decision is how to configure the
+      OpenStack cloud with a replicated site that would be used for high
-        storage. It can be configured as a single shared pool or separate pools,
+      availability or disaster recovery. The most important design decision
-        depending on the user and technical requirements.</para>
+      is configuring storage as a single shared pool or separate pools,
      depending on user and technical requirements.</para>
   <figure xml:id="multi-site_arch">
     <title>Multi-site OpenStack architecture</title>
    <mediaobject>
        <imageobject>
-            <imagedata contentwidth="4in"
+            <imagedata contentwidth="6in"
                fileref="../figures/Multi-Site_shared_keystone_horizon_swift1.png"/>
        </imageobject>
    </mediaobject>
  </figure>
    <section xml:id="openstack-services-architecture">
        <title>OpenStack services architecture</title>
        <para>The OpenStack Identity service, which is used by all other
-            OpenStack components for authorization and the catalog of service
+            OpenStack components for authorization and the catalog of
-            endpoints, supports the concept of regions. A region is a logical
+            service endpoints, supports the concept of regions. A region
-            construct that can be used to group OpenStack services that are in
+            is a logical construct used to group OpenStack services in
-            close proximity to one another. The concept of regions is flexible;
+            close proximity to one another. The concept of
-            it may can contain OpenStack service endpoints located within a
+            regions is flexible; it may can contain OpenStack service
-            distinct geographic region, or regions. It may be smaller in scope,
+            endpoints located within a distinct geographic region or regions.
-            where a region is a single rack within a data center or even a
+            It may be smaller in scope, where a region is a single rack
-            single blade chassis, with multiple regions existing in adjacent
+            within a data center, with multiple regions existing in adjacent
            racks in the same data center.</para>
-        <para>The majority of OpenStack components are designed to run within
+        <para>The majority of OpenStack components are designed to run
-            the context of a single region. The OpenStack Compute service is
+          within the context of a single region. The OpenStack Compute
-            designed to manage compute resources within a region, with support
+          service is designed to manage compute resources within a region,
-            for subdivisions of compute resources by using availability zones
+          with support for subdivisions of compute resources by using
-            and cells. The OpenStack Networking service can be used to manage
+          availability zones and cells. The OpenStack Networking service
-            network resources in the same broadcast domain or collection of
+          can be used to manage network resources in the same broadcast
-            switches that are linked. The OpenStack Block Storage service
+          domain or collection of switches that are linked. The OpenStack
-            controls storage resources within a region with all storage
+          Block Storage service controls storage resources within a region
-            resources residing on the same storage network. Like the OpenStack
+          with all storage resources residing on the same storage network.
-            Compute service, the OpenStack Block Storage service also supports
+          Like the OpenStack Compute service, the OpenStack Block Storage
-            the availability zone construct which can be used to subdivide
+          service also supports the availability zone construct which can
-            storage resources.</para>
+          be used to subdivide storage resources.</para>
        <para>The OpenStack dashboard, OpenStack Identity, and OpenStack
            Object Storage services are components that can each be deployed
            centrally in order to serve multiple regions.</para>
    </section>
    <section xml:id="arch-multi-storage">
        <title>Storage</title>
-        <para>With multiple OpenStack regions, having a single OpenStack Object
+        <para>With multiple OpenStack regions, it is recommended to configure
-            Storage service endpoint that delivers shared file storage for all
+          a single OpenStack Object Storage service endpoint to deliver
-            regions is desirable. The Object Storage service internally
+          shared file storage for all regions. The Object Storage service
-            replicates files to multiple nodes. The advantages of this are that,
+          internally replicates files to multiple nodes which can be used
-            if a file placed into the Object Storage service is visible to all
+          by applications or workloads in multiple regions. This simplifies
-            regions, it can be used by applications or workloads in any or all
+          high availability failover and disaster recovery rollback.</para>
            of the regions. This simplifies high availability failover and
            disaster recovery rollback.</para>
        <para>In order to scale the Object Storage service to meet the workload
            of multiple regions, multiple proxy workers are run and
            load-balanced, storage nodes are installed in each region, and the
@@ -68,19 +70,20 @@
            reducing the actual load on the storage network. In addition to an
            HTTP caching layer, use a caching layer like Memcache to cache
            objects between the proxy and storage nodes.</para>
-        <para>If the cloud is designed without a single Object Storage Service
+        <para>If the cloud is designed with a separate Object Storage
-            endpoint for multiple regions, and instead a separate Object Storage
+            Service endpoint made available in each region, applications are
            Service endpoint is made available in each region, applications are
            required to handle synchronization (if desired) and other management
            operations to ensure consistency across the nodes. For some
            applications, having multiple Object Storage Service endpoints
            located in the same region as the application may be desirable due
            to reduced latency, cross region bandwidth, and ease of
            deployment.</para>
-        <para>For the Block Storage service, the most important decisions are
+          <note>
-            the selection of the storage technology and whether or not a
+            <para>For the Block Storage service, the most important decisions
-            dedicated network is used to carry storage traffic from the storage
+              are the selection of the storage technology, and whether
-            service to the compute nodes.</para>
+              a dedicated network is used to carry storage traffic
              from the storage service to the compute nodes.</para>
          </note>
    </section>
    <section xml:id="arch-networking-multiple">
        <title>Networking</title>
@@ -100,18 +103,19 @@
    </section>
    <section xml:id="arch-dependencies-multiple">
        <title>Dependencies</title>
-        <para>The architecture for a multi-site installation of OpenStack is
+        <para>The architecture for a multi-site OpenStack installation
-            dependent on a number of factors. One major dependency to consider
+          is dependent on a number of factors. One major dependency to
-            is storage. When designing the storage system, the storage mechanism
+          consider is storage. When designing the storage system, the
-            needs to be determined. Once the storage type is determined, how it
+          storage mechanism needs to be determined. Once the storage
-            is accessed is critical. For example, we recommend that
+          type is determined, how it is accessed is critical. For example,
-            storage should use a dedicated network. Another concern is how
+          we recommend that storage should use a dedicated network.
-            the storage is configured to protect the data. For example, the
+          Another concern is how the storage is configured to protect
-            recovery point objective (RPO) and the recovery time objective
+          the data. For example, the Recovery Point Objective (RPO) and
-            (RTO). How quickly can the recovery from a fault be completed,
+          the Recovery Time Objective (RTO). How quickly recovery from
-            determines how often the replication of data is required. Ensure that
+          a fault can be completed, determines how often the replication of
-            enough storage is allocated to support the data protection
+          data is required. Ensure that enough storage is allocated to
-            strategy.</para>
+          support the data protection strategy.
      </para>
        <para>Networking decisions include the encapsulation mechanism that can
            be used for the tenant networks, how large the broadcast domains
            should be, and the contracted SLAs for the interconnects.</para>
--- a/doc/arch-design/multi_site/section_operational_considerations_multi_site.xml
+++ b/doc/arch-design/multi_site/section_operational_considerations_multi_site.xml
@@ -6,16 +6,14 @@
  xml:id="operational-considerations-multi-site">
    <?dbhtml stop-chunking?>
    <title>Operational considerations</title>
-    <para>Deployment of a multi-site OpenStack cloud using regions
+    <para>Multi-site OpenStack cloud deployment using regions
        requires that the service catalog contains per-region entries
-        for each service deployed other than the Identity service
+        for each service deployed other than the Identity service. Most
-        itself. There is limited support amongst currently available
+        off-the-shelf OpenStack deployment tools have limited support
-        off-the-shelf OpenStack deployment tools for defining multiple
+        for defining multiple regions in this fashion.</para>
-        regions in this fashion.</para>
+    <para>Deployers should be aware of this and provide the appropriate
    <para>Deployers must be aware of this and provide the appropriate
        customization of the service catalog for their site either
-        manually or via customization of the deployment tools in
+        manually, or by customizing deployment tools in use.</para>
        use.</para>
    <note><para>As of the Kilo release, documentation for
        implementing this feature is in progress. See this bug for
        more information:
@@ -31,51 +29,46 @@
        host operating systems, guest operating systems, OpenStack
        distributions (if applicable), software-defined infrastructure
        including network controllers and storage systems, and even
-        individual applications need to be evaluated in light of the
+        individual applications need to be evaluated.</para>
        multi-site nature of the cloud.</para>
    <para>Topics to consider include:</para>
    <itemizedlist>
        <listitem>
-            <para>The specific definition of what constitutes a site
+            <para>The definition of what constitutes a site
                in the relevant licenses, as the term does not
                necessarily denote a geographic or otherwise
-                physically isolated location in the traditional
+                physically isolated location.</para>
                sense.</para>
        </listitem>
        <listitem>
            <para>Differentiations between "hot" (active) and "cold"
-                (inactive) sites where significant savings may be made
+                (inactive) sites, where significant savings may be made
                in situations where one site is a cold standby for
                disaster recovery purposes only.</para>
        </listitem>
        <listitem>
            <para>Certain locations might require local vendors to
-                provide support and services for each site provides
+                provide support and services for each site which may vary
-                challenges, but will vary on the licensing agreement
+                with the licensing agreement in place.</para>
                in place.</para>
        </listitem>
    </itemizedlist></section>
    <section xml:id="logging-and-monitoring-multi-site">
      <title>Logging and monitoring</title>
    <para>Logging and monitoring does not significantly differ for a
-        multi-site OpenStack cloud. The same well known tools
+        multi-site OpenStack cloud. The tools described in the <link
        described in the <link
        xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">Logging
        and monitoring chapter</link> of the <citetitle>Operations
        Guide</citetitle> remain applicable. Logging and monitoring
-        can be provided both on a per-site basis and in a common
+        can be provided on a per-site basis, and in a common
        centralized location.</para>
    <para>When attempting to deploy logging and monitoring facilities
-        to a centralized location, care must be taken with regards to
+        to a centralized location, care must be taken with the load
-        the load placed on the inter-site networking links.</para></section>
+        placed on the inter-site networking links.</para></section>
    <section xml:id="upgrades-multi-site">
      <title>Upgrades</title>
-    <para>In multi-site OpenStack clouds deployed using regions each
+    <para>In multi-site OpenStack clouds deployed using regions, sites
-        site is, effectively, an independent OpenStack installation
+        are independent OpenStack installations which are linked
-        which is linked to the others by using centralized services
+        together using shared centralized services such as OpenStack
-        such as Identity which are shared between sites. At a high
+        Identity. At a high level the recommended order of operations
-        level the recommended order of operations to upgrade an
+        to upgrade an individual OpenStack environment is (see the <link
        individual OpenStack environment is (see the <link
        xlink:href="http://docs.openstack.org/openstack-ops/content/ops_upgrades-general-steps.html">Upgrades
        chapter</link> of the <citetitle>Operations Guide</citetitle>
        for details):</para>
@@ -123,22 +116,20 @@
                shared.</para>
        </listitem>
    </orderedlist>
-    <para>Note that Compute
+    <para>Compute upgrades within each site can also be performed in a rolling
        upgrades within each site can also be performed in a rolling
        fashion. Compute controller services (API, Scheduler, and
        Conductor) can be upgraded prior to upgrading of individual
-        compute nodes. This maximizes the ability of operations staff
+        compute nodes. This allows operations staff to keep a site
-        to keep a site operational for users of compute services while
+        operational for users of Compute services while performing an
-        performing an upgrade.</para></section>
+        upgrade.</para></section>
    <section xml:id="quota-management-multi-site">
      <title>Quota management</title>
-    <para>To prevent system capacities from being exhausted without
+      <para>Quotas are used to set operational limits to prevent system
-        notification, OpenStack provides operators with the ability to
+        capacities from being exhausted without notification. They are
-        define quotas. Quotas are used to set operational limits and
+        currently enforced at the tenant (or project) level rather than
-        are currently enforced at the tenant (or project) level rather
+        at the user level.</para>
-        than at the user level.</para>
+      <para>Quotas are defined on a per-region basis. Operators can
-    <para>Quotas are defined on a per-region basis. Operators may wish
+        define identical quotas for tenants in each region of the
        to define identical quotas for tenants in each region of the
        cloud to provide a consistent experience, or even create a
        process for synchronizing allocated quotas across regions. It
        is important to note that only the operational limits imposed
@@ -161,24 +152,22 @@
        Control (RBAC) policies, defined in a <filename>policy.json</filename> file, for
        each service. Operators edit these files to customize the
        policies for their OpenStack installation. If the application
-        of consistent RBAC policies across sites is considered a
+        of consistent RBAC policies across sites is a requirement, then
-        requirement, then it is necessary to ensure proper
+        it is necessary to ensure proper synchronization of the
-        synchronization of the <filename>policy.json</filename> files to all
+        <filename>policy.json</filename> files to all installations.</para>
-        installations.</para>
+    <para>This must be done using system administration tools
-    <para>This must be done using normal system administration tools
+        such as rsync as functionality for synchronizing policies
-        such as rsync as no functionality for synchronizing policies
+        across regions is not currently provided within OpenStack.</para></section>
        across regions is currently provided within OpenStack.</para></section>
    <section xml:id="documentation-multi-site">
      <title>Documentation</title>
    <para>Users must be able to leverage cloud infrastructure and
        provision new resources in the environment. It is important
-        that user documentation is accessible by users of the cloud
+        that user documentation is accessible by users to ensure they
-        infrastructure to ensure they are given sufficient information
+        are given sufficient information to help them leverage the cloud.
-        to help them leverage the cloud. As an example, by default
+        As an example, by default OpenStack schedules instances on a compute node
        OpenStack schedules instances on a compute node
        automatically. However, when multiple regions are available,
-        it is left to the end user to decide in which region to
+        the end user needs to decide in which region to schedule the
-        schedule the new instance. The dashboard presents the user with
+        new instance. The dashboard presents the user with
        the first region in your configuration. The API and CLI tools
        do not execute commands unless a valid region is specified.
        It is therefore important to provide documentation to your
--- a/doc/arch-design/multi_site/section_prescriptive_examples_multi_site.xml
+++ b/doc/arch-design/multi_site/section_prescriptive_examples_multi_site.xml
@@ -22,10 +22,10 @@
        very sensitive to latency and needs a rapid response to
        end-users. After reviewing the user, technical and operational
        considerations, it is determined beneficial to build a number
-        of regions local to the customer's edge. In this case rather
+        of regions local to the customer's edge. Rather than build a
-        than build a few large, centralized data centers, the intent
+        few large, centralized data centers, the intent of the architecture
-        of the architecture is to provide a pair of small data centers
+        is to provide a pair of small data centers in locations that
-        in locations that are closer to the customer. In this use
+        are closer to the customer. In this use
        case, spreading applications out allows for different
        horizontal scaling than a traditional compute workload scale.
        The intent is to scale by creating more copies of the
@@ -60,44 +60,47 @@
        expanding the capacity of all regions simultaneously,
        therefore maximizing the cost-effectiveness of the multi-site
        design.</para>
-    <para>One of the key decisions of running this sort of
+    <para>One of the key decisions of running this infrastructure is
-        infrastructure is whether or not to provide a redundancy
+        whether or not to provide a redundancy
        model. Two types of redundancy and high availability models in
        this configuration can be implemented. The first type
-        revolves around the availability of the central OpenStack
+        is the availability of central OpenStack
        components. Keystone can be made highly available in three
        central data centers that host the centralized OpenStack
        components. This prevents a loss of any one of the regions
        causing an outage in service. It also has the added benefit of
        being able to run a central storage repository as a primary
        cache for distributing content to each of the regions.</para>
-    <para>The second redundancy topic is that of the edge data center
+    <para>The second redundancy type is the edge data center itself.
-        itself. A second data center in each of the edge regional
+        A second data center in each of the edge regional
-        locations house a second region near the first. This
+        locations house a second region near the first region. This
        ensures that the application does not suffer degraded
        performance in terms of latency and availability.</para>
-    <para>This figure depicts the solution designed to have both a
+      <para><xref linkend="multi-site_customer_edge"/> depicts
-        centralized set of core data centers for OpenStack services
+        the solution designed to have both a centralized set of core
-        and paired edge data centers:</para>
+        data centers for OpenStack services and paired edge data centers:</para>
      <figure xml:id="multi-site_customer_edge">
        <title>Multi-site architecture example</title>
        <mediaobject>
        <imageobject>
-            <imagedata contentwidth="4in"
+            <imagedata contentwidth="6in"
                fileref="../figures/Multi-Site_Customer_Edge.png"/>
        </imageobject>
      </mediaobject>
      </figure>
    <section xml:id="geo-redundant-load-balancing">
      <title>Geo-redundant load balancing</title>
    <para>A large-scale web application has been designed with cloud
        principles in mind. The application is designed provide
        service to application store, on a 24/7 basis. The company has
-        typical 2-tier architecture with a web front-end servicing the
+        typical two tier architecture with a web front-end servicing the
-        customer requests and a NoSQL database back end storing the
+        customer requests, and a NoSQL database back end storing the
        information.</para>
    <para>As of late there has been several outages in number of major
-        public cloud providers&mdash;usually due to the fact these
+        public cloud providers due to applications running out of
-        applications were running out of a single geographical
+        a single geographical location. The design therefore should
-        location. The design therefore should mitigate the chance of a
+        mitigate the chance of a single site causing an outage for their
-        single site causing an outage for their business.</para>
+        business.</para>
    <para>The solution would consist of the following OpenStack
        components:</para>
    <itemizedlist>
@@ -108,12 +111,11 @@
        <listitem>
            <para>OpenStack Controller services running, Networking,
                dashboard, Block Storage and Compute running locally in
-                each of the three regions. The other services,
+                each of the three regions. Identity service, Orchestration
-                Identity, Orchestration, Telemetry, Image service and
+                service, Telemetry service, Image service and
-                Object Storage can be
+                Object Storage can be installed centrally, with
-                installed centrally&mdash;with nodes in each of the region
+                nodes in each of the region providing a redundant
-                providing a redundant OpenStack Controller plane
+                OpenStack Controller plane throughout the globe.</para>
                throughout the globe.</para>
        </listitem>
        <listitem>
            <para>OpenStack Compute nodes running the KVM
@@ -126,9 +128,9 @@
                replicated on a regular basis.</para>
        </listitem>
        <listitem>
-            <para>A Distributed DNS service available to all
+            <para>A distributed DNS service available to all
-                regions&mdash;that allows for dynamic update of DNS records of
+                regions that allows for dynamic update of DNS
-                deployed instances.</para>
+                records of deployed instances.</para>
        </listitem>
        <listitem>
            <para>A geo-redundant load balancing service can be used
@@ -153,10 +155,10 @@
        </listitem>
    </itemizedlist>
    <para>Another autoscaling Heat template can be used to deploy a
-        distributed MongoDB shard over the three locations&mdash;with the
+        distributed MongoDB shard over the three locations, with the
        option of storing required data on a globally available swift
        container. According to the usage and load on the database
-        server&mdash;additional shards can be provisioned according to
+        server, additional shards can be provisioned according to
        the thresholds defined in Telemetry.</para>
 <!--    <para>The reason that three regions were selected here was because of
        the fear of having abnormal load on a single region in the
@@ -169,57 +171,66 @@
        autoscaling and auto healing in the event of increased load.
        Additional configuration management tools, such as Puppet or
        Chef could also have been used in this scenario, but were not
-        chosen due to the fact that Orchestration had the appropriate built-in
+        chosen since Orchestration had the appropriate built-in
-        hooks into the OpenStack cloud&mdash;whereas the other tools were
+        hooks into the OpenStack cloud, whereas the other tools were
-        external and not native to OpenStack. In addition&mdash;since this
+        external and not native to OpenStack. In addition, external
-        deployment scenario was relatively straight forward&mdash;the
+        tools were not needed since this deployment scenario was straight
-        external tools were not needed.</para>
+        forward.</para>
-    <para>
+    <para>OpenStack Object Storage is used here to serve as a back end for
        OpenStack Object Storage is used here to serve as a back end for
        the Image service since it is the most suitable solution for a
-        globally distributed storage solution&mdash;with its own
+        globally distributed storage solution with its own
        replication mechanism. Home grown solutions could also have
-        been used including the handling of replication&mdash;but were not
+        been used including the handling of replication, but were not
        chosen, because Object Storage is already an intricate part of the
-        infrastructure&mdash;and proven solution.</para>
+        infrastructure and a proven solution.</para>
    <para>An external load balancing service was used and not the
        LBaaS in OpenStack because the solution in OpenStack is not
        redundant and does not have any awareness of geo location.</para>
      <figure xml:id="multi-site_geo_redundant">
        <title>Multi-site geo-redundant architecture</title>
      <mediaobject>
        <imageobject>
-            <imagedata contentwidth="4in"
+            <imagedata contentwidth="6in"
                fileref="../figures/Multi-site_Geo_Redundant_LB.png"/>
        </imageobject>
-    </mediaobject></section>
+      </mediaobject>
-    <section xml:id="location-local-services"><title>Location-local service</title>
+     </figure>
-    <para>A common use for a multi-site deployment of OpenStack, is
+    </section>
-        for creating a Content Delivery Network. An application that
+    <section xml:id="location-local-services">
      <title>Location-local service</title>
    <para>A common use for multi-site OpenStack deployment is
        creating a Content Delivery Network. An application that
        uses a location-local architecture requires low network
-        latency and proximity to the user, in order to provide an
+        latency and proximity to the user to provide an
-        optimal user experience, in addition to reducing the cost of
+        optimal user experience and reduce the cost of bandwidth and
-        bandwidth and transit, since the content resides on sites
+        transit. The content resides on sites closer to the customer,
-        closer to the customer, instead of a centralized content store
+        instead of a centralized content store that requires utilizing
-        that requires utilizing higher cost cross-country links.</para>
+        higher cost cross-country links.</para>
-    <para>This architecture usually includes a geo-location component
+    <para>This architecture includes a geo-location component
-        that places user requests at the closest possible node. In
+        that places user requests to the closest possible node. In
        this scenario, 100% redundancy of content across every site is
-        a goal rather than a requirement, with the intent being to
+        a goal rather than a requirement, with the intent to
-        maximize the amount of content available that is within a
+        maximize the amount of content available within a
-        minimum number of network hops for any given end user. Despite
+        minimum number of network hops for end users. Despite
        these differences, the storage replication configuration has
        significant overlap with that of a geo-redundant load
        balancing use case.</para>
-    <para>In this example, the application utilizing this multi-site
+      <para>In <xref linkend="multi-site_shared_shared_keystone"/>,
-        OpenStack install that is location aware would launch web
+        the application utilizing this multi-site OpenStack install
-        server or content serving instances on the compute cluster in
+        that is location-aware would launch web server or content
-        each site. Requests from clients are first sent to a
+        serving instances on the compute cluster in each site. Requests
-        global services load balancer that determines the location of
+        from clients are first sent to a global services load balancer
-        the client, then routes the request to the closest OpenStack
+        that determines the location of the client, then routes the
-        site where the application completes the request.</para>
+        request to the closest OpenStack site where the application
        completes the request.</para>
      <figure xml:id="multi-site_shared_shared_keystone">
        <title>Multi-site shared keystone architecture</title>
      <mediaobject>
        <imageobject>
-            <imagedata contentwidth="4in"
+            <imagedata contentwidth="6in"
                fileref="../figures/Multi-Site_shared_keystone1.png"/>
        </imageobject>
-    </mediaobject></section>
+      </mediaobject>
     </figure>
    </section>
 </section>
--- a/doc/arch-design/multi_site/section_tech_considerations_multi_site.xml
+++ b/doc/arch-design/multi_site/section_tech_considerations_multi_site.xml
@@ -27,105 +27,108 @@
        high-bandwidth links available between them, it may be wise to
        configure a separate storage replication network between the
        two sites to support a single Swift endpoint and a shared
-        object storage capability between them. (An example of this
+        Object Storage capability between them. An example of this
        technique, as well as a configuration walk-through, is
        available at <link
-        xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>).
+        xlink:href="http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network">http://docs.openstack.org/developer/swift/replication_network.html#dedicated-replication-network</link>.
        Another option in this scenario is to build a dedicated set of
-        tenant private networks across the secondary link using
+        tenant private networks across the secondary link, using
        overlay networks with a third party mapping the site overlays
        to each other.</para>
    <para>The capacity requirements of the links between sites is
-        driven by application behavior. If the latency of the links is
+        driven by application behavior. If the link latency is
        too high, certain applications that use a large number of
        small packets, for example RPC calls, may encounter issues
        communicating with each other or operating properly.
        Additionally, OpenStack may encounter similar types of issues.
-        To mitigate this, tuning of the Identity service call timeouts may be
+        To mitigate this, Identity service call timeouts can be
-        necessary to prevent issues authenticating against a central
+        tuned to prevent issues authenticating against a central
        Identity service.</para>
-    <para>Another capacity consideration when it comes to networking
+    <para>Another network capacity consideration for a multi-site
-        for a multi-site deployment is the available amount and
+        deployment is the amount and performance of overlay networks
-        performance of overlay networks for tenant networks. If using
+        available for tenant networks. If using shared tenant networks
-        shared tenant networks across zones, it is imperative that an
+        across zones, it is imperative that an external overlay manager
-        external overlay manager or controller be used to map these
+        or controller be used to map these overlays together. It is
-        overlays together. It is necessary to ensure the amount of
+        necessary to ensure the amount of possible IDs between the zones
-        possible IDs between the zones are identical. Note that, as of
+        are identical.</para>
-        the Kilo release, OpenStack Networking was not capable of managing
+      <note>
-        tunnel IDs across installations. This means that if one site
+        <para>As of the Kilo release, OpenStack Networking was not
-        runs out of IDs, but other does not, that tenant's network
+          capable of managing tunnel IDs across installations. So if
-        is unable to reach the other site.</para>
+          one site runs out of IDs, but another does not, that tenant's
          network is unable to reach the other site.</para>
      </note>
    <para>Capacity can take other forms as well. The ability for a
        region to grow depends on scaling out the number of available
        compute nodes. This topic is covered in greater detail in the
-        section for compute-focused deployments. However, it should be
+        section for compute-focused deployments. However, it may be
-        noted that cells may be necessary to grow an individual region
+        necessary to grow cells in an individual region, depending on
-        beyond a certain point. This point depends on the size of your
+        the size of your cluster and the ratio of virtual machines per
        cluster and the ratio of virtual machines per
        hypervisor.</para>
    <para>A third form of capacity comes in the multi-region-capable
        components of OpenStack. Centralized Object Storage is capable
        of serving objects through a single namespace across multiple
-        regions. Since this works by accessing the object store via
+        regions. Since this works by accessing the object store through
        swift proxy, it is possible to overload the proxies. There are
-        two options available to mitigate this issue. The first is to
+        two options available to mitigate this issue:</para>
-        deploy a large number of swift proxies. The drawback to this
+      <itemizedlist>
-        is that the proxies are not load-balanced and a large file
+        <listitem>
-        request could continually hit the same proxy. The other way to
+          <para>Deploy a large number of swift proxies. The drawback is
-        mitigate this is to front-end the proxies with a caching HTTP
+            that the proxies are not load-balanced and a large file
-        proxy and load balancer. Since swift objects are returned to
+            request could continually hit the same proxy.</para>
-        the requester via HTTP, this load balancer would alleviate the
+        </listitem>
        <listitem>
          <para>Add a caching HTTP proxy and load balancer in front of
            the swift proxies. Since swift objects are returned to the
            requester via HTTP, this load balancer would alleviate the
            load required on the swift proxies.</para>
         </listitem>
       </itemizedlist>
    <section xml:id="utilization-multi-site"><title>Utilization</title>
    <para>While constructing a multi-site OpenStack environment is the
        goal of this guide, the real test is whether an application
        can utilize it.</para>
-    <para>Identity is normally the first interface for the majority of
+    <para>The Identity service is normally the first interface for
-        OpenStack users. Interacting with the Identity service is required for
+        OpenStack users and is required for almost all major operations
-        almost all major operations within OpenStack. Therefore, it is
+        within OpenStack. Therefore, it is important that you provide users
-        important to ensure that you provide users with a single URL
+        with a single URL for Identity service authentication, and
-        for Identity service authentication. Equally important is proper
+        document the configuration of regions within the Identity service.
        documentation and configuration of regions within the Identity service.
        Each of the sites defined in your installation is considered
        to be a region in Identity nomenclature. This is important for
-        the users of the system, when reading Identity documentation,
+        the users, as it is required to define the region name when
-        as it is required to define the region name when providing
+        providing actions to an API endpoint or in the dashboard.</para>
        actions to an API endpoint or in the dashboard.</para>
    <para>Load balancing is another common issue with multi-site
        installations. While it is still possible to run HAproxy
-        instances with Load-Balancer-as-a-Service, these are local
+        instances with Load-Balancer-as-a-Service, these are defined
-        to a specific region. Some applications may be able to cope
+        to a specific region. Some applications can manage this using
-        with this via internal mechanisms. Others, however, may
+        internal mechanisms. Other applications may require the
-        require the implementation of an external system including
+        implementation of an external system, including global services
-        global services load balancers or anycast-advertised
+        load balancers or anycast-advertised DNS.</para>
        DNS.</para>
    <para>Depending on the storage model chosen during site design,
        storage replication and availability are also a concern
-        for end-users. If an application is capable of understanding
+        for end-users. If an application can support regions, then it
-        regions, then it is possible to keep the object storage system
+        is possible to keep the object storage system separated by region.
-        separated by region. In this case, users who want to have an
+        In this case, users who want to have an object available to
-        object available to more than one region need to do the
+        more than one region need to perform cross-site replication.
-        cross-site replication themselves. With a centralized swift
+        However, with a centralized swift proxy, the user may need to
-        proxy, however, the user may need to benchmark the replication
+        benchmark the replication timing of the Object Storage back end.
-        timing of the Object Storage back end. Benchmarking allows the
+        Benchmarking allows the operational staff to provide users with
-        operational staff to provide users with an understanding of
+        an understanding of the amount of time required for a stored or
-        the amount of time required for a stored or modified object to
+        modified object to become available to the entire environment.</para>
-        become available to the entire environment.</para></section>
+      </section>
    <section xml:id="performance"><title>Performance</title>
    <para>Determining the performance of a multi-site installation
        involves considerations that do not come into play in a
        single-site deployment. Being a distributed deployment,
-        multi-site deployments incur a few extra penalties to
+        performance in multi-site deployments may be affected in certain
-        performance in certain situations.</para>
+        situations.</para>
    <para>Since multi-site systems can be geographically separated,
-        they may have worse than normal latency or jitter when
+        there may be greater latency or jitter when communicating across
-        communicating across regions. This can especially impact
+        regions. This can especially impact systems like the OpenStack
-        systems like the OpenStack Identity service when making
+        Identity service when making authentication attempts from regions
-        authentication attempts from regions that do not contain the
+        that do not contain the centralized Identity implementation. It
-        centralized Identity implementation. It can also affect
+        can also affect applications which rely on Remote Procedure Call (RPC)
-        certain applications which rely on remote procedure call (RPC)
+        for normal operation. An example of this can be seen in high
-        for normal operation. An example of this can be seen in High
+        performance computing workloads.</para>
        Performance Computing workloads.</para>
    <para>Storage availability can also be impacted by the
        architecture of a multi-site deployment. A centralized Object
        Storage service requires more time for an object to be
@@ -137,4 +140,37 @@
        to manually cope with this limitation by creating duplicate
        block storage entries in each region.</para>
      </section>
    <section xml:id="openstack-components_multi-site">
      <title>OpenStack components</title>
    <para>Most OpenStack installations require a bare minimum set of
        pieces to function. These include the OpenStack Identity
        (keystone) for authentication, OpenStack Compute
        (nova) for compute, OpenStack Image service (glance) for image
        storage, OpenStack Networking (neutron) for networking, and
        potentially an object store in the form of OpenStack Object
        Storage (swift). Deploying a multi-site installation also demands extra
        components in order to coordinate between regions. A centralized
        Identity service is necessary to provide the single authentication
        point. A centralized dashboard is also recommended to provide a
        single login point and a mapping to the API and CLI
        options available. A centralized Object Storage service may also
        be used, but will require the installation of the swift proxy
        service.</para>
    <para>It may also be helpful to install a few extra options in
        order to facilitate certain use cases. For example,
        installing Designate may assist in automatically generating
        DNS domains for each region with an automatically-populated
        zone full of resource records for each instance. This
        facilitates using DNS as a mechanism for determining which
        region will be selected for certain applications.</para>
    <para>Another useful tool for managing a multi-site installation
        is Orchestration (heat). The Orchestration module allows the
        use of templates to define a set of instances to be launched
        together or for scaling existing sets. It can also be used to
        set up matching or differentiated groupings based on
        regions. For instance, if an application requires an equally
        balanced number of nodes across sites, the same heat template
        can be used to cover each site with small alterations to only
        the region name.</para>
    </section>
 </section>
--- a/doc/arch-design/multi_site/section_user_requirements_multi_site.xml
+++ b/doc/arch-design/multi_site/section_user_requirements_multi_site.xml
@@ -6,55 +6,16 @@
  xml:id="user-requirements-multi-site">
    <?dbhtml stop-chunking?>
    <title>User requirements</title>
    <para>A multi-site architecture is complex and has its own risks
        and considerations, therefore it is important to make sure
        when contemplating the design such an architecture that it
        meets the user and business requirements.</para>
    <para>Many jurisdictions have legislative and regulatory
        requirements governing the storage and management of data in
        cloud environments. Common areas of regulation include:</para>
    <itemizedlist>
        <listitem>
            <para>Data retention policies ensuring storage of
                persistent data and records management to meet data
                archival requirements.</para>
        </listitem>
        <listitem>
            <para>Data ownership policies governing the possession and
                responsibility for data.</para>
        </listitem>
        <listitem>
            <para>Data sovereignty policies governing the storage of
                data in foreign countries or otherwise separate
                jurisdictions.</para>
        </listitem>
        <listitem>
            <para>Data compliance policies governing types of
                information that needs to reside in certain locations
                due to regular issues and, more importantly, cannot
                reside in other locations for the same reason.</para>
        </listitem>
    </itemizedlist>
    <para>Examples of such legal frameworks include the data
        protection framework of the European Union (<link
        xlink:href="http://ec.europa.eu/justice/data-protection">http://ec.europa.eu/justice/data-protection</link>)
        and the requirements of the Financial Industry Regulatory
        Authority (<link
        xlink:href="http://www.finra.org/Industry/Regulation/FINRARules">http://www.finra.org/Industry/Regulation/FINRARules</link>)
        in the United States. Consult a local regulatory body for more
        information.</para>
    <section xml:id="workload-characteristics">
      <title>Workload characteristics</title>
-    <para>The expected workload is a critical requirement that needs
+    <para>An understanding of the expected workloads for a desired
-        to be captured to guide decision-making. An understanding of
+        multi-site environment and use case is an important factor in
-        the workloads in the context of the desired multi-site
+        the decision-making process. In this context, <literal>workload</literal>
-        environment and use case is important. Another way of thinking
+        refers to the way the systems are used. A workload could be a
-        about a workload is to think of it as the way the systems are
+        single application or a suite of applications that work together.
-        used. A workload could be a single application or a suite of
+        It could also be a duplicate set of applications that need to
-        applications that work together. It could also be a duplicate
+        run in multiple cloud environments. Often in a multi-site deployment,
-        set of applications that need to run in multiple cloud
+        the same workload will need to work identically in more than one
        environments. Often in a multi-site deployment the same
        workload will need to work identically in more than one
        physical location.</para>
    <para>This multi-site scenario likely includes one or more of the
        other scenarios in this book with the additional requirement
@@ -72,26 +33,26 @@
        <title>Consistency of images and templates across different
        sites</title>
    <para>It is essential that the deployment of instances is
-        consistent across the different sites. This needs to be built
+        consistent across the different sites and built
        into the infrastructure. If the OpenStack Object Storage is used as
-        a back end for the Image service, it is possible to create repositories of
+        a back end for the Image service, it is possible to create repositories
-        consistent images across multiple sites. Having central
+        of consistent images across multiple sites. Having central
        endpoints with multiple storage nodes allows consistent centralized
-        storage for each and every site.</para>
+        storage for every site.</para>
-    <para>Not using a centralized object store increases operational
+      <para>Not using a centralized object store increases the operational
-        overhead so that a consistent image library can be maintained. This
+        overhead of maintaining a consistent image library. This
        could include development of a replication mechanism to handle
        the transport of images and the changes to the images across
        multiple sites.</para></section>
-    <section xml:id="high-availability-multi-site"><title>High availability</title>
+    <section xml:id="high-availability-multi-site">
      <title>High availability</title>
    <para>If high availability is a requirement to provide continuous
        infrastructure operations, a basic requirement of high
        availability should be defined.</para>
    <para>The OpenStack management components need to have a basic and
        minimal level of redundancy. The simplest example is the loss
-        of any single site has no significant impact on the
+        of any single site should have minimal impact on the
-        availability of the OpenStack services of the entire
+        availability of the OpenStack services.</para>
        infrastructure.</para>
    <para>The <link
        xlink:href="http://docs.openstack.org/high-availability-guide/content/"><citetitle>OpenStack
        High Availability Guide</citetitle></link>
@@ -111,14 +72,12 @@
        WAN network design between the sites.</para>
    <para>Connecting more than two sites increases the challenges and
        adds more complexity to the design considerations. Multi-site
-        implementations require extra planning to address the
+        implementations require planning to address the additional
-        additional topology complexity used for internal and external
+        topology used for internal and external connectivity. Some options
-        connectivity. Some options include full mesh topology, hub
+        include full mesh topology, hub spoke, spine leaf, and 3D Torus.</para>
-        spoke, spine leaf, or 3d Torus.</para>
+    <para>If applications running in a cloud are not cloud-aware, there
-    <para>Not all the applications running in a cloud are cloud-aware.
+        should be clear measures and expectations to define what the
-        If that is the case, there should be clear measures and
+        infrastructure can and cannot support. An example would be
        expectations to define what the infrastructure can support
        and, more importantly, what it cannot. An example would be
        shared storage between sites. It is possible, however such a
        solution is not native to OpenStack and requires a third-party
        hardware vendor to fulfill such a requirement. Another example
@@ -126,21 +85,21 @@
        in object storage directly. These applications need to be
        cloud aware to make good use of an OpenStack Object
        Store.</para></section>
-    <section xml:id="application-readiness"><title>Application readiness</title>
+    <section xml:id="application-readiness">
      <title>Application readiness</title>
    <para>Some applications are tolerant of the lack of synchronized
        object storage, while others may need those objects to be
-        replicated and available across regions. Understanding of how
+        replicated and available across regions. Understanding how
        the cloud implementation impacts new and existing applications
-        is important for risk mitigation and the overall success of a
+        is important for risk mitigation, and the overall success of a
-        cloud project. Applications may have to be written to expect
+        cloud project. Applications may have to be written or rewritten
-        an infrastructure with little to no redundancy. Existing
+        for an infrastructure with little to no redundancy, or with the
-        applications not developed with the cloud in mind may need to
+        cloud in mind.</para></section>
-        be rewritten.</para></section>
+    <section xml:id="cost-multi-site">
-    <section xml:id="cost-multi-site"><title>Cost</title>
+      <title>Cost</title>
-    <para>The requirement of having more than one site has a cost
+    <para>A greater number of sites increase cost and complexity for a
-        attached to it. The greater the number of sites, the greater
+        multi-site deployment. Costs can be broken down into the following
-        the cost and complexity. Costs can be broken down into the
+        categories:</para>
        following categories:</para>
    <itemizedlist>
        <listitem>
            <para>Compute resources</para>
@@ -163,34 +122,32 @@
    </itemizedlist></section>
    <section xml:id="site-loss-and-recovery">
      <title>Site loss and recovery</title>
-    <para>Outages can cause loss of partial or full functionality of a
+    <para>Outages can cause partial or full loss of site functionality.
-        site. Strategies should be implemented to understand and plan
+      Strategies should be implemented to understand and plan for recovery
-        for recovery scenarios.</para>
+      scenarios.</para>
    <itemizedlist>
        <listitem>
            <para>The deployed applications need to continue to
-                function and, more importantly, consideration should
+                function and, more importantly, you must consider the
-                be taken of the impact on the performance and
+                impact on the performance and reliability of the application
-                reliability of the application when a site is
+                when a site is unavailable.</para>
                unavailable.</para>
        </listitem>
        <listitem>
            <para>It is important to understand what happens to the
                replication of objects and data between the sites when
                a site goes down. If this causes queues to start
                building up, consider how long these queues can
-                safely exist until something explodes.</para>
+                safely exist until an error occurs.</para>
        </listitem>
        <listitem>
-            <para>Ensure determination of the method for resuming
+          <para>After an outage, ensure the method for resuming proper
-                proper operations of a site when it comes back online
+            operations of a site is implemented when it comes back online.
-                after a disaster. We recommend you architect the
+            We recommend you architect the recovery to avoid race conditions.</para>
                recovery to avoid race conditions.</para>
        </listitem>
    </itemizedlist></section>
    <section xml:id="compliance-and-geo-location-multi-site">
      <title>Compliance and geo-location</title>
-    <para>An organization could have certain legal obligations and
+    <para>An organization may have certain legal obligations and
        regulatory compliance measures which could require certain
        workloads or data to not be located in certain regions.</para></section>
    <section xml:id="auditing-multi-site">
@@ -210,11 +167,10 @@
        site.</para></section>
    <section xml:id="authentication-between-sites">
        <title>Authentication between sites</title>
-    <para>Ideally it is best to have a single authentication domain
+    <para>It is recommended to have a single authentication domain
-        and not need a separate implementation for each and every
+        rather than a separate implementation for each and every
-        site. This, of course, requires an authentication
+        site. This requires an authentication mechanism that is highly
-        mechanism that is highly available and distributed to ensure
+        available and distributed to ensure continuous operation.
-        continuous operation. Authentication server locality is also
+        Authentication server locality might be required and should be
-        something that might be needed as well and should be planned
+        planned for.</para></section>
        for.</para></section>
 </section>