Merge "Edits to the prescriptive examples and tech consideration files"

2015-08-18 22:26:33 +00:00 · 2015-08-18 22:26:33 +00:00 · 9964daf06d
commit 9964daf06d
parent d9897717c7 d81f3251a7
2 changed files with 76 additions and 224 deletions
--- a/doc/arch-design/compute_focus/section_prescriptive_examples_compute_focus.xml
+++ b/doc/arch-design/compute_focus/section_prescriptive_examples_compute_focus.xml
@ -80,23 +80,31 @@
    </mediaobject>
    <para>There is also some customization of the filter scheduler
        that handles placement within the cells:</para>
-    <itemizedlist>
-        <listitem><para>ImagePropertiesFilter - To provide special handling
+    <variablelist>
+      <varlistentry><term>ImagePropertiesFilter</term>
+        <listitem>
+          <para>Provides special handling
                depending on the guest operating system in use
                (Linux-based or Windows-based).</para>
        </listitem>
-        <listitem><para>ProjectsToAggregateFilter - To provide special
-                handling depending on the project the instance is
+      </varlistentry>
+      <varlistentry><term>ProjectsToAggregateFilter</term>
+        <listitem><para>Provides special
+                handling depending on which project the instance is
                associated with.</para>
        </listitem>
-        <listitem><para>default_schedule_zones - Allows the selection of
+      </varlistentry>
+      <varlistentry><term>default_schedule_zones</term>
+        <listitem><para>Allows the selection of
                multiple default availability zones, rather than a
-                single default.
-        </para></listitem>
-    </itemizedlist>
+                single default.</para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
    <para>A central database team manages the MySQL database server in each cell
        in an active/passive configuration with a NetApp storage back end.
        Backups run every 6 hours.</para>
+
    <section xml:id="network-architecture">
      <title>Network architecture</title>
    <para>To integrate with existing networking infrastructure, CERN
@ -110,7 +118,9 @@
        placed an instance and selects a MAC address and IP
        from the pre-registered list associated with that node in the
        database. The database updates to reflect the address assignment to
-        that instance.</para></section>
+        that instance.</para>
+    </section>
+
    <section xml:id="storage-architecture">
      <title>Storage architecture</title>
    <para>CERN deploys the OpenStack Image service in the API cell and
@ -119,7 +129,9 @@
        use is a 3 PB Ceph cluster.</para>
    <para>CERN maintains a small set of Scientific Linux 5 and 6 images onto
        which orchestration tools can place applications. Puppet manages
-        instance configuration and customization.</para></section>
+        instance configuration and customization.</para>
+    </section>
+
    <section xml:id="monitoring">
      <title>Monitoring</title>
    <para>CERN does not require direct billing, but uses the Telemetry module
@ -147,18 +159,4 @@
        project.
    </para>
    </section>
-    <section xml:id="references-cern-resources"><title>References</title>
-    <para>The authors of the Architecture Design Guide would like to
-        thank CERN for publicly documenting their OpenStack deployment
-        in these resources, which formed the basis for this
-        chapter:</para>
-    <itemizedlist>
-        <listitem><para><link xlink:href="http://openstack-in-production.blogspot.fr">http://openstack-in-production.blogspot.fr</link>
-        </para></listitem>
-        <listitem><para>
-              <link
-              xlink:href="http://www.openstack.org/assets/presentation-media/Deep-Dive-into-the-CERN-Cloud-Infrastructure.pdf">Deep
-              dive into the CERN Cloud Infrastructure</link></para>
-        </listitem>
-    </itemizedlist></section>
 </section>
--- a/doc/arch-design/compute_focus/section_tech_considerations_compute_focus.xml
+++ b/doc/arch-design/compute_focus/section_tech_considerations_compute_focus.xml
@ -12,10 +12,7 @@
    <title>Technical considerations</title>
    <para>In a compute-focused OpenStack cloud, the type of instance
        workloads you provision heavily influences technical
-        decision making. For example, specific use cases that demand
-        multiple, short-running jobs present different requirements
-        than those that specify long-running jobs, even though both
-        situations are compute focused.</para>
+        decision making.</para>
    <para>Public and private clouds require deterministic capacity
        planning to support elastic growth in order to meet user SLA
        expectations. Deterministic capacity planning is the path to
@ -23,21 +20,19 @@
        perform consistently. This process is important because,
        when a service becomes a critical part of a user's
        infrastructure, the user's experience links directly to the SLAs of
-        the cloud itself. In cloud computing, it is not average speed but
-        speed consistency that determines a service's performance.
-        There are two aspects of capacity planning to consider:</para>
+        the cloud itself.</para>
+    <para>There are two aspects of capacity planning to consider:</para>
    <itemizedlist>
      <listitem>
-        <para>planning the initial deployment footprint</para>
+        <para>Planning the initial deployment footprint</para>
      </listitem>
      <listitem>
-        <para>planning expansion of it to stay ahead of the demands of cloud
-        users</para>
+        <para>Planning expansion of the environment to stay ahead of the
+        demands of cloud users</para>
      </listitem>
    </itemizedlist>
-    <para>Plan the initial footprint for an OpenStack deployment
-        based on existing infrastructure workloads
-        and estimates based on expected uptake.</para>
+    <para>Begin planning an initial OpenStack deployment footprint with
+      estimations of expected uptake, and existing infrastructure workloads.</para>
    <para>The starting point is the core count of the cloud. By
        applying relevant ratios, the user can gather information
        about:</para>
@ -50,7 +45,7 @@
            <para>Required storage: flavor disk size × number of instances</para>
        </listitem>
    </itemizedlist>
-    <para>Use these ratios to determine the amount of
+    <para>These ratios determine the amount of
        additional infrastructure needed to support the cloud. For
        example, consider a situation in which you require 1600
        instances, each with 2 vCPU and 50 GB of storage. Assuming the
@ -61,7 +56,7 @@
            <para>1600 = (16 &times; (number of physical cores)) / 2</para>
        </listitem>
        <listitem>
-            <para>storage required = 50&nbsp;GB &times; 1600</para>
+            <para>Storage required = 50&nbsp;GB &times; 1600</para>
        </listitem>
    </itemizedlist>
    <para>On the surface, the equations reveal the need for 200
@ -71,19 +66,10 @@
        look at patterns of usage to estimate the load that the API
        services, database servers, and queue servers are likely to
        encounter.</para>
-    <para>Consider, for example, the differences between a cloud that
-        supports a managed web-hosting platform and one running
-        integration tests for a development project that creates one
-        instance per code commit. In the former, the heavy work of
-        creating an instance happens only every few months, whereas
-        the latter puts constant heavy load on the cloud controller.
-        The average instance lifetime is significant, as a larger
-        number generally means less load on the cloud
-        controller.</para>
    <para>Aside from the creation and termination of instances, consider the
        impact of users accessing the service,
        particularly on nova-api and its associated database. Listing
-        instances gathers a great deal of information and, given the
+        instances gathers a great deal of information and given the
        frequency with which users run this operation, a cloud with a
        large number of users can increase the load significantly.
        This can even occur unintentionally. For example, the
@ -104,17 +90,12 @@
        the impacts of different hardware and instance load outs. See:
        <link xlink:href="https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods">https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods</link>
    </para>
+
    <section xml:id="expansion-planning-compute-focus">
        <title>Expansion planning</title>
    <para>A key challenge for planning the expansion of cloud
        compute services is the elastic nature of cloud infrastructure
-        demands. Previously, new users or customers had to
-        plan for and request the infrastructure they required ahead of
-        time, allowing time for reactive procurement processes. Cloud
-        computing users have come to expect the agility of having
-        instant access to new resources as required.
-        Consequently, plan for typical usage and for sudden bursts in
-        usage.</para>
+        demands.</para>
    <para>Planning for expansion is a balancing act.
        Planning too conservatively can lead to unexpected
        oversubscription of the cloud and dissatisfied users. Planning
@ -127,31 +108,11 @@
        average speed or capacity of the cloud. Using this information
        to model capacity performance enables users to more
        accurately determine the current and future capacity of the
-        cloud.</para></section>
-    <section xml:id="cpu-and-ram-compute-focus"><title>CPU and RAM</title>
-    <para>Adapted from:
-        <link xlink:href="http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice">http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice</link></para>
-    <para>In current generations, CPUs have up to 12 cores. If an
-        Intel CPU supports Hyper-Threading, those 12 cores double
-        to 24 cores. A server that supports multiple CPUs multiplies
-        the number of available cores.
-        Hyper-Threading is Intel's proprietary simultaneous
-        multi-threading implementation, used to improve
-        parallelization on their CPUs. Consider enabling
-        Hyper-Threading to improve the performance of multithreaded
-        applications.</para>
-    <para>Whether the user should enable Hyper-Threading on a CPU
-        depends on the use case. For example, disabling
-        Hyper-Threading can be beneficial in intense computing
-        environments. Running performance tests using local
-        workloads with and without Hyper-Threading can help
-        determine which option is more appropriate in any particular
-        case.</para>
-    <para>If they must run the Libvirt or KVM hypervisor drivers,
-        then the compute node CPUs must support
-        virtualization by way of the VT-x extensions for Intel chips
-        and AMD-v extensions for AMD chips to provide full
-        performance.</para>
+        cloud.</para>
+    </section>
+
+    <section xml:id="cpu-and-ram-compute-focus">
+      <title>CPU and RAM</title>
    <para>OpenStack enables users to overcommit CPU and RAM on
        compute nodes. This allows an increase in the number of
        instances running on the cloud at the cost of reducing the
@ -176,13 +137,10 @@
        long as the total amount of RAM associated with the instances
        is less than 1.5 times the amount of RAM available on the
        physical node.</para>
-    <para>For example, if a physical node has 48 GB of RAM, the
-        scheduler allocates instances to that node until the sum of
-        the RAM associated with the instances reaches 72 GB (such as
-        nine instances, in the case where each instance has 8 GB of
-        RAM).</para>
    <para>You must select the appropriate CPU and RAM allocation ratio
-        based on particular use cases.</para></section>
+        based on particular use cases.</para>
+    </section>
+
    <section xml:id="additional-hardware-compute-focus">
      <title>Additional hardware</title>
    <para>Certain use cases may benefit from exposure to additional
@ -193,8 +151,6 @@
                the availability of graphics processing units (GPUs)
                for general-purpose computing.</para>
        </listitem>
-    </itemizedlist>
-    <itemizedlist>
        <listitem>
            <para>Cryptographic routines that benefit from the
                availability of hardware random number generators to
@ -210,11 +166,14 @@
        characteristics, which can include hardware similarities. The
        addition of specialized hardware to a cloud deployment is
        likely to add to the cost of each node, so consider carefully
-        consideration whether all compute nodes, or
+        whether all compute nodes, or
        just a subset targeted by flavors, need the
        additional customization to support the desired
-        workloads.</para></section>
-    <section xml:id="utilization"><title>Utilization</title>
+        workloads.</para>
+    </section>
+
+    <section xml:id="utilization">
+      <title>Utilization</title>
    <para>Infrastructure-as-a-Service offerings, including OpenStack,
        use flavors to provide standardized views of virtual machine
        resource requirements that simplify the problem of scheduling
@ -253,107 +212,9 @@
        different ratios of CPU versus RAM versus HDD
        requirements.</para>
    <para>For more information on Flavors see:
-        <link xlink:href="http://docs.openstack.org/openstack-ops/content/flavors.html">http://docs.openstack.org/openstack-ops/content/flavors.html</link></para>
-    </section>
-    <section xml:id="performance-compute-focus"><title>Performance</title>
-    <para>So that workloads can consume as many resources
-        as are available, do not share cloud infrastructure. Ensure you accommodate
-        large scale workloads.</para>
-    <para>The duration of batch processing differs depending on
-        individual workloads. Time limits range from
-        seconds to hours, and as a result it is difficult to predict resource
-        use.</para>
-    </section>
-    <section xml:id="security-compute-focus"><title>Security</title>
-    <para>The security considerations for this scenario are
-        similar to those of the other scenarios in this guide.</para>
-    <para>A security domain comprises users, applications, servers,
-        and networks that share common trust requirements and
-        expectations within a system. Typically they have the same
-        authentication and authorization requirements and
-        users.</para>
-    <para>These security domains are:</para>
-    <orderedlist>
-        <listitem>
-            <para>Public</para>
-        </listitem>
-        <listitem>
-            <para>Guest</para>
-        </listitem>
-        <listitem>
-            <para>Management</para>
-        </listitem>
-        <listitem>
-            <para>Data</para>
-        </listitem>
-    </orderedlist>
-    <para>You can map these security domains individually to the
-        installation, or combine them. For example, some
-        deployment topologies combine both guest and data domains onto
-        one physical network, whereas in other cases these networks
-        are physically separate. In each case, the cloud operator
-        should be aware of the appropriate security concerns. Map out
-        security domains against specific OpenStack
-        deployment topology. The domains and their trust requirements
-        depend on whether the cloud instance is public, private, or
-        hybrid.</para>
-    <para>The public security domain is an untrusted area of
-        the cloud infrastructure. It can refer to the Internet as a
-        whole or simply to networks over which the user has no
-        authority. Always consider this domain untrusted.</para>
-    <para>Typically used for compute instance-to-instance traffic, the
-        guest security domain handles compute data generated by
-        instances on the cloud. It does not handle services that support the
-        operation of the cloud, for example API calls. Public cloud
-        providers and private cloud providers who do not have
-        stringent controls on instance use or who allow unrestricted
-        Internet access to instances should consider this an untrusted domain.
-        Private cloud providers may want to consider this
-        an internal network and therefore trusted only if they have
-        controls in place to assert that they trust instances and all
-        their tenants.</para>
-    <para>The management security domain is where services interact.
-        Sometimes referred to as the "control plane", the networks in
-        this domain transport confidential data such as configuration
-        parameters, user names, and passwords. In most deployments this
-        is a trusted domain.</para>
-    <para>The data security domain deals with
-        information pertaining to the storage services within
-        OpenStack. Much of the data that crosses this network has high
-        integrity and confidentiality requirements and depending on
-        the type of deployment there may also be strong availability
-        requirements. The trust level of this network is heavily
-        dependent on deployment decisions and as such we do not assign
-        this a default level of trust.</para>
-    <para>When deploying OpenStack in an enterprise as a private cloud, you can
-        generally assume it is behind a firewall and within the trusted
-        network alongside existing systems. Users of the cloud are
-        typically employees or trusted individuals that are bound by
-        the security requirements set forth by the company. This tends
-        to push most of the security domains towards a more trusted
-        model. However, when deploying OpenStack in a public-facing
-        role, you cannot make these assumptions and the number of attack vectors
-        significantly increases. For example, the API endpoints and the
-        software behind it become vulnerable to hostile
-        entities attempting to gain unauthorized access or prevent access
-        to services. This can result in loss of reputation and you must
-        protect against it through auditing and appropriate
-        filtering.</para>
-    <para>Take care when managing the users of the
-        system, whether in public or private
-        clouds. The identity service enables LDAP to be part of the
-        authentication process, and includes such systems as an
-        OpenStack deployment that may ease user management if
-        integrated into existing systems.</para>
-    <para>We recommend placing API services behind hardware that
-        performs SSL termination. API services
-        transmit user names, passwords, and generated tokens between
-        client machines and API endpoints and therefore must be
-        secure.</para>
-    <para>For more information on OpenStack Security, see
-      <link xlink:href="http://docs.openstack.org/security-guide/">http://docs.openstack.org/security-guide/</link>
-    </para>
+        <link xlink:href="http://docs.openstack.org/openstack-ops/content/flavors.html">OpenStack Operations Guide: Flavors</link></para>
    </section>
+
    <section xml:id="openstack-components-compute-focus">
      <title>OpenStack components</title>
    <para>Due to the nature of the workloads in this
@ -376,32 +237,24 @@
        <listitem>
            <para><glossterm>Orchestration</glossterm> module
            (<glossterm>heat</glossterm>)</para>
-        </listitem>
-    </itemizedlist>
-    <para>It is safe to assume that, given the nature of the
+                  <para>Given the nature of the
                    applications involved in this scenario, these are heavily
                    automated deployments. Making use of Orchestration is highly
                    beneficial in this case. You can script the deployment of a
                    batch of instances and the running of tests, but it
                    makes sense to use the Orchestration module
                    to handle all these actions.</para>
-    <itemizedlist>
+        </listitem>
        <listitem>
            <para>Telemetry module (ceilometer)</para>
-        </listitem>
-    </itemizedlist>
                <para>Telemetry and the alarms it generates support autoscaling
                  of instances using Orchestration. Users that are not using the
                  Orchestration module do not need to deploy the Telemetry module and
                  may choose to use external solutions to fulfill their
                  metering and monitoring requirements.</para>
-    <para>See also:
-        <link xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">http://docs.openstack.org/openstack-ops/content/logging_monitoring.html</link></para>
-    <itemizedlist>
+        </listitem>
        <listitem>
            <para>OpenStack Block Storage (cinder)</para>
-        </listitem>
-    </itemizedlist>
                <para>Due to the burst-able nature of the workloads and the
                  applications and instances that perform batch
                  processing, this cloud mainly uses memory or CPU, so
@ -409,14 +262,15 @@
                  requirement. This does not mean that you do not use
                  OpenStack Block Storage (cinder) in the infrastructure, but
                  typically it is not a central component.</para>
-    <itemizedlist>
+        </listitem>
        <listitem>
            <para>Networking</para>
-        </listitem>
-    </itemizedlist>
                <para>When choosing a networking platform, ensure that it either
                  works with all desired hypervisor and container technologies
                  and their OpenStack drivers, or that it includes an implementation of
                  an ML2 mechanism driver. You can mix networking platforms
-        that provide ML2 mechanisms drivers.</para></section>
+                  that provide ML2 mechanisms drivers.</para>
+        </listitem>
+    </itemizedlist>
+  </section>
 </section>