Edits to the prescriptive examples and tech consideration files

1. Removal of duplicated content
2. Removal of unnecessary content
3. Edits to existing content

Change-Id: Ib44cb3364dd7ec199e6134906fc83b575e6e4150
Implements: blueprint arch-guide
This commit is contained in:
asettle 2015-08-18 13:30:25 +10:00
parent 592f0ec820
commit d81f3251a7
2 changed files with 76 additions and 224 deletions

View File

@ -80,23 +80,31 @@
</mediaobject> </mediaobject>
<para>There is also some customization of the filter scheduler <para>There is also some customization of the filter scheduler
that handles placement within the cells:</para> that handles placement within the cells:</para>
<itemizedlist> <variablelist>
<listitem><para>ImagePropertiesFilter - To provide special handling <varlistentry><term>ImagePropertiesFilter</term>
<listitem>
<para>Provides special handling
depending on the guest operating system in use depending on the guest operating system in use
(Linux-based or Windows-based).</para> (Linux-based or Windows-based).</para>
</listitem> </listitem>
<listitem><para>ProjectsToAggregateFilter - To provide special </varlistentry>
handling depending on the project the instance is <varlistentry><term>ProjectsToAggregateFilter</term>
<listitem><para>Provides special
handling depending on which project the instance is
associated with.</para> associated with.</para>
</listitem> </listitem>
<listitem><para>default_schedule_zones - Allows the selection of </varlistentry>
<varlistentry><term>default_schedule_zones</term>
<listitem><para>Allows the selection of
multiple default availability zones, rather than a multiple default availability zones, rather than a
single default. single default.</para>
</para></listitem> </listitem>
</itemizedlist> </varlistentry>
</variablelist>
<para>A central database team manages the MySQL database server in each cell <para>A central database team manages the MySQL database server in each cell
in an active/passive configuration with a NetApp storage back end. in an active/passive configuration with a NetApp storage back end.
Backups run every 6 hours.</para> Backups run every 6 hours.</para>
<section xml:id="network-architecture"> <section xml:id="network-architecture">
<title>Network architecture</title> <title>Network architecture</title>
<para>To integrate with existing networking infrastructure, CERN <para>To integrate with existing networking infrastructure, CERN
@ -110,7 +118,9 @@
placed an instance and selects a MAC address and IP placed an instance and selects a MAC address and IP
from the pre-registered list associated with that node in the from the pre-registered list associated with that node in the
database. The database updates to reflect the address assignment to database. The database updates to reflect the address assignment to
that instance.</para></section> that instance.</para>
</section>
<section xml:id="storage-architecture"> <section xml:id="storage-architecture">
<title>Storage architecture</title> <title>Storage architecture</title>
<para>CERN deploys the OpenStack Image service in the API cell and <para>CERN deploys the OpenStack Image service in the API cell and
@ -119,7 +129,9 @@
use is a 3 PB Ceph cluster.</para> use is a 3 PB Ceph cluster.</para>
<para>CERN maintains a small set of Scientific Linux 5 and 6 images onto <para>CERN maintains a small set of Scientific Linux 5 and 6 images onto
which orchestration tools can place applications. Puppet manages which orchestration tools can place applications. Puppet manages
instance configuration and customization.</para></section> instance configuration and customization.</para>
</section>
<section xml:id="monitoring"> <section xml:id="monitoring">
<title>Monitoring</title> <title>Monitoring</title>
<para>CERN does not require direct billing, but uses the Telemetry module <para>CERN does not require direct billing, but uses the Telemetry module
@ -147,18 +159,4 @@
project. project.
</para> </para>
</section> </section>
<section xml:id="references-cern-resources"><title>References</title>
<para>The authors of the Architecture Design Guide would like to
thank CERN for publicly documenting their OpenStack deployment
in these resources, which formed the basis for this
chapter:</para>
<itemizedlist>
<listitem><para><link xlink:href="http://openstack-in-production.blogspot.fr">http://openstack-in-production.blogspot.fr</link>
</para></listitem>
<listitem><para>
<link
xlink:href="http://www.openstack.org/assets/presentation-media/Deep-Dive-into-the-CERN-Cloud-Infrastructure.pdf">Deep
dive into the CERN Cloud Infrastructure</link></para>
</listitem>
</itemizedlist></section>
</section> </section>

View File

@ -12,10 +12,7 @@
<title>Technical considerations</title> <title>Technical considerations</title>
<para>In a compute-focused OpenStack cloud, the type of instance <para>In a compute-focused OpenStack cloud, the type of instance
workloads you provision heavily influences technical workloads you provision heavily influences technical
decision making. For example, specific use cases that demand decision making.</para>
multiple, short-running jobs present different requirements
than those that specify long-running jobs, even though both
situations are compute focused.</para>
<para>Public and private clouds require deterministic capacity <para>Public and private clouds require deterministic capacity
planning to support elastic growth in order to meet user SLA planning to support elastic growth in order to meet user SLA
expectations. Deterministic capacity planning is the path to expectations. Deterministic capacity planning is the path to
@ -23,21 +20,19 @@
perform consistently. This process is important because, perform consistently. This process is important because,
when a service becomes a critical part of a user's when a service becomes a critical part of a user's
infrastructure, the user's experience links directly to the SLAs of infrastructure, the user's experience links directly to the SLAs of
the cloud itself. In cloud computing, it is not average speed but the cloud itself.</para>
speed consistency that determines a service's performance. <para>There are two aspects of capacity planning to consider:</para>
There are two aspects of capacity planning to consider:</para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para>planning the initial deployment footprint</para> <para>Planning the initial deployment footprint</para>
</listitem> </listitem>
<listitem> <listitem>
<para>planning expansion of it to stay ahead of the demands of cloud <para>Planning expansion of the environment to stay ahead of the
users</para> demands of cloud users</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>Plan the initial footprint for an OpenStack deployment <para>Begin planning an initial OpenStack deployment footprint with
based on existing infrastructure workloads estimations of expected uptake, and existing infrastructure workloads.</para>
and estimates based on expected uptake.</para>
<para>The starting point is the core count of the cloud. By <para>The starting point is the core count of the cloud. By
applying relevant ratios, the user can gather information applying relevant ratios, the user can gather information
about:</para> about:</para>
@ -50,7 +45,7 @@
<para>Required storage: flavor disk size × number of instances</para> <para>Required storage: flavor disk size × number of instances</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>Use these ratios to determine the amount of <para>These ratios determine the amount of
additional infrastructure needed to support the cloud. For additional infrastructure needed to support the cloud. For
example, consider a situation in which you require 1600 example, consider a situation in which you require 1600
instances, each with 2 vCPU and 50 GB of storage. Assuming the instances, each with 2 vCPU and 50 GB of storage. Assuming the
@ -61,7 +56,7 @@
<para>1600 = (16 &times; (number of physical cores)) / 2</para> <para>1600 = (16 &times; (number of physical cores)) / 2</para>
</listitem> </listitem>
<listitem> <listitem>
<para>storage required = 50&nbsp;GB &times; 1600</para> <para>Storage required = 50&nbsp;GB &times; 1600</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>On the surface, the equations reveal the need for 200 <para>On the surface, the equations reveal the need for 200
@ -71,19 +66,10 @@
look at patterns of usage to estimate the load that the API look at patterns of usage to estimate the load that the API
services, database servers, and queue servers are likely to services, database servers, and queue servers are likely to
encounter.</para> encounter.</para>
<para>Consider, for example, the differences between a cloud that
supports a managed web-hosting platform and one running
integration tests for a development project that creates one
instance per code commit. In the former, the heavy work of
creating an instance happens only every few months, whereas
the latter puts constant heavy load on the cloud controller.
The average instance lifetime is significant, as a larger
number generally means less load on the cloud
controller.</para>
<para>Aside from the creation and termination of instances, consider the <para>Aside from the creation and termination of instances, consider the
impact of users accessing the service, impact of users accessing the service,
particularly on nova-api and its associated database. Listing particularly on nova-api and its associated database. Listing
instances gathers a great deal of information and, given the instances gathers a great deal of information and given the
frequency with which users run this operation, a cloud with a frequency with which users run this operation, a cloud with a
large number of users can increase the load significantly. large number of users can increase the load significantly.
This can even occur unintentionally. For example, the This can even occur unintentionally. For example, the
@ -104,17 +90,12 @@
the impacts of different hardware and instance load outs. See: the impacts of different hardware and instance load outs. See:
<link xlink:href="https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods">https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods</link> <link xlink:href="https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods">https://github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-resource-calculator.ods</link>
</para> </para>
<section xml:id="expansion-planning-compute-focus"> <section xml:id="expansion-planning-compute-focus">
<title>Expansion planning</title> <title>Expansion planning</title>
<para>A key challenge for planning the expansion of cloud <para>A key challenge for planning the expansion of cloud
compute services is the elastic nature of cloud infrastructure compute services is the elastic nature of cloud infrastructure
demands. Previously, new users or customers had to demands.</para>
plan for and request the infrastructure they required ahead of
time, allowing time for reactive procurement processes. Cloud
computing users have come to expect the agility of having
instant access to new resources as required.
Consequently, plan for typical usage and for sudden bursts in
usage.</para>
<para>Planning for expansion is a balancing act. <para>Planning for expansion is a balancing act.
Planning too conservatively can lead to unexpected Planning too conservatively can lead to unexpected
oversubscription of the cloud and dissatisfied users. Planning oversubscription of the cloud and dissatisfied users. Planning
@ -127,31 +108,11 @@
average speed or capacity of the cloud. Using this information average speed or capacity of the cloud. Using this information
to model capacity performance enables users to more to model capacity performance enables users to more
accurately determine the current and future capacity of the accurately determine the current and future capacity of the
cloud.</para></section> cloud.</para>
<section xml:id="cpu-and-ram-compute-focus"><title>CPU and RAM</title> </section>
<para>Adapted from:
<link xlink:href="http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice">http://docs.openstack.org/openstack-ops/content/compute_nodes.html#cpu_choice</link></para> <section xml:id="cpu-and-ram-compute-focus">
<para>In current generations, CPUs have up to 12 cores. If an <title>CPU and RAM</title>
Intel CPU supports Hyper-Threading, those 12 cores double
to 24 cores. A server that supports multiple CPUs multiplies
the number of available cores.
Hyper-Threading is Intel's proprietary simultaneous
multi-threading implementation, used to improve
parallelization on their CPUs. Consider enabling
Hyper-Threading to improve the performance of multithreaded
applications.</para>
<para>Whether the user should enable Hyper-Threading on a CPU
depends on the use case. For example, disabling
Hyper-Threading can be beneficial in intense computing
environments. Running performance tests using local
workloads with and without Hyper-Threading can help
determine which option is more appropriate in any particular
case.</para>
<para>If they must run the Libvirt or KVM hypervisor drivers,
then the compute node CPUs must support
virtualization by way of the VT-x extensions for Intel chips
and AMD-v extensions for AMD chips to provide full
performance.</para>
<para>OpenStack enables users to overcommit CPU and RAM on <para>OpenStack enables users to overcommit CPU and RAM on
compute nodes. This allows an increase in the number of compute nodes. This allows an increase in the number of
instances running on the cloud at the cost of reducing the instances running on the cloud at the cost of reducing the
@ -176,13 +137,10 @@
long as the total amount of RAM associated with the instances long as the total amount of RAM associated with the instances
is less than 1.5 times the amount of RAM available on the is less than 1.5 times the amount of RAM available on the
physical node.</para> physical node.</para>
<para>For example, if a physical node has 48 GB of RAM, the
scheduler allocates instances to that node until the sum of
the RAM associated with the instances reaches 72 GB (such as
nine instances, in the case where each instance has 8 GB of
RAM).</para>
<para>You must select the appropriate CPU and RAM allocation ratio <para>You must select the appropriate CPU and RAM allocation ratio
based on particular use cases.</para></section> based on particular use cases.</para>
</section>
<section xml:id="additional-hardware-compute-focus"> <section xml:id="additional-hardware-compute-focus">
<title>Additional hardware</title> <title>Additional hardware</title>
<para>Certain use cases may benefit from exposure to additional <para>Certain use cases may benefit from exposure to additional
@ -193,8 +151,6 @@
the availability of graphics processing units (GPUs) the availability of graphics processing units (GPUs)
for general-purpose computing.</para> for general-purpose computing.</para>
</listitem> </listitem>
</itemizedlist>
<itemizedlist>
<listitem> <listitem>
<para>Cryptographic routines that benefit from the <para>Cryptographic routines that benefit from the
availability of hardware random number generators to availability of hardware random number generators to
@ -210,11 +166,14 @@
characteristics, which can include hardware similarities. The characteristics, which can include hardware similarities. The
addition of specialized hardware to a cloud deployment is addition of specialized hardware to a cloud deployment is
likely to add to the cost of each node, so consider carefully likely to add to the cost of each node, so consider carefully
consideration whether all compute nodes, or whether all compute nodes, or
just a subset targeted by flavors, need the just a subset targeted by flavors, need the
additional customization to support the desired additional customization to support the desired
workloads.</para></section> workloads.</para>
<section xml:id="utilization"><title>Utilization</title> </section>
<section xml:id="utilization">
<title>Utilization</title>
<para>Infrastructure-as-a-Service offerings, including OpenStack, <para>Infrastructure-as-a-Service offerings, including OpenStack,
use flavors to provide standardized views of virtual machine use flavors to provide standardized views of virtual machine
resource requirements that simplify the problem of scheduling resource requirements that simplify the problem of scheduling
@ -253,107 +212,9 @@
different ratios of CPU versus RAM versus HDD different ratios of CPU versus RAM versus HDD
requirements.</para> requirements.</para>
<para>For more information on Flavors see: <para>For more information on Flavors see:
<link xlink:href="http://docs.openstack.org/openstack-ops/content/flavors.html">http://docs.openstack.org/openstack-ops/content/flavors.html</link></para> <link xlink:href="http://docs.openstack.org/openstack-ops/content/flavors.html">OpenStack Operations Guide: Flavors</link></para>
</section>
<section xml:id="performance-compute-focus"><title>Performance</title>
<para>So that workloads can consume as many resources
as are available, do not share cloud infrastructure. Ensure you accommodate
large scale workloads.</para>
<para>The duration of batch processing differs depending on
individual workloads. Time limits range from
seconds to hours, and as a result it is difficult to predict resource
use.</para>
</section>
<section xml:id="security-compute-focus"><title>Security</title>
<para>The security considerations for this scenario are
similar to those of the other scenarios in this guide.</para>
<para>A security domain comprises users, applications, servers,
and networks that share common trust requirements and
expectations within a system. Typically they have the same
authentication and authorization requirements and
users.</para>
<para>These security domains are:</para>
<orderedlist>
<listitem>
<para>Public</para>
</listitem>
<listitem>
<para>Guest</para>
</listitem>
<listitem>
<para>Management</para>
</listitem>
<listitem>
<para>Data</para>
</listitem>
</orderedlist>
<para>You can map these security domains individually to the
installation, or combine them. For example, some
deployment topologies combine both guest and data domains onto
one physical network, whereas in other cases these networks
are physically separate. In each case, the cloud operator
should be aware of the appropriate security concerns. Map out
security domains against specific OpenStack
deployment topology. The domains and their trust requirements
depend on whether the cloud instance is public, private, or
hybrid.</para>
<para>The public security domain is an untrusted area of
the cloud infrastructure. It can refer to the Internet as a
whole or simply to networks over which the user has no
authority. Always consider this domain untrusted.</para>
<para>Typically used for compute instance-to-instance traffic, the
guest security domain handles compute data generated by
instances on the cloud. It does not handle services that support the
operation of the cloud, for example API calls. Public cloud
providers and private cloud providers who do not have
stringent controls on instance use or who allow unrestricted
Internet access to instances should consider this an untrusted domain.
Private cloud providers may want to consider this
an internal network and therefore trusted only if they have
controls in place to assert that they trust instances and all
their tenants.</para>
<para>The management security domain is where services interact.
Sometimes referred to as the "control plane", the networks in
this domain transport confidential data such as configuration
parameters, user names, and passwords. In most deployments this
is a trusted domain.</para>
<para>The data security domain deals with
information pertaining to the storage services within
OpenStack. Much of the data that crosses this network has high
integrity and confidentiality requirements and depending on
the type of deployment there may also be strong availability
requirements. The trust level of this network is heavily
dependent on deployment decisions and as such we do not assign
this a default level of trust.</para>
<para>When deploying OpenStack in an enterprise as a private cloud, you can
generally assume it is behind a firewall and within the trusted
network alongside existing systems. Users of the cloud are
typically employees or trusted individuals that are bound by
the security requirements set forth by the company. This tends
to push most of the security domains towards a more trusted
model. However, when deploying OpenStack in a public-facing
role, you cannot make these assumptions and the number of attack vectors
significantly increases. For example, the API endpoints and the
software behind it become vulnerable to hostile
entities attempting to gain unauthorized access or prevent access
to services. This can result in loss of reputation and you must
protect against it through auditing and appropriate
filtering.</para>
<para>Take care when managing the users of the
system, whether in public or private
clouds. The identity service enables LDAP to be part of the
authentication process, and includes such systems as an
OpenStack deployment that may ease user management if
integrated into existing systems.</para>
<para>We recommend placing API services behind hardware that
performs SSL termination. API services
transmit user names, passwords, and generated tokens between
client machines and API endpoints and therefore must be
secure.</para>
<para>For more information on OpenStack Security, see
<link xlink:href="http://docs.openstack.org/security-guide/">http://docs.openstack.org/security-guide/</link>
</para>
</section> </section>
<section xml:id="openstack-components-compute-focus"> <section xml:id="openstack-components-compute-focus">
<title>OpenStack components</title> <title>OpenStack components</title>
<para>Due to the nature of the workloads in this <para>Due to the nature of the workloads in this
@ -376,47 +237,40 @@
<listitem> <listitem>
<para><glossterm>Orchestration</glossterm> module <para><glossterm>Orchestration</glossterm> module
(<glossterm>heat</glossterm>)</para> (<glossterm>heat</glossterm>)</para>
<para>Given the nature of the
applications involved in this scenario, these are heavily
automated deployments. Making use of Orchestration is highly
beneficial in this case. You can script the deployment of a
batch of instances and the running of tests, but it
makes sense to use the Orchestration module
to handle all these actions.</para>
</listitem> </listitem>
</itemizedlist>
<para>It is safe to assume that, given the nature of the
applications involved in this scenario, these are heavily
automated deployments. Making use of Orchestration is highly
beneficial in this case. You can script the deployment of a
batch of instances and the running of tests, but it
makes sense to use the Orchestration module
to handle all these actions.</para>
<itemizedlist>
<listitem> <listitem>
<para>Telemetry module (ceilometer)</para> <para>Telemetry module (ceilometer)</para>
<para>Telemetry and the alarms it generates support autoscaling
of instances using Orchestration. Users that are not using the
Orchestration module do not need to deploy the Telemetry module and
may choose to use external solutions to fulfill their
metering and monitoring requirements.</para>
</listitem> </listitem>
</itemizedlist>
<para>Telemetry and the alarms it generates support autoscaling
of instances using Orchestration. Users that are not using the
Orchestration module do not need to deploy the Telemetry module and
may choose to use external solutions to fulfill their
metering and monitoring requirements.</para>
<para>See also:
<link xlink:href="http://docs.openstack.org/openstack-ops/content/logging_monitoring.html">http://docs.openstack.org/openstack-ops/content/logging_monitoring.html</link></para>
<itemizedlist>
<listitem> <listitem>
<para>OpenStack Block Storage (cinder)</para> <para>OpenStack Block Storage (cinder)</para>
<para>Due to the burst-able nature of the workloads and the
applications and instances that perform batch
processing, this cloud mainly uses memory or CPU, so
the need for add-on storage to each instance is not a likely
requirement. This does not mean that you do not use
OpenStack Block Storage (cinder) in the infrastructure, but
typically it is not a central component.</para>
</listitem> </listitem>
</itemizedlist>
<para>Due to the burst-able nature of the workloads and the
applications and instances that perform batch
processing, this cloud mainly uses memory or CPU, so
the need for add-on storage to each instance is not a likely
requirement. This does not mean that you do not use
OpenStack Block Storage (cinder) in the infrastructure, but
typically it is not a central component.</para>
<itemizedlist>
<listitem> <listitem>
<para>Networking</para> <para>Networking</para>
<para>When choosing a networking platform, ensure that it either
works with all desired hypervisor and container technologies
and their OpenStack drivers, or that it includes an implementation of
an ML2 mechanism driver. You can mix networking platforms
that provide ML2 mechanisms drivers.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>When choosing a networking platform, ensure that it either </section>
works with all desired hypervisor and container technologies
and their OpenStack drivers, or that it includes an implementation of
an ML2 mechanism driver. You can mix networking platforms
that provide ML2 mechanisms drivers.</para></section>
</section> </section>