Remove passive voice from Chap 8 Arch Guide
Change-Id: I7b6698e24c7fcc25c78853980c5be1068e7ee002 Closes-Bug: #1431137
This commit is contained in:
parent
f1eb453741
commit
a444c8510a
@ -6,33 +6,31 @@
|
|||||||
xml:id="massively_scalable">
|
xml:id="massively_scalable">
|
||||||
<title>Massively scalable</title>
|
<title>Massively scalable</title>
|
||||||
|
|
||||||
<para>A massively scalable architecture is defined as a cloud
|
<para>A massively scalable architecture is a cloud
|
||||||
implementation that is either a very large deployment, such as
|
implementation that is either a very large deployment, such as
|
||||||
one that would be built by a commercial service provider, or
|
a commercial service provider might build, or
|
||||||
one that has the capability to support user requests for large
|
one that has the capability to support user requests for large
|
||||||
amounts of cloud resources. An example would be an
|
amounts of cloud resources. An example is an
|
||||||
infrastructure in which requests to service 500 instances or
|
infrastructure in which requests to service 500 or more instances
|
||||||
more at a time is not uncommon. In a massively scalable
|
at a time is common. A massively scalable infrastructure
|
||||||
infrastructure, such a request is fulfilled without completely
|
fulfills such a request without exhausting the available
|
||||||
consuming all of the available cloud infrastructure resources.
|
cloud infrastructure resources. While the high capital cost
|
||||||
While the high capital cost of implementing such a cloud
|
of implementing such a cloud architecture means that it is
|
||||||
architecture makes it cost prohibitive and is only spearheaded
|
currently in limited use, many organizations are planning
|
||||||
by few organizations, many organizations are planning for
|
for massive scalability in the future.</para>
|
||||||
massive scalability moving toward the future.</para>
|
|
||||||
<para>A massively scalable OpenStack cloud design presents a
|
<para>A massively scalable OpenStack cloud design presents a
|
||||||
unique set of challenges and considerations. For the most part
|
unique set of challenges and considerations. For the most part
|
||||||
it is similar to a general purpose cloud architecture, as it
|
it is similar to a general purpose cloud architecture, as it
|
||||||
is built to address a non-specific range of potential use
|
is built to address a non-specific range of potential use
|
||||||
cases or functions. Typically, it is rare that massively
|
cases or functions. Typically, it is rare that particular
|
||||||
scalable clouds are designed or specialized for particular
|
workloads determine the design or configuration of massively
|
||||||
workloads. Like the general purpose cloud, the massively
|
scalable clouds. Like the general purpose cloud, the massively
|
||||||
scalable cloud is most often built as a platform for a variety
|
scalable cloud is most often built as a platform for a variety
|
||||||
of workloads. Massively scalable OpenStack clouds are
|
of workloads. Because private organizations rarely require
|
||||||
generally built as commercial public cloud offerings since
|
or have the resources for them, massively scalable OpenStack clouds
|
||||||
single private organizations rarely have the resources or need
|
are generally built as commercial, public cloud offerings.</para>
|
||||||
for this scale.</para>
|
|
||||||
<para>Services provided by a massively scalable OpenStack cloud
|
<para>Services provided by a massively scalable OpenStack cloud
|
||||||
will include:</para>
|
include:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Virtual-machine disk image library</para>
|
<para>Virtual-machine disk image library</para>
|
||||||
@ -64,12 +62,12 @@
|
|||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
<para>Like a general purpose cloud, the instances deployed in a
|
<para>Like a general purpose cloud, the instances deployed in a
|
||||||
massively scalable OpenStack cloud will not necessarily use
|
massively scalable OpenStack cloud do not necessarily use
|
||||||
any specific aspect of the cloud offering (compute, network,
|
any specific aspect of the cloud offering (compute, network,
|
||||||
or storage). As the cloud grows in scale, the scale of the
|
or storage). As the cloud grows in scale, the number of
|
||||||
number of workloads can cause stress on all of the cloud
|
workloads can cause stress on all the cloud
|
||||||
components. Additional stresses are introduced to supporting
|
components. This adds further stresses to supporting
|
||||||
infrastructure including databases and message brokers. The
|
infrastructure such as databases and message brokers. The
|
||||||
architecture design for such a cloud must account for these
|
architecture design for such a cloud must account for these
|
||||||
performance pressures without negatively impacting user
|
performance pressures without negatively impacting user
|
||||||
experience.</para>
|
experience.</para>
|
||||||
|
@ -6,35 +6,35 @@
|
|||||||
xml:id="operational-considerations-massive-scale">
|
xml:id="operational-considerations-massive-scale">
|
||||||
<?dbhtml stop-chunking?>
|
<?dbhtml stop-chunking?>
|
||||||
<title>Operational considerations</title>
|
<title>Operational considerations</title>
|
||||||
<para>In order to run at massive scale, it is important to plan on
|
<para>In order to run efficiently at massive scale, automate
|
||||||
the automation of as many of the operational processes as
|
as many of the operational processes as
|
||||||
possible. Automation includes the configuration of
|
possible. Automation includes the configuration of
|
||||||
provisioning, monitoring and alerting systems. Part of the
|
provisioning, monitoring and alerting systems. Part of the
|
||||||
automation process includes the capability to determine when
|
automation process includes the capability to determine when
|
||||||
human intervention is required and who should act. The
|
human intervention is required and who should act. The
|
||||||
objective is to increase the ratio of operational staff to
|
objective is to increase the ratio of operational staff to
|
||||||
running systems as much as possible to reduce maintenance
|
running systems as much as possible in order to reduce maintenance
|
||||||
costs. In a massively scaled environment, it is impossible for
|
costs. In a massively scaled environment, it is impossible for
|
||||||
staff to give each system individual care.</para>
|
staff to give each system individual care.</para>
|
||||||
<para>Configuration management tools such as Puppet or Chef allow
|
<para>Configuration management tools such as Puppet and Chef enable
|
||||||
operations staff to categorize systems into groups based on
|
operations staff to categorize systems into groups based on
|
||||||
their role and thus create configurations and system states
|
their roles and thus create configurations and system states
|
||||||
that are enforced through the provisioning system. Systems
|
that the provisioning system enforces. Systems
|
||||||
that fall out of the defined state due to errors or failures
|
that fall out of the defined state due to errors or failures
|
||||||
are quickly removed from the pool of active nodes and
|
are quickly removed from the pool of active nodes and
|
||||||
replaced.</para>
|
replaced.</para>
|
||||||
<para>At large scale the resource cost of diagnosing individual
|
<para>At large scale the resource cost of diagnosing failed individual
|
||||||
systems that have failed is far greater than the cost of
|
systems is far greater than the cost of
|
||||||
replacement. It is more economical to immediately replace the
|
replacement. It is more economical to replace the failed
|
||||||
system with a new system that can be provisioned and
|
system with a new system, provisioning and configuring it
|
||||||
configured automatically and quickly brought back into the
|
automatically then quickly adding it to the
|
||||||
pool of active nodes. By automating tasks that are labor-intensive,
|
pool of active nodes. By automating tasks that are labor-intensive,
|
||||||
repetitive, and critical to operations with
|
repetitive, and critical to operations, cloud operations
|
||||||
automation, cloud operations teams are able to be managed more
|
teams can work more
|
||||||
efficiently because fewer resources are needed for these
|
efficiently because fewer resources are required for these
|
||||||
babysitting tasks. Administrators are then free to tackle
|
common tasks. Administrators are then free to tackle
|
||||||
tasks that cannot be easily automated and have longer-term
|
tasks that are not easy to automate and that have longer-term
|
||||||
impacts on the business such as capacity planning.</para>
|
impacts on the business, for example capacity planning.</para>
|
||||||
<section xml:id="the-bleeding-edge">
|
<section xml:id="the-bleeding-edge">
|
||||||
<title>The bleeding edge</title>
|
<title>The bleeding edge</title>
|
||||||
<para>Running OpenStack at massive scale requires striking a
|
<para>Running OpenStack at massive scale requires striking a
|
||||||
@ -42,49 +42,48 @@
|
|||||||
be tempting to run an older stable release branch of OpenStack
|
be tempting to run an older stable release branch of OpenStack
|
||||||
to make deployments easier. However, when running at massive
|
to make deployments easier. However, when running at massive
|
||||||
scale, known issues that may be of some concern or only have
|
scale, known issues that may be of some concern or only have
|
||||||
minimal impact in smaller deployments could become pain points
|
minimal impact in smaller deployments could become pain points.
|
||||||
at massive scale. If the issue is well known, in many cases,
|
Recent releases may address well known issues. The OpenStack
|
||||||
it may be resolved in more recent releases. The OpenStack
|
community can help resolve reported issues by applying
|
||||||
community can help resolve any issues reported by applying
|
|
||||||
the collective expertise of the OpenStack developers.</para>
|
the collective expertise of the OpenStack developers.</para>
|
||||||
<para>When issues crop up, the number of organizations running at
|
<para>The number of organizations running at
|
||||||
a similar scale is a relatively tiny proportion of the
|
massive scales is a small proportion of the
|
||||||
OpenStack community, therefore it is important to share these
|
OpenStack community, therefore it is important to share
|
||||||
issues with the community and be a vocal advocate for
|
related issues with the community and be a vocal advocate for
|
||||||
resolving them. Some issues only manifest when operating at
|
resolving them. Some issues only manifest when operating at
|
||||||
large scale and the number of organizations able to duplicate
|
large scale, and the number of organizations able to duplicate
|
||||||
and validate an issue is small, so it will be important to
|
and validate an issue is small, so it is important to
|
||||||
document and dedicate resources to their resolution.</para>
|
document and dedicate resources to their resolution.</para>
|
||||||
<para>In some cases, the resolution to the problem is ultimately
|
<para>In some cases, the resolution to the problem is ultimately
|
||||||
to deploy a more recent version of OpenStack. Alternatively,
|
to deploy a more recent version of OpenStack. Alternatively,
|
||||||
when the issue needs to be resolved in a production
|
when you must resolve an issue in a production
|
||||||
environment where rebuilding the entire environment is not an
|
environment where rebuilding the entire environment is not an
|
||||||
option, it is possible to deploy just the more recent separate
|
option, it is sometimes possible to deploy updates to specific
|
||||||
underlying components required to resolve issues or gain
|
underlying components in order to resolve issues or gain
|
||||||
significant performance improvements. At first glance, this
|
significant performance improvements. Although this may appear
|
||||||
could be perceived as potentially exposing the deployment to
|
to expose the deployment to
|
||||||
increased risk and instability. However, in many cases it
|
increased risk and instability, in many cases it
|
||||||
could be an issue that has not been discovered yet.</para>
|
could be an undiscovered issue.</para>
|
||||||
<para>It is advisable to cultivate a development and operations
|
<para>We recommend building a development and operations
|
||||||
organization that is responsible for creating desired
|
organization that is responsible for creating desired
|
||||||
features, diagnose and resolve issues, and also build the
|
features, diagnosing and resolving issues, and building the
|
||||||
infrastructure for large scale continuous integration tests
|
infrastructure for large scale continuous integration tests
|
||||||
and continuous deployment. This helps catch bugs early and
|
and continuous deployment. This helps catch bugs early and
|
||||||
make deployments quicker and less painful. In addition to
|
makes deployments faster and easier. In addition to
|
||||||
development resources, the recruitment of experts in the
|
development resources, we also recommend the recruitment
|
||||||
fields of message queues, databases, distributed systems, and
|
of experts in the fields of message queues, databases, distributed
|
||||||
networking, cloud and storage is also advisable.</para></section>
|
systems, networking, cloud, and storage.</para></section>
|
||||||
<section xml:id="growth-and-capacity-planning">
|
<section xml:id="growth-and-capacity-planning">
|
||||||
<title>Growth and capacity planning</title>
|
<title>Growth and capacity planning</title>
|
||||||
<para>An important consideration in running at massive scale is
|
<para>An important consideration in running at massive scale is
|
||||||
projecting growth and utilization trends to plan capital
|
projecting growth and utilization trends in order to plan capital
|
||||||
expenditures for the near and long term. Utilization metrics
|
expenditures for the short and long term. Gather utilization
|
||||||
for compute, network, and storage as well as a historical
|
metrics for compute, network, and storage, along with historical
|
||||||
record of these metrics are required. While securing major
|
records of these metrics. While securing major
|
||||||
anchor tenants can lead to rapid jumps in the utilization
|
anchor tenants can lead to rapid jumps in the utilization
|
||||||
rates of all resources, the steady adoption of the cloud
|
rates of all resources, the steady adoption of the cloud
|
||||||
inside an organizations or by public consumers in a public
|
inside an organization or by consumers in a public
|
||||||
offering will also create a steady trend of increased
|
offering also creates a steady trend of increased
|
||||||
utilization.</para></section>
|
utilization.</para></section>
|
||||||
<section xml:id="skills-and-training">
|
<section xml:id="skills-and-training">
|
||||||
<title>Skills and training</title>
|
<title>Skills and training</title>
|
||||||
@ -95,8 +94,8 @@
|
|||||||
members to OpenStack conferences, meetup events, and
|
members to OpenStack conferences, meetup events, and
|
||||||
encouraging active participation in the mailing lists and
|
encouraging active participation in the mailing lists and
|
||||||
committees is a very important way to maintain skills and
|
committees is a very important way to maintain skills and
|
||||||
forge relationships in the community. A list of OpenStack
|
forge relationships in the community. For a list of OpenStack
|
||||||
training providers in the marketplace can be found here: <link
|
training providers in the marketplace, see: <link
|
||||||
xlink:href="http://www.openstack.org/marketplace/training/">http://www.openstack.org/marketplace/training/</link>.
|
xlink:href="http://www.openstack.org/marketplace/training/">http://www.openstack.org/marketplace/training/</link>.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -10,119 +10,114 @@
|
|||||||
xml:id="technical-considerations-massive-scale">
|
xml:id="technical-considerations-massive-scale">
|
||||||
<?dbhtml stop-chunking?>
|
<?dbhtml stop-chunking?>
|
||||||
<title>Technical considerations</title>
|
<title>Technical considerations</title>
|
||||||
<para>Converting an existing OpenStack environment that was
|
<para>Repurposing an existing OpenStack environment to be
|
||||||
designed for a different purpose to be massively scalable is a
|
massively scalable is a formidable task. When building
|
||||||
formidable task. When building a massively scalable
|
a massively scalable environment from the ground up, ensure
|
||||||
environment from the ground up, make sure the initial
|
you build the initial deployment with the same principles
|
||||||
deployment is built with the same principles and choices that
|
and choices that apply as the environment grows. For example,
|
||||||
apply as the environment grows. For example, a good approach
|
a good approach is to deploy the first site as a multi-site
|
||||||
is to deploy the first site as a multi-site environment. This
|
environment. This enables you to use the same deployment
|
||||||
allows the same deployment and segregation methods to be used
|
and segregation methods as the environment grows to separate
|
||||||
as the environment grows to separate locations across
|
locations across dedicated links or wide area networks. In
|
||||||
dedicated links or wide area networks. In a hyperscale cloud,
|
a hyperscale cloud, scale trumps redundancy. Modify applications
|
||||||
scale trumps redundancy. Applications must be modified with
|
with this in mind, relying on the scale and homogeneity of the
|
||||||
this in mind, relying on the scale and homogeneity of the
|
|
||||||
environment to provide reliability rather than redundant
|
environment to provide reliability rather than redundant
|
||||||
infrastructure provided by non-commodity hardware
|
infrastructure provided by non-commodity hardware
|
||||||
solutions.</para>
|
solutions.</para>
|
||||||
<section xml:id="infrastructure-segregation-massive-scale">
|
<section xml:id="infrastructure-segregation-massive-scale">
|
||||||
<title>Infrastructure segregation</title>
|
<title>Infrastructure segregation</title>
|
||||||
<para>Fortunately, OpenStack services are designed to support
|
<para>OpenStack services support massive horizontal scale.
|
||||||
massive horizontal scale. Be aware that this is not the case
|
Be aware that this is not the case for the entire supporting
|
||||||
for the entire supporting infrastructure. This is particularly
|
infrastructure. This is particularly a problem for the database
|
||||||
a problem for the database management systems and message
|
management systems and message queues that OpenStack services
|
||||||
queues used by the various OpenStack services for data storage
|
use for data storage and remote procedure call communications.</para>
|
||||||
and remote procedure call communications.</para>
|
<para>Traditional clustering techniques typically
|
||||||
<para>Traditional clustering techniques are typically used to
|
|
||||||
provide high availability and some additional scale for these
|
provide high availability and some additional scale for these
|
||||||
environments. In the quest for massive scale, however,
|
environments. In the quest for massive scale, however, you must
|
||||||
additional steps need to be taken to relieve the performance
|
take additional steps to relieve the performance
|
||||||
pressure on these components to prevent them from negatively
|
pressure on these components in order to prevent them from negatively
|
||||||
impacting the overall performance of the environment. It is
|
impacting the overall performance of the environment. Ensure
|
||||||
important to make sure that all the components are in balance
|
that all the components are in balance so that if the massively
|
||||||
so that, if and when the massively scalable environment fails,
|
scalable environment fails, all the components are near maximum
|
||||||
all the components are at, or close to, maximum
|
|
||||||
capacity.</para>
|
capacity.</para>
|
||||||
<para>Regions are used to segregate completely independent
|
<para>Regions segregate completely independent
|
||||||
installations linked only by an Identity and Dashboard
|
installations linked only by an Identity and Dashboard
|
||||||
(optional) installation. Services are installed with separate
|
(optional) installation. Services have separate
|
||||||
API endpoints for each region, complete with separate database
|
API endpoints for each region, an include separate database
|
||||||
and queue installations. This exposes some awareness of the
|
and queue installations. This exposes some awareness of the
|
||||||
environment's fault domains to users and gives them the
|
environment's fault domains to users and gives them the
|
||||||
ability to ensure some degree of application resiliency while
|
ability to ensure some degree of application resiliency while
|
||||||
also imposing the requirement to specify which region their
|
also imposing the requirement to specify which region to apply
|
||||||
actions must be applied to.</para>
|
their actions to.</para>
|
||||||
<para>Environments operating at massive scale typically need their
|
<para>Environments operating at massive scale typically need their
|
||||||
regions or sites subdivided further without exposing the
|
regions or sites subdivided further without exposing the
|
||||||
requirement to specify the failure domain to the user. This
|
requirement to specify the failure domain to the user. This
|
||||||
provides the ability to further divide the installation into
|
provides the ability to further divide the installation into
|
||||||
failure domains while also providing a logical unit for
|
failure domains while also providing a logical unit for
|
||||||
maintenance and the addition of new hardware. At hyperscale,
|
maintenance and the addition of new hardware. At hyperscale,
|
||||||
instead of adding single compute nodes, administrators may add
|
instead of adding single compute nodes, administrators can add
|
||||||
entire racks or even groups of racks at a time with each new
|
entire racks or even groups of racks at a time with each new
|
||||||
addition of nodes exposed via one of the segregation concepts
|
addition of nodes exposed via one of the segregation concepts
|
||||||
mentioned herein.</para>
|
mentioned herein.</para>
|
||||||
<para><glossterm baseform="cell">Cells</glossterm> provide the ability
|
<para><glossterm baseform="cell">Cells</glossterm> provide the ability
|
||||||
to subdivide the compute portion
|
to subdivide the compute portion
|
||||||
of an OpenStack installation, including regions, while still
|
of an OpenStack installation, including regions, while still
|
||||||
exposing a single endpoint. In each region an API cell is
|
exposing a single endpoint. Each region has an API cell
|
||||||
created along with a number of compute cells where the
|
along with a number of compute cells where the
|
||||||
workloads actually run. Each cell gets its own database and
|
workloads actually run. Each cell has its own database and
|
||||||
message queue setup (ideally clustered), providing the ability
|
message queue setup (ideally clustered), providing the ability
|
||||||
to subdivide the load on these subsystems, improving overall
|
to subdivide the load on these subsystems, improving overall
|
||||||
performance.</para>
|
performance.</para>
|
||||||
<para>Within each compute cell a complete compute installation is
|
<para>Each compute cell provides a complete compute installation,
|
||||||
provided, complete with full database and queue installations,
|
complete with full database and queue installations,
|
||||||
scheduler, conductor, and multiple compute hosts. The cells
|
scheduler, conductor, and multiple compute hosts. The cells
|
||||||
scheduler handles placement of user requests from the single
|
scheduler handles placement of user requests from the single
|
||||||
API endpoint to a specific cell from those available. The
|
API endpoint to a specific cell from those available. The
|
||||||
normal filter scheduler then handles placement within the
|
normal filter scheduler then handles placement within the
|
||||||
cell.</para>
|
cell.</para>
|
||||||
<para>The downside of using cells is that they are not well
|
<para>Unfortunately, Compute is the only OpenStack service that
|
||||||
supported by any of the OpenStack services other than Compute.
|
provides good support for cells. In addition, cells
|
||||||
Also, they do not adequately support some relatively standard
|
do not adequately support some standard
|
||||||
OpenStack functionality such as security groups and host
|
OpenStack functionality such as security groups and host
|
||||||
aggregates. Due to their relative newness and specialized use,
|
aggregates. Due to their relative newness and specialized use,
|
||||||
they receive relatively little testing in the OpenStack gate.
|
cells receive relatively little testing in the OpenStack gate.
|
||||||
Despite these issues, however, cells are used in some very
|
Despite these issues, cells play an important role in
|
||||||
well known OpenStack installations operating at massive scale
|
well known OpenStack installations operating at massive scale,
|
||||||
including those at CERN and Rackspace.</para></section>
|
such as those at CERN and Rackspace.</para></section>
|
||||||
<section xml:id="host-aggregates">
|
<section xml:id="host-aggregates">
|
||||||
<title>Host aggregates</title>
|
<title>Host aggregates</title>
|
||||||
<para>Host aggregates enable partitioning of OpenStack Compute
|
<para>Host aggregates enable partitioning of OpenStack Compute
|
||||||
deployments into logical groups for load balancing and
|
deployments into logical groups for load balancing and
|
||||||
instance distribution. Host aggregates may also be used to
|
instance distribution. You can also use host aggregates to
|
||||||
further partition an availability zone. Consider a cloud which
|
further partition an availability zone. Consider a cloud which
|
||||||
might use host aggregates to partition an availability zone
|
might use host aggregates to partition an availability zone
|
||||||
into groups of hosts that either share common resources, such
|
into groups of hosts that either share common resources, such
|
||||||
as storage and network, or have a special property, such as
|
as storage and network, or have a special property, such as
|
||||||
trusted computing hardware. Host aggregates are not explicitly
|
trusted computing hardware. You cannot target host aggregates
|
||||||
user-targetable; instead they are implicitly targeted via the
|
explicitly. Instead, select instance flavors that map to host
|
||||||
selection of instance flavors with extra specifications that
|
aggregate metadata. These flavors target host aggregates
|
||||||
map to host aggregate metadata.</para></section>
|
implicitly.</para></section>
|
||||||
<section xml:id="availability-zones">
|
<section xml:id="availability-zones">
|
||||||
<title>Availability zones</title>
|
<title>Availability zones</title>
|
||||||
<para>Availability zones provide another mechanism for subdividing
|
<para>Availability zones provide another mechanism for subdividing
|
||||||
an installation or region. They are, in effect, host
|
an installation or region. They are, in effect, host
|
||||||
aggregates that are exposed for (optional) explicit targeting
|
aggregates exposed for (optional) explicit targeting
|
||||||
by users.</para>
|
by users.</para>
|
||||||
<para>Unlike cells, they do not have their own database server or
|
<para>Unlike cells, availability zones do not have their own database
|
||||||
queue broker but simply represent an arbitrary grouping of
|
server or queue broker but represent an arbitrary grouping of
|
||||||
compute nodes. Typically, grouping of nodes into availability
|
compute nodes. Typically, nodes are grouped into availability
|
||||||
zones is based on a shared failure domain based on a physical
|
zones using a shared failure domain based on a physical
|
||||||
characteristic such as a shared power source, physical network
|
characteristic such as a shared power source or physical network
|
||||||
connection, and so on. Availability zones are exposed to the
|
connections. Users can target exposed availability zones; however,
|
||||||
user because they can be targeted; however, users are not
|
this is not a requirement. An alternative approach is to set a default
|
||||||
required to target them. An alternate approach is for the
|
availability zone to schedule instances to a non-default availability
|
||||||
operator to set a default availability zone to schedule
|
zone of nova.</para></section>
|
||||||
instances to other than the default availability zone of
|
|
||||||
nova.</para></section>
|
|
||||||
<section xml:id="segregation-example">
|
<section xml:id="segregation-example">
|
||||||
<title>Segregation example</title>
|
<title>Segregation example</title>
|
||||||
<para>In this example the cloud is divided into two regions, one
|
<para>In this example the cloud is divided into two regions, one
|
||||||
for each site, with two availability zones in each based on
|
for each site, with two availability zones in each based on
|
||||||
the power layout of the data centers. A number of host
|
the power layout of the data centers. A number of host
|
||||||
aggregates have also been defined to allow targeting of
|
aggregates enable targeting of
|
||||||
virtual machine instances using flavors, that require special
|
virtual machine instances using flavors, that require special
|
||||||
capabilities shared by the target hosts such as SSDs, 10 GbE
|
capabilities shared by the target hosts such as SSDs, 10 GbE
|
||||||
networks, or GPU cards.</para>
|
networks, or GPU cards.</para>
|
||||||
|
@ -56,48 +56,47 @@
|
|||||||
<listitem>
|
<listitem>
|
||||||
<para>The cloud user expects repeatable, dependable, and
|
<para>The cloud user expects repeatable, dependable, and
|
||||||
deterministic processes for launching and deploying
|
deterministic processes for launching and deploying
|
||||||
cloud resources. This could be delivered through a
|
cloud resources. You could deliver this through a
|
||||||
web-based interface or publicly available API
|
web-based interface or publicly available API
|
||||||
endpoints. All appropriate options for requesting
|
endpoints. All appropriate options for requesting
|
||||||
cloud resources need to be available through some type
|
cloud resources must be available through some type
|
||||||
of user interface, a command-line interface (CLI), or
|
of user interface, a command-line interface (CLI), or
|
||||||
API endpoints.</para>
|
API endpoints.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Cloud users expect a fully self-service and
|
<para>Cloud users expect a fully self-service and
|
||||||
on-demand consumption model. When an OpenStack cloud
|
on-demand consumption model. When an OpenStack cloud
|
||||||
reaches the "massively scalable" size, it means it is
|
reaches the "massively scalable" size, expect
|
||||||
expected to be consumed "as a service" in each and
|
consumption "as a service" in each and
|
||||||
every way.</para>
|
every way.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>For a user of a massively scalable OpenStack public
|
<para>For a user of a massively scalable OpenStack public
|
||||||
cloud, there will be no expectations for control over
|
cloud, there are no expectations for control over
|
||||||
security, performance, or availability. Only SLAs
|
security, performance, or availability. Users expect
|
||||||
related to uptime of API services are expected, and
|
only SLAs related to uptime of API services, and
|
||||||
very basic SLAs expected of services offered. The user
|
very basic SLAs for services offered. It is the user's
|
||||||
understands it is his or her responsibility to address
|
responsibility to address
|
||||||
these issues on their own. The exception to this
|
these issues on their own. The exception to this
|
||||||
expectation is the rare case of a massively scalable
|
expectation is the rare case of a massively scalable
|
||||||
cloud infrastructure built for a private or government
|
cloud infrastructure built for a private or government
|
||||||
organization that has specific requirements.</para>
|
organization that has specific requirements.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
<para>As might be expected, the cloud user requirements or
|
<para>The cloud user's requirements and expectations that determine
|
||||||
expectations that determine the design are all focused on the
|
the cloud design focus on the
|
||||||
consumption model. The user expects to be able to easily
|
consumption model. The user expects to consume cloud resources
|
||||||
consume cloud resources in an automated and deterministic way,
|
in an automated and deterministic way,
|
||||||
without any need for knowledge of the capacity, scalability,
|
without any need for knowledge of the capacity, scalability,
|
||||||
or other attributes of the cloud's underlying
|
or other attributes of the cloud's underlying
|
||||||
infrastructure.</para></section>
|
infrastructure.</para></section>
|
||||||
<section xml:id="operator-requirements-massive-scale">
|
<section xml:id="operator-requirements-massive-scale">
|
||||||
<title>Operator requirements</title>
|
<title>Operator requirements</title>
|
||||||
<para>Whereas the cloud user should be completely unaware of the
|
<para>While the cloud user can be completely unaware of the
|
||||||
underlying infrastructure of the cloud and its attributes, the
|
underlying infrastructure of the cloud and its attributes, the
|
||||||
operator must be able to build and support the infrastructure,
|
operator must build and support the infrastructure for operating
|
||||||
as well as how it needs to operate at scale. This presents a
|
at scale. This presents a very demanding set of requirements
|
||||||
very demanding set of requirements for building such a cloud
|
for building such a cloud from the operator's perspective:</para>
|
||||||
from the operator's perspective:</para>
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>First and foremost, everything must be capable of
|
<para>First and foremost, everything must be capable of
|
||||||
@ -105,7 +104,7 @@
|
|||||||
compute hardware, storage hardware, or networking
|
compute hardware, storage hardware, or networking
|
||||||
hardware, to the installation and configuration of the
|
hardware, to the installation and configuration of the
|
||||||
supporting software, everything must be capable of
|
supporting software, everything must be capable of
|
||||||
being automated. Manual processes will not suffice in
|
automation. Manual processes are impractical in
|
||||||
a massively scalable OpenStack design
|
a massively scalable OpenStack design
|
||||||
architecture.</para>
|
architecture.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
@ -127,13 +126,13 @@
|
|||||||
<listitem>
|
<listitem>
|
||||||
<para>Companies operating a massively scalable OpenStack
|
<para>Companies operating a massively scalable OpenStack
|
||||||
cloud also require that operational expenditures
|
cloud also require that operational expenditures
|
||||||
(OpEx) be minimized as much as possible. It is
|
(OpEx) be minimized as much as possible. We
|
||||||
recommended that cloud-optimized hardware is a good
|
recommend using cloud-optimized hardware when
|
||||||
approach when managing operational overhead. Some of
|
managing operational overhead. Some of
|
||||||
the factors that need to be considered include power,
|
the factors to consider include power,
|
||||||
cooling, and the physical design of the chassis. It is
|
cooling, and the physical design of the chassis. Through
|
||||||
possible to customize the hardware and systems so they
|
customization, it is possible to optimize the hardware
|
||||||
are optimized for this type of workload because of the
|
and systems for this type of workload because of the
|
||||||
scale of these implementations.</para>
|
scale of these implementations.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
@ -144,16 +143,16 @@
|
|||||||
infrastructure. This includes full scale metering of
|
infrastructure. This includes full scale metering of
|
||||||
the hardware and software status. A corresponding
|
the hardware and software status. A corresponding
|
||||||
framework of logging and alerting is also required to
|
framework of logging and alerting is also required to
|
||||||
store and allow operations to act upon the metrics
|
store and enable operations to act on the metrics
|
||||||
provided by the metering and monitoring solution(s).
|
provided by the metering and monitoring solutions.
|
||||||
The cloud operator also needs a solution that uses the
|
The cloud operator also needs a solution that uses the
|
||||||
data provided by the metering and monitoring solution
|
data provided by the metering and monitoring solution
|
||||||
to provide capacity planning and capacity trending
|
to provide capacity planning and capacity trending
|
||||||
analysis.</para>
|
analysis.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>A massively scalable OpenStack cloud will be a
|
<para>Invariably, massively scalable OpenStack clouds extend
|
||||||
multi-site cloud. Therefore, the user-operator
|
over several sites. Therefore, the user-operator
|
||||||
requirements for a multi-site OpenStack architecture
|
requirements for a multi-site OpenStack architecture
|
||||||
design are also applicable here. This includes various
|
design are also applicable here. This includes various
|
||||||
legal requirements for data storage, data placement,
|
legal requirements for data storage, data placement,
|
||||||
@ -161,18 +160,17 @@
|
|||||||
compliance requirements; image
|
compliance requirements; image
|
||||||
consistency-availability; storage replication and
|
consistency-availability; storage replication and
|
||||||
availability (both block and file/object storage); and
|
availability (both block and file/object storage); and
|
||||||
authentication, authorization, and auditing (AAA),
|
authentication, authorization, and auditing (AAA).
|
||||||
just to name a few. Refer to the <xref linkend="multi_site"/>
|
See <xref linkend="multi_site"/>
|
||||||
for more details on requirements and considerations
|
for more details on requirements and considerations
|
||||||
for multi-site OpenStack clouds.</para>
|
for multi-site OpenStack clouds.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Considerations around physical facilities such as
|
<para>The design architecture of a massively scalable OpenStack
|
||||||
space, floor weight, rack height and type,
|
cloud must address considerations around physical
|
||||||
|
facilities such as space, floor weight, rack height and type,
|
||||||
environmental considerations, power usage and power
|
environmental considerations, power usage and power
|
||||||
usage efficiency (PUE), and physical security must
|
usage efficiency (PUE), and physical security.</para>
|
||||||
also be addressed by the design architecture of a
|
|
||||||
massively scalable OpenStack cloud.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist></section>
|
</itemizedlist></section>
|
||||||
</section>
|
</section>
|
||||||
|
Loading…
Reference in New Issue
Block a user