1137be16ea
1. Removal of extra examples (prescriptive example section covers this content) 2. Grammar/spelling/general edits Change-Id: I662231a295a2ebe18f93ccb9429f9f8e0a8f9a09
157 lines
7.2 KiB
XML
157 lines
7.2 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
version="5.0"
|
|
xml:id="operational-considerations-general-purpose">
|
|
<?dbhtml stop-chunking?>
|
|
<title>Operational considerations</title>
|
|
<para>In the planning and design phases of the build out, it is
|
|
important to include the operation's function. Operational
|
|
factors affect the design choices for a general purpose cloud,
|
|
and operations staff are often tasked with the maintenance of
|
|
cloud environments for larger installations.</para>
|
|
<para>Expectations set by the Service Level Agreements (SLAs) directly
|
|
affect knowing when and where you should implement redundancy and
|
|
high availability. SLAs are contractual
|
|
obligations that provide assurances for service availability.
|
|
They define the levels of availability that drive the technical
|
|
design, often with penalties for not meeting contractual obligations.</para>
|
|
<para>SLA terms that affect design include:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>API availability guarantees implying multiple
|
|
infrastructure services and highly available
|
|
load balancers.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Network uptime guarantees affecting switch
|
|
design, which might require redundant switching and
|
|
power.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Factor in networking security policy requirements
|
|
in to your deployments.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<section xml:id="support-and-maintainability-general-purpose">
|
|
<title>Support and maintainability</title>
|
|
<para>To be able to support and maintain an installation, OpenStack
|
|
cloud management requires operations staff to understand and
|
|
comprehend design architecture content. The operations and engineering
|
|
staff skill level, and level of separation, are dependent on size and
|
|
purpose of the installation. Large cloud service providers, or telecom
|
|
providers, are more likely to be managed by specially trained, dedicated
|
|
operations organizations. Smaller implementations are more likely to rely
|
|
on support staff that need to take on combined engineering, design and
|
|
operations functions.</para>
|
|
<para>Maintaining OpenStack installations requires a
|
|
variety of technical skills. You may want to consider using a third-party
|
|
management company with special expertise in managing
|
|
OpenStack deployment.</para>
|
|
</section>
|
|
|
|
<section xml:id="monitoring-general-purpose">
|
|
<title>Monitoring</title>
|
|
<para>OpenStack clouds require appropriate monitoring platforms to
|
|
ensure errors are caught and managed appropriately. Specific
|
|
meters that are critically important to monitor include:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Image disk utilization
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Response time to the Compute API
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>Leveraging existing monitoring systems is an effective check to
|
|
ensure OpenStack environments can be monitored.</para>
|
|
</section>
|
|
|
|
<section xml:id="downtime-general-purpose">
|
|
<title>Downtime</title>
|
|
<para>To effectively run cloud installations, initial downtime planning
|
|
includes creating processes and architectures that support
|
|
the following:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Planned (maintenance)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Unplanned (system faults)
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>Resiliency of overall system and individual components are going
|
|
to be dictated by the requirements of the SLA, meaning designing
|
|
for high availability (HA) can have cost ramifications.</para>
|
|
</section>
|
|
|
|
<section xml:id="capacity-planning">
|
|
<title>Capacity planning</title>
|
|
<para>Capacity constraints for a general purpose cloud environment
|
|
include:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Compute limits
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Storage limits
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>A relationship exists between the size of the compute environment
|
|
and the supporting OpenStack infrastructure controller nodes requiring
|
|
support.</para>
|
|
<para>Increasing the size of the supporting compute environment increases
|
|
the network traffic and messages, adding load to the controller or
|
|
networking nodes. Effective monitoring of the environment will help
|
|
with capacity decisions on scaling.</para>
|
|
<para>Compute nodes automatically attach to OpenStack clouds, resulting in
|
|
a horizontally scaling process when adding extra compute capacity to an
|
|
OpenStack cloud. Additional processes are required to place nodes into
|
|
appropriate availability zones and host aggregates. When adding additional
|
|
compute nodes to environments, ensure identical or functional compatible
|
|
CPUs are used, otherwise live migration features will break. It is necessary
|
|
to add rack capacity or network switches as scaling out compute hosts directly
|
|
affects network and datacenter resources.</para>
|
|
<para>Assessing the average workloads and increasing the number of instances
|
|
that can run within the compute environment by adjusting the overcommit
|
|
ratio is another option. It is important to remember that changing the CPU overcommit
|
|
ratio can have a detrimental effect and cause a potential increase in a
|
|
noisy neighbor. The additional risk of increasing the overcommit ratio is
|
|
more instances failing when a compute host fails.</para>
|
|
<para>Compute host components can also be upgraded to account for
|
|
increases in demand; this is known as vertical scaling.
|
|
Upgrading CPUs with more cores, or increasing the overall
|
|
server memory, can add extra needed capacity depending on
|
|
whether the running applications are more CPU intensive or
|
|
memory intensive.</para>
|
|
<para>Insufficient disk capacity could also have a negative effect
|
|
on overall performance including CPU and memory usage.
|
|
Depending on the back-end architecture of the OpenStack Block
|
|
Storage layer, capacity includes adding disk shelves to
|
|
enterprise storage systems or installing additional block
|
|
storage nodes. Upgrading directly attached storage installed in
|
|
compute hosts, and adding capacity to the shared storage for
|
|
additional ephemeral storage to instances, may be necessary.</para>
|
|
<para>
|
|
For a deeper discussion on many of these topics, refer to the
|
|
<link
|
|
xlink:href="http://docs.openstack.org/ops"><citetitle>OpenStack
|
|
Operations Guide</citetitle></link>.
|
|
</para>
|
|
</section>
|
|
</section>
|