openstack-manuals/doc/arch-design/generalpurpose/section_operational_considerations_general_purpose.xml
asettle 1137be16ea Edits to the general purpose arch guide section
1. Removal of extra examples (prescriptive example
   section covers this content)
2. Grammar/spelling/general edits

Change-Id: I662231a295a2ebe18f93ccb9429f9f8e0a8f9a09
2015-09-17 11:07:56 +10:00

157 lines
7.2 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="operational-considerations-general-purpose">
<?dbhtml stop-chunking?>
<title>Operational considerations</title>
<para>In the planning and design phases of the build out, it is
important to include the operation's function. Operational
factors affect the design choices for a general purpose cloud,
and operations staff are often tasked with the maintenance of
cloud environments for larger installations.</para>
<para>Expectations set by the Service Level Agreements (SLAs) directly
affect knowing when and where you should implement redundancy and
high availability. SLAs are contractual
obligations that provide assurances for service availability.
They define the levels of availability that drive the technical
design, often with penalties for not meeting contractual obligations.</para>
<para>SLA terms that affect design include:</para>
<itemizedlist>
<listitem>
<para>API availability guarantees implying multiple
infrastructure services and highly available
load balancers.</para>
</listitem>
<listitem>
<para>Network uptime guarantees affecting switch
design, which might require redundant switching and
power.</para>
</listitem>
<listitem>
<para>Factor in networking security policy requirements
in to your deployments.</para>
</listitem>
</itemizedlist>
<section xml:id="support-and-maintainability-general-purpose">
<title>Support and maintainability</title>
<para>To be able to support and maintain an installation, OpenStack
cloud management requires operations staff to understand and
comprehend design architecture content. The operations and engineering
staff skill level, and level of separation, are dependent on size and
purpose of the installation. Large cloud service providers, or telecom
providers, are more likely to be managed by specially trained, dedicated
operations organizations. Smaller implementations are more likely to rely
on support staff that need to take on combined engineering, design and
operations functions.</para>
<para>Maintaining OpenStack installations requires a
variety of technical skills. You may want to consider using a third-party
management company with special expertise in managing
OpenStack deployment.</para>
</section>
<section xml:id="monitoring-general-purpose">
<title>Monitoring</title>
<para>OpenStack clouds require appropriate monitoring platforms to
ensure errors are caught and managed appropriately. Specific
meters that are critically important to monitor include:</para>
<itemizedlist>
<listitem>
<para>
Image disk utilization
</para>
</listitem>
<listitem>
<para>
Response time to the Compute API
</para>
</listitem>
</itemizedlist>
<para>Leveraging existing monitoring systems is an effective check to
ensure OpenStack environments can be monitored.</para>
</section>
<section xml:id="downtime-general-purpose">
<title>Downtime</title>
<para>To effectively run cloud installations, initial downtime planning
includes creating processes and architectures that support
the following:</para>
<itemizedlist>
<listitem>
<para>
Planned (maintenance)
</para>
</listitem>
<listitem>
<para>
Unplanned (system faults)
</para>
</listitem>
</itemizedlist>
<para>Resiliency of overall system and individual components are going
to be dictated by the requirements of the SLA, meaning designing
for high availability (HA) can have cost ramifications.</para>
</section>
<section xml:id="capacity-planning">
<title>Capacity planning</title>
<para>Capacity constraints for a general purpose cloud environment
include:</para>
<itemizedlist>
<listitem>
<para>
Compute limits
</para>
</listitem>
<listitem>
<para>
Storage limits
</para>
</listitem>
</itemizedlist>
<para>A relationship exists between the size of the compute environment
and the supporting OpenStack infrastructure controller nodes requiring
support.</para>
<para>Increasing the size of the supporting compute environment increases
the network traffic and messages, adding load to the controller or
networking nodes. Effective monitoring of the environment will help
with capacity decisions on scaling.</para>
<para>Compute nodes automatically attach to OpenStack clouds, resulting in
a horizontally scaling process when adding extra compute capacity to an
OpenStack cloud. Additional processes are required to place nodes into
appropriate availability zones and host aggregates. When adding additional
compute nodes to environments, ensure identical or functional compatible
CPUs are used, otherwise live migration features will break. It is necessary
to add rack capacity or network switches as scaling out compute hosts directly
affects network and datacenter resources.</para>
<para>Assessing the average workloads and increasing the number of instances
that can run within the compute environment by adjusting the overcommit
ratio is another option. It is important to remember that changing the CPU overcommit
ratio can have a detrimental effect and cause a potential increase in a
noisy neighbor. The additional risk of increasing the overcommit ratio is
more instances failing when a compute host fails.</para>
<para>Compute host components can also be upgraded to account for
increases in demand; this is known as vertical scaling.
Upgrading CPUs with more cores, or increasing the overall
server memory, can add extra needed capacity depending on
whether the running applications are more CPU intensive or
memory intensive.</para>
<para>Insufficient disk capacity could also have a negative effect
on overall performance including CPU and memory usage.
Depending on the back-end architecture of the OpenStack Block
Storage layer, capacity includes adding disk shelves to
enterprise storage systems or installing additional block
storage nodes. Upgrading directly attached storage installed in
compute hosts, and adding capacity to the shared storage for
additional ephemeral storage to instances, may be necessary.</para>
<para>
For a deeper discussion on many of these topics, refer to the
<link
xlink:href="http://docs.openstack.org/ops"><citetitle>OpenStack
Operations Guide</citetitle></link>.
</para>
</section>
</section>