openstack-manuals/doc/arch-design/compute_focus/section_operational_considerations_compute_focus.xml
Brian Moss 60002fc5be Remove passive voice from Arch Guide Chap 2
Closes-Bug: #1400552

Change-Id: I5b1572d7b5cf3321dcfa2836d7f777fe450a87e3
2015-04-24 15:45:14 +10:00

134 lines
5.8 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="operational-considerations-compute-focus">
<?dbhtml stop-chunking?>
<title>Operational considerations</title>
<para>Operationally, there are a number of considerations that affect the
design of compute-focused OpenStack clouds. Some examples include:</para>
<itemizedlist>
<listitem>
<para>
Enforcing strict API availability requirements
</para>
</listitem>
<listitem>
<para>
Understanding and dealing with failure scenarios
</para>
</listitem>
<listitem>
<para>
Managing host maintenance schedules
</para>
</listitem>
</itemizedlist>
<para>Service-level agreements (SLAs) are contractual obligations that
ensure the availability of a service. When designing an OpenStack cloud,
factoring in promises of availability implies a certain level of
redundancy and resiliency.</para>
<itemizedlist>
<listitem>
<para>Guarantees for API availability imply multiple infrastructure
services combined with appropriate, highly available load
balancers.</para>
</listitem>
<listitem>
<para>Network uptime guarantees affect the switch design and might
require redundant switching and power.</para>
</listitem>
<listitem>
<para>Factoring of network security policy requirements in to deployments.</para>
</listitem>
</itemizedlist>
<section xml:id="support-and-maintainability-compute-focus">
<title>Support and maintainability</title>
<para>OpenStack cloud management requires a certain level of
understanding and comprehension of design architecture. Specially trained,
dedicated operations organizations are more likely to manage larger
cloud service providers or telecom providers. Smaller implementations
are more inclined to rely on smaller support teams that need
to combine the engineering, design, and operation roles.</para>
<para>The maintenance of OpenStack installations requires a variety
of technical skills. To ease the operational burden, consider
incorporating features into the architecture and
design. Some examples include:</para>
<itemizedlist>
<listitem>
<para>Automating the operations functions</para>
</listitem>
<listitem>
<para>Utilizing a third party management company</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="montioring-compute-focus">
<title>Monitoring</title>
<para>OpenStack clouds require appropriate monitoring platforms that
help to catch and manage errors adequately. Consider leveraging any
existing monitoring systems to see if they are able to
effectively monitor an OpenStack environment. Specific metrics that
are critically important to capture include:</para>
<itemizedlist>
<listitem>
<para>Image disk utilization</para>
</listitem>
<listitem>
<para>Response time to the Compute API</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="expected-unexpected-server-downtime">
<title>Expected and unexpected server downtime</title>
<para>Unexpected server downtime is inevitable, and SLAs can
address how long it takes to recover from failure.
Recovery of a failed host means restoring instances from a snapshot, or
respawning that instance on another available host.</para>
<para>It is acceptable to design a compute-focused cloud
without the ability to migrate instances from one host to
another. The expectation is that the application
developer must handle failure within the application itself.
However, provisioning a compute-focused cloud
provides extra resilience. In this scenario, the
developer deploys extra support services.</para>
</section>
<section xml:id="capacity-planning-operational">
<title>Capacity planning</title>
<para>Adding extra capacity to an OpenStack cloud is a
horizontally scaling process.</para>
<note>
<para>Be mindful, however, of additional work to place the nodes into
appropriate Availability Zones and Host Aggregates.</para>
</note>
<para>We recommend the same or very similar CPUs
when adding extra nodes to the environment because they reduce
the chance of breaking live-migration features if they are
present. Scaling out hypervisor hosts also has a direct effect
on network and other data center resources. We recommend you
factor in this increase when reaching rack capacity or when requiring
extra network switches.</para>
<para>Changing the internal components of a Compute host to account for
increases in demand is a process known as vertical scaling.
Swapping a CPU for one with more cores, or
increasing the memory in a server, can help add extra
capacity for running applications.</para>
<para>Another option is to assess the average workloads and
increase the number of instances that can run within the
compute environment by adjusting the overcommit ratio. While
only appropriate in some environments, it's important to
remember that changing the CPU overcommit ratio can have a
detrimental effect and cause a potential increase in a noisy
neighbor. The added risk of increasing the overcommit ratio is that
more instances fail when a compute host fails. We do not recommend
that you increase the CPU overcommit ratio in compute-focused
OpenStack design architecture, as it can increase the potential
for noisy neighbor issues.</para>
</section>
</section>