60002fc5be
Closes-Bug: #1400552 Change-Id: I5b1572d7b5cf3321dcfa2836d7f777fe450a87e3
134 lines
5.8 KiB
XML
134 lines
5.8 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
version="5.0"
|
|
xml:id="operational-considerations-compute-focus">
|
|
<?dbhtml stop-chunking?>
|
|
<title>Operational considerations</title>
|
|
<para>Operationally, there are a number of considerations that affect the
|
|
design of compute-focused OpenStack clouds. Some examples include:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Enforcing strict API availability requirements
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Understanding and dealing with failure scenarios
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Managing host maintenance schedules
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>Service-level agreements (SLAs) are contractual obligations that
|
|
ensure the availability of a service. When designing an OpenStack cloud,
|
|
factoring in promises of availability implies a certain level of
|
|
redundancy and resiliency.</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Guarantees for API availability imply multiple infrastructure
|
|
services combined with appropriate, highly available load
|
|
balancers.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Network uptime guarantees affect the switch design and might
|
|
require redundant switching and power.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Factoring of network security policy requirements in to deployments.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<section xml:id="support-and-maintainability-compute-focus">
|
|
<title>Support and maintainability</title>
|
|
<para>OpenStack cloud management requires a certain level of
|
|
understanding and comprehension of design architecture. Specially trained,
|
|
dedicated operations organizations are more likely to manage larger
|
|
cloud service providers or telecom providers. Smaller implementations
|
|
are more inclined to rely on smaller support teams that need
|
|
to combine the engineering, design, and operation roles.</para>
|
|
<para>The maintenance of OpenStack installations requires a variety
|
|
of technical skills. To ease the operational burden, consider
|
|
incorporating features into the architecture and
|
|
design. Some examples include:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Automating the operations functions</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Utilizing a third party management company</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
|
|
<section xml:id="montioring-compute-focus">
|
|
<title>Monitoring</title>
|
|
<para>OpenStack clouds require appropriate monitoring platforms that
|
|
help to catch and manage errors adequately. Consider leveraging any
|
|
existing monitoring systems to see if they are able to
|
|
effectively monitor an OpenStack environment. Specific metrics that
|
|
are critically important to capture include:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Image disk utilization</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Response time to the Compute API</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
|
|
<section xml:id="expected-unexpected-server-downtime">
|
|
<title>Expected and unexpected server downtime</title>
|
|
<para>Unexpected server downtime is inevitable, and SLAs can
|
|
address how long it takes to recover from failure.
|
|
Recovery of a failed host means restoring instances from a snapshot, or
|
|
respawning that instance on another available host.</para>
|
|
<para>It is acceptable to design a compute-focused cloud
|
|
without the ability to migrate instances from one host to
|
|
another. The expectation is that the application
|
|
developer must handle failure within the application itself.
|
|
However, provisioning a compute-focused cloud
|
|
provides extra resilience. In this scenario, the
|
|
developer deploys extra support services.</para>
|
|
</section>
|
|
|
|
<section xml:id="capacity-planning-operational">
|
|
<title>Capacity planning</title>
|
|
<para>Adding extra capacity to an OpenStack cloud is a
|
|
horizontally scaling process.</para>
|
|
<note>
|
|
<para>Be mindful, however, of additional work to place the nodes into
|
|
appropriate Availability Zones and Host Aggregates.</para>
|
|
</note>
|
|
<para>We recommend the same or very similar CPUs
|
|
when adding extra nodes to the environment because they reduce
|
|
the chance of breaking live-migration features if they are
|
|
present. Scaling out hypervisor hosts also has a direct effect
|
|
on network and other data center resources. We recommend you
|
|
factor in this increase when reaching rack capacity or when requiring
|
|
extra network switches.</para>
|
|
<para>Changing the internal components of a Compute host to account for
|
|
increases in demand is a process known as vertical scaling.
|
|
Swapping a CPU for one with more cores, or
|
|
increasing the memory in a server, can help add extra
|
|
capacity for running applications.</para>
|
|
<para>Another option is to assess the average workloads and
|
|
increase the number of instances that can run within the
|
|
compute environment by adjusting the overcommit ratio. While
|
|
only appropriate in some environments, it's important to
|
|
remember that changing the CPU overcommit ratio can have a
|
|
detrimental effect and cause a potential increase in a noisy
|
|
neighbor. The added risk of increasing the overcommit ratio is that
|
|
more instances fail when a compute host fails. We do not recommend
|
|
that you increase the CPU overcommit ratio in compute-focused
|
|
OpenStack design architecture, as it can increase the potential
|
|
for noisy neighbor issues.</para>
|
|
</section>
|
|
</section>
|