Operational considerations
There are a number of operational considerations that affect the
design of compute-focused OpenStack clouds, including:
Enforcing strict API availability requirements
Understanding and dealing with failure scenarios
Managing host maintenance schedules
Service-level agreements (SLAs) are contractual obligations that
ensure the availability of a service. When designing an OpenStack cloud,
factoring in promises of availability implies a certain level of
redundancy and resiliency.
Monitoring
OpenStack clouds require appropriate monitoring platforms
to catch and manage errors.
We recommend leveraging existing monitoring systems
to see if they are able to effectively monitor an
OpenStack environment.
Specific meters that are critically important to capture
include:
Image disk utilization
Response time to the Compute API
Capacity planning
Adding extra capacity to an OpenStack cloud is a
horizontally scaling process.
We recommend similar (or the same) CPUs
when adding extra nodes to the environment. This reduces
the chance of breaking live-migration features if they are
present. Scaling out hypervisor hosts also has a direct effect
on network and other data center resources. We recommend you
factor in this increase when reaching rack capacity or when requiring
extra network switches.
Changing the internal components of a Compute host to account for
increases in demand is a process known as vertical scaling.
Swapping a CPU for one with more cores, or
increasing the memory in a server, can help add extra
capacity for running applications.
Another option is to assess the average workloads and
increase the number of instances that can run within the
compute environment by adjusting the overcommit ratio.
It is important to remember that changing the CPU
overcommit ratio can have a detrimental effect and cause
a potential increase in a noisy neighbor.
The added risk of increasing the overcommit ratio is that
more instances fail when a compute host fails. We do not recommend
that you increase the CPU overcommit ratio in compute-focused
OpenStack design architecture, as it can increase the potential
for noisy neighbor issues.