Operational considerations
- Operationally, there are a number of considerations that affect the
- design of compute-focused OpenStack clouds. Some examples include:
+ There are a number of operational considerations that affect the
+ design of compute-focused OpenStack clouds, including:
@@ -29,50 +29,18 @@
ensure the availability of a service. When designing an OpenStack cloud,
factoring in promises of availability implies a certain level of
redundancy and resiliency.
-
-
- Guarantees for API availability imply multiple infrastructure
- services combined with appropriate, highly available load
- balancers.
-
-
- Network uptime guarantees affect the switch design and might
- require redundant switching and power.
-
-
- Factoring of network security policy requirements in to deployments.
-
-
-
-
- Support and maintainability
- OpenStack cloud management requires a certain level of
- understanding and comprehension of design architecture. Specially trained,
- dedicated operations organizations are more likely to manage larger
- cloud service providers or telecom providers. Smaller implementations
- are more inclined to rely on smaller support teams that need
- to combine the engineering, design, and operation roles.
- The maintenance of OpenStack installations requires a variety
- of technical skills. To ease the operational burden, consider
- incorporating features into the architecture and
- design. Some examples include:
-
-
- Automating the operations functions
-
-
- Utilizing a third party management company
-
-
- Monitoring
- OpenStack clouds require appropriate monitoring platforms that
- help to catch and manage errors adequately. Consider leveraging any
- existing monitoring systems to see if they are able to
- effectively monitor an OpenStack environment. Specific meters that
- are critically important to capture include:
+ OpenStack clouds require appropriate monitoring platforms
+ to catch and manage errors.
+
+ We recommend leveraging existing monitoring systems
+ to see if they are able to effectively monitor an
+ OpenStack environment.
+
+ Specific meters that are critically important to capture
+ include:Image disk utilization
@@ -83,31 +51,12 @@
-
- Expected and unexpected server downtime
- Unexpected server downtime is inevitable, and SLAs can
- address how long it takes to recover from failure.
- Recovery of a failed host means restoring instances from a snapshot, or
- respawning that instance on another available host.
- It is acceptable to design a compute-focused cloud
- without the ability to migrate instances from one host to
- another. The expectation is that the application
- developer must handle failure within the application itself.
- However, provisioning a compute-focused cloud
- provides extra resilience. In this scenario, the
- developer deploys extra support services.
-
-
Capacity planningAdding extra capacity to an OpenStack cloud is a
horizontally scaling process.
-
- Be mindful, however, of additional work to place the nodes into
- appropriate Availability Zones and Host Aggregates.
-
- We recommend the same or very similar CPUs
- when adding extra nodes to the environment because they reduce
+ We recommend similar (or the same) CPUs
+ when adding extra nodes to the environment. This reduces
the chance of breaking live-migration features if they are
present. Scaling out hypervisor hosts also has a direct effect
on network and other data center resources. We recommend you
@@ -120,11 +69,13 @@
capacity for running applications.Another option is to assess the average workloads and
increase the number of instances that can run within the
- compute environment by adjusting the overcommit ratio. While
- only appropriate in some environments, it's important to
- remember that changing the CPU overcommit ratio can have a
- detrimental effect and cause a potential increase in a noisy
- neighbor. The added risk of increasing the overcommit ratio is that
+ compute environment by adjusting the overcommit ratio.
+
+ It is important to remember that changing the CPU
+ overcommit ratio can have a detrimental effect and cause
+ a potential increase in a noisy neighbor.
+
+ The added risk of increasing the overcommit ratio is that
more instances fail when a compute host fails. We do not recommend
that you increase the CPU overcommit ratio in compute-focused
OpenStack design architecture, as it can increase the potential
diff --git a/doc/arch-design/compute_focus/section_user_requirements_compute_focus.xml b/doc/arch-design/compute_focus/section_user_requirements_compute_focus.xml
deleted file mode 100644
index b8920e1585..0000000000
--- a/doc/arch-design/compute_focus/section_user_requirements_compute_focus.xml
+++ /dev/null
@@ -1,175 +0,0 @@
-
-
-%openstack;
-]>
-
-
- User requirements
- High utilization of CPU, RAM, or both defines compute
- intensive workloads. User requirements determine the performance
- demands for the cloud.
-
-
-
- Cost
-
- Cost is not generally a primary concern for a
- compute-focused cloud, however some organizations
- might be concerned with cost avoidance. Repurposing
- existing resources to tackle compute-intensive tasks
- instead of acquiring additional resources may
- offer cost reduction opportunities.
-
-
-
- Time to market
-
- Compute-focused clouds can deliver products more quickly,
- for example by speeding up a company's software development
- life cycle (SDLC) for building products and applications.
-
-
-
- Revenue opportunity
-
- Companies that want to build services or products that
- rely on the power of compute resources benefit from a
- compute-focused cloud. Examples include the analysis
- of large data sets (via Hadoop or Cassandra) or
- completing computational intensive tasks such as
- rendering, scientific computation, or
- simulations.
-
-
-
-
- Legal requirements
- Many jurisdictions have legislative and regulatory
- requirements governing the storage and management of data in
- cloud environments. Common areas of regulation include:
-
-
- Data retention policies ensuring storage of
- persistent data and records management to meet data
- archival requirements.
-
-
- Data ownership policies governing the possession and
- responsibility for data.
-
-
- Data sovereignty policies governing the storage of
- data in foreign countries or otherwise separate
- jurisdictions.
-
-
- Data compliance: certain types of information need
- to reside in certain locations due to regular issues and,
- more importantly, cannot reside in other locations
- for the same reason.
-
-
-
- Examples of such legal frameworks include the data
- protection framework of the European Union and the
- requirements of the Financial
- Industry Regulatory Authority in the United
- States. Consult a local regulatory body for more
- information.
-
- Technical considerations
- The following are some technical requirements you must consider
- in the architecture design:
-
-
-
- Performance
-
- If a primary technical concern is to deliver high performance
- capability, then a compute-focused design is an
- obvious choice because it is specifically designed to
- host compute-intensive workloads.
-
-
-
- Workload persistence
-
- Workloads can be either
- short-lived or long-running. Short-lived workloads
- can include continuous integration and continuous
- deployment (CI-CD) jobs, which create large numbers of
- compute instances simultaneously to
- perform a set of compute-intensive tasks. The environment then
- copies the results or artifacts from each instance into
- long-term storage before destroying the instance.
- Long-running workloads, like a Hadoop or
- high-performance computing (HPC) cluster, typically
- ingest large data sets, perform the computational work
- on those data sets, then push the results into long-term
- storage. When the computational work finishes, the instances
- remain idle until they receive another job. Environments
- for long-running workloads are often larger and more complex,
- but you can offset the cost of building them by keeping them
- active between jobs. Another example of long-running
- workloads is legacy applications that are
- persistent over time.
-
-
-
- Storage
-
- Workloads targeted for a compute-focused
- OpenStack cloud generally do not require any
- persistent block storage, although some uses of
- Hadoop with HDFS may require persistent
- block storage. A shared filesystem or object store
- maintains the initial data sets and serves as the
- destination for saving the computational results. By
- avoiding the input-output (IO) overhead, you can significantly
- enhance workload performance. Depending on
- the size of the data sets, it may be necessary to
- scale the object store or shared file system to match
- the storage demand.
-
-
-
- User interface
-
- Like any other cloud architecture, a
- compute-focused OpenStack cloud requires an on-demand
- and self-service user interface. End users must be
- able to provision computing power, storage, networks,
- and software simply and flexibly. This includes
- scaling the infrastructure up to a substantial level
- without disrupting host operations.
-
-
-
- Security
-
- Security is highly dependent
- on business requirements. For example, a
- computationally intense drug discovery application
- has much higher security requirements
- than a cloud for processing market
- data for a retailer. As a general rule, the security
- recommendations and guidelines provided in the
- OpenStack Security Guide are applicable.
-
-
-
-
-
- Operational considerations
- From an operational perspective, a compute intensive cloud
- is similar to a general-purpose cloud. See the general-purpose
- design section for more details on operational requirements.
-
-