[arch-guide-draft] Update use cases chapter
1. Reorganise chapter structure 2. Migrate arch examples Change-Id: Id0f4485eaef02a62cde06b84697252183188431e Implements: blueprint arch-guide-restructure
@ -1,126 +0,0 @@
|
||||
=============================
|
||||
Compute-focused cloud example
|
||||
=============================
|
||||
|
||||
The Conseil Européen pour la Recherche Nucléaire (CERN), also known as
|
||||
the European Organization for Nuclear Research, provides particle
|
||||
accelerators and other infrastructure for high-energy physics research.
|
||||
|
||||
As of 2011 CERN operated these two compute centers in Europe with plans
|
||||
to add a third.
|
||||
|
||||
+-----------------------+------------------------+
|
||||
| Data center | Approximate capacity |
|
||||
+=======================+========================+
|
||||
| Geneva, Switzerland | - 3.5 Mega Watts |
|
||||
| | |
|
||||
| | - 91000 cores |
|
||||
| | |
|
||||
| | - 120 PB HDD |
|
||||
| | |
|
||||
| | - 100 PB Tape |
|
||||
| | |
|
||||
| | - 310 TB Memory |
|
||||
+-----------------------+------------------------+
|
||||
| Budapest, Hungary | - 2.5 Mega Watts |
|
||||
| | |
|
||||
| | - 20000 cores |
|
||||
| | |
|
||||
| | - 6 PB HDD |
|
||||
+-----------------------+------------------------+
|
||||
|
||||
To support a growing number of compute-heavy users of experiments
|
||||
related to the Large Hadron Collider (LHC), CERN ultimately elected to
|
||||
deploy an OpenStack cloud using Scientific Linux and RDO. This effort
|
||||
aimed to simplify the management of the center's compute resources with
|
||||
a view to doubling compute capacity through the addition of a data
|
||||
center in 2013 while maintaining the same levels of compute staff.
|
||||
|
||||
The CERN solution uses :term:`cells <cell>` for segregation of compute
|
||||
resources and for transparently scaling between different data centers.
|
||||
This decision meant trading off support for security groups and live
|
||||
migration. In addition, they must manually replicate some details, like
|
||||
flavors, across cells. In spite of these drawbacks cells provide the
|
||||
required scale while exposing a single public API endpoint to users.
|
||||
|
||||
CERN created a compute cell for each of the two original data centers
|
||||
and created a third when it added a new data center in 2013. Each cell
|
||||
contains three availability zones to further segregate compute resources
|
||||
and at least three RabbitMQ message brokers configured for clustering
|
||||
with mirrored queues for high availability.
|
||||
|
||||
The API cell, which resides behind a HAProxy load balancer, is in the
|
||||
data center in Switzerland and directs API calls to compute cells using
|
||||
a customized variation of the cell scheduler. The customizations allow
|
||||
certain workloads to route to a specific data center or all data
|
||||
centers, with cell RAM availability determining cell selection in the
|
||||
latter case.
|
||||
|
||||
.. figure:: figures/Generic_CERN_Example.png
|
||||
|
||||
There is also some customization of the filter scheduler that handles
|
||||
placement within the cells:
|
||||
|
||||
ImagePropertiesFilter
|
||||
Provides special handling depending on the guest operating system in
|
||||
use (Linux-based or Windows-based).
|
||||
|
||||
ProjectsToAggregateFilter
|
||||
Provides special handling depending on which project the instance is
|
||||
associated with.
|
||||
|
||||
default_schedule_zones
|
||||
Allows the selection of multiple default availability zones, rather
|
||||
than a single default.
|
||||
|
||||
A central database team manages the MySQL database server in each cell
|
||||
in an active/passive configuration with a NetApp storage back end.
|
||||
Backups run every 6 hours.
|
||||
|
||||
Network architecture
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
To integrate with existing networking infrastructure, CERN made
|
||||
customizations to legacy networking (nova-network). This was in the form
|
||||
of a driver to integrate with CERN's existing database for tracking MAC
|
||||
and IP address assignments.
|
||||
|
||||
The driver facilitates selection of a MAC address and IP for new
|
||||
instances based on the compute node where the scheduler places the
|
||||
instance.
|
||||
|
||||
The driver considers the compute node where the scheduler placed an
|
||||
instance and selects a MAC address and IP from the pre-registered list
|
||||
associated with that node in the database. The database updates to
|
||||
reflect the address assignment to that instance.
|
||||
|
||||
Storage architecture
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
CERN deploys the OpenStack Image service in the API cell and configures
|
||||
it to expose version 1 (V1) of the API. This also requires the image
|
||||
registry. The storage back end in use is a 3 PB Ceph cluster.
|
||||
|
||||
CERN maintains a small set of Scientific Linux 5 and 6 images onto which
|
||||
orchestration tools can place applications. Puppet manages instance
|
||||
configuration and customization.
|
||||
|
||||
Monitoring
|
||||
~~~~~~~~~~
|
||||
|
||||
CERN does not require direct billing, but uses the Telemetry service to
|
||||
perform metering for the purposes of adjusting project quotas. CERN uses
|
||||
a sharded, replicated, MongoDB back-end. To spread API load, CERN
|
||||
deploys instances of the nova-api service within the child cells for
|
||||
Telemetry to query against. This also requires the configuration of
|
||||
supporting services such as keystone, glance-api, and glance-registry in
|
||||
the child cells.
|
||||
|
||||
.. figure:: figures/Generic_CERN_Architecture.png
|
||||
|
||||
Additional monitoring tools in use include
|
||||
`Flume <http://flume.apache.org/>`_, `Elastic
|
||||
Search <http://www.elasticsearch.org/>`_,
|
||||
`Kibana <http://www.elasticsearch.org/overview/kibana/>`_, and the CERN
|
||||
developed `Lemon <http://lemon.web.cern.ch/lemon/index.shtml>`_
|
||||
project.
|
@ -1,85 +0,0 @@
|
||||
=====================
|
||||
General cloud example
|
||||
=====================
|
||||
|
||||
An online classified advertising company wants to run web applications
|
||||
consisting of Tomcat, Nginx and MariaDB in a private cloud. To be able
|
||||
to meet policy requirements, the cloud infrastructure will run in their
|
||||
own data center. The company has predictable load requirements, but
|
||||
requires scaling to cope with nightly increases in demand. Their current
|
||||
environment does not have the flexibility to align with their goal of
|
||||
running an open source API environment. The current environment consists
|
||||
of the following:
|
||||
|
||||
* Between 120 and 140 installations of Nginx and Tomcat, each with 2
|
||||
vCPUs and 4 GB of RAM
|
||||
|
||||
* A three-node MariaDB and Galera cluster, each with 4 vCPUs and 8 GB
|
||||
RAM
|
||||
|
||||
The company runs hardware load balancers and multiple web applications
|
||||
serving their websites, and orchestrates environments using combinations
|
||||
of scripts and Puppet. The website generates large amounts of log data
|
||||
daily that requires archiving.
|
||||
|
||||
The solution would consist of the following OpenStack components:
|
||||
|
||||
* A firewall, switches and load balancers on the public facing network
|
||||
connections.
|
||||
|
||||
* OpenStack Controller service running Image, Identity, Networking,
|
||||
combined with support services such as MariaDB and RabbitMQ,
|
||||
configured for high availability on at least three controller nodes.
|
||||
|
||||
* OpenStack Compute nodes running the KVM hypervisor.
|
||||
|
||||
* OpenStack Block Storage for use by compute instances, requiring
|
||||
persistent storage (such as databases for dynamic sites).
|
||||
|
||||
* OpenStack Object Storage for serving static objects (such as images).
|
||||
|
||||
.. figure:: figures/General_Architecture3.png
|
||||
|
||||
Running up to 140 web instances and the small number of MariaDB
|
||||
instances requires 292 vCPUs available, as well as 584 GB RAM. On a
|
||||
typical 1U server using dual-socket hex-core Intel CPUs with
|
||||
Hyperthreading, and assuming 2:1 CPU overcommit ratio, this would
|
||||
require 8 OpenStack Compute nodes.
|
||||
|
||||
The web application instances run from local storage on each of the
|
||||
OpenStack Compute nodes. The web application instances are stateless,
|
||||
meaning that any of the instances can fail and the application will
|
||||
continue to function.
|
||||
|
||||
MariaDB server instances store their data on shared enterprise storage,
|
||||
such as NetApp or Solidfire devices. If a MariaDB instance fails,
|
||||
storage would be expected to be re-attached to another instance and
|
||||
rejoined to the Galera cluster.
|
||||
|
||||
Logs from the web application servers are shipped to OpenStack Object
|
||||
Storage for processing and archiving.
|
||||
|
||||
Additional capabilities can be realized by moving static web content to
|
||||
be served from OpenStack Object Storage containers, and backing the
|
||||
OpenStack Image service with OpenStack Object Storage.
|
||||
|
||||
.. note::
|
||||
|
||||
Increasing OpenStack Object Storage means network bandwidth needs to
|
||||
be taken into consideration. Running OpenStack Object Storage with
|
||||
network connections offering 10 GbE or better connectivity is
|
||||
advised.
|
||||
|
||||
Leveraging Orchestration and Telemetry services is also a potential
|
||||
issue when providing auto-scaling, orchestrated web application
|
||||
environments. Defining the web applications in a
|
||||
:term:`Heat Orchestration Template (HOT)`
|
||||
negates the reliance on the current scripted Puppet
|
||||
solution.
|
||||
|
||||
OpenStack Networking can be used to control hardware load balancers
|
||||
through the use of plug-ins and the Networking API. This allows users to
|
||||
control hardware load balance pools and instances as members in these
|
||||
pools, but their use in production environments must be carefully
|
||||
weighed against current stability.
|
||||
|
@ -1,154 +0,0 @@
|
||||
=====================
|
||||
Hybrid cloud examples
|
||||
=====================
|
||||
|
||||
Hybrid cloud environments are designed for these use cases:
|
||||
|
||||
* Bursting workloads from private to public OpenStack clouds
|
||||
* Bursting workloads from private to public non-OpenStack clouds
|
||||
* High availability across clouds (for technical diversity)
|
||||
|
||||
This chapter provides examples of environments that address
|
||||
each of these use cases.
|
||||
|
||||
Bursting to a public OpenStack cloud
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Company A's data center is running low on capacity.
|
||||
It is not possible to expand the data center in the foreseeable future.
|
||||
In order to accommodate the continuously growing need for
|
||||
development resources in the organization,
|
||||
Company A decides to use resources in the public cloud.
|
||||
|
||||
Company A has an established data center with a substantial amount
|
||||
of hardware. Migrating the workloads to a public cloud is not feasible.
|
||||
|
||||
The company has an internal cloud management platform that directs
|
||||
requests to the appropriate cloud, depending on the local capacity.
|
||||
This is a custom in-house application written for this specific purpose.
|
||||
|
||||
This solution is depicted in the figure below:
|
||||
|
||||
.. figure:: figures/Multi-Cloud_Priv-Pub3.png
|
||||
:width: 100%
|
||||
|
||||
This example shows two clouds with a Cloud Management
|
||||
Platform (CMP) connecting them. This guide does not
|
||||
discuss a specific CMP, but describes how the Orchestration and
|
||||
Telemetry services handle, manage, and control workloads.
|
||||
|
||||
The private OpenStack cloud has at least one controller and at least
|
||||
one compute node. It includes metering using the Telemetry service.
|
||||
The Telemetry service captures the load increase and the CMP
|
||||
processes the information. If there is available capacity,
|
||||
the CMP uses the OpenStack API to call the Orchestration service.
|
||||
This creates instances on the private cloud in response to user requests.
|
||||
When capacity is not available on the private cloud, the CMP issues
|
||||
a request to the Orchestration service API of the public cloud.
|
||||
This creates the instance on the public cloud.
|
||||
|
||||
In this example, Company A does not direct the deployments to an
|
||||
external public cloud due to concerns regarding resource control,
|
||||
security, and increased operational expense.
|
||||
|
||||
Bursting to a public non-OpenStack cloud
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The second example examines bursting workloads from the private cloud
|
||||
into a non-OpenStack public cloud using Amazon Web Services (AWS)
|
||||
to take advantage of additional capacity and to scale applications.
|
||||
|
||||
The following diagram demonstrates an OpenStack-to-AWS hybrid cloud:
|
||||
|
||||
.. figure:: figures/Multi-Cloud_Priv-AWS4.png
|
||||
:width: 100%
|
||||
|
||||
Company B states that its developers are already using AWS
|
||||
and do not want to change to a different provider.
|
||||
|
||||
If the CMP is capable of connecting to an external cloud
|
||||
provider with an appropriate API, the workflow process remains
|
||||
the same as the previous scenario.
|
||||
The actions the CMP takes, such as monitoring loads and
|
||||
creating new instances, stay the same.
|
||||
However, the CMP performs actions in the public cloud
|
||||
using applicable API calls.
|
||||
|
||||
If the public cloud is AWS, the CMP would use the
|
||||
EC2 API to create a new instance and assign an Elastic IP.
|
||||
It can then add that IP to HAProxy in the private cloud.
|
||||
The CMP can also reference AWS-specific
|
||||
tools such as CloudWatch and CloudFormation.
|
||||
|
||||
Several open source tool kits for building CMPs are
|
||||
available and can handle this kind of translation.
|
||||
Examples include ManageIQ, jClouds, and JumpGate.
|
||||
|
||||
High availability and disaster recovery
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Company C requires their local data center to be able to
|
||||
recover from failure. Some of the workloads currently in
|
||||
use are running on their private OpenStack cloud.
|
||||
Protecting the data involves Block Storage, Object Storage,
|
||||
and a database. The architecture supports the failure of
|
||||
large components of the system while ensuring that the
|
||||
system continues to deliver services.
|
||||
While the services remain available to users, the failed
|
||||
components are restored in the background based on standard
|
||||
best practice data replication policies.
|
||||
To achieve these objectives, Company C replicates data to
|
||||
a second cloud in a geographically distant location.
|
||||
The following diagram describes this system:
|
||||
|
||||
.. figure:: figures/Multi-Cloud_failover2.png
|
||||
:width: 100%
|
||||
|
||||
This example includes two private OpenStack clouds connected with a CMP.
|
||||
The source cloud, OpenStack Cloud 1, includes a controller and
|
||||
at least one instance running MySQL. It also includes at least
|
||||
one Block Storage volume and one Object Storage volume.
|
||||
This means that data is available to the users at all times.
|
||||
The details of the method for protecting each of these sources
|
||||
of data differs.
|
||||
|
||||
Object Storage relies on the replication capabilities of
|
||||
the Object Storage provider.
|
||||
Company C enables OpenStack Object Storage so that it creates
|
||||
geographically separated replicas that take advantage of this feature.
|
||||
The company configures storage so that at least one replica
|
||||
exists in each cloud. In order to make this work, the company
|
||||
configures a single array spanning both clouds with OpenStack Identity.
|
||||
Using Federated Identity, the array talks to both clouds, communicating
|
||||
with OpenStack Object Storage through the Swift proxy.
|
||||
|
||||
For Block Storage, the replication is a little more difficult,
|
||||
and involves tools outside of OpenStack itself.
|
||||
The OpenStack Block Storage volume is not set as the drive itself
|
||||
but as a logical object that points to a physical back end.
|
||||
Disaster recovery is configured for Block Storage for
|
||||
synchronous backup for the highest level of data protection,
|
||||
but asynchronous backup could have been set as an alternative
|
||||
that is not as latency sensitive.
|
||||
For asynchronous backup, the Block Storage API makes it possible
|
||||
to export the data and also the metadata of a particular volume,
|
||||
so that it can be moved and replicated elsewhere.
|
||||
More information can be found here:
|
||||
https://blueprints.launchpad.net/cinder/+spec/cinder-backup-volume-metadata-support.
|
||||
|
||||
The synchronous backups create an identical volume in both
|
||||
clouds and chooses the appropriate flavor so that each cloud
|
||||
has an identical back end. This is done by creating volumes
|
||||
through the CMP. After this is configured, a solution
|
||||
involving DRDB synchronizes the physical drives.
|
||||
|
||||
The database component is backed up using synchronous backups.
|
||||
MySQL does not support geographically diverse replication,
|
||||
so disaster recovery is provided by replicating the file itself.
|
||||
As it is not possible to use Object Storage as the back end of
|
||||
a database like MySQL, Swift replication is not an option.
|
||||
Company C decides not to store the data on another geo-tiered
|
||||
storage system, such as Ceph, as Block Storage.
|
||||
This would have given another layer of protection.
|
||||
Another option would have been to store the database on an OpenStack
|
||||
Block Storage volume and backing it up like any other Block Storage.
|
@ -1,14 +0,0 @@
|
||||
===========================
|
||||
Cloud architecture examples
|
||||
===========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
arch-examples-general.rst
|
||||
arch-examples-compute.rst
|
||||
arch-examples-storage.rst
|
||||
arch-examples-network.rst
|
||||
arch-examples-multi-site.rst
|
||||
arch-examples-hybrid.rst
|
||||
arch-examples-specialized.rst
|
Before Width: | Height: | Size: 79 KiB After Width: | Height: | Size: 79 KiB |
Before Width: | Height: | Size: 70 KiB After Width: | Height: | Size: 70 KiB |
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 24 KiB |
Before Width: | Height: | Size: 59 KiB After Width: | Height: | Size: 59 KiB |
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 54 KiB |
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 54 KiB |
Before Width: | Height: | Size: 68 KiB After Width: | Height: | Size: 68 KiB |
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 50 KiB |
Before Width: | Height: | Size: 52 KiB After Width: | Height: | Size: 52 KiB |
Before Width: | Height: | Size: 75 KiB After Width: | Height: | Size: 75 KiB |
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 37 KiB |
Before Width: | Height: | Size: 56 KiB After Width: | Height: | Size: 56 KiB |
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 50 KiB |
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 50 KiB |
Before Width: | Height: | Size: 35 KiB After Width: | Height: | Size: 35 KiB |
@ -4,3 +4,13 @@
|
||||
Use cases
|
||||
=========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
use-cases/use-case-development
|
||||
use-cases/use-case-general-compute
|
||||
use-cases/use-case-web-scale
|
||||
use-cases/use-case-public
|
||||
use-cases/use-case-storage
|
||||
use-cases/use-case-multisite
|
||||
use-cases/use-case-nfv
|
||||
|
@ -0,0 +1,17 @@
|
||||
.. _development-cloud:
|
||||
|
||||
=================
|
||||
Development cloud
|
||||
=================
|
||||
|
||||
Stakeholder
|
||||
~~~~~~~~~~~
|
||||
|
||||
User stories
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Design model
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Component block diagram
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
@ -0,0 +1,385 @@
|
||||
.. _general-compute-cloud:
|
||||
|
||||
=====================
|
||||
General compute cloud
|
||||
=====================
|
||||
|
||||
Design model
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Hybrid cloud environments are designed for these use cases:
|
||||
|
||||
* Bursting workloads from private to public OpenStack clouds
|
||||
* Bursting workloads from private to public non-OpenStack clouds
|
||||
* High availability across clouds (for technical diversity)
|
||||
|
||||
This chapter provides examples of environments that address
|
||||
each of these use cases.
|
||||
|
||||
|
||||
Component block diagram
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
Stakeholder
|
||||
~~~~~~~~~~~
|
||||
|
||||
|
||||
User stories
|
||||
~~~~~~~~~~~~
|
||||
|
||||
General cloud example
|
||||
---------------------
|
||||
|
||||
An online classified advertising company wants to run web applications
|
||||
consisting of Tomcat, Nginx and MariaDB in a private cloud. To meet the
|
||||
policy requirements, the cloud infrastructure will run in their
|
||||
own data center. The company has predictable load requirements, but
|
||||
requires scaling to cope with nightly increases in demand. Their current
|
||||
environment does not have the flexibility to align with their goal of
|
||||
running an open source API environment. The current environment consists
|
||||
of the following:
|
||||
|
||||
* Between 120 and 140 installations of Nginx and Tomcat, each with 2
|
||||
vCPUs and 4 GB of RAM
|
||||
|
||||
* A three-node MariaDB and Galera cluster, each with 4 vCPUs and 8 GB
|
||||
RAM
|
||||
|
||||
The company runs hardware load balancers and multiple web applications
|
||||
serving their websites, and orchestrates environments using combinations
|
||||
of scripts and Puppet. The website generates large amounts of log data
|
||||
daily that requires archiving.
|
||||
|
||||
The solution would consist of the following OpenStack components:
|
||||
|
||||
* A firewall, switches and load balancers on the public facing network
|
||||
connections.
|
||||
|
||||
* OpenStack Controller service running Image, Identity, Networking,
|
||||
combined with support services such as MariaDB and RabbitMQ,
|
||||
configured for high availability on at least three controller nodes.
|
||||
|
||||
* OpenStack Compute nodes running the KVM hypervisor.
|
||||
|
||||
* OpenStack Block Storage for use by compute instances, requiring
|
||||
persistent storage (such as databases for dynamic sites).
|
||||
|
||||
* OpenStack Object Storage for serving static objects (such as images).
|
||||
|
||||
.. figure:: ../figures/General_Architecture3.png
|
||||
|
||||
Running up to 140 web instances and the small number of MariaDB
|
||||
instances requires 292 vCPUs available, as well as 584 GB RAM. On a
|
||||
typical 1U server using dual-socket hex-core Intel CPUs with
|
||||
Hyperthreading, and assuming 2:1 CPU overcommit ratio, this would
|
||||
require 8 OpenStack Compute nodes.
|
||||
|
||||
The web application instances run from local storage on each of the
|
||||
OpenStack Compute nodes. The web application instances are stateless,
|
||||
meaning that any of the instances can fail and the application will
|
||||
continue to function.
|
||||
|
||||
MariaDB server instances store their data on shared enterprise storage,
|
||||
such as NetApp or Solidfire devices. If a MariaDB instance fails,
|
||||
storage would be expected to be re-attached to another instance and
|
||||
rejoined to the Galera cluster.
|
||||
|
||||
Logs from the web application servers are shipped to OpenStack Object
|
||||
Storage for processing and archiving.
|
||||
|
||||
Additional capabilities can be realized by moving static web content to
|
||||
be served from OpenStack Object Storage containers, and backing the
|
||||
OpenStack Image service with OpenStack Object Storage.
|
||||
|
||||
.. note::
|
||||
|
||||
Increasing OpenStack Object Storage means network bandwidth needs to
|
||||
be taken into consideration. Running OpenStack Object Storage with
|
||||
network connections offering 10 GbE or better connectivity is
|
||||
advised.
|
||||
|
||||
Leveraging Orchestration and Telemetry services is also a potential
|
||||
issue when providing auto-scaling, orchestrated web application
|
||||
environments. Defining the web applications in a
|
||||
:term:`Heat Orchestration Template (HOT)`
|
||||
negates the reliance on the current scripted Puppet
|
||||
solution.
|
||||
|
||||
OpenStack Networking can be used to control hardware load balancers
|
||||
through the use of plug-ins and the Networking API. This allows users to
|
||||
control hardware load balance pools and instances as members in these
|
||||
pools, but their use in production environments must be carefully
|
||||
weighed against current stability.
|
||||
|
||||
|
||||
Compute-focused cloud example
|
||||
-----------------------------
|
||||
|
||||
The Conseil Européen pour la Recherche Nucléaire (CERN), also known as
|
||||
the European Organization for Nuclear Research, provides particle
|
||||
accelerators and other infrastructure for high-energy physics research.
|
||||
|
||||
As of 2011 CERN operated these two compute centers in Europe with plans
|
||||
to add a third.
|
||||
|
||||
+-----------------------+------------------------+
|
||||
| Data center | Approximate capacity |
|
||||
+=======================+========================+
|
||||
| Geneva, Switzerland | - 3.5 Mega Watts |
|
||||
| | |
|
||||
| | - 91000 cores |
|
||||
| | |
|
||||
| | - 120 PB HDD |
|
||||
| | |
|
||||
| | - 100 PB Tape |
|
||||
| | |
|
||||
| | - 310 TB Memory |
|
||||
+-----------------------+------------------------+
|
||||
| Budapest, Hungary | - 2.5 Mega Watts |
|
||||
| | |
|
||||
| | - 20000 cores |
|
||||
| | |
|
||||
| | - 6 PB HDD |
|
||||
+-----------------------+------------------------+
|
||||
|
||||
To support a growing number of compute-heavy users of experiments
|
||||
related to the Large Hadron Collider (LHC), CERN ultimately elected to
|
||||
deploy an OpenStack cloud using Scientific Linux and RDO. This effort
|
||||
aimed to simplify the management of the center's compute resources with
|
||||
a view to doubling compute capacity through the addition of a data
|
||||
center in 2013 while maintaining the same levels of compute staff.
|
||||
|
||||
The CERN solution uses :term:`cells <cell>` for segregation of compute
|
||||
resources and for transparently scaling between different data centers.
|
||||
This decision meant trading off support for security groups and live
|
||||
migration. In addition, they must manually replicate some details, like
|
||||
flavors, across cells. In spite of these drawbacks cells provide the
|
||||
required scale while exposing a single public API endpoint to users.
|
||||
|
||||
CERN created a compute cell for each of the two original data centers
|
||||
and created a third when it added a new data center in 2013. Each cell
|
||||
contains three availability zones to further segregate compute resources
|
||||
and at least three RabbitMQ message brokers configured for clustering
|
||||
with mirrored queues for high availability.
|
||||
|
||||
The API cell, which resides behind a HAProxy load balancer, is in the
|
||||
data center in Switzerland and directs API calls to compute cells using
|
||||
a customized variation of the cell scheduler. The customizations allow
|
||||
certain workloads to route to a specific data center or all data
|
||||
centers, with cell RAM availability determining cell selection in the
|
||||
latter case.
|
||||
|
||||
.. figure:: ../figures/Generic_CERN_Example.png
|
||||
|
||||
There is also some customization of the filter scheduler that handles
|
||||
placement within the cells:
|
||||
|
||||
ImagePropertiesFilter
|
||||
Provides special handling depending on the guest operating system in
|
||||
use (Linux-based or Windows-based).
|
||||
|
||||
ProjectsToAggregateFilter
|
||||
Provides special handling depending on which project the instance is
|
||||
associated with.
|
||||
|
||||
default_schedule_zones
|
||||
Allows the selection of multiple default availability zones, rather
|
||||
than a single default.
|
||||
|
||||
A central database team manages the MySQL database server in each cell
|
||||
in an active/passive configuration with a NetApp storage back end.
|
||||
Backups run every 6 hours.
|
||||
|
||||
Network architecture
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To integrate with existing networking infrastructure, CERN made
|
||||
customizations to legacy networking (nova-network). This was in the form
|
||||
of a driver to integrate with CERN's existing database for tracking MAC
|
||||
and IP address assignments.
|
||||
|
||||
The driver facilitates selection of a MAC address and IP for new
|
||||
instances based on the compute node where the scheduler places the
|
||||
instance.
|
||||
|
||||
The driver considers the compute node where the scheduler placed an
|
||||
instance and selects a MAC address and IP from the pre-registered list
|
||||
associated with that node in the database. The database updates to
|
||||
reflect the address assignment to that instance.
|
||||
|
||||
Storage architecture
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
CERN deploys the OpenStack Image service in the API cell and configures
|
||||
it to expose version 1 (V1) of the API. This also requires the image
|
||||
registry. The storage back end in use is a 3 PB Ceph cluster.
|
||||
|
||||
CERN maintains a small set of Scientific Linux 5 and 6 images onto which
|
||||
orchestration tools can place applications. Puppet manages instance
|
||||
configuration and customization.
|
||||
|
||||
Monitoring
|
||||
^^^^^^^^^^
|
||||
|
||||
CERN does not require direct billing, but uses the Telemetry service to
|
||||
perform metering for the purposes of adjusting project quotas. CERN uses
|
||||
a sharded, replicated, MongoDB back-end. To spread API load, CERN
|
||||
deploys instances of the nova-api service within the child cells for
|
||||
Telemetry to query against. This also requires the configuration of
|
||||
supporting services such as keystone, glance-api, and glance-registry in
|
||||
the child cells.
|
||||
|
||||
.. figure:: ../figures/Generic_CERN_Architecture.png
|
||||
|
||||
Additional monitoring tools in use include
|
||||
`Flume <http://flume.apache.org/>`_, `Elastic
|
||||
Search <http://www.elasticsearch.org/>`_,
|
||||
`Kibana <http://www.elasticsearch.org/overview/kibana/>`_, and the CERN
|
||||
developed `Lemon <http://lemon.web.cern.ch/lemon/index.shtml>`_
|
||||
project.
|
||||
|
||||
|
||||
|
||||
Hybrid cloud example: bursting to a public OpenStack cloud
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Company A's data center is running low on capacity.
|
||||
It is not possible to expand the data center in the foreseeable future.
|
||||
In order to accommodate the continuously growing need for
|
||||
development resources in the organization,
|
||||
Company A decides to use resources in the public cloud.
|
||||
|
||||
Company A has an established data center with a substantial amount
|
||||
of hardware. Migrating the workloads to a public cloud is not feasible.
|
||||
|
||||
The company has an internal cloud management platform that directs
|
||||
requests to the appropriate cloud, depending on the local capacity.
|
||||
This is a custom in-house application written for this specific purpose.
|
||||
|
||||
This solution is depicted in the figure below:
|
||||
|
||||
.. figure:: ../figures/Multi-Cloud_Priv-Pub3.png
|
||||
:width: 100%
|
||||
|
||||
This example shows two clouds with a Cloud Management
|
||||
Platform (CMP) connecting them. This guide does not
|
||||
discuss a specific CMP, but describes how the Orchestration and
|
||||
Telemetry services handle, manage, and control workloads.
|
||||
|
||||
The private OpenStack cloud has at least one controller and at least
|
||||
one compute node. It includes metering using the Telemetry service.
|
||||
The Telemetry service captures the load increase and the CMP
|
||||
processes the information. If there is available capacity,
|
||||
the CMP uses the OpenStack API to call the Orchestration service.
|
||||
This creates instances on the private cloud in response to user requests.
|
||||
When capacity is not available on the private cloud, the CMP issues
|
||||
a request to the Orchestration service API of the public cloud.
|
||||
This creates the instance on the public cloud.
|
||||
|
||||
In this example, Company A does not direct the deployments to an
|
||||
external public cloud due to concerns regarding resource control,
|
||||
security, and increased operational expense.
|
||||
|
||||
Hybrid cloud example: bursting to a public non-OpenStack cloud
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The second example examines bursting workloads from the private cloud
|
||||
into a non-OpenStack public cloud using Amazon Web Services (AWS)
|
||||
to take advantage of additional capacity and to scale applications.
|
||||
|
||||
The following diagram demonstrates an OpenStack-to-AWS hybrid cloud:
|
||||
|
||||
.. figure:: ../figures/Multi-Cloud_Priv-AWS4.png
|
||||
:width: 100%
|
||||
|
||||
Company B states that its developers are already using AWS
|
||||
and do not want to change to a different provider.
|
||||
|
||||
If the CMP is capable of connecting to an external cloud
|
||||
provider with an appropriate API, the workflow process remains
|
||||
the same as the previous scenario.
|
||||
The actions the CMP takes, such as monitoring loads and
|
||||
creating new instances, stay the same.
|
||||
However, the CMP performs actions in the public cloud
|
||||
using applicable API calls.
|
||||
|
||||
If the public cloud is AWS, the CMP would use the
|
||||
EC2 API to create a new instance and assign an Elastic IP.
|
||||
It can then add that IP to HAProxy in the private cloud.
|
||||
The CMP can also reference AWS-specific
|
||||
tools such as CloudWatch and CloudFormation.
|
||||
|
||||
Several open source tool kits for building CMPs are
|
||||
available and can handle this kind of translation.
|
||||
Examples include ManageIQ, jClouds, and JumpGate.
|
||||
|
||||
Hybrid cloud example: high availability and disaster recovery
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Company C requires their local data center to be able to
|
||||
recover from failure. Some of the workloads currently in
|
||||
use are running on their private OpenStack cloud.
|
||||
Protecting the data involves Block Storage, Object Storage,
|
||||
and a database. The architecture supports the failure of
|
||||
large components of the system while ensuring that the
|
||||
system continues to deliver services.
|
||||
While the services remain available to users, the failed
|
||||
components are restored in the background based on standard
|
||||
best practice data replication policies.
|
||||
To achieve these objectives, Company C replicates data to
|
||||
a second cloud in a geographically distant location.
|
||||
The following diagram describes this system:
|
||||
|
||||
.. figure:: ../figures/Multi-Cloud_failover2.png
|
||||
:width: 100%
|
||||
|
||||
This example includes two private OpenStack clouds connected with a CMP.
|
||||
The source cloud, OpenStack Cloud 1, includes a controller and
|
||||
at least one instance running MySQL. It also includes at least
|
||||
one Block Storage volume and one Object Storage volume.
|
||||
This means that data is available to the users at all times.
|
||||
The details of the method for protecting each of these sources
|
||||
of data differs.
|
||||
|
||||
Object Storage relies on the replication capabilities of
|
||||
the Object Storage provider.
|
||||
Company C enables OpenStack Object Storage so that it creates
|
||||
geographically separated replicas that take advantage of this feature.
|
||||
The company configures storage so that at least one replica
|
||||
exists in each cloud. In order to make this work, the company
|
||||
configures a single array spanning both clouds with OpenStack Identity.
|
||||
Using Federated Identity, the array talks to both clouds, communicating
|
||||
with OpenStack Object Storage through the Swift proxy.
|
||||
|
||||
For Block Storage, the replication is a little more difficult,
|
||||
and involves tools outside of OpenStack itself.
|
||||
The OpenStack Block Storage volume is not set as the drive itself
|
||||
but as a logical object that points to a physical back end.
|
||||
Disaster recovery is configured for Block Storage for
|
||||
synchronous backup for the highest level of data protection,
|
||||
but asynchronous backup could have been set as an alternative
|
||||
that is not as latency sensitive.
|
||||
For asynchronous backup, the Block Storage API makes it possible
|
||||
to export the data and also the metadata of a particular volume,
|
||||
so that it can be moved and replicated elsewhere.
|
||||
More information can be found here:
|
||||
https://blueprints.launchpad.net/cinder/+spec/cinder-backup-volume-metadata-support.
|
||||
|
||||
The synchronous backups create an identical volume in both
|
||||
clouds and chooses the appropriate flavor so that each cloud
|
||||
has an identical back end. This is done by creating volumes
|
||||
through the CMP. After this is configured, a solution
|
||||
involving DRDB synchronizes the physical drives.
|
||||
|
||||
The database component is backed up using synchronous backups.
|
||||
MySQL does not support geographically diverse replication,
|
||||
so disaster recovery is provided by replicating the file itself.
|
||||
As it is not possible to use Object Storage as the back end of
|
||||
a database like MySQL, Swift replication is not an option.
|
||||
Company C decides not to store the data on another geo-tiered
|
||||
storage system, such as Ceph, as Block Storage.
|
||||
This would have given another layer of protection.
|
||||
Another option would have been to store the database on an OpenStack
|
||||
Block Storage volume and backing it up like any other Block Storage.
|
||||
|
@ -1,13 +1,26 @@
|
||||
=========================
|
||||
Multi-site cloud examples
|
||||
=========================
|
||||
.. _multisite-cloud:
|
||||
|
||||
================
|
||||
Multi-site cloud
|
||||
================
|
||||
|
||||
Design Model
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Component block diagram
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Stakeholder
|
||||
~~~~~~~~~~~
|
||||
|
||||
User stories
|
||||
~~~~~~~~~~~~
|
||||
|
||||
There are multiple ways to build a multi-site OpenStack installation,
|
||||
based on the needs of the intended workloads. Below are example
|
||||
architectures based on different requirements. These examples are meant
|
||||
as a reference, and not a hard and fast rule for deployments. Use the
|
||||
previous sections of this chapter to assist in selecting specific
|
||||
components and implementations based on specific needs.
|
||||
architectures based on different requirements, which are not hard and
|
||||
fast rules for deployment. Refer to previous sections to assist in
|
||||
selecting specific components and implementations based on your needs.
|
||||
|
||||
A large content provider needs to deliver content to customers that are
|
||||
geographically dispersed. The workload is very sensitive to latency and
|
||||
@ -64,18 +77,18 @@ center in each of the edge regional locations house a second region near
|
||||
the first region. This ensures that the application does not suffer
|
||||
degraded performance in terms of latency and availability.
|
||||
|
||||
:ref:`ms-customer-edge` depicts the solution designed to have both a
|
||||
The follow figure depicts the solution designed to have both a
|
||||
centralized set of core data centers for OpenStack services and paired edge
|
||||
data centers:
|
||||
data centers.
|
||||
|
||||
.. _ms-customer-edge:
|
||||
|
||||
.. figure:: figures/Multi-Site_Customer_Edge.png
|
||||
|
||||
**Multi-site architecture example**
|
||||
|
||||
Geo-redundant load balancing
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
.. figure:: ../figures/Multi-Site_Customer_Edge.png
|
||||
|
||||
|
||||
Geo-redundant load balancing example
|
||||
------------------------------------
|
||||
|
||||
A large-scale web application has been designed with cloud principles in
|
||||
mind. The application is designed provide service to application store,
|
||||
@ -83,7 +96,7 @@ on a 24/7 basis. The company has typical two tier architecture with a
|
||||
web front-end servicing the customer requests, and a NoSQL database back
|
||||
end storing the information.
|
||||
|
||||
As of late there has been several outages in number of major public
|
||||
Recently there has been several outages in number of major public
|
||||
cloud providers due to applications running out of a single geographical
|
||||
location. The design therefore should mitigate the chance of a single
|
||||
site causing an outage for their business.
|
||||
@ -155,12 +168,13 @@ not have any awareness of geo location.
|
||||
|
||||
.. _ms-geo-redundant:
|
||||
|
||||
.. figure:: figures/Multi-site_Geo_Redundant_LB.png
|
||||
|
||||
**Multi-site geo-redundant architecture**
|
||||
|
||||
Location-local service
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
.. figure:: ../figures/Multi-site_Geo_Redundant_LB.png
|
||||
|
||||
|
||||
Location-local service example
|
||||
------------------------------
|
||||
|
||||
A common use for multi-site OpenStack deployment is creating a Content
|
||||
Delivery Network. An application that uses a location-local architecture
|
||||
@ -187,6 +201,6 @@ application completes the request.
|
||||
|
||||
.. _ms-shared-keystone:
|
||||
|
||||
.. figure:: figures/Multi-Site_shared_keystone1.png
|
||||
.. figure:: ../figures/Multi-Site_shared_keystone1.png
|
||||
|
||||
**Multi-site shared keystone architecture**
|
@ -1,12 +1,28 @@
|
||||
.. _nfv-cloud:
|
||||
|
||||
==============================
|
||||
Network-focused cloud examples
|
||||
Network virtual function cloud
|
||||
==============================
|
||||
|
||||
An organization designs a large-scale web application with cloud
|
||||
principles in mind. The application scales horizontally in a bursting
|
||||
fashion and generates a high instance count. The application requires an
|
||||
SSL connection to secure data and must not lose connection state to
|
||||
individual servers.
|
||||
Stakeholder
|
||||
~~~~~~~~~~~
|
||||
|
||||
Design model
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Component block diagram
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
User stories
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Network-focused cloud examples
|
||||
------------------------------
|
||||
|
||||
An organization designs a large scale cloud-baesed web application. The
|
||||
application scales horizontally in a bursting fashion and generates a
|
||||
high instance count. The application requires an SSL connection to secure
|
||||
data and must not lose connection state to individual servers.
|
||||
|
||||
The figure below depicts an example design for this workload. In this
|
||||
example, a hardware load balancer provides SSL offload functionality and
|
||||
@ -28,7 +44,7 @@ vSwitch agent in GRE tunnel mode. This ensures all devices can reach all
|
||||
other devices and that you can create tenant networks for private
|
||||
addressing links to the load balancer.
|
||||
|
||||
.. figure:: figures/Network_Web_Services1.png
|
||||
.. figure:: ../figures/Network_Web_Services1.png
|
||||
|
||||
A web service architecture has many options and optional components. Due
|
||||
to this, it can fit into a large number of other OpenStack designs. A
|
||||
@ -153,7 +169,7 @@ east-west traffic
|
||||
specific direction. However this traffic might interfere with
|
||||
north-south traffic.
|
||||
|
||||
.. figure:: figures/Network_Cloud_Storage2.png
|
||||
.. figure:: ../figures/Network_Cloud_Storage2.png
|
||||
|
||||
This application prioritizes the north-south traffic over east-west
|
||||
traffic: the north-south traffic involves customer-facing data.
|
17
doc/arch-design-draft/source/use-cases/use-case-public.rst
Normal file
@ -0,0 +1,17 @@
|
||||
.. _public-cloud:
|
||||
|
||||
============
|
||||
Public cloud
|
||||
============
|
||||
|
||||
Stakeholder
|
||||
~~~~~~~~~~~
|
||||
|
||||
User stories
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Design model
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Component block diagram
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
@ -1,6 +1,11 @@
|
||||
==============================
|
||||
Storage-focused cloud examples
|
||||
==============================
|
||||
.. _storage-cloud:
|
||||
|
||||
=============
|
||||
Storage cloud
|
||||
=============
|
||||
|
||||
Design model
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Storage-focused architecture depends on specific use cases. This section
|
||||
discusses three example use cases:
|
||||
@ -11,15 +16,22 @@ discusses three example use cases:
|
||||
|
||||
* High performance database
|
||||
|
||||
Component block diagram
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
Stakeholder
|
||||
~~~~~~~~~~~
|
||||
|
||||
User stories
|
||||
~~~~~~~~~~~~
|
||||
|
||||
The example below shows a REST interface without a high performance
|
||||
requirement.
|
||||
requirement. The following diagram depicts the example architecture:
|
||||
|
||||
Swift is a highly scalable object store that is part of the OpenStack
|
||||
project. This diagram explains the example architecture:
|
||||
.. figure:: ../figures/Storage_Object.png
|
||||
|
||||
.. figure:: figures/Storage_Object.png
|
||||
|
||||
The example REST interface, presented as a traditional Object store
|
||||
The example REST interface, presented as a traditional Object Store
|
||||
running on traditional spindles, does not require a high performance
|
||||
caching tier.
|
||||
|
||||
@ -48,11 +60,11 @@ Proxy:
|
||||
|
||||
.. note::
|
||||
|
||||
It may be necessary to implement a 3rd-party caching layer for some
|
||||
It may be necessary to implement a third party caching layer for some
|
||||
applications to achieve suitable performance.
|
||||
|
||||
Compute analytics with Data processing service
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Compute analytics with data processing service
|
||||
----------------------------------------------
|
||||
|
||||
Analytics of large data sets are dependent on the performance of the
|
||||
storage system. Clouds using storage systems such as Hadoop Distributed
|
||||
@ -68,7 +80,7 @@ OpenStack has integration with Hadoop to manage the Hadoop cluster
|
||||
within the cloud. The following diagram shows an OpenStack store with a
|
||||
high performance requirement:
|
||||
|
||||
.. figure:: figures/Storage_Hadoop3.png
|
||||
.. figure:: ../figures/Storage_Hadoop3.png
|
||||
|
||||
The hardware requirements and configuration are similar to those of the
|
||||
High Performance Database example below. In this case, the architecture
|
||||
@ -96,9 +108,9 @@ database example below, a portion of the SSD pool can act as a block
|
||||
device to the Database server. In the high performance analytics
|
||||
example, the inline SSD cache layer accelerates the REST interface.
|
||||
|
||||
.. figure:: figures/Storage_Database_+_Object5.png
|
||||
.. figure:: ../figures/Storage_Database_+_Object5.png
|
||||
|
||||
In this example, Ceph presents a Swift-compatible REST interface, as
|
||||
In this example, Ceph presents a swift-compatible REST interface, as
|
||||
well as a block level storage from a distributed storage cluster. It is
|
||||
highly flexible and has features that enable reduced cost of operations
|
||||
such as self healing and auto balancing. Using erasure coded pools are a
|
@ -0,0 +1,17 @@
|
||||
.. _web-scale-cloud:
|
||||
|
||||
===============
|
||||
Web scale cloud
|
||||
===============
|
||||
|
||||
Stakeholder
|
||||
~~~~~~~~~~~
|
||||
|
||||
User stories
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Design model
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Component block diagram
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|