[arch-design] Migrate content to new structure

Migrate content changes in Mitaka to the new book structure:
1. Move HA content to the High Availability chapter
2. Move Capacity planning and scaling content to the Storage Design chapter
3. Move Compute resource design content to the Compute Nodes chapter

Change-Id: I6407e7e848dbfe3f8cedafa4596df5ab553eb2b7
Implements: blueprint arch-guide-restructure
This commit is contained in:
daz 2016-07-05 15:49:17 +10:00
parent b1e0ff33f5
commit 4b7de72c4e
7 changed files with 755 additions and 751 deletions

@ -1,190 +0,0 @@
.. _high-availability:
=================
High availability
=================
.. toctree::
:maxdepth: 2
Data Plane and Control Plane
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When designing an OpenStack cloud, it is important to consider the needs
dictated by the :term:`Service Level Agreement (SLA)` in terms of the core
services required to maintain availability of running Compute service
instances, networks, storage and additional services running on top of those
resources. These services are often referred to as the Data Plane services,
and are generally expected to be available all the time.
The remaining services, responsible for CRUD operations, metering, monitoring,
and so on, are often referred to as the Control Plane. The SLA is likely to
dictate a lower uptime requirement for these services.
The services comprising an OpenStack cloud have a number of requirements which
the architect needs to understand in order to be able to meet SLA terms. For
example, in order to provide the Compute service a minimum of storage, message
queueing, and database services are necessary as well as the networking between
them.
Ongoing maintenance operations are made much simpler if there is logical and
physical separation of Data Plane and Control Plane systems. It then becomes
possible to, for example, reboot a controller without affecting customers.
If one service failure affects the operation of an entire server ('noisy
neighbor), the separation between Control and Data Planes enables rapid
maintenance with a limited effect on customer operations.
Eliminating Single Points of Failure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Within each site
----------------
OpenStack lends itself to deployment in a highly available manner where it is
expected that at least 2 servers be utilized. These can run all the services
involved from the message queuing service, for example ``RabbitMQ`` or
``QPID``, and an appropriately deployed database service such as ``MySQL`` or
``MariaDB``. As services in the cloud are scaled out, back-end services will
need to scale too. Monitoring and reporting on server utilization and response
times, as well as load testing your systems, will help determine scale out
decisions.
The OpenStack services themselves should be deployed across multiple servers
that do not represent a single point of failure. Ensuring availability can
be achieved by placing these services behind highly available load balancers
that have multiple OpenStack servers as members.
There are a small number of OpenStack services which are intended to only run
in one place at a time (e.g. the ``ceilometer-agent-central`` service). In
order to prevent these services from becoming a single point of failure, they
can be controlled by clustering software such as ``Pacemaker``.
In OpenStack, the infrastructure is integral to providing services and should
always be available, especially when operating with SLAs. Ensuring network
availability is accomplished by designing the network architecture so that no
single point of failure exists. A consideration of the number of switches,
routes and redundancies of power should be factored into core infrastructure,
as well as the associated bonding of networks to provide diverse routes to your
highly available switch infrastructure.
Care must be taken when deciding network functionality. Currently, OpenStack
supports both the legacy networking (nova-network) system and the newer,
extensible OpenStack Networking (neutron). OpenStack Networking and legacy
networking both have their advantages and disadvantages. They are both valid
and supported options that fit different network deployment models described in
the `OpenStack Operations Guide
<http://docs.openstack.org/ops-guide/arch-network-design.html#network-topology>`_.
When using the Networking service, the OpenStack controller servers or separate
Networking hosts handle routing unless the dynamic virtual routers pattern for
routing is selected. Running routing directly on the controller servers mixes
the Data and Control Planes and can cause complex issues with performance and
troubleshooting. It is possible to use third party software and external
appliances that help maintain highly available layer three routes. Doing so
allows for common application endpoints to control network hardware, or to
provide complex multi-tier web applications in a secure manner. It is also
possible to completely remove routing from Networking, and instead rely on
hardware routing capabilities. In this case, the switching infrastructure must
support layer three routing.
Application design must also be factored into the capabilities of the
underlying cloud infrastructure. If the compute hosts do not provide a seamless
live migration capability, then it must be expected that if a compute host
fails, that instance and any data local to that instance will be deleted.
However, when providing an expectation to users that instances have a
high-level of uptime guaranteed, the infrastructure must be deployed in a way
that eliminates any single point of failure if a compute host disappears.
This may include utilizing shared file systems on enterprise storage or
OpenStack Block storage to provide a level of guarantee to match service
features.
If using a storage design that includes shared access to centralized storage,
ensure that this is also designed without single points of failure and the SLA
for the solution matches or exceeds the expected SLA for the Data Plane.
Between sites in a multi region design
--------------------------------------
Some services are commonly shared between multiple regions, including the
Identity service and the Dashboard. In this case, it is necessary to ensure
that the databases backing the services are replicated, and that access to
multiple workers across each site can be maintained in the event of losing a
single region.
Multiple network links should be deployed between sites to provide redundancy
for all components. This includes storage replication, which should be isolated
to a dedicated network or VLAN with the ability to assign QoS to control the
replication traffic or provide priority for this traffic. Note that if the data
store is highly changeable, the network requirements could have a significant
effect on the operational cost of maintaining the sites.
If the design incorporates more than one site, the ability to maintain object
availability in both sites has significant implications on the object storage
design and implementation. It also has a significant impact on the WAN network
design between the sites.
If applications running in a cloud are not cloud-aware, there should be clear
measures and expectations to define what the infrastructure can and cannot
support. An example would be shared storage between sites. It is possible,
however such a solution is not native to OpenStack and requires a third-party
hardware vendor to fulfill such a requirement. Another example can be seen in
applications that are able to consume resources in object storage directly.
Connecting more than two sites increases the challenges and adds more
complexity to the design considerations. Multi-site implementations require
planning to address the additional topology used for internal and external
connectivity. Some options include full mesh topology, hub spoke, spine leaf,
and 3D Torus.
For more information on high availability in OpenStack, see the `OpenStack High
Availability Guide <http://docs.openstack.org/ha-guide/>`_.
Site loss and recovery
~~~~~~~~~~~~~~~~~~~~~~
Outages can cause partial or full loss of site functionality. Strategies
should be implemented to understand and plan for recovery scenarios.
* The deployed applications need to continue to function and, more
importantly, you must consider the impact on the performance and
reliability of the application if a site is unavailable.
* It is important to understand what happens to the replication of
objects and data between the sites when a site goes down. If this
causes queues to start building up, consider how long these queues
can safely exist until an error occurs.
* After an outage, ensure that operations of a site are resumed when it
comes back online. We recommend that you architect the recovery to
avoid race conditions.
Inter-site replication data
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Traditionally, replication has been the best method of protecting object store
implementations. A variety of replication methods exist in storage
architectures, for example synchronous and asynchronous mirroring. Most object
stores and back-end storage systems implement methods for replication at the
storage subsystem layer. Object stores also tailor replication techniques to
fit a cloud's requirements.
Organizations must find the right balance between data integrity and data
availability. Replication strategy may also influence disaster recovery
methods.
Replication across different racks, data centers, and geographical regions
increases focus on determining and ensuring data locality. The ability to
guarantee data is accessed from the nearest or fastest storage can be necessary
for applications to perform well.
.. note::
When running embedded object store methods, ensure that you do not
instigate extra data replication as this may cause performance issues.

@ -1,5 +1,5 @@
=============
Compute Nodes
Compute nodes
=============
.. toctree::
@ -13,6 +13,88 @@ when designing and building your compute nodes. Compute nodes form the
resource core of the OpenStack Compute cloud, providing the processing, memory,
network and storage resources to run instances.
Overview
~~~~~~~~
When designing compute resource pools, consider the number of processors,
amount of memory, and the quantity of storage required for each hypervisor.
Determine whether compute resources will be provided in a single pool or in
multiple pools. In most cases, multiple pools of resources can be allocated
and addressed on demand, commonly referred to as bin packing.
In a bin packing design, each independent resource pool provides service
for specific flavors. Since instances are scheduled onto compute hypervisors,
each independent node's resources will be allocated to efficiently use the
available hardware. Bin packing also requires a common hardware design,
with all hardware nodes within a compute resource pool sharing a common
processor, memory, and storage layout. This makes it easier to deploy,
support, and maintain nodes throughout their lifecycle.
Increasing the size of the supporting compute environment increases the
network traffic and messages, adding load to the controller or
networking nodes. Effective monitoring of the environment will help with
capacity decisions on scaling.
Compute nodes automatically attach to OpenStack clouds, resulting in a
horizontally scaling process when adding extra compute capacity to an
OpenStack cloud. Additional processes are required to place nodes into
appropriate availability zones and host aggregates. When adding
additional compute nodes to environments, ensure identical or functional
compatible CPUs are used, otherwise live migration features will break.
It is necessary to add rack capacity or network switches as scaling out
compute hosts directly affects data center resources.
Compute host components can also be upgraded to account for increases in
demand, known as vertical scaling. Upgrading CPUs with more
cores, or increasing the overall server memory, can add extra needed
capacity depending on whether the running applications are more CPU
intensive or memory intensive.
When selecting a processor, compare features and performance
characteristics. Some processors include features specific to
virtualized compute hosts, such as hardware-assisted virtualization, and
technology related to memory paging (also known as EPT shadowing). These
types of features can have a significant impact on the performance of
your virtual machine.
The number of processor cores and threads impacts the number of worker
threads which can be run on a resource node. Design decisions must
relate directly to the service being run on it, as well as provide a
balanced infrastructure for all services.
Another option is to assess the average workloads and increase the
number of instances that can run within the compute environment by
adjusting the overcommit ratio. This ratio is configurable for CPU and
memory. The default CPU overcommit ratio is 16:1, and the default memory
overcommit ratio is 1.5:1. Determining the tuning of the overcommit
ratios during the design phase is important as it has a direct impact on
the hardware layout of your compute nodes.
.. note::
Changing the CPU overcommit ratio can have a detrimental effect
and cause a potential increase in a noisy neighbor.
Insufficient disk capacity could also have a negative effect on overall
performance including CPU and memory usage. Depending on the back-end
architecture of the OpenStack Block Storage layer, capacity includes
adding disk shelves to enterprise storage systems or installing
additional Block Storage nodes. Upgrading directly attached storage
installed in Compute hosts, and adding capacity to the shared storage
for additional ephemeral storage to instances, may be necessary.
Consider the Compute requirements of non-hypervisor nodes (also referred to as
resource nodes). This includes controller, Object Storage nodes, Block Storage
nodes, and networking services.
The ability to add Compute resource pools for unpredictable workloads should
be considered. In some cases, the demand for certain instance types or flavors
may not justify individual hardware design. Allocate hardware designs that are
capable of servicing the most common instance requests. Adding hardware to the
overall architecture can be done later.
Choosing a CPU
~~~~~~~~~~~~~~

@ -3,474 +3,13 @@ Storage design
==============
Storage is found in many parts of the OpenStack cloud environment. This
section describes persistent storage options you can configure with
your cloud. It is important to understand the distinction between
:term:`ephemeral <ephemeral volume>` storage and
:term:`persistent <persistent volume>` storage.
chapter describes persistent storage options you can configure with
your cloud.
Ephemeral Storage
~~~~~~~~~~~~~~~~~
If you deploy only the OpenStack :term:`Compute service` (nova), by default
your users do not have access to any form of persistent storage. The disks
associated with VMs are "ephemeral," meaning that from the user's point
of view they disappear when a virtual machine is terminated.
Persistent Storage
~~~~~~~~~~~~~~~~~~
Persistent storage means that the storage resource outlives any other
resource and is always available, regardless of the state of a running
instance.
Today, OpenStack clouds explicitly support three types of persistent
storage: *object storage*, *block storage*, and *file system storage*.
Object Storage
--------------
Object storage is implemented in OpenStack by the
OpenStack Object Storage (swift) project. Users access binary objects
through a REST API. If your intended users need to
archive or manage large datasets, you want to provide them with Object
Storage. In addition, OpenStack can store your virtual machine (VM)
images inside of an object storage system, as an alternative to storing
the images on a file system.
OpenStack Object Storage provides a highly scalable, highly available
storage solution by relaxing some of the constraints of traditional file
systems. In designing and procuring for such a cluster, it is important
to understand some key concepts about its operation. Essentially, this
type of storage is built on the idea that all storage hardware fails, at
every level, at some point. Infrequently encountered failures that would
hamstring other storage systems, such as issues taking down RAID cards
or entire servers, are handled gracefully with OpenStack Object
Storage. For more information, see the `Swift developer
documentation <http://docs.openstack.org/developer/swift/overview_architecture.html>`_
When designing your cluster, you must consider durability and
availability which is dependent on the spread and placement of your data,
rather than the reliability of the
hardware. Consider the default value of the number of replicas, which is
three. This means that before an object is marked as having been
written, at least two copies exist in case a single server fails to
write, the third copy may or may not yet exist when the write operation
initially returns. Altering this number increases the robustness of your
data, but reduces the amount of storage you have available. Look
at the placement of your servers. Consider spreading them widely
throughout your data center's network and power-failure zones. Is a zone
a rack, a server, or a disk?
Consider these main traffic flows for an Object Storage network:
* Among :term:`object`, :term:`container`, and
:term:`account servers <account server>`
* Between servers and the proxies
* Between the proxies and your users
Object Storage frequent communicates among servers hosting data. Even a small
cluster generates megabytes/second of traffic, which is predominantly, “Do
you have the object?” and “Yes I have the object!” If the answer
to the question is negative or the request times out,
replication of the object begins.
Consider the scenario where an entire server fails and 24 TB of data
needs to be transferred "immediately" to remain at three copies — this can
put significant load on the network.
Another consideration is when a new file is being uploaded, the proxy server
must write out as many streams as there are replicas, multiplying network
traffic. For a three-replica cluster, 10 Gbps in means 30 Gbps out. Combining
this with the previous high bandwidth bandwidth private versus public network
recommendations demands of replication is what results in the recommendation
that your private network be of significantly higher bandwidth than your public
network requires. OpenStack Object Storage communicates internally with
unencrypted, unauthenticated rsync for performance, so the private
network is required.
The remaining point on bandwidth is the public-facing portion. The
``swift-proxy`` service is stateless, which means that you can easily
add more and use HTTP load-balancing methods to share bandwidth and
availability between them.
More proxies means more bandwidth, if your storage can keep up.
Block Storage
-------------
Block storage (sometimes referred to as volume storage) provides users
with access to block-storage devices. Users interact with block storage
by attaching volumes to their running VM instances.
These volumes are persistent: they can be detached from one instance and
re-attached to another, and the data remains intact. Block storage is
implemented in OpenStack by the OpenStack Block Storage (cinder), which
supports multiple back ends in the form of drivers. Your
choice of a storage back end must be supported by a Block Storage
driver.
Most block storage drivers allow the instance to have direct access to
the underlying storage hardware's block device. This helps increase the
overall read/write IO. However, support for utilizing files as volumes
is also well established, with full support for NFS, GlusterFS and
others.
These drivers work a little differently than a traditional "block"
storage driver. On an NFS or GlusterFS file system, a single file is
created and then mapped as a "virtual" volume into the instance. This
mapping/translation is similar to how OpenStack utilizes QEMU's
file-based virtual machines stored in ``/var/lib/nova/instances``.
Shared File Systems Service
---------------------------
The Shared File Systems service (manila) provides a set of services for
management of shared file systems in a multi-tenant cloud environment.
Users interact with the Shared File Systems service by mounting remote File
Systems on their instances with the following usage of those systems for
file storing and exchange. The Shared File Systems service provides you with
shares which is a remote, mountable file system. You can mount a
share to and access a share from several hosts by several users at a
time. With shares, user can also:
* Create a share specifying its size, shared file system protocol,
visibility level.
* Create a share on either a share server or standalone, depending on
the selected back-end mode, with or without using a share network.
* Specify access rules and security services for existing shares.
* Combine several shares in groups to keep data consistency inside the
groups for the following safe group operations.
* Create a snapshot of a selected share or a share group for storing
the existing shares consistently or creating new shares from that
snapshot in a consistent way.
* Create a share from a snapshot.
* Set rate limits and quotas for specific shares and snapshots.
* View usage of share resources.
* Remove shares.
Like Block Storage, the Shared File Systems service is persistent. It
can be:
* Mounted to any number of client machines.
* Detached from one instance and attached to another without data loss.
During this process the data are safe unless the Shared File Systems
service itself is changed or removed.
Shares are provided by the Shared File Systems service. In OpenStack,
Shared File Systems service is implemented by Shared File System
(manila) project, which supports multiple back-ends in the form of
drivers. The Shared File Systems service can be configured to provision
shares from one or more back-ends. Share servers are, mostly, virtual
machines that export file shares using different protocols such as NFS,
CIFS, GlusterFS, or HDFS.
OpenStack Storage Concepts
~~~~~~~~~~~~~~~~~~~~~~~~~~
:ref:`table_openstack_storage` explains the different storage concepts
provided by OpenStack.
.. _table_openstack_storage:
.. list-table:: Table. OpenStack storage
:widths: 20 20 20 20 20
:header-rows: 1
* -
- Ephemeral storage
- Block storage
- Object storage
- Shared File System storage
* - Used to…
- Run operating system and scratch space
- Add additional persistent storage to a virtual machine (VM)
- Store data, including VM images
- Add additional persistent storage to a virtual machine
* - Accessed through…
- A file system
- A block device that can be partitioned, formatted, and mounted
(such as, /dev/vdc)
- The REST API
- A Shared File Systems service share (either manila managed or an
external one registered in manila) that can be partitioned, formatted
and mounted (such as /dev/vdc)
* - Accessible from…
- Within a VM
- Within a VM
- Anywhere
- Within a VM
* - Managed by…
- OpenStack Compute (nova)
- OpenStack Block Storage (cinder)
- OpenStack Object Storage (swift)
- OpenStack Shared File System Storage (manila)
* - Persists until…
- VM is terminated
- Deleted by user
- Deleted by user
- Deleted by user
* - Sizing determined by…
- Administrator configuration of size settings, known as *flavors*
- User specification in initial request
- Amount of available physical storage
- * User specification in initial request
* Requests for extension
* Available user-level quotes
* Limitations applied by Administrator
* - Encryption set by…
- Parameter in nova.conf
- Admin establishing `encrypted volume type
<http://docs.openstack.org/admin-guide/dashboard-manage-volumes.html>`_,
then user selecting encrypted volume
- Not yet available
- Shared File Systems service does not apply any additional encryption
above what the shares back-end storage provides
* - Example of typical usage…
- 10 GB first disk, 30 GB second disk
- 1 TB disk
- 10s of TBs of dataset storage
- Depends completely on the size of back-end storage specified when
a share was being created. In case of thin provisioning it can be
partial space reservation (for more details see
`Capabilities and Extra-Specs
<http://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
specification)
.. note::
**File-level Storage (for Live Migration)**
With file-level storage, users access stored data using the operating
system's file system interface. Most users, if they have used a network
storage solution before, have encountered this form of networked
storage. In the Unix world, the most common form of this is NFS. In the
Windows world, the most common form is called CIFS (previously, SMB).
OpenStack clouds do not present file-level storage to end users.
However, it is important to consider file-level storage for storing
instances under ``/var/lib/nova/instances`` when designing your cloud,
since you must have a shared file system if you want to support live
migration.
Choosing Storage Back Ends
~~~~~~~~~~~~~~~~~~~~~~~~~~
Users will indicate different needs for their cloud use cases. Some may
need fast access to many objects that do not change often, or want to
set a time-to-live (TTL) value on a file. Others may access only storage
that is mounted with the file system itself, but want it to be
replicated instantly when starting a new instance. For other systems,
ephemeral storage is the preferred choice. When you select
:term:`storage back ends <storage back end>`,
consider the following questions from user's perspective:
* Do my users need block storage?
* Do my users need object storage?
* Do I need to support live migration?
* Should my persistent storage drives be contained in my compute nodes,
or should I use external storage?
* What is the platter count I can achieve? Do more spindles result in
better I/O despite network access?
* Which one results in the best cost-performance scenario I'm aiming for?
* How do I manage the storage operationally?
* How redundant and distributed is the storage? What happens if a
storage node fails? To what extent can it mitigate my data-loss
disaster scenarios?
To deploy your storage by using only commodity hardware, you can use a number
of open-source packages, as shown in :ref:`table_persistent_file_storage`.
.. _table_persistent_file_storage:
.. list-table:: Table. Persistent file-based storage support
:widths: 25 25 25 25
:header-rows: 1
* -
- Object
- Block
- File-level
* - Swift
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
-
* - LVM
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
* - Ceph
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- Experimental
* - Gluster
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
* - NFS
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
* - ZFS
-
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
* - Sheepdog
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: figures/Check_mark_23x20_02.png
:width: 30%
-
This list of open source file-level shared storage solutions is not
exhaustive; other open source solutions exist (MooseFS). Your
organization may already have deployed a file-level shared storage
solution that you can use.
.. note::
**Storage Driver Support**
In addition to the open source technologies, there are a number of
proprietary solutions that are officially supported by OpenStack Block
Storage. You can find a matrix of the functionality provided by all of the
supported Block Storage drivers on the `OpenStack
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
Also, you need to decide whether you want to support object storage in
your cloud. The two common use cases for providing object storage in a
compute cloud are:
* To provide users with a persistent storage mechanism
* As a scalable, reliable data store for virtual machine images
Commodity Storage Back-end Technologies
---------------------------------------
This section provides a high-level overview of the differences among the
different commodity storage back end technologies. Depending on your
cloud user's needs, you can implement one or many of these technologies
in different combinations:
OpenStack Object Storage (swift)
The official OpenStack Object Store implementation. It is a mature
technology that has been used for several years in production by
Rackspace as the technology behind Rackspace Cloud Files. As it is
highly scalable, it is well-suited to managing petabytes of storage.
OpenStack Object Storage's advantages are better integration with
OpenStack (integrates with OpenStack Identity, works with the
OpenStack dashboard interface) and better support for multiple data
center deployment through support of asynchronous eventual
consistency replication.
Therefore, if you eventually plan on distributing your storage
cluster across multiple data centers, if you need unified accounts
for your users for both compute and object storage, or if you want
to control your object storage with the OpenStack dashboard, you
should consider OpenStack Object Storage. More detail can be found
about OpenStack Object Storage in the section below.
Ceph
A scalable storage solution that replicates data across commodity
storage nodes.
Ceph was designed to expose different types of storage interfaces to
the end user: it supports object storage, block storage, and
file-system interfaces, although the file-system interface is not
production-ready. Ceph supports the same API as swift
for object storage and can be used as a back end for cinder block
storage as well as back-end storage for glance images. Ceph supports
"thin provisioning," implemented using copy-on-write.
This can be useful when booting from volume because a new volume can
be provisioned very quickly. Ceph also supports keystone-based
authentication (as of version 0.56), so it can be a seamless swap in
for the default OpenStack swift implementation.
Ceph's advantages are that it gives the administrator more
fine-grained control over data distribution and replication
strategies, enables you to consolidate your object and block
storage, enables very fast provisioning of boot-from-volume
instances using thin provisioning, and supports a distributed
file-system interface, though this interface is `not yet
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
production deployment by the Ceph project.
If you want to manage your object and block storage within a single
system, or if you want to support fast boot-from-volume, you should
consider Ceph.
Gluster
A distributed, shared file system. As of Gluster version 3.3, you
can use Gluster to consolidate your object storage and file storage
into one unified file and object storage solution, which is called
Gluster For OpenStack (GFO). GFO uses a customized version of swift
that enables Gluster to be used as the back-end storage.
The main reason to use GFO rather than swift is if you also
want to support a distributed file system, either to support shared
storage live migration or to provide it as a separate service to
your end users. If you want to manage your object and file storage
within a single system, you should consider GFO.
LVM
The Logical Volume Manager is a Linux-based system that provides an
abstraction layer on top of physical disks to expose logical volumes
to the operating system. The LVM back-end implements block storage
as LVM logical partitions.
On each host that will house block storage, an administrator must
initially create a volume group dedicated to Block Storage volumes.
Blocks are created from LVM logical volumes.
.. note::
LVM does *not* provide any replication. Typically,
administrators configure RAID on nodes that use LVM as block
storage to protect against failures of individual hard drives.
However, RAID does not protect against a failure of the entire
host.
ZFS
The Solaris iSCSI driver for OpenStack Block Storage implements
blocks as ZFS entities. ZFS is a file system that also has the
functionality of a volume manager. This is unlike on a Linux system,
where there is a separation of volume manager (LVM) and file system
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
advantages over ext4, including improved data-integrity checking.
The ZFS back end for OpenStack Block Storage supports only
Solaris-based systems, such as Illumos. While there is a Linux port
of ZFS, it is not included in any of the standard Linux
distributions, and it has not been tested with OpenStack Block
Storage. As with LVM, ZFS does not provide replication across hosts
on its own; you need to add a replication solution on top of ZFS if
your cloud needs to be able to handle storage-node failures.
We don't recommend ZFS unless you have previous experience with
deploying it, since the ZFS back end for Block Storage requires a
Solaris-based operating system, and we assume that your experience
is primarily with Linux-based systems.
Sheepdog
Sheepdog is a userspace distributed storage system. Sheepdog scales
to several hundred nodes, and has powerful virtual disk management
features like snapshot, cloning, rollback, thin provisioning.
It is essentially an object storage system that manages disks and
aggregates the space and performance of disks linearly in hyper
scale on commodity hardware in a smart way. On top of its object
store, Sheepdog provides elastic volume service and http service.
Sheepdog does not assume anything about kernel version and can work
nicely with xattr-supported file systems.
.. toctree::
:maxdepth: 2
design-storage/design-storage-concepts
design-storage/design-storage-backends
design-storage/design-storage-planning-scaling.rst

@ -0,0 +1,222 @@
==========================
Choosing storage back ends
==========================
Users indicate different needs for their cloud use cases. Some may
need fast access to many objects that do not change often, or want to
set a time-to-live (TTL) value on a file. Others may access only storage
that is mounted with the file system itself, but want it to be
replicated instantly when starting a new instance. For other systems,
ephemeral storage is the preferred choice. When you select
:term:`storage back ends <storage back end>`,
consider the following questions from user's perspective:
* Do my users need Block Storage?
* Do my users need Object Storage?
* Do I need to support live migration?
* Should my persistent storage drives be contained in my Compute nodes,
or should I use external storage?
* What is the platter count I can achieve? Do more spindles result in
better I/O despite network access?
* Which one results in the best cost-performance scenario I am aiming for?
* How do I manage the storage operationally?
* How redundant and distributed is the storage? What happens if a
storage node fails? To what extent can it mitigate my data-loss
disaster scenarios?
To deploy your storage by using only commodity hardware, you can use a number
of open-source packages, as shown in :ref:`table_persistent_file_storage`.
.. _table_persistent_file_storage:
.. list-table:: Table. Persistent file-based storage support
:widths: 25 25 25 25
:header-rows: 1
* -
- Object
- Block
- File-level
* - Swift
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
-
-
* - LVM
-
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
-
* - Ceph
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
- Experimental
* - Gluster
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
* - NFS
-
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
* - ZFS
-
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
-
* - Sheepdog
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
- .. image:: ../figures/Check_mark_23x20_02.png
:width: 30%
-
Open source file-level shared storage solutions are available, such as
MooseFS. Your organization may already have deployed a file-level
shared storage solution that you can use.
.. note::
**Storage Driver Support**
In addition to the open source technologies, there are a number of
proprietary solutions that are officially supported by OpenStack Block
Storage. You can find a matrix of the functionality provided by all of the
supported Block Storage drivers on the `OpenStack
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
You should also decide whether you want to support Object Storage in
your cloud. The two common use cases for providing Object Storage in a
Compute cloud are:
* To provide users with a persistent storage mechanism
* As a scalable, reliable data store for virtual machine images
Commodity storage back-end technologies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section provides a high-level overview of the differences among the
different commodity storage back end technologies. Depending on your
cloud user's needs, you can implement one or many of these technologies
in different combinations:
OpenStack Object Storage (swift)
The official OpenStack Object Store implementation. It is a mature
technology that has been used for several years in production by
Rackspace as the technology behind Rackspace Cloud Files. As it is
highly scalable, it is well-suited to managing petabytes of storage.
OpenStack Object Storage's advantages are better integration with
OpenStack (integrates with OpenStack Identity, works with the
OpenStack Dashboard interface) and better support for multiple data
center deployment through support of asynchronous eventual
consistency replication.
If you plan on distributing your storage
cluster across multiple data centers, if you need unified accounts
for your users for both compute and object storage, or if you want
to control your object storage with the OpenStack dashboard, you
should consider OpenStack Object Storage. More detail can be found
about OpenStack Object Storage in the section below.
Ceph
A scalable storage solution that replicates data across commodity
storage nodes.
Ceph was designed to expose different types of storage interfaces to
the end user: it supports Object Storage, Block Storage, and
file-system interfaces, although the file-system interface is not
production-ready. Ceph supports the same API as swift
for Object Storage and can be used as a back end for Block
Storage, as well as back-end storage for glance images. Ceph supports
"thin provisioning," implemented using copy-on-write.
This can be useful when booting from volume because a new volume can
be provisioned very quickly. Ceph also supports keystone-based
authentication (as of version 0.56), so it can be a seamless swap in
for the default OpenStack swift implementation.
Ceph's advantages are that it gives the administrator more
fine-grained control over data distribution and replication
strategies, enables you to consolidate your Object and Block
Storage, enables very fast provisioning of boot-from-volume
instances using thin provisioning, and supports a distributed
file-system interface, though this interface is `not yet
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
production deployment by the Ceph project.
If you want to manage your Object and Block Storage within a single
system, or if you want to support fast boot-from-volume, you should
consider Ceph.
Gluster
A distributed, shared file system. As of Gluster version 3.3, you
can use Gluster to consolidate your object storage and file storage
into one unified file and Object Storage solution, which is called
Gluster For OpenStack (GFO). GFO uses a customized version of swift
that enables Gluster to be used as the back-end storage.
The main reason to use GFO rather than swift is if you also
want to support a distributed file system, either to support shared
storage live migration or to provide it as a separate service to
your end users. If you want to manage your object and file storage
within a single system, you should consider GFO.
LVM
The Logical Volume Manager is a Linux-based system that provides an
abstraction layer on top of physical disks to expose logical volumes
to the operating system. The LVM back-end implements block storage
as LVM logical partitions.
On each host that that houses Block Storage, an administrator must
initially create a volume group dedicated to Block Storage volumes.
Blocks are created from LVM logical volumes.
.. note::
LVM does *not* provide any replication. Typically,
administrators configure RAID on nodes that use LVM as block
storage to protect against failures of individual hard drives.
However, RAID does not protect against a failure of the entire
host.
ZFS
The Solaris iSCSI driver for OpenStack Block Storage implements
blocks as ZFS entities. ZFS is a file system that also has the
functionality of a volume manager. This is unlike on a Linux system,
where there is a separation of volume manager (LVM) and file system
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
advantages over ext4, including improved data-integrity checking.
The ZFS back end for OpenStack Block Storage supports only
Solaris-based systems, such as Illumos. While there is a Linux port
of ZFS, it is not included in any of the standard Linux
distributions, and it has not been tested with OpenStack Block
Storage. As with LVM, ZFS does not provide replication across hosts
on its own; you need to add a replication solution on top of ZFS if
your cloud needs to be able to handle storage-node failures.
We don't recommend ZFS unless you have previous experience with
deploying it, since the ZFS back end for Block Storage requires a
Solaris-based operating system, and we assume that your experience
is primarily with Linux-based systems.
Sheepdog
Sheepdog is a userspace distributed storage system. Sheepdog scales
to several hundred nodes, and has powerful virtual disk management
features like snapshot, cloning, rollback, thin provisioning.
It is essentially an object storage system that manages disks and
aggregates the space and performance of disks linearly in hyper
scale on commodity hardware in a smart way. On top of its object
store, Sheepdog provides elastic volume service and http service.
Sheepdog does not assume anything about kernel version and can work
nicely with xattr-supported file systems.
.. TODO Add summary of when Sheepdog is recommended

@ -0,0 +1,257 @@
================
Storage concepts
================
This section describes persistent storage options you can configure with
your cloud. It is important to understand the distinction between
:term:`ephemeral <ephemeral volume>` storage and
:term:`persistent <persistent volume>` storage.
Ephemeral storage
~~~~~~~~~~~~~~~~~
If you deploy only the OpenStack :term:`Compute service` (nova), by default
your users do not have access to any form of persistent storage. The disks
associated with VMs are ephemeral, meaning that from the user's point
of view they disappear when a virtual machine is terminated.
Persistent storage
~~~~~~~~~~~~~~~~~~
Persistent storage is a storage resource that outlives any other
resource and is always available, regardless of the state of a running
instance.
OpenStack clouds explicitly support three types of persistent
storage: *Object Storage*, *Block Storage*, and *file system storage*.
:ref:`table_openstack_storage` explains the different storage concepts
provided by OpenStack.
.. _table_openstack_storage:
.. list-table:: Table. OpenStack storage
:widths: 20 20 20 20 20
:header-rows: 1
* -
- Ephemeral storage
- Block storage
- Object storage
- Shared File System storage
* - Used to…
- Run operating system and scratch space
- Add additional persistent storage to a virtual machine (VM)
- Store data, including VM images
- Add additional persistent storage to a virtual machine
* - Accessed through…
- A file system
- A block device that can be partitioned, formatted, and mounted
(such as, /dev/vdc)
- The REST API
- A Shared File Systems service share (either manila managed or an
external one registered in manila) that can be partitioned, formatted
and mounted (such as /dev/vdc)
* - Accessible from…
- Within a VM
- Within a VM
- Anywhere
- Within a VM
* - Managed by…
- OpenStack Compute (nova)
- OpenStack Block Storage (cinder)
- OpenStack Object Storage (swift)
- OpenStack Shared File System Storage (manila)
* - Persists until…
- VM is terminated
- Deleted by user
- Deleted by user
- Deleted by user
* - Sizing determined by…
- Administrator configuration of size settings, known as *flavors*
- User specification in initial request
- Amount of available physical storage
- * User specification in initial request
* Requests for extension
* Available user-level quotes
* Limitations applied by Administrator
* - Encryption set by…
- Parameter in nova.conf
- Admin establishing `encrypted volume type
<http://docs.openstack.org/admin-guide/dashboard_manage_volumes.html>`_,
then user selecting encrypted volume
- Not yet available
- Shared File Systems service does not apply any additional encryption
above what the shares back-end storage provides
* - Example of typical usage…
- 10 GB first disk, 30 GB second disk
- 1 TB disk
- 10s of TBs of dataset storage
- Depends completely on the size of back-end storage specified when
a share was being created. In case of thin provisioning it can be
partial space reservation (for more details see
`Capabilities and Extra-Specs
<http://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
specification)
.. note::
**File-level Storage (for Live Migration)**
With file-level storage, users access stored data using the operating
system's file system interface. Most users, if they have used a network
storage solution before, have encountered this form of networked
storage. In the Unix world, the most common form of this is NFS. In the
Windows world, the most common form is called CIFS (previously, SMB).
OpenStack clouds do not present file-level storage to end users.
However, it is important to consider file-level storage for storing
instances under ``/var/lib/nova/instances`` when designing your cloud,
since you must have a shared file system if you want to support live
migration.
Object Storage
--------------
.. TODO (shaun) Revise this section. I would start with an abstract of object
storage and then describe how swift fits into it. I think this will match
the rest of the sections better.
Object storage is implemented in OpenStack by the
OpenStack Object Storage (swift) project. Users access binary objects
through a REST API. If your intended users need to
archive or manage large datasets, you want to provide them with Object
Storage. In addition, OpenStack can store your virtual machine (VM)
images inside of an Object Storage system, as an alternative to storing
the images on a file system.
OpenStack Object Storage provides a highly scalable, highly available
storage solution by relaxing some of the constraints of traditional file
systems. In designing and procuring for such a cluster, it is important
to understand some key concepts about its operation. Essentially, this
type of storage is built on the idea that all storage hardware fails, at
every level, at some point. Infrequently encountered failures that would
hamstring other storage systems, such as issues taking down RAID cards
or entire servers, are handled gracefully with OpenStack Object
Storage. For more information, see the `Swift developer
documentation <http://docs.openstack.org/developer/swift/overview_architecture.html>`_
When designing your cluster, consider:
* Durability and availability, that is dependent on the spread and
placement of your data, rather than the reliability of the hardware.
* Default value of the number of replicas, which is
three. This means that before an object is marked as having been
written, at least two copies exist in case a single server fails to
write, the third copy may or may not yet exist when the write operation
initially returns. Altering this number increases the robustness of your
data, but reduces the amount of storage you have available.
* Placement of your servers, whether to spread them widely
throughout your data center's network and power-failure zones. Define
a zone as a rack, a server, or a disk.
Consider these main traffic flows for an Object Storage network:
* Among :term:`object`, :term:`container`, and
:term:`account servers <account server>`
* Between servers and the proxies
* Between the proxies and your users
Object Storage frequently communicates among servers hosting data. Even a small
cluster generates megabytes/second of traffic. If an object is not received
or the request times out, replication of the object begins.
.. TODO Above paragraph: descibe what Object Storage is communicationg. What
is actually communicating? What part of the software is doing the
communicating? Is it all of the servers communicating with one another?
Consider the scenario where an entire server fails and 24 TB of data
needs to be transferred immediately to remain at three copies — this can
put significant load on the network.
Another consideration is when a new file is being uploaded, the proxy server
must write out as many streams as there are replicas, multiplying network
traffic. For a three-replica cluster, 10 Gbps in means 30 Gbps out. Combining
this with the previous high bandwidth private versus public network
recommendations demands of replication is what results in the recommendation
that your private network be of significantly higher bandwidth than your public
network requires. OpenStack Object Storage communicates internally with
unencrypted, unauthenticated rsync for performance, so the private
network is required.
.. TODO Revise the above paragraph for clarity.
The remaining point on bandwidth is the public-facing portion. The
``swift-proxy`` service is stateless, which means that you can easily
add more and use HTTP load-balancing methods to share bandwidth and
availability between them.
Block Storage
-------------
Block storage provides users with access to Block Storage devices. Users
interact with Block Storage by attaching volumes to their running VM instances.
These volumes are persistent: they can be detached from one instance and
re-attached to another, and the data remains intact. Block storage is
implemented in OpenStack by the OpenStack Block Storage (cinder), which
supports multiple back ends in the form of drivers. Your
choice of a storage back end must be supported by a Block Storage
driver.
Most Block Storage drivers allow the instance to have direct access to
the underlying storage hardware's block device. This helps increase the
overall read and write IO. However, support for utilizing files as volumes
is also well established, with full support for NFS, GlusterFS, and
others.
These drivers work a little differently than a traditional Block
Storage driver. On an NFS or GlusterFS file system, a single file is
created and then mapped as a virtual volume into the instance. This
mapping or translation is similar to how OpenStack utilizes QEMU's
file-based virtual machines stored in ``/var/lib/nova/instances``.
Shared File Systems Service
---------------------------
The Shared File Systems service (manila) provides a set of services for
management of shared file systems in a multi-tenant cloud environment.
Users interact with the Shared File Systems service by mounting remote File
Systems on their instances with the following usage of those systems for
file storing and exchange. The Shared File Systems service provides you with
a share which is a remote, mountable file system. You can mount a
share to and access a share from several hosts by several users at a
time. With shares, a user can also:
* Create a share specifying its size, shared file system protocol, and
visibility level.
* Create a share on either a share server or standalone, depending on
the selected back-end mode, with or without using a share network.
* Specify access rules and security services for existing shares.
* Combine several shares in groups to keep data consistency inside the
groups for the following safe group operations.
* Create a snapshot of a selected share or a share group for storing
the existing shares consistently or creating new shares from that
snapshot in a consistent way.
* Create a share from a snapshot.
* Set rate limits and quotas for specific shares and snapshots.
* View usage of share resources.
* Remove shares.
Like Block Storage, the Shared File Systems service is persistent. It
can be:
* Mounted to any number of client machines.
* Detached from one instance and attached to another without data loss.
During this process the data are safe unless the Shared File Systems
service itself is changed or removed.
Shares are provided by the Shared File Systems service. In OpenStack,
Shared File Systems service is implemented by Shared File System
(manila) project, which supports multiple back-ends in the form of
drivers. The Shared File Systems service can be configured to provision
shares from one or more back-ends. Share servers are virtual
machines that export file shares using different protocols such as NFS,
CIFS, GlusterFS, or HDFS.

@ -1,6 +1,6 @@
=============================
Capacity planning and scaling
=============================
=====================================
Storage capacity planning and scaling
=====================================
An important consideration in running a cloud over time is projecting growth
and utilization trends in order to plan capital expenditures for the short and
@ -312,91 +312,3 @@ resources servicing requests between proxy servers and storage nodes.
For this reason, the network architecture used for access to storage
nodes and proxy servers should make use of a design which is scalable.
Compute resource design
~~~~~~~~~~~~~~~~~~~~~~~
When designing compute resource pools, consider the number of processors,
amount of memory, and the quantity of storage required for each hypervisor.
Consider whether compute resources will be provided in a single pool or in
multiple pools. In most cases, multiple pools of resources can be allocated
and addressed on demand, commonly referred to as bin packing.
In a bin packing design, each independent resource pool provides service
for specific flavors. Since instances are scheduled onto compute hypervisors,
each independent node's resources will be allocated to efficiently use the
available hardware. Bin packing also requires a common hardware design,
with all hardware nodes within a compute resource pool sharing a common
processor, memory, and storage layout. This makes it easier to deploy,
support, and maintain nodes throughout their lifecycle.
Increasing the size of the supporting compute environment increases the
network traffic and messages, adding load to the controller or
networking nodes. Effective monitoring of the environment will help with
capacity decisions on scaling.
Compute nodes automatically attach to OpenStack clouds, resulting in a
horizontally scaling process when adding extra compute capacity to an
OpenStack cloud. Additional processes are required to place nodes into
appropriate availability zones and host aggregates. When adding
additional compute nodes to environments, ensure identical or functional
compatible CPUs are used, otherwise live migration features will break.
It is necessary to add rack capacity or network switches as scaling out
compute hosts directly affects network and data center resources.
Compute host components can also be upgraded to account for increases in
demand, known as vertical scaling. Upgrading CPUs with more
cores, or increasing the overall server memory, can add extra needed
capacity depending on whether the running applications are more CPU
intensive or memory intensive.
When selecting a processor, compare features and performance
characteristics. Some processors include features specific to
virtualized compute hosts, such as hardware-assisted virtualization, and
technology related to memory paging (also known as EPT shadowing). These
types of features can have a significant impact on the performance of
your virtual machine.
The number of processor cores and threads impacts the number of worker
threads which can be run on a resource node. Design decisions must
relate directly to the service being run on it, as well as provide a
balanced infrastructure for all services.
Another option is to assess the average workloads and increase the
number of instances that can run within the compute environment by
adjusting the overcommit ratio.
An overcommit ratio is the ratio of available virtual resources to
available physical resources. This ratio is configurable for CPU and
memory. The default CPU overcommit ratio is 16:1, and the default memory
overcommit ratio is 1.5:1. Determining the tuning of the overcommit
ratios during the design phase is important as it has a direct impact on
the hardware layout of your compute nodes.
.. note::
Changing the CPU overcommit ratio can have a detrimental effect
and cause a potential increase in a noisy neighbor.
Insufficient disk capacity could also have a negative effect on overall
performance including CPU and memory usage. Depending on the back-end
architecture of the OpenStack Block Storage layer, capacity includes
adding disk shelves to enterprise storage systems or installing
additional block storage nodes. Upgrading directly attached storage
installed in compute hosts, and adding capacity to the shared storage
for additional ephemeral storage to instances, may be necessary.
Consider the compute requirements of non-hypervisor nodes (also referred to as
resource nodes). This includes controller, object storage, and block storage
nodes, and networking services.
The ability to add compute resource pools for unpredictable workloads should
be considered. In some cases, the demand for certain instance types or flavors
may not justify individual hardware design. Allocate hardware designs that are
capable of servicing the most common instance requests. Adding hardware to the
overall architecture can be done later.
For more information on these topics, refer to the `OpenStack
Operations Guide <http://docs.openstack.org/ops>`_.
.. TODO Add information on control plane API services and horizon.

@ -1,5 +1,187 @@
.. _high-availability:
=================
High Availability
High availability
=================
Data plane and control plane
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When designing an OpenStack cloud, it is important to consider the needs
dictated by the :term:`Service Level Agreement (SLA)`. This includes the core
services required to maintain availability of running Compute service
instances, networks, storage, and additional services running on top of those
resources. These services are often referred to as the Data Plane services,
and are generally expected to be available all the time.
The remaining services, responsible for create, read, update and delete (CRUD)
operations, metering, monitoring, and so on, are often referred to as the
Control Plane. The SLA is likely to dictate a lower uptime requirement for
these services.
The services comprising an OpenStack cloud have a number of requirements which
the architect needs to understand in order to be able to meet SLA terms. For
example, in order to provide the Compute service a minimum of storage, message
queueing, and database services are necessary as well as the networking between
them.
Ongoing maintenance operations are made much simpler if there is logical and
physical separation of Data Plane and Control Plane systems. It then becomes
possible to, for example, reboot a controller without affecting customers.
If one service failure affects the operation of an entire server (``noisy
neighbor``), the separation between Control and Data Planes enables rapid
maintenance with a limited effect on customer operations.
Eliminating single points of failure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. TODO Add introduction
Within each site
----------------
OpenStack lends itself to deployment in a highly available manner where it is
expected that at least 2 servers be utilized. These can run all the services
involved from the message queuing service, for example ``RabbitMQ`` or
``QPID``, and an appropriately deployed database service such as ``MySQL`` or
``MariaDB``. As services in the cloud are scaled out, back-end services will
need to scale too. Monitoring and reporting on server utilization and response
times, as well as load testing your systems, will help determine scale out
decisions.
The OpenStack services themselves should be deployed across multiple servers
that do not represent a single point of failure. Ensuring availability can
be achieved by placing these services behind highly available load balancers
that have multiple OpenStack servers as members.
There are a small number of OpenStack services which are intended to only run
in one place at a time (for example, the ``ceilometer-agent-central`` service)
. In order to prevent these services from becoming a single point of failure,
they can be controlled by clustering software such as ``Pacemaker``.
In OpenStack, the infrastructure is integral to providing services and should
always be available, especially when operating with SLAs. Ensuring network
availability is accomplished by designing the network architecture so that no
single point of failure exists. A consideration of the number of switches,
routes and redundancies of power should be factored into core infrastructure,
as well as the associated bonding of networks to provide diverse routes to your
highly available switch infrastructure.
Care must be taken when deciding network functionality. Currently, OpenStack
supports both the legacy networking (nova-network) system and the newer,
extensible OpenStack Networking (neutron). OpenStack Networking and legacy
networking both have their advantages and disadvantages. They are both valid
and supported options that fit different network deployment models described in
the `OpenStack Operations Guide
<http://docs.openstack.org/ops-guide/arch_network_design.html#network-topology>`_.
When using the Networking service, the OpenStack controller servers or separate
Networking hosts handle routing unless the dynamic virtual routers pattern for
routing is selected. Running routing directly on the controller servers mixes
the Data and Control Planes and can cause complex issues with performance and
troubleshooting. It is possible to use third party software and external
appliances that help maintain highly available layer three routes. Doing so
allows for common application endpoints to control network hardware, or to
provide complex multi-tier web applications in a secure manner. It is also
possible to completely remove routing from Networking, and instead rely on
hardware routing capabilities. In this case, the switching infrastructure must
support layer three routing.
Application design must also be factored into the capabilities of the
underlying cloud infrastructure. If the compute hosts do not provide a seamless
live migration capability, then it must be expected that if a compute host
fails, that instance and any data local to that instance will be deleted.
However, when providing an expectation to users that instances have a
high-level of uptime guaranteed, the infrastructure must be deployed in a way
that eliminates any single point of failure if a compute host disappears.
This may include utilizing shared file systems on enterprise storage or
OpenStack Block storage to provide a level of guarantee to match service
features.
If using a storage design that includes shared access to centralized storage,
ensure that this is also designed without single points of failure and the SLA
for the solution matches or exceeds the expected SLA for the Data Plane.
Between sites in a multi-region design
--------------------------------------
Some services are commonly shared between multiple regions, including the
Identity service and the Dashboard. In this case, it is necessary to ensure
that the databases backing the services are replicated, and that access to
multiple workers across each site can be maintained in the event of losing a
single region.
Multiple network links should be deployed between sites to provide redundancy
for all components. This includes storage replication, which should be isolated
to a dedicated network or VLAN with the ability to assign QoS to control the
replication traffic or provide priority for this traffic.
.. note::
If the data store is highly changeable, the network requirements could have
a significant effect on the operational cost of maintaining the sites.
If the design incorporates more than one site, the ability to maintain object
availability in both sites has significant implications on the Object Storage
design and implementation. It also has a significant impact on the WAN network
design between the sites.
If applications running in a cloud are not cloud-aware, there should be clear
measures and expectations to define what the infrastructure can and cannot
support. An example would be shared storage between sites. It is possible,
however such a solution is not native to OpenStack and requires a third-party
hardware vendor to fulfill such a requirement. Another example can be seen in
applications that are able to consume resources in object storage directly.
Connecting more than two sites increases the challenges and adds more
complexity to the design considerations. Multi-site implementations require
planning to address the additional topology used for internal and external
connectivity. Some options include full mesh topology, hub spoke, spine leaf,
and 3D Torus.
For more information on high availability in OpenStack, see the `OpenStack High
Availability Guide <http://docs.openstack.org/ha-guide/>`_.
Site loss and recovery
~~~~~~~~~~~~~~~~~~~~~~
Outages can cause partial or full loss of site functionality. Strategies
should be implemented to understand and plan for recovery scenarios.
* The deployed applications need to continue to function and, more
importantly, you must consider the impact on the performance and
reliability of the application if a site is unavailable.
* It is important to understand what happens to the replication of
objects and data between the sites when a site goes down. If this
causes queues to start building up, consider how long these queues
can safely exist until an error occurs.
* After an outage, ensure that operations of a site are resumed when it
comes back online. We recommend that you architect the recovery to
avoid race conditions.
Inter-site replication data
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Traditionally, replication has been the best method of protecting object store
implementations. A variety of replication methods exist in storage
architectures, for example synchronous and asynchronous mirroring. Most object
stores and back-end storage systems implement methods for replication at the
storage subsystem layer. Object stores also tailor replication techniques to
fit a cloud's requirements.
Organizations must find the right balance between data integrity and data
availability. Replication strategy may also influence disaster recovery
methods.
Replication across different racks, data centers, and geographical regions
increases focus on determining and ensuring data locality. The ability to
guarantee data is accessed from the nearest or fastest storage can be necessary
for applications to perform well.
.. note::
When running embedded object store methods, ensure that you do not
instigate extra data replication as this may cause performance issues.