[arch-design] Migrate content to new structure
Migrate content changes in Mitaka to the new book structure: 1. Move HA content to the High Availability chapter 2. Move Capacity planning and scaling content to the Storage Design chapter 3. Move Compute resource design content to the Compute Nodes chapter Change-Id: I6407e7e848dbfe3f8cedafa4596df5ab553eb2b7 Implements: blueprint arch-guide-restructure
This commit is contained in:
parent
b1e0ff33f5
commit
4b7de72c4e
doc/arch-design-draft/source
@ -1,190 +0,0 @@
|
||||
.. _high-availability:
|
||||
|
||||
=================
|
||||
High availability
|
||||
=================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Data Plane and Control Plane
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When designing an OpenStack cloud, it is important to consider the needs
|
||||
dictated by the :term:`Service Level Agreement (SLA)` in terms of the core
|
||||
services required to maintain availability of running Compute service
|
||||
instances, networks, storage and additional services running on top of those
|
||||
resources. These services are often referred to as the Data Plane services,
|
||||
and are generally expected to be available all the time.
|
||||
|
||||
The remaining services, responsible for CRUD operations, metering, monitoring,
|
||||
and so on, are often referred to as the Control Plane. The SLA is likely to
|
||||
dictate a lower uptime requirement for these services.
|
||||
|
||||
The services comprising an OpenStack cloud have a number of requirements which
|
||||
the architect needs to understand in order to be able to meet SLA terms. For
|
||||
example, in order to provide the Compute service a minimum of storage, message
|
||||
queueing, and database services are necessary as well as the networking between
|
||||
them.
|
||||
|
||||
Ongoing maintenance operations are made much simpler if there is logical and
|
||||
physical separation of Data Plane and Control Plane systems. It then becomes
|
||||
possible to, for example, reboot a controller without affecting customers.
|
||||
If one service failure affects the operation of an entire server ('noisy
|
||||
neighbor’), the separation between Control and Data Planes enables rapid
|
||||
maintenance with a limited effect on customer operations.
|
||||
|
||||
|
||||
Eliminating Single Points of Failure
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Within each site
|
||||
----------------
|
||||
|
||||
OpenStack lends itself to deployment in a highly available manner where it is
|
||||
expected that at least 2 servers be utilized. These can run all the services
|
||||
involved from the message queuing service, for example ``RabbitMQ`` or
|
||||
``QPID``, and an appropriately deployed database service such as ``MySQL`` or
|
||||
``MariaDB``. As services in the cloud are scaled out, back-end services will
|
||||
need to scale too. Monitoring and reporting on server utilization and response
|
||||
times, as well as load testing your systems, will help determine scale out
|
||||
decisions.
|
||||
|
||||
The OpenStack services themselves should be deployed across multiple servers
|
||||
that do not represent a single point of failure. Ensuring availability can
|
||||
be achieved by placing these services behind highly available load balancers
|
||||
that have multiple OpenStack servers as members.
|
||||
|
||||
There are a small number of OpenStack services which are intended to only run
|
||||
in one place at a time (e.g. the ``ceilometer-agent-central`` service). In
|
||||
order to prevent these services from becoming a single point of failure, they
|
||||
can be controlled by clustering software such as ``Pacemaker``.
|
||||
|
||||
In OpenStack, the infrastructure is integral to providing services and should
|
||||
always be available, especially when operating with SLAs. Ensuring network
|
||||
availability is accomplished by designing the network architecture so that no
|
||||
single point of failure exists. A consideration of the number of switches,
|
||||
routes and redundancies of power should be factored into core infrastructure,
|
||||
as well as the associated bonding of networks to provide diverse routes to your
|
||||
highly available switch infrastructure.
|
||||
|
||||
Care must be taken when deciding network functionality. Currently, OpenStack
|
||||
supports both the legacy networking (nova-network) system and the newer,
|
||||
extensible OpenStack Networking (neutron). OpenStack Networking and legacy
|
||||
networking both have their advantages and disadvantages. They are both valid
|
||||
and supported options that fit different network deployment models described in
|
||||
the `OpenStack Operations Guide
|
||||
<http://docs.openstack.org/ops-guide/arch-network-design.html#network-topology>`_.
|
||||
|
||||
When using the Networking service, the OpenStack controller servers or separate
|
||||
Networking hosts handle routing unless the dynamic virtual routers pattern for
|
||||
routing is selected. Running routing directly on the controller servers mixes
|
||||
the Data and Control Planes and can cause complex issues with performance and
|
||||
troubleshooting. It is possible to use third party software and external
|
||||
appliances that help maintain highly available layer three routes. Doing so
|
||||
allows for common application endpoints to control network hardware, or to
|
||||
provide complex multi-tier web applications in a secure manner. It is also
|
||||
possible to completely remove routing from Networking, and instead rely on
|
||||
hardware routing capabilities. In this case, the switching infrastructure must
|
||||
support layer three routing.
|
||||
|
||||
Application design must also be factored into the capabilities of the
|
||||
underlying cloud infrastructure. If the compute hosts do not provide a seamless
|
||||
live migration capability, then it must be expected that if a compute host
|
||||
fails, that instance and any data local to that instance will be deleted.
|
||||
However, when providing an expectation to users that instances have a
|
||||
high-level of uptime guaranteed, the infrastructure must be deployed in a way
|
||||
that eliminates any single point of failure if a compute host disappears.
|
||||
This may include utilizing shared file systems on enterprise storage or
|
||||
OpenStack Block storage to provide a level of guarantee to match service
|
||||
features.
|
||||
|
||||
If using a storage design that includes shared access to centralized storage,
|
||||
ensure that this is also designed without single points of failure and the SLA
|
||||
for the solution matches or exceeds the expected SLA for the Data Plane.
|
||||
|
||||
Between sites in a multi region design
|
||||
--------------------------------------
|
||||
|
||||
Some services are commonly shared between multiple regions, including the
|
||||
Identity service and the Dashboard. In this case, it is necessary to ensure
|
||||
that the databases backing the services are replicated, and that access to
|
||||
multiple workers across each site can be maintained in the event of losing a
|
||||
single region.
|
||||
|
||||
Multiple network links should be deployed between sites to provide redundancy
|
||||
for all components. This includes storage replication, which should be isolated
|
||||
to a dedicated network or VLAN with the ability to assign QoS to control the
|
||||
replication traffic or provide priority for this traffic. Note that if the data
|
||||
store is highly changeable, the network requirements could have a significant
|
||||
effect on the operational cost of maintaining the sites.
|
||||
|
||||
If the design incorporates more than one site, the ability to maintain object
|
||||
availability in both sites has significant implications on the object storage
|
||||
design and implementation. It also has a significant impact on the WAN network
|
||||
design between the sites.
|
||||
|
||||
If applications running in a cloud are not cloud-aware, there should be clear
|
||||
measures and expectations to define what the infrastructure can and cannot
|
||||
support. An example would be shared storage between sites. It is possible,
|
||||
however such a solution is not native to OpenStack and requires a third-party
|
||||
hardware vendor to fulfill such a requirement. Another example can be seen in
|
||||
applications that are able to consume resources in object storage directly.
|
||||
|
||||
Connecting more than two sites increases the challenges and adds more
|
||||
complexity to the design considerations. Multi-site implementations require
|
||||
planning to address the additional topology used for internal and external
|
||||
connectivity. Some options include full mesh topology, hub spoke, spine leaf,
|
||||
and 3D Torus.
|
||||
|
||||
For more information on high availability in OpenStack, see the `OpenStack High
|
||||
Availability Guide <http://docs.openstack.org/ha-guide/>`_.
|
||||
|
||||
Site loss and recovery
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Outages can cause partial or full loss of site functionality. Strategies
|
||||
should be implemented to understand and plan for recovery scenarios.
|
||||
|
||||
* The deployed applications need to continue to function and, more
|
||||
importantly, you must consider the impact on the performance and
|
||||
reliability of the application if a site is unavailable.
|
||||
|
||||
* It is important to understand what happens to the replication of
|
||||
objects and data between the sites when a site goes down. If this
|
||||
causes queues to start building up, consider how long these queues
|
||||
can safely exist until an error occurs.
|
||||
|
||||
* After an outage, ensure that operations of a site are resumed when it
|
||||
comes back online. We recommend that you architect the recovery to
|
||||
avoid race conditions.
|
||||
|
||||
|
||||
Inter-site replication data
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Traditionally, replication has been the best method of protecting object store
|
||||
implementations. A variety of replication methods exist in storage
|
||||
architectures, for example synchronous and asynchronous mirroring. Most object
|
||||
stores and back-end storage systems implement methods for replication at the
|
||||
storage subsystem layer. Object stores also tailor replication techniques to
|
||||
fit a cloud's requirements.
|
||||
|
||||
Organizations must find the right balance between data integrity and data
|
||||
availability. Replication strategy may also influence disaster recovery
|
||||
methods.
|
||||
|
||||
Replication across different racks, data centers, and geographical regions
|
||||
increases focus on determining and ensuring data locality. The ability to
|
||||
guarantee data is accessed from the nearest or fastest storage can be necessary
|
||||
for applications to perform well.
|
||||
|
||||
.. note::
|
||||
|
||||
When running embedded object store methods, ensure that you do not
|
||||
instigate extra data replication as this may cause performance issues.
|
@ -1,5 +1,5 @@
|
||||
=============
|
||||
Compute Nodes
|
||||
Compute nodes
|
||||
=============
|
||||
|
||||
.. toctree::
|
||||
@ -13,6 +13,88 @@ when designing and building your compute nodes. Compute nodes form the
|
||||
resource core of the OpenStack Compute cloud, providing the processing, memory,
|
||||
network and storage resources to run instances.
|
||||
|
||||
Overview
|
||||
~~~~~~~~
|
||||
|
||||
When designing compute resource pools, consider the number of processors,
|
||||
amount of memory, and the quantity of storage required for each hypervisor.
|
||||
|
||||
Determine whether compute resources will be provided in a single pool or in
|
||||
multiple pools. In most cases, multiple pools of resources can be allocated
|
||||
and addressed on demand, commonly referred to as bin packing.
|
||||
|
||||
In a bin packing design, each independent resource pool provides service
|
||||
for specific flavors. Since instances are scheduled onto compute hypervisors,
|
||||
each independent node's resources will be allocated to efficiently use the
|
||||
available hardware. Bin packing also requires a common hardware design,
|
||||
with all hardware nodes within a compute resource pool sharing a common
|
||||
processor, memory, and storage layout. This makes it easier to deploy,
|
||||
support, and maintain nodes throughout their lifecycle.
|
||||
|
||||
Increasing the size of the supporting compute environment increases the
|
||||
network traffic and messages, adding load to the controller or
|
||||
networking nodes. Effective monitoring of the environment will help with
|
||||
capacity decisions on scaling.
|
||||
|
||||
Compute nodes automatically attach to OpenStack clouds, resulting in a
|
||||
horizontally scaling process when adding extra compute capacity to an
|
||||
OpenStack cloud. Additional processes are required to place nodes into
|
||||
appropriate availability zones and host aggregates. When adding
|
||||
additional compute nodes to environments, ensure identical or functional
|
||||
compatible CPUs are used, otherwise live migration features will break.
|
||||
It is necessary to add rack capacity or network switches as scaling out
|
||||
compute hosts directly affects data center resources.
|
||||
|
||||
Compute host components can also be upgraded to account for increases in
|
||||
demand, known as vertical scaling. Upgrading CPUs with more
|
||||
cores, or increasing the overall server memory, can add extra needed
|
||||
capacity depending on whether the running applications are more CPU
|
||||
intensive or memory intensive.
|
||||
|
||||
When selecting a processor, compare features and performance
|
||||
characteristics. Some processors include features specific to
|
||||
virtualized compute hosts, such as hardware-assisted virtualization, and
|
||||
technology related to memory paging (also known as EPT shadowing). These
|
||||
types of features can have a significant impact on the performance of
|
||||
your virtual machine.
|
||||
|
||||
The number of processor cores and threads impacts the number of worker
|
||||
threads which can be run on a resource node. Design decisions must
|
||||
relate directly to the service being run on it, as well as provide a
|
||||
balanced infrastructure for all services.
|
||||
|
||||
Another option is to assess the average workloads and increase the
|
||||
number of instances that can run within the compute environment by
|
||||
adjusting the overcommit ratio. This ratio is configurable for CPU and
|
||||
memory. The default CPU overcommit ratio is 16:1, and the default memory
|
||||
overcommit ratio is 1.5:1. Determining the tuning of the overcommit
|
||||
ratios during the design phase is important as it has a direct impact on
|
||||
the hardware layout of your compute nodes.
|
||||
|
||||
.. note::
|
||||
|
||||
Changing the CPU overcommit ratio can have a detrimental effect
|
||||
and cause a potential increase in a noisy neighbor.
|
||||
|
||||
Insufficient disk capacity could also have a negative effect on overall
|
||||
performance including CPU and memory usage. Depending on the back-end
|
||||
architecture of the OpenStack Block Storage layer, capacity includes
|
||||
adding disk shelves to enterprise storage systems or installing
|
||||
additional Block Storage nodes. Upgrading directly attached storage
|
||||
installed in Compute hosts, and adding capacity to the shared storage
|
||||
for additional ephemeral storage to instances, may be necessary.
|
||||
|
||||
Consider the Compute requirements of non-hypervisor nodes (also referred to as
|
||||
resource nodes). This includes controller, Object Storage nodes, Block Storage
|
||||
nodes, and networking services.
|
||||
|
||||
The ability to add Compute resource pools for unpredictable workloads should
|
||||
be considered. In some cases, the demand for certain instance types or flavors
|
||||
may not justify individual hardware design. Allocate hardware designs that are
|
||||
capable of servicing the most common instance requests. Adding hardware to the
|
||||
overall architecture can be done later.
|
||||
|
||||
|
||||
Choosing a CPU
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
|
@ -3,474 +3,13 @@ Storage design
|
||||
==============
|
||||
|
||||
Storage is found in many parts of the OpenStack cloud environment. This
|
||||
section describes persistent storage options you can configure with
|
||||
your cloud. It is important to understand the distinction between
|
||||
:term:`ephemeral <ephemeral volume>` storage and
|
||||
:term:`persistent <persistent volume>` storage.
|
||||
chapter describes persistent storage options you can configure with
|
||||
your cloud.
|
||||
|
||||
Ephemeral Storage
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
If you deploy only the OpenStack :term:`Compute service` (nova), by default
|
||||
your users do not have access to any form of persistent storage. The disks
|
||||
associated with VMs are "ephemeral," meaning that from the user's point
|
||||
of view they disappear when a virtual machine is terminated.
|
||||
|
||||
Persistent Storage
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Persistent storage means that the storage resource outlives any other
|
||||
resource and is always available, regardless of the state of a running
|
||||
instance.
|
||||
|
||||
Today, OpenStack clouds explicitly support three types of persistent
|
||||
storage: *object storage*, *block storage*, and *file system storage*.
|
||||
|
||||
Object Storage
|
||||
--------------
|
||||
|
||||
Object storage is implemented in OpenStack by the
|
||||
OpenStack Object Storage (swift) project. Users access binary objects
|
||||
through a REST API. If your intended users need to
|
||||
archive or manage large datasets, you want to provide them with Object
|
||||
Storage. In addition, OpenStack can store your virtual machine (VM)
|
||||
images inside of an object storage system, as an alternative to storing
|
||||
the images on a file system.
|
||||
|
||||
OpenStack Object Storage provides a highly scalable, highly available
|
||||
storage solution by relaxing some of the constraints of traditional file
|
||||
systems. In designing and procuring for such a cluster, it is important
|
||||
to understand some key concepts about its operation. Essentially, this
|
||||
type of storage is built on the idea that all storage hardware fails, at
|
||||
every level, at some point. Infrequently encountered failures that would
|
||||
hamstring other storage systems, such as issues taking down RAID cards
|
||||
or entire servers, are handled gracefully with OpenStack Object
|
||||
Storage. For more information, see the `Swift developer
|
||||
documentation <http://docs.openstack.org/developer/swift/overview_architecture.html>`_
|
||||
|
||||
When designing your cluster, you must consider durability and
|
||||
availability which is dependent on the spread and placement of your data,
|
||||
rather than the reliability of the
|
||||
hardware. Consider the default value of the number of replicas, which is
|
||||
three. This means that before an object is marked as having been
|
||||
written, at least two copies exist in case a single server fails to
|
||||
write, the third copy may or may not yet exist when the write operation
|
||||
initially returns. Altering this number increases the robustness of your
|
||||
data, but reduces the amount of storage you have available. Look
|
||||
at the placement of your servers. Consider spreading them widely
|
||||
throughout your data center's network and power-failure zones. Is a zone
|
||||
a rack, a server, or a disk?
|
||||
|
||||
Consider these main traffic flows for an Object Storage network:
|
||||
|
||||
* Among :term:`object`, :term:`container`, and
|
||||
:term:`account servers <account server>`
|
||||
* Between servers and the proxies
|
||||
* Between the proxies and your users
|
||||
|
||||
Object Storage frequent communicates among servers hosting data. Even a small
|
||||
cluster generates megabytes/second of traffic, which is predominantly, “Do
|
||||
you have the object?” and “Yes I have the object!” If the answer
|
||||
to the question is negative or the request times out,
|
||||
replication of the object begins.
|
||||
|
||||
Consider the scenario where an entire server fails and 24 TB of data
|
||||
needs to be transferred "immediately" to remain at three copies — this can
|
||||
put significant load on the network.
|
||||
|
||||
Another consideration is when a new file is being uploaded, the proxy server
|
||||
must write out as many streams as there are replicas, multiplying network
|
||||
traffic. For a three-replica cluster, 10 Gbps in means 30 Gbps out. Combining
|
||||
this with the previous high bandwidth bandwidth private versus public network
|
||||
recommendations demands of replication is what results in the recommendation
|
||||
that your private network be of significantly higher bandwidth than your public
|
||||
network requires. OpenStack Object Storage communicates internally with
|
||||
unencrypted, unauthenticated rsync for performance, so the private
|
||||
network is required.
|
||||
|
||||
The remaining point on bandwidth is the public-facing portion. The
|
||||
``swift-proxy`` service is stateless, which means that you can easily
|
||||
add more and use HTTP load-balancing methods to share bandwidth and
|
||||
availability between them.
|
||||
|
||||
More proxies means more bandwidth, if your storage can keep up.
|
||||
|
||||
Block Storage
|
||||
-------------
|
||||
|
||||
Block storage (sometimes referred to as volume storage) provides users
|
||||
with access to block-storage devices. Users interact with block storage
|
||||
by attaching volumes to their running VM instances.
|
||||
|
||||
These volumes are persistent: they can be detached from one instance and
|
||||
re-attached to another, and the data remains intact. Block storage is
|
||||
implemented in OpenStack by the OpenStack Block Storage (cinder), which
|
||||
supports multiple back ends in the form of drivers. Your
|
||||
choice of a storage back end must be supported by a Block Storage
|
||||
driver.
|
||||
|
||||
Most block storage drivers allow the instance to have direct access to
|
||||
the underlying storage hardware's block device. This helps increase the
|
||||
overall read/write IO. However, support for utilizing files as volumes
|
||||
is also well established, with full support for NFS, GlusterFS and
|
||||
others.
|
||||
|
||||
These drivers work a little differently than a traditional "block"
|
||||
storage driver. On an NFS or GlusterFS file system, a single file is
|
||||
created and then mapped as a "virtual" volume into the instance. This
|
||||
mapping/translation is similar to how OpenStack utilizes QEMU's
|
||||
file-based virtual machines stored in ``/var/lib/nova/instances``.
|
||||
|
||||
Shared File Systems Service
|
||||
---------------------------
|
||||
|
||||
The Shared File Systems service (manila) provides a set of services for
|
||||
management of shared file systems in a multi-tenant cloud environment.
|
||||
Users interact with the Shared File Systems service by mounting remote File
|
||||
Systems on their instances with the following usage of those systems for
|
||||
file storing and exchange. The Shared File Systems service provides you with
|
||||
shares which is a remote, mountable file system. You can mount a
|
||||
share to and access a share from several hosts by several users at a
|
||||
time. With shares, user can also:
|
||||
|
||||
* Create a share specifying its size, shared file system protocol,
|
||||
visibility level.
|
||||
* Create a share on either a share server or standalone, depending on
|
||||
the selected back-end mode, with or without using a share network.
|
||||
* Specify access rules and security services for existing shares.
|
||||
* Combine several shares in groups to keep data consistency inside the
|
||||
groups for the following safe group operations.
|
||||
* Create a snapshot of a selected share or a share group for storing
|
||||
the existing shares consistently or creating new shares from that
|
||||
snapshot in a consistent way.
|
||||
* Create a share from a snapshot.
|
||||
* Set rate limits and quotas for specific shares and snapshots.
|
||||
* View usage of share resources.
|
||||
* Remove shares.
|
||||
|
||||
Like Block Storage, the Shared File Systems service is persistent. It
|
||||
can be:
|
||||
|
||||
* Mounted to any number of client machines.
|
||||
* Detached from one instance and attached to another without data loss.
|
||||
During this process the data are safe unless the Shared File Systems
|
||||
service itself is changed or removed.
|
||||
|
||||
Shares are provided by the Shared File Systems service. In OpenStack,
|
||||
Shared File Systems service is implemented by Shared File System
|
||||
(manila) project, which supports multiple back-ends in the form of
|
||||
drivers. The Shared File Systems service can be configured to provision
|
||||
shares from one or more back-ends. Share servers are, mostly, virtual
|
||||
machines that export file shares using different protocols such as NFS,
|
||||
CIFS, GlusterFS, or HDFS.
|
||||
|
||||
OpenStack Storage Concepts
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:ref:`table_openstack_storage` explains the different storage concepts
|
||||
provided by OpenStack.
|
||||
|
||||
.. _table_openstack_storage:
|
||||
|
||||
.. list-table:: Table. OpenStack storage
|
||||
:widths: 20 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Ephemeral storage
|
||||
- Block storage
|
||||
- Object storage
|
||||
- Shared File System storage
|
||||
* - Used to…
|
||||
- Run operating system and scratch space
|
||||
- Add additional persistent storage to a virtual machine (VM)
|
||||
- Store data, including VM images
|
||||
- Add additional persistent storage to a virtual machine
|
||||
* - Accessed through…
|
||||
- A file system
|
||||
- A block device that can be partitioned, formatted, and mounted
|
||||
(such as, /dev/vdc)
|
||||
- The REST API
|
||||
- A Shared File Systems service share (either manila managed or an
|
||||
external one registered in manila) that can be partitioned, formatted
|
||||
and mounted (such as /dev/vdc)
|
||||
* - Accessible from…
|
||||
- Within a VM
|
||||
- Within a VM
|
||||
- Anywhere
|
||||
- Within a VM
|
||||
* - Managed by…
|
||||
- OpenStack Compute (nova)
|
||||
- OpenStack Block Storage (cinder)
|
||||
- OpenStack Object Storage (swift)
|
||||
- OpenStack Shared File System Storage (manila)
|
||||
* - Persists until…
|
||||
- VM is terminated
|
||||
- Deleted by user
|
||||
- Deleted by user
|
||||
- Deleted by user
|
||||
* - Sizing determined by…
|
||||
- Administrator configuration of size settings, known as *flavors*
|
||||
- User specification in initial request
|
||||
- Amount of available physical storage
|
||||
- * User specification in initial request
|
||||
* Requests for extension
|
||||
* Available user-level quotes
|
||||
* Limitations applied by Administrator
|
||||
* - Encryption set by…
|
||||
- Parameter in nova.conf
|
||||
- Admin establishing `encrypted volume type
|
||||
<http://docs.openstack.org/admin-guide/dashboard-manage-volumes.html>`_,
|
||||
then user selecting encrypted volume
|
||||
- Not yet available
|
||||
- Shared File Systems service does not apply any additional encryption
|
||||
above what the share’s back-end storage provides
|
||||
* - Example of typical usage…
|
||||
- 10 GB first disk, 30 GB second disk
|
||||
- 1 TB disk
|
||||
- 10s of TBs of dataset storage
|
||||
- Depends completely on the size of back-end storage specified when
|
||||
a share was being created. In case of thin provisioning it can be
|
||||
partial space reservation (for more details see
|
||||
`Capabilities and Extra-Specs
|
||||
<http://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
|
||||
specification)
|
||||
|
||||
.. note::
|
||||
|
||||
**File-level Storage (for Live Migration)**
|
||||
|
||||
With file-level storage, users access stored data using the operating
|
||||
system's file system interface. Most users, if they have used a network
|
||||
storage solution before, have encountered this form of networked
|
||||
storage. In the Unix world, the most common form of this is NFS. In the
|
||||
Windows world, the most common form is called CIFS (previously, SMB).
|
||||
|
||||
OpenStack clouds do not present file-level storage to end users.
|
||||
However, it is important to consider file-level storage for storing
|
||||
instances under ``/var/lib/nova/instances`` when designing your cloud,
|
||||
since you must have a shared file system if you want to support live
|
||||
migration.
|
||||
|
||||
Choosing Storage Back Ends
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Users will indicate different needs for their cloud use cases. Some may
|
||||
need fast access to many objects that do not change often, or want to
|
||||
set a time-to-live (TTL) value on a file. Others may access only storage
|
||||
that is mounted with the file system itself, but want it to be
|
||||
replicated instantly when starting a new instance. For other systems,
|
||||
ephemeral storage is the preferred choice. When you select
|
||||
:term:`storage back ends <storage back end>`,
|
||||
consider the following questions from user's perspective:
|
||||
|
||||
* Do my users need block storage?
|
||||
* Do my users need object storage?
|
||||
* Do I need to support live migration?
|
||||
* Should my persistent storage drives be contained in my compute nodes,
|
||||
or should I use external storage?
|
||||
* What is the platter count I can achieve? Do more spindles result in
|
||||
better I/O despite network access?
|
||||
* Which one results in the best cost-performance scenario I'm aiming for?
|
||||
* How do I manage the storage operationally?
|
||||
* How redundant and distributed is the storage? What happens if a
|
||||
storage node fails? To what extent can it mitigate my data-loss
|
||||
disaster scenarios?
|
||||
|
||||
To deploy your storage by using only commodity hardware, you can use a number
|
||||
of open-source packages, as shown in :ref:`table_persistent_file_storage`.
|
||||
|
||||
.. _table_persistent_file_storage:
|
||||
|
||||
.. list-table:: Table. Persistent file-based storage support
|
||||
:widths: 25 25 25 25
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Object
|
||||
- Block
|
||||
- File-level
|
||||
* - Swift
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
-
|
||||
* - LVM
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
* - Ceph
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- Experimental
|
||||
* - Gluster
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
* - NFS
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
* - ZFS
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
* - Sheepdog
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
|
||||
This list of open source file-level shared storage solutions is not
|
||||
exhaustive; other open source solutions exist (MooseFS). Your
|
||||
organization may already have deployed a file-level shared storage
|
||||
solution that you can use.
|
||||
|
||||
.. note::
|
||||
|
||||
**Storage Driver Support**
|
||||
|
||||
In addition to the open source technologies, there are a number of
|
||||
proprietary solutions that are officially supported by OpenStack Block
|
||||
Storage. You can find a matrix of the functionality provided by all of the
|
||||
supported Block Storage drivers on the `OpenStack
|
||||
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
|
||||
|
||||
Also, you need to decide whether you want to support object storage in
|
||||
your cloud. The two common use cases for providing object storage in a
|
||||
compute cloud are:
|
||||
|
||||
* To provide users with a persistent storage mechanism
|
||||
* As a scalable, reliable data store for virtual machine images
|
||||
|
||||
Commodity Storage Back-end Technologies
|
||||
---------------------------------------
|
||||
|
||||
This section provides a high-level overview of the differences among the
|
||||
different commodity storage back end technologies. Depending on your
|
||||
cloud user's needs, you can implement one or many of these technologies
|
||||
in different combinations:
|
||||
|
||||
OpenStack Object Storage (swift)
|
||||
The official OpenStack Object Store implementation. It is a mature
|
||||
technology that has been used for several years in production by
|
||||
Rackspace as the technology behind Rackspace Cloud Files. As it is
|
||||
highly scalable, it is well-suited to managing petabytes of storage.
|
||||
OpenStack Object Storage's advantages are better integration with
|
||||
OpenStack (integrates with OpenStack Identity, works with the
|
||||
OpenStack dashboard interface) and better support for multiple data
|
||||
center deployment through support of asynchronous eventual
|
||||
consistency replication.
|
||||
|
||||
Therefore, if you eventually plan on distributing your storage
|
||||
cluster across multiple data centers, if you need unified accounts
|
||||
for your users for both compute and object storage, or if you want
|
||||
to control your object storage with the OpenStack dashboard, you
|
||||
should consider OpenStack Object Storage. More detail can be found
|
||||
about OpenStack Object Storage in the section below.
|
||||
|
||||
Ceph
|
||||
A scalable storage solution that replicates data across commodity
|
||||
storage nodes.
|
||||
|
||||
Ceph was designed to expose different types of storage interfaces to
|
||||
the end user: it supports object storage, block storage, and
|
||||
file-system interfaces, although the file-system interface is not
|
||||
production-ready. Ceph supports the same API as swift
|
||||
for object storage and can be used as a back end for cinder block
|
||||
storage as well as back-end storage for glance images. Ceph supports
|
||||
"thin provisioning," implemented using copy-on-write.
|
||||
|
||||
This can be useful when booting from volume because a new volume can
|
||||
be provisioned very quickly. Ceph also supports keystone-based
|
||||
authentication (as of version 0.56), so it can be a seamless swap in
|
||||
for the default OpenStack swift implementation.
|
||||
|
||||
Ceph's advantages are that it gives the administrator more
|
||||
fine-grained control over data distribution and replication
|
||||
strategies, enables you to consolidate your object and block
|
||||
storage, enables very fast provisioning of boot-from-volume
|
||||
instances using thin provisioning, and supports a distributed
|
||||
file-system interface, though this interface is `not yet
|
||||
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
|
||||
production deployment by the Ceph project.
|
||||
|
||||
If you want to manage your object and block storage within a single
|
||||
system, or if you want to support fast boot-from-volume, you should
|
||||
consider Ceph.
|
||||
|
||||
Gluster
|
||||
A distributed, shared file system. As of Gluster version 3.3, you
|
||||
can use Gluster to consolidate your object storage and file storage
|
||||
into one unified file and object storage solution, which is called
|
||||
Gluster For OpenStack (GFO). GFO uses a customized version of swift
|
||||
that enables Gluster to be used as the back-end storage.
|
||||
|
||||
The main reason to use GFO rather than swift is if you also
|
||||
want to support a distributed file system, either to support shared
|
||||
storage live migration or to provide it as a separate service to
|
||||
your end users. If you want to manage your object and file storage
|
||||
within a single system, you should consider GFO.
|
||||
|
||||
LVM
|
||||
The Logical Volume Manager is a Linux-based system that provides an
|
||||
abstraction layer on top of physical disks to expose logical volumes
|
||||
to the operating system. The LVM back-end implements block storage
|
||||
as LVM logical partitions.
|
||||
|
||||
On each host that will house block storage, an administrator must
|
||||
initially create a volume group dedicated to Block Storage volumes.
|
||||
Blocks are created from LVM logical volumes.
|
||||
|
||||
.. note::
|
||||
|
||||
LVM does *not* provide any replication. Typically,
|
||||
administrators configure RAID on nodes that use LVM as block
|
||||
storage to protect against failures of individual hard drives.
|
||||
However, RAID does not protect against a failure of the entire
|
||||
host.
|
||||
|
||||
ZFS
|
||||
The Solaris iSCSI driver for OpenStack Block Storage implements
|
||||
blocks as ZFS entities. ZFS is a file system that also has the
|
||||
functionality of a volume manager. This is unlike on a Linux system,
|
||||
where there is a separation of volume manager (LVM) and file system
|
||||
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
|
||||
advantages over ext4, including improved data-integrity checking.
|
||||
|
||||
The ZFS back end for OpenStack Block Storage supports only
|
||||
Solaris-based systems, such as Illumos. While there is a Linux port
|
||||
of ZFS, it is not included in any of the standard Linux
|
||||
distributions, and it has not been tested with OpenStack Block
|
||||
Storage. As with LVM, ZFS does not provide replication across hosts
|
||||
on its own; you need to add a replication solution on top of ZFS if
|
||||
your cloud needs to be able to handle storage-node failures.
|
||||
|
||||
We don't recommend ZFS unless you have previous experience with
|
||||
deploying it, since the ZFS back end for Block Storage requires a
|
||||
Solaris-based operating system, and we assume that your experience
|
||||
is primarily with Linux-based systems.
|
||||
|
||||
Sheepdog
|
||||
Sheepdog is a userspace distributed storage system. Sheepdog scales
|
||||
to several hundred nodes, and has powerful virtual disk management
|
||||
features like snapshot, cloning, rollback, thin provisioning.
|
||||
|
||||
It is essentially an object storage system that manages disks and
|
||||
aggregates the space and performance of disks linearly in hyper
|
||||
scale on commodity hardware in a smart way. On top of its object
|
||||
store, Sheepdog provides elastic volume service and http service.
|
||||
Sheepdog does not assume anything about kernel version and can work
|
||||
nicely with xattr-supported file systems.
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
design-storage/design-storage-concepts
|
||||
design-storage/design-storage-backends
|
||||
design-storage/design-storage-planning-scaling.rst
|
||||
|
||||
|
@ -0,0 +1,222 @@
|
||||
==========================
|
||||
Choosing storage back ends
|
||||
==========================
|
||||
|
||||
Users indicate different needs for their cloud use cases. Some may
|
||||
need fast access to many objects that do not change often, or want to
|
||||
set a time-to-live (TTL) value on a file. Others may access only storage
|
||||
that is mounted with the file system itself, but want it to be
|
||||
replicated instantly when starting a new instance. For other systems,
|
||||
ephemeral storage is the preferred choice. When you select
|
||||
:term:`storage back ends <storage back end>`,
|
||||
consider the following questions from user's perspective:
|
||||
|
||||
* Do my users need Block Storage?
|
||||
* Do my users need Object Storage?
|
||||
* Do I need to support live migration?
|
||||
* Should my persistent storage drives be contained in my Compute nodes,
|
||||
or should I use external storage?
|
||||
* What is the platter count I can achieve? Do more spindles result in
|
||||
better I/O despite network access?
|
||||
* Which one results in the best cost-performance scenario I am aiming for?
|
||||
* How do I manage the storage operationally?
|
||||
* How redundant and distributed is the storage? What happens if a
|
||||
storage node fails? To what extent can it mitigate my data-loss
|
||||
disaster scenarios?
|
||||
|
||||
To deploy your storage by using only commodity hardware, you can use a number
|
||||
of open-source packages, as shown in :ref:`table_persistent_file_storage`.
|
||||
|
||||
.. _table_persistent_file_storage:
|
||||
|
||||
.. list-table:: Table. Persistent file-based storage support
|
||||
:widths: 25 25 25 25
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Object
|
||||
- Block
|
||||
- File-level
|
||||
* - Swift
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
-
|
||||
* - LVM
|
||||
-
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
* - Ceph
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- Experimental
|
||||
* - Gluster
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
* - NFS
|
||||
-
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
* - ZFS
|
||||
-
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
* - Sheepdog
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: ../figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
|
||||
Open source file-level shared storage solutions are available, such as
|
||||
MooseFS. Your organization may already have deployed a file-level
|
||||
shared storage solution that you can use.
|
||||
|
||||
.. note::
|
||||
|
||||
**Storage Driver Support**
|
||||
|
||||
In addition to the open source technologies, there are a number of
|
||||
proprietary solutions that are officially supported by OpenStack Block
|
||||
Storage. You can find a matrix of the functionality provided by all of the
|
||||
supported Block Storage drivers on the `OpenStack
|
||||
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
|
||||
|
||||
You should also decide whether you want to support Object Storage in
|
||||
your cloud. The two common use cases for providing Object Storage in a
|
||||
Compute cloud are:
|
||||
|
||||
* To provide users with a persistent storage mechanism
|
||||
* As a scalable, reliable data store for virtual machine images
|
||||
|
||||
Commodity storage back-end technologies
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section provides a high-level overview of the differences among the
|
||||
different commodity storage back end technologies. Depending on your
|
||||
cloud user's needs, you can implement one or many of these technologies
|
||||
in different combinations:
|
||||
|
||||
OpenStack Object Storage (swift)
|
||||
The official OpenStack Object Store implementation. It is a mature
|
||||
technology that has been used for several years in production by
|
||||
Rackspace as the technology behind Rackspace Cloud Files. As it is
|
||||
highly scalable, it is well-suited to managing petabytes of storage.
|
||||
OpenStack Object Storage's advantages are better integration with
|
||||
OpenStack (integrates with OpenStack Identity, works with the
|
||||
OpenStack Dashboard interface) and better support for multiple data
|
||||
center deployment through support of asynchronous eventual
|
||||
consistency replication.
|
||||
|
||||
If you plan on distributing your storage
|
||||
cluster across multiple data centers, if you need unified accounts
|
||||
for your users for both compute and object storage, or if you want
|
||||
to control your object storage with the OpenStack dashboard, you
|
||||
should consider OpenStack Object Storage. More detail can be found
|
||||
about OpenStack Object Storage in the section below.
|
||||
|
||||
Ceph
|
||||
A scalable storage solution that replicates data across commodity
|
||||
storage nodes.
|
||||
|
||||
Ceph was designed to expose different types of storage interfaces to
|
||||
the end user: it supports Object Storage, Block Storage, and
|
||||
file-system interfaces, although the file-system interface is not
|
||||
production-ready. Ceph supports the same API as swift
|
||||
for Object Storage and can be used as a back end for Block
|
||||
Storage, as well as back-end storage for glance images. Ceph supports
|
||||
"thin provisioning," implemented using copy-on-write.
|
||||
|
||||
This can be useful when booting from volume because a new volume can
|
||||
be provisioned very quickly. Ceph also supports keystone-based
|
||||
authentication (as of version 0.56), so it can be a seamless swap in
|
||||
for the default OpenStack swift implementation.
|
||||
|
||||
Ceph's advantages are that it gives the administrator more
|
||||
fine-grained control over data distribution and replication
|
||||
strategies, enables you to consolidate your Object and Block
|
||||
Storage, enables very fast provisioning of boot-from-volume
|
||||
instances using thin provisioning, and supports a distributed
|
||||
file-system interface, though this interface is `not yet
|
||||
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
|
||||
production deployment by the Ceph project.
|
||||
|
||||
If you want to manage your Object and Block Storage within a single
|
||||
system, or if you want to support fast boot-from-volume, you should
|
||||
consider Ceph.
|
||||
|
||||
Gluster
|
||||
A distributed, shared file system. As of Gluster version 3.3, you
|
||||
can use Gluster to consolidate your object storage and file storage
|
||||
into one unified file and Object Storage solution, which is called
|
||||
Gluster For OpenStack (GFO). GFO uses a customized version of swift
|
||||
that enables Gluster to be used as the back-end storage.
|
||||
|
||||
The main reason to use GFO rather than swift is if you also
|
||||
want to support a distributed file system, either to support shared
|
||||
storage live migration or to provide it as a separate service to
|
||||
your end users. If you want to manage your object and file storage
|
||||
within a single system, you should consider GFO.
|
||||
|
||||
LVM
|
||||
The Logical Volume Manager is a Linux-based system that provides an
|
||||
abstraction layer on top of physical disks to expose logical volumes
|
||||
to the operating system. The LVM back-end implements block storage
|
||||
as LVM logical partitions.
|
||||
|
||||
On each host that that houses Block Storage, an administrator must
|
||||
initially create a volume group dedicated to Block Storage volumes.
|
||||
Blocks are created from LVM logical volumes.
|
||||
|
||||
.. note::
|
||||
|
||||
LVM does *not* provide any replication. Typically,
|
||||
administrators configure RAID on nodes that use LVM as block
|
||||
storage to protect against failures of individual hard drives.
|
||||
However, RAID does not protect against a failure of the entire
|
||||
host.
|
||||
|
||||
ZFS
|
||||
The Solaris iSCSI driver for OpenStack Block Storage implements
|
||||
blocks as ZFS entities. ZFS is a file system that also has the
|
||||
functionality of a volume manager. This is unlike on a Linux system,
|
||||
where there is a separation of volume manager (LVM) and file system
|
||||
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
|
||||
advantages over ext4, including improved data-integrity checking.
|
||||
|
||||
The ZFS back end for OpenStack Block Storage supports only
|
||||
Solaris-based systems, such as Illumos. While there is a Linux port
|
||||
of ZFS, it is not included in any of the standard Linux
|
||||
distributions, and it has not been tested with OpenStack Block
|
||||
Storage. As with LVM, ZFS does not provide replication across hosts
|
||||
on its own; you need to add a replication solution on top of ZFS if
|
||||
your cloud needs to be able to handle storage-node failures.
|
||||
|
||||
We don't recommend ZFS unless you have previous experience with
|
||||
deploying it, since the ZFS back end for Block Storage requires a
|
||||
Solaris-based operating system, and we assume that your experience
|
||||
is primarily with Linux-based systems.
|
||||
|
||||
Sheepdog
|
||||
Sheepdog is a userspace distributed storage system. Sheepdog scales
|
||||
to several hundred nodes, and has powerful virtual disk management
|
||||
features like snapshot, cloning, rollback, thin provisioning.
|
||||
|
||||
It is essentially an object storage system that manages disks and
|
||||
aggregates the space and performance of disks linearly in hyper
|
||||
scale on commodity hardware in a smart way. On top of its object
|
||||
store, Sheepdog provides elastic volume service and http service.
|
||||
Sheepdog does not assume anything about kernel version and can work
|
||||
nicely with xattr-supported file systems.
|
||||
|
||||
.. TODO Add summary of when Sheepdog is recommended
|
@ -0,0 +1,257 @@
|
||||
================
|
||||
Storage concepts
|
||||
================
|
||||
|
||||
This section describes persistent storage options you can configure with
|
||||
your cloud. It is important to understand the distinction between
|
||||
:term:`ephemeral <ephemeral volume>` storage and
|
||||
:term:`persistent <persistent volume>` storage.
|
||||
|
||||
Ephemeral storage
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
If you deploy only the OpenStack :term:`Compute service` (nova), by default
|
||||
your users do not have access to any form of persistent storage. The disks
|
||||
associated with VMs are ephemeral, meaning that from the user's point
|
||||
of view they disappear when a virtual machine is terminated.
|
||||
|
||||
Persistent storage
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Persistent storage is a storage resource that outlives any other
|
||||
resource and is always available, regardless of the state of a running
|
||||
instance.
|
||||
|
||||
OpenStack clouds explicitly support three types of persistent
|
||||
storage: *Object Storage*, *Block Storage*, and *file system storage*.
|
||||
|
||||
:ref:`table_openstack_storage` explains the different storage concepts
|
||||
provided by OpenStack.
|
||||
|
||||
.. _table_openstack_storage:
|
||||
|
||||
.. list-table:: Table. OpenStack storage
|
||||
:widths: 20 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Ephemeral storage
|
||||
- Block storage
|
||||
- Object storage
|
||||
- Shared File System storage
|
||||
* - Used to…
|
||||
- Run operating system and scratch space
|
||||
- Add additional persistent storage to a virtual machine (VM)
|
||||
- Store data, including VM images
|
||||
- Add additional persistent storage to a virtual machine
|
||||
* - Accessed through…
|
||||
- A file system
|
||||
- A block device that can be partitioned, formatted, and mounted
|
||||
(such as, /dev/vdc)
|
||||
- The REST API
|
||||
- A Shared File Systems service share (either manila managed or an
|
||||
external one registered in manila) that can be partitioned, formatted
|
||||
and mounted (such as /dev/vdc)
|
||||
* - Accessible from…
|
||||
- Within a VM
|
||||
- Within a VM
|
||||
- Anywhere
|
||||
- Within a VM
|
||||
* - Managed by…
|
||||
- OpenStack Compute (nova)
|
||||
- OpenStack Block Storage (cinder)
|
||||
- OpenStack Object Storage (swift)
|
||||
- OpenStack Shared File System Storage (manila)
|
||||
* - Persists until…
|
||||
- VM is terminated
|
||||
- Deleted by user
|
||||
- Deleted by user
|
||||
- Deleted by user
|
||||
* - Sizing determined by…
|
||||
- Administrator configuration of size settings, known as *flavors*
|
||||
- User specification in initial request
|
||||
- Amount of available physical storage
|
||||
- * User specification in initial request
|
||||
* Requests for extension
|
||||
* Available user-level quotes
|
||||
* Limitations applied by Administrator
|
||||
* - Encryption set by…
|
||||
- Parameter in nova.conf
|
||||
- Admin establishing `encrypted volume type
|
||||
<http://docs.openstack.org/admin-guide/dashboard_manage_volumes.html>`_,
|
||||
then user selecting encrypted volume
|
||||
- Not yet available
|
||||
- Shared File Systems service does not apply any additional encryption
|
||||
above what the share’s back-end storage provides
|
||||
* - Example of typical usage…
|
||||
- 10 GB first disk, 30 GB second disk
|
||||
- 1 TB disk
|
||||
- 10s of TBs of dataset storage
|
||||
- Depends completely on the size of back-end storage specified when
|
||||
a share was being created. In case of thin provisioning it can be
|
||||
partial space reservation (for more details see
|
||||
`Capabilities and Extra-Specs
|
||||
<http://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
|
||||
specification)
|
||||
|
||||
.. note::
|
||||
|
||||
**File-level Storage (for Live Migration)**
|
||||
|
||||
With file-level storage, users access stored data using the operating
|
||||
system's file system interface. Most users, if they have used a network
|
||||
storage solution before, have encountered this form of networked
|
||||
storage. In the Unix world, the most common form of this is NFS. In the
|
||||
Windows world, the most common form is called CIFS (previously, SMB).
|
||||
|
||||
OpenStack clouds do not present file-level storage to end users.
|
||||
However, it is important to consider file-level storage for storing
|
||||
instances under ``/var/lib/nova/instances`` when designing your cloud,
|
||||
since you must have a shared file system if you want to support live
|
||||
migration.
|
||||
|
||||
Object Storage
|
||||
--------------
|
||||
|
||||
.. TODO (shaun) Revise this section. I would start with an abstract of object
|
||||
storage and then describe how swift fits into it. I think this will match
|
||||
the rest of the sections better.
|
||||
|
||||
Object storage is implemented in OpenStack by the
|
||||
OpenStack Object Storage (swift) project. Users access binary objects
|
||||
through a REST API. If your intended users need to
|
||||
archive or manage large datasets, you want to provide them with Object
|
||||
Storage. In addition, OpenStack can store your virtual machine (VM)
|
||||
images inside of an Object Storage system, as an alternative to storing
|
||||
the images on a file system.
|
||||
|
||||
OpenStack Object Storage provides a highly scalable, highly available
|
||||
storage solution by relaxing some of the constraints of traditional file
|
||||
systems. In designing and procuring for such a cluster, it is important
|
||||
to understand some key concepts about its operation. Essentially, this
|
||||
type of storage is built on the idea that all storage hardware fails, at
|
||||
every level, at some point. Infrequently encountered failures that would
|
||||
hamstring other storage systems, such as issues taking down RAID cards
|
||||
or entire servers, are handled gracefully with OpenStack Object
|
||||
Storage. For more information, see the `Swift developer
|
||||
documentation <http://docs.openstack.org/developer/swift/overview_architecture.html>`_
|
||||
|
||||
When designing your cluster, consider:
|
||||
|
||||
* Durability and availability, that is dependent on the spread and
|
||||
placement of your data, rather than the reliability of the hardware.
|
||||
|
||||
* Default value of the number of replicas, which is
|
||||
three. This means that before an object is marked as having been
|
||||
written, at least two copies exist in case a single server fails to
|
||||
write, the third copy may or may not yet exist when the write operation
|
||||
initially returns. Altering this number increases the robustness of your
|
||||
data, but reduces the amount of storage you have available.
|
||||
|
||||
* Placement of your servers, whether to spread them widely
|
||||
throughout your data center's network and power-failure zones. Define
|
||||
a zone as a rack, a server, or a disk.
|
||||
|
||||
Consider these main traffic flows for an Object Storage network:
|
||||
|
||||
* Among :term:`object`, :term:`container`, and
|
||||
:term:`account servers <account server>`
|
||||
* Between servers and the proxies
|
||||
* Between the proxies and your users
|
||||
|
||||
Object Storage frequently communicates among servers hosting data. Even a small
|
||||
cluster generates megabytes/second of traffic. If an object is not received
|
||||
or the request times out, replication of the object begins.
|
||||
|
||||
.. TODO Above paragraph: descibe what Object Storage is communicationg. What
|
||||
is actually communicating? What part of the software is doing the
|
||||
communicating? Is it all of the servers communicating with one another?
|
||||
|
||||
Consider the scenario where an entire server fails and 24 TB of data
|
||||
needs to be transferred immediately to remain at three copies — this can
|
||||
put significant load on the network.
|
||||
|
||||
Another consideration is when a new file is being uploaded, the proxy server
|
||||
must write out as many streams as there are replicas, multiplying network
|
||||
traffic. For a three-replica cluster, 10 Gbps in means 30 Gbps out. Combining
|
||||
this with the previous high bandwidth private versus public network
|
||||
recommendations demands of replication is what results in the recommendation
|
||||
that your private network be of significantly higher bandwidth than your public
|
||||
network requires. OpenStack Object Storage communicates internally with
|
||||
unencrypted, unauthenticated rsync for performance, so the private
|
||||
network is required.
|
||||
|
||||
.. TODO Revise the above paragraph for clarity.
|
||||
|
||||
The remaining point on bandwidth is the public-facing portion. The
|
||||
``swift-proxy`` service is stateless, which means that you can easily
|
||||
add more and use HTTP load-balancing methods to share bandwidth and
|
||||
availability between them.
|
||||
|
||||
Block Storage
|
||||
-------------
|
||||
|
||||
Block storage provides users with access to Block Storage devices. Users
|
||||
interact with Block Storage by attaching volumes to their running VM instances.
|
||||
|
||||
These volumes are persistent: they can be detached from one instance and
|
||||
re-attached to another, and the data remains intact. Block storage is
|
||||
implemented in OpenStack by the OpenStack Block Storage (cinder), which
|
||||
supports multiple back ends in the form of drivers. Your
|
||||
choice of a storage back end must be supported by a Block Storage
|
||||
driver.
|
||||
|
||||
Most Block Storage drivers allow the instance to have direct access to
|
||||
the underlying storage hardware's block device. This helps increase the
|
||||
overall read and write IO. However, support for utilizing files as volumes
|
||||
is also well established, with full support for NFS, GlusterFS, and
|
||||
others.
|
||||
|
||||
These drivers work a little differently than a traditional Block
|
||||
Storage driver. On an NFS or GlusterFS file system, a single file is
|
||||
created and then mapped as a virtual volume into the instance. This
|
||||
mapping or translation is similar to how OpenStack utilizes QEMU's
|
||||
file-based virtual machines stored in ``/var/lib/nova/instances``.
|
||||
|
||||
Shared File Systems Service
|
||||
---------------------------
|
||||
|
||||
The Shared File Systems service (manila) provides a set of services for
|
||||
management of shared file systems in a multi-tenant cloud environment.
|
||||
Users interact with the Shared File Systems service by mounting remote File
|
||||
Systems on their instances with the following usage of those systems for
|
||||
file storing and exchange. The Shared File Systems service provides you with
|
||||
a share which is a remote, mountable file system. You can mount a
|
||||
share to and access a share from several hosts by several users at a
|
||||
time. With shares, a user can also:
|
||||
|
||||
* Create a share specifying its size, shared file system protocol, and
|
||||
visibility level.
|
||||
* Create a share on either a share server or standalone, depending on
|
||||
the selected back-end mode, with or without using a share network.
|
||||
* Specify access rules and security services for existing shares.
|
||||
* Combine several shares in groups to keep data consistency inside the
|
||||
groups for the following safe group operations.
|
||||
* Create a snapshot of a selected share or a share group for storing
|
||||
the existing shares consistently or creating new shares from that
|
||||
snapshot in a consistent way.
|
||||
* Create a share from a snapshot.
|
||||
* Set rate limits and quotas for specific shares and snapshots.
|
||||
* View usage of share resources.
|
||||
* Remove shares.
|
||||
|
||||
Like Block Storage, the Shared File Systems service is persistent. It
|
||||
can be:
|
||||
|
||||
* Mounted to any number of client machines.
|
||||
* Detached from one instance and attached to another without data loss.
|
||||
During this process the data are safe unless the Shared File Systems
|
||||
service itself is changed or removed.
|
||||
|
||||
Shares are provided by the Shared File Systems service. In OpenStack,
|
||||
Shared File Systems service is implemented by Shared File System
|
||||
(manila) project, which supports multiple back-ends in the form of
|
||||
drivers. The Shared File Systems service can be configured to provision
|
||||
shares from one or more back-ends. Share servers are virtual
|
||||
machines that export file shares using different protocols such as NFS,
|
||||
CIFS, GlusterFS, or HDFS.
|
@ -1,6 +1,6 @@
|
||||
=============================
|
||||
Capacity planning and scaling
|
||||
=============================
|
||||
=====================================
|
||||
Storage capacity planning and scaling
|
||||
=====================================
|
||||
|
||||
An important consideration in running a cloud over time is projecting growth
|
||||
and utilization trends in order to plan capital expenditures for the short and
|
||||
@ -312,91 +312,3 @@ resources servicing requests between proxy servers and storage nodes.
|
||||
For this reason, the network architecture used for access to storage
|
||||
nodes and proxy servers should make use of a design which is scalable.
|
||||
|
||||
Compute resource design
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When designing compute resource pools, consider the number of processors,
|
||||
amount of memory, and the quantity of storage required for each hypervisor.
|
||||
|
||||
Consider whether compute resources will be provided in a single pool or in
|
||||
multiple pools. In most cases, multiple pools of resources can be allocated
|
||||
and addressed on demand, commonly referred to as bin packing.
|
||||
|
||||
In a bin packing design, each independent resource pool provides service
|
||||
for specific flavors. Since instances are scheduled onto compute hypervisors,
|
||||
each independent node's resources will be allocated to efficiently use the
|
||||
available hardware. Bin packing also requires a common hardware design,
|
||||
with all hardware nodes within a compute resource pool sharing a common
|
||||
processor, memory, and storage layout. This makes it easier to deploy,
|
||||
support, and maintain nodes throughout their lifecycle.
|
||||
|
||||
Increasing the size of the supporting compute environment increases the
|
||||
network traffic and messages, adding load to the controller or
|
||||
networking nodes. Effective monitoring of the environment will help with
|
||||
capacity decisions on scaling.
|
||||
|
||||
Compute nodes automatically attach to OpenStack clouds, resulting in a
|
||||
horizontally scaling process when adding extra compute capacity to an
|
||||
OpenStack cloud. Additional processes are required to place nodes into
|
||||
appropriate availability zones and host aggregates. When adding
|
||||
additional compute nodes to environments, ensure identical or functional
|
||||
compatible CPUs are used, otherwise live migration features will break.
|
||||
It is necessary to add rack capacity or network switches as scaling out
|
||||
compute hosts directly affects network and data center resources.
|
||||
|
||||
Compute host components can also be upgraded to account for increases in
|
||||
demand, known as vertical scaling. Upgrading CPUs with more
|
||||
cores, or increasing the overall server memory, can add extra needed
|
||||
capacity depending on whether the running applications are more CPU
|
||||
intensive or memory intensive.
|
||||
|
||||
When selecting a processor, compare features and performance
|
||||
characteristics. Some processors include features specific to
|
||||
virtualized compute hosts, such as hardware-assisted virtualization, and
|
||||
technology related to memory paging (also known as EPT shadowing). These
|
||||
types of features can have a significant impact on the performance of
|
||||
your virtual machine.
|
||||
|
||||
The number of processor cores and threads impacts the number of worker
|
||||
threads which can be run on a resource node. Design decisions must
|
||||
relate directly to the service being run on it, as well as provide a
|
||||
balanced infrastructure for all services.
|
||||
|
||||
Another option is to assess the average workloads and increase the
|
||||
number of instances that can run within the compute environment by
|
||||
adjusting the overcommit ratio.
|
||||
|
||||
An overcommit ratio is the ratio of available virtual resources to
|
||||
available physical resources. This ratio is configurable for CPU and
|
||||
memory. The default CPU overcommit ratio is 16:1, and the default memory
|
||||
overcommit ratio is 1.5:1. Determining the tuning of the overcommit
|
||||
ratios during the design phase is important as it has a direct impact on
|
||||
the hardware layout of your compute nodes.
|
||||
|
||||
.. note::
|
||||
|
||||
Changing the CPU overcommit ratio can have a detrimental effect
|
||||
and cause a potential increase in a noisy neighbor.
|
||||
|
||||
Insufficient disk capacity could also have a negative effect on overall
|
||||
performance including CPU and memory usage. Depending on the back-end
|
||||
architecture of the OpenStack Block Storage layer, capacity includes
|
||||
adding disk shelves to enterprise storage systems or installing
|
||||
additional block storage nodes. Upgrading directly attached storage
|
||||
installed in compute hosts, and adding capacity to the shared storage
|
||||
for additional ephemeral storage to instances, may be necessary.
|
||||
|
||||
Consider the compute requirements of non-hypervisor nodes (also referred to as
|
||||
resource nodes). This includes controller, object storage, and block storage
|
||||
nodes, and networking services.
|
||||
|
||||
The ability to add compute resource pools for unpredictable workloads should
|
||||
be considered. In some cases, the demand for certain instance types or flavors
|
||||
may not justify individual hardware design. Allocate hardware designs that are
|
||||
capable of servicing the most common instance requests. Adding hardware to the
|
||||
overall architecture can be done later.
|
||||
|
||||
For more information on these topics, refer to the `OpenStack
|
||||
Operations Guide <http://docs.openstack.org/ops>`_.
|
||||
|
||||
.. TODO Add information on control plane API services and horizon.
|
@ -1,5 +1,187 @@
|
||||
.. _high-availability:
|
||||
|
||||
=================
|
||||
High Availability
|
||||
High availability
|
||||
=================
|
||||
|
||||
Data plane and control plane
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When designing an OpenStack cloud, it is important to consider the needs
|
||||
dictated by the :term:`Service Level Agreement (SLA)`. This includes the core
|
||||
services required to maintain availability of running Compute service
|
||||
instances, networks, storage, and additional services running on top of those
|
||||
resources. These services are often referred to as the Data Plane services,
|
||||
and are generally expected to be available all the time.
|
||||
|
||||
The remaining services, responsible for create, read, update and delete (CRUD)
|
||||
operations, metering, monitoring, and so on, are often referred to as the
|
||||
Control Plane. The SLA is likely to dictate a lower uptime requirement for
|
||||
these services.
|
||||
|
||||
The services comprising an OpenStack cloud have a number of requirements which
|
||||
the architect needs to understand in order to be able to meet SLA terms. For
|
||||
example, in order to provide the Compute service a minimum of storage, message
|
||||
queueing, and database services are necessary as well as the networking between
|
||||
them.
|
||||
|
||||
Ongoing maintenance operations are made much simpler if there is logical and
|
||||
physical separation of Data Plane and Control Plane systems. It then becomes
|
||||
possible to, for example, reboot a controller without affecting customers.
|
||||
If one service failure affects the operation of an entire server (``noisy
|
||||
neighbor``), the separation between Control and Data Planes enables rapid
|
||||
maintenance with a limited effect on customer operations.
|
||||
|
||||
Eliminating single points of failure
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. TODO Add introduction
|
||||
|
||||
Within each site
|
||||
----------------
|
||||
|
||||
OpenStack lends itself to deployment in a highly available manner where it is
|
||||
expected that at least 2 servers be utilized. These can run all the services
|
||||
involved from the message queuing service, for example ``RabbitMQ`` or
|
||||
``QPID``, and an appropriately deployed database service such as ``MySQL`` or
|
||||
``MariaDB``. As services in the cloud are scaled out, back-end services will
|
||||
need to scale too. Monitoring and reporting on server utilization and response
|
||||
times, as well as load testing your systems, will help determine scale out
|
||||
decisions.
|
||||
|
||||
The OpenStack services themselves should be deployed across multiple servers
|
||||
that do not represent a single point of failure. Ensuring availability can
|
||||
be achieved by placing these services behind highly available load balancers
|
||||
that have multiple OpenStack servers as members.
|
||||
|
||||
There are a small number of OpenStack services which are intended to only run
|
||||
in one place at a time (for example, the ``ceilometer-agent-central`` service)
|
||||
. In order to prevent these services from becoming a single point of failure,
|
||||
they can be controlled by clustering software such as ``Pacemaker``.
|
||||
|
||||
In OpenStack, the infrastructure is integral to providing services and should
|
||||
always be available, especially when operating with SLAs. Ensuring network
|
||||
availability is accomplished by designing the network architecture so that no
|
||||
single point of failure exists. A consideration of the number of switches,
|
||||
routes and redundancies of power should be factored into core infrastructure,
|
||||
as well as the associated bonding of networks to provide diverse routes to your
|
||||
highly available switch infrastructure.
|
||||
|
||||
Care must be taken when deciding network functionality. Currently, OpenStack
|
||||
supports both the legacy networking (nova-network) system and the newer,
|
||||
extensible OpenStack Networking (neutron). OpenStack Networking and legacy
|
||||
networking both have their advantages and disadvantages. They are both valid
|
||||
and supported options that fit different network deployment models described in
|
||||
the `OpenStack Operations Guide
|
||||
<http://docs.openstack.org/ops-guide/arch_network_design.html#network-topology>`_.
|
||||
|
||||
When using the Networking service, the OpenStack controller servers or separate
|
||||
Networking hosts handle routing unless the dynamic virtual routers pattern for
|
||||
routing is selected. Running routing directly on the controller servers mixes
|
||||
the Data and Control Planes and can cause complex issues with performance and
|
||||
troubleshooting. It is possible to use third party software and external
|
||||
appliances that help maintain highly available layer three routes. Doing so
|
||||
allows for common application endpoints to control network hardware, or to
|
||||
provide complex multi-tier web applications in a secure manner. It is also
|
||||
possible to completely remove routing from Networking, and instead rely on
|
||||
hardware routing capabilities. In this case, the switching infrastructure must
|
||||
support layer three routing.
|
||||
|
||||
Application design must also be factored into the capabilities of the
|
||||
underlying cloud infrastructure. If the compute hosts do not provide a seamless
|
||||
live migration capability, then it must be expected that if a compute host
|
||||
fails, that instance and any data local to that instance will be deleted.
|
||||
However, when providing an expectation to users that instances have a
|
||||
high-level of uptime guaranteed, the infrastructure must be deployed in a way
|
||||
that eliminates any single point of failure if a compute host disappears.
|
||||
This may include utilizing shared file systems on enterprise storage or
|
||||
OpenStack Block storage to provide a level of guarantee to match service
|
||||
features.
|
||||
|
||||
If using a storage design that includes shared access to centralized storage,
|
||||
ensure that this is also designed without single points of failure and the SLA
|
||||
for the solution matches or exceeds the expected SLA for the Data Plane.
|
||||
|
||||
Between sites in a multi-region design
|
||||
--------------------------------------
|
||||
|
||||
Some services are commonly shared between multiple regions, including the
|
||||
Identity service and the Dashboard. In this case, it is necessary to ensure
|
||||
that the databases backing the services are replicated, and that access to
|
||||
multiple workers across each site can be maintained in the event of losing a
|
||||
single region.
|
||||
|
||||
Multiple network links should be deployed between sites to provide redundancy
|
||||
for all components. This includes storage replication, which should be isolated
|
||||
to a dedicated network or VLAN with the ability to assign QoS to control the
|
||||
replication traffic or provide priority for this traffic.
|
||||
|
||||
.. note::
|
||||
|
||||
If the data store is highly changeable, the network requirements could have
|
||||
a significant effect on the operational cost of maintaining the sites.
|
||||
|
||||
If the design incorporates more than one site, the ability to maintain object
|
||||
availability in both sites has significant implications on the Object Storage
|
||||
design and implementation. It also has a significant impact on the WAN network
|
||||
design between the sites.
|
||||
|
||||
If applications running in a cloud are not cloud-aware, there should be clear
|
||||
measures and expectations to define what the infrastructure can and cannot
|
||||
support. An example would be shared storage between sites. It is possible,
|
||||
however such a solution is not native to OpenStack and requires a third-party
|
||||
hardware vendor to fulfill such a requirement. Another example can be seen in
|
||||
applications that are able to consume resources in object storage directly.
|
||||
|
||||
Connecting more than two sites increases the challenges and adds more
|
||||
complexity to the design considerations. Multi-site implementations require
|
||||
planning to address the additional topology used for internal and external
|
||||
connectivity. Some options include full mesh topology, hub spoke, spine leaf,
|
||||
and 3D Torus.
|
||||
|
||||
For more information on high availability in OpenStack, see the `OpenStack High
|
||||
Availability Guide <http://docs.openstack.org/ha-guide/>`_.
|
||||
|
||||
Site loss and recovery
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Outages can cause partial or full loss of site functionality. Strategies
|
||||
should be implemented to understand and plan for recovery scenarios.
|
||||
|
||||
* The deployed applications need to continue to function and, more
|
||||
importantly, you must consider the impact on the performance and
|
||||
reliability of the application if a site is unavailable.
|
||||
|
||||
* It is important to understand what happens to the replication of
|
||||
objects and data between the sites when a site goes down. If this
|
||||
causes queues to start building up, consider how long these queues
|
||||
can safely exist until an error occurs.
|
||||
|
||||
* After an outage, ensure that operations of a site are resumed when it
|
||||
comes back online. We recommend that you architect the recovery to
|
||||
avoid race conditions.
|
||||
|
||||
|
||||
Inter-site replication data
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Traditionally, replication has been the best method of protecting object store
|
||||
implementations. A variety of replication methods exist in storage
|
||||
architectures, for example synchronous and asynchronous mirroring. Most object
|
||||
stores and back-end storage systems implement methods for replication at the
|
||||
storage subsystem layer. Object stores also tailor replication techniques to
|
||||
fit a cloud's requirements.
|
||||
|
||||
Organizations must find the right balance between data integrity and data
|
||||
availability. Replication strategy may also influence disaster recovery
|
||||
methods.
|
||||
|
||||
Replication across different racks, data centers, and geographical regions
|
||||
increases focus on determining and ensuring data locality. The ability to
|
||||
guarantee data is accessed from the nearest or fastest storage can be necessary
|
||||
for applications to perform well.
|
||||
|
||||
.. note::
|
||||
|
||||
When running embedded object store methods, ensure that you do not
|
||||
instigate extra data replication as this may cause performance issues.
|
||||
|
Loading…
x
Reference in New Issue
Block a user