[arch-design] Migrate arch content from Ops Guide
1. Migrate storage and scaling content from Ops Guide to Arch Guide 2. Edit migrated content 3. Remove images from mitaka changes Note: Architecture chapter content in the Ops Guide will remain until the new Arch Guide is published. Change-Id: I676b57635be567a0c1b3ea63650e6327d3ea0696 Implements: blueprint arch-guide-restructure
@ -1,3 +1,430 @@
|
||||
=============================
|
||||
Capacity planning and scaling
|
||||
=============================
|
||||
|
||||
Whereas traditional applications required larger hardware to scale
|
||||
(vertical scaling), cloud-based applications typically request more,
|
||||
discrete hardware (horizontal scaling).
|
||||
|
||||
OpenStack is designed to be horizontally scalable. Rather than switching
|
||||
to larger servers, you procure more servers and simply install identically
|
||||
configured services. Ideally, you scale out and load balance among groups of
|
||||
functionally identical services (for example, compute nodes or ``nova-api``
|
||||
nodes), that communicate on a message bus.
|
||||
|
||||
The Starting Point
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Determining the scalability of your cloud and how to improve it requires
|
||||
balancing many variables. No one solution meets everyone's scalability goals.
|
||||
However, it is helpful to track a number of metrics. Since you can define
|
||||
virtual hardware templates, called "flavors" in OpenStack, you can start to
|
||||
make scaling decisions based on the flavors you'll provide. These templates
|
||||
define sizes for memory in RAM, root disk size, amount of ephemeral data disk
|
||||
space available, and number of cores for starters.
|
||||
|
||||
The default OpenStack flavors are shown in :ref:`table_default_flavors`.
|
||||
|
||||
.. _table_default_flavors:
|
||||
|
||||
.. list-table:: Table. OpenStack default flavors
|
||||
:widths: 20 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* - Name
|
||||
- Virtual cores
|
||||
- Memory
|
||||
- Disk
|
||||
- Ephemeral
|
||||
* - m1.tiny
|
||||
- 1
|
||||
- 512 MB
|
||||
- 1 GB
|
||||
- 0 GB
|
||||
* - m1.small
|
||||
- 1
|
||||
- 2 GB
|
||||
- 10 GB
|
||||
- 20 GB
|
||||
* - m1.medium
|
||||
- 2
|
||||
- 4 GB
|
||||
- 10 GB
|
||||
- 40 GB
|
||||
* - m1.large
|
||||
- 4
|
||||
- 8 GB
|
||||
- 10 GB
|
||||
- 80 GB
|
||||
* - m1.xlarge
|
||||
- 8
|
||||
- 16 GB
|
||||
- 10 GB
|
||||
- 160 GB
|
||||
|
||||
The starting point is the core count of your cloud. By applying
|
||||
some ratios, you can gather information about:
|
||||
|
||||
- The number of virtual machines (VMs) you expect to run,
|
||||
``((overcommit fraction × cores) / virtual cores per instance)``
|
||||
|
||||
- How much storage is required ``(flavor disk size × number of instances)``
|
||||
|
||||
You can use these ratios to determine how much additional infrastructure
|
||||
you need to support your cloud.
|
||||
|
||||
Here is an example using the ratios for gathering scalability
|
||||
information for the number of VMs expected as well as the storage
|
||||
needed. The following numbers support (200 / 2) × 16 = 1600 VM instances
|
||||
and require 80 TB of storage for ``/var/lib/nova/instances``:
|
||||
|
||||
- 200 physical cores.
|
||||
|
||||
- Most instances are size m1.medium (two virtual cores, 50 GB of
|
||||
storage).
|
||||
|
||||
- Default CPU overcommit ratio (``cpu_allocation_ratio`` in nova.conf)
|
||||
of 16:1.
|
||||
|
||||
.. note::
|
||||
Regardless of the overcommit ratio, an instance can not be placed
|
||||
on any physical node with fewer raw (pre-overcommit) resources than
|
||||
instance flavor requires.
|
||||
|
||||
However, you need more than the core count alone to estimate the load
|
||||
that the API services, database servers, and queue servers are likely to
|
||||
encounter. You must also consider the usage patterns of your cloud.
|
||||
|
||||
As a specific example, compare a cloud that supports a managed
|
||||
web-hosting platform with one running integration tests for a
|
||||
development project that creates one VM per code commit. In the former,
|
||||
the heavy work of creating a VM happens only every few months, whereas
|
||||
the latter puts constant heavy load on the cloud controller. You must
|
||||
consider your average VM lifetime, as a larger number generally means
|
||||
less load on the cloud controller.
|
||||
|
||||
.. TODO Perhaps relocate the above paragraph under the web scale use case?
|
||||
|
||||
Aside from the creation and termination of VMs, you must consider the
|
||||
impact of users accessing the service particularly on ``nova-api`` and
|
||||
its associated database. Listing instances garners a great deal of
|
||||
information and, given the frequency with which users run this
|
||||
operation, a cloud with a large number of users can increase the load
|
||||
significantly. This can occur even without their knowledge. For example,
|
||||
leaving the OpenStack dashboard instances tab open in the browser
|
||||
refreshes the list of VMs every 30 seconds.
|
||||
|
||||
After you consider these factors, you can determine how many cloud
|
||||
controller cores you require. A typical eight core, 8 GB of RAM server
|
||||
is sufficient for up to a rack of compute nodes — given the above
|
||||
caveats.
|
||||
|
||||
You must also consider key hardware specifications for the performance
|
||||
of user VMs, as well as budget and performance needs, including storage
|
||||
performance (spindles/core), memory availability (RAM/core), network
|
||||
bandwidth hardware specifications and (Gbps/core), and overall
|
||||
CPU performance (CPU/core).
|
||||
|
||||
.. tip::
|
||||
|
||||
For a discussion of metric tracking, including how to extract
|
||||
metrics from your cloud, see the .`OpenStack Operations Guide
|
||||
<http://docs.openstack.org/ops-guide/ops_logging_monitoring.html>`_.
|
||||
|
||||
Adding Cloud Controller Nodes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You can facilitate the horizontal expansion of your cloud by adding
|
||||
nodes. Adding compute nodes is straightforward since they are easily picked up
|
||||
by the existing installation. However, you must consider some important
|
||||
points when you design your cluster to be highly available.
|
||||
|
||||
A cloud controller node runs several different services. You
|
||||
can install services that communicate only using the message queue
|
||||
internally— ``nova-scheduler`` and ``nova-console`` on a new server for
|
||||
expansion. However, other integral parts require more care.
|
||||
|
||||
You should load balance user-facing services such as dashboard,
|
||||
``nova-api``, or the Object Storage proxy. Use any standard HTTP
|
||||
load-balancing method (DNS round robin, hardware load balancer, or
|
||||
software such as Pound or HAProxy). One caveat with dashboard is the VNC
|
||||
proxy, which uses the WebSocket protocol— something that an L7 load
|
||||
balancer might struggle with. See also `Horizon session storage
|
||||
<http://docs.openstack.org/developer/horizon/topics/deployment.html#session-storage>`_.
|
||||
|
||||
You can configure some services, such as ``nova-api`` and
|
||||
``glance-api``, to use multiple processes by changing a flag in their
|
||||
configuration file allowing them to share work between multiple cores on
|
||||
the one machine.
|
||||
|
||||
.. tip::
|
||||
|
||||
Several options are available for MySQL load balancing, and the
|
||||
supported AMQP brokers have built-in clustering support. Information
|
||||
on how to configure these and many of the other services can be
|
||||
found in the `operations chapter
|
||||
<http://docs.openstack.org/ops-guide/operations.html>`_ in the Operations
|
||||
Guide.
|
||||
|
||||
Segregating Your Cloud
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When you want to offer users different regions to provide legal
|
||||
considerations for data storage, redundancy across earthquake fault
|
||||
lines, or for low-latency API calls, you segregate your cloud. Use one
|
||||
of the following OpenStack methods to segregate your cloud: *cells*,
|
||||
*regions*, *availability zones*, or *host aggregates*.
|
||||
|
||||
Each method provides different functionality and can be best divided
|
||||
into two groups:
|
||||
|
||||
- Cells and regions, which segregate an entire cloud and result in
|
||||
running separate Compute deployments.
|
||||
|
||||
- :term:`Availability zones <availability zone>` and host aggregates,
|
||||
which merely divide a single Compute deployment.
|
||||
|
||||
:ref:`table_segregation_methods` provides a comparison view of each
|
||||
segregation method currently provided by OpenStack Compute.
|
||||
|
||||
.. _table_segregation_methods:
|
||||
|
||||
.. list-table:: Table. OpenStack segregation methods
|
||||
:widths: 20 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Cells
|
||||
- Regions
|
||||
- Availability zones
|
||||
- Host aggregates
|
||||
* - **Use**
|
||||
- A single :term:`API endpoint` for compute, or you require a second
|
||||
level of scheduling.
|
||||
- Discrete regions with separate API endpoints and no coordination
|
||||
between regions.
|
||||
- Logical separation within your nova deployment for physical isolation
|
||||
or redundancy.
|
||||
- To schedule a group of hosts with common features.
|
||||
* - **Example**
|
||||
- A cloud with multiple sites where you can schedule VMs "anywhere" or on
|
||||
a particular site.
|
||||
- A cloud with multiple sites, where you schedule VMs to a particular
|
||||
site and you want a shared infrastructure.
|
||||
- A single-site cloud with equipment fed by separate power supplies.
|
||||
- Scheduling to hosts with trusted hardware support.
|
||||
* - **Overhead**
|
||||
- Considered experimental. A new service, nova-cells. Each cell has a full
|
||||
nova installation except nova-api.
|
||||
- A different API endpoint for every region. Each region has a full nova
|
||||
installation.
|
||||
- Configuration changes to ``nova.conf``.
|
||||
- Configuration changes to ``nova.conf``.
|
||||
* - **Shared services**
|
||||
- Keystone, ``nova-api``
|
||||
- Keystone
|
||||
- Keystone, All nova services
|
||||
- Keystone, All nova services
|
||||
|
||||
Cells and Regions
|
||||
-----------------
|
||||
|
||||
OpenStack Compute cells are designed to allow running the cloud in a
|
||||
distributed fashion without having to use more complicated technologies,
|
||||
or be invasive to existing nova installations. Hosts in a cloud are
|
||||
partitioned into groups called *cells*. Cells are configured in a tree.
|
||||
The top-level cell ("API cell") has a host that runs the ``nova-api``
|
||||
service, but no ``nova-compute`` services. Each child cell runs all of
|
||||
the other typical ``nova-*`` services found in a regular installation,
|
||||
except for the ``nova-api`` service. Each cell has its own message queue
|
||||
and database service and also runs ``nova-cells``, which manages the
|
||||
communication between the API cell and child cells.
|
||||
|
||||
This allows for a single API server being used to control access to
|
||||
multiple cloud installations. Introducing a second level of scheduling
|
||||
(the cell selection), in addition to the regular ``nova-scheduler``
|
||||
selection of hosts, provides greater flexibility to control where
|
||||
virtual machines are run.
|
||||
|
||||
Unlike having a single API endpoint, regions have a separate API
|
||||
endpoint per installation, allowing for a more discrete separation.
|
||||
Users wanting to run instances across sites have to explicitly select a
|
||||
region. However, the additional complexity of a running a new service is
|
||||
not required.
|
||||
|
||||
The OpenStack dashboard (horizon) can be configured to use multiple
|
||||
regions. This can be configured through the ``AVAILABLE_REGIONS``
|
||||
parameter.
|
||||
|
||||
Availability Zones and Host Aggregates
|
||||
--------------------------------------
|
||||
|
||||
You can use availability zones, host aggregates, or both to partition a
|
||||
nova deployment.
|
||||
|
||||
Availability zones are implemented through and configured in a similar
|
||||
way to host aggregates.
|
||||
|
||||
However, you can use them for different reasons.
|
||||
|
||||
Availability zone
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
This enables you to arrange OpenStack compute hosts into logical groups
|
||||
and provides a form of physical isolation and redundancy from other
|
||||
availability zones, such as by using a separate power supply or network
|
||||
equipment.
|
||||
|
||||
You define the availability zone in which a specified compute host
|
||||
resides locally on each server. An availability zone is commonly used to
|
||||
identify a set of servers that have a common attribute. For instance, if
|
||||
some of the racks in your data center are on a separate power source,
|
||||
you can put servers in those racks in their own availability zone.
|
||||
Availability zones can also help separate different classes of hardware.
|
||||
|
||||
When users provision resources, they can specify from which availability
|
||||
zone they want their instance to be built. This allows cloud consumers
|
||||
to ensure that their application resources are spread across disparate
|
||||
machines to achieve high availability in the event of hardware failure.
|
||||
|
||||
Host aggregates zone
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This enables you to partition OpenStack Compute deployments into logical
|
||||
groups for load balancing and instance distribution. You can use host
|
||||
aggregates to further partition an availability zone. For example, you
|
||||
might use host aggregates to partition an availability zone into groups
|
||||
of hosts that either share common resources, such as storage and
|
||||
network, or have a special property, such as trusted computing
|
||||
hardware.
|
||||
|
||||
A common use of host aggregates is to provide information for use with
|
||||
the ``nova-scheduler``. For example, you might use a host aggregate to
|
||||
group a set of hosts that share specific flavors or images.
|
||||
|
||||
The general case for this is setting key-value pairs in the aggregate
|
||||
metadata and matching key-value pairs in flavor's ``extra_specs``
|
||||
metadata. The ``AggregateInstanceExtraSpecsFilter`` in the filter
|
||||
scheduler will enforce that instances be scheduled only on hosts in
|
||||
aggregates that define the same key to the same value.
|
||||
|
||||
An advanced use of this general concept allows different flavor types to
|
||||
run with different CPU and RAM allocation ratios so that high-intensity
|
||||
computing loads and low-intensity development and testing systems can
|
||||
share the same cloud without either starving the high-use systems or
|
||||
wasting resources on low-utilization systems. This works by setting
|
||||
``metadata`` in your host aggregates and matching ``extra_specs`` in
|
||||
your flavor types.
|
||||
|
||||
The first step is setting the aggregate metadata keys
|
||||
``cpu_allocation_ratio`` and ``ram_allocation_ratio`` to a
|
||||
floating-point value. The filter schedulers ``AggregateCoreFilter`` and
|
||||
``AggregateRamFilter`` will use those values rather than the global
|
||||
defaults in ``nova.conf`` when scheduling to hosts in the aggregate. Be
|
||||
cautious when using this feature, since each host can be in multiple
|
||||
aggregates, but should have only one allocation ratio for
|
||||
each resources. It is up to you to avoid putting a host in multiple
|
||||
aggregates that define different values for the same resource.
|
||||
|
||||
This is the first half of the equation. To get flavor types that are
|
||||
guaranteed a particular ratio, you must set the ``extra_specs`` in the
|
||||
flavor type to the key-value pair you want to match in the aggregate.
|
||||
For example, if you define ``extra_specs`` ``cpu_allocation_ratio`` to
|
||||
"1.0", then instances of that type will run in aggregates only where the
|
||||
metadata key ``cpu_allocation_ratio`` is also defined as "1.0." In
|
||||
practice, it is better to define an additional key-value pair in the
|
||||
aggregate metadata to match on rather than match directly on
|
||||
``cpu_allocation_ratio`` or ``core_allocation_ratio``. This allows
|
||||
better abstraction. For example, by defining a key ``overcommit`` and
|
||||
setting a value of "high," "medium," or "low," you could then tune the
|
||||
numeric allocation ratios in the aggregates without also needing to
|
||||
change all flavor types relating to them.
|
||||
|
||||
.. note::
|
||||
|
||||
Previously, all services had an availability zone. Currently, only
|
||||
the ``nova-compute`` service has its own availability zone. Services
|
||||
such as ``nova-scheduler``, ``nova-network``, and ``nova-conductor``
|
||||
have always spanned all availability zones.
|
||||
|
||||
When you run any of the following operations, the services appear in
|
||||
their own internal availability zone
|
||||
(CONF.internal_service_availability_zone):
|
||||
|
||||
- :command:`nova host-list` (os-hosts)
|
||||
|
||||
- :command:`euca-describe-availability-zones verbose`
|
||||
|
||||
- :command:`nova service-list`
|
||||
|
||||
The internal availability zone is hidden in
|
||||
euca-describe-availability_zones (nonverbose).
|
||||
|
||||
CONF.node_availability_zone has been renamed to
|
||||
CONF.default_availability_zone and is used only by the
|
||||
``nova-api`` and ``nova-scheduler`` services.
|
||||
|
||||
CONF.node_availability_zone still works but is deprecated.
|
||||
|
||||
Scalable Hardware
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
While several resources already exist to help with deploying and
|
||||
installing OpenStack, it's very important to make sure that you have
|
||||
your deployment planned out ahead of time. This guide presumes that you
|
||||
have set aside a rack for the OpenStack cloud but also offers
|
||||
suggestions for when and what to scale.
|
||||
|
||||
Hardware Procurement
|
||||
--------------------
|
||||
|
||||
“The Cloud” has been described as a volatile environment where servers
|
||||
can be created and terminated at will. While this may be true, it does
|
||||
not mean that your servers must be volatile. Ensuring that your cloud's
|
||||
hardware is stable and configured correctly means that your cloud
|
||||
environment remains up and running.
|
||||
|
||||
OpenStack can be deployed on any hardware supported by an
|
||||
OpenStack compatible Linux distribution.
|
||||
|
||||
Hardware does not have to be consistent, but it should at least have the
|
||||
same type of CPU to support instance migration.
|
||||
|
||||
The typical hardware recommended for use with OpenStack is the standard
|
||||
value-for-money offerings that most hardware vendors stock. It should be
|
||||
straightforward to divide your procurement into building blocks such as
|
||||
"compute," "object storage," and "cloud controller," and request as many
|
||||
of these as you need. Alternatively, any existing servers you have that meet
|
||||
performance requirements and virtualization technology are likely to support
|
||||
OpenStack.
|
||||
|
||||
Capacity Planning
|
||||
-----------------
|
||||
|
||||
OpenStack is designed to increase in size in a straightforward manner.
|
||||
Taking into account the considerations previous mentioned, particularly on the
|
||||
sizing of the cloud controller, it should be possible to procure additional
|
||||
compute or object storage nodes as needed. New nodes do not need to be the same
|
||||
specification or vendor as existing nodes.
|
||||
|
||||
For compute nodes, ``nova-scheduler`` will manage differences in
|
||||
sizing with core count and RAM. However, you should consider that the user
|
||||
experience changes with differing CPU speeds. When adding object storage
|
||||
nodes, a :term:`weight` should be specified that reflects the
|
||||
:term:`capability` of the node.
|
||||
|
||||
Monitoring the resource usage and user growth will enable you to know
|
||||
when to procure. The `Logging and Monitoring
|
||||
<http://docs.openstack.org/ops-guide/ops_logging_monitoring.html>`_
|
||||
chapte in the Operations Guide details some useful metrics.
|
||||
|
||||
Burn-in Testing
|
||||
---------------
|
||||
|
||||
The chances of failure for the server's hardware are high at the start
|
||||
and the end of its life. As a result, dealing with hardware failures
|
||||
while in production can be avoided by appropriate burn-in testing to
|
||||
attempt to trigger the early-stage failures. The general principle is to
|
||||
stress the hardware to its limits. Examples of burn-in tests include
|
||||
running a CPU or disk benchmark for several days.
|
||||
|
||||
|
3
doc/arch-design-draft/source/design-compute.rst
Normal file
@ -0,0 +1,3 @@
|
||||
=======
|
||||
Compute
|
||||
=======
|
3
doc/arch-design-draft/source/design-control-plane.rst
Normal file
@ -0,0 +1,3 @@
|
||||
=============
|
||||
Control Plane
|
||||
=============
|
3
doc/arch-design-draft/source/design-dashboard-api.rst
Normal file
@ -0,0 +1,3 @@
|
||||
==================
|
||||
Dashboard and APIs
|
||||
==================
|
3
doc/arch-design-draft/source/design-identity.rst
Normal file
@ -0,0 +1,3 @@
|
||||
========
|
||||
Identity
|
||||
========
|
3
doc/arch-design-draft/source/design-images.rst
Normal file
@ -0,0 +1,3 @@
|
||||
======
|
||||
Images
|
||||
======
|
3
doc/arch-design-draft/source/design-networking.rst
Normal file
@ -0,0 +1,3 @@
|
||||
==========
|
||||
Networking
|
||||
==========
|
476
doc/arch-design-draft/source/design-storage.rst
Normal file
@ -0,0 +1,476 @@
|
||||
==============
|
||||
Storage design
|
||||
==============
|
||||
|
||||
Storage is found in many parts of the OpenStack cloud environment. This
|
||||
section describes persistent storage options you can configure with
|
||||
your cloud. It is important to understand the distinction between
|
||||
:term:`ephemeral <ephemeral volume>` storage and
|
||||
:term:`persistent <persistent volume>` storage.
|
||||
|
||||
Ephemeral Storage
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
If you deploy only the OpenStack :term:`Compute service` (nova), by default
|
||||
your users do not have access to any form of persistent storage. The disks
|
||||
associated with VMs are "ephemeral," meaning that from the user's point
|
||||
of view they disappear when a virtual machine is terminated.
|
||||
|
||||
Persistent Storage
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Persistent storage means that the storage resource outlives any other
|
||||
resource and is always available, regardless of the state of a running
|
||||
instance.
|
||||
|
||||
Today, OpenStack clouds explicitly support three types of persistent
|
||||
storage: *object storage*, *block storage*, and *file system storage*.
|
||||
|
||||
Object Storage
|
||||
--------------
|
||||
|
||||
Object storage is implemented in OpenStack by the
|
||||
OpenStack Object Storage (swift) project. Users access binary objects
|
||||
through a REST API. If your intended users need to
|
||||
archive or manage large datasets, you want to provide them with Object
|
||||
Storage. In addition, OpenStack can store your virtual machine (VM)
|
||||
images inside of an object storage system, as an alternative to storing
|
||||
the images on a file system.
|
||||
|
||||
OpenStack Object Storage provides a highly scalable, highly available
|
||||
storage solution by relaxing some of the constraints of traditional file
|
||||
systems. In designing and procuring for such a cluster, it is important
|
||||
to understand some key concepts about its operation. Essentially, this
|
||||
type of storage is built on the idea that all storage hardware fails, at
|
||||
every level, at some point. Infrequently encountered failures that would
|
||||
hamstring other storage systems, such as issues taking down RAID cards
|
||||
or entire servers, are handled gracefully with OpenStack Object
|
||||
Storage. For more information, see the `Swift developer
|
||||
documentation <http://docs.openstack.org/developer/swift/overview_architecture.html>`_
|
||||
|
||||
When designing your cluster, you must consider durability and
|
||||
availability which is dependent on the spread and placement of your data,
|
||||
rather than the reliability of the
|
||||
hardware. Consider the default value of the number of replicas, which is
|
||||
three. This means that before an object is marked as having been
|
||||
written, at least two copies exist in case a single server fails to
|
||||
write, the third copy may or may not yet exist when the write operation
|
||||
initially returns. Altering this number increases the robustness of your
|
||||
data, but reduces the amount of storage you have available. Look
|
||||
at the placement of your servers. Consider spreading them widely
|
||||
throughout your data center's network and power-failure zones. Is a zone
|
||||
a rack, a server, or a disk?
|
||||
|
||||
Consider these main traffic flows for an Object Storage network:
|
||||
|
||||
* Among :term:`object`, :term:`container`, and
|
||||
:term:`account servers <account server>`
|
||||
* Between servers and the proxies
|
||||
* Between the proxies and your users
|
||||
|
||||
Object Storage frequent communicates among servers hosting data. Even a small
|
||||
cluster generates megabytes/second of traffic, which is predominantly, “Do
|
||||
you have the object?” and “Yes I have the object!” If the answer
|
||||
to the question is negative or the request times out,
|
||||
replication of the object begins.
|
||||
|
||||
Consider the scenario where an entire server fails and 24 TB of data
|
||||
needs to be transferred "immediately" to remain at three copies — this can
|
||||
put significant load on the network.
|
||||
|
||||
Another consideration is when a new file is being uploaded, the proxy server
|
||||
must write out as many streams as there are replicas, multiplying network
|
||||
traffic. For a three-replica cluster, 10 Gbps in means 30 Gbps out. Combining
|
||||
this with the previous high bandwidth bandwidth private versus public network
|
||||
recommendations demands of replication is what results in the recommendation
|
||||
that your private network be of significantly higher bandwidth than your public
|
||||
network requires. OpenStack Object Storage communicates internally with
|
||||
unencrypted, unauthenticated rsync for performance, so the private
|
||||
network is required.
|
||||
|
||||
The remaining point on bandwidth is the public-facing portion. The
|
||||
``swift-proxy`` service is stateless, which means that you can easily
|
||||
add more and use HTTP load-balancing methods to share bandwidth and
|
||||
availability between them.
|
||||
|
||||
More proxies means more bandwidth, if your storage can keep up.
|
||||
|
||||
Block Storage
|
||||
-------------
|
||||
|
||||
Block storage (sometimes referred to as volume storage) provides users
|
||||
with access to block-storage devices. Users interact with block storage
|
||||
by attaching volumes to their running VM instances.
|
||||
|
||||
These volumes are persistent: they can be detached from one instance and
|
||||
re-attached to another, and the data remains intact. Block storage is
|
||||
implemented in OpenStack by the OpenStack Block Storage (cinder), which
|
||||
supports multiple back ends in the form of drivers. Your
|
||||
choice of a storage back end must be supported by a Block Storage
|
||||
driver.
|
||||
|
||||
Most block storage drivers allow the instance to have direct access to
|
||||
the underlying storage hardware's block device. This helps increase the
|
||||
overall read/write IO. However, support for utilizing files as volumes
|
||||
is also well established, with full support for NFS, GlusterFS and
|
||||
others.
|
||||
|
||||
These drivers work a little differently than a traditional "block"
|
||||
storage driver. On an NFS or GlusterFS file system, a single file is
|
||||
created and then mapped as a "virtual" volume into the instance. This
|
||||
mapping/translation is similar to how OpenStack utilizes QEMU's
|
||||
file-based virtual machines stored in ``/var/lib/nova/instances``.
|
||||
|
||||
Shared File Systems Service
|
||||
---------------------------
|
||||
|
||||
The Shared File Systems service (manila) provides a set of services for
|
||||
management of shared file systems in a multi-tenant cloud environment.
|
||||
Users interact with the Shared File Systems service by mounting remote File
|
||||
Systems on their instances with the following usage of those systems for
|
||||
file storing and exchange. The Shared File Systems service provides you with
|
||||
shares which is a remote, mountable file system. You can mount a
|
||||
share to and access a share from several hosts by several users at a
|
||||
time. With shares, user can also:
|
||||
|
||||
* Create a share specifying its size, shared file system protocol,
|
||||
visibility level.
|
||||
* Create a share on either a share server or standalone, depending on
|
||||
the selected back-end mode, with or without using a share network.
|
||||
* Specify access rules and security services for existing shares.
|
||||
* Combine several shares in groups to keep data consistency inside the
|
||||
groups for the following safe group operations.
|
||||
* Create a snapshot of a selected share or a share group for storing
|
||||
the existing shares consistently or creating new shares from that
|
||||
snapshot in a consistent way.
|
||||
* Create a share from a snapshot.
|
||||
* Set rate limits and quotas for specific shares and snapshots.
|
||||
* View usage of share resources.
|
||||
* Remove shares.
|
||||
|
||||
Like Block Storage, the Shared File Systems service is persistent. It
|
||||
can be:
|
||||
|
||||
* Mounted to any number of client machines.
|
||||
* Detached from one instance and attached to another without data loss.
|
||||
During this process the data are safe unless the Shared File Systems
|
||||
service itself is changed or removed.
|
||||
|
||||
Shares are provided by the Shared File Systems service. In OpenStack,
|
||||
Shared File Systems service is implemented by Shared File System
|
||||
(manila) project, which supports multiple back-ends in the form of
|
||||
drivers. The Shared File Systems service can be configured to provision
|
||||
shares from one or more back-ends. Share servers are, mostly, virtual
|
||||
machines that export file shares using different protocols such as NFS,
|
||||
CIFS, GlusterFS, or HDFS.
|
||||
|
||||
OpenStack Storage Concepts
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:ref:`table_openstack_storage` explains the different storage concepts
|
||||
provided by OpenStack.
|
||||
|
||||
.. _table_openstack_storage:
|
||||
|
||||
.. list-table:: Table. OpenStack storage
|
||||
:widths: 20 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Ephemeral storage
|
||||
- Block storage
|
||||
- Object storage
|
||||
- Shared File System storage
|
||||
* - Used to…
|
||||
- Run operating system and scratch space
|
||||
- Add additional persistent storage to a virtual machine (VM)
|
||||
- Store data, including VM images
|
||||
- Add additional persistent storage to a virtual machine
|
||||
* - Accessed through…
|
||||
- A file system
|
||||
- A block device that can be partitioned, formatted, and mounted
|
||||
(such as, /dev/vdc)
|
||||
- The REST API
|
||||
- A Shared File Systems service share (either manila managed or an
|
||||
external one registered in manila) that can be partitioned, formatted
|
||||
and mounted (such as /dev/vdc)
|
||||
* - Accessible from…
|
||||
- Within a VM
|
||||
- Within a VM
|
||||
- Anywhere
|
||||
- Within a VM
|
||||
* - Managed by…
|
||||
- OpenStack Compute (nova)
|
||||
- OpenStack Block Storage (cinder)
|
||||
- OpenStack Object Storage (swift)
|
||||
- OpenStack Shared File System Storage (manila)
|
||||
* - Persists until…
|
||||
- VM is terminated
|
||||
- Deleted by user
|
||||
- Deleted by user
|
||||
- Deleted by user
|
||||
* - Sizing determined by…
|
||||
- Administrator configuration of size settings, known as *flavors*
|
||||
- User specification in initial request
|
||||
- Amount of available physical storage
|
||||
- * User specification in initial request
|
||||
* Requests for extension
|
||||
* Available user-level quotes
|
||||
* Limitations applied by Administrator
|
||||
* - Encryption set by…
|
||||
- Parameter in nova.conf
|
||||
- Admin establishing `encrypted volume type
|
||||
<http://docs.openstack.org/admin-guide/dashboard_manage_volumes.html>`_,
|
||||
then user selecting encrypted volume
|
||||
- Not yet available
|
||||
- Shared File Systems service does not apply any additional encryption
|
||||
above what the share’s back-end storage provides
|
||||
* - Example of typical usage…
|
||||
- 10 GB first disk, 30 GB second disk
|
||||
- 1 TB disk
|
||||
- 10s of TBs of dataset storage
|
||||
- Depends completely on the size of back-end storage specified when
|
||||
a share was being created. In case of thin provisioning it can be
|
||||
partial space reservation (for more details see
|
||||
`Capabilities and Extra-Specs
|
||||
<http://docs.openstack.org/developer/manila/devref/capabilities_and_extra_specs.html?highlight=extra%20specs#common-capabilities>`_
|
||||
specification)
|
||||
|
||||
.. note::
|
||||
|
||||
**File-level Storage (for Live Migration)**
|
||||
|
||||
With file-level storage, users access stored data using the operating
|
||||
system's file system interface. Most users, if they have used a network
|
||||
storage solution before, have encountered this form of networked
|
||||
storage. In the Unix world, the most common form of this is NFS. In the
|
||||
Windows world, the most common form is called CIFS (previously, SMB).
|
||||
|
||||
OpenStack clouds do not present file-level storage to end users.
|
||||
However, it is important to consider file-level storage for storing
|
||||
instances under ``/var/lib/nova/instances`` when designing your cloud,
|
||||
since you must have a shared file system if you want to support live
|
||||
migration.
|
||||
|
||||
Choosing Storage Back Ends
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Users will indicate different needs for their cloud use cases. Some may
|
||||
need fast access to many objects that do not change often, or want to
|
||||
set a time-to-live (TTL) value on a file. Others may access only storage
|
||||
that is mounted with the file system itself, but want it to be
|
||||
replicated instantly when starting a new instance. For other systems,
|
||||
ephemeral storage is the preferred choice. When you select
|
||||
:term:`storage back ends <storage back end>`,
|
||||
consider the following questions from user's perspective:
|
||||
|
||||
* Do my users need block storage?
|
||||
* Do my users need object storage?
|
||||
* Do I need to support live migration?
|
||||
* Should my persistent storage drives be contained in my compute nodes,
|
||||
or should I use external storage?
|
||||
* What is the platter count I can achieve? Do more spindles result in
|
||||
better I/O despite network access?
|
||||
* Which one results in the best cost-performance scenario I'm aiming for?
|
||||
* How do I manage the storage operationally?
|
||||
* How redundant and distributed is the storage? What happens if a
|
||||
storage node fails? To what extent can it mitigate my data-loss
|
||||
disaster scenarios?
|
||||
|
||||
To deploy your storage by using only commodity hardware, you can use a number
|
||||
of open-source packages, as shown in :ref:`table_persistent_file_storage`.
|
||||
|
||||
.. _table_persistent_file_storage:
|
||||
|
||||
.. list-table:: Table. Persistent file-based storage support
|
||||
:widths: 25 25 25 25
|
||||
:header-rows: 1
|
||||
|
||||
* -
|
||||
- Object
|
||||
- Block
|
||||
- File-level
|
||||
* - Swift
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
-
|
||||
* - LVM
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
* - Ceph
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- Experimental
|
||||
* - Gluster
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
* - NFS
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
* - ZFS
|
||||
-
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
* - Sheepdog
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
- .. image:: figures/Check_mark_23x20_02.png
|
||||
:width: 30%
|
||||
-
|
||||
|
||||
This list of open source file-level shared storage solutions is not
|
||||
exhaustive; other open source solutions exist (MooseFS). Your
|
||||
organization may already have deployed a file-level shared storage
|
||||
solution that you can use.
|
||||
|
||||
.. note::
|
||||
|
||||
**Storage Driver Support**
|
||||
|
||||
In addition to the open source technologies, there are a number of
|
||||
proprietary solutions that are officially supported by OpenStack Block
|
||||
Storage. You can find a matrix of the functionality provided by all of the
|
||||
supported Block Storage drivers on the `OpenStack
|
||||
wiki <https://wiki.openstack.org/wiki/CinderSupportMatrix>`_.
|
||||
|
||||
Also, you need to decide whether you want to support object storage in
|
||||
your cloud. The two common use cases for providing object storage in a
|
||||
compute cloud are:
|
||||
|
||||
* To provide users with a persistent storage mechanism
|
||||
* As a scalable, reliable data store for virtual machine images
|
||||
|
||||
Commodity Storage Back-end Technologies
|
||||
---------------------------------------
|
||||
|
||||
This section provides a high-level overview of the differences among the
|
||||
different commodity storage back end technologies. Depending on your
|
||||
cloud user's needs, you can implement one or many of these technologies
|
||||
in different combinations:
|
||||
|
||||
OpenStack Object Storage (swift)
|
||||
The official OpenStack Object Store implementation. It is a mature
|
||||
technology that has been used for several years in production by
|
||||
Rackspace as the technology behind Rackspace Cloud Files. As it is
|
||||
highly scalable, it is well-suited to managing petabytes of storage.
|
||||
OpenStack Object Storage's advantages are better integration with
|
||||
OpenStack (integrates with OpenStack Identity, works with the
|
||||
OpenStack dashboard interface) and better support for multiple data
|
||||
center deployment through support of asynchronous eventual
|
||||
consistency replication.
|
||||
|
||||
Therefore, if you eventually plan on distributing your storage
|
||||
cluster across multiple data centers, if you need unified accounts
|
||||
for your users for both compute and object storage, or if you want
|
||||
to control your object storage with the OpenStack dashboard, you
|
||||
should consider OpenStack Object Storage. More detail can be found
|
||||
about OpenStack Object Storage in the section below.
|
||||
|
||||
Ceph
|
||||
A scalable storage solution that replicates data across commodity
|
||||
storage nodes.
|
||||
|
||||
Ceph was designed to expose different types of storage interfaces to
|
||||
the end user: it supports object storage, block storage, and
|
||||
file-system interfaces, although the file-system interface is not
|
||||
production-ready. Ceph supports the same API as swift
|
||||
for object storage and can be used as a back end for cinder block
|
||||
storage as well as back-end storage for glance images. Ceph supports
|
||||
"thin provisioning," implemented using copy-on-write.
|
||||
|
||||
This can be useful when booting from volume because a new volume can
|
||||
be provisioned very quickly. Ceph also supports keystone-based
|
||||
authentication (as of version 0.56), so it can be a seamless swap in
|
||||
for the default OpenStack swift implementation.
|
||||
|
||||
Ceph's advantages are that it gives the administrator more
|
||||
fine-grained control over data distribution and replication
|
||||
strategies, enables you to consolidate your object and block
|
||||
storage, enables very fast provisioning of boot-from-volume
|
||||
instances using thin provisioning, and supports a distributed
|
||||
file-system interface, though this interface is `not yet
|
||||
recommended <http://ceph.com/docs/master/cephfs/>`_ for use in
|
||||
production deployment by the Ceph project.
|
||||
|
||||
If you want to manage your object and block storage within a single
|
||||
system, or if you want to support fast boot-from-volume, you should
|
||||
consider Ceph.
|
||||
|
||||
Gluster
|
||||
A distributed, shared file system. As of Gluster version 3.3, you
|
||||
can use Gluster to consolidate your object storage and file storage
|
||||
into one unified file and object storage solution, which is called
|
||||
Gluster For OpenStack (GFO). GFO uses a customized version of swift
|
||||
that enables Gluster to be used as the back-end storage.
|
||||
|
||||
The main reason to use GFO rather than swift is if you also
|
||||
want to support a distributed file system, either to support shared
|
||||
storage live migration or to provide it as a separate service to
|
||||
your end users. If you want to manage your object and file storage
|
||||
within a single system, you should consider GFO.
|
||||
|
||||
LVM
|
||||
The Logical Volume Manager is a Linux-based system that provides an
|
||||
abstraction layer on top of physical disks to expose logical volumes
|
||||
to the operating system. The LVM back-end implements block storage
|
||||
as LVM logical partitions.
|
||||
|
||||
On each host that will house block storage, an administrator must
|
||||
initially create a volume group dedicated to Block Storage volumes.
|
||||
Blocks are created from LVM logical volumes.
|
||||
|
||||
.. note::
|
||||
|
||||
LVM does *not* provide any replication. Typically,
|
||||
administrators configure RAID on nodes that use LVM as block
|
||||
storage to protect against failures of individual hard drives.
|
||||
However, RAID does not protect against a failure of the entire
|
||||
host.
|
||||
|
||||
ZFS
|
||||
The Solaris iSCSI driver for OpenStack Block Storage implements
|
||||
blocks as ZFS entities. ZFS is a file system that also has the
|
||||
functionality of a volume manager. This is unlike on a Linux system,
|
||||
where there is a separation of volume manager (LVM) and file system
|
||||
(such as, ext3, ext4, xfs, and btrfs). ZFS has a number of
|
||||
advantages over ext4, including improved data-integrity checking.
|
||||
|
||||
The ZFS back end for OpenStack Block Storage supports only
|
||||
Solaris-based systems, such as Illumos. While there is a Linux port
|
||||
of ZFS, it is not included in any of the standard Linux
|
||||
distributions, and it has not been tested with OpenStack Block
|
||||
Storage. As with LVM, ZFS does not provide replication across hosts
|
||||
on its own; you need to add a replication solution on top of ZFS if
|
||||
your cloud needs to be able to handle storage-node failures.
|
||||
|
||||
We don't recommend ZFS unless you have previous experience with
|
||||
deploying it, since the ZFS back end for Block Storage requires a
|
||||
Solaris-based operating system, and we assume that your experience
|
||||
is primarily with Linux-based systems.
|
||||
|
||||
Sheepdog
|
||||
Sheepdog is a userspace distributed storage system. Sheepdog scales
|
||||
to several hundred nodes, and has powerful virtual disk management
|
||||
features like snapshot, cloning, rollback, thin provisioning.
|
||||
|
||||
It is essentially an object storage system that manages disks and
|
||||
aggregates the space and performance of disks linearly in hyper
|
||||
scale on commodity hardware in a smart way. On top of its object
|
||||
store, Sheepdog provides elastic volume service and http service.
|
||||
Sheepdog does not assume anything about kernel version and can work
|
||||
nicely with xattr-supported file systems.
|
||||
|
||||
|
@ -2,23 +2,47 @@
|
||||
Design
|
||||
======
|
||||
|
||||
Compute service
|
||||
~~~~~~~~~~~~~~~
|
||||
Designing an OpenStack cloud requires a understanding of the cloud user's
|
||||
requirements and needs to determine the best possible configuration. This
|
||||
chapter provides guidance on the decisions you need to make during the
|
||||
design process.
|
||||
|
||||
Storage
|
||||
~~~~~~~
|
||||
To design, deploy, and configure OpenStack, administrators must
|
||||
understand the logical architecture. OpenStack modules are one of the
|
||||
following types:
|
||||
|
||||
Networking service
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
Daemon
|
||||
Runs as a background process. On Linux platforms, a daemon is usually
|
||||
installed as a service.
|
||||
|
||||
Identity service
|
||||
~~~~~~~~~~~~~~~~
|
||||
Script
|
||||
Installs a virtual environment and runs tests.
|
||||
|
||||
Image service
|
||||
~~~~~~~~~~~~~
|
||||
Command-line interface (CLI)
|
||||
Enables users to submit API calls to OpenStack services through commands.
|
||||
|
||||
Control Plane
|
||||
~~~~~~~~~~~~~
|
||||
:ref:`logical_architecture` shows one example of the most common
|
||||
integrated services within OpenStack and how they interact with each
|
||||
other. End users can interact through the dashboard, CLIs, and APIs.
|
||||
All services authenticate through a common Identity service, and
|
||||
individual services interact with each other through public APIs, except
|
||||
where privileged administrator commands are necessary.
|
||||
|
||||
Dashboard and APIs
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
.. _logical_architecture:
|
||||
|
||||
.. figure:: common/figures/osog_0001.png
|
||||
:width: 100%
|
||||
:alt: OpenStack Logical Architecture
|
||||
|
||||
OpenStack Logical Architecture
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
design-compute.rst
|
||||
design-storage.rst
|
||||
design-networking.rst
|
||||
design-identity.rst
|
||||
design-images.rst
|
||||
design-control-plane.rst
|
||||
design-dashboard-api.rst
|
||||
|
BIN
doc/arch-design-draft/source/figures/Check_mark_23x20_02.png
Normal file
After Width: | Height: | Size: 3.0 KiB |
Before Width: | Height: | Size: 52 KiB |
Before Width: | Height: | Size: 39 KiB |
Before Width: | Height: | Size: 35 KiB |
Before Width: | Height: | Size: 79 KiB |
Before Width: | Height: | Size: 70 KiB |
Before Width: | Height: | Size: 24 KiB |
Before Width: | Height: | Size: 42 KiB |
Before Width: | Height: | Size: 59 KiB |
Before Width: | Height: | Size: 54 KiB |
Before Width: | Height: | Size: 54 KiB |
Before Width: | Height: | Size: 68 KiB |
Before Width: | Height: | Size: 50 KiB |
Before Width: | Height: | Size: 52 KiB |
Before Width: | Height: | Size: 75 KiB |
Before Width: | Height: | Size: 37 KiB |
Before Width: | Height: | Size: 56 KiB |
Before Width: | Height: | Size: 46 KiB |
Before Width: | Height: | Size: 56 KiB |
Before Width: | Height: | Size: 30 KiB |
Before Width: | Height: | Size: 22 KiB |
Before Width: | Height: | Size: 25 KiB |
Before Width: | Height: | Size: 50 KiB |
Before Width: | Height: | Size: 50 KiB |
Before Width: | Height: | Size: 35 KiB |
BIN
doc/common/figures/osog_0001.png
Normal file
After Width: | Height: | Size: 765 KiB |