deb-sahara/doc/source/userdoc/overview.rst
Michael McCune c2ea4fa572 Updating userdoc overview
* formatted text for 79 chars
* minor grammar fixes

Change-Id: Ib68ebfc0cfd2d9cdb987b60f39347a96c5873741
Partial-Bug: 1490687
2015-09-01 10:34:29 -04:00

72 lines
2.9 KiB
ReStructuredText

Getting Started
================
Clusters
--------
A cluster deployed by sahara consists of node groups. Node groups vary by
their role, parameters and number of machines. The picture below
illustrates an example of a Hadoop cluster consisting of 3 node groups each
having a different role (set of processes).
.. image:: ../images/hadoop-cluster-example.jpg
Node group parameters include Hadoop parameters like ``io.sort.mb`` or
``mapred.child.java.opts``, and several infrastructure parameters like the
flavor for VMs or storage location (ephemeral drive or cinder volume).
A cluster is characterized by its node groups and its parameters. Like a node
group, a cluster has data processing framework and infrastructure parameters.
An example of a cluster-wide Hadoop parameter is ``dfs.replication``. For
infrastructure, an example could be image which will be used to launch cluster
VMs.
Templates
---------
In order to simplify cluster provisioning sahara employs the concept of
templates. There are two kinds of templates: node group templates and
cluster templates. The former is used to create node groups, the latter
- clusters. Essentially templates have the very same parameters as
corresponding entities. Their aim is to remove the burden of specifying all
of the required parameters each time a user wants to launch a cluster.
In the REST interface, templates have extended functionality. First you can
specify node-scoped parameters here, they will work as defaults for node
groups. Also with the REST interface, during cluster creation a user can
override template parameters for both cluster and node groups.
Provisioning Plugins
--------------------
A provisioning plugin is a component responsible for provisioning a data
processing cluster. Generally each plugin is capable of provisioning a
specific data processing framework or Hadoop distribution. Also the plugin
can install management and/or monitoring tools for a cluster.
Since framework configuration parameters vary depending on the distribution
and the version, templates are always plugin and version specific. A template
cannot be used if the plugin, or framework, versions are different than the
ones they were created for.
You may find the list of available plugins on that page: :doc:`plugins`
Image Registry
--------------
OpenStack starts VMs based on a pre-built image with an installed OS. The image
requirements for sahara depend on the plugin and data processing framework
version. Some plugins require just a basic cloud image and will install the
framework on the VMs from scratch. Some plugins might require images with
pre-installed frameworks or Hadoop distributions.
The Sahara Image Registry is a feature which helps filter out images during
cluster creation. See :doc:`registering_image` for details on how to work
with Image Registry.
Features
--------
Sahara has several interesting features. The full list could be found there:
:doc:`features`