deb-sahara/doc/source/userdoc/overview.rst
Shilla Saebi 499ba6ef93 corrected error in overview.rst
creationm

Change-Id: If3366c70f92dc42a21490e2936d6311e5c194f36
Closes-Bug: #1391994
2014-11-12 13:39:04 -05:00

2.8 KiB

Getting Started

Clusters

A cluster deployed by Sahara consists of node groups. Node groups vary by their role, parameters and number of machines. The picture below illustrates an example of a Hadoop cluster consisting of 3 node groups each having a different role (set of processes).

image

Node group parameters include Hadoop parameters like io.sort.mb or mapred.child.java.opts, and several infrastructure parameters like the flavor for VMs or storage location (ephemeral drive or Cinder volume).

A cluster is characterized by its node groups and its parameters. Like a node group, a cluster has Hadoop and infrastructure parameters. An example of a cluster-wide Hadoop parameter is dfs.replication. For infrastructure, an example could be image which will be used to launch cluster VMs.

Templates

In order to simplify cluster provisioning Sahara employs the concept of templates. There are two kinds of templates: node group templates and cluster templates. The former is used to create node groups, the latter - clusters. Essentially templates have the very same parameters as corresponding entities. Their aim is to remove the burden of specifying all of the required parameters each time a user wants to launch a cluster.

In the REST interface, templates have extended functionality. First you can specify node-scoped parameters here, they will work as a defaults for node groups. Also with the REST interface, during cluster creation a user can override template parameters for both cluster and node groups.

Provisioning Plugins

A provisioning plugin is a component responsible for provisioning a Hadoop cluster. Generally each plugin is capable of provisioning a specific Hadoop distribution. Also the plugin can install management and/or monitoring tools for a cluster.

Since Hadoop parameters vary depending on distribution and the Hadoop version, templates are always plugin and Hadoop version specific. A template cannot be used if the plugin/Hadoop versions are different than the ones they were created for.

You may find the list of available plugins on that page: plugins

Image Registry

OpenStack starts VMs based on a pre-built image with an installed OS. The image requirements for Sahara depend on the plugin and Hadoop version. Some plugins require just a basic cloud image and will install Hadoop on the VMs from scratch. Some plugins might require images with pre-installed Hadoop.

The Sahara Image Registry is a feature which helps filter out images during cluster creation. See registering_image for details on how to work with Image Registry.

Features

Sahara has several interesting features. The full list could be found there: features