c2ea4fa572
* formatted text for 79 chars * minor grammar fixes Change-Id: Ib68ebfc0cfd2d9cdb987b60f39347a96c5873741 Partial-Bug: 1490687
72 lines
2.9 KiB
ReStructuredText
72 lines
2.9 KiB
ReStructuredText
Getting Started
|
|
================
|
|
|
|
Clusters
|
|
--------
|
|
|
|
A cluster deployed by sahara consists of node groups. Node groups vary by
|
|
their role, parameters and number of machines. The picture below
|
|
illustrates an example of a Hadoop cluster consisting of 3 node groups each
|
|
having a different role (set of processes).
|
|
|
|
.. image:: ../images/hadoop-cluster-example.jpg
|
|
|
|
Node group parameters include Hadoop parameters like ``io.sort.mb`` or
|
|
``mapred.child.java.opts``, and several infrastructure parameters like the
|
|
flavor for VMs or storage location (ephemeral drive or cinder volume).
|
|
|
|
A cluster is characterized by its node groups and its parameters. Like a node
|
|
group, a cluster has data processing framework and infrastructure parameters.
|
|
An example of a cluster-wide Hadoop parameter is ``dfs.replication``. For
|
|
infrastructure, an example could be image which will be used to launch cluster
|
|
VMs.
|
|
|
|
Templates
|
|
---------
|
|
|
|
In order to simplify cluster provisioning sahara employs the concept of
|
|
templates. There are two kinds of templates: node group templates and
|
|
cluster templates. The former is used to create node groups, the latter
|
|
- clusters. Essentially templates have the very same parameters as
|
|
corresponding entities. Their aim is to remove the burden of specifying all
|
|
of the required parameters each time a user wants to launch a cluster.
|
|
|
|
In the REST interface, templates have extended functionality. First you can
|
|
specify node-scoped parameters here, they will work as defaults for node
|
|
groups. Also with the REST interface, during cluster creation a user can
|
|
override template parameters for both cluster and node groups.
|
|
|
|
Provisioning Plugins
|
|
--------------------
|
|
|
|
A provisioning plugin is a component responsible for provisioning a data
|
|
processing cluster. Generally each plugin is capable of provisioning a
|
|
specific data processing framework or Hadoop distribution. Also the plugin
|
|
can install management and/or monitoring tools for a cluster.
|
|
|
|
Since framework configuration parameters vary depending on the distribution
|
|
and the version, templates are always plugin and version specific. A template
|
|
cannot be used if the plugin, or framework, versions are different than the
|
|
ones they were created for.
|
|
|
|
You may find the list of available plugins on that page: :doc:`plugins`
|
|
|
|
Image Registry
|
|
--------------
|
|
|
|
OpenStack starts VMs based on a pre-built image with an installed OS. The image
|
|
requirements for sahara depend on the plugin and data processing framework
|
|
version. Some plugins require just a basic cloud image and will install the
|
|
framework on the VMs from scratch. Some plugins might require images with
|
|
pre-installed frameworks or Hadoop distributions.
|
|
|
|
The Sahara Image Registry is a feature which helps filter out images during
|
|
cluster creation. See :doc:`registering_image` for details on how to work
|
|
with Image Registry.
|
|
|
|
Features
|
|
--------
|
|
|
|
Sahara has several interesting features. The full list could be found there:
|
|
:doc:`features`
|