Browse Source

Merge "Restructure the documentation according the new spec"

Jenkins 4 years ago
committed by Gerrit Code Review
  1. 95
  2. 56


@ -0,0 +1,95 @@
.. _diskimage-builder-label:
Building Images for Vanilla Plugin
In this document you will find instruction on how to build Ubuntu, Fedora, and
CentOS images with Apache Hadoop version 2.x.x.
As of now the vanilla plugin works with images with pre-installed versions of
Apache Hadoop. To simplify the task of building such images we use
`Disk Image Builder <>`_.
`Disk Image Builder` builds disk images using elements. An element is a
particular set of code that alters how the image is built, or runs within the
chroot to prepare the image.
Elements for building vanilla images are stored in the
`Sahara image elements repository <>`_
.. note::
Sahara requires images with cloud-init package installed:
* `For Fedora <>`_
* `For Ubuntu 14 <>`_
* `For CentOS 6 <>`_
* `For CentOS 7 <>`_
To create vanilla images follow these steps:
1. Clone repository ""
2. Use tox to build images.
You can run the command below in sahara-image-elements
directory to build images. By default this script will attempt to create
cloud images for all versions of supported plugins and all operating systems
(subset of Ubuntu, Fedora, and CentOS depending on plugin).
.. sourcecode:: console
tox -e venv -- sahara-image-create -u
If you want to build Vanilla 2.7.1 image with centos 7 just execute:
.. sourcecode:: console
tox -e venv -- sahara-image-create -p vanilla -v 2.7.1 -i centos7
Tox will create a virtualenv and install required python packages in it,
clone the repositories "" and
"" and export necessary
* ``DIB_HADOOP_VERSION`` - version of Hadoop to install
* ``JAVA_DOWNLOAD_URL`` - download link for JDK (tarball or bin)
* ``OOZIE_DOWNLOAD_URL`` - download link for OOZIE (we have built
Oozie libs here: ````)
* ``SPARK_DOWNLOAD_URL`` - download link for Spark
* ``HIVE_VERSION`` - version of Hive to install
(currently supports only 0.11.0)
* ``ubuntu_image_name``
* ``fedora_image_name``
* ``DIB_IMAGE_SIZE`` - parameter that specifies a volume of hard disk
of instance. You need to specify it only for Fedora because Fedora
doesn't use all available volume
* ``DIB_COMMIT_ID`` - latest commit id of diskimage-builder project
* ``SAHARA_ELEMENTS_COMMIT_ID`` - latest commit id of
sahara-image-elements project
NOTE: If you don't want to use default values, you should set your values
of parameters.
Then it will create required cloud images using image elements that install
all the necessary packages and configure them. You will find created images
in the parent directory.
.. note::
Disk Image Builder will generate QCOW2 images, used with the default
OpenStack Qemu/KVM hypervisors. If your OpenStack uses a different
hypervisor, the generated image should be converted to an appropriate
VMware Nova backend requires VMDK image format. You may use qemu-img
utility to convert a QCOW2 image to VMDK.
.. sourcecode:: console
qemu-img convert -O vmdk <original_image>.qcow2 <converted_image>.vmdk
For finer control of see the `official documentation


@ -0,0 +1,56 @@
Vanilla Plugin
The vanilla plugin is a reference implementation which allows users to operate
a cluster with Apache Hadoop.
Since the Newton release Spark is integrated into the Vanilla plugin so you
can launch Spark jobs on a Vanilla cluster.
For cluster provisioning prepared images should be used. They already have
Apache Hadoop 2.7.1 installed.
You may build images by yourself using :doc:`vanilla-imagebuilder` or you could
download prepared images from
Vanilla plugin requires an image to be tagged in Sahara Image Registry with
two tags: 'vanilla' and '<hadoop version>' (e.g. '2.7.1').
The default username specified for these images is different
for each distribution:
| OS | username |
| Ubuntu 14 | ubuntu |
| Fedora 20 | fedora |
| CentOS 6 | cloud-user |
| CentOS 7 | centos |
Cluster Validation
When user creates or scales a Hadoop cluster using a Vanilla plugin,
the cluster topology requested by user is verified for consistency.
Currently there are the following limitations in cluster topology for Vanilla
For Vanilla Hadoop version 2.x.x:
+ Cluster must contain exactly one namenode
+ Cluster can contain at most one resourcemanager
+ Cluster can contain at most one secondary namenode
+ Cluster can contain at most one historyserver
+ Cluster can contain at most one oozie and this process is also required
for EDP
+ Cluster can't contain oozie without resourcemanager and without
+ Cluster can't have nodemanager nodes if it doesn't have resourcemanager
+ Cluster can have at most one hiveserver node.
+ Cluster can have at most one spark history server and this process is also
required for Spark EDP (Spark is available since the Newton release).