Updated project documentation.

- Added developer guide
- Added installation guide
- Removed screenshots of Horizon pages

Change-Id: I827e9dad490099b7825ed812452d89fc7d4d0e83
This commit is contained in:
Ruslan Kamaldinov 2013-05-20 00:54:23 +04:00
parent 9b9b693098
commit 1badfe1938
25 changed files with 611 additions and 396 deletions

View File

@ -0,0 +1,153 @@
Setting Up a Development Environment
====================================
This page describes how to point a local running Savanna instance to a Devstack deployed in a VM.
You should be able to debug and test your changes without having to deploy.
Setup VM for DevStack
---------------------
In order to run Devstsack in a local VM, you need to start by installing a guest with Ubuntu 12.04 server.
Download an image file from `Ubuntu's web site <http://www.ubuntu.com/download/server>`_ and create a new guest from it.
Virtualization solution should support nested virtualization.
**On Mac OS X Systems**
Install VMWare Fusion and create a new VM with Ubuntu Server 12.04.
Recommended settings:
- Processor - at least 2 cores
- Enable hypervisor applications in this virtual machine
- Memory - at least 4GB
- Hard Drive - at least 60GB
**On Linux Systems**
Use KVM
TBD: add more details.
Install DevStack on VM
----------------------
Now we are going to install DevStack in VM we just created. So, connect to VM with secure shell and follow instructions.
1. Clone DevStack:
.. sourcecode:: bash
sudo apt-get install git-core
git clone https://github.com/openstack-dev/devstack.git
2. Create file localrc in devstack directory with the following content:
.. sourcecode:: bash
ADMIN_PASSWORD=nova
MYSQL_PASSWORD=nova
RABBIT_PASSWORD=nova
SERVICE_PASSWORD=$ADMIN_PASSWORD
SERVICE_TOKEN=nova
# keystone is now configured by default to use PKI as the token format which produces huge tokens.
# set UUID as keystone token format which is much shorter and easier to work with.
KEYSTONE_TOKEN_FORMAT=UUID
# Change the FLOATING_RANGE to whatever IPs VM is working in.
# In NAT mode it is subnet VMWare Fusion provides, in bridged mode it is your local network.
# But only use the top end of the network by using a /27 and starting at the 224 octet.
FLOATING_RANGE=172.16.94.224/27
# Enable auto assignment of floating IPs. By default Savanna expects this setting to be enabled
EXTRA_OPTS=(auto_assign_floating_ip=True)
3. Start DevStack:
.. sourcecode:: bash
./stack.sh
4. Once previous step is finished Devstack will print Horizon URL.
Navigate to this URL and login with login "admin" and password from localrc.
5. Now we need to modify security rules. It will allow to connect to VMs directly from your host.
Navigate to project's "Admin" security tab and edit default Security Group rules:
TCP, Port range 1-65535, CIDR, 0.0.0.0/0
ICMP, -1, -1, CIDR, 0.0.0.0/0
6. Congratulations! You have OpenStack running in your VM and ready to launch VMs inside that VM :)
Setup Local Environment
-----------------------
Now we are going to setup development environment for Savanna on your OS.
1. Install prerequisites
On OS X Systems:
.. sourcecode:: bash
# we actually need pip, which is part of python package
brew install python
pip install virtualenv tox
On Ubuntu:
.. sourcecode:: bash
sudo apt-get install python-dev python-virtualenv
sydo pip install tox
On Fedora-based distributions (e.g., Fedora/RHEL/CentOS/Scientific Linux):
.. sourcecode:: bash
sudo yum install python-devel python-virtualenv
sudo pip install tox
2. Grab the code from GitHub:
.. sourcecode:: bash
git clone git://github.com/stackforge/savanna.git
cd savanna
3. Prepare virtual environment:
.. sourcecode:: bash
tools/install_venv
4. Create config file from default template:
.. sourcecode:: bash
cp ./etc/savanna/savanna.conf.sample ./etc/savanna/savanna.conf
5. Look through the savanna.conf and change parameters which default values do not suite you.
.. note::
Config file can be specified for ``savanna-api`` and ``savanna-manage`` commands using ``--config-file`` flag.
6. To initialize Savanna database with predefined configs and templates just call:
.. sourcecode:: bash
tox -evenv -- savanna-manage --config-file etc/savanna/savanna.conf reset-db --with-gen-templates
Virtualenv with all requirements is now installed into ``.tox/venv``.
7. To start Savanna call:
.. sourcecode:: bash
tox -evenv -- savanna-api --config-file etc/savanna/savanna.conf --allow-cluster-ops

View File

@ -0,0 +1,68 @@
Development Guidelines
======================
Coding Guidelines
-----------------
For all the code in Savanna we have a rule - it should pass `PEP 8`_.
To check your code against PEP 8 run:
.. sourcecode:: bash
tox -e pep8
.. note::
For more details on coding guidelines see file ``HACKING.rst`` in the root of Savanna repo.
Testing Guidelines
------------------
Savanna has a suite of tests that are run on all submitted code,
and it is recommended that developers execute the tests themselves to
catch regressions early. Developers are also expected to keep the
test suite up-to-date with any submitted code changes.
Savanna's suite of unit tests can be executed in an isolated environment
with `Tox`_. To execute the unit tests run the following from the root of Savanna repo:
.. sourcecode:: bash
tox -e py27
Documentation Guidelines
------------------------
The documentation in docstrings should follow the `PEP 257`_ conventions
(as mentioned in the `PEP 8`_ guidelines).
More specifically:
1. Triple quotes should be used for all docstrings.
2. If the docstring is simple and fits on one line, then just use
one line.
3. For docstrings that take multiple lines, there should be a newline
after the opening quotes, and before the closing quotes.
4. `Sphinx`_ is used to build documentation, so use the restructured text
markup to designate parameters, return values, etc. Documentation on
the sphinx specific markup can be found here:
To build documentation execute. You will find html pages at ``doc/build/html``:
.. sourcecode:: bash
tox -e docs
.. note::
For more details on documentation guidelines see file HACKING.rst in the root of Savanna repo.
.. _PEP 8: http://www.python.org/dev/peps/pep-0008/
.. _PEP 257: http://www.python.org/dev/peps/pep-0257/
.. _Tox: http://tox.testrun.org/
.. _Sphinx: http://sphinx.pocoo.org/markup/index.html

View File

@ -0,0 +1,16 @@
Code Reviews with Gerrit
========================
Savanna uses the `Gerrit`_ tool to review proposed code changes. The review site
is http://review.openstack.org.
Gerrit is a complete replacement for Github pull requests. `All Github pull
requests to the Savanna repository will be ignored`.
See `Gerrit Workflow Quick Reference`_ for information about how to get
started using Gerrit. See `Gerrit, Jenkins and Github`_ for more detailed
documentation on how to work with Gerrit.
.. _Gerrit: http://code.google.com/p/gerrit
.. _Gerrit, Jenkins and Github: http://wiki.openstack.org/GerritJenkinsGithub
.. _Gerrit Workflow Quick Reference: http://wiki.openstack.org/GerritWorkflow

View File

@ -1,7 +1,7 @@
How to Participate
==================
You can browse the code at `our github repo <https://github.com/stackforge/savanna>`_ and :doc:`here <\quickstart>` are the instructions on how to launch Savanna.
You can browse the code at `our github repo <https://github.com/stackforge/savanna>`_.
If you would like to ask some questions or make proposals,
feel free to reach us on #savanna irc channel at `freenode <http://freenode.net/>`_.
@ -14,5 +14,4 @@ We're going to hold public weekly meetings on Thursday at 18:00 UTC on #openstac
If you want to contribute either to docs or to code, simply send us change request via review.openstack.org (gerrit).
You can file `bugs <https://bugs.launchpad.net/savanna>`_ and register `blueprints <https://blueprints.launchpad.net/savanna>`_ at
`Savanna launchpad page <https://launchpad.net/savanna>`_.
`Savanna launchpad page <https://launchpad.net/savanna>`_.

View File

@ -0,0 +1,31 @@
Developer Guide
===============
Programming HowTos and Tutorials
--------------------------------
.. toctree::
:maxdepth: 3
development.guidelines
development.environment
unit_tests
how_to_participate
Background Concepts for Savanna
-------------------------------
.. toctree::
:maxdepth: 3
plugins
templates
Other Resources
---------------
.. toctree::
:maxdepth: 3
launchpad
gerrit
jenkins

View File

@ -0,0 +1,2 @@
Continuous Integration with Jenkins
===================================

View File

@ -0,0 +1,48 @@
Project hosting with Launchpad
==============================
`Launchpad`_ hosts the Savanna project. The Savanna project homepage on Launchpad is
http://launchpad.net/savanna.
Launchpad credentials
---------------------
Creating a login on Launchpad is important even if you don't use the Launchpad
site itself, since Launchpad credentials are used for logging in on several
OpenStack-related sites. These sites include:
* `Wiki`_
* Gerrit (see :doc:`gerrit`)
* Jenkins (see :doc:`jenkins`)
Mailing list
------------
The mailing list email is ``savanna-all@lists.launchpad.net``. To participate in the mailing list:
#. Join the `Savanna Team`_ on Launchpad.
#. Subscribe to the list on the `Savanna Team`_ page on Launchpad.
The mailing list archives are at https://lists.launchpad.net/savanna-all
Bug tracking
------------
Report Savanna bugs at https://bugs.launchpad.net/savanna
Feature requests (Blueprints)
-----------------------------
Savanna uses Launchpad Blueprints to track feature requests. Blueprints are at
https://blueprints.launchpad.net/savanna.
Technical support (Answers)
---------------------------
Savanna uses Launchpad Answers to track Savanna technical support questions. The Savanna
Answers page is at https://answers.launchpad.net/savanna
.. _Launchpad: http://launchpad.net
.. _Wiki: http://wiki.openstack.org/savanna
.. _Savanna Team: https://launchpad.net/~savanna-all

View File

@ -0,0 +1,13 @@
Pluggable mechanism
===================
Savanna Pluggable Provisioning Mechanism aims to deploy Hadoop clusters and integrate them with 3rd party vendor
management tools like Cloudera Management Console, Hortonworks Ambari, Intel Hadoop Distribution and monitoring tools
like NagiOS, Zabbix, Ganglia.
`Read full specification here <https://wiki.openstack.org/wiki/Savanna/PluggableProvisioning>`_.
.. note::
Object model, flow and detailed description will be moved here once this functionality is checked into repository.

View File

@ -0,0 +1,12 @@
Cluster templates
=================
Templates is an attempt to provide a simple unified mean of Hadoop cluster configuration.
See the wiki link for more details
`Read full specification here <https://wiki.openstack.org/wiki/Savanna/Templates>`_.
.. note::
Details of templates mechanism will be moved here once this functionality is is checked into repository.

View File

@ -0,0 +1,22 @@
Unit Tests
====================================
Savanna contains a suite of unit tests, in the savanna/tests directory.
Any proposed code change will be automatically rejected by the OpenStack
Jenkins server [#f1]_ if the change causes unit test failures.
Running the tests
-----------------
Run the unit tests by doing:
.. sourcecode:: bash
./tools/run_tests.sh
.. rubric:: Footnotes
.. [#f1] See :doc:`jenkins`.

View File

@ -1,49 +0,0 @@
********************************
Custom Horizon pages for Savanna
********************************
Some pages has been implemented and there are some screenshots of it.
Screencast and sources of customized Horizon will be published at an early date.
1. Base page with Hadoop clusters list (empty cluster list).
.. image:: ../images/horizon/01-savanna-empty.png
:width: 800 px
:scale: 99 %
:align: left
2. Node template details page
.. image:: ../images/horizon/02-savanna-tmpl.png
:width: 800 px
:scale: 99 %
:align: left
3. Hadoop cluster creation wizard
.. image:: ../images/horizon/03-savanna-create.png
:width: 800 px
:scale: 99 %
:align: left
4. Hadoop clusters page with new cluster in 'Starting' state
.. image:: ../images/horizon/04-savanna-starting.png
:width: 800 px
:scale: 99 %
:align: left
5. After some time, cluster is active
.. image:: ../images/horizon/05-savanna-active.png
:width: 800 px
:scale: 99 %
:align: left
6. Hadoop cluster details page
.. image:: ../images/horizon/06-savanna-details.png
:width: 800 px
:scale: 99 %
:align: left

View File

@ -1,18 +1,17 @@
*********************
Savanna Horizon Setup
*********************
=====================
1 Setup prerequisites
=====================
---------------------
1.1 OpenStack environment (Folsom+ version) installed.
1.2 Savanna REST API service installed and configured. You can find :doc:`quickstart guide here <..\quickstart>`.
1.2 Savanna REST API service installed and configured.
1.3 Operating system, where Savanna Horizons service installed, has to be connected to internal OpenStack network.
2 Savanna-Horizon Installation
==============================
------------------------------
2.1 Go to your Horizons machine and install the following packets:
@ -99,7 +98,7 @@ Here are the required modifications
3 Configure apache2 server
==========================
--------------------------
3.1 Install apache and mod_wsgi
@ -164,4 +163,4 @@ Now all installations are done and Horizon can be started:
sudo service apache2 restart
You can check that service has been started successfully. Go to Horizon URL and you'll be able to see :doc:`Savanna pages <\index>` in the Project tab.
You can check that service has been started successfully. Go to Horizon URL and you'll be able to see Savanna pages in the Project tab.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

View File

@ -1,171 +1,50 @@
Welcome to Savanna documentation!
=================================
Welcome to Savanna!
===================
Useful links
------------
.. toctree::
:maxdepth: 1
overview
architecture
devref/index
installation.guide
quickstart
* `Savanna wiki <https://wiki.openstack.org/wiki/Savanna>`_
* `Savanna roadmap <https://wiki.openstack.org/wiki/Savanna/Roadmap>`_
.. include:: introduction.rst.inc
Project overview
----------------
.. toctree::
:maxdepth: 1
overview
architecture
roadmap
restapi/v02
quickstart
horizon/index
horizon/howto
how-to-participate
* `Sources repo <https://github.com/stackforge/savanna>`_
* `Launchpad project (bugs, blueprints, etc.) <https://launchpad.net/savanna>`_
* `Savanna REST API and custom Horizon screencast <http://www.youtube.com/watch?v=UUt2gqGHcPg>`_
* `Savanna talk slides from OpenStack Summit'13 <http://www.slideshare.net/mirantis/savanna-hadoop-on-openstack>`_
Introduction
------------
Apache Hadoop is an industry standard and widely adopted MapReduce implementation.
The aim of this project is to enable users to easily provision and manage Hadoop clusters on OpenStack.
It is worth mentioning that Amazon provides Hadoop for several years as Amazon Elastic MapReduce (EMR) service.
Savanna aims to provide users with simple means to provision a Hadoop cluster
by specifying several parameters like Hadoop version, cluster topology, nodes hardware details
and a few more. After user fills in all the parameters, Savanna deploys the cluster in a few minutes.
Also Savanna provides means to scale already provisioned cluster by adding/removing worker nodes on demand.
The solution will address following use cases:
* fast provisioning of Hadoop clusters on OpenStack for Dev and QA;
* utilization of unused compute power from general purpose OpenStack IaaS cloud;
* "Analytics as a Service" for ad-hoc or bursty analytic workloads (similar to AWS EMR).
Key features are:
* designed as an OpenStack component;
* managed through REST API with UI available as part of OpenStack Dashboard;
* support for different Hadoop distributions:
* pluggable system of Hadoop installation engines;
* integration with vendor specific management tools, such as Apache Ambari or Cloudera Management Console;
* predefined templates of Hadoop configurations with ability to modify parameters.
Details
-------
The Savanna product communicates with the following OpenStack components:
* Horizon - provides GUI with ability to use all of Savannas features;
* Keystone - authenticates users and provides security token that is used to work with the OpenStack,
hence limiting user abilities in Savanna to his OpenStack privileges;
* Nova - is used to provision VMs for Hadoop Cluster;
* Glance - Hadoop VM images are stored there, each image containing an installed OS and Hadoop;
the pre-installed Hadoop should give us good handicap on node start-up;
* Swift - can be used as a storage for data that will be processed by Hadoop jobs.
.. image:: images/openstack-interop.png
:width: 800 px
:scale: 99 %
:align: left
General Workflow
----------------
Savanna will provide two level of abstraction for API and UI based on the addressed use cases:
cluster provisioning and analytics as a service.
For the fast cluster provisioning generic workflow will be as following:
* select Hadoop version;
* select base image with or without pre-installed Hadoop:
* for base images without Hadoop pre-installed Savanna will support pluggable deployment engines integrated with vendor tooling;
* define cluster configuration, including size and topology of the cluster and setting the different type of Hadoop parameters (e.g. heap size):
* to ease the configuration of such parameters mechanism of configurable templates will be provided;
* provision the cluster: Savanna will provision VMs, install and configure Hadoop;
* operation on the cluster: add/remove nodes;
* terminate the cluster when its not needed anymore.
For analytic as a service generic workflow will be as following:
* select one of predefined Hadoop versions;
* configure the job:
* choose type of the job: pig, hive, jar-file, etc.;
* provide the job script source or jar location;
* select input and output data location (initially only Swift will be supported);
* select location for logs;
* set limit for the cluster size;
* execute the job:
* all cluster provisioning and job execution will happen transparently to the user;
* cluster will be removed automatically after job completion;
* get the results of computations (for example, from Swift).
Users Perspective
------------------
While provisioning cluster through Savanna, user operates on two types of entities: Node Templates and Clusters.
Node Template describes a node within cluster and it has several parameters. Node Type is one of the Node Templates
properties that determines what Hadoop processes will be running on the node and thereby its role in the cluster.
It could be either of JobTracker, NameNode, TaskTracker or DataNode, or any logical combination of these.
Also template encapsulates hardware parameters (flavor) for the node VM and configuration for Hadoop processes running on the node.
Cluster entity simply represents Hadoop Cluster. It is mainly characterized by VM image with pre-installed Hadoop which
will be used for cluster deployment and cluster topology. The topology is a list of node templates and respectively
amount of nodes being deployed for each template. With respect to topology, Savanna checks only that cluster has one JobTracker and one NameNode.
Each node template and cluster belongs to some tenant determined by user. Users have access only to objects located in
tenants they have access to. Users could edit/delete only objects they created. Naturally admin users have full access to every object.
That way Savanna complies with general OpenStack access policy.
Savanna provides several kinds of Hadoop cluster topology. JobTracker and NameNode processes could be run either on a single
VM or two separate ones. Also cluster could contain worker nodes of different types. Worker nodes could run both TaskTracker and DataNode,
or either of these processes alone. Savanna allows user to create cluster with any combination of these options.
Integration with Swift
----------------------
The Swift service is a standard object storage in OpenStack environment, analog of Amazon S3. As a rule it is deployed
on bare metal machines. It is natural to expect Hadoop on OpenStack to process data stored there. There are a couple
of enhancements on the way which can help there.
First, a FileSystem implementation for Swift: `HADOOP-8545 <https://issues.apache.org/jira/browse/HADOOP-8545>`_.
With that thing in place, Hadoop jobs can work with Swift
as naturally as with HDFS.
On the Swift side, we have the change request: `Change I6b1ba25b <https://review.openstack.org/#/c/21015/>`_ (merged).
It implements the ability to list endpoints for an object, account or container, to make it possible to integrate swift
with software that relies on data locality information to avoid network overhead.
Pluggable Deployment and Monitoring
-----------------------------------
In addition to the monitoring capabilities provided by vendor-specific Hadoop management tooling, Savanna will provide pluggable integration with external monitoring systems such as Nagios or Zabbix.
Both deployment and monitoring tools will be installed on stand-alone VMs, thus allowing a single instance to manage/monitor several clusters at once.
Useful links
------------
Developer Docs
--------------
.. toctree::
:maxdepth: 2
:maxdepth: 3
devref/index
.. toctree::
:maxdepth: 1
architecture
roadmap
restapi/v02
quickstart
horizon/index
horizon/howto
how-to-participate
* `Sources repo <https://github.com/stackforge/savanna>`_
* `Launchpad project (bugs, blueprints, etc.) <https://launchpad.net/savanna>`_
* `Savanna REST API and custom Horizon screencast <http://www.youtube.com/watch?v=UUt2gqGHcPg>`_
* `Savanna talk slides from OpenStack Summit'13 <http://www.slideshare.net/mirantis/savanna-hadoop-on-openstack>`_
Installation guides
-------------------
.. toctree::
:maxdepth: 1
installation.guide
horizon/installation.guide

View File

@ -0,0 +1,39 @@
Savanna Installation Guide
==========================
1. You can install the latest Savanna release version from pypi:
.. sourcecode:: bash
sudo pip install savanna
Or you can get Savanna archive from http://tarballs.openstack.org/savanna/ and install it using pip:
.. sourcecode:: bash
sudo pip install http://tarballs.openstack.org/savanna/savanna-master.tar.gz#egg=savanna
.. note::
savanna-master.tar.gz contains the latest changes in the source code.
savanna-some_version.tar.gz contains features related to specified Savanna release.
2. After installation you should create configuration file or change default config to run Savanna properly. Default config file is located in:
.. sourcecode:: bash
sudo mkdir /etc/savanna
sudo cp /usr/local/share/savanna/savanna.conf.sample /etc/savanna/savanna.conf
3. To initialize Savanna database with created configuration just call:
.. sourcecode:: bash
savanna-manage --config-file /etc/savanna/savanna.conf reset-db --with-gen-templates
4. To start Savanna call:
.. sourcecode:: bash
savanna-api --config-file /etc/savanna/savanna.conf

View File

@ -0,0 +1,26 @@
Introduction
------------
Apache Hadoop is an industry standard and widely adopted MapReduce implementation.
The aim of this project is to enable users to easily provision and manage Hadoop clusters on OpenStack.
It is worth mentioning that Amazon provides Hadoop for several years as Amazon Elastic MapReduce (EMR) service.
Savanna aims to provide users with simple means to provision Hadoop clusters
by specifying several parameters like Hadoop version, cluster topology, nodes hardware details
and a few more. After user fills in all the parameters, Savanna deploys the cluster in a few minutes.
Also Savanna provides means to scale already provisioned cluster by adding/removing worker nodes on demand.
The solution will address following use cases:
* fast provisioning of Hadoop clusters on OpenStack for Dev and QA;
* utilization of unused compute power from general purpose OpenStack IaaS cloud;
* "Analytics as a Service" for ad-hoc or bursty analytic workloads (similar to AWS EMR).
Key features are:
* designed as an OpenStack component;
* managed through REST API with UI available as part of OpenStack Dashboard;
* support for different Hadoop distributions:
* pluggable system of Hadoop installation engines;
* integration with vendor specific management tools, such as Apache Ambari or Cloudera Management Console;
* predefined templates of Hadoop configurations with ability to modify parameters.

105
doc/source/overview.rst Normal file
View File

@ -0,0 +1,105 @@
Savanna Overview
================
.. include:: introduction.rst.inc
Details
-------
The Savanna product communicates with the following OpenStack components:
* Horizon - provides GUI with ability to use all of Savannas features;
* Keystone - authenticates users and provides security token that is used to work with the OpenStack,
hence limiting user abilities in Savanna to his OpenStack privileges;
* Nova - is used to provision VMs for Hadoop Cluster;
* Glance - Hadoop VM images are stored there, each image containing an installed OS and Hadoop;
the pre-installed Hadoop should give us good handicap on node start-up;
* Swift - can be used as a storage for data that will be processed by Hadoop jobs.
.. image:: images/openstack-interop.png
:width: 800 px
:scale: 99 %
:align: left
General Workflow
----------------
Savanna will provide two level of abstraction for API and UI based on the addressed use cases:
cluster provisioning and analytics as a service.
For the fast cluster provisioning generic workflow will be as following:
* select Hadoop version;
* select base image with or without pre-installed Hadoop:
* for base images without Hadoop pre-installed Savanna will support pluggable deployment engines integrated with vendor tooling;
* define cluster configuration, including size and topology of the cluster and setting the different type of Hadoop parameters (e.g. heap size):
* to ease the configuration of such parameters mechanism of configurable templates will be provided;
* provision the cluster: Savanna will provision VMs, install and configure Hadoop;
* operation on the cluster: add/remove nodes;
* terminate the cluster when its not needed anymore.
For analytic as a service generic workflow will be as following:
* select one of predefined Hadoop versions;
* configure the job:
* choose type of the job: pig, hive, jar-file, etc.;
* provide the job script source or jar location;
* select input and output data location (initially only Swift will be supported);
* select location for logs;
* set limit for the cluster size;
* execute the job:
* all cluster provisioning and job execution will happen transparently to the user;
* cluster will be removed automatically after job completion;
* get the results of computations (for example, from Swift).
Users Perspective
------------------
While provisioning cluster through Savanna, user operates on two types of entities: Node Templates and Clusters.
Node Template describes a node within cluster and it has several parameters. Node Type is one of the Node Templates
properties that determines what Hadoop processes will be running on the node and thereby its role in the cluster.
It could be either of JobTracker, NameNode, TaskTracker or DataNode, or any logical combination of these.
Also template encapsulates hardware parameters (flavor) for the node VM and configuration for Hadoop processes running on the node.
Cluster entity simply represents Hadoop Cluster. It is mainly characterized by VM image with pre-installed Hadoop which
will be used for cluster deployment and cluster topology. The topology is a list of node templates and respectively
amount of nodes being deployed for each template. With respect to topology, Savanna checks only that cluster has one JobTracker and one NameNode.
Each node template and cluster belongs to some tenant determined by user. Users have access only to objects located in
tenants they have access to. Users could edit/delete only objects they created. Naturally admin users have full access to every object.
That way Savanna complies with general OpenStack access policy.
Savanna provides several kinds of Hadoop cluster topology. JobTracker and NameNode processes could be run either on a single
VM or two separate ones. Also cluster could contain worker nodes of different types. Worker nodes could run both TaskTracker and DataNode,
or either of these processes alone. Savanna allows user to create cluster with any combination of these options.
Integration with Swift
----------------------
The Swift service is a standard object storage in OpenStack environment, analog of Amazon S3. As a rule it is deployed
on bare metal machines. It is natural to expect Hadoop on OpenStack to process data stored there. There are a couple
of enhancements on the way which can help there.
First, a FileSystem implementation for Swift: `HADOOP-8545 <https://issues.apache.org/jira/browse/HADOOP-8545>`_.
With that thing in place, Hadoop jobs can work with Swift
as naturally as with HDFS.
On the Swift side, we have the change request: `Change I6b1ba25b <https://review.openstack.org/#/c/21015/>`_ (merged).
It implements the ability to list endpoints for an object, account or container, to make it possible to integrate swift
with software that relies on data locality information to avoid network overhead.
Pluggable Deployment and Monitoring
-----------------------------------
In addition to the monitoring capabilities provided by vendor-specific Hadoop management tooling, Savanna will provide pluggable integration with external monitoring systems such as Nagios or Zabbix.
Both deployment and monitoring tools will be installed on stand-alone VMs, thus allowing a single instance to manage/monitor several clusters at once.

View File

@ -1,82 +1,33 @@
************************
Savanna quickstart guide
************************
Savanna v0.1.1 quickstart guide
===============================
1 You can install the latest Savanna release version (0.1, 0.1.1 and etc.) from pypi:
1. Install Savanna
------------------
.. sourcecode:: bash
* If you want to hack follow :doc:`devref/development.environment`.
* If you want just to install and use Savanna follow :doc:`installation.guide`.
sudo pip install savanna
2. Image setup
--------------
Or you can get Savanna archive from http://tarballs.openstack.org/savanna/ and install it using pip:
.. sourcecode:: bash
sudo pip install http://tarballs.openstack.org/savanna/savanna-master.tar.gz#egg=savanna
**Note:**
savanna-master.tar.gz contains the latest changes in the source code.
savanna-some_version.tar.gz contains features related to specified Savanna release.
2 After installation you should create configuration file or change default config to run Savanna properly. Default config file is located in:
.. sourcecode:: bash
/usr/local/share/savanna/savanna.conf.sample
3 To initialize Savanna database with created configuration just call:
.. sourcecode:: bash
savanna-manage --config-file /pathToConfig reset-db --with-gen-templates
4 To start Savanna call:
.. sourcecode:: bash
savanna-api --config-file /pathToConfig
***********************
Full installation guide
***********************
1 Setup prerequisites
=====================
1.1 OpenStack environment (Folsom+ version) installed.
1.2 Git should be installed on the machine where Savanna_API will be deployed.
1.3 Your OpenStack should have flavors with 'm1.small' and 'm1.medium' names defined because these flavors are referenced by Savanna's default Node Templates.
You can check which flavors you have by running
.. sourcecode:: bash
nova flavor-list
2 Image setup
=============
2.1 Go to OpenStack management node or you can configure ENV at another machine:
2.1. Go to OpenStack management node or you can configure ENV at another machine:
.. sourcecode:: bash
ssh user@hostname
2.2 Download tarball from the following URL:
2.2. Download tarball from the following URL:
.. sourcecode:: bash
wget http://savanna-files.mirantis.com/savanna-0.1-hdp-img.tar.gz
2.3 Unpack image and import it into Glance:
2.3. Unpack image and import it into Glance:
.. sourcecode:: bash
tar -xzf savanna-0.1-hdp-img.tar.gz
glance image-create --name=hdp.image --disk-format=qcow2 --container-format=bare < ./savanna-0.1-hdp-img.img
glance image-create --name=vanilla-hadoop.image --disk-format=qcow2 --container-format=bare < ./savanna-0.1-hdp-img.img
You should see the output similar to the following:
@ -95,7 +46,7 @@ You should see the output similar to the following:
| is_public | False |
| min_disk | 0 |
| min_ram | 0 |
| name | hdp.image |
| name | vanilla-hadoop.image |
| owner | 6b26f08455ec449ea7a2d3da75339255 |
| protected | False |
| size | 1675296768 |
@ -104,82 +55,26 @@ You should see the output similar to the following:
+------------------+--------------------------------------+
3 Savanna API SETUP
===================
3.1 Git clone repo from the https://github.com/stackforge/savanna
.. sourcecode:: bash
git clone git://github.com/stackforge/savanna.git
3.2 Go to the cloned repo directory
.. sourcecode:: bash
cd savanna
3.3 Install python headers, virtualenv and tox:
.. sourcecode:: bash
sudo apt-get update
sudo apt-get install python-dev python-virtualenv
sudo pip install tox
3.4 Prepare virtual environment:
.. sourcecode:: bash
tools/install_venv
3.5 Create config file from default template local.cfg-sample:
.. sourcecode:: bash
cp ./etc/savanna/savanna.conf.sample ./etc/savanna/savanna.conf
3.6 Look through the savanna.conf and change parameters which default values do not suite you.
**Note:** Config file could be specified for ``savanna-api`` and ``savanna-manage`` commands using ``--config-file`` flag.
**Note:** If your OpenStack cluster doesn't automatically assign floating ips then you should set ``use_floating_ips`` configuration option to ``False``.
3.7 To initialize Savanna database with created configuration just call:
.. sourcecode:: bash
tox -evenv -- savanna-manage --config-file etc/savanna/savanna.conf reset-db --with-gen-templates
Virtualenv with all requirements installed into it is now available in ``.tox/venv``. You can create it by executing ``tools/install_venv``.
3.8 To start Savanna call:
.. sourcecode:: bash
tox -evenv -- savanna-api --config-file etc/savanna/savanna.conf --allow-cluster-ops
3. Savanna API SETUP
--------------------
Now Savanna service is running. Further steps show how you can verify from console that Savanna API works properly.
3.9 First install httpie program. It allows you to send http requests to Savanna API service.
3.1. First install httpie program. It allows you to send http requests to Savanna API service.
.. sourcecode:: bash
sudo easy_install httpie
pip httpie
**Note:** sure you can use another HTTP client like curl to send requests to Savanna service
3.10 Then you need to get authentification token from OpenStack Keystone service:
3.2. Then you need to get authentification token from OpenStack Keystone service.
This steps assumes you have keystone client configured:
.. sourcecode:: bash
tools/get_auth_token --config-file <path to config file>
keystone token-get
E.g.:
.. sourcecode:: bash
tools/get_auth_token --config-file etc/savanna/savanna.conf
If authentication succeed, output will be as follows:
@ -197,7 +92,7 @@ If authentication succeed, output will be as follows:
**Note:** Save the token because you have to supply it with every request to Savanna in X-Auth-Token header.
You will also use tenant id in request URL
3.11 Send http request to the Savanna service:
3.3. Send http request to the Savanna service:
.. sourcecode:: bash
@ -206,9 +101,7 @@ You will also use tenant id in request URL
Where:
* savanna_api_ip - hostname where Savanna API service is running
* tenant_id - id of the tenant for which you got token in previous item
* auth_token - token obtained in previous item
For example:
@ -235,10 +128,10 @@ Output of this command will look as follows:
}
}
4 Hadoop Cluster startup
========================
4. Hadoop Cluster startup
-------------------------
4.1 Send the POST request to Savanna API to create Hadoop Cluster.
4.1. Send the POST request to Savanna API to create Hadoop Cluster.
Create file with name ``cluster_create.json`` and fill it with the following content:
@ -300,7 +193,7 @@ Response for this request will look like:
}
4.2 If the response in the 3.1. was ``202 ACCEPTED`` then you can check status of new cluster:
4.2. If the response in the 3.1. was ``202 ACCEPTED`` then you can check status of new cluster:
.. sourcecode:: bash
@ -359,7 +252,7 @@ Initially the cluster will be in "Starting" state, but eventually (in several mi
}
}
4.3 So you recieved NameNode's and JobTracker's URLs like this:
4.3. So you recieved NameNode's and JobTracker's URLs like this:
.. sourcecode:: json
@ -370,7 +263,7 @@ Initially the cluster will be in "Starting" state, but eventually (in several mi
and you actually could access them via browser
4.4 To check that your Hadoop installation works correctly:
4.4. To check that your Hadoop installation works correctly:
* Go to NameNode via ssh:
@ -398,4 +291,4 @@ and you actually could access them via browser
"jobtracker": "http://JobTracker_IP:50030"
Congratulations! Now you have Hadoop cluster ready on the OpenStack cloud!
Congratulations! Now you have Hadoop cluster ready on the OpenStack cloud!

View File

@ -2,6 +2,10 @@
Savanna REST API v0.2
*********************
.. note::
REST API v0.2 corresponds to Savanna v0.1.1
1 General API information
=========================

View File

@ -1,45 +0,0 @@
Savanna Roadmap
===============
Phase 1 - Basic Cluster Provisioning
------------------------------------
completion - early April
* Cluster provisioning
* Deployment Engine implementation for pre-installed images
* Templates for Hadoop cluster configuration
* REST API for cluster startup and operations
* UI integrated into Horizon
Phase 2 - Cluster Operations
----------------------------
completion - end of June
* Manual cluster scaling (add/remove nodes)
* Hadoop cluster topology configuration parameters
* Data node placement control
* HDFS location
* Swift integration
* Integration with vendor specific deployment/management tooling
* Monitoring support - integration with 3rd-party monitoring tools (Zabbix, Nagios)
Phase 3 - Analytics as a Service
--------------------------------
completion - end of September
* API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR)
* User-friendly UI for ad-hoc analytics queries based on Hive or Pig
Further Roadmap
---------------
completion - TBD
* HDFS and Swift integration
* Caching of Swift data on HDFS
* Avoid issues with Swift eventual consistency while running job
* HBase support