Merge "Documentation for the Spark plugin"

2014-06-07 19:02:49 +00:00 · 2014-06-07 19:02:49 +00:00 · 4170900df0
commit 4170900df0
parent 4e3cb32475 2d4379bdb4
1 changed files with 63 additions and 0 deletions
--- a/doc/source/userdoc/spark_plugin.rst
+++ b/doc/source/userdoc/spark_plugin.rst
@ -0,0 +1,63 @@
+Spark Plugin
+============
+
+The Spark Sahara plugin provides a way to provision Apache Spark clusters on
+OpenStack in a single click and in an easily repeatable fashion.
+
+Currently Spark is installed in standalone mode, with no YARN or Mesos support.
+
+Images
+------
+
+For cluster provisioning prepared images should be used. The Spark plugin
+has been developed and tested with the images generated by the :doc:`diskimagebuilder`.
+Those Ubuntu images already have Cloudera CDH4 HDFS and Apache Spark 0.9.1 installed.
+
+The Spark plugin requires an image to be tagged in Sahara Image Registry with
+two tags: 'spark' and '<Spark version>' (e.g. '0.9.1').
+
+Also you should specify the username of the default cloud-user used in the image. For
+images generated with the DIB it is 'ubuntu'.
+
+Note that the Spark cluster is deployed using the scripts available in the
+Spark distribution, which allow to start all services (master and slaves), stop
+all services and so on. As such (and as opposed to CDH HDFS daemons), Spark is
+not deployed as a standard Ubuntu service and if the virtual machines are
+rebooted, Spark will not be restarted.
+
+Spark configuration
+-------------------
+
+Spark needs few parameters to work and has sensible defaults. If needed they
+can be changed when creating the Sahara cluster template. No node group options
+are available.
+
+Once the cluster is ready, connect with ssh to the master using the 'ubuntu'
+user and the appropriate ssh key. Spark is installed in /opt/spark and should
+be completely configured and ready to start executing jobs. At the bottom of
+the cluster information page from the OpenStack dashboard, a link to the Spark
+web interface is provided.
+
+Cluster Validation
+------------------
+
+When a user creates an Hadoop cluster using the Spark plugin,
+the cluster topology requested by user is verified for consistency.
+
+Currently there are the following limitations in cluster topology for the Spark plugin:
+
+  + Cluster must contain exactly one HDFS namenode
+  + Cluster must contain exactly one Spark master
+  + Cluster must contain at least one Spark slave
+  + Cluster must contain at least one HDFS datanode
+
+The tested configuration puts the NameNode co-located with the master and a DataNode
+with each slave to maximize data locality.
+
+Limitations
+-----------
+
+For now scaling and EDP are not supported.
+
+Swift support is not available in Spark. Once it is developed there, it will be
+possible to add it to this plugin.