Merge "Documentation for the Spark plugin"
This commit is contained in:
commit
4170900df0
63
doc/source/userdoc/spark_plugin.rst
Normal file
63
doc/source/userdoc/spark_plugin.rst
Normal file
@ -0,0 +1,63 @@
|
||||
Spark Plugin
|
||||
============
|
||||
|
||||
The Spark Sahara plugin provides a way to provision Apache Spark clusters on
|
||||
OpenStack in a single click and in an easily repeatable fashion.
|
||||
|
||||
Currently Spark is installed in standalone mode, with no YARN or Mesos support.
|
||||
|
||||
Images
|
||||
------
|
||||
|
||||
For cluster provisioning prepared images should be used. The Spark plugin
|
||||
has been developed and tested with the images generated by the :doc:`diskimagebuilder`.
|
||||
Those Ubuntu images already have Cloudera CDH4 HDFS and Apache Spark 0.9.1 installed.
|
||||
|
||||
The Spark plugin requires an image to be tagged in Sahara Image Registry with
|
||||
two tags: 'spark' and '<Spark version>' (e.g. '0.9.1').
|
||||
|
||||
Also you should specify the username of the default cloud-user used in the image. For
|
||||
images generated with the DIB it is 'ubuntu'.
|
||||
|
||||
Note that the Spark cluster is deployed using the scripts available in the
|
||||
Spark distribution, which allow to start all services (master and slaves), stop
|
||||
all services and so on. As such (and as opposed to CDH HDFS daemons), Spark is
|
||||
not deployed as a standard Ubuntu service and if the virtual machines are
|
||||
rebooted, Spark will not be restarted.
|
||||
|
||||
Spark configuration
|
||||
-------------------
|
||||
|
||||
Spark needs few parameters to work and has sensible defaults. If needed they
|
||||
can be changed when creating the Sahara cluster template. No node group options
|
||||
are available.
|
||||
|
||||
Once the cluster is ready, connect with ssh to the master using the 'ubuntu'
|
||||
user and the appropriate ssh key. Spark is installed in /opt/spark and should
|
||||
be completely configured and ready to start executing jobs. At the bottom of
|
||||
the cluster information page from the OpenStack dashboard, a link to the Spark
|
||||
web interface is provided.
|
||||
|
||||
Cluster Validation
|
||||
------------------
|
||||
|
||||
When a user creates an Hadoop cluster using the Spark plugin,
|
||||
the cluster topology requested by user is verified for consistency.
|
||||
|
||||
Currently there are the following limitations in cluster topology for the Spark plugin:
|
||||
|
||||
+ Cluster must contain exactly one HDFS namenode
|
||||
+ Cluster must contain exactly one Spark master
|
||||
+ Cluster must contain at least one Spark slave
|
||||
+ Cluster must contain at least one HDFS datanode
|
||||
|
||||
The tested configuration puts the NameNode co-located with the master and a DataNode
|
||||
with each slave to maximize data locality.
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
For now scaling and EDP are not supported.
|
||||
|
||||
Swift support is not available in Spark. Once it is developed there, it will be
|
||||
possible to add it to this plugin.
|
Loading…
Reference in New Issue
Block a user