Merge "cleanup spark plugin documentation"

This commit is contained in:
Jenkins 2015-09-21 08:52:55 +00:00 committed by Gerrit Code Review
commit 57523a684f

View File

@ -1,68 +1,66 @@
Spark Plugin Spark Plugin
============ ============
The Spark Sahara plugin provides a way to provision Apache Spark clusters on The Spark plugin for sahara provides a way to provision Apache Spark clusters
OpenStack in a single click and in an easily repeatable fashion. on OpenStack in a single click and in an easily repeatable fashion.
Currently Spark is installed in standalone mode, with no YARN or Mesos support. Currently Spark is installed in standalone mode, with no YARN or Mesos
support.
Images Images
------ ------
For cluster provisioning prepared images should be used. The Spark plugin For cluster provisioning, prepared images should be used. The Spark plugin
has been developed and tested with the images generated by sahara-image-elements: has been developed and tested with the images generated by
sahara-image-elements:
* https://github.com/openstack/sahara-image-elements * https://github.com/openstack/sahara-image-elements
Those Ubuntu images already have Cloudera CDH5 HDFS and Apache Spark installed. The Ubuntu images generated by sahara-image-elements have Cloudera CDH 5.4.0
A prepared image for Spark 1.0 and CDH4 HDFS can be found at the following location: HDFS and Apache Spark installed. A prepared image for Spark 1.3.1 and CDH
5.4.0 HDFS can be found at the following location:
* http://sahara-files.mirantis.com/sahara-juno-spark-1.0.0-ubuntu-14.04.qcow2 * http://sahara-files.mirantis.com/images/upstream/liberty/
The Spark plugin requires an image to be tagged in Sahara Image Registry with The Spark plugin requires an image to be tagged in the sahara image registry
two tags: 'spark' and '<Spark version>' (e.g. '1.0.0'). with two tags: 'spark' and '<Spark version>' (e.g. '1.3.1').
Also you should specify the username of the default cloud-user used in the image. For Also you should specify the username of the default cloud-user used in the
the images available at the URLs listed above and for all the ones generated with the image. For the images available at the URLs listed above and for all the ones
DIB it is 'ubuntu'. generated with the DIB it is `ubuntu`.
Note that the Spark cluster is deployed using the scripts available in the Note that the Spark cluster is deployed using the scripts available in the
Spark distribution, which allow to start all services (master and slaves), stop Spark distribution, which allow the user to start all services (master and
all services and so on. As such (and as opposed to CDH HDFS daemons), Spark is slaves), stop all services and so on. As such (and as opposed to CDH HDFS
not deployed as a standard Ubuntu service and if the virtual machines are daemons), Spark is not deployed as a standard Ubuntu service and if the
rebooted, Spark will not be restarted. virtual machines are rebooted, Spark will not be restarted.
Spark configuration Spark configuration
------------------- -------------------
Spark needs few parameters to work and has sensible defaults. If needed they Spark needs few parameters to work and has sensible defaults. If needed they
can be changed when creating the Sahara cluster template. No node group options can be changed when creating the sahara cluster template. No node group
are available. options are available.
Once the cluster is ready, connect with ssh to the master using the 'ubuntu' Once the cluster is ready, connect with ssh to the master using the `ubuntu`
user and the appropriate ssh key. Spark is installed in /opt/spark and should user and the appropriate ssh key. Spark is installed in `/opt/spark` and
be completely configured and ready to start executing jobs. At the bottom of should be completely configured and ready to start executing jobs. At the
the cluster information page from the OpenStack dashboard, a link to the Spark bottom of the cluster information page from the OpenStack dashboard, a link to
web interface is provided. the Spark web interface is provided.
Cluster Validation Cluster Validation
------------------ ------------------
When a user creates an Hadoop cluster using the Spark plugin, When a user creates an Hadoop cluster using the Spark plugin, the cluster
the cluster topology requested by user is verified for consistency. topology requested by user is verified for consistency.
Currently there are the following limitations in cluster topology for the Spark plugin: Currently there are the following limitations in cluster topology for the
Spark plugin:
+ Cluster must contain exactly one HDFS namenode + Cluster must contain exactly one HDFS namenode
+ Cluster must contain exactly one Spark master + Cluster must contain exactly one Spark master
+ Cluster must contain at least one Spark slave + Cluster must contain at least one Spark slave
+ Cluster must contain at least one HDFS datanode + Cluster must contain at least one HDFS datanode
The tested configuration puts the NameNode co-located with the master and a DataNode The tested configuration co-locates the NameNode with the master and a
with each slave to maximize data locality. DataNode with each slave to maximize data locality.
Limitations
-----------
Swift support is not available in Spark. Once it is developed there, it will be
possible to add it to this plugin.