Add sample spark wordcount job
Added new spark job that can read data from Swift. Also added job to Sahara CI to test that. Implements blueprint: edp-spark-example-with-swift Change-Id: I3484a8ba0bddebea34b46ab33af9e6ed06bf4f44
This commit is contained in:
parent
32d8be795f
commit
82942e5125
@ -2,7 +2,33 @@ Example Spark Job
|
|||||||
=================
|
=================
|
||||||
|
|
||||||
This example contains the compiled classes for SparkPi extracted from
|
This example contains the compiled classes for SparkPi extracted from
|
||||||
the example jar distributed with Apache Spark version 1.0.0.
|
the example jar distributed with Apache Spark version 1.3.1.
|
||||||
|
|
||||||
SparkPi example estimates Pi. It can take a single optional integer
|
SparkPi example estimates Pi. It can take a single optional integer
|
||||||
argument specifying the number of slices (tasks) to use.
|
argument specifying the number of slices (tasks) to use.
|
||||||
|
|
||||||
|
Example spark-wordcount Job
|
||||||
|
==========================
|
||||||
|
|
||||||
|
spark-wordcount is a modified version of the WordCount example from Apache Spark.
|
||||||
|
It can read input data from hdfs or swift container, then output the number of occurrences
|
||||||
|
of each word to standard output or hdfs.
|
||||||
|
|
||||||
|
Launching wordcount job from Sahara UI
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
1. Create a job binary that points to ``spark-wordcount.jar``.
|
||||||
|
2. Create a job template and set ``spark-wordcount.jar`` as the main binary
|
||||||
|
of the job template.
|
||||||
|
3. Create a Swift container with your input file. As example, you can upload
|
||||||
|
``sample_input.txt``.
|
||||||
|
3. Launch job:
|
||||||
|
|
||||||
|
1. Put path to input file in ``args``
|
||||||
|
2. Put path to output file in ``args``
|
||||||
|
3. Fill the ``Main class`` input with the following class: ``sahara.edp.spark.SparkWordCount``
|
||||||
|
4. Put the following values in the job's configs: ``edp.spark.adapt_for_swift`` with value ``True``,
|
||||||
|
``fs.swift.service.sahara.password`` with password for your username, and
|
||||||
|
``fs.swift.service.sahara.username`` with your username. These values are required for
|
||||||
|
correct access to your input file, located in Swift.
|
||||||
|
5. Execute the job. You will be able to view your output in hdfs.
|
||||||
|
10
etc/edp-examples/edp-spark/sample_input.txt
Normal file
10
etc/edp-examples/edp-spark/sample_input.txt
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
one
|
||||||
|
one
|
||||||
|
one
|
||||||
|
one
|
||||||
|
two
|
||||||
|
two
|
||||||
|
two
|
||||||
|
three
|
||||||
|
three
|
||||||
|
four
|
BIN
etc/edp-examples/edp-spark/spark-wordcount.jar
Normal file
BIN
etc/edp-examples/edp-spark/spark-wordcount.jar
Normal file
Binary file not shown.
@ -95,6 +95,20 @@ edp_jobs_flow:
|
|||||||
edp.java.main_class: org.apache.spark.examples.SparkPi
|
edp.java.main_class: org.apache.spark.examples.SparkPi
|
||||||
args:
|
args:
|
||||||
- 4
|
- 4
|
||||||
|
- type: Spark
|
||||||
|
input_datasource:
|
||||||
|
type: swift
|
||||||
|
source: etc/edp-examples/edp-spark/sample_input.txt
|
||||||
|
main_lib:
|
||||||
|
type: database
|
||||||
|
source: etc/edp-examples/edp-spark/spark-wordcount.jar
|
||||||
|
configs:
|
||||||
|
edp.java.main_class: sahara.edp.spark.SparkWordCount
|
||||||
|
edp.spark.adapt_for_swift: true
|
||||||
|
fs.swift.service.sahara.username: ${OS_USERNAME}
|
||||||
|
fs.swift.service.sahara.password: ${OS_PASSWORD}
|
||||||
|
args:
|
||||||
|
- '{input_datasource}'
|
||||||
transient:
|
transient:
|
||||||
- type: Pig
|
- type: Pig
|
||||||
input_datasource:
|
input_datasource:
|
||||||
@ -155,5 +169,3 @@ edp_jobs_flow:
|
|||||||
args:
|
args:
|
||||||
- 10
|
- 10
|
||||||
- 10
|
- 10
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user