Add sample spark wordcount job
Added new spark job that can read data from Swift. Also added job to Sahara CI to test that. Implements blueprint: edp-spark-example-with-swift Change-Id: I3484a8ba0bddebea34b46ab33af9e6ed06bf4f44
This commit is contained in:
parent
32d8be795f
commit
82942e5125
@ -2,7 +2,33 @@ Example Spark Job
|
||||
=================
|
||||
|
||||
This example contains the compiled classes for SparkPi extracted from
|
||||
the example jar distributed with Apache Spark version 1.0.0.
|
||||
the example jar distributed with Apache Spark version 1.3.1.
|
||||
|
||||
SparkPi example estimates Pi. It can take a single optional integer
|
||||
argument specifying the number of slices (tasks) to use.
|
||||
|
||||
Example spark-wordcount Job
|
||||
==========================
|
||||
|
||||
spark-wordcount is a modified version of the WordCount example from Apache Spark.
|
||||
It can read input data from hdfs or swift container, then output the number of occurrences
|
||||
of each word to standard output or hdfs.
|
||||
|
||||
Launching wordcount job from Sahara UI
|
||||
--------------------------------------
|
||||
|
||||
1. Create a job binary that points to ``spark-wordcount.jar``.
|
||||
2. Create a job template and set ``spark-wordcount.jar`` as the main binary
|
||||
of the job template.
|
||||
3. Create a Swift container with your input file. As example, you can upload
|
||||
``sample_input.txt``.
|
||||
3. Launch job:
|
||||
|
||||
1. Put path to input file in ``args``
|
||||
2. Put path to output file in ``args``
|
||||
3. Fill the ``Main class`` input with the following class: ``sahara.edp.spark.SparkWordCount``
|
||||
4. Put the following values in the job's configs: ``edp.spark.adapt_for_swift`` with value ``True``,
|
||||
``fs.swift.service.sahara.password`` with password for your username, and
|
||||
``fs.swift.service.sahara.username`` with your username. These values are required for
|
||||
correct access to your input file, located in Swift.
|
||||
5. Execute the job. You will be able to view your output in hdfs.
|
||||
|
10
etc/edp-examples/edp-spark/sample_input.txt
Normal file
10
etc/edp-examples/edp-spark/sample_input.txt
Normal file
@ -0,0 +1,10 @@
|
||||
one
|
||||
one
|
||||
one
|
||||
one
|
||||
two
|
||||
two
|
||||
two
|
||||
three
|
||||
three
|
||||
four
|
BIN
etc/edp-examples/edp-spark/spark-wordcount.jar
Normal file
BIN
etc/edp-examples/edp-spark/spark-wordcount.jar
Normal file
Binary file not shown.
@ -95,6 +95,20 @@ edp_jobs_flow:
|
||||
edp.java.main_class: org.apache.spark.examples.SparkPi
|
||||
args:
|
||||
- 4
|
||||
- type: Spark
|
||||
input_datasource:
|
||||
type: swift
|
||||
source: etc/edp-examples/edp-spark/sample_input.txt
|
||||
main_lib:
|
||||
type: database
|
||||
source: etc/edp-examples/edp-spark/spark-wordcount.jar
|
||||
configs:
|
||||
edp.java.main_class: sahara.edp.spark.SparkWordCount
|
||||
edp.spark.adapt_for_swift: true
|
||||
fs.swift.service.sahara.username: ${OS_USERNAME}
|
||||
fs.swift.service.sahara.password: ${OS_PASSWORD}
|
||||
args:
|
||||
- '{input_datasource}'
|
||||
transient:
|
||||
- type: Pig
|
||||
input_datasource:
|
||||
@ -155,5 +169,3 @@ edp_jobs_flow:
|
||||
args:
|
||||
- 10
|
||||
- 10
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user