e9a7a3858a
This change implements custom check for Kafka Service. This creates test topic, several messages are sending. Change-Id: If6013ecc6a173b99ced68722775fbe30702943c5 |
||
---|---|---|
.. | ||
NOTICE.txt | ||
README.rst | ||
sample_input.txt | ||
spark-example.jar | ||
spark-kafka-example.py | ||
spark-wordcount.jar |
Example Spark Job
This example contains the compiled classes for SparkPi extracted from the example jar distributed with Apache Spark version 1.3.1.
SparkPi example estimates Pi. It can take a single optional integer argument specifying the number of slices (tasks) to use.
Example spark-wordcount Job
spark-wordcount is a modified version of the WordCount example from Apache Spark. It can read input data from hdfs or swift container, then output the number of occurrences of each word to standard output or hdfs.
Launching wordcount job from Sahara UI
Create a job binary that points to
spark-wordcount.jar
.Create a job template and set
spark-wordcount.jar
as the main binary of the job template.Create a Swift container with your input file. As example, you can upload
sample_input.txt
.Launch job:
- Put path to input file in
args
- Put path to output file in
args
- Fill the
Main class
input with the following class:sahara.edp.spark.SparkWordCount
- Put the following values in the job's configs:
edp.spark.adapt_for_swift
with valueTrue
,fs.swift.service.sahara.password
with password for your username, andfs.swift.service.sahara.username
with your username. These values are required for correct access to your input file, located in Swift. - Execute the job. You will be able to view your output in hdfs.
- Put path to input file in
Launching spark-kafka-example
- Create a cluster with
Kafka Broker
,ZooKeeper
andSpark History Server
. The Ambari plugin can be used for that purpose. Please, use your keypair during cluster creation to have the ability to ssh in instances with that processes. For simplicity, these services should located on same the node. - Ssh to the node with the
Kafka Broker
service. Create a sample topic using the following command:path/kafka-topics.sh --create --zookeeper localhost:2181 \ --replication-factor 1 --partitions 1 --topic test-topic
. Also executepath/kafka-console-producer.sh --broker-list \ localhost:6667 --topic test-topic
and then put several messages in the topic. Please, note that you need to replace the valueslocalhost
andpath
with your own values. - Download the Spark Streaming utils to the node with your
Spark History Server
from this URL:http://central.maven.org/maven2/org/apache/spark/spark-streaming-kafka-assembly_2.10/1.4.1/spark-streaming-kafka-assembly_2.10-1.4.1.jar
. Now you are ready to launch your job from sahara UI. - Create a job binary that points to
spark-kafka-example.py
. Also you need to create a job that uses this job binary as a main binary. - Execute the job with the following job configs:
edp.spark.driver.classpath
with a value that points to the utils downloaded during step 2. Also the job should be run with the following arguments:localhost:2181
as the first argument,test-topic
as the second, and30
as the third. - Congratulations, your job was successfully launched!