add impl of SparkWordCount example
|3 years ago|
|wordcountapp||3 years ago|
|NOTICE.txt||3 years ago|
|README.rst||3 years ago|
|sample_input.txt||3 years ago|
|spark-kafka-example.py||3 years ago|
|spark-pi.py||3 years ago|
|spark-wordcount.jar||3 years ago|
This example contains the compiled classes for SparkPi extracted from the example jar distributed with Apache Spark version 1.3.1.
SparkPi example estimates Pi. It can take a single optional integer argument specifying the number of slices (tasks) to use.
spark-wordcount is a modified version of the WordCount example from Apache Spark. It can read input data from hdfs or swift container, then output the number of occurrences of each word to standard output or hdfs.
spark-wordcount.jaras the main binary of the job template.
- Put path to input file in
- Put path to output file in
- Fill the
Main classinput with the following class:
- Put the following values in the job's configs:
fs.swift.service.sahara.passwordwith password for your username, and
fs.swift.service.sahara.usernamewith your username. These values are required for correct access to your input file, located in Swift.
- Execute the job. You will be able to view your output in hdfs.
Spark History Server. The Ambari plugin can be used for that purpose. Please, use your keypair during cluster creation to have the ability to ssh in instances with that processes. For simplicity, these services should located on same the node.
Kafka Brokerservice. Create a sample topic using the following command:
path/kafka-topics.sh --create --zookeeper localhost:2181 \ --replication-factor 1 --partitions 1 --topic test-topic. Also execute
path/kafka-console-producer.sh --broker-list \ localhost:6667 --topic test-topicand then put several messages in the topic. Please, note that you need to replace the values
pathwith your own values.
Spark History Serverfrom this URL:
http://central.maven.org/maven2/org/apache/spark/spark-streaming-kafka-assembly_2.10/1.4.1/spark-streaming-kafka-assembly_2.10-1.4.1.jar. Now you are ready to launch your job from sahara UI.
spark-kafka-example.py. Also you need to create a job that uses this job binary as a main binary.
edp.spark.driver.classpathwith a value that points to the utils downloaded during step 2. Also the job should be run with the following arguments:
localhost:2181as the first argument,
test-topicas the second, and
30as the third.