Create etc/edp-examples directory

Merge content from sahara-extra/edp-examples and integration/tests/resources
into one directory under etc/edp-examples. This is part of the effort to
ultimately move edp-examples out of the sahara-extra repo and eliminate duplication.

The integration tests have been changed to reference the new
etc/edp-examples directory, and an EDPJobInfo class has been added to
eliminate path and config value duplication between the tests.

Partial-implements: blueprint edp-move-examples
Change-Id: I71b3cd21dcb9983fd6284a90316b12368481c700
This commit is contained in:
Trevor McKay 2014-08-07 11:31:28 -04:00
parent 13647cd70a
commit 90187b0322
23 changed files with 269 additions and 162 deletions

View File

@ -0,0 +1,54 @@
=====================
EDP WordCount Example
=====================
Overview
========
``WordCount.java`` is a modified version of the WordCount example bundled with
version 1.2.1 of Apache Hadoop. It has been extended for use from a java action
in an Oozie workflow. The modification below allows any configuration values
from the ``<configuration>`` tag in an Oozie workflow to be set in the Configuration
object::
// This will add properties from the <configuration> tag specified
// in the Oozie workflow. For java actions, Oozie writes the
// configuration values to a file pointed to by ooze.action.conf.xml
conf.addResource(new Path("file:///",
System.getProperty("oozie.action.conf.xml")));
In the example workflow, we use the ``<configuration>`` tag to specify user and
password configuration values for accessing swift objects.
Compiling
=========
To build the jar, add ``hadoop-core`` and ``commons-cli`` to the classpath.
On a node running Ubuntu 13.04 with hadoop 1.2.1 the following commands
will compile ``WordCount.java`` from within the ``src`` directory::
$ mkdir wordcount_classes
$ javac -classpath /usr/share/hadoop/hadoop-core-1.2.1.jar:/usr/share/hadoop/lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java
$ jar -cvf edp-java.jar -C wordcount_classes/ .
Note, on a node with hadoop 2.3.0 the ``javac`` command above can be replaced with:
$ javac -classpath /opt/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0.jar:/opt/hadoop-2.3.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.3.0.jar:/opt/hadoop-2.3.0/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/hadoop-2.3.0/share/hadoop/mapreduce/lib/hadoop-annotations-2.3.0.jar -d wordcount_classes WordCount.java
Running from the Sahara UI
===========================
Running the WordCount example from the Sahara UI is very similar to running a Pig, Hive,
or MapReduce job.
1) Create a job binary that points to the ``edp-java.jar`` file
2) Create a ``Java`` job type and add the job binary to the ``libs`` value
3) Launch the job:
a) Add the input and output paths to ``args``
b) If swift input or output paths are used, set the ``fs.swift.service.sahara.username`` and ``fs.swift.service.sahara.password``
configuration values
c) The Sahara UI will prompt for the required ``main_class`` value and the optional ``java_opts`` value

View File

@ -0,0 +1,22 @@
=====================================================
Running WordCount example from the Oozie command line
=====================================================
1) Copy the *edp-java.jar* file from *sahara/edp-examples/edp-java* to *./wordcount/lib/edp-java.jar*
2) Modify the *job.properties* file to specify the correct **jobTracker** and **nameNode** addresses for your cluster.
3) Modify the *workflow.xml* file to contain the correct input and output paths. These paths may be Sahara swift urls or hdfs paths.
* If swift urls are used, set the **fs.swift.service.sahara.username** and **fs.swift.service.sahara.password**
properties in the **<configuration>** section.
4) Upload the *wordcount* directory to hdfs
$ hadoop fs -put wordcount wordcount
5) Launch the job, specifying the correct oozie server and port
$ oozie job -oozie http://oozie_server:port/oozie -config wordcount/job.properties -run
6) Don't forget to create your swift input path! A Sahara swift url looks like *swift://container.sahara/object*

View File

@ -0,0 +1,23 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
nameNode=hdfs://1.2.3.4:8020
jobTracker=1.2.3.4:8021
queueName=default
oozie.wf.application.path=${nameNode}/user/${user.name}/wordcount

View File

@ -0,0 +1,49 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<workflow-app xmlns="uri:oozie:workflow:0.2" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>fs.swift.service.sahara.username</name>
<value>swiftuser</value>
</property>
<property>
<name>fs.swift.service.sahara.password</name>
<value>swiftpassword</value>
</property>
</configuration>
<main-class>org.openstack.sahara.examples.WordCount</main-class>
<arg>swift://user.sahara/input</arg>
<arg>swift://user.sahara/output</arg>
</java>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

View File

@ -0,0 +1,2 @@
This product includes software developed by The Apache Software
Foundation (http://www.apache.org/).

View File

@ -0,0 +1,11 @@
Example Pig job
===============
This script trims spaces in input text
This sample pig job is based on examples in Chapter 11 of
"Hadoop: The Definitive Guide" by Tom White, published by O'Reilly Media.
The original code can be found in a maven project at
https://github.com/tomwhite/hadoop-book

View File

@ -0,0 +1,4 @@
pomegranate
banana
apple
lychee

View File

@ -0,0 +1,4 @@
pomegranate
banana
apple
lychee

Binary file not shown.

View File

@ -26,7 +26,66 @@ from sahara.tests.integration.tests import base
from sahara.utils import edp
class EDPJobInfo(object):
PIG_PATH = 'etc/edp-examples/pig-job/'
JAVA_PATH = 'etc/edp-examples/edp-java/'
MAPREDUCE_PATH = 'etc/edp-examples/edp-mapreduce/'
HADOOP2_JAVA_PATH = 'etc/edp-examples/hadoop2/edp-java/'
def read_pig_example_script(self):
return open(self.PIG_PATH + 'example.pig').read()
def read_pig_example_jar(self):
return open(self.PIG_PATH + 'udf.jar').read()
def read_java_example_lib(self, hadoop_vers=1):
if hadoop_vers == 1:
return open(self.JAVA_PATH + 'edp-java.jar').read()
return open(self.HADOOP2_JAVA_PATH + (
'hadoop-mapreduce-examples-2.3.0.jar')).read()
def java_example_configs(self, hadoop_vers=1):
if hadoop_vers == 1:
return {
'configs': {
'edp.java.main_class':
'org.openstack.sahara.examples.WordCount'
}
}
return {
'configs': {
'edp.java.main_class':
'org.apache.hadoop.examples.QuasiMonteCarlo'
},
'args': ['10', '10']
}
def read_mapreduce_example_jar(self):
return open(self.MAPREDUCE_PATH + 'edp-mapreduce.jar').read()
def mapreduce_example_configs(self):
return {
'configs': {
'mapred.mapper.class': 'org.apache.oozie.example.SampleMapper',
'mapred.reducer.class':
'org.apache.oozie.example.SampleReducer'
}
}
def mapreduce_streaming_configs(self):
return {
"configs": {
"edp.streaming.mapper": "/bin/cat",
"edp.streaming.reducer": "/usr/bin/wc"
}
}
class EDPTest(base.ITestCase):
def setUp(self):
super(EDPTest, self).setUp()
self.edp_info = EDPJobInfo()
def _create_data_source(self, name, data_type, url, description=''):
return self.sahara.data_sources.create(

View File

@ -24,7 +24,6 @@ from sahara.tests.integration.tests import map_reduce
from sahara.tests.integration.tests import scaling
from sahara.tests.integration.tests import swift
from sahara.utils import edp as utils_edp
from sahara.utils import files as f
class CDHGatingTest(cluster_configs.ClusterConfigTest,
@ -203,11 +202,9 @@ class CDHGatingTest(cluster_configs.ClusterConfigTest,
self._edp_test()
def _edp_test(self):
path = 'tests/integration/tests/resources/'
# check pig
pig_job = f.get_file_text(path + 'edp-job.pig')
pig_lib = f.get_file_text(path + 'edp-lib.jar')
pig_job = self.edp_info.read_pig_example_script()
pig_lib = self.edp_info.read_pig_example_jar()
self.edp_testing(job_type=utils_edp.JOB_TYPE_PIG,
job_data_list=[{'pig': pig_job}],
lib_data_list=[{'jar': pig_lib}],
@ -215,14 +212,8 @@ class CDHGatingTest(cluster_configs.ClusterConfigTest,
hdfs_local_output=True)
# check mapreduce
mapreduce_jar = f.get_file_text(path + 'edp-mapreduce.jar')
mapreduce_configs = {
'configs': {
'mapred.mapper.class': 'org.apache.oozie.example.SampleMapper',
'mapred.reducer.class':
'org.apache.oozie.example.SampleReducer'
}
}
mapreduce_jar = self.edp_info.read_mapreduce_example_jar()
mapreduce_configs = self.edp_info.mapreduce_example_configs()
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE,
job_data_list=[],
lib_data_list=[{'jar': mapreduce_jar}],
@ -231,37 +222,13 @@ class CDHGatingTest(cluster_configs.ClusterConfigTest,
hdfs_local_output=True)
# check mapreduce streaming
mapreduce_streaming_configs = {
'configs': {
'edp.streaming.mapper': '/bin/cat',
'edp.streaming.reducer': '/usr/bin/wc'
}
}
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE_STREAMING,
job_data_list=[],
lib_data_list=[],
configs=mapreduce_streaming_configs,
configs=self.edp_info.mapreduce_streaming_configs(),
swift_binaries=False,
hdfs_local_output=True)
# check java
"""
java_jar = f.get_file_text(
path + 'hadoop-mapreduce-examples-2.3.0.jar')
java_configs = {
'configs': {
'edp.java.main_class':
'org.apache.hadoop.examples.QuasiMonteCarlo'
},
'args': ['10', '10']
}
self.edp_testing(utils_edp.JOB_TYPE_JAVA,
job_data_list=[],
lib_data_list=[{'jar': java_jar}],
configs=java_configs,
swift_binaries=False,
hdfs_local_output=True)"""
@b.errormsg("Failure while cluster scaling: ")
def _check_scaling(self):
change_list = [

View File

@ -21,7 +21,6 @@ from sahara.tests.integration.tests import edp
from sahara.tests.integration.tests import scaling
from sahara.tests.integration.tests import swift
from sahara.utils import edp as utils_edp
from sahara.utils import files as f
class HDP2GatingTest(swift.SwiftTest, scaling.ScalingTest,
@ -128,11 +127,10 @@ class HDP2GatingTest(swift.SwiftTest, scaling.ScalingTest,
self._edp_test()
def _edp_test(self):
path = 'tests/integration/tests/resources/'
# check pig
pig_job = f.get_file_text(path + 'edp-job.pig')
pig_lib = f.get_file_text(path + 'edp-lib.jar')
pig_job = self.edp_info.read_pig_example_script()
pig_lib = self.edp_info.read_pig_example_jar()
self.edp_testing(job_type=utils_edp.JOB_TYPE_PIG,
job_data_list=[{'pig': pig_job}],
lib_data_list=[{'jar': pig_lib}],
@ -140,14 +138,8 @@ class HDP2GatingTest(swift.SwiftTest, scaling.ScalingTest,
hdfs_local_output=True)
# check mapreduce
mapreduce_jar = f.get_file_text(path + 'edp-mapreduce.jar')
mapreduce_configs = {
'configs': {
'mapred.mapper.class': 'org.apache.oozie.example.SampleMapper',
'mapred.reducer.class':
'org.apache.oozie.example.SampleReducer'
}
}
mapreduce_jar = self.edp_info.read_mapreduce_example_jar()
mapreduce_configs = self.edp_info.mapreduce_example_configs()
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE,
job_data_list=[],
lib_data_list=[{'jar': mapreduce_jar}],
@ -156,27 +148,14 @@ class HDP2GatingTest(swift.SwiftTest, scaling.ScalingTest,
hdfs_local_output=True)
# check mapreduce streaming
mapreduce_streaming_configs = {
'configs': {
'edp.streaming.mapper': '/bin/cat',
'edp.streaming.reducer': '/usr/bin/wc'
}
}
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE_STREAMING,
job_data_list=[],
lib_data_list=[],
configs=mapreduce_streaming_configs)
configs=self.edp_info.mapreduce_streaming_configs())
# check java
java_jar = f.get_file_text(
path + 'hadoop-mapreduce-examples-2.3.0.jar')
java_configs = {
'configs': {
'edp.java.main_class':
'org.apache.hadoop.examples.QuasiMonteCarlo'
},
'args': ['10', '10']
}
java_jar = self.edp_info.read_java_example_lib(2)
java_configs = self.edp_info.java_example_configs(2)
self.edp_testing(utils_edp.JOB_TYPE_JAVA,
job_data_list=[],
lib_data_list=[{'jar': java_jar}],

View File

@ -157,35 +157,14 @@ class HDPGatingTest(cinder.CinderVolumeTest, edp.EDPTest,
# ---------------------------------EDP TESTING---------------------------------
path = 'sahara/tests/integration/tests/resources/'
pig_job_data = open(path + 'edp-job.pig').read()
pig_lib_data = open(path + 'edp-lib.jar').read()
mapreduce_jar_data = open(path + 'edp-mapreduce.jar').read()
pig_job_data = self.edp_info.read_pig_example_script()
pig_lib_data = self.edp_info.read_pig_example_jar()
mapreduce_jar_data = self.edp_info.read_mapreduce_example_jar()
# This is a modified version of WordCount that takes swift configs
java_lib_data = open(path + 'edp-java/edp-java.jar').read()
java_configs = {
"configs": {
"edp.java.main_class":
"org.openstack.sahara.examples.WordCount"
}
}
java_lib_data = self.edp_info.read_java_example_lib()
mapreduce_configs = {
"configs": {
"mapred.mapper.class":
"org.apache.oozie.example.SampleMapper",
"mapred.reducer.class":
"org.apache.oozie.example.SampleReducer"
}
}
mapreduce_streaming_configs = {
"configs": {
"edp.streaming.mapper":
"/bin/cat",
"edp.streaming.reducer": "/usr/bin/wc"
}
}
try:
self.edp_testing(job_type=utils_edp.JOB_TYPE_PIG,
job_data_list=[{'pig': pig_job_data}],
@ -195,17 +174,18 @@ class HDPGatingTest(cinder.CinderVolumeTest, edp.EDPTest,
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE,
job_data_list=[],
lib_data_list=[{'jar': mapreduce_jar_data}],
configs=mapreduce_configs,
configs=self.edp_info.mapreduce_example_configs(),
swift_binaries=True,
hdfs_local_output=True)
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE_STREAMING,
job_data_list=[],
lib_data_list=[],
configs=mapreduce_streaming_configs)
configs=(
self.edp_info.mapreduce_streaming_configs()))
self.edp_testing(job_type=utils_edp.JOB_TYPE_JAVA,
job_data_list=[],
lib_data_list=[{'jar': java_lib_data}],
configs=java_configs,
configs=self.edp_info.java_example_configs(),
pass_input_output_args=True)
except Exception as e:

View File

@ -265,34 +265,12 @@ class VanillaGatingTest(cinder.CinderVolumeTest,
# ---------------------------------EDP TESTING---------------------------------
path = 'sahara/tests/integration/tests/resources/'
pig_job_data = open(path + 'edp-job.pig').read()
pig_lib_data = open(path + 'edp-lib.jar').read()
mapreduce_jar_data = open(path + 'edp-mapreduce.jar').read()
pig_job_data = self.edp_info.read_pig_example_script()
pig_lib_data = self.edp_info.read_pig_example_jar()
mapreduce_jar_data = self.edp_info.read_mapreduce_example_jar()
# This is a modified version of WordCount that takes swift configs
java_lib_data = open(path + 'edp-java/edp-java.jar').read()
java_configs = {
"configs": {
"edp.java.main_class":
"org.openstack.sahara.examples.WordCount"
}
}
mapreduce_configs = {
"configs": {
"mapred.mapper.class":
"org.apache.oozie.example.SampleMapper",
"mapred.reducer.class":
"org.apache.oozie.example.SampleReducer"
}
}
mapreduce_streaming_configs = {
"configs": {
"edp.streaming.mapper": "/bin/cat",
"edp.streaming.reducer": "/usr/bin/wc"
}
}
java_lib_data = self.edp_info.read_java_example_lib()
try:
self.edp_testing(job_type=utils_edp.JOB_TYPE_PIG,
job_data_list=[{'pig': pig_job_data}],
@ -302,17 +280,18 @@ class VanillaGatingTest(cinder.CinderVolumeTest,
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE,
job_data_list=[],
lib_data_list=[{'jar': mapreduce_jar_data}],
configs=mapreduce_configs,
configs=self.edp_info.mapreduce_example_configs(),
swift_binaries=True,
hdfs_local_output=True)
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE_STREAMING,
job_data_list=[],
lib_data_list=[],
configs=mapreduce_streaming_configs)
configs=(
self.edp_info.mapreduce_streaming_configs()))
self.edp_testing(job_type=utils_edp.JOB_TYPE_JAVA,
job_data_list=[],
lib_data_list=[{'jar': java_lib_data}],
configs=java_configs,
configs=self.edp_info.java_example_configs(),
pass_input_output_args=True)
except Exception as e:

View File

@ -24,10 +24,6 @@ from sahara.tests.integration.tests import map_reduce
from sahara.tests.integration.tests import scaling
from sahara.tests.integration.tests import swift
from sahara.utils import edp as utils_edp
from sahara.utils import files as f
RESOURCES_PATH = 'tests/integration/tests/resources/'
class VanillaTwoGatingTest(cluster_configs.ClusterConfigTest,
@ -199,8 +195,10 @@ class VanillaTwoGatingTest(cluster_configs.ClusterConfigTest,
self._edp_java_test()
def _edp_pig_test(self):
pig_job = f.get_file_text(RESOURCES_PATH + 'edp-job.pig')
pig_lib = f.get_file_text(RESOURCES_PATH + 'edp-lib.jar')
pig_job = self.edp_info.read_pig_example_script()
pig_lib = self.edp_info.read_pig_example_jar()
self.edp_testing(job_type=utils_edp.JOB_TYPE_PIG,
job_data_list=[{'pig': pig_job}],
lib_data_list=[{'jar': pig_lib}],
@ -208,14 +206,8 @@ class VanillaTwoGatingTest(cluster_configs.ClusterConfigTest,
hdfs_local_output=True)
def _edp_mapreduce_test(self):
mapreduce_jar = f.get_file_text(RESOURCES_PATH + 'edp-mapreduce.jar')
mapreduce_configs = {
'configs': {
'mapred.mapper.class': 'org.apache.oozie.example.SampleMapper',
'mapred.reducer.class':
'org.apache.oozie.example.SampleReducer'
}
}
mapreduce_jar = self.edp_info.read_mapreduce_example_jar()
mapreduce_configs = self.edp_info.mapreduce_example_configs()
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE,
job_data_list=[],
lib_data_list=[{'jar': mapreduce_jar}],
@ -224,27 +216,14 @@ class VanillaTwoGatingTest(cluster_configs.ClusterConfigTest,
hdfs_local_output=True)
def _edp_mapreduce_streaming_test(self):
mapreduce_streaming_configs = {
'configs': {
'edp.streaming.mapper': '/bin/cat',
'edp.streaming.reducer': '/usr/bin/wc'
}
}
self.edp_testing(job_type=utils_edp.JOB_TYPE_MAPREDUCE_STREAMING,
job_data_list=[],
lib_data_list=[],
configs=mapreduce_streaming_configs)
configs=self.edp_info.mapreduce_streaming_configs())
def _edp_java_test(self):
java_jar = f.get_file_text(
RESOURCES_PATH + 'hadoop-mapreduce-examples-2.3.0.jar')
java_configs = {
'configs': {
'edp.java.main_class':
'org.apache.hadoop.examples.QuasiMonteCarlo'
},
'args': ['10', '10']
}
java_jar = self.edp_info.read_java_example_lib(2)
java_configs = self.edp_info.java_example_configs(2)
self.edp_testing(utils_edp.JOB_TYPE_JAVA,
job_data_list=[],
lib_data_list=[{'jar': java_jar}],

View File

@ -1,4 +0,0 @@
Compiled against Hadoop 1.2.1
$ mkdir wordcount_classes
$ javac -classpath /usr/share/hadoop/hadoop-core-1.2.1.jar:/usr/share/hadoop/lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java
$ jar -cvf edp-java.jar -C wordcount_classes/ .

View File

@ -70,9 +70,8 @@ class TransientClusterTest(edp.EDPTest):
raise
# check EDP
path = 'sahara/tests/integration/tests/resources/'
pig_job_data = open(path + 'edp-job.pig').read()
pig_lib_data = open(path + 'edp-lib.jar').read()
pig_job_data = self.edp_info.read_pig_example_script()
pig_lib_data = self.edp_info.read_pig_example_jar()
self.edp_testing(job_type=utils_edp.JOB_TYPE_PIG,
job_data_list=[{'pig': pig_job_data}],
lib_data_list=[{'jar': pig_lib_data}])