sahara-tests/sahara_tests/scenario/defaults/edp-examples/edp-pig/cleanup-string
Luigi Toscano 77fa63e19e New Pig example with a User Defined Function
Replace the existing Pig/UDF example with an original one.
The license of the replaced example was not totally clear.

The input and expected output of the old example has been kept as they
are used also by other tests.

Even if we are working towards the removal of jar file from this repository,
rebuilding Hadoop-based jars is not trivial, so the jar file (compiled with
target JVM 1.6) is part of the patch.

Change-Id: Ib86f63458797dc10b19334177dab01e16894ca57
2016-08-23 01:22:41 +02:00
..
data New Pig example with a User Defined Function 2016-08-23 01:22:41 +02:00
src New Pig example with a User Defined Function 2016-08-23 01:22:41 +02:00
README.rst New Pig example with a User Defined Function 2016-08-23 01:22:41 +02:00
edp-pig-udf-stringcleaner.jar New Pig example with a User Defined Function 2016-08-23 01:22:41 +02:00
example.pig New Pig example with a User Defined Function 2016-08-23 01:22:41 +02:00

README.rst

Pig StringCleaner Example

Overview

This is an (almost useless) example of Pig job which uses a custom UDF (User Defined Function).

  • StringCleaner.java is a Pig UDF which strips some characters from the input.
  • example.pig is the main Pig code which uses the UDF;

Compiling the UDF

To build the jar, add pig to the classpath.

$ cd src $ mkdir build $ javac -source 1.6 -target 1.6 -cp /path/to/pig.jar -d build StringCleaner.java $ jar -cvf edp-pig-stringcleaner.jar -C build/ .

Running from the Sahara UI

The procedure does not differ from the usual steps for other Pig jobs.

Create a job template where: - the main library points to the job binary for example.pig; - additional library contains the job binary for edp-pig-udf-stringcleaner.jar.

Create a job from that job template and attach the input and output data sources.