deb-sahara/etc/edp-examples/edp-pig/top-todoers
Alexander Ignatov 4e86bda8eb Add one more sample for pig job examples
This patch adds new pig example "Top TODOers" shown in previous
ATL OpenStack Summit:

* Updated corresponding documentation and paths in integration tests

Implements blueprint: edp-examples

Change-Id: I4285b9c3a334cda7387776bc06147ef53c0a57e0
2014-12-17 16:12:37 +03:00
..
data Add one more sample for pig job examples 2014-12-17 16:12:37 +03:00
example.pig Add one more sample for pig job examples 2014-12-17 16:12:37 +03:00
README.rst Add one more sample for pig job examples 2014-12-17 16:12:37 +03:00

Top TODOers Pig job

This script calculates top TODOers in input sources.

Example of usage

This pig script can process as many input files (sources) as you want. Just put all input files in a directory in HDFS or container in Swift and give the path of the HDFS directory (Swift object) as input DataSource for EDP.

Here are steps how to prepare input data:

  1. Create dir 'input'
$ mkdir input
  1. Get some sources from GitHub and put it to 'input' directory:
$ cd input
$ git clone "https://github.com/openstack/swift.git"
$ git clone "https://github.com/openstack/nova.git"
$ git clone "https://github.com/openstack/glance.git"
$ git clone "https://github.com/openstack/image-api.git"
$ git clone "https://github.com/openstack/neutron.git"
$ git clone "https://github.com/openstack/horizon.git"
$ git clone "https://github.com/openstack/python-novaclient.git"
$ git clone "https://github.com/openstack/python-keystoneclient.git"
$ git clone "https://github.com/openstack/oslo-incubator.git"
$ git clone "https://github.com/openstack/python-neutronclient.git"
$ git clone "https://github.com/openstack/python-glanceclient.git"
$ git clone "https://github.com/openstack/python-swiftclient.git"
$ git clone "https://github.com/openstack/python-cinderclient.git"
$ git clone "https://github.com/openstack/ceilometer.git"
$ git clone "https://github.com/openstack/cinder.git"
$ git clone "https://github.com/openstack/heat.git"
$ git clone "https://github.com/openstack/python-heatclient.git"
$ git clone "https://github.com/openstack/python-ceilometerclient.git"
$ git clone "https://github.com/openstack/oslo.config.git"
$ git clone "https://github.com/openstack/ironic.git"
$ git clone "https://github.com/openstack/python-ironicclient.git"
$ git clone "https://github.com/openstack/operations-guide.git"
$ git clone "https://github.com/openstack/keystone.git"
$ git clone "https://github.com/openstack/oslo.messaging.git"
$ git clone "https://github.com/openstack/oslo.sphinx.git"
$ git clone "https://github.com/openstack/oslo.version.git"
$ git clone "https://github.com/openstack/sahara.git"
$ git clone "https://github.com/openstack/python-saharaclient.git"
$ git clone "https://github.com/openstack/openstack.git"
$ cd ..
  1. Create single file containing all sources:
tar -cf input.tar input/*

Note

Pig can operate with raw files as well as with compressed data, so in this step you might want to create *.gz file with sources and it should work.

  1. Upload input.tar to Swift or HDFS as input data source for EDP processing