poppy/hadoop
Sriram Madapusi Vasudevan 92577d2272 feat: add log delivery pig script
- The hadoop script will allow split up the provider's logs that are
  piped into it, based on those domains that have log delivery enabled.

- README.rst contains instructions on how the script is meant to be
  used.

Implements: blueprint log-delivery

Change-Id: I4434175bead26e9b78a3115038af55b25a62163c
2015-06-12 11:39:45 -04:00
..
README.rst feat: add log delivery pig script 2015-06-12 11:39:45 -04:00
log_delivery.pig feat: add log delivery pig script 2015-06-12 11:39:45 -04:00

README.rst

Log Delivery

The pig script needs to be run in a hadoop cluster, after piping all the required logs from a provider with whom services are set up with.

NOTE:
  • All the domains that need to have logs delivered need to copied into the Hadoop Cluster, under the name domains_log.tsv
  • The corresponding Provider URL needs to be also set

How to run a Pig Script

$ pig -p INPUT=~/log_source -p OUTPUT=~/logs_output -p PROVIDER_URL_EXT=mycdn

Output

There should be directories created under OUTPUT, with each directory corresponding to a domain that had log delivered enabled, and log files underneath each of those directories pertaining to that domain.

$ logs_output/mydomain/mydomain-0000.gz $ logs_output/yourdomain/yourdomain-0000.gz