Adding job execution examples to UI user guide
Adding examples for a Pig job and a Spark job to the Sahara UI user guide. The examples referenced are the ones from the Sahara repository /etc/edp-examples directory. Also, removed language that referred to the "Sahara" dashboard that formerly existed as a separate entity. The updated version references the "Data Processing" panel where it now lives (under Project). Change-Id: If83cdef109a4356fdb2db4487cde4e76d7d280a8 Closes-Bug: #1288914
This commit is contained in:
parent
38a92ed5a7
commit
b3ef8c3dfb
@ -1,7 +1,7 @@
|
|||||||
Sahara UI User Guide
|
Sahara (Data Processing) UI User Guide
|
||||||
====================
|
======================================
|
||||||
|
|
||||||
This guide assumes that you already have Sahara and the Sahara Dashboard configured and running.
|
This guide assumes that you already have Sahara service and the Horizon dashboard up and running.
|
||||||
If you require assistance with that, please see the installation guides.
|
If you require assistance with that, please see the installation guides.
|
||||||
|
|
||||||
Launching a cluster via the Sahara Dashboard
|
Launching a cluster via the Sahara Dashboard
|
||||||
@ -9,7 +9,7 @@ Launching a cluster via the Sahara Dashboard
|
|||||||
Registering an Image
|
Registering an Image
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
1) Navigate to the "Sahara" tab in the dashboard, then click on the "Image Registry" panel.
|
1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then click on the "Image Registry" panel.
|
||||||
|
|
||||||
2) From that page, click on the "Register Image" button at the top right.
|
2) From that page, click on the "Register Image" button at the top right.
|
||||||
|
|
||||||
@ -24,7 +24,7 @@ Registering an Image
|
|||||||
Create Node Group Templates
|
Create Node Group Templates
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
1) Navigate to the "Sahara" tab in the dashboard, then click on the "Node Group Templates" panel.
|
1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then click on the "Node Group Templates" panel.
|
||||||
|
|
||||||
2) From that page, click on the "Create Template" button at the top right.
|
2) From that page, click on the "Create Template" button at the top right.
|
||||||
|
|
||||||
@ -43,7 +43,7 @@ Create Node Group Templates
|
|||||||
Create a Cluster Template
|
Create a Cluster Template
|
||||||
-------------------------
|
-------------------------
|
||||||
|
|
||||||
1) Navigate to the "Sahara" tab in the dashboard, then click on the "Cluster Templates" panel.
|
1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then click on the "Cluster Templates" panel.
|
||||||
|
|
||||||
2) From that page, click on the "Create Template" button at the top right.
|
2) From that page, click on the "Create Template" button at the top right.
|
||||||
|
|
||||||
@ -64,7 +64,7 @@ Create a Cluster Template
|
|||||||
Launching a Cluster
|
Launching a Cluster
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
1) Navigate to the "Sahara" tab in the dashboard, then click on the "Clusters" panel.
|
1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then click on the "Clusters" panel.
|
||||||
|
|
||||||
2) Click on the "Launch Cluster" button at the top right.
|
2) Click on the "Launch Cluster" button at the top right.
|
||||||
|
|
||||||
@ -85,7 +85,7 @@ Launching a Cluster
|
|||||||
|
|
||||||
Scaling a Cluster
|
Scaling a Cluster
|
||||||
-----------------
|
-----------------
|
||||||
1) From the Sahara/Clusters page, click on the "Scale Cluster" button of the row that contains the cluster that you want to scale.
|
1) From the Data Processing/Clusters page, click on the "Scale Cluster" button of the row that contains the cluster that you want to scale.
|
||||||
|
|
||||||
2) You can adjust the numbers of instances for existing Node Group Templates.
|
2) You can adjust the numbers of instances for existing Node Group Templates.
|
||||||
|
|
||||||
@ -102,7 +102,7 @@ Data Sources
|
|||||||
------------
|
------------
|
||||||
Data Sources are where the input and output from your jobs are housed.
|
Data Sources are where the input and output from your jobs are housed.
|
||||||
|
|
||||||
1) From the Sahara/Data Sources page, click on the "Create Data Source" button at the top right.
|
1) From the Data Processing/Data Sources page, click on the "Create Data Source" button at the top right.
|
||||||
|
|
||||||
2) Give your Data Source a name.
|
2) Give your Data Source a name.
|
||||||
|
|
||||||
@ -123,7 +123,7 @@ Job Binaries
|
|||||||
------------
|
------------
|
||||||
Job Binaries are where you define/upload the source code (mains and libraries) for your job.
|
Job Binaries are where you define/upload the source code (mains and libraries) for your job.
|
||||||
|
|
||||||
1) From the Sahara/Job Binaries page, click on the "Create Job Binary" button at the top right.
|
1) From the Data Processing/Job Binaries page, click on the "Create Job Binary" button at the top right.
|
||||||
|
|
||||||
2) Give your Job Binary a name (this can be different than the actual filename).
|
2) Give your Job Binary a name (this can be different than the actual filename).
|
||||||
|
|
||||||
@ -142,7 +142,7 @@ Jobs
|
|||||||
----
|
----
|
||||||
Jobs are where you define the type of job you'd like to run as well as which "Job Binaries" are required.
|
Jobs are where you define the type of job you'd like to run as well as which "Job Binaries" are required.
|
||||||
|
|
||||||
1) From the Sahara/Jobs page, click on the "Create Job" button at the top right.
|
1) From the Data Processing/Jobs page, click on the "Create Job" button at the top right.
|
||||||
|
|
||||||
2) Give your Job a name.
|
2) Give your Job a name.
|
||||||
|
|
||||||
@ -160,7 +160,7 @@ Job Executions
|
|||||||
--------------
|
--------------
|
||||||
Job Executions are what you get by "Launching" a job. You can monitor the status of your job to see when it has completed its run.
|
Job Executions are what you get by "Launching" a job. You can monitor the status of your job to see when it has completed its run.
|
||||||
|
|
||||||
1) From the Sahara/Jobs page, find the row that contains the job you want to launch and click on the "Launch Job" button at the right side of that row.
|
1) From the Data Processing/Jobs page, find the row that contains the job you want to launch and click on the "Launch Job" button at the right side of that row.
|
||||||
|
|
||||||
2) Choose the cluster (already running--see `Launching a Cluster`_ above) on which you would like the job to run.
|
2) Choose the cluster (already running--see `Launching a Cluster`_ above) on which you would like the job to run.
|
||||||
|
|
||||||
@ -178,7 +178,97 @@ Job Executions are what you get by "Launching" a job. You can monitor the statu
|
|||||||
- Relaunch on New Cluster will take you through the forms to start a new cluster before letting you specify input/output Data Sources and job configuration.
|
- Relaunch on New Cluster will take you through the forms to start a new cluster before letting you specify input/output Data Sources and job configuration.
|
||||||
- Relaunch on Existing Cluster will prompt you for input/output Data Sources as well as allow you to change job configuration before launching the job.
|
- Relaunch on Existing Cluster will prompt you for input/output Data Sources as well as allow you to change job configuration before launching the job.
|
||||||
|
|
||||||
|
Example Jobs
|
||||||
|
------------
|
||||||
|
There are sample jobs located in the sahara repository. The instructions there guide you through running the jobs via the command line.
|
||||||
|
In this section, we will give a walkthrough on how to run those jobs via the Horizon UI.
|
||||||
|
These steps assume that you already have a cluster up and running (in the "Active" state).
|
||||||
|
|
||||||
|
1) Sample Pig job - https://github.com/openstack/sahara/tree/master/etc/edp-examples/pig-job
|
||||||
|
|
||||||
|
- Load the input data file from https://github.com/openstack/sahara/tree/master/etc/edp-examples/pig-job/data/input into swift
|
||||||
|
|
||||||
|
- Click on Projet/Object Store/Containers and create a container with any name ("samplecontainer" for our purposes here).
|
||||||
|
|
||||||
|
- Click on Upload Object and give the object a name ("piginput" in this case)
|
||||||
|
|
||||||
|
- Navigate to Data Processing/Data Sources, Click on Create Data Source.
|
||||||
|
|
||||||
|
- Name your Data Source ("pig-input-ds" in this sample)
|
||||||
|
|
||||||
|
- Type = Swift, URL samplecontainer.sahara/piginput, fill-in the Source username/password fields with your username/password and click "Create"
|
||||||
|
|
||||||
|
- Create another Data Source to use as output for the job
|
||||||
|
|
||||||
|
- Create another Data Source to use as output for our job. Name = pig-output-ds, Type = Swift, URL = samplecontainer.sahara/pigoutput, Source username/password, "Create"
|
||||||
|
|
||||||
|
- Store your Job Binaries in the Sahara database
|
||||||
|
|
||||||
|
- Navigate to Data Processing/Job Binaries, Click on Create Job Binary
|
||||||
|
|
||||||
|
- Name = example.pig, Storage type = Internal database, click Browse and find example.pig wherever you checked out the sahara project <sahara root>/etc/edp-examples/pig-job
|
||||||
|
|
||||||
|
- Create another Job Binary: Name = udf.jar, Storage type = Internal database, click Browse and find udf.jar wherever you checked out the sahara project <sahara root>/etc/edp-examples/pig-job
|
||||||
|
|
||||||
|
- Create a Job
|
||||||
|
|
||||||
|
- Navigate to Data Processing/Jobs, Click on Create Job
|
||||||
|
|
||||||
|
- Name = pigsample, Job Type = Pig, Choose "example.pig" as the main binary
|
||||||
|
|
||||||
|
- Click on the "Libs" tab and choose "udf.jar", then hit the "Choose" button beneath the dropdown, then click on "Create"
|
||||||
|
|
||||||
|
- Launch your job
|
||||||
|
|
||||||
|
- To launch your job from the Jobs page, click on the down arrow at the far right of the screen and choose "Launch on Existing Cluster"
|
||||||
|
|
||||||
|
- For the input, choose "pig-input-ds", for output choose "pig-output-ds". Also choose whichever cluster you'd like to run the job on.
|
||||||
|
|
||||||
|
- For this job, no additional configuration is necessary, so you can just click on "Launch"
|
||||||
|
|
||||||
|
- You will be taken to the "Job Executions" page where you can see your job progress through "PENDING, RUNNING, SUCCEEDED" phases
|
||||||
|
|
||||||
|
- When your job finishes with "SUCCEEDED", you can navigate back to Object Store/Containers and browse to the samplecontainer to see your output. It should be in the "pigoutput" folder.
|
||||||
|
|
||||||
|
2) Sample Spark job - https://github.com/openstack/sahara/tree/master/etc/edp-examples/edp-spark
|
||||||
|
|
||||||
|
- Store the Job Binary in the Sahara database
|
||||||
|
|
||||||
|
- Navigate to Data Processing/Job Binaries, Click on Create Job Binary
|
||||||
|
|
||||||
|
- Name = sparkexample.jar, Storage type = Internal database, Browse to the location <sahara root>/etc/edp-examples/edp-spark and choose spark-example.jar, Click "Create"
|
||||||
|
|
||||||
|
- Create a Job
|
||||||
|
|
||||||
|
- Name = sparkexamplejob, Job Type = Spark, Main binary = Choose sparkexample.jar, Click "Create"
|
||||||
|
|
||||||
|
- Launch your job
|
||||||
|
|
||||||
|
- To launch your job from the Jobs page, click on the down arrow at the far right of the screen and choose "Launch on Existing Cluster"
|
||||||
|
|
||||||
|
- Choose whichever cluster you'd like to run the job on.
|
||||||
|
|
||||||
|
- Click on the "Configure" tab
|
||||||
|
|
||||||
|
- Set the main class to be: org.apache.spark.examples.SparkPi
|
||||||
|
|
||||||
|
- Under Arguments, click Add and fill in the number of "Slices" you want to use for the job. For this example, let's use 100 as the value
|
||||||
|
|
||||||
|
- Click on Launch
|
||||||
|
|
||||||
|
- You will be taken to the "Job Executions" page where you can see your job progress through "PENDING, RUNNING, SUCCEEDED" phases
|
||||||
|
|
||||||
|
- When your job finishes with "SUCCEEDED", you can see your results by sshing to the Spark "master" node.
|
||||||
|
|
||||||
|
- The output is located at /tmp/spark-edp/<name of job>/<job execution id>. You can do ``cat stdout`` which should display something like "Pi is roughly 3.14156132"
|
||||||
|
|
||||||
|
- It should be noted that for more complex jobs, the input/output may be elsewhere. This particular job just writes to stdout, which is logged in the folder under /tmp.
|
||||||
|
|
||||||
Additional Notes
|
Additional Notes
|
||||||
----------------
|
----------------
|
||||||
1) Throughout the Sahara UI, you will find that if you try to delete an object that you will not be able to delete it if another object depends on it.
|
1) Throughout the Sahara UI, you will find that if you try to delete an object that you will not be able to delete it if another object depends on it.
|
||||||
An example of this would be trying to delete a Job that has an existing Job Execution. In order to be able to delete that job, you would first need to delete any Job Executions that relate to that job.
|
An example of this would be trying to delete a Job that has an existing Job Execution. In order to be able to delete that job, you would first need to delete any Job Executions that relate to that job.
|
||||||
|
|
||||||
|
2) In the examples above, we mention adding your username/password for the Swift Data Sources.
|
||||||
|
It should be noted that it is possible to configure Sahara such that the username/password credentials are *not* required.
|
||||||
|
For more information on that, please refer to: :doc:`Sahara Advanced Configuration Guide <../userdoc/advanced.configuration.guide>`
|
Loading…
Reference in New Issue
Block a user