From a718ec7ddf85ef2e1e17868f6e2cd05b1c2762cd Mon Sep 17 00:00:00 2001 From: Trevor McKay Date: Wed, 1 Oct 2014 17:23:29 -0400 Subject: [PATCH] Fix HDFS url description, and other various edits HDFS url description is wrong as a result of code changes. This was the major motivation for this CR. Additional changes * formatted for 80 characters * consistent use of '.' at the end of bullets * added mention of Spark * adding '.sahara' suffix is no longer necessary * some other minor changes Closes-Bug: 1376457 Change-Id: I72134bcdf6c42911d07e65952a9a56331d896699 --- doc/source/horizon/dashboard.user.guide.rst | 309 +++++++++++++------- 1 file changed, 199 insertions(+), 110 deletions(-) diff --git a/doc/source/horizon/dashboard.user.guide.rst b/doc/source/horizon/dashboard.user.guide.rst index 3ca0f94a..f6b50609 100644 --- a/doc/source/horizon/dashboard.user.guide.rst +++ b/doc/source/horizon/dashboard.user.guide.rst @@ -1,101 +1,124 @@ Sahara (Data Processing) UI User Guide ====================================== -This guide assumes that you already have Sahara service and the Horizon dashboard up and running. -Don't forget to make sure that Sahara is registered in Keystone. -If you require assistance with that, please see the `installation guide <../installation.guide.html>`_. +This guide assumes that you already have the Sahara service and Horizon +dashboard up and running. Don't forget to make sure that Sahara is registered in +Keystone. If you require assistance with that, please see the +`installation guide <../installation.guide.html>`_. Launching a cluster via the Sahara UI ------------------------------------- Registering an Image -------------------- -1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then click on the "Image Registry" panel. +1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then + click on the "Image Registry" panel -2) From that page, click on the "Register Image" button at the top right. +2) From that page, click on the "Register Image" button at the top right -3) Choose the image that you'd like to register as a Hadoop Image +3) Choose the image that you'd like to register with Sahara -4) Enter the username of the cloud-init user on the image. +4) Enter the username of the cloud-init user on the image -5) Click on the tags that you want to add to the image. (A version ie: 1.2.1 and a type ie: vanilla are required for cluster functionality) +5) Click on the tags that you want to add to the image. (A version ie: 1.2.1 and + a type ie: vanilla are required for cluster functionality) -6) Click the "Done" button to finish the registration. +6) Click the "Done" button to finish the registration Create Node Group Templates --------------------------- -1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then click on the "Node Group Templates" panel. +1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then + click on the "Node Group Templates" panel -2) From that page, click on the "Create Template" button at the top right. +2) From that page, click on the "Create Template" button at the top right -3) Choose your desired Plugin name and Version from the dropdowns and click "Create". +3) Choose your desired Plugin name and Version from the dropdowns and click + "Create" 4) Give your Node Group Template a name (description is optional) 5) Choose a flavor for this template (based on your CPU/memory/disk needs) -6) Choose the storage location for your instance, this can be either "Ephemeral Drive" or "Cinder Volume". If you choose "Cinder Volume", you will need to add additional configuration. +6) Choose the storage location for your instance, this can be either "Ephemeral + Drive" or "Cinder Volume". If you choose "Cinder Volume", you will need to add + additional configuration -7) Choose which processes should be run for any instances that are spawned from this Node Group Template. +7) Choose which processes should be run for any instances that are spawned from + this Node Group Template -8) Click on the "Create" button to finish creating your Node Group Template. +8) Click on the "Create" button to finish creating your Node Group Template Create a Cluster Template ------------------------- -1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then click on the "Cluster Templates" panel. +1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then + click on the "Cluster Templates" panel -2) From that page, click on the "Create Template" button at the top right. +2) From that page, click on the "Create Template" button at the top right -3) Choose your desired Plugin name and Version from the dropdowns and click "Create". +3) Choose your desired Plugin name and Version from the dropdowns and click + "Create" -4) Under the "Details" tab, you must give your template a name. +4) Under the "Details" tab, you must give your template a name -5) Under the "Node Groups" tab, you should add one or more nodes that can be based on one or more templates. +5) Under the "Node Groups" tab, you should add one or more nodes that can be + based on one or more templates - - To do this, start by choosing a Node Group Template from the dropdown and click the "+" button. - - You can adjust the number of nodes to be spawned for this node group via the text box or the "-" and "+" buttons. - - Repeat these steps if you need nodes from additional node group templates. + - To do this, start by choosing a Node Group Template from the dropdown and + click the "+" button + - You can adjust the number of nodes to be spawned for this node group via + the text box or the "-" and "+" buttons + - Repeat these steps if you need nodes from additional node group templates -6) Optionally, you can adjust your configuration further by using the "General Parameters", "HDFS Parameters" and "MapReduce Parameters" tabs. +6) Optionally, you can adjust your configuration further by using the "General + Parameters", "HDFS Parameters" and "MapReduce Parameters" tabs -7) Click on the "Create" button to finish creating your Cluster Template. +7) Click on the "Create" button to finish creating your Cluster Template Launching a Cluster ------------------- -1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then click on the "Clusters" panel. +1) Navigate to the "Project" dashboard, then the "Data Processing" tab, then + click on the "Clusters" panel -2) Click on the "Launch Cluster" button at the top right. +2) Click on the "Launch Cluster" button at the top right -3) Choose your desired Plugin name and Version from the dropdowns and click "Create". +3) Choose your desired Plugin name and Version from the dropdowns and click + "Create" -4) Give your cluster a name. (required) +4) Give your cluster a name (required) -5) Choose which cluster template should be used for your cluster. +5) Choose which cluster template should be used for your cluster -6) Choose the image that should be used for your cluster (if you do not see any options here, see `Registering an Image`_ above). +6) Choose the image that should be used for your cluster (if you do not see any + options here, see `Registering an Image`_ above) -7) Optionally choose a keypair that can be used to authenticate to your cluster instances. +7) Optionally choose a keypair that can be used to authenticate to your cluster + instances -8) Click on the "Create" button to start your cluster. +8) Click on the "Create" button to start your cluster - - Your cluster's status will display on the Clusters table. - - It will likely take several minutes to reach the "Active" state. + - Your cluster's status will display on the Clusters table + - It will likely take several minutes to reach the "Active" state Scaling a Cluster ----------------- -1) From the Data Processing/Clusters page, click on the "Scale Cluster" button of the row that contains the cluster that you want to scale. +1) From the Data Processing/Clusters page, click on the "Scale Cluster" button + of the row that contains the cluster that you want to scale -2) You can adjust the numbers of instances for existing Node Group Templates. +2) You can adjust the numbers of instances for existing Node Group Templates -3) You can also add a new Node Group Template and choose a number of instances to launch. +3) You can also add a new Node Group Template and choose a number of instances + to launch - - This can be done by selecting your desired Node Group Template from the dropdown and clicking the "+" button. - - Your new Node Group will appear below and you can adjust the number of instances via the text box or the +/- buttons. + - This can be done by selecting your desired Node Group Template from the + dropdown and clicking the "+" button + - Your new Node Group will appear below and you can adjust the number of + instances via the text box or the "+" and "-" buttons -4) To confirm the scaling settings and trigger the spawning/deletion of instances, click on "Scale". +4) To confirm the scaling settings and trigger the spawning/deletion of + instances, click on "Scale" Elastic Data Processing (EDP) ----------------------------- @@ -103,113 +126,155 @@ Data Sources ------------ Data Sources are where the input and output from your jobs are housed. -1) From the Data Processing/Data Sources page, click on the "Create Data Source" button at the top right. +1) From the Data Processing/Data Sources page, click on the "Create Data Source" + button at the top right -2) Give your Data Source a name. +2) Give your Data Source a name -3) Enter the URL to the Data Source. +3) Enter the URL of the the Data Source - - For a Swift object, the url will look like .sahara/ (ie: mycontainer.sahara/inputfile). The "swift://" is automatically added for you. - - For an HDFS object, the url will look like / (ie: myhost/user/hadoop/inputfile). The "hdfs://" is automatically added for you. + - For a Swift object, enter / (ie: *mycontainer/inputfile*). + Sahara will prepend *swift://* for you + - For an HDFS object, enter an absolute path, a relative path or a full URL: -4) Enter the username and password for the Data Source. + + */my/absolute/path* indicates an absolute path in the cluster HDFS + + *my/path* indicates the path */user/hadoop/my/path* in the cluster HDFS + assuming the defined HDFS user is *hadoop* + + *hdfs://host:port/path* can be used to indicate any HDFS location -5) Enter an optional description. +4) Enter the username and password for the Data Source (also see + `Additional Notes`_) -6) Click on "Create". +5) Enter an optional description -7) Repeat for additional Data Sources. +6) Click on "Create" + +7) Repeat for additional Data Sources Job Binaries ------------ -Job Binaries are where you define/upload the source code (mains and libraries) for your job. +Job Binaries are where you define/upload the source code (mains and libraries) +for your job. -1) From the Data Processing/Job Binaries page, click on the "Create Job Binary" button at the top right. +1) From the Data Processing/Job Binaries page, click on the "Create Job Binary" + button at the top right -2) Give your Job Binary a name (this can be different than the actual filename). +2) Give your Job Binary a name (this can be different than the actual filename) -3) Choose the type of storage for your Job Binary. +3) Choose the type of storage for your Job Binary - - For "Swift", you will need to enter the URL of your binary (.sahara/) as well as the username and password. - - For "Internal database", you can choose from "Create a script" or "Upload a new file". + - For "Swift", enter the URL of your binary (/) as well as + the username and password (also see `Additional Notes`_) + - For "Internal database", you can choose from "Create a script" or "Upload + a new file" -4) Enter an optional description. +4) Enter an optional description -5) Click on "Create". +5) Click on "Create" 6) Repeat for additional Job Binaries Jobs ---- -Jobs are where you define the type of job you'd like to run as well as which "Job Binaries" are required. +Jobs are where you define the type of job you'd like to run as well as which +"Job Binaries" are required -1) From the Data Processing/Jobs page, click on the "Create Job" button at the top right. +1) From the Data Processing/Jobs page, click on the "Create Job" button at the + top right -2) Give your Job a name. +2) Give your Job a name -3) Choose the type of job you'd like to run (Pig, Hive, MapReduce, Streaming MapReduce, Java Action) +3) Choose the type of job you'd like to run -4) Choose the main binary from the dropdown (not applicable for MapReduce or Java Action). +4) Choose the main binary from the dropdown -5) Enter an optional description for your Job. + - This is required for Hive, Pig, and Spark jobs + - Other job types do not use a main binary -6) Optionally, click on the "Libs" tab and add one or more libraries that are required for your job. Each library must be defined as a Job Binary. +5) Enter an optional description for your Job -7) Click on "Create". +6) Click on the "Libs" tab and choose any libraries needed by your job + + - MapReduce and Java jobs require at least one library + - Other job types may optionally use libraries + +7) Click on "Create" Job Executions -------------- -Job Executions are what you get by "Launching" a job. You can monitor the status of your job to see when it has completed its run. +Job Executions are what you get by "Launching" a job. You can monitor the +status of your job to see when it has completed its run -1) From the Data Processing/Jobs page, find the row that contains the job you want to launch and click on the "Launch Job" button at the right side of that row. +1) From the Data Processing/Jobs page, find the row that contains the job you + want to launch and click on the "Launch Job" button at the right side of that + row -2) Choose the cluster (already running--see `Launching a Cluster`_ above) on which you would like the job to run. +2) Choose the cluster (already running--see `Launching a Cluster`_ above) on + which you would like the job to run -3) Choose the Input and Output Data Sources (Data Sources defined above). +3) Choose the Input and Output Data Sources (Data Sources defined above) -4) If additional configuration is required, click on the "Configure" tab. +4) If additional configuration is required, click on the "Configure" tab - - Additional configuration properties can be defined by clicking on the "Add" button. - - An example configuration entry might be mapred.mapper.class for the Name and org.apache.oozie.example.SampleMapper for the Value. + - Additional configuration properties can be defined by clicking on the "Add" + button + - An example configuration entry might be mapred.mapper.class for the Name and + org.apache.oozie.example.SampleMapper for the Value -5) Click on "Launch". To monitor the status of your job, you can navigate to the Sahara/Job Executions panel. +5) Click on "Launch". To monitor the status of your job, you can navigate to + the Sahara/Job Executions panel -6) You can relaunch a Job Execution from the Job Executions page by using the "Relaunch on New Cluster" or "Relaunch on Existing Cluster" links. +6) You can relaunch a Job Execution from the Job Executions page by using the + "Relaunch on New Cluster" or "Relaunch on Existing Cluster" links - - Relaunch on New Cluster will take you through the forms to start a new cluster before letting you specify input/output Data Sources and job configuration. - - Relaunch on Existing Cluster will prompt you for input/output Data Sources as well as allow you to change job configuration before launching the job. + - Relaunch on New Cluster will take you through the forms to start a new + cluster before letting you specify input/output Data Sources and job + configuration + - Relaunch on Existing Cluster will prompt you for input/output Data Sources + as well as allow you to change job configuration before launching the job Example Jobs ------------ -There are sample jobs located in the sahara repository. The instructions there guide you through running the jobs via the command line. -In this section, we will give a walkthrough on how to run those jobs via the Horizon UI. -These steps assume that you already have a cluster up and running (in the "Active" state). +There are sample jobs located in the sahara repository. In this section, we +will give a walkthrough on how to run those jobs via the Horizon UI. These steps +assume that you already have a cluster up and running (in the "Active" state). -1) Sample Pig job - https://github.com/openstack/sahara/tree/master/etc/edp-examples/pig-job +1) Sample Pig job - + https://github.com/openstack/sahara/tree/master/etc/edp-examples/pig-job - - Load the input data file from https://github.com/openstack/sahara/tree/master/etc/edp-examples/pig-job/data/input into swift + - Load the input data file from + https://github.com/openstack/sahara/tree/master/etc/edp-examples/pig-job/data/input + into swift - - Click on Projet/Object Store/Containers and create a container with any name ("samplecontainer" for our purposes here). + - Click on Projet/Object Store/Containers and create a container with any + name ("samplecontainer" for our purposes here) - - Click on Upload Object and give the object a name ("piginput" in this case) + - Click on Upload Object and give the object a name + ("piginput" in this case) - - Navigate to Data Processing/Data Sources, Click on Create Data Source. + - Navigate to Data Processing/Data Sources, Click on Create Data Source - Name your Data Source ("pig-input-ds" in this sample) - - Type = Swift, URL samplecontainer.sahara/piginput, fill-in the Source username/password fields with your username/password and click "Create" + - Type = Swift, URL samplecontainer/piginput, fill-in the Source + username/password fields with your username/password and click "Create" - Create another Data Source to use as output for the job - - Create another Data Source to use as output for our job. Name = pig-output-ds, Type = Swift, URL = samplecontainer.sahara/pigoutput, Source username/password, "Create" + - Name = pig-output-ds, Type = Swift, URL = samplecontainer/pigoutput, + Source username/password, "Create" - Store your Job Binaries in the Sahara database - Navigate to Data Processing/Job Binaries, Click on Create Job Binary - - Name = example.pig, Storage type = Internal database, click Browse and find example.pig wherever you checked out the sahara project /etc/edp-examples/pig-job + - Name = example.pig, Storage type = Internal database, click Browse and + find example.pig wherever you checked out the sahara project + /etc/edp-examples/pig-job - - Create another Job Binary: Name = udf.jar, Storage type = Internal database, click Browse and find udf.jar wherever you checked out the sahara project /etc/edp-examples/pig-job + - Create another Job Binary: Name = udf.jar, Storage type = Internal + database, click Browse and find udf.jar wherever you checked out the + sahara project /etc/edp-examples/pig-job - Create a Job @@ -217,59 +282,83 @@ These steps assume that you already have a cluster up and running (in the "Activ - Name = pigsample, Job Type = Pig, Choose "example.pig" as the main binary - - Click on the "Libs" tab and choose "udf.jar", then hit the "Choose" button beneath the dropdown, then click on "Create" + - Click on the "Libs" tab and choose "udf.jar", then hit the "Choose" button + beneath the dropdown, then click on "Create" - Launch your job - - To launch your job from the Jobs page, click on the down arrow at the far right of the screen and choose "Launch on Existing Cluster" + - To launch your job from the Jobs page, click on the down arrow at the far + right of the screen and choose "Launch on Existing Cluster" - - For the input, choose "pig-input-ds", for output choose "pig-output-ds". Also choose whichever cluster you'd like to run the job on. + - For the input, choose "pig-input-ds", for output choose "pig-output-ds". + Also choose whichever cluster you'd like to run the job on - - For this job, no additional configuration is necessary, so you can just click on "Launch" + - For this job, no additional configuration is necessary, so you can just + click on "Launch" - - You will be taken to the "Job Executions" page where you can see your job progress through "PENDING, RUNNING, SUCCEEDED" phases + - You will be taken to the "Job Executions" page where you can see your job + progress through "PENDING, RUNNING, SUCCEEDED" phases - - When your job finishes with "SUCCEEDED", you can navigate back to Object Store/Containers and browse to the samplecontainer to see your output. It should be in the "pigoutput" folder. + - When your job finishes with "SUCCEEDED", you can navigate back to Object + Store/Containers and browse to the samplecontainer to see your output. + It should be in the "pigoutput" folder -2) Sample Spark job - https://github.com/openstack/sahara/tree/master/etc/edp-examples/edp-spark +2) Sample Spark job - + https://github.com/openstack/sahara/tree/master/etc/edp-examples/edp-spark - Store the Job Binary in the Sahara database - Navigate to Data Processing/Job Binaries, Click on Create Job Binary - - Name = sparkexample.jar, Storage type = Internal database, Browse to the location /etc/edp-examples/edp-spark and choose spark-example.jar, Click "Create" + - Name = sparkexample.jar, Storage type = Internal database, Browse to the + location /etc/edp-examples/edp-spark and choose + spark-example.jar, Click "Create" - Create a Job - - Name = sparkexamplejob, Job Type = Spark, Main binary = Choose sparkexample.jar, Click "Create" + - Name = sparkexamplejob, Job Type = Spark, + Main binary = Choose sparkexample.jar, Click "Create" - Launch your job - - To launch your job from the Jobs page, click on the down arrow at the far right of the screen and choose "Launch on Existing Cluster" + - To launch your job from the Jobs page, click on the down arrow at the far + right of the screen and choose "Launch on Existing Cluster" - - Choose whichever cluster you'd like to run the job on. + - Choose whichever cluster you'd like to run the job on - Click on the "Configure" tab - Set the main class to be: org.apache.spark.examples.SparkPi - - Under Arguments, click Add and fill in the number of "Slices" you want to use for the job. For this example, let's use 100 as the value + - Under Arguments, click Add and fill in the number of "Slices" you want to + use for the job. For this example, let's use 100 as the value - Click on Launch - - You will be taken to the "Job Executions" page where you can see your job progress through "PENDING, RUNNING, SUCCEEDED" phases + - You will be taken to the "Job Executions" page where you can see your job + progress through "PENDING, RUNNING, SUCCEEDED" phases - - When your job finishes with "SUCCEEDED", you can see your results by sshing to the Spark "master" node. + - When your job finishes with "SUCCEEDED", you can see your results by + sshing to the Spark "master" node - - The output is located at /tmp/spark-edp//. You can do ``cat stdout`` which should display something like "Pi is roughly 3.14156132" + - The output is located at /tmp/spark-edp//. + You can do ``cat stdout`` which should display something like + "Pi is roughly 3.14156132" - - It should be noted that for more complex jobs, the input/output may be elsewhere. This particular job just writes to stdout, which is logged in the folder under /tmp. + - It should be noted that for more complex jobs, the input/output may be + elsewhere. This particular job just writes to stdout, which is logged in + the folder under /tmp Additional Notes ---------------- -1) Throughout the Sahara UI, you will find that if you try to delete an object that you will not be able to delete it if another object depends on it. -An example of this would be trying to delete a Job that has an existing Job Execution. In order to be able to delete that job, you would first need to delete any Job Executions that relate to that job. +1) Throughout the Sahara UI, you will find that if you try to delete an object + that you will not be able to delete it if another object depends on it. + An example of this would be trying to delete a Job that has an existing Job + Execution. In order to be able to delete that job, you would first need to + delete any Job Executions that relate to that job. -2) In the examples above, we mention adding your username/password for the Swift Data Sources. -It should be noted that it is possible to configure Sahara such that the username/password credentials are *not* required. -For more information on that, please refer to: :doc:`Sahara Advanced Configuration Guide <../userdoc/advanced.configuration.guide>` +2) In the examples above, we mention adding your username/password for the Swift + Data Sources. It should be noted that it is possible to configure Sahara such + that the username/password credentials are *not* required. For more + information on that, please refer to: + :doc:`Sahara Advanced Configuration Guide <../userdoc/advanced.configuration.guide>`