doc: refer to the split plugin documentation

Remove the detailed information about the plugins from the
core documentation and redirect to the plugins documentation instead.

A redirect has been set to not break the existing links.

Change-Id: Ief8593b02242748a5ffd4f55515975faebd19523
This commit is contained in:
Luigi Toscano 2019-03-20 12:36:25 +01:00
parent dc17f1903f
commit bb5f75db7f
11 changed files with 23 additions and 775 deletions

View File

@ -4,3 +4,4 @@ redirectmatch 301 ^/sahara/([^/]+)/contributor/launchpad.html$ /sahara/$1/contri
redirectmatch 301 ^/sahara/(?!ocata|pike|queens)([^/]+)/user/vanilla-imagebuilder.html$ /sahara/$1/user/vanilla-plugin.html
redirectmatch 301 ^/sahara/(?!ocata|pike|queens)([^/]+)/user/cdh-imagebuilder.html$ /sahara/$1/user/cdh-plugin.html
redirectmatch 301 ^/sahara/(?!ocata|pike|queens)([^/]+)/user/guest-requirements.html$ /sahara/$1/user/building-guest-images.html
redirectmatch 301 ^/sahara/([^/]+)/user/([^-]+)-plugin.html$ /sahara-plugin-$2/$1/

View File

@ -60,6 +60,12 @@ openstack_projects = [
'neutron',
'nova',
'oslo.middleware',
'sahara-plugin-ambari',
'sahara-plugin-cdh',
'sahara-plugin-mapr',
'sahara-plugin-spark',
'sahara-plugin-storm',
'sahara-plugin-vanilla',
'tooz'
]

View File

@ -1,161 +0,0 @@
Ambari Plugin
=============
The Ambari sahara plugin provides a way to provision
clusters with Hortonworks Data Platform on OpenStack using templates in a
single click and in an easily repeatable fashion. The sahara controller serves
as the glue between Hadoop and OpenStack. The Ambari plugin mediates between
the sahara controller and Apache Ambari in order to deploy and configure Hadoop
on OpenStack. Core to the HDP Plugin is Apache Ambari
which is used as the orchestrator for deploying HDP on OpenStack. The Ambari
plugin uses Ambari Blueprints for cluster provisioning.
Apache Ambari Blueprints
------------------------
Apache Ambari Blueprints is a portable document definition, which provides a
complete definition for an Apache Hadoop cluster, including cluster topology,
components, services and their configurations. Ambari Blueprints can be
consumed by the Ambari plugin to instantiate a Hadoop cluster on OpenStack. The
benefits of this approach is that it allows for Hadoop clusters to be
configured and deployed using an Ambari native format that can be used with as
well as outside of OpenStack allowing for clusters to be re-instantiated in a
variety of environments.
Images
------
For cluster provisioning, prepared images should be used.
.. list-table:: Support matrix for the `ambari` plugin
:widths: 15 15 20 15 35
:header-rows: 1
* - Version
(image tag)
- Distribution
- Build method
- Version
(build parameter)
- Notes
* - 2.6
- Ubuntu 16.04, CentOS 7
- sahara-image-pack
- 2.6
- uses Ambari 2.6
* - 2.5
- Ubuntu 16.04, CentOS 7
- sahara-image-pack
- 2.5
- uses Ambari 2.6
* - 2.4
- Ubuntu 14.04, CentOS 7
- sahara-image-pack
- 2.4
- uses Ambari 2.6
* - 2.4
- Ubuntu 14.04, CentOS 7
- sahara-image-create
- 2.4
- uses Ambari 2.2.1.0
* - 2.3
- Ubuntu 14.04, CentOS 7
- sahara-image-pack
- 2.3
- uses Ambari 2.4
* - 2.3
- Ubuntu 14.04, CentOS 7
- sahara-image-create
- 2.3
- uses Ambari 2.2.0.0
For more information about building image, refer to
:doc:`building-guest-images`.
HDP plugin requires an image to be tagged in sahara Image Registry with two
tags: 'ambari' and '<plugin version>' (e.g. '2.5').
The image requires a username. For more information, refer to the
:doc:`registering-image` section.
To speed up provisioning, the HDP packages can be pre-installed on the image
used. The packages' versions depend on the HDP version required.
High Availability for HDFS and YARN
-----------------------------------
High Availability (Using the Quorum Journal Manager) can be
deployed automatically with the Ambari plugin. You can deploy High Available
cluster through UI by selecting ``NameNode HA`` and/or ``ResourceManager HA``
options in general configs of cluster template.
The NameNode High Availability is deployed using 2 NameNodes, one active and
one standby. The NameNodes use a set of JournalNodes and Zookepeer Servers to
ensure the necessary synchronization. In case of ResourceManager HA 2
ResourceManagers should be enabled in addition.
A typical Highly available Ambari cluster uses 2 separate NameNodes, 2 separate
ResourceManagers and at least 3 JournalNodes and at least 3 Zookeeper Servers.
HDP Version Support
-------------------
The HDP plugin currently supports deployment of HDP 2.3, 2.4 and 2.5.
Cluster Validation
------------------
Prior to Hadoop cluster creation, the HDP plugin will perform the following
validation checks to ensure a successful Hadoop deployment:
* Ensure the existence of Ambari Server process in the cluster;
* Ensure the existence of a NameNode, Zookeeper, ResourceManagers processes
HistoryServer and App TimeLine Server in the cluster
Enabling Kerberos security for cluster
--------------------------------------
If you want to protect your clusters using MIT Kerberos security you have to
complete a few steps below.
* If you would like to create a cluster protected by Kerberos security you
just need to enable Kerberos by checkbox in the ``General Parameters``
section of the cluster configuration. If you prefer to use the OpenStack CLI
for cluster creation, you have to put the data below in the
``cluster_configs`` section:
.. sourcecode:: console
"cluster_configs": {
"Enable Kerberos Security": true,
}
Sahara in this case will correctly prepare KDC server and will create
principals along with keytabs to enable authentication for Hadoop services.
* Ensure that you have the latest hadoop-openstack jar file distributed
on your cluster nodes. You can download one at
``https://tarballs.openstack.org/sahara-extra/dist/``
* Sahara will create principals along with keytabs for system users
like ``oozie``, ``hdfs`` and ``spark`` so that you will not have to
perform additional auth operations to execute your jobs on top of the
cluster.
Adjusting Ambari Agent Package Installation timeout Parameter
-------------------------------------------------------------
For a cluster with large number of nodes or slow connectivity to HDP repo
server, a Sahara HDP Cluster creation may fail due to ambari agent
reaching the timeout threshold while installing the packages in the nodes.
Such failures will occur during the "cluster start" stage which can be
monitored from Cluster Events tab of Sahara Dashboard. The timeout error will
be visible from the Ambari Dashboard as well.
* To avoid the package installation timeout by ambari agent you need to change
the default value of ``Ambari Agent Package Install timeout`` parameter which
can be found in the ``General Parameters`` section of the cluster template
configuration.

View File

@ -1,190 +0,0 @@
Cloudera Plugin
===============
The Cloudera plugin is a Sahara plugin which allows the user to
deploy and operate a cluster with Cloudera Manager.
The Cloudera plugin is enabled in Sahara by default. You can manually
modify the Sahara configuration file (default /etc/sahara/sahara.conf) to
explicitly enable or disable it in "plugins" line.
Images
------
For cluster provisioning, prepared images should be used.
.. list-table:: Support matrix for the `vanilla` plugin
:widths: 15 15 20 15 35
:header-rows: 1
* - Version
(image tag)
- Distribution
- Build method
- Version
(build parameter)
- Notes
* - 5.13.0
- Ubuntu 16.04, CentOS 7
- sahara-image-pack
- 5.13.0
-
* - 5.11.0
- Ubuntu 16.04, CentOS 7
- sahara-image-pack, sahara-image-create
- 5.11.0
-
* - 5.9.0
- Ubuntu 14.04, CentOS 7
- sahara-image-pack, sahara-image-create
- 5.9.0
-
* - 5.7.0
- Ubuntu 14.04, CentOS 7
- sahara-image-pack, sahara-image-create
- 5.7.0
-
For more information about building image, refer to
:doc:`building-guest-images`.
The cloudera plugin requires an image to be tagged in Sahara Image Registry
with two tags: 'cdh' and '<cloudera version>' (e.g. '5.13.0', '5.11.0',
'5.9.0', etc).
The default username specified for these images is different for each
distribution. For more information, refer to the
:doc:`registering-image` section.
Build settings
~~~~~~~~~~~~~~
It is possible to specify minor versions of CDH when ``sahara-image-create``
is used.
If you want to use a minor versions, export ``DIB_CDH_MINOR_VERSION``
before starting the build command, e.g.:
.. sourcecode:: console
export DIB_CDH_MINOR_VERSION=5.7.1
Services Supported
------------------
Currently below services are supported in both versions of Cloudera plugin:
HDFS, Oozie, YARN, Spark, Zookeeper, Hive, Hue, HBase. 5.3.0 version of
Cloudera Plugin also supported following services: Impala, Flume, Solr, Sqoop,
and Key-value Store Indexer. In version 5.4.0 KMS service support was added
based on version 5.3.0. Kafka 2.0.2 was added for CDH 5.5 and higher.
.. note::
Sentry service is enabled in Cloudera plugin. However, as we do not enable
Kerberos authentication in the cluster for CDH version < 5.5 (which is
required for Sentry functionality) then using Sentry service will not
really take any effect, and other services depending on Sentry will not do
any authentication too.
High Availability Support
-------------------------
Currently HDFS NameNode High Availability is supported beginning with
Cloudera 5.4.0 version. You can refer to :doc:`features` for the detail
info.
YARN ResourceManager High Availability is supported beginning with Cloudera
5.4.0 version. This feature adds redundancy in the form of an Active/Standby
ResourceManager pair to avoid the failure of single RM. Upon failover, the
Standby RM become Active so that the applications can resume from their last
check-pointed state.
Cluster Validation
------------------
When the user performs an operation on the cluster using a Cloudera plugin, the
cluster topology requested by the user is verified for consistency.
The following limitations are required in the cluster topology for all
cloudera plugin versions:
+ Cluster must contain exactly one manager.
+ Cluster must contain exactly one namenode.
+ Cluster must contain exactly one secondarynamenode.
+ Cluster must contain at least ``dfs_replication`` datanodes.
+ Cluster can contain at most one resourcemanager and this process is also
required by nodemanager.
+ Cluster can contain at most one jobhistory and this process is also
required for resourcemanager.
+ Cluster can contain at most one oozie and this process is also required
for EDP.
+ Cluster can't contain oozie without datanode.
+ Cluster can't contain oozie without nodemanager.
+ Cluster can't contain oozie without jobhistory.
+ Cluster can't contain hive on the cluster without the following services:
metastore, hive server, webcat and resourcemanager.
+ Cluster can contain at most one hue server.
+ Cluster can't contain hue server without hive service and oozie.
+ Cluster can contain at most one spark history server.
+ Cluster can't contain spark history server without resourcemanager.
+ Cluster can't contain hbase master service without at least one zookeeper
and at least one hbase regionserver.
+ Cluster can't contain hbase regionserver without at least one hbase maser.
In case of 5.3.0, 5.4.0, 5.5.0, 5.7.x or 5.9.x version of Cloudera Plugin
there are few extra limitations in the cluster topology:
+ Cluster can't contain flume without at least one datanode.
+ Cluster can contain at most one sentry server service.
+ Cluster can't contain sentry server service without at least one zookeeper
and at least one datanode.
+ Cluster can't contain solr server without at least one zookeeper and at
least one datanode.
+ Cluster can contain at most one sqoop server.
+ Cluster can't contain sqoop server without at least one datanode,
nodemanager and jobhistory.
+ Cluster can't contain hbase indexer without at least one datanode,
zookeeper, solr server and hbase master.
+ Cluster can contain at most one impala catalog server.
+ Cluster can contain at most one impala statestore.
+ Cluster can't contain impala catalogserver without impala statestore,
at least one impalad service, at least one datanode, and metastore.
+ If using Impala, the daemons must be installed on every datanode.
In case of version 5.5.0, 5.7.x or 5.9.x of Cloudera Plugin additional
services in the cluster topology are available:
+ Cluster can have the kafka service and several kafka brokers.
Enabling Kerberos security for cluster
--------------------------------------
If you want to protect your clusters using MIT Kerberos security you have to
complete a few steps below.
* If you would like to create a cluster protected by Kerberos security you
just need to enable Kerberos by checkbox in the ``General Parameters``
section of the cluster configuration. If you prefer to use the OpenStack CLI
for cluster creation, you have to put the data below in the
``cluster_configs`` section:
.. sourcecode:: console
"cluster_configs": {
"Enable Kerberos Security": true,
}
Sahara in this case will correctly prepare KDC server and will create
principals along with keytabs to enable authentication for Hadoop services.
* Ensure that you have the latest hadoop-openstack jar file distributed
on your cluster nodes. You can download one at
``https://tarballs.openstack.org/sahara-extra/dist/``
* Sahara will create principals along with keytabs for system users
like ``hdfs`` and ``spark`` so that you will not have to
perform additional auth operations to execute your jobs on top of the
cluster.

View File

@ -24,12 +24,6 @@ Plugins
:maxdepth: 2
plugins
vanilla-plugin
ambari-plugin
spark-plugin
storm-plugin
cdh-plugin
mapr-plugin
Elastic Data Processing

View File

@ -1,128 +0,0 @@
MapR Distribution Plugin
========================
The MapR Sahara plugin allows to provision MapR clusters on
OpenStack in an easy way and do it, quickly, conveniently and simply.
Operation
---------
The MapR Plugin performs the following four primary functions during cluster
creation:
1. MapR components deployment - the plugin manages the deployment of the
required software to the target VMs
2. Services Installation - MapR services are installed according to provided
roles list
3. Services Configuration - the plugin combines default settings with user
provided settings
4. Services Start - the plugin starts appropriate services according to
specified roles
Images
------
The Sahara MapR plugin can make use of either minimal (operating system only)
images or pre-populated MapR images. The base requirement for both is that the
image is cloud-init enabled and contains a supported operating system (see
http://maprdocs.mapr.com/home/InteropMatrix/r_os_matrix.html).
The advantage of a pre-populated image is that provisioning time is reduced, as
packages do not need to be downloaded which make up the majority of the time
spent in the provisioning cycle. In addition, provisioning large clusters will
put a burden on the network as packages for all nodes need to be downloaded
from the package repository.
.. list-table:: Support matrix for the `mapr` plugin
:widths: 15 15 20 15 35
:header-rows: 1
* - Version
(image tag)
- Distribution
- Build method
- Version
(build parameter)
- Notes
* - 5.2.0.mrv2
- Ubuntu 14.04, CentOS 7
- sahara-image-pack
- 5.2.0.mrv2
-
* - 5.2.0.mrv2
- Ubuntu 14.04, CentOS 7
- sahara-image-create
- 5.2.0
-
For more information about building image, refer to
:doc:`building-guest-images`.
MapR plugin needs an image to be tagged in Sahara Image Registry with
two tags: 'mapr' and '<MapR version>' (e.g. '5.2.0.mrv2').
The default username specified for these images is different for each
distribution. For more information, refer to the
:doc:`registering-image` section.
Hadoop Version Support
----------------------
The MapR plugin currently supports Hadoop 2.7.0 (5.2.0.mrv2).
Cluster Validation
------------------
When the user creates or scales a Hadoop cluster using a mapr plugin, the
cluster topology requested by the user is verified for consistency.
Every MapR cluster must contain:
* at least 1 *CLDB* process
* exactly 1 *Webserver* process
* odd number of *ZooKeeper* processes but not less than 1
* *FileServer* process on every node
* at least 1 ephemeral drive (then you need to specify the ephemeral drive in
the flavor not on the node group template creation) or 1 Cinder volume
per instance
Every Hadoop cluster must contain exactly 1 *Oozie* process
Every MapReduce v1 cluster must contain:
* at least 1 *JobTracker* process
* at least 1 *TaskTracker* process
Every MapReduce v2 cluster must contain:
* exactly 1 *ResourceManager* process
* exactly 1 *HistoryServer* process
* at least 1 *NodeManager* process
Every Spark cluster must contain:
* exactly 1 *Spark Master* process
* exactly 1 *Spark HistoryServer* process
* at least 1 *Spark Slave* (worker) process
HBase service is considered valid if:
* cluster has at least 1 *HBase-Master* process
* cluster has at least 1 *HBase-RegionServer* process
Hive service is considered valid if:
* cluster has exactly 1 *HiveMetastore* process
* cluster has exactly 1 *HiveServer2* process
Hue service is considered valid if:
* cluster has exactly 1 *Hue* process
* *Hue* process resides on the same node as *HttpFS* process
HttpFS service is considered valid if cluster has exactly 1 *HttpFS* process
Sqoop service is considered valid if cluster has exactly 1 *Sqoop2-Server*
process
The MapR Plugin
---------------
For more information, please contact MapR.

View File

@ -6,12 +6,20 @@ enables sahara to deploy a specific data processing framework (for example,
Hadoop) or distribution, and allows configuration of topology and
management/monitoring tools.
* :doc:`vanilla-plugin` - deploys Vanilla Apache Hadoop
* :doc:`ambari-plugin` - deploys Hortonworks Data Platform
* :doc:`spark-plugin` - deploys Apache Spark with Cloudera HDFS
* :doc:`storm-plugin` - deploys Apache Storm
* :doc:`mapr-plugin` - deploys MapR plugin with MapR File System
* :doc:`cdh-plugin` - deploys Cloudera Hadoop
The plugins currently developed as part of the official Sahara project are:
* :sahara-plugin-ambari-doc:`Ambari Plugin <>` -
deploys Hortonworks Data Platform
* :sahara-plugin-cdh-doc:`CDH Plugin <>` -
deploys Cloudera Hadoop
* :sahara-plugin-mapr-doc:`MapR Plugin <>` -
deploys MapR plugin with MapR File System
* :sahara-plugin-spark-doc:`Spark Plugin <>` -
deploys Apache Spark with Cloudera HDFS
* :sahara-plugin-storm-doc:`Storm Plugin <>` -
deploys Apache Storm
* :sahara-plugin-vanilla-doc:`Vanilla Plugin <>` -
deploys Vanilla Apache Hadoop
Managing plugins
----------------

View File

@ -1,91 +0,0 @@
Spark Plugin
============
The Spark plugin for sahara provides a way to provision Apache Spark clusters
on OpenStack in a single click and in an easily repeatable fashion.
Currently Spark is installed in standalone mode, with no YARN or Mesos
support.
Images
------
For cluster provisioning, prepared images should be used.
.. list-table:: Support matrix for the `spark` plugin
:widths: 15 15 20 15 35
:header-rows: 1
* - Version
(image tag)
- Distribution
- Build method
- Version
(build parameter)
- Notes
* - 2.3
- Ubuntu 16.04
- sahara-image-create
- 2.3.0
- based on CDH 5.11
* - 2.2
- Ubuntu 16.04
- sahara-image-create
- 2.2.0
- based on CDH 5.11
For more information about building image, refer to
:doc:`building-guest-images`.
The Spark plugin requires an image to be tagged in the sahara image registry
with two tags: 'spark' and '<Spark version>' (e.g. '1.6.0').
The image requires a username. For more information, refer to the
:doc:`registering-image` section.
Note that the Spark cluster is deployed using the scripts available in the
Spark distribution, which allow the user to start all services (master and
slaves), stop all services and so on. As such (and as opposed to CDH HDFS
daemons), Spark is not deployed as a standard Ubuntu service and if the
virtual machines are rebooted, Spark will not be restarted.
Build settings
~~~~~~~~~~~~~~
When ``sahara-image-create`` is used, you can override few settings
by exporting the corresponding environment variables
before starting the build command:
* ``SPARK_DOWNLOAD_URL`` - download link for Spark
Spark configuration
-------------------
Spark needs few parameters to work and has sensible defaults. If needed they
can be changed when creating the sahara cluster template. No node group
options are available.
Once the cluster is ready, connect with ssh to the master using the `ubuntu`
user and the appropriate ssh key. Spark is installed in `/opt/spark` and
should be completely configured and ready to start executing jobs. At the
bottom of the cluster information page from the OpenStack dashboard, a link to
the Spark web interface is provided.
Cluster Validation
------------------
When a user creates an Hadoop cluster using the Spark plugin, the cluster
topology requested by user is verified for consistency.
Currently there are the following limitations in cluster topology for the
Spark plugin:
+ Cluster must contain exactly one HDFS namenode
+ Cluster must contain exactly one Spark master
+ Cluster must contain at least one Spark slave
+ Cluster must contain at least one HDFS datanode
The tested configuration co-locates the NameNode with the master and a
DataNode with each slave to maximize data locality.

View File

@ -1,82 +0,0 @@
Storm Plugin
============
The Storm plugin for sahara provides a way to provision Apache Storm clusters
on OpenStack in a single click and in an easily repeatable fashion.
Currently Storm is installed in standalone mode, with no YARN support.
Images
------
For cluster provisioning, prepared images should be used.
.. list-table:: Support matrix for the `storm` plugin
:widths: 15 15 20 15 35
:header-rows: 1
* - Version
(image tag)
- Distribution
- Build method
- Version
(build parameter)
- Notes
* - 1.2
- Ubuntu 16.04
- sahara-image-create
- 1.2.1, 1.2.0
- both versions are supported by the same image tag
* - 1.1.0
- Ubuntu 16.04
- sahara-image-create
- 1.1.1, 1.1.0
- both versions are supported by the same image tag
For more information about building image, refer to
:doc:`building-guest-images`.
The Storm plugin requires an image to be tagged in the sahara image registry
with two tags: 'storm' and '<Storm version>' (e.g. '1.1.0').
The image requires a username. For more information, refer to the
:doc:`registering-image` section.
Note that the Storm cluster is deployed using the scripts available in the
Storm distribution, which allow the user to start all services (nimbus,
supervisors and zookeepers), stop all services and so on. As such Storm is not
deployed as a standard Ubuntu service and if the virtual machines are rebooted,
Storm will not be restarted.
Storm configuration
-------------------
Storm needs few parameters to work and has sensible defaults. If needed they
can be changed when creating the sahara cluster template. No node group
options are available.
Once the cluster is ready, connect with ssh to the master using the `ubuntu`
user and the appropriate ssh key. Storm is installed in `/usr/local/storm` and
should be completely configured and ready to start executing jobs. At the
bottom of the cluster information page from the OpenStack dashboard, a link to
the Storm web interface is provided.
Cluster Validation
------------------
When a user creates a Storm cluster using the Storm plugin, the cluster
topology requested by user is verified for consistency.
Currently there are the following limitations in cluster topology for the
Storm plugin:
+ Cluster must contain exactly one Storm nimbus
+ Cluster must contain at least one Storm supervisor
+ Cluster must contain at least one Zookeeper node
The tested configuration has nimbus, supervisor, and Zookeeper processes each
running on their own nodes.
Another possible configuration is one node with nimbus alone, and additional
nodes each with supervisor and Zookeeper processes together.

View File

@ -1,111 +0,0 @@
Vanilla Plugin
==============
The vanilla plugin is a reference implementation which allows users to operate
a cluster with Apache Hadoop.
Since the Newton release Spark is integrated into the Vanilla plugin so you
can launch Spark jobs on a Vanilla cluster.
Images
------
For cluster provisioning, prepared images should be used.
.. list-table:: Support matrix for the `vanilla` plugin
:widths: 15 15 20 15 35
:header-rows: 1
* - Version
(image tag)
- Distribution
- Build method
- Version
(build parameter)
- Notes
* - 2.8.2
- Ubuntu 16.04, CentOS 7
- sahara-image-create
- 2.8.2
- Hive 2.3.2, Oozie 4.3.0
* - 2.7.5
- Ubuntu 16.04, CentOS 7
- sahara-image-create
- 2.7.5
- Hive 2.3.2, Oozie 4.3.0
* - 2.7.1
- Ubuntu 16.04, CentOS 7
- sahara-image-create
- 2.7.1
- Hive 0.11.0, Oozie 4.2.0
For more information about building image, refer to
:doc:`building-guest-images`.
Vanilla plugin requires an image to be tagged in Sahara Image Registry with
two tags: 'vanilla' and '<hadoop version>' (e.g. '2.7.1').
The image requires a username. For more information, refer to the
:doc:`registering-image` section.
Build settings
~~~~~~~~~~~~~~
When ``sahara-image-create`` is used, you can override few settings
by exporting the corresponding environment variables
before starting the build command:
* ``DIB_HADOOP_VERSION`` - version of Hadoop to install
* ``HIVE_VERSION`` - version of Hive to install
* ``OOZIE_DOWNLOAD_URL`` - download link for Oozie (we have built
Oozie libs here: https://tarballs.openstack.org/sahara-extra/dist/oozie/)
* ``SPARK_DOWNLOAD_URL`` - download link for Spark
Vanilla Plugin Requirements
---------------------------
The image building tools described in :ref:`building-guest-images-label`
add the required software to the image and their usage is strongly suggested.
Nevertheless, here are listed the software that should be pre-loaded
on the guest image so that it can be used to create Vanilla clusters:
* ssh-client installed
* Java (version >= 7)
* Apache Hadoop installed
* 'hadoop' user created
See :doc:`hadoop-swift` for information on using Swift with your sahara cluster
(for EDP support Swift integration is currently required).
To support EDP, the following components must also be installed on the guest:
* Oozie version 4 or higher
* mysql/mariadb
* hive
Cluster Validation
------------------
When user creates or scales a Hadoop cluster using a Vanilla plugin,
the cluster topology requested by user is verified for consistency.
Currently there are the following limitations in cluster topology for Vanilla
plugin:
For Vanilla Hadoop version 2.x.x:
+ Cluster must contain exactly one namenode
+ Cluster can contain at most one resourcemanager
+ Cluster can contain at most one secondary namenode
+ Cluster can contain at most one historyserver
+ Cluster can contain at most one oozie and this process is also required
for EDP
+ Cluster can't contain oozie without resourcemanager and without
historyserver
+ Cluster can't have nodemanager nodes if it doesn't have resourcemanager
+ Cluster can have at most one hiveserver node.
+ Cluster can have at most one spark history server and this process is also
required for Spark EDP (Spark is available since the Newton release).

View File

@ -5,3 +5,5 @@
/sahara/latest/user/cdh-imagebuilder.html 301 /sahara/latest/user/cdh-plugin.html
/sahara/latest/user/guest-requirements.html 301 /sahara/latest/user/building-guest-images.html
/sahara/rocky/user/guest-requirements.html 301 /sahara/rocky/user/building-guest-images.html
/sahara/latest/user/vanilla-plugin.html 301 /sahara-plugin-vanilla/latest/
/sahara/stein/user/storm-plugin.html 301 /sahara-plugin-storm/stein/