Merge "Removing extraneous Swift information from Features"

This commit is contained in:
Jenkins 2014-10-01 20:39:41 +00:00 committed by Gerrit Code Review
commit 36ccea816b

View File

@ -4,55 +4,43 @@ Features Overview
Cluster Scaling
---------------
The mechanism of cluster scaling is designed to enable user to change the number of running instances without creating a new cluster.
User may change number of instances in existing Node Groups or add new Node Groups.
The mechanism of cluster scaling is designed to enable user to change the
number of running instances without creating a new cluster.
User may change number of instances in existing Node Groups or add new Node
Groups.
If cluster fails to scale properly, all changes will be rolled back.
Swift Integration
-----------------
In order to leverage Swift within Hadoop, including using Swift data sources from within EDP, Hadoop requires the application of a patch.
For additional information about this patch and configuration, please refer to :doc:`hadoop-swift`. Sahara automatically sets information
about the Swift filesystem implementation, location awareness, URL and tenant name for authorization.
The only required information that is still needed to be set is username and password to access Swift. These parameters need to be
explicitly set prior to launching the job.
E.g. :
.. sourcecode:: console
$ hadoop distcp -D fs.swift.service.sahara.username=admin \
-D fs.swift.service.sahara.password=swordfish \
swift://integration.sahara/temp swift://integration.sahara/temp1
How to compose a swift URL? The template is: ``swift://${container}.${provider}/${object}``.
We don't need to point out the account because it will be automatically
determined from tenant name from configs. Actually, account=tenant.
${provider} was designed to provide an opportunity to work
with several Swift installations. E.g. it is possible to read data from one Swift installation and write it to another one.
But as for now, Sahara automatically generates configs only for one Swift installation
with name "sahara".
Currently user can only enable/disable Swift for a Hadoop cluster. But there is a blueprint about making Swift access
more configurable: https://blueprints.launchpad.net/sahara/+spec/swift-configuration-through-rest-and-ui
In order to leverage Swift within Hadoop, including using Swift data sources
from within EDP, Hadoop requires the application of a patch.
For additional information about using Swift with Sahara, including patching
Hadoop and configuring Sahara, please refer to the :doc:`hadoop-swift`
documentation.
Cinder support
--------------
Cinder is a block storage service that can be used as an alternative for an ephemeral drive. Using Cinder volumes increases reliability of data which is important for HDFS service.
Cinder is a block storage service that can be used as an alternative for an
ephemeral drive. Using Cinder volumes increases reliability of data which is
important for HDFS service.
User can set how many volumes will be attached to each node in a Node Group and the size of each volume.
User can set how many volumes will be attached to each node in a Node Group
and the size of each volume.
All volumes are attached during Cluster creation/scaling operations.
Neutron and Nova Network support
--------------------------------
OpenStack Cluster may use Nova Network or Neutron as a networking service. Sahara supports both, but when deployed,
a special configuration for networking should be set explicitly. By default Sahara will behave as if Nova Network is used.
If OpenStack Cluster uses Neutron, then ``use_neutron`` option should be set to ``True`` in Sahara configuration file. In
addition, if the OpenStack Cluster supports network namespaces, set the ``use_namespaces`` option to ``True``
OpenStack Cluster may use Nova Network or Neutron as a networking service.
Sahara supports both, but when deployed,
a special configuration for networking should be set explicitly. By default
Sahara will behave as if Nova Network is used.
If OpenStack Cluster uses Neutron, then ``use_neutron`` option should be set
to ``True`` in Sahara configuration file. In
addition, if the OpenStack Cluster supports network namespaces, set the
``use_namespaces`` option to ``True``
.. sourcecode:: cfg
@ -62,28 +50,40 @@ addition, if the OpenStack Cluster supports network namespaces, set the ``use_na
Floating IP Management
----------------------
Sahara needs to access instances through ssh during a Cluster setup. To establish a connection Sahara may
use both: fixed and floating IP of an Instance. By default ``use_floating_ips`` parameter is set to ``True``, so
Sahara will use Floating IP of an Instance to connect. In this case, user has two options for how to make all instances
Sahara needs to access instances through ssh during a Cluster setup. To
establish a connection Sahara may
use both: fixed and floating IP of an Instance. By default
``use_floating_ips`` parameter is set to ``True``, so
Sahara will use Floating IP of an Instance to connect. In this case, user has
two options for how to make all instances
get a floating IP:
* Nova Network may be configured to assign floating IPs automatically by setting ``auto_assign_floating_ip`` to ``True`` in ``nova.conf``
* Nova Network may be configured to assign floating IPs automatically by
setting ``auto_assign_floating_ip`` to ``True`` in ``nova.conf``
* User may specify a floating IP pool for each Node Group directly.
Note: When using floating IPs for management (``use_floating_ip=True``) **every** instance in the Cluster should have a floating IP,
Note: When using floating IPs for management (``use_floating_ip=True``)
**every** instance in the Cluster should have a floating IP,
otherwise Sahara will not be able to work with it.
If ``use_floating_ips`` parameter is set to ``False`` Sahara will use Instances' fixed IPs for management. In this case
the node where Sahara is running should have access to Instances' fixed IP network. When OpenStack uses Neutron for
networking, user will be able to choose fixed IP network for all instances in a Cluster.
If ``use_floating_ips`` parameter is set to ``False`` Sahara will use
Instances' fixed IPs for management. In this case
the node where Sahara is running should have access to Instances' fixed IP
network. When OpenStack uses Neutron for
networking, user will be able to choose fixed IP network for all instances
in a Cluster.
Anti-affinity
-------------
One of the problems in Hadoop running on OpenStack is that there is no ability to control where machine is actually running.
We cannot be sure that two new virtual machines are started on different physical machines. As a result, any replication with cluster
One of the problems in Hadoop running on OpenStack is that there is no
ability to control where machine is actually running.
We cannot be sure that two new virtual machines are started on different
physical machines. As a result, any replication with cluster
is not reliable because all replicas may turn up on one physical machine.
Anti-affinity feature provides an ability to explicitly tell Sahara to run specified processes on different compute nodes. This
is especially useful for Hadoop datanode process to make HDFS replicas reliable.
Anti-affinity feature provides an ability to explicitly tell Sahara to run
specified processes on different compute nodes. This
is especially useful for Hadoop datanode process to make HDFS replicas
reliable.
Starting with Juno release Sahara creates server groups with
``anti-affinity`` policy to enable anti affinity feature. Sahara creates one
@ -162,7 +162,9 @@ environments it is recommended to control security group policy manually.
Heat Integration
----------------
Sahara may use `OpenStack Orchestration engine <https://wiki.openstack.org/wiki/Heat>`_ (aka Heat) to provision nodes for Hadoop cluster.
Sahara may use
`OpenStack Orchestration engine <https://wiki.openstack.org/wiki/Heat>`_
(aka Heat) to provision nodes for Hadoop cluster.
To make Sahara work with Heat the following steps are required:
* Your OpenStack installation must have 'orchestration' service up and running