297 lines
13 KiB
ReStructuredText
297 lines
13 KiB
ReStructuredText
Features Overview
|
|
=================
|
|
|
|
Cluster Scaling
|
|
---------------
|
|
|
|
The mechanism of cluster scaling is designed to enable user to change the
|
|
number of running instances without creating a new cluster.
|
|
User may change number of instances in existing Node Groups or add new Node
|
|
Groups.
|
|
|
|
If cluster fails to scale properly, all changes will be rolled back.
|
|
|
|
Swift Integration
|
|
-----------------
|
|
|
|
In order to leverage Swift within Hadoop, including using Swift data sources
|
|
from within EDP, Hadoop requires the application of a patch.
|
|
For additional information about using Swift with Sahara, including patching
|
|
Hadoop and configuring Sahara, please refer to the :doc:`hadoop-swift`
|
|
documentation.
|
|
|
|
Cinder support
|
|
--------------
|
|
Cinder is a block storage service that can be used as an alternative for an
|
|
ephemeral drive. Using Cinder volumes increases reliability of data which is
|
|
important for HDFS service.
|
|
|
|
User can set how many volumes will be attached to each node in a Node Group
|
|
and the size of each volume.
|
|
|
|
All volumes are attached during Cluster creation/scaling operations.
|
|
|
|
Neutron and Nova Network support
|
|
--------------------------------
|
|
OpenStack clusters may use Nova or Neutron as a networking service. Sahara
|
|
supports both, but when deployed a special configuration for networking
|
|
should be set explicitly. By default Sahara will behave as if Nova is used.
|
|
If an OpenStack cluster uses Neutron, then the ``use_neutron`` property should
|
|
be set to ``True`` in the Sahara configuration file. Additionally, if the
|
|
cluster supports network namespaces the ``use_namespaces`` property can be
|
|
used to enable their usage.
|
|
|
|
.. sourcecode:: cfg
|
|
|
|
[DEFAULT]
|
|
use_neutron=True
|
|
use_namespaces=True
|
|
|
|
.. note::
|
|
If a user other than ``root`` will be running the Sahara server
|
|
instance and namespaces are used, some additional configuration is
|
|
required, please see the :doc:`advanced.configuration.guide` for more
|
|
information.
|
|
|
|
Floating IP Management
|
|
----------------------
|
|
|
|
Sahara needs to access instances through ssh during a Cluster setup. To
|
|
establish a connection Sahara may
|
|
use both: fixed and floating IP of an Instance. By default
|
|
``use_floating_ips`` parameter is set to ``True``, so
|
|
Sahara will use Floating IP of an Instance to connect. In this case, user has
|
|
two options for how to make all instances
|
|
get a floating IP:
|
|
|
|
* Nova Network may be configured to assign floating IPs automatically by
|
|
setting ``auto_assign_floating_ip`` to ``True`` in ``nova.conf``
|
|
* User may specify a floating IP pool for each Node Group directly.
|
|
|
|
Note: When using floating IPs for management (``use_floating_ip=True``)
|
|
**every** instance in the Cluster should have a floating IP,
|
|
otherwise Sahara will not be able to work with it.
|
|
|
|
If ``use_floating_ips`` parameter is set to ``False`` Sahara will use
|
|
Instances' fixed IPs for management. In this case
|
|
the node where Sahara is running should have access to Instances' fixed IP
|
|
network. When OpenStack uses Neutron for
|
|
networking, user will be able to choose fixed IP network for all instances
|
|
in a Cluster.
|
|
|
|
Anti-affinity
|
|
-------------
|
|
One of the problems in Hadoop running on OpenStack is that there is no
|
|
ability to control where machine is actually running.
|
|
We cannot be sure that two new virtual machines are started on different
|
|
physical machines. As a result, any replication with cluster
|
|
is not reliable because all replicas may turn up on one physical machine.
|
|
Anti-affinity feature provides an ability to explicitly tell Sahara to run
|
|
specified processes on different compute nodes. This
|
|
is especially useful for Hadoop datanode process to make HDFS replicas
|
|
reliable.
|
|
|
|
Starting with Juno release Sahara creates server groups with
|
|
``anti-affinity`` policy to enable anti affinity feature. Sahara creates one
|
|
server group per cluster and assigns all instances with affected processes to
|
|
this server group. Refer to Nova documentation on how server groups work.
|
|
|
|
This feature is supported by all plugins out of the box.
|
|
|
|
Data-locality
|
|
-------------
|
|
It is extremely important for data processing to do locally (on the same rack,
|
|
openstack compute node or even VM) as much work as
|
|
possible. Hadoop supports data-locality feature and can schedule jobs to
|
|
tasktracker nodes that are local for input stream. In this case tasktracker
|
|
could communicate directly with local data node.
|
|
|
|
Sahara supports topology configuration for HDFS and Swift data sources.
|
|
|
|
To enable data-locality set ``enable_data_locality`` parameter to ``True`` in
|
|
Sahara configuration file
|
|
|
|
.. sourcecode:: cfg
|
|
|
|
enable_data_locality=True
|
|
|
|
In this case two files with topology must be provided to Sahara.
|
|
Options ``compute_topology_file`` and ``swift_topology_file`` parameters
|
|
control location of files with compute and swift nodes topology descriptions
|
|
correspondingly.
|
|
|
|
``compute_topology_file`` should contain mapping between compute nodes and
|
|
racks in the following format:
|
|
|
|
.. sourcecode:: cfg
|
|
|
|
compute1 /rack1
|
|
compute1 /rack2
|
|
compute1 /rack2
|
|
|
|
Note that compute node name must be exactly the same as configured in
|
|
openstack (``host`` column in admin list for instances).
|
|
|
|
``swift_topology_file`` should contain mapping between swift nodes and
|
|
racks in the following format:
|
|
|
|
.. sourcecode:: cfg
|
|
|
|
node1 /rack1
|
|
node2 /rack2
|
|
node3 /rack2
|
|
|
|
Note that swift node must be exactly the same as configures in object.builder
|
|
swift ring. Also make sure that VMs with tasktracker service has direct access
|
|
to swift nodes.
|
|
|
|
Hadoop versions after 1.2.0 support four-layer topology
|
|
(https://issues.apache.org/jira/browse/HADOOP-8468). To enable this feature
|
|
set ``enable_hypervisor_awareness`` option to ``True`` in Sahara configuration
|
|
file. In this case Sahara will add compute node ID as a second level of
|
|
topology for Virtual Machines.
|
|
|
|
Security group management
|
|
-------------------------
|
|
|
|
Sahara allows you to control which security groups will be used for created
|
|
instances. This can be done by providing the ``security_groups`` parameter for
|
|
the Node Group or Node Group Template. By default an empty list is used that
|
|
will result in using the default security group.
|
|
|
|
Sahara may also create a security group for instances in node group
|
|
automatically. This security group will only have open ports which are
|
|
required by instance processes or the Sahara engine. This option is useful
|
|
for development and secured from outside environments, but for production
|
|
environments it is recommended to control security group policy manually.
|
|
|
|
Heat Integration
|
|
----------------
|
|
|
|
Sahara may use
|
|
`OpenStack Orchestration engine <https://wiki.openstack.org/wiki/Heat>`_
|
|
(aka Heat) to provision nodes for Hadoop cluster.
|
|
To make Sahara work with Heat the following steps are required:
|
|
|
|
* Your OpenStack installation must have 'orchestration' service up and running
|
|
* Sahara must contain the following configuration parameter in *sahara.conf*:
|
|
|
|
.. sourcecode:: cfg
|
|
|
|
# An engine which will be used to provision infrastructure for Hadoop cluster. (string value)
|
|
infrastructure_engine=heat
|
|
|
|
|
|
There is a feature parity between direct and heat infrastructure engines. It is
|
|
recommended to use heat engine since direct engine will be deprecated at some
|
|
point.
|
|
|
|
Plugin Capabilities
|
|
-------------------
|
|
The below tables provides a plugin capability matrix:
|
|
|
|
+--------------------------+---------+----------+----------+-------+
|
|
| | Plugin |
|
|
| +---------+----------+----------+-------+
|
|
| Feature | Vanilla | HDP | Cloudera | Spark |
|
|
+==========================+=========+==========+==========+=======+
|
|
| Nova and Neutron network | x | x | x | x |
|
|
+--------------------------+---------+----------+----------+-------+
|
|
| Cluster Scaling | x | Scale Up | x | x |
|
|
+--------------------------+---------+----------+----------+-------+
|
|
| Swift Integration | x | x | x | N/A |
|
|
+--------------------------+---------+----------+----------+-------+
|
|
| Cinder Support | x | x | x | x |
|
|
+--------------------------+---------+----------+----------+-------+
|
|
| Data Locality | x | x | N/A | x |
|
|
+--------------------------+---------+----------+----------+-------+
|
|
| EDP | x | x | x | x |
|
|
+--------------------------+---------+----------+----------+-------+
|
|
|
|
Running Sahara in Distributed Mode
|
|
----------------------------------
|
|
|
|
.. warning::
|
|
Currently distributed mode for Sahara is in alpha state. We do not
|
|
recommend using it in production environment.
|
|
|
|
The `installation guide <installation.guide.html>`_ suggests to launch
|
|
Sahara as a single 'sahara-all' process. It is also possible to run Sahara
|
|
in distributed mode with 'sahara-api' and 'sahara-engine' processes running
|
|
on several machines simultaneously.
|
|
|
|
Sahara-api works as a frontend and serves users' requests. It
|
|
offloads 'heavy' tasks to sahara-engine via RPC mechanism. While
|
|
sahara-engine could be loaded, sahara-api by design stays free
|
|
and hence may quickly respond on user queries.
|
|
|
|
If Sahara runs on several machines, the API requests could be
|
|
balanced between several sahara-api instances using a load balancer.
|
|
It is not required to balance load between different sahara-engine
|
|
instances, as that will be automatically done via a message queue.
|
|
|
|
If a single machine goes down, others will continue serving
|
|
users' requests. Hence a better scalability is achieved and some
|
|
fault tolerance as well. Note that the proposed solution is not
|
|
a true High Availability. While failure of a single machine does not
|
|
affect work of other machines, all of the operations running on
|
|
the failed machine will stop. For example, if a cluster
|
|
scaling is interrupted, the cluster will be stuck in a half-scaled state.
|
|
The cluster will probably continue working, but it will be impossible
|
|
to scale it further or run jobs on it via EDP.
|
|
|
|
To run Sahara in distributed mode pick several machines on which
|
|
you want to run Sahara services and follow these steps:
|
|
|
|
* On each machine install and configure Sahara using the
|
|
`installation guide <../installation.guide.html>`_
|
|
except:
|
|
|
|
* Do not run 'sahara-db-manage' or launch Sahara with 'sahara-all'
|
|
* Make sure sahara.conf provides database connection string to a
|
|
single database on all machines.
|
|
|
|
* Run 'sahara-db-manage' as described in the installation guide,
|
|
but only on a single (arbitrarily picked) machine.
|
|
|
|
* sahara-api and sahara-engine processes use oslo.messaging to
|
|
communicate with each other. You need to configure it properly on
|
|
each node (see below).
|
|
|
|
* run sahara-api and sahara-engine on the desired nodes. On a node
|
|
you can run both sahara-api and sahara-engine or you can run them on
|
|
separate nodes. It does not matter as long as they are configured
|
|
to use the same message broker and database.
|
|
|
|
To configure oslo.messaging, first you need to pick the driver you are
|
|
going to use. Right now three drivers are provided: Rabbit MQ, Qpid or Zmq.
|
|
To use Rabbit MQ or Qpid driver, you will have to setup messaging broker.
|
|
The picked driver must be supplied in ``sahara.conf`` in
|
|
``[DEFAULT]/rpc_backend`` parameter. Use one the following values:
|
|
``rabbit``, ``qpid`` or ``zmq``. Next you have to supply
|
|
driver-specific options.
|
|
|
|
Unfortunately, right now there is no documentation with description of
|
|
drivers' configuration. The options are available only in source code.
|
|
|
|
* For Rabbit MQ see
|
|
|
|
* rabbit_opts variable in `impl_rabbit.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_rabbit.py?id=1.4.0#n38>`_
|
|
* amqp_opts variable in `amqp.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/amqp.py?id=1.4.0#n37>`_
|
|
|
|
* For Qpid see
|
|
|
|
* qpid_opts variable in `impl_qpid.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_qpid.py?id=1.4.0#n40>`_
|
|
* amqp_opts variable in `amqp.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/amqp.py?id=1.4.0#n37>`_
|
|
|
|
* For Zmq see
|
|
|
|
* zmq_opts variable in `impl_zmq.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/impl_zmq.py?id=1.4.0#n49>`_
|
|
* matchmaker_opts variable in `matchmaker.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker.py?id=1.4.0#n27>`_
|
|
* matchmaker_redis_opts variable in `matchmaker_redis.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker_redis.py?id=1.4.0#n26>`_
|
|
* matchmaker_opts variable in `matchmaker_ring.py <https://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo/messaging/_drivers/matchmaker_ring.py?id=1.4.0#n27>`_
|
|
|
|
You can find the same options defined in ``sahara.conf.sample``. You can use
|
|
it to find section names for each option (matchmaker options are
|
|
defined not in ``[DEFAULT]``)
|