22b4aee958
Changes * removed sections about patching Swift and references to Folsom and Grizzly * adding domain.name and trust.id to list of configuration options * refactoring example section Change-Id: I3b82079102f1ef8179f4d31a7a23e76343637cb3 Closes-Bug: #1373066
120 lines
4.8 KiB
ReStructuredText
120 lines
4.8 KiB
ReStructuredText
.. _swift-integration-label:
|
|
|
|
Swift Integration
|
|
=================
|
|
Hadoop and Swift integration are the essential continuation of the
|
|
Hadoop/OpenStack marriage. The key component to making this marriage work is
|
|
the Hadoop Swift filesystem implementation. Although this implementation has
|
|
been merged into the upstream Hadoop project, Sahara maintains a version with
|
|
the most current features enabled.
|
|
|
|
* The original Hadoop patch can be found at
|
|
https://issues.apache.org/jira/browse/HADOOP-8545
|
|
|
|
* The most current Sahara maintained version of this patch can be found in the
|
|
Sahara Extra repository https://github.com/openstack/sahara-extra
|
|
|
|
* The latest compiled version of the jar for this component can be downloaded
|
|
from http://sahara-files.mirantis.com/hadoop-swift/hadoop-swift-latest.jar
|
|
|
|
Hadoop patching
|
|
---------------
|
|
You may build the jar file yourself by choosing the latest patch from the
|
|
Sahara Extra repository and using Maven to build with the pom.xml file
|
|
provided. Or you may get the latest jar pre-built from the CDN at
|
|
http://sahara-files.mirantis.com/hadoop-swift/hadoop-swift-latest.jar
|
|
|
|
You will need to put this file into the hadoop libraries
|
|
(e.g. /usr/lib/share/hadoop/lib) on each job-tracker and task-tracker node
|
|
for Hadoop 1.x, or each ResourceManager and NodeManager node for Hadoop 2.x
|
|
in the cluster.
|
|
|
|
Hadoop configurations
|
|
---------------------
|
|
In general, when Sahara runs a job on a cluster it will handle configuring the
|
|
Hadoop installation. In cases where a user might require more in-depth
|
|
configuration all the data is set in the ``core-site.xml`` file on the cluster
|
|
instances using this template:
|
|
|
|
.. sourcecode:: xml
|
|
|
|
<property>
|
|
<name>${name} + ${config}</name>
|
|
<value>${value}</value>
|
|
<description>${not mandatory description}</description>
|
|
</property>
|
|
|
|
|
|
There are two types of configs here:
|
|
|
|
1. General. The ``${name}`` in this case equals to ``fs.swift``. Here is the list of ``${config}``:
|
|
|
|
* ``.impl`` - Swift FileSystem implementation. The ${value} is ``org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem``
|
|
* ``.connect.timeout`` - timeout for all connections by default: 15000
|
|
* ``.socket.timeout`` - how long the connection waits for responses from servers. by default: 60000
|
|
* ``.connect.retry.count`` - connection retry count for all connections. by default: 3
|
|
* ``.connect.throttle.delay`` - delay in millis between bulk (delete, rename, copy operations). by default: 0
|
|
* ``.blocksize`` - blocksize for filesystem. By default: 32Mb
|
|
* ``.partsize`` - the partition size for uploads. By default: 4608*1024Kb
|
|
* ``.requestsize`` - request size for reads in KB. By default: 64Kb
|
|
|
|
|
|
|
|
2. Provider-specific. The patch for Hadoop supports different cloud providers.
|
|
The ``${name}`` in this case equals to ``fs.swift.service.${provider}``.
|
|
|
|
Here is the list of ``${config}``:
|
|
|
|
* ``.auth.url`` - authorization URL
|
|
* ``.tenant``
|
|
* ``.username``
|
|
* ``.password``
|
|
* ``.domain.name`` - Domains can be used to specify users who are not in
|
|
the tenant specified.
|
|
* ``.trust.id`` - Trusts are optionally used to scope the authentication
|
|
tokens of the supplied user.
|
|
* ``.http.port``
|
|
* ``.https.port``
|
|
* ``.region`` - Swift region is used when cloud has more than one Swift
|
|
installation. If region param is not set first region from Keystone endpoint
|
|
list will be chosen. If region param not found exception will be thrown.
|
|
* ``.location-aware`` - turn On location awareness. Is false by default
|
|
* ``.apikey``
|
|
* ``.public``
|
|
|
|
|
|
Example
|
|
-------
|
|
For this example it is assumed that you have setup a Hadoop instance with
|
|
a valid configuration and the Swift filesystem component. Furthermore there is
|
|
assumed to be a Swift container named ``integration`` holding an object named
|
|
``temp``, as well as a Keystone user named ``admin`` with a password of
|
|
``swordfish``.
|
|
|
|
The following example illustrates how to copy an object to a new location in
|
|
the same container. We will use Hadoop's ``distcp`` command
|
|
(http://hadoop.apache.org/docs/r0.19.0/distcp.html) to accomplish the copy.
|
|
Note that the service provider for our Swift access is ``sahara``, and that
|
|
we will not need to specify the project of our Swift container as it will
|
|
be provided in the Hadoop configuration.
|
|
|
|
Swift paths are expressed in Hadoop according to the following template:
|
|
``swift://${container}.${provider}/${object}``. For our example source this
|
|
will appear as ``swift://integration.sahara/temp``.
|
|
|
|
Let's run the job:
|
|
|
|
.. sourcecode:: console
|
|
|
|
$ hadoop distcp -D fs.swift.service.sahara.username=admin \
|
|
-D fs.swift.service.sahara.password=swordfish \
|
|
swift://integration.sahara/temp swift://integration.sahara/temp1
|
|
|
|
After that just confirm that ``temp1`` has been created in our ``integration``
|
|
container.
|
|
|
|
Limitations
|
|
-----------
|
|
|
|
**Note:** Please note that container names should be a valid URI.
|