Add spec for supporting S3-compatible object stores
bp sahara-support-s3 Change-Id: Ib9a64e5bd98803a994cfd58250b9c930b141c9ed
This commit is contained in:
parent
d5ea402718
commit
5e2c4a86b8
161
specs/pike/support-for-s3-compatible-object-stores.rst
Normal file
161
specs/pike/support-for-s3-compatible-object-stores.rst
Normal file
@ -0,0 +1,161 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
Support for S3-compatible object stores
|
||||
==========================================
|
||||
|
||||
https://blueprints.launchpad.net/sahara/+spec/sahara-support-s3
|
||||
|
||||
Following the efforts done to make data sources and job binaries "pluggable",
|
||||
it should be feasible to introduce support for S3-compatible object stores.
|
||||
This will be an additional alternative to the existing HDFS, Swift, MapR-FS,
|
||||
and Manila storage options.
|
||||
|
||||
A previous spec regarding this topic existed around the time of Icehouse
|
||||
release, but the work has been stagnant since then:
|
||||
https://blueprints.launchpad.net/sahara/+spec/edp-data-source-from-s3
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Hadoop already offers filesystem libraries with support for s3a:///path URIs,
|
||||
so supporting S3-compatible object stores on Sahara is a reasonable feature
|
||||
to add.
|
||||
|
||||
Within the world of OpenStack, many cloud operators choose Ceph RadosGW
|
||||
instead of Swift. RadosGW object store supports access through either
|
||||
the Swift or S3 APIs. Also, with some extra configuration, a "native"
|
||||
install of Swift can also support the S3 API. For some users we may expect
|
||||
the Hadoop S3 library to be preferable over the Hadoop Swift library as it
|
||||
has recently received several enhancements including support for larger
|
||||
objects and other performance improvements.
|
||||
|
||||
Additionally, some cloud users may wish to use other S3-compatible object
|
||||
stores, including:
|
||||
|
||||
* Amazon S3 (including AWS Public Datasets)
|
||||
* LeoFS
|
||||
* Riak Cloud Storage
|
||||
* Cloudian HyperStore
|
||||
* Minio
|
||||
* SwiftStack
|
||||
* Eucalyptus
|
||||
|
||||
It is clear that adding support for S3 datasources will open up a new world of
|
||||
Sahara use cases.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
An "s3" data source type will be added, via new code in
|
||||
*sahara.service.edp.data_sources*. We will need utilities to validate S3
|
||||
URIs, as well as to handle job configs (access key, secret key, endpoint,
|
||||
bucket URI).
|
||||
|
||||
Regarding EDP, there should not be much work to do outside of defining the new
|
||||
data source type, since the Hadoop S3 library allows jobs to be run against S3
|
||||
seamlessly.
|
||||
|
||||
Similar work will be done to enable an "s3" job binary type, including the
|
||||
writing of "job binary retriever" code.
|
||||
|
||||
While the implementation of the abstraction is simple, a lot of work comes
|
||||
from dashboard, saharaclient, documentation, and testing.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
Do not add support for S3 as a data source for EDP. Since the Hadoop S3
|
||||
libraries are already included on the image regardless of this change,
|
||||
users can run data processing jobs against S3 manually. We still may wish
|
||||
to add the relevant JARs to the classpath as a courtesy to users.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None (only "s3" as a valid data source type and job binary type in the schema)
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Deployer impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Sahara-image-elements impact
|
||||
----------------------------
|
||||
|
||||
On most images, hadoop-aws.jar needs to be added to the classpath. Generally
|
||||
images with Hadoop (or related component) installed already have the JAR. The
|
||||
work will probably take place during the transition from SIE to image packing,
|
||||
so the work will probably need to be done in both places.
|
||||
|
||||
Sahara-dashboard / Horizon impact
|
||||
---------------------------------
|
||||
|
||||
Data Source and Job Binary forms should support s3 type, with fields for
|
||||
access key, secret key, S3 URI, and S3 endpoint. Note that this is a lot
|
||||
of fields, in fact more than we have for Swift. There will probably be some
|
||||
saharaclient impact too, because of this.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Jeremy Freudberg
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* S3 as a data source
|
||||
* S3 as a job binary
|
||||
* Ensure presence of AWS JAR on images
|
||||
* Dashboard and saharaclient work
|
||||
* Scenario tests
|
||||
* Documentation
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
We will probably want scenario tests (although we don't have them for Manila).
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Nothing out of the ordinary, but important to keep in mind both user and
|
||||
developer perspective.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
Loading…
Reference in New Issue
Block a user