Initial push of docs
This commit is contained in:
@ -1,2 +1,96 @@
# swift-storlets
Middleware and Compute Engine for an Openstack Swift compute framework that runs compute within a Swift cluster
Swift Storlets
Swift Storlets extend Swift with the capability to run computation near the data in a secure and isolated manner. With Swift Storlets a user can write code,
package and deploy it as a Swift object, and then explicitly invoke it on data objects as if the code was part of the Swift pipeline.
We use the term Storlet to refer to the binary code deployed as a Swift object.
Invoking a Storlet on a data object is done in an isolated manner so that the data accessible by the computation is only the object's data and its user metadata.
Moreover, the computation has no access to disks, network or to the Swift request environment.
The Swift Storlets repo provides:
* A Swift storlet middleware that can intercept a request for running a storlet over some data,
forward the data to the compute engine and stream back the compute engine results.
* A Storlet gateway API, defining the compute engine API used by the Swift storlet middleware
to invoke computations.
* A StorletGateway implementation class that implements the StorletGateway API.
This class runs in the context of the Swift storlet middleware inside the proxy and
object service pipelines, and is responsible to communicate with the compute engine passing
it the data to be processed and getting back the result.
* The Docker based compute engine which is responsible for sandboxing the execution of the Storlet.
The Docker based compute engine is described in doc/source/docker_compute_engine.rst.
* Initial tools for managing and deploying Docker images within the Swift cluster.
The documentation in this repo is organized according to the various roles involved with Swift Storlets:
#. Storlet developer. The Storlet developer develops, packages and deploys Storlets to Swift. This is described in: doc/source/writing_and_deploying_storlets.rst
#. Storlet user. A Swift user that wishes to invoke a deployed Storlet on some data object in Swift. doc/source/invoking_storlets.rst describes how to invoke a Storlet.
#. Storlets account manager (or account manager in short). The account manager is an admin user on the customer side who is typically the one responsible for paying the
bill (and perhaps setting ACLs). From Storlets perspective the account manager is responsible for managing the Docker image as well as the Storlets that can be executed
on data in the account. Part of the echo system is giving the account manager a way to deploy a Docker image to be used for Storlets execution within that account.
doc/source/building_and_deploying_docker_images.rst has the details.
#. Swift Storlet manager. Typically, this is the Swift admin on the provider side that deals with rings and broken disks.
From the Storlets perspective (s)he is the one responsible for the below. doc/source/storlets_management.rst has the details of the provided tools to do that.
Those tools are installed on a designated node having a 'Storlet management' role (See installation section below)
* Enabling a Swift account for Storlets. Since we wanted to give a self contained implementation we actuially give a tool for
creating a Storlet enabled account. That is, we first create a tenant and account in Keystone, and then do the Swift related
operations for enabling the account for Storlets.
* Deploy an account's Docker image across the cluster. This allows the account admin to upload a self tailored Docker image that the Swift admin can
then deploy across the cluster. Requests for running Storlets in that account would be served by Storlets running over this account's self tailored image.
#. Swift storlet developer. Someone looking at playing with the code of the storlet middleware and the storlet gateway. If you are one of those you will be interested in:
* doc/source/dev_and_test_guide.rst
* doc/source/storlets_docker_gateway.rst
Finally, these are a MUST:
#. doc/source/storlets_installation_guide.rst
#. doc/source/storlet_all_in_one.rst
Storlets Invocation Types
Storlets can be invoked in 2 ways:
#. Invocation upon GET. In this case the user gets a transformation of the object residing in the store (as opposed to the actual object). Typical use case for GET is anonimization, where the user might not have access to a certain data unless it is being anonymized by some storlet.
#. Invocation upon PUT. In this case the data kept in the store is a transformation of the object uploaded by the user (as opposed to the actual uploaded data or metadata). A typical use case is metadata enrichment, where a Storlet extracts format specific metadata from the uploaded data and adds it as Swift metadata.
doc/source/storlets_installation_guide.rst describes how to install Storlets in an existing Swift cluster that uses Keystone.
For convenience we also provide a storlet all-in-one installation script that installs Swift with Keystone and Storlets in a single virtual machine.
See doc/source/storlets_all_in_one.rst
The installation is based on Ansible and was tested on Ubuntu 14.10, and with Swift 1.13 and Swift 2.2.
Once installation is completed, you can try run the system tests as described in the doc/source/storlets_installation_guide.rst
The system tests are a good reference for writing and deploying a Storlet.
The purpose of this repository is to serve as a mostly read only reference for (1) the Swift storlets middleware, and (2) a storlets gateway
Having said that we will be doing fixing of major bugs, potentially add some improvements and adaptations required to stay tuned with
the Swift Storlets middleware as it evolves while getting upstream.
Given enough interest from the community this status may change to be a more active project.
Development and Testing
doc/sources/dev_and_test_guide.rst describes how to develop within this repo environment. At this point the development guide is given to allow playing with the code, and make some experiments.
#. The research leading to the development of this code received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under the grant agreements for the ENSURE and VISION Cloud projects.
#. The development of this code received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under the grant agreements for the:
* ForgetIT, where the code is used for pushing down analytics jobs to the object storage
* COSMOS projects, where the code is used for TODO
#. Future development of this code would receive funding from:
* The European Community's Seventh Framework Programme (FP7/2007-2013) under the grant agreement for the FI-CORE project where the code is integrated with a holistic cloud deployment solution, and from
* the European Community's Horizon 2020 (H2020/2014-2020) under the grant agreement for the IOStack project where the codeis used as a backend implementing Storage policies
Normal file
Normal file
Binary file not shown.
After ![]() (image error) Size: 22 KiB |
Normal file
Normal file
@ -0,0 +1,215 @@
The Swift account manager can supply a Docker image in which the account's storlets
are to be executed. When planning the Docker image, the account manager needs to consider the
#. The image is built over an existing image that contains the storlets run time.
#. The image would be executed with no network devices, with limited memory and
#. A Docker image is not brought up as a general purpose Linux machine in terms
of the init process. Specifically, do not install daemons that require special
initializations on 'OS bring up'.
The idea is that a user supplied docker image would contain dependencies
required by storlets in the form of libraries.
Logical Flow
The flow of deploying an account manager tailored Docker image to Swift involves
both the account manager (customer side) and the Swift Storlet manager (provider side)
Below are the steps of this flow:
#. A prerequisite for the account manager to deploy a Docker image to Swift is having an
account that is enabled for Storlets. This is an operation that is done by the Swift Storlet
manager and is explained in storlets_management.rst.
#. Once the account is enabled for Storlets, a container named docker_images is
created, with access to both the account manager as well as the storlet manager.
That container will include a basic Docker image consisting of some Storlet
engine code that allows the Swift Storlet engine to work with that image.
#. The account manager can download this image, adjust it to his needs in terms of
the installed software stack, and upload it back to the docker_images container.
#. Once uploaded, the account manager must notify the Swift Storlet engine manager
of the update. The storlets manager would take care of testing and deploying
it to all Swift nodes. storlets_management.rst describes the provided tool
the Storlet manager can use for the actual deployment.
The below sections describe in details the steps taken by the account manager
Downloading the Docker Image
Downloading the Docker image involves a simple retrieval of a Swift object. To
get the exact name of the object just list the docker_images container in the
account. The name will carry the base OS system and engine language binding run
time. An example might be: ubuntu_14.04_jre7_storlets reflecting the following
#. The base OS is Ubuntu 14.04. Currently this is the only base OS we support.
#. The Storlets run time is jre7. Currently storlets can be written only in Java.
#. The storlet engine code is installed.
The image will come in a .tar format.
Below is an example of downloading the image from the tenant's docker_images
container using the swift CLI. As with all examples using the Swicf CLI, we are
using environment variables defining the tenant, user credentials and auth URI.
All these are required for the operation of any Swift CLI. Please change them
export OS_USERNAME=swift
export OS_PASSWORD=passw0rd
export OS_TENANT_NAME=service
export OS_AUTH_URL=
In the below be show:
#. Listing the docker_images container.
#. Downloading the image object
#. Getting the image object's metadata. Pay attention to the image_name metadata
field of the object. It is required for the next steps.
eranr@lnx-ccs8:~$ swift list docker_images
eranr@lnx-ccs8:~$ swift download docker_images ubuntu_14.04_jre7_storlets.tar
ubuntu_14.04_jre7_storlets.tar [headers 0.311s, total 8.550s, 68.008 MB/s]
eranr@lnx-ccs8:~$ swift stat docker_images ubuntu_14.04_jre7_storlets.tar
Account: AUTH_305f5f3d12834be187238e080b8643e4
Container: docker_images
Object: ubuntu_14.04_jre7_storlets.tar
Content Type: application/x-tar
Content Length: 581439488
Last Modified: Sat, 25 Oct 2014 19:47:13 GMT
ETag: ac014db984be37faf7307801baa11ab0
Meta Image-Name: ubuntu_14.04_jre7_storlets
Meta Mtime: 1414266426.880534
Accept-Ranges: bytes
X-Timestamp: 1414266432.09929
X-Trans-Id: tx794e21cd40b544e6a377b-00544bfed3
Tuning the Docker Image
To tune the Docker image, Docker must be used. To install please refer to
The below steps illustrate the tuning process:
1. Use docker load to load the .tar image. Each Docker client maintains a local
repository of the images from which containers can be executed. The load
operation simply loads the .tar file to that local repository. Note that once
the .tar is loaded, the docker images command shows the image, whose name has
a suffix identical to the image object Swift metadata.
root@lnx-ccs8:/home/eranr# docker load -i ubuntu_14.04_jre7_storlets.tar
root@lnx-ccs8:/home/eranr# docker images
localhost:5001/ubuntu_14.04_jre7_storlets latest f6929e6abc60 3 days ago 563.6 MB
2. Use a Docker file that is based on the loaded image to make the necessary
changes to the image. Below is a Dockerfile for installing 'ffmpeg'. Few
notes are in place:
#. The first line "FROM" must carry the image name we have downloaded.
#. The maintainer needs to be a user that is allowed to do the actual actions
within the container. Please leave it as is.
#. The below example shows ffmpeg installation. For more options and
information on Dockerfiles, please refer to:
#. One MUST refrain from using the Dockerfile ENTRYPOINT and CMD. Using those
will cause the image from being unusable by the Storlet engine.
root@lnx-ccs8:/home/eranr/dockerfile_example# cat Dockerfile
RUN ["apt-get", "update"]
RUN ["apt-get", "install","-y", "software-properties-common"]
RUN ["add-apt-repository","deb trusty main"]
RUN ["apt-key", "adv", "--recv-keys", "--keyserver", "", "1DB8ADC1CFCA9579"]
RUN ["apt-key", "update"]
RUN ["apt-get", "update"]
RUN ["apt-get", "install", "-y", "ffmpeg"]
3. We now use the Docker fie to create a new image from it. Note the -t directive
for the new image name to be created. The name of the image would be required
for the Storlet manager to deploy the Storlet. Also, note that the command
ends with a dot "." specifying in which directory the build is taking place.
when building an image that copies stuff into the image, all that stuff must
reside in that building directory.
root@lnx-ccs8:/home/eranr/dockerfile_example# docker build -t service_tenant_image .
Sending build context to Docker daemon 2.56 kB
Sending build context to Docker daemon
Step 0 : FROM
---> f6929e6abc60
Processing triggers for libc-bin (2.19-0ubuntu6.3) ...
---> 11975468ecf8
Removing intermediate container 226d2510b925
Successfully built 11975468ecf8
4. At this point listing the images, shows the newly created image.
root@lnx-ccs8:/home/eranr/dockerfile_example# docker images
service_tenant_image latest 11975468ecf8 7 minutes ago 660.1 MB
localhost:5001/ubuntu_14.04_jre7_storlets latest f6929e6abc60 4 days ago 563.6 MB
Currently, we have no testing tool that can actually test a storlet inside the
created image. The best one can do is run a Docker container based on the
image, and run within it code that simulates how the Storlet would use the image.
Below we run /bin/bash inside a container based on the newly created image.
We then invoke ffmpeg showing that the installation was indeed successful.
Note that the 'debug' parameter tells our entry point not to execute the storlet
engine but rather the /bin/bash from which we can run ffmpeg
root@lnx-ccs8:/home/eranr/dockerfile_example# docker run -i -t service_tenant_image debug /bin/bash
root@b129c3e6e76b:/# ffmpeg
ffmpeg version 1.2.6-7:1.2.6-1~trusty1 Copyright (c) 2000-2014 the FFmpeg developers
built on Apr 26 2014 18:52:58 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1)
configuration: --arch=amd64 --disable-stripping --enable-avresample --enable-pthreads --enable-runtime-cpudetect --extra-version='7:1.2.6-1~trusty1' --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --enable-bzlib --enable-libdc1394 --enable-libfreetype --enable-frei0r --enable-gnutls --enable-libgsm --enable-libmp3lame --enable-librtmp --enable-libopencv --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-vaapi --enable-vdpau --enable-libvorbis --enable-libvpx --enable-zlib --enable-gpl --enable-postproc --enable-libcdio --enable-x11grab --enable-libx264 --shlibdir=/usr/lib/x86_64-linux-gnu --enable-shared --disable-static
libavutil 52. 18.100 / 52. 18.100
libavcodec 54. 92.100 / 54. 92.100
libavformat 54. 63.104 / 54. 63.104
libavdevice 53. 5.103 / 53. 5.103
libavfilter 3. 42.103 / 3. 42.103
libswscale 2. 2.100 / 2. 2.100
libswresample 0. 17.102 / 0. 17.102
libpostproc 52. 2.100 / 52. 2.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
Use -h to get full help or, even better, run 'man ffmpeg'
Uploading the Docker Image
1. Use docker save to save the image as a tar file:
root@lnx-ccs8:/home/eranr/dockerfile_example# docker save -o service_tenant_image.tar service_tenant_image
2. Again, we use the Swift CLI to upload the image. We assume the appropriate
environment variables are in place.
root@lnx-ccs8:/home/eranr/dockerfile_example# swift upload docker_images service_tenant_image.tar
Normal file
Normal file
@ -0,0 +1,58 @@
Development and Testing
This guide explains how to build the various components, and how to deploy them once built.
Note that some of the components being built need to be inserted to a docker image before
they can be tested. Thus, once should have an installed environment (see storlet_installation_guide.rst)
before proceeding with the test and deploy steps (which are in fact a subset of the installation steps).
The repo consists of code written in Python, Java and C. We have chose ant to serve as a 'make' tool for all code.
The main build task in build.xml is dependent on two other build tasks:
#. build_storlets task, used above. This task builds all the sample storlets used in the system tests.
#. build engine task, used for building/packaging the following components:
#. The storlet middleware and the storlet docker gateway python code. These are build as two packages in a single 'storlets' egg:
* storlet_middleware
* storlet_gateway
#. The SBus code. This is the communication module between the gateway and the Docker container. It has a transport layer written in "C" and
'bindings' to both Java and Python.
#. The Python written storlet_factory_daemon, which is packaged for installation in a Docker image
#. The Java SDaemon code, which is the daemon code that loads the Storlets in run time. This code is compiled to a .jar that is later installed
in the Docker image.
#. The Java SCommon code, which has the Storlet interface declaration, as well as the accompanying classes appearing in the interface. This code
is copiled to a .jar that is required both in the Docker image as well as for building Storlets.
Two additional tasks of interest in our build.xml are the deploy_host_engine and deploy_container_engine. These tasks are based on the Ansible installation scripts and do the following:
#. deploy_host_engine would get all the code that is relevant to the host side (python middleware and SBus) and deploy it on the hosts, as described in Deploy/playbook/hosts file
#. deploy_container_engine, would create an updated image of the tenant defined in Deploy/playbook/common.yml and distribute it to all nodes as defined in Deploy/playbook/hosts. Typically, the hosts file will describe an all-in-one type of installation.
Running the Tests
Other than testing, those tests are a good reference for writing and deploying storlets.
To run the system tests follow the next steps:
#. cd to the repo root
#. run 'ant build_storlets'
#. cd to SystemTests
#. Edit the file and make sure that the following variables match the installation.
If you have used the storlets all-in-one installation, this is already taken care of.
- DEV_AUTH_IP - The IP of the Keystone authentication endpoint
- ACCOUNT - The name of the account created for Storlets
- USER_NAME - The user name created for the account
- PASSWORD = The above user password
#. run 'python'
Normal file
Normal file
@ -0,0 +1,21 @@
Docker Compute Engine
The Docker compute engine makes use of Docker containers to sandbox execution of Storlets. The engine is designed to reuse Docker containers and processes within them to minimize the latency of Storlets invocations.
To facilitate multi-tenancy the engine holds a Docker image per Swift account. The engine is made of the following components as depicted in the figure below:
#. The Storlet middleware running the StorletsDockerGateway class implementing the StorletsGateway interface.
#. The Storlet daemon. A Java based generic daemon that when spawned, loads dynamically the Storlet code, initiates a thread pool to process requests and listens for invoke commands coming from the StorletDockerGateway class.
#. The daemon factory. This is the Docker container main process that manages the Storlet daemons lifetime. The factory listens for commands coming from the StorletDockerGateway class.
#. SBus. The Storlets bus. A communication mechanism based on unix domain sockets used to pass commands and file descriptors from the StorletDockerGateway to the factory and Storlet daemons. There are two types of SBus instances:
#. SBus connecting the StorletsDockerGateway to the daemon factory. There is an instance of this type of SBus per account's Docker container.
#. SBus connecting the StorletsDockerGateway to each Storlet daemon.
.. image:: engine.png
:height: 800
:width: 1476
:scale: 50
:alt: alternate text
Normal file
Normal file
Binary file not shown.
After ![]() (image error) Size: 17 KiB |
Normal file
Normal file
@ -0,0 +1,208 @@
Storlets Invocation
Once the storlet and its dependencies are deployed the storlet is ready for execution,
and can be invoked. Storlets can be invoked as part of a GET or a PUT request.
Invocation via PUT and GET involves adding an extra header to the Swift original
PUT/GET requests. Following our Identity Storlet example, here are invocation
examples. This time the examples make use of the python swift client.
Invocation on GET
The code below shows the invocation. Some notes:
#. There are invocations with and without a parameter controlling whether the
get42 binary dependency is to be called. Note the difference in the response
headers where one shows the execution result and the other does not.
#. Note the X-Run-Storlet header. being added to the call.
#. Note the X-Generate-Log storlet that causes a log file to be created.
The execution results below show the log retrieval.
from swiftclient import client as c
def get_processed_object(url, token, storlet_name, container_name, object_name, invoke_get42 = False):
headers = {'X-Run-Storlet': storlet_name,
'X-Storlet-Generate-Log' : 'True'}
if (invoke_get42 == True):
querystring = 'execute=true'
querystring = None
response_headers, object_content = c.get_object(url,
query_string = querystring,
headers = headers)
print response_headers
print object_content
AUTH_IP = ''
AUTH_PORT = '5000'
ACCOUNT = 'service'
USER_NAME = 'swift'
PASSWORD = 'passw0rd'
os_options = {'tenant_name': ACCOUNT}
url, token = c.get_auth("http://" + AUTH_IP + ":" + AUTH_PORT + "/v2.0", ACCOUNT +":"+USER_NAME, PASSWORD, os_options = os_options, auth_version="2.0")
print 'Identity Storlet invocation without calling get42'
get_processed_object(url, token, 'identitystorlet-1.0.jar', 'myobjects', 'source.txt')
print 'Identity Storlet invocation instructing to call get42'
get_processed_object(url, token, 'identitystorlet-1.0.jar', 'myobjects', 'source.txt', True)
Here is the result of the running the above python script:
eranr@lnx-ccs8:/tmp$ python
Identity Storlet invocation without calling get42
'x-object-meta-x-object-meta-testkey': 'tester',
'transfer-encoding': 'chunked',
'accept-ranges': 'bytes',
'x-object-meta-testkey': 'tester',
'last-modified': 'Tue, 30 Sep 2014 22:07:42 GMT',
'etag': '8ca2a24dbd9779d462c66866c0fb90c3',
'x-timestamp': '1412114861.90504',
'x-trans-id': 'tx464a488a618e44b5b763d-00542baa25',
'date': 'Wed, 01 Oct 2014 07:15:50 GMT',
'x-object-meta-type': 'SBUS_FD_INPUT_OBJECT',
'content-type': 'application/octet-stream'
Some content to copy
Identity Storlet invocation instructing to call get42
'x-object-meta-execution result': '42',
'x-object-meta-x-object-meta-testkey': 'tester',
'transfer-encoding': 'chunked',
'accept-ranges': 'bytes',
'x-object-meta-testkey': 'tester',
'last-modified': 'Tue, 30 Sep 2014 22:07:42 GMT',
'etag': '8ca2a24dbd9779d462c66866c0fb90c3',
'x-timestamp': '1412114861.90504',
'x-trans-id': 'tx12a4f2a168804dcabf8fc-00542baa26',
'date': 'Wed, 01 Oct 2014 07:15:50 GMT',
'x-object-meta-type': 'SBUS_FD_INPUT_OBJECT',
'content-type': 'application/octet-stream'
Some content to copy
We now show a download of the log file generated per the X-Storlet-Generate-Log header.
Again, we use the swift client assuming we have the appropriate environment variables in place.
Note that the log reflects the two invocations done above.
eranr@lnx-ccs8:/tmp$ swift download storletlog identitystorlet.log
identitystorlet.log [headers 0.243s, total 0.243s, 0.001 MB/s]
eranr@lnx-ccs8:/tmp$ cat identitystorlet.log
About to invoke storlet
IdentityStorlet Invoked
Storlet invocation done
About to invoke storlet
IdentityStorlet Invoked
Exec = /home/swift/identitystorlet/get42
Exit code = 42
Storlet invocation done
Invocation on PUT
the code below shows the invocation. Some notes:
#. As with the GET example there are invocations with and without a parameter controlling whether the get42 binary dependency is to be called. After each put we do a GET and print the response headers to show the difference between the invocations. See below.
#. As with the GET example we add the X-Run-Storlet header.
#. This time we do not add the X-Generate-Log header, which is the recommended way, as it saves a creation of an object.
import random
import string
from swiftclient import client as c
def put_processed_object(url, token, storlet_name, container_name, object_name, file_name_to_upload, invoke_get42 = False):
headers = {'X-Run-Storlet': storlet_name,
'X-Storlet-Generate-Log' : 'True'}
if (invoke_get42 == True):
querystring = 'execute=true'
querystring = None
fileobj = open(file_name_to_upload,'r')
random_md = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(32))
headers = {'X-Run-Storlet': 'identitystorlet-1.0.jar', 'X-Object-Meta-Testkey' : random_md}
headers = headers,
query_string = querystring,
resp_headers, saved_content = c.get_object(
print resp_headers
AUTH_IP = ''
AUTH_PORT = '5000'
ACCOUNT = 'service'
USER_NAME = 'swift'
PASSWORD = 'passw0rd'
os_options = {'tenant_name': ACCOUNT}
url, token = c.get_auth("http://" + AUTH_IP + ":" + AUTH_PORT + "/v2.0", ACCOUNT +":"+USER_NAME, PASSWORD, os_options = os_options, auth_version="2.0")
print 'Identity Storlet invocation without calling get42'
put_processed_object(url, token, 'identitystorlet-1.0.jar', 'myobjects', 'source.txt', '/tmp/source.txt')
print 'Identity Storlet invocation instructing to call get42'
put_processed_object(url, token, 'identitystorlet-1.0.jar', 'myobjects', 'source.txt', '/tmp/source.txt' , True)
Here is the result of the running the above python script:
eranr@lnx-ccs8:/tmp$ python
Identity Storlet invocation without calling get42
'content-length': '1024',
'x-object-meta-x-object-meta-testkey': '1185FZ5FPQ1WXS9IDT4TZZB6GYAQQ0WL',
'accept-ranges': 'bytes',
'x-object-meta-testkey': '1185FZ5FPQ1WXS9IDT4TZZB6GYAQQ0WL',
'last-modified': 'Wed, 01 Oct 2014 07:48:56 GMT',
'etag': '7575c5b098f45ccabce1c3f7fc906eb9',
'x-timestamp': '1412149735.87168',
'x-trans-id': 'tx9a27ba91bee34a8ca9f0c-00542bb1e7',
'date': 'Wed, 01 Oct 2014 07:48:55 GMT',
'x-object-meta-type': 'SBUS_FD_INPUT_OBJECT',
'content-type': 'text/plain'
Identity Storlet invocation instructing to call get42
'x-object-meta-execution result': '42',
'content-length': '1024',
'x-object-meta-x-object-meta-testkey': '54YA1EDTTODMBUJOYCHEGSOQQPV0180L', // This looks like a bug
'accept-ranges': 'bytes',
'x-object-meta-testkey': '54YA1EDTTODMBUJOYCHEGSOQQPV0180L',
'last-modified': 'Wed, 01 Oct 2014 07:48:56 GMT',
'etag': '7575c5b098f45ccabce1c3f7fc906eb9',
'x-timestamp': '1412149735.97100',
'x-trans-id': 'txde8619a966c14b0c99d97-00542bb1e8',
'date': 'Wed, 01 Oct 2014 07:48:56 GMT',
'x-object-meta-type': 'SBUS_FD_INPUT_OBJECT',
'content-type': 'text/plain'
Normal file
Normal file
@ -0,0 +1,45 @@
The StorletDockerGateway implements the StorletsGatewayBase API, which is called by the storlet middleware. The API is defined as follows:
validates correctness of the storlet name as well as mandatory headers
req the Swift request
def validateStorletUpload(req)
Checks that access to the container / object is authorized
req the Swift request
def authorizeStorletExecution(req)
Checks that access to the container / object is authorized
req the Swift request
def augmentStorletRequest(req)
Invoke the PUT proxy implementation of the gateway
req the Swift request as received from client
container the targeted container
obj the targeted object
def gatewayProxyPutFlow(req, container,obj)
Checks that access to the container / object is authorized
req the Swift request
container the targeted container
obj the targeted object
orig_resp this is the Swift response of the plain GET request applied to the targetted object (that is without Storlet invocation)
def gatewayObjectGetFlow(req, container, obj, orig_resp)
Normal file
Normal file
@ -0,0 +1,110 @@
Storlets Installation
Storlets installation (and build) assume an existing Swift cluster that works with Keystone.
The installation consists of the following components:
#. A set of scripts to assist the Storlets admin. This admin represents the provider, and is responsible for the Storlet enabled accounts and their image management.
Those scripts make use of a private Docker registry as well as a designated account and containers used to keep Storlet management related state.
#. A Docker engine that is installed on all proxy and object nodes.
#. The Swift storlet middleware and gateway that are installed on all proxy and object nodes.
#. A basic storlets enabled Docker image that is added to the private Docker registry.
#. A default account, enabled for Storlets with a default image deployed on all proxy and object nodes. This image is based on the basic storlets enabled image.
The installation scripts takes two input files:
#. A 'hosts' file describing the nodes on which the installation takes place. This is a standard Ansible hosts file that needs to have the following sections (an example is given below).
#. docker. The node to be installed with a private Docker registry
#. storlet-mgmt. The node to be installed with the Storlet management scripts
#. storlet-proxy. The list of the Swift cluster proxies
#. storlet-storage. The list of the Swift cluster object servers
#. root or a sudoer credentials Ansible can use to ssh the machines. In the below example we assume all nodes have the same credentials.
#. An Ansible var file with various inputs, such as the Keystone IP and credentials, the Storlet management account information, etc. The file is Deploy/playbook/common.yml, and we give below the entries of interest that may need editing.
At a high level the installation consists of the following steps:
#. Install Docker on all nodes.
#. Install the storlets middleware on each of the Swift nodes (proxies and storage nodes).
#. Create a tenant enabled for storlets (assumes Keystone).
#. Deploy a default Docker image for the above tenant.
#. Install a set of storlets management scripts. Done on a designated node having a storlet management role.
hosts file example
|||| ansible_ssh_user=root ansible_ssh_pass=passw0rd
|||| ansible_ssh_user=root ansible_ssh_pass=passw0rd
|||| ansible_ssh_user=root ansible_ssh_pass=passw0rd
|||| ansible_ssh_user=root ansible_ssh_pass=passw0rd
Few notes:
#. For an 'all-in-one' type of installation, one can specify in all sections.
#. If all hosts have the same ssh user and password one can use ansible's group_vars/all
Below are the entries of interest of common.yml
# A cross nodes directory for Storlets internal usage. Must exist with the same name in all proxy and storage nodes.
lxc_device: /home/lxc_device
# A pointer to this repo
storlet_source_dir: <need to point to the repo root>
# Swift Access information. The below IP should be an IP of one of the proxies.
swift_public_url: http://{{ swift_endpoint_host }}:80/v1
# Keystone access information
keystone_admin_url: http://{{ keystone_endpoint_host }}:35357/v2.0
keystone_public_url: http://{{ keystone_endpoint_host }}:5000/v2.0
keystone_admin_token: ADMIN
keystone_admin_password: passw0rd
# Information for creating an account for the Storlet manager
storlet_management_account: storlet_management
storlet_management_admin_username: storlet_manager
storlet_manager_admin_password: storlet_manager
# Information for creating a Storlet enabled account
storlets_default_tenant_name: service
storlets_default_tenant_user_name: swift
storlets_default_tenant_user_password: passw0rd
to perform the installation follow these 3 steps:
#. Create a hosts file as described above
#. Edit the file Deploy/playbook/common.yml according to the above
#. Under Deploy/playbook/ run 'ansible-playbook -i <hosts file> storlet.yml'
in case the hosts file has credentials of a sudoer user, you eill need to run: 'ansible-playbook -s -i <hosts file> storlet.yml'
Tip: you might want to "export ANSIBLE_HOST_KEY_CHECKING=False" before running the playbook in case the hosts are not in known_hosts.
Note: The hosts file used for running the playbook is also used by the admin tool to deploy future images. Thus, the ssh information kept in
this file must also apply when used from the storlet-mgmt node.
Normal file
Normal file
@ -0,0 +1,179 @@
The Storlet manager operations currently include:
#. Creating a Storlet enabled tenant.
#. Deploying an image that was created by the tenant admin as described in building_and_deploying_docker_images.rst
The scripts providing these operations are located under /opt/ibm in the storlet management machine.
Creating a Storlet enabled Tenant
The operation of creating a Storlet enabled tenant is made of the following steps:
#. Create a new tenant in Keystone, together with a tenant admin user.
#. Enable the corresponding Swift account for storlets, including the creation of the Storlet specific containers
whose default names are: storlet, dependency, storletlog and docker_images
#. Upload the default Storlets image to the account's docker_images container.
Running the creation task
The script perform all of the above operations.
Underneath the script uses Ansible.
The script takes 3 parameters:
#. The tenant name to create
#. The user name for the account manager
#. The password for the account manager
Note that the script is aware of the Keysone admin credentials as they
were provided to the initial installation script as described in storlets_installation_guide.rst
Below is a sample invocation:
root@lnx-ccs8:/opt/ibm# ./
./ <tenant_name> <user_name> <user_password>
root@lnx-ccs8:/opt/ibm# ./ new_tenant new_tenant_admin passw0rd
PLAY [localhost] **************************************************************
GATHERING FACTS ***************************************************************
ok: [localhost]
TASK: [get_hosts_object | get hosts object] ***********************************
changed: [localhost]
PLAY RECAP ********************************************************************
localhost : ok=2 changed=1 unreachable=0 failed=0
PLAY [storlet-mgmt] ***********************************************************
GATHERING FACTS ***************************************************************
ok: [localhost]
TASK: [add_new_tenant | create new tenant new_tenant] *************************
changed: [localhost]
TASK: [add_new_tenant | create new user new_tenant_admin for tenant new_tenant] ***
changed: [localhost]
TASK: [add_new_tenant | apply role admin to user new_tenant_admin] ************
changed: [localhost]
TASK: [add_new_tenant | Set account metadata in swift -- enable storlets] *****
changed: [localhost]
TASK: [add_new_tenant | put account container log] ****************************
changed: [localhost]
TASK: [add_new_tenant | put account container storlet] ************************
changed: [localhost]
TASK: [add_new_tenant | put account container dependency] *********************
changed: [localhost]
TASK: [add_new_tenant | put account container docker_images] ******************
changed: [localhost]
TASK: [add_new_tenant | save default storlet docker image as tar file] ********
changed: [localhost]
TASK: [add_new_tenant | upload docker image to docker_images container] *******
changed: [localhost]
TASK: [add_new_tenant | remove storlet docker image tar file] *****************
changed: [localhost]
PLAY RECAP ********************************************************************
localhost : ok=12 changed=11 unreachable=0 failed=0
Deploying a Tenant Image
Recall that in the Docker image build (described in building_and_deploying_docker_images.rst) the image was given a name
(specified after -t in the docker build command) and was uploaded as a .tar file to the tenant's docker_images Swift container.
When deploying an image, the Storlet's admin needs to provide the tenant name, the .tar object name and the image name.
Running the deployment task
Following the example from the build image instructions, the image name is called service_tenant_image
and the object name that was uploaded is service_tenant_image.tar, and so we execute:
root@lnx-ccs8:/opt/ibm# ./
./ <tenant_name> <tar_object_name> <tenant_image_name>
root@lnx-ccs8:/opt/ibm# ./ new_tenant service_tenant_image.tar service_tenant_image
PLAY [localhost] **************************************************************
GATHERING FACTS ***************************************************************
ok: [localhost]
TASK: [get_hosts_object | get hosts object] ***********************************
changed: [localhost]
PLAY RECAP ********************************************************************
localhost : ok=2 changed=1 unreachable=0 failed=0
PLAY [storlet-mgmt] ***********************************************************
GATHERING FACTS ***************************************************************
ok: [localhost]
TASK: [push_tenant_image | Get the tenant id from Keystone] *******************
changed: [localhost]
TASK: [push_tenant_image | get image tar file] ********************************
changed: [localhost]
TASK: [push_tenant_image | load image to local docker registry] ***************
changed: [localhost]
TASK: [push_tenant_image | create the tenant specific docker image step 1 - create repo dir] ***
changed: [localhost]
TASK: [push_tenant_image | create the tenant specific docker image step 2 - create Docker file] ***
changed: [localhost]
TASK: [push_tenant_image | create the tenant specific docker image step 3 - copy tenant_id file to build dir] ***
changed: [localhost]
TASK: [push_tenant_image | Build the image {{tenant_id.stdout_lines[0]}}] *****
changed: [localhost]
TASK: [push_tenant_image | Push the image to the global registry] *************
changed: [localhost]
TASK: [push_tenant_image | remove storlet docker image tar file] **************
changed: [localhost]
PLAY RECAP ********************************************************************
localhost : ok=10 changed=9 unreachable=0 failed=0
PLAY [storlet] ****************************************************************
GATHERING FACTS ***************************************************************
ok: [localhost]
TASK: [pull_tenant_image | Get the tenant id from Keystone] *******************
changed: [localhost]
TASK: [pull_tenant_image | docker pull] ***************************************
changed: [localhost]
PLAY RECAP ********************************************************************
localhost : ok=3 changed=2 unreachable=0 failed=0
Testing the deployment
Once deployed, all swift nodes should have the image. A docker images command should show a newly created image having a name of the form <repository>:<port>/<tenant keystone id> as shown below.
root@lnx-ccs8:/opt/ibm# docker images
localhost:5001/e0d4204e4e7c4c079a58f0b8156a921b latest 138e3c6a0b07 3 minutes ago 596.8 MB
Normal file
Normal file
@ -0,0 +1,350 @@
Currently, Storlets must be written in Java. Writing a Storlet involves
implementing a single method interface and following some simple rules and best
practices described below.
Once the Storlet is written and tested it can be uploaded as an object to a
designated container (called 'storlet' by default). In addition in case the
Storlet is dependent on some Java library, that library can be uploaded as a
dependency of the Storlet. It is assumed that Storlet dependencies are small
(on the order of few MBs), heavier dependencies should be part of the Docker
image. We describe below how to deploy a storlet and its dependencies.
Once uploaded, the Storlet can be invoked in several ways (depending on
how the Storlet is written):
#. As part of a GET, where the object appearing in the GET request is the
storlet's input and the response to the GET request is the storlet's output.
The Storlet is also provided with the object user's metadata, and can output
user metadata.
#. As part of a PUT, where the request body is the storlet's input, and the
actual object saved in Swift is the storlet's output. The object being written
to the store is taken from the URI of the user's PUT request. As with the GET
case, the Storlet is provided with the user metadata (extracted from the request headers)
and can output user metadata.
To write a storlet you will need the SCommon.jar which is being built as part of
the Storlets build process (see dev_and_test_guide.rst). Import the .jar to a Java
project in Eclipse and implement the interface.
The interface has a single method that looks like this:
public void invoke(ArrayList<StorletInputStream> inStreams,
ArrayList<StorletOutputStream> outStreams,
Map<String,String> parameters, StorletLogger logger) throws StorletException;
Here is a class diagram illustrating the classes involved in the above API.
.. image:: SCommonClassDiagram.png
:height: 1500px
:width: 1900 px
:scale: 50 %
:alt: Programming Model Class Diagram
:align: center
#. The StorleInputStream is used to stream object's data into the storlet.
An instance of the class is provided whenever the Storlet gets an object as
an input. Practically, It is used in the GET and PUT scenarios to
stream in the object's data and metadata. To consume the data do a getStream()
to get a on which you can just read(). To consume the
metadata call the getMetadata() method.
#. The StorleOutputStream is a base class for the StorletObjectOutputStream.
The actual instance ever received by the storlet will be StorletObjectOutputStream
The base class serves for a type of invocation that was removed for the interest
of code simplicity.
#. StorletObjectOutputStream. In the PUT and GET scenarios the storlet is
called with an instance of this class.
- Use the setMetadata method to set the Object's metadata.
- Use getStream to get a on which you can just write()
the content of the object.
- Notice that setMetadata must be called. Also it must be called before
writing the data. Additional guidelines on using StorletObjectOutputStream
are given below.
#. StorletLogger. The StorletLogger class supports a single method called emitLog,
and accepts only String type. Each invocation of the storlet would result in
a newly created object that contains the emitted logs. This object is located
in a designated container called storletlog by default, and will carry the name
<storlet_name>.log. Creating an object containing the logs per request has its
overhead. Thus, the actual creation of the logs object is controlled by a header
supplied during storlet invocation. More information is given in invoking_storlets.rst
When invoked via the Swift GET REST API the invoke method
will be called as follows:
#. The inStreams array would include a single element of type StorleInputStream
representing the object appearing in the request's URI.
#. The outStreams would include a single element of type StorleObjectOutputStream
representing the response returned to the user. Anything written to the output
stream is effectively written to the response body returned to the user's GET
#. The parameters map includes execution parameters sent. These parameters can be
specified in the storlet execution request. See invoking_storlets.rst
**IMPORTANT: Do not use parameters that start with 'storlet_' these are
kept for system parameters that the storlet can use.**
#. A StorletLogger instance.
When invoked via the Swift PUT REST API , the invoke method will be called as
#. The inStreams array would include a single element of type StorleInputStream
representing the object to read.
#. The outStreams would include a single element which is an instance of
StorletObjectOutputStream. Metadata and data written using this object will
make it to the store, under the name provided in the original request URI.
#. The parameters, and StorletLogger as in the GET call.
Storlet Writing Guidelines
Below are some guidelines to writing a Storlet. Some of them are musts, some are
recommendations, and some are tips.
#. The Storlet code must be thread safe and reenterant. The invoke method will
be called many times and potentially in parallel.
#. Once the storlet has finished writing the response, it is important to close
the output stream. Failing to do so will result in a timeout. Specifically,
close the obtained from the call to getStreasm()
#. With the current implementation, a storlet must start to respond within 40
seconds of invocation. Otherwise, Swift would timeout. Moreover, the Storlet
must output something every 40 seconds so as not to timeout. This is a
mechanism to ensure that the Storlet code does not get stuck.Note that
outputting an empty string does not do the job in terms of resetting the 40
seconds timeout.
#. For StorletObjectOutputStream, the call to setMetadata must happen before the
storlet starts streaming out the output data. Note the applicability of the 40
seconds timeout here as well.
#. The total size of metadata given to setMetadata (when serialized as a string)
should not exceed 4096 Bytes
#. While Swift uses the prefix X-Object-Meta to specify that a certain header
reflects a metadata key, the key itself should not begin with that prefix.
More specifically, metadata keys passed to setMetadata should not have that
prefix (unless this is really part of the key)
#. Storlets are tailored for stream processing, that is, process the input as it
is read and produce output while still reading. In other words a 'merge sort'
of the content of an object is not a good example for a storlet as it requires
to read all the content into memory (random reads are not an option as the
input is provided as a stream). While we currently do not employ any restrictions
on the CPU usage or memory consumption of the storlet, reading large object
into memory or doing very intensive computations would have impact on the overall
system performance.
#. While this might be obvious it is advisable to test the storlet prior to its
#. The storlets are executed in an open-jdk 7 environment. Thus, any dependencies
that the storlet code requires which are outside of open-jdk 7 should be
stated as storlet dependencies and uploaded with the storlet. Exact details
are found in the deployment section below.
#. In some cases the Storlet may need to know the path where the storlet .jar
as well as the dependencies are kept inside the Linux container. One reason
may be the need to invoke a binary dependency. To get that path use the
following code:
// Get the path of this class image
String strJarPath = StorletUtils.getClassFolder(this.getClass());
The Identity Storlet Example
The Identity storlet is written to work with both PUT and GET invocations
#. During PUT it will place the data and metadata as uploaded by the user.
#. During GET it will return the data and metadata as stored in the system.
The Storlet has two optional inputs:
#. An integer controlling the chunk size at which data is copied from source to
#. A Boolean controlling whether the Storlet would invoke an executable.
This demonstrates an executable dependency.
The identity storlet code can be found under StorletSamples.
How to Deploy a Storlet
In this paragraph we cover:
#. The principles behind storlet deployment, plus examples.
#. A Swift client example for uploading a storlet.
#. A python example for uploading a storlet.
Storlet Deployment Principles
The compiled class that implements the storlet needs to be wrapped in a .jar.
This jar must not include the SCommon.jar. Any jars that the class implementation
is dependent on should be uploaded as separate jars as shown in the deployment
section below.
Storlet deployment is essentially uploading the storlet and its dependencies to
designated containers in the account we are working with. While a storlet and a
dependency are regular Swift objects, they must carry some metadata used by the
storlet engine. When a storlet is first executed, the engine fetches the necessary
objects from Swift and puts them is a directory accessible by the Docker container.
Note that the dependencies are meant to be small. Having a large list of dependencies
or a very large dependency may result in a timeout on the first attempt to execute a
storlet. If this happens, just re-send the request again.
We consider two types of dependencies: libraries and executables. libraries would
typically be .jar files the storlet code is dependent on. Alternatively, one can
have a binary dependency, that the storlet code can execute.
Following the Identity storlet example, we have 2 objects to upload:
#. The storlet packaged in a .jar. In our case the jar was named:
identitystorlet-1.0.jar The jar needs to be uploaded to a container named
storlet. The name of the uploaded storlet must be of the form <name>-<version>.
The metadata that must accompany a storlet is as follows:
X-Object-Meta-Storlet-Language - currently must be 'java'
X-Object-Meta-Storlet-Interface-Version - currenltly we have a single version '1.0'
X-Object-Meta-Storlet-Dependency - A comma separated list of dependencies. In our case: 'get42'
X-Object-Meta-Storlet-Object-Metadata - Currently, not in use, but must appear. Use the value 'no'
X-Object-Meta-Storlet-Main - The name of the class that implements the IStorlet API. In our case: ''
#. The binary file that the storlet code is dependent on. In our case it is a
binary called get42. The binary should be uploaded to a container named
dependency. The dependency metadata fields appear below. Note the permissions
header. This header is required so that the engine will chmod it accordingly
when placed in the container so that the storlet would be able to execute it.
X-Object-Meta-Storlet-Dependency-Version - While the engine currently does not parse this header, it must appear.
X-Object-Meta-Storlet-Dependency-Permissions - An optional metadata field, where the user can state the permissions
given to the dependency when it is copied to the Linux container. This is helpful for binary dependencies invoked by the
storlet. For a binary dependency once can specify: '0755'
If one wishes to update the storlet just upload again, the engine would recognize
the update and bring the updated code.
Important: Currently, dependency updates are not recognized, only the Storlet
code itself can be updated.
Deploying a Storlet using Swift Client
When using the Swift client one needs to provide the credentials, as well as the
authentication URI. The credentials can be supplied either via environment
variables or via command line parameters. To make the commands more readable I
have used environment variables:
export OS_USERNAME=swift
export OS_PASSWORD=passw0rd
export OS_TENANT_NAME=service
export OS_AUTH_URL=
Here is the Swift client command for uploading the storlet. some notes:
#. We use the upload option of the swift cli.
#. The container name is the first parameter for the upload command and is
#. The name of the object and the local file to upload is 'identitystorelt-1.0-jar'
IMPORTANT: when uploading the file from another directory, that parameter would
be something of the form 'bin/identitystorelt-1.0-jar' in this case the name
of the object appearing in the storlet container would be 'bin/identitystorelt-1.0-jar'
which will not work for the engine.
#. The metadata that needs to accompany the storlet object is provided as headers.
eranr@lnx-ccs8:~/workspace/Storlets/StorletSamples/IdentityStorlet/bin$ swift upload storlet identitystorlet-1.0.jar \
-H "X-Object-Meta-Storlet-Language:Java" \
-H "X-Object-Meta-Storlet-Interface-Version:1.0" \
-H "X-Object-Meta-Storlet-Object-Metadata:no" \
-H "" \
-H "X-Object-Meta-Storlet-Dependency:get42"
Here is the Swift client command for uploading the get42 dependency. Again,
some notes:
#. The container name used here is the first parameter for the upload command and is 'dependency'.
#. We use the optional permissions header as this is a binary .
eranr@lnx-ccs8:~/workspace/Storlets/StorletSamples/IdentityStorlet/bin$ swift upload dependency get42 \
-H "X-Object-Meta-Storlet-Dependency-Version:1.0" \
-H "X-Object-Meta-Storlet-Dependency-Permissions:0755"
Deploying a Storlet with Python
Here is a code snippet that uploads both the storlet as well as the dependencies.
The code assumes v2 authentication, and was tested against a Swift cluster with:
#. Keystone configured with a 'service' account, having a user 'swift' whose
password is 'passw0rd'
#. Under the service account there are already 'storlet', 'dependency', and
'storletlog' containers.
from swiftclient import client as c
def put_storlet_object(url, token, storlet_name, local_path_to_storlet, main_class_name, dependencies):
# Delete previous storlet
resp = dict()
metadata = {'X-Object-Meta-Storlet-Language':'Java',
'X-Object-Meta-Storlet-Dependency': dependencies,
'X-Object-Meta-Storlet-Main': main_class_name}
f = open('%s/%s' % (local_path_to_storlet, storlet_name), 'r')
content_length = None
response = dict()
c.put_object(url, token, 'storlet', storlet_name, f,
content_length, None, None, "application/octet-stream", metadata, None, None, None, response)
print response
def put_storlet_dependency(url, token, dependency_name, local_path_to_dependency):
metadata = {'X-Object-Meta-Storlet-Dependency-Version': '1'}
# for an executable dependency
# metadata['X-Object-Meta-Storlet-Dependency-Permissions'] = '0755'
f = open('%s/%s'% (local_path_to_dependency, dependency_name), 'r')
content_length = None
response = dict()
c.put_object(url, token, 'dependency', dependency_name, f,
content_length, None, None, "application/octet-stream", metadata, None, None, None, response)
print response
status = response.get('status')
assert (status == 200 or status == 201)
AUTH_IP = ''
AUTH_PORT = '5000'
ACCOUNT = 'service'
USER_NAME = 'swift'
PASSWORD = 'passw0rd'
os_options = {'tenant_name': ACCOUNT}
url, token = c.get_auth("http://" + AUTH_IP + ":" + AUTH_PORT + "/v2.0", ACCOUNT +":"+USER_NAME, PASSWORD, os_options = os_options, auth_version="2.0")
put_storlet_object(url, token,'identitystorlet-1.0.jar','/tmp' ,'', 'get42')
put_storlet_dependency(url, token,'get42','/tmp')
Reference in New Issue
Block a user