Object metadata is stored as a pickled hash: first the data is pickled, then split into strings of length <= 254, then stored in a series of extended attributes named "user.swift.metadata", "user.swift.metadata1", "user.swift.metadata2", and so forth. The choice of length 254 is odd, undocumented, and dates back to the initial commit of Swift. From talking to people, I believe this was an attempt to fit the first xattr in the inode, thus avoiding a seek. However, it doesn't work. XFS _either_ stores all the xattrs together in the inode (local), _or_ it spills them all to blocks located outside the inode (extents or btree). Using short xattrs actually hurts us here; by splitting into more pieces, we end up with more names to store, thus reducing the metadata size that'll fit in the inode. [Source: http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/Extended_Attributes.html] I did some benchmarking of read_metadata with various xattr sizes against an XFS filesystem on a spinning disk, no VMs involved. Summary: name | rank | runs | mean | sd | timesBaseline ------|------|------|-----------|-----------|-------------- 32768 | 1 | 2500 | 0.0001195 | 3.75e-05 | 1.0 16384 | 2 | 2500 | 0.0001348 | 1.869e-05 | 1.12809122912 8192 | 3 | 2500 | 0.0001604 | 2.708e-05 | 1.34210998858 4096 | 4 | 2500 | 0.0002326 | 0.0004816 | 1.94623473988 2048 | 5 | 2500 | 0.0003414 | 0.0001409 | 2.85674781189 1024 | 6 | 2500 | 0.0005457 | 0.0001741 | 4.56648611635 254 | 7 | 2500 | 0.001848 | 0.001663 | 15.4616067887 Here, "name" is the chunk size for the pickled metadata. A total metadata size of around 31.5 KiB was used, so the "32768" runs represent storing everything in one single xattr, while the "254" runs represent things as they are without this change. Since bigger xattr chunks make things go faster, the new chunk size is 64 KiB. That's the biggest xattr that XFS allows. Reading of metadata from existing files is unaffected; the read_metadata() function already handles xattrs of any size. On non-XFS filesystems, this is no worse than what came before: ext4 has a limit of one block (typically 4 KiB) for all xattrs (names and values) taken together [1], so this change slightly increases the amount of Swift metadata that can be stored on ext4. ZFS let me store an xattr with an 8 MiB value, so that's plenty. It'll probably go further, but I stopped there. [1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Extended_Attributes Change-Id: Ie22db08ac0050eda693de4c30d4bc0d620e7f7d4
Swift
A distributed object storage system designed to scale from a single machine to thousands of servers. Swift is optimized for multi-tenancy and high concurrency. Swift is ideal for backups, web and mobile content, and any other unstructured data that can grow without bound.
Swift provides a simple, REST-based API fully documented at http://docs.openstack.org/.
Swift was originally developed as the basis for Rackspace's Cloud Files and was open-sourced in 2010 as part of the OpenStack project. It has since grown to include contributions from many companies and has spawned a thriving ecosystem of 3rd party tools. Swift's contributors are listed in the AUTHORS file.
Docs
To build documentation install sphinx (pip install sphinx), run
python setup.py build_sphinx, and then browse to /doc/build/html/index.html.
These docs are auto-generated after every commit and available online at
http://docs.openstack.org/developer/swift/.
For Developers
The best place to get started is the "SAIO - Swift All In One". This document will walk you through setting up a development cluster of Swift in a VM. The SAIO environment is ideal for running small-scale tests against swift and trying out new features and bug fixes.
You can run unit tests with .unittests and functional tests with
.functests.
If you would like to start contributing, check out these notes to help you get started.
Code Organization
- bin/: Executable scripts that are the processes run by the deployer
- doc/: Documentation
- etc/: Sample config files
- swift/: Core code
- account/: account server
- common/: code shared by different modules
- middleware/: "standard", officially-supported middleware
- ring/: code implementing Swift's ring
- container/: container server
- obj/: object server
- proxy/: proxy server
- test/: Unit and functional tests
Data Flow
Swift is a WSGI application and uses eventlet's WSGI server. After the
processes are running, the entry point for new requests is the Application
class in swift/proxy/server.py. From there, a controller is chosen, and the
request is processed. The proxy may choose to forward the request to a back-
end server. For example, the entry point for requests to the object server is
the ObjectController class in swift/obj/server.py.
For Deployers
Deployer docs are also available at http://docs.openstack.org/developer/swift/. A good starting point is at http://docs.openstack.org/developer/swift/deployment_guide.html
You can run functional tests against a swift cluster with .functests. These
functional tests require /etc/swift/test.conf to run. A sample config file
can be found in this source tree in test/sample.conf.
For Client Apps
For client applications, official Python language bindings are provided at http://github.com/openstack/python-swiftclient.
Complete API documentation at http://docs.openstack.org/api/openstack-object-storage/1.0/content/
For more information come hang out in #openstack-swift on freenode.
Thanks,
The Swift Development Team