Files
swift/test/unit/common/middleware/test_bulk.py
Samuel Merritt 215cd551df Bulk upload: treat user xattrs as object metadata
Currently, if you PUT a single object, then you can also associate
metadata with it by putting it in the request headers, prefixed with
"X-Object-Meta". However, if you're bulk-uploading objects, then you
have no way to assign any metadata.

The tar file format* allows for arbitrary UTF-8 key/value pairs to be
associated with each file in an archive (as well as with the archive
itself, but we don't care about that here). If a file has extended
attributes, then tar will store those as key/value pairs.

This commit makes bulk upload read those extended attributes, if
present, and convert those to Swift object metadata. Attributes
starting with "user.meta" are converted to object metadata, and
"user.mime_type"** is converted to Content-Type.

For example, if you have a file "setup.py":

    $ setfattr -n user.mime_type -v "application/python-setup" setup.py
    $ setfattr -n user.meta.lunch -v "burger and fries" setup.py
    $ setfattr -n user.meta.dinner -v "baked ziti" setup.py
    $ setfattr -n user.stuff -v "whee" setup.py

This will get translated to headers:

    Content-Type: application/python-setup
    X-Object-Meta-Lunch: burger and fries
    X-Object-Meta-Dinner: baked ziti

Swift will handle xattrs stored by both GNU and BSD tar***. Only
xattrs user.mime_type and user.meta.* are processed; others are
ignored.

This brings bulk upload much closer to feature-parity with non-bulk upload.

* The POSIX 1003.1-2001 (pax) format, at least. There are a few
  different, mutually-incompatible tar formats out there, because of
  course there are. This is the default format on GNU tar 1.27.1 or
  later.

** http://standards.freedesktop.org/shared-mime-info-spec/latest/ar01s02.html#idm140622087713936

*** Even with pax-format tarballs, different encoders store xattrs
    slightly differently; for example, GNU tar stores the xattr
    "user.rubberducky" as pax header "SCHILY.xattr.user.rubberducky",
    while BSD tar (which uses libarchive) stores it as
    "LIBARCHIVE.xattr.user.rubberducky". One might wonder if this is
    some programmer's attempt at job security.

Change-Id: I5e3ce87d31054f5239e86d47c45adbde2bb93640
2015-04-21 18:42:32 -07:00

42 KiB