From c10740c8c5a80c40ab24f9432a4f056733862ee1 Mon Sep 17 00:00:00 2001 From: Samuel Merritt Date: Tue, 14 Apr 2015 16:45:09 -0700 Subject: [PATCH] Symlinks in Swift Change-Id: I97eeaf3bb83c2ed46c7203b4c4d97fa716f99a00 --- specs/in_progress/symlinks.rst | 241 +++++++++++++++++++++++++++++++++ 1 file changed, 241 insertions(+) create mode 100644 specs/in_progress/symlinks.rst diff --git a/specs/in_progress/symlinks.rst b/specs/in_progress/symlinks.rst new file mode 100644 index 0000000..6796674 --- /dev/null +++ b/specs/in_progress/symlinks.rst @@ -0,0 +1,241 @@ + +:: + + This work is licensed under a Creative Commons Attribution 3.0 + Unported License. + http://creativecommons.org/licenses/by/3.0/legalcode + +==================== +Swift Symbolic Links +==================== + +1. Problem description +====================== + +With the advent of storage policies and erasure codes, moving an +object between containers is becoming increasingly useful. However, we +don't want to break existing references to the object when we do so. + +For example, a common object lifecycle has the object starting life +"hot" (i.e. frequently requested) and gradually "cooling" over time +(becoming less frequently requested). The user will want an object to +start out replicated for high requests-per-second while hot, but +eventually transition to EC for lower storage cost once cold. + +A completely different use case is when an application is sharding +objects across multiple containers, but finds that it needs to use +even more containers; for example, going from 256 containers up to +4096 as write rate goes up. The application could migrate to the new +schema by creating 4096-sharded references for all 256-sharded +objects, thus avoiding a lot of data movement. + +Yet a third use case is a user who has large amounts of +infrequently-accessed data that is stored replicated (because it was +uploaded prior to Swift's erasure-code support) and would like to +store it erasure-coded instead. The user will probably ask for Swift +to allow storage-policy changes at the container level, but as that is +fraught with peril, we can offer them this instead. + + +2. Proposed change +================== + +Swift will gain the notion of a symbolic link ("symlink") object. This +object will reference another object. GET, HEAD, POST, and OPTIONS +requests for a symlink object will operate on the referenced object. +DELETE and PUT requests for a symlink object will operate on the +symlink object, not the referenced object, and will delete or +overwrite it, respectively. + +GET, HEAD, POST, and OPTIONS requests can operate on a symlink object +instead of the referenced object by adding a query parameter +``?symlink=true`` to the request. + +The aim is for Swift symlinks to operate analogously to Unix symbolic +links (except where it does not make sense to do so). + + +2.1. Alternatives +----------------- + +One could use a single-segment SLO manifest to achieve a similar +effect. However, the ETag of a SLO manifest is the MD5 of the ETags of +its segments, so using a single-segment SLO manifest changes the ETag +of the object. Also, object metadata (X-Object-Meta-\*) would have to +be copied to the SLO manifest since metadata from SLO segments does +not appear in the response. Further, SLO manifests contain the ETag of +the referenced segments, and if a segment changes, the manifest +becomes invalid. This is not a desirable property for symlinks. + +A DLO manifest does not validate ETags, but it still fails to preserve +the referenced object's ETag and metadata, so it is also unsuitable. +Further, since DLOs are based on object name prefixes, the upload of a +new object (e.g. ``thesis.doc``, then later ``thesis.doc.old``) could +cause corrupted downloads. + +Also, DLOs and SLOs cannot use each other as segments, while Swift +symlinks can reference DLOs and SLOs *and* act as segments in DLOs and +SLOs. + +3. Client-facing API +==================== + +Clients create a Swift symlink by performing a zero-length PUT request +with the query parameter ``?symlink=true`` and the header +``X-Object-Symlink-Target-Object: ``. + +For a cross-container symlink, also include the header +``X-Object-Symlink-Target-Container: ``. If omitted, it defaults to +the container of the symlink object. + +For a cross-account symlink, also include the header +``X-Object-Symlink-Target-Account: ``. If omitted, it defaults to +the account of the symlink object. + +Symlinks must be zero-byte objects. Attempting to PUT a symlink +with a nonempty request body will result in a 400-series error. + +The referenced object need not exist at symlink-creation time. This +mimics the behavior of Unix symbolic links. Also, if we ever make bulk +uploads work with symbolic links in the tarballs, then we'll have to +avoid validation. ``tar`` just appends files to the archive as it +finds them; it does not push symbolic links to the back of the +archive. Thus, there's a 50% chance that any given symlink in a +tarball will precede its referent. + + +3.1 Example: Move an object to EC storage +----------------------------------------- + +Assume the object is /v1/MY_acct/con/obj + +1. Obtain an EC-storage-policy container either by finding a + pre-existing one or by making a container PUT request with the + right X-Storage-Policy header. + +1. Make a COPY request to copy the object into the EC-policy + container, e.g.:: + + COPY /v1/MY_acct/con/obj + Destination: ec-con/obj + +1. Overwrite the replicated object with a symlink object:: + + PUT /v1/MY_acct/con/obj?symlink=true + X-Object-Symlink-Target-Container: ec-con + X-Object-Symlink-Target-Object: obj + +4. Interactions With Existing Features +====================================== + +4.1 COPY requests +----------------- + +If you copy a symlink without ``?symlink=true``, you get a copy of the +referenced object. If you copy a symlink with ``?symlink=true``, you +get a copy of the symlink; it will refer to the same object, +container, and account. + +However, if you copy a symlink without +``X-Object-Symlink-Target-Container`` between containers, or a symlink +without ``X-Object-Symlink-Target-Account`` between accounts, the new +symlink will refer to a different object. + +4.2 Versioned Containers +------------------------ + +These will definitely interact. We should probably figure out how. + + +4.3 Object Expiration +--------------------- + +There's nothing special here. If you create the symlink with +``X-Delete-At``, the symlink will get deleted at the appropriate time. + +If you use a plain POST to set ``X-Delete-At`` on a symlink, it gets +set on the referenced object just like other object metadata. If you +use POST with ``?symlink=true`` to set ``X-Delete-At`` on a symlink, +it will be set on the symlink itself. + + +4.4 Large Objects +----------------- + +Since we'll almost certainly end up implementing symlinks as +middleware, we'll order the pipeline like this:: + + [pipeline:main] + pipeline = catch_errors ... slo dlo symlink ... proxy-server + +This way, you can create a symlink whose target is a large object +*and* a large object can reference symlinks as segments. + +This also works if we decide to implement symlinks in the proxy +server, though that would only happen if a compelling reason were +found. + + +4.5 User Authorization +---------------------- + +Authorization will be checked for both the symlink and the referenced +object. If the user is authorized to see the symlink but not the +referenced object, they'll get a 403, same as if they'd tried to +access the referenced object directly. + + +4.6. Quotas +----------- + +Nothing special needed here. A symlink counts as 1 object toward an +object-count quota. Since symlinks are zero bytes, they do not count +toward a storage quota, and we do not need to write any code to make +that happen. + + +4.7 list_endpoints / Hadoop / ZeroVM +------------------------------------ + +If the application talks directly to the object server and fetches a +symlink, it's up to the application to deal with it. Applications that +bypass the proxy should either avoid use of symlinks or should know +how to handle them. + +The same is true for SLO, DLO, versioning, erasure codes, and other +services that the Swift proxy server provides, so we are not without +precedent here. + + +4.8 Container Sync +------------------ + +Symlinks are synced like every other object. If the referenced object +in cluster A has a different container name than in cluster B, then +the symlink will point to the wrong place in one of the clusters. + +Intra-container symlinks (those with only +``X-Object-Symlink-Target-Object``) will work correctly on both +clusters. Also, if containers are named identically on both clusters, +inter-container symlinks (those with +``X-Object-Symlink-Target-Object`` and +``X-Object-Symlink-Target-Container``) will work correctly too. + + +4.9 Bulk Uploads +---------------- + +Currently, bulk uploads ignore all non-file members in the uploaded +tarball. This could be expanded to also process symbolic-link members +(i.e. those for which ``tarinfo.issym() == True``) and create symlink +objects from them. This is not necessary for the initial +implementation of Swift symlinks, but it would be nice to have. + +4.10 Swiftclient +---------------- + +python-swiftclient could download Swift symlinks as Unix symlinks if a +flag is given, or it could upload Unix symlinks as Swift symlinks in +some cases. This is not necessary for the initial implementation of +Swift symlinks, and is mainly mentioned here to show that +python-swiftclient was not forgotten.