training-guides/doc/training-guides/module003-ch006-more-concep...

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE chapter [
<!ENTITY % openstack SYSTEM "../common/entities/openstack.ent">
%openstack;
]>
<chapter xmlns="http://docbook.org/ns/docbook"
    xmlns:xi="http://www.w3.org/2001/XInclude"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    version="5.0"
    xml:id="module003-ch006-more-concepts">
    <title>A Bit More On Swift</title>
        <para><guilabel>Containers and Objects</guilabel></para>
        <para>A container is a storage compartment for your data and
            provides a way for you to organize your data. You can
            think of a container as a folder in Windows or a
            directory in UNIX. The primary difference between a
            container and these other file system concepts is that
            containers cannot be nested. You can, however, create an
            unlimited number of containers within your account. Data
            must be stored in a container so you must have at least
            one container defined in your account prior to uploading
            data.</para>
        <para>The only restrictions on container names is that they
            cannot contain a forward slash (/) or an ascii null (%00)
            and must be less than 257 bytes in length. Please note
            that the length restriction applies to the name after it
            has been URL encoded. For example, a container name of
            Course Docs would be URL encoded as Course%20Docs and
            therefore be 13 bytes in length rather than the expected
            11.</para>
        <para>An object is the basic storage entity and any optional
            metadata that represents the files you store in the
            OpenStack Object Storage system. When you upload data to
            OpenStack Object Storage, the data is stored as-is (no
            compression or encryption) and consists of a location
            (container), the object's name, and any metadata
            consisting of key/value pairs. For instance, you may chose
            to store a backup of your digital photos and organize them
            into albums. In this case, each object could be tagged
            with metadata such as Album : Caribbean Cruise or Album :
            Aspen Ski Trip.</para>
        <para>The only restriction on object names is that they must
            be less than 1024 bytes in length after URL encoding. For
            example, an object name of C++final(v2).txt should be URL
            encoded as C%2B%2Bfinal%28v2%29.txt and therefore be 24
            bytes in length rather than the expected 16.</para>
        <para>The maximum allowable size for a storage object upon
            upload is 5&nbsp;GB and the minimum is zero bytes.
            You can use the built-in large object support and the
            swift utility to retrieve objects larger than 5&nbsp;GB.</para>
        <para>For metadata, you should not exceed 90 individual
            key/value pairs for any one object and the total byte
            length of all key/value pairs should not exceed 4&nbsp;KB
            (4096&nbsp;bytes).</para>
        <para><guilabel>Language-Specific API
        Bindings</guilabel></para>
        <para>A set of supported API bindings in several popular
            languages are available from the Rackspace Cloud Files
            product, which uses OpenStack Object Storage code for its
            implementation. These bindings provide a layer of
            abstraction on top of the base REST API, allowing
            programmers to work with a container and object model
            instead of working directly with HTTP requests and
            responses. These bindings are free (as in beer and as in
            speech) to download, use, and modify. They are all
            licensed under the MIT License as described in the COPYING
            file packaged with each binding. If you do make any
            improvements to an API, you are encouraged (but not
            required) to submit those changes back to us.</para>
        <para>The API bindings for Rackspace Cloud Files are hosted
                at<link xlink:href="http://github.com/rackspace"
                ></link><link
                xlink:href="http://github.com/rackspace"
                >http://github.com/rackspace</link>. Feel free to
            coordinate your changes through github or, if you prefer,
            send your changes to cloudfiles@rackspacecloud.com. Just
            make sure to indicate which language and version you
            modified and send a unified diff.</para>
        <para>Each binding includes its own documentation (either
            HTML, PDF, or CHM). They also include code snippets and
            examples to help you get started. The currently supported
            API binding for OpenStack Object Storage are:</para>
        <itemizedlist>
            <listitem>
                <para>PHP (requires 5.x and the modules: cURL,
                    FileInfo, mbstring)</para>
            </listitem>
            <listitem>
                <para>Python (requires 2.4 or newer)</para>
            </listitem>
            <listitem>
                <para>Java (requires JRE v1.5 or newer)</para>
            </listitem>
            <listitem>
                <para>C#/.NET (requires .NET Framework v3.5)</para>
            </listitem>
            <listitem>
                <para>Ruby (requires 1.8 or newer and mime-tools
                    module)</para>
            </listitem>
        </itemizedlist>
        <para>There are no other supported language-specific bindings
            at this time. You are welcome to create your own language
            API bindings and we can help answer any questions during
            development, host your code if you like, and give you full
            credit for your work.</para>
            <para><guilabel>Proxy Server</guilabel></para>
            <para>The Proxy Server is responsible for tying together
                the rest of the OpenStack Object Storage architecture.
                For each request, it will look up the location of the
                account, container, or object in the ring (see below)
                and route the request accordingly. The public API is
                also exposed through the Proxy Server.</para>
            <para>A large number of failures are also handled in the
                Proxy Server. For example, if a server is unavailable
                for an object PUT, it will ask the ring for a hand-off
                server and route there instead.</para>
            <para>When objects are streamed to or from an object
                server, they are streamed directly through the proxy
                server to or from the user – the proxy server does not
                spool them.</para>
            <para>You can use a proxy server with account management
                enabled by configuring it in the proxy server
                configuration file.</para>
            <para><guilabel>Object Server</guilabel></para>
            <para>The Object Server is a very simple blob storage
                server that can store, retrieve and delete objects
                stored on local devices. Objects are stored as binary
                files on the filesystem with metadata stored in the
                file’s extended attributes (xattrs). This requires
                that the underlying filesystem choice for object
                servers support xattrs on files. Some filesystems,
                like ext3, have xattrs turned off by default.</para>
            <para>Each object is stored using a path derived from the
                object name’s hash and the operation’s timestamp. Last
                write always wins, and ensures that the latest object
                version will be served. A deletion is also treated as
                a version of the file (a 0 byte file ending with
                “.ts”, which stands for tombstone). This ensures that
                deleted files are replicated correctly and older
                versions don’t magically reappear due to failure
                scenarios.</para>
            <para><guilabel>Container Server</guilabel></para>
            <para>The Container Server’s primary job is to handle
                listings of objects. It does not know where those
                objects are, just what objects are in a specific
                container. The listings are stored as SQLite database
                files, and replicated across the cluster similar to
                how objects are. Statistics are also tracked that
                include the total number of objects, and total storage
                usage for that container.</para>
            <para><guilabel>Account Server</guilabel></para>
            <para>The Account Server is very similar to the Container
                Server, excepting that it is responsible for listings
                of containers rather than objects.</para>
            <para><guilabel>Replication</guilabel></para>
            <para>Replication is designed to keep the system in a
                consistent state in the face of temporary error
                conditions like network outages or drive
                failures.</para>
            <para>The replication processes compare local data with
                each remote copy to ensure they all contain the latest
                version. Object replication uses a hash list to
                quickly compare subsections of each partition, and
                container and account replication use a combination of
                hashes and shared high water marks.</para>
            <para>Replication updates are push based. For object
                replication, updating is just a matter of rsyncing
                files to the peer. Account and container replication
                push missing records over HTTP or rsync whole database
                files.</para>
            <para>The replicator also ensures that data is removed
                from the system. When an item (object, container, or
                account) is deleted, a tombstone is set as the latest
                version of the item. The replicator will see the
                tombstone and ensure that the item is removed from the
                entire system.</para>
            <para>To separate the cluster-internal replication traffic
                from client traffic, separate replication servers can
                be used. These replication servers are based on the
                standard storage servers, but they listen on the
                replication IP and only respond to REPLICATE requests.
                Storage servers can serve REPLICATE requests, so an
                operator can transition to using a separate
                replication network with no cluster downtime.</para>
            <para>Replication IP and port information is stored in the
                ring on a per-node basis. These parameters will be
                used if they are present, but they are not required.
                If this information does not exist or is empty for a
                particular node, the node's standard IP and port will
                be used for replication.</para>
            <para><guilabel>Updaters</guilabel></para>
            <para>There are times when container or account data can
                not be immediately updated. This usually occurs during
                failure scenarios or periods of high load. If an
                update fails, the update is queued locally on the file
                system, and the updater will process the failed
                updates. This is where an eventual consistency window
                will most likely come in to play. For example, suppose
                a container server is under load and a new object is
                put in to the system. The object will be immediately
                available for reads as soon as the proxy server
                responds to the client with success. However, the
                container server did not update the object listing,
                and so the update would be queued for a later update.
                Container listings, therefore, may not immediately
                contain the object.</para>
            <para>In practice, the consistency window is only as large
                as the frequency at which the updater runs and may not
                even be noticed as the proxy server will route listing
                requests to the first container server which responds.
                The server under load may not be the one that serves
                subsequent listing requests – one of the other two
                replicas may handle the listing.</para>
            <para><guilabel>Auditors</guilabel></para>
            <para>Auditors crawl the local server checking the
                integrity of the objects, containers, and accounts. If
                corruption is found (in the case of bit rot, for
                example), the file is quarantined, and replication
                will replace the bad file from another replica. If
                other errors are found they are logged. For example,
                an object’s listing cannot be found on any container
                server it should be.</para>
</chapter>