Configure Object Storage features
Object Storage zones In OpenStack Object Storage, data is placed across different tiers of failure domains. First, data is spread across regions, then zones, then servers, and finally across drives. Data is placed to get the highest failure domain isolation. If you deploy multiple regions, the Object Storage service places the data across the regions. Within a region, each replica of the data should be stored in unique zones, if possible. If there is only one zone, data should be placed on different servers. And if there is only one server, data should be placed on different drives. Regions are widely separated installations with a high-latency or otherwise constrained network link between them. Zones are arbitrarily assigned, and it is up to the administrator of the Object Storage cluster to choose an isolation level and attempt to maintain the isolation level through appropriate zone assignment. For example, a zone may be defined as a rack with a single power source. Or a zone may be a DC room with a common utility provider. Servers are identified by a unique IP/port. Drives are locally attached storage volumes identified by mount point. In small clusters (five nodes or fewer), everything is normally in a single zone. Larger Object Storage deployments may assign zone designations differently; for example, an entire cabinet or rack of servers may be designated as a single zone to maintain replica availability if the cabinet becomes unavailable (for example, due to failure of the top of rack switches or a dedicated circuit). In very large deployments, such as service provider level deployments, each zone might have an entirely autonomous switching and power infrastructure, so that even the loss of an electrical circuit or switching aggregator would result in the loss of a single replica at most.
Rackspace zone recommendations For ease of maintenance on OpenStack Object Storage, Rackspace recommends that you set up at least five nodes. Each node is assigned its own zone (for a total of five zones), which gives you host level redundancy. This enables you to take down a single zone for maintenance and still guarantee object availability in the event that another zone fails during your maintenance. You could keep each server in its own cabinet to achieve cabinet level isolation, but you may wish to wait until your Object Storage service is better established before developing cabinet-level isolation. OpenStack Object Storage is flexible; if you later decide to change the isolation level, you can take down one zone at a time and move them to appropriate new homes.
RAID controller configuration OpenStack Object Storage does not require RAID. In fact, most RAID configurations cause significant performance degradation. The main reason for using a RAID controller is the battery-backed cache. It is very important for data integrity reasons that when the operating system confirms a write has been committed that the write has actually been committed to a persistent location. Most disks lie about hardware commits by default, instead writing to a faster write cache for performance reasons. In most cases, that write cache exists only in non-persistent memory. In the case of a loss of power, this data may never actually get committed to disk, resulting in discrepancies that the underlying file system must handle. OpenStack Object Storage works best on the XFS file system, and this document assumes that the hardware being used is configured appropriately to be mounted with the nobarriers option. For more information, refer to the XFS FAQ: http://xfs.org/index.php/XFS_FAQ To get the most out of your hardware, it is essential that every disk used in OpenStack Object Storage is configured as a standalone, individual RAID 0 disk; in the case of 6 disks, you would have six RAID 0s or one JBOD. Some RAID controllers do not support JBOD or do not support battery backed cache with JBOD. To ensure the integrity of your data, you must ensure that the individual drive caches are disabled and the battery backed cache in your RAID card is configured and used. Failure to configure the controller properly in this case puts data at risk in the case of sudden loss of power. You can also use hybrid drives or similar options for battery backed up cache configurations without a RAID controller.
Throttle resources through rate limits Rate limiting in OpenStack Object Storage is implemented as a pluggable middleware that you configure on the proxy server. Rate limiting is performed on requests that result in database writes to the account and container SQLite databases. It uses memcached and is dependent on the proxy servers having highly synchronized time. The rate limits are limited by the accuracy of the proxy server clocks.
Configure rate limiting All configuration is optional. If no account or container limits are provided, no rate limiting occurs. Available configuration options include: The container rate limits are linearly interpolated from the values given. A sample container rate limiting could be: container_ratelimit_100 = 100 container_ratelimit_200 = 50 container_ratelimit_500 = 20 This would result in:
Values for Rate Limiting with Sample Configuration Settings
Container Size Rate Limit
0-99 No limiting
100 100
150 75
500 20
1000 20
Health check Provides an easy way to monitor whether the Object Storage proxy server is alive. If you access the proxy with the path /healthcheck, it responds with OK in the response body, which monitoring tools can use.
Domain remap Middleware that translates container and account parts of a domain to path parameters that the proxy server understands.
CNAME lookup Middleware that translates an unknown domain in the host header to something that ends with the configured storage_domain by looking up the given domain's CNAME record in DNS.
Temporary URL Allows the creation of URLs to provide temporary access to objects. For example, a website may wish to provide a link to download a large object in OpenStack Object Storage, but the Object Storage account has no public access. The website can generate a URL that provides GET access for a limited time to the resource. When the web browser user clicks on the link, the browser downloads the object directly from Object Storage, eliminating the need for the website to act as a proxy for the request. If the user shares the link with all his friends, or accidentally posts it on a forum, the direct access is limited to the expiration time set when the website created the link. A temporary URL is the typical URL associated with an object, with two additional query parameters: temp_url_sig A cryptographic signature temp_url_expires An expiration date, in Unix time An example of a temporary URL: https://swift-cluster.example.com/v1/AUTH_a422b2-91f3-2f46-74b7-d7c9e8958f5d30/container/object? temp_url_sig=da39a3ee5e6b4b0d3255bfef95601890afd80709& temp_url_expires=1323479485 To create temporary URLs, first set the X-Account-Meta-Temp-URL-Key header on your Object Storage account to an arbitrary string. This string serves as a secret key. For example, to set a key of b3968d0207b54ece87cccc06515a89d4 using the swift command-line tool: $ swift post -m "Temp-URL-Key:b3968d0207b54ece87cccc06515a89d4" Next, generate an HMAC-SHA1 (RFC 2104) signature to specify: Which HTTP method to allow (typically GET or PUT) The expiry date as a Unix timestamp The full path to the object The secret key set as the X-Account-Meta-Temp-URL-Key Here is code generating the signature for a GET for 24 hours on /v1/AUTH_account/container/object: import hmac from hashlib import sha1 from time import time method = 'GET' duration_in_seconds = 60*60*24 expires = int(time() + duration_in_seconds) path = '/v1/AUTH_a422b2-91f3-2f46-74b7-d7c9e8958f5d30/container/object' key = 'mykey' hmac_body = '%s\n%s\n%s' % (method, expires, path) sig = hmac.new(key, hmac_body, sha1).hexdigest() s = 'https://{host}/{path}?temp_url_sig={sig}&temp_url_expires={expires}' url = s.format(host='swift-cluster.example.com', path=path, sig=sig, expires=expires) Any alteration of the resource path or query arguments results in a 401 Unauthorized error. Similarly, a PUT where GET was the allowed method returns a 401. HEAD is allowed if GET or PUT is allowed. Using this in combination with browser form post translation middleware could also allow direct-from-browser uploads to specific locations in Object Storage. Changing the X-Account-Meta-Temp-URL-Key invalidates any previously generated temporary URLs within 60 seconds (the memcache time for the key). Object Storage supports up to two keys, specified by X-Account-Meta-Temp-URL-Key and X-Account-Meta-Temp-URL-Key-2. Signatures are checked against both keys, if present. This is to allow for key rotation without invalidating all existing temporary URLs. Object Storage includes a script called swift-temp-url that generates the query parameters automatically: $ bin/swift-temp-url GET 3600 /v1/AUTH_account/container/object mykey /v1/AUTH_account/container/object? temp_url_sig=5c4cc8886f36a9d0919d708ade98bf0cc71c9e91& temp_url_expires=1374497657 Because this command only returns the path, you must prefix the Object Storage host name (for example, https://swift-cluster.example.com). With GET Temporary URLs, a Content-Disposition header is set on the response so that browsers interpret this as a file attachment to be saved. The file name chosen is based on the object name, but you can override this with a filename query parameter. The following example specifies a filename of My Test File.pdf: https://swift-cluster.example.com/v1/AUTH_a422b2-91f3-2f46-74b7-d7c9e8958f5d30/container/object? temp_url_sig=da39a3ee5e6b4b0d3255bfef95601890afd80709& temp_url_expires=1323479485& filename=My+Test+File.pdf If you do not want the object to be downloaded, you can cause Content-Disposition: inline to be set on the response by adding the inline parameter to the query string, as follows: https://swift-cluster.example.com/v1/AUTH_account/container/object? temp_url_sig=da39a3ee5e6b4b0d3255bfef95601890afd80709& temp_url_expires=1323479485&inline To enable Temporary URL functionality, edit /etc/swift/proxy-server.conf to add tempurl to the pipeline variable defined in the [pipeline:main] section. The tempurl entry should appear immediately before the authentication filters in the pipeline, such as authtoken, tempauth or keystoneauth. For example:[pipeline:main] pipeline = pipeline = healthcheck cache tempurl authtoken keystoneauth proxy-server
Name check filter Name Check is a filter that disallows any paths that contain defined forbidden characters or that exceed a defined length.
Constraints To change the OpenStack Object Storage internal limits, update the values in the swift-constraints section in the swift.conf file. Use caution when you update these values because they affect the performance in the entire cluster.
Cluster health Use the swift-dispersion-report tool to measure overall cluster health. This tool checks if a set of deliberately distributed containers and objects are currently in their proper places within the cluster. For instance, a common deployment has three replicas of each object. The health of that object can be measured by checking if each replica is in its proper place. If only 2 of the 3 is in place the object's health can be said to be at 66.66%, where 100% would be perfect. A single object's health, especially an older object, usually reflects the health of that entire partition the object is in. If you make enough objects on a distinct percentage of the partitions in the cluster,you get a good estimate of the overall cluster health. In practice, about 1% partition coverage seems to balance well between accuracy and the amount of time it takes to gather results. The first thing that needs to be done to provide this health value is create a new account solely for this usage. Next, you need to place the containers and objects throughout the system so that they are on distinct partitions. The swift-dispersion-populate tool does this by making up random container and object names until they fall on distinct partitions. Last, and repeatedly for the life of the cluster, you must run the swift-dispersion-report tool to check the health of each of these containers and objects. These tools need direct access to the entire cluster and to the ring files (installing them on a proxy server suffices). The swift-dispersion-populate and swift-dispersion-report commands both use the same configuration file, /etc/swift/dispersion.conf. Example dispersion.conf file: [dispersion] auth_url = http://localhost:8080/auth/v1.0 auth_user = test:tester auth_key = testing There are also configuration options for specifying the dispersion coverage, which defaults to 1%, retries, concurrency, and so on. However, the defaults are usually fine. Once the configuration is in place, run swift-dispersion-populate to populate the containers and objects throughout the cluster. Now that those containers and objects are in place, you can run swift-dispersion-report to get a dispersion report, or the overall health of the cluster. Here is an example of a cluster in perfect health: $ swift-dispersion-report Queried 2621 containers for dispersion reporting, 19s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries 100.00% of object copies found (7857 of 7857) Sample represents 1.00% of the object partition space Now, deliberately double the weight of a device in the object ring (with replication turned off) and re-run the dispersion report to show what impact that has: $ swift-ring-builder object.builder set_weight d0 200 $ swift-ring-builder object.builder rebalance ... $ swift-dispersion-report Queried 2621 containers for dispersion reporting, 8s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries There were 1763 partitions missing one copy. 77.56% of object copies found (6094 of 7857) Sample represents 1.00% of the object partition space You can see the health of the objects in the cluster has gone down significantly. Of course, this test environment has just four devices, in a production environment with many devices the impact of one device change is much less. Next, run the replicators to get everything put back into place and then rerun the dispersion report: ... start object replicators and monitor logs until they're caught up ... $ swift-dispersion-report Queried 2621 containers for dispersion reporting, 17s, 0 retries 100.00% of container copies found (7863 of 7863) Sample represents 1.00% of the container partition space Queried 2619 objects for dispersion reporting, 7s, 0 retries 100.00% of object copies found (7857 of 7857) Sample represents 1.00% of the object partition space Alternatively, the dispersion report can also be output in JSON format. This allows it to be more easily consumed by third-party utilities: $ swift-dispersion-report -j {"object": {"retries:": 0, "missing_two": 0, "copies_found": 7863, "missing_one": 0, "copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "missing_all": 0}, "container": {"retries:": 0, "missing_two": 0, "copies_found": 12534, "missing_one": 0, "copies_expected": 12534, "pct_found": 100.0, "overlapping": 15, "missing_all": 0}}
Static Large Object (SLO) support This feature is very similar to Dynamic Large Object (DLO) support in that it enables the user to upload many objects concurrently and afterwards download them as a single object. It is different in that it does not rely on eventually consistent container listings to do so. Instead, a user-defined manifest of the object segments is used.
Container quotas The container_quotas middleware implements simple quotas that can be imposed on Object Storage containers by a user with the ability to set container metadata, most likely the account administrator. This can be useful for limiting the scope of containers that are delegated to non-admin users, exposed to formpost uploads, or just as a self-imposed sanity check. Any object PUT operations that exceed these quotas return a 403 response (forbidden). Quotas are subject to several limitations: eventual consistency, the timeliness of the cached container_info (60 second TTL by default), and it is unable to reject chunked transfer uploads that exceed the quota (though once the quota is exceeded, new chunked transfers are refused). Set quotas by adding meta values to the container. These values are validated when you set them: X-Container-Meta-Quota-Bytes: Maximum size of the container, in bytes. X-Container-Meta-Quota-Count: Maximum object count of the container.
Account quotas The x-account-meta-quota-bytes metadata entry must be requests (PUT, POST) if a given account quota (in bytes) is exceeded while DELETE requests are still allowed. The x-account-meta-quota-bytes metadata entry must be set to store and enable the quota. Write requests to this metadata entry are only permitted for resellers. There is no account quota limitation on a reseller account even if x-account-meta-quota-bytes is set. Any object PUT operations that exceed the quota return a 413 response (request entity too large) with a descriptive body. The following command uses an admin account that own the Reseller role to set a quota on the test account: $ swift -A http://127.0.0.1:8080/auth/v1.0 -U admin:admin -K admin \ --os-storage-url http://127.0.0.1:8080/v1/AUTH_test post -m quota-bytes:10000 Here is the stat listing of an account where quota has been set: $ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing stat Account: AUTH_test Containers: 0 Objects: 0 Bytes: 0 Meta Quota-Bytes: 10000 X-Timestamp: 1374075958.37454 X-Trans-Id: tx602634cf478546a39b1be-0051e6bc7a This command removes the account quota: $ swift -A http://127.0.0.1:8080/auth/v1.0 -U admin:admin -K admin --os-storage-url http://127.0.0.1:8080/v1/AUTH_test post -m quota-bytes:
Bulk delete Use bulk-delete to delete multiple files from an account with a single request. Responds to DELETE requests with a header 'X-Bulk-Delete: true_value'. The body of the DELETE request is a new line-separated list of files to delete. The files listed must be URL encoded and in the form: /container_name/obj_name If all files are successfully deleted (or did not exist), the operation returns HTTPOk. If any files failed to delete, the operation returns HTTPBadGateway. In both cases, the response body is a JSON dictionary that shows the number of files that were successfully deleted or not found. The files that failed are listed.
Drive audit The configuration items reference a script that can be run by using cron to watch for bad drives. If errors are detected, it unmounts the bad drive, so that OpenStack Object Storage can work around it. It takes the following options:
Form post Middleware that provides the ability to upload objects to a cluster using an HTML form POST. The format of the form is: <![CDATA[ <form action="<swift-url>" method="POST" enctype="multipart/form-data"> <input type="hidden" name="redirect" value="<redirect-url>" /> <input type="hidden" name="max_file_size" value="<bytes>" /> <input type="hidden" name="max_file_count" value="<count>" /> <input type="hidden" name="expires" value="<unix-timestamp>" /> <input type="hidden" name="signature" value="<hmac>" /> <input type="file" name="file1" /><br /> <input type="submit" /> </form>]]> The swift-url is the URL to the Object Storage destination, such as: https://swift-cluster.example.com/v1/AUTH_account/container/object_prefix The name of each file uploaded is appended to the specified swift-url. So, you can upload directly to the root of container with a URL like: https://swift-cluster.example.com/v1/AUTH_account/container/ Optionally, you can include an object prefix to better separate different users' uploads, such as: https://swift-cluster.example.com/v1/AUTH_account/container/object_prefix The form method must be POST and the enctype must be set as multipart/form-data. The redirect attribute is the URL to redirect the browser to after the upload completes. The URL has status and message query parameters added to it, indicating the HTTP status code for the upload (2xx is success) and a possible message for further information if there was an error (such as "max_file_size exceeded"). The max_file_size attribute must be included and indicates the largest single file upload that can be done, in bytes. The max_file_count attribute must be included and indicates the maximum number of files that can be uploaded with the form. Include additional <![CDATA[<input type="file" name="filexx"/>]]> attributes if desired. The expires attribute is the Unix timestamp before which the form must be submitted before it is invalidated. The signature attribute is the HMAC-SHA1 signature of the form. This sample Python code shows how to compute the signature: import hmac from hashlib import sha1 from time import time path = '/v1/account/container/object_prefix' redirect = 'https://myserver.com/some-page' max_file_size = 104857600 max_file_count = 10 expires = int(time() + 600) key = 'mykey' hmac_body = '%s\n%s\n%s\n%s\n%s' % (path, redirect, max_file_size, max_file_count, expires) signature = hmac.new(key, hmac_body, sha1).hexdigest() The key is the value of the X-Account-Meta-Temp-URL-Key header on the account. Be certain to use the full path, from the /v1/ onward. The command-line tool swift-form-signature may be used (mostly just when testing) to compute expires and signature. The file attributes must appear after the other attributes to be processed correctly. If attributes come after the file, they are not sent with the sub-request because on the server side, all attributes in the file cannot be parsed unless the whole file is read into memory and the server does not have enough memory to service these requests. So, attributes that follow the file are ignored.
Static web sites When configured, this middleware serves container data as a static web site with index file and error file resolution and optional file listings. This mode is normally only active for anonymous requests.