swift

Author	SHA1	Message	Date
Samuel Merritt	a41c458c90	proxy: make the right number of container updates When the proxy is putting X-Container headers into object PUT requests, it should put out just enough to make the container update durable in the worst case. It shouldn't do more, since that results in extra work for the container servers; and it shouldn't do less, since that results in objects not showing up in listings. The current code gets the number right as long as you have 3 container replicas and an odd number of object replicas, but it comes up with some bogus numbers in other cases. The number it computes is (object-quorum + 1). This patch changes the number to (container-quorum + max_put_failures). Example: given an EC 12+5 policy and 3 container replicas, you can lose up to 4 connections and still succeed. Since you need to have 2 container updates happen for durability, you need 6 connections to have X-Container headers. That way, you can lose 4 and still have 2 left. The current code would put X-Container headers on 14 of the connections, resulting in more than double the workload on the container servers; this patch changes the number to 6. Example 2: given a (crazy) EC 3+6 policy and 3 container replicas, you can lose up to 5 connections, so you need X-Container headers on 7. The current code only sends 5, giving a worst-case result of a PUT succeeds but never reaches the containers. This patch changes the number to 7. Other examples: \| current \| this change \| --+-----------+---------------+ EC 10+4, 3x container \| 12 \| 5 \| EC 10+4, 5x container \| 12 \| 6 \| EC 15+4, 3x container \| 17 \| 5 \| EC 15+4, 5x container \| 17 \| 6 \| EC 4+8, 3x container \| 6 \| 9 \| 7x object, 3x container \| 5 \| 5 \| 6x object, 3x container \| 4 \| 5 \| Change-Id: I34efd48655b890340912810ab111bb63445e5c8b	2018-01-09 16:34:24 -08:00
Samuel Merritt	728b4ba140	Add checksum to object extended attributes Currently, our integrity checking for objects is pretty weak when it comes to object metadata. If the extended attributes on a .data or .meta file get corrupted in such a way that we can still unpickle it, we don't have anything that detects that. This could be especially bad with encrypted etags; if the encrypted etag (X-Object-Sysmeta-Crypto-Etag or whatever it is) gets some bits flipped, then we'll cheerfully decrypt the cipherjunk into plainjunk, then send it to the client. Net effect is that the client sees a GET response with an ETag that doesn't match the MD5 of the object and Swift has no way of detecting and quarantining this object. Note that, with an unencrypted object, if the ETag metadatum gets mangled, then the object will be quarantined by the object server or auditor, whichever notices first. As part of this commit, I also ripped out some mocking of getxattr/setxattr in tests. It appears to be there to allow unit tests to run on systems where /tmp doesn't support xattrs. However, since the mock is keyed off of inode number and inode numbers get re-used, there's lots of leakage between different test runs. On a real FS, unlinking a file and then creating a new one of the same name will also reset the xattrs; this isn't the case with the mock. The mock was pretty old; Ubuntu 12.04 and up all support xattrs in /tmp, and recent Red Hat / CentOS releases do too. The xattr mock was added in 2011; maybe it was to support Ubuntu Lucid Lynx? Bonus: now you can pause a test with the debugger, inspect its files in /tmp, and actually see the xattrs along with the data. Since this patch now uses a real filesystem for testing filesystem operations, tests are skipped if the underlying filesystem does not support setting xattrs (eg tmpfs or more than 4k of xattrs on ext4). References to "/tmp" have been replaced with calls to tempfile.gettempdir(). This will allow setting the TMPDIR envvar in test setup and getting an XFS filesystem instead of ext4 or tmpfs. THIS PATCH SIGNIFICANTLY CHANGES TESTING ENVIRONMENTS With this patch, every test environment will require TMPDIR to be using a filesystem that supports at least 4k of extended attributes. Neither ext4 nor tempfs support this. XFS is recommended. So why all the SkipTests? Why not simply raise an error? We still need the tests to run on the base image for OpenStack's CI system. Since we were previously mocking out xattr, there wasn't a problem, but we also weren't actually testing anything. This patch adds functionality to validate xattr data, so we need to drop the mock. `test.unit.skip_if_no_xattrs()` is also imported into `test.functional` so that functional tests can import it from the functional test namespace. The related OpenStack CI infrastructure changes are made in https://review.openstack.org/#/c/394600/. Co-Authored-By: John Dickinson <me@not.mn> Change-Id: I98a37c0d451f4960b7a12f648e4405c6c6716808	2017-11-03 13:30:05 -04:00
Tim Burke	c118059719	Respond 400 Bad Request when Accept headers fail to parse Change-Id: I6eb4e4bca95e2ee4fecdb703394cb2419737922d Closes-Bug: 1716509	2017-10-13 12:35:21 -07:00
Alistair Coles	a4a5494fd2	test account autocreate listing format Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d Change-Id: I50c22225bbebff71600bea9158bda1edd18b48b0	2017-10-09 13:56:26 -07:00
Tim Burke	839c13003a	Stop clearing params for account_autocreate responses Otherwise, we send back a 204 where middlewares should be expecting a 200 and an empty JSON array. Change-Id: I05549342327108f71b60a316f734c55bc9589915 Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d	2017-10-04 19:01:29 +00:00
Jenkins	23de16b0bf	Merge "Move listing formatting out to proxy middleware"	2017-09-20 01:15:28 +00:00
Kota Tsuyuzaki	1e79f828ad	Remove all post_as_copy related code and configes It was deprecated and we discussed on this topic in Denver PTG for Queen cycle. Main motivation for this work is that deprecated post_as_copy option and its gate blocks future symlink work. Change-Id: I411893db1565864ed5beb6ae75c38b982a574476	2017-09-16 05:50:41 +00:00
Tim Burke	4806434cb0	Move listing formatting out to proxy middleware Make some json -> (text, xml) stuff in a common module, reference that in account/container servers so we don't break existing clients (including out-of-date proxies), but have the proxy controllers always force a json listing. This simplifies operations on listings (such as the ones already happening in decrypter, or the ones planned for symlink and sharding) by only needing to consider a single response type. There is a downside of larger backend requests for text/plain listings, but it seems like a net win? Change-Id: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d	2017-09-15 06:38:26 +00:00
Alistair Coles	0e51ac09ad	Cleanup test tempdirs in tearDown Stop leaking tmp dirs in unit tests Change-Id: I606e9deeedc7c52a85d270b3cef7dfba13b4f0d3	2017-09-01 15:34:30 +01:00
Christopher Bartz	c653566f4a	headers_to_account_info include per policy stats This commit adds per-policy stats to proxy.controllers.base.headers_to_account_info Change-Id: I800266d15aabcc7b6e0234de3c9b965b5c15a623 Closes-Bug: #1675776	2017-08-14 11:41:04 +02:00
Jenkins	c3f6e82ae1	Merge "Write-affinity aware object deletion"	2017-07-06 14:00:05 +00:00
Jenkins	c22bab4b34	Merge "Version DLOs, just like every other type of object"	2017-07-03 14:06:07 +00:00
Lingxian Kong	831eb6e3ce	Write-affinity aware object deletion When deleting objects in multi-region swift delpoyment with write affinity configured, users always get 404 when deleting object before it's replcated to approriate nodes. This patch adds a config item 'write_affinity_handoff_delete_count' so that operator could define how many local handoff nodes should swift send request to get more candidates for the final response, or by default just leave it to swift to calculate the appropriate number. Change-Id: Ic4ef82e4fc1a91c85bdbc6bf41705a76f16d1341 Closes-Bug: #1503161	2017-06-27 22:42:02 +12:00
Kota Tsuyuzaki	066f44323d	Follow up for affinity config per policy This changes: - Add assertions for write_affinity values in load_app tests - Add a test case that policy override read_affinity default by timing strategy - Avoid 'scope' but it's 'label' Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc Change-Id: Ia8262490895d60da345f3679fc53653b2c2a2b3e	2017-06-13 11:53:11 +01:00
Jenkins	0d5b2a867d	Merge "Follow-up for per-policy proxy configs"	2017-06-05 08:50:45 +00:00
Tim Burke	5ecf828b17	Follow-up for per-policy proxy configs * Only use one StringIO in ConfigString * Rename the write_affinity_node_count function to be write_affinity_node_count_fn * Use comprehensions instead of six.moves.filter * Rename OverrideConf to ProxyOverrideOptions * Make ProxyOverrideOptions's __repr__ eval()able * Various conf -> options renames * Stop trying to handle a KeyError that should never come up * Be explicit about how deep we need to copy in proxy/test_server.py * Drop an unused return value * Add a test for a non-"proxy-server" app name * Combine bad-section-name tests * Try to clean up (at least a little) a self-described "hokey test" Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc Change-Id: I4e81175d5445049bc1f48b3ac02c5bc0f77e6f59	2017-06-01 20:36:19 +00:00
Alistair Coles	227cef9933	Add link from policies overview to per-policy proxy-server conf - add proxy server per policy config as an optional step in the configuration of a policy, with link to the deployment guide - add reverse link from deployment guide per-policy config doc section to storage policies docs Drive-by fix an incorrect test comment Change-Id: Ib95310193270a63c9d1e321c6e7de240e00b387f Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc	2017-05-26 10:41:35 +01:00
Jenkins	263dc8a3f3	Merge "Enable per policy proxy config options"	2017-05-25 06:34:48 +00:00
Jenkins	6f7b1f9ee2	Merge "Use setUpModule instead of setup for module level unit test setup"	2017-05-24 21:29:09 +00:00
Alistair Coles	45884c1102	Enable per policy proxy config options This is an alternative approach to that proposed in [1] Adds support for optional per-policy config sections to be added in proxy-server.conf. This is highly desirable to allow per-policy affinity options to be set for use with duplicated EC policies [2] and composite rings [3]. Certain options found in per-policy conf sections will override their equivalents that may be set in the [app:proxy-server] section. Currently the options handled that way are: sorting_method read_affinity write_affinity write_affinity_node_count For example: [proxy-server:policy:0] sorting_method = affinity read_affinity = r1=100 write_affinity = r1 write_affinity_node_count = 1 * replicas The corresponding attributes of the proxy-server Application are now available from instances of an OverrideConf object that is obtained from Application.get_policy_options(policy). [1] Related-Change: I9104fc789ba85ab3ab5ccd34096125b482821389 [2] Related-Change: Idd155401982a2c48110c30b480966a863f6bd305 [3] Related-Change: I0d8928b55020592f8e75321d1f7678688301d797 Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp> Change-Id: I3f718f425f525baa80045ba067950c752bcaaefc	2017-05-23 20:22:30 +01:00
Pete Zaitcev	5dfc3a75fb	Open-code eventlet.listen() Recently out gate started blowing up intermittently with a strange case of ports mixed up. Sometimes a functional tests tries to authorize on a port that's clearly an object server port, and the like. As it turns out, eventlet developers added an unavoidable SO_REUSEPORT into listen(), which makes listen(("localhost",0) to reuse ports. There's an issue about it: https://github.com/eventlet/eventlet/issues/411 This patch is working around the problem while eventlet people consider the issue. Change-Id: I67522909f96495a6a30e1acdb79835dce2189549	2017-05-11 01:39:14 -06:00
Alistair Coles	511ac2ee60	Use setUpModule instead of setup for module level unit test setup Module setup() and teardown() functions are found by nosetests [1] but unittests expects setUpModule() and tearDownModule() [2]. The latter function names are also found by nosetests, so using those function names enables the test module to be run with either nosetests or unittest. Although the tox test envs and .unittests script use nosetests, this change allows the convenience of using unittest, for example when it is the default test runner in a development environment such as PyCharm. This change also makes it unnecessary to explicitly call the setup() and teardown() functions when executing the module directly. [1] http://nose.readthedocs.io/en/latest/writing_tests.html#test-modules [2] https://docs.python.org/2/library/unittest.html#setupmodule-and-teardownmodule Change-Id: Ib2e5470a339af1f937b25d643b64356e8848ed36	2017-05-04 12:47:17 +01:00
Jenkins	dd3bc8fe61	Merge "Move EC-specific unit test to EC Test class"	2017-05-03 23:20:36 +00:00
Jenkins	33a422af9f	Merge "Fix (un)patch_policies"	2017-05-03 21:11:39 +00:00
Matthew Oliver	a07f7dc8c0	Fix sporadic failure in TestAccountController unit test The proxy server on occasion has error limited a node by the time the test runs, causing the proxie's node_iter failing to iter out this error limited node. As the test uses a default FakeRing with no extra handoffs, on this occasion we only get 2 requests which is not enough for quorum, causing it to return a 503. This patch sets the error_suppression_interval to 0 when creating the proxy server. Meaning a node effectively isn't error_limited. Change-Id: I96cf4c4d63594f803cc1cd57e874d1624db8e249 Closes-Bug: #1682026	2017-04-27 01:03:29 +00:00
Alistair Coles	c740447de5	Move EC-specific unit test to EC Test class The refactoring in the Related-Change separated EC specific object controller tests into EC specific TestCase classes, but left two EC specific tests in the Replication object controller test class. This patch moves them to the appropriate test class. Previously the tests were only executed once, now they are executed in each of two subclasses using different EC policies. As a result it was necessary to make the test container name unique to the policy under test. Related-Change: Ifd3d0fa66773e640bb61cc528f7a1b2358e97d91 Change-Id: Ie712ea91b5dd74c504a0dd6aa40c3d657277108c	2017-04-19 10:39:43 +01:00
Kota Tsuyuzaki	381640cf90	Fix (un)patch_policies Due to the refactoring of TestObjectController (related-change), all of BaseTestECObjectController test methods are not being needed to be unpatched because they are expected to run for test setup-ed policies. This patch works for items as follows: - Move part of setUp/tearDown routines at BaseTestObjectController needed by only TestReplicatedObjectController which affects patch_policies - Remove all unpatch_policies from BaseTestECObjectController - Set up self.ec_policy to avoid to set policy index and retrieve the policy for each test method. The reason why I didn't squash this up to the related parent patch is to clarify what was changed at those patches. The parent is for just clustering the tests for each test class and this one attempts to improve. Related-Change: Idd155401982a2c48110c30b480966a863f6bd305 Change-Id: I25a3f8fc837706d78dca226fe282d9e5ead65a0d	2017-04-18 23:30:39 -07:00
Tim Burke	3ad8773239	Version DLOs, just like every other type of object Previously, requests involving DLOs would bypass versioned_writes: * Any existing DLOs wouldn't get copied to the archive container during overwrites (or deletes, with history-mode), so there would be no evidence they had ever existed. * Any new DLOs wouldn't copy overwritten objects to the archive container, potentially leading to data loss. Now, DLOs will behave like every other type of object under versioned_writes. Change-Id: I488e13eead2f33dd272d03f6f898adc52fc7fdad Related-Change: Ie899290b3312e201979eafefb253d1a60b65b837 Related-Change: Ib5b29a19e1d577026deb50fc9d26064a8da81cd7 Closes-Bug: #1626989	2017-03-27 17:15:13 +00:00
Kota Tsuyuzaki	8fe4bfefaa	TestObjectController refactoring From the related change of ECDuplication, Swift have a couple of Test classes for EC policy, normal EC and EC Duplication, in the test/unit/proxy/test_server.py. To enable the classes, the related change abstracts the EC test cases as the ECTestMixin class to gather test methods into one place but it was worse because TestObjectController did still have both test cases for replication and for ec that may be hard to understand the test class structure. Hence, this patch attempts to refactor the structure as From: ECTestMixin \| ------------------------------------- \| \| TestObjectController TestObjectControllerECDuplication (for replication and EC) (for EC Duplication Policy) To: BaseTestObjectController \| -------------------------------------- \| \| TestReplicatedObjectController BaseTestECObjectController (for replication) \| --------------------------------- \| \| TestECObjectController TestECDuplicationObjectController (for EC policy) (for EC Duplication Policy) Some more cleanups are in follow up patches because this patch shows a lot of moving code chunks which could be hard to compare the diff. To make the review easy, this patch forcus on ONLY the structure changes as possible. Related-Change: Idd155401982a2c48110c30b480966a863f6bd305 Related-Change: I25a3f8fc837706d78dca226fe282d9e5ead65a0d Change-Id: Ifd3d0fa66773e640bb61cc528f7a1b2358e97d91	2017-03-22 19:54:50 +00:00
Jenkins	1e9b8888bf	Merge "Enable cluster-wide CORS Expose-Headers setting"	2017-03-13 19:24:20 +00:00
Jenkins	cf1c44dff0	Merge "Fixups for EC frag duplication tests"	2017-03-03 23:08:34 +00:00
Jenkins	1f36b5dd16	Merge "EC Fragment Duplication - Foundational Global EC Cluster Support"	2017-02-26 06:26:08 +00:00
Alistair Coles	e4972f5ac7	Fixups for EC frag duplication tests Follow up for related change: - fix typos - use common helper methods - refactor some tests to reduce duplicate code Related-Change: Idd155401982a2c48110c30b480966a863f6bd305 Change-Id: I2f91a2f31e4c1b11f3d685fa8166c1a25eb87429	2017-02-25 20:40:04 -08:00
Romain LE DISEZ	9b47de3095	Enable cluster-wide CORS Expose-Headers setting An operator proposing a web UX to its customers might want to allow web browser to access some headers by default (eg: X-Storage-Policy, X-Container-Read, ...). This commit adds a new setting to the proxy-server to allow some headers to be added cluster-wide to the CORS header Access-Control-Expose-Headers. Change-Id: I5ca90a052f27c98a514a96ee2299bfa1b6d46334	2017-02-25 19:00:28 +01:00
Jenkins	075c21a944	Merge "Add Vary: headers for CORS responses"	2017-02-23 01:45:29 +00:00
Kota Tsuyuzaki	40ba7f6172	EC Fragment Duplication - Foundational Global EC Cluster Support This patch enables efficent PUT/GET for global distributed cluster[1]. Problem: Erasure coding has the capability to decrease the amout of actual stored data less then replicated model. For example, ec_k=6, ec_m=3 parameter can be 1.5x of the original data which is smaller than 3x replicated. However, unlike replication, erasure coding requires availability of at least some ec_k fragments of the total ec_k + ec_m fragments to service read (e.g. 6 of 9 in the case above). As such, if we stored the EC object into a swift cluster on 2 geographically distributed data centers which have the same volume of disks, it is likely the fragments will be stored evenly (about 4 and 5) so we still need to access a faraway data center to decode the original object. In addition, if one of the data centers was lost in a disaster, the stored objects will be lost forever, and we have to cry a lot. To ensure highly durable storage, you would think of making more parity fragments (e.g. ec_k=6, ec_m=10), unfortunately this causes significant performance degradation due to the cost of mathmetical caluculation for erasure coding encode/decode. How this resolves the problem: EC Fragment Duplication extends on the initial solution to add more fragments from which to rebuild an object similar to the solution described above. The difference is making copies of encoded fragments. With experimental results[1][2], employing small ec_k and ec_m shows enough performance to store/retrieve objects. On PUT: - Encode incomming object with small ec_k and ec_m <- faster! - Make duplicated copies of the encoded fragments. The # of copies are determined by 'ec_duplication_factor' in swift.conf - Store all fragments in Swift Global EC Cluster The duplicated fragments increase pressure on existing requirements when decoding objects in service to a read request. All fragments are stored with their X-Object-Sysmeta-Ec-Frag-Index. In this change, the X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index encoded by PyECLib, there will be duplicates. Anytime we must decode the original object data, we must only consider the ec_k fragments as unique according to their X-Object-Sysmeta-Ec-Frag-Index. On decode no duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and avoided if possible. On GET: This patch inclues following changes: - Change GET Path to sort primary nodes grouping as subsets, so that each subset will includes unique fragments - Change Reconstructor to be more aware of possibly duplicate fragments For example, with this change, a policy could be configured such that swift.conf: ec_num_data_fragments = 2 ec_num_parity_fragments = 1 ec_duplication_factor = 2 (object ring must have 6 replicas) At Object-Server: node index (from object ring): 0 1 2 3 4 5 <- keep node index for reconstruct decision X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual fragment index for backend (PyEClib) Additional improvements to Global EC Cluster Support will require features such as Composite Rings, and more efficient fragment rebalance/reconstruction. 1: http://goo.gl/IYiNPk (Swift Design Spec Repository) 2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo) Doc-Impact Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: Idd155401982a2c48110c30b480966a863f6bd305	2017-02-22 10:56:13 -08:00
Tim Burke	e8a80e874a	Let users know entity size in 416 responses If a user sends a Range header with no satisfiable ranges, we send back a 416 Requested Range Not Satisfiable response. Previously however, there would be no indication of the size of the object they were requesting, so they wouldn't know how to craft a satisfiable range. We do send a Content-Length, but it is (correctly) the length of the error message. The RFC [1] has an answer for this: > A server generating a 416 (Range Not Satisfiable) response to a > byte-range request SHOULD send a Content-Range header field with an > unsatisfied-range value, as in the following example: > > Content-Range: bytes */1234 > > The complete-length in a 416 response indicates the current length of > the selected representation. Now, we'll send a Content-Range header for all 416 responses, including those coming from the object server as well as those generated on a proxy because of the Range mangling required to support EC policies. [1] RFC 7233, section 4.2, although similar language was used in RFC 2616, sections 10.4.17 and 14.16 Change-Id: I80c7390fc6f84a10a212b0641bb07a64dfccbd45	2016-11-30 10:52:08 -08:00
Tim Burke	e8a5448b07	Add X-Openstack-Request-Id to Access-Control-Expose-Headers Change-Id: Ib95a693042f0b3cf204033eb5957660cb3573dcf Related-Change: I56cd4738808b99c0a08463f83c100be51a62db05	2016-11-16 12:39:12 -08:00
Alistair Coles	b13b49a27c	EC - eliminate .durable files Instead of using a separate .durable file to indicate the durable status of a .data file, rename the .data to include a durable marker in the filename. This saves one inode for every EC fragment archive. An EC policy PUT will, as before, first rename a temp file to: <timestamp>#<frag_index>.data but now, when the object is committed, that file will be renamed: <timestamp>#<frag_index>#d.data with the '#d' suffix marking the data file as durable. Diskfile suffix hashing returns the same result when the new durable-data filename or the legacy durable file is found in an object directory. A fragment archive that has been created on an upgraded object server will therefore appear to be in the same state, as far as the consistency engine is concerned, as the same fragment archive created on an older object server. Since legacy .durable files will still exist in deployed clusters, many of the unit tests scenarios have been duplicated for both new durable-data filenames and legacy durable files. Change-Id: I6f1f62d47be0b0ac7919888c77480a636f11f607	2016-10-10 18:11:02 +01:00
Jenkins	8526d4c5d2	Merge "Fix using filter() to meet python2,3"	2016-09-28 22:25:57 +00:00
Clay Gerrard	bfaa8e0583	Fix ChunkWriteError when running unittests I don't think this is a real bug - just that the mocked iter wasn't closing it subiters like the real iter does. Change-Id: I44c8159f9eea8737bc86b6c7eb59a512e57e86c1	2016-09-21 17:33:30 -07:00
Luong Anh Tuan	19a684dded	Fix using filter() to meet python2,3 As mentioned in link[1], if we need filter on python3, Raplace filter(lambda obj: test(obj), data) with: [obj for obj in data if test(obj)]. [1] https://wiki.openstack.org/wiki/Python3 Change-Id: Ia1ea2ec89e4beb957a4cb358b0d0cef970f23e0a	2016-09-22 07:32:38 +07:00
Alistair Coles	44a861787a	Enable object server to return non-durable data This patch improves EC GET response handling: - The proxy no longer requires all object servers to have a durable file for the fragment archive that they return in response to a GET. The proxy will now be satisfied if just one object server has a durable file at the same timestamp as fragments from other object servers. This means that the proxy can now successfully GET an object that had missing durable files when it was PUT. - The proxy will now ensure that it has a quorum of unique fragment indexes from object servers before considering a GET to be successful. - The proxy is now able to fetch multiple fragment archives having different indexes from the same node. This enables the proxy to successfully GET an object that has some fragments that have landed on the same node, for example after a rebalance. This new behavior is facilitated by an exchange of new headers on a GET request and response between the proxy and object servers. An object server now includes with a GET (or HEAD) response: - X-Backend-Fragments: the value of this describes all fragment archive indexes that the server has for the object by encoding a map of the form: timestamp -> <list of fragment indexes> - X-Backend-Durable-Timestamp: the value of this is the internal form of the timestamp of the newest durable file that was found, if any. - X-Backend-Data-Timestamp: the value of this is the internal form of the timestamp of the data file that was used to construct the diskfile. A proxy server now includes with a GET request: - X-Backend-Fragment-Preferences: the value of this describes the proxy's current preference with respect to those fragments that it would have object servers return. It encodes a list of timestamp, and for each timestamp a list of fragment indexes that the proxy does NOT require (because it already has them). The presence of a X-Backend-Fragment-Preferences header (even one with an empty list as its value) will cause the object server to search for the most appropriate fragment to return, disregarding the existence or not of any durable file. The object server assumes that the proxy knows best. Closes-Bug: 1469094 Closes-Bug: 1484598 Change-Id: I2310981fd1c4622ff5d1a739cbcc59637ffe3fc3 Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>	2016-09-16 11:40:14 +01:00
Jenkins	8608bd96dd	Merge "Make object creation more atomic in Linux"	2016-09-13 04:02:47 +00:00
Jenkins	0944753b37	Merge "Fix EC ring validation at ring reload"	2016-08-27 01:01:24 +00:00
Prashanth Pai	773edb4a5d	Make object creation more atomic in Linux Linux 3.11 introduced O_TMPFILE as a flag to open() sys call. This would enable users to get a fd to an unnamed temporary file. As it's unnamed, it does not require the caller to devise unique names. It is also not accessible through any path. Hence, file creation is race-free. This file is initially unreachable. It is then populated with data(write), metadata(fsetxattr) and fsync'd before being atomically linked into the filesystem in a fully formed state using linkat() sys call. Only after a successful linkat() will the object file will be available for reference. Caveats * Unlike os.rename(), linkat() cannot overwrite destination path if it already exists. If path exists, we unlink and try again. * XFS support for O_TMPFILE was only added in Linux 3.15. * If client disconnects during object upload, although there is no incomplete/stale file on disk, the object directory would persist and is not cleaned up immediately. Change-Id: I8402439fab3aba5d7af449b5e465f89332f606ec Signed-off-by: Prashanth Pai <ppai@redhat.com>	2016-08-24 14:56:00 +05:30
Tim Burke	3e46079546	Add Vary: headers for CORS responses From the (non-normative) Implementation Considerations section of https://www.w3.org/TR/cors/#resource-implementation : > Resources that wish to enable themselves to be shared with multiple > Origins but do not respond uniformly with "*" must in practice > generate the Access-Control-Allow-Origin header dynamically in > response to every request they wish to allow. As a consequence, > authors of such resources should send a Vary: Origin HTTP header or > provide other appropriate control directives to prevent caching of > such responses, which may be inaccurate if re-used across-origins. We do the first part (dynamic Access-Control-Allow-Origin: generation based on the incoming Origin: header), but not the second (send a Vary: Origin header). Consider this scenario: 1. Swift user Alice has some static content that should be available from some (but not all) other domains. She creates a new container with an appropriate X-Container-Meta-Access-Control-Allow-Origin like "http://foo.example.com http://bar.example.com". 2. End user Bob pulls up a browser and visits http://foo.example.com, which references a cross-origin resource. Seeing this, the browser issues a preflight request and gets back a response that includes headers like: Access-Control-Allow-Origin: http://foo.example.com Access-Control-Allow-Methods: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE Since the preflight succeeded, the browser follows through on the cross-origin request and everything loads properly. 3. Now Bob visits http://bar.example.com, which references the same resource. Ordinarily, the exact same thing would happen, but with http://bar.example.com in the headers. However, if the browser cached the preflight response (because it didn't want to make two requests everytime it needed a resource), it would assume the server would only allow resource-sharing with http://foo.example.com and not load the resource. Similar issues arise from the dynamically-generated Access-Control-Allow-Headers header. For more information on the Vary: header, see http://tools.ietf.org/html/rfc7231#section-7.1.4 Change-Id: I9950e593312f654ee596b7f43f7ab9e5b684d8e5	2016-08-19 16:28:16 -07:00
Jenkins	1c74fbec02	Merge "Use more specific asserts in test/unit/proxy tests"	2016-08-19 03:54:49 +00:00
Jenkins	9d29ca1c76	Merge "Last-Modified header support on HEAD/GET container"	2016-08-11 14:44:12 +00:00
Rebecca Finn	aa2a84ba8a	Check object metadata constraints after authorizing In the object proxy controller, the POST method checked the metadata of an object before calling swift.authorize. This could allow an auth middleware to set metadata that violates constraints. Instead, checking the metadata should take place after authorization. Change-Id: I5f05039498c406473952e78c6a40ec11e8b53f8e Closes-Bug: #1596944	2016-07-28 19:05:08 +00:00

1 2 3 4 5 ...

463 Commits