swift

Author	SHA1	Message	Date
Romain LE DISEZ	e199192cae	Replace replication_one_per_device by custom count This commit replaces boolean replication_one_per_device by an integer replication_concurrency_per_device. The new configuration parameter is passed to utils.lock_path() which now accept as an argument a limit for the number of locks that can be acquired for a specific path. Instead of trying to lock path/.lock, utils.lock_path() now tries to lock files path/.lock-X, where X is in the range (0, N), N being the limit for the number of locks allowed for the path. The default value of limit is set to 1. Change-Id: I3c3193344c7a57a8a4fc7932d1b10e702efd3572	2017-10-24 16:17:41 +01:00
Samuel Merritt	9c97c80b26	Clean up a couple hand-rolled mocks. Change-Id: I6582985990e8b5e3a0c65bce5d3bb1e39d58dfb9	2017-10-20 15:31:07 -07:00
Zuul	6b8716a34e	Merge "tighten up drop_privileges unit tests"	2017-10-19 03:36:58 +00:00
Zuul	3facd2a13b	Merge "Drop group comparison from drop_privileges test"	2017-10-18 10:55:25 +00:00
Alistair Coles	1aecd1dfc6	tighten up drop_privileges unit tests add more assertions about args that are passed to os module functions Related-Change: Ida15e72ae4ecdc2d6ce0d37bd99c2d86bd4e5ddc Change-Id: Iee483274aff37fc9930cd54008533de2917157f4	2017-10-18 11:00:59 +01:00
Tim Burke	646f7507a1	Quiet test output when running test_utils.py in isolation Change-Id: I4cf85d3cd5a20424e9bbbdf0213120b3c3d4b837	2017-10-17 21:17:57 +00:00
Corey Bryant	0f91c862e1	Drop group comparison from drop_privileges test Drop the group comparison from drop_privileges test as it isn't valid since os.setgroups() is mocked. Change-Id: Ida15e72ae4ecdc2d6ce0d37bd99c2d86bd4e5ddc Closes-Bug: #1724342	2017-10-17 20:06:27 +00:00
Jenkins	21a0df4ae0	Merge "More assertion cleanup"	2017-09-13 06:44:02 +00:00
Samuel Merritt	6d160797fc	Fix deadlock when logging from a tpool thread. The object server runs certain IO-intensive methods outside the main pthread for performance. If one of those methods tries to log, this can cause a crash that eventually leads to an object server with hundreds or thousands of greenthreads, all deadlocked. The short version of the story is that logging.SysLogHandler has a mutex which Eventlet monkey-patches. However, the monkey-patched mutex sometimes breaks if used across different pthreads, and it breaks in such a way that it is still considered held. After that happens, any attempt to emit a log message blocks the calling greenthread forever. The fix is to use a mutex that works across different greenlets and across different pthreads. This patch introduces such a lock based on an anonymous pipe. Change-Id: I57decefaf5bbed57b97a62d0df8518b112917480 Closes-Bug: 1710328	2017-08-16 14:10:25 -07:00
Jenkins	e71064b38c	Merge "Fix swiftdir option and usage of storage policy aliases"	2017-07-25 23:15:19 +00:00
Jenkins	3c11f6b8a8	Merge "Make dict deletion idempotent in dump_recon_cache"	2017-07-18 03:28:16 +00:00
Jenkins	83b62b4f39	Merge "Add Timestamp.now() helper"	2017-07-18 03:27:50 +00:00
Alistair Coles	c2e59b9b8b	Make dict deletion idempotent in dump_recon_cache Calling dump_recon_cache with a key mapped to an empty dict value causes the key to be removed from the cache entry. Doing the same again causes the key to be added back and mapped an empty dict, and the key continues to toggle as calls are repeated. This behavior is seen on the Related-Bug report. This patch fixes dump_recon_cache to make deletion of a key idempotent. This fix is needed for the Related-Change which makes use of empty dicts with dump_recon_cache to clear unwanted keys from the cache. The only caller that currently set empty dict values is obj/auditor.py where the current intended behavior would appear to be as per this patch. Related-Change: I28925a37f3985c9082b5a06e76af4dc3ec813abe Related-Bug: #1704858 Change-Id: If9638b4e7dba0ec2c7bd95809cec6c5e18e9301e	2017-07-17 17:14:50 -07:00
Christian Schwede	2410b616bb	Fix swiftdir option and usage of storage policy aliases If swift-recon/swift-get-nodes/swift-object-info is used with the swiftdir option they will read rings from the given directory; however they are still using /etc/swift/swift.conf to find the policies on the current node. This makes it impossible to maintain a local swift.conf copy (if you don't have write access to /etc/swift) or check multiple clusters from the same node. Until now swift-recon was also not usable with storage policy aliases, this patch fixes this as well. Closes-Bug: 1577582 Closes-Bug: 1604707 Closes-Bug: 1617951 Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Change-Id: I13188d42ec19e32e4420739eacd1e5b454af2ae3	2017-07-12 16:23:04 -04:00
Jenkins	e94b383655	Merge "Add support to increase object ring partition power"	2017-07-05 14:40:42 +00:00
Christian Schwede	e1140666d6	Add support to increase object ring partition power This patch adds methods to increase the partition power of an existing object ring without downtime for the users using a 3-step process. Data won't be moved to other nodes; objects using the new increased partition power will be located on the same device and are hardlinked to avoid data movement. 1. A new setting "next_part_power" will be added to the rings, and once the proxy server reloaded the rings it will send this value to the object servers on any write operation. Object servers will now create a hard-link in the new location to the original DiskFile object. Already existing data will be relinked using a new tool in the new locations using hardlinks. 2. The actual partition power itself will be increased. Servers will now use the new partition power to read from and write to. No longer required hard links in the old object location have to be removed now by the relinker tool; the relinker tool reads the next_part_power setting to find object locations that need to be cleaned up. 3. The "next_part_power" flag will be removed. This mostly implements the spec in [1]; however it's not using an "epoch" as described there. The idea of the epoch was to store data using different partition powers in their own namespace to avoid conflicts with auditors and replicators as well as being able to abort such an operation and just remove the new tree. This would require some heavy change of the on-disk data layout, and other object-server implementations would be required to adopt this scheme too. Instead the object-replicator is now aware that there is a partition power increase in progress and will skip replication of data in that storage policy; the relinker tool should be simply run and afterwards the partition power will be increased. This shouldn't take that much time (it's only walking the filesystem and hardlinking); impact should be low therefore. The relinker should be run on all storage nodes at the same time in parallel to decrease the required time (though this is not mandatory). Failures during relinking should not affect cluster operations - relinking can be even aborted manually and restarted later. Auditors are not quarantining objects written to a path with a different partition power and therefore working as before (though they are reading each object twice in the worst case before the no longer needed hard links are removed). Co-Authored-By: Alistair Coles <alistair.coles@hpe.com> Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> [1] https://specs.openstack.org/openstack/swift-specs/specs/in_progress/ increasing_partition_power.html Change-Id: I7d6371a04f5c1c4adbb8733a71f3c177ee5448bb	2017-06-15 15:08:48 -07:00
Jenkins	6181351a65	Merge "Make mount_check option usable in containerized environments"	2017-06-14 19:49:23 +00:00
lingyongxu	ee9458a250	Using assertIsNone() instead of assertEqual(None) Following OpenStack Style Guidelines: [1] http://docs.openstack.org/developer/hacking/#unit-tests-and-assertraises [H203] Unit test assertions tend to give better messages for more specific assertions. As a result, assertIsNone(...) is preferred over assertEqual(None, ...) and assertIs(..., None) Change-Id: If4db8872c4f5705c1fff017c4891626e9ce4d1e4	2017-06-07 14:05:53 +08:00
Christian Schwede	5eeaa95440	Make mount_check option usable in containerized environments The ismount_raw method does not work inside containers if disks are mounted on the hostsystem and only mountpoints are exposed inside the containers. In this case the inode and device checks fail, making this option unusable. Mounting devices into the containers would solve this. However, this would require that all processes that require access to a device are running inside the same container, which counteracts the container concept. This patch adds the possiblity to place stubfiles named ".ismount" into the root directory of any device, and Swift assumes a given device to be mounted if that file exists. This should be transparent to existing clusters. Change-Id: I9d9fc0a4447a8c5dd39ca60b274c119af6b4c28f	2017-05-19 12:16:53 +02:00
Jenkins	1f36582efb	Merge "Fix unit tests on i386 and other archs"	2017-05-10 19:58:45 +00:00
Tim Burke	85d6cd30be	Add Timestamp.now() helper Often, we want the current timestamp. May as well improve the ergonomics a bit and provide a class method for it. Change-Id: I3581c635c094a8c4339e9b770331a03eab704074	2017-04-27 14:19:00 -07:00
Ondřej Nový	9e15effb3b	Fix unit tests on i386 and other archs Change-Id: I4f84b725e220e28919570fd7f296b63b34d0375d	2017-04-24 21:40:31 +00:00
Tim Burke	1776e0fd20	Improve test_get_valid_utf8_str coverage Include a couple trivial cases, and verify that surrogate pairs get collapsed. Also, move it to a more-appropriate class. Related-Change: I4c570c08c770636d57b1157e19d5b7034fd9ed4e (patchset 3) Change-Id: Iab0fdafe08d06a9d677dc421e60779e94d27ba9b	2017-04-20 15:54:54 -07:00
Clay Gerrard	88ebcafbb9	Fix intermittent test_unlink_* failures Change-Id: Iab403724a418e5d8a44e56e58da782bc66eab6e4 Closes-Bug: #1579578	2017-03-29 22:30:54 +00:00
Alistair Coles	e4972f5ac7	Fixups for EC frag duplication tests Follow up for related change: - fix typos - use common helper methods - refactor some tests to reduce duplicate code Related-Change: Idd155401982a2c48110c30b480966a863f6bd305 Change-Id: I2f91a2f31e4c1b11f3d685fa8166c1a25eb87429	2017-02-25 20:40:04 -08:00
Kota Tsuyuzaki	40ba7f6172	EC Fragment Duplication - Foundational Global EC Cluster Support This patch enables efficent PUT/GET for global distributed cluster[1]. Problem: Erasure coding has the capability to decrease the amout of actual stored data less then replicated model. For example, ec_k=6, ec_m=3 parameter can be 1.5x of the original data which is smaller than 3x replicated. However, unlike replication, erasure coding requires availability of at least some ec_k fragments of the total ec_k + ec_m fragments to service read (e.g. 6 of 9 in the case above). As such, if we stored the EC object into a swift cluster on 2 geographically distributed data centers which have the same volume of disks, it is likely the fragments will be stored evenly (about 4 and 5) so we still need to access a faraway data center to decode the original object. In addition, if one of the data centers was lost in a disaster, the stored objects will be lost forever, and we have to cry a lot. To ensure highly durable storage, you would think of making more parity fragments (e.g. ec_k=6, ec_m=10), unfortunately this causes significant performance degradation due to the cost of mathmetical caluculation for erasure coding encode/decode. How this resolves the problem: EC Fragment Duplication extends on the initial solution to add more fragments from which to rebuild an object similar to the solution described above. The difference is making copies of encoded fragments. With experimental results[1][2], employing small ec_k and ec_m shows enough performance to store/retrieve objects. On PUT: - Encode incomming object with small ec_k and ec_m <- faster! - Make duplicated copies of the encoded fragments. The # of copies are determined by 'ec_duplication_factor' in swift.conf - Store all fragments in Swift Global EC Cluster The duplicated fragments increase pressure on existing requirements when decoding objects in service to a read request. All fragments are stored with their X-Object-Sysmeta-Ec-Frag-Index. In this change, the X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index encoded by PyECLib, there will be duplicates. Anytime we must decode the original object data, we must only consider the ec_k fragments as unique according to their X-Object-Sysmeta-Ec-Frag-Index. On decode no duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and avoided if possible. On GET: This patch inclues following changes: - Change GET Path to sort primary nodes grouping as subsets, so that each subset will includes unique fragments - Change Reconstructor to be more aware of possibly duplicate fragments For example, with this change, a policy could be configured such that swift.conf: ec_num_data_fragments = 2 ec_num_parity_fragments = 1 ec_duplication_factor = 2 (object ring must have 6 replicas) At Object-Server: node index (from object ring): 0 1 2 3 4 5 <- keep node index for reconstruct decision X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual fragment index for backend (PyEClib) Additional improvements to Global EC Cluster Support will require features such as Composite Rings, and more efficient fragment rebalance/reconstruction. 1: http://goo.gl/IYiNPk (Swift Design Spec Repository) 2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo) Doc-Impact Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: Idd155401982a2c48110c30b480966a863f6bd305	2017-02-22 10:56:13 -08:00
Jenkins	fa456a590c	Merge "IO Priority support on the AArch64 architecture"	2017-02-09 00:16:30 +00:00
tone-zhang	031ba11357	IO Priority support on the AArch64 architecture This patches fixes Swift's IO priority control on the AArch64 architecture by getting the correct __NR_ioprio_set value. Change-Id: Ic93ce80fde223074e7d1a5338c8cf88863c6ddeb Closes-Bug: #1658405	2017-01-24 05:40:10 +00:00
Jenkins	06136112f7	Merge "Refactor recon to use single md5_hash_for_file function"	2017-01-24 00:38:24 +00:00
Jenkins	94c6b9e46b	Merge "Raise ValueError if a config section does not exist"	2016-12-19 07:30:59 +00:00
Christian Hugo	ffd5194a3b	Raise ValueError if a config section does not exist Instead of printing the error message and calling sys.exit() when a section not exists or reading the file failed rais an Exception from readconfig. Depending on the Value or IO-Error, the caller can decide if he wants to exit or continue. If an Exception reaches the wsgi utilities it bubbles all the way up. Change-Id: Ieb444f8c34e37f49bea21c3caf1c6c2d7bee5fb4 Closes-Bug: 1578321	2016-12-15 19:49:57 +00:00
Tim Burke	c6b9195db8	More assertion cleanup Change-Id: Id88af19c5bfd0bcbbeabcf4eeb23beef4c50b1cb Related-Change: I416831c8ad92f8445bc8d9560040a5ebf5c90702	2016-12-12 14:08:07 -08:00
Cao Xuan Hoang	3da144a3af	Replace 'assertTrue(a not in b)' with 'assertNotIn(a, b)' trivialfix Change-Id: I416831c8ad92f8445bc8d9560040a5ebf5c90702	2016-12-12 16:23:09 +07:00
Alistair Coles	609b5182c4	Refactor recon to use single md5_hash_for_file function There were several implementations of hashing the content of a file in cli/recon.py and common/middleware/recon.py. This patch relocates one implementation (_hash_for_ringfile, introduced in the Related Change) to common/utils.py and refactors recon cli and middleware to use that function. Also improves use of mocking in the unit tests to eliminate passing custom file opener functions to the ReconMiddleware get_ring_md5 and get_swift_conf_md5 methods. Related-Change: I9623752c3cd2361f57864f3e938e1baf5e9292d7 Change-Id: Iaad88e49aadeb28f614aafa1e9596fe07ce9793a	2016-12-02 18:22:59 +00:00
Ondřej Nový	9847796f01	Set owner of drive-audit recon cache to swift user Fixies this problem: * swift-drive-audit needs to be run by root, because only root have "umount" permission * swift-object servers typically runs as user swift * if swift-drive-audit is run by root, /var/cache/swift/drive.recon is owned by root, with 0o600 * recon middleware (inside swift-object-server) can't read this cache file: swift-object: Error reading recon cache file This patch adds "user" option to drive-audit config file. Recon cache is chowned to this user. Change-Id: Ibf20543ee690b7c5a37fabd1540fd5c0c7b638c9	2016-10-19 17:16:42 +00:00
Jenkins	32bc272634	Merge "Fix when we set state in Spliterator"	2016-10-03 23:46:22 +00:00
Tim Burke	2ec4189e37	Fix when we set state in Spliterator Also clean up a comment and some exception text Change-Id: I1e7755cc0468f9a3ba96a0dd24868f09a10c3df0 Related-Change: I24716e3271cf3370642e3755447e717fd7d9957c	2016-10-03 14:27:47 -07:00
Jenkins	1e5c5c35bd	Merge "Support multi-range GETs for static large objects."	2016-09-28 04:48:34 +00:00
Alistair Coles	44a861787a	Enable object server to return non-durable data This patch improves EC GET response handling: - The proxy no longer requires all object servers to have a durable file for the fragment archive that they return in response to a GET. The proxy will now be satisfied if just one object server has a durable file at the same timestamp as fragments from other object servers. This means that the proxy can now successfully GET an object that had missing durable files when it was PUT. - The proxy will now ensure that it has a quorum of unique fragment indexes from object servers before considering a GET to be successful. - The proxy is now able to fetch multiple fragment archives having different indexes from the same node. This enables the proxy to successfully GET an object that has some fragments that have landed on the same node, for example after a rebalance. This new behavior is facilitated by an exchange of new headers on a GET request and response between the proxy and object servers. An object server now includes with a GET (or HEAD) response: - X-Backend-Fragments: the value of this describes all fragment archive indexes that the server has for the object by encoding a map of the form: timestamp -> <list of fragment indexes> - X-Backend-Durable-Timestamp: the value of this is the internal form of the timestamp of the newest durable file that was found, if any. - X-Backend-Data-Timestamp: the value of this is the internal form of the timestamp of the data file that was used to construct the diskfile. A proxy server now includes with a GET request: - X-Backend-Fragment-Preferences: the value of this describes the proxy's current preference with respect to those fragments that it would have object servers return. It encodes a list of timestamp, and for each timestamp a list of fragment indexes that the proxy does NOT require (because it already has them). The presence of a X-Backend-Fragment-Preferences header (even one with an empty list as its value) will cause the object server to search for the most appropriate fragment to return, disregarding the existence or not of any durable file. The object server assumes that the proxy knows best. Closes-Bug: 1469094 Closes-Bug: 1484598 Change-Id: I2310981fd1c4622ff5d1a739cbcc59637ffe3fc3 Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>	2016-09-16 11:40:14 +01:00
Jenkins	8608bd96dd	Merge "Make object creation more atomic in Linux"	2016-09-13 04:02:47 +00:00
Jenkins	dd30b9ef98	Merge "Close the iterators in string_along."	2016-08-31 00:15:33 +00:00
Timur Alperovich	66c905e294	Close the iterators in string_along. Make sure to close the underlying iterator in string_along. What is currently happening when using the InternalClient is that "Client disconnected" warnings are generated and resources are tied up until GC runs. Change-Id: If1f6c0c756aee95f53f99371439533a97d347eab	2016-08-30 14:28:08 -07:00
Prashanth Pai	773edb4a5d	Make object creation more atomic in Linux Linux 3.11 introduced O_TMPFILE as a flag to open() sys call. This would enable users to get a fd to an unnamed temporary file. As it's unnamed, it does not require the caller to devise unique names. It is also not accessible through any path. Hence, file creation is race-free. This file is initially unreachable. It is then populated with data(write), metadata(fsetxattr) and fsync'd before being atomically linked into the filesystem in a fully formed state using linkat() sys call. Only after a successful linkat() will the object file will be available for reference. Caveats * Unlike os.rename(), linkat() cannot overwrite destination path if it already exists. If path exists, we unlink and try again. * XFS support for O_TMPFILE was only added in Linux 3.15. * If client disconnects during object upload, although there is no incomplete/stale file on disk, the object directory would persist and is not cleaned up immediately. Change-Id: I8402439fab3aba5d7af449b5e465f89332f606ec Signed-off-by: Prashanth Pai <ppai@redhat.com>	2016-08-24 14:56:00 +05:30
Samuel Merritt	4bcd3d7f6d	Support multi-range GETs for static large objects. Bonus consistency: 416 responses now always have a body. Before, if you had "swob.HTTPRequestedRangeNotSatisfiable()", you'd get a body, but if you had "swob.Response(..., conditional_response=True)", then you'd get a length-0 response body. Now you always get a response body. It's just the default <html><h1>..., but at it's always there. Bonus efficiency: do a little caching of sub-SLO manifests to avoid needless re-fetches. This kicks in when there are multiple references to the same sub-SLO in a given manifest. The caching only holds 20 sub-SLOs so that a malicious user can't build a giant SLO tree and use it to run the proxy out of memory (we're already holding up to 10 manifests in memory at a time since a SLO can include another SLO to a depth of 10; this doesn't make the situation too much worse). Change-Id: I24716e3271cf3370642e3755447e717fd7d9957c	2016-08-18 15:56:06 -07:00
Peter Lisák	ed772236c7	Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396	2016-08-10 23:56:15 +02:00
Jenkins	e9f5e7966a	Merge "Moved ipv4 & ipv6 validations to the common utils"	2016-07-29 21:25:16 +00:00
Nandini Tata	5c9732ac8e	Moved ipv4 & ipv6 validations to the common utils Validating ip addresses for ipv4 and ipv6 formats have more generic use cases outside of rings. swift-get-nodes and other utilities that need to handle ipv6 adrresses often require importing ip validation methods from swift/common/rings/utils (see Related-Change). Also, expand_ipv6 method already exists in swift/common/utils. Hence moving validation of ips also into swift/common/utils from swift/common/ring/utils. Related-Change: I6551d65241950c65e7160587cc414deb4a2122f5 Change-Id: I720a9586469cf55acab74b4b005907ce106b3da4	2016-07-28 12:08:06 -07:00
Alistair Coles	928c4790eb	Refactor tests and add tests Relocates some test infrastructure in preparation for use with encryption tests, in particular moves the test server setup code from test/unit/proxy/test_server.py to a new helpers.py so that it can be re-used, and adds ability to specify additional config options for the test servers (used in encryption tests). Adds unit test coverage for extract_swift_bytes and functional test coverage for container listings. Adds a check on the content and metadata of reconciled objects in probe tests. Change-Id: I9bfbf4e47cb0eb370e7a74d18c78d67b6b9d6645	2016-06-15 16:36:25 +01:00
Matthew Oliver	876df35f84	disable_fallocate also disables fallocate_reserve Currently when disable_fallocate is true it disables calling the fallocate syscall, but it doesn't disable fallocate_reserve. This patch fixes this. This problem has caused functional tests to fail in our SAIOs, since SAIOs have disable_fallocate set but the fallocate_reserve space free checking was still being run creating 507 responses. This is thanks to the change in fallocate_reserve default changing from 0 to 1%. Because fallocate_reserve and disable_fallocate causes SAIO functional tests to fail a section called 'Known Issues' has been added to the SAIO developer documentation which includes a warning about using fallocate_reserve on SAIOs. Change-Id: I727bfb0861ea26fe2f16ad55f4d36ae088864d8f	2016-05-19 11:56:29 +10:00
Samuel Merritt	6834547f66	Clean up fallocate tests a little Change-Id: I01f1ad8ef0f8910718fd2fb30c9e8285358baf84	2016-05-12 08:46:48 -07:00

1 2 3 4 5 ...

304 Commits