swift

Author	SHA1	Message	Date
Zuul	8516f2302a	Merge "Include policy in ran-out-of-handoffs log message"	2018-01-24 10:52:22 +00:00
Zuul	c9264124f3	Merge "Stop logging tracebacks when the replicator runs out of handoffs"	2018-01-24 02:27:34 +00:00
Tim Burke	fdf817b318	Include policy in ran-out-of-handoffs log message Related-Change: I2f5bb12f3704880df1750229425f64f419ff9aef Change-Id: I14b70e949a57d2c31b71d3efa6ee1200d0717db6	2018-01-23 18:43:00 +00:00
Tim Burke	61fe6aae81	Better mock out OSErrors in test_replicator before raising them Also, provide a return value for resp.read() so we hit a pickle error instead of a type error. Change-Id: I56141eee63ad1ceb2edf807432fa2516fabb15a6	2017-12-27 14:14:08 -08:00
Tim Burke	fba3fb7089	Stop logging tracebacks when the replicator runs out of handoffs Otherwise, swift-in-the-small can fill up logs with object-replicator: Error syncing partition: Traceback (most recent call last): File ".../swift/obj/replicator.py", line 419, in update node = next(nodes) StopIteration ...which simultaneously sounds worse than it is and isn't helpful in diagnosing/debugging the issue. Change-Id: I2f5bb12f3704880df1750229425f64f419ff9aef	2017-12-18 23:50:03 +00:00
Samuel Merritt	728b4ba140	Add checksum to object extended attributes Currently, our integrity checking for objects is pretty weak when it comes to object metadata. If the extended attributes on a .data or .meta file get corrupted in such a way that we can still unpickle it, we don't have anything that detects that. This could be especially bad with encrypted etags; if the encrypted etag (X-Object-Sysmeta-Crypto-Etag or whatever it is) gets some bits flipped, then we'll cheerfully decrypt the cipherjunk into plainjunk, then send it to the client. Net effect is that the client sees a GET response with an ETag that doesn't match the MD5 of the object and Swift has no way of detecting and quarantining this object. Note that, with an unencrypted object, if the ETag metadatum gets mangled, then the object will be quarantined by the object server or auditor, whichever notices first. As part of this commit, I also ripped out some mocking of getxattr/setxattr in tests. It appears to be there to allow unit tests to run on systems where /tmp doesn't support xattrs. However, since the mock is keyed off of inode number and inode numbers get re-used, there's lots of leakage between different test runs. On a real FS, unlinking a file and then creating a new one of the same name will also reset the xattrs; this isn't the case with the mock. The mock was pretty old; Ubuntu 12.04 and up all support xattrs in /tmp, and recent Red Hat / CentOS releases do too. The xattr mock was added in 2011; maybe it was to support Ubuntu Lucid Lynx? Bonus: now you can pause a test with the debugger, inspect its files in /tmp, and actually see the xattrs along with the data. Since this patch now uses a real filesystem for testing filesystem operations, tests are skipped if the underlying filesystem does not support setting xattrs (eg tmpfs or more than 4k of xattrs on ext4). References to "/tmp" have been replaced with calls to tempfile.gettempdir(). This will allow setting the TMPDIR envvar in test setup and getting an XFS filesystem instead of ext4 or tmpfs. THIS PATCH SIGNIFICANTLY CHANGES TESTING ENVIRONMENTS With this patch, every test environment will require TMPDIR to be using a filesystem that supports at least 4k of extended attributes. Neither ext4 nor tempfs support this. XFS is recommended. So why all the SkipTests? Why not simply raise an error? We still need the tests to run on the base image for OpenStack's CI system. Since we were previously mocking out xattr, there wasn't a problem, but we also weren't actually testing anything. This patch adds functionality to validate xattr data, so we need to drop the mock. `test.unit.skip_if_no_xattrs()` is also imported into `test.functional` so that functional tests can import it from the functional test namespace. The related OpenStack CI infrastructure changes are made in https://review.openstack.org/#/c/394600/. Co-Authored-By: John Dickinson <me@not.mn> Change-Id: I98a37c0d451f4960b7a12f648e4405c6c6716808	2017-11-03 13:30:05 -04:00
Clay Gerrard	feee399840	Use check_drive consistently We added check_drive to the account/container servers to unify how all the storage wsgi servers treat device dirs/mounts. Thus pushes that unification down into the consistency engine. Drive-by: * use FakeLogger less * clean up some repeititon in probe utility for device re-"mounting" Related-Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764 Change-Id: I941ffbc568ebfa5964d49964dc20c382a5e2ec2a	2017-11-01 16:33:40 +00:00
Jenkins	269d314dd8	Merge "Fix StopIteration noise in obj/test_replicator.py"	2017-07-18 03:28:08 +00:00
Jenkins	c66fff645f	Merge "Add tests for Recon's object replication_time time unit"	2017-07-18 03:28:00 +00:00
Alistair Coles	74700eb890	Fix StopIteration noise in obj/test_replicator.py Insufficient arguments are passed to create MockProcess instances resulting in StopIteration errors being raised during the repeated replicator run_once cycles added in [1]. The test passes because the replicator just logs these exceptions, but the logger noise is distracting when running the test [2]. [1] Related-Change: Ib5c9dd17e40150450ec57a728ae8652fbc730af6 [2] nosetests ./test/unit/obj/test_replicator.py:\ TestObjectReplicator.test_run_once -s Change-Id: I36208e93c81744068a3454577a30d0c5a8d9cb9b	2017-07-11 15:41:48 +01:00
Christian Schwede	3820e67448	Add tests for Recon's object replication_time time unit Adding a simple test to avoid regression. Change-Id: I1503af414b5c557a8a2e2c410b3938e97a644a2e	2017-06-22 15:25:37 +02:00
Christian Schwede	e1140666d6	Add support to increase object ring partition power This patch adds methods to increase the partition power of an existing object ring without downtime for the users using a 3-step process. Data won't be moved to other nodes; objects using the new increased partition power will be located on the same device and are hardlinked to avoid data movement. 1. A new setting "next_part_power" will be added to the rings, and once the proxy server reloaded the rings it will send this value to the object servers on any write operation. Object servers will now create a hard-link in the new location to the original DiskFile object. Already existing data will be relinked using a new tool in the new locations using hardlinks. 2. The actual partition power itself will be increased. Servers will now use the new partition power to read from and write to. No longer required hard links in the old object location have to be removed now by the relinker tool; the relinker tool reads the next_part_power setting to find object locations that need to be cleaned up. 3. The "next_part_power" flag will be removed. This mostly implements the spec in [1]; however it's not using an "epoch" as described there. The idea of the epoch was to store data using different partition powers in their own namespace to avoid conflicts with auditors and replicators as well as being able to abort such an operation and just remove the new tree. This would require some heavy change of the on-disk data layout, and other object-server implementations would be required to adopt this scheme too. Instead the object-replicator is now aware that there is a partition power increase in progress and will skip replication of data in that storage policy; the relinker tool should be simply run and afterwards the partition power will be increased. This shouldn't take that much time (it's only walking the filesystem and hardlinking); impact should be low therefore. The relinker should be run on all storage nodes at the same time in parallel to decrease the required time (though this is not mandatory). Failures during relinking should not affect cluster operations - relinking can be even aborted manually and restarted later. Auditors are not quarantining objects written to a path with a different partition power and therefore working as before (though they are reading each object twice in the worst case before the no longer needed hard links are removed). Co-Authored-By: Alistair Coles <alistair.coles@hpe.com> Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> [1] https://specs.openstack.org/openstack/swift-specs/specs/in_progress/ increasing_partition_power.html Change-Id: I7d6371a04f5c1c4adbb8733a71f3c177ee5448bb	2017-06-15 15:08:48 -07:00
Alexandre Lécuyer	95905b0174	Modify _get_hashes() arguments to be more generic Some public functions in the diskfile manager expect or return full file paths. It implies a filesystem diskfile implementation. To make it easier to plug alternate diskfile implementations, patch functions to take more generic arguments. This commit changes DiskFileManager _get_hashes() arguments from: - partition_path, recalculate=None, do_listdir=False to : - device, partition, policy, recalculate=None, do_listdir=False Callers are modified accordingly, in diskfile.py, reconstructor.py, and replicator.py Change-Id: I8e2d7075572e466ae2fa5ebef5e31d87eed90fec	2017-03-29 14:57:40 +02:00
Kota Tsuyuzaki	b1c36dc154	Fix test_replicator assertion Because random.randint includeds both endpoints so that random.randint(0, 9) which is assigned in replicatore should be [0-9]. Hence, the assertion for replication_cycle should be less or equal to 9. And the replication_cycle should be mod of 10. Change-Id: I81da375a4864256e8f3b473d4399402f83fc6aeb	2017-02-06 08:19:18 -08:00
Mahati Chamarthy	69f7be99a6	Move documented reclaim_age option to correct location The reclaim_age is a DiskFile option, it doesn't make sense for two different object services or nodes to use different values. I also driveby cleanup the reclaim_age plumbing from get_hashes to cleanup_ondisk_files since it's a method on the Manager and has access to the configured reclaim_age. This fixes a bug where finalize_put wouldn't use the [DEFAULT]/object-server configured reclaim_age - which is normally benign but leads to weird behavior on DELETE requests with really small reclaim_age. There's a couple of places in the replicator and reconstructor that reach into their manager to borrow the reclaim_age when emptying out the aborted PUTs that failed to cleanup their files in tmp - but that timeout doesn't really need to be coupled with reclaim_age and that method could have just as reasonably been implemented on the Manager. UpgradeImpact: Previously the reclaim_age was documented to be configurable in various object-* services config sections, but that did not work correctly unless you also configured the option for the object-server because of REPLICATE request rehash cleanup. All object services must use the same reclaim_age. If you require a non-default reclaim age it should be set in the [DEFAULT] section. If there are different non-default values, the greater should be used for all object services and configured only in the [DEFAULT] section. If you specify a reclaim_age value in any object related config you should move it to only the [DEFAULT] section before you upgrade. If you configure a reclaim_age less that your consistency window you are likely to be eaten by a Grue. Closes-Bug: #1626296 Change-Id: I2b9189941ac29f6e3be69f76ff1c416315270916 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>	2017-01-13 03:10:47 +00:00
Kota Tsuyuzaki	b09360d447	Fix stats calculation in object-reconstructor This patch fixes the object-reconstructor to calculate device_count as the total number of local devices in all policies. Previously Swift counts it for each policy but reconstruction_device_count which means the number of devices actually swift needs to reconstruct is counted as sum of ones for all polices. With this patch, Swift will gather all local devices for all policies at first, and then, collect parts for each devices as well as current. To do so, we can see the statuses for remaining job/disks percentage via stats_line output. To enable this change, this patch also touchs the object replicator to get a DiskFileManager via the DiskFileRouter class so that DiskFileManager instances are policy specific. Currently the same replication policy DiskFileManager class is always used, but this change future proofs the replicator for possible other DiskFileManager implementations. The change also gives the ObjectReplicator a _df_router variable, making it consistent with the ObjectReconstructor, and allowing a common way for ssync.Sender to access DiskFileManager instances via it's daemon's _df_router instance. Also, remove the use of FakeReplicator from the ssync test suite. It was not necessary and risked masking divergence between ssync and the replicator and reconstructor daemon implementations. Co-Author: Alistair Coles <alistair.coles@hpe.com> Closes-Bug: #1488608 Change-Id: Ic7a4c932b59158d21a5fb4de9ed3ed57f249d068	2016-12-12 21:26:54 -08:00
Pavel Kvasnička	8dcdaff64e	Fix non-deterministic suffix updates in hashes.pkl Right now the do_listdir option was set on every 10th replication run. Due to the randomness of the job listing this might update a given partition much less often than expected, for example with 1000 partitions per replicator only every ~70th run. Co-Authored-By: Alistair Coles <alistair.coles@hpe.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Christian Schwede <cschwede@redhat.com> Related-Bug: #1634967 Closes-Bug: 1644807 Change-Id: Ib5c9dd17e40150450ec57a728ae8652fbc730af6	2016-12-02 08:40:39 +01:00
Charles Hsu	65b1820407	Ignore auditor status files to prevent replicator reports errors Ignore `auditor_status_*.json` files during the collecting jobs and replicator won't use these wrong paths to find objects that causes an exception to increase failure count in replicator report. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Mark Kirkwood <mark.kirkwood@catalyst.net.nz> Change-Id: Ib15a0987288d9ee32432c1998aefe638ca3b223b Closes-Bug: #1583305	2016-08-16 15:52:20 +08:00
Clay Gerrard	cc0016399d	Remove red herring from logs The object replicator can log some junk about the cluster ip instead of the replication ip in some specific error log lines that can make you think either you're crazy or your rings are crazy. ... in this case it was just the logging was crazy - so fix that. Change-Id: Ie5cbb2d1b30feb2529c17fc3d72af7df1aa3ffdd	2016-07-18 14:01:57 -07:00
Christopher Bartz	b3dd6a5df1	Improves log message in swift.obj.replicator Before this commit, when a local device has not found been found in a object-replication run, the policy was not mentioned in the error log. But it is of interest to know the policy, for example for error searching, when no local device has been found. Change-Id: Icb9f9f1d4aec5c4a70dd8abdf5483d4816720418	2016-05-04 09:14:12 +02:00
Shashirekha Gundur	cf48e75c25	change default ports for servers Changing the recommended ports for Swift services from ports 6000-6002 to unused ports 6200-6202; so they do not conflict with X-Windows or other services. Updated SAIO docs. DocImpact Closes-Bug: #1521339 Change-Id: Ie1c778b159792c8e259e2a54cb86051686ac9d18	2016-04-29 14:47:38 -04:00
Clay Gerrard	f87a5487b5	Make rsync ignore it's own temporary files In situations where rsync may inadvertently be unable to cleanup it's temporary files we shouldn't spread them around the cluster. By asking our rsync subexec to --exclude patterns that match it's own convention for temporary naming we'll only ever transfer real replicated artifacts and never temporary artifacts which should always be ignored until they are fully transfered. Cleanup of stale rsync droppings should be performed by the auditor and will be addressed in a separate change related to lp bug #1554005. Closes-Bug: #1553995 Change-Id: Ibe598b339af024d05e4d89c34d696e972d8189ff	2016-03-14 17:30:10 -07:00
Samuel Merritt	f56d18e143	Fix typos in comments Change-Id: I4f98d447bd2ddd8eeb2f4da66d069bd7d6f00dc6	2016-02-11 10:59:43 -08:00
Clay Gerrard	01410129da	Make handoffs_first a more useful "mode" Based on experience using handoffs_first and feedback from other operators it has become clear that handoffs_first is only used during periods of problematic cluster behavior (e.g. full disks) when replication attempts are failing to quickly drain off the partitions from the nodes which they have been rebalanced from. In order to focus on the most important work (getting handoff partitions off the node) handoffs_first mode will abort the current replication sweep before attempting any primary suffix syncing if any of the handoff partitions were not removed for any reason - and start over with replication of handoffs jobs as the highest priority. Note that handoffs_first being enabled will emit a warning on start up, even if no handoff jobs fail, because of the negative impact it can have during normal operations by dog piling on a node that was temporarily unavailable. Change-Id: Ia324728d42c606e2f9e7d29b4ab5fcbff6e47aea	2016-01-25 15:29:25 -08:00
Peter Lisák	16de32f168	Log error if a local device not identified in replicator Example: * Different port in config and in ring file. * Running daemon on server not in ring file. In both cases replication daemon is running but nothing is replicated. Error log helps to distinguish a local device can't be identified. Closes-Bug: 1508228 Change-Id: I99351b7d9946f250b7750df91c13d09352a145ce	2015-11-21 14:04:32 +01:00
janonymous	f5f9d791b0	pep8 fix: assertEquals -> assertEqual assertEquals is deprecated in py3, replacing it. Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8 Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>	2015-10-11 12:57:25 +02:00
Charles Hsu	d01cd42509	Fix replicator intersection exception when sync data to remote regions. This patch fixed the exception (AttributeError: 'list' object has no attribute 'intersection') when replicator try to sync data from handoff to primary partition in more than one remote region. Change-Id: I565c45dda8c99d36e24dbf1145f2d2527d593ac0 Closes-Bug: 1503152	2015-10-07 12:18:35 -07:00
Jenkins	f7c235513c	Merge "Follow up patch to fix a multiline import NITPIC"	2015-08-25 10:28:47 +00:00
Jenkins	d703a532cb	Merge "Use correct Storage-Policy header for REPLICATE requests"	2015-08-25 08:30:57 +00:00
Jenkins	75664055b2	Merge "Minor cleanup handoff mode warnings"	2015-08-25 08:11:00 +00:00
Matthew Oliver	6a35d479e8	Follow up patch to fix a multiline import NITPIC This change cleans up test/unit/obj/test_replicator.py's imports to use only 1 version of multiline import syntaxes (' \' vs '()'). I don't really mind which, but we should be consistant, at least in the same file. This is a follow up for patch 215857. Change-Id: Ie2d328c25865b19092c493981a803ee246a9d7a5	2015-08-25 11:31:59 +10:00
Clay Gerrard	a38f63e1c6	Use correct Storage-Policy header for REPLICATE requests Under some concurrency the object-replicator could potentially send the wrong X-Backed-Storage-Policy-Index header to it's partner nodes during replication if there were multiple storage policies on the same node because of a race where multiple jobs being processed concurrently would mutate some shared state on the ObjectReplicator instance. Instead of using shared stated on the ObjectReplicator instance when mutating the default headers send with REPLICATION requests each job will copy them into a local where they can safely be updated. Change-Id: I5522db57af7e308b1f9d4181f14ea14e386a71fd	2015-08-24 11:20:02 -07:00
Jenkins	2d41ff7b45	Merge "Enable Object Replicator's failure count in recon"	2015-08-24 07:32:08 +00:00
Clay Gerrard	8b1df9918b	Minor cleanup handoff mode warnings * message is a little clearer * test is a little stronger Change-Id: I745cde7f4a46dafc80ab42d39e6ccc92aa3b746e	2015-08-21 18:43:41 -07:00
Pradeep Kumar Singh	ab163702de	Emit warning log in object replicator When the object-replicator encounters handoffs_first and handoff_delete options as enabled it should emit a log warning indicating that it should be changed back to the default before the next "normal" rebalance. Closes-Bug: #1457262 Change-Id: If9dc2796c18ed3cf13da920831e2d5c2ae9f12a0	2015-08-21 02:47:04 +00:00
Hisashi Osanai	79ba4a8598	Enable Object Replicator's failure count in recon This patch makes the count of object replication failure in recon. And "failure_nodes" is added to Account Replicator and Container Replicator. Recon shows the count of object repliction failure as follows: $ curl http://<ip>:<port>/recon/replication/object { "replication_last": 1416334368.60865, "replication_stats": { "attempted": 13346, "failure": 870, "failure_nodes": { "192.168.0.1": {"sdb1": 3}, "192.168.0.2": {"sdb1": 851, "sdc1": 1, "sdd1": 8}, "192.168.0.3": {"sdb1": 3, "sdc1": 4} }, "hashmatch": 0, "remove": 0, "rsync": 0, "start": 1416354240.9761429, "success": 1908 }, "replication_time": 2316.5563162644703, "object_replication_last": 1416334368.60865, "object_replication_time": 2316.5563162644703 } Note that 'object_replication_last' and 'object_replication_time' are considered to be transitional and will be removed in the subsequent releases. Use 'replication_last' and 'replication_time' instead. Additionaly this patch adds the count in swift-recon and it will be showed as follows: $ swift-recon object -r ======================================================================== ======= --> Starting reconnaissance on 4 hosts ======================================================================== ======= [2014-11-27 16:14:09] Checking on replication [replication_failure] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 4 [replication_success] low: 3, high: 3, avg: 3.0, total: 12, Failed: 0.0%, no_result: 0, reported: 4 [replication_time] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 4 [replication_attempted] low: 1, high: 1, avg: 1.0, total: 4, Failed: 0.0%, no_result: 0, reported: 4 Oldest completion was 2014-11-27 16:09:45 (4 minutes ago) by 192.168.0.4:6002. Most recent completion was 2014-11-27 16:14:19 (-10 seconds ago) by 192.168.0.1:6002. ======================================================================== ======= In case there is a cluster which has servers, a server runs with this patch and the other servers run without this patch. If swift-recon executes on the server which runs with this patch, there are unnecessary information on the output such as [failure], [success] and [attempted]. Because other servers which run without this patch are not able to send a response with information that this patch needs. Therefore once you apply this patch, you also apply this patch to other servers before you execute swift-recon. DocImpact Change-Id: Iecd33655ae2568482833131f422679996c374d78 Co-Authored-By: Kenichiro Matsuda <matsuda_kenichi@jp.fujitsu.com> Co-Authored-By: Brian Cline <bcline@softlayer.com> Implements: blueprint enable-object-replication-failure-in-recon	2015-08-18 11:40:02 +09:00
janonymous	9456af35a2	pep8 fix: assertEquals -> assertEqual assertEquals is deprecated in py3,changes in dir: test/unit/obj/ test/unit/test_locale/ Change-Id: I3dd0c1107165ac529f1cd967363e5cf408a1d02b	2015-08-07 19:28:35 +05:30
Alistair Coles	bcd00d9461	Refactor diskfile This patch mostly eliminates the duplicate code that was deliberately left in place during EC review to avoid major churn of the diskfile module prior to the kilo release. This focuses on obvious de-duplication and shuffling code between classes. It deliberately does not attempt to hammer out every last piece of de-duplication where that would introduce more complex changes - that can come later. Code is moved from the module level and from ECDiskFile* classes into new BaseDiskFile* classes. Concrete classes for replication and EC policy retain their existing names i.e. DiskFile[Manager\|Writer\|Reader\|] and ECDiskFile[Manager\|Writer\|Reader\|] respectively. Knock-on changes: - fix bug whereby get_hashes was ignoring self.reclaim_age and always using the default arg value. - replication diskfile manager now deletes a tombstone that is older than reclaim_age even when there is a newer .meta file. - replication diskfile manager will no longer raise an AssertionError if only a .meta file is found during hash_cleanup_listdir. - fix stale test in test_auditor.py: test_with_tombstone test setup was convoluted (probably dates back to when object puts did not clean up the object dir). Now that they do you have to try harder to create a dir with a tombstone and a data file. Change-Id: I963e0d0ae0d6569ad1de605034c529529cbb4f9a	2015-07-30 12:21:00 +01:00
janonymous	c907107fe4	cPickle is deprecated in py3, replacing it from six.moves cPickle is deprecated and should be replaced with six.moves to provide py2 and py3 compatibility. Change-Id: Ibad990708722360d188c641e61444d50a16a1e93	2015-07-07 22:46:37 +05:30
Darrell Bishop	df134df901	Allow 1+ object-servers-per-disk deployment Enabled by a new > 0 integer config value, "servers_per_port" in the [DEFAULT] config section for object-server and/or replication server configs. The setting's integer value determines how many different object-server workers handle requests for any single unique local port in the ring. In this mode, the parent swift-object-server process continues to run as the original user (i.e. root if low-port binding is required), binds to all ports as defined in the ring, and forks off the specified number of workers per listen socket. The child, per-port servers drop privileges and behave pretty much how object-server workers always have, except that because the ring has unique ports per disk, the object-servers will only be handling requests for a single disk. The parent process detects dead servers and restarts them (with the correct listen socket), starts missing servers when an updated ring file is found with a device on the server with a new port, and kills extraneous servers when their port is found to no longer be in the ring. The ring files are stat'ed at most every "ring_check_interval" seconds, as configured in the object-server config (same default of 15s). Immediately stopping all swift-object-worker processes still works by sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process still causes the parent process to close all listen sockets and exit, allowing existing children to finish serving their existing requests. The drop_privileges helper function now has an optional param to suppress the setsid() call, which otherwise screws up the child workers' process management. The class method RingData.load() can be told to only load the ring metadata (i.e. everything except replica2part2dev_id) with the optional kwarg, header_only=True. This is used to keep the parent and all forked off workers from unnecessarily having full copies of all storage policy rings in memory. A new helper class, swift.common.storage_policy.BindPortsCache, provides a method to return a set of all device ports in all rings for the server on which it is instantiated (identified by its set of IP addresses). The BindPortsCache instance will track mtimes of ring files, so they are not opened more frequently than necessary. This patch includes enhancements to the probe tests and object-replicator/object-reconstructor config plumbing to allow the probe tests to work correctly both in the "normal" config (same IP but unique ports for each SAIO "server") and a server-per-port setup where each SAIO "server" must have a unique IP address and unique port per disk within each "server". The main probe tests only work with 4 servers and 4 disks, but you can see the difference in the rings for the EC probe tests where there are 2 disks per server for a total of 8 disks. Specifically, swift.common.ring.utils.is_local_device() will ignore the ports when the "my_port" argument is None. Then, object-replicator and object-reconstructor both set self.bind_port to None if server_per_port is enabled. Bonus improvement for IPv6 addresses in is_local_device(). This PR for vagrant-swift-all-in-one will aid in testing this patch: https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/ Also allow SAIO to answer is_local_device() better; common SAIO setups have multiple "servers" all on the same host with different ports for the different "servers" (which happen to match the IPs specified in the rings for the devices on each of those "servers"). However, you can configure the SAIO to have different localhost IP addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the servers' config files' bind_ip setting. This new whataremyips() implementation combined with a little plumbing allows is_local_device() to accurately answer, even on an SAIO. In the default case (an unspecified bind_ip defaults to '0.0.0.0') as well as an explict "bind to everything" like '0.0.0.0' or '::', whataremyips() behaves as it always has, returning all IP addresses for the server. Also updated probe tests to handle each "server" in the SAIO having a unique IP address. For some (noisy) benchmarks that show servers_per_port=X is at least as good as the same number of "normal" workers: https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md Benchmarks showing the benefits of I/O isolation with a small number of slow disks: https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md If you were wondering what the overhead of threads_per_disk looks like: https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md DocImpact Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6	2015-06-18 12:43:50 -07:00
janonymous	09e7477a39	Replace it.next() with next(it) for py3 compat The Python 2 next() method of iterators was renamed to __next__() on Python 3. Use the builtin next() function instead which works on Python 2 and Python 3. Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d	2015-06-15 22:10:45 +05:30
Jenkins	2ea8bae389	Merge "Allow rsync to use compression"	2015-05-13 10:58:08 +00:00
paul luse	647b66a2ce	Erasure Code Reconstructor This patch adds the erasure code reconstructor. It follows the design of the replicator but: - There is no notion of update() or update_deleted(). - There is a single job processor - Jobs are processed partition by partition. - At the end of processing a rebalanced or handoff partition, the reconstructor will remove successfully reverted objects if any. And various ssync changes such as the addition of reconstruct_fa() function called from ssync_sender which performs the actual reconstruction while sending the object to the receiver Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Samuel Merritt <sam@swiftstack.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> blueprint ec-reconstructor Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51	2015-04-14 00:52:17 -07:00
Alistair Coles	fa89064933	Per-policy DiskFile classes Adds specific disk file classes for EC policy types. The new ECDiskFile and ECDiskFileWriter classes are used by the ECDiskFileManager. ECDiskFileManager is registered with the DiskFileRouter for use with EC_POLICY type policies. Refactors diskfile tests into BaseDiskFileMixin and BaseDiskFileManagerMixin classes which are then extended in subclasses for the legacy replication-type DiskFile* and ECDiskFile* classes. Refactor to prefer use of a policy instance reference over a policy_index int to refer to a policy. Add additional verification to DiskFileManager.get_dev_path to validate the device root with common.constraints.check_dir, even when mount_check is disabled for use in on a virtual swift-all-in-one. Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Samuel Merritt <sam@swiftstack.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I22f915160dc67a9e18f4738c1ddf068344e8ad5d	2015-04-14 00:52:16 -07:00
Clay Gerrard	a707829334	Update test infrastructure * Get FakeConn ready for expect 100 continue * Use debug_logger more and with better interfaces * Fix patch_policies to be less annoying Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Paul Luse <paul.e.luse@intel.com> Co-Authored-By: Samuel Merritt <sam@swiftstack.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> Change-Id: I28c0a3539d994cbb8e6b94d63a23ed4ea6cb956d	2015-04-13 22:57:42 -07:00
Prashanth Pai	9c33bbde69	Allow rsync to use compression From rsync's man page: -z, --compress With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmitted -- something that is useful over a slow connection. A configurable option has been added to allow rsync to compress, but only if the remote node is in a different region than the local one. NOTE: Objects that are already compressed (for example: .tar.gz, .mp3) might slow down the syncing process. On wire compression can also be extended to ssync later in a different change if required. In case of ssync, we could explore faster compression libraries like lz4. rsync uses zlib which is slow but offers higher compression ratio. Change-Id: Ic9b9cbff9b5e68bef8257b522cc352fc3544db3c Signed-off-by: Prashanth Pai <ppai@redhat.com>	2015-03-02 14:39:58 +05:30
Clay Gerrard	2ff66a532c	Fix object replicator partition cleanup Probetests discovered two issues with the current state of the object-replicator as a result of the attempts to clean up changes related to efficient cross-region replication. Known failures are: * rsync replication when configured with no sync_method in the config fails to clean up a handoff partition * ssync replication when there is only one region fails to cleanup a handoff partition In both cases the path resulting in the failure moved through the implicit else clause (dangling elif) of the partition cleanup code path. In the ssync case the failure came form a miss on the first if branch when delete_objs would be None if there is no remote regions. In the rsync case the failure came from a miss on the second elif condition when looking for an entry in the conf dict and not setting a default. This change adds unittests for both failures that should fail in a reasonable way against master without requiring a probetest run against other configs, as well as rephrasing the logic in the partition cleanup handling to try and make the logic flow more explicit. Change-Id: Ic59d998a3e36a3eb3e509d9fdf7096e812281357	2015-02-26 18:31:41 -08:00
Kota Tsuyuzaki	f578a35100	Fix efficient replication handoff delete Current code might delete local handoff objects incorrectly when remote node requires whole of the objects at poking because empty cand_objs won't be applied to the delete candidate objects list. This patch ensures the delete candidate objects list always will be updated (i.e. it will be empty list when the poke job find whole local objects are required by remote), and then, handle deleting objects correctly according to the delete candidate. This patch includes a test written by Clay Gerrard at [1]. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> 1: https://review.openstack.org/#/c/155542/ Change-Id: Ie8f75ed65c7bfefbb18ddccd9fe0e41b72dca0a4	2015-02-19 00:09:31 -08:00
Clay Gerrard	596042b9c7	Minor cleanup post efficent multi-region replication One log line had a typo, and I refactored the per object cleanup code out of update_deleted into the per object hashdir cleanup method. Change-Id: I19d03d0706a75bd8ec2fe327a1eb1b5ec36de6d2	2015-02-13 08:04:56 -08:00
Kota Tsuyuzaki	20ca279d74	Efficient Replication for Distributed Regions This change provides a efficient way of replication between regions of a global distributed cluster. This approach makes object-replicator to push replicas to a primary node in a remote region, then, to skip pushing them to next primary node in the region with expecting asynchronous replication. This implementation includes a couple of changes on ssync_sender to allow object-replicator to delete local handoff objects correctly. One is to return a list of existing objects in remote region. The list includes local paths of the objects which exist both on the local device and the remote device. The other is supporting existence check for specified objects. It requires the object list build by the first change. When the object list is given, ssync_sender does only missing_check based on the list. These changes are needed because current swift can not handle the existence check in object-level. Note that this feature will work partially (i.e. only when primary-to-primary) with rsync. Implements: blueprint efficient-replication Change-Id: I5d990444d7977f4127bb37f9256212c893438df1	2015-02-10 12:52:15 -08:00

1 2 3

106 Commits