swift

Author	SHA1	Message	Date
Romain LE DISEZ	71f6fd025e	Allows to configure the rsync modules where the replicators will send data Currently, the rsync module where the replicators send data is static. It forbids administrators to set rsync configuration based on their current deployment or needs. As an example, the rsyncd configuration example encourages to set a connections limit for the modules account, container and object. It permits to protect devices from excessives parallels connections, because it would impact performances. On a server with many devices, it is tempting to increase this number proportionally, but nothing guarantees that the distribution of the connections will be balanced. In the worst scenario, a single device can receive all the connections, which is a severe impact on performances. This commit adds a new option named 'rsync_module' to the -replicator sections of the -server configuration file. This configuration variable can be extrapolated with device attributes like ip, port, device, zone, ... by using the format {NAME}. eg: rsync_module = {replication_ip}::object_{device} With this configuration, an administrators can solve the problem of connections distribution by creating one module per device in rsyncd configuration. The default values are backward compatible: {replication_ip}::account {replication_ip}::container {replication_ip}::object Option vm_test_mode is deprecated by this commit, but backward compatibility is maintained. The option is only effective when rsync_module is not set. In that case, {replication_port} is appended to the default value of rsync_module. Change-Id: Iad91df50dadbe96c921181797799b4444323ce2e	2015-09-07 08:00:18 +02:00
janonymous	c5b5cf91a9	test/unit: Replace python print operator with print function (pep H233, py33) 'print' function is compatible with 2.x and 3.x python versions Link : https://www.python.org/dev/peps/pep-3105/ Python 2.6 has a __future__ import that removes print as language syntax, letting you use the functional form instead Change-Id: I94e1bc6bd83ad6b05695c7ebdf7cbfd8f6d9f9af	2015-07-28 21:03:05 +05:30
janonymous	cd7b2db550	unit tests: Replace "self.assert_" by "self.assertTrue" The assert_() method is deprecated and can be safely replaced by assertTrue(). This patch makes sure that running the tests does not create undesired warnings. Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71	2015-07-21 19:23:00 +05:30
Victor Stinner	1cc3eff958	Fixes for mock 1.1 The new release of mock 1.1 is more strict. It helped to find bugs in tests. Closes-Bug: #1473369 Change-Id: Id179513c6010d827cbcbdda7692a920e29213bcb	2015-07-10 16:37:11 +02:00
Darrell Bishop	df134df901	Allow 1+ object-servers-per-disk deployment Enabled by a new > 0 integer config value, "servers_per_port" in the [DEFAULT] config section for object-server and/or replication server configs. The setting's integer value determines how many different object-server workers handle requests for any single unique local port in the ring. In this mode, the parent swift-object-server process continues to run as the original user (i.e. root if low-port binding is required), binds to all ports as defined in the ring, and forks off the specified number of workers per listen socket. The child, per-port servers drop privileges and behave pretty much how object-server workers always have, except that because the ring has unique ports per disk, the object-servers will only be handling requests for a single disk. The parent process detects dead servers and restarts them (with the correct listen socket), starts missing servers when an updated ring file is found with a device on the server with a new port, and kills extraneous servers when their port is found to no longer be in the ring. The ring files are stat'ed at most every "ring_check_interval" seconds, as configured in the object-server config (same default of 15s). Immediately stopping all swift-object-worker processes still works by sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process still causes the parent process to close all listen sockets and exit, allowing existing children to finish serving their existing requests. The drop_privileges helper function now has an optional param to suppress the setsid() call, which otherwise screws up the child workers' process management. The class method RingData.load() can be told to only load the ring metadata (i.e. everything except replica2part2dev_id) with the optional kwarg, header_only=True. This is used to keep the parent and all forked off workers from unnecessarily having full copies of all storage policy rings in memory. A new helper class, swift.common.storage_policy.BindPortsCache, provides a method to return a set of all device ports in all rings for the server on which it is instantiated (identified by its set of IP addresses). The BindPortsCache instance will track mtimes of ring files, so they are not opened more frequently than necessary. This patch includes enhancements to the probe tests and object-replicator/object-reconstructor config plumbing to allow the probe tests to work correctly both in the "normal" config (same IP but unique ports for each SAIO "server") and a server-per-port setup where each SAIO "server" must have a unique IP address and unique port per disk within each "server". The main probe tests only work with 4 servers and 4 disks, but you can see the difference in the rings for the EC probe tests where there are 2 disks per server for a total of 8 disks. Specifically, swift.common.ring.utils.is_local_device() will ignore the ports when the "my_port" argument is None. Then, object-replicator and object-reconstructor both set self.bind_port to None if server_per_port is enabled. Bonus improvement for IPv6 addresses in is_local_device(). This PR for vagrant-swift-all-in-one will aid in testing this patch: https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/ Also allow SAIO to answer is_local_device() better; common SAIO setups have multiple "servers" all on the same host with different ports for the different "servers" (which happen to match the IPs specified in the rings for the devices on each of those "servers"). However, you can configure the SAIO to have different localhost IP addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the servers' config files' bind_ip setting. This new whataremyips() implementation combined with a little plumbing allows is_local_device() to accurately answer, even on an SAIO. In the default case (an unspecified bind_ip defaults to '0.0.0.0') as well as an explict "bind to everything" like '0.0.0.0' or '::', whataremyips() behaves as it always has, returning all IP addresses for the server. Also updated probe tests to handle each "server" in the SAIO having a unique IP address. For some (noisy) benchmarks that show servers_per_port=X is at least as good as the same number of "normal" workers: https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md Benchmarks showing the benefits of I/O isolation with a small number of slow disks: https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md If you were wondering what the overhead of threads_per_disk looks like: https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md DocImpact Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6	2015-06-18 12:43:50 -07:00
janonymous	09e7477a39	Replace it.next() with next(it) for py3 compat The Python 2 next() method of iterators was renamed to __next__() on Python 3. Use the builtin next() function instead which works on Python 2 and Python 3. Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d	2015-06-15 22:10:45 +05:30
Jenkins	2ea8bae389	Merge "Allow rsync to use compression"	2015-05-13 10:58:08 +00:00
Prashanth Pai	a6f630f27c	fsync() on directories renamer() method now does a fsync on containing directory of target path and also on parent dirs of newly created directories, by default. This can be explicitly turned off in cases where it is not necessary (For example- quarantines). The following article explains why this is necessary: http://lwn.net/Articles/457667/ Although, it may seem like the right thing to do, this change does come at a performance penalty. However, no configurable option is provided to turn it off. Also, lock_path() inside invalidate_hash() was always creating part of object path in filesystem. Those are never fsync'd. This has been fixed. Change-Id: Id8e02f84f48370edda7fb0c46e030db3b53a71e3 Signed-off-by: Prashanth Pai <ppai@redhat.com>	2015-03-04 12:33:56 +05:30
Prashanth Pai	9c33bbde69	Allow rsync to use compression From rsync's man page: -z, --compress With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmitted -- something that is useful over a slow connection. A configurable option has been added to allow rsync to compress, but only if the remote node is in a different region than the local one. NOTE: Objects that are already compressed (for example: .tar.gz, .mp3) might slow down the syncing process. On wire compression can also be extended to ssync later in a different change if required. In case of ssync, we could explore faster compression libraries like lz4. rsync uses zlib which is slow but offers higher compression ratio. Change-Id: Ic9b9cbff9b5e68bef8257b522cc352fc3544db3c Signed-off-by: Prashanth Pai <ppai@redhat.com>	2015-03-02 14:39:58 +05:30
Jenkins	8cf9107022	Merge "Fix large out of sync out of date containers"	2015-01-13 13:58:39 +00:00
Harshit	99fa8b3f8e	Removing commented out test in test_db_replicator It removes test_dispatch test from test_db_replicator which has been commented out for a while. Change-Id: Ia28fa923a65ad7d85804cbf6f7acef244741bab1 Closes-Bug: #1408502	2015-01-10 01:07:45 -08:00
Clay Gerrard	404ac092d1	Fix large out of sync out of date containers As I understand it db replication starts with a preflight sync request to the remote container server who's response will include the last synced row_id that it has on file for the sending nodes database id. If the difference in the last sync point returned is more than 50% of the local sending db's rows, it'll fall back to sending the whole db over rsync and let the remote end merge items locally - but generally there's just a few rows missing and they're shipped over the wire as json and stuffed into some rather normal looking merge_items calls. The one thing that's a bit different with these remote merge_items calls (compared to your average run of the mill eat a bunch of entries out of a .pending file) is the is source kwarg. When this optional kwarg comes into merge_items it's the remote sending db's uuid, and after we eat all the rows it sent us we update our local incoming_sync table for that uuid so that next time when it makes it's pre-flight sync request we can tell it where it left off. Now normally the sending db is going to push out it's rows up from the returned sync_point in 1000 item diffs, up to 10 batches total (per_diff and max_diffs options) - 10K rows. If that goes well then everything is in sync up to at least the point it started, and the sending db will also ship over it's incoming_sync rows to merge_syncs on the remote end. Since the sending db is in sync with these other db's up to those points so is the remote db now by way of the transitive property. Also note through some weird artifact that I'm not entirely convinced isn't an unrelated and possibly benign bug the incoming_sync table on the sending db will often also happen to include it's own uuid - maybe it got pushed back to it from another node? Anyway, that seemed to work well enough until a sending db got diff capped (i.e. sent it's 10K rows and wasn't finished), when this happened the final merge_syncs call never gets sent because the remote end is definitely not up to date with the other databases that the sending db is - it's not even up-to-date with the sending db yet! But the hope is certainly that on the next pass it'll be able to finish sending the remaining items. But since the remote end is who decides what the last successfully synced row with this local sending db was - it's super important that the incoming_sync table is getting updated in merge_items when that source kwarg is there. I observed this simple and straight forward process wasn't working well in one case - which is weird considering it didn't have much in the way of tests. After I had the test and started looking into it seemed maybe the source kwarg handling got over-indented a bit in the bulk insert merge_items refactor. I think this is correct - maybe we could send someone up to the mountain temple to seek out gholt? Change-Id: I4137388a97925814748ecc36b3ab5f1ac3309659	2015-01-07 17:20:35 -08:00
Clay Gerrard	233e0aebf7	Fix reclaim on deleted containers The common db replicator's code path for reclaiming deleted db's beyond the reclaim age was not covered by unittests, and a AttributeError snuck in. In writing the test that would cover the common code both for accounts and containers I discovered another KeyError with the container conditional for validating the container's fully reported status. This fixes both those issues and adds additional tests for the cleanup empty account container partition and suffix directories. Change-Id: I2a1bfaefebd05b01231bf71dd908fcc49adb4c36	2014-12-03 17:10:15 -08:00
Caleb Tennis	d40cebfe55	Clean up empty account and container partitions directories. Because we iterate over these directories on a replication run, and they are not (previously) cleaned up, the time to start the replication increases incrementally for each stale directory lying around. Thousands of directories across dozens of disks on a single machine can make for non-trivial startup times. Plus it just seems like good housekeeping. Closes-Bug: #1396152 Change-Id: Iab607b03b7f011e87b799d1f9af7ab3b4ff30019	2014-12-02 16:24:32 -05:00
Takashi Kajinami	7a0c4d2482	Remove invalid connection checking in db_replicator Account/Container-replicator checks connection generation and timeout in HTTP REPLICATE Request in _repl_to_node, but it doesn't really checks connection but only construction of ReplConnection class. This patch removes that invalid checking. Change-Id: Ie6b4062123d998e69c15638b741e7d1ba8a08b62 Closes-Bug: #1359018	2014-11-25 00:00:05 +09:00
Clay Gerrard	a14d2c857c	Enqueue misplaced objects during container replication After a container database is replicated, a _post_replicate_hook will enqueue misplaced objects for the container-reconciler into the .misplaced_objects containers. Items to be reconciled are "batch loaded" into the reconciler queue and the end of a container replication cycle by levering container replication itself. DocImpact Implements: blueprint storage-policies Change-Id: I3627efcdea75403586dffee46537a60add08bfda	2014-06-18 21:09:50 -07:00
Clay Gerrard	81bc31e6ec	Merge container storage_policy_index Keep status_changed_at in container databases current with status changes that occur as a result of container creation, deletion, or re-creation. Merge container put/delete/created timestamps when handling replicate responses from remote servers in addition to during the handling of the REPLICATE request. When storage policies are configured on a cluster send status_changed_at, object_count and storage_policy_index as part of container replication sync args. Use status_changed_at during replication to determine the oldest active container and merge storage_policy_index. DocImpact Implements: blueprint storage-policies Change-Id: Ib9a0dd42c271145e641437dc04d0ebea1e11fc47	2014-06-18 20:57:09 -07:00
Clay Gerrard	7624b198cf	Update FakeRing and FakeLogger FakeLogger gets better log level handling Parameterize logger on some daemons which were previously unparameterized and try and use the interface in tests. FakeRing use more real code The existing FakeRing mock's implementation bit me on some pretty subtle character encoding issue by-passing the hash_path code that is normally part of get_part_nodes. This change tries to exercise more of the real ring code paths when it makes sense and provide a better Fake for use in testing. Add write_fake_ring helper to test.unit for when you need a real ring. DocImpact Implements: blueprint storage-policies Change-Id: Id2e3740b1dd569050f4e083617e7dd6a4249027e	2014-06-18 17:31:37 -07:00
Pete Zaitcev	a7cfcc3d7a	Relocate DATADIR to backends It simply makes sense that the definition of DATADIR belongs to backends. After all, some of them may not even have any. Coincidentially, a few unnecessary imports are dropped. By the way, on the object server side, diskfile.py provides DATADIR in the same way already. Change-Id: I60bfd522c77c4a0ee13697a2e31141777c7e2398	2014-04-01 23:22:22 -06:00
Samuel Merritt	09ef06fd99	Convert all old-style classes to new-style This cleanup has been slowly happening for a while; let's finish it. Change-Id: I1561e3540d524834e0cc5bc725ab80936eae1f0e	2014-03-03 17:28:48 -08:00
Cristian A Sanchez	fdc775d6d5	Increases the UT coverage of db_replicator.py Adds 20 unit tests to increase the coverage of db_replicator.py from 71% to 90% Change-Id: Ia63cb8f2049fb3182bbf7af695087bfe15cede54 Closes-Bug: #948179	2013-12-03 14:00:22 -03:00
Gonéri Le Bouder	14c5b547f2	test: improve db_replicator coverage This patch adds a test for ReplicatorRpc.complete_rsync() and complete extract_device() coverage. test_extract_device: test the case the parameter is invalid test_complete_rsync_with_bad_input: ensure the use of invalid parameters return a 404 erro test_complete_rsync: validate the returned code in case of success Change-Id: I59e0d26a1efe59d8beff1e81c2a7edc6de0872e9	2013-11-21 17:28:07 +01:00
Peter Portante	9411a24ba7	Revert "Refactor common/utils methods to common/ondisk" This reverts commit 7760f41c3ce436cb23b4b8425db3749a3da33d32 Change-Id: I95e57a2563784a8cd5e995cc826afeac0eadbe62 Signed-off-by: Peter Portante <peter.portante@redhat.com>	2013-10-07 17:18:09 -04:00
ZhiQiang Fan	f72704fc82	Change OpenStack LLC to Foundation Change-Id: I7c3df47c31759dbeb3105f8883e2688ada848d58 Closes-bug: #1214176	2013-09-20 01:02:31 +08:00
Peter Portante	7760f41c3c	Refactor common/utils methods to common/ondisk Place all the methods related to on-disk layout and / or configuration into a new common module that can be shared by the various modules using the same on-disk layout. Change-Id: I27ffd4665d5115ffdde649c48a4d18e12017e6a9 Signed-off-by: Peter Portante <peter.portante@redhat.com>	2013-09-17 17:32:04 -04:00
Peter Portante	56593a1323	Pep8 unit test modules w/ <= 20 violations (6 of 12) Change-Id: I7317beb97e1530cb18c62da55ccf4c64206ff362 Signed-off-by: Peter Portante <peter.portante@redhat.com>	2013-09-01 16:12:42 -04:00
Alex Gaynor	ff5a6d0111	Corrected many style violations in the tests. I focussed primarily on F-category violations, they are all but all fixed with this patch. Change-Id: I343f6945c97984ed1093bc347b6def6994297041	2013-07-24 10:18:47 -07:00
Vladimir Vechkanov	1f7d2a60d6	Refactor and add tests for db_replicator * Create class for testing _repl_to_not and replicate_object fuctions to prevent duplication code by adding all preparation into setUp function. * Move existed test function which testin _repl_to_not and replicate_object into created classes. * Add tests for replicate_object and _repl_to_node functions. Change-Id: I75ac7c6f0230e71bfb24328e44c33734b520b4cd	2013-07-02 17:03:28 +04:00
gholt	fef2afd927	Fixed Bug 1187200 See Bug 1187200 for a full description of the problem. Part 1: X-Delete-At-Container added to X-Delete-At-* info This fixes the bug by passing the expiring-objects-account's container name onward to the backend object servers. This is in case the object servers' expiring_objects_container_divisor happens to be different than the proxy server's, we want to make sure the host, partition, and device match up with the container name. Different container names would be fine, but not with mismatched host, partition, and device info. Part 2: The db_replicator now double checks the disk path's partition against the partition the ring gives back. If they don't match, it logs the problem but continues to replicate the database to where it should be and, on success to all proper nodes, removes the local out of place database. Bug 1187200 Change-Id: Id0873a3f2198ce285fe0b0c777738eff38bc2438	2013-06-08 20:00:32 +00:00
Vladimir Vechkanov	fd3b64bb16	Fix problem with changing class attribute Attribute get_repl_missing_table in FakeBroker class was changed in test_replicate_object_quarantine function and not returned back. That's why next test cases takes not expexted values from FakeBroker. fixes bug 1180354 Change-Id: Iba55255771e6483832c7782fcbe331e20e818f4e	2013-05-21 23:21:50 +04:00
Sergey Kraynev	ea7858176b	Implementation of replication servers Support separate replication ip address: - Added new function in utils. This function provides ability to select separate IP address for replication service. - Db_replicator and object replicators were changed. Replication process uses new function now. Replication network parameters: - Replication network fields (replication_ip, replication_port) support was added to device dictionary in swift-ring-builder script. - Changes were made to support new fields in search, show and set_info functions. Implementation of replication servers: - Separate replication servers use the same code as normal replication servers, but with replication_server parameter = True. When using a separate replication network, the non-replication servers set replication_server = False. When there is no separate replication network (the default case), replication_server is not included in the config. DocImpact Change-Id: Ie9af5bdcdf9241c355e36053ca4adfe49dc35bd0 Implements: blueprint dedicated-replication-network	2013-04-21 18:14:42 -04:00
Greg Lange	30e88fd676	add unit tests for db_replicator Change-Id: I9002fa193a51f40523e7936e3117a2f3f2b2f7f8	2013-04-04 18:45:24 +00:00
Greg Lange	44f00a23c1	fixed some minor things in tests that pyflakes complained about Change-Id: Ifeab56a964630bcf941e932fcbe39e6572e62975	2013-03-26 20:42:26 +00:00
gholt	4e5889d6ce	Refactor db_replicator's roundrobin_datadirs roundrobin_datadirs was returning any .db file at any depth in the accounts/containers structure. Since xfs corruption can cause such files to appear in odd places at times (only happened on one drive of ours so far, but still...), I've refactored this function to only return .db files at the proper depth. Change-Id: Id06ef6584941f8a572e286f69dfa3d96fe451355	2012-11-15 21:44:14 +00:00
gholt	b5509b1bee	Db reclamation should remove empty suffix dirs When a db is reclaimed it removes the hash dir the db files are in, but it does not try to remove the parent suffix dir though it might be empty now. This eventually leads to a bunch of empty suffix dirs lying around. This patch fixes that by attempting to remove the parent suffix dir after a hash dir reclamation. Here's a quick script to see how bad a given drive might be: import os, os.path, sys if len(sys.argv) != 2: sys.exit('%s <mount-point>' % sys.argv[0]) in_use = 0 empty = 0 containers = os.path.join(sys.argv[1], 'containers') for p in os.listdir(containers): partition = os.path.join(containers, p) for s in os.listdir(partition): suffix = os.path.join(partition, s) if os.listdir(suffix): in_use += 1 else: empty += 1 print in_use, 'in use,', empty, 'empty,', '%.02f%%' % ( 100.0 * empty / (in_use + empty)), 'empty' And here's a quick script to clean up a drive: NOTE THAT I HAVEN'T ACTUALLY RUN THIS ON A LIVE NODE YET! import errno, os, os.path, sys if len(sys.argv) != 2: sys.exit('%s <mount-point>' % sys.argv[0]) containers = os.path.join(sys.argv[1], 'containers') for p in os.listdir(containers): partition = os.path.join(containers, p) for s in os.listdir(partition): suffix = os.path.join(partition, s) try: os.rmdir(suffix) except OSError, err: if err.errno not in (errno.ENOENT, errno.ENOTEMPTY): print err Change-Id: I2e6463a4cd40597fc236ebe3e73b4b31347f2309	2012-10-25 19:42:56 +00:00
lrqrun	7b664c99e5	Fix PEP8 issues in ./test/unit/common . Fix some pep8 issues in modified: test_bufferedhttp.py modified: test_constraints.py modified: test_db.py modified: test_db_replicator.py modified: test_init.py make the code looks pretty. Change-Id: I1c374b1ccd4f028c4e4b2e8194a6d1c201d50571	2012-08-31 11:24:46 +08:00
Darrell Bishop	66400b7337	Add device name to *-replicator.removes for DBs To tell when replication for a device has finished, it's important to know when the replicator is removing objects. This was previously handled for the object-replicator (object-replicator.partition.delete.count.<device> and object-replicator.partition.update.count.<device> metrics) but not the account and container replicators. This patch extends the existing DB removal count metrics to make them per-device. The new metrics are: account-replicator.removes.<device> container-replicator.removes.<device> There's also a bonus refactoring and increased test coverage of the DB replicator code. Change-Id: I2067317d4a5f8ad2a496834147954bdcdfc541c1	2012-08-22 13:35:09 -07:00
Jenkins	6682138b0a	Merge "Make ring class interface slightly more abstracted from implementation."	2012-03-22 20:25:06 +00:00
John Dickinson	1ecf5ebba1	updated copyright date for all files Change-Id: Ifd909d3561c2647770a7e0caa3cd91acd1b4f298	2012-03-19 13:45:34 -05:00
Michael Barton	e008c2ebb8	Make ring class interface slightly more abstracted from implementation. Change-Id: I0f55d61c7b8de30460f17a69e5d9946494dbda6e	2012-03-14 22:00:30 +00:00
David Goetz	2d9103f9e0	adding double quarantine support for db replication	2011-04-18 15:00:59 -07:00
Anne Gentle	8823427161	Changed copyright notices on py files and the single rst file with a copyright notice	2011-01-04 17:34:43 -06:00
Chuck Thier	158e6c3ae9	refactored bins to by more DRY	2010-08-31 23:12:59 +00:00
Chuck Thier	2a36fe0619	Initial commit of middleware refactor	2010-08-20 00:50:12 +00:00
Chuck Thier	2c596c0a0f	Initial commit of middleware refactor	2010-08-20 00:42:38 +00:00
gholt	15009bb76e	Added metadata to account and container servers	2010-08-10 12:18:15 -07:00
Chuck Thier	001407b969	Initial commit of Swift code	2010-07-12 17:03:45 -05:00

47 Commits