swift

Author	SHA1	Message	Date
Thiago da Silva	700fcc7353	Remove duplicate statement Change-Id: I249a1d5c0c025d571587e225e833a865ed6409e0	2019-01-22 10:24:49 -05:00
Zuul	b9d2c08e8d	Merge "Fix SSYNC concurrency on partition"	2018-12-11 23:34:54 +00:00
Romain LE DISEZ	014d46f9a7	Fix SSYNC concurrency on partition Commit e199192caefef068b5bf57da8b878e0bc82e3453 introduced the ability to have multiple SSYNC running on a single device. It misses a security to ensure that only one SSYNC request can be running on a partition. This commit update replication_lock to lock N times the device, then lock once the partition related to a SSYNC request. Change-Id: Id053ed7dd355d414d7920dda79a968a1c6677c14	2018-12-04 14:47:26 +01:00
Romain de Joux	4809884d9f	Use eventlet.patcher.original to get Python select module in get_hub get_hub function was added in commit b155da42 with the idea to bypass eventlet automatic hub selection that prefers epoll if available by default. Since version 0.20.0 eventlet removed select.poll() function in its patched select module (eventlet.green.select), see: - https://github.com/eventlet/eventlet/commit/614a20462 So if eventlet monkey patching is done before a get_hub() call (as now in wsgi.py since commit c9410c7d) if we use 'import select' we get the eventlet version that don't have poll attribute. To prevent that we use eventlet.patcher.original function to get python select module to test if poll() is available on current platform. Change-Id: I69b3db3951b3d3b6583845978deb2883492e7f0f Closes-Bug: 1804627	2018-11-26 23:04:30 +01:00
Tim Burke	582f0585e8	py3: encryption follow-up Change-Id: Ic680a11fa3133b3d6f3fa6fa007ccfbeb540899a	2018-11-20 14:27:19 -08:00
Zuul	614e85d479	Merge "Remove empty directories after a revert job"	2018-11-01 04:34:04 +00:00
Alexandre Lécuyer	d306345ddd	Remove empty directories after a revert job Currently, the reconstructor will not remove empty object and suffixes directories after processing a revert job. This will only happen during its next run. This patch will attempt to remove these empty directories immediately, while we have the inodes cached. Change-Id: I5dfc145b919b70ab7dae34fb124c8a25ba77222f	2018-10-26 09:29:14 +02:00
Pete Zaitcev	1663782459	Fix up the test for .ismount We kept hitting a floating error in the test, where fist ismount in the test succeeds, while it should fail. As it turned out, the return of gettempdir was the plain /tmp. So, a previous test created /tmp/.ismount and the subsequent runs failed on it. Re-generating the root filesystem (e.g. by a container) fixes the problem, but still, there's no need to do this. This change tightens the test up by placing the .ismount into a subdirectory of the test directory instead of the global /tmp. Change-Id: I006ba1f69982ef7513db3508d691723656f576c9	2018-10-26 05:30:55 +00:00
Tim Burke	0a564d885e	Check for .ismount stubs with symlinks, too Related-Change: I9d9fc0a4447a8c5dd39ca60b274c119af6b4c28f Change-Id: Ib6a2edf648397d1d1c875461698f63afcde5b3ed	2018-10-19 22:59:34 +00:00
zhulingjie	83a7ce8ce0	Python 3 compatibility: fix xrange/range issues xrange is not defined in python3. Rename xrange() to range(). Change-Id: Ifb1c9cfd863ce6dfe3cced3eca7ea8e539d8a5e9	2018-10-14 14:08:19 +00:00
Zuul	5cc4a72c76	Merge "Configure diskfile per storage policy"	2018-09-27 00:19:32 +00:00
Zuul	fc9ab28927	Merge "py3: port request_helpers"	2018-09-25 20:01:10 +00:00
Tim Burke	2ef21ac05d	py3: port request_helpers Change-Id: I6be1a1c618e4b4fa03b34dad96f378aca01e8e08	2018-09-15 01:33:34 -06:00
Kota Tsuyuzaki	814a76689f	Follow up punch_hole patch Add missing tests to get more coverage and reduce a line. Change-Id: I34d8063ee82323c9751b4c965bee01ab584c5eb5	2018-09-15 06:49:18 +09:00
Alexandre Lécuyer	dbacdcf01c	Add punch_hole utility function This is useful for deallocating disk blocks as part of an alternate disk file implementation. Additionally, add an offset argument to the existing fallocate utility function; this allows you to grow an existing file. Sam always had the best descriptions: utils.fallocate(fd, size) allocates <size> bytes for the file referred to by <fd>. It allows for keeping a reserve of an additional N bytes or X% of the filesystem free. If neither fallocate() or posix_fallocate() C functions are avaialble, utils.fallocate() will log a warning (once only) and not actually allocate space. utils.punch_hole(fd, offset, length) deallocates <length> bytes starting at <offset> from the file referred to by <fd>. It uses the C function fallocate(). If fallocate() is not available, calls to utils.punch_hole() will raise an exception. Since these both use the fallocate syscall, refactor that a bit and get rid of FallocateWrapper. We add a new _LibcWrapper to do some lazy-loading of a C function and expose whether the function is actually available in Python, though. This allows utils.fallocate and utils.punch_hole to keep their fancy logic pretty well-contained. Modernized the tests for utils.fallocate() and utils.punch_hole(). Co-Authored-By: Samuel Merritt <sam@swiftstack.com> Change-Id: Ieac30a477d784905c94742ee3d0898d7e0194b39	2018-09-14 13:55:42 -06:00
Romain LE DISEZ	673fda7620	Configure diskfile per storage policy With this commit, each storage policy can define the diskfile to use to access objects. Selection of the diskfile is done in swift.conf. Example: [storage-policy:0] name = gold policy_type = replication default = yes diskfile = egg:swift#replication.fs The diskfile configuration item accepts the same format than middlewares declaration: [[scheme:]egg_name#]entry_point The egg_name is optional and default to "swift". The scheme is optional and default to the only valid value "egg". The upstream entry points are "replication.fs" and "erasure_coding.fs". Co-Authored-By: Alexandre Lécuyer <alexandre.lecuyer@corp.ovh.com> Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I070c21bc1eaf1c71ac0652cec9e813cadcc14851	2018-08-24 02:29:13 +00:00
Zuul	89854250c3	Merge "Add fallocate_reserve to account and container servers."	2018-07-20 08:42:51 +00:00
Samuel Merritt	8e651a2d3d	Add fallocate_reserve to account and container servers. The object server can be configured to leave a certain amount of disk space free; default is 1%. This is useful in avoiding 100%-full filesystems, as those can get Swift in a state where the filesystem is too full to write tombstones, so you can't delete objects to free up space. When a cluster has accounts/containers and objects on the same disks, then you can wind up with a 100%-full disk since account and container servers don't respect fallocate_reserve. This commit makes account and container servers respect fallocate_reserve so that disks shared between account/container and object rings won't get 100% full. When a disk's free space falls below the configured reserve, account and container PUT, POST, and REPLICATE requests will fail with a 507 status code. These are the operations that can significantly increase the disk space used by a given database. I called the parameter "fallocate_reserve" for consistency with the object server. No actual fallocate() call happens under Swift's control in the account or container servers (sqlite3 might make such a call, but it's out of our hands). Change-Id: I083442eef14bf83c0ea717b1decb3e6b56dbf1d0	2018-07-18 17:27:11 +10:00
mmcardle	26b20ee729	IP Range restrictions in temp urls This patch adds an additional optional parameter to tempurl which restricts the ip's from which a temp url can be used from. Change-Id: I23fe998a980960d4a32df042b3f6a21f096c36af	2018-07-03 12:25:28 +01:00
Tim Burke	1318bacc17	py36: Fix test_get_logger_sysloghandler_plumbing Change-Id: Ibdb9e2bbec1c962d930a3f69fc95a8c562ac13b7	2018-06-21 15:43:26 -07:00
Zuul	1cd6416471	Merge "Fix common/test_utils.py on Python 3.5.4+"	2018-06-20 03:15:49 +00:00
Tim Burke	dc8d1c964a	Get rid of tpool_reraise As best I can tell, eventlet already does (and always has done) the right thing, and we were just bad at catching Timeouts. For some history: https://github.com/openstack/swift/commit/5db3cb3 https://github.com/openstack/swift/commit/2b3aab8 https://github.com/openstack/swift/commit/da0e013 Change-Id: Iad8109c4a03f006a89e55373cf3ca867d724b3e1 Related-Bug: 1647804	2018-06-12 15:23:17 -07:00
Samuel Merritt	854db51845	Fix common/test_utils.py on Python 3.5.4+ In CPython commit e59af55c2, instantiating a logging.SysLogHandler stopped raising an exception if the syslog server was unavailable. This commit first appears in CPython 3.5.4. utils.get_logger() catches that error and retries the instantiation, and there a test asserting that. The test fails on Python 3.5.4 or greater, so now it has been corrected to only assert things about the first instantiation of logging.SysLogHandler and passes on Python 3.5.4 and 3.5.5. This was noticed by running "tox -e py35" on an Ubuntu 18.04 system, which ships with Python 3.5.5. Change-Id: I43f231bd7d3566b9849a48f46ec9e2af4cd23be4	2018-05-24 14:14:34 -07:00
Tim Burke	4af57dbc65	Let make_db_file_path accept epoch=None ...in which case it should strip the epoch if the original path had one. Change-Id: I8739a474c56c0f2376a276d2691c84448cb9c647	2018-05-22 13:49:17 -07:00
Matthew Oliver	2641814010	Add sharder daemon, manage_shard_ranges tool and probe tests The sharder daemon visits container dbs and when necessary executes the sharding workflow on the db. The workflow is, in overview: - perform an audit of the container for sharding purposes. - move any misplaced objects that do not belong in the container to their correct shard. - move shard ranges from FOUND state to CREATED state by creating shard containers. - move shard ranges from CREATED to CLEAVED state by cleaving objects to shard dbs and replicating those dbs. By default this is done in batches of 2 shard ranges per visit. Additionally, when the auto_shard option is True (NOT yet recommeneded in production), the sharder will identify shard ranges for containers that have exceeded the threshold for sharding, and will also manage the sharding and shrinking of shard containers. The manage_shard_ranges tool provides a means to manually identify shard ranges and merge them to a container in order to trigger sharding. This is currently the recommended way to shard a container. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f	2018-05-18 18:48:13 +01:00
Alistair Coles	4a3efe61a9	Redirect object updates to shard containers Enable the proxy to fetch a shard container location from the container server in order to redirect an object update to the shard. Enable the container server to redirect object updates to shard containers. Enable object updater to accept redirection of an object update. Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I6ff85827eecdea746b3626c0d401f68139cce19d	2018-05-18 18:48:13 +01:00
Alistair Coles	14af38a899	Add support for sharding in ContainerBroker With this patch the ContainerBroker gains several new features: 1. A shard_ranges table to persist ShardRange data, along with methods to merge and access ShardRange instances to that table, and to remove expired shard ranges. 2. The ability to create a fresh db file to replace the existing db file. Fresh db files are named using the hash of the container path plus an epoch which is a serialized Timestamp value, in the form: <hash>_<epoch>.db During sharding both the fresh and retiring db files co-exist on disk. The ContainerBroker is now able to choose the newest on disk db file when instantiated. It also provides a method (get_brokers()) to gain access to broker instance for either on disk file. 3. Methods to access the current state of the on disk db files i.e. UNSHARDED (old file only), SHARDING (fresh and retiring files), or SHARDED (fresh file only with shard ranges). Container replication is also modified: 1. shard ranges are replicated between container db peers. Unlike objects, shard ranges are both pushed and pulled during a REPLICATE event. 2. If a container db is capable of being sharded (i.e. it has a set of shard ranges) then it will no longer attempt to replicate objects to its peers. Object record durability is achieved by sharding rather than peer to peer replication. Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: Ie4d2816259e6c25c346976e181fb9d350f947190	2018-05-18 18:42:38 +01:00
Alistair Coles	a962340dd8	Add ShardRange class A ShardRange represents the part of the object namespace that is managed by a container. It encapsulates: - the namespace range, from an excluded lower bound to an included upper bound - the object count and bytes used in the range - the current state of the range, including whether it is deleted or not Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Kazuhiro MIYAHARA <miyahara.kazuhiro@lab.ntt.co.jp> Change-Id: Iae090dc170843f15fd2a3ea8f167bec2848e928d	2018-05-17 19:43:35 +01:00
Alistair Coles	9d742b85ad	Refactoring, test infrastructure changes and cleanup ...in preparation for the container sharding feature. Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I4455677abb114a645cff93cd41b394d227e805de	2018-05-15 18:18:25 +01:00
Samuel Merritt	c4751d0d55	Make reconstructor go faster with --override-devices The object reconstructor will now fork all available worker processes when operating on a subset of local devices. Example: A system has 24 disks, named "d1" through "d24" reconstructor_workers = 8 invoked with --override-devices=d1,d2,d3,d4,d5,d6 In this case, the reconstructor will now use 6 worker processes, one per disk. The old behavior was to use 2 worker processes, one for d1, d3, and d5 and the other for d2, d4, and d6 (because 24 / 8 = 3, so we assigned 3 disks per worker before creating another). I think the new behavior better matches operators' expectations. If I give a concurrent program six tasks to do and tell it to operate on up to eight at a time, I'd expect it to do all six tasks at once, not run two concurrent batches of three tasks apiece. This has no effect when --override-devices is not specified. When operating on all local devices instead of a subset, the new and old code produce the same result. The reconstructor's behavior now matches the object replicator's behavior. Change-Id: Ib308c156c77b9b92541a12dd7e9b1a8ea8307a30	2018-04-25 11:18:35 -07:00
Samuel Merritt	c28004deb0	Multiprocess object replicator Add a multiprocess mode to the object replicator. Setting the "replicator_workers" setting to a positive value N will result in the replicator using up to N worker processes to perform replication tasks. At most one worker per disk will be spawned, so one can set replicator_workers=99999999 to always get one worker per disk regardless of the number of disks in each node. This is the same behavior that the object reconstructor has. Worker process logs will have a bit of information prepended so operators can tell which messages came from which worker. It looks like this: [worker 1/2 pid=16529] 154/154 (100.00%) partitions replicated in 1.02s (150.87/sec, 0s remaining) The prefix is "[worker M/N pid=P] ", where M is the worker's index, N is the total number of workers, and P is the process ID. Every message from the replicator's logger will have the prefix; this includes messages from down in diskfile, but does not include things printed to stdout or stderr. Drive-by fix: don't dump recon stats when replicating only certain policies. When running the object replicator with replicator_workers > 0 and "--policies=X,Y,Z", the replicator would update recon stats after running. Since it only ran on a subset of objects, it should not update recon, much like it doesn't update recon when run with --devices or --partitions. Change-Id: I6802a9ad9f1f9b9dafb99d8b095af0fdbf174dc5	2018-04-24 04:05:08 +00:00
Zuul	329b413494	Merge "Don't log tracebacks on ECONNRESET, ENETUNREACH, BadStatusLine"	2018-04-05 21:03:55 +00:00
Tim Burke	5d20411a3f	Don't log tracebacks on ECONNRESET, ENETUNREACH, BadStatusLine Seen during a retart-storm: Traceback (most recent call last): File ".../swift/common/db_replicator.py", line 134, in replicate {'Content-Type': 'application/json'}) File ".../httplib.py", line 1057, in request self._send_request(method, url, body, headers) File ".../httplib.py", line 1097, in _send_request self.endheaders(body) File ".../httplib.py", line 1053, in endheaders self._send_output(message_body) File ".../httplib.py", line 897, in _send_output self.send(msg) File ".../httplib.py", line 859, in send self.connect() File ".../swift/common/bufferedhttp.py", line 108, in connect return HTTPConnection.connect(self) File ".../httplib.py", line 836, in connect self.timeout, self.source_address) File ".../eventlet/green/socket.py", line 72, in create_connection raise err error: [Errno 104] ECONNRESET Traceback (most recent call last): File ".../swift/obj/replicator.py", line 282, in update '', headers=self.headers).getresponse() File ".../swift/common/bufferedhttp.py", line 157, in http_connect ipaddr, port, method, path, headers, query_string, ssl) File ".../swift/common/bufferedhttp.py", line 189, in http_connect_raw conn.endheaders() File ".../httplib.py", line 1053, in endheaders self._send_output(message_body) File ".../httplib.py", line 897, in _send_output self.send(msg) File ".../httplib.py", line 859, in send self.connect() File ".../swift/common/bufferedhttp.py", line 108, in connect return HTTPConnection.connect(self) File ".../httplib.py", line 836, in connect self.timeout, self.source_address) File ".../eventlet/green/socket.py", line 72, in create_connection raise err error: [Errno 101] ENETUNREACH Traceback (most recent call last): File ".../swift/obj/replicator.py", line 282, in update '', headers=self.headers).getresponse() File ".../swift/common/bufferedhttp.py", line 123, in getresponse response = HTTPConnection.getresponse(self) File ".../httplib.py", line 1136, in getresponse response.begin() File ".../httplib.py", line 453, in begin version, status, reason = self._read_status() File ".../httplib.py", line 417, in _read_status raise BadStatusLine(line) BadStatusLine: '' (Different transactions, of course.) Change-Id: I07192b8d2ece2d2ee04fe0d877ead6fbfc321d86	2018-04-03 21:42:18 -07:00
Alexandre Lécuyer	9f4910f6b9	Add round_robin_iter function to common/utils Move the existing code in db_replicator.py to utils.py Change-Id: I04e3f30a82c89fb7e714c5eb7225374b9a5c76f9	2018-03-20 15:18:58 +01:00
Zuul	08938eeb33	Merge "Encapsulate some general utility in utils"	2018-03-20 06:45:35 +00:00
Clay Gerrard	2c4c23defc	Encapsulate some general utility in utils ... but either way works. Change-Id: I56b7689270e528a1d2099ff96030ce660e588c94 Related-Change-Id: Ic108f5c38f700ac4c7bcf8315bf4c55306951361	2018-03-16 18:45:16 -07:00
Thomas Goirand	22b9a4a943	Fix tests using O_TMPFILE Unit tests using O_TMPFILE only rely on the kernel version to check for the feature. This is wrong, as some filesystem, like tmpfs, doesn't support O_TMPFILE. So, instead of checking kernel version, this patch actually attempts to open a file using O_TMPFILE and see if that's supported. If not, then the test is skipped. Change-Id: I5d652f1634b1ef940838573cfdd799ea17b8b572	2018-03-13 12:06:07 +00:00
Zuul	800e585176	Merge "py3: have Timestamp accept bytestrings, better validate input"	2018-03-08 05:13:23 +00:00
Tim Burke	e4640495d8	py3: have Timestamp accept bytestrings, better validate input Change-Id: I627894b9496cc85b3b1852e9df4812257df4ac6a	2018-03-07 12:53:36 -08:00
Zuul	14a8797228	Merge "Make test_greater_with_offset not fail on py36"	2018-03-03 23:55:53 +00:00
Zuul	ceb3c01bf6	Merge "Make statsd errors correspond to 5xx only"	2018-03-03 03:08:41 +00:00
Pete Zaitcev	fdaf052d73	Make test_greater_with_offset not fail on py36 Reviewer, beware: we determined that the test was using the facilities improperly. This patch adjusts the test but does not fix the code under test. The time.time() output looks like this: [zaitcev@lembas swift-tsrep]$ python2 Python 2.7.14 (default, Dec 11 2017, 14:52:53) [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux2 >>> import time >>> time.time() 1519861559.96239 >>> time.time() 1519861561.046204 >>> time.time() 1519861561.732341 >>> (it's never beyond 6 digits on py2) [zaitcev@lembas swift-tsrep]$ python3 Python 3.6.3 (default, Oct 9 2017, 12:07:10) [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux >>> import time >>> time.time() 1519861541.7662468 >>> time.time() 1519861542.893482 >>> time.time() 1519861546.56222 >>> time.time() 1519861547.3297756 >>> (can go beyond 6 digits on py3) When fraction is too long on py3, you get: >>> now = 1519830570.6949349 >>> now 1519830570.6949348 >>> timestamp = Timestamp(now, offset=1) >>> timestamp 1519830570.69493_0000000000000001 >>> value = '%f' % now >>> value '1519830570.694935' >>> timestamp > value False >>> Note that the test fails in exactly the same way on py2, if time.time() returns enough digits. Therefore, rounding changes are not the culprit. The real problem is the assumption that you can take a float T, print it with '%f' into S, then do arithmetic on T to get O, convert S, T, and O into Timestamp, then make comparisons. This does not work, because rounding happens twice: once when you interpolate %f, and then when you construct a Timestamp. The only valid operation is to accept a timestamp (e.g. from X-Delete-At) as a floating point number as a decimal string, and convert it once. Only then you can do arithmetics to find the expiration. Change-Id: Ie3b002abbd4734c675ee48a7535b8b846032f9d1	2018-03-01 21:04:42 -06:00
Tim Burke	642f79965a	py3: port common/ring/ and common/utils.py I can't imagine us not having a py3 proxy server at some point, and that proxy server is going to need a ring. While we're at it (and since they were so close anyway), port * cli/ringbuilder.py and * common/linkat.py * common/daemon.py Change-Id: Iec8d97e0ce925614a86b516c4c6ed82809d0ba9b	2018-02-12 06:42:24 +00:00
Samuel Merritt	38ada71824	Make statsd errors correspond to 5xx only The goal is to make the successful statsd buckets (e.g. "object-server.GET.timing") have timing information for all the requests that the server handled correctly, while the error buckets (e.g. "object-server.GET.errors.timing") have the rest. Currently, we don't do that great a job of it. We special-case a few 4xx status codes (404, 412, 416) to not count as errors, but we leave some pretty large holes. If you're graphing errors, you'll see spikes when client is sending bogus requests (400) or failing to re-authenticate (403). You'll also see spikes when your drives are unmounted (507) and when there's bugs that need fixing (500). This commit makes .errors.timing be just 5xx in the hope that its graph will be more useful. Change-Id: I92b41bcbb880c0688c37ab231c19ebe984b18215	2018-01-23 12:12:09 -08:00
Zuul	f925d31599	Merge "rename utils function less like stdlib"	2017-12-15 02:33:21 +00:00
Zuul	f9fdb17c18	Merge "Add base64decode function to common/utils"	2017-12-15 02:33:18 +00:00
Clay Gerrard	7647defb0f	rename utils function less like stdlib Related-Change-Id: I3436bf3724884fe252c6cb603243c1195f67b701 Change-Id: I74199c62b46e4db93a76760ebf91d84e3e1e3cfc	2017-12-14 12:57:03 -08:00
Zuul	27398c3573	Merge "No longer import nose"	2017-11-08 23:43:07 +00:00
Zuul	0da972bbb0	Merge "Require lock_path limit to be a positive int"	2017-11-07 19:33:02 +00:00
Zuul	18d3ba4e6e	Merge "Cleanup lock_path and replication_concurrency_per_device tests"	2017-11-07 19:33:01 +00:00

... 2 3 4 5 6 ...

509 Commits