223 Commits

Author SHA1 Message Date
Jenkins
222649de45 Merge "Allow smaller segments in static large objects" 2016-01-23 06:13:26 +00:00
Samuel Merritt
5d449471b1 Remove some Python 2.6 leftovers
Change-Id: I798d08722c90327c66759aa0bb4526851ba38d41
2016-01-14 17:26:01 -08:00
Jenkins
c38ca6329f Merge "Port swift.common.utils.StatsdClient to Python 3" 2016-01-05 09:14:51 +00:00
janonymous
684c4c0459 Python 3 deprecated the logger.warn method in favor of warning
DeprecationWarning: The 'warn' method is deprecated, use 'warning'
instead

Change-Id: I35df44374c4521b1f06be7a96c0b873e8c3674d8
2015-12-22 22:11:29 +05:30
Jenkins
cb8962fbdd Merge "Make ECDiskFile report all fragments found on disk" 2015-12-19 17:15:47 +00:00
Alistair Coles
60b2e02905 Make ECDiskFile report all fragments found on disk
Refactor the disk file get_ondisk_files logic to enable
ECDiskfile to gather *all* fragments found on disk (not just those
with a matching .durable file) and make the fragments available
via the DiskFile interface as a dict mapping:

    Timestamp --> list of fragment indexes

Also, if a durable fragment has been found then the timestamp
of the durable file is exposed via the diskfile interface.

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I55e20a999685b94023d47b231d51007045ac920e
2015-12-15 15:25:11 +00:00
Victor Stinner
88c9aed7c8 Port swift.common.utils.StatsdClient to Python 3
* StatsdClient._send(): on Python 3, encode parts to UTF-8 and
  replace '|' with b'|' to join parts.
* timing_stats(): replace func.func_name with func.__name__. The
  func_name attribute of functions was removed on Python 3, whereas
  the __name__ attribute is available on Python 2 and Python 3.
* Fix unit tests to use bytes

Change-Id: Ic279c9b54e91aabcc52587eed7758e268ffb155e
2015-12-14 08:09:57 +00:00
Samuel Merritt
7f636a5572 Allow smaller segments in static large objects
The addition of range support for SLO segments (commit 25d5e68)
required the range size to be at least the SLO minimum segment size
(default 1 MiB). However, if you're doing something like assembling a
video of short clips out of a larger one, then you might not need a
full 1 MiB.

The reason for the 1 MiB restriction was to protect Swift from
resource overconsumption. It takes CPU, RAM, and internal bandwidth to
connect to an object server, so it's much cheaper to serve a 10 GiB
SLO if it has 10 MiB segments than if it has 10 B segments.

Instead of a strict limit, now we apply ratelimiting to small
segments. The threshold for "small" is configurable and defaults to 1
MiB. SLO segments may now be as small as 1 byte.

If a client makes SLOs as before, it'll still be able to download the
objects as fast as Swift can serve them. However, a SLO with a lot of
small ranges or segments will be slowed down to avoid resource
overconsumption. This is similar to how DLOs work, except that DLOs
ratelimit *every* segment, not just small ones.

UpgradeImpact

For operators: if your cluster has enabled ratelimiting for SLO, you
will want to set rate_limit_under_size to a large number prior to
upgrade. This will preserve your existing behavior of ratelimiting all
SLO segments. 5368709123 is a good value, as that's 1 greater than the
default max object size. Alternately, hold down the 9 key until you
get bored.

If your cluster has not enabled ratelimiting for SLO (the default), no
action is needed.

Change-Id: Id1ff7742308ed816038a5c44ec548afa26612b95
2015-12-09 10:09:13 -08:00
Jenkins
08f8e235b9 Merge "Fix Python 3 issues in utils" 2015-12-01 06:01:45 +00:00
Jenkins
d9e44bda05 Merge "monkeypatch thread for keystoneclient" 2015-11-11 21:08:53 +00:00
Jenkins
e66b9284bc Merge "Add unit tests for utils.validate_hash_path" 2015-11-06 04:00:21 +00:00
Mehdi Abaakouk
bf8689474a monkeypatch thread for keystoneclient
keystoneclient uses threading.Lock(), but swift doesn't
monkeypatch threading, this result in lockup when two
greenthreads try to acquire a non green lock.

This change fixes that.

Change-Id: I9b44284a5eb598a6978364819f253e031f4eaeef
Closes-bug: #1508424
2015-11-03 16:36:19 +01:00
Kota Tsuyuzaki
a4a5fcad8f Add unit tests for utils.validate_hash_path
This patch add some unit tests to prevent regression and to describe
validate_hash_path behavior, I found in review with

https://review.openstack.org/#/c/231864/

*bonus*
- Fix test_hash_path to use "with" syntax instead of "try/finally"
  for assigning a testing value into a global variable.

Change-Id: I948999a8fb8addb9a378dbf8bee853b205aeafad
2015-10-23 23:00:54 -07:00
Samuel Merritt
e31ecb24b6 Get rid of contextlib.nested() for py3
contextlib.nested() is missing completely in Python 3.

Since 2.7, we can use multiple context managers in a 'with' statement,
like so:

    with thing1() as t1, thing2() as t2:
        do_stuff()

Now, if we had some code that needed to nest an arbitrary number of
context managers, there's stuff we could do with contextlib.ExitStack
and such... but we don't. We only use contextlib.nested() in tests to
set up bunches of mocks without crazy-deep indentation, and all that
stuff fits perfectly into multiple-context-manager 'with' statements.

Change-Id: Id472958b007948f05dbd4c7fb8cf3ffab58e2681
2015-10-23 11:44:54 -07:00
Victor Stinner
d877a42237 Fix Python 3 issues in utils
* LogAdapter.exception(): on Python 3, socket.error is an alias to
  OSError, so group both exceptions to support Python 3.
* ThreadPool: call GreenPipe() with indexed parameter, don't pass the
  third parameter by its name, since the parameter name changed
  between Python 2 (bufsize) and Python 3 (buffering).
* strip_value() in test_utils: StringIO.getvalue() now requires
  seek(0), otherwise the buffer is filled with null characters.
* test_lock_file(): on Python 3, seek(0) is now required to go the
  beginning of a file opened in append mode. In append mode, Python
  goes to the end of the file.

Change-Id: I4e56a51690f016a0a2e1354380ce11cff1891f64
2015-10-22 15:02:52 +00:00
Victor Stinner
71e573f4ba Port swift.common.utils.backward() to Python 3
backward() is written to handle binary files:

* Replace literal native strings to literal byte strings: add b'...'
  prefix
* Update unit tests: use byte strings
* TemporaryFile(): use the default mode 'w+b', instead of using 'r+w'
  ('r+w' mode creates a Unicode file on Python 3).

Change-Id: Ic91f7e6c605db0b888763080d49f0f501029837f
2015-10-19 17:57:01 +02:00
Jenkins
cc46ab0b8f Merge "py3: Replace gen.next() with next(gen)" 2015-10-12 19:45:29 +00:00
janonymous
1882801be1 pep8 fix: assertNotEquals -> assertNotEqual
assertNotEquals is deprecated in py3

Change-Id: Ib611351987bed1199fb8f73a750955a61d022d0a
2015-10-12 07:40:07 +00:00
janonymous
f5f9d791b0 pep8 fix: assertEquals -> assertEqual
assertEquals is deprecated in py3, replacing it.

Change-Id: Ida206abbb13c320095bb9e3b25a2b66cc31bfba8
Co-Authored-By: Ondřej Nový <ondrej.novy@firma.seznam.cz>
2015-10-11 12:57:25 +02:00
Victor Stinner
8f85427939 py3: Replace gen.next() with next(gen)
The next() method of Python 2 generators was renamed to __next__().
Call the builtin next() function instead which works on Python 2 and
Python 3.

The patch was generated by the next operation of the sixer tool.

Change-Id: Id12bc16cba7d9b8a283af0d392188a185abe439d
2015-10-08 15:40:06 +02:00
Zack M. Davis
06bede8942 replace use of deprecated rfc822.Message with a helper utility
The rfc822 module has been deprecated since Python 2.3, and in
particular is absent from the Python 3 standard library. However, Swift
uses instances of rfc822.Message in a number of places, relying on its
behavior of immediately parsing the headers of a file-like object
without consuming the body, leaving the position of the file at the
start of the body. Python 3's http.client has an undocumented
parse_headers function with the same behavior, which inspired the new
parse_mime_headers utility introduced here. (The HeaderKeyDict returned
by parse_mime_headers doesn't have a `.getheader(key)` method like
rfc822.Message did; the dictionary-like `[key]` or `.get(key)` interface
should be used exclusively.)

The implementation in this commit won't actually work with Python 3, the
email.parser.Parser().parsestr of which expects a Unicode string, but it
is believed that this can be addressed in followup work.

Change-Id: Ia5ee2ead67e36e8c6416183667f64ae255887736
2015-10-05 12:22:24 -07:00
Alistair Coles
29c10db0cb Add POST capability to ssync for .meta files
ssync currently does the wrong thing when replicating object dirs
containing both a .data and a .meta file. The ssync sender uses a
single PUT to send both object content and metadata to the receiver,
using the metadata (.meta file) timestamp. This results in the object
content timestamp being advanced to the metadata timestamp,
potentially overwriting newer object data on the receiver and causing
an inconsistency with the container server record for the object.

For example, replicating an object dir with {t0.data(etag=x), t2.meta}
to a receiver with t1.data(etag=y) will result in the creation of
t2.data(etag=x) on the receiver. However, the container server will
continue to list the object as t1(etag=y).

This patch modifies ssync to replicate the content of .data and .meta
separately using a PUT request for the data (no change) and a POST
request for the metadata. In effect, ssync replication replicates the
client operations that generated the .data and .meta files so that
the result of replication is the same as if the original client requests
had persisted on all object servers.

Apart from maintaining correct timestamps across sync'd nodes, this has
the added benefit of not needing to PUT objects when only the metadata
has changed and a POST will suffice.

Taking the same example, ssync sender will no longer PUT t0.data but will
POST t2.meta resulting in the receiver having t1.data and t2.meta.

The changes are backwards compatible: an upgraded sender will only sync
data files to a legacy receiver and will not sync meta files (fixing the
erroneous behavior described above); a legacy sender will operate as
before when sync'ing to an upgraded receiver.

Changes:
- diskfile API provides methods to get the data file timestamp
  as distinct from the diskfile timestamp.

- diskfile yield_hashes return tuple now passes a dict mapping data and
  meta (if any) timestamps to their respective values in the timestamp
  field.

- ssync_sender will encode data and meta timestamps in the
  (hash_path, timestamp) tuple sent to the receiver during
  missing_checks.

- ssync_receiver compares sender's data and meta timestamps to any
  local diskfile and may specify that only data or meta parts are sent
  during updates phase by appending a qualifier to the hash returned
  in its 'wanted' list.

- ssync_sender now sends POST subrequests when a meta file
  exists and its content needs to be replicated.

- ssync_sender may send *only* a POST if the receiver indicates that
  is the only part required to be sync'd.

- object server will allow PUT and DELETE with earlier timestamp than
  a POST

- Fixed TODO related to replicated objects with fast-POST and ssync

Related spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Closes-Bug: 1501528
Change-Id: I97552d194e5cc342b0a3f4b9800de8aa6b9cb85b
2015-10-02 11:24:19 +00:00
Romain LE DISEZ
71f6fd025e Allows to configure the rsync modules where the replicators will send data
Currently, the rsync module where the replicators send data is static. It
forbids administrators to set rsync configuration based on their current
deployment or needs.

As an example, the rsyncd configuration example encourages to set a connections
limit for the modules account, container and object. It permits to protect
devices from excessives parallels connections, because it would impact
performances.

On a server with many devices, it is tempting to increase this number
proportionally, but nothing guarantees that the distribution of the connections
will be balanced. In the worst scenario, a single device can receive all the
connections, which is a severe impact on performances.

This commit adds a new option named 'rsync_module' to the *-replicator sections
of the *-server configuration file. This configuration variable can be
extrapolated with device attributes like ip, port, device, zone, ... by using
the format {NAME}. eg:
    rsync_module = {replication_ip}::object_{device}

With this configuration, an administrators can solve the problem of connections
distribution by creating one module per device in rsyncd configuration.

The default values are backward compatible:
    {replication_ip}::account
    {replication_ip}::container
    {replication_ip}::object

Option vm_test_mode is deprecated by this commit, but backward compatibility is
maintained. The option is only effective when rsync_module is not set. In that
case, {replication_port} is appended to the default value of rsync_module.

Change-Id: Iad91df50dadbe96c921181797799b4444323ce2e
2015-09-07 08:00:18 +02:00
paul luse
893f30c61d EC GET path: require fragments to be of same set
And if they are not, exhaust the node iter to go get more.  The
problem without this implementation is a simple overwrite where
a GET follows before the handoff has put the newer obj back on
the 'alive again' node such that the proxy gets n-1 fragments
of the newest set and 1 of the older.

This patch bucketizes the fragments by etag and if it doesn't
have enough continues to exhaust the node iterator until it
has a large enough matching set.

Change-Id: Ib710a133ce1be278365067fd0d6610d80f1f7372
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Closes-Bug: 1457691
2015-08-27 21:09:41 -07:00
Jenkins
f644eaef7d Merge "test/unit: Replace python print operator with print function (pep H233, py33)" 2015-08-11 09:49:50 +00:00
Jenkins
5a04412ad2 Merge "sys.exc_type/exc_value/exc_traceback are Deprecated" 2015-08-03 14:54:52 +00:00
Victor Stinner
a0db56dcde Fix pep8 E265 warning of hacking 0.10
Fix the warning E265 "block comment should start with '# '" added in pep
1.5.

Change-Id: Ib57282e958be9c7cddffc7bca34fbbf1d4c460fd
2015-07-30 09:33:18 +02:00
janonymous
c5b5cf91a9 test/unit: Replace python print operator with print function (pep H233, py33)
'print' function is compatible with 2.x and 3.x python versions
Link : https://www.python.org/dev/peps/pep-3105/

Python 2.6 has a __future__ import that removes print as language syntax,
letting you use the functional form instead

Change-Id: I94e1bc6bd83ad6b05695c7ebdf7cbfd8f6d9f9af
2015-07-28 21:03:05 +05:30
Jenkins
b87e32591d Merge "Use six to fix imports on Python 3" 2015-07-28 10:25:25 +00:00
Jenkins
7eb0306d0e Merge "Deprecated xreadlines() usage fixed" 2015-07-28 00:18:03 +00:00
Victor Stinner
e24d7c36fa Use six to fix imports on Python 3
Get configparser, queue, http_client modules from six.moves.

Patch generated by the six_moves operation of the sixer tool:
https://pypi.python.org/pypi/sixer

Change-Id: I666241ab50101b8cc6f992dd80134ce27327bd7d
2015-07-24 11:48:28 +02:00
Jenkins
260e976e50 Merge "Get StringIO and cStringIO from six.moves" 2015-07-24 06:52:36 +00:00
janonymous
cd7b2db550 unit tests: Replace "self.assert_" by "self.assertTrue"
The assert_() method is deprecated and can be safely replaced by assertTrue().
This patch makes sure that running the tests does not create undesired
warnings.

Change-Id: I0602ba39ef93263386644ee68088d5f65fcb4a71
2015-07-21 19:23:00 +05:30
janonymous
39621b8629 Deprecated xreadlines() usage fixed
Change-Id: Iadf42c2f86f78c11259e21c88b4aef51db3441ad
2015-07-18 14:12:58 +05:30
Victor Stinner
6e70f3fa32 Get StringIO and cStringIO from six.moves
* replace "from cStringIO import StringIO"
  with "from six.moves import cStringIO as StringIO"
* replace "from StringIO import StringIO"
  with "from six import StringIO"
* replace "import cStringIO" and "cStringIO.StringIO()"
  with "from six import moves" and "moves.cStringIO()"
* replace "import StringIO" and "StringIO.StringIO()"
  with "import six" and "six.StringIO()"

This patch was generated by the stringio operation of the sixer tool:
https://pypi.python.org/pypi/sixer

Change-Id: Iacba77fec3045f96773d1090c0bd48613729a561
2015-07-15 16:56:33 +02:00
janonymous
c8131c6abb sys.exc_type/exc_value/exc_traceback are Deprecated
sys.exc_info() contains a tuple of these three.

Change-Id: I530cbeb37c43da98b4924db41f6604871077bd47
2015-07-12 14:48:35 +05:30
Victor Stinner
e5c962a28c Replace xrange() with six.moves.range()
Patch generated by the xrange operation of the sixer tool:
https://pypi.python.org/pypi/sixer

Manual changes:

* Fix indentation for pep8 checks
* Fix TestGreenthreadSafeIterator.test_access_is_serialized of
  test.unit.common.test_utils:
  replace range(1, 11) with list(range(1, 11))
* Fix UnsafeXrange docstring, revert change

Change-Id: Icb7e26135c5e57b5302b8bfe066b33cafe69fe4d
2015-06-23 07:29:15 +00:00
Darrell Bishop
df134df901 Allow 1+ object-servers-per-disk deployment
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs.  The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring.  In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket.  The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk.  The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring.  The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).

Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM.  Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.

The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True.  This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.

A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses).  The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.

This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server".  The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks.  Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None.  Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled.  Bonus improvement for IPv6
addresses in is_local_device().

This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/

Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").

However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.

This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.

In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.

Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.

For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md

Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md

If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md

DocImpact

Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
2015-06-18 12:43:50 -07:00
janonymous
09e7477a39 Replace it.next() with next(it) for py3 compat
The Python 2 next() method of iterators was renamed to __next__() on
Python 3. Use the builtin next() function instead which works on Python
2 and Python 3.

Change-Id: Ic948bc574b58f1d28c5c58e3985906dee17fa51d
2015-06-15 22:10:45 +05:30
Samuel Merritt
4f2ed8bcd0 EC: support multiple ranges for GET requests
This commit lets clients receive multipart/byteranges responses (see
RFC 7233, Appendix A) for erasure-coded objects. Clients can already
do this for replicated objects, so this brings EC closer to feature
parity (ha!).

GetOrHeadHandler got a base class extracted from it that treats an
HTTP response as a sequence of byte-range responses. This way, it can
continue to yield whole fragments, not just N-byte pieces of the raw
HTTP response, since an N-byte piece of a multipart/byteranges
response is pretty much useless.

There are a couple of bonus fixes in here, too. For starters, download
resuming now works on multipart/byteranges responses. Before, it only
worked on 200 responses or 206 responses for a single byte
range. Also, BufferedHTTPResponse grew a readline() method.

Also, the MIME response for replicated objects got tightened up a
little. Before, it had some leading and trailing CRLFs which, while
allowed by RFC 7233, provide no benefit. Now, both replicated and EC
multipart/byteranges avoid extraneous bytes. This let me re-use the
Content-Length calculation in swob instead of having to either hack
around it or add extraneous whitespace to match.

Change-Id: I16fc65e0ec4e356706d327bdb02a3741e36330a0
2015-06-03 11:42:00 -07:00
Michael MATUR
e7c8c578d9 fixup!Patch of "parse_content_disposition" method to meet RFC2183
The spec of Content-Disposition does not require a space character after
comma: http://www.ietf.org/rfc/rfc2183.txt

Change-Id: Iff438dc36ce78c6a79bb66ab3d889a8dae7c0e1f
Closes-Bug: #1458497
2015-05-26 11:23:58 +02:00
Samuel Merritt
94215049fd Bump up a timeout in a test
Got a slow crappy VM like I do? You might see this fail
occasionally. Bump up the timeout a little to help it out.

Change-Id: I8c0e5b99012830ea3525fa55b0811268db3da2a2
2015-05-04 15:08:51 -07:00
Samuel Merritt
decbcd24d4 Foundational support for PUT and GET of erasure-coded objects
This commit makes it possible to PUT an object into Swift and have it
stored using erasure coding instead of replication, and also to GET
the object back from Swift at a later time.

This works by splitting the incoming object into a number of segments,
erasure-coding each segment in turn to get fragments, then
concatenating the fragments into fragment archives. Segments are 1 MiB
in size, except the last, which is between 1 B and 1 MiB.

+====================================================================+
|                             object data                            |
+====================================================================+

                                   |
          +------------------------+----------------------+
          |                        |                      |
          v                        v                      v

+===================+    +===================+         +==============+
|     segment 1     |    |     segment 2     |   ...   |   segment N  |
+===================+    +===================+         +==============+

          |                       |
          |                       |
          v                       v

     /=========\             /=========\
     | pyeclib |             | pyeclib |         ...
     \=========/             \=========/

          |                       |
          |                       |
          +--> fragment A-1       +--> fragment A-2
          |                       |
          |                       |
          |                       |
          |                       |
          |                       |
          +--> fragment B-1       +--> fragment B-2
          |                       |
          |                       |
         ...                     ...

Then, object server A gets the concatenation of fragment A-1, A-2,
..., A-N, so its .data file looks like this (called a "fragment archive"):

+=====================================================================+
|     fragment A-1     |     fragment A-2     |  ...  |  fragment A-N |
+=====================================================================+

Since this means that the object server never sees the object data as
the client sent it, we have to do a few things to ensure data
integrity.

First, the proxy has to check the Etag if the client provided it; the
object server can't do it since the object server doesn't see the raw
data.

Second, if the client does not provide an Etag, the proxy computes it
and uses the MIME-PUT mechanism to provide it to the object servers
after the object body. Otherwise, the object would not have an Etag at
all.

Third, the proxy computes the MD5 of each fragment archive and sends
it to the object server using the MIME-PUT mechanism. With replicated
objects, the proxy checks that the Etags from all the object servers
match, and if they don't, returns a 500 to the client. This mitigates
the risk of data corruption in one of the proxy --> object connections,
and signals to the client when it happens. With EC objects, we can't
use that same mechanism, so we must send the checksum with each
fragment archive to get comparable protection.

On the GET path, the inverse happens: the proxy connects to a bunch of
object servers (M of them, for an M+K scheme), reads one fragment at a
time from each fragment archive, decodes those fragments into a
segment, and serves the segment to the client.

When an object server dies partway through a GET response, any
partially-fetched fragment is discarded, the resumption point is wound
back to the nearest fragment boundary, and the GET is retried with the
next object server.

GET requests for a single byterange work; GET requests for multiple
byteranges do not.

There are a number of things _not_ included in this commit. Some of
them are listed here:

 * multi-range GET

 * deferred cleanup of old .data files

 * durability (daemon to reconstruct missing archives)

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
2015-04-14 00:52:17 -07:00
Kota Tsuyuzaki
99a3d1a1c5 Fix the prefix of messages caputured from stderr
Swift caputures both sys.stdout and sys.stderr messages and
transfers them to syslog. However, whole of the messages
uses the "STDOUT:" prefix, currently.

It will be quite confusable for debugging purpose because we
cannot make sure whether "Swift didn't caputure stderr" or
"Swift captured stderr but didn't show it" when we want to log
some stderr messages.

This patch enables to allow a new prefix "STDERR:" and use it
for sys.stderr.

Change-Id: Idf4649e9860b77c3600a5cd0a3e0bd674b31fb4f
2015-03-13 04:39:16 -07:00
Prashanth Pai
a6f630f27c fsync() on directories
renamer() method now does a fsync on containing directory of target path
and also on parent dirs of newly created directories, by default.
This can be explicitly turned off in cases where it is not
necessary (For example- quarantines).

The following article explains why this is necessary:
http://lwn.net/Articles/457667/

Although, it may seem like the right thing to do, this change does come
at a performance penalty. However, no configurable option is provided to
turn it off.

Also, lock_path() inside invalidate_hash() was always creating part of
object path in filesystem. Those are never fsync'd. This has been
fixed.

Change-Id: Id8e02f84f48370edda7fb0c46e030db3b53a71e3
Signed-off-by: Prashanth Pai <ppai@redhat.com>
2015-03-04 12:33:56 +05:30
Jenkins
d6467d3385 Merge "Add multiple reseller prefixes and composite tokens" 2015-02-24 16:12:01 +00:00
Donagh McCabe
89397c5b67 Add multiple reseller prefixes and composite tokens
This change is in support of Composite Tokens and Service Accounts
(see http://specs.openstack.org/openstack/swift-specs/specs/in_progress/
service_token.html)

During coding, minor changes were made compared to the original
specification. See https://review.openstack.org/138771 for these changes.

DocImpact

Change-Id: I6072b4efb3a479a8e0cc2d9c11ffda5764b55e30
2015-02-23 15:57:20 +00:00
Jenkins
a10e42d910 Merge "Fix logger test coupling" 2015-02-13 18:53:35 +00:00
Alistair Coles
f5ab30574e Fix logger test coupling
Two tests fail if test/unit/common/test_daemon.py is excluded
from the test set:

test.unit.common.test_utils.TestUtils.test_swift_log_formatter
test.unit.common.test_utils.TestStatsdLoggingDelegation.test_thread_locals

They fail because they assert that logger.txn_id is None,
when previous tests cause txn_id to be set. This coupling is masked
when test_daemon.py is included because it reloads the common/utils
module, and is executed just before test.unit.common.test_utils.

The coupling can be observed by running, for example:

nosetests ./test/unit/common/middleware/test_except.py ./test/unit/common/test_utils.py
or
nosetests ./test/unit/proxy ./test/unit/common/test_utils.py

The failing tests should reset logger thread_locals before making
assertions. test_utils.reset_loggers() attempts this but is broken
because it sets thread_local on the wrong object.

Changes in this patch:
 - fix reset_loggers() to reset the LogAdapter thread local state
 - add a reset_logger_state decorator to call reset_loggers
   before and after a decorated method
 - use the decorator for increased isolation of tests that previously
   called reset_loggers only on exit

Change-Id: If9aa781a2dd1929a47ef69322ec8c53263d47660
2015-02-12 15:32:25 +00:00
Jenkins
0186194e96 Merge "Output logs of policy index" 2015-02-10 22:21:13 +00:00