470 Commits

Author SHA1 Message Date
Zuul
8359f43079 Merge "Increase name-length limits for internal accounts" 2018-01-31 00:29:29 +00:00
Zuul
0cf032a75f Merge "Return HTTPServerError instead of HTTPNotFound" 2018-01-19 03:26:14 +00:00
Christopher Bartz
d8f9045518 Send correct number of X-Delete-At-* headers
Send just as many requests with X-Delete-At-* as we do X-Container-* to
the object server.  Furthermore, stop the object server on making an
update to the expirer queue when it wasn't told to do so and remove the
log warning which would have been produced.

Reason:

It can be the case that the number of object replicas (OR) is larger
than the number of container replicas (CR) for a given storage policy
(most likely in case of EC).  Before this commit, only CR object servers
received the x-delete-at-* headers, which means that OR - CR object
servers did not receive the headers.  The servers missing the header
would produce a log warning and create the x-delete-at-container header
and async update on their own, which could lead to a bug, if the
expiring_objects_container_divisor option was misconfigured.

Change-Id: I20fc2f42f590fda995814a2fa7ba86019f9fddc1
Closes-Bug: #1733588
2018-01-17 01:36:28 +00:00
cheng
5cbd5cf303 Return HTTPServerError instead of HTTPNotFound
Swift allows autocreate account. It should be treat as server error
instead of 404 when it fails to create account

Change-Id: I726271bc06e3c1b07a4af504c3fd7ddb789bd512
Closes-bug: 1718810
2018-01-15 14:22:26 +08:00
Zuul
9a323e1989 Merge "Fix socket leak on 416 EC GET responses." 2018-01-12 12:48:43 +00:00
Samuel Merritt
a41c458c90 proxy: make the right number of container updates
When the proxy is putting X-Container headers into object PUT
requests, it should put out just enough to make the container update
durable in the worst case. It shouldn't do more, since that results in
extra work for the container servers; and it shouldn't do less, since
that results in objects not showing up in listings.

The current code gets the number right as long as you have 3 container
replicas and an odd number of object replicas, but it comes up with
some bogus numbers in other cases. The number it computes is
(object-quorum + 1).

This patch changes the number to (container-quorum +
max_put_failures).

Example: given an EC 12+5 policy and 3 container replicas, you can
lose up to 4 connections and still succeed. Since you need to have 2
container updates happen for durability, you need 6 connections to
have X-Container headers. That way, you can lose 4 and still have 2
left. The current code would put X-Container headers on 14 of the
connections, resulting in more than double the workload on the
container servers; this patch changes the number to 6.

Example 2: given a (crazy) EC 3+6 policy and 3 container replicas, you
can lose up to 5 connections, so you need X-Container headers on
7. The current code only sends 5, giving a worst-case result of a PUT
succeeds but never reaches the containers. This patch changes the
number to 7.

Other examples:
                          |  current  |  this change  |
                        --+-----------+---------------+
EC 10+4, 3x container     |    12     |      5        |
EC 10+4, 5x container     |    12     |      6        |
EC 15+4, 3x container     |    17     |      5        |
EC 15+4, 5x container     |    17     |      6        |
EC 4+8, 3x container      |    6      |      9        |
7x object, 3x container   |    5      |      5        |
6x object, 3x container   |    4      |      5        |

Change-Id: I34efd48655b890340912810ab111bb63445e5c8b
2018-01-09 16:34:24 -08:00
Samuel Merritt
f709eed41b Fix socket leak on 416 EC GET responses.
Sometimes, when handling an EC GET request with a Range header, the
object servers reply 206 to the proxy, but the proxy (correctly)
replies 416 to the client[1]. In that case, the connections to the object
servers were not being closed. This was due to improper error handling
in ECAppIter.

Since ECAppIter is intended to be a WSGI iterable, it expects to have
its close() method called when the caller is done with it. In this
particular case, the caller (ECAppIter.kickoff()) was not calling
close() when an exception was raised. Now it is.

[1] consider a 4+2 EC policy with segment size 1024, an 20 byte
object, and a request with "Range: bytes=21-50". The proxy needs whole
fragments to decode, so it asks the object server for "Range:
bytes=0-255" [2], the object server says 206, and then the proxy
realizes that the client's request is unsatisfiable and tells the
client 416.

[2] segment size 1024 and 4 data fragments means the fragments have
size 1024 / 4 = 256, hence "bytes=0-255" asks for the first whole
fragment

Change-Id: Ide2edf8c449c97d45f48c2dbbbff7aebefa4b158
Closes-Bug: 1738804
2017-12-29 10:26:06 -08:00
Samuel Merritt
728b4ba140 Add checksum to object extended attributes
Currently, our integrity checking for objects is pretty weak when it
comes to object metadata. If the extended attributes on a .data or
.meta file get corrupted in such a way that we can still unpickle it,
we don't have anything that detects that.

This could be especially bad with encrypted etags; if the encrypted
etag (X-Object-Sysmeta-Crypto-Etag or whatever it is) gets some bits
flipped, then we'll cheerfully decrypt the cipherjunk into plainjunk,
then send it to the client. Net effect is that the client sees a GET
response with an ETag that doesn't match the MD5 of the object *and*
Swift has no way of detecting and quarantining this object.

Note that, with an unencrypted object, if the ETag metadatum gets
mangled, then the object will be quarantined by the object server or
auditor, whichever notices first.

As part of this commit, I also ripped out some mocking of
getxattr/setxattr in tests. It appears to be there to allow unit tests
to run on systems where /tmp doesn't support xattrs. However, since
the mock is keyed off of inode number and inode numbers get re-used,
there's lots of leakage between different test runs. On a real FS,
unlinking a file and then creating a new one of the same name will
also reset the xattrs; this isn't the case with the mock.

The mock was pretty old; Ubuntu 12.04 and up all support xattrs in
/tmp, and recent Red Hat / CentOS releases do too. The xattr mock was
added in 2011; maybe it was to support Ubuntu Lucid Lynx?

Bonus: now you can pause a test with the debugger, inspect its files
in /tmp, and actually see the xattrs along with the data.

Since this patch now uses a real filesystem for testing filesystem
operations, tests are skipped if the underlying filesystem does not
support setting xattrs (eg tmpfs or more than 4k of xattrs on ext4).

References to "/tmp" have been replaced with calls to
tempfile.gettempdir(). This will allow setting the TMPDIR envvar in
test setup and getting an XFS filesystem instead of ext4 or tmpfs.

THIS PATCH SIGNIFICANTLY CHANGES TESTING ENVIRONMENTS

With this patch, every test environment will require TMPDIR to be
using a filesystem that supports at least 4k of extended attributes.
Neither ext4 nor tempfs support this. XFS is recommended.

So why all the SkipTests? Why not simply raise an error? We still need
the tests to run on the base image for OpenStack's CI system. Since
we were previously mocking out xattr, there wasn't a problem, but we
also weren't actually testing anything. This patch adds functionality
to validate xattr data, so we need to drop the mock.

`test.unit.skip_if_no_xattrs()` is also imported into `test.functional`
so that functional tests can import it from the functional test
namespace.

The related OpenStack CI infrastructure changes are made in
https://review.openstack.org/#/c/394600/.

Co-Authored-By: John Dickinson <me@not.mn>

Change-Id: I98a37c0d451f4960b7a12f648e4405c6c6716808
2017-11-03 13:30:05 -04:00
Tim Burke
5b26516ab2 Increase name-length limits for internal accounts
Double account and container name-length limits for accounts starting with
auto_create_account_prefix (default: '.') -- these are used internally by
Swift, and may need to have some prefix followed by a user-settable value.

Related-Change: Ice703dc6d98108ad251c43f824426d026e1f1d97
Change-Id: Ie1ce5ea49b06ab3002c0bd0fad7cea16cea2598e
2017-10-13 12:46:37 -07:00
Tim Burke
c118059719 Respond 400 Bad Request when Accept headers fail to parse
Change-Id: I6eb4e4bca95e2ee4fecdb703394cb2419737922d
Closes-Bug: 1716509
2017-10-13 12:35:21 -07:00
Alistair Coles
a4a5494fd2 test account autocreate listing format
Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
Change-Id: I50c22225bbebff71600bea9158bda1edd18b48b0
2017-10-09 13:56:26 -07:00
Tim Burke
839c13003a Stop clearing params for account_autocreate responses
Otherwise, we send back a 204 where middlewares should be expecting
a 200 and an empty JSON array.

Change-Id: I05549342327108f71b60a316f734c55bc9589915
Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
2017-10-04 19:01:29 +00:00
Jenkins
23de16b0bf Merge "Move listing formatting out to proxy middleware" 2017-09-20 01:15:28 +00:00
Kota Tsuyuzaki
1e79f828ad Remove all post_as_copy related code and configes
It was deprecated and we discussed on this topic in Denver PTG
for Queen cycle. Main motivation for this work is that deprecated
post_as_copy option and its gate blocks future symlink work.

Change-Id: I411893db1565864ed5beb6ae75c38b982a574476
2017-09-16 05:50:41 +00:00
Tim Burke
4806434cb0 Move listing formatting out to proxy middleware
Make some json -> (text, xml) stuff in a common module, reference that in
account/container servers so we don't break existing clients (including
out-of-date proxies), but have the proxy controllers always force a json
listing.

This simplifies operations on listings (such as the ones already happening in
decrypter, or the ones planned for symlink and sharding) by only needing to
consider a single response type.

There is a downside of larger backend requests for text/plain listings, but
it seems like a net win?

Change-Id: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
2017-09-15 06:38:26 +00:00
Alistair Coles
0e51ac09ad Cleanup test tempdirs in tearDown
Stop leaking tmp dirs in unit tests

Change-Id: I606e9deeedc7c52a85d270b3cef7dfba13b4f0d3
2017-09-01 15:34:30 +01:00
Christopher Bartz
c653566f4a headers_to_account_info include per policy stats
This commit adds per-policy stats to
proxy.controllers.base.headers_to_account_info

Change-Id: I800266d15aabcc7b6e0234de3c9b965b5c15a623
Closes-Bug: #1675776
2017-08-14 11:41:04 +02:00
Jenkins
c3f6e82ae1 Merge "Write-affinity aware object deletion" 2017-07-06 14:00:05 +00:00
Jenkins
c22bab4b34 Merge "Version DLOs, just like every other type of object" 2017-07-03 14:06:07 +00:00
Lingxian Kong
831eb6e3ce Write-affinity aware object deletion
When deleting objects in multi-region swift delpoyment with write
affinity configured, users always get 404 when deleting object before
it's replcated to approriate nodes.

This patch adds a config item 'write_affinity_handoff_delete_count' so
that operator could define how many local handoff nodes should swift
send request to get more candidates for the final response, or by
default just leave it to swift to calculate the appropriate number.

Change-Id: Ic4ef82e4fc1a91c85bdbc6bf41705a76f16d1341
Closes-Bug: #1503161
2017-06-27 22:42:02 +12:00
Kota Tsuyuzaki
066f44323d Follow up for affinity config per policy
This changes:
- Add assertions for write_affinity values in load_app tests
- Add a test case that policy override read_affinity default by timing
  strategy
- Avoid 'scope' but it's 'label'

Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc
Change-Id: Ia8262490895d60da345f3679fc53653b2c2a2b3e
2017-06-13 11:53:11 +01:00
Jenkins
0d5b2a867d Merge "Follow-up for per-policy proxy configs" 2017-06-05 08:50:45 +00:00
Tim Burke
5ecf828b17 Follow-up for per-policy proxy configs
* Only use one StringIO in ConfigString
* Rename the write_affinity_node_count function to be
  write_affinity_node_count_fn
* Use comprehensions instead of six.moves.filter
* Rename OverrideConf to ProxyOverrideOptions
* Make ProxyOverrideOptions's __repr__ eval()able
* Various conf -> options renames
* Stop trying to handle a KeyError that should never come up
* Be explicit about how deep we need to copy in proxy/test_server.py
* Drop an unused return value
* Add a test for a non-"proxy-server" app name
* Combine bad-section-name tests
* Try to clean up (at least a little) a self-described "hokey test"

Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc
Change-Id: I4e81175d5445049bc1f48b3ac02c5bc0f77e6f59
2017-06-01 20:36:19 +00:00
Alistair Coles
227cef9933 Add link from policies overview to per-policy proxy-server conf
- add proxy server per policy config as an optional
  step in the configuration of a policy, with link to
  the deployment guide

- add reverse link from deployment guide per-policy
  config doc section to storage policies docs

Drive-by fix an incorrect test comment

Change-Id: Ib95310193270a63c9d1e321c6e7de240e00b387f
Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc
2017-05-26 10:41:35 +01:00
Jenkins
263dc8a3f3 Merge "Enable per policy proxy config options" 2017-05-25 06:34:48 +00:00
Jenkins
6f7b1f9ee2 Merge "Use setUpModule instead of setup for module level unit test setup" 2017-05-24 21:29:09 +00:00
Alistair Coles
45884c1102 Enable per policy proxy config options
This is an alternative approach to that proposed in [1]

Adds support for optional per-policy config sections
to be added in proxy-server.conf. This is highly desirable
to allow per-policy affinity options to be set for use with
duplicated EC policies [2] and composite rings [3].

Certain options found in per-policy conf sections will
override their equivalents that may be set in the
[app:proxy-server] section. Currently the options
handled that way are:

  sorting_method
  read_affinity
  write_affinity
  write_affinity_node_count

For example:

  [proxy-server:policy:0]
  sorting_method = affinity
  read_affinity = r1=100
  write_affinity = r1
  write_affinity_node_count = 1 * replicas

The corresponding attributes of the proxy-server Application
are now available from instances of an OverrideConf object
that is obtained from Application.get_policy_options(policy).

[1] Related-Change: I9104fc789ba85ab3ab5ccd34096125b482821389
[2] Related-Change: Idd155401982a2c48110c30b480966a863f6bd305
[3] Related-Change: I0d8928b55020592f8e75321d1f7678688301d797

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I3f718f425f525baa80045ba067950c752bcaaefc
2017-05-23 20:22:30 +01:00
Pete Zaitcev
5dfc3a75fb Open-code eventlet.listen()
Recently out gate started blowing up intermittently with a strange
case of ports mixed up. Sometimes a functional tests tries to
authorize on a port that's clearly an object server port, and
the like. As it turns out, eventlet developers added an unavoidable
SO_REUSEPORT into listen(), which makes listen(("localhost",0)
to reuse ports.

There's an issue about it:
 https://github.com/eventlet/eventlet/issues/411

This patch is working around the problem while eventlet people
consider the issue.

Change-Id: I67522909f96495a6a30e1acdb79835dce2189549
2017-05-11 01:39:14 -06:00
Alistair Coles
511ac2ee60 Use setUpModule instead of setup for module level unit test setup
Module setup() and teardown() functions are found by nosetests [1] but
unittests expects setUpModule() and tearDownModule() [2]. The latter
function names are also found by nosetests, so using those function
names enables the test module to be run with either nosetests or
unittest.

Although the tox test envs and .unittests script use nosetests, this
change allows the convenience of using unittest, for example when it
is the default test runner in a development environment such as
PyCharm.

This change also makes it unnecessary to explicitly call the setup()
and teardown() functions when executing the module directly.

[1] http://nose.readthedocs.io/en/latest/writing_tests.html#test-modules
[2] https://docs.python.org/2/library/unittest.html#setupmodule-and-teardownmodule

Change-Id: Ib2e5470a339af1f937b25d643b64356e8848ed36
2017-05-04 12:47:17 +01:00
Jenkins
dd3bc8fe61 Merge "Move EC-specific unit test to EC Test class" 2017-05-03 23:20:36 +00:00
Jenkins
33a422af9f Merge "Fix (un)patch_policies" 2017-05-03 21:11:39 +00:00
Matthew Oliver
a07f7dc8c0 Fix sporadic failure in TestAccountController unit test
The proxy server on occasion has error limited a node by the time the
test runs, causing the proxie's node_iter failing to iter out this
error limited  node. As the test uses a default FakeRing with no
extra handoffs, on this occasion we only get 2 requests which is not
enough for quorum, causing it to return a 503.

This patch sets the error_suppression_interval to 0 when creating
the proxy server. Meaning a node effectively isn't error_limited.

Change-Id: I96cf4c4d63594f803cc1cd57e874d1624db8e249
Closes-Bug: #1682026
2017-04-27 01:03:29 +00:00
Alistair Coles
c740447de5 Move EC-specific unit test to EC Test class
The refactoring in the Related-Change separated EC
specific object controller tests into EC specific TestCase
classes, but left two EC specific tests in the Replication
object controller test class. This patch moves them to the
appropriate test class.

Previously the tests were only executed once, now they are
executed in each of two subclasses using different EC
policies. As a result it was necessary to make the test
container name unique to the policy under test.

Related-Change: Ifd3d0fa66773e640bb61cc528f7a1b2358e97d91

Change-Id: Ie712ea91b5dd74c504a0dd6aa40c3d657277108c
2017-04-19 10:39:43 +01:00
Kota Tsuyuzaki
381640cf90 Fix (un)patch_policies
Due to the refactoring of TestObjectController (related-change),
all of BaseTestECObjectController test methods are not being
needed to be unpatched because they are expected to run for test
setup-ed policies.

This patch works for items as follows:

- Move part of setUp/tearDown routines at BaseTestObjectController
  needed by only TestReplicatedObjectController which affects
  patch_policies

- Remove all unpatch_policies from BaseTestECObjectController

- Set up self.ec_policy to avoid to set policy index and
  retrieve the policy for each test method.

The reason why I didn't squash this up to the related parent patch is
to clarify what was changed at those patches. The parent is for just
clustering the tests for each test class and this one attempts to
improve.

Related-Change: Idd155401982a2c48110c30b480966a863f6bd305
Change-Id: I25a3f8fc837706d78dca226fe282d9e5ead65a0d
2017-04-18 23:30:39 -07:00
Tim Burke
3ad8773239 Version DLOs, just like every other type of object
Previously, requests involving DLOs would bypass versioned_writes:

 * Any existing DLOs wouldn't get copied to the archive container during
   overwrites (or deletes, with history-mode), so there would be no
   evidence they had ever existed.

 * Any new DLOs wouldn't copy overwritten objects to the archive
   container, potentially leading to data loss.

Now, DLOs will behave like every other type of object under
versioned_writes.

Change-Id: I488e13eead2f33dd272d03f6f898adc52fc7fdad
Related-Change: Ie899290b3312e201979eafefb253d1a60b65b837
Related-Change: Ib5b29a19e1d577026deb50fc9d26064a8da81cd7
Closes-Bug: #1626989
2017-03-27 17:15:13 +00:00
Kota Tsuyuzaki
8fe4bfefaa TestObjectController refactoring
From the related change of ECDuplication, Swift have a couple of Test
classes for EC policy, normal EC and EC Duplication, in the
test/unit/proxy/test_server.py. To enable the classes, the related change
abstracts the EC test cases as the ECTestMixin class to gather test
methods into one place but it was worse because TestObjectController did
still have both test cases for replication and for ec that may be hard
to understand the test class structure.

Hence, this patch attempts to refactor the structure as

From:

     ECTestMixin
            |
    -------------------------------------
    |                                   |
TestObjectController           TestObjectControllerECDuplication
(for replication and EC)       (for EC Duplication Policy)

To:

    BaseTestObjectController
            |
    --------------------------------------
    |                                    |
TestReplicatedObjectController  BaseTestECObjectController
(for replication)                        |
                          ---------------------------------
                          |                               |
                TestECObjectController    TestECDuplicationObjectController
                (for EC policy)           (for EC Duplication Policy)

Some more cleanups are in follow up patches because this patch shows a lot
of moving code chunks which could be hard to compare the diff. To make
the review easy, this patch forcus on ONLY the structure changes as
possible.

Related-Change: Idd155401982a2c48110c30b480966a863f6bd305
Related-Change: I25a3f8fc837706d78dca226fe282d9e5ead65a0d
Change-Id: Ifd3d0fa66773e640bb61cc528f7a1b2358e97d91
2017-03-22 19:54:50 +00:00
Jenkins
1e9b8888bf Merge "Enable cluster-wide CORS Expose-Headers setting" 2017-03-13 19:24:20 +00:00
Jenkins
cf1c44dff0 Merge "Fixups for EC frag duplication tests" 2017-03-03 23:08:34 +00:00
Jenkins
1f36b5dd16 Merge "EC Fragment Duplication - Foundational Global EC Cluster Support" 2017-02-26 06:26:08 +00:00
Alistair Coles
e4972f5ac7 Fixups for EC frag duplication tests
Follow up for related change:
- fix typos
- use common helper methods
- refactor some tests to reduce duplicate code

Related-Change: Idd155401982a2c48110c30b480966a863f6bd305

Change-Id: I2f91a2f31e4c1b11f3d685fa8166c1a25eb87429
2017-02-25 20:40:04 -08:00
Romain LE DISEZ
9b47de3095 Enable cluster-wide CORS Expose-Headers setting
An operator proposing a web UX to its customers might want to allow web
browser to access some headers by default (eg: X-Storage-Policy,
 X-Container-Read, ...). This commit adds a new setting to the
proxy-server to allow some headers to be added cluster-wide to the CORS
header Access-Control-Expose-Headers.

Change-Id: I5ca90a052f27c98a514a96ee2299bfa1b6d46334
2017-02-25 19:00:28 +01:00
Jenkins
075c21a944 Merge "Add Vary: headers for CORS responses" 2017-02-23 01:45:29 +00:00
Kota Tsuyuzaki
40ba7f6172 EC Fragment Duplication - Foundational Global EC Cluster Support
This patch enables efficent PUT/GET for global distributed cluster[1].

Problem:
Erasure coding has the capability to decrease the amout of actual stored
data less then replicated model. For example, ec_k=6, ec_m=3 parameter
can be 1.5x of the original data which is smaller than 3x replicated.
However, unlike replication, erasure coding requires availability of at
least some ec_k fragments of the total ec_k + ec_m fragments to service
read (e.g. 6 of 9 in the case above). As such, if we stored the
EC object into a swift cluster on 2 geographically distributed data
centers which have the same volume of disks, it is likely the fragments
will be stored evenly (about 4 and 5) so we still need to access a
faraway data center to decode the original object. In addition, if one
of the data centers was lost in a disaster, the stored objects will be
lost forever, and we have to cry a lot. To ensure highly durable
storage, you would think of making *more* parity fragments (e.g.
ec_k=6, ec_m=10), unfortunately this causes *significant* performance
degradation due to the cost of mathmetical caluculation for erasure
coding encode/decode.

How this resolves the problem:
EC Fragment Duplication extends on the initial solution to add *more*
fragments from which to rebuild an object similar to the solution
described above. The difference is making *copies* of encoded fragments.
With experimental results[1][2], employing small ec_k and ec_m shows
enough performance to store/retrieve objects.

On PUT:

- Encode incomming object with small ec_k and ec_m  <- faster!
- Make duplicated copies of the encoded fragments. The # of copies
  are determined by 'ec_duplication_factor' in swift.conf
- Store all fragments in Swift Global EC Cluster

The duplicated fragments increase pressure on existing requirements
when decoding objects in service to a read request.  All fragments are
stored with their X-Object-Sysmeta-Ec-Frag-Index.  In this change, the
X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index
encoded by PyECLib, there *will* be duplicates.  Anytime we must decode
the original object data, we must only consider the ec_k fragments as
unique according to their X-Object-Sysmeta-Ec-Frag-Index.  On decode no
duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an
object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and
avoided if possible.

On GET:

This patch inclues following changes:
- Change GET Path to sort primary nodes grouping as subsets, so that
  each subset will includes unique fragments
- Change Reconstructor to be more aware of possibly duplicate fragments

For example, with this change, a policy could be configured such that

swift.conf:
ec_num_data_fragments = 2
ec_num_parity_fragments = 1
ec_duplication_factor = 2
(object ring must have 6 replicas)

At Object-Server:
node index (from object ring):  0 1 2 3 4 5 <- keep node index for
                                               reconstruct decision
X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual
                                               fragment index for
                                               backend (PyEClib)

Additional improvements to Global EC Cluster Support will require
features such as Composite Rings, and more efficient fragment
rebalance/reconstruction.

1: http://goo.gl/IYiNPk (Swift Design Spec Repository)
2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo)

Doc-Impact

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Idd155401982a2c48110c30b480966a863f6bd305
2017-02-22 10:56:13 -08:00
Tim Burke
e8a80e874a Let users know entity size in 416 responses
If a user sends a Range header with no satisfiable ranges, we send back
a 416 Requested Range Not Satisfiable response. Previously however,
there would be no indication of the size of the object they were
requesting, so they wouldn't know how to craft a satisfiable range. We
*do* send a Content-Length, but it is (correctly) the length of the
error message.

The RFC [1] has an answer for this:

>  A server generating a 416 (Range Not Satisfiable) response to a
>  byte-range request SHOULD send a Content-Range header field with an
>  unsatisfied-range value, as in the following example:
>
>    Content-Range: bytes */1234
>
>  The complete-length in a 416 response indicates the current length of
>  the selected representation.

Now, we'll send a Content-Range header for all 416 responses, including
those coming from the object server as well as those generated on a
proxy because of the Range mangling required to support EC policies.

[1] RFC 7233, section 4.2, although similar language was used in RFC
2616, sections 10.4.17 and 14.16

Change-Id: I80c7390fc6f84a10a212b0641bb07a64dfccbd45
2016-11-30 10:52:08 -08:00
Tim Burke
e8a5448b07 Add X-Openstack-Request-Id to Access-Control-Expose-Headers
Change-Id: Ib95a693042f0b3cf204033eb5957660cb3573dcf
Related-Change: I56cd4738808b99c0a08463f83c100be51a62db05
2016-11-16 12:39:12 -08:00
Alistair Coles
b13b49a27c EC - eliminate .durable files
Instead of using a separate .durable file to indicate
the durable status of a .data file, rename the .data
to include a durable marker in the filename. This saves
one inode for every EC fragment archive.

An EC policy PUT will, as before, first rename a temp
file to:

   <timestamp>#<frag_index>.data

but now, when the object is committed, that file will be
renamed:

   <timestamp>#<frag_index>#d.data

with the '#d' suffix marking the data file as durable.

Diskfile suffix hashing returns the same result when the
new durable-data filename or the legacy durable file is
found in an object directory. A fragment archive that has
been created on an upgraded object server will therefore
appear to be in the same state, as far as the consistency
engine is concerned, as the same fragment archive created
on an older object server.

Since legacy .durable files will still exist in deployed
clusters, many of the unit tests scenarios have been
duplicated for both new durable-data filenames and legacy
durable files.

Change-Id: I6f1f62d47be0b0ac7919888c77480a636f11f607
2016-10-10 18:11:02 +01:00
Jenkins
8526d4c5d2 Merge "Fix using filter() to meet python2,3" 2016-09-28 22:25:57 +00:00
Clay Gerrard
bfaa8e0583 Fix ChunkWriteError when running unittests
I don't think this is a real bug - just that the mocked iter wasn't
closing it subiters like the real iter does.

Change-Id: I44c8159f9eea8737bc86b6c7eb59a512e57e86c1
2016-09-21 17:33:30 -07:00
Luong Anh Tuan
19a684dded Fix using filter() to meet python2,3
As mentioned in link[1], if we need filter on python3,
Raplace filter(lambda obj: test(obj), data) with:
[obj for obj in data if test(obj)].

[1] https://wiki.openstack.org/wiki/Python3

Change-Id: Ia1ea2ec89e4beb957a4cb358b0d0cef970f23e0a
2016-09-22 07:32:38 +07:00
Alistair Coles
44a861787a Enable object server to return non-durable data
This patch improves EC GET response handling:

- The proxy no longer requires all object servers to have a
  durable file for the fragment archive that they return in
  response to a GET. The proxy will now be satisfied if just
  one object server has a durable file at the same timestamp
  as fragments from other object servers.

  This means that the proxy can now successfully GET an
  object that had missing durable files when it was PUT.

- The proxy will now ensure that it has a quorum of *unique*
  fragment indexes from object servers before considering a
  GET to be successful.

- The proxy is now able to fetch multiple fragment archives
  having different indexes from the same node. This enables
  the proxy to successfully GET an object that has some
  fragments that have landed on the same node, for example
  after a rebalance.

This new behavior is facilitated by an exchange of new
headers on a GET request and response between the proxy and
object servers.

An object server now includes with a GET (or HEAD) response:

- X-Backend-Fragments: the value of this describes all
  fragment archive indexes that the server has for the
  object by encoding a map of the form: timestamp -> <list
  of fragment indexes>

- X-Backend-Durable-Timestamp: the value of this is the
  internal form of the timestamp of the newest durable file
  that was found, if any.

- X-Backend-Data-Timestamp: the value of this is the
  internal form of the timestamp of the data file that was
  used to construct the diskfile.

A proxy server now includes with a GET request:

- X-Backend-Fragment-Preferences: the value of this
  describes the proxy's current preference with respect to
  those fragments that it would have object servers
  return. It encodes a list of timestamp, and for each
  timestamp a list of fragment indexes that the proxy does
  NOT require (because it already has them).

  The presence of a X-Backend-Fragment-Preferences header
  (even one with an empty list as its value) will cause the
  object server to search for the most appropriate fragment
  to return, disregarding the existence or not of any
  durable file. The object server assumes that the proxy
  knows best.

Closes-Bug: 1469094
Closes-Bug: 1484598

Change-Id: I2310981fd1c4622ff5d1a739cbcc59637ffe3fc3
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
2016-09-16 11:40:14 +01:00