Unlinke DNSException, NXDOMAIN and NoAnswer are not errors. Thus, they
should be cached so that a misconfiguration from a cname_lookup user do
not generate too many requests on the nameservers.
This patch cache NXDOMAIN and NoAnswer response for 60 seconds.
Change-Id: I1d3002bceaf5f5bee364fea6afe52cbf2aeb5fd2
Layout the foundation for documenting the features which will enable
Global EC.
The formatting on the sections in our existing EC docs didn't follow
best practices [1] and it caused some sphinx build warnings.
1. http://www.sphinx-doc.org/en/stable/rest.html#sections
Change-Id: I2d164dafeb84629c75c3c2ff774329ee84270b7f
Since version 0.20, eventlet bundle a dnspython version (commit:
52b09becacd23f384cf69ae37d70c893c43e3b13). Since then, catching
exceptions require the full module path.
Closes-Bug: #1656891
Change-Id: Iac6bb974c1a5d084e450057cf5de1eec80ae21a1
This is follow up for https://review.openstack.org/#/c/436522/
I'd like to use same assertion if it goes the same path.
Both Exception and Timeout will be in the exception log starts with
"Trying to GET". "Timeout" is an extra word appeared in the log.
And more, this adds assertions for the return value from the
get_response for error cases which should be as None.
Change-Id: Iba86b495a14c15fc6eca8bf8a7df7d110256b0af
Not so long ago, we changed our default port ranges from 60xx to 62xx
but we left the install guide using the old ranges.
Closes-Bug: #1669389
Related-Change: Ie1c778b159792c8e259e2a54cb86051686ac9d18
Change-Id: Ie4fee05b5f7e0c0879a7b42973bca459f7c85408
There's no python3-devel rpm package, so jobs fails when trying
to install packages in bindep.txt. Updated with the correct name:
python34-devel
Change-Id: I14f07f92d2a622f89062fc09969ca7087920b6cc
Signed-off-by: Thiago da Silva <thiago@redhat.com>
Some part of the test coverage was omitted in related change
and some has been missing. This change fixes it.
Change-Id: I403b493bd8e59f6bcb586b4263a8e8c267728505
Related-Change-Id: I69e4c4baee64fd2192cbf5836b0803db1cc71705
Previously, Swift3 used client-facing HTTP headers to pass the S3 access
key, signature, and normalized request through the WSGI pipeline.
However, tempauth did not validate that Swift3 actually set the headers;
as a result, an attacker who has captured either a single valid S3-style
temporary URL or a single valid request through the S3 API may impersonate
the user that signed the URL or issued the request indefinitely through
the Swift API.
Now, the S3 authentication information will be taken from a separate
namespace in the WSGI environment, completely inaccessible to the
client. Specifically,
environ['swift3.auth_details'] = {
'access_key': <access key>,
'signature': <signature>,
'string_to_sign': <normalized request>,
}
Note that tempauth is not expected to be in production use, but may have
been used as a template by other authentication middlewares to add their
own Swift3 support.
Change-Id: Ib90adcc2f059adaf203fba1c95b2154561ea7487
Related-Change: Ia3fbb4938f0daa8845cba4137a01cc43bc1a713c
Follow up for related change:
- fix typos
- use common helper methods
- refactor some tests to reduce duplicate code
Related-Change: Idd155401982a2c48110c30b480966a863f6bd305
Change-Id: I2f91a2f31e4c1b11f3d685fa8166c1a25eb87429
The requests library checks that the headers are either strings or
bytes. Currently, the two test_object_expirer tests fail with the
message:
InvalidHeader: Header value 1487879553 must be of type str
or bytes, not <type 'int'>
The header in question is "x-delete-at". The patch converts it to
a string, before making a Swift Client request.
Change-Id: I738697cb6b696f0e346345f75e0069048961f2ff
The patch adds documentation for the SLO raw format, specifically what
the fields are in relation to the documented format (hash vs etag,
bytes vs size_bytes, and name vs path).
Change-Id: I44c74ad406a6e55e4228f52fac623eeabbd7564f
Commit bfbf0d1e78ff47937adb06bce648a0b915c838d1 removed a check that was
meant to avoid to resolve a storage domain. It breaks the behavior of the
middleware as the resolution of a storage domain will return nothing, so
the global resolution will fail.
Example:
Host header: storage.example.com
storage_domain: [.storage.example.com]
Host does not end with one of the storage_domains (because of the dot),
so the middleware will loop to resolve the CNAME of storage.example.com,
but it won't succeed because it's a storage_domain.
Closes-Bug: #1311435
Change-Id: If594b816ff2f7025521de716b32c42bf3137f5dd
This patch enables efficent PUT/GET for global distributed cluster[1].
Problem:
Erasure coding has the capability to decrease the amout of actual stored
data less then replicated model. For example, ec_k=6, ec_m=3 parameter
can be 1.5x of the original data which is smaller than 3x replicated.
However, unlike replication, erasure coding requires availability of at
least some ec_k fragments of the total ec_k + ec_m fragments to service
read (e.g. 6 of 9 in the case above). As such, if we stored the
EC object into a swift cluster on 2 geographically distributed data
centers which have the same volume of disks, it is likely the fragments
will be stored evenly (about 4 and 5) so we still need to access a
faraway data center to decode the original object. In addition, if one
of the data centers was lost in a disaster, the stored objects will be
lost forever, and we have to cry a lot. To ensure highly durable
storage, you would think of making *more* parity fragments (e.g.
ec_k=6, ec_m=10), unfortunately this causes *significant* performance
degradation due to the cost of mathmetical caluculation for erasure
coding encode/decode.
How this resolves the problem:
EC Fragment Duplication extends on the initial solution to add *more*
fragments from which to rebuild an object similar to the solution
described above. The difference is making *copies* of encoded fragments.
With experimental results[1][2], employing small ec_k and ec_m shows
enough performance to store/retrieve objects.
On PUT:
- Encode incomming object with small ec_k and ec_m <- faster!
- Make duplicated copies of the encoded fragments. The # of copies
are determined by 'ec_duplication_factor' in swift.conf
- Store all fragments in Swift Global EC Cluster
The duplicated fragments increase pressure on existing requirements
when decoding objects in service to a read request. All fragments are
stored with their X-Object-Sysmeta-Ec-Frag-Index. In this change, the
X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index
encoded by PyECLib, there *will* be duplicates. Anytime we must decode
the original object data, we must only consider the ec_k fragments as
unique according to their X-Object-Sysmeta-Ec-Frag-Index. On decode no
duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an
object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and
avoided if possible.
On GET:
This patch inclues following changes:
- Change GET Path to sort primary nodes grouping as subsets, so that
each subset will includes unique fragments
- Change Reconstructor to be more aware of possibly duplicate fragments
For example, with this change, a policy could be configured such that
swift.conf:
ec_num_data_fragments = 2
ec_num_parity_fragments = 1
ec_duplication_factor = 2
(object ring must have 6 replicas)
At Object-Server:
node index (from object ring): 0 1 2 3 4 5 <- keep node index for
reconstruct decision
X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual
fragment index for
backend (PyEClib)
Additional improvements to Global EC Cluster Support will require
features such as Composite Rings, and more efficient fragment
rebalance/reconstruction.
1: http://goo.gl/IYiNPk (Swift Design Spec Repository)
2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo)
Doc-Impact
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Idd155401982a2c48110c30b480966a863f6bd305
Now that we're shuffling parts before going through them, those stats no
longer make sense -- device completion would always be 100%.
Also, always use delete_partition for cleanup, so we have one place to
make improvements. This means we'll properly clean up non-numeric
directories.
Also also, put more I/O in the tpool in delete_partition.
Change-Id: Ie06bb16c130d46ccf887c8fcb252b8d018072d68
Related-Change: I69e4c4baee64fd2192cbf5836b0803db1cc71705
This is a follow-up for https://review.openstack.org/#/c/425493
This patch includes:
- Add more tests on the configuration with handoffs_first and
handoffs_only
- Remove unnecessary space in a warning log line. (2 places)
- Change test conf from True/False to "True"/"False" (string) because in
the conf dict, those value should be string.
Co-Authored-By: Janie Richling <jrichli@us.ibm.com>
Change-Id: Ida90c32d16481a15fa68c9fdb380932526c366f6
The handoffs_first mode in the replicator has the useful behavior of
processing all handoff parts across all disks until there aren't any
handoffs anymore on the node [1] and then it seemingly tries to drop
back into normal operation. In practice I've only ever heard of
handoffs_first used while rebalancing and turned off as soon as the
rebalance finishes - it's not recommended to run with handoffs_first
mode turned on and it emits a warning on startup if option is enabled.
The handoffs_first mode on the reconstructor doesn't work - it was
prioritizing handoffs *per-part* [2] - which is really unfortunate
because in the reconstructor during a rebalance it's often *much* more
attractive from an efficiency disk/network perspective to revert a
partition from a handoff than it is to rebuild an entire partition from
another primary using the other EC fragments in the cluster.
This change deprecates handoffs_first in favor of handoffs_only in the
reconstructor which is far more useful - and just like handoffs_first
mode in the replicator - it gives the operator the option of forcing the
consistency engine to focus on rebalance. The handoffs_only behavior is
somewhat consistent with the replicator's handoffs_first option (any
error on any handoff in the replicactor will make it essentially handoff
only forever) but the option does what you want and is named correctly
in the reconstructor.
For consistency with the replicator the reconstructor will mostly honor
the handoffs_first option, but if you set handoffs_only in the config it
always takes precedence. Having handoffs_first in your config always
results in a warning, but if handoff_only is not set and handoffs_first
is true the reconstructor will assume you need handoffs_only and behaves
as such.
When running in handoffs_only mode the reconstructor will start to log a
warning every cycle if you leave it running in handoffs_only after it
finishes reverting handoffs. However you should be monitoring on-disk
partitions and disable the option as soon as the cluster finishes the
full rebalance cycle.
1. Ia324728d42c606e2f9e7d29b4ab5fcbff6e47aea fixed replicator
handoffs_first "mode"
2. Unlike replication each partition in a EC policy can have a different
kind of job per frag_index, but the cardinality of jobs is typically
only one (either sync or revert) unless there's been a bunch of errors
during write and then handoffs partitions maybe hold a number of
different fragments.
Known-Issues:
handoffs_only is not documented outside of the example config, see lp
bug #1626290
Closes-Bug: #1653018
Change-Id: Idde4b6cf92fab6c45f2c0c2733277701eb436898