271 Commits

Author SHA1 Message Date
Cao Xuan Hoang
3da144a3af Replace 'assertTrue(a not in b)' with 'assertNotIn(a, b)'
trivialfix

Change-Id: I416831c8ad92f8445bc8d9560040a5ebf5c90702
2016-12-12 16:23:09 +07:00
Ondřej Nový
9847796f01 Set owner of drive-audit recon cache to swift user
Fixies this problem:
* swift-drive-audit needs to be run by root, because only root have
  "umount" permission
* swift-object servers typically runs as user swift
* if swift-drive-audit is run by root, /var/cache/swift/drive.recon is
  owned by root, with 0o600
* recon middleware (inside swift-object-server) can't read this cache
  file: swift-object: Error reading recon cache file

This patch adds "user" option to drive-audit config file. Recon cache
is chowned to this user.

Change-Id: Ibf20543ee690b7c5a37fabd1540fd5c0c7b638c9
2016-10-19 17:16:42 +00:00
Jenkins
32bc272634 Merge "Fix when we set state in Spliterator" 2016-10-03 23:46:22 +00:00
Tim Burke
2ec4189e37 Fix when we set state in Spliterator
Also clean up a comment and some exception text

Change-Id: I1e7755cc0468f9a3ba96a0dd24868f09a10c3df0
Related-Change: I24716e3271cf3370642e3755447e717fd7d9957c
2016-10-03 14:27:47 -07:00
Jenkins
1e5c5c35bd Merge "Support multi-range GETs for static large objects." 2016-09-28 04:48:34 +00:00
Alistair Coles
44a861787a Enable object server to return non-durable data
This patch improves EC GET response handling:

- The proxy no longer requires all object servers to have a
  durable file for the fragment archive that they return in
  response to a GET. The proxy will now be satisfied if just
  one object server has a durable file at the same timestamp
  as fragments from other object servers.

  This means that the proxy can now successfully GET an
  object that had missing durable files when it was PUT.

- The proxy will now ensure that it has a quorum of *unique*
  fragment indexes from object servers before considering a
  GET to be successful.

- The proxy is now able to fetch multiple fragment archives
  having different indexes from the same node. This enables
  the proxy to successfully GET an object that has some
  fragments that have landed on the same node, for example
  after a rebalance.

This new behavior is facilitated by an exchange of new
headers on a GET request and response between the proxy and
object servers.

An object server now includes with a GET (or HEAD) response:

- X-Backend-Fragments: the value of this describes all
  fragment archive indexes that the server has for the
  object by encoding a map of the form: timestamp -> <list
  of fragment indexes>

- X-Backend-Durable-Timestamp: the value of this is the
  internal form of the timestamp of the newest durable file
  that was found, if any.

- X-Backend-Data-Timestamp: the value of this is the
  internal form of the timestamp of the data file that was
  used to construct the diskfile.

A proxy server now includes with a GET request:

- X-Backend-Fragment-Preferences: the value of this
  describes the proxy's current preference with respect to
  those fragments that it would have object servers
  return. It encodes a list of timestamp, and for each
  timestamp a list of fragment indexes that the proxy does
  NOT require (because it already has them).

  The presence of a X-Backend-Fragment-Preferences header
  (even one with an empty list as its value) will cause the
  object server to search for the most appropriate fragment
  to return, disregarding the existence or not of any
  durable file. The object server assumes that the proxy
  knows best.

Closes-Bug: 1469094
Closes-Bug: 1484598

Change-Id: I2310981fd1c4622ff5d1a739cbcc59637ffe3fc3
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
2016-09-16 11:40:14 +01:00
Jenkins
8608bd96dd Merge "Make object creation more atomic in Linux" 2016-09-13 04:02:47 +00:00
Jenkins
dd30b9ef98 Merge "Close the iterators in string_along." 2016-08-31 00:15:33 +00:00
Timur Alperovich
66c905e294 Close the iterators in string_along.
Make sure to close the underlying iterator in string_along. What is
currently happening when using the InternalClient is that "Client
disconnected" warnings are generated and resources are tied up until
GC runs.

Change-Id: If1f6c0c756aee95f53f99371439533a97d347eab
2016-08-30 14:28:08 -07:00
Prashanth Pai
773edb4a5d Make object creation more atomic in Linux
Linux 3.11 introduced O_TMPFILE as a flag to open() sys call. This would
enable users to get a fd to an unnamed temporary file. As it's unnamed,
it does not require the caller to devise unique names. It is also not
accessible through any path. Hence, file creation is race-free.

This file is initially unreachable. It is then populated with data(write),
metadata(fsetxattr) and fsync'd before being atomically linked into the
filesystem in a fully formed state using linkat() sys call. Only after a
successful linkat() will the object file will be available for reference.

Caveats
* Unlike os.rename(), linkat() cannot overwrite destination path if it
  already exists. If path exists, we unlink and try again.
* XFS support for O_TMPFILE was only added in Linux 3.15.
* If client disconnects during object upload, although there is no
  incomplete/stale file on disk, the object directory would persist
  and is not cleaned up immediately.

Change-Id: I8402439fab3aba5d7af449b5e465f89332f606ec
Signed-off-by: Prashanth Pai <ppai@redhat.com>
2016-08-24 14:56:00 +05:30
Samuel Merritt
4bcd3d7f6d Support multi-range GETs for static large objects.
Bonus consistency: 416 responses now always have a body. Before, if
you had "swob.HTTPRequestedRangeNotSatisfiable()", you'd get a body,
but if you had "swob.Response(..., conditional_response=True)", then
you'd get a length-0 response body. Now you always get a response
body. It's just the default <html><h1>..., but at it's always there.

Bonus efficiency: do a little caching of sub-SLO manifests to avoid
needless re-fetches. This kicks in when there are multiple references
to the same sub-SLO in a given manifest. The caching only holds 20
sub-SLOs so that a malicious user can't build a giant SLO tree and use
it to run the proxy out of memory (we're already holding up to 10
manifests in memory at a time since a SLO can include another SLO to a
depth of 10; this doesn't make the situation too much worse).

Change-Id: I24716e3271cf3370642e3755447e717fd7d9957c
2016-08-18 15:56:06 -07:00
Peter Lisák
ed772236c7 Change schedule priority of daemon/server in config
The goal is to modify schedule priority and I/O scheduling class and
priority of daemon/server via configuration.
Setting is optional, default keeps current behaviour.

Use case:
Prioritize object-server to object-auditor, because all user's requests
needed to be served in peak hours and audit could wait.

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
DocImpact
Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396
2016-08-10 23:56:15 +02:00
Jenkins
e9f5e7966a Merge "Moved ipv4 & ipv6 validations to the common utils" 2016-07-29 21:25:16 +00:00
Nandini Tata
5c9732ac8e Moved ipv4 & ipv6 validations to the common utils
Validating ip addresses for ipv4 and ipv6 formats have more generic
use cases outside of rings. swift-get-nodes and other utilities that
need to handle ipv6 adrresses often require importing ip validation
methods from swift/common/rings/utils (see Related-Change). Also,
expand_ipv6 method already exists in swift/common/utils. Hence moving
validation of ips also into swift/common/utils from
swift/common/ring/utils.

Related-Change: I6551d65241950c65e7160587cc414deb4a2122f5
Change-Id: I720a9586469cf55acab74b4b005907ce106b3da4
2016-07-28 12:08:06 -07:00
Alistair Coles
928c4790eb Refactor tests and add tests
Relocates some test infrastructure in preparation for
use with encryption tests, in particular moves the test
server setup code from test/unit/proxy/test_server.py
to a new helpers.py so that it can be re-used, and adds
ability to specify additional config options for the
test servers (used in encryption tests).

Adds unit test coverage for extract_swift_bytes and functional
test coverage for container listings. Adds a check on the content
and metadata of reconciled objects in probe tests.

Change-Id: I9bfbf4e47cb0eb370e7a74d18c78d67b6b9d6645
2016-06-15 16:36:25 +01:00
Matthew Oliver
876df35f84 disable_fallocate also disables fallocate_reserve
Currently when disable_fallocate is true it disables calling the
fallocate syscall, but it doesn't disable fallocate_reserve. This
patch fixes this.

This problem has caused functional tests to fail in our SAIOs, since
SAIOs have disable_fallocate set but the fallocate_reserve space free
checking was still being run creating 507 responses. This is thanks
to the change in fallocate_reserve default changing from 0 to 1%.

Because fallocate_reserve and disable_fallocate causes SAIO functional
tests to fail a section called 'Known Issues' has been added to the
SAIO developer documentation which includes a warning about
using fallocate_reserve on SAIOs.

Change-Id: I727bfb0861ea26fe2f16ad55f4d36ae088864d8f
2016-05-19 11:56:29 +10:00
Samuel Merritt
6834547f66 Clean up fallocate tests a little
Change-Id: I01f1ad8ef0f8910718fd2fb30c9e8285358baf84
2016-05-12 08:46:48 -07:00
Jenkins
a403faadd4 Merge "Allow fallocate_reserve to be a percentage" 2016-05-12 08:18:39 +00:00
Jenkins
f66898ae00 Merge "Remove ThreadPool class" 2016-05-11 01:37:40 +00:00
Jenkins
7429da0ddb Merge "SwiftLogFormatter will log transaction IDs on INFO level" 2016-05-10 01:53:46 +00:00
Christian Schwede
4c11833a9c Remove ThreadPool class
With the removement of threads_per_disk there is no longer a need to use
run_in_thread() at all; it was just calling the function itself when
running with 0 threads.
Similar to force_run_in_thread() - with 0 threads it was basically doing
the same like in tpool_reraise(), therefore replacing the call and
finally removing the complete ThreadPool class.

Note that this might break external consumers that are inheriting
BaseDiskFileManager; in this case you need to adopt this change in your
codebase then.

Change-Id: I39489dd660935bdbfbc26b92af86814369369fb5
2016-04-29 13:27:56 -05:00
Bryan Keller
33fdd0a356 SwiftLogFormatter will log transaction IDs on INFO level
Previously SwiftLogFormatter would make two checks. One to see if the
transaction id was already in the message field and another check to
make sure the log level wasn't set to info. If either of these was
true, then it would not log the transaction ID in the transaction ID
field.

This commit removes the check for the info log. Now transaction IDs
will be recorded in all cases that have them.

Change-Id: Ic06538ab55a75d298169ae1745671573ee9c09e8
Closes-Bug: #1504344
2016-04-29 10:33:16 -05:00
Samuel Merritt
29544a9e17 Use smaller quorum size in proxy for even numbers of replicas
Requiring 2/2 backends for PUT requests means that the cluster can't
tolerate a single failure. Likewise, if you have 4 replicas in 2
regions, requiring 3/4 on a POST request means you cannot POST with
your inter-region link down or congested.

This changes the (replication) quorum size in the proxy to be at least
half the nodes instead of a majority of the nodes.

Daemons that were looking for a majority remain unchanged. The
container reconciler, replicator, and updater still require majorities
so their functioning is unchanged.

Odd numbers of replicas are unaffected by this commit.

Change-Id: I3b07ff0222aba6293ad7d60afe1747acafbe6ce4
2016-04-27 16:59:00 -05:00
Andy McCrae
0da9da5131 Allow fallocate_reserve to be a percentage
Add the ability to set the fallocate_reserve value as a percentage.
This happens automatically when adding the '%' at the end of the value.
Having the ability to set a % of free space rather than a byte value is
useful especially when drive sizes are heterogenous.

The default for fallocate_reserve has been adjusted to 1%, having the
fallocate_reserve set seems sensible for all deploys and percentages are
far safer to default than byte values (across drives of any size).

Tests added for using fallocate_reserve as a percentage.

Duplicate tests for fallocate_reserve have been removed.

Docs updated to reflect the fallocate_reserve change.

Change-Id: I4aea613a708205c917e81d6b2861396655e73238
2016-04-23 08:02:00 -05:00
John Dickinson
91f980314f fix fallocate_reserve traceback
Previously, fallocate_reserve could result in a traceback. The
OSError being raised didn't have the proper errno set. This patch
sets the errno to ENOSPC.

Change-Id: I017b0584972ca8832f3b160bbcdff335ae9a1aa6
2016-04-13 11:57:45 -05:00
Samuel Merritt
95efd3f903 Fix infinite recursion during logging when syslog is down
Change-Id: Ia9ecffc88ce43616977e141498e5ee404f2c29c4
2016-04-06 15:45:20 -07:00
Clay Gerrard
1d03803a85 Auditor will clean up stale rsync tempfiles
DiskFile already fills in the _ondisk_info attribute when it tries to open
a diskfile - even if the DiskFile's fileset is not valid or deleted.
During this process the rsync tempfiles would be discovered and logged,
but no-one would attempt to clean them up - even if they were really old.

Instead of logging and ignoring unexpected files when validate a DiskFile
fileset we'll add unexpected files to the unexpected key in the
_ondisk_info attribute.

With a little bit of re-organization in the auditor's object_audit method
to get things into a single return path we can add an unconditional check
for unexpected files and remove those that are "old enough".

Since the replicator will kill any rsync processes that are running longer
than the configured rsync_timeout we know that any rsync tempfiles older
than this can be deleted.

Split unlink_older_than in common.utils into two functions to allow an
explicit list of previously discovered paths to be passed in to avoid an
extra listdir.  Since the getmtime handling already ignores OSError
there's less concern of race condition where a previous discovered
unexpected file is reaped by rsync while we're attempting to clean it up.

Update some doc on the new config option.

Closes-Bug: #1554005

Change-Id: Id67681cb77f605e3491b8afcb9c69d769e154283
2016-03-23 19:34:34 +00:00
Matthew Oliver
f595a7e704 Add concurrent reads option to proxy
This change adds 2 new parameters to enable and control concurrent GETs
in swift, these are 'concurrent_gets' and 'concurrency_timeout'.

'concurrent_gets' allows you to turn on or off concurrent GETs, when
on it will set the GET/HEAD concurrency to replica count. And in the
case of EC HEADs it will set it to ndata.
The proxy will then serve only the first valid source to respond.
This applies to all account, container and object GETs except
for EC. For EC only HEAD requests are effected.

It achieves this by changing the request sending mechanism to using
GreenAsyncPile and green threads with a time out between each
request.

'concurrency_timeout' is related to concurrent_gets. And is the
amount of time to wait before firing the next thread. A value of 0
will fire at the same time (fully concurrent), setting another value
will stagger the firing allowing you the ability to give a node a
shorter chance to respond before firing the next. This value is a float
and should be somewhere between 0 and node_timeout. The default is
conn_timeout. Meaning by default it will stagger the firing.

DocImpact
Implements: blueprint concurrent-reads
Change-Id: I789d39472ec48b22415ff9d9821b1eefab7da867
2016-03-16 06:00:34 +00:00
Samuel Merritt
9430f4c9f5 Move HeaderKeyDict to avoid an inline import
There was a function in swift.common.utils that was importing
swob.HeaderKeyDict at call time. It couldn't import it at compilation
time since utils can't import from swob or else it blows up with a
circular import error.

This commit just moves HeaderKeyDict into swift.common.header_key_dict
so that we can remove the inline import.

Change-Id: I656fde8cc2e125327c26c589cf1045cb81ffc7e5
2016-03-07 12:26:48 -08:00
Alistair Coles
e91de49d68 Update container on fast-POST
This patch makes a number of changes to enable content-type
metadata to be updated when using the fast-POST mode of
operation, as proposed in the associated spec [1].

* the object server and diskfile are modified to allow
  content-type to be updated by a POST and the updated value
  to be stored in .meta files.

* the object server accepts PUTs and DELETEs with older
  timestamps than existing .meta files. This is to be
  consistent with replication that will leave a later .meta
  file in place when replicating a .data file.

* the diskfile interface is modified to provide accessor
  methods for the content-type and its timestamp.

* the naming of .meta files is modified to encode two
  timestamps when the .meta file contains a content-type value
  that was set prior to the latest metadata update; this
  enables consistency to be achieved when rsync is used for
  replication.

* ssync is modified to sync meta files when content-type
  differs between local and remote copies of objects.

* the object server issues container updates when handling
  POST requests, notifying the container server of the current
  immutable metadata (etag, size, hash, swift_bytes),
  content-type with their respective timestamps, and the
  mutable metadata timestamp.

* the container server maintains the most recently reported
  values for immutable metadata, content-type and mutable
  metadata, each with their respective timestamps, in a single
  db row.

* new probe tests verify that replication achieves eventual
  consistency of containers and objects after discrete updates
  to content-type and mutable metadata, and that container-sync
  sync's objects after fast-post updates.

[1] spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e

Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01
2016-03-03 14:25:10 +00:00
Takashi Kajinami
8e4347afd5 Fix proxy-server's support for chunked transferring in GET object
Proxy-server now requires Content-Length in the response header
when getting object and does not support chunked transferring with
"Transfer-Encoding: chunked"

This doesn't matter in normal swift, but prohibits us from putting
any middelwares to execute something like streaming processing of
objects, which can't calculate the length of their response body
before they start to send their response.

Change-Id: I60fc6c86338d734e39b7e5f1e48a2647995045ef
2016-03-02 22:56:13 +09:00
Jenkins
d8a5bf880f Merge "Fix StatsD tests to not use real DNS" 2016-02-26 21:39:40 +00:00
Jenkins
8f0ba56d0d Merge "Replace assertEqual(None, *) with assertIsNone in tests" 2016-02-24 03:29:59 +00:00
Samuel Merritt
eb7ca115e6 Fix StatsD tests to not use real DNS
In common/test_utils.py, TestStatsdLogging had the majority of its
test cases calling the real socket.getaddrinfo(), which uses real
DNS. This is very slightly slower than using a mock getaddrinfo() when
the machine running the tests has functioning DNS, but on a machine
with no network connection at all, the tests are excruciatingly slow
due to timeouts.

This commit mocks things out as appropriate. There's still one user of
the real getaddrinfo(), but it's for ::1, so that's just local
resolution based on /etc/hosts.

Timing numbers for "./.unittests test.unit.common.test_utils:TestStatsdLogging":

 * network, without this patch: 1.8s
 * no network, without this patch: 221.2s (ouch)
 * network, with this patch: 1.1s
 * no network, with this patch: 1.1s

Change-Id: I1a2d6f24fc9bb928894fb1fd8383516250e29e0c
2016-02-23 14:00:34 -08:00
Jenkins
d9f500a128 Merge "Make _get_addr() method a function in utils." 2016-02-22 20:30:22 +00:00
Chaozhe.Chen
4a44e27e00 Replace assertEqual(None, *) with assertIsNone in tests
As swift no longer supports Python 2.6, replace assertEqual(None, *)
with assertIsNone in tests to have more clear messages in case of
failure.

Change-Id: I94af3e8156ef40465d4f7a2cb79fb99fc7bbda56
Closes-Bug: #1280522
2016-02-16 23:49:06 +08:00
Jenkins
e2c570c0ab Merge "Port parse_mime_headers() to Python 3" 2016-02-12 15:11:15 +00:00
Jenkins
4e370e5116 Merge "Port FileLikeIter to Python 3" 2016-02-10 07:36:51 +00:00
Jenkins
eaf6af3179 Merge "Allow IPv6 addresses/hostnames in StatsD target" 2016-02-04 03:23:01 +00:00
Darrell Bishop
26327e1e8b Allow IPv6 addresses/hostnames in StatsD target
The log_statsd_host value can now be an IPv6 address or a hostname
which only resolves to an IPv6 address.  In both cases, the new
behavior is to use an AF_INET6 socket on which .sendto() is called
with the originally-configured hostname (or IP).  This means the
Swift process is not caching a DNS resolution for the lifetime of
the process (a good thing).

If a hostname resolves to both an IPv6 or IPv4 address, an AF_INET
socket is used (i.e. only the IPv4 address will receive the UDP
packet).

The old behavior is preserved: any invalid IP address literals and
failures in DNS resolution or actual StatsD packet sending do not
halt the process or bubble up; they are caught, logged, and
otherwise ignored.

Change-Id: Ibddddcf140e2e69b08edf3feed3e9a5fa17307cf
2016-02-03 00:26:31 -08:00
Alistair Coles
a1776b9c1f Let equal Timestamps not be unequal
Make the result of Timestamp(x) != Timestamp(x) be False.

In python 2.7 this requires the __ne__ method to be defined [1].
"The truth of x==y does not imply that x!=y is false." The
functools.total_ordering decorator does not autocreate a __ne__
method.

In python 3 the __ne__ method is not required [2]. "By default,
__ne__() delegates to __eq__() and inverts the result".

This patch puts back the __ne__ method removed in [3]. Whilst no tests
fail on master with python2.7, they do on this patch [4] and it seems
dangerous to have this absurd behaviour lurking.

[1] https://docs.python.org/2/reference/datamodel.html#object.__ne__
[2] https://docs.python.org/3.4/reference/datamodel.html#object.__ne__
[3] Change-Id: Id26777ac2c780316ff10ef7d954c48cc1fd480b5
[4] Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01

Change-Id: I01fbfa310df3c74390f8e8c2e9ffff81bbf05e47
2016-01-28 10:08:02 -08:00
Jenkins
d7f8c2297c Merge "Add __next__() methods to utils iterators for py3" 2016-01-27 01:19:38 +00:00
Jenkins
f1989c5e8b Merge "Fix handling of "Permission Denied" error from NamedTemporaryFile function" 2016-01-26 14:17:34 +00:00
Kazuhiro MIYAHARA
9ef15453fa Fix handling of "Permission Denied" error from NamedTemporaryFile function
If "Permission Denied" has happen in NamedTemporaryFile function in
dump_recon_cache method, swift will log a message of reference to a variable
without assignment and not log a message of "Permission Denied".
This patch fixes the handling and add an unit test.

Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: Iafdd94905e9e9c81f5966a923324b50c18fcf592
2016-01-26 11:27:34 +00:00
Victor Stinner
d47155af26 Add __next__() methods to utils iterators for py3
On Python 3, next(obj) calls obj.__next__(), not obj.next(). Add an
alias from __next__() to next() to be compatible with Python 2 and
Python 3.

Change-Id: Ida104d3bd7cdba557e523f18df43d56847060054
2016-01-25 14:05:59 +01:00
Jenkins
222649de45 Merge "Allow smaller segments in static large objects" 2016-01-23 06:13:26 +00:00
Victor Stinner
d9b22ac51c Port parse_mime_headers() to Python 3
Port swift.common.utils.parse_mime_headers() to Python 3:

* On Python 3, tries to decode headers from UTF-8. If an header was
  was not encoded to UTF-8, decode the header from Latin1.
* Update the parse_mime_headers() tests: on Python 3, HTTP header
  values are Unicode strings.

This change is a follow-up of the change
Ia5ee2ead67e36e8c6416183667f64ae255887736.

Change-Id: I042dd13e9eb0e9844ccd832d538cdac84359ed42
2016-01-20 15:53:07 +01:00
Victor Stinner
6c32da14f4 Port FileLikeIter to Python 3
Port FileLikeIter and _MultipartMimeFileLikeObject and
swift.common.utils to Python 3:

* Add a __next__() alias to the next() method. On Python 3, the
  next() method is no more used, __next__() is required.
* Use literal byte strings: FileLikeIter _MultipartMimeFileLikeObject
  are written to handle binary files.
* test_close(): replace .FileLikeIter('abcdef') with
  FileLikeIter([b'a', b'b', b'c']). On Python 3, list(b'abc') returns
  [97, 98, 99], whereas ['a', 'b', 'c'] is returned on Python 2.
* Update unit FileLikeIter tests to use byte strings.

Change-Id: Ibacddb70b22f624ecd83e374749578feddf8bca8
2016-01-19 15:15:18 -08:00
Samuel Merritt
5d449471b1 Remove some Python 2.6 leftovers
Change-Id: I798d08722c90327c66759aa0bb4526851ba38d41
2016-01-14 17:26:01 -08:00
Timur Alperovich
725a166ebd Make _get_addr() method a function in utils.
The patch moves the MemcacheConnPool._get_addr() method a function in
swift.common.utils. The function is renamed to parse_socket_string()
and the documentation is updated accordingly. The test for it has also
been moved.

Change-Id: Ida65b2fded28d0a059e668646f5b89714298f348
2016-01-12 21:09:48 -08:00