383 Commits

Author SHA1 Message Date
Zuul
c2b03f6779 Merge "Latch shard-stat reporting" 2020-06-06 04:01:26 +00:00
Zuul
56278100a1 Merge "relinker: Improve performance by limiting I/O" 2020-06-05 06:09:24 +00:00
Tim Burke
cedec8c5ef Latch shard-stat reporting
The idea is, if none of

  - timestamp,
  - object_count,
  - bytes_used,
  - state, or
  - epoch

has changed, we shouldn't need to send an update back to the root
container.

This is more-or-less comparable to what the container-updater does to
avoid unnecessary writes to the account.

Closes-Bug: #1834097
Change-Id: I1ee7ba5eae3c508064714c4deb4f7c6bbbfa32af
2020-05-29 22:33:10 -07:00
Tim Burke
fe74ec0489 py27: Suppress UnicodeWarnings in ShardRange setters
Previously, we'd see warnings like

   UnicodeWarning: Unicode equal comparison failed to convert both
   arguments to Unicode - interpreting them as being unequal

when setting lower/upper bounds with non-ascii byte strings.

Change-Id: I328f297a5403d7e59db95bc726428a3f92df88e1
2020-05-04 21:39:28 -07:00
Zuul
3cceec2ee5 Merge "Update hacking for Python3" 2020-04-09 15:05:28 +00:00
Andreas Jaeger
96b56519bf Update hacking for Python3
The repo is Python using both Python 2 and 3 now, so update hacking to
version 2.0 which supports Python 2 and 3. Note that latest hacking
release 3.0 only supports version 3.

Fix problems found.

Remove hacking and friends from lower-constraints, they are not needed
for installation.

Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f
2020-04-03 21:21:07 +02:00
Romain LE DISEZ
8378a11d11 Replace all "with Chunk*Timeout" by a watchdog
The contextmanager eventlet.timeout.Timeout is scheduling a call to
throw an exception every time is is entered. The swift-proxy uses
Chunk(Read|Write)Timeout for every chunk read/written from the client or
object-server. For a single upload/download of a big object, it means
tens of thousands of scheduling in eventlet, which is very costly.

This patch replace the usage of these context managers by a watchdog
greenthread that will schedule itself by sleeping until the next timeout
expiration. Then, only if a timeout expired, it will schedule a call to
throw the appropriate exception.

The gain on bandwidth and CPU usage is significant. On a benchmark
environment, it gave this result for an upload of 6 Gbpson a replica
policy (average of 3 runs):
    master: 5.66 Gbps / 849 jiffies consumed by the proxy-server
    this patch: 7.56 Gbps / 618 jiffies consumed by the proxy-server

Change-Id: I19fd42908be5a6ac5905ba193967cd860cb27a0b
2020-04-02 07:38:47 -04:00
Romain LE DISEZ
3061ec803f relinker: Improve performance by limiting I/O
This commit reduce the number of I/O done by the swift-object-relinker.

First, it saves a progress state of relinking and cleanup in case the
process is interrupted during the operation. This allow to resume
operation without rescanning all partitions.

Secondly, it prevents from being scanned by relink and cleanup all
partitions that are bigger than 2^part_power (or (2^next_part_power)/2).
These partitions were not existing before the beginning of the part_power
increase, so there is nothing to relink or cleanup.

Thirdly, it reverse-orders the partitions to scan so that some useless
work is avoided. If a device contains partitions 1 and 3, relinking
partition 1 will create "new" objects in partition 3, that will need to
be scanned when the relinker will work on partition 3. It is useless. If
partition 3 is done first, it will only contain the objects that need to
be relinked.

Fourthly, it allows to specify a unique device to work on.

To do that, some hooks were added in audit_location_generator to allow
to execute some custom code before/after iterating a
device/partition/suffix/hash.

Change-Id: If1bf8ed9036fb0ec619b0d4f16061a81a1af2082
2020-03-31 17:33:06 -04:00
Romain LE DISEZ
d361e5febf Make wsgi server uses systemd's NOTIFY_SOCKET
Change-Id: Ice224fc2a6ba0150be180955037c13fc90365479
2020-03-31 15:22:48 -04:00
Clay Gerrard
2759d5d51c New Object Versioning mode
This patch adds a new object versioning mode. This new mode provides
a new set of APIs for users to interact with older versions of an
object. It also changes the naming scheme of older versions and adds
a version-id to each object.

This new mode is not backwards compatible or interchangeable with the
other two modes (i.e., stack and history), especially due to the changes
in the namimg scheme of older versions. This new mode will also serve
as a foundation for adding S3 versioning compatibility in the s3api
middleware.

Note that this does not (yet) support using a versioned container as
a source in container-sync. Container sync should be enhanced to sync
previous versions of objects.

Change-Id: Ic7d39ba425ca324eeb4543a2ce8d03428e2225a1
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>
2020-01-24 17:39:56 -08:00
Tim Burke
57ca3570e9 Allow Timestamp comparisons against out-of-range values
Prior to the related change, clients may have written down X-Delete-At headers
that are outside of the Timestamp range, for example.

Change-Id: Ib8ae7ebcbdb32e0aa58446bd1ef949e5e2f63e74
Related-Change: I23666ec8a067d829eaf9bfe54bd086c320b3429e
Related-Bug: 1821204
Partial-Bug: 1860149
2020-01-17 08:11:34 -08:00
Zuul
9fa0b211a9 Merge "Seamlessly reload servers with SIGUSR1" 2019-11-14 20:34:48 +00:00
Zuul
15ccb776d7 Merge "Follow up punch_hole patch" 2019-11-14 01:17:19 +00:00
Darrell Bishop
1107f24179 Seamlessly reload servers with SIGUSR1
Swift servers can now be seamlessly reloaded by sending them a SIGUSR1
(instead of a SIGHUP).  The server forks off a synchronized child to
wait to close the old listen socket(s) until the new server has started
up and bound its listen socket(s).  The new server is exec'ed from the
old one so its PID doesn't change.  This makes Systemd happier, so a
ReloadExec= stanza can now be used.

The seamless part means that incoming connections will alwyas get
accepted either by the old server or the new one.  This eliminates
client-perceived "downtime" during server reloads, while allowing the
server to fully reload, re-reading configuration, becoming a fresh
Python interpreter instance, etc.  The SO_REUSEPORT socket option has
already been getting used, so nothing had to change there.

This patch also includes a non-invasive fix for a current eventlet bug;
see https://github.com/eventlet/eventlet/pull/590
That bug prevents a SIGHUP "reload" from properly servicing existing
requests before old worker processes close sockets and exit.  The
existing probtests missed this, but the new ones, in this patch, caught
it.

New probe tests cover both old SIGHUP "reload" behavior as well as the
new SIGUSR1 seamless reload behavior.

Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7
2019-11-07 10:15:26 -08:00
Zuul
4768f22507 Merge "Consistently use io.BytesIO" 2019-10-15 16:39:20 +00:00
Tim Burke
d270596b67 Consistently use io.BytesIO
Change-Id: Ic41b37ac75b5596a8307c4962be86f2a4b0d9731
2019-10-15 15:09:46 +02:00
Christian Schwede
78a4070c90 Fix misleading error msg if swift.conf unreadable
Several tools are returning a misleading error message if swift.conf is
missing or not readable by the user, stating that the hash pre-/suffixes
are missing. Let's fix this by catching the real issue down below.

Change-Id: I7a47e6260ed51a3b7d9665b3a4510520429ae158
2019-10-10 17:55:28 -07:00
Clay Gerrard
25aeb0ca49 Make GreenAsyncPile not hang
It's probably weird that StreamingPile has this interfaces that swallows
exceptions, but this seems better than hanging.

Change-Id: I8fe45c0f0d291efc84f3edf5d6b7cd116b5c7835
2019-08-02 08:19:41 -07:00
zhufl
c46b88ab74 Fix invalid assert states
"self.assertTrue(policies[1].is_deprecated, True)" and
"self.assertTrue(crashy_calls[0], 1)" are not correct, this is
to fix them.

Change-Id: I7b07f0833d675d2939c910f679b54da2b8cda482
2019-07-01 09:20:02 +08:00
Clay Gerrard
dca658103a Fix swift with python <2.7.9
Closes-Bug: #1831932

Change-Id: I0d33864f4bffa401082548ee9a52f6eb50cb1f39
2019-06-07 10:39:00 -05:00
Gilles Biannic
a4cc353375 Make log format for requests configurable
Add the log_msg_template option in proxy-server.conf and log_format in
a/c/o-server.conf. It is a string parsable by Python's format()
function. Some fields containing user data might be anonymized by using
log_anonymization_method and log_anonymization_salt.

Change-Id: I29e30ef45fe3f8a026e7897127ffae08a6a80cd9
2019-05-02 17:43:25 -06:00
Tim Burke
049e56a5d0 Remove our urlparse wrapper
It has not been necessary since we dropped support for Python 2.6.
See https://github.com/python/cpython/commit/8c6d9d7 and
https://bugs.python.org/issue2987.

Be sure to keep a `urlparse` name in utils, though; swauth (at least)
still expects there to be a swift.common.utils.urlparse.

Change-Id: If2502868f251b8a83aa929ee22b10046e708d111
2019-04-10 12:39:09 -07:00
Pete Zaitcev
575538b55b py3: port the container
This started with ShardRanges and its CLI. The sharder is at the
bottom of the dependency chain. Even container backend needs it.
Once we started tinkering with the sharder, it all snowballed to
include the rest of the container services.

Beware, this does affect some of Python 2 code. Mostly it's trivial
and obviously correct, but needs checking by reviewers.

About killing the stray "from __future__ import unicode_literals":
we do not do it in general. The specific problem it caused was
a failure of functional tests because unicode leaked into a field
that was supposed to be encoded. It is just too hard to track the
types when rules change from file to file, so off with its head.

Change-Id: Iba4e65d0e46d8c1f5a91feb96c2c07f99ca7c666
2019-02-20 21:30:46 -06:00
Thiago da Silva
0668731839 Change how O_TMPFILE support is detected
Previously o_tmpfile support was detected by checking the
kernel version as it was officially introduced in XFS in 3.15.
The problem is that RHEL has backported the support to at least
RHEL 7.6 but the kernel version is not updated.

This patch changes o_tmpfile is detected by actually attempting to
open a file with the O_TMPFILE flag and keeps the information cached
in DiskFileManager so that the check only happens once while process
is running.

Change-Id: I3599e2ab257bcd99467aee83b747939afac639d8
2019-01-31 18:35:39 +00:00
Thiago da Silva
700fcc7353 Remove duplicate statement
Change-Id: I249a1d5c0c025d571587e225e833a865ed6409e0
2019-01-22 10:24:49 -05:00
Zuul
b9d2c08e8d Merge "Fix SSYNC concurrency on partition" 2018-12-11 23:34:54 +00:00
Romain LE DISEZ
014d46f9a7 Fix SSYNC concurrency on partition
Commit e199192caefef068b5bf57da8b878e0bc82e3453 introduced the ability
to have multiple SSYNC running on a single device. It misses a security
to ensure that only one SSYNC request can be running on a partition.

This commit update replication_lock to lock N times the device, then
lock once the partition related to a SSYNC request.

Change-Id: Id053ed7dd355d414d7920dda79a968a1c6677c14
2018-12-04 14:47:26 +01:00
Romain de Joux
4809884d9f Use eventlet.patcher.original to get Python select module in get_hub
get_hub function was added in commit b155da42 with the idea to bypass
eventlet automatic hub selection that prefers epoll if available by default.

Since version 0.20.0 eventlet removed select.poll() function in its patched
select module (eventlet.green.select), see:
   - https://github.com/eventlet/eventlet/commit/614a20462

So if eventlet monkey patching is done before a get_hub() call (as now in
wsgi.py since commit c9410c7d) if we use 'import select' we get the eventlet
version that don't have poll attribute.

To prevent that we use eventlet.patcher.original function to get python select
module to test if poll() is available on current platform.

Change-Id: I69b3db3951b3d3b6583845978deb2883492e7f0f
Closes-Bug: 1804627
2018-11-26 23:04:30 +01:00
Tim Burke
582f0585e8 py3: encryption follow-up
Change-Id: Ic680a11fa3133b3d6f3fa6fa007ccfbeb540899a
2018-11-20 14:27:19 -08:00
Zuul
614e85d479 Merge "Remove empty directories after a revert job" 2018-11-01 04:34:04 +00:00
Alexandre Lécuyer
d306345ddd Remove empty directories after a revert job
Currently, the reconstructor will not remove empty object and suffixes
directories after processing a revert job. This will only happen during
its next run.

This patch will attempt to remove these empty directories immediately,
while we have the inodes cached.

Change-Id: I5dfc145b919b70ab7dae34fb124c8a25ba77222f
2018-10-26 09:29:14 +02:00
Pete Zaitcev
1663782459 Fix up the test for .ismount
We kept hitting a floating error in the test, where fist ismount
in the test succeeds, while it should fail. As it turned out,
the return of gettempdir was the plain /tmp. So, a previous test
created /tmp/.ismount and the subsequent runs failed on it.
Re-generating the root filesystem (e.g. by a container) fixes
the problem, but still, there's no need to do this. This change
tightens the test up by placing the .ismount into a subdirectory
of the test directory instead of the global /tmp.

Change-Id: I006ba1f69982ef7513db3508d691723656f576c9
2018-10-26 05:30:55 +00:00
Tim Burke
0a564d885e Check for .ismount stubs with symlinks, too
Related-Change: I9d9fc0a4447a8c5dd39ca60b274c119af6b4c28f
Change-Id: Ib6a2edf648397d1d1c875461698f63afcde5b3ed
2018-10-19 22:59:34 +00:00
zhulingjie
83a7ce8ce0 Python 3 compatibility: fix xrange/range issues
xrange is not defined in python3.
Rename xrange() to range().

Change-Id: Ifb1c9cfd863ce6dfe3cced3eca7ea8e539d8a5e9
2018-10-14 14:08:19 +00:00
Zuul
5cc4a72c76 Merge "Configure diskfile per storage policy" 2018-09-27 00:19:32 +00:00
Zuul
fc9ab28927 Merge "py3: port request_helpers" 2018-09-25 20:01:10 +00:00
Tim Burke
2ef21ac05d py3: port request_helpers
Change-Id: I6be1a1c618e4b4fa03b34dad96f378aca01e8e08
2018-09-15 01:33:34 -06:00
Kota Tsuyuzaki
814a76689f Follow up punch_hole patch
Add missing tests to get more coverage and reduce a line.

Change-Id: I34d8063ee82323c9751b4c965bee01ab584c5eb5
2018-09-15 06:49:18 +09:00
Alexandre Lécuyer
dbacdcf01c Add punch_hole utility function
This is useful for deallocating disk blocks as part of an alternate disk
file implementation.

Additionally, add an offset argument to the existing fallocate utility
function; this allows you to grow an existing file.

Sam always had the best descriptions:

  utils.fallocate(fd, size) allocates <size> bytes for the file referred
  to by <fd>. It allows for keeping a reserve of an additional N bytes
  or X% of the filesystem free. If neither fallocate() or
  posix_fallocate() C functions are avaialble, utils.fallocate() will
  log a warning (once only) and not actually allocate space.

  utils.punch_hole(fd, offset, length) deallocates <length> bytes
  starting at <offset> from the file referred to by <fd>. It uses the C
  function fallocate(). If fallocate() is not available, calls to
  utils.punch_hole() will raise an exception.

Since these both use the fallocate syscall, refactor that a bit and get
rid of FallocateWrapper. We add a new _LibcWrapper to do some
lazy-loading of a C function and expose whether the function is actually
available in Python, though. This allows utils.fallocate and
utils.punch_hole to keep their fancy logic pretty well-contained.

Modernized the tests for utils.fallocate() and utils.punch_hole().

Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Change-Id: Ieac30a477d784905c94742ee3d0898d7e0194b39
2018-09-14 13:55:42 -06:00
Romain LE DISEZ
673fda7620 Configure diskfile per storage policy
With this commit, each storage policy can define the diskfile to use to
access objects. Selection of the diskfile is done in swift.conf.

Example:
    [storage-policy:0]
    name = gold
    policy_type = replication
    default = yes
    diskfile = egg:swift#replication.fs

The diskfile configuration item accepts the same format than middlewares
declaration: [[scheme:]egg_name#]entry_point
The egg_name is optional and default to "swift". The scheme is optional
and default to the only valid value "egg". The upstream entry points are
"replication.fs" and "erasure_coding.fs".

Co-Authored-By: Alexandre Lécuyer <alexandre.lecuyer@corp.ovh.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I070c21bc1eaf1c71ac0652cec9e813cadcc14851
2018-08-24 02:29:13 +00:00
Zuul
89854250c3 Merge "Add fallocate_reserve to account and container servers." 2018-07-20 08:42:51 +00:00
Samuel Merritt
8e651a2d3d Add fallocate_reserve to account and container servers.
The object server can be configured to leave a certain amount of disk
space free; default is 1%. This is useful in avoiding 100%-full
filesystems, as those can get Swift in a state where the filesystem is
too full to write tombstones, so you can't delete objects to free up
space.

When a cluster has accounts/containers and objects on the same disks,
then you can wind up with a 100%-full disk since account and container
servers don't respect fallocate_reserve. This commit makes account and
container servers respect fallocate_reserve so that disks shared
between account/container and object rings won't get 100% full.

When a disk's free space falls below the configured reserve, account
and container PUT, POST, and REPLICATE requests will fail with a 507
status code. These are the operations that can significantly increase
the disk space used by a given database.

I called the parameter "fallocate_reserve" for consistency with the
object server. No actual fallocate() call happens under Swift's
control in the account or container servers (sqlite3 might make such a
call, but it's out of our hands).

Change-Id: I083442eef14bf83c0ea717b1decb3e6b56dbf1d0
2018-07-18 17:27:11 +10:00
mmcardle
26b20ee729 IP Range restrictions in temp urls
This patch adds an additional optional parameter to tempurl
which restricts the ip's from which a temp url can be used from.

Change-Id: I23fe998a980960d4a32df042b3f6a21f096c36af
2018-07-03 12:25:28 +01:00
Tim Burke
1318bacc17 py36: Fix test_get_logger_sysloghandler_plumbing
Change-Id: Ibdb9e2bbec1c962d930a3f69fc95a8c562ac13b7
2018-06-21 15:43:26 -07:00
Zuul
1cd6416471 Merge "Fix common/test_utils.py on Python 3.5.4+" 2018-06-20 03:15:49 +00:00
Tim Burke
dc8d1c964a Get rid of tpool_reraise
As best I can tell, eventlet already does (and always has done) the
right thing, and we were just bad at catching Timeouts.

For some history:

    https://github.com/openstack/swift/commit/5db3cb3
    https://github.com/openstack/swift/commit/2b3aab8
    https://github.com/openstack/swift/commit/da0e013

Change-Id: Iad8109c4a03f006a89e55373cf3ca867d724b3e1
Related-Bug: 1647804
2018-06-12 15:23:17 -07:00
Samuel Merritt
854db51845 Fix common/test_utils.py on Python 3.5.4+
In CPython commit e59af55c2, instantiating a logging.SysLogHandler
stopped raising an exception if the syslog server was
unavailable. This commit first appears in CPython
3.5.4. utils.get_logger() catches that error and retries the
instantiation, and there a test asserting that. The test fails on
Python 3.5.4 or greater, so now it has been corrected to only assert
things about the first instantiation of logging.SysLogHandler and
passes on Python 3.5.4 and 3.5.5.

This was noticed by running "tox -e py35" on an Ubuntu 18.04 system,
which ships with Python 3.5.5.

Change-Id: I43f231bd7d3566b9849a48f46ec9e2af4cd23be4
2018-05-24 14:14:34 -07:00
Tim Burke
4af57dbc65 Let make_db_file_path accept epoch=None
...in which case it should strip the epoch if the original path had one.

Change-Id: I8739a474c56c0f2376a276d2691c84448cb9c647
2018-05-22 13:49:17 -07:00
Matthew Oliver
2641814010 Add sharder daemon, manage_shard_ranges tool and probe tests
The sharder daemon visits container dbs and when necessary executes
the sharding workflow on the db.

The workflow is, in overview:

- perform an audit of the container for sharding purposes.

- move any misplaced objects that do not belong in the container
  to their correct shard.

- move shard ranges from FOUND state to CREATED state by creating
  shard containers.

- move shard ranges from CREATED to CLEAVED state by cleaving objects
  to shard dbs and replicating those dbs. By default this is done in
  batches of 2 shard ranges per visit.

Additionally, when the auto_shard option is True (NOT yet recommeneded
in production), the sharder will identify shard ranges for containers
that have exceeded the threshold for sharding, and will also manage
the sharding and shrinking of shard containers.

The manage_shard_ranges tool provides a means to manually identify
shard ranges and merge them to a container in order to trigger
sharding. This is currently the recommended way to shard a container.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f
2018-05-18 18:48:13 +01:00
Alistair Coles
4a3efe61a9 Redirect object updates to shard containers
Enable the proxy to fetch a shard container location from the
container server in order to redirect an object update to the shard.

Enable the container server to redirect object updates to shard
containers.

Enable object updater to accept redirection of an object update.

Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: I6ff85827eecdea746b3626c0d401f68139cce19d
2018-05-18 18:48:13 +01:00