4176 Commits

Author SHA1 Message Date
Zuul
119fd25ca5 Merge "s3api: Check whether versioning is enabled more" 2020-06-06 04:18:26 +00:00
Zuul
c2b03f6779 Merge "Latch shard-stat reporting" 2020-06-06 04:01:26 +00:00
Zuul
713f056493 Merge "Quiet eventlet exceptions in test" 2020-06-06 03:31:35 +00:00
Zuul
00b9ee2538 Merge "Don't auto-create shard containers" 2020-06-06 02:21:10 +00:00
Clay Gerrard
fc731198ac Quiet eventlet exceptions in test
This traceback would probably show up in the wild, but it's not relevant
to this test.

Change-Id: I9e6e7679f674bddddcc4440b38d5741aaece3393
2020-06-05 09:49:01 -05:00
Zuul
56278100a1 Merge "relinker: Improve performance by limiting I/O" 2020-06-05 06:09:24 +00:00
Tim Burke
ce4c0fb14b Don't auto-create shard containers
...unless the client requests it specifically using a new flag:

   X-Backend-Auto-Create: true

Previously, you could get real jittery listings during a rebalance:

 * Partition with a shard DB get reassigned, so one primary has no DB.
 * Proxy makes a listing, gets a 404, tries another node. Likely, one of
   the other shard replicas responds. Things are fine.
 * Update comes in. Since we use the auto_create_account_prefix
   namespace for shards, container DB gets created and we write the row.
 * Proxy makes another listing. There's a one-in-three chance that we
   claim there's only one object in that whole range.

Note that unsharded databases would respond to the update with a 404 and
wait for one of the other primaries (or the old primary that's now a
hand-off) to rsync a whole DB over, keeping us in the happy state.

Now, if the account is in the shards namespace, 404 the object update if
we have no DB. Wait for replication like in the unsharded case.

Continue to be willing to create the DB when the sharder is seeding all
the CREATED databases before it starts cleaving, though.

Change-Id: I15052f3f17999e6f432951ba7c0731dcdc9475bb
Closes-Bug: #1881210
2020-06-03 13:26:31 -07:00
Zuul
b26d208b61 Merge "Simplify wsgify()" 2020-06-02 19:53:58 +00:00
Zuul
6c1bc3949d Merge "ratelimit: Allow multiple placements" 2020-06-01 21:25:41 +00:00
Clay Gerrard
ede9dad9f6 Better functest quarantine cleanup
Change-Id: I9218aaeb5fcd21f1bc2a5d655e3216059a209aeb
2020-06-01 14:43:05 +00:00
Tim Burke
cedec8c5ef Latch shard-stat reporting
The idea is, if none of

  - timestamp,
  - object_count,
  - bytes_used,
  - state, or
  - epoch

has changed, we shouldn't need to send an update back to the root
container.

This is more-or-less comparable to what the container-updater does to
avoid unnecessary writes to the account.

Closes-Bug: #1834097
Change-Id: I1ee7ba5eae3c508064714c4deb4f7c6bbbfa32af
2020-05-29 22:33:10 -07:00
Tim Burke
fa768b4342 Simplify wsgify()
Change-Id: Iec399aa8b58e72152a17265f2af1131f02667131
2020-05-27 03:19:13 +00:00
Zuul
5d9c373618 Merge "versioning: Have versioning symlinks make pre-auth requests to reserved container" 2020-05-27 00:15:58 +00:00
Tim Burke
a8e03f42e0 versioning: Have versioning symlinks make pre-auth requests to reserved container
Previously, the lack of container ACLs on the reserved container would
mean that attempting to grant access to the user-visible container would
not work; the user could not access the backing object.

Now, have symlinks with the allow-reserved-names sysmeta set be
pre-authed. Note that the user still has to be authorized to read the
symlink, and if the backing object was *itself* a symlink, that will be
authed separately.

Change-Id: Ifd744044421ef2ca917ce9502b155a6514ce8ecf
Closes-Bug: #1880013
2020-05-26 10:09:56 -05:00
Clay Gerrard
aab45880f8 Breakup reclaim into batches
We want to do the table scan without locking and group the locking
deletes into small indexed operations to minimize the impact of
background processes calling reclaim each cycle.

Change-Id: I3ccd145c14a9b68ff8a9da61f79034549c9bc127
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Closes-Bug: #1877651
2020-05-22 21:15:57 +00:00
Tim Burke
1db11df4f2 ratelimit: Allow multiple placements
We usually want to have ratelimit fairly far left in the pipeline -- the
assumption is that something like an auth check will be fairly expensive
and we should try to shield the auth system so it doesn't melt under the
load of a misbehaved swift client.

But with S3 requests, we can't know the account/container that a request
is destined for until *after* auth. Fortunately, we've already got some
code to make s3api play well with ratelimit.

So, let's have our cake and eat it, too: allow operators to place
ratelimit once, before auth, for swift requests and again, after auth,
for s3api. They'll both use the same memcached keys (so users can't
switch APIs to effectively double their limit), but still only have each
S3 request counted against the limit once.

Change-Id: If003bb43f39427fe47a0f5a01dbcc19e1b3b67ef
2020-05-19 11:10:22 -07:00
Zuul
08db36a295 Merge "py27: Suppress UnicodeWarnings in ShardRange setters" 2020-05-13 17:44:56 +00:00
Tim Burke
0fd23ee546 Fix pep8 job
New flake8 came out with new & improved rules. Ignore E741; it would be
too much churn. Fix the rest.

Change-Id: I9125c8c53423232309a75cbcc5b695b378864c1b
2020-05-13 00:24:13 -07:00
Tim Burke
fe74ec0489 py27: Suppress UnicodeWarnings in ShardRange setters
Previously, we'd see warnings like

   UnicodeWarning: Unicode equal comparison failed to convert both
   arguments to Unicode - interpreting them as being unequal

when setting lower/upper bounds with non-ascii byte strings.

Change-Id: I328f297a5403d7e59db95bc726428a3f92df88e1
2020-05-04 21:39:28 -07:00
Tim Burke
4c8512afb1 Use separate name for HeaderKeyDict var vs list of response headers
Closes-Bug: #1875538
Change-Id: I1bcef61157594329f6978f7380bf7293aa1ca65e
Related-Change: Ia832e9bab13167948f01bc50aa8a61974ce189fb
2020-04-28 07:48:27 -07:00
Tim Burke
1af995f0e8 s3api: Check whether versioning is enabled more
Previously, attempting to GET, HEAD, or DELETE an object with a non-null
version-id would cause 500s, with logs complaining about how

    version-aware operations require that the container is versioned

Now, we'll early-return with a 404 (on GET or HEAD) or 204 (on DELETE).

Change-Id: I46bfd4ae7d49657a94734962c087f350e758fead
Closes-Bug: 1874295
2020-04-27 21:19:17 -07:00
Tim Burke
69b8165cd8 obj: _finalize_durable may succeed even when data file is missing
Ordinarily, an ENOENT in _finalize_durable should mean something's gone
off the rails -- we expected to be able to mark data durable, but
couldn't!

If there are concurrent writers, though, it might actually be OK:

   Client A writes .data
   Client B writes .data
   Client B finalizes .data *and cleans up on-disk files*
   Client A tries to finalize but .data is gone

Previously, the above would cause the object server to 500, and if
enough of them did this, the client may see a 503. Now, call it good so
clients get 201s.

Change-Id: I4e322a7be23870a62aaa6acee8435598a056c544
Closes-Bug: #1719860
2020-04-24 15:39:37 -07:00
Tim Burke
eae27412d2 sharding: Don't inject shard ranges when user says quit
When an operator does a `find_and_replace` on a DB that already has
shard ranges, they get a prompt like:

   This will delete existing 58 shard ranges.
   Do you want to show the existing ranges [s], delete the existing
   ranges [yes] or quit without deleting [q]?

Previously, if they selected `q`, we would skip the delete but still do
the merge (!) and immediately warn about how there are now invalid shard
ranges. Now, quit without merging.

Change-Id: I7d869b137a6fbade59bb8ba16e4f3e9663e18822
2020-04-18 00:46:00 +00:00
Zuul
a495f1e327 Merge "pep8: Turn on E305" 2020-04-10 11:55:07 +00:00
Zuul
2b7e80217d Merge "Allow clients to send quoted ETags for static links" 2020-04-10 00:18:59 +00:00
Zuul
3cceec2ee5 Merge "Update hacking for Python3" 2020-04-09 15:05:28 +00:00
Zuul
7a6357fdbb Merge "s3api: Propagate backend PATH_INFO in environ for other middleware" 2020-04-09 02:16:33 +00:00
Romain de Joux
415011e162 s3api: Propagate backend PATH_INFO in environ for other middleware
Use swift.backend_path entry in wsgi environment to propagate
backend PATH_INFO.

Needed by ceilometermiddleware to extract account/container info
from PATH_INFO, patch: https://review.opendev.org/#/c/718085/

Change-Id: Ifb3c6c30835d912c5ba4b2e03f2e0b5cb392671a
2020-04-08 11:55:54 +02:00
Zuul
62fc62bb12 Merge "py3: stop barfing on message/rfc822 Content-Types" 2020-04-07 08:59:05 +00:00
Zuul
7158adfb22 Merge "Replace all "with Chunk*Timeout" by a watchdog" 2020-04-04 07:53:26 +00:00
Zuul
f3c04afed3 Merge "Extend MemcacheRing.delete() API to manage server_key" 2020-04-03 20:52:55 +00:00
Zuul
ede407d05f Merge "Optimize obj replicator/reconstructor healthchecks" 2020-04-03 20:12:27 +00:00
Tim Burke
668242c422 pep8: Turn on E305
Change-Id: Ia968ec7375ab346a2155769a46e74ce694a57fc2
2020-04-03 21:22:38 +02:00
Andreas Jaeger
96b56519bf Update hacking for Python3
The repo is Python using both Python 2 and 3 now, so update hacking to
version 2.0 which supports Python 2 and 3. Note that latest hacking
release 3.0 only supports version 3.

Fix problems found.

Remove hacking and friends from lower-constraints, they are not needed
for installation.

Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f
2020-04-03 21:21:07 +02:00
Romain de Joux
bbea88cb1a Extend MemcacheRing.delete() API to manage server_key
Memcachering provided set_multi/get_multi to set/get list of key on an
unique memcached server selected in ring with server_key value. But
current api doesn't allow to deleted value save with set_multi.

This change add a server_key optional keyword to delete method to allow
to delete entry in the memcached selected with server_key instead of key.

Change-Id: I24c29540ee4b91adeb7b9f44fe84bc4d46f89218
2020-04-03 13:49:27 +02:00
Romain LE DISEZ
8378a11d11 Replace all "with Chunk*Timeout" by a watchdog
The contextmanager eventlet.timeout.Timeout is scheduling a call to
throw an exception every time is is entered. The swift-proxy uses
Chunk(Read|Write)Timeout for every chunk read/written from the client or
object-server. For a single upload/download of a big object, it means
tens of thousands of scheduling in eventlet, which is very costly.

This patch replace the usage of these context managers by a watchdog
greenthread that will schedule itself by sleeping until the next timeout
expiration. Then, only if a timeout expired, it will schedule a call to
throw the appropriate exception.

The gain on bandwidth and CPU usage is significant. On a benchmark
environment, it gave this result for an upload of 6 Gbpson a replica
policy (average of 3 runs):
    master: 5.66 Gbps / 849 jiffies consumed by the proxy-server
    this patch: 7.56 Gbps / 618 jiffies consumed by the proxy-server

Change-Id: I19fd42908be5a6ac5905ba193967cd860cb27a0b
2020-04-02 07:38:47 -04:00
Romain LE DISEZ
804776b379 Optimize obj replicator/reconstructor healthchecks
DaemonStrategy class calls Daemon.is_healthy() method every 0.1 seconds
to ensure that all workers are running as wanted.

On object replicator/reconstructor daemons, is_healthy() check if the rings
changed to decide if workers must be created/killed. With large rings,
this operation can be CPU intensive, especially on low-end CPU.

This patch:
- increases the check interval to 5 seconds by default, because none of
  these daemons are critical for performance (they are not in the datapath).
  But it allows each daemon to change this value if necessary
- ensures that before doing a computation of all devices in the ring,
  object replicator/reconstructor checks that the ring really changed
  (by checking the mtime of the ring.gz files)

On an Atom N2800 processor, this patch reduced the CPU usage of the main
object replicator/reconstructor from 70% of a core to 0%.

Change-Id: I2867e2be539f325778e2f044a151fd0773a7c390
2020-04-01 08:03:32 -04:00
Romain LE DISEZ
3061ec803f relinker: Improve performance by limiting I/O
This commit reduce the number of I/O done by the swift-object-relinker.

First, it saves a progress state of relinking and cleanup in case the
process is interrupted during the operation. This allow to resume
operation without rescanning all partitions.

Secondly, it prevents from being scanned by relink and cleanup all
partitions that are bigger than 2^part_power (or (2^next_part_power)/2).
These partitions were not existing before the beginning of the part_power
increase, so there is nothing to relink or cleanup.

Thirdly, it reverse-orders the partitions to scan so that some useless
work is avoided. If a device contains partitions 1 and 3, relinking
partition 1 will create "new" objects in partition 3, that will need to
be scanned when the relinker will work on partition 3. It is useless. If
partition 3 is done first, it will only contain the objects that need to
be relinked.

Fourthly, it allows to specify a unique device to work on.

To do that, some hooks were added in audit_location_generator to allow
to execute some custom code before/after iterating a
device/partition/suffix/hash.

Change-Id: If1bf8ed9036fb0ec619b0d4f16061a81a1af2082
2020-03-31 17:33:06 -04:00
Romain LE DISEZ
d361e5febf Make wsgi server uses systemd's NOTIFY_SOCKET
Change-Id: Ice224fc2a6ba0150be180955037c13fc90365479
2020-03-31 15:22:48 -04:00
Tim Burke
04cc11b938 py3: stop barfing on message/rfc822 Content-Types
Closes-Bug: #1863053
Change-Id: I7493d3e201e26df9f200e16bc081d8a0f30308b9
2020-03-26 12:55:54 -07:00
Zuul
712bf3c9fb Merge "ring: Flag region, zone, and device as required in add_dev" 2020-03-26 00:51:19 +00:00
Tim Burke
dc424f593d Allow clients to send quoted ETags for static links
Change-Id: I29c62d28311fd0c2bc6394e03153689523a5959d
2020-03-20 20:16:12 -05:00
Tim Burke
821b964166 ring: Flag region, zone, and device as required in add_dev
They effectively already *were*, but if you used the RingBuilder API
directly (rather than the CLI) you could previously write down builders
that would hit KeyErrors on load.

Change-Id: I1de895d4571f7464be920345881789d47659729f
2020-03-06 22:25:21 -06:00
Zuul
9d4dc29fb3 Merge "Apply limit to list versioned containers" 2020-03-07 01:17:03 +00:00
Zuul
71cc368179 Merge "sharding: filter shards based on prefix param when listing" 2020-03-05 06:53:05 +00:00
Zuul
99e8feb48d Merge "Fix up some Content-Type handling in account/container listings" 2020-03-04 07:45:00 +00:00
Zuul
efe084adab Merge "Use float consistently for proxy timeout settings" 2020-03-04 07:29:42 +00:00
Zuul
fc46a763d2 Merge "middlewares: Clean up app iters better" 2020-03-03 21:53:51 +00:00
Clay Gerrard
f2ffd90059 Apply limit to list versioned containers
Change-Id: I28e062273d673c4f07cd3c5da088aa790b77a599
Closes-Bug: #1863841
2020-03-03 11:27:21 -08:00
Clay Gerrard
dc40779307 Use float consistently for proxy timeout settings
Change-Id: I433c97df99193ec31c863038b9b6fd20bb3705b8
2020-03-02 10:44:48 -06:00