528 Commits

Author SHA1 Message Date
Zuul
cf18e1f47b Merge "sharding: Cache shard ranges for object writes" 2019-07-13 00:34:06 +00:00
Tim Burke
a1af3811a7 sharding: Cache shard ranges for object writes
Previously, we issued a GET to the root container for every object PUT,
POST, and DELETE. This puts load on the container server, potentially
leading to timeouts, error limiting, and erroneous 404s (!).

Now, cache the complete set of 'updating' shards, and find the shard for
this particular update in the proxy. Add a new config option,
recheck_updating_shard_ranges, to control the cache time; it defaults to
one hour. Set to 0 to fall back to previous behavior.

Note that we should be able to tolerate stale shard data just fine; we
already have to worry about async pendings that got written down with
one shard but may not get processed until that shard has itself sharded
or shrunk into another shard.

Also note that memcache has a default value limit of 1MiB, which may be
exceeded if a container has thousands of shards. In that case, set()
will act like a delete(), causing increased memcache churn but otherwise
preserving existing behavior. In the future, we may want to add support
for gzipping the cached shard ranges as they should compress well.

Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919
Closes-Bug: 1781291
2019-07-11 10:40:38 -07:00
zengjia
0ae1ad63c1 Update auth_url in install docs
Beginning with the Queens release, the keystone install guide
recommends running all interfaces on the same port.This patch
updates the swift install guide to reflect that change

Change-Id: Id00cfd2c921da352abdbbbb6668b921f3cb31a1a
Closes-bug: #1754104
2019-07-11 15:03:16 +08:00
Zuul
e62f07d988 Merge "py3: port staticweb and domain_remap func tests" 2019-07-11 05:40:19 +00:00
Zuul
9367bff8fc Merge "py3: add swift-dsvm-functional-py3 job" 2019-07-11 05:01:59 +00:00
Tim Burke
9d1b749740 py3: port staticweb and domain_remap func tests
Drive-by: Tighten domain_remap assertions on listings, which required
that we fix proxy pipeline placement. Add a note about it to the sample
config.

Change-Id: I41835148051294088a2c0fb4ed4e7a7b61273e5f
2019-07-10 09:51:38 -07:00
Tim Burke
345f577ff1 s3token: fix conf option name
Related-Change: Ica740c28b47aa3f3b38dbfed4a7f5662ec46c2c4
Change-Id: I71f411a2e99fa8259b86f11ed29d1b816ff469cb
2019-07-03 07:28:36 -07:00
Tim Burke
4f7c44a9d7 Add information about secret_cache_duration to sample config
Related-Change-Id: Id0c01da6aa6ca804c8f49a307b5171b87ec92228
Change-Id: Ica740c28b47aa3f3b38dbfed4a7f5662ec46c2c4
2019-07-02 18:43:59 +00:00
Tim Burke
39a54fecdc py3: add swift-dsvm-functional-py3 job
Note that keystone wants to stick some UTF-8 encoded bytes into
memcached, but we want to store it as JSON... or something?

Also, make sure we can hit memcache for containers with invalid UTF-8.
Although maybe it'd be better to catch that before we ever try memcache?

Change-Id: I1fbe133c8ec73ef6644ecfcbb1931ddef94e0400
2019-06-21 22:31:18 -07:00
Clay Gerrard
34bd4f7fa3 Clarify usage of dequeue_from_legacy option
Change-Id: Iae9aa7a91b9afc19cb8613b5bc31de463b853dde
2019-05-05 03:20:34 +00:00
Kazuhiro MIYAHARA
443f029a58 Enable to configure object-expirer in object-server.conf
To prepare for object-expirer's general task queue feature [1],
this patch enables to configure object-expirer in object-server.conf.
Object-expirer.conf can be used in the same manner as before, but deprecated.

If both of object-server.conf with "object-expirer" section and
object-expirer.conf are in a node, only object-server.conf is used.
Object-expirer.conf is used only if all object-server.conf doesn't have
"object-expirer" section.

There are two differences between "object-expirer.conf" style and
"object-server.conf" style.

The first difference is `dequeue_from_legacy` default value.
`dequeue_from_legacy` defines task queue mode. In "object-expirer.conf"
style, the default mode is legacy queue. In "object-server.conf" style,
the default mode is general queue. But general mode means no-op mode
for now, because general task queue is not implemented yet.

The second difference is internal client config. In "object-expirer.conf"
style, config file of internal client is the object-expirer.conf itself.
In "object-server.conf" style, config file of internal client is
another file.

[1]: https://review.openstack.org/#/c/517389/

Co-Authored-By: Matthew Oliver <matt@oliver.net.au>

Change-Id: Ib21568f9b9d8547da87a99d65ae73a550e9c3230
2019-05-04 15:45:02 +00:00
Gilles Biannic
a4cc353375 Make log format for requests configurable
Add the log_msg_template option in proxy-server.conf and log_format in
a/c/o-server.conf. It is a string parsable by Python's format()
function. Some fields containing user data might be anonymized by using
log_anonymization_method and log_anonymization_salt.

Change-Id: I29e30ef45fe3f8a026e7897127ffae08a6a80cd9
2019-05-02 17:43:25 -06:00
Tim Burke
d748851766 s3token: Add note about config change when upgrading from swift3
Change-Id: I2610cbdc9b7bc2b4d614eaedb4f3369d7a424ab3
2019-03-05 14:50:22 -08:00
Clay Gerrard
ea8e545a27 Rebuild frags for unmounted disks
Change the behavior of the EC reconstructor to perform a fragment
rebuild to a handoff node when a primary peer responds with 507 to the
REPLICATE request.

Each primary node in a EC ring will sync with exactly three primary
peers, in addition to the left & right nodes we now select a third node
from the far side of the ring.  If any of these partners respond
unmounted the reconstructor will rebuild it's fragments to a handoff
node with the appropriate index.

To prevent ssync (which is uninterruptible) receiving a 409 (Conflict)
we must give the remote handoff node the correct backend_index for the
fragments it will recieve.  In the common case we will use
determistically different handoffs for each fragment index to prevent
multiple unmounted primary disks from forcing a single handoff node to
hold more than one rebuilt fragment.

Handoff nodes will continue to attempt to revert rebuilt handoff
fragments to the appropriate primary until it is remounted or
rebalanced.  After a rebalance of EC rings (potentially removing
unmounted/failed devices), it's most IO efficient to run in
handoffs_only mode to avoid unnecessary rebuilds.

Closes-Bug: #1510342

Change-Id: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec
2019-02-08 18:04:55 +00:00
Zuul
3043c54f28 Merge "s3api: Allow concurrent multi-deletes" 2018-12-08 10:05:39 +00:00
Tim Burke
00be3f595e s3api: Allow concurrent multi-deletes
Previously, a thousand-item multi-delete request would consider each
object to delete serially, and not start trying to delete one until the
previous was deleted (or hit an error).

Now, allow operators to configure a concurrency factor to allow multiple
deletes at the same time.

Default the concurrency to 2, like we did for slo and bulk.

See also: http://lists.openstack.org/pipermail/openstack-dev/2016-May/095737.html

Change-Id: If235931635094b7251e147d79c8b7daa10cdcb3d
Related-Change: I128374d74a4cef7a479b221fd15eec785cc4694a
2018-12-06 23:20:52 +00:00
Tim Burke
692a03473f s3api: Change default location to us-east-1
This is more likely to be the default region that a client would try for
v4 signatures.

UpgradeImpact:
==============

Deployers with clusters that relied on the old implicit default
location of US should explicitly set

    location = US

in the [filter:s3api] section of proxy-server.conf before upgrading.

Change-Id: Ib6659a7ad2bd58d711002125e7820f6e86383be8
2018-11-12 11:04:20 -08:00
Clay Gerrard
06cf5d298f Add databases_per_second to db daemons
Most daemons have a "go as fast as you can then sleep for 30 seconds"
strategy towards resource utilization; the object-updater and
object-auditor however have some "X_per_second" options that allow
operators much better control over how they spend their I/O budget.

This change extends that pattern into the account-replicator,
container-replicator, and container-sharder which have been known to peg
CPUs when they're not IO limited.

Partial-Bug: #1784753
Change-Id: Ib7f2497794fa2f384a1a6ab500b657c624426384
2018-10-30 22:28:05 +00:00
Zuul
5cc4a72c76 Merge "Configure diskfile per storage policy" 2018-09-27 00:19:32 +00:00
Alistair Coles
904e7c97f1 Add more doc and test for cors_expose_headers option
In follow-up to the related change, mention the new
cors_expose_headers option (and other proxy-server.conf
options) in the CORS doc.

Add a test for the cors options being loaded into the
proxy server.

Improve CORS comments in docs.

Change-Id: I647d8f9e9cbd98de05443638628414b1e87d1a76
Related-Change: I5ca90a052f27c98a514a96ee2299bfa1b6d46334
2018-09-17 12:35:25 -07:00
Zuul
5d46c0d8b3 Merge "Adding keep_idle config value to socket" 2018-09-15 00:43:52 +00:00
FatemaKhalid
cfeb32c66b Adding keep_idle config value to socket
User can cofigure KEEPIDLE time for sockets in TCP connection.
The default value is the old value which is 600.

Change-Id: Ib7fb166deb8a87ae4e97ba0671048b1ec079a2ef
Closes-Bug:1759606
2018-09-15 01:30:53 +02:00
Tim Burke
5a8cfd6e06 Add another user for s3api func tests
Previously we'd use two users, one admin and one unprivileged.

Ceph's s3-tests, however, assume that both users should have access to
create buckets. Further, there are different errors that may be returned
depending on whether you are the *bucket* owner or not when using
s3_acl. So now we've got:

  test:tester1  (admin)
  test:tester2  (also admin)
  test:tester3  (unprivileged)

Change-Id: I0b67c53de3bcadc2c656d86131fca5f2c3114f14
2018-09-14 13:33:51 +00:00
Romain LE DISEZ
673fda7620 Configure diskfile per storage policy
With this commit, each storage policy can define the diskfile to use to
access objects. Selection of the diskfile is done in swift.conf.

Example:
    [storage-policy:0]
    name = gold
    policy_type = replication
    default = yes
    diskfile = egg:swift#replication.fs

The diskfile configuration item accepts the same format than middlewares
declaration: [[scheme:]egg_name#]entry_point
The egg_name is optional and default to "swift". The scheme is optional
and default to the only valid value "egg". The upstream entry points are
"replication.fs" and "erasure_coding.fs".

Co-Authored-By: Alexandre Lécuyer <alexandre.lecuyer@corp.ovh.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I070c21bc1eaf1c71ac0652cec9e813cadcc14851
2018-08-24 02:29:13 +00:00
Alistair Coles
2722e49a8c Add support for multiple root encryption secrets
For some use cases operators would like to periodically introduce a
new encryption root secret that would be used when new object data is
written. However, existing encrypted data does not need to be
re-encrypted with keys derived from the new root secret. Older root
secret(s) would still be used as necessary to decrypt older object
data.

This patch modifies the KeyMaster class to support multiple root
secrets indexed via unique secret_id's, and to store the id of the
root secret used for an encryption operation in the crypto meta. The
decrypter is modified to fetch appropriate keys based on the secret id
in retrieved crypto meta.

The changes are backwards compatible with previous crypto middleware
configurations and existing encrypted object data.

Change-Id: I40307acf39b6c1cc9921f711a8da55d03924d232
2018-08-17 17:54:30 +00:00
Zuul
00373dad61 Merge "Add keymaster to fetch root secret from KMIP service" 2018-07-25 03:49:50 +00:00
Samuel Merritt
8e651a2d3d Add fallocate_reserve to account and container servers.
The object server can be configured to leave a certain amount of disk
space free; default is 1%. This is useful in avoiding 100%-full
filesystems, as those can get Swift in a state where the filesystem is
too full to write tombstones, so you can't delete objects to free up
space.

When a cluster has accounts/containers and objects on the same disks,
then you can wind up with a 100%-full disk since account and container
servers don't respect fallocate_reserve. This commit makes account and
container servers respect fallocate_reserve so that disks shared
between account/container and object rings won't get 100% full.

When a disk's free space falls below the configured reserve, account
and container PUT, POST, and REPLICATE requests will fail with a 507
status code. These are the operations that can significantly increase
the disk space used by a given database.

I called the parameter "fallocate_reserve" for consistency with the
object server. No actual fallocate() call happens under Swift's
control in the account or container servers (sqlite3 might make such a
call, but it's out of our hands).

Change-Id: I083442eef14bf83c0ea717b1decb3e6b56dbf1d0
2018-07-18 17:27:11 +10:00
Alistair Coles
1951dc7e9a Add keymaster to fetch root secret from KMIP service
Add a new middleware that can be used to fetch an encryption root
secret from a KMIP service. The middleware uses a PyKMIP client
to interact with a KMIP endpoint. The middleware is configured with
a unique identifier for the key to be fetched and options required
for the PyKMIP client.

Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: Ib0943fb934b347060fc66c091673a33bcfac0a6d
2018-07-03 09:00:21 +01:00
Zuul
ea33638d0c Merge "object-updater: add concurrent updates" 2018-06-14 20:37:06 +00:00
Samuel Merritt
d5c532a94e object-updater: add concurrent updates
The object updater now supports two configuration settings:
"concurrency" and "updater_workers". The latter controls how many
worker processes are spawned, while the former controls how many
concurrent container updates are performed by each worker
process. This should speed the processing of async_pendings.

There is a change to the semantics of the configuration
options. Previously, "concurrency" controlled the number of worker
processes spawned, and "updater_workers" did not exist. I switched the
meanings for consistency with other configuration options. In the
object reconstructor, object replicator, object server, object
expirer, container replicator, container server, account replicator,
account server, and account reaper, "concurrency" refers to the number
of concurrent tasks performed within one process (for reference, the
container updater and object auditor use "concurrency" to mean number
of processes).

On upgrade, a node configured with concurrency=N will still handle
async updates N-at-a-time, but will do so using only one process
instead of N.

UpgradeImpact:

If you have a config file like this:

    [object-updater]
    concurrency = <N>

and you want to take advantage of faster updates, then do this:

    [object-updater]
    concurrency = 8  # the default; you can omit this line
    updater_workers = <N>

If you want updates to be processed exactly as before, do this:

    [object-updater]
    concurrency = 1
    updater_workers = <N>

Change-Id: I17e18088e61f664e1b9942d66423666d0cae1689
2018-06-13 17:39:34 -07:00
Zuul
c01c43d982 Merge "Adds read_only middleware" 2018-06-07 06:49:26 +00:00
Greg Lange
5d601b78f3 Adds read_only middleware
This patch adds a read_only middleware to swift. It gives the ability
to make an entire cluster or individual accounts read only.
When a cluster or an account is in read only mode, requests that would
result in writes to the cluser are not allowed.

DocImpact

Change-Id: I7e0743aecd60b171bbcefcc8b6e1f3fd4cef2478
2018-05-30 03:26:36 +00:00
Thiago da Silva
36dbd38e48 Add s3api headers to allowed_headers by default
Previously, these headers had to be added by operators to their
object-server.conf when enabling swift3 middleware. Since s3api
is now imported into swift we should go ahead and add these headers
by default too.

Change-Id: Ib82e175096716e42aecdab48f01f079e09da6a1d
Signed-off-by: Thiago da Silva <thiago@redhat.com>
2018-05-29 16:02:50 -04:00
Darrell Bishop
661838d968 Add support for PROXY protocol v1 (only)
...to the proxy-server.

The point is to allow the Swift proxy server to log accurate
client IP addresses when there is a proxy or SSL-terminator between the
client and the Swift proxy server.  Example servers supporting this
PROXY protocol:
  stud (v1 only)
  stunnel
  haproxy
  hitch (v2 only)
  varnish

See http://www.haproxy.org/download/1.7/doc/proxy-protocol.txt

The feature is enabled by adding this to your proxy config file:

  [app:proxy-server]
  use = egg:swift#proxy
  ...
  require_proxy_protocol = true

The protocol specification states:

  The receiver MUST be configured to only receive the protocol
  described in this specification and MUST not try to guess
  whether the protocol header is present or not.

so valid deployments are:

  1) require_proxy_protocol = false  (or missing; default is false)
     and NOT behind a proxy that adds or proxies existing PROXY lines.
  2) require_proxy_protocol = true
     and IS behind a proxy that adds or proxies existing PROXY lines.

Specifically, in the default configuration, one cannot send the swift
proxy PROXY lines (no change from before this patch).  When this
feature is enabled, one _must_ send PROXY lines.

Change-Id: Icb88902f0a89b8d980c860be032d5e822845d03a
2018-05-23 18:10:40 -07:00
Matthew Oliver
2641814010 Add sharder daemon, manage_shard_ranges tool and probe tests
The sharder daemon visits container dbs and when necessary executes
the sharding workflow on the db.

The workflow is, in overview:

- perform an audit of the container for sharding purposes.

- move any misplaced objects that do not belong in the container
  to their correct shard.

- move shard ranges from FOUND state to CREATED state by creating
  shard containers.

- move shard ranges from CREATED to CLEAVED state by cleaving objects
  to shard dbs and replicating those dbs. By default this is done in
  batches of 2 shard ranges per visit.

Additionally, when the auto_shard option is True (NOT yet recommeneded
in production), the sharder will identify shard ranges for containers
that have exceeded the threshold for sharding, and will also manage
the sharding and shrinking of shard containers.

The manage_shard_ranges tool provides a means to manually identify
shard ranges and merge them to a container in order to trigger
sharding. This is currently the recommended way to shard a container.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f
2018-05-18 18:48:13 +01:00
Zuul
3313392462 Merge "Import swift3 into swift repo as s3api middleware" 2018-04-30 16:00:56 +00:00
Kota Tsuyuzaki
636b922f3b Import swift3 into swift repo as s3api middleware
This attempts to import openstack/swift3 package into swift upstream
repository, namespace. This is almost simple porting except following items.

1. Rename swift3 namespace to swift.common.middleware.s3api
1.1 Rename also some conflicted class names (e.g. Request/Response)

2. Port unittests to test/unit/s3api dir to be able to run on the gate.

3. Port functests to test/functional/s3api and setup in-process testing

4. Port docs to doc dir, then address the namespace change.

5. Use get_logger() instead of global logger instance

6. Avoid global conf instance

Ex. fix various minor issue on those steps (e.g. packages, dependencies,
  deprecated things)

The details and patch references in the work on feature/s3api are listed
at https://trello.com/b/ZloaZ23t/s3api (completed board)

Note that, because this is just a porting, no new feature is developed since
the last swift3 release, and in the future work, Swift upstream may continue
to work on remaining items for further improvements and the best compatibility
of Amazon S3. Please read the new docs for your deployment and keep track to
know what would be changed in the future releases.

Change-Id: Ib803ea89cfee9a53c429606149159dd136c036fd
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
2018-04-27 15:53:57 +09:00
Zuul
47efb5b969 Merge "Multiprocess object replicator" 2018-04-25 00:41:21 +00:00
Samuel Merritt
c28004deb0 Multiprocess object replicator
Add a multiprocess mode to the object replicator. Setting the
"replicator_workers" setting to a positive value N will result in the
replicator using up to N worker processes to perform replication
tasks.

At most one worker per disk will be spawned, so one can set
replicator_workers=99999999 to always get one worker per disk
regardless of the number of disks in each node. This is the same
behavior that the object reconstructor has.

Worker process logs will have a bit of information prepended so
operators can tell which messages came from which worker. It looks
like this:

  [worker 1/2 pid=16529] 154/154 (100.00%) partitions replicated in 1.02s (150.87/sec, 0s remaining)

The prefix is "[worker M/N pid=P] ", where M is the worker's index, N
is the total number of workers, and P is the process ID. Every message
from the replicator's logger will have the prefix; this includes
messages from down in diskfile, but does not include things printed to
stdout or stderr.

Drive-by fix: don't dump recon stats when replicating only certain
policies. When running the object replicator with replicator_workers >
0 and "--policies=X,Y,Z", the replicator would update recon stats
after running. Since it only ran on a subset of objects, it should not
update recon, much like it doesn't update recon when run with
--devices or --partitions.

Change-Id: I6802a9ad9f1f9b9dafb99d8b095af0fdbf174dc5
2018-04-24 04:05:08 +00:00
wangqi
708b24aef1 Deprecate auth_uri option
Option auth_uri from group keystone_authtoken is deprecated[1].
Use option www_authenticate_uri from group keystone_authtoken.

[1]https://review.openstack.org/#/c/508522/

Change-Id: I43bbc8b8c986e54a9a0829a0631d78d4077306f8
2018-04-18 02:07:11 +00:00
melissaml
3bc267d10c fix a typo in documentation
Change-Id: I0492ae1d50493585ead919904d6d9502b7738266
2018-03-23 07:29:02 +08:00
Samuel Merritt
47fed6f2f9 Add handoffs-only mode to DB replicators.
The object reconstructor has a handoffs-only mode that is very useful
when a cluster requires rapid rebalancing, like when disks are nearing
fullness. This mode's goal is to remove handoff partitions from disks
without spending effort on primary partitions. The object replicator
has a similar mode, though it varies in some details.

This commit adds a handoffs-only mode to the account and container
replicators.

Change-Id: I588b151ee65ae49d204bd6bf58555504c15edf9f
Closes-Bug: 1668399
2018-02-16 16:56:13 -08:00
Zuul
82844a3211 Merge "Add support for data segments to SLO and SegmentedIterable" 2018-02-01 12:52:55 +00:00
Zuul
bf172e2936 Merge "tempurl: Make the digest algorithm configurable" 2018-02-01 03:51:06 +00:00
Tim Burke
5a4d3bdfc4 tempurl: Make the digest algorithm configurable
... and add support for SHA-256 and SHA-512 by default. This allows us
to start moving toward replacing SHA-1-based signatures. We've known
this would eventually be necessary for a while [1], and earlier this
year we've seen SHA-1 collisions [2].

Additionally, allow signatures to be base64-encoded, provided they start
with a digest name followed by a colon. Trailing padding is optional for
base64-encoded signatures, and both normal and "url-safe" modes are
supported. For example, all of the following SHA-1 signatures are
equivalent:

   da39a3ee5e6b4b0d3255bfef95601890afd80709
   sha1:2jmj7l5rSw0yVb/vlWAYkK/YBwk=
   sha1:2jmj7l5rSw0yVb/vlWAYkK/YBwk
   sha1:2jmj7l5rSw0yVb_vlWAYkK_YBwk=
   sha1:2jmj7l5rSw0yVb_vlWAYkK_YBwk

(Note that "normal" base64 encodings will require that you url encode
all "+" characters as "%2B" so they aren't misinterpretted as spaces.)

This was done for two reasons:

   1. A hex-encoded SHA-512 is rather lengthy at 128 characters -- 88
      isn't *that* much better, but it's something.
   2. This will allow us to more-easily add support for different
      digests with the same bit length in the future.

Base64-encoding is required for SHA-512 signatures; hex-encoding is
supported for SHA-256 signatures so we aren't needlessly breaking from
what Rackspace is doing.

[1] https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html
[2] https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

Change-Id: Ia9dd1a91cc3c9c946f5f029cdefc9e66bcf01046
Related-Bug: #1733634
2018-01-31 02:19:18 +00:00
Joel Wright
11bf9e4588 Add support for data segments to SLO and SegmentedIterable
This patch updates the SLO middleware and SegmentedIterable to add
support for user-specified inlined-data segments. Such segments will
contain base64-encoded data to be added before/after an object-backed
segment within an SLO. To accommodate the potential extra data we
increase the default SLO maximum manifest size from 2MiB to 8MiB.
The default maximum number of segments remains 1000, but this will
only be enforced for object-backed segments.

This patch is a prerequisite for a future patch enabling the
download of large objects as tarballs. The TLO patch will be added
as a dependent patch later.

UpgradeImpact
=============
During a rolling upgrade, an updated proxy may write a manifest that
out-of-date proxies will not be able to read. This will resolve itself
once the upgrade completes on all nodes.

Change-Id: Ib8dc216a84d370e6da7d6b819af79582b671d699
2018-01-31 02:13:22 +00:00
Zuul
9ae5de09e8 Merge "fix barbican integration" 2018-01-19 11:40:54 +00:00
Zuul
17eb570a6c Merge "Improve object-updater's stats logging" 2018-01-18 19:21:20 +00:00
Samuel Merritt
f64c00b00a Improve object-updater's stats logging
The object updater has five different stats, but its logging only told
you two of them (successes and failures), and it only told you after
finishing all the async_pendings for a device. If you have a cluster
that's been sick and has millions upon millions of async_pendings
laying around, then your object-updaters are frustratingly
silent. I've seen one cluster with around 8 million async_pendings per
disk where the object-updaters only emitted stats every 12 hours.

Yes, if you have StatsD logging set up properly, you can go look at
your graphs and get real-time feedback on what it's doing. If you
don't have that, all you get is a frustrating silence.

Now, the object updater tells you all of its stats (successes,
failures, quarantines due to bad pickles, unlinks, and errors), and it
tells you incremental progress every five minutes. The logging at the
end of a pass remains and has been expanded to also include all stats.

Also included is a small change to what counts as an error: unmounted
drives no longer do. The goal is that only abnormal things count as
errors, like permission problems, malformed filenames, and so
on. These are things that should never happen, but if they do, may
require operator intervention. Drives fail, so logging an error upon
encountering an unmounted drive is not useful.

Change-Id: Idbddd507f0b633d14dffb7a9834fce93a10359ab
2018-01-17 13:59:23 -08:00
Alistair Coles
6e394bba0a Add request_tries option to object-expirer.conf-sample
...and update the object-expirer man page.

Change-Id: Idca1b8e3b7d5b40481af0d60477510e2557b88c0
2018-01-15 15:29:11 +00:00