Fixes bug 1104708
There could be severe performance drop for swift is one disk of one
storage node is problematic due to the tragic state of async disk I/O.
This patch provided PUT timing per kB transfered (ms/kB) monitoring
support for each non-zero-byte request of each disk and report to
statsD for alert.
-adding "object-server.PUT.<device>.timing" metrics for object-server.
DocImpact.
Change-Id: Ie94bddad28e8be52e71683bf6c9db988664abe47
This was an oustanding TODO for StatsD Swift metrics. A new timing
metric is tracked for (only) GET requests for accounts, containers,
and objects:
proxy-server.<req_type>.GET.<status_int>.first-byte.timing
Also updated StatsD documentation in the Admin Guide to clarify that
timing metrics are sent in units of milliseconds.
Change-Id: I5bb781c06cefcb5280f4fb1112a526c029fe0c20
If invoked as 'swift-ring-builder-safe' the directory containing the builder
file provided will be locked (via lock_parent_directory()). This provides a
small safe guard against multiple instances of the swift-ring-builder (or
other utilities that observe this lock) from attempting to write to or read
the builder/ring files while operations are in progress.
This is particularly useful in environments where ring management has been
automated (via Chef or custom solutions) but the operator still occasionally
needs to manually interact with the ring.
DocImpact
Change-Id: Ia362744a8151a91bfb586d01da582906726852e6
As Dieter pointed out in bug 1090495
(https://bugs.launchpad.net/swift/+bug/1090495), the volume of metrics
can vary wildly between StatsD metrics.
This patch implements a partial solution by reducing the sample_rate
used for known high-volume metrics (operational experience will need to
inform this over time) and introducing a new tunable,
log_statsd_sample_rate_factor which is multiplied by the sample_rate for
every statsd stat. This tunable can be used to reduce StatsD traffic
proportionally for all metrics and is intended to replace
log_statsd_default_sample_rate, which is left alone for
backward-compatibility, should anyone be using it.
This patch also includes a drive-by fix for log_udp_port which wasn't
being converted to an int (I didn't verify that actually causes trouble
in SysLogHandler(), but it's definitely an improvement regardles).
Change-Id: Id404636e3629f6431cf1c4e64a143959750a3c23
- Add two optional flags that let you limit swift-dispersion-report to only
reporting on containers OR objects.
- Also make dispersion.conf and swift-dispersion-report manpages
current.
DocImpact
Change-Id: Iad56133cad261241db27d0e2103098e3c2f3c245
It's not sufficient to just look at swift.object-updater.successes to
see the async_pending unlink rate. There are two different spots where
unlinks happen: one when an async_pending has been successfully
processed, and another when the updater notices multiple
async_pendings for the same object. Both events are now tracked under
the same name: swift.object-updater.unlinks.
FakeLogger has now sprouted a couple of convenience methods for
testing logged metrics.
Fixed pep8 1.3.3's complaints in the files this diff touches.
Also: bonus speling and, grammar fixes in the admin guide.
Change-Id: I8c1493784adbe24ba2b5512615e87669b3d94505
Removed many StatsD logging calls in proxy-server and added
swift-informant-style catch-all logging in the proxy-logger middleware.
Many errors previously rolled into the "proxy-server.<type>.errors"
counter will now appear broken down by response code and with timing
data at: "proxy-server.<type>.<verb>.<status>.timing". Also, bytes
transferred (sum of in + out) will be at:
"proxy-server.<type>.<verb>.<status>.xfer". The proxy-logging
middleware can get its StatsD config from standard vars in [DEFAULT] or
from access_log_statsd_* config vars in its config section.
Similarly to Swift Informant, request methods ("verbs") are filtered
using the new proxy-logging config var, "log_statsd_valid_http_methods"
which defaults to GET, HEAD, POST, PUT, DELETE, and COPY. Requests with
methods not in this list use "BAD_METHOD" for <verb> in the metric name.
To avoid user error, access_log_statsd_valid_http_methods is also
accepted.
Previously, proxy-server metrics used "Account", "Container", and
"Object" for the <type>, but these are now all lowercase.
Updated the admin guide's StatsD docs to reflect the above changes and
also include the "proxy-server.<type>.handoff_count" and
"proxy-server.<type>.handoff_all_count" metrics.
The proxy server now saves off the original req.method and proxy_logging
will use this if it can (both for request logging and as the "<verb>" in
the statsd timing metric). This fixes bug 1025433.
Removed some stale access_log_* related code in proxy/server.py. Also
removed the BaseApplication/Application distinction as it's no longer
necessary.
Fixed up the sample config files a bit (logging lines, mostly).
Fixed typo in SAIO development guide.
Got proxy_logging.py test coverage to 100%.
Fixed proxy_logging.py for PEP8 v1.3.2.
Enhanced test.unit.FakeLogger to track more calls to enable testing
StatsD metric calls.
Change-Id: I45d94cb76450be96d66fcfab56359bdfdc3a2576
To tell when replication for a device has finished, it's important to
know when the replicator is removing objects. This was previously
handled for the object-replicator
(object-replicator.partition.delete.count.<device> and
object-replicator.partition.update.count.<device> metrics) but not the
account and container replicators.
This patch extends the existing DB removal count metrics to make them
per-device. The new metrics are:
account-replicator.removes.<device>
container-replicator.removes.<device>
There's also a bonus refactoring and increased test coverage of the DB
replicator code.
Change-Id: I2067317d4a5f8ad2a496834147954bdcdfc541c1
The Admin Guide now contains information about the ring serialization
change (and importantly, how to downgrade, if necessary).
Also added container-server conf var, "allow_versions" to the Deployment
Guide.
Also changed description of proxy-server conf var,
"max_containers_whitelist" to say it contains "account names" not
"account hashes".
Change-Id: Ib23c6118cc5195cc04765afd28e442e4c735f0d4
We're still using saio:11000 in a few spots so a few things
don't work out of the box on the saio. Fixes bug #1024561
Change-Id: I226de54c2785b0d0b681c8d0cc24260adbd3d663
Expand recon middleware to include support for account and container
servers in addition to the existing object servers. Also add support
for retrieving recent information from auditors, replicators, and
updaters. In the case of certain checks (such as container auditors)
the stats returned are only for the most recent path processed.
The middleware has also been refactored and should now also handle
errors better in cases where stats are unavailable.
While new check's have been added the output from pre-existing
check's has not changed. This should allow existing 3rd party
utilities such as the Swift ZenPack to continue to function.
Change-Id: Ib9893a77b9b8a2f03179f2a73639bc4a6e264df7
Documentation, including a list of metrics reported and their semantics,
is in the Admin Guide in a new section, "Reporting Metrics to StatsD".
An optional "metric prefix" may be configured which will be prepended to
every metric name sent to StatsD.
Here is the rationale for doing a deep integration like this versus only
sending metrics to StatsD in middleware. It's the only way to report
some internal activities of Swift in a real-time manner. So to have one
way of reporting to StatsD and one place/style of configuration, even
some things (like, say, timing of PUT requests into the proxy-server)
which could be logged via middleware are consistently logged the same
way (deep integration via the logger delegate methods).
When log_statsd_host is configured, get_logger() injects a
swift.common.utils.StatsdClient object into the logger as
logger.statsd_client. Then a set of delegate methods on LogAdapter
either pass through to the StatsdClient object or become no-ops. This
allows StatsD logging to look like:
self.logger.increment('some.metric.here')
and do the right thing in all cases and with no messy conditional logic.
I wanted to use the pystatsd module for the StatsD client, but the
version on PyPi is lagging the git repo (and is missing both the prefix
functionality and timing_since() method). So I wrote my
swift.common.utils.StatsdClient. The interface is the same as
pystatsd.Client, but the code was written from scratch. It's pretty
simple, and the tests I added cover it. This also frees Swift from an
optional dependency on the pystatsd module, making this feature easier
to enable.
There's test coverage for the new code and all existing tests continue
to pass.
Refactored out _one_audit_pass() method in swift/account/auditor.py and
swift/container/auditor.py.
Fixed some misc. PEP8 violations.
Misc test cleanups and refactorings (particularly the way "fake logging"
is handled).
Change-Id: Ie968a9ae8771f59ee7591e2ae11999c44bfe33b2
Corrected its/it's mistakes, harmonized line wrapping within some docs
and clarified doc wording in several places.
Change-Id: Ib9ac6d5e859f770a702e1fad6de8d4abe0390b47
Add's the configuration file option "dump_json" or command line
options [-j|--dump-json] to have swift-dispersion-report output
the report in json format. This allows the dispersion report to
be more easily consumed elsewhere.
There's also a few pep8 fixes and removal of unused imports.
Change-Id: I2374311ccbef43e6bbae24665c9584e60f3da173