docs: Move metric name/description tables out to separate page(s)

Offer it both by service and as a single, more easily searchable, page.

That admin guide is *still* too long, but this should help a bit.

Change-Id: I946c72f40dce2f33ef845a0ca816038727848b3a
This commit is contained in:
Tim Burke 2023-05-25 13:40:33 -07:00
parent 149b617c28
commit 307315bde2
18 changed files with 499 additions and 442 deletions

View File

@ -883,450 +883,28 @@ of async_pendings in real-time, but will not tell you the current number of
async_pending container updates on disk at any point in time.
Note also that the set of metrics collected, their names, and their semantics
are not locked down and will change over time.
are not locked down and will change over time. For more details, see the
service-specific tables listed below:
Metrics for `account-auditor`:
========================== =========================================================
Metric Name Description
-------------------------- ---------------------------------------------------------
`account-auditor.errors` Count of audit runs (across all account databases) which
caught an Exception.
`account-auditor.passes` Count of individual account databases which passed audit.
`account-auditor.failures` Count of individual account databases which failed audit.
`account-auditor.timing` Timing data for individual account database audits.
========================== =========================================================
Metrics for `account-reaper`:
============================================== ====================================================
Metric Name Description
---------------------------------------------- ----------------------------------------------------
`account-reaper.errors` Count of devices failing the mount check.
`account-reaper.timing` Timing data for each reap_account() call.
`account-reaper.return_codes.X` Count of HTTP return codes from various operations
(e.g. object listing, container deletion, etc.). The
value for X is the first digit of the return code
(2 for 201, 4 for 404, etc.).
`account-reaper.containers_failures` Count of failures to delete a container.
`account-reaper.containers_deleted` Count of containers successfully deleted.
`account-reaper.containers_remaining` Count of containers which failed to delete with
zero successes.
`account-reaper.containers_possibly_remaining` Count of containers which failed to delete with
at least one success.
`account-reaper.objects_failures` Count of failures to delete an object.
`account-reaper.objects_deleted` Count of objects successfully deleted.
`account-reaper.objects_remaining` Count of objects which failed to delete with zero
successes.
`account-reaper.objects_possibly_remaining` Count of objects which failed to delete with at
least one success.
============================================== ====================================================
Metrics for `account-server` ("Not Found" is not considered an error and requests
which increment `errors` are not included in the timing data):
======================================== =======================================================
Metric Name Description
---------------------------------------- -------------------------------------------------------
`account-server.DELETE.errors.timing` Timing data for each DELETE request resulting in an
error: bad request, not mounted, missing timestamp.
`account-server.DELETE.timing` Timing data for each DELETE request not resulting in
an error.
`account-server.PUT.errors.timing` Timing data for each PUT request resulting in an error:
bad request, not mounted, conflict, recently-deleted.
`account-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`account-server.HEAD.errors.timing` Timing data for each HEAD request resulting in an
error: bad request, not mounted.
`account-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error.
`account-server.GET.errors.timing` Timing data for each GET request resulting in an
error: bad request, not mounted, bad delimiter,
account listing limit too high, bad accept header.
`account-server.GET.timing` Timing data for each GET request not resulting in
an error.
`account-server.REPLICATE.errors.timing` Timing data for each REPLICATE request resulting in an
error: bad request, not mounted.
`account-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
`account-server.POST.errors.timing` Timing data for each POST request resulting in an
error: bad request, bad or missing timestamp, not
mounted.
`account-server.POST.timing` Timing data for each POST request not resulting in
an error.
======================================== =======================================================
Metrics for `account-replicator`:
===================================== ====================================================
Metric Name Description
------------------------------------- ----------------------------------------------------
`account-replicator.diffs` Count of syncs handled by sending differing rows.
`account-replicator.diff_caps` Count of "diffs" operations which failed because
"max_diffs" was hit.
`account-replicator.no_changes` Count of accounts found to be in sync.
`account-replicator.hashmatches` Count of accounts found to be in sync via hash
comparison (`broker.merge_syncs` was called).
`account-replicator.rsyncs` Count of completely missing accounts which were sent
via rsync.
`account-replicator.remote_merges` Count of syncs handled by sending entire database
via rsync.
`account-replicator.attempts` Count of database replication attempts.
`account-replicator.failures` Count of database replication attempts which failed
due to corruption (quarantined) or inability to read
as well as attempts to individual nodes which
failed.
`account-replicator.removes.<device>` Count of databases on <device> deleted because the
delete_timestamp was greater than the put_timestamp
and the database had no rows or because it was
successfully sync'ed to other locations and doesn't
belong here anymore.
`account-replicator.successes` Count of replication attempts to an individual node
which were successful.
`account-replicator.timing` Timing data for each database replication attempt
not resulting in a failure.
===================================== ====================================================
Metrics for `container-auditor`:
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`container-auditor.errors` Incremented when an Exception is caught in an audit
pass (only once per pass, max).
`container-auditor.passes` Count of individual containers passing an audit.
`container-auditor.failures` Count of individual containers failing an audit.
`container-auditor.timing` Timing data for each container audit.
============================ ====================================================
Metrics for `container-replicator`:
======================================= ====================================================
Metric Name Description
--------------------------------------- ----------------------------------------------------
`container-replicator.diffs` Count of syncs handled by sending differing rows.
`container-replicator.diff_caps` Count of "diffs" operations which failed because
"max_diffs" was hit.
`container-replicator.no_changes` Count of containers found to be in sync.
`container-replicator.hashmatches` Count of containers found to be in sync via hash
comparison (`broker.merge_syncs` was called).
`container-replicator.rsyncs` Count of completely missing containers where were sent
via rsync.
`container-replicator.remote_merges` Count of syncs handled by sending entire database
via rsync.
`container-replicator.attempts` Count of database replication attempts.
`container-replicator.failures` Count of database replication attempts which failed
due to corruption (quarantined) or inability to read
as well as attempts to individual nodes which
failed.
`container-replicator.removes.<device>` Count of databases deleted on <device> because the
delete_timestamp was greater than the put_timestamp
and the database had no rows or because it was
successfully sync'ed to other locations and doesn't
belong here anymore.
`container-replicator.successes` Count of replication attempts to an individual node
which were successful.
`container-replicator.timing` Timing data for each database replication attempt
not resulting in a failure.
======================================= ====================================================
Metrics for `container-server` ("Not Found" is not considered an error and requests
which increment `errors` are not included in the timing data):
========================================== ====================================================
Metric Name Description
------------------------------------------ ----------------------------------------------------
`container-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
not mounted, missing timestamp, conflict.
`container-server.DELETE.timing` Timing data for each DELETE request not resulting in
an error.
`container-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
missing timestamp, not mounted, conflict.
`container-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`container-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
not mounted.
`container-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error.
`container-server.GET.errors.timing` Timing data for GET request errors: bad request,
not mounted, parameters not utf8, bad accept header.
`container-server.GET.timing` Timing data for each GET request not resulting in
an error.
`container-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
request, not mounted.
`container-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
`container-server.POST.errors.timing` Timing data for POST request errors: bad request,
bad x-container-sync-to, not mounted.
`container-server.POST.timing` Timing data for each POST request not resulting in
an error.
========================================== ====================================================
Metrics for `container-sync`:
=============================== ====================================================
Metric Name Description
------------------------------- ----------------------------------------------------
`container-sync.skips` Count of containers skipped because they don't have
sync'ing enabled.
`container-sync.failures` Count of failures sync'ing of individual containers.
`container-sync.syncs` Count of individual containers sync'ed successfully.
`container-sync.deletes` Count of container database rows sync'ed by
deletion.
`container-sync.deletes.timing` Timing data for each container database row
synchronization via deletion.
`container-sync.puts` Count of container database rows sync'ed by Putting.
`container-sync.puts.timing` Timing data for each container database row
synchronization via Putting.
=============================== ====================================================
Metrics for `container-updater`:
============================== ====================================================
Metric Name Description
------------------------------ ----------------------------------------------------
`container-updater.successes` Count of containers which successfully updated their
account.
`container-updater.failures` Count of containers which failed to update their
account.
`container-updater.no_changes` Count of containers which didn't need to update
their account.
`container-updater.timing` Timing data for processing a container; only
includes timing for containers which needed to
update their accounts (i.e. "successes" and
"failures" but not "no_changes").
============================== ====================================================
Metrics for `object-auditor`:
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`object-auditor.quarantines` Count of objects failing audit and quarantined.
`object-auditor.errors` Count of errors encountered while auditing objects.
`object-auditor.timing` Timing data for each object audit (does not include
any rate-limiting sleep time for
max_files_per_second, but does include rate-limiting
sleep time for max_bytes_per_second).
============================ ====================================================
Metrics for `object-expirer`:
======================== ====================================================
Metric Name Description
------------------------ ----------------------------------------------------
`object-expirer.objects` Count of objects expired.
`object-expirer.errors` Count of errors encountered while attempting to
expire an object.
`object-expirer.timing` Timing data for each object expiration attempt,
including ones resulting in an error.
======================== ====================================================
Metrics for `object-reconstructor`:
====================================================== ======================================================
Metric Name Description
------------------------------------------------------ ------------------------------------------------------
`object-reconstructor.partition.delete.count.<device>` A count of partitions on <device> which were
reconstructed and synced to another node because they
didn't belong on this node. This metric is tracked
per-device to allow for "quiescence detection" for
object reconstruction activity on each device.
`object-reconstructor.partition.delete.timing` Timing data for partitions reconstructed and synced to
another node because they didn't belong on this node.
This metric is not tracked per device.
`object-reconstructor.partition.update.count.<device>` A count of partitions on <device> which were
reconstructed and synced to another node, but also
belong on this node. As with delete.count, this metric
is tracked per-device.
`object-reconstructor.partition.update.timing` Timing data for partitions reconstructed which also
belong on this node. This metric is not tracked
per-device.
`object-reconstructor.suffix.hashes` Count of suffix directories whose hash (of filenames)
was recalculated.
`object-reconstructor.suffix.syncs` Count of suffix directories reconstructed with ssync.
====================================================== ======================================================
Metrics for `object-replicator`:
=================================================== ====================================================
Metric Name Description
--------------------------------------------------- ----------------------------------------------------
`object-replicator.partition.delete.count.<device>` A count of partitions on <device> which were
replicated to another node because they didn't
belong on this node. This metric is tracked
per-device to allow for "quiescence detection" for
object replication activity on each device.
`object-replicator.partition.delete.timing` Timing data for partitions replicated to another
node because they didn't belong on this node. This
metric is not tracked per device.
`object-replicator.partition.update.count.<device>` A count of partitions on <device> which were
replicated to another node, but also belong on this
node. As with delete.count, this metric is tracked
per-device.
`object-replicator.partition.update.timing` Timing data for partitions replicated which also
belong on this node. This metric is not tracked
per-device.
`object-replicator.suffix.hashes` Count of suffix directories whose hash (of filenames)
was recalculated.
`object-replicator.suffix.syncs` Count of suffix directories replicated with rsync.
=================================================== ====================================================
Metrics for `object-server`:
======================================= ====================================================
Metric Name Description
--------------------------------------- ----------------------------------------------------
`object-server.quarantines` Count of objects (files) found bad and moved to
quarantine.
`object-server.async_pendings` Count of container updates saved as async_pendings
(may result from PUT or DELETE requests).
`object-server.POST.errors.timing` Timing data for POST request errors: bad request,
missing timestamp, delete-at in past, not mounted.
`object-server.POST.timing` Timing data for each POST request not resulting in
an error.
`object-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
not mounted, missing timestamp, object creation
constraint violation, delete-at in past.
`object-server.PUT.timeouts` Count of object PUTs which exceeded max_upload_time.
`object-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`object-server.PUT.<device>.timing` Timing data per kB transferred (ms/kB) for each
non-zero-byte PUT request on each device.
Monitoring problematic devices, higher is bad.
`object-server.GET.errors.timing` Timing data for GET request errors: bad request,
not mounted, header timestamps before the epoch,
precondition failed.
File errors resulting in a quarantine are not
counted here.
`object-server.GET.timing` Timing data for each GET request not resulting in an
error. Includes requests which couldn't find the
object (including disk errors resulting in file
quarantine).
`object-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
not mounted.
`object-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error. Includes requests which couldn't find the
object (including disk errors resulting in file
quarantine).
`object-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
missing timestamp, not mounted, precondition
failed. Includes requests which couldn't find or
match the object.
`object-server.DELETE.timing` Timing data for each DELETE request not resulting
in an error.
`object-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
request, not mounted.
`object-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
======================================= ====================================================
Metrics for `object-updater`:
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`object-updater.errors` Count of drives not mounted or async_pending files
with an unexpected name.
`object-updater.timing` Timing data for object sweeps to flush async_pending
container updates. Does not include object sweeps
which did not find an existing async_pending storage
directory.
`object-updater.quarantines` Count of async_pending container updates which were
corrupted and moved to quarantine.
`object-updater.successes` Count of successful container updates.
`object-updater.failures` Count of failed container updates.
`object-updater.unlinks` Count of async_pending files unlinked. An
async_pending file is unlinked either when it is
successfully processed or when the replicator sees
that there is a newer async_pending file for the
same object.
============================ ====================================================
Metrics for `proxy-server` (in the table, `<type>` is the proxy-server
controller responsible for the request and will be one of "account",
"container", or "object"):
======================================== ====================================================
Metric Name Description
---------------------------------------- ----------------------------------------------------
`proxy-server.errors` Count of errors encountered while serving requests
before the controller type is determined. Includes
invalid Content-Length, errors finding the internal
controller to handle the request, invalid utf8, and
bad URLs.
`proxy-server.<type>.handoff_count` Count of node hand-offs; only tracked if log_handoffs
is set in the proxy-server config.
`proxy-server.<type>.handoff_all_count` Count of times *only* hand-off locations were
utilized; only tracked if log_handoffs is set in the
proxy-server config.
`proxy-server.<type>.client_timeouts` Count of client timeouts (client did not read within
`client_timeout` seconds during a GET or did not
supply data within `client_timeout` seconds during
a PUT).
`proxy-server.<type>.client_disconnects` Count of detected client disconnects during PUT
operations (does NOT include caught Exceptions in
the proxy-server which caused a client disconnect).
======================================== ====================================================
Metrics for `proxy-logging` middleware (in the table, `<type>` is either the
proxy-server controller responsible for the request: "account", "container",
"object", or the string "SOS" if the request came from the `Swift Origin Server`_
middleware. The `<verb>` portion will be one of "GET", "HEAD", "POST", "PUT",
"DELETE", "COPY", "OPTIONS", or "BAD_METHOD". The list of valid HTTP methods
is configurable via the `log_statsd_valid_http_methods` config variable and
the default setting yields the above behavior):
.. _Swift Origin Server: https://github.com/dpgoetz/sos
==================================================== ============================================
Metric Name Description
---------------------------------------------------- --------------------------------------------
`proxy-server.<type>.<verb>.<status>.timing` Timing data for requests, start to finish.
The <status> portion is the numeric HTTP
status code for the request (e.g. "200" or
"404").
`proxy-server.<type>.GET.<status>.first-byte.timing` Timing data up to completion of sending the
response headers (only for GET requests).
<status> and <type> are as for the main
timing metric.
`proxy-server.<type>.<verb>.<status>.xfer` This counter metric is the sum of bytes
transferred in (from clients) and out (to
clients) for requests. The <type>, <verb>,
and <status> portions of the metric are just
like the main timing metric.
==================================================== ============================================
The `proxy-logging` middleware also groups these metrics by policy. The
`<policy-index>` portion represents a policy index):
========================================================================== =====================================
Metric Name Description
-------------------------------------------------------------------------- -------------------------------------
`proxy-server.object.policy.<policy-index>.<verb>.<status>.timing` Timing data for requests, aggregated
by policy index.
`proxy-server.object.policy.<policy-index>.GET.<status>.first-byte.timing` Timing data up to completion of
sending the response headers,
aggregated by policy index.
`proxy-server.object.policy.<policy-index>.<verb>.<status>.xfer` Sum of bytes transferred in and out,
aggregated by policy index.
========================================================================== =====================================
Metrics for `tempauth` middleware (in the table, `<reseller_prefix>` represents
the actual configured reseller_prefix or "`NONE`" if the reseller_prefix is the
empty string):
========================================= ====================================================
Metric Name Description
----------------------------------------- ----------------------------------------------------
`tempauth.<reseller_prefix>.unauthorized` Count of regular requests which were denied with
HTTPUnauthorized.
`tempauth.<reseller_prefix>.forbidden` Count of regular requests which were denied with
HTTPForbidden.
`tempauth.<reseller_prefix>.token_denied` Count of token requests which were denied.
`tempauth.<reseller_prefix>.errors` Count of errors.
========================================= ====================================================
.. toctree::
metrics/account_auditor
metrics/account_reaper
metrics/account_server
metrics/account_replicator
metrics/container_auditor
metrics/container_replicator
metrics/container_server
metrics/container_sync
metrics/container_updater
metrics/object_auditor
metrics/object_expirer
metrics/object_reconstructor
metrics/object_replicator
metrics/object_server
metrics/object_updater
metrics/proxy_server
Or, view :doc:`metrics/all` as one page.
------------------------
Debugging Tips and Tools

View File

@ -0,0 +1,12 @@
``account-auditor`` Metrics
===========================
========================== =========================================================
Metric Name Description
-------------------------- ---------------------------------------------------------
`account-auditor.errors` Count of audit runs (across all account databases) which
caught an Exception.
`account-auditor.passes` Count of individual account databases which passed audit.
`account-auditor.failures` Count of individual account databases which failed audit.
`account-auditor.timing` Timing data for individual account database audits.
========================== =========================================================

View File

@ -0,0 +1,25 @@
``account-reaper`` Metrics
==========================
============================================== ====================================================
Metric Name Description
---------------------------------------------- ----------------------------------------------------
`account-reaper.errors` Count of devices failing the mount check.
`account-reaper.timing` Timing data for each reap_account() call.
`account-reaper.return_codes.X` Count of HTTP return codes from various operations
(e.g. object listing, container deletion, etc.). The
value for X is the first digit of the return code
(2 for 201, 4 for 404, etc.).
`account-reaper.containers_failures` Count of failures to delete a container.
`account-reaper.containers_deleted` Count of containers successfully deleted.
`account-reaper.containers_remaining` Count of containers which failed to delete with
zero successes.
`account-reaper.containers_possibly_remaining` Count of containers which failed to delete with
at least one success.
`account-reaper.objects_failures` Count of failures to delete an object.
`account-reaper.objects_deleted` Count of objects successfully deleted.
`account-reaper.objects_remaining` Count of objects which failed to delete with zero
successes.
`account-reaper.objects_possibly_remaining` Count of objects which failed to delete with at
least one success.
============================================== ====================================================

View File

@ -0,0 +1,31 @@
``account-replicator`` Metrics
==============================
===================================== ====================================================
Metric Name Description
------------------------------------- ----------------------------------------------------
`account-replicator.diffs` Count of syncs handled by sending differing rows.
`account-replicator.diff_caps` Count of "diffs" operations which failed because
"max_diffs" was hit.
`account-replicator.no_changes` Count of accounts found to be in sync.
`account-replicator.hashmatches` Count of accounts found to be in sync via hash
comparison (`broker.merge_syncs` was called).
`account-replicator.rsyncs` Count of completely missing accounts which were sent
via rsync.
`account-replicator.remote_merges` Count of syncs handled by sending entire database
via rsync.
`account-replicator.attempts` Count of database replication attempts.
`account-replicator.failures` Count of database replication attempts which failed
due to corruption (quarantined) or inability to read
as well as attempts to individual nodes which
failed.
`account-replicator.removes.<device>` Count of databases on <device> deleted because the
delete_timestamp was greater than the put_timestamp
and the database had no rows or because it was
successfully sync'ed to other locations and doesn't
belong here anymore.
`account-replicator.successes` Count of replication attempts to an individual node
which were successful.
`account-replicator.timing` Timing data for each database replication attempt
not resulting in a failure.
===================================== ====================================================

View File

@ -0,0 +1,37 @@
``account-server`` Metrics
==========================
..note::
"Not Found" is not considered an error and requests
which increment `errors` are not included in the timing data.
======================================== =======================================================
Metric Name Description
---------------------------------------- -------------------------------------------------------
`account-server.DELETE.errors.timing` Timing data for each DELETE request resulting in an
error: bad request, not mounted, missing timestamp.
`account-server.DELETE.timing` Timing data for each DELETE request not resulting in
an error.
`account-server.PUT.errors.timing` Timing data for each PUT request resulting in an error:
bad request, not mounted, conflict, recently-deleted.
`account-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`account-server.HEAD.errors.timing` Timing data for each HEAD request resulting in an
error: bad request, not mounted.
`account-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error.
`account-server.GET.errors.timing` Timing data for each GET request resulting in an
error: bad request, not mounted, bad delimiter,
account listing limit too high, bad accept header.
`account-server.GET.timing` Timing data for each GET request not resulting in
an error.
`account-server.REPLICATE.errors.timing` Timing data for each REPLICATE request resulting in an
error: bad request, not mounted.
`account-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
`account-server.POST.errors.timing` Timing data for each POST request resulting in an
error: bad request, bad or missing timestamp, not
mounted.
`account-server.POST.timing` Timing data for each POST request not resulting in
an error.
======================================== =======================================================

View File

@ -0,0 +1,24 @@
:orphan:
All Statsd Metrics
==================
.. include:: account_auditor.rst
.. include:: account_reaper.rst
.. include:: account_server.rst
.. include:: account_replicator.rst
.. include:: container_auditor.rst
.. include:: container_replicator.rst
.. include:: container_server.rst
.. include:: container_sync.rst
.. include:: container_updater.rst
.. include:: object_auditor.rst
.. include:: object_expirer.rst
.. include:: object_reconstructor.rst
.. include:: object_replicator.rst
.. include:: object_server.rst
.. include:: object_updater.rst
.. include:: proxy_server.rst

View File

@ -0,0 +1,12 @@
``container-auditor`` Metrics
=============================
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`container-auditor.errors` Incremented when an Exception is caught in an audit
pass (only once per pass, max).
`container-auditor.passes` Count of individual containers passing an audit.
`container-auditor.failures` Count of individual containers failing an audit.
`container-auditor.timing` Timing data for each container audit.
============================ ====================================================

View File

@ -0,0 +1,31 @@
``container-replicator`` Metrics
================================
======================================= ====================================================
Metric Name Description
--------------------------------------- ----------------------------------------------------
`container-replicator.diffs` Count of syncs handled by sending differing rows.
`container-replicator.diff_caps` Count of "diffs" operations which failed because
"max_diffs" was hit.
`container-replicator.no_changes` Count of containers found to be in sync.
`container-replicator.hashmatches` Count of containers found to be in sync via hash
comparison (`broker.merge_syncs` was called).
`container-replicator.rsyncs` Count of completely missing containers where were sent
via rsync.
`container-replicator.remote_merges` Count of syncs handled by sending entire database
via rsync.
`container-replicator.attempts` Count of database replication attempts.
`container-replicator.failures` Count of database replication attempts which failed
due to corruption (quarantined) or inability to read
as well as attempts to individual nodes which
failed.
`container-replicator.removes.<device>` Count of databases deleted on <device> because the
delete_timestamp was greater than the put_timestamp
and the database had no rows or because it was
successfully sync'ed to other locations and doesn't
belong here anymore.
`container-replicator.successes` Count of replication attempts to an individual node
which were successful.
`container-replicator.timing` Timing data for each database replication attempt
not resulting in a failure.
======================================= ====================================================

View File

@ -0,0 +1,35 @@
``container-server`` Metrics
============================
.. note::
"Not Found" is not considered an error and requests
which increment `errors` are not included in the timing data.
========================================== ====================================================
Metric Name Description
------------------------------------------ ----------------------------------------------------
`container-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
not mounted, missing timestamp, conflict.
`container-server.DELETE.timing` Timing data for each DELETE request not resulting in
an error.
`container-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
missing timestamp, not mounted, conflict.
`container-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`container-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
not mounted.
`container-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error.
`container-server.GET.errors.timing` Timing data for GET request errors: bad request,
not mounted, parameters not utf8, bad accept header.
`container-server.GET.timing` Timing data for each GET request not resulting in
an error.
`container-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
request, not mounted.
`container-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
`container-server.POST.errors.timing` Timing data for POST request errors: bad request,
bad x-container-sync-to, not mounted.
`container-server.POST.timing` Timing data for each POST request not resulting in
an error.
========================================== ====================================================

View File

@ -0,0 +1,18 @@
``container-sync`` Metrics
==========================
=============================== ====================================================
Metric Name Description
------------------------------- ----------------------------------------------------
`container-sync.skips` Count of containers skipped because they don't have
sync'ing enabled.
`container-sync.failures` Count of failures sync'ing of individual containers.
`container-sync.syncs` Count of individual containers sync'ed successfully.
`container-sync.deletes` Count of container database rows sync'ed by
deletion.
`container-sync.deletes.timing` Timing data for each container database row
synchronization via deletion.
`container-sync.puts` Count of container database rows sync'ed by Putting.
`container-sync.puts.timing` Timing data for each container database row
synchronization via Putting.
=============================== ====================================================

View File

@ -0,0 +1,17 @@
``container-updater`` Metrics
=============================
============================== ====================================================
Metric Name Description
------------------------------ ----------------------------------------------------
`container-updater.successes` Count of containers which successfully updated their
account.
`container-updater.failures` Count of containers which failed to update their
account.
`container-updater.no_changes` Count of containers which didn't need to update
their account.
`container-updater.timing` Timing data for processing a container; only
includes timing for containers which needed to
update their accounts (i.e. "successes" and
"failures" but not "no_changes").
============================== ====================================================

View File

@ -0,0 +1,13 @@
``object-auditor`` Metrics
==========================
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`object-auditor.quarantines` Count of objects failing audit and quarantined.
`object-auditor.errors` Count of errors encountered while auditing objects.
`object-auditor.timing` Timing data for each object audit (does not include
any rate-limiting sleep time for
max_files_per_second, but does include rate-limiting
sleep time for max_bytes_per_second).
============================ ====================================================

View File

@ -0,0 +1,12 @@
``object-expirer`` Metrics
==========================
======================== ====================================================
Metric Name Description
------------------------ ----------------------------------------------------
`object-expirer.objects` Count of objects expired.
`object-expirer.errors` Count of errors encountered while attempting to
expire an object.
`object-expirer.timing` Timing data for each object expiration attempt,
including ones resulting in an error.
======================== ====================================================

View File

@ -0,0 +1,25 @@
``object-reconstructor`` Metrics
================================
====================================================== ======================================================
Metric Name Description
------------------------------------------------------ ------------------------------------------------------
`object-reconstructor.partition.delete.count.<device>` A count of partitions on <device> which were
reconstructed and synced to another node because they
didn't belong on this node. This metric is tracked
per-device to allow for "quiescence detection" for
object reconstruction activity on each device.
`object-reconstructor.partition.delete.timing` Timing data for partitions reconstructed and synced to
another node because they didn't belong on this node.
This metric is not tracked per device.
`object-reconstructor.partition.update.count.<device>` A count of partitions on <device> which were
reconstructed and synced to another node, but also
belong on this node. As with delete.count, this metric
is tracked per-device.
`object-reconstructor.partition.update.timing` Timing data for partitions reconstructed which also
belong on this node. This metric is not tracked
per-device.
`object-reconstructor.suffix.hashes` Count of suffix directories whose hash (of filenames)
was recalculated.
`object-reconstructor.suffix.syncs` Count of suffix directories reconstructed with ssync.
====================================================== ======================================================

View File

@ -0,0 +1,25 @@
``object-replicator`` Metrics
=============================
=================================================== ====================================================
Metric Name Description
--------------------------------------------------- ----------------------------------------------------
`object-replicator.partition.delete.count.<device>` A count of partitions on <device> which were
replicated to another node because they didn't
belong on this node. This metric is tracked
per-device to allow for "quiescence detection" for
object replication activity on each device.
`object-replicator.partition.delete.timing` Timing data for partitions replicated to another
node because they didn't belong on this node. This
metric is not tracked per device.
`object-replicator.partition.update.count.<device>` A count of partitions on <device> which were
replicated to another node, but also belong on this
node. As with delete.count, this metric is tracked
per-device.
`object-replicator.partition.update.timing` Timing data for partitions replicated which also
belong on this node. This metric is not tracked
per-device.
`object-replicator.suffix.hashes` Count of suffix directories whose hash (of filenames)
was recalculated.
`object-replicator.suffix.syncs` Count of suffix directories replicated with rsync.
=================================================== ====================================================

View File

@ -0,0 +1,49 @@
``object-server`` Metrics
=========================
======================================= ====================================================
Metric Name Description
--------------------------------------- ----------------------------------------------------
`object-server.quarantines` Count of objects (files) found bad and moved to
quarantine.
`object-server.async_pendings` Count of container updates saved as async_pendings
(may result from PUT or DELETE requests).
`object-server.POST.errors.timing` Timing data for POST request errors: bad request,
missing timestamp, delete-at in past, not mounted.
`object-server.POST.timing` Timing data for each POST request not resulting in
an error.
`object-server.PUT.errors.timing` Timing data for PUT request errors: bad request,
not mounted, missing timestamp, object creation
constraint violation, delete-at in past.
`object-server.PUT.timeouts` Count of object PUTs which exceeded max_upload_time.
`object-server.PUT.timing` Timing data for each PUT request not resulting in an
error.
`object-server.PUT.<device>.timing` Timing data per kB transferred (ms/kB) for each
non-zero-byte PUT request on each device.
Monitoring problematic devices, higher is bad.
`object-server.GET.errors.timing` Timing data for GET request errors: bad request,
not mounted, header timestamps before the epoch,
precondition failed.
File errors resulting in a quarantine are not
counted here.
`object-server.GET.timing` Timing data for each GET request not resulting in an
error. Includes requests which couldn't find the
object (including disk errors resulting in file
quarantine).
`object-server.HEAD.errors.timing` Timing data for HEAD request errors: bad request,
not mounted.
`object-server.HEAD.timing` Timing data for each HEAD request not resulting in
an error. Includes requests which couldn't find the
object (including disk errors resulting in file
quarantine).
`object-server.DELETE.errors.timing` Timing data for DELETE request errors: bad request,
missing timestamp, not mounted, precondition
failed. Includes requests which couldn't find or
match the object.
`object-server.DELETE.timing` Timing data for each DELETE request not resulting
in an error.
`object-server.REPLICATE.errors.timing` Timing data for REPLICATE request errors: bad
request, not mounted.
`object-server.REPLICATE.timing` Timing data for each REPLICATE request not resulting
in an error.
======================================= ====================================================

View File

@ -0,0 +1,22 @@
``object-updater`` Metrics
==========================
============================ ====================================================
Metric Name Description
---------------------------- ----------------------------------------------------
`object-updater.errors` Count of drives not mounted or async_pending files
with an unexpected name.
`object-updater.timing` Timing data for object sweeps to flush async_pending
container updates. Does not include object sweeps
which did not find an existing async_pending storage
directory.
`object-updater.quarantines` Count of async_pending container updates which were
corrupted and moved to quarantine.
`object-updater.successes` Count of successful container updates.
`object-updater.failures` Count of failed container updates.
`object-updater.unlinks` Count of async_pending files unlinked. An
async_pending file is unlinked either when it is
successfully processed or when the replicator sees
that there is a newer async_pending file for the
same object.
============================ ====================================================

View File

@ -0,0 +1,91 @@
``proxy-server`` Metrics
========================
In the table, ``<type>`` is the proxy-server controller responsible for the
request and will be one of ``account``, ``container``, or ``object``.
======================================== ====================================================
Metric Name Description
---------------------------------------- ----------------------------------------------------
`proxy-server.errors` Count of errors encountered while serving requests
before the controller type is determined. Includes
invalid Content-Length, errors finding the internal
controller to handle the request, invalid utf8, and
bad URLs.
`proxy-server.<type>.handoff_count` Count of node hand-offs; only tracked if log_handoffs
is set in the proxy-server config.
`proxy-server.<type>.handoff_all_count` Count of times *only* hand-off locations were
utilized; only tracked if log_handoffs is set in the
proxy-server config.
`proxy-server.<type>.client_timeouts` Count of client timeouts (client did not read within
`client_timeout` seconds during a GET or did not
supply data within `client_timeout` seconds during
a PUT).
`proxy-server.<type>.client_disconnects` Count of detected client disconnects during PUT
operations (does NOT include caught Exceptions in
the proxy-server which caused a client disconnect).
======================================== ====================================================
Additionally, middleware often emit their own metrics
``proxy-logging`` Middleware
----------------------------
In the table, ``<type>`` is either the proxy-server controller responsible
for the request: ``account``, ``container``, ``object``, or the string
``SOS`` if the request came from the `Swift Origin Server`_ middleware.
The ``<verb>`` portion will be one of ``GET``, ``HEAD``, ``POST``, ``PUT``,
``DELETE``, ``COPY``, ``OPTIONS``, or ``BAD_METHOD``. The list of valid
HTTP methods is configurable via the ``log_statsd_valid_http_methods``
config variable and the default setting yields the above behavior.
.. _Swift Origin Server: https://github.com/dpgoetz/sos
==================================================== ============================================
Metric Name Description
---------------------------------------------------- --------------------------------------------
`proxy-server.<type>.<verb>.<status>.timing` Timing data for requests, start to finish.
The <status> portion is the numeric HTTP
status code for the request (e.g. "200" or
"404").
`proxy-server.<type>.GET.<status>.first-byte.timing` Timing data up to completion of sending the
response headers (only for GET requests).
<status> and <type> are as for the main
timing metric.
`proxy-server.<type>.<verb>.<status>.xfer` This counter metric is the sum of bytes
transferred in (from clients) and out (to
clients) for requests. The <type>, <verb>,
and <status> portions of the metric are just
like the main timing metric.
==================================================== ============================================
The ``proxy-logging`` middleware also groups these metrics by policy. The
``<policy-index>`` portion represents a policy index:
========================================================================== =====================================
Metric Name Description
-------------------------------------------------------------------------- -------------------------------------
`proxy-server.object.policy.<policy-index>.<verb>.<status>.timing` Timing data for requests, aggregated
by policy index.
`proxy-server.object.policy.<policy-index>.GET.<status>.first-byte.timing` Timing data up to completion of
sending the response headers,
aggregated by policy index.
`proxy-server.object.policy.<policy-index>.<verb>.<status>.xfer` Sum of bytes transferred in and out,
aggregated by policy index.
========================================================================== =====================================
``tempauth`` Middleware
-----------------------
In the table, ``<reseller_prefix>`` represents the actual configured
reseller_prefix or ``NONE`` if the reseller_prefix is the empty string:
========================================= ====================================================
Metric Name Description
----------------------------------------- ----------------------------------------------------
`tempauth.<reseller_prefix>.unauthorized` Count of regular requests which were denied with
HTTPUnauthorized.
`tempauth.<reseller_prefix>.forbidden` Count of regular requests which were denied with
HTTPForbidden.
`tempauth.<reseller_prefix>.token_denied` Count of token requests which were denied.
`tempauth.<reseller_prefix>.errors` Count of errors.
========================================= ====================================================