gerrit/Documentation/metrics.txt
Edwin Kempin 755504f33f RetryHelper: Add metric to count number of failures on auto retry
Since we retry automatically only on non-recoverable failures it's
expected that the number of failures on auto retry is the same as the
number of auto retries. The number of auto retries is captured by the
action/auto_retry_count metric. If the value of this metric is compared
with the value of the new metric we can verify if our assumption is
correct. If there is a mismatch between the values we have exceptions
that are considered as non-recoverable, but which actually are
recoverable. In this case we should change the code to treat them as
recoverable.

Signed-off-by: Edwin Kempin <ekempin@google.com>
Change-Id: Id52026bf2d1a27e7c0668bcdd63ca1effdf8db09
2019-08-21 10:46:15 +02:00

209 lines
7.7 KiB
Plaintext

= Gerrit Code Review - Metrics
Metrics about Gerrit's internal state can be sent to external monitoring systems
via plugins. See the link:dev-plugins.html#metrics[plugin documentation] for
details of plugin implementations.
== Metrics
The following metrics are reported.
=== General
* `build/label`: Version of Gerrit server software.
* `events`: Triggered events.
=== Actions
* `action/retry_attempt_count`: Number of retry attempts made
by RetryHelper to execute an action (0 == single attempt, no retry)
* `action/retry_timeout_count`: Number of action executions of RetryHelper
that ultimately timed out
* `action/auto_retry_count`: Number of automatic retries with tracing
* `action/failures_on_auto_retry_count`: Number of failures on auto retry
=== Pushes
* `receivecommits/changes`: histogram of number of changes processed
in a single upload, split up by update type (change created/updated,
change autoclosed).
* `receivecommits/latency`: latency per change for processing a push,
split up by update type (create+replace, and autoclose)
* `receivecommits/push_latency`: total latency for processing a push,
split up by update type (create+replace, autoclose, normal)
* `receivecommits/timeout`: number of timeouts during push processing.
=== Process
* `proc/birth_timestamp`: Time at which the Gerrit process started.
* `proc/uptime`: Uptime of the Gerrit process.
* `proc/cpu/usage`: CPU time used by the Gerrit process.
* `proc/num_open_fds`: Number of open file descriptors.
* `proc/jvm/memory/heap_committed`: Amount of memory guaranteed for user objects.
* `proc/jvm/memory/heap_used`: Amount of memory holding user objects.
* `proc/jvm/memory/non_heap_committed`: Amount of memory guaranteed for classes,
etc.
* `proc/jvm/memory/non_heap_used`: Amount of memory holding classes, etc.
* `proc/jvm/memory/object_pending_finalization_count`: Approximate number of
objects needing finalization.
* `proc/jvm/gc/count`: Number of GCs.
* `proc/jvm/gc/time`: Approximate accumulated GC elapsed time.
* `proc/jvm/thread/num_live`: Current live thread count.
=== Caches
* `caches/memory_cached`: Memory entries.
* `caches/memory_hit_ratio`: Memory hit ratio.
* `caches/memory_eviction_count`: Memory eviction count.
* `caches/disk_cached`: Disk entries used by persistent cache.
* `caches/disk_hit_ratio`: Disk hit ratio for persistent cache.
=== Change
* `change/submit_rule_evaluation`: Latency for evaluating submit rules on a change.
* `change/submit_type_evaluation`: Latency for evaluating the submit type on a change.
=== HTTP
* `http/server/error_count`: Rate of REST API error responses.
* `http/server/success_count`: Rate of REST API success responses.
* `http/server/rest_api/count`: Rate of REST API calls by view.
* `http/server/rest_api/change_id_type`: Rate of REST API calls by change ID type.
* `http/server/rest_api/error_count`: Rate of REST API calls by view.
* `http/server/rest_api/server_latency`: REST API call latency by view.
* `http/server/rest_api/response_bytes`: Size of REST API response on network
(may be gzip compressed) by view.
* `http/server/rest_api/change_json/to_change_info_latency`: Latency for
toChangeInfo invocations in ChangeJson.
* `http/server/rest_api/change_json/to_change_infos_latency`: Latency for
toChangeInfos invocations in ChangeJson.
* `http/server/rest_api/change_json/format_query_results_latency`: Latency for
formatQueryResults invocations in ChangeJson.
* `http/server/rest_api/ui_actions/latency`: Latency for RestView#getDescription calls.
=== Query
* `query/query_latency`: Successful query latency, accumulated over the life
of the process.
=== Core Queues
The following queues support metrics:
* default `WorkQueue`
* index batch
* index interactive
* receive commits
* send email
* ssh batch worker
* ssh command start
* ssh interactive worker
* ssh stream worker
Each queue provides the following metrics:
* `queue/<queue_name>/pool_size`: Current number of threads in the pool
* `queue/<queue_name>/max_pool_size`: Maximum allowed number of threads in the pool
* `queue/<queue_name>/active_threads`: Number of threads that are actively executing tasks
* `queue/<queue_name>/scheduled_tasks`: Number of scheduled tasks in the queue
* `queue/<queue_name>/total_scheduled_tasks_count`: Total number of tasks that have been scheduled
* `queue/<queue_name>/total_completed_tasks_count`: Total number of tasks that have completed execution
=== SSH sessions
* `sshd/sessions/connected`: Number of currently connected SSH sessions.
* `sshd/sessions/created`: Rate of new SSH sessions.
* `sshd/sessions/authentication_failures`: Rate of SSH authentication failures.
=== Topics
* `topic/cross_project_submit`: number of cross-project topic submissions.
* `topic/cross_project_submit_completed`: number of cross-project
topic submissions that concluded successfully.
=== JGit
* `jgit/block_cache/cache_used`: Bytes of memory retained in JGit block cache.
* `jgit/block_cache/open_files`: File handles held open by JGit block cache.
=== Git
* `git/upload-pack/request_count`: Total number of git-upload-pack requests.
* `git/upload-pack/phase_counting`: Time spent in the 'Counting...' phase.
* `git/upload-pack/phase_compressing`: Time spent in the 'Compressing...' phase.
* `git/upload-pack/phase_writing`: Time spent transferring bytes to client.
* `git/upload-pack/pack_bytes`: Distribution of sizes of packs sent to clients.
=== BatchUpdate
* `batch_update/execute_change_ops`: BatchUpdate change update latency,
excluding reindexing
=== NoteDb
* `notedb/update_latency`: NoteDb update latency by table.
* `notedb/stage_update_latency`: Latency for staging updates to NoteDb by table.
* `notedb/read_latency`: NoteDb read latency by table.
* `notedb/parse_latency`: NoteDb parse latency by table.
* `notedb/external_id_cache_load_count`: Total number of times the external ID
cache loader was called.
* `notedb/external_id_partial_read_latency`: Latency for generating a new external ID
cache state from a prior state.
* `notedb/external_id_update_count`: Total number of external ID updates.
* `notedb/read_all_external_ids_latency`: Latency for reading all
external ID's from NoteDb.
=== Permissions
* `permissions/project_state/computation_latency`: Latency to compute current access
sections on a project by traversing it's parents.
* `permissions/permission_collection/filter_latency`: Latency to filter access sections
by user and ref.
* `permissions/ref_filter/full_filter_count`: Rate of full ref filter operations
* `permissions/ref_filter/skip_filter_count`: Rate of ref filter operations where
we skip full evaluation because the user can read all refs
=== Reviewer Suggestion
* `reviewer_suggestion/query_accounts`: Latency for querying accounts for
reviewer suggestion.
* `reviewer_suggestion/recommend_accounts`: Latency for recommending accounts
for reviewer suggestion.
* `reviewer_suggestion/load_accounts`: Latency for loading accounts for
reviewer suggestion.
* `reviewer_suggestion/query_groups`: Latency for querying groups for reviewer
suggestion.
=== Repo Sequences
* `sequence/next_id_latency`: Latency of requesting IDs from repo sequences.
=== Plugin
* `plugin/latency`: Latency for plugin invocation.
* `plugin/error_count`: Number of plugin errors.
=== Group
* `group/guess_relevant_groups_latency`: Latency for guessing relevant groups.
=== Replication Plugin
* `plugins/replication/replication_latency`: Time spent pushing to remote
destination.
* `plugins/replication/replication_delay`: Time spent waiting before pushing to
remote destination.
* `plugins/replication/replication_retries`: Number of retries when pushing to
remote destination.
=== License
* `license/cla_check_count`: Total number of CLA check requests.
GERRIT
------
Part of link:index.html[Gerrit Code Review]
SEARCHBOX
---------