:title: Monitoring Monitoring ========== .. _statsd: Statsd reporting ---------------- Zuul comes with support for the statsd protocol, when enabled and configured (see below), the Zuul scheduler will emit raw metrics to a statsd receiver which let you in turn generate nice graphics. Configuration ~~~~~~~~~~~~~ Statsd support uses the ``statsd`` python module. Note that support is optional and Zuul will start without the statsd python module present. Configuration is in the :attr:`statsd` section of ``zuul.conf``. Metrics ~~~~~~~ These metrics are emitted by the Zuul :ref:`scheduler`: .. stat:: zuul.event.. :type: counter Zuul will report counters for each type of event it receives from each of its configured drivers. .. stat:: zuul.tenant..pipeline Holds metrics specific to jobs. This hierarchy includes: .. stat:: A set of metrics for each pipeline named as defined in the Zuul config. .. stat:: all_jobs :type: counter Number of jobs triggered by the pipeline. .. stat:: current_changes :type: gauge The number of items currently being processed by this pipeline. .. stat:: project This hierarchy holds more specific metrics for each project participating in the pipeline. .. stat:: The canonical hostname for the triggering project. Embedded ``.`` characters will be translated to ``_``. .. stat:: The name of the triggering project. Embedded ``/`` or ``.`` characters will be translated to ``_``. .. stat:: The name of the triggering branch. Embedded ``/`` or ``.`` characters will be translated to ``_``. .. stat:: job Subtree detailing per-project job statistics: .. stat:: The triggered job name. .. stat:: :type: counter, timer A counter for each type of result (e.g., ``SUCCESS`` or ``FAILURE``, ``ERROR``, etc.) for the job. If the result is ``SUCCESS`` or ``FAILURE``, Zuul will additionally report the duration of the build as a timer. .. stat:: current_changes :type: gauge The number of items of this project currently being processed by this pipeline. .. stat:: resident_time :type: timer A timer metric reporting how long each item for this project has been in the pipeline. .. stat:: total_changes :type: counter The number of changes for this project processed by the pipeline since Zuul started. .. stat:: resident_time :type: timer A timer metric reporting how long each item has been in the pipeline. .. stat:: total_changes :type: counter The number of changes processed by the pipeline since Zuul started. .. stat:: wait_time :type: timer How long each item spent in the pipeline before its first job started. .. stat:: zuul.executor. Holds metrics emitted by individual executors. The ```` component of the key will be replaced with the hostname of the executor. .. stat:: merger. :type: counter Incremented to represent the status of a Zuul executor's merger operations. ```` can be either ``SUCCESS`` or ``FAILURE``. A failed merge operation which would be accounted for as a ``FAILURE`` is what ends up being returned by Zuul as a ``MERGER_FAILURE``. .. stat:: builds :type: counter Incremented each time the executor starts a build. .. stat:: starting_builds :type: gauge The number of builds starting on this executor. These are builds which have not yet begun their first pre-playbook. .. stat:: running_builds :type: gauge The number of builds currently running on this executor. This includes starting builds. .. stat:: phase Subtree detailing per-phase execution statistics: .. stat:: ```` represents a phase in the execution of a job. This can be an *internal* phase (such as ``setup`` or ``cleanup``) as well as *job* phases such as ``pre``, ``run`` or ``post``. .. stat:: :type: counter A counter for each type of result. These results do not, by themselves, determine the status of a build but are indicators of the exit status provided by Ansible for the execution of a particular phase. Example of possible counters for each phase are: ``RESULT_NORMAL``, ``RESULT_TIMED_OUT``, ``RESULT_UNREACHABLE``, ``RESULT_ABORTED``. .. stat:: load_average :type: gauge The one-minute load average of this executor, multiplied by 100. .. stat:: pct_used_ram :type: gauge The used RAM (excluding buffers and cache) on this executor, as a percentage multiplied by 100. .. stat:: pct_used_ram_cgroup :type: gauge The used RAM (excluding buffers and cache) on this executor allowed by the cgroup, as percentage multiplied by 100. .. stat:: zuul.nodepool.requests Holds metrics related to Zuul requests and responses from Nodepool. States are one of: *requested* Node request submitted by Zuul to Nodepool *canceled* Node request was canceled by Zuul *failed* Nodepool failed to fulfill a node request *fulfilled* Nodes were assigned by Nodepool .. stat:: :type: timer Records the elapsed time from request to completion for states `failed` and `fulfilled`. For example, ``zuul.nodepool.request.fulfilled.mean`` will give the average time for all fulfilled requests within each ``statsd`` flush interval. A lower value for `fulfilled` requests is better. Ideally, there will be no `failed` requests. .. stat:: .total :type: counter Incremented when nodes are assigned or removed as described in the states above. .. stat:: .size. :type: counter, timer Increments for the node count of each request. For example, a request for 3 nodes would use the key ``zuul.nodepool.requests.requested.size.3``; fulfillment of 3 node requests can be tracked with ``zuul.nodepool.requests.fulfilled.size.3``. The timer is implemented for ``fulfilled`` and ``failed`` requests. For example, the timer ``zuul.nodepool.requests.failed.size.3.mean`` gives the average time of 3-node failed requests within the ``statsd`` flush interval. A lower value for `fulfilled` requests is better. Ideally, there will be no `failed` requests. .. stat:: .label.