From c6fe6459f266b22b6e1eab64b38b2b35cf619176 Mon Sep 17 00:00:00 2001 From: Ian Wienand Date: Tue, 27 Nov 2018 20:03:46 +1100 Subject: [PATCH] Rework zuul nodepool stats reporting The current stats set a counter zuul.nodepool. but then tries to set more counters like zuul.nodepool..label. This doesn't work because zuul.nodepool. is already a counter value; it can't also be an intermediate key. Note this *does* work with the timer values, but that's because statsd is turning the timer into individual values (e.g. zuul.nodepool..) as it flushes each interval. Thus we need to rethink these stats. This puts them under a new intermeidate key "requests" and adds a "total" count; thus zuul.nodepool. == zuul.nodepool.requests..total The other stats, showing requests by-label and by-size will now live under the zuul.nodepool.requests parent. While we're here, use a statsd pipeline to send the status update as it works better when sending lots of stats quickly over UDP. This isn't handled by the current debug log below; move this into the test-case framework. The documentation has been clarified to match the code. Change-Id: I127e8b6d08ab86e0f24018fd4b33c626682c76c7 --- doc/source/admin/monitoring.rst | 111 ++++++++---------- .../nodepool-statsd-3eb500893833cdc4.yaml | 10 ++ tests/base.py | 3 + tests/unit/test_scheduler.py | 40 +++---- zuul/nodepool.py | 37 +++--- 5 files changed, 106 insertions(+), 95 deletions(-) create mode 100644 releasenotes/notes/nodepool-statsd-3eb500893833cdc4.yaml diff --git a/doc/source/admin/monitoring.rst b/doc/source/admin/monitoring.rst index 4487d187a6..a75c2ddd8e 100644 --- a/doc/source/admin/monitoring.rst +++ b/doc/source/admin/monitoring.rst @@ -188,78 +188,69 @@ These metrics are emitted by the Zuul :ref:`scheduler`: The used RAM (excluding buffers and cache) on this executor, as a percentage multiplied by 100. -.. stat:: zuul.nodepool +.. stat:: zuul.nodepool.requests - Holds metrics related to Zuul requests from Nodepool. + Holds metrics related to Zuul requests and responses from Nodepool. - .. stat:: requested + States are one of: + + *requested* + Node request submitted by Zuul to Nodepool + *canceled* + Node request was canceled by Zuul + *failed* + Nodepool failed to fulfill a node request + *fulfilled* + Nodes were assigned by Nodepool + + .. stat:: + :type: timer + + Records the elapsed time from request to completion for states + `failed` and `fulfilled`. For example, + ``zuul.nodepool.request.fulfilled.mean`` will give the average + time for all fulfilled requests within each ``statsd`` flush + interval. + + A lower value for `fulfilled` requests is better. Ideally, + there will be no `failed` requests. + + .. stat:: .total :type: counter - Incremented each time a node request is submitted to Nodepool. + Incremented when nodes are assigned or removed as described in + the states above. - .. stat:: label.