Browse Source

Add gearman stats reference

The stats emitted under zuul.geard are currently undocumented.  Add
them to the monitoring guide and add some more details to the geard
toubleshooting guide for what to do if the stats look wrong.

Change-Id: I831def2f7c22d8ffff62569cc7d657033a85ed19
tags/3.4.0
Ian Wienand 5 months ago
parent
commit
18fb9ec37e
2 changed files with 75 additions and 7 deletions
  1. 40
    3
      doc/source/admin/monitoring.rst
  2. 35
    4
      doc/source/admin/troubleshooting.rst

+ 40
- 3
doc/source/admin/monitoring.rst View File

@@ -264,7 +264,10 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
264 264
    .. stat:: current_requests
265 265
       :type: gauge
266 266
 
267
-      The number of outstanding nodepool requests from Zuul.
267
+      The number of outstanding nodepool requests from Zuul.  Ideally
268
+      this will be at zero, meaning all requests are fulfilled.
269
+      Persistently high values indicate more testing node resources
270
+      would be helpful.
268 271
 
269 272
 .. stat:: zuul.mergers
270 273
 
@@ -283,7 +286,9 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
283 286
    .. stat:: jobs_queued
284 287
       :type: gauge
285 288
 
286
-      The number of merge jobs queued.
289
+      The number of merge jobs waiting for a merger.  This should
290
+      ideally be zero; persistent higher values indicate more merger
291
+      resources would be useful.
287 292
 
288 293
 .. stat:: zuul.executors
289 294
 
@@ -307,8 +312,40 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
307 312
    .. stat:: jobs_queued
308 313
       :type: gauge
309 314
 
310
-      The number of executor jobs queued.
315
+      The number of jobs allocated nodes, but queued waiting for an
316
+      executor to run on.  This should ideally be at zero; persistent
317
+      higher values indicate more exectuor resources would be useful.
311 318
 
319
+.. stat:: zuul.geard
320
+
321
+   Gearman job distribution statistics.  Gearman jobs encompass the
322
+   wide variety of distributed jobs running within the scheduler and
323
+   across mergers and exectuors.  These stats are emitted by the `gear
324
+   <https://pypi.org/project/gear/>`__ library.
325
+
326
+   .. stat:: running
327
+      :type: gauge
328
+
329
+      Jobs that Gearman has actively running.  The longest running
330
+      jobs will usually relate to active job execution so you would
331
+      expect this to have a lower bound around there.  Note this may
332
+      be lower than active nodes, as a multiple-node job will only
333
+      have one active Gearman job.
334
+
335
+   .. stat:: waiting
336
+      :type: gauge
337
+
338
+      Jobs waiting in the gearman queue.  This would be expected to be
339
+      around zero; note that this is *not* related to the backlogged
340
+      queue of jobs waiting for a node allocation (node allocations
341
+      are via Zookeeper).  If this is unexpectedly high, see
342
+      :ref:`debug_gearman` for queue debugging tips to find out which
343
+      particular function calls are waiting.
344
+
345
+   .. stat:: total
346
+      :type: gauge
347
+
348
+      The sum of the `running` and `waiting` jobs.
312 349
 
313 350
 As an example, given a job named `myjob` in `mytenant` triggered by a
314 351
 change to `myproject` on the `master` branch in the `gate` pipeline

+ 35
- 4
doc/source/admin/troubleshooting.rst View File

@@ -1,10 +1,41 @@
1 1
 Troubleshooting
2 2
 ---------------
3 3
 
4
-You can use telnet to connect to gearman to check which Zuul
5
-components are online::
4
+Some advanced troubleshooting options are provided below.  These are
5
+generally very low-level and are not normally required.
6
+
7
+.. _debug_gearman:
8
+
9
+Gearman Jobs
10
+============
11
+
12
+Connecting to Gearman can allow you see if any Zuul components appear
13
+to not be accepting requests correctly.
14
+
15
+For unencrypted Gearman connections, you can use telnet to connect to
16
+and check which Zuul components are online::
6 17
 
7 18
     telnet <gearman_ip> 4730
8 19
 
9
-Useful commands are ``workers`` and ``status`` which you can run by just
10
-typing those commands once connected to gearman.
20
+For encrypted connections, you will need to provide suitable keys,
21
+e.g::
22
+
23
+    openssl s_client -connect localhost:4730 -cert /etc/zuul/ssl/client.pem  -key /etc/zuul/ssl/client.key
24
+
25
+Commands available are discussed in the Gearman `administrative
26
+protocol <http://gearman.org/protocol>`__.  Useful commands are
27
+``workers`` and ``status`` which you can run by just typing those
28
+commands once connected to gearman.
29
+
30
+For ``status`` you will see output for internal Zuul functions in the
31
+form ``FUNCTION\tTOTAL\tRUNNING\tAVAILABLE_WORKERS``::
32
+
33
+  ...
34
+  executor:resume:ze06.openstack.org	0	0	1
35
+  zuul:config_errors_list	0	0	1
36
+  zuul:status_get	0	0	1
37
+  executor:stop:ze11.openstack.org	0	0	1
38
+  zuul:job_list	0	0	1
39
+  zuul:tenant_sql_connection	0	0	1
40
+  executor:resume:ze09.openstack.org	0	0	1
41
+  ...

Loading…
Cancel
Save