Browse Source

Merge "Add gearman stats reference"

tags/3.4.0
Zuul 5 months ago
parent
commit
51ede31e84
2 changed files with 75 additions and 7 deletions
  1. 40
    3
      doc/source/admin/monitoring.rst
  2. 35
    4
      doc/source/admin/troubleshooting.rst

+ 40
- 3
doc/source/admin/monitoring.rst View File

@@ -264,7 +264,10 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
264 264
    .. stat:: current_requests
265 265
       :type: gauge
266 266
 
267
-      The number of outstanding nodepool requests from Zuul.
267
+      The number of outstanding nodepool requests from Zuul.  Ideally
268
+      this will be at zero, meaning all requests are fulfilled.
269
+      Persistently high values indicate more testing node resources
270
+      would be helpful.
268 271
 
269 272
 .. stat:: zuul.mergers
270 273
 
@@ -283,7 +286,9 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
283 286
    .. stat:: jobs_queued
284 287
       :type: gauge
285 288
 
286
-      The number of merge jobs queued.
289
+      The number of merge jobs waiting for a merger.  This should
290
+      ideally be zero; persistent higher values indicate more merger
291
+      resources would be useful.
287 292
 
288 293
 .. stat:: zuul.executors
289 294
 
@@ -307,8 +312,40 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
307 312
    .. stat:: jobs_queued
308 313
       :type: gauge
309 314
 
310
-      The number of executor jobs queued.
315
+      The number of jobs allocated nodes, but queued waiting for an
316
+      executor to run on.  This should ideally be at zero; persistent
317
+      higher values indicate more exectuor resources would be useful.
311 318
 
319
+.. stat:: zuul.geard
320
+
321
+   Gearman job distribution statistics.  Gearman jobs encompass the
322
+   wide variety of distributed jobs running within the scheduler and
323
+   across mergers and exectuors.  These stats are emitted by the `gear
324
+   <https://pypi.org/project/gear/>`__ library.
325
+
326
+   .. stat:: running
327
+      :type: gauge
328
+
329
+      Jobs that Gearman has actively running.  The longest running
330
+      jobs will usually relate to active job execution so you would
331
+      expect this to have a lower bound around there.  Note this may
332
+      be lower than active nodes, as a multiple-node job will only
333
+      have one active Gearman job.
334
+
335
+   .. stat:: waiting
336
+      :type: gauge
337
+
338
+      Jobs waiting in the gearman queue.  This would be expected to be
339
+      around zero; note that this is *not* related to the backlogged
340
+      queue of jobs waiting for a node allocation (node allocations
341
+      are via Zookeeper).  If this is unexpectedly high, see
342
+      :ref:`debug_gearman` for queue debugging tips to find out which
343
+      particular function calls are waiting.
344
+
345
+   .. stat:: total
346
+      :type: gauge
347
+
348
+      The sum of the `running` and `waiting` jobs.
312 349
 
313 350
 As an example, given a job named `myjob` in `mytenant` triggered by a
314 351
 change to `myproject` on the `master` branch in the `gate` pipeline

+ 35
- 4
doc/source/admin/troubleshooting.rst View File

@@ -1,10 +1,41 @@
1 1
 Troubleshooting
2 2
 ---------------
3 3
 
4
-You can use telnet to connect to gearman to check which Zuul
5
-components are online::
4
+Some advanced troubleshooting options are provided below.  These are
5
+generally very low-level and are not normally required.
6
+
7
+.. _debug_gearman:
8
+
9
+Gearman Jobs
10
+============
11
+
12
+Connecting to Gearman can allow you see if any Zuul components appear
13
+to not be accepting requests correctly.
14
+
15
+For unencrypted Gearman connections, you can use telnet to connect to
16
+and check which Zuul components are online::
6 17
 
7 18
     telnet <gearman_ip> 4730
8 19
 
9
-Useful commands are ``workers`` and ``status`` which you can run by just
10
-typing those commands once connected to gearman.
20
+For encrypted connections, you will need to provide suitable keys,
21
+e.g::
22
+
23
+    openssl s_client -connect localhost:4730 -cert /etc/zuul/ssl/client.pem  -key /etc/zuul/ssl/client.key
24
+
25
+Commands available are discussed in the Gearman `administrative
26
+protocol <http://gearman.org/protocol>`__.  Useful commands are
27
+``workers`` and ``status`` which you can run by just typing those
28
+commands once connected to gearman.
29
+
30
+For ``status`` you will see output for internal Zuul functions in the
31
+form ``FUNCTION\tTOTAL\tRUNNING\tAVAILABLE_WORKERS``::
32
+
33
+  ...
34
+  executor:resume:ze06.openstack.org	0	0	1
35
+  zuul:config_errors_list	0	0	1
36
+  zuul:status_get	0	0	1
37
+  executor:stop:ze11.openstack.org	0	0	1
38
+  zuul:job_list	0	0	1
39
+  zuul:tenant_sql_connection	0	0	1
40
+  executor:resume:ze09.openstack.org	0	0	1
41
+  ...

Loading…
Cancel
Save