grafyaml actually knows this is deprecated, but it's actually more
than that and doesn't work to refresh the variable at all now. "1"
means "on load" which is what we want.
Change-Id: I34ecdd30c2188cb7e6ec32e33c6a6e99b6240934
The templating we end up with in the running grafana for the OVH regions
on the OVH dashboard is null. We set our OpenStack datasource to be our
default datasource but maybe we need to set it explicitly. Do this to
see if it changes the behavior.
Change-Id: Ie95dd980a5c117e1849b08a3611330ff06987c34
The minor updates are apparently due to us not having run the script
the last time it was updated with new urls.
Change-Id: I255d1e47b5cff29a3ed377b65ceab677ab1c272e
All of these dashboards are the same, and have mostly copied all the
same issues with them. This makes updating anything a massive pain.
This implements a single dashboard template with a small script to
create individual dashboards for each provider and its regions.
I have included a range of fixes. The y-axis format has changed in
later versions of grafana. The API time tracking is no longer scaled,
but we just tell grafana it is in ms and it displays it correctly.
The test nodes history graph is moved to the top, as it is probably
the most interesting graph (note this splits itself out per region, if
mulitple regions are selected). Values for "null as zero" are
consistently set. Various formatting fixes for the labels are
included.
Change-Id: I5fbffaec3c82aa1fce0947f771de67edd15f7dfc
These stats aren't updating any more. Unfortunately, I don't think
there's any current replacement as nodepool doesn't have any insight
into the job it is satisfying a request for.
Change-Id: Ib69fbda5ee019180cd8761d0ead474b426bce379
Since we now query a cloud for its quota information, lets track the
response rate in grafana.
Change-Id: Ie9e2727b5dc3d18f5e5fc37be89a9a5f9492eb47
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Following the update to Zuul v3 some things changed:
- nodes.delete became nodes.deleting
- nodes.used became nodes.in-use but nodes.used is still relevant
as it's the status between 'in-use' and 'deleting'
- Add a panel for displaying failed nodes
Change-Id: I240d082115bd9078e45984d8fcff212a4e40e842
Depends-On: I6a89752d74ed7424267c3af3937ad01fb4bb8f86
Now that nodepool has been switch to use shade, we need to update
grafana to use the new shade syntax for Server related tasks.
Change-Id: I7698d54d89bda5327ac434fd8e662f0fe58d7f5e
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
I missed some tweaks on the previous Test Nodes graph change.
Also make the job runtimes wider like Paul suggested.
Change-Id: I5ac43909a679d273a557112ad8526a68de15f4f1
Add axis labels and units where appropriate.
Change the launch attempts graphs to summarize to 1m rather than
1h since grafana lets us zoom in. 1m is the lowest native unit
of time that will always show whole numbers for this metric (whose
lowest non-zero value is 1 event / 10 seconds).
Change the test nodes graph to stacked to match the way we normally
draw this graph, but change the tooltip to 'individual' so that
when hovering, individual values for the different states are
displayed, rather than cumulative (which does not make sense for
this application).
Also change the tooltip for the node graphs on the zuul dashboard
in the same manner.
Change-Id: I500aa486362476cff76a3d254093723f27021bed
Depends-On: Ie542dc4d0e151a00e84cc970c2cfa8c02377d7bf
These are per-region versions of the nodepool node state graph,
except that the values are not stacked in order to make the
individual values more accessible.
Change-Id: I8ec90758828484a9ffb7a90d2eacbcccc8b78bb4
There is no .error metric, but rather, errors are broken out by
cause. For this graph, simply display their sum.
Change-Id: Iae19e4e78098f3373c3195ff3ec52a11c5e92a3b