Don't expect a static job name

A job name passed via the prometheus_scrape library doesn't end up as a
static job name in the prometheus configuration file in the COS world
even though COS expects a fixed string. Practically we cannot have a
static job name like job=ceph in any of the alert rules in COS since the
charms will convert the string "ceph" into:

> juju_MODELNAME_ID_APPNAME_prometheus_scrape_JOBNAME(ceph)-N

Let's give up the possibility of the static job name and use "up{}" so
it will be annotated with the model name/ID, etc. without any specific
job related condition. It will break the alert rules when one unit have
more than one scraping endpoint because there will be no way to
distinguish multiple scraping jobs. Ceph MON only has one prometheus
endpoint for the time being so this change shouldn't cause an immediate
issue. Overall, it's not ideal but at least better than the current
status, which is an alert error out of the box.

The following alert rule:
> up{} == 0
will be converted and annotated as:
> up{juju_application="ceph-mon",juju_model="ceph",juju_model_uuid="UUID"} == 0

Closes-Bug: #2044062

Change-Id: I0df8bc0238349b5f03179dfb8f4da95da48140c7
(cherry picked from commit fb32621831)
This commit is contained in:
Nobuto Murata 2024-03-17 22:55:39 +09:00
parent f3d290b55d
commit 61defed938
1 changed files with 2 additions and 2 deletions

View File

@ -343,7 +343,7 @@ groups:
annotations:
description: "The mgr/prometheus module at {{ $labels.instance }} is unreachable. This could mean that the module has been disabled or the mgr daemon itself is down. Without the mgr/prometheus module metrics and alerts will no longer function. Open a shell to an admin node or toolbox pod and use 'ceph -s' to to determine whether the mgr is active. If the mgr is not active, restart it, otherwise you can determine module status with 'ceph mgr module ls'. If it is not listed as enabled, enable it with 'ceph mgr module enable prometheus'."
summary: "The mgr/prometheus module is not available"
expr: "up{job=\"ceph\"} == 0"
expr: "up{} == 0"
for: "1m"
labels:
oid: "1.3.6.1.4.1.50495.1.2.1.6.2"
@ -601,7 +601,7 @@ groups:
annotations:
description: "The prometheus job that scrapes from Ceph is no longer defined, this will effectively mean you'll have no metrics or alerts for the cluster. Please review the job definitions in the prometheus.yml file of the prometheus instance."
summary: "The scrape job for Ceph is missing from Prometheus"
expr: "absent(up{job=\"ceph\"})"
expr: "absent(up{})"
for: "30s"
labels:
oid: "1.3.6.1.4.1.50495.1.2.1.12.1"