From 61defed938f49f338bc5df9a9a8794dfae2021e6 Mon Sep 17 00:00:00 2001
From: Nobuto Murata <nobuto.murata@canonical.com>
Date: Sun, 17 Mar 2024 22:55:39 +0900
Subject: [PATCH] Don't expect a static job name

A job name passed via the prometheus_scrape library doesn't end up as a
static job name in the prometheus configuration file in the COS world
even though COS expects a fixed string. Practically we cannot have a
static job name like job=ceph in any of the alert rules in COS since the
charms will convert the string "ceph" into:

> juju_MODELNAME_ID_APPNAME_prometheus_scrape_JOBNAME(ceph)-N

Let's give up the possibility of the static job name and use "up{}" so
it will be annotated with the model name/ID, etc. without any specific
job related condition. It will break the alert rules when one unit have
more than one scraping endpoint because there will be no way to
distinguish multiple scraping jobs. Ceph MON only has one prometheus
endpoint for the time being so this change shouldn't cause an immediate
issue. Overall, it's not ideal but at least better than the current
status, which is an alert error out of the box.

The following alert rule:
> up{} == 0
will be converted and annotated as:
> up{juju_application="ceph-mon",juju_model="ceph",juju_model_uuid="UUID"} == 0

Closes-Bug: #2044062

Change-Id: I0df8bc0238349b5f03179dfb8f4da95da48140c7
(cherry picked from commit fb3262183102171da5704868d7522290b3a9ede4)
---
 files/prometheus_alert_rules/prometheus_alerts.yml.default | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/files/prometheus_alert_rules/prometheus_alerts.yml.default b/files/prometheus_alert_rules/prometheus_alerts.yml.default
index a544d41e..b292a3c4 100644
--- a/files/prometheus_alert_rules/prometheus_alerts.yml.default
+++ b/files/prometheus_alert_rules/prometheus_alerts.yml.default
@@ -343,7 +343,7 @@ groups:
         annotations:
           description: "The mgr/prometheus module at {{ $labels.instance }} is unreachable. This could mean that the module has been disabled or the mgr daemon itself is down. Without the mgr/prometheus module metrics and alerts will no longer function. Open a shell to an admin node or toolbox pod and use 'ceph -s' to to determine whether the mgr is active. If the mgr is not active, restart it, otherwise you can determine module status with 'ceph mgr module ls'. If it is not listed as enabled, enable it with 'ceph mgr module enable prometheus'."
           summary: "The mgr/prometheus module is not available"
-        expr: "up{job=\"ceph\"} == 0"
+        expr: "up{} == 0"
         for: "1m"
         labels:
           oid: "1.3.6.1.4.1.50495.1.2.1.6.2"
@@ -601,7 +601,7 @@ groups:
         annotations:
           description: "The prometheus job that scrapes from Ceph is no longer defined, this will effectively mean you'll have no metrics or alerts for the cluster.  Please review the job definitions in the prometheus.yml file of the prometheus instance."
           summary: "The scrape job for Ceph is missing from Prometheus"
-        expr: "absent(up{job=\"ceph\"})"
+        expr: "absent(up{})"
         for: "30s"
         labels:
           oid: "1.3.6.1.4.1.50495.1.2.1.12.1"