Add alert rules when OpenStack services are down

The openstack-exporter generate metrics that checks if services
are up. E.g:
- openstack_loadbalancer_up
- openstack_designate_up
- openstack_identity_up

This new rule uses a regex to identify all metrics name that
starts with 'openstack' and ends with 'up'. Case the result is 0,
meaning service is down, it will generate individual alerts for
every service with problems.

Change-Id: Ia3f6aced5dcbfa124b4340ad054a43a460284019
This commit is contained in:
Gabriel Cocenza
2024-07-09 11:37:31 -03:00
parent f3ab145e58
commit e3daf1ad07
2 changed files with 25 additions and 1 deletions

View File

@@ -42,6 +42,15 @@ The charm by default uses following images:
`ghcr.io/canonical/openstack-exporter:1.6.0-7533071`
## Alerting Rules
This charm automatically adds Prometheus alert rules using the files at
`src/prometheus_alert_rules` when related with `grafana-agent`.
The following alerts are configured by default:
- `OpenStackServicesDown`: This alert rule will trigger when an OpenStack service is down. The
exporter generates metrics that identify if services are up. E.g.: openstack_loadbalancer_up,
openstack_designate_up. Individual alerts will appear if one of those services has problems.
## Contributing
Please see the [Juju SDK docs](https://juju.is/docs/sdk) for guidelines
@@ -58,4 +67,3 @@ Please report bugs on [Launchpad][lp-bugs-charm-openstack-exporter-k8s].
[juju-docs-actions]: https://jaas.ai/docs/actions
[juju-docs-config-apps]: https://juju.is/docs/configuring-applications
[lp-bugs-charm-openstack-exporter-k8s]: https://bugs.launchpad.net/charm-openstack-exporter-k8s/+filebug

View File

@@ -0,0 +1,16 @@
groups:
- name: OpenStackServices
rules:
- alert: OpenStackServicesDown
expr: |
sum by(service) (
label_replace({__name__=~"openstack_(.+)_up"}, "service", "$1", "__name__", "openstack_(.+)_up")
) == 0
for: 5m
labels:
severity: critical
service: "{{ $labels.service }}"
annotations:
summary: OpenStack Services Down
description: |
The OpenStack service {{ $labels.service }} is down