openstack-helm-infra/doc/source/monitoring/prometheus.rst
astebenkova 98f9438ba7 [elasticsearch-exporter] Update to the latest v1.7.0
The current version of the exporter is outdated, switch to the upstream
+ rename --es.snapshots to --collector.snapshots (v1.7.0) and
  --es.cluster_settings to --collector.clustersettings (v1.6.0)

Change-Id: I4b496d859a4764fbec3271817391667a53286acd
2024-01-18 17:23:24 +02:00

13 KiB

Prometheus

The Prometheus chart in openstack-helm-infra provides a time series database and a strong querying language for monitoring various components of OpenStack-Helm. Prometheus gathers metrics by scraping defined service endpoints or pods at specified intervals and indexing them in the underlying time series database.

Authentication

The Prometheus deployment includes a sidecar container that runs an Apache reverse proxy to add authentication capabilities for Prometheus. The username and password are configured under the monitoring entry in the endpoints section of the chart's values.yaml.

The configuration for Apache can be found under the conf.httpd key, and uses a helm-toolkit function that allows for including gotpl entries in the template directly. This allows the use of other templates, like the endpoint lookup function templates, directly in the configuration for Apache.

Prometheus Service configuration

The Prometheus service is configured via command line flags set during runtime. These flags include: setting the configuration file, setting log levels, setting characteristics of the time series database, and enabling the web admin API for snapshot support. These settings can be configured via the values tree at:

conf:
  prometheus:
    command_line_flags:
      log.level: info
      query.max_concurrency: 20
      query.timeout: 2m
      storage.tsdb.path: /var/lib/prometheus/data
      storage.tsdb.retention: 7d
      web.enable_admin_api: false
      web.enable_lifecycle: false

The Prometheus configuration file contains the definitions for scrape targets and the location of the rules files for triggering alerts on scraped metrics. The configuration file is defined in the values file, and can be found at:

conf:
  prometheus:
    scrape_configs: |

By defining the configuration via the values file, an operator can override all configuration components of the Prometheus deployment at runtime.

Kubernetes Endpoint Configuration

The Prometheus chart in openstack-helm-infra uses the built-in service discovery mechanisms for Kubernetes endpoints and pods to automatically configure scrape targets. Functions added to helm-toolkit allows configuration of these targets via annotations that can be applied to any service or pod that exposes metrics for Prometheus, whether a service for an application-specific exporter or an application that provides a metrics endpoint via its service. The values in these functions correspond to entries in the monitoring tree under the prometheus key in a chart's values.yaml file.

The functions definitions are below:

{{- define "helm-toolkit.snippets.prometheus_service_annotations" -}}
{{- $config := index . 0 -}}
{{- if $config.scrape }}
prometheus.io/scrape: {{ $config.scrape | quote }}
{{- end }}
{{- if $config.scheme }}
prometheus.io/scheme: {{ $config.scheme | quote }}
{{- end }}
{{- if $config.path }}
prometheus.io/path: {{ $config.path | quote }}
{{- end }}
{{- if $config.port }}
prometheus.io/port: {{ $config.port | quote }}
{{- end }}
{{- end -}}
{{- define "helm-toolkit.snippets.prometheus_pod_annotations" -}}
{{- $config := index . 0 -}}
{{- if $config.scrape }}
prometheus.io/scrape: {{ $config.scrape | quote }}
{{- end }}
{{- if $config.path }}
prometheus.io/path: {{ $config.path | quote }}
{{- end }}
{{- if $config.port }}
prometheus.io/port: {{ $config.port | quote }}
{{- end }}
{{- end -}}

These functions render the following annotations:

  • prometheus.io/scrape: Must be set to true for Prometheus to scrape target
  • prometheus.io/scheme: Overrides scheme used to scrape target if not http
  • prometheus.io/path: Overrides path used to scrape target metrics if not /metrics
  • prometheus.io/port: Overrides port to scrape metrics on if not service's default port

Each chart that can be targeted for monitoring by Prometheus has a prometheus section under a monitoring tree in the chart's values.yaml, and Prometheus monitoring is disabled by default for those services. Example values for the required entries can be found in the following monitoring configuration for the prometheus-node-exporter chart:

monitoring:
  prometheus:
    enabled: false
    node_exporter:
      scrape: true

If the prometheus.enabled key is set to true, the annotations are set on the targeted service or pod as the condition for applying the annotations evaluates to true. For example:

{{- $prometheus_annotations := $envAll.Values.monitoring.prometheus.node_exporter }}
---
apiVersion: v1
kind: Service
metadata:
name: {{ tuple "node_metrics" "internal" . | include "helm-toolkit.endpoints.hostname_short_endpoint_lookup" }}
labels:
{{ tuple $envAll "node_exporter" "metrics" | include "helm-toolkit.snippets.kubernetes_metadata_labels" | indent 4 }}
annotations:
{{- if .Values.monitoring.prometheus.enabled }}
{{ tuple $prometheus_annotations | include "helm-toolkit.snippets.prometheus_service_annotations" | indent 4 }}
{{- end }}

Kubelet, API Server, and cAdvisor

The Prometheus chart includes scrape target configurations for the kubelet, the Kubernetes API servers, and cAdvisor. These targets are configured based on a kubeadm deployed Kubernetes cluster, as OpenStack-Helm uses kubeadm to deploy Kubernetes in the gates. These configurations may need to change based on your chosen method of deployment. Please note the cAdvisor metrics will not be captured if the kubelet was started with the following flag:

--cadvisor-port=0

To enable the gathering of the kubelet's custom metrics, the following flag must be set:

--enable-custom-metrics

Installation

The Prometheus chart can be installed with the following command:

helm install --namespace=openstack local/prometheus --name=prometheus

The above command results in a Prometheus deployment configured to automatically discover services with the necessary annotations for scraping, configured to gather metrics on the kubelet, the Kubernetes API servers, and cAdvisor.

Extending Prometheus

Prometheus can target various exporters to gather metrics related to specific applications to extend visibility into an OpenStack-Helm deployment. Currently, openstack-helm-infra contains charts for:

  • prometheus-kube-state-metrics: Provides additional Kubernetes metrics
  • prometheus-node-exporter: Provides metrics for nodes and linux kernels
  • prometheus-openstack-metrics-exporter: Provides metrics for OpenStack services

Kube-State-Metrics

The prometheus-kube-state-metrics chart provides metrics for Kubernetes objects as well as metrics for kube-scheduler and kube-controller-manager. Information on the specific metrics available via the kube-state-metrics service can be found in the kube-state-metrics documentation.

The prometheus-kube-state-metrics chart can be installed with the following:

helm install --namespace=kube-system local/prometheus-kube-state-metrics --name=prometheus-kube-state-metrics

Node Exporter

The prometheus-node-exporter chart provides hardware and operating system metrics exposed via Linux kernels. Information on the specific metrics available via the Node exporter can be found on the Node_exporter GitHub page.

The prometheus-node-exporter chart can be installed with the following:

helm install --namespace=kube-system local/prometheus-node-exporter --name=prometheus-node-exporter

OpenStack Exporter

The prometheus-openstack-exporter chart provides metrics specific to the OpenStack services. The exporter's source code can be found here. While the metrics provided are by no means comprehensive, they will be expanded upon.

Please note the OpenStack exporter requires the creation of a Keystone user to successfully gather metrics. To create the required user, the chart uses the same keystone user management job the OpenStack service charts use.

The prometheus-openstack-exporter chart can be installed with the following:

helm install --namespace=openstack local/prometheus-openstack-exporter --name=prometheus-openstack-exporter

Other exporters

Certain charts in OpenStack-Helm include templates for application-specific Prometheus exporters, which keeps the monitoring of those services tightly coupled to the chart. The templates for these exporters can be found in the monitoring subdirectory in the chart. These exporters are disabled by default, and can be enabled by setting the appropriate flag in the monitoring.prometheus key of the chart's values.yaml file. The charts containing exporters include:

Ceph

Starting with Luminous, Ceph can export metrics with ceph-mgr prometheus module. This module can be enabled in Ceph's values.yaml under the ceph_mgr_enabled_plugins key by appending prometheus to the list of enabled modules. After enabling the prometheus module, metrics can be scraped on the ceph-mgr service endpoint. This relies on the Prometheus annotations attached to the ceph-mgr service template, and these annotations can be modified in the endpoints section of Ceph's values.yaml file. Information on the specific metrics available via the prometheus module can be found in the Ceph prometheus module documentation.

Prometheus Dashboard

Prometheus includes a dashboard that can be accessed via the accessible Prometheus endpoint (NodePort or otherwise). This dashboard will give you a view of your scrape targets' state, the configuration values for Prometheus's scrape jobs and command line flags, a view of any alerts triggered based on the defined rules, and a means for using PromQL to query scraped metrics. The Prometheus dashboard is a useful tool for verifying Prometheus is configured appropriately and to verify the status of any services targeted for scraping via the Prometheus service discovery annotations.

Rules Configuration

Prometheus provides a querying language that can operate on defined rules which allow for the generation of alerts on specific metrics. The Prometheus chart in openstack-helm-infra defines these rules via the values.yaml file. By defining these in the values file, it allows operators flexibility to provide specific rules via overrides at installation. The following rules keys are provided:

values:
  conf:
    rules:
      alertmanager:
      etcd3:
      kube_apiserver:
      kube_controller_manager:
      kubelet:
      kubernetes:
      rabbitmq:
      mysql:
      ceph:
      openstack:
      custom:

These provided keys provide recording and alert rules for all infrastructure components of an OpenStack-Helm deployment. If you wish to exclude rules for a component, leave the tree empty in an overrides file. To read more about Prometheus recording and alert rules definitions, please see the official Prometheus recording and alert rules documentation.

Note: Prometheus releases prior to 2.0 used gotpl to define rules. Prometheus 2.0 changed the rules format to YAML, making them much easier to read. The Prometheus chart in openstack-helm-infra uses Prometheus 2.0 by default to take advantage of changes to the underlying storage layer and the handling of stale data. The chart will not support overrides for Prometheus versions below 2.0, as the command line flags for the service changed between versions.

The wide range of exporters included in OpenStack-Helm coupled with the ability to define rules with configuration overrides allows for the addition of custom alerting and recording rules to fit an operator's monitoring needs. Adding new rules or modifying existing rules require overrides for either an existing key under conf.rules or the addition of a new key under conf.rules. The addition of custom rules can be used to define complex checks that can be extended for determining the liveliness or health of infrastructure components.