181 lines
6.3 KiB
ReStructuredText
181 lines
6.3 KiB
ReStructuredText
.. _monitoring:
|
|
|
|
Container Monitoring in Kubernetes
|
|
----------------------------------
|
|
|
|
The current monitoring capabilities that can be deployed with magnum span
|
|
through different components. These are:
|
|
|
|
* **metrics-server:** is responsible for the API metrics.k8s.io requests. This
|
|
includes the most basic functionality when using simple HPA metrics or when
|
|
using the *kubectl top* command.
|
|
|
|
* **prometheus:** is a full fledged service that allows the user to access
|
|
advanced metrics capabilities. These metrics are collected with a resolution
|
|
of 30 seconds and include resources such as CPU, Memory, Disk and Network IO
|
|
as well as R/W rates. These metrics of fine granularity are available on your
|
|
cluster for up to a period of 14 days (default).
|
|
|
|
* **prometheus-adapter:** is an extra component that integrates with the
|
|
prometheus service and allows a user to create more sophisticated `HPA
|
|
<https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/>`_
|
|
rules. The service integrates fully with the metrics.k8s.io API but at this
|
|
time only custom.metrics.k8s.io is being actively used.
|
|
|
|
|
|
The installation of these services is controlled with the following labels:
|
|
|
|
_`metrics_server_enabled`
|
|
metrics_server_enabled is used to enable disable the installation of
|
|
the metrics server.
|
|
Train default: true
|
|
Stein default: true
|
|
|
|
_`monitoring_enabled`
|
|
Enable installation of cluster monitoring solution provided by the
|
|
stable/prometheus-operator helm chart.
|
|
Default: false
|
|
|
|
_`prometheus_adapter_enabled`
|
|
Enable installation of cluster custom metrics provided by the
|
|
stable/prometheus-adapter helm chart. This service depends on
|
|
monitoring_enabled.
|
|
Default: true
|
|
|
|
To control deployed versions, extra labels are available:
|
|
|
|
_`metrics_server_chart_tag`
|
|
Add metrics_server_chart_tag to select the version of the
|
|
stable/metrics-server chart to install.
|
|
Ussuri default: v2.8.8
|
|
Yoga default: v3.7.0
|
|
|
|
_`prometheus_operator_chart_tag`
|
|
Add prometheus_operator_chart_tag to select version of the
|
|
stable/prometheus-operator chart to install. When installing the chart,
|
|
helm will use the default values of the tag defined and overwrite them based
|
|
on the prometheus-operator-config ConfigMap currently defined. You must
|
|
certify that the versions are compatible.
|
|
|
|
_`prometheus_adapter_chart_tag`
|
|
The stable/prometheus-adapter helm chart version to use.
|
|
Train-default: 1.4.0
|
|
|
|
Full fledged cluster monitoring
|
|
+++++++++++++++++++++++++++++++
|
|
|
|
The prometheus installation provided with the `monitoring_enabled`_ label is in
|
|
fact a multi component service. This installation is managed with the
|
|
prometheus-operator helm chart and the constituent components are:
|
|
|
|
* **prometheus** (data collection, storage and search)
|
|
|
|
* **node-exporter** (data source for the kubelet/node)
|
|
* **kube-state-metrics** (data source for the running kubernetes objects
|
|
{deployments, pods, nodes, etc})
|
|
|
|
* **alertmanager** (alarm aggregation, processing and dispatch)
|
|
* **grafana** (metrics visualization)
|
|
|
|
|
|
These components are installed in a generic way that makes it easy to have a
|
|
cluster wide monitoring infrastructure running with no effort.
|
|
|
|
.. warning::
|
|
|
|
The existent monitoring infra does not take into account the existence of
|
|
nodegroups. If you plan to use nodegroups in your cluster you can take into
|
|
account the maximum number of total nodes and use *max_node_count* to
|
|
correctly setup the prometheus server.
|
|
|
|
.. note::
|
|
|
|
Before creating your cluster take into account the scale of the cluster.
|
|
This is important as the Prometheus server pod might not fit your nodes.
|
|
This is particularly important if you are using *Cluster Autoscaling* as
|
|
the Prometheus server will schedule resources needed to meet the maximum
|
|
number of nodes that your cluster can scale up to defined by
|
|
label (if existent) *max_node_count*.
|
|
|
|
The Prometheus server will consume the following resources:
|
|
|
|
::
|
|
|
|
RAM:: 256 (base) + Nodes * 40 [MB]
|
|
CPU:: 128 (base) + Nodes * 7 [mCPU]
|
|
Disk:: 15 GB for 2 weeks (depends on usage)
|
|
|
|
|
|
Tuning parameters
|
|
+++++++++++++++++
|
|
|
|
The existent setup configurations allows you to tune the metric infrastructure
|
|
to your requisites. Below is a list of labels that can be used for specific
|
|
cases:
|
|
|
|
_`grafana_admin_passwd`
|
|
This label lets users create their own *admin* user password for the Grafana
|
|
interface. It expects a string value.
|
|
Default: admin
|
|
|
|
_`monitoring_retention_days`
|
|
This label lets users specify the maximum retention time for data collected
|
|
in the prometheus server in days.
|
|
Default: 14
|
|
|
|
_`monitoring_interval_seconds`
|
|
This label lets users specify the time between metric samples in seconds.
|
|
Default: 30
|
|
|
|
_`monitoring_retention_size`
|
|
This label lets users specify the maximum size (in gigibytes) for data
|
|
stored by the prometheus server. This label must be used together with
|
|
`monitoring_storage_class_name`_.
|
|
Default: 14
|
|
|
|
_`monitoring_storage_class_name`
|
|
The kubernetes storage class name to use for the prometheus pvc.
|
|
Using this label will activate the usage of a pvc instead of local
|
|
disk space.
|
|
When using monitoring_storage_class_name 2 pvcs will be created.
|
|
One for the prometheus server which size is set by
|
|
`monitoring_retention_size`_ and one for grafana which is fixed at 1Gi.
|
|
Default: ""
|
|
|
|
_`monitoring_ingress_enabled`
|
|
This label set's up all the underlying services to be accessible in a
|
|
'route by path' way. This means that the services will be exposed as:
|
|
|
|
::
|
|
|
|
my.domain.com/alertmanager
|
|
my.domain.com/prometheus
|
|
my.domain.com/grafana
|
|
|
|
|
|
This label must be used together with `cluster_root_domain_name`_.
|
|
Default: false
|
|
|
|
_`cluster_root_domain_name`
|
|
The root domain name to use for the cluster automatically set up
|
|
applications.
|
|
Default: "localhost"
|
|
|
|
_`cluster_basic_auth_secret`
|
|
The kubernetes secret to use for the proxy basic auth username and password
|
|
for the unprotected services {alertmanager,prometheus}. Basic auth is only
|
|
set up if this file is specified.
|
|
The secret must be in the same namespace as the used proxy (kube-system).
|
|
Default: ""
|
|
|
|
::
|
|
|
|
To create this secret you can do:
|
|
$ htpasswd -c auth foo
|
|
$ kubectl create secret generic basic-auth --from-file=auth
|
|
|
|
_`prometheus_adapter_configmap`
|
|
The name of the prometheus-adapter rules ConfigMap to use. Using this label
|
|
will overwrite the default rules.
|
|
Default: ""
|