4. Update cluster monitoring documentation
Change the User Documentation to introduce the new way of installing the prometheus monitoring suite by using label monitoring_enabled. Give a broad overview of the existent monitoring features available out-of-the-box and which components exist and what they do. Explain which FAQ can be solved with already existent integrations by manipulating monitoring specific labels. task: 39627 story: 2006765 Depends-On: Ie0e7000e0d94b2037f2c398fa67a2a2b7e256bc3 Change-Id: I5581650b15ce94e31a44de09f82aef1790013b54 Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@gmail.com>
This commit is contained in:
parent
ea64468ab3
commit
a3d8b4fe8d
|
@ -352,7 +352,7 @@ the table are linked to more details elsewhere in the user guide.
|
||||||
+---------------------------------------+--------------------+---------------+
|
+---------------------------------------+--------------------+---------------+
|
||||||
| `admission_control_list`_ | see below | see below |
|
| `admission_control_list`_ | see below | see below |
|
||||||
+---------------------------------------+--------------------+---------------+
|
+---------------------------------------+--------------------+---------------+
|
||||||
| `prometheus_monitoring`_ | - true | false |
|
| `prometheus_monitoring` (deprecated) | - true | false |
|
||||||
| | - false | |
|
| | - false | |
|
||||||
+---------------------------------------+--------------------+---------------+
|
+---------------------------------------+--------------------+---------------+
|
||||||
| `grafana_admin_passwd`_ | (any string) | "admin" |
|
| `grafana_admin_passwd`_ | (any string) | "admin" |
|
||||||
|
@ -1301,19 +1301,6 @@ _`heapster_enabled`
|
||||||
Ussuri default: false
|
Ussuri default: false
|
||||||
Train default: true
|
Train default: true
|
||||||
|
|
||||||
_`metrics_server_chart_tag`
|
|
||||||
Add metrics_server_chart_tag to select the version of the
|
|
||||||
stable/metrics-server chart to install.
|
|
||||||
Ussuri default: v2.8.8
|
|
||||||
|
|
||||||
_`metrics_server_enabled`
|
|
||||||
metrics_server_enabled is used to enable disable the installation of
|
|
||||||
the metrics server.
|
|
||||||
To use this service tiller_enabled must be true when using
|
|
||||||
helm_client_tag<v3.0.0.
|
|
||||||
Train default: true
|
|
||||||
Stein default: true
|
|
||||||
|
|
||||||
_`cloud_provider_tag`
|
_`cloud_provider_tag`
|
||||||
This label allows users to override the default
|
This label allows users to override the default
|
||||||
openstack-cloud-controller-manager container image tag. Refer to
|
openstack-cloud-controller-manager container image tag. Refer to
|
||||||
|
@ -1483,74 +1470,6 @@ _`k8s_keystone_auth_tag`
|
||||||
Train default: v1.14.0
|
Train default: v1.14.0
|
||||||
Ussuri default: v1.18.0
|
Ussuri default: v1.18.0
|
||||||
|
|
||||||
_`monitoring_enabled`
|
|
||||||
Enable installation of cluster monitoring solution provided by the
|
|
||||||
stable/prometheus-operator helm chart.
|
|
||||||
To use this service tiller_enabled must be true when using
|
|
||||||
helm_client_tag<v3.0.0.
|
|
||||||
Default: false
|
|
||||||
|
|
||||||
_`monitoring_retention_days`
|
|
||||||
The number of time (in days) that prometheus metrics should be kept.
|
|
||||||
Default: 14
|
|
||||||
|
|
||||||
_`monitoring_retention_size`
|
|
||||||
The maximum memory (in GiB) allowed to be used by prometheus server to
|
|
||||||
store metrics.
|
|
||||||
Default: 14
|
|
||||||
|
|
||||||
_`monitoring_interval_seconds`
|
|
||||||
The time interval (in seconds) between consecutive metric scrapings.
|
|
||||||
Default: 30
|
|
||||||
|
|
||||||
_`monitoring_storage_class_name`
|
|
||||||
The kubernetes storage class name to use for the prometheus pvc.
|
|
||||||
Using this label will activate the usage of a pvc instead of local
|
|
||||||
disk space.
|
|
||||||
When using monitoring_storage_class_name 2 pvcs will be created.
|
|
||||||
One for the prometheus server which size is set by
|
|
||||||
monitoring_retention_size and one for grafana which is fixed at 1Gi.
|
|
||||||
Default: ""
|
|
||||||
|
|
||||||
_`monitoring_ingress_enabled`
|
|
||||||
Enable configuration of ingresses for the enabled monitoring services
|
|
||||||
{alertmanager,grafana,prometheus}.
|
|
||||||
Default: false
|
|
||||||
|
|
||||||
_`cluster_basic_auth_secret`
|
|
||||||
The kubernetes secret to use for the proxy basic auth username and password
|
|
||||||
for the unprotected services {alertmanager,prometheus}. Basic auth is only
|
|
||||||
set up if this file is specified.
|
|
||||||
The secret must be in the same namespace as the used proxy (kube-system).
|
|
||||||
Default: ""
|
|
||||||
|
|
||||||
_`cluster_root_domain_name`
|
|
||||||
The root domain name to use for the cluster automatically set up
|
|
||||||
applications.
|
|
||||||
Default: "localhost"
|
|
||||||
|
|
||||||
_`prometheus_adapter_enabled`
|
|
||||||
Enable installation of cluster custom metrics provided by the
|
|
||||||
stable/prometheus-adapter helm chart. This service depends on
|
|
||||||
monitoring_enabled.
|
|
||||||
Default: true
|
|
||||||
|
|
||||||
_`prometheus_adapter_chart_tag`
|
|
||||||
The stable/prometheus-adapter helm chart version to use.
|
|
||||||
Train-default: 1.4.0
|
|
||||||
|
|
||||||
_`prometheus_adapter_configmap`
|
|
||||||
The name of the prometheus-adapter rules ConfigMap to use. Using this label
|
|
||||||
will overwrite the default rules.
|
|
||||||
Default: ""
|
|
||||||
|
|
||||||
_`prometheus_operator_chart_tag`
|
|
||||||
Add prometheus_operator_chart_tag to select version of the
|
|
||||||
stable/prometheus-operator chart to install. When installing the chart,
|
|
||||||
helm will use the default values of the tag defined and overwrite them based
|
|
||||||
on the prometheus-operator-config ConfigMap currently defined. You must
|
|
||||||
certify that the versions are compatible.
|
|
||||||
|
|
||||||
_`tiller_enabled`
|
_`tiller_enabled`
|
||||||
If set to true, tiller will be deployed in the kube-system namespace.
|
If set to true, tiller will be deployed in the kube-system namespace.
|
||||||
Ussuri default: false
|
Ussuri default: false
|
||||||
|
@ -3582,66 +3501,7 @@ created. This example can be applied for any ``create``, ``update`` or
|
||||||
Container Monitoring
|
Container Monitoring
|
||||||
====================
|
====================
|
||||||
|
|
||||||
The offered monitoring stack relies on the following set of containers and
|
.. include:: monitoring.rst
|
||||||
services:
|
|
||||||
|
|
||||||
- cAdvisor
|
|
||||||
- Node Exporter
|
|
||||||
- Prometheus
|
|
||||||
- Grafana
|
|
||||||
|
|
||||||
To setup this monitoring stack, users are given two configurable labels in
|
|
||||||
the Magnum cluster template's definition:
|
|
||||||
|
|
||||||
_`prometheus_monitoring`
|
|
||||||
This label accepts a boolean value. If *True*, the monitoring stack will be
|
|
||||||
setup. By default *prometheus_monitoring = False*.
|
|
||||||
|
|
||||||
_`grafana_admin_passwd`
|
|
||||||
This label lets users create their own *admin* user password for the Grafana
|
|
||||||
interface. It expects a string value. By default it is set to *admin*.
|
|
||||||
|
|
||||||
|
|
||||||
Container Monitoring in Kubernetes
|
|
||||||
----------------------------------
|
|
||||||
|
|
||||||
By default, all Kubernetes clusters already contain *cAdvisor* integrated
|
|
||||||
with the *Kubelet* binary. Its container monitoring data can be accessed on
|
|
||||||
a node level basis through *http://NODE_IP:4194*.
|
|
||||||
|
|
||||||
Node Exporter is part of the above mentioned monitoring stack as it can be
|
|
||||||
used to export machine metrics. Such functionality also work on a node level
|
|
||||||
which means that when `prometheus_monitoring`_ is *True*, the Kubernetes nodes
|
|
||||||
will be populated with an additional manifest under
|
|
||||||
*/etc/kubernetes/manifests*. Node Exporter is then automatically picked up
|
|
||||||
and launched as a regular Kubernetes POD.
|
|
||||||
|
|
||||||
To aggregate and complement all the existing monitoring metrics and add a
|
|
||||||
built-in visualization layer, Prometheus is used. It is launched by the
|
|
||||||
Kubernetes master node(s) as a *Service* within a *Deployment* with one
|
|
||||||
replica and it relies on a *ConfigMap* where the Prometheus configuration
|
|
||||||
(prometheus.yml) is defined. This configuration uses Prometheus native
|
|
||||||
support for service discovery in Kubernetes clusters,
|
|
||||||
*kubernetes_sd_configs*. The respective manifests can be found in
|
|
||||||
*/srv/kubernetes/monitoring/* on the master nodes and once the service is
|
|
||||||
up and running, Prometheus UI can be accessed through port 9090.
|
|
||||||
|
|
||||||
Finally, for custom plotting and enhanced metric aggregation and
|
|
||||||
visualization, Prometheus can be integrated with Grafana as it provides
|
|
||||||
native compliance for Prometheus data sources. Also Grafana is deployed as
|
|
||||||
a *Service* within a *Deployment* with one replica. The default user is
|
|
||||||
*admin* and the password is setup according to `grafana_admin_passwd`_.
|
|
||||||
There is also a default Grafana dashboard provided with this installation,
|
|
||||||
from the official `Grafana dashboards' repository
|
|
||||||
<https://grafana.net/dashboards>`_. The Prometheus data
|
|
||||||
source is automatically added to Grafana once it is up and running, pointing
|
|
||||||
to *http://prometheus:9090* through *Proxy*. The respective manifests can
|
|
||||||
also be found in */srv/kubernetes/monitoring/* on the master nodes and once
|
|
||||||
the service is running, the Grafana dashboards can be accessed through port
|
|
||||||
3000.
|
|
||||||
|
|
||||||
For both Prometheus and Grafana, there is an assigned *systemd* service
|
|
||||||
called *kube-enable-monitoring*.
|
|
||||||
|
|
||||||
Kubernetes Post Install Manifest
|
Kubernetes Post Install Manifest
|
||||||
================================
|
================================
|
||||||
|
|
|
@ -0,0 +1,183 @@
|
||||||
|
As of this moment, monitoring is only supported for kubernetes clusters.
|
||||||
|
|
||||||
|
Container Monitoring in Kubernetes
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
The current monitoring capabilities that can be deployed with magnum span
|
||||||
|
through different components. These are:
|
||||||
|
|
||||||
|
* **metrics-server:** is responsible for the API metrics.k8s.io requests. This
|
||||||
|
includes the most basic functionality when using simple HPA metrics or when
|
||||||
|
using the *kubectl top* command.
|
||||||
|
|
||||||
|
* **prometheus:** is a full fledged service that allows the user to access
|
||||||
|
advanced metrics capabilities. These metrics are collected with a resolution
|
||||||
|
of 30 seconds and include resources such as CPU, Memory, Disk and Network IO
|
||||||
|
as well as R/W rates. These metrics of fine granularity are available on your
|
||||||
|
cluster for up to a period of 14 days (default).
|
||||||
|
|
||||||
|
* **prometheus-adapter:** is an extra component that integrates with the
|
||||||
|
prometheus service and allows a user to create more sophisticated `HPA
|
||||||
|
<https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/>`_
|
||||||
|
rules. The service integrates fully with the metrics.k8s.io API but at this
|
||||||
|
time only custom.metrics.k8s.io is being actively used.
|
||||||
|
|
||||||
|
|
||||||
|
The installation of these services is controlled with the following labels:
|
||||||
|
|
||||||
|
_`metrics_server_enabled`
|
||||||
|
metrics_server_enabled is used to enable disable the installation of
|
||||||
|
the metrics server.
|
||||||
|
To use this service tiller_enabled must be true when using
|
||||||
|
helm_client_tag<v3.0.0.
|
||||||
|
Train default: true
|
||||||
|
Stein default: true
|
||||||
|
|
||||||
|
_`monitoring_enabled`
|
||||||
|
Enable installation of cluster monitoring solution provided by the
|
||||||
|
stable/prometheus-operator helm chart.
|
||||||
|
To use this service tiller_enabled must be true when using
|
||||||
|
helm_client_tag<v3.0.0.
|
||||||
|
Default: false
|
||||||
|
|
||||||
|
_`prometheus_adapter_enabled`
|
||||||
|
Enable installation of cluster custom metrics provided by the
|
||||||
|
stable/prometheus-adapter helm chart. This service depends on
|
||||||
|
monitoring_enabled.
|
||||||
|
Default: true
|
||||||
|
|
||||||
|
To control deployed versions, extra labels are available:
|
||||||
|
|
||||||
|
_`metrics_server_chart_tag`
|
||||||
|
Add metrics_server_chart_tag to select the version of the
|
||||||
|
stable/metrics-server chart to install.
|
||||||
|
Ussuri default: v2.8.8
|
||||||
|
|
||||||
|
_`prometheus_operator_chart_tag`
|
||||||
|
Add prometheus_operator_chart_tag to select version of the
|
||||||
|
stable/prometheus-operator chart to install. When installing the chart,
|
||||||
|
helm will use the default values of the tag defined and overwrite them based
|
||||||
|
on the prometheus-operator-config ConfigMap currently defined. You must
|
||||||
|
certify that the versions are compatible.
|
||||||
|
|
||||||
|
_`prometheus_adapter_chart_tag`
|
||||||
|
The stable/prometheus-adapter helm chart version to use.
|
||||||
|
Train-default: 1.4.0
|
||||||
|
|
||||||
|
Full fledged cluster monitoring
|
||||||
|
+++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
The prometheus installation provided with the `monitoring_enabled`_ label is in
|
||||||
|
fact a multi component service. This installation is managed with the
|
||||||
|
prometheus-operator helm chart and the constituent components are:
|
||||||
|
|
||||||
|
* **prometheus** (data collection, storage and search)
|
||||||
|
|
||||||
|
* **node-exporter** (data source for the kubelet/node)
|
||||||
|
* **kube-state-metrics** (data source for the running kubernetes objects
|
||||||
|
{deployments, pods, nodes, etc})
|
||||||
|
|
||||||
|
* **alertmanager** (alarm aggregation, processing and dispatch)
|
||||||
|
* **grafana** (metrics visualization)
|
||||||
|
|
||||||
|
|
||||||
|
These components are installed in a generic way that makes it easy to have a
|
||||||
|
cluster wide monitoring infrastructure running with no effort.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
The existent monitoring infra does not take into account the existence of
|
||||||
|
nodegroups. If you plan to use nodegroups in your cluster you can take into
|
||||||
|
account the maximum number of total nodes and use *max_node_count* to
|
||||||
|
correctly setup the prometheus server.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Before creating your cluster take into account the scale of the cluster.
|
||||||
|
This is important as the Prometheus server pod might not fit your nodes.
|
||||||
|
This is particularly important if you are using *Cluster Autoscaling* as
|
||||||
|
the Prometheus server will schedule resources needed to meet the maximum
|
||||||
|
number of nodes that your cluster can scale up to defined by
|
||||||
|
label (if existent) *max_node_count*.
|
||||||
|
|
||||||
|
The Prometheus server will consume the following resources:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
RAM:: 256 (base) + Nodes * 40 [MB]
|
||||||
|
CPU:: 128 (base) + Nodes * 7 [mCPU]
|
||||||
|
Disk:: 15 GB for 2 weeks (depends on usage)
|
||||||
|
|
||||||
|
|
||||||
|
Tuning parameters
|
||||||
|
+++++++++++++++++
|
||||||
|
|
||||||
|
The existent setup configurations allows you to tune the metric infrastructure
|
||||||
|
to your requisites. Below is a list of labels that can be used for specific
|
||||||
|
cases:
|
||||||
|
|
||||||
|
_`grafana_admin_passwd`
|
||||||
|
This label lets users create their own *admin* user password for the Grafana
|
||||||
|
interface. It expects a string value.
|
||||||
|
Default: admin
|
||||||
|
|
||||||
|
_`monitoring_retention_days`
|
||||||
|
This label lets users specify the maximum retention time for data collected
|
||||||
|
in the prometheus server in days.
|
||||||
|
Default: 14
|
||||||
|
|
||||||
|
_`monitoring_interval_seconds`
|
||||||
|
This label lets users specify the time between metric samples in seconds.
|
||||||
|
Default: 30
|
||||||
|
|
||||||
|
_`monitoring_retention_size`
|
||||||
|
This label lets users specify the maximum size (in gigibytes) for data
|
||||||
|
stored by the prometheus server. This label must be used together with
|
||||||
|
`monitoring_storage_class_name`_.
|
||||||
|
Default: 14
|
||||||
|
|
||||||
|
_`monitoring_storage_class_name`
|
||||||
|
The kubernetes storage class name to use for the prometheus pvc.
|
||||||
|
Using this label will activate the usage of a pvc instead of local
|
||||||
|
disk space.
|
||||||
|
When using monitoring_storage_class_name 2 pvcs will be created.
|
||||||
|
One for the prometheus server which size is set by
|
||||||
|
`monitoring_retention_size`_ and one for grafana which is fixed at 1Gi.
|
||||||
|
Default: ""
|
||||||
|
|
||||||
|
_`monitoring_ingress_enabled`
|
||||||
|
This label set's up all the underlying services to be accessible in a
|
||||||
|
'route by path' way. This means that the services will be exposed as:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
my.domain.com/alertmanager
|
||||||
|
my.domain.com/prometheus
|
||||||
|
my.domain.com/grafana
|
||||||
|
|
||||||
|
|
||||||
|
This label must be used together with `cluster_root_domain_name`_.
|
||||||
|
Default: false
|
||||||
|
|
||||||
|
_`cluster_root_domain_name`
|
||||||
|
The root domain name to use for the cluster automatically set up
|
||||||
|
applications.
|
||||||
|
Default: "localhost"
|
||||||
|
|
||||||
|
_`cluster_basic_auth_secret`
|
||||||
|
The kubernetes secret to use for the proxy basic auth username and password
|
||||||
|
for the unprotected services {alertmanager,prometheus}. Basic auth is only
|
||||||
|
set up if this file is specified.
|
||||||
|
The secret must be in the same namespace as the used proxy (kube-system).
|
||||||
|
Default: ""
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
To create this secret you can do:
|
||||||
|
$ htpasswd -c auth foo
|
||||||
|
$ kubectl create secret generic basic-auth --from-file=auth
|
||||||
|
|
||||||
|
_`prometheus_adapter_configmap`
|
||||||
|
The name of the prometheus-adapter rules ConfigMap to use. Using this label
|
||||||
|
will overwrite the default rules.
|
||||||
|
Default: ""
|
Loading…
Reference in New Issue