Merge "4. Update cluster monitoring documentation"

This commit is contained in:
Zuul 2021-02-09 15:55:09 +00:00 committed by Gerrit Code Review
commit 252d722c06
2 changed files with 185 additions and 142 deletions

View File

@ -352,7 +352,7 @@ the table are linked to more details elsewhere in the user guide.
+---------------------------------------+--------------------+---------------+
| `admission_control_list`_ | see below | see below |
+---------------------------------------+--------------------+---------------+
| `prometheus_monitoring`_ | - true | false |
| `prometheus_monitoring` (deprecated) | - true | false |
| | - false | |
+---------------------------------------+--------------------+---------------+
| `grafana_admin_passwd`_ | (any string) | "admin" |
@ -1301,19 +1301,6 @@ _`heapster_enabled`
Ussuri default: false
Train default: true
_`metrics_server_chart_tag`
Add metrics_server_chart_tag to select the version of the
stable/metrics-server chart to install.
Ussuri default: v2.8.8
_`metrics_server_enabled`
metrics_server_enabled is used to enable disable the installation of
the metrics server.
To use this service tiller_enabled must be true when using
helm_client_tag<v3.0.0.
Train default: true
Stein default: true
_`cloud_provider_tag`
This label allows users to override the default
openstack-cloud-controller-manager container image tag. Refer to
@ -1483,74 +1470,6 @@ _`k8s_keystone_auth_tag`
Train default: v1.14.0
Ussuri default: v1.18.0
_`monitoring_enabled`
Enable installation of cluster monitoring solution provided by the
stable/prometheus-operator helm chart.
To use this service tiller_enabled must be true when using
helm_client_tag<v3.0.0.
Default: false
_`monitoring_retention_days`
The number of time (in days) that prometheus metrics should be kept.
Default: 14
_`monitoring_retention_size`
The maximum memory (in GiB) allowed to be used by prometheus server to
store metrics.
Default: 14
_`monitoring_interval_seconds`
The time interval (in seconds) between consecutive metric scrapings.
Default: 30
_`monitoring_storage_class_name`
The kubernetes storage class name to use for the prometheus pvc.
Using this label will activate the usage of a pvc instead of local
disk space.
When using monitoring_storage_class_name 2 pvcs will be created.
One for the prometheus server which size is set by
monitoring_retention_size and one for grafana which is fixed at 1Gi.
Default: ""
_`monitoring_ingress_enabled`
Enable configuration of ingresses for the enabled monitoring services
{alertmanager,grafana,prometheus}.
Default: false
_`cluster_basic_auth_secret`
The kubernetes secret to use for the proxy basic auth username and password
for the unprotected services {alertmanager,prometheus}. Basic auth is only
set up if this file is specified.
The secret must be in the same namespace as the used proxy (kube-system).
Default: ""
_`cluster_root_domain_name`
The root domain name to use for the cluster automatically set up
applications.
Default: "localhost"
_`prometheus_adapter_enabled`
Enable installation of cluster custom metrics provided by the
stable/prometheus-adapter helm chart. This service depends on
monitoring_enabled.
Default: true
_`prometheus_adapter_chart_tag`
The stable/prometheus-adapter helm chart version to use.
Train-default: 1.4.0
_`prometheus_adapter_configmap`
The name of the prometheus-adapter rules ConfigMap to use. Using this label
will overwrite the default rules.
Default: ""
_`prometheus_operator_chart_tag`
Add prometheus_operator_chart_tag to select version of the
stable/prometheus-operator chart to install. When installing the chart,
helm will use the default values of the tag defined and overwrite them based
on the prometheus-operator-config ConfigMap currently defined. You must
certify that the versions are compatible.
_`tiller_enabled`
If set to true, tiller will be deployed in the kube-system namespace.
Ussuri default: false
@ -3582,66 +3501,7 @@ created. This example can be applied for any ``create``, ``update`` or
Container Monitoring
====================
The offered monitoring stack relies on the following set of containers and
services:
- cAdvisor
- Node Exporter
- Prometheus
- Grafana
To setup this monitoring stack, users are given two configurable labels in
the Magnum cluster template's definition:
_`prometheus_monitoring`
This label accepts a boolean value. If *True*, the monitoring stack will be
setup. By default *prometheus_monitoring = False*.
_`grafana_admin_passwd`
This label lets users create their own *admin* user password for the Grafana
interface. It expects a string value. By default it is set to *admin*.
Container Monitoring in Kubernetes
----------------------------------
By default, all Kubernetes clusters already contain *cAdvisor* integrated
with the *Kubelet* binary. Its container monitoring data can be accessed on
a node level basis through *http://NODE_IP:4194*.
Node Exporter is part of the above mentioned monitoring stack as it can be
used to export machine metrics. Such functionality also work on a node level
which means that when `prometheus_monitoring`_ is *True*, the Kubernetes nodes
will be populated with an additional manifest under
*/etc/kubernetes/manifests*. Node Exporter is then automatically picked up
and launched as a regular Kubernetes POD.
To aggregate and complement all the existing monitoring metrics and add a
built-in visualization layer, Prometheus is used. It is launched by the
Kubernetes master node(s) as a *Service* within a *Deployment* with one
replica and it relies on a *ConfigMap* where the Prometheus configuration
(prometheus.yml) is defined. This configuration uses Prometheus native
support for service discovery in Kubernetes clusters,
*kubernetes_sd_configs*. The respective manifests can be found in
*/srv/kubernetes/monitoring/* on the master nodes and once the service is
up and running, Prometheus UI can be accessed through port 9090.
Finally, for custom plotting and enhanced metric aggregation and
visualization, Prometheus can be integrated with Grafana as it provides
native compliance for Prometheus data sources. Also Grafana is deployed as
a *Service* within a *Deployment* with one replica. The default user is
*admin* and the password is setup according to `grafana_admin_passwd`_.
There is also a default Grafana dashboard provided with this installation,
from the official `Grafana dashboards' repository
<https://grafana.net/dashboards>`_. The Prometheus data
source is automatically added to Grafana once it is up and running, pointing
to *http://prometheus:9090* through *Proxy*. The respective manifests can
also be found in */srv/kubernetes/monitoring/* on the master nodes and once
the service is running, the Grafana dashboards can be accessed through port
3000.
For both Prometheus and Grafana, there is an assigned *systemd* service
called *kube-enable-monitoring*.
.. include:: monitoring.rst
Kubernetes Post Install Manifest
================================

View File

@ -0,0 +1,183 @@
As of this moment, monitoring is only supported for kubernetes clusters.
Container Monitoring in Kubernetes
----------------------------------
The current monitoring capabilities that can be deployed with magnum span
through different components. These are:
* **metrics-server:** is responsible for the API metrics.k8s.io requests. This
includes the most basic functionality when using simple HPA metrics or when
using the *kubectl top* command.
* **prometheus:** is a full fledged service that allows the user to access
advanced metrics capabilities. These metrics are collected with a resolution
of 30 seconds and include resources such as CPU, Memory, Disk and Network IO
as well as R/W rates. These metrics of fine granularity are available on your
cluster for up to a period of 14 days (default).
* **prometheus-adapter:** is an extra component that integrates with the
prometheus service and allows a user to create more sophisticated `HPA
<https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/>`_
rules. The service integrates fully with the metrics.k8s.io API but at this
time only custom.metrics.k8s.io is being actively used.
The installation of these services is controlled with the following labels:
_`metrics_server_enabled`
metrics_server_enabled is used to enable disable the installation of
the metrics server.
To use this service tiller_enabled must be true when using
helm_client_tag<v3.0.0.
Train default: true
Stein default: true
_`monitoring_enabled`
Enable installation of cluster monitoring solution provided by the
stable/prometheus-operator helm chart.
To use this service tiller_enabled must be true when using
helm_client_tag<v3.0.0.
Default: false
_`prometheus_adapter_enabled`
Enable installation of cluster custom metrics provided by the
stable/prometheus-adapter helm chart. This service depends on
monitoring_enabled.
Default: true
To control deployed versions, extra labels are available:
_`metrics_server_chart_tag`
Add metrics_server_chart_tag to select the version of the
stable/metrics-server chart to install.
Ussuri default: v2.8.8
_`prometheus_operator_chart_tag`
Add prometheus_operator_chart_tag to select version of the
stable/prometheus-operator chart to install. When installing the chart,
helm will use the default values of the tag defined and overwrite them based
on the prometheus-operator-config ConfigMap currently defined. You must
certify that the versions are compatible.
_`prometheus_adapter_chart_tag`
The stable/prometheus-adapter helm chart version to use.
Train-default: 1.4.0
Full fledged cluster monitoring
+++++++++++++++++++++++++++++++
The prometheus installation provided with the `monitoring_enabled`_ label is in
fact a multi component service. This installation is managed with the
prometheus-operator helm chart and the constituent components are:
* **prometheus** (data collection, storage and search)
* **node-exporter** (data source for the kubelet/node)
* **kube-state-metrics** (data source for the running kubernetes objects
{deployments, pods, nodes, etc})
* **alertmanager** (alarm aggregation, processing and dispatch)
* **grafana** (metrics visualization)
These components are installed in a generic way that makes it easy to have a
cluster wide monitoring infrastructure running with no effort.
.. warning::
The existent monitoring infra does not take into account the existence of
nodegroups. If you plan to use nodegroups in your cluster you can take into
account the maximum number of total nodes and use *max_node_count* to
correctly setup the prometheus server.
.. note::
Before creating your cluster take into account the scale of the cluster.
This is important as the Prometheus server pod might not fit your nodes.
This is particularly important if you are using *Cluster Autoscaling* as
the Prometheus server will schedule resources needed to meet the maximum
number of nodes that your cluster can scale up to defined by
label (if existent) *max_node_count*.
The Prometheus server will consume the following resources:
::
RAM:: 256 (base) + Nodes * 40 [MB]
CPU:: 128 (base) + Nodes * 7 [mCPU]
Disk:: 15 GB for 2 weeks (depends on usage)
Tuning parameters
+++++++++++++++++
The existent setup configurations allows you to tune the metric infrastructure
to your requisites. Below is a list of labels that can be used for specific
cases:
_`grafana_admin_passwd`
This label lets users create their own *admin* user password for the Grafana
interface. It expects a string value.
Default: admin
_`monitoring_retention_days`
This label lets users specify the maximum retention time for data collected
in the prometheus server in days.
Default: 14
_`monitoring_interval_seconds`
This label lets users specify the time between metric samples in seconds.
Default: 30
_`monitoring_retention_size`
This label lets users specify the maximum size (in gigibytes) for data
stored by the prometheus server. This label must be used together with
`monitoring_storage_class_name`_.
Default: 14
_`monitoring_storage_class_name`
The kubernetes storage class name to use for the prometheus pvc.
Using this label will activate the usage of a pvc instead of local
disk space.
When using monitoring_storage_class_name 2 pvcs will be created.
One for the prometheus server which size is set by
`monitoring_retention_size`_ and one for grafana which is fixed at 1Gi.
Default: ""
_`monitoring_ingress_enabled`
This label set's up all the underlying services to be accessible in a
'route by path' way. This means that the services will be exposed as:
::
my.domain.com/alertmanager
my.domain.com/prometheus
my.domain.com/grafana
This label must be used together with `cluster_root_domain_name`_.
Default: false
_`cluster_root_domain_name`
The root domain name to use for the cluster automatically set up
applications.
Default: "localhost"
_`cluster_basic_auth_secret`
The kubernetes secret to use for the proxy basic auth username and password
for the unprotected services {alertmanager,prometheus}. Basic auth is only
set up if this file is specified.
The secret must be in the same namespace as the used proxy (kube-system).
Default: ""
::
To create this secret you can do:
$ htpasswd -c auth foo
$ kubectl create secret generic basic-auth --from-file=auth
_`prometheus_adapter_configmap`
The name of the prometheus-adapter rules ConfigMap to use. Using this label
will overwrite the default rules.
Default: ""