4. Update cluster monitoring documentation

Change the User Documentation to introduce the new way of installing the prometheus monitoring suite by using label monitoring_enabled. Give a broad overview of the existent monitoring features available out-of-the-box and which components exist and what they do. Explain which FAQ can be solved with already existent integrations by manipulating monitoring specific labels. task: 39627 story: 2006765 Depends-On: Ie0e7000e0d94b2037f2c398fa67a2a2b7e256bc3 Change-Id: I5581650b15ce94e31a44de09f82aef1790013b54 Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@gmail.com>
2020-10-05 14:56:24 +02:00 · 2020-10-05 14:56:24 +02:00 · a3d8b4fe8d
parent ea64468ab3
commit a3d8b4fe8d
2 changed files with 185 additions and 142 deletions
--- a/doc/source/user/index.rst
+++ b/doc/source/user/index.rst
@ -352,7 +352,7 @@ the table are linked to more details elsewhere in the user guide.
 +---------------------------------------+--------------------+---------------+
 | `admission_control_list`_             | see below          | see below     |
 +---------------------------------------+--------------------+---------------+
-| `prometheus_monitoring`_              | - true             | false         |
+| `prometheus_monitoring` (deprecated)  | - true             | false         |
 |                                       | - false            |               |
 +---------------------------------------+--------------------+---------------+
 | `grafana_admin_passwd`_               | (any string)       | "admin"       |
@ -1301,19 +1301,6 @@ _`heapster_enabled`
  Ussuri default: false
  Train default: true

-_`metrics_server_chart_tag`
-  Add metrics_server_chart_tag to select the version of the
-  stable/metrics-server chart to install.
-  Ussuri default: v2.8.8
-
-_`metrics_server_enabled`
-  metrics_server_enabled is used to enable disable the installation of
-  the metrics server.
-  To use this service tiller_enabled must be true when using
-  helm_client_tag<v3.0.0.
-  Train default: true
-  Stein default: true
-
 _`cloud_provider_tag`
  This label allows users to override the default
  openstack-cloud-controller-manager container image tag. Refer to
@ -1483,74 +1470,6 @@ _`k8s_keystone_auth_tag`
  Train default: v1.14.0
  Ussuri default: v1.18.0

-_`monitoring_enabled`
-  Enable installation of cluster monitoring solution provided by the
-  stable/prometheus-operator helm chart.
-  To use this service tiller_enabled must be true when using
-  helm_client_tag<v3.0.0.
-  Default: false
-
-_`monitoring_retention_days`
-  The number of time (in days) that prometheus metrics should be kept.
-  Default: 14
-
-_`monitoring_retention_size`
-  The maximum memory (in GiB) allowed to be used by prometheus server to
-  store metrics.
-  Default: 14
-
-_`monitoring_interval_seconds`
-  The time interval (in seconds) between consecutive metric scrapings.
-  Default: 30
-
-_`monitoring_storage_class_name`
-  The kubernetes storage class name to use for the prometheus pvc.
-  Using this label will activate the usage of a pvc instead of local
-  disk space.
-  When using monitoring_storage_class_name 2 pvcs will be created.
-  One for the prometheus server which size is set by
-  monitoring_retention_size and one for grafana which is fixed at 1Gi.
-  Default: ""
-
-_`monitoring_ingress_enabled`
-  Enable configuration of ingresses for the enabled monitoring services
-  {alertmanager,grafana,prometheus}.
-  Default: false
-
-_`cluster_basic_auth_secret`
-  The kubernetes secret to use for the proxy basic auth username and password
-  for the unprotected services {alertmanager,prometheus}. Basic auth is only
-  set up if this file is specified.
-  The secret must be in the same namespace as the used proxy (kube-system).
-  Default: ""
-
-_`cluster_root_domain_name`
-  The root domain name to use for the cluster automatically set up
-  applications.
-  Default: "localhost"
-
-_`prometheus_adapter_enabled`
-  Enable installation of cluster custom metrics provided by the
-  stable/prometheus-adapter helm chart. This service depends on
-  monitoring_enabled.
-  Default: true
-
-_`prometheus_adapter_chart_tag`
-  The stable/prometheus-adapter helm chart version to use.
-  Train-default: 1.4.0
-
-_`prometheus_adapter_configmap`
-  The name of the prometheus-adapter rules ConfigMap to use. Using this label
-  will overwrite the default rules.
-  Default: ""
-
-_`prometheus_operator_chart_tag`
-  Add prometheus_operator_chart_tag to select version of the
-  stable/prometheus-operator chart to install. When installing the chart,
-  helm will use the default values of the tag defined and overwrite them based
-  on the prometheus-operator-config ConfigMap currently defined. You must
-  certify that the versions are compatible.
-
 _`tiller_enabled`
  If set to true, tiller will be deployed in the kube-system namespace.
  Ussuri default: false
@ -3582,66 +3501,7 @@ created. This example can be applied for any ``create``, ``update`` or
 Container Monitoring
 ====================

-The offered monitoring stack relies on the following set of containers and
-services:
-
- cAdvisor
- Node Exporter
- Prometheus
- Grafana
-
-To setup this monitoring stack, users are given two configurable labels in
-the Magnum cluster template's definition:
-
-_`prometheus_monitoring`
-  This label accepts a boolean value. If *True*, the monitoring stack will be
-  setup. By default *prometheus_monitoring = False*.
-
-_`grafana_admin_passwd`
-  This label lets users create their own *admin* user password for the Grafana
-  interface. It expects a string value. By default it is set to *admin*.
-
-
-Container Monitoring in Kubernetes
----------------------------------
-
-By default, all Kubernetes clusters already contain *cAdvisor* integrated
-with the *Kubelet* binary. Its container monitoring data can be accessed on
-a node level basis through *http://NODE_IP:4194*.
-
-Node Exporter is part of the above mentioned monitoring stack as it can be
-used to export machine metrics. Such functionality also work on a node level
-which means that when `prometheus_monitoring`_ is *True*, the Kubernetes nodes
-will be populated with an additional manifest under
-*/etc/kubernetes/manifests*. Node Exporter is then automatically picked up
-and launched as a regular Kubernetes POD.
-
-To aggregate and complement all the existing monitoring metrics and add a
-built-in visualization layer, Prometheus is used. It is launched by the
-Kubernetes master node(s) as a *Service* within a *Deployment* with one
-replica and it relies on a *ConfigMap* where the Prometheus configuration
-(prometheus.yml) is defined. This configuration uses Prometheus native
-support for service discovery in Kubernetes clusters,
-*kubernetes_sd_configs*. The respective manifests can be found in
-*/srv/kubernetes/monitoring/* on the master nodes and once the service is
-up and running, Prometheus UI can be accessed through port 9090.
-
-Finally, for custom plotting and enhanced metric aggregation and
-visualization, Prometheus can be integrated with Grafana as it provides
-native compliance for Prometheus data sources. Also Grafana is deployed as
-a *Service* within a *Deployment* with one replica. The default user is
-*admin* and the password is setup according to `grafana_admin_passwd`_.
-There is also a default Grafana dashboard provided with this installation,
-from the official `Grafana dashboards' repository
-<https://grafana.net/dashboards>`_. The Prometheus data
-source is automatically added to Grafana once it is up and running, pointing
-to *http://prometheus:9090* through *Proxy*. The respective manifests can
-also be found in */srv/kubernetes/monitoring/* on the master nodes and once
-the service is running, the Grafana dashboards can be accessed through port
-3000.
-
-For both Prometheus and Grafana, there is an assigned *systemd* service
-called *kube-enable-monitoring*.
+.. include:: monitoring.rst

 Kubernetes Post Install Manifest
 ================================
--- a/doc/source/user/monitoring.rst
+++ b/doc/source/user/monitoring.rst
@ -0,0 +1,183 @@
+As of this moment, monitoring is only supported for kubernetes clusters.
+
+Container Monitoring in Kubernetes
+----------------------------------
+
+The current monitoring capabilities that can be deployed with magnum span
+through different components. These are:
+
+* **metrics-server:** is responsible for the API metrics.k8s.io requests. This
+  includes the most basic functionality when using simple HPA metrics or when
+  using the *kubectl top* command.
+
+* **prometheus:** is a full fledged service that allows the user to access
+  advanced metrics capabilities. These metrics are collected with a resolution
+  of 30 seconds and include resources such as CPU, Memory, Disk and Network IO
+  as well as R/W rates. These metrics of fine granularity are available on your
+  cluster for up to a period of 14 days (default).
+
+* **prometheus-adapter:** is an extra component that integrates with the
+  prometheus service and allows a user to create more sophisticated `HPA
+  <https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/>`_
+  rules. The service integrates fully with the metrics.k8s.io API but at this
+  time only custom.metrics.k8s.io is being actively used.
+
+
+The installation of these services is controlled with the following labels:
+
+_`metrics_server_enabled`
+  metrics_server_enabled is used to enable disable the installation of
+  the metrics server.
+  To use this service tiller_enabled must be true when using
+  helm_client_tag<v3.0.0.
+  Train default: true
+  Stein default: true
+
+_`monitoring_enabled`
+  Enable installation of cluster monitoring solution provided by the
+  stable/prometheus-operator helm chart.
+  To use this service tiller_enabled must be true when using
+  helm_client_tag<v3.0.0.
+  Default: false
+
+_`prometheus_adapter_enabled`
+  Enable installation of cluster custom metrics provided by the
+  stable/prometheus-adapter helm chart. This service depends on
+  monitoring_enabled.
+  Default: true
+
+To control deployed versions, extra labels are available:
+
+_`metrics_server_chart_tag`
+  Add metrics_server_chart_tag to select the version of the
+  stable/metrics-server chart to install.
+  Ussuri default: v2.8.8
+
+_`prometheus_operator_chart_tag`
+  Add prometheus_operator_chart_tag to select version of the
+  stable/prometheus-operator chart to install. When installing the chart,
+  helm will use the default values of the tag defined and overwrite them based
+  on the prometheus-operator-config ConfigMap currently defined. You must
+  certify that the versions are compatible.
+
+_`prometheus_adapter_chart_tag`
+  The stable/prometheus-adapter helm chart version to use.
+  Train-default: 1.4.0
+
+Full fledged cluster monitoring
+++++++++++++++++++++++++++++++
+
+The prometheus installation provided with the `monitoring_enabled`_ label is in
+fact a multi component service. This installation is managed with the
+prometheus-operator helm chart and the constituent components are:
+
+* **prometheus** (data collection, storage and search)
+
+  * **node-exporter** (data source for the kubelet/node)
+  * **kube-state-metrics** (data source for the running kubernetes objects
+    {deployments, pods, nodes, etc})
+
+* **alertmanager** (alarm aggregation, processing and dispatch)
+* **grafana** (metrics visualization)
+
+
+These components are installed in a generic way that makes it easy to have a
+cluster wide monitoring infrastructure running with no effort.
+
+.. warning::
+
+    The existent monitoring infra does not take into account the existence of
+    nodegroups. If you plan to use nodegroups in your cluster you can take into
+    account the maximum number of total nodes and use *max_node_count* to
+    correctly setup the prometheus server.
+
+.. note::
+
+    Before creating your cluster take into account the scale of the cluster.
+    This is important as the Prometheus server pod might not fit your nodes.
+    This is particularly important if you are using *Cluster Autoscaling* as
+    the Prometheus server will schedule resources needed to meet the maximum
+    number of nodes that your cluster can scale up to defined by
+    label (if existent) *max_node_count*.
+
+    The Prometheus server will consume the following resources:
+
+    ::
+
+        RAM:: 256 (base) + Nodes * 40 [MB]
+        CPU:: 128 (base) + Nodes * 7 [mCPU]
+        Disk:: 15 GB for 2 weeks (depends on usage)
+
+
+Tuning parameters
+++++++++++++++++
+
+The existent setup configurations allows you to tune the metric infrastructure
+to your requisites. Below is a list of labels that can be used for specific
+cases:
+
+_`grafana_admin_passwd`
+  This label lets users create their own *admin* user password for the Grafana
+  interface. It expects a string value.
+  Default: admin
+
+_`monitoring_retention_days`
+  This label lets users specify the maximum retention time for data collected
+  in the prometheus server in days.
+  Default: 14
+
+_`monitoring_interval_seconds`
+  This label lets users specify the time between metric samples in seconds.
+  Default: 30
+
+_`monitoring_retention_size`
+  This label lets users specify the maximum size (in gigibytes) for data
+  stored by the prometheus server. This label must be used together with
+  `monitoring_storage_class_name`_.
+  Default: 14
+
+_`monitoring_storage_class_name`
+  The kubernetes storage class name to use for the prometheus pvc.
+  Using this label will activate the usage of a pvc instead of local
+  disk space.
+  When using monitoring_storage_class_name 2 pvcs will be created.
+  One for the prometheus server which size is set by
+  `monitoring_retention_size`_ and one for grafana which is fixed at 1Gi.
+  Default: ""
+
+_`monitoring_ingress_enabled`
+  This label set's up all the underlying services to be accessible in a
+  'route by path' way. This means that the services will be exposed as:
+
+  ::
+
+      my.domain.com/alertmanager
+      my.domain.com/prometheus
+      my.domain.com/grafana
+
+
+  This label must be used together with `cluster_root_domain_name`_.
+  Default: false
+
+_`cluster_root_domain_name`
+  The root domain name to use for the cluster automatically set up
+  applications.
+  Default: "localhost"
+
+_`cluster_basic_auth_secret`
+  The kubernetes secret to use for the proxy basic auth username and password
+  for the unprotected services {alertmanager,prometheus}. Basic auth is only
+  set up if this file is specified.
+  The secret must be in the same namespace as the used proxy (kube-system).
+  Default: ""
+
+  ::
+
+    To create this secret you can do:
+    $ htpasswd -c auth foo
+    $ kubectl create secret generic basic-auth --from-file=auth
+
+_`prometheus_adapter_configmap`
+  The name of the prometheus-adapter rules ConfigMap to use. Using this label
+  will overwrite the default rules.
+  Default: ""