4. Update cluster monitoring documentation

Change the User Documentation to introduce the new way of installing the prometheus monitoring suite by using label monitoring_enabled. Give a broad overview of the existent monitoring features available out-of-the-box and which components exist and what they do. Explain which FAQ can be solved with already existent integrations by manipulating monitoring specific labels. task: 39627 story: 2006765 Depends-On: Ie0e7000e0d94b2037f2c398fa67a2a2b7e256bc3 Change-Id: I5581650b15ce94e31a44de09f82aef1790013b54 Signed-off-by: Diogo Guerra <diogo.filipe.tomas.guerra@gmail.com>
2020-10-05 14:56:24 +02:00 · 2020-10-05 14:56:24 +02:00 · a3d8b4fe8d
parent ea64468ab3
commit a3d8b4fe8d
2 changed files with 185 additions and 142 deletions
--- a/doc/source/user/index.rst
+++ b/doc/source/user/index.rst
@ -352,7 +352,7 @@ the table are linked to more details elsewhere in the user guide.
 +---------------------------------------+--------------------+---------------+
 | `admission_control_list`_             | see below          | see below     |
 +---------------------------------------+--------------------+---------------+
-| `prometheus_monitoring`_              | - true             | false         |
+| `prometheus_monitoring` (deprecated)  | - true             | false         |
 |                                       | - false            |               |
 +---------------------------------------+--------------------+---------------+
 | `grafana_admin_passwd`_               | (any string)       | "admin"       |
@ -1301,19 +1301,6 @@ _`heapster_enabled`
  Ussuri default: false
  Train default: true
 _`metrics_server_chart_tag`
  Add metrics_server_chart_tag to select the version of the
  stable/metrics-server chart to install.
  Ussuri default: v2.8.8
 _`metrics_server_enabled`
  metrics_server_enabled is used to enable disable the installation of
  the metrics server.
  To use this service tiller_enabled must be true when using
  helm_client_tag<v3.0.0.
  Train default: true
  Stein default: true
 _`cloud_provider_tag`
  This label allows users to override the default
  openstack-cloud-controller-manager container image tag. Refer to
@ -1483,74 +1470,6 @@ _`k8s_keystone_auth_tag`
  Train default: v1.14.0
  Ussuri default: v1.18.0
 _`monitoring_enabled`
  Enable installation of cluster monitoring solution provided by the
  stable/prometheus-operator helm chart.
  To use this service tiller_enabled must be true when using
  helm_client_tag<v3.0.0.
  Default: false
 _`monitoring_retention_days`
  The number of time (in days) that prometheus metrics should be kept.
  Default: 14
 _`monitoring_retention_size`
  The maximum memory (in GiB) allowed to be used by prometheus server to
  store metrics.
  Default: 14
 _`monitoring_interval_seconds`
  The time interval (in seconds) between consecutive metric scrapings.
  Default: 30
 _`monitoring_storage_class_name`
  The kubernetes storage class name to use for the prometheus pvc.
  Using this label will activate the usage of a pvc instead of local
  disk space.
  When using monitoring_storage_class_name 2 pvcs will be created.
  One for the prometheus server which size is set by
  monitoring_retention_size and one for grafana which is fixed at 1Gi.
  Default: ""
 _`monitoring_ingress_enabled`
  Enable configuration of ingresses for the enabled monitoring services
  {alertmanager,grafana,prometheus}.
  Default: false
 _`cluster_basic_auth_secret`
  The kubernetes secret to use for the proxy basic auth username and password
  for the unprotected services {alertmanager,prometheus}. Basic auth is only
  set up if this file is specified.
  The secret must be in the same namespace as the used proxy (kube-system).
  Default: ""
 _`cluster_root_domain_name`
  The root domain name to use for the cluster automatically set up
  applications.
  Default: "localhost"
 _`prometheus_adapter_enabled`
  Enable installation of cluster custom metrics provided by the
  stable/prometheus-adapter helm chart. This service depends on
  monitoring_enabled.
  Default: true
 _`prometheus_adapter_chart_tag`
  The stable/prometheus-adapter helm chart version to use.
  Train-default: 1.4.0
 _`prometheus_adapter_configmap`
  The name of the prometheus-adapter rules ConfigMap to use. Using this label
  will overwrite the default rules.
  Default: ""
 _`prometheus_operator_chart_tag`
  Add prometheus_operator_chart_tag to select version of the
  stable/prometheus-operator chart to install. When installing the chart,
  helm will use the default values of the tag defined and overwrite them based
  on the prometheus-operator-config ConfigMap currently defined. You must
  certify that the versions are compatible.
 _`tiller_enabled`
  If set to true, tiller will be deployed in the kube-system namespace.
  Ussuri default: false
@ -3582,66 +3501,7 @@ created. This example can be applied for any ``create``, ``update`` or
 Container Monitoring
 ====================
-The offered monitoring stack relies on the following set of containers and
+.. include:: monitoring.rst
 services:
 - cAdvisor
 - Node Exporter
 - Prometheus
 - Grafana
 To setup this monitoring stack, users are given two configurable labels in
 the Magnum cluster template's definition:
 _`prometheus_monitoring`
  This label accepts a boolean value. If *True*, the monitoring stack will be
  setup. By default *prometheus_monitoring = False*.
 _`grafana_admin_passwd`
  This label lets users create their own *admin* user password for the Grafana
  interface. It expects a string value. By default it is set to *admin*.
 Container Monitoring in Kubernetes
 ----------------------------------
 By default, all Kubernetes clusters already contain *cAdvisor* integrated
 with the *Kubelet* binary. Its container monitoring data can be accessed on
 a node level basis through *http://NODE_IP:4194*.
 Node Exporter is part of the above mentioned monitoring stack as it can be
 used to export machine metrics. Such functionality also work on a node level
 which means that when `prometheus_monitoring`_ is *True*, the Kubernetes nodes
 will be populated with an additional manifest under
 */etc/kubernetes/manifests*. Node Exporter is then automatically picked up
 and launched as a regular Kubernetes POD.
 To aggregate and complement all the existing monitoring metrics and add a
 built-in visualization layer, Prometheus is used. It is launched by the
 Kubernetes master node(s) as a *Service* within a *Deployment* with one
 replica and it relies on a *ConfigMap* where the Prometheus configuration
 (prometheus.yml) is defined. This configuration uses Prometheus native
 support for service discovery in Kubernetes clusters,
 *kubernetes_sd_configs*. The respective manifests can be found in
 */srv/kubernetes/monitoring/* on the master nodes and once the service is
 up and running, Prometheus UI can be accessed through port 9090.
 Finally, for custom plotting and enhanced metric aggregation and
 visualization, Prometheus can be integrated with Grafana as it provides
 native compliance for Prometheus data sources. Also Grafana is deployed as
 a *Service* within a *Deployment* with one replica. The default user is
 *admin* and the password is setup according to `grafana_admin_passwd`_.
 There is also a default Grafana dashboard provided with this installation,
 from the official `Grafana dashboards' repository
 <https://grafana.net/dashboards>`_. The Prometheus data
 source is automatically added to Grafana once it is up and running, pointing
 to *http://prometheus:9090* through *Proxy*. The respective manifests can
 also be found in */srv/kubernetes/monitoring/* on the master nodes and once
 the service is running, the Grafana dashboards can be accessed through port
 3000.
 For both Prometheus and Grafana, there is an assigned *systemd* service
 called *kube-enable-monitoring*.
 Kubernetes Post Install Manifest
 ================================
--- a/doc/source/user/monitoring.rst
+++ b/doc/source/user/monitoring.rst
@ -0,0 +1,183 @@
 As of this moment, monitoring is only supported for kubernetes clusters.
 Container Monitoring in Kubernetes
 ----------------------------------
 The current monitoring capabilities that can be deployed with magnum span
 through different components. These are:
 * **metrics-server:** is responsible for the API metrics.k8s.io requests. This
  includes the most basic functionality when using simple HPA metrics or when
  using the *kubectl top* command.
 * **prometheus:** is a full fledged service that allows the user to access
  advanced metrics capabilities. These metrics are collected with a resolution
  of 30 seconds and include resources such as CPU, Memory, Disk and Network IO
  as well as R/W rates. These metrics of fine granularity are available on your
  cluster for up to a period of 14 days (default).
 * **prometheus-adapter:** is an extra component that integrates with the
  prometheus service and allows a user to create more sophisticated `HPA
  <https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/>`_
  rules. The service integrates fully with the metrics.k8s.io API but at this
  time only custom.metrics.k8s.io is being actively used.
 The installation of these services is controlled with the following labels:
 _`metrics_server_enabled`
  metrics_server_enabled is used to enable disable the installation of
  the metrics server.
  To use this service tiller_enabled must be true when using
  helm_client_tag<v3.0.0.
  Train default: true
  Stein default: true
 _`monitoring_enabled`
  Enable installation of cluster monitoring solution provided by the
  stable/prometheus-operator helm chart.
  To use this service tiller_enabled must be true when using
  helm_client_tag<v3.0.0.
  Default: false
 _`prometheus_adapter_enabled`
  Enable installation of cluster custom metrics provided by the
  stable/prometheus-adapter helm chart. This service depends on
  monitoring_enabled.
  Default: true
 To control deployed versions, extra labels are available:
 _`metrics_server_chart_tag`
  Add metrics_server_chart_tag to select the version of the
  stable/metrics-server chart to install.
  Ussuri default: v2.8.8
 _`prometheus_operator_chart_tag`
  Add prometheus_operator_chart_tag to select version of the
  stable/prometheus-operator chart to install. When installing the chart,
  helm will use the default values of the tag defined and overwrite them based
  on the prometheus-operator-config ConfigMap currently defined. You must
  certify that the versions are compatible.
 _`prometheus_adapter_chart_tag`
  The stable/prometheus-adapter helm chart version to use.
  Train-default: 1.4.0
 Full fledged cluster monitoring
 +++++++++++++++++++++++++++++++
 The prometheus installation provided with the `monitoring_enabled`_ label is in
 fact a multi component service. This installation is managed with the
 prometheus-operator helm chart and the constituent components are:
 * **prometheus** (data collection, storage and search)
  * **node-exporter** (data source for the kubelet/node)
  * **kube-state-metrics** (data source for the running kubernetes objects
    {deployments, pods, nodes, etc})
 * **alertmanager** (alarm aggregation, processing and dispatch)
 * **grafana** (metrics visualization)
 These components are installed in a generic way that makes it easy to have a
 cluster wide monitoring infrastructure running with no effort.
 .. warning::
    The existent monitoring infra does not take into account the existence of
    nodegroups. If you plan to use nodegroups in your cluster you can take into
    account the maximum number of total nodes and use *max_node_count* to
    correctly setup the prometheus server.
 .. note::
    Before creating your cluster take into account the scale of the cluster.
    This is important as the Prometheus server pod might not fit your nodes.
    This is particularly important if you are using *Cluster Autoscaling* as
    the Prometheus server will schedule resources needed to meet the maximum
    number of nodes that your cluster can scale up to defined by
    label (if existent) *max_node_count*.
    The Prometheus server will consume the following resources:
    ::
        RAM:: 256 (base) + Nodes * 40 [MB]
        CPU:: 128 (base) + Nodes * 7 [mCPU]
        Disk:: 15 GB for 2 weeks (depends on usage)
 Tuning parameters
 +++++++++++++++++
 The existent setup configurations allows you to tune the metric infrastructure
 to your requisites. Below is a list of labels that can be used for specific
 cases:
 _`grafana_admin_passwd`
  This label lets users create their own *admin* user password for the Grafana
  interface. It expects a string value.
  Default: admin
 _`monitoring_retention_days`
  This label lets users specify the maximum retention time for data collected
  in the prometheus server in days.
  Default: 14
 _`monitoring_interval_seconds`
  This label lets users specify the time between metric samples in seconds.
  Default: 30
 _`monitoring_retention_size`
  This label lets users specify the maximum size (in gigibytes) for data
  stored by the prometheus server. This label must be used together with
  `monitoring_storage_class_name`_.
  Default: 14
 _`monitoring_storage_class_name`
  The kubernetes storage class name to use for the prometheus pvc.
  Using this label will activate the usage of a pvc instead of local
  disk space.
  When using monitoring_storage_class_name 2 pvcs will be created.
  One for the prometheus server which size is set by
  `monitoring_retention_size`_ and one for grafana which is fixed at 1Gi.
  Default: ""
 _`monitoring_ingress_enabled`
  This label set's up all the underlying services to be accessible in a
  'route by path' way. This means that the services will be exposed as:
  ::
      my.domain.com/alertmanager
      my.domain.com/prometheus
      my.domain.com/grafana
  This label must be used together with `cluster_root_domain_name`_.
  Default: false
 _`cluster_root_domain_name`
  The root domain name to use for the cluster automatically set up
  applications.
  Default: "localhost"
 _`cluster_basic_auth_secret`
  The kubernetes secret to use for the proxy basic auth username and password
  for the unprotected services {alertmanager,prometheus}. Basic auth is only
  set up if this file is specified.
  The secret must be in the same namespace as the used proxy (kube-system).
  Default: ""
  ::
    To create this secret you can do:
    $ htpasswd -c auth foo
    $ kubectl create secret generic basic-auth --from-file=auth
 _`prometheus_adapter_configmap`
  The name of the prometheus-adapter rules ConfigMap to use. Using this label
  will overwrite the default rules.
  Default: ""