[k8s] Monitoring with Prometheus and Grafana

Profit from the default cAdvisor deployed by k8s to deploy the remaining monitoring stack on top, made of node-exporter, Prometheus and Grafana. Node-exporter is ran as a normal pod through a manifest, while Prometheus and Grafana are deployments with 1 replica. Prometheus has compliance with Kubernetes, so the discovery of the nodes and other k8s components is configured directly in Prometheus configuration. Change-Id: If2cab996b9458580a55b5212ab298c909622e7f3 Partially-Implements: blueprint container-monitoring
2017-01-27 17:03:54 +01:00 · 2017-01-27 17:03:54 +01:00 · 248e45f75c
commit 248e45f75c
parent 3a9e8cfb40
19 changed files with 2725 additions and 4 deletions
--- a/doc/source/userguide.rst
+++ b/doc/source/userguide.rst
@ -33,6 +33,7 @@ Contents
 #. `Storage`_
 #. `Image Management`_
 #. `Notification`_
+#. `Container Monitoring`_

 ===========
 Terminology
@ -304,7 +305,11 @@ the table are linked to more details elsewhere in the user guide.
 +---------------------------------------+--------------------+---------------+
 | `admission_control_list`_             | see below          | see below     |
 +---------------------------------------+--------------------+---------------+
-
+| `prometheus_monitoring`_              | - true             | false         |
+|                                       | - false            |               |
+---------------------------------------+--------------------+---------------+
+| `grafana_admin_passwd`_               | (any string)       | "admin"       |
+---------------------------------------+--------------------+---------------+

 =======
 Cluster
@ -2719,3 +2724,69 @@ created. This example can be applied for any ``create``, ``update`` or
        "publisher_id": "magnum.host1234",
        "timestamp": "2016-05-20 15:03:45.960280"
    }
+
+
+====================
+Container Monitoring
+====================
+
+The offered monitoring stack relies on the following set of containers and
+services:
+
+- cAdvisor
+- Node Exporter
+- Prometheus
+- Grafana
+
+To setup this monitoring stack, users are given two configurable labels in
+the Magnum cluster template's definition:
+
+_`prometheus_monitoring`
+  This label accepts a boolean value. If *True*, the monitoring stack will be
+  setup. By default *prometheus_monitoring = False*.
+
+_`grafana_admin_passwd`
+  This label lets users create their own *admin* user password for the Grafana
+  interface. It expects a string value. By default it is set to *admin*.
+
+
+Container Monitoring in Kubernetes
+----------------------------------
+
+By default, all Kubernetes clusters already contain *cAdvisor* integrated
+with the *Kubelet* binary. Its container monitoring data can be accessed on
+a node level basis through *http://NODE_IP:4194*.
+
+Node Exporter is part of the above mentioned monitoring stack as it can be
+used to export machine metrics. Such functionality also work on a node level
+which means that when `prometheus_monitoring`_ is *True*, the Kubernetes nodes
+will be populated with an additional manifest under
+*/etc/kubernetes/manifests*. Node Exporter is then automatically picked up
+and launched as a regular Kubernetes POD.
+
+To aggregate and complement all the existing monitoring metrics and add a
+built-in visualization layer, Prometheus is used. It is launched by the
+Kubernetes master node(s) as a *Service* within a *Deployment* with one
+replica and it relies on a *ConfigMap* where the Prometheus configuration
+(prometheus.yml) is defined. This configuration uses Prometheus native
+support for service discovery in Kubernetes clusters,
+*kubernetes_sd_configs*. The respective manifests can be found in
+*/srv/kubernetes/monitoring/* on the master nodes and once the service is
+up and running, Prometheus UI can be accessed through port 9090.
+
+Finally, for custom plotting and enhanced metric aggregation and
+visualization, Prometheus can be integrated with Grafana as it provides
+native compliance for Prometheus data sources. Also Grafana is deployed as
+a *Service* within a *Deployment* with one replica. The default user is
+*admin* and the password is setup according to `grafana_admin_passwd`_.
+There is also a default Grafana dashboard provided with this installation,
+from the official `Grafana dashboards' repository
+<https://grafana.net/dashboards>`_. The Prometheus data
+source is automatically added to Grafana once it is up and running, pointing
+to *http://prometheus:9090* through *Proxy*. The respective manifests can
+also be found in */srv/kubernetes/monitoring/* on the master nodes and once
+the service is running, the Grafana dashboards can be accessed through port
+3000.
+
+For both Prometheus and Grafana, there is an assigned *systemd* service
+called *kube-enable-monitoring*.
--- a/magnum/drivers/common/templates/kubernetes/fragments/enable-monitoring.sh
+++ b/magnum/drivers/common/templates/kubernetes/fragments/enable-monitoring.sh
@ -0,0 +1,139 @@
+#!/bin/bash
+
+. /etc/sysconfig/heat-params
+
+if [ "$(echo $PROMETHEUS_MONITORING | tr '[:upper:]' '[:lower:]')" = "false" ]; then
+    exit 0
+fi
+
+function writeFile {
+    # $1 is filename
+    # $2 is file content
+
+    [ -f ${1} ] || {
+        echo "Writing File: $1"
+        mkdir -p $(dirname ${1})
+        cat << EOF > ${1}
+$2
+EOF
+    }
+}
+
+KUBE_MON_BIN=/usr/local/bin/kube-enable-monitoring
+KUBE_MON_SERVICE=/etc/systemd/system/kube-enable-monitoring.service
+GRAFANA_DEF_DASHBOARDS="/var/lib/grafana/dashboards"
+GRAFANA_DEF_DASHBOARD_FILE=$GRAFANA_DEF_DASHBOARDS"/default.json"
+
+# Write the binary for enable-monitoring
+KUBE_MON_BIN_CONTENT='''#!/bin/sh
+until curl -sf "http://127.0.0.1:8080/healthz"
+do
+    echo "Waiting for Kubernetes API..."
+    sleep 5
+done
+
+# Check if all resources exist already before creating them
+# Check if configmap Prometheus exists
+kubectl get configmap prometheus -n kube-system
+if [ "$?" != "0" ] && \
+        [ -f "/srv/kubernetes/monitoring/prometheusConfigMap.yaml" ]; then
+    kubectl create -f /srv/kubernetes/monitoring/prometheusConfigMap.yaml
+fi
+
+# Check if deployment and service Prometheus exist
+kubectl get service prometheus -n kube-system | kubectl get deployment prometheus -n kube-system
+if [ "${PIPESTATUS[0]}" != "0" ] && [ "${PIPESTATUS[1]}" != "0" ] && \
+        [ -f "/srv/kubernetes/monitoring/prometheusService.yaml" ]; then
+    kubectl create -f /srv/kubernetes/monitoring/prometheusService.yaml
+fi
+
+# Check if configmap graf-dash exists
+kubectl get configmap graf-dash -n kube-system
+if [ "$?" != "0" ] && \
+        [ -f '''$GRAFANA_DEF_DASHBOARD_FILE''' ]; then
+    kubectl create configmap graf-dash --from-file='''$GRAFANA_DEF_DASHBOARD_FILE''' -n kube-system
+fi
+
+# Check if deployment and service Grafana exist
+kubectl get service grafana -n kube-system | kubectl get deployment grafana -n kube-system
+if [ "${PIPESTATUS[0]}" != "0" ] && [ "${PIPESTATUS[1]}" != "0" ] && \
+        [ -f "/srv/kubernetes/monitoring/grafanaService.yaml" ]; then
+    kubectl create -f /srv/kubernetes/monitoring/grafanaService.yaml
+fi
+
+# Wait for Grafana pod and then inject data source
+while true
+do
+    echo "Waiting for Grafana pod to be up and Running"
+    if [ "$(kubectl get po -n kube-system -l name=grafana -o jsonpath={..phase})" = "Running" ]; then
+        break
+    fi
+    sleep 2
+done
+
+# Which node is running Grafana
+NODE_IP=`kubectl get po -n kube-system -o jsonpath={.items[0].status.hostIP} -l name=grafana`
+PROM_SERVICE_IP=`kubectl get svc prometheus --namespace kube-system -o jsonpath={..clusterIP}`
+
+# The Grafana pod might be running but the app might still be initiating
+echo "Check if Grafana is ready..."
+curl --user admin:$ADMIN_PASSWD -X GET http://$NODE_IP:3000/api/datasources/1
+until [ $? -eq 0 ]
+do
+    sleep 2
+    curl --user admin:$ADMIN_PASSWD -X GET http://$NODE_IP:3000/api/datasources/1
+done
+
+# Inject Prometheus datasource into Grafana
+while true
+do
+    INJECT=`curl --user admin:$ADMIN_PASSWD -X POST  \
+        -H "Content-Type: application/json;charset=UTF-8" \
+        --data-binary '''"'"'''{"name":"k8sPrometheus","isDefault":true,
+            "type":"prometheus","url":"http://'''"'"'''$PROM_SERVICE_IP'''"'"''':9090","access":"proxy"}'''"'"'''\
+        "http://$NODE_IP:3000/api/datasources/"`
+
+    if [[ "$INJECT" = *"Datasource added"* ]]; then
+        echo "Prometheus datasource injected into Grafana"
+        break
+    fi
+    echo "Trying to inject Prometheus datasource into Grafana - "$INJECT
+done
+'''
+writeFile $KUBE_MON_BIN "$KUBE_MON_BIN_CONTENT"
+
+
+# Write the monitoring service
+KUBE_MON_SERVICE_CONTENT='''[Unit]
+Requires=kubelet.service
+
+[Service]
+Type=oneshot
+Environment=HOME=/root
+EnvironmentFile=-/etc/kubernetes/config
+ExecStart='''${KUBE_MON_BIN}'''
+
+[Install]
+WantedBy=multi-user.target
+'''
+writeFile $KUBE_MON_SERVICE "$KUBE_MON_SERVICE_CONTENT"
+
+chown root:root ${KUBE_MON_BIN}
+chmod 0755 ${KUBE_MON_BIN}
+
+chown root:root ${KUBE_MON_SERVICE}
+chmod 0644 ${KUBE_MON_SERVICE}
+
+# Download the default JSON Grafana dashboard
+# Not a crucial step, so allow it to fail
+# TODO: this JSON should be passed into the minions as gzip in cloud-init
+GRAFANA_DASHB_URL="https://grafana.net/api/dashboards/1621/revisions/1/download"
+mkdir -p $GRAFANA_DEF_DASHBOARDS
+curl $GRAFANA_DASHB_URL -o $GRAFANA_DEF_DASHBOARD_FILE || echo "Failed to fetch default Grafana dashboard"
+if [ -f $GRAFANA_DEF_DASHBOARD_FILE ]; then
+    sed -i -- 's|${DS_PROMETHEUS}|k8sPrometheus|g' $GRAFANA_DEF_DASHBOARD_FILE
+fi
+
+# Launch the monitoring service
+systemctl enable kube-enable-monitoring
+systemctl start --no-block kube-enable-monitoring
--- a/magnum/drivers/common/templates/kubernetes/fragments/enable-node-exporter.sh
+++ b/magnum/drivers/common/templates/kubernetes/fragments/enable-node-exporter.sh
@ -0,0 +1,27 @@
+#!/bin/sh
+
+. /etc/sysconfig/heat-params
+
+if [ "$(echo $PROMETHEUS_MONITORING | tr '[:upper:]' '[:lower:]')" = "false" ]; then
+    exit 0
+fi
+
+# Write node-exporter manifest as a regular pod
+cat > /etc/kubernetes/manifests/node-exporter.yaml << EOF
+apiVersion: v1
+kind: Pod
+metadata:
+  name: node-exporter
+  namespace: kube-system
+  annotations:
+    prometheus.io/scrape: "true"
+  labels:
+    app: node-exporter
+spec:
+  containers:
+  - name: node-exporter
+    image: prom/node-exporter
+    ports:
+    - containerPort: 9100
+      hostPort: 9100
+EOF
--- a/magnum/drivers/common/templates/kubernetes/fragments/write-grafana-service.yaml
+++ b/magnum/drivers/common/templates/kubernetes/fragments/write-grafana-service.yaml
@ -0,0 +1,67 @@
+#cloud-config
+merge_how: dict(recurse_array)+list(append)
+write_files:
+  - path: /srv/kubernetes/monitoring/grafanaService.yaml
+    owner: "root:root"
+    permissions: "0644"
+    content: |
+      apiVersion: v1
+      kind: Service
+      metadata:
+        labels:
+          name: node
+          role: service
+        name: grafana
+        namespace: kube-system
+      spec:
+        type: "NodePort"
+        ports:
+          - port: 3000
+            targetPort: 3000
+            nodePort: 30603
+        selector:
+          grafana: "true"
+      ---
+      apiVersion: extensions/v1beta1
+      kind: Deployment
+      metadata:
+        name: grafana
+        namespace: kube-system
+      spec:
+        replicas: 1
+        template:
+          metadata:
+            labels:
+              name: grafana
+              grafana: "true"
+              role: db
+          spec:
+            containers:
+              - image: grafana/grafana
+                imagePullPolicy: Always
+                name: grafana
+                env:
+                  - name: GF_SECURITY_ADMIN_PASSWORD
+                    value: $ADMIN_PASSWD
+                  - name: GF_DASHBOARDS_JSON_ENABLED
+                    value: "true"
+                  - name: GF_DASHBOARDS_JSON_PATH
+                    value: /var/lib/grafana/dashboards
+                resources:
+                  # keep request = limit to keep this container in guaranteed class
+                  limits:
+                    cpu: 100m
+                    memory: 200Mi
+                  requests:
+                    cpu: 100m
+                    memory: 200Mi
+                volumeMounts:
+                - name: default-dashboard
+                  mountPath: /var/lib/grafana/dashboards
+                ports:
+                  - containerPort: 3000
+                    hostPort: 3000
+            volumes:
+            - name: default-dashboard
+              configMap:
+                name: graf-dash
--- a/magnum/drivers/common/templates/kubernetes/fragments/write-heat-params-master.yaml
+++ b/magnum/drivers/common/templates/kubernetes/fragments/write-heat-params-master.yaml
@ -5,6 +5,7 @@ write_files:
    owner: "root:root"
    permissions: "0600"
    content: |
+      PROMETHEUS_MONITORING="$PROMETHEUS_MONITORING"
      KUBE_API_PUBLIC_ADDRESS="$KUBE_API_PUBLIC_ADDRESS"
      KUBE_API_PRIVATE_ADDRESS="$KUBE_API_PRIVATE_ADDRESS"
      KUBE_API_PORT="$KUBE_API_PORT"
--- a/magnum/drivers/common/templates/kubernetes/fragments/write-heat-params.yaml
+++ b/magnum/drivers/common/templates/kubernetes/fragments/write-heat-params.yaml
@ -5,6 +5,7 @@ write_files:
    owner: "root:root"
    permissions: "0600"
    content: |
+      PROMETHEUS_MONITORING="$PROMETHEUS_MONITORING"
      KUBE_ALLOW_PRIV="$KUBE_ALLOW_PRIV"
      KUBE_MASTER_IP="$KUBE_MASTER_IP"
      KUBE_API_PORT="$KUBE_API_PORT"
--- a/magnum/drivers/common/templates/kubernetes/fragments/write-prometheus-configmap.yaml
+++ b/magnum/drivers/common/templates/kubernetes/fragments/write-prometheus-configmap.yaml
@ -0,0 +1,82 @@
+#cloud-config
+merge_how: dict(recurse_array)+list(append)
+write_files:
+  - path: /srv/kubernetes/monitoring/prometheusConfigMap.yaml
+    owner: "root:root"
+    permissions: "0644"
+    content: |
+      apiVersion: v1
+      kind: ConfigMap
+      metadata:
+        name: prometheus
+        namespace: kube-system
+      data:
+        prometheus.yml: |
+          global:
+            scrape_interval: 10s
+            scrape_timeout: 10s
+            evaluation_interval: 10s
+
+          scrape_configs:
+          - job_name: 'kubernetes-nodes-cadvisor'
+            tls_config:
+              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
+            kubernetes_sd_configs:
+            - role: node
+            relabel_configs:
+            - action: labelmap
+              regex: __meta_kubernetes_node_label_(.+)
+            - source_labels: [__meta_kubernetes_role]
+              action: replace
+              target_label: kubernetes_role
+            - source_labels: [__address__]
+              regex: '(.*):10250'
+              replacement: '${1}:10255'
+              target_label: __address__
+            metric_relabel_configs:
+            - action: replace
+              source_labels: [id]
+              regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
+              target_label: rkt_container_name
+              replacement: '${2}-${1}'
+            - action: replace
+              source_labels: [id]
+              regex: '^/system\.slice/(.+)\.service$'
+              target_label: systemd_service_name
+              replacement: '${1}'
+
+          - job_name: 'kubernetes-apiserver-cadvisor'
+            tls_config:
+              insecure_skip_verify: true
+              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
+            kubernetes_sd_configs:
+            - role: endpoints
+            relabel_configs:
+            - action: labelmap
+              regex: __meta_kubernetes_node_label_(.+)
+            - source_labels: [__meta_kubernetes_role]
+              action: replace
+              target_label: kubernetes_role
+            - source_labels: [__address__]
+              regex: '(.*):10250'
+              replacement: '${1}:10255'
+              target_label: __address__
+
+          - job_name: 'kubernetes-node-exporter'
+            tls_config:
+              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
+            kubernetes_sd_configs:
+            - role: node
+            relabel_configs:
+            - action: labelmap
+              regex: __meta_kubernetes_node_label_(.+)
+            - source_labels: [__meta_kubernetes_role]
+              action: replace
+              target_label: kubernetes_role
+            - source_labels: [__address__]
+              regex: '(.*):10250'
+              replacement: '${1}:9100'
+              target_label: __address__
--- a/magnum/drivers/common/templates/kubernetes/fragments/write-prometheus-service.yaml
+++ b/magnum/drivers/common/templates/kubernetes/fragments/write-prometheus-service.yaml
@ -0,0 +1,60 @@
+#cloud-config
+merge_how: dict(recurse_array)+list(append)
+write_files:
+  - path: /srv/kubernetes/monitoring/prometheusService.yaml
+    owner: "root:root"
+    permissions: "0644"
+    content: |
+      apiVersion: v1
+      kind: Service
+      metadata:
+        annotations:
+          prometheus.io/scrape: 'true'
+        labels:
+          name: prometheus
+        name: prometheus
+        namespace: kube-system
+      spec:
+        selector:
+          app: prometheus
+        type: NodePort
+        ports:
+        - name: prometheus
+          protocol: TCP
+          port: 9090
+          nodePort: 30900
+      ---
+      apiVersion: extensions/v1beta1
+      kind: Deployment
+      metadata:
+        name: prometheus
+        namespace: kube-system
+      spec:
+        replicas: 1
+        selector:
+          matchLabels:
+            app: prometheus
+        template:
+          metadata:
+            name: prometheus
+            labels:
+              app: prometheus
+          spec:
+            containers:
+            - name: prometheus
+              image: prom/prometheus
+              args:
+                - '-storage.local.retention=6h'
+                - '-storage.local.memory-chunks=500000'
+                - '-config.file=/etc/prometheus/prometheus.yml'
+              ports:
+              - name: web
+                containerPort: 9090
+                hostPort: 9090
+              volumeMounts:
+              - name: config-volume
+                mountPath: /etc/prometheus
+            volumes:
+            - name: config-volume
+              configMap:
+                name: prometheus
--- a/magnum/drivers/heat/k8s_template_def.py
+++ b/magnum/drivers/heat/k8s_template_def.py
@ -109,7 +109,9 @@ class K8sTemplateDefinition(template_def.BaseTemplateDefinition):
                      'flannel_network_subnetlen',
                      'system_pods_initial_delay',
                      'system_pods_timeout',
-                      'admission_control_list']
+                      'admission_control_list',
+                      'prometheus_monitoring',
+                      'grafana_admin_passwd']

        for label in label_list:
            extra_params[label] = cluster_template.labels.get(label)
--- a/magnum/drivers/k8s_fedora_atomic_v1/templates/kubecluster.yaml
+++ b/magnum/drivers/k8s_fedora_atomic_v1/templates/kubecluster.yaml
@ -40,6 +40,19 @@ parameters:
    default: m1.small
    description: flavor to use when booting the server for minions

+  prometheus_monitoring:
+    type: boolean
+    default: false
+    description: >
+      whether or not to have the grafana-prometheus-cadvisor monitoring setup
+
+  grafana_admin_passwd:
+    type: string
+    default: admin
+    hidden: true
+    description: >
+      admin user password for the Grafana monitoring interface
+
  dns_nameserver:
    type: string
    description: address of a DNS nameserver reachable in your environment
@ -417,6 +430,8 @@ resources:
      resource_def:
        type: kubemaster.yaml
        properties:
+          prometheus_monitoring: {get_param: prometheus_monitoring}
+          grafana_admin_passwd: {get_param: grafana_admin_passwd}
          api_public_address: {get_attr: [api_lb, floating_address]}
          api_private_address: {get_attr: [api_lb, address]}
          ssh_key_name: {get_param: ssh_key_name}
@ -474,6 +489,7 @@ resources:
      resource_def:
        type: kubeminion.yaml
        properties:
+          prometheus_monitoring: {get_param: prometheus_monitoring}
          ssh_key_name: {get_param: ssh_key_name}
          server_image: {get_param: server_image}
          minion_flavor: {get_param: minion_flavor}
--- a/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml
+++ b/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml
@ -105,6 +105,17 @@ parameters:
    type: string
    description: endpoint to retrieve TLS certs from

+  prometheus_monitoring:
+    type: boolean
+    description: >
+      whether or not to have prometheus and grafana deployed
+
+  grafana_admin_passwd:
+    type: string
+    hidden: true
+    description: >
+      admin user password for the Grafana monitoring interface
+
  api_public_address:
    type: string
    description: Public IP address of the Kubernetes master server.
@ -238,6 +249,7 @@ resources:
        str_replace:
          template: {get_file: ../../common/templates/kubernetes/fragments/write-heat-params-master.yaml}
          params:
+            "$PROMETHEUS_MONITORING": {get_param: prometheus_monitoring}
            "$KUBE_API_PUBLIC_ADDRESS": {get_attr: [api_address_switch, public_ip]}
            "$KUBE_API_PRIVATE_ADDRESS": {get_attr: [api_address_switch, private_ip]}
            "$KUBE_API_PORT": {get_param: kubernetes_port}
@ -314,6 +326,39 @@ resources:
      group: ungrouped
      config: {get_file: ../../common/templates/kubernetes/fragments/write-network-config.sh}

+  write_prometheus_configmap:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config: {get_file: ../../common/templates/kubernetes/fragments/write-prometheus-configmap.yaml}
+
+
+  write_prometheus_service:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config: {get_file: ../../common/templates/kubernetes/fragments/write-prometheus-service.yaml}
+
+  write_grafana_service:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config:
+        str_replace:
+          template: {get_file: ../../common/templates/kubernetes/fragments/write-grafana-service.yaml}
+          params:
+            "$ADMIN_PASSWD": {get_param: grafana_admin_passwd}
+
+  enable_monitoring:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config:
+        str_replace:
+          template: {get_file: ../../common/templates/kubernetes/fragments/enable-monitoring.sh}
+          params:
+            "$ADMIN_PASSWD": {get_param: grafana_admin_passwd}
+
  network_config_service:
    type: OS::Heat::SoftwareConfig
    properties:
@ -394,6 +439,9 @@ resources:
        - config: {get_resource: add_proxy}
        - config: {get_resource: enable_services}
        - config: {get_resource: write_network_config}
+        - config: {get_resource: write_prometheus_configmap}
+        - config: {get_resource: write_prometheus_service}
+        - config: {get_resource: write_grafana_service}
        - config: {get_resource: network_config_service}
        - config: {get_resource: network_service}
        - config: {get_resource: kube_system_namespace_service}
@ -401,6 +449,7 @@ resources:
        - config: {get_resource: enable_kube_proxy}
        - config: {get_resource: kube_ui_service}
        - config: {get_resource: kube_examples}
+        - config: {get_resource: enable_monitoring}
        - config: {get_resource: master_wc_notify}

  ######################################################################
--- a/magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml
+++ b/magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml
@ -61,6 +61,11 @@ parameters:
    type: string
    description: endpoint to retrieve TLS certs from

+  prometheus_monitoring:
+    type: boolean
+    description: >
+      whether or not to have the node-exporter running on the node
+
  kube_master_ip:
    type: string
    description: IP address of the Kubernetes master server.
@ -220,6 +225,7 @@ resources:
        str_replace:
          template: {get_file: ../../common/templates/kubernetes/fragments/write-heat-params.yaml}
          params:
+            $PROMETHEUS_MONITORING: {get_param: prometheus_monitoring}
            $KUBE_ALLOW_PRIV: {get_param: kube_allow_priv}
            $KUBE_MASTER_IP: {get_param: kube_master_ip}
            $KUBE_API_PORT: {get_param: kubernetes_port}
@ -321,6 +327,12 @@ resources:
      group: ungrouped
      config: {get_file: ../../common/templates/kubernetes/fragments/enable-kube-proxy-minion.sh}

+  enable_node_exporter:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config: {get_file: ../../common/templates/kubernetes/fragments/enable-node-exporter.sh}
+
  minion_wc_notify:
    type: OS::Heat::SoftwareConfig
    properties:
@ -361,6 +373,7 @@ resources:
        - config: {get_resource: add_proxy}
        - config: {get_resource: enable_services}
        - config: {get_resource: enable_kube_proxy}
+        - config: {get_resource: enable_node_exporter}
        - config: {get_resource: enable_docker_registry}
        - config: {get_resource: minion_wc_notify}

--- a/magnum/drivers/k8s_fedora_atomic_v1/tools/grafana-prometheus-dashboard.json
+++ b/magnum/drivers/k8s_fedora_atomic_v1/tools/grafana-prometheus-dashboard.json
--- a/magnum/drivers/k8s_fedora_ironic_v1/templates/kubecluster.yaml
+++ b/magnum/drivers/k8s_fedora_ironic_v1/templates/kubecluster.yaml
@ -43,6 +43,19 @@ parameters:
    default: baremetal
    description: flavor to use when booting the server

+  prometheus_monitoring:
+    type: boolean
+    default: false
+    description: >
+      whether or not to have the grafana-prometheus-cadvisor monitoring setup
+
+  grafana_admin_passwd:
+    type: string
+    default: admin
+    hidden: true
+    description: >
+      admin user password for the Grafana monitoring interface
+
  dns_nameserver:
    type: string
    description: address of a dns nameserver reachable in your environment
@ -405,6 +418,8 @@ resources:
      resource_def:
        type: kubemaster.yaml
        properties:
+          prometheus_monitoring: {get_param: prometheus_monitoring}
+          grafana_admin_passwd: {get_param: grafana_admin_passwd}
          api_public_address: {get_attr: [api_lb, floating_address]}
          api_private_address: {get_attr: [api_lb, address]}
          ssh_key_name: {get_param: ssh_key_name}
@ -491,6 +506,7 @@ resources:
  kubeminion_software_configs:
    type: kubeminion_software_configs.yaml
    properties:
+      prometheus_monitoring: {get_param: prometheus_monitoring}
      network_driver: {get_param: network_driver}
      kube_master_ip: {get_attr: [api_address_lb_switch, private_ip]}
      etcd_server_ip: {get_attr: [etcd_address_lb_switch, private_ip]}
--- a/magnum/drivers/k8s_fedora_ironic_v1/templates/kubemaster.yaml
+++ b/magnum/drivers/k8s_fedora_ironic_v1/templates/kubemaster.yaml
@ -105,6 +105,17 @@ parameters:
    type: string
    description: endpoint to retrieve TLS certs from

+  prometheus_monitoring:
+    type: boolean
+    description: >
+      whether or not to have prometheus and grafana deployed
+
+  grafana_admin_passwd:
+    type: string
+    hidden: true
+    description: >
+      admin user password for the Grafana monitoring interface
+
  api_public_address:
    type: string
    description: Public IP address of the Kubernetes master server.
@ -232,6 +243,7 @@ resources:
        str_replace:
          template: {get_file: ../../common/templates/kubernetes/fragments/write-heat-params-master.yaml}
          params:
+            "$PROMETHEUS_MONITORING": {get_param: prometheus_monitoring}
            "$KUBE_API_PUBLIC_ADDRESS": {get_attr: [api_address_switch, public_ip]}
            "$KUBE_API_PRIVATE_ADDRESS": {get_attr: [api_address_switch, private_ip]}
            "$KUBE_API_PORT": {get_param: kubernetes_port}
@ -307,6 +319,39 @@ resources:
      group: ungrouped
      config: {get_file: ../../common/templates/kubernetes/fragments/write-network-config.sh}

+  write_prometheus_configmap:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config: {get_file: ../../common/templates/kubernetes/fragments/write-prometheus-configmap.yaml}
+
+
+  write_prometheus_service:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config: {get_file: ../../common/templates/kubernetes/fragments/write-prometheus-service.yaml}
+
+  write_grafana_service:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config:
+        str_replace:
+          template: {get_file: ../../common/templates/kubernetes/fragments/write-grafana-service.yaml}
+          params:
+            "$ADMIN_PASSWD": {get_param: grafana_admin_passwd}
+
+  enable_monitoring:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config:
+        str_replace:
+          template: {get_file: ../../common/templates/kubernetes/fragments/enable-monitoring.sh}
+          params:
+            "$ADMIN_PASSWD": {get_param: grafana_admin_passwd}
+
  network_config_service:
    type: OS::Heat::SoftwareConfig
    properties:
@ -387,6 +432,9 @@ resources:
        - config: {get_resource: add_proxy}
        - config: {get_resource: enable_services}
        - config: {get_resource: write_network_config}
+        - config: {get_resource: write_prometheus_configmap}
+        - config: {get_resource: write_prometheus_service}
+        - config: {get_resource: write_grafana_service}
        - config: {get_resource: network_config_service}
        - config: {get_resource: network_service}
        - config: {get_resource: kube_system_namespace_service}
@ -394,6 +442,7 @@ resources:
        - config: {get_resource: enable_kube_proxy}
        - config: {get_resource: kube_ui_service}
        - config: {get_resource: kube_examples}
+        - config: {get_resource: enable_monitoring}
        - config: {get_resource: master_wc_notify}

  ######################################################################
--- a/magnum/drivers/k8s_fedora_ironic_v1/templates/kubeminion_software_configs.yaml
+++ b/magnum/drivers/k8s_fedora_ironic_v1/templates/kubeminion_software_configs.yaml
@ -43,6 +43,11 @@ parameters:
    type: string
    description: endpoint to retrieve TLS certs from

+  prometheus_monitoring:
+    type: boolean
+    description: >
+      whether or not to have the node-exporter running on the node
+
  kube_master_ip:
    type: string
    description: IP address of the Kubernetes master server.
@ -176,6 +181,7 @@ resources:
        str_replace:
          template: {get_file: ../../common/templates/kubernetes/fragments/write-heat-params.yaml}
          params:
+            $PROMETHEUS_MONITORING: {get_param: prometheus_monitoring}
            $KUBE_ALLOW_PRIV: {get_param: kube_allow_priv}
            $KUBE_MASTER_IP: {get_param: kube_master_ip}
            $KUBE_API_PORT: {get_param: kubernetes_port}
@ -276,6 +282,12 @@ resources:
      group: ungrouped
      config: {get_file: ../../common/templates/kubernetes/fragments/enable-kube-proxy-minion.sh}

+  enable_node_exporter:
+    type: OS::Heat::SoftwareConfig
+    properties:
+      group: ungrouped
+      config: {get_file: ../../common/templates/kubernetes/fragments/enable-node-exporter.sh}
+
  minion_wc_notify:
    type: OS::Heat::SoftwareConfig
    properties:
@ -316,6 +328,7 @@ resources:
        - config: {get_resource: add_proxy}
        - config: {get_resource: enable_services}
        - config: {get_resource: enable_kube_proxy}
+        - config: {get_resource: enable_node_exporter}
        - config: {get_resource: enable_docker_registry}
        - config: {get_resource: minion_wc_notify}

--- a/magnum/tests/unit/conductor/handlers/test_k8s_cluster_conductor.py
+++ b/magnum/tests/unit/conductor/handlers/test_k8s_cluster_conductor.py
@ -51,7 +51,9 @@ class TestClusterConductorWithK8s(base.TestCase):
                       'flannel_backend': 'vxlan',
                       'system_pods_initial_delay': '15',
                       'system_pods_timeout': '1',
-                       'admission_control_list': 'fake_list'},
+                       'admission_control_list': 'fake_list',
+                       'prometheus_monitoring': 'False',
+                       'grafana_admin_passwd': 'fake_pwd'},
            'tls_disabled': False,
            'server_type': 'vm',
            'registry_enabled': False,
@ -149,7 +151,9 @@ class TestClusterConductorWithK8s(base.TestCase):
                       'flannel_backend': 'vxlan',
                       'system_pods_initial_delay': '15',
                       'system_pods_timeout': '1',
-                       'admission_control_list': 'fake_list'},
+                       'admission_control_list': 'fake_list',
+                       'prometheus_monitoring': 'False',
+                       'grafana_admin_passwd': 'fake_pwd'},
            'http_proxy': 'http_proxy',
            'https_proxy': 'https_proxy',
            'no_proxy': 'no_proxy',
@ -180,6 +184,8 @@ class TestClusterConductorWithK8s(base.TestCase):
            'system_pods_initial_delay': '15',
            'system_pods_timeout': '1',
            'admission_control_list': 'fake_list',
+            'prometheus_monitoring': 'False',
+            'grafana_admin_passwd': 'fake_pwd',
            'http_proxy': 'http_proxy',
            'https_proxy': 'https_proxy',
            'no_proxy': 'no_proxy',
@ -261,6 +267,8 @@ class TestClusterConductorWithK8s(base.TestCase):
            'system_pods_initial_delay': '15',
            'system_pods_timeout': '1',
            'admission_control_list': 'fake_list',
+            'prometheus_monitoring': 'False',
+            'grafana_admin_passwd': 'fake_pwd',
            'http_proxy': 'http_proxy',
            'https_proxy': 'https_proxy',
            'magnum_url': 'http://127.0.0.1:9511/v1',
@ -344,6 +352,8 @@ class TestClusterConductorWithK8s(base.TestCase):
            'system_pods_initial_delay': '15',
            'system_pods_timeout': '1',
            'admission_control_list': 'fake_list',
+            'prometheus_monitoring': 'False',
+            'grafana_admin_passwd': 'fake_pwd',
            'insecure_registry_url': '10.0.0.1:5000',
            'kube_version': 'fake-version',
            'magnum_url': 'http://127.0.0.1:9511/v1',
@ -419,6 +429,8 @@ class TestClusterConductorWithK8s(base.TestCase):
            'system_pods_initial_delay': '15',
            'system_pods_timeout': '1',
            'admission_control_list': 'fake_list',
+            'prometheus_monitoring': 'False',
+            'grafana_admin_passwd': 'fake_pwd',
            'tls_disabled': False,
            'registry_enabled': False,
            'trustee_domain_id': self.mock_keystone.trustee_domain_id,
@ -486,6 +498,8 @@ class TestClusterConductorWithK8s(base.TestCase):
            'system_pods_initial_delay': '15',
            'system_pods_timeout': '1',
            'admission_control_list': 'fake_list',
+            'prometheus_monitoring': 'False',
+            'grafana_admin_passwd': 'fake_pwd',
            'tls_disabled': False,
            'registry_enabled': False,
            'trustee_domain_id': self.mock_keystone.trustee_domain_id,
@ -679,6 +693,8 @@ class TestClusterConductorWithK8s(base.TestCase):
            'system_pods_initial_delay': '15',
            'system_pods_timeout': '1',
            'admission_control_list': 'fake_list',
+            'prometheus_monitoring': 'False',
+            'grafana_admin_passwd': 'fake_pwd',
            'tenant_name': 'fake_tenant',
            'username': 'fake_user',
            'cluster_uuid': self.cluster_dict['uuid'],
--- a/magnum/tests/unit/drivers/test_template_definition.py
+++ b/magnum/tests/unit/drivers/test_template_definition.py
@ -260,6 +260,10 @@ class AtomicK8sTemplateDefinitionTestCase(BaseTemplateDefinitionTestCase):
            'system_pods_timeout')
        admission_control_list = mock_cluster_template.labels.get(
            'admission_control_list')
+        prometheus_monitoring = mock_cluster_template.labels.get(
+            'prometheus_monitoring')
+        grafana_admin_passwd = mock_cluster_template.labels.get(
+            'grafana_admin_passwd')

        k8s_def = k8sa_tdef.AtomicK8sTemplateDefinition()

@ -275,6 +279,8 @@ class AtomicK8sTemplateDefinitionTestCase(BaseTemplateDefinitionTestCase):
            'system_pods_initial_delay': system_pods_initial_delay,
            'system_pods_timeout': system_pods_timeout,
            'admission_control_list': admission_control_list,
+            'prometheus_monitoring': prometheus_monitoring,
+            'grafana_admin_passwd': grafana_admin_passwd,
            'username': 'fake_user',
            'tenant_name': 'fake_tenant',
            'magnum_url': mock_osc.magnum_url.return_value,
@ -325,6 +331,10 @@ class AtomicK8sTemplateDefinitionTestCase(BaseTemplateDefinitionTestCase):
            'system_pods_timeout')
        admission_control_list = mock_cluster_template.labels.get(
            'admission_control_list')
+        prometheus_monitoring = mock_cluster_template.labels.get(
+            'prometheus_monitoring')
+        grafana_admin_passwd = mock_cluster_template.labels.get(
+            'grafana_admin_passwd')

        k8s_def = k8sa_tdef.AtomicK8sTemplateDefinition()

@ -340,6 +350,8 @@ class AtomicK8sTemplateDefinitionTestCase(BaseTemplateDefinitionTestCase):
            'system_pods_initial_delay': system_pods_initial_delay,
            'system_pods_timeout': system_pods_timeout,
            'admission_control_list': admission_control_list,
+            'prometheus_monitoring': prometheus_monitoring,
+            'grafana_admin_passwd': grafana_admin_passwd,
            'username': 'fake_user',
            'tenant_name': 'fake_tenant',
            'magnum_url': mock_osc.magnum_url.return_value,
--- a/releasenotes/notes/bp-container-monitoring-d4bb1cbd0a4e44cc.yaml
+++ b/releasenotes/notes/bp-container-monitoring-d4bb1cbd0a4e44cc.yaml
@ -0,0 +1,8 @@
+---
+features:
+  - |
+    Includes a monitoring stack based on cAdvisor, node-exporter, Prometheus
+    and Grafana. Users can enable this stack through the label
+    prometheus_monitoring. Prometheus scrapes metrics from the Kubernetes
+    cluster and then serves them to Grafana through Grafana's Prometheus
+    data source. Upon completion, a default Grafana dashboard is provided.