Browse Source

[k8s] Monitoring with Prometheus and Grafana

Profit from the default cAdvisor deployed by k8s to deploy the
remaining monitoring stack on top, made of node-exporter,
Prometheus and Grafana.

Node-exporter is ran as a normal pod through a manifest, while
Prometheus and Grafana are deployments with 1 replica.

Prometheus has compliance with Kubernetes, so the discovery of
the nodes and other k8s components is configured directly in
Prometheus configuration.

Change-Id: If2cab996b9458580a55b5212ab298c909622e7f3
Partially-Implements: blueprint container-monitoring
changes/91/426291/14
Cristovao Cordeiro 5 years ago
parent
commit
248e45f75c
  1. 73
      doc/source/userguide.rst
  2. 139
      magnum/drivers/common/templates/kubernetes/fragments/enable-monitoring.sh
  3. 27
      magnum/drivers/common/templates/kubernetes/fragments/enable-node-exporter.sh
  4. 67
      magnum/drivers/common/templates/kubernetes/fragments/write-grafana-service.yaml
  5. 1
      magnum/drivers/common/templates/kubernetes/fragments/write-heat-params-master.yaml
  6. 1
      magnum/drivers/common/templates/kubernetes/fragments/write-heat-params.yaml
  7. 82
      magnum/drivers/common/templates/kubernetes/fragments/write-prometheus-configmap.yaml
  8. 60
      magnum/drivers/common/templates/kubernetes/fragments/write-prometheus-service.yaml
  9. 4
      magnum/drivers/heat/k8s_template_def.py
  10. 16
      magnum/drivers/k8s_fedora_atomic_v1/templates/kubecluster.yaml
  11. 49
      magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml
  12. 13
      magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml
  13. 2079
      magnum/drivers/k8s_fedora_atomic_v1/tools/grafana-prometheus-dashboard.json
  14. 16
      magnum/drivers/k8s_fedora_ironic_v1/templates/kubecluster.yaml
  15. 49
      magnum/drivers/k8s_fedora_ironic_v1/templates/kubemaster.yaml
  16. 13
      magnum/drivers/k8s_fedora_ironic_v1/templates/kubeminion_software_configs.yaml
  17. 20
      magnum/tests/unit/conductor/handlers/test_k8s_cluster_conductor.py
  18. 12
      magnum/tests/unit/drivers/test_template_definition.py
  19. 8
      releasenotes/notes/bp-container-monitoring-d4bb1cbd0a4e44cc.yaml

73
doc/source/userguide.rst

@ -33,6 +33,7 @@ Contents
#. `Storage`_
#. `Image Management`_
#. `Notification`_
#. `Container Monitoring`_
===========
Terminology
@ -304,7 +305,11 @@ the table are linked to more details elsewhere in the user guide.
+---------------------------------------+--------------------+---------------+
| `admission_control_list`_ | see below | see below |
+---------------------------------------+--------------------+---------------+
| `prometheus_monitoring`_ | - true | false |
| | - false | |
+---------------------------------------+--------------------+---------------+
| `grafana_admin_passwd`_ | (any string) | "admin" |
+---------------------------------------+--------------------+---------------+
=======
Cluster
@ -2719,3 +2724,69 @@ created. This example can be applied for any ``create``, ``update`` or
"publisher_id": "magnum.host1234",
"timestamp": "2016-05-20 15:03:45.960280"
}
====================
Container Monitoring
====================
The offered monitoring stack relies on the following set of containers and
services:
- cAdvisor
- Node Exporter
- Prometheus
- Grafana
To setup this monitoring stack, users are given two configurable labels in
the Magnum cluster template's definition:
_`prometheus_monitoring`
This label accepts a boolean value. If *True*, the monitoring stack will be
setup. By default *prometheus_monitoring = False*.
_`grafana_admin_passwd`
This label lets users create their own *admin* user password for the Grafana
interface. It expects a string value. By default it is set to *admin*.
Container Monitoring in Kubernetes
----------------------------------
By default, all Kubernetes clusters already contain *cAdvisor* integrated
with the *Kubelet* binary. Its container monitoring data can be accessed on
a node level basis through *http://NODE_IP:4194*.
Node Exporter is part of the above mentioned monitoring stack as it can be
used to export machine metrics. Such functionality also work on a node level
which means that when `prometheus_monitoring`_ is *True*, the Kubernetes nodes
will be populated with an additional manifest under
*/etc/kubernetes/manifests*. Node Exporter is then automatically picked up
and launched as a regular Kubernetes POD.
To aggregate and complement all the existing monitoring metrics and add a
built-in visualization layer, Prometheus is used. It is launched by the
Kubernetes master node(s) as a *Service* within a *Deployment* with one
replica and it relies on a *ConfigMap* where the Prometheus configuration
(prometheus.yml) is defined. This configuration uses Prometheus native
support for service discovery in Kubernetes clusters,
*kubernetes_sd_configs*. The respective manifests can be found in
*/srv/kubernetes/monitoring/* on the master nodes and once the service is
up and running, Prometheus UI can be accessed through port 9090.
Finally, for custom plotting and enhanced metric aggregation and
visualization, Prometheus can be integrated with Grafana as it provides
native compliance for Prometheus data sources. Also Grafana is deployed as
a *Service* within a *Deployment* with one replica. The default user is
*admin* and the password is setup according to `grafana_admin_passwd`_.
There is also a default Grafana dashboard provided with this installation,
from the official `Grafana dashboards' repository
<https://grafana.net/dashboards>`_. The Prometheus data
source is automatically added to Grafana once it is up and running, pointing
to *http://prometheus:9090* through *Proxy*. The respective manifests can
also be found in */srv/kubernetes/monitoring/* on the master nodes and once
the service is running, the Grafana dashboards can be accessed through port
3000.
For both Prometheus and Grafana, there is an assigned *systemd* service
called *kube-enable-monitoring*.

139
magnum/drivers/common/templates/kubernetes/fragments/enable-monitoring.sh

@ -0,0 +1,139 @@
#!/bin/bash
. /etc/sysconfig/heat-params
if [ "$(echo $PROMETHEUS_MONITORING | tr '[:upper:]' '[:lower:]')" = "false" ]; then
exit 0
fi
function writeFile {
# $1 is filename
# $2 is file content
[ -f ${1} ] || {
echo "Writing File: $1"
mkdir -p $(dirname ${1})
cat << EOF > ${1}
$2
EOF
}
}
KUBE_MON_BIN=/usr/local/bin/kube-enable-monitoring
KUBE_MON_SERVICE=/etc/systemd/system/kube-enable-monitoring.service
GRAFANA_DEF_DASHBOARDS="/var/lib/grafana/dashboards"
GRAFANA_DEF_DASHBOARD_FILE=$GRAFANA_DEF_DASHBOARDS"/default.json"
# Write the binary for enable-monitoring
KUBE_MON_BIN_CONTENT='''#!/bin/sh
until curl -sf "http://127.0.0.1:8080/healthz"
do
echo "Waiting for Kubernetes API..."
sleep 5
done
# Check if all resources exist already before creating them
# Check if configmap Prometheus exists
kubectl get configmap prometheus -n kube-system
if [ "$?" != "0" ] && \
[ -f "/srv/kubernetes/monitoring/prometheusConfigMap.yaml" ]; then
kubectl create -f /srv/kubernetes/monitoring/prometheusConfigMap.yaml
fi
# Check if deployment and service Prometheus exist
kubectl get service prometheus -n kube-system | kubectl get deployment prometheus -n kube-system
if [ "${PIPESTATUS[0]}" != "0" ] && [ "${PIPESTATUS[1]}" != "0" ] && \
[ -f "/srv/kubernetes/monitoring/prometheusService.yaml" ]; then
kubectl create -f /srv/kubernetes/monitoring/prometheusService.yaml
fi
# Check if configmap graf-dash exists
kubectl get configmap graf-dash -n kube-system
if [ "$?" != "0" ] && \
[ -f '''$GRAFANA_DEF_DASHBOARD_FILE''' ]; then
kubectl create configmap graf-dash --from-file='''$GRAFANA_DEF_DASHBOARD_FILE''' -n kube-system
fi
# Check if deployment and service Grafana exist
kubectl get service grafana -n kube-system | kubectl get deployment grafana -n kube-system
if [ "${PIPESTATUS[0]}" != "0" ] && [ "${PIPESTATUS[1]}" != "0" ] && \
[ -f "/srv/kubernetes/monitoring/grafanaService.yaml" ]; then
kubectl create -f /srv/kubernetes/monitoring/grafanaService.yaml
fi
# Wait for Grafana pod and then inject data source
while true
do
echo "Waiting for Grafana pod to be up and Running"
if [ "$(kubectl get po -n kube-system -l name=grafana -o jsonpath={..phase})" = "Running" ]; then
break
fi
sleep 2
done
# Which node is running Grafana
NODE_IP=`kubectl get po -n kube-system -o jsonpath={.items[0].status.hostIP} -l name=grafana`
PROM_SERVICE_IP=`kubectl get svc prometheus --namespace kube-system -o jsonpath={..clusterIP}`
# The Grafana pod might be running but the app might still be initiating
echo "Check if Grafana is ready..."
curl --user admin:$ADMIN_PASSWD -X GET http://$NODE_IP:3000/api/datasources/1
until [ $? -eq 0 ]
do
sleep 2
curl --user admin:$ADMIN_PASSWD -X GET http://$NODE_IP:3000/api/datasources/1
done
# Inject Prometheus datasource into Grafana
while true
do
INJECT=`curl --user admin:$ADMIN_PASSWD -X POST \
-H "Content-Type: application/json;charset=UTF-8" \
--data-binary '''"'"'''{"name":"k8sPrometheus","isDefault":true,
"type":"prometheus","url":"http://'''"'"'''$PROM_SERVICE_IP'''"'"''':9090","access":"proxy"}'''"'"'''\
"http://$NODE_IP:3000/api/datasources/"`
if [[ "$INJECT" = *"Datasource added"* ]]; then
echo "Prometheus datasource injected into Grafana"
break
fi
echo "Trying to inject Prometheus datasource into Grafana - "$INJECT
done
'''
writeFile $KUBE_MON_BIN "$KUBE_MON_BIN_CONTENT"
# Write the monitoring service
KUBE_MON_SERVICE_CONTENT='''[Unit]
Requires=kubelet.service
[Service]
Type=oneshot
Environment=HOME=/root
EnvironmentFile=-/etc/kubernetes/config
ExecStart='''${KUBE_MON_BIN}'''
[Install]
WantedBy=multi-user.target
'''
writeFile $KUBE_MON_SERVICE "$KUBE_MON_SERVICE_CONTENT"
chown root:root ${KUBE_MON_BIN}
chmod 0755 ${KUBE_MON_BIN}
chown root:root ${KUBE_MON_SERVICE}
chmod 0644 ${KUBE_MON_SERVICE}
# Download the default JSON Grafana dashboard
# Not a crucial step, so allow it to fail
# TODO: this JSON should be passed into the minions as gzip in cloud-init
GRAFANA_DASHB_URL="https://grafana.net/api/dashboards/1621/revisions/1/download"
mkdir -p $GRAFANA_DEF_DASHBOARDS
curl $GRAFANA_DASHB_URL -o $GRAFANA_DEF_DASHBOARD_FILE || echo "Failed to fetch default Grafana dashboard"
if [ -f $GRAFANA_DEF_DASHBOARD_FILE ]; then
sed -i -- 's|${DS_PROMETHEUS}|k8sPrometheus|g' $GRAFANA_DEF_DASHBOARD_FILE
fi
# Launch the monitoring service
systemctl enable kube-enable-monitoring
systemctl start --no-block kube-enable-monitoring

27
magnum/drivers/common/templates/kubernetes/fragments/enable-node-exporter.sh

@ -0,0 +1,27 @@
#!/bin/sh
. /etc/sysconfig/heat-params
if [ "$(echo $PROMETHEUS_MONITORING | tr '[:upper:]' '[:lower:]')" = "false" ]; then
exit 0
fi
# Write node-exporter manifest as a regular pod
cat > /etc/kubernetes/manifests/node-exporter.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: node-exporter
namespace: kube-system
annotations:
prometheus.io/scrape: "true"
labels:
app: node-exporter
spec:
containers:
- name: node-exporter
image: prom/node-exporter
ports:
- containerPort: 9100
hostPort: 9100
EOF

67
magnum/drivers/common/templates/kubernetes/fragments/write-grafana-service.yaml

@ -0,0 +1,67 @@
#cloud-config
merge_how: dict(recurse_array)+list(append)
write_files:
- path: /srv/kubernetes/monitoring/grafanaService.yaml
owner: "root:root"
permissions: "0644"
content: |
apiVersion: v1
kind: Service
metadata:
labels:
name: node
role: service
name: grafana
namespace: kube-system
spec:
type: "NodePort"
ports:
- port: 3000
targetPort: 3000
nodePort: 30603
selector:
grafana: "true"
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: grafana
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
name: grafana
grafana: "true"
role: db
spec:
containers:
- image: grafana/grafana
imagePullPolicy: Always
name: grafana
env:
- name: GF_SECURITY_ADMIN_PASSWORD
value: $ADMIN_PASSWD
- name: GF_DASHBOARDS_JSON_ENABLED
value: "true"
- name: GF_DASHBOARDS_JSON_PATH
value: /var/lib/grafana/dashboards
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: default-dashboard
mountPath: /var/lib/grafana/dashboards
ports:
- containerPort: 3000
hostPort: 3000
volumes:
- name: default-dashboard
configMap:
name: graf-dash

1
magnum/drivers/common/templates/kubernetes/fragments/write-heat-params-master.yaml

@ -5,6 +5,7 @@ write_files:
owner: "root:root"
permissions: "0600"
content: |
PROMETHEUS_MONITORING="$PROMETHEUS_MONITORING"
KUBE_API_PUBLIC_ADDRESS="$KUBE_API_PUBLIC_ADDRESS"
KUBE_API_PRIVATE_ADDRESS="$KUBE_API_PRIVATE_ADDRESS"
KUBE_API_PORT="$KUBE_API_PORT"

1
magnum/drivers/common/templates/kubernetes/fragments/write-heat-params.yaml

@ -5,6 +5,7 @@ write_files:
owner: "root:root"
permissions: "0600"
content: |
PROMETHEUS_MONITORING="$PROMETHEUS_MONITORING"
KUBE_ALLOW_PRIV="$KUBE_ALLOW_PRIV"
KUBE_MASTER_IP="$KUBE_MASTER_IP"
KUBE_API_PORT="$KUBE_API_PORT"

82
magnum/drivers/common/templates/kubernetes/fragments/write-prometheus-configmap.yaml

@ -0,0 +1,82 @@
#cloud-config
merge_how: dict(recurse_array)+list(append)
write_files:
- path: /srv/kubernetes/monitoring/prometheusConfigMap.yaml
owner: "root:root"
permissions: "0644"
content: |
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus
namespace: kube-system
data:
prometheus.yml: |
global:
scrape_interval: 10s
scrape_timeout: 10s
evaluation_interval: 10s
scrape_configs:
- job_name: 'kubernetes-nodes-cadvisor'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_role]
action: replace
target_label: kubernetes_role
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:10255'
target_label: __address__
metric_relabel_configs:
- action: replace
source_labels: [id]
regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
target_label: rkt_container_name
replacement: '${2}-${1}'
- action: replace
source_labels: [id]
regex: '^/system\.slice/(.+)\.service$'
target_label: systemd_service_name
replacement: '${1}'
- job_name: 'kubernetes-apiserver-cadvisor'
tls_config:
insecure_skip_verify: true
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_role]
action: replace
target_label: kubernetes_role
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:10255'
target_label: __address__
- job_name: 'kubernetes-node-exporter'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_role]
action: replace
target_label: kubernetes_role
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__

60
magnum/drivers/common/templates/kubernetes/fragments/write-prometheus-service.yaml

@ -0,0 +1,60 @@
#cloud-config
merge_how: dict(recurse_array)+list(append)
write_files:
- path: /srv/kubernetes/monitoring/prometheusService.yaml
owner: "root:root"
permissions: "0644"
content: |
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
name: prometheus
name: prometheus
namespace: kube-system
spec:
selector:
app: prometheus
type: NodePort
ports:
- name: prometheus
protocol: TCP
port: 9090
nodePort: 30900
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: prometheus
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
name: prometheus
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- '-storage.local.retention=6h'
- '-storage.local.memory-chunks=500000'
- '-config.file=/etc/prometheus/prometheus.yml'
ports:
- name: web
containerPort: 9090
hostPort: 9090
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
volumes:
- name: config-volume
configMap:
name: prometheus

4
magnum/drivers/heat/k8s_template_def.py

@ -109,7 +109,9 @@ class K8sTemplateDefinition(template_def.BaseTemplateDefinition):
'flannel_network_subnetlen',
'system_pods_initial_delay',
'system_pods_timeout',
'admission_control_list']
'admission_control_list',
'prometheus_monitoring',
'grafana_admin_passwd']
for label in label_list:
extra_params[label] = cluster_template.labels.get(label)

16
magnum/drivers/k8s_fedora_atomic_v1/templates/kubecluster.yaml

@ -40,6 +40,19 @@ parameters:
default: m1.small
description: flavor to use when booting the server for minions
prometheus_monitoring:
type: boolean
default: false
description: >
whether or not to have the grafana-prometheus-cadvisor monitoring setup
grafana_admin_passwd:
type: string
default: admin
hidden: true
description: >
admin user password for the Grafana monitoring interface
dns_nameserver:
type: string
description: address of a DNS nameserver reachable in your environment
@ -417,6 +430,8 @@ resources:
resource_def:
type: kubemaster.yaml
properties:
prometheus_monitoring: {get_param: prometheus_monitoring}
grafana_admin_passwd: {get_param: grafana_admin_passwd}
api_public_address: {get_attr: [api_lb, floating_address]}
api_private_address: {get_attr: [api_lb, address]}
ssh_key_name: {get_param: ssh_key_name}
@ -474,6 +489,7 @@ resources:
resource_def:
type: kubeminion.yaml
properties:
prometheus_monitoring: {get_param: prometheus_monitoring}
ssh_key_name: {get_param: ssh_key_name}
server_image: {get_param: server_image}
minion_flavor: {get_param: minion_flavor}

49
magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml

@ -105,6 +105,17 @@ parameters:
type: string
description: endpoint to retrieve TLS certs from
prometheus_monitoring:
type: boolean
description: >
whether or not to have prometheus and grafana deployed
grafana_admin_passwd:
type: string
hidden: true
description: >
admin user password for the Grafana monitoring interface
api_public_address:
type: string
description: Public IP address of the Kubernetes master server.
@ -238,6 +249,7 @@ resources:
str_replace:
template: {get_file: ../../common/templates/kubernetes/fragments/write-heat-params-master.yaml}
params:
"$PROMETHEUS_MONITORING": {get_param: prometheus_monitoring}
"$KUBE_API_PUBLIC_ADDRESS": {get_attr: [api_address_switch, public_ip]}
"$KUBE_API_PRIVATE_ADDRESS": {get_attr: [api_address_switch, private_ip]}
"$KUBE_API_PORT": {get_param: kubernetes_port}
@ -314,6 +326,39 @@ resources:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/write-network-config.sh}
write_prometheus_configmap:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/write-prometheus-configmap.yaml}
write_prometheus_service:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/write-prometheus-service.yaml}
write_grafana_service:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config:
str_replace:
template: {get_file: ../../common/templates/kubernetes/fragments/write-grafana-service.yaml}
params:
"$ADMIN_PASSWD": {get_param: grafana_admin_passwd}
enable_monitoring:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config:
str_replace:
template: {get_file: ../../common/templates/kubernetes/fragments/enable-monitoring.sh}
params:
"$ADMIN_PASSWD": {get_param: grafana_admin_passwd}
network_config_service:
type: OS::Heat::SoftwareConfig
properties:
@ -394,6 +439,9 @@ resources:
- config: {get_resource: add_proxy}
- config: {get_resource: enable_services}
- config: {get_resource: write_network_config}
- config: {get_resource: write_prometheus_configmap}
- config: {get_resource: write_prometheus_service}
- config: {get_resource: write_grafana_service}
- config: {get_resource: network_config_service}
- config: {get_resource: network_service}
- config: {get_resource: kube_system_namespace_service}
@ -401,6 +449,7 @@ resources:
- config: {get_resource: enable_kube_proxy}
- config: {get_resource: kube_ui_service}
- config: {get_resource: kube_examples}
- config: {get_resource: enable_monitoring}
- config: {get_resource: master_wc_notify}
######################################################################

13
magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml

@ -61,6 +61,11 @@ parameters:
type: string
description: endpoint to retrieve TLS certs from
prometheus_monitoring:
type: boolean
description: >
whether or not to have the node-exporter running on the node
kube_master_ip:
type: string
description: IP address of the Kubernetes master server.
@ -220,6 +225,7 @@ resources:
str_replace:
template: {get_file: ../../common/templates/kubernetes/fragments/write-heat-params.yaml}
params:
$PROMETHEUS_MONITORING: {get_param: prometheus_monitoring}
$KUBE_ALLOW_PRIV: {get_param: kube_allow_priv}
$KUBE_MASTER_IP: {get_param: kube_master_ip}
$KUBE_API_PORT: {get_param: kubernetes_port}
@ -321,6 +327,12 @@ resources:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/enable-kube-proxy-minion.sh}
enable_node_exporter:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/enable-node-exporter.sh}
minion_wc_notify:
type: OS::Heat::SoftwareConfig
properties:
@ -361,6 +373,7 @@ resources:
- config: {get_resource: add_proxy}
- config: {get_resource: enable_services}
- config: {get_resource: enable_kube_proxy}
- config: {get_resource: enable_node_exporter}
- config: {get_resource: enable_docker_registry}
- config: {get_resource: minion_wc_notify}

2079
magnum/drivers/k8s_fedora_atomic_v1/tools/grafana-prometheus-dashboard.json

File diff suppressed because it is too large

16
magnum/drivers/k8s_fedora_ironic_v1/templates/kubecluster.yaml

@ -43,6 +43,19 @@ parameters:
default: baremetal
description: flavor to use when booting the server
prometheus_monitoring:
type: boolean
default: false
description: >
whether or not to have the grafana-prometheus-cadvisor monitoring setup
grafana_admin_passwd:
type: string
default: admin
hidden: true
description: >
admin user password for the Grafana monitoring interface
dns_nameserver:
type: string
description: address of a dns nameserver reachable in your environment
@ -405,6 +418,8 @@ resources:
resource_def:
type: kubemaster.yaml
properties:
prometheus_monitoring: {get_param: prometheus_monitoring}
grafana_admin_passwd: {get_param: grafana_admin_passwd}
api_public_address: {get_attr: [api_lb, floating_address]}
api_private_address: {get_attr: [api_lb, address]}
ssh_key_name: {get_param: ssh_key_name}
@ -491,6 +506,7 @@ resources:
kubeminion_software_configs:
type: kubeminion_software_configs.yaml
properties:
prometheus_monitoring: {get_param: prometheus_monitoring}
network_driver: {get_param: network_driver}
kube_master_ip: {get_attr: [api_address_lb_switch, private_ip]}
etcd_server_ip: {get_attr: [etcd_address_lb_switch, private_ip]}

49
magnum/drivers/k8s_fedora_ironic_v1/templates/kubemaster.yaml

@ -105,6 +105,17 @@ parameters:
type: string
description: endpoint to retrieve TLS certs from
prometheus_monitoring:
type: boolean
description: >
whether or not to have prometheus and grafana deployed
grafana_admin_passwd:
type: string
hidden: true
description: >
admin user password for the Grafana monitoring interface
api_public_address:
type: string
description: Public IP address of the Kubernetes master server.
@ -232,6 +243,7 @@ resources:
str_replace:
template: {get_file: ../../common/templates/kubernetes/fragments/write-heat-params-master.yaml}
params:
"$PROMETHEUS_MONITORING": {get_param: prometheus_monitoring}
"$KUBE_API_PUBLIC_ADDRESS": {get_attr: [api_address_switch, public_ip]}
"$KUBE_API_PRIVATE_ADDRESS": {get_attr: [api_address_switch, private_ip]}
"$KUBE_API_PORT": {get_param: kubernetes_port}
@ -307,6 +319,39 @@ resources:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/write-network-config.sh}
write_prometheus_configmap:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/write-prometheus-configmap.yaml}
write_prometheus_service:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/write-prometheus-service.yaml}
write_grafana_service:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config:
str_replace:
template: {get_file: ../../common/templates/kubernetes/fragments/write-grafana-service.yaml}
params:
"$ADMIN_PASSWD": {get_param: grafana_admin_passwd}
enable_monitoring:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config:
str_replace:
template: {get_file: ../../common/templates/kubernetes/fragments/enable-monitoring.sh}
params:
"$ADMIN_PASSWD": {get_param: grafana_admin_passwd}
network_config_service:
type: OS::Heat::SoftwareConfig
properties:
@ -387,6 +432,9 @@ resources:
- config: {get_resource: add_proxy}
- config: {get_resource: enable_services}
- config: {get_resource: write_network_config}
- config: {get_resource: write_prometheus_configmap}
- config: {get_resource: write_prometheus_service}
- config: {get_resource: write_grafana_service}
- config: {get_resource: network_config_service}
- config: {get_resource: network_service}
- config: {get_resource: kube_system_namespace_service}
@ -394,6 +442,7 @@ resources:
- config: {get_resource: enable_kube_proxy}
- config: {get_resource: kube_ui_service}
- config: {get_resource: kube_examples}
- config: {get_resource: enable_monitoring}
- config: {get_resource: master_wc_notify}
######################################################################

13
magnum/drivers/k8s_fedora_ironic_v1/templates/kubeminion_software_configs.yaml

@ -43,6 +43,11 @@ parameters:
type: string
description: endpoint to retrieve TLS certs from
prometheus_monitoring:
type: boolean
description: >
whether or not to have the node-exporter running on the node
kube_master_ip:
type: string
description: IP address of the Kubernetes master server.
@ -176,6 +181,7 @@ resources:
str_replace:
template: {get_file: ../../common/templates/kubernetes/fragments/write-heat-params.yaml}
params:
$PROMETHEUS_MONITORING: {get_param: prometheus_monitoring}
$KUBE_ALLOW_PRIV: {get_param: kube_allow_priv}
$KUBE_MASTER_IP: {get_param: kube_master_ip}
$KUBE_API_PORT: {get_param: kubernetes_port}
@ -276,6 +282,12 @@ resources:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/enable-kube-proxy-minion.sh}
enable_node_exporter:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ../../common/templates/kubernetes/fragments/enable-node-exporter.sh}
minion_wc_notify:
type: OS::Heat::SoftwareConfig
properties:
@ -316,6 +328,7 @@ resources:
- config: {get_resource: add_proxy}
- config: {get_resource: enable_services}
- config: {get_resource: enable_kube_proxy}
- config: {get_resource: enable_node_exporter}
- config: {get_resource: enable_docker_registry}
- config: {get_resource: minion_wc_notify}

20
magnum/tests/unit/conductor/handlers/test_k8s_cluster_conductor.py

@ -51,7 +51,9 @@ class TestClusterConductorWithK8s(base.TestCase):
'flannel_backend': 'vxlan',
'system_pods_initial_delay': '15',
'system_pods_timeout': '1',
'admission_control_list': 'fake_list'},
'admission_control_list': 'fake_list',
'prometheus_monitoring': 'False',
'grafana_admin_passwd': 'fake_pwd'},
'tls_disabled': False,
'server_type': 'vm',
'registry_enabled': False,
@ -149,7 +151,9 @@ class TestClusterConductorWithK8s(base.TestCase):
'flannel_backend': 'vxlan',
'system_pods_initial_delay': '15',
'system_pods_timeout': '1',
'admission_control_list': 'fake_list'},
'admission_control_list': 'fake_list',
'prometheus_monitoring': 'False',
'grafana_admin_passwd': 'fake_pwd'},
'http_proxy': 'http_proxy',
'https_proxy': 'https_proxy',
'no_proxy': 'no_proxy',
@ -180,6 +184,8 @@ class TestClusterConductorWithK8s(base.TestCase):
'system_pods_initial_delay': '15',
'system_pods_timeout': '1',
'admission_control_list': 'fake_list',
'prometheus_monitoring': 'False',
'grafana_admin_passwd': 'fake_pwd',
'http_proxy': 'http_proxy',
'https_proxy': 'https_proxy',
'no_proxy': 'no_proxy',
@ -261,6 +267,8 @@ class TestClusterConductorWithK8s(base.TestCase):
'system_pods_initial_delay': '15',
'system_pods_timeout': '1',
'admission_control_list': 'fake_list',
'prometheus_monitoring': 'False',
'grafana_admin_passwd': 'fake_pwd',
'http_proxy': 'http_proxy',
'https_proxy': 'https_proxy',
'magnum_url': 'http://127.0.0.1:9511/v1',
@ -344,6 +352,8 @@ class TestClusterConductorWithK8s(base.TestCase):
'system_pods_initial_delay': '15',
'system_pods_timeout': '1',
'admission_control_list': 'fake_list',
'prometheus_monitoring': 'False',
'grafana_admin_passwd': 'fake_pwd',
'insecure_registry_url': '10.0.0.1:5000',
'kube_version': 'fake-version',
'magnum_url': 'http://127.0.0.1:9511/v1',
@ -419,6 +429,8 @@ class TestClusterConductorWithK8s(base.TestCase):
'system_pods_initial_delay': '15',
'system_pods_timeout': '1',
'admission_control_list': 'fake_list',
'prometheus_monitoring': 'False',
'grafana_admin_passwd': 'fake_pwd',
'tls_disabled': False,
'registry_enabled': False,
'trustee_domain_id': self.mock_keystone.trustee_domain_id,
@ -486,6 +498,8 @@ class TestClusterConductorWithK8s(base.TestCase):
'system_pods_initial_delay': '15',
'system_pods_timeout': '1',
'admission_control_list': 'fake_list',
'prometheus_monitoring': 'False',
'grafana_admin_passwd': 'fake_pwd',
'tls_disabled': False,
'registry_enabled': False,
'trustee_domain_id': self.mock_keystone.trustee_domain_id,
@ -679,6 +693,8 @@ class TestClusterConductorWithK8s(base.TestCase):
'system_pods_initial_delay': '15',
'system_pods_timeout': '1',
'admission_control_list': 'fake_list',
'prometheus_monitoring': 'False',
'grafana_admin_passwd': 'fake_pwd',
'tenant_name': 'fake_tenant',
'username': 'fake_user',
'cluster_uuid': self.cluster_dict['uuid'],

12
magnum/tests/unit/drivers/test_template_definition.py

@ -260,6 +260,10 @@ class AtomicK8sTemplateDefinitionTestCase(BaseTemplateDefinitionTestCase):
'system_pods_timeout')
admission_control_list = mock_cluster_template.labels.get(
'admission_control_list')
prometheus_monitoring = mock_cluster_template.labels.get(
'prometheus_monitoring')
grafana_admin_passwd = mock_cluster_template.labels.get(
'grafana_admin_passwd')
k8s_def = k8sa_tdef.AtomicK8sTemplateDefinition()
@ -275,6 +279,8 @@ class AtomicK8sTemplateDefinitionTestCase(BaseTemplateDefinitionTestCase):
'system_pods_initial_delay': system_pods_initial_delay,
'system_pods_timeout': system_pods_timeout,
'admission_control_list': admission_control_list,
'prometheus_monitoring': prometheus_monitoring,
'grafana_admin_passwd': grafana_admin_passwd,
'username': 'fake_user',
'tenant_name': 'fake_tenant',
'magnum_url': mock_osc.magnum_url.return_value,
@ -325,6 +331,10 @@ class AtomicK8sTemplateDefinitionTestCase(BaseTemplateDefinitionTestCase):
'system_pods_timeout')
admission_control_list = mock_cluster_template.labels.get(
'admission_control_list')
prometheus_monitoring = mock_cluster_template.labels.get(
'prometheus_monitoring')
grafana_admin_passwd = mock_cluster_template.labels.get(
'grafana_admin_passwd')
k8s_def = k8sa_tdef.AtomicK8sTemplateDefinition()
@ -340,6 +350,8 @@ class AtomicK8sTemplateDefinitionTestCase(BaseTemplateDefinitionTestCase):
'system_pods_initial_delay': system_pods_initial_delay,
'system_pods_timeout': system_pods_timeout,
'admission_control_list': admission_control_list,
'prometheus_monitoring': prometheus_monitoring,
'grafana_admin_passwd': grafana_admin_passwd,
'username': 'fake_user',
'tenant_name': 'fake_tenant',
'magnum_url': mock_osc.magnum_url.return_value,

8
releasenotes/notes/bp-container-monitoring-d4bb1cbd0a4e44cc.yaml

@ -0,0 +1,8 @@
---
features:
- |
Includes a monitoring stack based on cAdvisor, node-exporter, Prometheus
and Grafana. Users can enable this stack through the label
prometheus_monitoring. Prometheus scrapes metrics from the Kubernetes
cluster and then serves them to Grafana through Grafana's Prometheus
data source. Upon completion, a default Grafana dashboard is provided.
Loading…
Cancel
Save