[k8s] Update prometheus monitoring helm based configuration

* prometheus-operator chart version upgraded from 0.1.31. to 5.12.3
* Fix an issue where when using Feature Gate Priority the scheduler
would evict the prometheus monitoring node-exporter pods
* Fix an issue where intensive CPU utilization would make the
metrics fail intermitently or completly fail
* Prometheus resources are now calculated based on the MAX_NODE_COUNT
requested
* Change the sampling rate from the standard 30s to 1 minute
* Add the missing tiller CONTAINER_INFRA_PREFIX variable to the ConfigMap
* Add label prometheus_operator_chart_tag to enable the user to
specify the stable/prometheus-operator chart to use
* Fix breaking changes on CoreDNS metrics introduced by
8fb27da2fc
* Fix Graphana dashboard not showing data.


Change-Id: If42873cd6668c07e4e911e4eef5e4ae2232be66f
Task: 30777
Task: 30779
Story: 2005588
Signed-off-by: Diogo Guerra <dy090.guerra@gmail.com>
This commit is contained in:
Diogo Guerra 2019-04-30 15:17:51 +02:00
parent 3217e75b63
commit 87b1b703ea
8 changed files with 93 additions and 24 deletions

View File

@ -309,6 +309,8 @@ the table are linked to more details elsewhere in the user guide.
| `monitoring_enabled`_ | - true | false | | `monitoring_enabled`_ | - true | false |
| | - false | | | | - false | |
+---------------------------------------+--------------------+---------------+ +---------------------------------------+--------------------+---------------+
| `prometheus_operator_chart_tag`_ | see below | see below |
+---------------------------------------+--------------------+---------------+
| `swarm_strategy`_ | - spread | spread | | `swarm_strategy`_ | - spread | spread |
| | - binpack | | | | - binpack | |
| | - random | | | | - random | |
@ -1142,10 +1144,10 @@ _`container_infra_prefix`
* gcr.io/google_containers/kubernetes-dashboard-amd64:v1.5.1 * gcr.io/google_containers/kubernetes-dashboard-amd64:v1.5.1
* gcr.io/google-containers/hyperkube:v1.12.1 * gcr.io/google-containers/hyperkube:v1.12.1
* quay.io/coreos/configmap-reload:v0.0.1 * quay.io/coreos/configmap-reload:v0.0.1
* quay.io/coreos/prometheus-config-reloader:v0.26.0 * quay.io/coreos/prometheus-config-reloader:v0.30.1
* quay.io/coreos/prometheus-operator:v0.15.3 * quay.io/coreos/prometheus-operator:v0.30.1
* quay.io/prometheus/alertmanager:v0.15.3 * quay.io/prometheus/alertmanager:v0.17.0
* quay.io/prometheus/prometheus:v2.5.0 * quay.io/prometheus/prometheus:v2.9.1
* k8s.gcr.io/node-problem-detector:v0.6.2 * k8s.gcr.io/node-problem-detector:v0.6.2
* docker.io/planetlabs/draino:abf028a * docker.io/planetlabs/draino:abf028a
* docker.io/openstackmagnum/cluster-autoscaler:v1.0 * docker.io/openstackmagnum/cluster-autoscaler:v1.0
@ -1274,6 +1276,13 @@ _`monitoring_enabled`
stable/prometheus-operator helm chart. stable/prometheus-operator helm chart.
Default: false Default: false
_`prometheus_operator_chart_tag`
Add prometheus_operator_chart_tag to select version of the
stable/prometheus-operator chart to install. When installing the chart,
helm will use the default values of the tag defined and overwrite them based
on the prometheus-operator-config ConfigMap currently defined. You must
certify that the versions are compatible.
_`tiller_enabled` _`tiller_enabled`
If set to true, tiller will be deployed in the kube-system namespace. If set to true, tiller will be deployed in the kube-system namespace.
Defaults to false. Defaults to false.

View File

@ -40,6 +40,7 @@ HEAT_PARAMS=/etc/sysconfig/heat-params
CLUSTER_UUID="$CLUSTER_UUID" CLUSTER_UUID="$CLUSTER_UUID"
MAGNUM_URL="$MAGNUM_URL" MAGNUM_URL="$MAGNUM_URL"
MONITORING_ENABLED="$MONITORING_ENABLED" MONITORING_ENABLED="$MONITORING_ENABLED"
PROMETHEUS_OPERATOR_CHART_TAG="$PROMETHEUS_OPERATOR_CHART_TAG"
VOLUME_DRIVER="$VOLUME_DRIVER" VOLUME_DRIVER="$VOLUME_DRIVER"
REGION_NAME="$REGION_NAME" REGION_NAME="$REGION_NAME"
HTTP_PROXY="$HTTP_PROXY" HTTP_PROXY="$HTTP_PROXY"

View File

@ -10,10 +10,16 @@ printf "Starting to run ${step}\n"
### Configuration ### Configuration
############################################################################### ###############################################################################
CHART_NAME="prometheus-operator" CHART_NAME="prometheus-operator"
CHART_VERSION="0.1.31" CHART_VERSION=${PROMETHEUS_OPERATOR_CHART_TAG:-5.12.3}
if [ "$(echo ${MONITORING_ENABLED} | tr '[:upper:]' '[:lower:]')" = "true" ]; then if [ "$(echo ${MONITORING_ENABLED} | tr '[:upper:]' '[:lower:]')" = "true" ]; then
# Calculate resources needed to run the Prometheus Monitoring Solution
# MAX_NODE_COUNT so we can have metrics even if cluster scales
PROMETHEUS_SERVER_CPU=$(expr 128 + 7 \* ${MAX_NODE_COUNT} )
PROMETHEUS_SERVER_RAM=$(expr 256 + 40 \* ${MAX_NODE_COUNT})
# Validate if communication node <-> master is secure or insecure # Validate if communication node <-> master is secure or insecure
PROTOCOL="https" PROTOCOL="https"
INSECURE_SKIP_VERIFY="False" INSECURE_SKIP_VERIFY="False"
@ -53,11 +59,12 @@ data:
done done
helm repo update helm repo update
if [[ \$(helm history prometheus-operator | grep prometheus-operator) ]]; then if [[ \$(helm history ${CHART_NAME} | grep ${CHART_NAME}) ]]; then
echo "${CHART_NAME} already installed on server. Continue..." echo "${CHART_NAME} already installed on server. Continue..."
exit 0 exit 0
else else
helm install stable/${CHART_NAME} --namespace monitoring --name ${CHART_NAME} --version v${CHART_VERSION} --values /opt/magnum/install-${CHART_NAME}-values.yaml # TODO: Set namespace to monitoring. This is needed as the Kubernetes default priorityClass can only be used in NS kube-system
helm install stable/${CHART_NAME} --namespace kube-system --name ${CHART_NAME} --version v${CHART_VERSION} --values /opt/magnum/install-${CHART_NAME}-values.yaml
fi fi
install-${CHART_NAME}-values.yaml: | install-${CHART_NAME}-values.yaml: |
@ -67,11 +74,22 @@ data:
alertmanager: alertmanager:
alertmanagerSpec: alertmanagerSpec:
image: image:
repository: ${CONTAINER_INFRA_PREFIX:-quay.io/}prometheus/alertmanager repository: ${CONTAINER_INFRA_PREFIX:-quay.io/prometheus/}alertmanager
# # Needs testing
# resources:
# requests:
# cpu: 100m
# memory: 256Mi
priorityClassName: "system-cluster-critical"
# Dashboard # Dashboard
grafana: grafana:
#enabled: ${ENABLE_GRAFANA} #enabled: ${ENABLE_GRAFANA}
resources:
requests:
cpu: 100m
memory: 128Mi
adminPassword: ${ADMIN_PASSWD} adminPassword: ${ADMIN_PASSWD}
kubeApiServer: kubeApiServer:
@ -88,37 +106,59 @@ data:
port: 9153 port: 9153
targetPort: 9153 targetPort: 9153
selector: selector:
k8s-app: coredns k8s-app: kube-dns
kubeEtcd: kubeEtcd:
service:
port: 4001
targetPort: 4001
selector:
k8s-app: etcd-server
serviceMonitor: serviceMonitor:
scheme: ${PROTOCOL} scheme: ${PROTOCOL}
insecureSkipVerify: ${INSECURE_SKIP_VERIFY} insecureSkipVerify: true
## If Protocol is http this files should be neglected ## If Protocol is http this files should be neglected
caFile: ${CERT_DIR}/ca.crt caFile: /etc/prometheus/secrets/etcd-certificates/ca.crt
certFile: ${CERT_DIR}/kubelet.crt certFile: /etc/prometheus/secrets/etcd-certificates/kubelet.crt
keyFile: ${CERT_DIR}/kubelet.key keyFile: /etc/prometheus/secrets/etcd-certificates/kubelet.key
kube-state-metrics:
priorityClassName: "system-cluster-critical"
resources:
#Guaranteed
limits:
cpu: 50m
memory: 64M
prometheus-node-exporter:
priorityClassName: "system-node-critical"
resources:
#Guaranteed
limits:
cpu: 20m
memory: 20M
prometheusOperator: prometheusOperator:
priorityClassName: "system-cluster-critical"
image: image:
repository: ${CONTAINER_INFRA_PREFIX:-quay.io/}coreos/prometheus-operator repository: ${CONTAINER_INFRA_PREFIX:-quay.io/coreos/}prometheus-operator
configmapReloadImage: configmapReloadImage:
repository: ${CONTAINER_INFRA_PREFIX:-quay.io/}coreos/configmap-reload repository: ${CONTAINER_INFRA_PREFIX:-quay.io/coreos/}configmap-reload
prometheusConfigReloaderImage: prometheusConfigReloaderImage:
repository: ${CONTAINER_INFRA_PREFIX:-quay.io/}coreos/prometheus-config-reloader repository: ${CONTAINER_INFRA_PREFIX:-quay.io/coreos/}prometheus-config-reloader
hyperkubeImage: hyperkubeImage:
repository: ${CONTAINER_INFRA_PREFIX:-gcr.io/google-containers/}hyperkube repository: ${CONTAINER_INFRA_PREFIX:-gcr.io/google-containers/}hyperkube
prometheus: prometheus:
prometheusSpec: prometheusSpec:
scrapeInterval: 30s
evaluationInterval: 30s
image: image:
repository: ${CONTAINER_INFRA_PREFIX:-quay.io/}prometheus/prometheus repository: ${CONTAINER_INFRA_PREFIX:-quay.io/prometheus/}prometheus
retention: 14d retention: 14d
resources:
requests:
cpu: ${PROMETHEUS_SERVER_CPU}m
memory: ${PROMETHEUS_SERVER_RAM}M
# secrets:
# - etcd-certificates
priorityClassName: "system-cluster-critical"
--- ---
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
@ -132,7 +172,7 @@ spec:
serviceAccountName: tiller serviceAccountName: tiller
containers: containers:
- name: config-helm - name: config-helm
image: docker.io/openstackmagnum/helm-client:dev image: ${CONTAINER_INFRA_PREFIX:-docker.io/openstackmagnum/}helm-client:dev
command: command:
- bash - bash
args: args:

View File

@ -132,6 +132,7 @@ class K8sFedoraTemplateDefinition(k8s_template_def.K8sTemplateDefinition):
'heat_container_agent_tag', 'heat_container_agent_tag',
'keystone_auth_enabled', 'k8s_keystone_auth_tag', 'keystone_auth_enabled', 'k8s_keystone_auth_tag',
'monitoring_enabled', 'monitoring_enabled',
'prometheus_operator_chart_tag',
'tiller_enabled', 'tiller_enabled',
'tiller_tag', 'tiller_tag',
'tiller_namespace', 'tiller_namespace',

View File

@ -577,6 +577,11 @@ parameters:
description: Enable or disable prometheus-operator monitoring solution. description: Enable or disable prometheus-operator monitoring solution.
default: false default: false
prometheus_operator_chart_tag:
type: string
description: The stable/prometheus-operator chart version to use.
default: 5.12.3
project_id: project_id:
type: string type: string
description: > description: >
@ -929,6 +934,7 @@ resources:
keystone_auth_enabled: {get_param: keystone_auth_enabled} keystone_auth_enabled: {get_param: keystone_auth_enabled}
k8s_keystone_auth_tag: {get_param: k8s_keystone_auth_tag} k8s_keystone_auth_tag: {get_param: k8s_keystone_auth_tag}
monitoring_enabled: {get_param: monitoring_enabled} monitoring_enabled: {get_param: monitoring_enabled}
prometheus_operator_chart_tag: {get_param: prometheus_operator_chart_tag}
project_id: {get_param: project_id} project_id: {get_param: project_id}
tiller_enabled: {get_param: tiller_enabled} tiller_enabled: {get_param: tiller_enabled}
tiller_tag: {get_param: tiller_tag} tiller_tag: {get_param: tiller_tag}

View File

@ -430,6 +430,11 @@ parameters:
description: Enable or disable prometheus-operator monitoring solution. description: Enable or disable prometheus-operator monitoring solution.
default: false default: false
prometheus_operator_chart_tag:
type: string
description: The stable/prometheus-operator chart version to use.
default: 5.12.3
project_id: project_id:
type: string type: string
description: > description: >
@ -613,6 +618,7 @@ resources:
"$KEYSTONE_AUTH_ENABLED": {get_param: keystone_auth_enabled} "$KEYSTONE_AUTH_ENABLED": {get_param: keystone_auth_enabled}
"$K8S_KEYSTONE_AUTH_TAG": {get_param: k8s_keystone_auth_tag} "$K8S_KEYSTONE_AUTH_TAG": {get_param: k8s_keystone_auth_tag}
"$MONITORING_ENABLED": {get_param: monitoring_enabled} "$MONITORING_ENABLED": {get_param: monitoring_enabled}
"$PROMETHEUS_OPERATOR_CHART_TAG": {get_param: prometheus_operator_chart_tag}
"$PROJECT_ID": {get_param: project_id} "$PROJECT_ID": {get_param: project_id}
"$EXTERNAL_NETWORK_ID": {get_param: external_network} "$EXTERNAL_NETWORK_ID": {get_param: external_network}
"$TILLER_ENABLED": {get_param: tiller_enabled} "$TILLER_ENABLED": {get_param: tiller_enabled}

View File

@ -510,6 +510,8 @@ class AtomicK8sTemplateDefinitionTestCase(BaseK8sTemplateDefinitionTestCase):
'k8s_keystone_auth_tag') 'k8s_keystone_auth_tag')
monitoring_enabled = mock_cluster.labels.get( monitoring_enabled = mock_cluster.labels.get(
'monitoring_enabled') 'monitoring_enabled')
prometheus_operator_chart_tag = mock_cluster.labels.get(
'prometheus_operator_chart_tag')
project_id = mock_cluster.project_id project_id = mock_cluster.project_id
tiller_enabled = mock_cluster.labels.get( tiller_enabled = mock_cluster.labels.get(
'tiller_enabled') 'tiller_enabled')
@ -589,6 +591,7 @@ class AtomicK8sTemplateDefinitionTestCase(BaseK8sTemplateDefinitionTestCase):
'keystone_auth_enabled': keystone_auth_enabled, 'keystone_auth_enabled': keystone_auth_enabled,
'k8s_keystone_auth_tag': k8s_keystone_auth_tag, 'k8s_keystone_auth_tag': k8s_keystone_auth_tag,
'monitoring_enabled': monitoring_enabled, 'monitoring_enabled': monitoring_enabled,
'prometheus_operator_chart_tag': prometheus_operator_chart_tag,
'project_id': project_id, 'project_id': project_id,
'external_network': external_network_id, 'external_network': external_network_id,
'tiller_enabled': tiller_enabled, 'tiller_enabled': tiller_enabled,
@ -912,6 +915,8 @@ class AtomicK8sTemplateDefinitionTestCase(BaseK8sTemplateDefinitionTestCase):
'k8s_keystone_auth_tag') 'k8s_keystone_auth_tag')
monitoring_enabled = mock_cluster.labels.get( monitoring_enabled = mock_cluster.labels.get(
'monitoring_enabled') 'monitoring_enabled')
prometheus_operator_chart_tag = mock_cluster.labels.get(
'prometheus_operator_chart_tag')
project_id = mock_cluster.project_id project_id = mock_cluster.project_id
tiller_enabled = mock_cluster.labels.get( tiller_enabled = mock_cluster.labels.get(
'tiller_enabled') 'tiller_enabled')
@ -993,6 +998,7 @@ class AtomicK8sTemplateDefinitionTestCase(BaseK8sTemplateDefinitionTestCase):
'keystone_auth_enabled': keystone_auth_enabled, 'keystone_auth_enabled': keystone_auth_enabled,
'k8s_keystone_auth_tag': k8s_keystone_auth_tag, 'k8s_keystone_auth_tag': k8s_keystone_auth_tag,
'monitoring_enabled': monitoring_enabled, 'monitoring_enabled': monitoring_enabled,
'prometheus_operator_chart_tag': prometheus_operator_chart_tag,
'project_id': project_id, 'project_id': project_id,
'external_network': external_network_id, 'external_network': external_network_id,
'tiller_enabled': tiller_enabled, 'tiller_enabled': tiller_enabled,

View File

@ -5,4 +5,4 @@ features:
solution by means of helm stable/prometheus-operator public chart. solution by means of helm stable/prometheus-operator public chart.
Defaults to false. grafana_admin_passwd label can be used to set Defaults to false. grafana_admin_passwd label can be used to set
grafana dashboard admin access password. If grafana_admin_passwd grafana dashboard admin access password. If grafana_admin_passwd
is not set the password defaults to prom_operator. is not set the password defaults to admin.