Support PromQL config file for PrometheusPlugin
The PromQL statement data for PrometheusPlugin is able to customize with external data file. The operators can use the original PromQL statement with this file. Implements: blueprint support-auto-lcm Change-Id: Ie84eef8098feabaf4a82a33610248dcae5e205c0
This commit is contained in:
parent
aac03ceffc
commit
5ab59f7edb
@ -238,6 +238,7 @@ function configure_tacker {
|
||||
cd -
|
||||
|
||||
cp $TACKER_DIR/etc/tacker/tacker.conf.sample $TACKER_CONF
|
||||
cp $TACKER_DIR/etc/tacker/prometheus-plugin.yaml $TACKER_CONF_DIR/prometheus-plugin.yaml
|
||||
|
||||
iniset_rpc_backend tacker $TACKER_CONF
|
||||
|
||||
|
@ -321,6 +321,11 @@ Tacker Zed release
|
||||
- Prometheus: 2.37
|
||||
- Alertmanager: 0.24
|
||||
|
||||
Tacker Antelope release
|
||||
|
||||
- Prometheus: 2.37
|
||||
- Alertmanager: 0.25
|
||||
|
||||
Alert rule registration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -373,7 +378,7 @@ at "metadata" field.
|
||||
With the parameter, pod name can be specified but container name can not.
|
||||
And some prometheus metrics need container name. Therefore, ``max``
|
||||
statement of PromQL is alternatively used in some measurements to
|
||||
measure without container name. That means it provids only most
|
||||
measure without container name. That means it provides only most
|
||||
impacted value among the containers. For example:
|
||||
|
||||
``avg(max(container_fs_usage_bytes{pod=~"pod name"} /
|
||||
@ -448,6 +453,107 @@ rule file directly. Below is example of alert rule.
|
||||
vnfc_info_id: VDU1-85adebfa-d71c-49ab-9d39-d8dd7e393541
|
||||
annotations:
|
||||
|
||||
External data file
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The PromQL statement data for Performance Management
|
||||
is able to customize with external data file. The operators can use the
|
||||
original PromQL statement with this file.
|
||||
|
||||
The external data file includes configuration about PromQL statement for
|
||||
Performance Management. The template of the file is located
|
||||
at etc/tacker/prometheus-plugin.yaml from the tacker project source directory.
|
||||
Edit this file if you need and put it in the configuration directory
|
||||
(e.g. /etc/tacker).
|
||||
|
||||
Default configuration file
|
||||
--------------------------
|
||||
|
||||
Normally, the default external data file is automatically deployed at the
|
||||
installation process. However if you need to deploy the file manually,
|
||||
execute below command at the top directory of tacker project.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
sudo python3 ./setup.py install
|
||||
|
||||
Data format
|
||||
-----------
|
||||
|
||||
The file is described in yaml format [#yaml]_.
|
||||
|
||||
Root configuration
|
||||
------------------
|
||||
|
||||
The configuration consists of PromQL config for PMJob API and
|
||||
PromQL config for Threshold API. The PMJob and the Threshold are
|
||||
defined in ETSI GS NFV-SOL 003 [#etsi_sol_003]_.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
# PromQL config for PM Job API
|
||||
PMJob:
|
||||
PromQL: <PromQLConfig>
|
||||
# PromQL config for Threshold API
|
||||
Threshold:
|
||||
PromQL: <PromQLConfig>
|
||||
|
||||
<PromQLConfig>
|
||||
--------------
|
||||
|
||||
The elements of PromQLConfig are key-value pairs of a performanceMetric
|
||||
and a PromQL statement. These performanceMetric are defined in
|
||||
ETSI GS NFV-SOL 003 [#etsi_sol_003]_.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
<PromQLConfig>
|
||||
VCpuUsageMeanVnf: <F-string of PromQL statement>
|
||||
VCpuUsagePeakVnf: <F-string of PromQL statement>
|
||||
VMemoryUsageMeanVnf: <F-string of PromQL statement>
|
||||
VMemoryUsagePeakVnf: <F-string of PromQL statement>
|
||||
VDiskUsageMeanVnf: <F-string of PromQL statement>
|
||||
VDiskUsagePeakVnf: <F-string of PromQL statement>
|
||||
ByteIncomingVnfIntCp: <F-string of PromQL statement>
|
||||
PacketIncomingVnfIntCp: <F-string of PromQL statement>
|
||||
ByteOutgoingVnfIntCp: <F-string of PromQL statement>
|
||||
PacketOutgoingVnfIntCp: <F-string of PromQL statement>
|
||||
ByteIncomingVnfExtCp: <F-string of PromQL statement>
|
||||
PacketIncomingVnfExtCp: <F-string of PromQL statement>
|
||||
ByteOutgoingVnfExtCp: <F-string of PromQL statement>
|
||||
PacketOutgoingVnfExtCp: <F-string of PromQL statement>
|
||||
|
||||
For example, VCpuUsageMeanVnf can be described as below.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
VCpuUsageMeanVnf: >-
|
||||
avg(sum(rate(pod_cpu_usage_seconds_total
|
||||
{{namespace="{namespace}",pod=~"{pod}"}}[{reporting_period}s])))
|
||||
|
||||
F-string of PromQL statement
|
||||
----------------------------
|
||||
|
||||
For above PromQL statement, f-string of python [#f_string]_ is used.
|
||||
In the f-string, below replacement field can be used. They are replaced
|
||||
with a SOL-API's attribute [#etsi_sol_003]_ or Tacker internal value.
|
||||
|
||||
``{collection_period}``
|
||||
Replaced with collectionPeriod attribute of SOL-API.
|
||||
``{pod}``
|
||||
Replaced with a resourceId when subObjectInstanceIds are specified
|
||||
(e.g: "test-test1-8d6db447f-stzhb").
|
||||
Or, replaced with regexp that matches each resourceIds in vnfInstance when
|
||||
subObjectInstanceIds are not specified
|
||||
(e.g: "(test-test1-[0-9a-f]{1,10}-[0-9a-z]{5}$|
|
||||
test-test2-[0-9a-f]{1,10}-[0-9a-z]{5}$)").
|
||||
``{reporting_period}``
|
||||
Replaced with reportingPeriod attribute of SOL-API.
|
||||
``{sub_object_instance_id}``
|
||||
Replaced with an element of subObjectInstanceIds of SOL-API.
|
||||
``{namespace}``
|
||||
Replaced with the kubernetes namespace that the vnfInstance belongs to.
|
||||
|
||||
Using Vendor Specific Plugin
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -487,3 +593,8 @@ tacker.sol_refactored.common.monitoring_plugin_base.MonitoringPlugin.
|
||||
* - ``CONF.prometheus_plugin.auto_healing_class``
|
||||
- PrometheusPluginAutoHealing
|
||||
- Class name for auto healing.
|
||||
|
||||
.. rubric:: Footnotes
|
||||
.. [#yaml] https://yaml.org/spec/1.2-old/spec.html
|
||||
.. [#etsi_sol_003] https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.03.01_60/gs_nfv-sol003v030301p.pdf
|
||||
.. [#f_string] https://docs.python.org/3.9/tutorial/inputoutput.html#fancier-output-formatting
|
||||
|
121
etc/tacker/prometheus-plugin.yaml
Normal file
121
etc/tacker/prometheus-plugin.yaml
Normal file
@ -0,0 +1,121 @@
|
||||
# Prometheus plugin configuration file
|
||||
#
|
||||
# This describes the Prometheus plugin configuration. This is used when
|
||||
# Prometheus Plugin creates an alert rule. You can use your own promQL
|
||||
# statements. Put this file in the configuration directory (e.g:/etc/tacker).
|
||||
#
|
||||
# The settings are key-value pairs of a performanceMetric and a PromQL
|
||||
# statement. The performanceMetric is defined in ETSI GS NFV-SOL 003. For
|
||||
# PromQL statement, f-string of python is used. In the f-string, below
|
||||
# replacement field can be used. They are replaced with a SOL-API's attribute
|
||||
# or Tacker internal value.
|
||||
#
|
||||
# {collection_period}:
|
||||
# Replaced with collectionPeriod attribute of SOL-API.
|
||||
# {pod}:
|
||||
# Replaced with a resourceId when subObjectInstanceIds are specified.
|
||||
# e.g: test-test1-8d6db447f-stzhb
|
||||
# Replaced with regexp that matches each resourceId in vnfInstance when
|
||||
# subObjectInstanceIds are not specified.
|
||||
# e.g: (test-test1-[0-9a-f]{1,10}-[0-9a-z]{5}$|
|
||||
# test-test2-[0-9a-f]{1,10}-[0-9a-z]{5}$)
|
||||
# {reporting_period}:
|
||||
# Replaced with reportingPeriod attribute of SOL-API.
|
||||
# {sub_object_instance_id}:
|
||||
# Replaced with an element of subObjectInstanceIds of SOL-API.
|
||||
# {namespace}:
|
||||
# Replaced with the kubernetes namespace that the vnfInstance belongs to.
|
||||
#
|
||||
|
||||
PMJob:
|
||||
PromQL:
|
||||
VCpuUsageMeanVnf: >-
|
||||
avg(sum(rate(pod_cpu_usage_seconds_total
|
||||
{{namespace="{namespace}",pod=~"{pod}"}}[{reporting_period}s])))
|
||||
VCpuUsagePeakVnf: >-
|
||||
max(sum(rate(pod_cpu_usage_seconds_total
|
||||
{{namespace="{namespace}",pod=~"{pod}"}}[{reporting_period}s])))
|
||||
VMemoryUsageMeanVnf: >-
|
||||
avg(pod_memory_working_set_bytes{{namespace="{namespace}",pod=~"{pod}"}} /
|
||||
on(pod) (kube_node_status_capacity{{resource="memory"}} *
|
||||
on(node) group_right kube_pod_info{{pod=~"{pod}"}}))
|
||||
VMemoryUsagePeakVnf: >-
|
||||
max(pod_memory_working_set_bytes{{namespace="{namespace}",pod=~"{pod}"}} /
|
||||
on(pod) (kube_node_status_capacity{{resource="memory"}} *
|
||||
on(node) group_right kube_pod_info{{pod=~"{pod}"}}))
|
||||
VDiskUsageMeanVnf: >-
|
||||
avg(max(container_fs_usage_bytes{{namespace="{namespace}",pod=~"{pod}"}}/
|
||||
container_fs_limit_bytes{{namespace="{namespace}",pod=~"{pod}"}}))
|
||||
VDiskUsagePeakVnf: >-
|
||||
max(max(container_fs_usage_bytes{{namespace="{namespace}",pod=~"{pod}"}}/
|
||||
container_fs_limit_bytes{{namespace="{namespace}",pod=~"{pod}"}}))
|
||||
ByteIncomingVnfIntCp: >-
|
||||
sum(container_network_receive_bytes_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
PacketIncomingVnfIntCp: >-
|
||||
sum(container_network_receive_packets_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
ByteOutgoingVnfIntCp: >-
|
||||
sum(container_network_transmit_bytes_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
PacketOutgoingVnfIntCp: >-
|
||||
sum(container_network_transmit_packets_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
ByteIncomingVnfExtCp: >-
|
||||
sum(container_network_receive_bytes_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
PacketIncomingVnfExtCp: >-
|
||||
sum(container_network_receive_packets_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
ByteOutgoingVnfExtCp: >-
|
||||
sum(container_network_transmit_bytes_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
PacketOutgoingVnfExtCp: >-
|
||||
sum(container_network_transmit_packets_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
Threshold:
|
||||
PromQL:
|
||||
VCpuUsageMeanVnf: >-
|
||||
avg(sum(rate(pod_cpu_usage_seconds_total
|
||||
{{namespace="{namespace}",pod=~"{pod}"}}[{reporting_period}s])))
|
||||
VCpuUsagePeakVnf: >-
|
||||
max(sum(rate(pod_cpu_usage_seconds_total
|
||||
{{namespace="{namespace}",pod=~"{pod}"}}[{reporting_period}s])))
|
||||
VMemoryUsageMeanVnf: >-
|
||||
avg(pod_memory_working_set_bytes{{namespace="{namespace}",pod=~"{pod}"}} /
|
||||
on(pod) (kube_node_status_capacity{{resource="memory"}} *
|
||||
on(node) group_right kube_pod_info{{pod=~"{pod}"}}))
|
||||
VMemoryUsagePeakVnf: >-
|
||||
max(pod_memory_working_set_bytes{{namespace="{namespace}",pod=~"{pod}"}} /
|
||||
on(pod) (kube_node_status_capacity{{resource="memory"}} *
|
||||
on(node) group_right kube_pod_info{{pod=~"{pod}"}}))
|
||||
VDiskUsageMeanVnf: >-
|
||||
avg(max(container_fs_usage_bytes{{namespace="{namespace}",pod=~"{pod}"}}/
|
||||
container_fs_limit_bytes{{namespace="{namespace}",pod=~"{pod}"}}))
|
||||
VDiskUsagePeakVnf: >-
|
||||
max(max(container_fs_usage_bytes{{namespace="{namespace}",pod=~"{pod}"}}/
|
||||
container_fs_limit_bytes{{namespace="{namespace}",pod=~"{pod}"}}))
|
||||
ByteIncomingVnfIntCp: >-
|
||||
sum(container_network_receive_bytes_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
PacketIncomingVnfIntCp: >-
|
||||
sum(container_network_receive_packets_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
ByteOutgoingVnfIntCp: >-
|
||||
sum(container_network_transmit_bytes_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
PacketOutgoingVnfIntCp: >-
|
||||
sum(container_network_transmit_packets_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
ByteIncomingVnfExtCp: >-
|
||||
sum(container_network_receive_bytes_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
PacketIncomingVnfExtCp: >-
|
||||
sum(container_network_receive_packets_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
ByteOutgoingVnfExtCp: >-
|
||||
sum(container_network_transmit_bytes_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
||||
PacketOutgoingVnfExtCp: >-
|
||||
sum(container_network_transmit_packets_total
|
||||
{{namespace="{namespace}",interface="{sub_object_instance_id}",pod=~"{pod}"}})
|
@ -28,6 +28,7 @@ data_files =
|
||||
etc/tacker =
|
||||
etc/tacker/api-paste.ini
|
||||
etc/tacker/rootwrap.conf
|
||||
etc/tacker/prometheus-plugin.yaml
|
||||
etc/rootwrap.d =
|
||||
etc/tacker/rootwrap.d/tacker.filters
|
||||
etc/init.d = etc/init.d/tacker-server
|
||||
|
@ -20,10 +20,12 @@ import os
|
||||
import paramiko
|
||||
import re
|
||||
import tempfile
|
||||
import yaml
|
||||
|
||||
from keystoneauth1 import exceptions as ks_exc
|
||||
from oslo_log import log as logging
|
||||
from oslo_utils import uuidutils
|
||||
from tacker.common import utils
|
||||
from tacker.sol_refactored.api import prometheus_plugin_validator as validator
|
||||
from tacker.sol_refactored.api.schemas import prometheus_plugin_schemas
|
||||
from tacker.sol_refactored.common import config as cfg
|
||||
@ -54,7 +56,283 @@ class PrometheusPlugin():
|
||||
return t if t.tzinfo else t.astimezone()
|
||||
|
||||
|
||||
class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
class PrometheusPluginPmBase(PrometheusPlugin):
|
||||
def __init__(self):
|
||||
super(PrometheusPluginPmBase, self).__init__()
|
||||
auth_handle = http_client.NoAuthHandle()
|
||||
self.client = http_client.HttpClient(auth_handle)
|
||||
|
||||
def convert_measurement_unit(self, metric, value):
|
||||
if re.match(r'^V(Cpu|Memory|Disk)Usage(Mean|Peak)Vnf\..+', metric):
|
||||
value = float(value)
|
||||
elif re.match(r'^(Byte|Packet)(Incoming|Outgoing)Vnf(IntCp|ExtCp)',
|
||||
metric):
|
||||
value = int(value)
|
||||
else:
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"Failed to convert annotations.value to measurement unit.")
|
||||
return value
|
||||
|
||||
def load_prom_config(self):
|
||||
config_file = utils.find_config_file({}, 'prometheus-plugin.yaml')
|
||||
if not config_file:
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"prometheus-plugin.yaml not found."
|
||||
)
|
||||
LOG.info(f"prom_config file: {config_file}")
|
||||
with open(config_file) as file:
|
||||
prom_config = yaml.safe_load(file.read())
|
||||
return prom_config
|
||||
|
||||
def make_prom_ql(self, target, pod, collection_period=30,
|
||||
reporting_period=90, sub_object_instance_id='*',
|
||||
pm_type='PMJob', namespace='default'):
|
||||
REPORTING_PERIOD_MIN = 30
|
||||
reporting_period = max(reporting_period, REPORTING_PERIOD_MIN)
|
||||
prom_config = self.load_prom_config()
|
||||
expr = prom_config[pm_type]['PromQL'][target].format(
|
||||
pod=pod,
|
||||
collection_period=collection_period,
|
||||
reporting_period=reporting_period,
|
||||
sub_object_instance_id=sub_object_instance_id,
|
||||
namespace=namespace
|
||||
)
|
||||
LOG.info(f"promQL expr: {expr}")
|
||||
return expr
|
||||
|
||||
def make_rule(self, type, id, object_instance_id, sub_object_instance_id,
|
||||
metric, expression, collection_period=30):
|
||||
if type == 'PMJob':
|
||||
labels = {
|
||||
'alertname': '',
|
||||
'receiver_type': 'tacker',
|
||||
'function_type': 'vnfpm',
|
||||
'job_id': id,
|
||||
'object_instance_id': object_instance_id,
|
||||
'sub_object_instance_id': sub_object_instance_id,
|
||||
'metric': metric
|
||||
}
|
||||
elif type == 'Threshold':
|
||||
labels = {
|
||||
'alertname': '',
|
||||
'receiver_type': 'tacker',
|
||||
'function_type': 'vnfpm-threshold',
|
||||
'threshold_id': id,
|
||||
'object_instance_id': object_instance_id,
|
||||
'sub_object_instance_id': sub_object_instance_id,
|
||||
'metric': metric
|
||||
}
|
||||
else:
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"Invalid type in make_rule()."
|
||||
)
|
||||
|
||||
labels = {k: v for k, v in labels.items() if v is not None}
|
||||
annotations = {
|
||||
'value': r'{{$value}}'
|
||||
}
|
||||
rule = {
|
||||
'alert': uuidutils.generate_uuid(),
|
||||
'expr': expression,
|
||||
'for': f'{collection_period}s',
|
||||
'labels': labels,
|
||||
'annotations': annotations
|
||||
}
|
||||
return rule
|
||||
|
||||
def get_namespace(self, inst):
|
||||
return inst.instantiatedVnfInfo.metadata.get(
|
||||
'namespace', 'default') if (
|
||||
inst.obj_attr_is_set('instantiatedVnfInfo') and
|
||||
inst.instantiatedVnfInfo.obj_attr_is_set(
|
||||
'metadata')) else 'default'
|
||||
|
||||
def get_vnfc_resource_info(self, inst):
|
||||
return inst.instantiatedVnfInfo.vnfcResourceInfo if (
|
||||
inst.obj_attr_is_set('instantiatedVnfInfo') and
|
||||
inst.instantiatedVnfInfo.obj_attr_is_set(
|
||||
'vnfcResourceInfo')) else None
|
||||
|
||||
def get_pod_regexp(self, inst):
|
||||
# resource ids are like:
|
||||
# ['test-test1-756757f8f-xcwmt',
|
||||
# 'test-test2-756757f8f-kmghr', ...]
|
||||
# convert them to a regex string such as:
|
||||
# '(test-test1-[0-9a-f]{1,10}-[0-9a-z]{5}$|
|
||||
# test-test2-[0-9a-f]{1,10}-[0-9a-z]{5}$|...)'
|
||||
resource_info = self.get_vnfc_resource_info(inst)
|
||||
if not resource_info:
|
||||
return None
|
||||
deployments = list(filter(
|
||||
lambda r:
|
||||
r.computeResource.obj_attr_is_set(
|
||||
'vimLevelResourceType')
|
||||
and r.computeResource.obj_attr_is_set(
|
||||
'resourceId'
|
||||
)
|
||||
and r.computeResource.vimLevelResourceType ==
|
||||
'Deployment', resource_info
|
||||
))
|
||||
deployments = list(set(list(map(
|
||||
lambda d: re.sub(
|
||||
r'\-[0-9a-f]{1,10}\-[0-9a-z]{5}$', '',
|
||||
d.computeResource.resourceId) +
|
||||
r'-[0-9a-f]{1,10}-[0-9a-z]{5}$',
|
||||
deployments
|
||||
))))
|
||||
return ('(' + '|'.join(deployments) + ')'
|
||||
if len(deployments) else None)
|
||||
|
||||
def get_compute_resource_by_sub_obj(self, inst, sub_obj):
|
||||
if (not inst.obj_attr_is_set('instantiatedVnfInfo') or
|
||||
not inst.instantiatedVnfInfo.obj_attr_is_set(
|
||||
'vnfcResourceInfo') or
|
||||
not inst.instantiatedVnfInfo.obj_attr_is_set('vnfcInfo')):
|
||||
return None
|
||||
vnfc_info = list(filter(
|
||||
lambda x: (x.obj_attr_is_set('vnfcResourceInfoId') and
|
||||
x.id == sub_obj),
|
||||
inst.instantiatedVnfInfo.vnfcInfo))
|
||||
if len(vnfc_info) == 0:
|
||||
return None
|
||||
resources = list(filter(
|
||||
lambda x: (vnfc_info[0].obj_attr_is_set('vnfcResourceInfoId') and
|
||||
x.id == vnfc_info[0].vnfcResourceInfoId and
|
||||
x.computeResource.obj_attr_is_set('vimLevelResourceType') and
|
||||
x.computeResource.vimLevelResourceType == 'Deployment' and
|
||||
x.computeResource.obj_attr_is_set('resourceId')),
|
||||
inst.instantiatedVnfInfo.vnfcResourceInfo))
|
||||
if len(resources) == 0:
|
||||
return None
|
||||
return resources[0].computeResource
|
||||
|
||||
def _delete_rule(self, host, port, user, password, path, id):
|
||||
with paramiko.Transport(sock=(host, port)) as client:
|
||||
client.connect(username=user, password=password)
|
||||
sftp = paramiko.SFTPClient.from_transport(client)
|
||||
sftp.remove(f'{path}/{id}.json')
|
||||
|
||||
def reload_prom_server(self, context, reload_uri):
|
||||
resp, _ = self.client.do_request(
|
||||
reload_uri, "PUT", context=context)
|
||||
if resp.status_code >= 400 and resp.status_code < 600:
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
f"Reloading request to prometheus is failed: "
|
||||
f"{resp.status_code}.")
|
||||
|
||||
def _upload_rule(self, rule_group, host, port, user, password, path, id):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
with open(os.path.join(tmpdir, 'rule.json'),
|
||||
'w+', encoding="utf-8") as fp:
|
||||
json.dump(rule_group, fp, indent=4, ensure_ascii=False)
|
||||
filename = fp.name
|
||||
with paramiko.Transport(sock=(host, port)) as client:
|
||||
LOG.info("Upload rule files to prometheus server: %s.", host)
|
||||
client.connect(username=user, password=password)
|
||||
sftp = paramiko.SFTPClient.from_transport(client)
|
||||
sftp.put(filename, f'{path}/{id}.json')
|
||||
self.verify_rule(host, port, user, password, path, id)
|
||||
|
||||
def verify_rule(self, host, port, user, password, path, id):
|
||||
if not CONF.prometheus_plugin.test_rule_with_promtool:
|
||||
return
|
||||
with paramiko.SSHClient() as client:
|
||||
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
|
||||
client.connect(host, port=port, username=user, password=password)
|
||||
command = f"promtool check rules {path}/{id}.json"
|
||||
LOG.info("Rule file validation command: %s", command)
|
||||
_, stdout, stderr = client.exec_command(command)
|
||||
if stdout.channel.recv_exit_status() != 0:
|
||||
error_byte = stderr.read()
|
||||
error_str = error_byte.decode('utf-8')
|
||||
LOG.error(
|
||||
"Rule file validation with promtool failed: %s",
|
||||
error_str)
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"Rule file validation with promtool failed.")
|
||||
|
||||
def delete_rules(self, context, pm_job_or_threshold):
|
||||
target_list, reload_list = self.get_access_info(pm_job_or_threshold)
|
||||
for target in target_list:
|
||||
try:
|
||||
self._delete_rule(
|
||||
target['host'], target['port'], target['user'],
|
||||
target['password'], target['path'], pm_job_or_threshold.id)
|
||||
except (sol_ex.PrometheusPluginError, ks_exc.ClientException,
|
||||
paramiko.SSHException):
|
||||
# NOTE(shimizu-koji): This exception is ignored.
|
||||
# DELETE /pm_jobs/{id} will be success even if _delete_rule()
|
||||
# is failed. Because the rule file was already deleted.
|
||||
pass
|
||||
for uri in reload_list:
|
||||
try:
|
||||
self.reload_prom_server(context, uri)
|
||||
except (sol_ex.PrometheusPluginError, ks_exc.ClientException,
|
||||
paramiko.SSHException):
|
||||
pass
|
||||
|
||||
def get_access_info(self, pm_job_or_threshold):
|
||||
target_list = []
|
||||
reload_list = []
|
||||
if (not pm_job_or_threshold.obj_attr_is_set('metadata')
|
||||
or 'monitoring' not in pm_job_or_threshold.metadata):
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"monitoring info is missing at metadata field.")
|
||||
access_info = pm_job_or_threshold.metadata['monitoring']
|
||||
if (access_info.get('monitorName') != 'prometheus' or
|
||||
access_info.get('driverType') != 'external'):
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"prometheus info is missing at metadata field.")
|
||||
for info in access_info.get('targetsInfo', []):
|
||||
host = info.get('prometheusHost', '')
|
||||
port = info.get('prometheusHostPort', 22)
|
||||
auth = info.get('authInfo', {})
|
||||
user = auth.get('ssh_username', '')
|
||||
password = auth.get('ssh_password', '')
|
||||
path = info.get('alertRuleConfigPath', '')
|
||||
uri = info.get('prometheusReloadApiEndpoint', '')
|
||||
if not (host and user and path and uri):
|
||||
continue
|
||||
target_list.append({
|
||||
'host': host,
|
||||
'port': port,
|
||||
'user': user,
|
||||
'password': password,
|
||||
'path': path
|
||||
})
|
||||
reload_list.append(uri)
|
||||
return target_list, list(set(reload_list))
|
||||
|
||||
def upload_rules(self, context, target_list, reload_list, rule_group, id):
|
||||
def _cleanup_error(target_list):
|
||||
for target in target_list:
|
||||
try:
|
||||
self._delete_rule(target['host'], target['port'],
|
||||
target['user'], target['password'], target['path'],
|
||||
id)
|
||||
except (sol_ex.PrometheusPluginError, ks_exc.ClientException,
|
||||
paramiko.SSHException):
|
||||
pass
|
||||
|
||||
try:
|
||||
for target in target_list:
|
||||
self._upload_rule(
|
||||
rule_group, target['host'], target['port'],
|
||||
target['user'], target['password'], target['path'],
|
||||
id)
|
||||
for uri in reload_list:
|
||||
self.reload_prom_server(context, uri)
|
||||
except (sol_ex.PrometheusPluginError, ks_exc.ClientException,
|
||||
paramiko.SSHException) as e:
|
||||
LOG.error("failed to upload rule files: %s", e.args[0])
|
||||
_cleanup_error(target_list)
|
||||
raise e
|
||||
except Exception as e:
|
||||
_cleanup_error(target_list)
|
||||
raise e
|
||||
|
||||
|
||||
class PrometheusPluginPm(PrometheusPluginPmBase, mon_base.MonitoringPlugin):
|
||||
_instance = None
|
||||
|
||||
@staticmethod
|
||||
@ -73,62 +351,9 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
"Not constructor but instance() should be used.")
|
||||
super(PrometheusPluginPm, self).__init__()
|
||||
self.notification_callback = None
|
||||
auth_handle = http_client.NoAuthHandle()
|
||||
self.client = http_client.HttpClient(auth_handle)
|
||||
self.reporting_period_margin = (
|
||||
CONF.prometheus_plugin.reporting_period_margin)
|
||||
self.notification_callback = self.default_callback
|
||||
# Pod name can be specified but container name can not.
|
||||
# And some prometheus metrics need container name. Therefore, max
|
||||
# statement of PromQL is alternatively used in some measurements to
|
||||
# measure without container name. That means it provids only most
|
||||
# impacted value among the containers.
|
||||
self.sol_exprs = {
|
||||
'VCpuUsageMeanVnf':
|
||||
'avg(sum(rate(pod_cpu_usage_seconds_total'
|
||||
'{{pod=~"{pod}"}}[{reporting_period}s])))',
|
||||
'VCpuUsagePeakVnf':
|
||||
'max(sum(rate(pod_cpu_usage_seconds_total'
|
||||
'{{pod=~"{pod}"}}[{reporting_period}s])))',
|
||||
'VMemoryUsageMeanVnf':
|
||||
'avg(pod_memory_working_set_bytes{{pod=~"{pod}"}} / '
|
||||
'on(pod) (kube_node_status_capacity{{resource="memory"}} * '
|
||||
'on(node) group_right kube_pod_info))',
|
||||
'VMemoryUsagePeakVnf':
|
||||
'max(pod_memory_working_set_bytes{{pod=~"{pod}"}} / '
|
||||
'on(pod) (kube_node_status_capacity{{resource="memory"}} * '
|
||||
'on(node) group_right kube_pod_info))',
|
||||
'VDiskUsageMeanVnf':
|
||||
'avg(max(container_fs_usage_bytes{{pod=~"{pod}"}}/'
|
||||
'container_fs_limit_bytes{{pod=~"{pod}"}}))',
|
||||
'VDiskUsagePeakVnf':
|
||||
'max(max(container_fs_usage_bytes{{pod=~"{pod}"}}/'
|
||||
'container_fs_limit_bytes{{pod=~"{pod}"}}))',
|
||||
'ByteIncomingVnfIntCp':
|
||||
'sum(container_network_receive_bytes_total'
|
||||
'{{interface="{sub_object_instance_id}",pod=~"{pod}"}})',
|
||||
'PacketIncomingVnfIntCp':
|
||||
'sum(container_network_receive_packets_total'
|
||||
'{{interface="{sub_object_instance_id}",pod=~"{pod}"}})',
|
||||
'ByteOutgoingVnfIntCp':
|
||||
'sum(container_network_transmit_bytes_total'
|
||||
'{{interface="{sub_object_instance_id}",pod=~"{pod}"}})',
|
||||
'PacketOutgoingVnfIntCp':
|
||||
'sum(container_network_transmit_packets_total'
|
||||
'{{interface="{sub_object_instance_id}",pod=~"{pod}"}})',
|
||||
'ByteIncomingVnfExtCp':
|
||||
'sum(container_network_receive_bytes_total'
|
||||
'{{interface="{sub_object_instance_id}",pod=~"{pod}"}})',
|
||||
'PacketIncomingVnfExtCp':
|
||||
'sum(container_network_receive_packets_total'
|
||||
'{{interface="{sub_object_instance_id}",pod=~"{pod}"}})',
|
||||
'ByteOutgoingVnfExtCp':
|
||||
'sum(container_network_transmit_bytes_total'
|
||||
'{{interface="{sub_object_instance_id}",pod=~"{pod}"}})',
|
||||
'PacketOutgoingVnfExtCp':
|
||||
'sum(container_network_transmit_packets_total'
|
||||
'{{interface="{sub_object_instance_id}",pod=~"{pod}"}})',
|
||||
}
|
||||
PrometheusPluginPm._instance = self
|
||||
|
||||
def set_callback(self, notification_callback):
|
||||
@ -152,17 +377,6 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
def default_callback(self, context, entries):
|
||||
self.rpc.store_job_info(context, entries)
|
||||
|
||||
def convert_measurement_unit(self, metric, value):
|
||||
if re.match(r'^V(Cpu|Memory|Disk)Usage(Mean|Peak)Vnf\..+', metric):
|
||||
value = float(value)
|
||||
elif re.match(r'^(Byte|Packet)(Incoming|Outgoing)Vnf(IntCp|ExtCp)',
|
||||
metric):
|
||||
value = int(value)
|
||||
else:
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"Failed to convert annotations.value to measurement unit.")
|
||||
return value
|
||||
|
||||
def get_datetime_of_latest_report(
|
||||
self, context, pm_job, object_instance_id,
|
||||
sub_object_instance_id, metric):
|
||||
@ -305,77 +519,7 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
)
|
||||
return metrics
|
||||
|
||||
def make_prom_ql(self, target, pod, collection_period=30,
|
||||
reporting_period=90, sub_object_instance_id='*'):
|
||||
reporting_period = max(reporting_period, 30)
|
||||
expr = self.sol_exprs[target].format(
|
||||
pod=pod,
|
||||
collection_period=collection_period,
|
||||
reporting_period=reporting_period,
|
||||
sub_object_instance_id=sub_object_instance_id
|
||||
)
|
||||
return expr
|
||||
|
||||
def make_rule(self, pm_job, object_instance_id, sub_object_instance_id,
|
||||
metric, expression, collection_period):
|
||||
labels = {
|
||||
'alertname': '',
|
||||
'receiver_type': 'tacker',
|
||||
'function_type': 'vnfpm',
|
||||
'job_id': pm_job.id,
|
||||
'object_instance_id': object_instance_id,
|
||||
'sub_object_instance_id': sub_object_instance_id,
|
||||
'metric': metric
|
||||
}
|
||||
labels = {k: v for k, v in labels.items() if v is not None}
|
||||
annotations = {
|
||||
'value': r'{{$value}}'
|
||||
}
|
||||
rule = {
|
||||
'alert': uuidutils.generate_uuid(),
|
||||
'expr': expression,
|
||||
'for': f'{collection_period}s',
|
||||
'labels': labels,
|
||||
'annotations': annotations
|
||||
}
|
||||
return rule
|
||||
|
||||
def get_vnfc_resource_info(self, _, vnf_instance_id, inst_map):
|
||||
inst = inst_map[vnf_instance_id]
|
||||
if not inst.obj_attr_is_set('instantiatedVnfInfo') or\
|
||||
not inst.instantiatedVnfInfo.obj_attr_is_set(
|
||||
'vnfcResourceInfo'):
|
||||
return None
|
||||
return inst.instantiatedVnfInfo.vnfcResourceInfo
|
||||
|
||||
def get_pod_regexp(self, resource_info):
|
||||
# resource ids are like:
|
||||
# ['test-test1-756757f8f-xcwmt',
|
||||
# 'test-test2-756757f8f-kmghr', ...]
|
||||
# convert them to a regex string such as:
|
||||
# '(test-test1-[0-9a-f]{1,10}-[0-9a-z]{5}$|
|
||||
# test-test2-[0-9a-f]{1,10}-[0-9a-z]{5}$|...)'
|
||||
deployments = list(filter(
|
||||
lambda r:
|
||||
r.computeResource.obj_attr_is_set(
|
||||
'vimLevelResourceType')
|
||||
and r.computeResource.obj_attr_is_set(
|
||||
'resourceId'
|
||||
)
|
||||
and r.computeResource.vimLevelResourceType ==
|
||||
'Deployment', resource_info
|
||||
))
|
||||
deployments = list(set(list(map(
|
||||
lambda d: re.sub(
|
||||
r'\-[0-9a-f]{1,10}\-[0-9a-z]{5}$', '',
|
||||
d.computeResource.resourceId) +
|
||||
r'-[0-9a-f]{1,10}-[0-9a-z]{5}$',
|
||||
deployments
|
||||
))))
|
||||
pods_regexp = '(' + '|'.join(deployments) + ')'
|
||||
return deployments, pods_regexp
|
||||
|
||||
def _make_rules_for_each_obj(self, context, pm_job, inst_map, metric):
|
||||
def _make_rules_for_each_obj(self, pm_job, inst_map, metric):
|
||||
target = re.sub(r'\..+$', '', metric)
|
||||
objs = pm_job.objectInstanceIds
|
||||
collection_period = pm_job.criteria.collectionPeriod
|
||||
@ -388,45 +532,19 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
# convert them to a regex string such as:
|
||||
# '(test-test1-[0-9a-f]{1,10}-[0-9a-z]{5}$|
|
||||
# test-test2-[0-9a-f]{1,10}-[0-9a-z]{5}$|...)'
|
||||
resource_info = self.get_vnfc_resource_info(context, obj, inst_map)
|
||||
if not resource_info:
|
||||
continue
|
||||
deployments, pods_regexp = self.get_pod_regexp(resource_info)
|
||||
if len(deployments) == 0:
|
||||
pods_regexp = self.get_pod_regexp(inst_map[obj])
|
||||
if pods_regexp is None:
|
||||
continue
|
||||
namespace = self.get_namespace(inst_map[obj])
|
||||
expr = self.make_prom_ql(
|
||||
target, pods_regexp, collection_period=collection_period,
|
||||
reporting_period=reporting_period)
|
||||
reporting_period=reporting_period, namespace=namespace)
|
||||
rules.append(self.make_rule(
|
||||
pm_job, obj, None, metric, expr,
|
||||
collection_period))
|
||||
'PMJob', pm_job.id, obj, None, metric, expr,
|
||||
collection_period=collection_period))
|
||||
return rules
|
||||
|
||||
def get_compute_resource_by_sub_obj(self, vnf_instance, sub_obj):
|
||||
inst = vnf_instance
|
||||
if (not inst.obj_attr_is_set('instantiatedVnfInfo') or
|
||||
not inst.instantiatedVnfInfo.obj_attr_is_set(
|
||||
'vnfcResourceInfo') or
|
||||
not inst.instantiatedVnfInfo.obj_attr_is_set('vnfcInfo')):
|
||||
return None
|
||||
vnfc_info = list(filter(
|
||||
lambda x: (x.obj_attr_is_set('vnfcResourceInfoId') and
|
||||
x.id == sub_obj),
|
||||
inst.instantiatedVnfInfo.vnfcInfo))
|
||||
if len(vnfc_info) == 0:
|
||||
return None
|
||||
resources = list(filter(
|
||||
lambda x: (vnfc_info[0].obj_attr_is_set('vnfcResourceInfoId') and
|
||||
x.id == vnfc_info[0].vnfcResourceInfoId and
|
||||
x.computeResource.obj_attr_is_set('vimLevelResourceType') and
|
||||
x.computeResource.vimLevelResourceType == 'Deployment' and
|
||||
x.computeResource.obj_attr_is_set('resourceId')),
|
||||
inst.instantiatedVnfInfo.vnfcResourceInfo))
|
||||
if len(resources) == 0:
|
||||
return None
|
||||
return resources[0].computeResource
|
||||
|
||||
def _make_rules_for_each_sub_obj(self, context, pm_job, inst_map, metric):
|
||||
def _make_rules_for_each_sub_obj(self, pm_job, inst_map, metric):
|
||||
target = re.sub(r'\..+$', '', metric)
|
||||
objs = pm_job.objectInstanceIds
|
||||
sub_objs = pm_job.subObjectInstanceIds\
|
||||
@ -435,7 +553,7 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
collection_period = pm_job.criteria.collectionPeriod
|
||||
reporting_period = pm_job.criteria.reportingPeriod
|
||||
rules = []
|
||||
resource_info = self.get_vnfc_resource_info(context, objs[0], inst_map)
|
||||
resource_info = self.get_vnfc_resource_info(inst_map[objs[0]])
|
||||
if not resource_info:
|
||||
return []
|
||||
if pm_job.objectType in {'Vnf', 'Vnfc'}:
|
||||
@ -446,38 +564,39 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
if not compute_resource:
|
||||
continue
|
||||
resource_id = compute_resource.resourceId
|
||||
namespace = self.get_namespace(inst)
|
||||
expr = self.make_prom_ql(
|
||||
target, resource_id,
|
||||
collection_period=collection_period,
|
||||
reporting_period=reporting_period)
|
||||
reporting_period=reporting_period,
|
||||
namespace=namespace)
|
||||
rules.append(self.make_rule(
|
||||
pm_job, objs[0], sub_obj, metric, expr,
|
||||
collection_period))
|
||||
'PMJob', pm_job.id, objs[0], sub_obj, metric, expr,
|
||||
collection_period=collection_period))
|
||||
else:
|
||||
deployments, pods_regexp = self.get_pod_regexp(resource_info)
|
||||
if len(deployments) == 0:
|
||||
pods_regexp = self.get_pod_regexp(inst_map[objs[0]])
|
||||
if pods_regexp is None:
|
||||
return []
|
||||
for sub_obj in sub_objs:
|
||||
namespace = self.get_namespace(inst_map[objs[0]])
|
||||
expr = self.make_prom_ql(
|
||||
target, pods_regexp, collection_period=collection_period,
|
||||
reporting_period=reporting_period,
|
||||
sub_object_instance_id=sub_obj)
|
||||
sub_object_instance_id=sub_obj, namespace=namespace)
|
||||
rules.append(self.make_rule(
|
||||
pm_job, objs[0], sub_obj, metric, expr,
|
||||
collection_period))
|
||||
'PMJob', pm_job.id, objs[0], sub_obj, metric, expr,
|
||||
collection_period=collection_period))
|
||||
return rules
|
||||
|
||||
def _make_rules(self, context, pm_job, metric, inst_map):
|
||||
def _make_rules(self, pm_job, metric, inst_map):
|
||||
sub_objs = pm_job.subObjectInstanceIds\
|
||||
if (pm_job.obj_attr_is_set('subObjectInstanceIds') and
|
||||
pm_job.subObjectInstanceIds) else []
|
||||
# Cardinality of objectInstanceIds and subObjectInstanceIds
|
||||
# is N:0 or 1:N.
|
||||
if len(sub_objs) > 0:
|
||||
return self._make_rules_for_each_sub_obj(
|
||||
context, pm_job, inst_map, metric)
|
||||
return self._make_rules_for_each_obj(
|
||||
context, pm_job, inst_map, metric)
|
||||
return self._make_rules_for_each_sub_obj(pm_job, inst_map, metric)
|
||||
return self._make_rules_for_each_obj(pm_job, inst_map, metric)
|
||||
|
||||
def decompose_metrics_vnfintextcp(self, pm_job):
|
||||
group_name = 'VnfInternalCp'\
|
||||
@ -504,32 +623,6 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
)
|
||||
return metrics
|
||||
|
||||
def _delete_rule(self, host, port, user, password, path, pm_job_id):
|
||||
with paramiko.Transport(sock=(host, port)) as client:
|
||||
client.connect(username=user, password=password)
|
||||
sftp = paramiko.SFTPClient.from_transport(client)
|
||||
sftp.remove(f'{path}/{pm_job_id}.json')
|
||||
|
||||
def delete_rules(self, context, pm_job):
|
||||
target_list, reload_list = self.get_access_info(pm_job)
|
||||
for target in target_list:
|
||||
try:
|
||||
self._delete_rule(
|
||||
target['host'], target['port'], target['user'],
|
||||
target['password'], target['path'], pm_job.id)
|
||||
except (sol_ex.PrometheusPluginError, ks_exc.ClientException,
|
||||
paramiko.SSHException):
|
||||
# This exception is ignored. DELETE /pm_jobs/{id}
|
||||
# will be success even if _delete_rule() is failed.
|
||||
# Because the rule file was already deleted.
|
||||
pass
|
||||
for uri in reload_list:
|
||||
try:
|
||||
self.reload_prom_server(context, uri)
|
||||
except (sol_ex.PrometheusPluginError, ks_exc.ClientException,
|
||||
paramiko.SSHException):
|
||||
pass
|
||||
|
||||
def decompose_metrics(self, pm_job):
|
||||
if pm_job.objectType in {'Vnf', 'Vnfc'}:
|
||||
return self.decompose_metrics_vnfc(pm_job)
|
||||
@ -538,107 +631,6 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
f"Invalid objectType: {pm_job.objectType}.")
|
||||
|
||||
def reload_prom_server(self, context, reload_uri):
|
||||
resp, _ = self.client.do_request(
|
||||
reload_uri, "PUT", context=context)
|
||||
if resp.status_code >= 400 and resp.status_code < 600:
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
f"Reloading request to prometheus is failed: "
|
||||
f"{resp.status_code}.")
|
||||
|
||||
def _upload_rule(self, rule_group, host, port, user, password, path,
|
||||
pm_job_id):
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
with open(os.path.join(tmpdir, 'rule.json'),
|
||||
'w+', encoding="utf-8") as fp:
|
||||
json.dump(rule_group, fp, indent=4, ensure_ascii=False)
|
||||
filename = fp.name
|
||||
with paramiko.Transport(sock=(host, port)) as client:
|
||||
LOG.info("Upload rule files to prometheus server: %s.", host)
|
||||
client.connect(username=user, password=password)
|
||||
sftp = paramiko.SFTPClient.from_transport(client)
|
||||
sftp.put(filename, f'{path}/{pm_job_id}.json')
|
||||
self.verify_rule(host, port, user, password, path, pm_job_id)
|
||||
|
||||
def verify_rule(self, host, port, user, password, path, pm_job_id):
|
||||
if not CONF.prometheus_plugin.test_rule_with_promtool:
|
||||
return
|
||||
with paramiko.SSHClient() as client:
|
||||
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
|
||||
client.connect(host, port=port, username=user, password=password)
|
||||
command = f"promtool check rules {path}/{pm_job_id}.json"
|
||||
LOG.info("Rule file validation command: %s", command)
|
||||
_, stdout, stderr = client.exec_command(command)
|
||||
if stdout.channel.recv_exit_status() != 0:
|
||||
error_byte = stderr.read()
|
||||
error_str = error_byte.decode('utf-8')
|
||||
LOG.error(
|
||||
"Rule file validation with promtool failed: %s",
|
||||
error_str)
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"Rule file validation with promtool failed.")
|
||||
|
||||
def get_access_info(self, pm_job):
|
||||
target_list = []
|
||||
reload_list = []
|
||||
if (not pm_job.obj_attr_is_set('metadata')
|
||||
or 'monitoring' not in pm_job.metadata):
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"monitoring info is missing at metadata field.")
|
||||
access_info = pm_job.metadata['monitoring']
|
||||
if (access_info.get('monitorName') != 'prometheus' or
|
||||
access_info.get('driverType') != 'external'):
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
"prometheus info is missing at metadata field.")
|
||||
for info in access_info.get('targetsInfo', []):
|
||||
host = info.get('prometheusHost', '')
|
||||
port = info.get('prometheusHostPort', 22)
|
||||
auth = info.get('authInfo', {})
|
||||
user = auth.get('ssh_username', '')
|
||||
password = auth.get('ssh_password', '')
|
||||
path = info.get('alertRuleConfigPath', '')
|
||||
uri = info.get('prometheusReloadApiEndpoint', '')
|
||||
if not (host and user and path and uri):
|
||||
continue
|
||||
target_list.append({
|
||||
'host': host,
|
||||
'port': port,
|
||||
'user': user,
|
||||
'password': password,
|
||||
'path': path
|
||||
})
|
||||
reload_list.append(uri)
|
||||
return target_list, list(set(reload_list))
|
||||
|
||||
def upload_rules(
|
||||
self, context, target_list, reload_list, rule_group, pm_job):
|
||||
def _cleanup_error(target_list):
|
||||
for target in target_list:
|
||||
try:
|
||||
self._delete_rule(target['host'], target['port'],
|
||||
target['user'], target['password'], target['path'],
|
||||
pm_job.id)
|
||||
except (sol_ex.PrometheusPluginError, ks_exc.ClientException,
|
||||
paramiko.SSHException):
|
||||
pass
|
||||
|
||||
try:
|
||||
for target in target_list:
|
||||
self._upload_rule(
|
||||
rule_group, target['host'], target['port'],
|
||||
target['user'], target['password'], target['path'],
|
||||
pm_job.id)
|
||||
for uri in reload_list:
|
||||
self.reload_prom_server(context, uri)
|
||||
except (sol_ex.PrometheusPluginError, ks_exc.ClientException,
|
||||
paramiko.SSHException) as e:
|
||||
LOG.error("failed to upload rule files: %s", e.args[0])
|
||||
_cleanup_error(target_list)
|
||||
raise e
|
||||
except Exception as e:
|
||||
_cleanup_error(target_list)
|
||||
raise e
|
||||
|
||||
def get_vnf_instances(self, context, pm_job):
|
||||
object_instance_ids = list(set(pm_job.objectInstanceIds))
|
||||
return dict(zip(
|
||||
@ -651,7 +643,7 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
target_list, reload_list = self.get_access_info(pm_job)
|
||||
metrics = self.decompose_metrics(pm_job)
|
||||
inst_map = self.get_vnf_instances(context, pm_job)
|
||||
rules = sum([self._make_rules(context, pm_job, metric, inst_map)
|
||||
rules = sum([self._make_rules(pm_job, metric, inst_map)
|
||||
for metric in metrics], [])
|
||||
if len(rules) == 0:
|
||||
raise sol_ex.PrometheusPluginError(
|
||||
@ -666,7 +658,7 @@ class PrometheusPluginPm(PrometheusPlugin, mon_base.MonitoringPlugin):
|
||||
]
|
||||
}
|
||||
self.upload_rules(
|
||||
context, target_list, reload_list, rule_group, pm_job)
|
||||
context, target_list, reload_list, rule_group, pm_job.id)
|
||||
return rule_group
|
||||
|
||||
|
||||
|
@ -13,6 +13,7 @@
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import copy
|
||||
from oslo_utils import uuidutils
|
||||
|
||||
|
||||
@ -588,6 +589,95 @@ def terminate_vnf_min():
|
||||
}
|
||||
|
||||
|
||||
def pm_job_external(callback_uri, inst_id, host_ip, rsc_id):
|
||||
def pm_job(
|
||||
callback_uri, inst_id, host_ip,
|
||||
object_type, performance_metric,
|
||||
sub_object_instance_id=None):
|
||||
job = {
|
||||
"objectType": object_type,
|
||||
"objectInstanceIds": [inst_id],
|
||||
"subObjectInstanceIds": ([sub_object_instance_id]
|
||||
if sub_object_instance_id else []),
|
||||
"criteria": {
|
||||
"performanceMetric": [performance_metric],
|
||||
"performanceMetricGroup": [],
|
||||
"collectionPeriod": 30,
|
||||
"reportingPeriod": 90
|
||||
},
|
||||
"callbackUri": callback_uri,
|
||||
"metadata": {
|
||||
"monitoring": {
|
||||
"monitorName": "prometheus",
|
||||
"driverType": "external",
|
||||
"targetsInfo": [
|
||||
{
|
||||
"prometheusHost": host_ip,
|
||||
"prometheusHostPort": 50022,
|
||||
"authInfo": {
|
||||
"ssh_username": "root",
|
||||
"ssh_password": "root"
|
||||
},
|
||||
"alertRuleConfigPath":
|
||||
"/tmp",
|
||||
"prometheusReloadApiEndpoint":
|
||||
"http://localhost:9990/-/reload",
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
return copy.deepcopy(job)
|
||||
return [
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnf",
|
||||
f"VCpuUsageMeanVnf.{inst_id}"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnf",
|
||||
f"VCpuUsagePeakVnf.{inst_id}"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnf",
|
||||
f"VMemoryUsageMeanVnf.{inst_id}"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnf",
|
||||
f"VMemoryUsagePeakVnf.{inst_id}"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnf",
|
||||
f"VDiskUsageMeanVnf.{inst_id}"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnf",
|
||||
f"VDiskUsagePeakVnf.{inst_id}"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnfc",
|
||||
f"VCpuUsageMeanVnf.{inst_id}",
|
||||
sub_object_instance_id=rsc_id),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnfc",
|
||||
f"VCpuUsagePeakVnf.{inst_id}",
|
||||
sub_object_instance_id=rsc_id),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnfc",
|
||||
f"VMemoryUsageMeanVnf.{inst_id}",
|
||||
sub_object_instance_id=rsc_id),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnfc",
|
||||
f"VMemoryUsagePeakVnf.{inst_id}",
|
||||
sub_object_instance_id=rsc_id),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnfc",
|
||||
f"VDiskUsageMeanVnf.{inst_id}",
|
||||
sub_object_instance_id=rsc_id),
|
||||
pm_job(callback_uri, inst_id, host_ip, "Vnfc",
|
||||
f"VDiskUsagePeakVnf.{inst_id}",
|
||||
sub_object_instance_id=rsc_id),
|
||||
pm_job(callback_uri, inst_id, host_ip, "VnfIntCp",
|
||||
"ByteIncomingVnfIntCp", sub_object_instance_id="eth0"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "VnfIntCp",
|
||||
"PacketIncomingVnfIntCp", sub_object_instance_id="eth0"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "VnfIntCp",
|
||||
"ByteOutgoingVnfIntCp", sub_object_instance_id="eth0"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "VnfIntCp",
|
||||
"PacketOutgoingVnfIntCp", sub_object_instance_id="eth0"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "VnfExtCp",
|
||||
"ByteIncomingVnfExtCp", sub_object_instance_id="eth1"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "VnfExtCp",
|
||||
"PacketIncomingVnfExtCp", sub_object_instance_id="eth1"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "VnfExtCp",
|
||||
"ByteOutgoingVnfExtCp", sub_object_instance_id="eth1"),
|
||||
pm_job(callback_uri, inst_id, host_ip, "VnfExtCp",
|
||||
"PacketOutgoingVnfExtCp", sub_object_instance_id="eth1")
|
||||
]
|
||||
|
||||
|
||||
def pm_job_min(callback_uri, inst_id, host_ip):
|
||||
return {
|
||||
"objectType": "Vnf",
|
||||
|
Binary file not shown.
@ -0,0 +1,63 @@
|
||||
FROM python:3.8.13
|
||||
|
||||
## setup and run
|
||||
# [usage:]
|
||||
# docker build -t tacker-monitoring-test .
|
||||
# docker run -v ${PWD}/src:/work/src -v ${PWD}/rules:/etc/prometheus/rules -p 55555:55555 -p 50022:22 -e TEST_REMOTE_URI="http://<nfvo_addr>:<port>" -it tacker-monitoring-test
|
||||
#
|
||||
# (under proxy environment)
|
||||
# sudo docker build --build-arg PROXY=$http_proxy -t tacker-monitoring-test .
|
||||
# docker run -v ${PWD}/src:/work/src -v ${PWD}/rules:/etc/prometheus/rules -p 55555:55555 -p 50022:22 -e TEST_REMOTE_URI="http://<nfvo_addr>:<port>" -it tacker-monitoring-test
|
||||
#
|
||||
# [api:]
|
||||
# curl -X POST http://<<this_tool's_url>>:55555/v2/tenant_id/servers/server_id/alarms -d '{"fault_action": "http://<<tacker_uri>>", "fault_id": "2222"}' -i
|
||||
# curl -X DELETE http://<<this_tool's_url>>:55555/v2/tenant_id/servers/server_id/alarms/<<alarm_id>> -i
|
||||
|
||||
ARG PROXY
|
||||
ENV http_proxy ${PROXY}
|
||||
ENV https_proxy ${PROXY}
|
||||
ENV HTTP_PROXY ${PROXY}
|
||||
ENV HTTPS_PROXY ${PROXY}
|
||||
|
||||
USER root
|
||||
RUN useradd -m user
|
||||
RUN if [ ! -z "${MS_UID}" -a "${MS_UID}" -ge 1000 ] ;\
|
||||
then usermod -u ${MS_UID} user ;\
|
||||
else usermod -u 1000 user ; \
|
||||
fi
|
||||
|
||||
# SSH server
|
||||
RUN apt-get update && \
|
||||
apt-get install -y --no-install-recommends openssh-server && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
echo "root:root" | chpasswd && \
|
||||
sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config
|
||||
|
||||
RUN pip install --upgrade pip
|
||||
COPY requirements.txt /tmp/requirements.txt
|
||||
RUN pip install --default-timeout=1000 --no-cache-dir -r /tmp/requirements.txt
|
||||
COPY entrypoint.sh /tmp/entrypoint.sh
|
||||
RUN mkdir -p /work/src && chmod 777 /work/src
|
||||
RUN mkdir -p /etc/prometheus/rules && chmod 777 /etc/prometheus/rules
|
||||
|
||||
# prometheus & promtool
|
||||
ARG PROM_VERSION="2.37.5"
|
||||
RUN cd /tmp && \
|
||||
wget -q https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz && \
|
||||
tar zxf /tmp/prometheus-${PROM_VERSION}.linux-amd64.tar.gz -C /usr/local/src/&& \
|
||||
ln -s /usr/local/src/prometheus-${PROM_VERSION}.linux-amd64/prometheus /usr/bin/prometheus && \
|
||||
ln -s /usr/local/src/prometheus-${PROM_VERSION}.linux-amd64/promtool /usr/bin/promtool
|
||||
|
||||
ENV http_proxy ''
|
||||
ENV https_proxy ''
|
||||
ENV HTTP_PROXY ''
|
||||
ENV HTTPS_PROXY ''
|
||||
|
||||
EXPOSE 55555
|
||||
EXPOSE 22
|
||||
|
||||
#USER user
|
||||
WORKDIR /work
|
||||
RUN chown "user:user" /tmp/entrypoint.sh
|
||||
RUN chmod +x /tmp/entrypoint.sh
|
||||
CMD [ "/tmp/entrypoint.sh" ]
|
@ -0,0 +1,5 @@
|
||||
#!/bin/bash
|
||||
|
||||
service ssh start
|
||||
|
||||
python3 /work/src/testserver.py
|
@ -0,0 +1,256 @@
|
||||
# Copyright (C) 2022 Fujitsu
|
||||
# All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
# not use this file except in compliance with the License. You may obtain
|
||||
# a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
# License for the specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
import copy
|
||||
import json
|
||||
import os
|
||||
import threading
|
||||
import urllib
|
||||
import urllib.request
|
||||
import uuid
|
||||
|
||||
from datetime import datetime
|
||||
from http import HTTPStatus
|
||||
from http.server import BaseHTTPRequestHandler
|
||||
from http.server import HTTPServer
|
||||
|
||||
PORT = 55555
|
||||
PROM_RULE_DIR = '/etc/prometheus/rules'
|
||||
|
||||
server_notification_alarm_map = {}
|
||||
|
||||
|
||||
_body_base = {
|
||||
'receiver': 'receiver',
|
||||
'status': 'firing',
|
||||
'alerts': [
|
||||
],
|
||||
'groupLabels': {},
|
||||
'commonLabels': {
|
||||
'alertname': 'NodeInstanceDown',
|
||||
'job': 'node'
|
||||
},
|
||||
'commonAnnotations': {
|
||||
'description': 'sample'
|
||||
},
|
||||
'externalURL': 'http://controller147:9093',
|
||||
'version': '4',
|
||||
'groupKey': '{}:{}',
|
||||
'truncatedAlerts': 0
|
||||
}
|
||||
|
||||
|
||||
class PeriodicTask():
|
||||
def __init__(self):
|
||||
self.remote_url = os.getenv('TEST_REMOTE_URI')
|
||||
print(f"url: {str(self.remote_url)}")
|
||||
self.schedule_next()
|
||||
self.stored_alerts_fm = {}
|
||||
|
||||
def schedule_next(self):
|
||||
self.timer = threading.Timer(10, self.run)
|
||||
self.timer.start()
|
||||
|
||||
def server_notification_task(self):
|
||||
print("server_notification_task: num of items: %s" %
|
||||
str(len(server_notification_alarm_map.keys())))
|
||||
for v in server_notification_alarm_map.values():
|
||||
try:
|
||||
if ('fault_action' in v and 'fault_id' in v):
|
||||
url = v['fault_action']
|
||||
body = {
|
||||
'notification': {
|
||||
'alarm_id': v['alarm_id'],
|
||||
'fault_id': v['fault_id'],
|
||||
'fault_type': '10'
|
||||
}
|
||||
}
|
||||
headers = {
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
|
||||
req = urllib.request.Request(
|
||||
url, json.dumps(body).encode('utf-8'), headers)
|
||||
with urllib.request.urlopen(req) as res:
|
||||
print(f"res status: {str(res.status)}")
|
||||
except Exception as ex:
|
||||
print(str(ex))
|
||||
|
||||
def _prometheus_plugin_task(self, grp, filename):
|
||||
alerts_pm = []
|
||||
alerts_fm = []
|
||||
alerts_auto_scale = []
|
||||
use_stored_alerts = False
|
||||
|
||||
if filename in self.stored_alerts_fm:
|
||||
print("use_stored_alerts")
|
||||
stored_alerts = self.stored_alerts_fm[filename]
|
||||
for a in stored_alerts:
|
||||
a['status'] = 'resolved'
|
||||
del self.stored_alerts_fm[filename]
|
||||
alerts_fm = stored_alerts
|
||||
use_stored_alerts = True
|
||||
|
||||
for rule in grp['rules']:
|
||||
if 'labels' not in rule or 'function_type' not in rule['labels']:
|
||||
continue
|
||||
alt = {
|
||||
'status': 'firing',
|
||||
'labels': rule['labels'],
|
||||
'annotations': {'value': 99},
|
||||
'startsAt': datetime.now().isoformat(),
|
||||
'fingerprint': str(uuid.uuid4())
|
||||
}
|
||||
if rule['labels']['function_type'] == 'vnfpm':
|
||||
alt['annotations'] = {'value': 99}
|
||||
alerts_pm.append(alt)
|
||||
if (not use_stored_alerts and
|
||||
rule['labels']['function_type'] == 'vnffm'):
|
||||
alt['annotations'] = {
|
||||
'fault_type': 'fault_type',
|
||||
'probable_cause': 'probable_cause'}
|
||||
alerts_fm.append(alt)
|
||||
if rule['labels']['function_type'] == 'auto_scale':
|
||||
alt['annotations'] = {}
|
||||
alerts_auto_scale.append(alt)
|
||||
|
||||
if not use_stored_alerts and len(alerts_fm) > 0:
|
||||
self.stored_alerts_fm[filename] = alerts_fm
|
||||
|
||||
return (alerts_pm, alerts_fm, alerts_auto_scale)
|
||||
|
||||
def prometheus_plugin_task(self):
|
||||
print(f"prometheus_plugin_task: {PROM_RULE_DIR}")
|
||||
for entry in os.scandir(path=PROM_RULE_DIR):
|
||||
if not entry.is_file():
|
||||
continue
|
||||
print(f"file: {entry.name}")
|
||||
try:
|
||||
with open(PROM_RULE_DIR + '/' + entry.name) as f:
|
||||
rules = json.load(f)
|
||||
if 'groups' not in rules:
|
||||
continue
|
||||
for grp in rules['groups']:
|
||||
if 'rules' not in grp:
|
||||
continue
|
||||
pm, fm, scale = self._prometheus_plugin_task(
|
||||
grp, entry.name)
|
||||
for x in [(pm, '/pm_event'), (fm, '/alert'),
|
||||
(scale, '/alert/vnf_instances')]:
|
||||
if len(x[0]) == 0:
|
||||
continue
|
||||
body = copy.deepcopy(_body_base)
|
||||
body['alerts'] = x[0]
|
||||
headers = {'Content-Type': 'application/json'}
|
||||
url = self.remote_url + x[1]
|
||||
req = urllib.request.Request(
|
||||
url, json.dumps(body).encode('utf-8'),
|
||||
headers, method='POST')
|
||||
print(f"uri: {str(url)}")
|
||||
print(f"body: {str(body)}")
|
||||
with urllib.request.urlopen(req) as res:
|
||||
print(f"res status: {str(res.status)}")
|
||||
|
||||
except Exception as ex:
|
||||
print(str(ex))
|
||||
|
||||
def run(self):
|
||||
print("PeriodicTask run()")
|
||||
self.server_notification_task()
|
||||
self.prometheus_plugin_task()
|
||||
self.schedule_next()
|
||||
|
||||
|
||||
class TestHttpServer(BaseHTTPRequestHandler):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
|
||||
def send_response(self, response_code, response_body):
|
||||
super().send_response(response_code)
|
||||
self.send_header("Content-type", "application/json")
|
||||
self.send_header("Content-Length", str(len(response_body)))
|
||||
self.end_headers()
|
||||
self.wfile.write(response_body)
|
||||
|
||||
def do_GET(self):
|
||||
print(f"GET {self.path}")
|
||||
response_body = '{"result": |