Removing subprocess call from system metrics classes
Changed the IO, Disk and Network classes to not use a subprocess. They now use psutil to get the metrics. Also, changed the linux system metrics classes to subclass the AgentCheck class instead of the old-style Check class. Added additional configuration and changed monasca-setup to support that. Fixed some Python 2.6 incompatible string formatting issues. Change-Id: I1f8b65bf48e48e2c598aa4950c194fbae2f9e337
This commit is contained in:
parent
c0225d49ad
commit
b1de7db1f5
25
README.md
25
README.md
|
@ -74,7 +74,6 @@
|
|||
- [Dimensions](#dimensions)
|
||||
- [Cross-Tenant Metric Submission](#cross-tenant-metric-submission)
|
||||
- [Statsd](#statsd)
|
||||
- [Log Parsing](#log-parsing)
|
||||
- [License](#license)
|
||||
|
||||
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
||||
|
@ -86,7 +85,6 @@ The Monasca Agent is a modern Python monitoring agent for gathering metrics and
|
|||
* System metrics such as cpu and memory utilization.
|
||||
* Nagios plugins. The Monasca Agent can run Nagios plugins and send the status code returned by the plugin as a metric to the Monasca API.
|
||||
* Statsd. The Monasca Agent supports an integrated Statsd daemon which can be used by applications via a statsd client library.
|
||||
* Retrieving metrics from log files written in a specific format.
|
||||
* Host alive. The Monasca Agent can perform active checks on a host to determine if it is alive using ping(ICMP) or SSH.
|
||||
* Process checks. The Monasca Agent can check a process and return several metrics on the process such as number of instances, memory, io and threads.
|
||||
* Http Endpoint checks. The Monasca Agent can perform active checks on http endpoints by sending an HTTP request to an API.
|
||||
|
@ -370,12 +368,14 @@ This section documents the system metrics that are sent by the Agent. This sect
|
|||
| cpu.stolen_perc | | Percentage of stolen CPU time, i.e. the time spent in other OS contexts when running in a virtualized environment |
|
||||
| cpu.system_perc | | Percentage of time the CPU is used at the system level |
|
||||
| cpu.user_perc | | Percentage of time the CPU is used at the user level |
|
||||
| disk.inode_used_perc | device | The percentage of inodes that are used on a device |
|
||||
| disk.space_used_perc | device | The percentage of disk space that is being used on a device |
|
||||
| disk.inode_used_perc | device, mount_point | The percentage of inodes that are used on a device |
|
||||
| disk.space_used_perc | device, mount_point | The percentage of disk space that is being used on a device |
|
||||
| io.read_kbytes_sec | device | Kbytes/sec read by an io device
|
||||
| io.read_req_sec | device | Number of read requests/sec to an io device
|
||||
| io.read_time_sec | device | Amount of read time/sec to an io device
|
||||
| io.write_kbytes_sec |device | Kbytes/sec written by an io device
|
||||
| io.write_req_sec | device | Number of write requests/sec to an io device
|
||||
| io.write_time_sec | device | Amount of write time/sec to an io device
|
||||
| load.avg_1_min | | The average system load over a 1 minute period
|
||||
| load.avg_5_min | | The average system load over a 5 minute period
|
||||
| load.avg_15_min | | The average system load over a 15 minute period
|
||||
|
@ -390,12 +390,14 @@ This section documents the system metrics that are sent by the Agent. This sect
|
|||
| mem.used_buffers | | Number of buffers being used by the kernel for block io
|
||||
| mem.used_cached | | Memory used for the page cache
|
||||
| mem.used_shared | | Memory shared between separate processes and typically used for inter-process communication
|
||||
| net.in_bytes | device | Number of network bytes received
|
||||
| net.out_bytes | device | Number of network bytes sent
|
||||
| net.in_packets | device | Number of network packets received
|
||||
| net.out_packets | device | Number of network packets sent
|
||||
| net.in_errors | device | Number of network errors on incoming network traffic
|
||||
| net.out_errors | device | Number of network errors on outgoing network traffic
|
||||
| net.in_bytes_sec | device | Number of network bytes received per second
|
||||
| net.out_bytes_sec | device | Number of network bytes sent per second
|
||||
| net.in_packets_sec | device | Number of network packets received per second
|
||||
| net.out_packets_sec | device | Number of network packets sent per second
|
||||
| net.in_errors_sec | device | Number of network errors on incoming network traffic per second
|
||||
| net.out_errors_sec | device | Number of network errors on outgoing network traffic per second
|
||||
| net.in_packets_dropped_sec | device | Number of inbound network packets dropped per second
|
||||
| net.out_packets_dropped_sec | device | Number of inbound network packets dropped per second
|
||||
| monasca.thread_count | service=monitoring component=monasca-agent | Number of threads that the collector is consuming for this collection run
|
||||
| monasca.emit_time_sec | service=monitoring component=monasca-agent | Amount of time that the collector took for sending the collected metrics to the Forwarder for this collection run
|
||||
| monasca.collection_time_sec | service=monitoring component=monasca-agent | Amount of time that the collector took for this collection run
|
||||
|
@ -1181,8 +1183,5 @@ Here are some examples of how code can be instrumented using calls to monasca-st
|
|||
|
||||
```
|
||||
|
||||
# Log Parsing
|
||||
TBD
|
||||
|
||||
# License
|
||||
Copyright (c) 2014 Hewlett-Packard Development Company, L.P.
|
||||
|
|
|
@ -65,9 +65,6 @@ dimensions: {args.dimensions}
|
|||
# time to wait between collection runs
|
||||
check_freq: {args.check_frequency}
|
||||
|
||||
# Use mount points instead of volumes to track disk and fs metrics
|
||||
use_mount: no
|
||||
|
||||
# Change port the Agent is listening to
|
||||
# listen_port: 17123
|
||||
|
||||
|
@ -81,24 +78,6 @@ use_mount: no
|
|||
# https://github.com/DataDog/dd-agent/wiki/Network-Traffic-and-Proxy-Configuration
|
||||
# non_local_traffic: no
|
||||
|
||||
# ========================================================================== #
|
||||
# System Metrics configuration #
|
||||
# ========================================================================== #
|
||||
|
||||
# Enabled categories of system metrics to collect
|
||||
# Current system metrics available: cpu,disk,io,load,memory
|
||||
system_metrics: cpu,disk,io,load,memory
|
||||
|
||||
# Disk and IO #
|
||||
# Some infrastrucures have many constantly changing virtual devices (e.g. folks
|
||||
# running constantly churning linux containers) whose metrics aren't
|
||||
# interesting. To filter out a particular pattern of devices
|
||||
# from disk and IO collection, configure a regex here:
|
||||
# device_blacklist_re: .*\/dev\/mapper\/lxc-box.*
|
||||
|
||||
# For disk metrics it is also possible to ignore entire filesystem types
|
||||
ignore_filesystem_types: tmpfs,devtmpfs
|
||||
|
||||
[Statsd]
|
||||
# ========================================================================== #
|
||||
# Monasca Statsd configuration #
|
||||
|
|
|
@ -0,0 +1,5 @@
|
|||
init_config:
|
||||
|
||||
instances:
|
||||
# Cpu check only supports one configured instance
|
||||
- name: cpu_stats
|
|
@ -0,0 +1,19 @@
|
|||
init_config:
|
||||
|
||||
instances:
|
||||
# Disk check only supports one configured instance
|
||||
- name: disk_stats
|
||||
# Add a mount_point dimension to the disk system metrics (Default = True)
|
||||
#use_mount: False
|
||||
|
||||
# Send additional disk i/o system metrics (Default = True)
|
||||
#send_io_stats: False
|
||||
|
||||
# Some infrastructures have many constantly changing virtual devices (e.g. folks
|
||||
# running constantly churning linux containers) whose metrics aren't
|
||||
# interesting. To filter out a particular pattern of devices
|
||||
# from the disk collection, configure a regex here:
|
||||
# device_blacklist_re: .*\/dev\/mapper\/lxc-box.*
|
||||
|
||||
# For disk metrics it is also possible to ignore entire filesystem types
|
||||
ignore_filesystem_types: tmpfs,devtmpfs
|
|
@ -0,0 +1,5 @@
|
|||
init_config:
|
||||
|
||||
instances:
|
||||
# Load check only supports one configured instance
|
||||
- name: load_stats
|
|
@ -0,0 +1,5 @@
|
|||
init_config:
|
||||
|
||||
instances:
|
||||
# Memory check only supports one configured instance
|
||||
- name: memory_stats
|
|
@ -2,10 +2,9 @@ init_config:
|
|||
|
||||
instances:
|
||||
# Network check only supports one configured instance
|
||||
- collect_connection_state: false
|
||||
excluded_interfaces:
|
||||
- excluded_interfaces:
|
||||
- lo
|
||||
- lo0
|
||||
# Optionally completely ignore any network interface
|
||||
# matching the given regex:
|
||||
#excluded_interface_re: my-network-interface.*
|
||||
#excluded_interface_re: my-network-interface
|
||||
|
|
|
@ -1,2 +0,0 @@
|
|||
site_name: monasca-agent
|
||||
repo_url: https://github.com/stackforge/monasca-agent
|
|
@ -260,7 +260,7 @@ class Check(util.Dimensions):
|
|||
except Exception:
|
||||
pass
|
||||
if prettyprint:
|
||||
print("Metrics: {}".format(metrics))
|
||||
print("Metrics: {0}".format(metrics))
|
||||
return metrics
|
||||
|
||||
|
||||
|
@ -442,11 +442,11 @@ class AgentCheck(util.Dimensions):
|
|||
metrics = self.aggregator.flush()
|
||||
if prettyprint:
|
||||
for metric in metrics:
|
||||
print(" Timestamp: {}".format(metric.timestamp))
|
||||
print(" Name: {}".format(metric.name))
|
||||
print(" Value: {}".format(metric.value))
|
||||
print(" Timestamp: {0}".format(metric.timestamp))
|
||||
print(" Name: {0}".format(metric.name))
|
||||
print(" Value: {0}".format(metric.value))
|
||||
if (metric.delegated_tenant):
|
||||
print(" Delegtd ID: {}".format(metric.delegated_tenant))
|
||||
print(" Delegtd ID: {0}".format(metric.delegated_tenant))
|
||||
|
||||
print(" Dimensions: ", end='')
|
||||
line = 0
|
||||
|
@ -606,7 +606,7 @@ def run_check(name, path=None):
|
|||
# Read the config file
|
||||
config = Config()
|
||||
confd_path = path or os.path.join(config.get_confd_path(),
|
||||
'{}.yaml'.format(name))
|
||||
'{0}.yaml'.format(name))
|
||||
|
||||
try:
|
||||
f = open(confd_path)
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
# Core modules
|
||||
import logging
|
||||
import socket
|
||||
import system.unix as u
|
||||
import system.win32 as w32
|
||||
import threading
|
||||
import time
|
||||
|
@ -40,24 +39,16 @@ class Collector(util.Dimensions):
|
|||
self.initialized_checks_d = []
|
||||
self.init_failed_checks_d = []
|
||||
|
||||
# add system checks
|
||||
# add windows system checks
|
||||
if self.os == 'windows':
|
||||
self._checks = [w32.Disk(log),
|
||||
w32.IO(log),
|
||||
w32.Processes(log),
|
||||
w32.Memory(log),
|
||||
w32.Network(log),
|
||||
w32.Cpu(log)]
|
||||
w32.IO(log),
|
||||
w32.Processes(log),
|
||||
w32.Memory(log),
|
||||
w32.Network(log),
|
||||
w32.Cpu(log)]
|
||||
else:
|
||||
possible_checks = {'cpu': u.Cpu,
|
||||
'disk': u.Disk,
|
||||
'io': u.IO,
|
||||
'load': u.Load,
|
||||
'memory': u.Memory}
|
||||
self._checks = []
|
||||
# Only setup the configured system checks
|
||||
for check in self.agent_config.get('system_metrics', []):
|
||||
self._checks.append(possible_checks[check](log, self.agent_config))
|
||||
|
||||
if checksd:
|
||||
# is of type {check_name: check}
|
||||
|
|
|
@ -110,7 +110,7 @@ class ServicesCheck(monasca_agent.collector.checks.AgentCheck):
|
|||
self.resultsq.put(result)
|
||||
|
||||
except Exception:
|
||||
self.log.exception('Failure in ServiceCheck {}'.format(name))
|
||||
self.log.exception('Failure in ServiceCheck {0}'.format(name))
|
||||
result = (FAILURE, FAILURE, FAILURE, FAILURE)
|
||||
self.resultsq.put(result)
|
||||
|
||||
|
|
|
@ -1,558 +0,0 @@
|
|||
"""Unix system checks.
|
||||
"""
|
||||
|
||||
# stdlib
|
||||
import logging
|
||||
import psutil
|
||||
import re
|
||||
import subprocess as sp
|
||||
import sys
|
||||
import time
|
||||
|
||||
# project
|
||||
|
||||
import monasca_agent.collector.checks.check as check
|
||||
import monasca_agent.common.metrics as metrics
|
||||
import monasca_agent.common.util as util
|
||||
|
||||
|
||||
# locale-resilient float converter
|
||||
to_float = lambda s: float(s.replace(",", "."))
|
||||
|
||||
|
||||
class Disk(check.Check):
|
||||
"""Collects metrics about the machine's disks.
|
||||
"""
|
||||
|
||||
def check(self):
|
||||
"""Get disk space/inode stats.
|
||||
"""
|
||||
# psutil can be used for disk usage stats but not for inode information, so df is used.
|
||||
|
||||
fs_types_to_ignore = []
|
||||
# First get the configuration.
|
||||
if self.agent_config is not None:
|
||||
use_mount = self.agent_config.get("use_mount", False)
|
||||
blacklist_re = self.agent_config.get('device_blacklist_re', None)
|
||||
for fs_type in self.agent_config.get('ignore_filesystem_types', []):
|
||||
fs_types_to_ignore.extend(['-x', fs_type])
|
||||
else:
|
||||
use_mount = False
|
||||
blacklist_re = None
|
||||
platform_name = sys.platform
|
||||
|
||||
try:
|
||||
dfk_out = _get_subprocess_output(['df', '-k'] + fs_types_to_ignore)
|
||||
stats = self.parse_df_output(
|
||||
dfk_out,
|
||||
platform_name,
|
||||
use_mount=use_mount,
|
||||
blacklist_re=blacklist_re
|
||||
)
|
||||
|
||||
# Collect inode metrics.
|
||||
dfi_out = _get_subprocess_output(['df', '-i'] + fs_types_to_ignore)
|
||||
inodes = self.parse_df_output(
|
||||
dfi_out,
|
||||
platform_name,
|
||||
inodes=True,
|
||||
use_mount=use_mount,
|
||||
blacklist_re=blacklist_re
|
||||
)
|
||||
# parse into a list of Measurements
|
||||
stats.update(inodes)
|
||||
timestamp = time.time()
|
||||
measurements = [metrics.Measurement(key.split('.', 1)[1],
|
||||
timestamp,
|
||||
value,
|
||||
self._set_dimensions({'device': key.split('.', 1)[0]}),
|
||||
None)
|
||||
for key, value in stats.iteritems()]
|
||||
|
||||
return measurements
|
||||
|
||||
except Exception:
|
||||
self.logger.exception('Error collecting disk stats')
|
||||
return []
|
||||
|
||||
def parse_df_output(
|
||||
self, df_output, platform_name, inodes=False, use_mount=False, blacklist_re=None):
|
||||
"""Parse the output of the df command.
|
||||
|
||||
If use_volume is true the volume is used to anchor the metric, otherwise false the mount
|
||||
point is used. Returns a tuple of (disk, inode).
|
||||
"""
|
||||
usage_data = {}
|
||||
|
||||
# Transform the raw output into tuples of the df data.
|
||||
devices = self._transform_df_output(df_output, blacklist_re)
|
||||
|
||||
# If we want to use the mount point, replace the volume name on each line.
|
||||
for parts in devices:
|
||||
try:
|
||||
if use_mount:
|
||||
parts[0] = parts[-1]
|
||||
if inodes:
|
||||
if util.Platform.is_darwin(platform_name):
|
||||
# Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted
|
||||
# Inodes are in position 5, 6 and we need to compute the total
|
||||
# Total
|
||||
parts[1] = int(parts[5]) + int(parts[6]) # Total
|
||||
parts[2] = int(parts[5]) # Used
|
||||
parts[3] = int(parts[6]) # Available
|
||||
elif util.Platform.is_freebsd(platform_name):
|
||||
# Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted
|
||||
# Inodes are in position 5, 6 and we need to compute the total
|
||||
parts[1] = int(parts[5]) + int(parts[6]) # Total
|
||||
parts[2] = int(parts[5]) # Used
|
||||
parts[3] = int(parts[6]) # Available
|
||||
else:
|
||||
parts[1] = int(parts[1]) # Total
|
||||
parts[2] = int(parts[2]) # Used
|
||||
parts[3] = int(parts[3]) # Available
|
||||
else:
|
||||
parts[1] = int(parts[1]) # Total
|
||||
parts[2] = int(parts[2]) # Used
|
||||
parts[3] = int(parts[3]) # Available
|
||||
except IndexError:
|
||||
self.logger.exception("Cannot parse %s" % (parts,))
|
||||
|
||||
# Some partitions (EFI boot) may appear to have 0 available inodes
|
||||
if parts[1] == 0:
|
||||
continue
|
||||
|
||||
#
|
||||
# Remote shared storage device names like '10.103.0.220:/instances'
|
||||
# cause invalid metrics on the api server side, so if we encounter
|
||||
# a colon, remove everything to the left of it (including the
|
||||
# offending colon).
|
||||
#
|
||||
device_name = parts[0]
|
||||
idx = device_name.find(":")
|
||||
if idx > 0:
|
||||
device_name = device_name[(idx+1):]
|
||||
if inodes:
|
||||
usage_data['%s.disk.inode_used_perc' % device_name] = float(parts[2]) / parts[1] * 100
|
||||
else:
|
||||
usage_data['%s.disk.space_used_perc' % device_name] = float(parts[2]) / parts[1] * 100
|
||||
|
||||
return usage_data
|
||||
|
||||
@staticmethod
|
||||
def _is_number(a_string):
|
||||
try:
|
||||
float(a_string)
|
||||
except ValueError:
|
||||
return False
|
||||
return True
|
||||
|
||||
def _is_real_device(self, device):
|
||||
"""Return true if we should track the given device name and false otherwise.
|
||||
"""
|
||||
# First, skip empty lines.
|
||||
if not device or len(device) <= 1:
|
||||
return False
|
||||
|
||||
# Filter out fake devices.
|
||||
device_name = device[0]
|
||||
if device_name == 'none':
|
||||
return False
|
||||
|
||||
# Now filter our fake hosts like 'map -hosts'. For example:
|
||||
# Filesystem 1024-blocks Used Available Capacity Mounted on
|
||||
# /dev/disk0s2 244277768 88767396 155254372 37% /
|
||||
# map -hosts 0 0 0 100% /net
|
||||
blocks = device[1]
|
||||
if not self._is_number(blocks):
|
||||
return False
|
||||
return True
|
||||
|
||||
def _flatten_devices(self, devices):
|
||||
# Some volumes are stored on their own line. Rejoin them here.
|
||||
previous = None
|
||||
for parts in devices:
|
||||
if len(parts) == 1:
|
||||
previous = parts[0]
|
||||
elif previous and self._is_number(parts[0]):
|
||||
# collate with previous line
|
||||
parts.insert(0, previous)
|
||||
previous = None
|
||||
else:
|
||||
previous = None
|
||||
return devices
|
||||
|
||||
def _transform_df_output(self, df_output, blacklist_re):
|
||||
"""Given raw output for the df command, transform it into a normalized list devices.
|
||||
|
||||
A 'device' is a list with fields corresponding to the output of df output on each platform.
|
||||
"""
|
||||
all_devices = [l.strip().split() for l in df_output.split("\n")]
|
||||
|
||||
# Skip the header row and empty lines.
|
||||
raw_devices = [l for l in all_devices[1:] if l]
|
||||
|
||||
# Flatten the disks that appear in the mulitple lines.
|
||||
flattened_devices = self._flatten_devices(raw_devices)
|
||||
|
||||
# Filter fake disks.
|
||||
def keep_device(device):
|
||||
if not self._is_real_device(device):
|
||||
return False
|
||||
if blacklist_re and blacklist_re.match(device[0]):
|
||||
return False
|
||||
return True
|
||||
|
||||
devices = filter(keep_device, flattened_devices)
|
||||
|
||||
return devices
|
||||
|
||||
|
||||
class IO(check.Check):
|
||||
|
||||
def __init__(self, logger, agent_config=None):
|
||||
super(IO, self).__init__(logger, agent_config)
|
||||
self.header_re = re.compile(r'([%\\/\-_a-zA-Z0-9]+)[\s+]?')
|
||||
self.item_re = re.compile(r'^([a-zA-Z0-9\/]+)')
|
||||
self.value_re = re.compile(r'\d+\.\d+')
|
||||
self.stat_blacklist = ["await", "wrqm/s", "avgqu-sz", "r_await", "w_await", "rrqm/s",
|
||||
"avgrq-sz", "%util", "svctm"]
|
||||
|
||||
def _parse_linux2(self, output):
|
||||
recent_stats = output.split('Device:')[2].split('\n')
|
||||
header = recent_stats[0]
|
||||
header_names = re.findall(self.header_re, header)
|
||||
|
||||
io_stats = {}
|
||||
|
||||
for statsIndex in range(1, len(recent_stats)):
|
||||
row = recent_stats[statsIndex]
|
||||
|
||||
if not row:
|
||||
# Ignore blank lines.
|
||||
continue
|
||||
|
||||
device_match = self.item_re.match(row)
|
||||
|
||||
if device_match is not None:
|
||||
# Sometimes device names span two lines.
|
||||
device = device_match.groups()[0]
|
||||
else:
|
||||
continue
|
||||
|
||||
values = re.findall(self.value_re, row)
|
||||
|
||||
if not values:
|
||||
# Sometimes values are on the next line so we encounter
|
||||
# instances of [].
|
||||
continue
|
||||
|
||||
io_stats[device] = {}
|
||||
|
||||
for header_index in range(len(header_names)):
|
||||
header_name = header_names[header_index]
|
||||
io_stats[device][self.xlate(header_name, "linux")] = values[header_index]
|
||||
|
||||
return io_stats
|
||||
|
||||
@staticmethod
|
||||
def _parse_darwin(output):
|
||||
lines = [l.split() for l in output.split("\n") if len(l) > 0]
|
||||
disks = lines[0]
|
||||
lastline = lines[-1]
|
||||
io = {}
|
||||
for idx, disk in enumerate(disks):
|
||||
kb_t, tps, mb_s = map(float, lastline[(3 * idx):(3 * idx) + 3]) # 3 cols at a time
|
||||
io[disk] = {
|
||||
'system.io.bytes_per_s': mb_s * 10 ** 6,
|
||||
}
|
||||
return io
|
||||
|
||||
@staticmethod
|
||||
def xlate(metric_name, os_name):
|
||||
"""Standardize on linux metric names.
|
||||
"""
|
||||
if os_name == "sunos":
|
||||
names = {"wait": "await",
|
||||
"svc_t": "svctm",
|
||||
"%b": "%util",
|
||||
"kr/s": "io.read_kbytes_sec",
|
||||
"kw/s": "io.write_kbytes_sec",
|
||||
"actv": "avgqu-sz"}
|
||||
elif os_name == "freebsd":
|
||||
names = {"svc_t": "await",
|
||||
"%b": "%util",
|
||||
"kr/s": "io.read_kbytes_sec",
|
||||
"kw/s": "io.write_kbytes_sec",
|
||||
"wait": "avgqu-sz"}
|
||||
elif os_name == "linux":
|
||||
names = {"rkB/s": "io.read_kbytes_sec",
|
||||
"r/s": "io.read_req_sec",
|
||||
"wkB/s": "io.write_kbytes_sec",
|
||||
"w/s": "io.write_req_sec"}
|
||||
# translate if possible
|
||||
return names.get(metric_name, metric_name)
|
||||
|
||||
def check(self):
|
||||
"""Capture io stats.
|
||||
|
||||
@rtype dict
|
||||
@return [metrics.Measurement, ]
|
||||
"""
|
||||
|
||||
# TODO psutil.disk_io_counters() will also return this infomration but it isn't per second and so must be
|
||||
# converted possibly by doing two runs a second apart or storing timestamp+data from the previous collection
|
||||
io = {}
|
||||
try:
|
||||
if util.Platform.is_linux():
|
||||
stdout = sp.Popen(['iostat', '-d', '1', '2', '-x', '-k'],
|
||||
stdout=sp.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
|
||||
# Linux 2.6.32-343-ec2 (ip-10-35-95-10) 12/11/2012 _x86_64_ (2 CPU)
|
||||
#
|
||||
# Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
|
||||
# sda1 0.00 17.61 0.26 32.63 4.23 201.04 12.48 0.16 4.81 0.53 1.73
|
||||
# sdb 0.00 2.68 0.19 3.84 5.79 26.07 15.82 0.02 4.93 0.22 0.09
|
||||
# sdg 0.00 0.13 2.29 3.84 100.53 30.61 42.78 0.05 8.41 0.88 0.54
|
||||
# sdf 0.00 0.13 2.30 3.84 100.54 30.61 42.78 0.06 9.12 0.90 0.55
|
||||
# md0 0.00 0.00 0.05 3.37 1.41 30.01 18.35 0.00 0.00 0.00 0.00
|
||||
#
|
||||
# Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
|
||||
# sda1 0.00 0.00 0.00 10.89 0.00 43.56 8.00 0.03 2.73 2.73 2.97
|
||||
# sdb 0.00 0.00 0.00 2.97 0.00 11.88 8.00 0.00 0.00 0.00 0.00
|
||||
# sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
|
||||
# sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
|
||||
# md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
|
||||
# 0.00 0.00 0.00 0.00
|
||||
io.update(self._parse_linux2(stdout))
|
||||
|
||||
elif sys.platform == "sunos5":
|
||||
iostat = sp.Popen(["iostat", "-x", "-d", "1", "2"],
|
||||
stdout=sp.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
|
||||
# extended device statistics <-- since boot
|
||||
# device r/s w/s kr/s kw/s wait actv svc_t %w %b
|
||||
# ramdisk1 0.0 0.0 0.1 0.1 0.0 0.0 0.0 0 0
|
||||
# sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
|
||||
# sd1 79.9 149.9 1237.6 6737.9 0.0 0.5 2.3 0 11
|
||||
# extended device statistics <-- past second
|
||||
# device r/s w/s kr/s kw/s wait actv svc_t %w %b
|
||||
# ramdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
|
||||
# sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
|
||||
# sd1 0.0 139.0 0.0 1850.6 0.0 0.0 0.1 0 1
|
||||
|
||||
# discard the first half of the display (stats since boot)
|
||||
lines = [l for l in iostat.split("\n") if len(l) > 0]
|
||||
lines = lines[len(lines) / 2:]
|
||||
|
||||
assert "extended device statistics" in lines[0]
|
||||
headers = lines[1].split()
|
||||
assert "device" in headers
|
||||
for l in lines[2:]:
|
||||
cols = l.split()
|
||||
# cols[0] is the device
|
||||
# cols[1:] are the values
|
||||
io[cols[0]] = {}
|
||||
for i in range(1, len(cols)):
|
||||
io[cols[0]][self.xlate(headers[i], "sunos")] = cols[i]
|
||||
|
||||
elif sys.platform.startswith("freebsd"):
|
||||
iostat = sp.Popen(["iostat", "-x", "-d", "1", "2"],
|
||||
stdout=sp.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
|
||||
# Be careful!
|
||||
# It looks like SunOS, but some columms (wait, svc_t) have different meaning
|
||||
# extended device statistics
|
||||
# device r/s w/s kr/s kw/s wait svc_t %b
|
||||
# ad0 3.1 1.3 49.9 18.8 0 0.7 0
|
||||
# extended device statistics
|
||||
# device r/s w/s kr/s kw/s wait svc_t %b
|
||||
# ad0 0.0 2.0 0.0 31.8 0 0.2 0
|
||||
|
||||
# discard the first half of the display (stats since boot)
|
||||
lines = [l for l in iostat.split("\n") if len(l) > 0]
|
||||
lines = lines[len(lines) / 2:]
|
||||
|
||||
assert "extended device statistics" in lines[0]
|
||||
headers = lines[1].split()
|
||||
assert "device" in headers
|
||||
for l in lines[2:]:
|
||||
cols = l.split()
|
||||
# cols[0] is the device
|
||||
# cols[1:] are the values
|
||||
io[cols[0]] = {}
|
||||
for i in range(1, len(cols)):
|
||||
io[cols[0]][self.xlate(headers[i], "freebsd")] = cols[i]
|
||||
elif sys.platform == 'darwin':
|
||||
iostat = sp.Popen(['iostat', '-d', '-c', '2', '-w', '1'],
|
||||
stdout=sp.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
# disk0 disk1 <-- number of disks
|
||||
# KB/t tps MB/s KB/t tps MB/s
|
||||
# 21.11 23 0.47 20.01 0 0.00
|
||||
# 6.67 3 0.02 0.00 0 0.00 <-- line of interest
|
||||
io = self._parse_darwin(iostat)
|
||||
else:
|
||||
return []
|
||||
|
||||
# If we filter devices, do it know.
|
||||
if self.agent_config is not None:
|
||||
device_blacklist_re = self.agent_config.get('device_blacklist_re', None)
|
||||
else:
|
||||
device_blacklist_re = None
|
||||
if device_blacklist_re:
|
||||
filtered_io = {}
|
||||
for device, stats in io.iteritems():
|
||||
if not device_blacklist_re.match(device):
|
||||
filtered_io[device] = stats
|
||||
else:
|
||||
filtered_io = io
|
||||
|
||||
measurements = []
|
||||
timestamp = time.time()
|
||||
for dev_name, stats in filtered_io.iteritems():
|
||||
filtered_stats = dict((stat, stats[stat])
|
||||
for stat in stats.iterkeys() if stat not in self.stat_blacklist)
|
||||
m_list = [metrics.Measurement(key,
|
||||
timestamp,
|
||||
value,
|
||||
self._set_dimensions({'device': dev_name}),
|
||||
None)
|
||||
for key, value in filtered_stats.iteritems()]
|
||||
measurements.extend(m_list)
|
||||
|
||||
return measurements
|
||||
|
||||
except Exception:
|
||||
self.logger.exception("Cannot extract IO statistics")
|
||||
return []
|
||||
|
||||
|
||||
class Load(check.Check):
|
||||
|
||||
def check(self):
|
||||
if util.Platform.is_linux():
|
||||
try:
|
||||
loadAvrgProc = open('/proc/loadavg', 'r')
|
||||
uptime = loadAvrgProc.readlines()
|
||||
loadAvrgProc.close()
|
||||
except Exception:
|
||||
self.logger.exception('Cannot extract load')
|
||||
return []
|
||||
|
||||
uptime = uptime[0] # readlines() provides a list but we want a string
|
||||
|
||||
elif sys.platform in ('darwin', 'sunos5') or sys.platform.startswith("freebsd"):
|
||||
# Get output from uptime
|
||||
try:
|
||||
uptime = sp.Popen(['uptime'],
|
||||
stdout=sp.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
except Exception:
|
||||
self.logger.exception('Cannot extract load')
|
||||
return {}
|
||||
|
||||
# Split out the 3 load average values
|
||||
load = [res.replace(',', '.') for res in re.findall(r'([0-9]+[\.,]\d+)', uptime)]
|
||||
timestamp = time.time()
|
||||
dimensions = self._set_dimensions(None)
|
||||
|
||||
return [metrics.Measurement('load.avg_1_min', timestamp, float(load[0]), dimensions),
|
||||
metrics.Measurement('load.avg_5_min', timestamp, float(load[1]), dimensions),
|
||||
metrics.Measurement('load.avg_15_min', timestamp, float(load[2]), dimensions)]
|
||||
|
||||
|
||||
class Memory(check.Check):
|
||||
|
||||
def check(self):
|
||||
mem_info = psutil.virtual_memory()
|
||||
swap_info = psutil.swap_memory()
|
||||
mem_data = {
|
||||
'mem.total_mb': int(mem_info.total/1048576),
|
||||
'mem.free_mb': int(mem_info.free/1048576),
|
||||
'mem.usable_mb': int(mem_info.available/1048576),
|
||||
'mem.usable_perc': float(100 - mem_info.percent),
|
||||
'mem.swap_total_mb': int(swap_info.total/1048576),
|
||||
'mem.swap_used_mb': int(swap_info.used/1048576),
|
||||
'mem.swap_free_mb': int(swap_info.free/1048576),
|
||||
'mem.swap_free_perc': float(100 - swap_info.percent)
|
||||
}
|
||||
|
||||
if 'buffers' in mem_info:
|
||||
mem_data['mem.used_buffers'] = int(mem_info.buffers/1048576)
|
||||
|
||||
if 'cached' in mem_info:
|
||||
mem_data['mem.used_cache'] = int(mem_info.cached/1048576)
|
||||
|
||||
if 'shared' in mem_info:
|
||||
mem_data['mem.used_shared'] = int(mem_info.shared/1048576)
|
||||
|
||||
timestamp = time.time()
|
||||
dimensions = self._set_dimensions(None)
|
||||
return [metrics.Measurement(key, timestamp, value, dimensions) for key, value in mem_data.iteritems()]
|
||||
|
||||
|
||||
class Cpu(check.Check):
|
||||
|
||||
def check(self):
|
||||
"""Return an aggregate of CPU stats across all CPUs.
|
||||
"""
|
||||
cpu_stats = psutil.cpu_times_percent(percpu=False)
|
||||
return self._format_results(cpu_stats.user + cpu_stats.nice,
|
||||
cpu_stats.system + cpu_stats.irq + cpu_stats.softirq,
|
||||
cpu_stats.iowait,
|
||||
cpu_stats.idle,
|
||||
cpu_stats.steal)
|
||||
|
||||
def _format_results(self, us, sy, wa, idle, st):
|
||||
data = {'cpu.user_perc': us,
|
||||
'cpu.system_perc': sy,
|
||||
'cpu.wait_perc': wa,
|
||||
'cpu.idle_perc': idle,
|
||||
'cpu.stolen_perc': st}
|
||||
for key in data.keys():
|
||||
if data[key] is None:
|
||||
del data[key]
|
||||
|
||||
timestamp = time.time()
|
||||
dimensions = self._set_dimensions(None)
|
||||
return [metrics.Measurement(key, timestamp, value, dimensions) for key, value in data.iteritems()]
|
||||
|
||||
|
||||
def _get_subprocess_output(command):
|
||||
"""Run the given subprocess command and return it's output.
|
||||
|
||||
Raise an Exception if an error occurs.
|
||||
"""
|
||||
proc = sp.Popen(command, stdout=sp.PIPE, close_fds=True)
|
||||
return proc.stdout.read()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# 1s loop with results
|
||||
|
||||
logging.basicConfig(level=logging.DEBUG, format='%(asctime)-15s %(message)s')
|
||||
log = logging.getLogger()
|
||||
config = {"device_blacklist_re": re.compile('.*disk0.*')}
|
||||
cpu = Cpu(log, config)
|
||||
disk = Disk(log, config)
|
||||
io = IO(log, config)
|
||||
load = Load(log, config)
|
||||
mem = Memory(log, config)
|
||||
|
||||
while True:
|
||||
print("=" * 10)
|
||||
print("--- IO ---")
|
||||
print(io.check())
|
||||
print("--- Disk ---")
|
||||
print(disk.check())
|
||||
print("--- CPU ---")
|
||||
print(cpu.check())
|
||||
print("--- Load ---")
|
||||
print(load.check())
|
||||
print("--- Memory ---")
|
||||
print(mem.check())
|
||||
print("\n\n\n")
|
||||
time.sleep(1)
|
|
@ -0,0 +1,41 @@
|
|||
import psutil
|
||||
import logging
|
||||
|
||||
import monasca_agent.collector.checks as checks
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Cpu(checks.AgentCheck):
|
||||
|
||||
def __init__(self, name, init_config, agent_config):
|
||||
super(Cpu, self).__init__(name, init_config, agent_config)
|
||||
|
||||
def check(self, instance):
|
||||
"""Capture cpu stats
|
||||
|
||||
"""
|
||||
dimensions = self._set_dimensions(None, instance)
|
||||
|
||||
cpu_stats = psutil.cpu_times_percent(percpu=False)
|
||||
self._format_results(cpu_stats.user + cpu_stats.nice,
|
||||
cpu_stats.system + cpu_stats.irq + cpu_stats.softirq,
|
||||
cpu_stats.iowait,
|
||||
cpu_stats.idle,
|
||||
cpu_stats.steal,
|
||||
dimensions)
|
||||
|
||||
def _format_results(self, us, sy, wa, idle, st, dimensions):
|
||||
data = {'cpu.user_perc': us,
|
||||
'cpu.system_perc': sy,
|
||||
'cpu.wait_perc': wa,
|
||||
'cpu.idle_perc': idle,
|
||||
'cpu.stolen_perc': st}
|
||||
|
||||
for key in data.keys():
|
||||
if data[key] is None:
|
||||
del data[key]
|
||||
|
||||
[self.gauge(key, value, dimensions) for key, value in data.iteritems()]
|
||||
|
||||
log.debug('Collected {0} cpu metrics'.format(len(data)))
|
|
@ -0,0 +1,102 @@
|
|||
import psutil
|
||||
import logging
|
||||
import os
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
import monasca_agent.collector.checks as checks
|
||||
|
||||
|
||||
class Disk(checks.AgentCheck):
|
||||
|
||||
def __init__(self, name, init_config, agent_config):
|
||||
super(Disk, self).__init__(name, init_config, agent_config)
|
||||
|
||||
def check(self, instance):
|
||||
"""Capture disk stats
|
||||
|
||||
"""
|
||||
dimensions = self._set_dimensions(None, instance)
|
||||
|
||||
if instance is not None:
|
||||
use_mount = instance.get("use_mount", True)
|
||||
send_io_stats = instance.get("send_io_stats", True)
|
||||
# If we filter devices, get the list.
|
||||
device_blacklist_re = self._get_re_exclusions(instance)
|
||||
fs_types_to_ignore = self._get_fs_exclusions(instance)
|
||||
else:
|
||||
use_mount = True
|
||||
fs_types_to_ignore = []
|
||||
device_blacklist_re = None
|
||||
send_io_stats = True
|
||||
|
||||
partitions = psutil.disk_partitions()
|
||||
disk_count = 0
|
||||
for partition in partitions:
|
||||
if partition.fstype not in fs_types_to_ignore \
|
||||
or (device_blacklist_re \
|
||||
and not device_blacklist_re.match(partition.device)):
|
||||
device_name = self._get_device_name(partition.device)
|
||||
disk_usage = psutil.disk_usage(partition.mountpoint)
|
||||
st = os.statvfs(partition.mountpoint)
|
||||
if use_mount:
|
||||
dimensions.update({'mount_point': partition.mountpoint})
|
||||
self.gauge("disk.space_used_perc",
|
||||
disk_usage.percent,
|
||||
device_name=device_name,
|
||||
dimensions=dimensions)
|
||||
disk_count += 1
|
||||
if st.f_files > 0:
|
||||
self.gauge("disk.inode_used_perc",
|
||||
round((float(st.f_files - st.f_ffree) / st.f_files) * 100, 2),
|
||||
device_name=device_name,
|
||||
dimensions=dimensions)
|
||||
disk_count += 1
|
||||
|
||||
log.debug('Collected {0} disk usage metrics for partition {1}'.format(disk_count, partition.mountpoint))
|
||||
disk_count = 0
|
||||
if send_io_stats:
|
||||
disk_stats = psutil.disk_io_counters(perdisk=True)
|
||||
stats = disk_stats[device_name]
|
||||
self.rate("io.read_req_sec", float(stats.read_count), device_name=device_name, dimensions=dimensions)
|
||||
self.rate("io.write_req_sec", float(stats.write_count), device_name=device_name, dimensions=dimensions)
|
||||
self.rate("io.read_kbytes_sec", float(stats.read_bytes / 1024), device_name=device_name, dimensions=dimensions)
|
||||
self.rate("io.write_kbytes_sec", float(stats.write_bytes / 1024), device_name=device_name, dimensions=dimensions)
|
||||
self.rate("io.read_time_sec", float(stats.read_time / 1000), device_name=device_name, dimensions=dimensions)
|
||||
self.rate("io.write_time_sec", float(stats.write_time / 1000), device_name=device_name, dimensions=dimensions)
|
||||
|
||||
log.debug('Collected 6 disk I/O metrics for partition {0}'.format(partition.mountpoint))
|
||||
|
||||
def _get_re_exclusions(self, instance):
|
||||
"""Parse device blacklist regular expression"""
|
||||
filter = None
|
||||
try:
|
||||
filter_device_re = instance.get('device_blacklist_re', None)
|
||||
if filter_device_re:
|
||||
filter = re.compile(filter_device_re)
|
||||
except re.error as err:
|
||||
log.error('Error processing regular expression {0}'.format(filter_device_re))
|
||||
|
||||
return filter
|
||||
|
||||
def _get_fs_exclusions(self, instance):
|
||||
"""parse comma separated file system types to ignore list"""
|
||||
file_system_list = []
|
||||
try:
|
||||
file_systems = instance.get('ignore_filesystem_types', None)
|
||||
if file_systems:
|
||||
# Parse file system types
|
||||
file_system_list.extend([x.strip() for x in file_systems.split(',')])
|
||||
except ValueError:
|
||||
log.info("Unable to process ignore_filesystem_types.")
|
||||
|
||||
return file_system_list
|
||||
|
||||
def _get_device_name(self, device):
|
||||
start = device.rfind("/")
|
||||
if start > -1:
|
||||
device_name = device[start + 1:]
|
||||
else:
|
||||
device_name = device
|
||||
|
||||
return device_name
|
|
@ -64,7 +64,7 @@ class HTTPCheck(services_checks.ServicesCheck):
|
|||
headers["Content-type"] = "application/json"
|
||||
else:
|
||||
self.log.warning("""Unable to get token. Keystone API server may be down.
|
||||
Skipping check for {}""".format(addr))
|
||||
Skipping check for {0}""".format(addr))
|
||||
return
|
||||
try:
|
||||
self.log.debug("Connecting to %s" % addr)
|
||||
|
|
|
@ -69,7 +69,7 @@ consumer_groups:
|
|||
consumer_offsets[(consumer_group, topic, partition)] = kafka_consumer.offsets[partition]
|
||||
except KeyError:
|
||||
kafka_consumer.stop()
|
||||
self.log.error('Error fetching consumer offset for {} partition {}'.format(topic, partition))
|
||||
self.log.error('Error fetching consumer offset for {0} partition {1}'.format(topic, partition))
|
||||
kafka_consumer.stop()
|
||||
|
||||
# Query Kafka for the broker offsets, done in a separate loop so only one query is done
|
||||
|
@ -82,7 +82,7 @@ consumer_groups:
|
|||
response = kafka_conn.send_offset_request([common.OffsetRequest(topic, p, -1, 1)])
|
||||
offset_responses.append(response[0])
|
||||
except common.KafkaError as e:
|
||||
self.log.error("Error fetching broker offset: {}".format(e))
|
||||
self.log.error("Error fetching broker offset: {0}".format(e))
|
||||
|
||||
for resp in offset_responses:
|
||||
broker_offsets[(resp.topic, resp.partition)] = resp.offsets[0]
|
||||
|
|
|
@ -32,10 +32,10 @@ class LibvirtCheck(AgentCheck):
|
|||
|
||||
def __init__(self, name, init_config, agent_config):
|
||||
AgentCheck.__init__(self, name, init_config, agent_config)
|
||||
self.instance_cache_file = "{}/{}".format(self.init_config.get('cache_dir'),
|
||||
'libvirt_instances.yaml')
|
||||
self.metric_cache_file = "{}/{}".format(self.init_config.get('cache_dir'),
|
||||
'libvirt_metrics.yaml')
|
||||
self.instance_cache_file = "{0}/{1}".format(self.init_config.get('cache_dir'),
|
||||
'libvirt_instances.yaml')
|
||||
self.metric_cache_file = "{0}/{1}".format(self.init_config.get('cache_dir'),
|
||||
'libvirt_metrics.yaml')
|
||||
|
||||
def _test_vm_probation(self, created):
|
||||
"""Test to see if a VM was created within the probation period.
|
||||
|
@ -79,7 +79,7 @@ class LibvirtCheck(AgentCheck):
|
|||
if stat.S_IMODE(os.stat(self.instance_cache_file).st_mode) != 0600:
|
||||
os.chmod(self.instance_cache_file, 0600)
|
||||
except IOError as e:
|
||||
self.log.error("Cannot write to {}: {}".format(self.instance_cache_file, e))
|
||||
self.log.error("Cannot write to {0}: {1}".format(self.instance_cache_file, e))
|
||||
|
||||
return id_cache
|
||||
|
||||
|
@ -125,7 +125,7 @@ class LibvirtCheck(AgentCheck):
|
|||
if stat.S_IMODE(os.stat(self.metric_cache_file).st_mode) != 0600:
|
||||
os.chmod(self.metric_cache_file, 0600)
|
||||
except IOError as e:
|
||||
self.log.error("Cannot write to {}: {}".format(self.metric_cache_file, e))
|
||||
self.log.error("Cannot write to {0}: {1}".format(self.metric_cache_file, e))
|
||||
|
||||
def check(self, instance):
|
||||
"""Gather VM metrics for each instance"""
|
||||
|
@ -150,8 +150,8 @@ class LibvirtCheck(AgentCheck):
|
|||
# Skip instances created within the probation period
|
||||
vm_probation_remaining = self._test_vm_probation(instance_cache.get(inst.name)['created'])
|
||||
if (vm_probation_remaining >= 0):
|
||||
self.log.info("Libvirt: {} in probation for another {} seconds".format(instance_cache.get(inst.name)['hostname'],
|
||||
vm_probation_remaining))
|
||||
self.log.info("Libvirt: {0} in probation for another {1} seconds".format(instance_cache.get(inst.name)['hostname'],
|
||||
vm_probation_remaining))
|
||||
continue
|
||||
|
||||
# Build customer dimensions
|
||||
|
@ -187,7 +187,7 @@ class LibvirtCheck(AgentCheck):
|
|||
sample_time = int(time.time())
|
||||
disk_dimensions = {'device': disk[0].device}
|
||||
for metric in disk[1]._fields:
|
||||
metric_name = "io.{}".format(metric)
|
||||
metric_name = "io.{0}".format(metric)
|
||||
if metric_name not in metric_cache[inst.name]:
|
||||
metric_cache[inst.name][metric_name] = {}
|
||||
|
||||
|
@ -197,7 +197,7 @@ class LibvirtCheck(AgentCheck):
|
|||
val_diff = value - metric_cache[inst.name][metric_name][disk[0].device]['value']
|
||||
# Change the metric name to a rate, ie. "io.read_requests"
|
||||
# gets converted to "io.read_ops_sec"
|
||||
rate_name = "{}_sec".format(metric_name.replace('requests', 'ops'))
|
||||
rate_name = "{0}_sec".format(metric_name.replace('requests', 'ops'))
|
||||
# Customer
|
||||
this_dimensions = disk_dimensions.copy()
|
||||
this_dimensions.update(dims_customer)
|
||||
|
@ -207,7 +207,7 @@ class LibvirtCheck(AgentCheck):
|
|||
# Operations (metric name prefixed with "vm."
|
||||
this_dimensions = disk_dimensions.copy()
|
||||
this_dimensions.update(dims_operations)
|
||||
self.gauge("vm.{}".format(rate_name), val_diff,
|
||||
self.gauge("vm.{0}".format(rate_name), val_diff,
|
||||
dimensions=this_dimensions)
|
||||
# Save this metric to the cache
|
||||
metric_cache[inst.name][metric_name][disk[0].device] = {
|
||||
|
@ -219,7 +219,7 @@ class LibvirtCheck(AgentCheck):
|
|||
sample_time = int(time.time())
|
||||
vnic_dimensions = {'device': vnic[0].name}
|
||||
for metric in vnic[1]._fields:
|
||||
metric_name = "net.{}".format(metric)
|
||||
metric_name = "net.{0}".format(metric)
|
||||
if metric_name not in metric_cache[inst.name]:
|
||||
metric_cache[inst.name][metric_name] = {}
|
||||
|
||||
|
@ -229,7 +229,7 @@ class LibvirtCheck(AgentCheck):
|
|||
val_diff = value - metric_cache[inst.name][metric_name][vnic[0].name]['value']
|
||||
# Change the metric name to a rate, ie. "net.rx_bytes"
|
||||
# gets converted to "net.rx_bytes_sec"
|
||||
rate_name = "{}_sec".format(metric_name)
|
||||
rate_name = "{0}_sec".format(metric_name)
|
||||
# Rename "tx" to "out" and "rx" to "in"
|
||||
rate_name = rate_name.replace("tx", "out")
|
||||
rate_name = rate_name.replace("rx", "in")
|
||||
|
@ -243,7 +243,7 @@ class LibvirtCheck(AgentCheck):
|
|||
# Operations (metric name prefixed with "vm."
|
||||
this_dimensions = vnic_dimensions.copy()
|
||||
this_dimensions.update(dims_operations)
|
||||
self.gauge("vm.{}".format(rate_name), val_diff,
|
||||
self.gauge("vm.{0}".format(rate_name), val_diff,
|
||||
dimensions=this_dimensions)
|
||||
# Save this metric to the cache
|
||||
metric_cache[inst.name][metric_name][vnic[0].name] = {
|
||||
|
|
|
@ -0,0 +1,60 @@
|
|||
import psutil
|
||||
import logging
|
||||
import re
|
||||
import subprocess
|
||||
|
||||
import monasca_agent.collector.checks as checks
|
||||
import monasca_agent.common.util as util
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Load(checks.AgentCheck):
|
||||
|
||||
def __init__(self, name, init_config, agent_config):
|
||||
super(Load, self).__init__(name, init_config, agent_config)
|
||||
|
||||
def check(self, instance):
|
||||
"""Capture load stats
|
||||
|
||||
"""
|
||||
|
||||
dimensions = self._set_dimensions(None, instance)
|
||||
|
||||
if util.Platform.is_linux():
|
||||
try:
|
||||
loadAvrgProc = open('/proc/loadavg', 'r')
|
||||
uptime = loadAvrgProc.readlines()
|
||||
loadAvrgProc.close()
|
||||
except Exception:
|
||||
log.exception('Cannot extract load using /proc/loadavg')
|
||||
return
|
||||
|
||||
uptime = uptime[0] # readlines() provides a list but we want a string
|
||||
|
||||
elif sys.platform in ('darwin', 'sunos5') or sys.platform.startswith("freebsd"):
|
||||
# Get output from uptime
|
||||
try:
|
||||
uptime = subprocess.Popen(['uptime'],
|
||||
stdout=sp.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
except Exception:
|
||||
log.exception('Cannot extract load using uptime')
|
||||
return
|
||||
|
||||
# Split out the 3 load average values
|
||||
load = [res.replace(',', '.') for res in re.findall(r'([0-9]+[\.,]\d+)', uptime)]
|
||||
|
||||
dimensions = self._set_dimensions(None)
|
||||
|
||||
self.gauge('load.avg_1_min',
|
||||
float(load[0]),
|
||||
dimensions=dimensions)
|
||||
self.gauge('load.avg_5_min',
|
||||
float(load[1]),
|
||||
dimensions=dimensions)
|
||||
self.gauge('load.avg_15_min',
|
||||
float(load[2]),
|
||||
dimensions=dimensions)
|
||||
|
||||
log.debug('Collected 3 load metrics')
|
|
@ -0,0 +1,67 @@
|
|||
import psutil
|
||||
import logging
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
import monasca_agent.collector.checks as checks
|
||||
|
||||
|
||||
class Memory(checks.AgentCheck):
|
||||
|
||||
def __init__(self, name, init_config, agent_config):
|
||||
super(Memory, self).__init__(name, init_config, agent_config)
|
||||
|
||||
def check(self, instance):
|
||||
"""Capture memory stats
|
||||
|
||||
"""
|
||||
dimensions = self._set_dimensions(None, instance)
|
||||
|
||||
mem_info = psutil.virtual_memory()
|
||||
swap_info = psutil.swap_memory()
|
||||
|
||||
self.gauge('mem.total_mb',
|
||||
int(mem_info.total/1048576),
|
||||
dimensions=dimensions)
|
||||
self.gauge('mem.free_mb',
|
||||
int(mem_info.free/1048576),
|
||||
dimensions=dimensions)
|
||||
self.gauge('mem.usable_mb',
|
||||
int(mem_info.available/1048576),
|
||||
dimensions=dimensions)
|
||||
self.gauge('mem.usable_perc',
|
||||
float(100 - mem_info.percent),
|
||||
dimensions=dimensions)
|
||||
self.gauge('mem.swap_total_mb',
|
||||
int(swap_info.total/1048576),
|
||||
dimensions=dimensions)
|
||||
self.gauge('mem.swap_used_mb',
|
||||
int(swap_info.used/1048576),
|
||||
dimensions=dimensions)
|
||||
self.gauge('mem.swap_free_mb',
|
||||
int(swap_info.free/1048576),
|
||||
dimensions=dimensions)
|
||||
self.gauge('mem.swap_free_perc',
|
||||
float(100 - swap_info.percent),
|
||||
dimensions=dimensions)
|
||||
|
||||
count = 8
|
||||
if 'buffers' in mem_info:
|
||||
self.gauge('mem.used_buffers',
|
||||
int(mem_info.buffers/1048576),
|
||||
dimensions=dimensions)
|
||||
count +=1
|
||||
|
||||
if 'cached' in mem_info:
|
||||
self.gauge('mem.used_cache',
|
||||
int(mem_info.cached/1048576),
|
||||
dimensions=dimensions)
|
||||
count +=1
|
||||
|
||||
if 'shared' in mem_info:
|
||||
self.gauge('mem.used_shared',
|
||||
int(mem_info.shared/1048576),
|
||||
dimensions=dimensions)
|
||||
count +=1
|
||||
|
||||
log.debug('Collected {0} memory metrics'.format(count))
|
|
@ -100,8 +100,8 @@ class WrapNagios(ServicesCheck):
|
|||
self.gauge(instance['name'], status_code, dimensions=dimensions)
|
||||
# Return DOWN on critical, UP otherwise
|
||||
if status_code == "2":
|
||||
return Status.DOWN, "DOWN: {}".format(detail)
|
||||
return Status.UP, "UP: {}".format(detail)
|
||||
return Status.DOWN, "DOWN: {0}".format(detail)
|
||||
return Status.UP, "UP: {0}".format(detail)
|
||||
|
||||
# Save last-run data
|
||||
file_w = open(last_run_file, "w")
|
||||
|
|
|
@ -1,352 +1,48 @@
|
|||
"""Collects network metrics.
|
||||
|
||||
"""
|
||||
|
||||
# stdlib
|
||||
import logging
|
||||
import re
|
||||
import subprocess
|
||||
import psutil
|
||||
|
||||
# project
|
||||
from monasca_agent.collector.checks import AgentCheck
|
||||
from monasca_agent.common.util import Platform
|
||||
import monasca_agent.collector.checks as checks
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Network(AgentCheck):
|
||||
class Network(checks.AgentCheck):
|
||||
|
||||
TCP_STATES = {
|
||||
"ESTABLISHED": "established",
|
||||
"SYN_SENT": "opening",
|
||||
"SYN_RECV": "opening",
|
||||
"FIN_WAIT1": "closing",
|
||||
"FIN_WAIT2": "closing",
|
||||
"TIME_WAIT": "time_wait",
|
||||
"CLOSE": "closing",
|
||||
"CLOSE_WAIT": "closing",
|
||||
"LAST_ACK": "closing",
|
||||
"LISTEN": "listening",
|
||||
"CLOSING": "closing",
|
||||
}
|
||||
|
||||
NETSTAT_GAUGE = {
|
||||
('udp4', 'connections'): 'net.udp4_connections',
|
||||
('udp6', 'connections'): 'net.udp6_connections',
|
||||
('tcp4', 'established'): 'net.tcp4_established',
|
||||
('tcp4', 'opening'): 'net.tcp4_opening',
|
||||
('tcp4', 'closing'): 'net.tcp4_closing',
|
||||
('tcp4', 'listening'): 'net.tcp4_listening',
|
||||
('tcp4', 'time_wait'): 'net.tcp4_time_wait',
|
||||
('tcp6', 'established'): 'net.tcp6_established',
|
||||
('tcp6', 'opening'): 'net.tcp6_opening',
|
||||
('tcp6', 'closing'): 'net.tcp6_closing',
|
||||
('tcp6', 'listening'): 'net.tcp6_listening',
|
||||
('tcp6', 'time_wait'): 'net.tcp6_time_wait',
|
||||
}
|
||||
|
||||
def __init__(self, name, init_config, agent_config, instances=None):
|
||||
AgentCheck.__init__(self, name, init_config, agent_config, instances=instances)
|
||||
if instances is not None and len(instances) > 1:
|
||||
raise Exception("Network check only supports one configured instance.")
|
||||
def __init__(self, name, init_config, agent_config):
|
||||
super(Network, self).__init__(name, init_config, agent_config)
|
||||
|
||||
def check(self, instance):
|
||||
if instance is None:
|
||||
instance = {}
|
||||
"""Capture network metrics
|
||||
|
||||
"""
|
||||
dimensions = self._set_dimensions(None, instance)
|
||||
|
||||
self._excluded_ifaces = instance.get('excluded_interfaces', [])
|
||||
self._collect_cx_state = instance.get('collect_connection_state', False)
|
||||
excluded_ifaces = instance.get('excluded_interfaces', [])
|
||||
if excluded_ifaces:
|
||||
log.debug('Excluding network devices: {0}'.format(excluded_ifaces))
|
||||
|
||||
self._exclude_iface_re = None
|
||||
exclude_re = instance.get('excluded_interface_re', None)
|
||||
if exclude_re:
|
||||
self.log.debug("Excluding network devices matching: %s" % exclude_re)
|
||||
self._exclude_iface_re = re.compile(exclude_re)
|
||||
|
||||
if Platform.is_linux():
|
||||
self._check_linux(instance, dimensions)
|
||||
elif Platform.is_bsd():
|
||||
self._check_bsd(instance, dimensions)
|
||||
elif Platform.is_solaris():
|
||||
self._check_solaris(instance, dimensions)
|
||||
|
||||
def _submit_devicemetrics(self, iface, vals_by_metric, dimensions):
|
||||
if self._exclude_iface_re and self._exclude_iface_re.match(iface):
|
||||
# Skip this network interface.
|
||||
return False
|
||||
|
||||
expected_metrics = [
|
||||
'in_bytes',
|
||||
'out_bytes',
|
||||
'in_packets',
|
||||
'in_errors',
|
||||
'out_packets',
|
||||
'out_errors',
|
||||
]
|
||||
for m in expected_metrics:
|
||||
assert m in vals_by_metric
|
||||
assert len(vals_by_metric) == len(expected_metrics)
|
||||
|
||||
# For reasons i don't understand only these metrics are skipped if a
|
||||
# particular interface is in the `excluded_interfaces` config list.
|
||||
# Not sure why the others aren't included. Until I understand why, I'm
|
||||
# going to keep the same behaviour.
|
||||
exclude_iface_metrics = [
|
||||
'in_packets',
|
||||
'in_errors',
|
||||
'out_packets',
|
||||
'out_errors',
|
||||
]
|
||||
|
||||
count = 0
|
||||
for metric, val in vals_by_metric.iteritems():
|
||||
if iface in self._excluded_ifaces and metric in exclude_iface_metrics:
|
||||
# skip it!
|
||||
continue
|
||||
self.rate('net.%s' % metric, val, device_name=iface, dimensions=dimensions)
|
||||
count += 1
|
||||
self.log.debug("tracked %s network metrics for interface %s" % (count, iface))
|
||||
|
||||
@staticmethod
|
||||
def _parse_value(v):
|
||||
if v == "-":
|
||||
return 0
|
||||
exclude_iface_re = re.compile(exclude_re)
|
||||
log.debug('Excluding network devices matching: {0}'.format(exclude_re))
|
||||
else:
|
||||
try:
|
||||
return long(v)
|
||||
except ValueError:
|
||||
return 0
|
||||
exclude_iface_re = None
|
||||
|
||||
def _check_linux(self, instance, dimensions):
|
||||
if self._collect_cx_state:
|
||||
netstat = subprocess.Popen(["netstat", "-n", "-u", "-t", "-a"],
|
||||
stdout=subprocess.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
# Active Internet connections (w/o servers)
|
||||
# Proto Recv-Q Send-Q Local Address Foreign Address State
|
||||
# tcp 0 0 46.105.75.4:80 79.220.227.193:2032 SYN_RECV
|
||||
# tcp 0 0 46.105.75.4:143 90.56.111.177:56867 ESTABLISHED
|
||||
# tcp 0 0 46.105.75.4:50468 107.20.207.175:443 TIME_WAIT
|
||||
# tcp6 0 0 46.105.75.4:80 93.15.237.188:58038 FIN_WAIT2
|
||||
# tcp6 0 0 46.105.75.4:80 79.220.227.193:2029 ESTABLISHED
|
||||
# udp 0 0 0.0.0.0:123 0.0.0.0:*
|
||||
# udp6 0 0 :::41458 :::*
|
||||
nics = psutil.net_io_counters(pernic=True)
|
||||
count = 0
|
||||
for nic_name in nics.keys():
|
||||
if nic_name not in excluded_ifaces or exclude_iface_re and not exclude_iface_re.match(nic_name):
|
||||
nic = nics[nic_name]
|
||||
self.rate('net.in_bytes_sec', nic.bytes_recv, device_name=nic_name, dimensions=dimensions)
|
||||
self.rate('net.out_bytes_sec', nic.bytes_sent, device_name=nic_name, dimensions=dimensions)
|
||||
self.rate('net.in_packets_sec', nic.packets_recv, device_name=nic_name, dimensions=dimensions)
|
||||
self.rate('net.out_packets_sec', nic.packets_sent, device_name=nic_name, dimensions=dimensions)
|
||||
self.rate('net.in_errors_sec', nic.errin, device_name=nic_name, dimensions=dimensions)
|
||||
self.rate('net.out_errors_sec', nic.errout, device_name=nic_name, dimensions=dimensions)
|
||||
self.rate('net.in_packets_dropped_sec', nic.dropin, device_name=nic_name, dimensions=dimensions)
|
||||
self.rate('net.out_packets_dropped_sec', nic.dropout, device_name=nic_name, dimensions=dimensions)
|
||||
|
||||
lines = netstat.split("\n")
|
||||
|
||||
metrics = dict.fromkeys(self.NETSTAT_GAUGE.values(), 0)
|
||||
for l in lines[2:-1]:
|
||||
cols = l.split()
|
||||
# 0 1 2 3 4 5
|
||||
# tcp 0 0 46.105.75.4:143 90.56.111.177:56867 ESTABLISHED
|
||||
if cols[0].startswith("tcp"):
|
||||
protocol = ("tcp4", "tcp6")[cols[0] == "tcp6"]
|
||||
if cols[5] in self.TCP_STATES:
|
||||
metric = self.NETSTAT_GAUGE[protocol, self.TCP_STATES[cols[5]]]
|
||||
metrics[metric] += 1
|
||||
elif cols[0].startswith("udp"):
|
||||
protocol = ("udp4", "udp6")[cols[0] == "udp6"]
|
||||
metric = self.NETSTAT_GAUGE[protocol, 'connections']
|
||||
metrics[metric] += 1
|
||||
|
||||
for metric, value in metrics.iteritems():
|
||||
self.gauge(metric, value, dimensions=dimensions)
|
||||
|
||||
proc = open('/proc/net/dev', 'r')
|
||||
try:
|
||||
lines = proc.readlines()
|
||||
finally:
|
||||
proc.close()
|
||||
# Inter-| Receive | Transmit
|
||||
# face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
|
||||
# lo:45890956 112797 0 0 0 0 0 0 45890956 112797 0 0 0 0 0 0
|
||||
# eth0:631947052 1042233 0 19 0 184 0 1206 1208625538 1320529 0 0 0 0 0 0
|
||||
# eth1: 0 0 0 0 0 0 0 0
|
||||
# 0 0 0 0 0 0 0 0
|
||||
for l in lines[2:]:
|
||||
cols = l.split(':', 1)
|
||||
x = cols[1].split()
|
||||
# Filter inactive interfaces
|
||||
if self._parse_value(x[0]) or self._parse_value(x[8]):
|
||||
iface = cols[0].strip()
|
||||
metrics = {
|
||||
'in_bytes': self._parse_value(x[0]),
|
||||
'out_bytes': self._parse_value(x[8]),
|
||||
'in_packets': self._parse_value(x[1]),
|
||||
'in_errors': self._parse_value(x[2]) + self._parse_value(x[3]),
|
||||
'out_packets': self._parse_value(x[9]),
|
||||
'out_errors': self._parse_value(x[10]) + self._parse_value(x[11]),
|
||||
}
|
||||
self._submit_devicemetrics(iface, metrics, dimensions)
|
||||
|
||||
def _check_bsd(self, instance, dimensions):
|
||||
netstat = subprocess.Popen(["netstat", "-i", "-b"],
|
||||
stdout=subprocess.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
# Name Mtu Network Address Ipkts Ierrs Ibytes Opkts Oerrs Obytes Coll
|
||||
# lo0 16384 <Link#1> 318258 0 428252203 318258 0 428252203 0
|
||||
# lo0 16384 localhost fe80:1::1 318258 - 428252203 318258 - 428252203 -
|
||||
# lo0 16384 127 localhost 318258 - 428252203 318258 - 428252203 -
|
||||
# lo0 16384 localhost ::1 318258 - 428252203 318258 - 428252203 -
|
||||
# gif0* 1280 <Link#2> 0 0 0 0 0 0 0
|
||||
# stf0* 1280 <Link#3> 0 0 0 0 0 0 0
|
||||
# en0 1500 <Link#4> 04:0c:ce:db:4e:fa 20801309 0 13835457425 15149389 0 11508790198 0
|
||||
# en0 1500 seneca.loca fe80:4::60c:ceff: 20801309 - 13835457425 15149389 - 11508790198 -
|
||||
# en0 1500 2001:470:1f 2001:470:1f07:11d 20801309 - 13835457425 15149389 - 11508790198 -
|
||||
# en0 1500 2001:470:1f 2001:470:1f07:11d 20801309 - 13835457425 15149389 - 11508790198 -
|
||||
# en0 1500 192.168.1 192.168.1.63 20801309 - 13835457425 15149389 - 11508790198 -
|
||||
# en0 1500 2001:470:1f 2001:470:1f07:11d 20801309 - 13835457425 15149389 - 11508790198 -
|
||||
# p2p0 2304 <Link#5> 06:0c:ce:db:4e:fa 0 0 0 0 0 0 0
|
||||
# ham0 1404 <Link#6> 7a:79:05:4d:bf:f5 30100 0 6815204 18742 0 8494811 0
|
||||
# ham0 1404 5 5.77.191.245 30100 - 6815204 18742 - 8494811 -
|
||||
# ham0 1404 seneca.loca fe80:6::7879:5ff: 30100 - 6815204 18742 - 8494811 -
|
||||
# ham0 1404 2620:9b::54 2620:9b::54d:bff5 30100 - 6815204
|
||||
# 18742 - 8494811 -
|
||||
|
||||
lines = netstat.split("\n")
|
||||
headers = lines[0].split()
|
||||
|
||||
# Given the irregular structure of the table above, better to parse from the end of each line
|
||||
# Verify headers first
|
||||
# -7 -6 -5 -4 -3 -2 -1
|
||||
for h in ("Ipkts", "Ierrs", "Ibytes", "Opkts", "Oerrs", "Obytes", "Coll"):
|
||||
if h not in headers:
|
||||
self.logger.error("%s not found in %s; cannot parse" % (h, headers))
|
||||
return False
|
||||
|
||||
current = None
|
||||
for l in lines[1:]:
|
||||
# Another header row, abort now, this is IPv6 land
|
||||
if "Name" in l:
|
||||
break
|
||||
|
||||
x = l.split()
|
||||
if len(x) == 0:
|
||||
break
|
||||
|
||||
iface = x[0]
|
||||
if iface.endswith("*"):
|
||||
iface = iface[:-1]
|
||||
if iface == current:
|
||||
# skip multiple lines of same interface
|
||||
continue
|
||||
else:
|
||||
current = iface
|
||||
|
||||
# Filter inactive interfaces
|
||||
if self._parse_value(x[-5]) or self._parse_value(x[-2]):
|
||||
iface = current
|
||||
metrics = {
|
||||
'in_bytes': self._parse_value(x[-5]),
|
||||
'out_bytes': self._parse_value(x[-2]),
|
||||
'in_packets': self._parse_value(x[-7]),
|
||||
'in_errors': self._parse_value(x[-6]),
|
||||
'out_packets': self._parse_value(x[-4]),
|
||||
'out_errors': self._parse_value(x[-3]),
|
||||
}
|
||||
self._submit_devicemetrics(iface, metrics, dimensions)
|
||||
|
||||
def _check_solaris(self, instance, dimensions):
|
||||
# Can't get bytes sent and received via netstat
|
||||
# Default to kstat -p link:0:
|
||||
netstat = subprocess.Popen(["kstat", "-p", "link:0:"],
|
||||
stdout=subprocess.PIPE,
|
||||
close_fds=True).communicate()[0]
|
||||
metrics_by_interface = self._parse_solaris_netstat(netstat)
|
||||
for interface, metrics in metrics_by_interface.iteritems():
|
||||
self._submit_devicemetrics(interface, metrics, dimensions)
|
||||
|
||||
def _parse_solaris_netstat(self, netstat_output):
|
||||
"""Return a mapping of network metrics by interface. For example:
|
||||
|
||||
{ interface:
|
||||
{'out_bytes': 0,
|
||||
'in_bytes': 0,
|
||||
'in_packets': 0,
|
||||
...
|
||||
}
|
||||
}
|
||||
"""
|
||||
# Here's an example of the netstat output:
|
||||
#
|
||||
# link:0:net0:brdcstrcv 527336
|
||||
# link:0:net0:brdcstxmt 1595
|
||||
# link:0:net0:class net
|
||||
# link:0:net0:collisions 0
|
||||
# link:0:net0:crtime 16359935.2637943
|
||||
# link:0:net0:ierrors 0
|
||||
# link:0:net0:ifspeed 10000000000
|
||||
# link:0:net0:ipackets 682834
|
||||
# link:0:net0:ipackets64 682834
|
||||
# link:0:net0:link_duplex 0
|
||||
# link:0:net0:link_state 1
|
||||
# link:0:net0:multircv 0
|
||||
# link:0:net0:multixmt 1595
|
||||
# link:0:net0:norcvbuf 0
|
||||
# link:0:net0:noxmtbuf 0
|
||||
# link:0:net0:obytes 12820668
|
||||
# link:0:net0:obytes64 12820668
|
||||
# link:0:net0:oerrors 0
|
||||
# link:0:net0:opackets 105445
|
||||
# link:0:net0:opackets64 105445
|
||||
# link:0:net0:rbytes 113983614
|
||||
# link:0:net0:rbytes64 113983614
|
||||
# link:0:net0:snaptime 16834735.1607669
|
||||
# link:0:net0:unknowns 0
|
||||
# link:0:net0:zonename 53aa9b7e-48ba-4152-a52b-a6368c3d9e7c
|
||||
# link:0:net1:brdcstrcv 4947620
|
||||
# link:0:net1:brdcstxmt 1594
|
||||
# link:0:net1:class net
|
||||
# link:0:net1:collisions 0
|
||||
# link:0:net1:crtime 16359935.2839167
|
||||
# link:0:net1:ierrors 0
|
||||
# link:0:net1:ifspeed 10000000000
|
||||
# link:0:net1:ipackets 4947620
|
||||
# link:0:net1:ipackets64 4947620
|
||||
# link:0:net1:link_duplex 0
|
||||
# link:0:net1:link_state 1
|
||||
# link:0:net1:multircv 0
|
||||
# link:0:net1:multixmt 1594
|
||||
# link:0:net1:norcvbuf 0
|
||||
# link:0:net1:noxmtbuf 0
|
||||
# link:0:net1:obytes 73324
|
||||
# link:0:net1:obytes64 73324
|
||||
# link:0:net1:oerrors 0
|
||||
# link:0:net1:opackets 1594
|
||||
# link:0:net1:opackets64 1594
|
||||
# link:0:net1:rbytes 304384894
|
||||
# link:0:net1:rbytes64 304384894
|
||||
# link:0:net1:snaptime 16834735.1613302
|
||||
# link:0:net1:unknowns 0
|
||||
# link:0:net1:zonename 53aa9b7e-48ba-4152-a52b-a6368c3d9e7c
|
||||
|
||||
# A mapping of solaris names -> datadog names
|
||||
metric_by_solaris_name = {
|
||||
'rbytes64': 'in_bytes',
|
||||
'obytes64': 'out_bytes',
|
||||
'ipackets64': 'in_packets',
|
||||
'ierrors': 'in_errors',
|
||||
'opackets64': 'out_packets',
|
||||
'oerrors': 'out_errors',
|
||||
}
|
||||
|
||||
lines = [l for l in netstat_output.split("\n") if len(l) > 0]
|
||||
|
||||
metrics_by_interface = {}
|
||||
|
||||
for l in lines:
|
||||
# Parse the metric & interface.
|
||||
cols = l.split()
|
||||
link, n, iface, name = cols[0].split(":")
|
||||
assert link == "link"
|
||||
|
||||
# Get the datadog metric name.
|
||||
ddname = metric_by_solaris_name.get(name, None)
|
||||
if ddname is None:
|
||||
continue
|
||||
|
||||
# Add it to this interface's list of metrics.
|
||||
metrics = metrics_by_interface.get(iface, {})
|
||||
metrics[ddname] = self._parse_value(cols[1])
|
||||
metrics_by_interface[iface] = metrics
|
||||
|
||||
return metrics_by_interface
|
||||
log.debug('Collected 8 network metrics for device {0}'.format(nic_name))
|
||||
|
|
|
@ -323,7 +323,7 @@ def run_check(check):
|
|||
if isinstance(check, status_checks.ServicesCheck):
|
||||
is_multi_threaded = True
|
||||
print("#" * 80)
|
||||
print("Check name: '{}'\n".format(check.name))
|
||||
print("Check name: '{0}'\n".format(check.name))
|
||||
check.run()
|
||||
# Sleep for a second and then run a second check to capture rate metrics
|
||||
time.sleep(1)
|
||||
|
|
|
@ -36,7 +36,7 @@ class Config(object):
|
|||
elif os.path.exists(os.getcwd() + '/agent.conf'):
|
||||
self._configFile = os.getcwd() + '/agent.conf'
|
||||
else:
|
||||
log.error('No config file found at {} nor in the working directory.'.format(DEFAULT_CONFIG_FILE))
|
||||
log.error('No config file found at {0} nor in the working directory.'.format(DEFAULT_CONFIG_FILE))
|
||||
|
||||
self._read_config()
|
||||
|
||||
|
@ -48,7 +48,7 @@ class Config(object):
|
|||
elif isinstance(sections, list):
|
||||
section_list.extend(sections)
|
||||
else:
|
||||
log.error('Unknown section: {}'.format(str(sections)))
|
||||
log.error('Unknown section: {0}'.format(str(sections)))
|
||||
return {}
|
||||
|
||||
new_config = {}
|
||||
|
@ -83,12 +83,9 @@ class Config(object):
|
|||
'version': self.get_version(),
|
||||
'additional_checksd': os.path.join(os.path.dirname(self._configFile), '/checks_d/'),
|
||||
'system_metrics': None,
|
||||
'ignore_filesystem_types': None,
|
||||
'device_blacklist_re': None,
|
||||
'limit_memory_consumption': None,
|
||||
'skip_ssl_validation': False,
|
||||
'watchdog': True,
|
||||
'use_mount': False,
|
||||
'autorestart': False,
|
||||
'non_local_traffic': False},
|
||||
'Api': {'is_enabled': False,
|
||||
|
@ -165,24 +162,6 @@ class Config(object):
|
|||
metrics_list = []
|
||||
self._config['Main']['system_metrics'] = metrics_list
|
||||
|
||||
# Parse device blacklist regular expression
|
||||
try:
|
||||
filter_device_re = self._config['Main']['device_blacklist_re']
|
||||
if filter_device_re:
|
||||
self._config['Main']['device_blacklist_re'] = re.compile(filter_device_re)
|
||||
except re.error as err:
|
||||
log.error('Error processing regular expression {0}'.format(filter_device_re))
|
||||
|
||||
# Parse file system types
|
||||
if self._config['Main']['ignore_filesystem_types']:
|
||||
# parse comma separated file system types to ignore list
|
||||
try:
|
||||
file_system_list = [x.strip() for x in self._config['Main']['ignore_filesystem_types'].split(',')]
|
||||
except ValueError:
|
||||
log.info("Unable to process ignore_filesystem_types.")
|
||||
file_system_list = []
|
||||
self._config['Main']['ignore_filesystem_types'] = file_system_list
|
||||
|
||||
def get_confd_path(self):
|
||||
path = os.path.join(os.path.dirname(self._configFile), 'conf.d')
|
||||
if os.path.exists(path):
|
||||
|
@ -217,8 +196,8 @@ def main():
|
|||
configuration = Config()
|
||||
config = configuration.get_config()
|
||||
api_config = configuration.get_config('Api')
|
||||
print "Main Configuration: \n {}".format(config)
|
||||
print "\nApi Configuration: \n {}".format(api_config)
|
||||
print "Main Configuration: \n {0}".format(config)
|
||||
print "\nApi Configuration: \n {0}".format(api_config)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -44,8 +44,8 @@ def http_emitter(message, log, url):
|
|||
response = opener.open(request)
|
||||
log.debug('http_emitter: postback response: ' + str(response.read()))
|
||||
except Exception as exc:
|
||||
log.error("""Forwarder at {} is down or not responding...
|
||||
Error is {}
|
||||
log.error("""Forwarder at {0} is down or not responding...
|
||||
Error is {1}
|
||||
Please restart the monasca-agent.""".format(url, repr(exc)))
|
||||
finally:
|
||||
if response:
|
||||
|
|
|
@ -72,7 +72,7 @@ class Keystone(object):
|
|||
self._keystone_client = self._get_ksclient()
|
||||
except Exception as exc:
|
||||
log.error("Unable to create the Keystone Client. " +
|
||||
"Error was {}".format(repr(exc)))
|
||||
"Error was {0}".format(repr(exc)))
|
||||
return None
|
||||
|
||||
self._token = self._keystone_client.token
|
||||
|
|
|
@ -20,6 +20,7 @@ import uuid
|
|||
import logging
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Tornado
|
||||
try:
|
||||
from tornado import ioloop, version_info as tornado_version
|
||||
|
@ -384,7 +385,6 @@ class Paths(object):
|
|||
log.info("Windows certificate path: %s" % crt_path)
|
||||
tornado.simple_httpclient._DEFAULT_CA_CERTS = crt_path
|
||||
|
||||
|
||||
def plural(count):
|
||||
if count == 1:
|
||||
return ""
|
||||
|
|
|
@ -9,13 +9,13 @@ from keystone import Keystone
|
|||
from libvirt import Libvirt
|
||||
from mon import MonAPI, MonPersister, MonThresh
|
||||
from mysql import MySQL
|
||||
from network import Network
|
||||
from neutron import Neutron
|
||||
from nova import Nova
|
||||
from ntp import Ntp
|
||||
from postfix import Postfix
|
||||
from rabbitmq import RabbitMQ
|
||||
from swift import Swift
|
||||
from system import System
|
||||
from zookeeper import Zookeeper
|
||||
|
||||
DETECTION_PLUGINS = [Apache,
|
||||
|
@ -29,11 +29,11 @@ DETECTION_PLUGINS = [Apache,
|
|||
MonPersister,
|
||||
MonThresh,
|
||||
MySQL,
|
||||
Network,
|
||||
Neutron,
|
||||
Nova,
|
||||
Ntp,
|
||||
Postfix,
|
||||
RabbitMQ,
|
||||
Swift,
|
||||
System,
|
||||
Zookeeper]
|
||||
|
|
|
@ -48,9 +48,9 @@ class Libvirt(monasca_setup.detection.Plugin):
|
|||
init_config[option] = nova_cfg.get(cfg_section, option)
|
||||
|
||||
# Create an identity server URI
|
||||
init_config['identity_uri'] = "{}://{}:{}/v2.0".format(nova_cfg.get(cfg_section, 'auth_protocol'),
|
||||
nova_cfg.get(cfg_section, 'auth_host'),
|
||||
nova_cfg.get(cfg_section, 'auth_port'))
|
||||
init_config['identity_uri'] = "{0}://{1}:{2}/v2.0".format(nova_cfg.get(cfg_section, 'auth_protocol'),
|
||||
nova_cfg.get(cfg_section, 'auth_host'),
|
||||
nova_cfg.get(cfg_section, 'auth_port'))
|
||||
|
||||
config['libvirt'] = {'init_config': init_config,
|
||||
'instances': [{}]}
|
||||
|
|
|
@ -1,34 +0,0 @@
|
|||
import os
|
||||
|
||||
import yaml
|
||||
|
||||
from monasca_setup.detection import Plugin
|
||||
from monasca_setup import agent_config
|
||||
|
||||
|
||||
class Network(Plugin):
|
||||
|
||||
"""No configuration here, working networking is assumed so this is either on or off.
|
||||
|
||||
"""
|
||||
|
||||
def _detect(self):
|
||||
"""Run detection, set self.available True if the service is detected.
|
||||
|
||||
"""
|
||||
self.available = True
|
||||
|
||||
def build_config(self):
|
||||
"""Build the config as a Plugins object and return.
|
||||
|
||||
"""
|
||||
# A bit silly to parse the yaml only for it to be converted back but this
|
||||
# plugin is the exception not the rule
|
||||
with open(os.path.join(self.template_dir, 'conf.d/network.yaml'), 'r') as network_template:
|
||||
default_net_config = yaml.load(network_template.read())
|
||||
config = agent_config.Plugins()
|
||||
config['network'] = default_net_config
|
||||
return config
|
||||
|
||||
def dependencies_installed(self):
|
||||
return True
|
|
@ -0,0 +1,42 @@
|
|||
import logging
|
||||
import os
|
||||
import yaml
|
||||
|
||||
from monasca_setup.detection import Plugin
|
||||
from monasca_setup import agent_config
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class System(Plugin):
|
||||
|
||||
"""No configuration here, the system metrics are assumed so this is either on or off.
|
||||
|
||||
"""
|
||||
system_metrics = ['network', 'disk', 'load', 'memory', 'cpu']
|
||||
|
||||
def _detect(self):
|
||||
"""Run detection, set self.available True if the service is detected.
|
||||
|
||||
"""
|
||||
self.available = True
|
||||
|
||||
def build_config(self):
|
||||
"""Build the configs for the system metrics as Plugin objects and return.
|
||||
|
||||
"""
|
||||
config = agent_config.Plugins()
|
||||
for metric in System.system_metrics:
|
||||
try:
|
||||
with open(os.path.join(self.template_dir, 'conf.d/' + metric + '.yaml'), 'r') as metric_template:
|
||||
default_config = yaml.load(metric_template.read())
|
||||
config[metric] = default_config
|
||||
log.info('\tConfigured {0}'.format(metric))
|
||||
except (OSError, IOError):
|
||||
log.info('\tUnable to configure {0}'.format(metric))
|
||||
continue
|
||||
|
||||
return config
|
||||
|
||||
def dependencies_installed(self):
|
||||
return True
|
|
@ -96,7 +96,7 @@ def main(argv=None):
|
|||
if detected_os == 'Linux':
|
||||
linux_flavor = platform.linux_distribution()[0]
|
||||
if 'Ubuntu' or 'debian' in linux_flavor:
|
||||
for package in ['coreutils', 'sysstat']:
|
||||
for package in ['coreutils']:
|
||||
# Check for required dependencies for system checks
|
||||
try:
|
||||
output = check_output('dpkg -s {0}'.format(package),
|
||||
|
|
Loading…
Reference in New Issue