Removing subprocess call from system metrics classes

Changed the IO, Disk and Network classes to not use a subprocess.
They now use psutil to get the metrics.  Also, changed the linux
system metrics classes to subclass the AgentCheck class instead
of the old-style Check class.  Added additional configuration
and changed monasca-setup to support that. Fixed some Python
2.6 incompatible string formatting issues.

Change-Id: I1f8b65bf48e48e2c598aa4950c194fbae2f9e337
This commit is contained in:
Gary Hessler 2015-02-19 16:51:58 -07:00
parent c0225d49ad
commit b1de7db1f5
31 changed files with 437 additions and 1042 deletions

View File

@ -74,7 +74,6 @@
- [Dimensions](#dimensions)
- [Cross-Tenant Metric Submission](#cross-tenant-metric-submission)
- [Statsd](#statsd)
- [Log Parsing](#log-parsing)
- [License](#license)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
@ -86,7 +85,6 @@ The Monasca Agent is a modern Python monitoring agent for gathering metrics and
* System metrics such as cpu and memory utilization.
* Nagios plugins. The Monasca Agent can run Nagios plugins and send the status code returned by the plugin as a metric to the Monasca API.
* Statsd. The Monasca Agent supports an integrated Statsd daemon which can be used by applications via a statsd client library.
* Retrieving metrics from log files written in a specific format.
* Host alive. The Monasca Agent can perform active checks on a host to determine if it is alive using ping(ICMP) or SSH.
* Process checks. The Monasca Agent can check a process and return several metrics on the process such as number of instances, memory, io and threads.
* Http Endpoint checks. The Monasca Agent can perform active checks on http endpoints by sending an HTTP request to an API.
@ -370,12 +368,14 @@ This section documents the system metrics that are sent by the Agent. This sect
| cpu.stolen_perc | | Percentage of stolen CPU time, i.e. the time spent in other OS contexts when running in a virtualized environment |
| cpu.system_perc | | Percentage of time the CPU is used at the system level |
| cpu.user_perc | | Percentage of time the CPU is used at the user level |
| disk.inode_used_perc | device | The percentage of inodes that are used on a device |
| disk.space_used_perc | device | The percentage of disk space that is being used on a device |
| disk.inode_used_perc | device, mount_point | The percentage of inodes that are used on a device |
| disk.space_used_perc | device, mount_point | The percentage of disk space that is being used on a device |
| io.read_kbytes_sec | device | Kbytes/sec read by an io device
| io.read_req_sec | device | Number of read requests/sec to an io device
| io.read_time_sec | device | Amount of read time/sec to an io device
| io.write_kbytes_sec |device | Kbytes/sec written by an io device
| io.write_req_sec | device | Number of write requests/sec to an io device
| io.write_time_sec | device | Amount of write time/sec to an io device
| load.avg_1_min | | The average system load over a 1 minute period
| load.avg_5_min | | The average system load over a 5 minute period
| load.avg_15_min | | The average system load over a 15 minute period
@ -390,12 +390,14 @@ This section documents the system metrics that are sent by the Agent. This sect
| mem.used_buffers | | Number of buffers being used by the kernel for block io
| mem.used_cached | | Memory used for the page cache
| mem.used_shared | | Memory shared between separate processes and typically used for inter-process communication
| net.in_bytes | device | Number of network bytes received
| net.out_bytes | device | Number of network bytes sent
| net.in_packets | device | Number of network packets received
| net.out_packets | device | Number of network packets sent
| net.in_errors | device | Number of network errors on incoming network traffic
| net.out_errors | device | Number of network errors on outgoing network traffic
| net.in_bytes_sec | device | Number of network bytes received per second
| net.out_bytes_sec | device | Number of network bytes sent per second
| net.in_packets_sec | device | Number of network packets received per second
| net.out_packets_sec | device | Number of network packets sent per second
| net.in_errors_sec | device | Number of network errors on incoming network traffic per second
| net.out_errors_sec | device | Number of network errors on outgoing network traffic per second
| net.in_packets_dropped_sec | device | Number of inbound network packets dropped per second
| net.out_packets_dropped_sec | device | Number of inbound network packets dropped per second
| monasca.thread_count | service=monitoring component=monasca-agent | Number of threads that the collector is consuming for this collection run
| monasca.emit_time_sec | service=monitoring component=monasca-agent | Amount of time that the collector took for sending the collected metrics to the Forwarder for this collection run
| monasca.collection_time_sec | service=monitoring component=monasca-agent | Amount of time that the collector took for this collection run
@ -1181,8 +1183,5 @@ Here are some examples of how code can be instrumented using calls to monasca-st
```
# Log Parsing
TBD
# License
Copyright (c) 2014 Hewlett-Packard Development Company, L.P.

View File

@ -65,9 +65,6 @@ dimensions: {args.dimensions}
# time to wait between collection runs
check_freq: {args.check_frequency}
# Use mount points instead of volumes to track disk and fs metrics
use_mount: no
# Change port the Agent is listening to
# listen_port: 17123
@ -81,24 +78,6 @@ use_mount: no
# https://github.com/DataDog/dd-agent/wiki/Network-Traffic-and-Proxy-Configuration
# non_local_traffic: no
# ========================================================================== #
# System Metrics configuration #
# ========================================================================== #
# Enabled categories of system metrics to collect
# Current system metrics available: cpu,disk,io,load,memory
system_metrics: cpu,disk,io,load,memory
# Disk and IO #
# Some infrastrucures have many constantly changing virtual devices (e.g. folks
# running constantly churning linux containers) whose metrics aren't
# interesting. To filter out a particular pattern of devices
# from disk and IO collection, configure a regex here:
# device_blacklist_re: .*\/dev\/mapper\/lxc-box.*
# For disk metrics it is also possible to ignore entire filesystem types
ignore_filesystem_types: tmpfs,devtmpfs
[Statsd]
# ========================================================================== #
# Monasca Statsd configuration #

5
conf.d/cpu.yaml Normal file
View File

@ -0,0 +1,5 @@
init_config:
instances:
# Cpu check only supports one configured instance
- name: cpu_stats

19
conf.d/disk.yaml Normal file
View File

@ -0,0 +1,19 @@
init_config:
instances:
# Disk check only supports one configured instance
- name: disk_stats
# Add a mount_point dimension to the disk system metrics (Default = True)
#use_mount: False
# Send additional disk i/o system metrics (Default = True)
#send_io_stats: False
# Some infrastructures have many constantly changing virtual devices (e.g. folks
# running constantly churning linux containers) whose metrics aren't
# interesting. To filter out a particular pattern of devices
# from the disk collection, configure a regex here:
# device_blacklist_re: .*\/dev\/mapper\/lxc-box.*
# For disk metrics it is also possible to ignore entire filesystem types
ignore_filesystem_types: tmpfs,devtmpfs

5
conf.d/load.yaml Normal file
View File

@ -0,0 +1,5 @@
init_config:
instances:
# Load check only supports one configured instance
- name: load_stats

5
conf.d/memory.yaml Normal file
View File

@ -0,0 +1,5 @@
init_config:
instances:
# Memory check only supports one configured instance
- name: memory_stats

View File

@ -2,10 +2,9 @@ init_config:
instances:
# Network check only supports one configured instance
- collect_connection_state: false
excluded_interfaces:
- excluded_interfaces:
- lo
- lo0
# Optionally completely ignore any network interface
# matching the given regex:
#excluded_interface_re: my-network-interface.*
#excluded_interface_re: my-network-interface

View File

@ -1,2 +0,0 @@
site_name: monasca-agent
repo_url: https://github.com/stackforge/monasca-agent

View File

@ -260,7 +260,7 @@ class Check(util.Dimensions):
except Exception:
pass
if prettyprint:
print("Metrics: {}".format(metrics))
print("Metrics: {0}".format(metrics))
return metrics
@ -442,11 +442,11 @@ class AgentCheck(util.Dimensions):
metrics = self.aggregator.flush()
if prettyprint:
for metric in metrics:
print(" Timestamp: {}".format(metric.timestamp))
print(" Name: {}".format(metric.name))
print(" Value: {}".format(metric.value))
print(" Timestamp: {0}".format(metric.timestamp))
print(" Name: {0}".format(metric.name))
print(" Value: {0}".format(metric.value))
if (metric.delegated_tenant):
print(" Delegtd ID: {}".format(metric.delegated_tenant))
print(" Delegtd ID: {0}".format(metric.delegated_tenant))
print(" Dimensions: ", end='')
line = 0
@ -606,7 +606,7 @@ def run_check(name, path=None):
# Read the config file
config = Config()
confd_path = path or os.path.join(config.get_confd_path(),
'{}.yaml'.format(name))
'{0}.yaml'.format(name))
try:
f = open(confd_path)

View File

@ -1,7 +1,6 @@
# Core modules
import logging
import socket
import system.unix as u
import system.win32 as w32
import threading
import time
@ -40,24 +39,16 @@ class Collector(util.Dimensions):
self.initialized_checks_d = []
self.init_failed_checks_d = []
# add system checks
# add windows system checks
if self.os == 'windows':
self._checks = [w32.Disk(log),
w32.IO(log),
w32.Processes(log),
w32.Memory(log),
w32.Network(log),
w32.Cpu(log)]
w32.IO(log),
w32.Processes(log),
w32.Memory(log),
w32.Network(log),
w32.Cpu(log)]
else:
possible_checks = {'cpu': u.Cpu,
'disk': u.Disk,
'io': u.IO,
'load': u.Load,
'memory': u.Memory}
self._checks = []
# Only setup the configured system checks
for check in self.agent_config.get('system_metrics', []):
self._checks.append(possible_checks[check](log, self.agent_config))
if checksd:
# is of type {check_name: check}

View File

@ -110,7 +110,7 @@ class ServicesCheck(monasca_agent.collector.checks.AgentCheck):
self.resultsq.put(result)
except Exception:
self.log.exception('Failure in ServiceCheck {}'.format(name))
self.log.exception('Failure in ServiceCheck {0}'.format(name))
result = (FAILURE, FAILURE, FAILURE, FAILURE)
self.resultsq.put(result)

View File

@ -1,558 +0,0 @@
"""Unix system checks.
"""
# stdlib
import logging
import psutil
import re
import subprocess as sp
import sys
import time
# project
import monasca_agent.collector.checks.check as check
import monasca_agent.common.metrics as metrics
import monasca_agent.common.util as util
# locale-resilient float converter
to_float = lambda s: float(s.replace(",", "."))
class Disk(check.Check):
"""Collects metrics about the machine's disks.
"""
def check(self):
"""Get disk space/inode stats.
"""
# psutil can be used for disk usage stats but not for inode information, so df is used.
fs_types_to_ignore = []
# First get the configuration.
if self.agent_config is not None:
use_mount = self.agent_config.get("use_mount", False)
blacklist_re = self.agent_config.get('device_blacklist_re', None)
for fs_type in self.agent_config.get('ignore_filesystem_types', []):
fs_types_to_ignore.extend(['-x', fs_type])
else:
use_mount = False
blacklist_re = None
platform_name = sys.platform
try:
dfk_out = _get_subprocess_output(['df', '-k'] + fs_types_to_ignore)
stats = self.parse_df_output(
dfk_out,
platform_name,
use_mount=use_mount,
blacklist_re=blacklist_re
)
# Collect inode metrics.
dfi_out = _get_subprocess_output(['df', '-i'] + fs_types_to_ignore)
inodes = self.parse_df_output(
dfi_out,
platform_name,
inodes=True,
use_mount=use_mount,
blacklist_re=blacklist_re
)
# parse into a list of Measurements
stats.update(inodes)
timestamp = time.time()
measurements = [metrics.Measurement(key.split('.', 1)[1],
timestamp,
value,
self._set_dimensions({'device': key.split('.', 1)[0]}),
None)
for key, value in stats.iteritems()]
return measurements
except Exception:
self.logger.exception('Error collecting disk stats')
return []
def parse_df_output(
self, df_output, platform_name, inodes=False, use_mount=False, blacklist_re=None):
"""Parse the output of the df command.
If use_volume is true the volume is used to anchor the metric, otherwise false the mount
point is used. Returns a tuple of (disk, inode).
"""
usage_data = {}
# Transform the raw output into tuples of the df data.
devices = self._transform_df_output(df_output, blacklist_re)
# If we want to use the mount point, replace the volume name on each line.
for parts in devices:
try:
if use_mount:
parts[0] = parts[-1]
if inodes:
if util.Platform.is_darwin(platform_name):
# Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted
# Inodes are in position 5, 6 and we need to compute the total
# Total
parts[1] = int(parts[5]) + int(parts[6]) # Total
parts[2] = int(parts[5]) # Used
parts[3] = int(parts[6]) # Available
elif util.Platform.is_freebsd(platform_name):
# Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted
# Inodes are in position 5, 6 and we need to compute the total
parts[1] = int(parts[5]) + int(parts[6]) # Total
parts[2] = int(parts[5]) # Used
parts[3] = int(parts[6]) # Available
else:
parts[1] = int(parts[1]) # Total
parts[2] = int(parts[2]) # Used
parts[3] = int(parts[3]) # Available
else:
parts[1] = int(parts[1]) # Total
parts[2] = int(parts[2]) # Used
parts[3] = int(parts[3]) # Available
except IndexError:
self.logger.exception("Cannot parse %s" % (parts,))
# Some partitions (EFI boot) may appear to have 0 available inodes
if parts[1] == 0:
continue
#
# Remote shared storage device names like '10.103.0.220:/instances'
# cause invalid metrics on the api server side, so if we encounter
# a colon, remove everything to the left of it (including the
# offending colon).
#
device_name = parts[0]
idx = device_name.find(":")
if idx > 0:
device_name = device_name[(idx+1):]
if inodes:
usage_data['%s.disk.inode_used_perc' % device_name] = float(parts[2]) / parts[1] * 100
else:
usage_data['%s.disk.space_used_perc' % device_name] = float(parts[2]) / parts[1] * 100
return usage_data
@staticmethod
def _is_number(a_string):
try:
float(a_string)
except ValueError:
return False
return True
def _is_real_device(self, device):
"""Return true if we should track the given device name and false otherwise.
"""
# First, skip empty lines.
if not device or len(device) <= 1:
return False
# Filter out fake devices.
device_name = device[0]
if device_name == 'none':
return False
# Now filter our fake hosts like 'map -hosts'. For example:
# Filesystem 1024-blocks Used Available Capacity Mounted on
# /dev/disk0s2 244277768 88767396 155254372 37% /
# map -hosts 0 0 0 100% /net
blocks = device[1]
if not self._is_number(blocks):
return False
return True
def _flatten_devices(self, devices):
# Some volumes are stored on their own line. Rejoin them here.
previous = None
for parts in devices:
if len(parts) == 1:
previous = parts[0]
elif previous and self._is_number(parts[0]):
# collate with previous line
parts.insert(0, previous)
previous = None
else:
previous = None
return devices
def _transform_df_output(self, df_output, blacklist_re):
"""Given raw output for the df command, transform it into a normalized list devices.
A 'device' is a list with fields corresponding to the output of df output on each platform.
"""
all_devices = [l.strip().split() for l in df_output.split("\n")]
# Skip the header row and empty lines.
raw_devices = [l for l in all_devices[1:] if l]
# Flatten the disks that appear in the mulitple lines.
flattened_devices = self._flatten_devices(raw_devices)
# Filter fake disks.
def keep_device(device):
if not self._is_real_device(device):
return False
if blacklist_re and blacklist_re.match(device[0]):
return False
return True
devices = filter(keep_device, flattened_devices)
return devices
class IO(check.Check):
def __init__(self, logger, agent_config=None):
super(IO, self).__init__(logger, agent_config)
self.header_re = re.compile(r'([%\\/\-_a-zA-Z0-9]+)[\s+]?')
self.item_re = re.compile(r'^([a-zA-Z0-9\/]+)')
self.value_re = re.compile(r'\d+\.\d+')
self.stat_blacklist = ["await", "wrqm/s", "avgqu-sz", "r_await", "w_await", "rrqm/s",
"avgrq-sz", "%util", "svctm"]
def _parse_linux2(self, output):
recent_stats = output.split('Device:')[2].split('\n')
header = recent_stats[0]
header_names = re.findall(self.header_re, header)
io_stats = {}
for statsIndex in range(1, len(recent_stats)):
row = recent_stats[statsIndex]
if not row:
# Ignore blank lines.
continue
device_match = self.item_re.match(row)
if device_match is not None:
# Sometimes device names span two lines.
device = device_match.groups()[0]
else:
continue
values = re.findall(self.value_re, row)
if not values:
# Sometimes values are on the next line so we encounter
# instances of [].
continue
io_stats[device] = {}
for header_index in range(len(header_names)):
header_name = header_names[header_index]
io_stats[device][self.xlate(header_name, "linux")] = values[header_index]
return io_stats
@staticmethod
def _parse_darwin(output):
lines = [l.split() for l in output.split("\n") if len(l) > 0]
disks = lines[0]
lastline = lines[-1]
io = {}
for idx, disk in enumerate(disks):
kb_t, tps, mb_s = map(float, lastline[(3 * idx):(3 * idx) + 3]) # 3 cols at a time
io[disk] = {
'system.io.bytes_per_s': mb_s * 10 ** 6,
}
return io
@staticmethod
def xlate(metric_name, os_name):
"""Standardize on linux metric names.
"""
if os_name == "sunos":
names = {"wait": "await",
"svc_t": "svctm",
"%b": "%util",
"kr/s": "io.read_kbytes_sec",
"kw/s": "io.write_kbytes_sec",
"actv": "avgqu-sz"}
elif os_name == "freebsd":
names = {"svc_t": "await",
"%b": "%util",
"kr/s": "io.read_kbytes_sec",
"kw/s": "io.write_kbytes_sec",
"wait": "avgqu-sz"}
elif os_name == "linux":
names = {"rkB/s": "io.read_kbytes_sec",
"r/s": "io.read_req_sec",
"wkB/s": "io.write_kbytes_sec",
"w/s": "io.write_req_sec"}
# translate if possible
return names.get(metric_name, metric_name)
def check(self):
"""Capture io stats.
@rtype dict
@return [metrics.Measurement, ]
"""
# TODO psutil.disk_io_counters() will also return this infomration but it isn't per second and so must be
# converted possibly by doing two runs a second apart or storing timestamp+data from the previous collection
io = {}
try:
if util.Platform.is_linux():
stdout = sp.Popen(['iostat', '-d', '1', '2', '-x', '-k'],
stdout=sp.PIPE,
close_fds=True).communicate()[0]
# Linux 2.6.32-343-ec2 (ip-10-35-95-10) 12/11/2012 _x86_64_ (2 CPU)
#
# Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
# sda1 0.00 17.61 0.26 32.63 4.23 201.04 12.48 0.16 4.81 0.53 1.73
# sdb 0.00 2.68 0.19 3.84 5.79 26.07 15.82 0.02 4.93 0.22 0.09
# sdg 0.00 0.13 2.29 3.84 100.53 30.61 42.78 0.05 8.41 0.88 0.54
# sdf 0.00 0.13 2.30 3.84 100.54 30.61 42.78 0.06 9.12 0.90 0.55
# md0 0.00 0.00 0.05 3.37 1.41 30.01 18.35 0.00 0.00 0.00 0.00
#
# Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
# sda1 0.00 0.00 0.00 10.89 0.00 43.56 8.00 0.03 2.73 2.73 2.97
# sdb 0.00 0.00 0.00 2.97 0.00 11.88 8.00 0.00 0.00 0.00 0.00
# sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# 0.00 0.00 0.00 0.00
io.update(self._parse_linux2(stdout))
elif sys.platform == "sunos5":
iostat = sp.Popen(["iostat", "-x", "-d", "1", "2"],
stdout=sp.PIPE,
close_fds=True).communicate()[0]
# extended device statistics <-- since boot
# device r/s w/s kr/s kw/s wait actv svc_t %w %b
# ramdisk1 0.0 0.0 0.1 0.1 0.0 0.0 0.0 0 0
# sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
# sd1 79.9 149.9 1237.6 6737.9 0.0 0.5 2.3 0 11
# extended device statistics <-- past second
# device r/s w/s kr/s kw/s wait actv svc_t %w %b
# ramdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
# sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
# sd1 0.0 139.0 0.0 1850.6 0.0 0.0 0.1 0 1
# discard the first half of the display (stats since boot)
lines = [l for l in iostat.split("\n") if len(l) > 0]
lines = lines[len(lines) / 2:]
assert "extended device statistics" in lines[0]
headers = lines[1].split()
assert "device" in headers
for l in lines[2:]:
cols = l.split()
# cols[0] is the device
# cols[1:] are the values
io[cols[0]] = {}
for i in range(1, len(cols)):
io[cols[0]][self.xlate(headers[i], "sunos")] = cols[i]
elif sys.platform.startswith("freebsd"):
iostat = sp.Popen(["iostat", "-x", "-d", "1", "2"],
stdout=sp.PIPE,
close_fds=True).communicate()[0]
# Be careful!
# It looks like SunOS, but some columms (wait, svc_t) have different meaning
# extended device statistics
# device r/s w/s kr/s kw/s wait svc_t %b
# ad0 3.1 1.3 49.9 18.8 0 0.7 0
# extended device statistics
# device r/s w/s kr/s kw/s wait svc_t %b
# ad0 0.0 2.0 0.0 31.8 0 0.2 0
# discard the first half of the display (stats since boot)
lines = [l for l in iostat.split("\n") if len(l) > 0]
lines = lines[len(lines) / 2:]
assert "extended device statistics" in lines[0]
headers = lines[1].split()
assert "device" in headers
for l in lines[2:]:
cols = l.split()
# cols[0] is the device
# cols[1:] are the values
io[cols[0]] = {}
for i in range(1, len(cols)):
io[cols[0]][self.xlate(headers[i], "freebsd")] = cols[i]
elif sys.platform == 'darwin':
iostat = sp.Popen(['iostat', '-d', '-c', '2', '-w', '1'],
stdout=sp.PIPE,
close_fds=True).communicate()[0]
# disk0 disk1 <-- number of disks
# KB/t tps MB/s KB/t tps MB/s
# 21.11 23 0.47 20.01 0 0.00
# 6.67 3 0.02 0.00 0 0.00 <-- line of interest
io = self._parse_darwin(iostat)
else:
return []
# If we filter devices, do it know.
if self.agent_config is not None:
device_blacklist_re = self.agent_config.get('device_blacklist_re', None)
else:
device_blacklist_re = None
if device_blacklist_re:
filtered_io = {}
for device, stats in io.iteritems():
if not device_blacklist_re.match(device):
filtered_io[device] = stats
else:
filtered_io = io
measurements = []
timestamp = time.time()
for dev_name, stats in filtered_io.iteritems():
filtered_stats = dict((stat, stats[stat])
for stat in stats.iterkeys() if stat not in self.stat_blacklist)
m_list = [metrics.Measurement(key,
timestamp,
value,
self._set_dimensions({'device': dev_name}),
None)
for key, value in filtered_stats.iteritems()]
measurements.extend(m_list)
return measurements
except Exception:
self.logger.exception("Cannot extract IO statistics")
return []
class Load(check.Check):
def check(self):
if util.Platform.is_linux():
try:
loadAvrgProc = open('/proc/loadavg', 'r')
uptime = loadAvrgProc.readlines()
loadAvrgProc.close()
except Exception:
self.logger.exception('Cannot extract load')
return []
uptime = uptime[0] # readlines() provides a list but we want a string
elif sys.platform in ('darwin', 'sunos5') or sys.platform.startswith("freebsd"):
# Get output from uptime
try:
uptime = sp.Popen(['uptime'],
stdout=sp.PIPE,
close_fds=True).communicate()[0]
except Exception:
self.logger.exception('Cannot extract load')
return {}
# Split out the 3 load average values
load = [res.replace(',', '.') for res in re.findall(r'([0-9]+[\.,]\d+)', uptime)]
timestamp = time.time()
dimensions = self._set_dimensions(None)
return [metrics.Measurement('load.avg_1_min', timestamp, float(load[0]), dimensions),
metrics.Measurement('load.avg_5_min', timestamp, float(load[1]), dimensions),
metrics.Measurement('load.avg_15_min', timestamp, float(load[2]), dimensions)]
class Memory(check.Check):
def check(self):
mem_info = psutil.virtual_memory()
swap_info = psutil.swap_memory()
mem_data = {
'mem.total_mb': int(mem_info.total/1048576),
'mem.free_mb': int(mem_info.free/1048576),
'mem.usable_mb': int(mem_info.available/1048576),
'mem.usable_perc': float(100 - mem_info.percent),
'mem.swap_total_mb': int(swap_info.total/1048576),
'mem.swap_used_mb': int(swap_info.used/1048576),
'mem.swap_free_mb': int(swap_info.free/1048576),
'mem.swap_free_perc': float(100 - swap_info.percent)
}
if 'buffers' in mem_info:
mem_data['mem.used_buffers'] = int(mem_info.buffers/1048576)
if 'cached' in mem_info:
mem_data['mem.used_cache'] = int(mem_info.cached/1048576)
if 'shared' in mem_info:
mem_data['mem.used_shared'] = int(mem_info.shared/1048576)
timestamp = time.time()
dimensions = self._set_dimensions(None)
return [metrics.Measurement(key, timestamp, value, dimensions) for key, value in mem_data.iteritems()]
class Cpu(check.Check):
def check(self):
"""Return an aggregate of CPU stats across all CPUs.
"""
cpu_stats = psutil.cpu_times_percent(percpu=False)
return self._format_results(cpu_stats.user + cpu_stats.nice,
cpu_stats.system + cpu_stats.irq + cpu_stats.softirq,
cpu_stats.iowait,
cpu_stats.idle,
cpu_stats.steal)
def _format_results(self, us, sy, wa, idle, st):
data = {'cpu.user_perc': us,
'cpu.system_perc': sy,
'cpu.wait_perc': wa,
'cpu.idle_perc': idle,
'cpu.stolen_perc': st}
for key in data.keys():
if data[key] is None:
del data[key]
timestamp = time.time()
dimensions = self._set_dimensions(None)
return [metrics.Measurement(key, timestamp, value, dimensions) for key, value in data.iteritems()]
def _get_subprocess_output(command):
"""Run the given subprocess command and return it's output.
Raise an Exception if an error occurs.
"""
proc = sp.Popen(command, stdout=sp.PIPE, close_fds=True)
return proc.stdout.read()
if __name__ == '__main__':
# 1s loop with results
logging.basicConfig(level=logging.DEBUG, format='%(asctime)-15s %(message)s')
log = logging.getLogger()
config = {"device_blacklist_re": re.compile('.*disk0.*')}
cpu = Cpu(log, config)
disk = Disk(log, config)
io = IO(log, config)
load = Load(log, config)
mem = Memory(log, config)
while True:
print("=" * 10)
print("--- IO ---")
print(io.check())
print("--- Disk ---")
print(disk.check())
print("--- CPU ---")
print(cpu.check())
print("--- Load ---")
print(load.check())
print("--- Memory ---")
print(mem.check())
print("\n\n\n")
time.sleep(1)

View File

@ -0,0 +1,41 @@
import psutil
import logging
import monasca_agent.collector.checks as checks
log = logging.getLogger(__name__)
class Cpu(checks.AgentCheck):
def __init__(self, name, init_config, agent_config):
super(Cpu, self).__init__(name, init_config, agent_config)
def check(self, instance):
"""Capture cpu stats
"""
dimensions = self._set_dimensions(None, instance)
cpu_stats = psutil.cpu_times_percent(percpu=False)
self._format_results(cpu_stats.user + cpu_stats.nice,
cpu_stats.system + cpu_stats.irq + cpu_stats.softirq,
cpu_stats.iowait,
cpu_stats.idle,
cpu_stats.steal,
dimensions)
def _format_results(self, us, sy, wa, idle, st, dimensions):
data = {'cpu.user_perc': us,
'cpu.system_perc': sy,
'cpu.wait_perc': wa,
'cpu.idle_perc': idle,
'cpu.stolen_perc': st}
for key in data.keys():
if data[key] is None:
del data[key]
[self.gauge(key, value, dimensions) for key, value in data.iteritems()]
log.debug('Collected {0} cpu metrics'.format(len(data)))

View File

@ -0,0 +1,102 @@
import psutil
import logging
import os
log = logging.getLogger(__name__)
import monasca_agent.collector.checks as checks
class Disk(checks.AgentCheck):
def __init__(self, name, init_config, agent_config):
super(Disk, self).__init__(name, init_config, agent_config)
def check(self, instance):
"""Capture disk stats
"""
dimensions = self._set_dimensions(None, instance)
if instance is not None:
use_mount = instance.get("use_mount", True)
send_io_stats = instance.get("send_io_stats", True)
# If we filter devices, get the list.
device_blacklist_re = self._get_re_exclusions(instance)
fs_types_to_ignore = self._get_fs_exclusions(instance)
else:
use_mount = True
fs_types_to_ignore = []
device_blacklist_re = None
send_io_stats = True
partitions = psutil.disk_partitions()
disk_count = 0
for partition in partitions:
if partition.fstype not in fs_types_to_ignore \
or (device_blacklist_re \
and not device_blacklist_re.match(partition.device)):
device_name = self._get_device_name(partition.device)
disk_usage = psutil.disk_usage(partition.mountpoint)
st = os.statvfs(partition.mountpoint)
if use_mount:
dimensions.update({'mount_point': partition.mountpoint})
self.gauge("disk.space_used_perc",
disk_usage.percent,
device_name=device_name,
dimensions=dimensions)
disk_count += 1
if st.f_files > 0:
self.gauge("disk.inode_used_perc",
round((float(st.f_files - st.f_ffree) / st.f_files) * 100, 2),
device_name=device_name,
dimensions=dimensions)
disk_count += 1
log.debug('Collected {0} disk usage metrics for partition {1}'.format(disk_count, partition.mountpoint))
disk_count = 0
if send_io_stats:
disk_stats = psutil.disk_io_counters(perdisk=True)
stats = disk_stats[device_name]
self.rate("io.read_req_sec", float(stats.read_count), device_name=device_name, dimensions=dimensions)
self.rate("io.write_req_sec", float(stats.write_count), device_name=device_name, dimensions=dimensions)
self.rate("io.read_kbytes_sec", float(stats.read_bytes / 1024), device_name=device_name, dimensions=dimensions)
self.rate("io.write_kbytes_sec", float(stats.write_bytes / 1024), device_name=device_name, dimensions=dimensions)
self.rate("io.read_time_sec", float(stats.read_time / 1000), device_name=device_name, dimensions=dimensions)
self.rate("io.write_time_sec", float(stats.write_time / 1000), device_name=device_name, dimensions=dimensions)
log.debug('Collected 6 disk I/O metrics for partition {0}'.format(partition.mountpoint))
def _get_re_exclusions(self, instance):
"""Parse device blacklist regular expression"""
filter = None
try:
filter_device_re = instance.get('device_blacklist_re', None)
if filter_device_re:
filter = re.compile(filter_device_re)
except re.error as err:
log.error('Error processing regular expression {0}'.format(filter_device_re))
return filter
def _get_fs_exclusions(self, instance):
"""parse comma separated file system types to ignore list"""
file_system_list = []
try:
file_systems = instance.get('ignore_filesystem_types', None)
if file_systems:
# Parse file system types
file_system_list.extend([x.strip() for x in file_systems.split(',')])
except ValueError:
log.info("Unable to process ignore_filesystem_types.")
return file_system_list
def _get_device_name(self, device):
start = device.rfind("/")
if start > -1:
device_name = device[start + 1:]
else:
device_name = device
return device_name

View File

@ -64,7 +64,7 @@ class HTTPCheck(services_checks.ServicesCheck):
headers["Content-type"] = "application/json"
else:
self.log.warning("""Unable to get token. Keystone API server may be down.
Skipping check for {}""".format(addr))
Skipping check for {0}""".format(addr))
return
try:
self.log.debug("Connecting to %s" % addr)

View File

@ -69,7 +69,7 @@ consumer_groups:
consumer_offsets[(consumer_group, topic, partition)] = kafka_consumer.offsets[partition]
except KeyError:
kafka_consumer.stop()
self.log.error('Error fetching consumer offset for {} partition {}'.format(topic, partition))
self.log.error('Error fetching consumer offset for {0} partition {1}'.format(topic, partition))
kafka_consumer.stop()
# Query Kafka for the broker offsets, done in a separate loop so only one query is done
@ -82,7 +82,7 @@ consumer_groups:
response = kafka_conn.send_offset_request([common.OffsetRequest(topic, p, -1, 1)])
offset_responses.append(response[0])
except common.KafkaError as e:
self.log.error("Error fetching broker offset: {}".format(e))
self.log.error("Error fetching broker offset: {0}".format(e))
for resp in offset_responses:
broker_offsets[(resp.topic, resp.partition)] = resp.offsets[0]

View File

@ -32,10 +32,10 @@ class LibvirtCheck(AgentCheck):
def __init__(self, name, init_config, agent_config):
AgentCheck.__init__(self, name, init_config, agent_config)
self.instance_cache_file = "{}/{}".format(self.init_config.get('cache_dir'),
'libvirt_instances.yaml')
self.metric_cache_file = "{}/{}".format(self.init_config.get('cache_dir'),
'libvirt_metrics.yaml')
self.instance_cache_file = "{0}/{1}".format(self.init_config.get('cache_dir'),
'libvirt_instances.yaml')
self.metric_cache_file = "{0}/{1}".format(self.init_config.get('cache_dir'),
'libvirt_metrics.yaml')
def _test_vm_probation(self, created):
"""Test to see if a VM was created within the probation period.
@ -79,7 +79,7 @@ class LibvirtCheck(AgentCheck):
if stat.S_IMODE(os.stat(self.instance_cache_file).st_mode) != 0600:
os.chmod(self.instance_cache_file, 0600)
except IOError as e:
self.log.error("Cannot write to {}: {}".format(self.instance_cache_file, e))
self.log.error("Cannot write to {0}: {1}".format(self.instance_cache_file, e))
return id_cache
@ -125,7 +125,7 @@ class LibvirtCheck(AgentCheck):
if stat.S_IMODE(os.stat(self.metric_cache_file).st_mode) != 0600:
os.chmod(self.metric_cache_file, 0600)
except IOError as e:
self.log.error("Cannot write to {}: {}".format(self.metric_cache_file, e))
self.log.error("Cannot write to {0}: {1}".format(self.metric_cache_file, e))
def check(self, instance):
"""Gather VM metrics for each instance"""
@ -150,8 +150,8 @@ class LibvirtCheck(AgentCheck):
# Skip instances created within the probation period
vm_probation_remaining = self._test_vm_probation(instance_cache.get(inst.name)['created'])
if (vm_probation_remaining >= 0):
self.log.info("Libvirt: {} in probation for another {} seconds".format(instance_cache.get(inst.name)['hostname'],
vm_probation_remaining))
self.log.info("Libvirt: {0} in probation for another {1} seconds".format(instance_cache.get(inst.name)['hostname'],
vm_probation_remaining))
continue
# Build customer dimensions
@ -187,7 +187,7 @@ class LibvirtCheck(AgentCheck):
sample_time = int(time.time())
disk_dimensions = {'device': disk[0].device}
for metric in disk[1]._fields:
metric_name = "io.{}".format(metric)
metric_name = "io.{0}".format(metric)
if metric_name not in metric_cache[inst.name]:
metric_cache[inst.name][metric_name] = {}
@ -197,7 +197,7 @@ class LibvirtCheck(AgentCheck):
val_diff = value - metric_cache[inst.name][metric_name][disk[0].device]['value']
# Change the metric name to a rate, ie. "io.read_requests"
# gets converted to "io.read_ops_sec"
rate_name = "{}_sec".format(metric_name.replace('requests', 'ops'))
rate_name = "{0}_sec".format(metric_name.replace('requests', 'ops'))
# Customer
this_dimensions = disk_dimensions.copy()
this_dimensions.update(dims_customer)
@ -207,7 +207,7 @@ class LibvirtCheck(AgentCheck):
# Operations (metric name prefixed with "vm."
this_dimensions = disk_dimensions.copy()
this_dimensions.update(dims_operations)
self.gauge("vm.{}".format(rate_name), val_diff,
self.gauge("vm.{0}".format(rate_name), val_diff,
dimensions=this_dimensions)
# Save this metric to the cache
metric_cache[inst.name][metric_name][disk[0].device] = {
@ -219,7 +219,7 @@ class LibvirtCheck(AgentCheck):
sample_time = int(time.time())
vnic_dimensions = {'device': vnic[0].name}
for metric in vnic[1]._fields:
metric_name = "net.{}".format(metric)
metric_name = "net.{0}".format(metric)
if metric_name not in metric_cache[inst.name]:
metric_cache[inst.name][metric_name] = {}
@ -229,7 +229,7 @@ class LibvirtCheck(AgentCheck):
val_diff = value - metric_cache[inst.name][metric_name][vnic[0].name]['value']
# Change the metric name to a rate, ie. "net.rx_bytes"
# gets converted to "net.rx_bytes_sec"
rate_name = "{}_sec".format(metric_name)
rate_name = "{0}_sec".format(metric_name)
# Rename "tx" to "out" and "rx" to "in"
rate_name = rate_name.replace("tx", "out")
rate_name = rate_name.replace("rx", "in")
@ -243,7 +243,7 @@ class LibvirtCheck(AgentCheck):
# Operations (metric name prefixed with "vm."
this_dimensions = vnic_dimensions.copy()
this_dimensions.update(dims_operations)
self.gauge("vm.{}".format(rate_name), val_diff,
self.gauge("vm.{0}".format(rate_name), val_diff,
dimensions=this_dimensions)
# Save this metric to the cache
metric_cache[inst.name][metric_name][vnic[0].name] = {

View File

@ -0,0 +1,60 @@
import psutil
import logging
import re
import subprocess
import monasca_agent.collector.checks as checks
import monasca_agent.common.util as util
log = logging.getLogger(__name__)
class Load(checks.AgentCheck):
def __init__(self, name, init_config, agent_config):
super(Load, self).__init__(name, init_config, agent_config)
def check(self, instance):
"""Capture load stats
"""
dimensions = self._set_dimensions(None, instance)
if util.Platform.is_linux():
try:
loadAvrgProc = open('/proc/loadavg', 'r')
uptime = loadAvrgProc.readlines()
loadAvrgProc.close()
except Exception:
log.exception('Cannot extract load using /proc/loadavg')
return
uptime = uptime[0] # readlines() provides a list but we want a string
elif sys.platform in ('darwin', 'sunos5') or sys.platform.startswith("freebsd"):
# Get output from uptime
try:
uptime = subprocess.Popen(['uptime'],
stdout=sp.PIPE,
close_fds=True).communicate()[0]
except Exception:
log.exception('Cannot extract load using uptime')
return
# Split out the 3 load average values
load = [res.replace(',', '.') for res in re.findall(r'([0-9]+[\.,]\d+)', uptime)]
dimensions = self._set_dimensions(None)
self.gauge('load.avg_1_min',
float(load[0]),
dimensions=dimensions)
self.gauge('load.avg_5_min',
float(load[1]),
dimensions=dimensions)
self.gauge('load.avg_15_min',
float(load[2]),
dimensions=dimensions)
log.debug('Collected 3 load metrics')

View File

@ -0,0 +1,67 @@
import psutil
import logging
log = logging.getLogger(__name__)
import monasca_agent.collector.checks as checks
class Memory(checks.AgentCheck):
def __init__(self, name, init_config, agent_config):
super(Memory, self).__init__(name, init_config, agent_config)
def check(self, instance):
"""Capture memory stats
"""
dimensions = self._set_dimensions(None, instance)
mem_info = psutil.virtual_memory()
swap_info = psutil.swap_memory()
self.gauge('mem.total_mb',
int(mem_info.total/1048576),
dimensions=dimensions)
self.gauge('mem.free_mb',
int(mem_info.free/1048576),
dimensions=dimensions)
self.gauge('mem.usable_mb',
int(mem_info.available/1048576),
dimensions=dimensions)
self.gauge('mem.usable_perc',
float(100 - mem_info.percent),
dimensions=dimensions)
self.gauge('mem.swap_total_mb',
int(swap_info.total/1048576),
dimensions=dimensions)
self.gauge('mem.swap_used_mb',
int(swap_info.used/1048576),
dimensions=dimensions)
self.gauge('mem.swap_free_mb',
int(swap_info.free/1048576),
dimensions=dimensions)
self.gauge('mem.swap_free_perc',
float(100 - swap_info.percent),
dimensions=dimensions)
count = 8
if 'buffers' in mem_info:
self.gauge('mem.used_buffers',
int(mem_info.buffers/1048576),
dimensions=dimensions)
count +=1
if 'cached' in mem_info:
self.gauge('mem.used_cache',
int(mem_info.cached/1048576),
dimensions=dimensions)
count +=1
if 'shared' in mem_info:
self.gauge('mem.used_shared',
int(mem_info.shared/1048576),
dimensions=dimensions)
count +=1
log.debug('Collected {0} memory metrics'.format(count))

View File

@ -100,8 +100,8 @@ class WrapNagios(ServicesCheck):
self.gauge(instance['name'], status_code, dimensions=dimensions)
# Return DOWN on critical, UP otherwise
if status_code == "2":
return Status.DOWN, "DOWN: {}".format(detail)
return Status.UP, "UP: {}".format(detail)
return Status.DOWN, "DOWN: {0}".format(detail)
return Status.UP, "UP: {0}".format(detail)
# Save last-run data
file_w = open(last_run_file, "w")

View File

@ -1,352 +1,48 @@
"""Collects network metrics.
"""
# stdlib
import logging
import re
import subprocess
import psutil
# project
from monasca_agent.collector.checks import AgentCheck
from monasca_agent.common.util import Platform
import monasca_agent.collector.checks as checks
log = logging.getLogger(__name__)
class Network(AgentCheck):
class Network(checks.AgentCheck):
TCP_STATES = {
"ESTABLISHED": "established",
"SYN_SENT": "opening",
"SYN_RECV": "opening",
"FIN_WAIT1": "closing",
"FIN_WAIT2": "closing",
"TIME_WAIT": "time_wait",
"CLOSE": "closing",
"CLOSE_WAIT": "closing",
"LAST_ACK": "closing",
"LISTEN": "listening",
"CLOSING": "closing",
}
NETSTAT_GAUGE = {
('udp4', 'connections'): 'net.udp4_connections',
('udp6', 'connections'): 'net.udp6_connections',
('tcp4', 'established'): 'net.tcp4_established',
('tcp4', 'opening'): 'net.tcp4_opening',
('tcp4', 'closing'): 'net.tcp4_closing',
('tcp4', 'listening'): 'net.tcp4_listening',
('tcp4', 'time_wait'): 'net.tcp4_time_wait',
('tcp6', 'established'): 'net.tcp6_established',
('tcp6', 'opening'): 'net.tcp6_opening',
('tcp6', 'closing'): 'net.tcp6_closing',
('tcp6', 'listening'): 'net.tcp6_listening',
('tcp6', 'time_wait'): 'net.tcp6_time_wait',
}
def __init__(self, name, init_config, agent_config, instances=None):
AgentCheck.__init__(self, name, init_config, agent_config, instances=instances)
if instances is not None and len(instances) > 1:
raise Exception("Network check only supports one configured instance.")
def __init__(self, name, init_config, agent_config):
super(Network, self).__init__(name, init_config, agent_config)
def check(self, instance):
if instance is None:
instance = {}
"""Capture network metrics
"""
dimensions = self._set_dimensions(None, instance)
self._excluded_ifaces = instance.get('excluded_interfaces', [])
self._collect_cx_state = instance.get('collect_connection_state', False)
excluded_ifaces = instance.get('excluded_interfaces', [])
if excluded_ifaces:
log.debug('Excluding network devices: {0}'.format(excluded_ifaces))
self._exclude_iface_re = None
exclude_re = instance.get('excluded_interface_re', None)
if exclude_re:
self.log.debug("Excluding network devices matching: %s" % exclude_re)
self._exclude_iface_re = re.compile(exclude_re)
if Platform.is_linux():
self._check_linux(instance, dimensions)
elif Platform.is_bsd():
self._check_bsd(instance, dimensions)
elif Platform.is_solaris():
self._check_solaris(instance, dimensions)
def _submit_devicemetrics(self, iface, vals_by_metric, dimensions):
if self._exclude_iface_re and self._exclude_iface_re.match(iface):
# Skip this network interface.
return False
expected_metrics = [
'in_bytes',
'out_bytes',
'in_packets',
'in_errors',
'out_packets',
'out_errors',
]
for m in expected_metrics:
assert m in vals_by_metric
assert len(vals_by_metric) == len(expected_metrics)
# For reasons i don't understand only these metrics are skipped if a
# particular interface is in the `excluded_interfaces` config list.
# Not sure why the others aren't included. Until I understand why, I'm
# going to keep the same behaviour.
exclude_iface_metrics = [
'in_packets',
'in_errors',
'out_packets',
'out_errors',
]
count = 0
for metric, val in vals_by_metric.iteritems():
if iface in self._excluded_ifaces and metric in exclude_iface_metrics:
# skip it!
continue
self.rate('net.%s' % metric, val, device_name=iface, dimensions=dimensions)
count += 1
self.log.debug("tracked %s network metrics for interface %s" % (count, iface))
@staticmethod
def _parse_value(v):
if v == "-":
return 0
exclude_iface_re = re.compile(exclude_re)
log.debug('Excluding network devices matching: {0}'.format(exclude_re))
else:
try:
return long(v)
except ValueError:
return 0
exclude_iface_re = None
def _check_linux(self, instance, dimensions):
if self._collect_cx_state:
netstat = subprocess.Popen(["netstat", "-n", "-u", "-t", "-a"],
stdout=subprocess.PIPE,
close_fds=True).communicate()[0]
# Active Internet connections (w/o servers)
# Proto Recv-Q Send-Q Local Address Foreign Address State
# tcp 0 0 46.105.75.4:80 79.220.227.193:2032 SYN_RECV
# tcp 0 0 46.105.75.4:143 90.56.111.177:56867 ESTABLISHED
# tcp 0 0 46.105.75.4:50468 107.20.207.175:443 TIME_WAIT
# tcp6 0 0 46.105.75.4:80 93.15.237.188:58038 FIN_WAIT2
# tcp6 0 0 46.105.75.4:80 79.220.227.193:2029 ESTABLISHED
# udp 0 0 0.0.0.0:123 0.0.0.0:*
# udp6 0 0 :::41458 :::*
nics = psutil.net_io_counters(pernic=True)
count = 0
for nic_name in nics.keys():
if nic_name not in excluded_ifaces or exclude_iface_re and not exclude_iface_re.match(nic_name):
nic = nics[nic_name]
self.rate('net.in_bytes_sec', nic.bytes_recv, device_name=nic_name, dimensions=dimensions)
self.rate('net.out_bytes_sec', nic.bytes_sent, device_name=nic_name, dimensions=dimensions)
self.rate('net.in_packets_sec', nic.packets_recv, device_name=nic_name, dimensions=dimensions)
self.rate('net.out_packets_sec', nic.packets_sent, device_name=nic_name, dimensions=dimensions)
self.rate('net.in_errors_sec', nic.errin, device_name=nic_name, dimensions=dimensions)
self.rate('net.out_errors_sec', nic.errout, device_name=nic_name, dimensions=dimensions)
self.rate('net.in_packets_dropped_sec', nic.dropin, device_name=nic_name, dimensions=dimensions)
self.rate('net.out_packets_dropped_sec', nic.dropout, device_name=nic_name, dimensions=dimensions)
lines = netstat.split("\n")
metrics = dict.fromkeys(self.NETSTAT_GAUGE.values(), 0)
for l in lines[2:-1]:
cols = l.split()
# 0 1 2 3 4 5
# tcp 0 0 46.105.75.4:143 90.56.111.177:56867 ESTABLISHED
if cols[0].startswith("tcp"):
protocol = ("tcp4", "tcp6")[cols[0] == "tcp6"]
if cols[5] in self.TCP_STATES:
metric = self.NETSTAT_GAUGE[protocol, self.TCP_STATES[cols[5]]]
metrics[metric] += 1
elif cols[0].startswith("udp"):
protocol = ("udp4", "udp6")[cols[0] == "udp6"]
metric = self.NETSTAT_GAUGE[protocol, 'connections']
metrics[metric] += 1
for metric, value in metrics.iteritems():
self.gauge(metric, value, dimensions=dimensions)
proc = open('/proc/net/dev', 'r')
try:
lines = proc.readlines()
finally:
proc.close()
# Inter-| Receive | Transmit
# face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
# lo:45890956 112797 0 0 0 0 0 0 45890956 112797 0 0 0 0 0 0
# eth0:631947052 1042233 0 19 0 184 0 1206 1208625538 1320529 0 0 0 0 0 0
# eth1: 0 0 0 0 0 0 0 0
# 0 0 0 0 0 0 0 0
for l in lines[2:]:
cols = l.split(':', 1)
x = cols[1].split()
# Filter inactive interfaces
if self._parse_value(x[0]) or self._parse_value(x[8]):
iface = cols[0].strip()
metrics = {
'in_bytes': self._parse_value(x[0]),
'out_bytes': self._parse_value(x[8]),
'in_packets': self._parse_value(x[1]),
'in_errors': self._parse_value(x[2]) + self._parse_value(x[3]),
'out_packets': self._parse_value(x[9]),
'out_errors': self._parse_value(x[10]) + self._parse_value(x[11]),
}
self._submit_devicemetrics(iface, metrics, dimensions)
def _check_bsd(self, instance, dimensions):
netstat = subprocess.Popen(["netstat", "-i", "-b"],
stdout=subprocess.PIPE,
close_fds=True).communicate()[0]
# Name Mtu Network Address Ipkts Ierrs Ibytes Opkts Oerrs Obytes Coll
# lo0 16384 <Link#1> 318258 0 428252203 318258 0 428252203 0
# lo0 16384 localhost fe80:1::1 318258 - 428252203 318258 - 428252203 -
# lo0 16384 127 localhost 318258 - 428252203 318258 - 428252203 -
# lo0 16384 localhost ::1 318258 - 428252203 318258 - 428252203 -
# gif0* 1280 <Link#2> 0 0 0 0 0 0 0
# stf0* 1280 <Link#3> 0 0 0 0 0 0 0
# en0 1500 <Link#4> 04:0c:ce:db:4e:fa 20801309 0 13835457425 15149389 0 11508790198 0
# en0 1500 seneca.loca fe80:4::60c:ceff: 20801309 - 13835457425 15149389 - 11508790198 -
# en0 1500 2001:470:1f 2001:470:1f07:11d 20801309 - 13835457425 15149389 - 11508790198 -
# en0 1500 2001:470:1f 2001:470:1f07:11d 20801309 - 13835457425 15149389 - 11508790198 -
# en0 1500 192.168.1 192.168.1.63 20801309 - 13835457425 15149389 - 11508790198 -
# en0 1500 2001:470:1f 2001:470:1f07:11d 20801309 - 13835457425 15149389 - 11508790198 -
# p2p0 2304 <Link#5> 06:0c:ce:db:4e:fa 0 0 0 0 0 0 0
# ham0 1404 <Link#6> 7a:79:05:4d:bf:f5 30100 0 6815204 18742 0 8494811 0
# ham0 1404 5 5.77.191.245 30100 - 6815204 18742 - 8494811 -
# ham0 1404 seneca.loca fe80:6::7879:5ff: 30100 - 6815204 18742 - 8494811 -
# ham0 1404 2620:9b::54 2620:9b::54d:bff5 30100 - 6815204
# 18742 - 8494811 -
lines = netstat.split("\n")
headers = lines[0].split()
# Given the irregular structure of the table above, better to parse from the end of each line
# Verify headers first
# -7 -6 -5 -4 -3 -2 -1
for h in ("Ipkts", "Ierrs", "Ibytes", "Opkts", "Oerrs", "Obytes", "Coll"):
if h not in headers:
self.logger.error("%s not found in %s; cannot parse" % (h, headers))
return False
current = None
for l in lines[1:]:
# Another header row, abort now, this is IPv6 land
if "Name" in l:
break
x = l.split()
if len(x) == 0:
break
iface = x[0]
if iface.endswith("*"):
iface = iface[:-1]
if iface == current:
# skip multiple lines of same interface
continue
else:
current = iface
# Filter inactive interfaces
if self._parse_value(x[-5]) or self._parse_value(x[-2]):
iface = current
metrics = {
'in_bytes': self._parse_value(x[-5]),
'out_bytes': self._parse_value(x[-2]),
'in_packets': self._parse_value(x[-7]),
'in_errors': self._parse_value(x[-6]),
'out_packets': self._parse_value(x[-4]),
'out_errors': self._parse_value(x[-3]),
}
self._submit_devicemetrics(iface, metrics, dimensions)
def _check_solaris(self, instance, dimensions):
# Can't get bytes sent and received via netstat
# Default to kstat -p link:0:
netstat = subprocess.Popen(["kstat", "-p", "link:0:"],
stdout=subprocess.PIPE,
close_fds=True).communicate()[0]
metrics_by_interface = self._parse_solaris_netstat(netstat)
for interface, metrics in metrics_by_interface.iteritems():
self._submit_devicemetrics(interface, metrics, dimensions)
def _parse_solaris_netstat(self, netstat_output):
"""Return a mapping of network metrics by interface. For example:
{ interface:
{'out_bytes': 0,
'in_bytes': 0,
'in_packets': 0,
...
}
}
"""
# Here's an example of the netstat output:
#
# link:0:net0:brdcstrcv 527336
# link:0:net0:brdcstxmt 1595
# link:0:net0:class net
# link:0:net0:collisions 0
# link:0:net0:crtime 16359935.2637943
# link:0:net0:ierrors 0
# link:0:net0:ifspeed 10000000000
# link:0:net0:ipackets 682834
# link:0:net0:ipackets64 682834
# link:0:net0:link_duplex 0
# link:0:net0:link_state 1
# link:0:net0:multircv 0
# link:0:net0:multixmt 1595
# link:0:net0:norcvbuf 0
# link:0:net0:noxmtbuf 0
# link:0:net0:obytes 12820668
# link:0:net0:obytes64 12820668
# link:0:net0:oerrors 0
# link:0:net0:opackets 105445
# link:0:net0:opackets64 105445
# link:0:net0:rbytes 113983614
# link:0:net0:rbytes64 113983614
# link:0:net0:snaptime 16834735.1607669
# link:0:net0:unknowns 0
# link:0:net0:zonename 53aa9b7e-48ba-4152-a52b-a6368c3d9e7c
# link:0:net1:brdcstrcv 4947620
# link:0:net1:brdcstxmt 1594
# link:0:net1:class net
# link:0:net1:collisions 0
# link:0:net1:crtime 16359935.2839167
# link:0:net1:ierrors 0
# link:0:net1:ifspeed 10000000000
# link:0:net1:ipackets 4947620
# link:0:net1:ipackets64 4947620
# link:0:net1:link_duplex 0
# link:0:net1:link_state 1
# link:0:net1:multircv 0
# link:0:net1:multixmt 1594
# link:0:net1:norcvbuf 0
# link:0:net1:noxmtbuf 0
# link:0:net1:obytes 73324
# link:0:net1:obytes64 73324
# link:0:net1:oerrors 0
# link:0:net1:opackets 1594
# link:0:net1:opackets64 1594
# link:0:net1:rbytes 304384894
# link:0:net1:rbytes64 304384894
# link:0:net1:snaptime 16834735.1613302
# link:0:net1:unknowns 0
# link:0:net1:zonename 53aa9b7e-48ba-4152-a52b-a6368c3d9e7c
# A mapping of solaris names -> datadog names
metric_by_solaris_name = {
'rbytes64': 'in_bytes',
'obytes64': 'out_bytes',
'ipackets64': 'in_packets',
'ierrors': 'in_errors',
'opackets64': 'out_packets',
'oerrors': 'out_errors',
}
lines = [l for l in netstat_output.split("\n") if len(l) > 0]
metrics_by_interface = {}
for l in lines:
# Parse the metric & interface.
cols = l.split()
link, n, iface, name = cols[0].split(":")
assert link == "link"
# Get the datadog metric name.
ddname = metric_by_solaris_name.get(name, None)
if ddname is None:
continue
# Add it to this interface's list of metrics.
metrics = metrics_by_interface.get(iface, {})
metrics[ddname] = self._parse_value(cols[1])
metrics_by_interface[iface] = metrics
return metrics_by_interface
log.debug('Collected 8 network metrics for device {0}'.format(nic_name))

View File

@ -323,7 +323,7 @@ def run_check(check):
if isinstance(check, status_checks.ServicesCheck):
is_multi_threaded = True
print("#" * 80)
print("Check name: '{}'\n".format(check.name))
print("Check name: '{0}'\n".format(check.name))
check.run()
# Sleep for a second and then run a second check to capture rate metrics
time.sleep(1)

View File

@ -36,7 +36,7 @@ class Config(object):
elif os.path.exists(os.getcwd() + '/agent.conf'):
self._configFile = os.getcwd() + '/agent.conf'
else:
log.error('No config file found at {} nor in the working directory.'.format(DEFAULT_CONFIG_FILE))
log.error('No config file found at {0} nor in the working directory.'.format(DEFAULT_CONFIG_FILE))
self._read_config()
@ -48,7 +48,7 @@ class Config(object):
elif isinstance(sections, list):
section_list.extend(sections)
else:
log.error('Unknown section: {}'.format(str(sections)))
log.error('Unknown section: {0}'.format(str(sections)))
return {}
new_config = {}
@ -83,12 +83,9 @@ class Config(object):
'version': self.get_version(),
'additional_checksd': os.path.join(os.path.dirname(self._configFile), '/checks_d/'),
'system_metrics': None,
'ignore_filesystem_types': None,
'device_blacklist_re': None,
'limit_memory_consumption': None,
'skip_ssl_validation': False,
'watchdog': True,
'use_mount': False,
'autorestart': False,
'non_local_traffic': False},
'Api': {'is_enabled': False,
@ -165,24 +162,6 @@ class Config(object):
metrics_list = []
self._config['Main']['system_metrics'] = metrics_list
# Parse device blacklist regular expression
try:
filter_device_re = self._config['Main']['device_blacklist_re']
if filter_device_re:
self._config['Main']['device_blacklist_re'] = re.compile(filter_device_re)
except re.error as err:
log.error('Error processing regular expression {0}'.format(filter_device_re))
# Parse file system types
if self._config['Main']['ignore_filesystem_types']:
# parse comma separated file system types to ignore list
try:
file_system_list = [x.strip() for x in self._config['Main']['ignore_filesystem_types'].split(',')]
except ValueError:
log.info("Unable to process ignore_filesystem_types.")
file_system_list = []
self._config['Main']['ignore_filesystem_types'] = file_system_list
def get_confd_path(self):
path = os.path.join(os.path.dirname(self._configFile), 'conf.d')
if os.path.exists(path):
@ -217,8 +196,8 @@ def main():
configuration = Config()
config = configuration.get_config()
api_config = configuration.get_config('Api')
print "Main Configuration: \n {}".format(config)
print "\nApi Configuration: \n {}".format(api_config)
print "Main Configuration: \n {0}".format(config)
print "\nApi Configuration: \n {0}".format(api_config)
if __name__ == "__main__":

View File

@ -44,8 +44,8 @@ def http_emitter(message, log, url):
response = opener.open(request)
log.debug('http_emitter: postback response: ' + str(response.read()))
except Exception as exc:
log.error("""Forwarder at {} is down or not responding...
Error is {}
log.error("""Forwarder at {0} is down or not responding...
Error is {1}
Please restart the monasca-agent.""".format(url, repr(exc)))
finally:
if response:

View File

@ -72,7 +72,7 @@ class Keystone(object):
self._keystone_client = self._get_ksclient()
except Exception as exc:
log.error("Unable to create the Keystone Client. " +
"Error was {}".format(repr(exc)))
"Error was {0}".format(repr(exc)))
return None
self._token = self._keystone_client.token

View File

@ -20,6 +20,7 @@ import uuid
import logging
log = logging.getLogger(__name__)
# Tornado
try:
from tornado import ioloop, version_info as tornado_version
@ -384,7 +385,6 @@ class Paths(object):
log.info("Windows certificate path: %s" % crt_path)
tornado.simple_httpclient._DEFAULT_CA_CERTS = crt_path
def plural(count):
if count == 1:
return ""

View File

@ -9,13 +9,13 @@ from keystone import Keystone
from libvirt import Libvirt
from mon import MonAPI, MonPersister, MonThresh
from mysql import MySQL
from network import Network
from neutron import Neutron
from nova import Nova
from ntp import Ntp
from postfix import Postfix
from rabbitmq import RabbitMQ
from swift import Swift
from system import System
from zookeeper import Zookeeper
DETECTION_PLUGINS = [Apache,
@ -29,11 +29,11 @@ DETECTION_PLUGINS = [Apache,
MonPersister,
MonThresh,
MySQL,
Network,
Neutron,
Nova,
Ntp,
Postfix,
RabbitMQ,
Swift,
System,
Zookeeper]

View File

@ -48,9 +48,9 @@ class Libvirt(monasca_setup.detection.Plugin):
init_config[option] = nova_cfg.get(cfg_section, option)
# Create an identity server URI
init_config['identity_uri'] = "{}://{}:{}/v2.0".format(nova_cfg.get(cfg_section, 'auth_protocol'),
nova_cfg.get(cfg_section, 'auth_host'),
nova_cfg.get(cfg_section, 'auth_port'))
init_config['identity_uri'] = "{0}://{1}:{2}/v2.0".format(nova_cfg.get(cfg_section, 'auth_protocol'),
nova_cfg.get(cfg_section, 'auth_host'),
nova_cfg.get(cfg_section, 'auth_port'))
config['libvirt'] = {'init_config': init_config,
'instances': [{}]}

View File

@ -1,34 +0,0 @@
import os
import yaml
from monasca_setup.detection import Plugin
from monasca_setup import agent_config
class Network(Plugin):
"""No configuration here, working networking is assumed so this is either on or off.
"""
def _detect(self):
"""Run detection, set self.available True if the service is detected.
"""
self.available = True
def build_config(self):
"""Build the config as a Plugins object and return.
"""
# A bit silly to parse the yaml only for it to be converted back but this
# plugin is the exception not the rule
with open(os.path.join(self.template_dir, 'conf.d/network.yaml'), 'r') as network_template:
default_net_config = yaml.load(network_template.read())
config = agent_config.Plugins()
config['network'] = default_net_config
return config
def dependencies_installed(self):
return True

View File

@ -0,0 +1,42 @@
import logging
import os
import yaml
from monasca_setup.detection import Plugin
from monasca_setup import agent_config
log = logging.getLogger(__name__)
class System(Plugin):
"""No configuration here, the system metrics are assumed so this is either on or off.
"""
system_metrics = ['network', 'disk', 'load', 'memory', 'cpu']
def _detect(self):
"""Run detection, set self.available True if the service is detected.
"""
self.available = True
def build_config(self):
"""Build the configs for the system metrics as Plugin objects and return.
"""
config = agent_config.Plugins()
for metric in System.system_metrics:
try:
with open(os.path.join(self.template_dir, 'conf.d/' + metric + '.yaml'), 'r') as metric_template:
default_config = yaml.load(metric_template.read())
config[metric] = default_config
log.info('\tConfigured {0}'.format(metric))
except (OSError, IOError):
log.info('\tUnable to configure {0}'.format(metric))
continue
return config
def dependencies_installed(self):
return True

View File

@ -96,7 +96,7 @@ def main(argv=None):
if detected_os == 'Linux':
linux_flavor = platform.linux_distribution()[0]
if 'Ubuntu' or 'debian' in linux_flavor:
for package in ['coreutils', 'sysstat']:
for package in ['coreutils']:
# Check for required dependencies for system checks
try:
output = check_output('dpkg -s {0}'.format(package),