Update collectd cpu plugin and monitor-tools to diagnose cpu spikes
The collectd cpu plugin and monitor-tools are updated to support diagnosing high cpu usage on shorter time scale. This includes tools that assist SystemEngineering determine the source where CPU time is coming from. This collectd cpu plugin is updated to support Kubernetes services under system.slice or k8splatform.slice. This changes the frequency of read function sampling to 1 second. We now see logs with instantaneous cpu spikes at the cgroup level. This dispatch of results still occurs at the original plugin interval of 30 seconds. The logging of the 1 second sampling is configurable via /etc/collectd.d/starlingx/python_plugins.conf field 'hires = <true|false>. The hiresolution samples are always collected and used for a histogram, but it is not always desired to log this due to the volume of output. This adds new logs for occupancy wait. This is similar to cpu occupancy, but instead of realtime used, it measures the aggregate percent of time a given cgroup is waiting to schedule. This is a measure of CPU contention. This adds new logs for occupancy histograms for all cgroups and aggregated groupings based on the 1 second occupancy samples. The histograms are displayed in hirunner order. This displays the histogram, the mean, 95th-percentile, and max value. The histograms are logged at 5 minute intervals. This reduces collectd cgroup to 256 CPUShare from (1024). This smoothes out behaviour of poorly behaved audits. The 'schedtop' tool is updated to display 'cgroup' field. This is the systemd cgroup name, or abbrieviated pod-name. This also handles Kernel sched output format changes for 6.6. New tool 'portscanner' is added to monitor-tools to diagnose local host processes that are using specific ports. This has been instrumental in discovering gunicorn/keystone API users. New tool 'k8smetrics' is added to monitor-tools to display the delay histogram and percentiles for kube-apiserver and etdcserver. This gives a way to quantify performance as a result of system load. Partial-Bug: 2084714 TEST PLAN: AIO-SX, AIO-DX, Standard, Storage, DC: PASS: Fresh install ISO PASS: Verify /var/log/collectd.logs for 1 second cpu/wait logs, and contains: etcd, kubelet, and containerd services. PASS: Verify we are dispatching at 30 second granularity. PASS: Verify we are displaying histograms every 5 minutes. PASS: Verify we can enable/disable the display of hiresolution logs with /etc/collectd.d/starlingx/python_plugins.conf field 'hires = <true|false>'. PASS: Verify schedtop contains 'cgroup' output. PASS: Verify output from 'k8smetrics'. Cross check against Prometheus GUI for apiserver percentile. PASS: Verify output from portscanner with port 5000. Verify 1-to-1 mapping against /var/log/keystone/keystone-all.log. Change-Id: I82d4f414afdf1cecbcc99680b360cbad702ba140 Signed-off-by: Jim Gauld <James.Gauld@windriver.com>
This commit is contained in:
parent
8f404ea66c
commit
0232b8b9dc
@ -12,5 +12,9 @@ ExecStart=/usr/sbin/collectd
|
||||
ExecStartPost=/bin/bash -c 'echo $MAINPID > /var/run/collectd.pid'
|
||||
ExecStopPost=/bin/rm -f /var/run/collectd.pid
|
||||
|
||||
# cgroup performance engineering
|
||||
# - smooth out CPU impulse from poorly behaved plugin
|
||||
CPUShares=256
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
|
@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2018-2021 Wind River Systems, Inc.
|
||||
# Copyright (c) 2018-2024 Wind River Systems, Inc.
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
@ -17,6 +17,7 @@
|
||||
############################################################################
|
||||
import collectd
|
||||
import copy
|
||||
import numpy as np
|
||||
import os
|
||||
import plugin_common as pc
|
||||
import re
|
||||
@ -26,8 +27,13 @@ import tsconfig.tsconfig as tsc
|
||||
|
||||
from kubernetes.client.rest import ApiException
|
||||
|
||||
PLUGIN = 'platform cpu usage plugin'
|
||||
#PLUGIN = 'platform cpu usage plugin'
|
||||
PLUGIN = 'platform cpu'
|
||||
PLUGIN_HISTOGRAM = 'histogram'
|
||||
PLUGIN_DEBUG = 'DEBUG platform cpu'
|
||||
PLUGIN_HIRES_INTERVAL = 1 # hi-resolution sample interval in secs
|
||||
PLUGIN_DISPATCH_INTERVAL = 30 # dispatch interval in secs
|
||||
PLUGIN_HISTOGRAM_INTERVAL = 300 # histogram interval in secs
|
||||
|
||||
TIMESTAMP = 'timestamp'
|
||||
PLATFORM_CPU_PERCENT = 'platform-occupancy'
|
||||
@ -42,25 +48,38 @@ SCHEDSTAT = '/proc/schedstat'
|
||||
CPUACCT = pc.CGROUP_ROOT + '/cpuacct'
|
||||
CPUACCT_USAGE = 'cpuacct.usage'
|
||||
CPUACCT_USAGE_PERCPU = 'cpuacct.usage_percpu'
|
||||
CPU_STAT = 'cpu.stat'
|
||||
|
||||
# Common regex pattern match groups
|
||||
re_uid = re.compile(r'^pod(\S+)')
|
||||
re_processor = re.compile(r'^[Pp]rocessor\s+:\s+(\d+)')
|
||||
re_schedstat = re.compile(r'^cpu(\d+)\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+(\d+)\s+')
|
||||
re_schedstat = re.compile(r'^cpu(\d+)\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+(\d+)\s+(\d+)\s+')
|
||||
re_schedstat_version = re.compile(r'^version\s+(\d+)')
|
||||
re_keyquoteval = re.compile(r'^\s*(\S+)\s*[=:]\s*\"(\S+)\"\s*')
|
||||
re_cpu_wait_sum = re.compile(r'^wait_sum\s+(\d+)')
|
||||
|
||||
# hirunner minimum cpu occupancy threshold
|
||||
HIRUNNER_MINIMUM_CPU_PERCENT = 0.1
|
||||
|
||||
# Set numpy format for printing bins
|
||||
np.set_printoptions(formatter={'int': '{: 4d}'.format})
|
||||
|
||||
|
||||
# Plugin specific control class and object.
|
||||
class CPU_object(pc.PluginObject):
|
||||
|
||||
def __init__(self):
|
||||
super(CPU_object, self).__init__(PLUGIN, '')
|
||||
# CPU Plugin flags
|
||||
self.dispatch = False # print occupancy and dispatch this sample
|
||||
self.histogram = False # print occupancy histogram this sample
|
||||
|
||||
# CPU plugin configurable settings
|
||||
self.debug = True
|
||||
self.verbose = True
|
||||
self.hires = False
|
||||
|
||||
# Cache Kubernetes pods data
|
||||
self._cache = {}
|
||||
self._k8s_client = pc.K8sClient()
|
||||
self.k8s_pods = set()
|
||||
@ -69,15 +88,50 @@ class CPU_object(pc.PluginObject):
|
||||
self.schedstat_supported = True
|
||||
self.number_platform_cpus = 0
|
||||
|
||||
# Platform CPU monitor
|
||||
now = time.time() # epoch time in floating seconds
|
||||
self._t0 = {} # cputime state info at start of sample interval
|
||||
self._t0[TIMESTAMP] = now
|
||||
self._t0_cpuacct = {}
|
||||
|
||||
self._data = {} # derived measurements at end of sample interval
|
||||
self._data[PLATFORM_CPU_PERCENT] = 0.0
|
||||
self.elapsed_ms = 0.0
|
||||
# CPU State information at start of dispatch interval
|
||||
self.d_t0 = {} # per-cpu cputime at dispatch time 0
|
||||
self.d_w0 = {} # per-cpu cpuwait at dispatch time 0
|
||||
self.d_t0[TIMESTAMP] = now # timestamp dispatch time 0
|
||||
self.d_w0[TIMESTAMP] = now # timestamp dispatch time 0
|
||||
self.d_t0_cpuacct = {} # per-cgroup cpuacct at dispatch time 0
|
||||
self.d_t0_cpuwait = {} # per-cgroup cpuwait at dispatch time 0
|
||||
|
||||
# Derived measurements over dispatch interval
|
||||
self.d_occ = {} # dispatch occupancy per cgroup or derived aggregate
|
||||
self.d_occw = {} # dispatch occupancy wait per cgroup or derived aggregate
|
||||
self.d_occ[PLATFORM_CPU_PERCENT] = 0.0 # dispatch platform occupancy
|
||||
self.d_occw[PLATFORM_CPU_PERCENT] = 0.0 # dispatch platform occupancy wait
|
||||
for g in pc.OVERALL_GROUPS:
|
||||
self.d_occ[g] = 0.0
|
||||
self.d_occw[g] = 0.0
|
||||
self.d_elapsed_ms = 0.0 # dispatch elapsed time
|
||||
|
||||
# CPU State information at start of read sample interval
|
||||
self._t0 = {} # per-cpu cputime at time 0
|
||||
self._w0 = {} # per-cpu cpuwait at time 0
|
||||
self._t0[TIMESTAMP] = now # timestamp time 0
|
||||
self._w0[TIMESTAMP] = now # timestamp time 0
|
||||
self._t0_cpuacct = {} # per-cgroup cpuacct at time 0
|
||||
self._t0_cpuwait = {} # per-cgroup cpuwait at time 0
|
||||
|
||||
# Derived measurements over read sample interval
|
||||
self._occ = {} # occupancy per cgroup or derived aggregate
|
||||
self._occw = {} # occupancy wait per cgroup or derived aggregate
|
||||
self._occ[PLATFORM_CPU_PERCENT] = 0.0 # platform occupancy
|
||||
self._occw[PLATFORM_CPU_PERCENT] = 0.0 # platform occupancy wait
|
||||
for g in pc.OVERALL_GROUPS:
|
||||
self._occ[g] = 0.0
|
||||
self._occw[g] = 0.0
|
||||
self.elapsed_ms = 0.0 # elapsed time
|
||||
|
||||
# Derived measurements over histogram interval
|
||||
self.hist_t0 = now # histogram timestamp time 0
|
||||
self.hist_elapsed_ms = 0.0 # histogram elapsed time
|
||||
self.hist_occ = {} # histogram bin counts per cgroup or derived aggregate
|
||||
self.shared_bins = np.histogram_bin_edges(
|
||||
np.array([0, 100], dtype=np.float64), bins=10, range=(0, 100))
|
||||
|
||||
|
||||
# Instantiate the class
|
||||
@ -87,13 +141,17 @@ obj = CPU_object()
|
||||
def read_schedstat():
|
||||
"""Read current hiresolution times per cpu from /proc/schedstats.
|
||||
|
||||
Return dictionary of cputimes in nanoseconds per cpu.
|
||||
Return dictionary of cputimes in nanoseconds per cpu,
|
||||
dictionary of cpuwaits in nanoseconds per cpu.
|
||||
"""
|
||||
|
||||
cputime = {}
|
||||
cpuwait = {}
|
||||
|
||||
# Obtain cumulative cputime (nanoseconds) from 7th field of
|
||||
# /proc/schedstat. This is the time running tasks on this cpu.
|
||||
# Obtain cumulative cputime (nanoseconds) from 7th field,
|
||||
# and cumulative cpuwait (nanoseconds) from 8th field,
|
||||
# from /proc/schedstat. This is the time running and waiting
|
||||
# for tasks on this cpu.
|
||||
try:
|
||||
with open(SCHEDSTAT, 'r') as f:
|
||||
for line in f:
|
||||
@ -101,11 +159,13 @@ def read_schedstat():
|
||||
if match:
|
||||
k = int(match.group(1))
|
||||
v = int(match.group(2))
|
||||
w = int(match.group(3))
|
||||
cputime[k] = v
|
||||
cpuwait[k] = w
|
||||
except Exception as err:
|
||||
collectd.error('%s Cannot read schedstat, error=%s' % (PLUGIN, err))
|
||||
|
||||
return cputime
|
||||
return cputime, cpuwait
|
||||
|
||||
|
||||
def get_logical_cpus():
|
||||
@ -202,8 +262,36 @@ def get_cgroup_cpuacct(path, cpulist=None):
|
||||
return acct
|
||||
|
||||
|
||||
def get_cgroup_cpu_wait_sum(path):
|
||||
"""Get cgroup cpu.stat wait_sum usage for a specific cgroup path.
|
||||
|
||||
This represents the aggregate of all tasks wait time cfs_rq.
|
||||
This tells us how suffering a task group is in the fight of
|
||||
cpu resources.
|
||||
|
||||
Returns cumulative wait_sum in nanoseconds.
|
||||
"""
|
||||
|
||||
wait_sum = 0
|
||||
|
||||
# Get the aggregate wait_sum for all cpus
|
||||
fstat = '/'.join([path, CPU_STAT])
|
||||
try:
|
||||
with open(fstat, 'r') as f:
|
||||
for line in f:
|
||||
match = re_cpu_wait_sum.search(line)
|
||||
if match:
|
||||
v = int(match.group(1))
|
||||
wait_sum = int(v)
|
||||
except IOError:
|
||||
# Silently ignore IO errors. It is likely the cgroup disappeared.
|
||||
pass
|
||||
|
||||
return wait_sum
|
||||
|
||||
|
||||
def get_cpuacct():
|
||||
"""Get cpuacct usage based on cgroup hierarchy."""
|
||||
"""Get cpuacct usage and wait_sum based on cgroup hierarchy."""
|
||||
|
||||
cpuacct = {}
|
||||
cpuacct[pc.GROUP_OVERALL] = {}
|
||||
@ -211,48 +299,86 @@ def get_cpuacct():
|
||||
cpuacct[pc.GROUP_PODS] = {}
|
||||
cpuacct[pc.CGROUP_SYSTEM] = {}
|
||||
cpuacct[pc.CGROUP_USER] = {}
|
||||
cpuacct[pc.CGROUP_INIT] = {}
|
||||
cpuacct[pc.CGROUP_K8SPLATFORM] = {}
|
||||
|
||||
cpuwait = {}
|
||||
cpuwait[pc.GROUP_OVERALL] = {}
|
||||
cpuwait[pc.GROUP_FIRST] = {}
|
||||
cpuwait[pc.GROUP_PODS] = {}
|
||||
cpuwait[pc.CGROUP_SYSTEM] = {}
|
||||
cpuwait[pc.CGROUP_USER] = {}
|
||||
cpuwait[pc.CGROUP_INIT] = {}
|
||||
cpuwait[pc.CGROUP_K8SPLATFORM] = {}
|
||||
|
||||
exclude_types = ['.mount']
|
||||
|
||||
# Overall cpuacct usage
|
||||
acct = get_cgroup_cpuacct(CPUACCT, cpulist=obj.cpu_list)
|
||||
wait = get_cgroup_cpu_wait_sum(CPUACCT)
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_TOTAL] = acct
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_TOTAL] = wait
|
||||
|
||||
# Initialize 'overhead' time (derived measurement). This will contain
|
||||
# the remaining cputime not specifically tracked by first-level cgroups.
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_OVERHEAD] = acct
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_OVERHEAD] = wait
|
||||
|
||||
# Walk the first level cgroups and get cpuacct usage
|
||||
# (e.g., docker, k8s-infra, user.slice, system.slice, machine.slice)
|
||||
dir_list = next(os.walk(CPUACCT))[1]
|
||||
for name in dir_list:
|
||||
if any(name.endswith(x) for x in ['.mount', '.scope']):
|
||||
if any(name.endswith(x) for x in exclude_types):
|
||||
continue
|
||||
cg_path = '/'.join([CPUACCT, name])
|
||||
acct = get_cgroup_cpuacct(cg_path, cpulist=obj.cpu_list)
|
||||
wait = get_cgroup_cpu_wait_sum(cg_path)
|
||||
cpuacct[pc.GROUP_FIRST][name] = acct
|
||||
cpuwait[pc.GROUP_FIRST][name] = wait
|
||||
|
||||
# Subtract out first-level cgroups. The remaining cputime represents
|
||||
# systemd 'init' pid and kthreads on Platform cpus.
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_OVERHEAD] -= acct
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_OVERHEAD] -= wait
|
||||
|
||||
# Walk the system.slice cgroups and get cpuacct usage
|
||||
path = '/'.join([CPUACCT, pc.CGROUP_SYSTEM])
|
||||
dir_list = next(os.walk(path))[1]
|
||||
for name in dir_list:
|
||||
if any(name.endswith(x) for x in ['.mount', '.scope']):
|
||||
if any(name.endswith(x) for x in exclude_types):
|
||||
continue
|
||||
cg_path = '/'.join([path, name])
|
||||
acct = get_cgroup_cpuacct(cg_path, cpulist=obj.cpu_list)
|
||||
wait = get_cgroup_cpu_wait_sum(cg_path)
|
||||
cpuacct[pc.CGROUP_SYSTEM][name] = acct
|
||||
cpuwait[pc.CGROUP_SYSTEM][name] = wait
|
||||
|
||||
# Walk the system.slice cgroups and get cpuacct usage
|
||||
path = '/'.join([CPUACCT, pc.CGROUP_K8SPLATFORM])
|
||||
if os.path.isdir(path):
|
||||
dir_list = next(os.walk(path))[1]
|
||||
else:
|
||||
dir_list = []
|
||||
for name in dir_list:
|
||||
if any(name.endswith(x) for x in exclude_types):
|
||||
continue
|
||||
cg_path = '/'.join([path, name])
|
||||
acct = get_cgroup_cpuacct(cg_path, cpulist=obj.cpu_list)
|
||||
wait = get_cgroup_cpu_wait_sum(cg_path)
|
||||
cpuacct[pc.CGROUP_K8SPLATFORM][name] = acct
|
||||
cpuwait[pc.CGROUP_K8SPLATFORM][name] = wait
|
||||
|
||||
# Walk the user.slice cgroups and get cpuacct usage
|
||||
path = '/'.join([CPUACCT, pc.CGROUP_USER])
|
||||
dir_list = next(os.walk(path))[1]
|
||||
for name in dir_list:
|
||||
if any(name.endswith(x) for x in ['.mount', '.scope']):
|
||||
if any(name.endswith(x) for x in exclude_types):
|
||||
continue
|
||||
cg_path = '/'.join([path, name])
|
||||
acct = get_cgroup_cpuacct(cg_path, cpulist=obj.cpu_list)
|
||||
wait = get_cgroup_cpu_wait_sum(cg_path)
|
||||
cpuacct[pc.CGROUP_USER][name] = acct
|
||||
cpuwait[pc.CGROUP_USER][name] = wait
|
||||
|
||||
# Walk the kubepods hierarchy to the pod level and get cpuacct usage.
|
||||
# We can safely ignore reading this if the path does not exist.
|
||||
@ -268,8 +394,357 @@ def get_cpuacct():
|
||||
uid = match.group(1)
|
||||
cg_path = os.path.join(root, name)
|
||||
acct = get_cgroup_cpuacct(cg_path)
|
||||
wait = get_cgroup_cpu_wait_sum(cg_path)
|
||||
cpuacct[pc.GROUP_PODS][uid] = acct
|
||||
return cpuacct
|
||||
cpuwait[pc.GROUP_PODS][uid] = wait
|
||||
return cpuacct, cpuwait
|
||||
|
||||
|
||||
def calculate_occupancy(
|
||||
prefix, hires, dispatch,
|
||||
cache,
|
||||
t0, t1,
|
||||
w0, w1,
|
||||
t0_cpuacct, t1_cpuacct,
|
||||
t0_cpuwait, t1_cpuwait,
|
||||
occ, occw,
|
||||
elapsed_ms,
|
||||
number_platform_cpus,
|
||||
cpu_list, debug):
|
||||
"""Calculate average occupancy and wait for platform cpus and cgroups.
|
||||
|
||||
This calculates:
|
||||
- per-cpu cputime delta between time 0 and time 1 (ms)
|
||||
- per-cpu cpuwait delta between time 0 and time 1 (ms)
|
||||
- average platform occupancy based on cputime (%)
|
||||
- average platform occupancy wait based on cpuwait (%)
|
||||
- per-cgroup cpuacct delta between time 0 and time 1
|
||||
- per-cgroup cpuwait delta between time 0 and time 1
|
||||
- average per-cgroup occupancy based on cpuacct (%)
|
||||
- average per-cgroup occupancy wait based on cpuwait (%)
|
||||
- aggregate occupancy of specific cgroup groupings (%)
|
||||
- aggregate occupancy wait of specific cgroup groupings (%)
|
||||
|
||||
This logs platform occupancy and aggregate cgroup groupings.
|
||||
This logs of hirunner occupancy for base cgroups.
|
||||
"""
|
||||
|
||||
# Aggregate cputime and cpuwait delta for platform logical cpus
|
||||
cputime_ms = 0.0
|
||||
cpuwait_ms = 0.0
|
||||
for cpu in cpu_list:
|
||||
# Paranoia check, we should never hit this.
|
||||
if cpu not in t0 or cpu not in w0:
|
||||
collectd.error('%s cputime initialization error' % (PLUGIN))
|
||||
break
|
||||
cputime_ms += float(t1[cpu] - t0[cpu])
|
||||
cpuwait_ms += float(w1[cpu] - w0[cpu])
|
||||
cputime_ms /= float(pc.ONE_MILLION)
|
||||
cpuwait_ms /= float(pc.ONE_MILLION)
|
||||
|
||||
# Calculate average occupancy and wait of platform logical cpus
|
||||
p_occ = 0.0
|
||||
p_occw = 0.0
|
||||
if number_platform_cpus > 0 and elapsed_ms > 0:
|
||||
p_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
p_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
else:
|
||||
p_occ = 0.0
|
||||
p_occw = 0.0
|
||||
|
||||
if debug:
|
||||
collectd.info('%s %s %s elapsed = %.1f ms, '
|
||||
'cputime = %.1f ms, cpuwait = %.1f ms, '
|
||||
'n_cpus = %d, '
|
||||
'occupancy = %.2f %%, wait = %.2f %%'
|
||||
% (PLUGIN_DEBUG,
|
||||
prefix,
|
||||
PLATFORM_CPU_PERCENT,
|
||||
elapsed_ms,
|
||||
cputime_ms, cpuwait_ms,
|
||||
number_platform_cpus,
|
||||
p_occ, p_occw))
|
||||
|
||||
occ[PLATFORM_CPU_PERCENT] = p_occ
|
||||
occw[PLATFORM_CPU_PERCENT] = p_occw
|
||||
|
||||
# Calculate cpuacct and cpuwait delta for cgroup hierarchy, dropping transient cgroups
|
||||
cpuacct = {}
|
||||
for i in t1_cpuacct.keys():
|
||||
cpuacct[i] = {}
|
||||
for k, v in t1_cpuacct[i].items():
|
||||
if i in t0_cpuacct.keys() and k in t0_cpuacct[i].keys():
|
||||
cpuacct[i][k] = v - t0_cpuacct[i][k]
|
||||
else:
|
||||
cpuacct[i][k] = v
|
||||
cpuwait = {}
|
||||
for i in t1_cpuwait.keys():
|
||||
cpuwait[i] = {}
|
||||
for k, v in t1_cpuwait[i].items():
|
||||
if i in t0_cpuwait.keys() and k in t0_cpuwait[i].keys():
|
||||
cpuwait[i][k] = v - t0_cpuwait[i][k]
|
||||
else:
|
||||
cpuwait[i][k] = v
|
||||
|
||||
# Summarize cpuacct usage for various groupings we aggregate
|
||||
for g in pc.GROUPS_AGGREGATED:
|
||||
cpuacct[pc.GROUP_OVERALL][g] = 0.0
|
||||
cpuwait[pc.GROUP_OVERALL][g] = 0.0
|
||||
|
||||
# Aggregate cpuacct usage by K8S pod
|
||||
for uid in cpuacct[pc.GROUP_PODS]:
|
||||
acct = cpuacct[pc.GROUP_PODS][uid]
|
||||
wait = cpuwait[pc.GROUP_PODS][uid]
|
||||
if uid in cache:
|
||||
pod = cache[uid]
|
||||
else:
|
||||
collectd.warning('%s uid %s not found' % (PLUGIN, uid))
|
||||
continue
|
||||
|
||||
# K8S platform system usage, i.e., essential: kube-system
|
||||
# check for component label app.starlingx.io/component=platform
|
||||
if pod.is_platform_resource():
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_K8S_SYSTEM] += acct
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_K8S_SYSTEM] += wait
|
||||
|
||||
# K8S platform addons usage, i.e., non-essential: monitor, openstack
|
||||
if pod.namespace in pc.K8S_NAMESPACE_ADDON:
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_K8S_ADDON] += acct
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_K8S_ADDON] += wait
|
||||
|
||||
# Calculate base cpuacct usage (i.e., base tasks, exclude K8S and VMs)
|
||||
# e.g., docker, system.slice, user.slice, init.scope
|
||||
for name in cpuacct[pc.GROUP_FIRST].keys():
|
||||
if name in pc.BASE_GROUPS:
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_BASE] += \
|
||||
cpuacct[pc.GROUP_FIRST][name]
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_BASE] += \
|
||||
cpuwait[pc.GROUP_FIRST][name]
|
||||
elif name not in pc.BASE_GROUPS_EXCLUDE:
|
||||
collectd.warning('%s could not find cgroup: %s' % (PLUGIN, name))
|
||||
|
||||
# Calculate system.slice container cpuacct usage
|
||||
for g in pc.CONTAINERS_CGROUPS:
|
||||
if g in cpuacct[pc.CGROUP_SYSTEM].keys():
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
|
||||
cpuacct[pc.CGROUP_SYSTEM][g]
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
|
||||
cpuwait[pc.CGROUP_SYSTEM][g]
|
||||
if g in cpuacct[pc.CGROUP_K8SPLATFORM].keys():
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
|
||||
cpuacct[pc.CGROUP_K8SPLATFORM][g]
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
|
||||
cpuwait[pc.CGROUP_K8SPLATFORM][g]
|
||||
|
||||
# Calculate platform cpuacct usage (this excludes apps)
|
||||
for g in pc.PLATFORM_GROUPS:
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_PLATFORM] += \
|
||||
cpuacct[pc.GROUP_OVERALL][g]
|
||||
cpuwait[pc.GROUP_OVERALL][pc.GROUP_PLATFORM] += \
|
||||
cpuwait[pc.GROUP_OVERALL][g]
|
||||
|
||||
# Calculate cgroup based occupancy and wait for overall groupings
|
||||
for g in pc.OVERALL_GROUPS:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.GROUP_OVERALL][g]) / float(pc.ONE_MILLION)
|
||||
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occ[g] = g_occ
|
||||
cpuwait_ms = \
|
||||
float(cpuwait[pc.GROUP_OVERALL][g]) / float(pc.ONE_MILLION)
|
||||
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occw[g] = g_occw
|
||||
if obj.debug:
|
||||
collectd.info('%s %s %s elapsed = %.1f ms, '
|
||||
'cputime = %.1f ms, cpuwait = %.1f ms, '
|
||||
'n_cpus = %d, '
|
||||
'occupancy = %.2f %%, wait = %.2f %%'
|
||||
% (PLUGIN_DEBUG,
|
||||
prefix,
|
||||
g,
|
||||
elapsed_ms,
|
||||
cputime_ms, cpuwait_ms,
|
||||
number_platform_cpus,
|
||||
g_occ, g_occ))
|
||||
|
||||
# Store occupancy hirunners
|
||||
h_occ = {}
|
||||
h_occw = {}
|
||||
|
||||
# Calculate cgroup based occupancy for first-level groupings
|
||||
for g in cpuacct[pc.GROUP_FIRST]:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.GROUP_FIRST][g]) / float(pc.ONE_MILLION)
|
||||
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occ[g] = g_occ
|
||||
cpuwait_ms = \
|
||||
float(cpuwait[pc.GROUP_FIRST][g]) / float(pc.ONE_MILLION)
|
||||
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occw[g] = g_occw
|
||||
|
||||
if g != pc.CGROUP_INIT:
|
||||
continue
|
||||
|
||||
# Keep hirunners exceeding minimum threshold.
|
||||
if g_occ >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
h_occ[g] = g_occ
|
||||
if g_occw >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
h_occw[g] = g_occw
|
||||
|
||||
# Calculate cgroup based occupancy for cgroups within system.slice.
|
||||
for g in cpuacct[pc.CGROUP_SYSTEM]:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.CGROUP_SYSTEM][g]) / float(pc.ONE_MILLION)
|
||||
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occ[g] = g_occ
|
||||
cpuwait_ms = \
|
||||
float(cpuwait[pc.CGROUP_SYSTEM][g]) / float(pc.ONE_MILLION)
|
||||
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occw[g] = g_occw
|
||||
|
||||
# Keep hirunners exceeding minimum threshold.
|
||||
if g_occ >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
h_occ[g] = g_occ
|
||||
if g_occw >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
h_occw[g] = g_occw
|
||||
|
||||
# Calculate cgroup based occupancy for cgroups within k8splatform.slice.
|
||||
if pc.CGROUP_K8SPLATFORM in cpuacct.keys():
|
||||
for g in cpuacct[pc.CGROUP_K8SPLATFORM]:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.CGROUP_K8SPLATFORM][g]) / float(pc.ONE_MILLION)
|
||||
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occ[g] = g_occ
|
||||
cpuwait_ms = \
|
||||
float(cpuwait[pc.CGROUP_K8SPLATFORM][g]) / float(pc.ONE_MILLION)
|
||||
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occw[g] = g_occw
|
||||
|
||||
# Keep hirunners exceeding minimum threshold.
|
||||
if g_occ >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
h_occ[g] = g_occ
|
||||
if g_occw >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
h_occw[g] = g_occw
|
||||
|
||||
# Calculate cgroup based occupancy for cgroups within user.slice.
|
||||
for g in cpuacct[pc.CGROUP_USER]:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.CGROUP_USER][g]) / float(pc.ONE_MILLION)
|
||||
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occ[g] = g_occ
|
||||
cpuwait_ms = \
|
||||
float(cpuwait[pc.CGROUP_USER][g]) / float(pc.ONE_MILLION)
|
||||
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
|
||||
/ float(elapsed_ms) / number_platform_cpus
|
||||
occw[g] = g_occw
|
||||
|
||||
# Keep hirunners exceeding minimum threshold.
|
||||
if g_occ >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
h_occ[g] = g_occ
|
||||
if g_occw >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
h_occw[g] = g_occw
|
||||
|
||||
if (hires and prefix == 'hires') or (dispatch and prefix == 'dispatch'):
|
||||
# Print cpu occupancy usage for high-level groupings
|
||||
collectd.info('%s %s Usage: %.1f%% (avg per cpu); '
|
||||
'cpus: %d, Platform: %.1f%% '
|
||||
'(Base: %.1f, k8s-system: %.1f), k8s-addon: %.1f, '
|
||||
'%s: %.1f, %s: %.1f'
|
||||
% (PLUGIN, prefix,
|
||||
occ[PLATFORM_CPU_PERCENT],
|
||||
number_platform_cpus,
|
||||
occ[pc.GROUP_PLATFORM],
|
||||
occ[pc.GROUP_BASE],
|
||||
occ[pc.GROUP_K8S_SYSTEM],
|
||||
occ[pc.GROUP_K8S_ADDON],
|
||||
pc.GROUP_CONTAINERS,
|
||||
occ[pc.GROUP_CONTAINERS],
|
||||
pc.GROUP_OVERHEAD,
|
||||
occ[pc.GROUP_OVERHEAD]))
|
||||
|
||||
# Print hirunner cpu occupancy usage for base cgroups
|
||||
occs = ', '.join(
|
||||
'{}: {:.1f}'.format(k.split('.', 1)[0], v) for k, v in sorted(
|
||||
h_occ.items(), key=lambda t: -float(t[1]))
|
||||
)
|
||||
collectd.info('%s %s %s: %.1f%%; cpus: %d, (%s)'
|
||||
% (PLUGIN,
|
||||
prefix, 'Base usage',
|
||||
occ[pc.GROUP_BASE],
|
||||
number_platform_cpus,
|
||||
occs))
|
||||
|
||||
# Print hirunner cpu wait for base cgroups
|
||||
occws = ', '.join(
|
||||
'{}: {:.1f}'.format(k.split('.', 1)[0], v) for k, v in sorted(
|
||||
h_occw.items(), key=lambda t: -float(t[1]))
|
||||
)
|
||||
collectd.info('%s %s %s: %.1f%%; cpus: %d, (%s)'
|
||||
% (PLUGIN,
|
||||
prefix, 'Base wait',
|
||||
occw[pc.GROUP_BASE],
|
||||
number_platform_cpus,
|
||||
occws))
|
||||
|
||||
|
||||
def aggregate_histogram(histogram, occ, shared_bins, hist_occ, debug):
|
||||
"""Aggregate occupancy histogram bins for platform cpus and cgroups.
|
||||
|
||||
This aggregates occupancy histogram bins for each key measurement.
|
||||
|
||||
When 'histogram' flag is True, this will:
|
||||
- calculate mean, 95th-percentime, and max statistics, and bins
|
||||
the measurements
|
||||
- log histograms and statistics per measurement in hirunner order
|
||||
"""
|
||||
|
||||
# Aggregate each key, value into histogram bins
|
||||
for k, v in occ.items():
|
||||
# Get abbreviated name (excludes: .service, .scope, .socket, .mount)
|
||||
# eg, 'k8splatform.slice' will shorten to 'k8splatform'
|
||||
key = k.split('.', 1)[0]
|
||||
if key not in hist_occ:
|
||||
hist_occ[key] = np.array([], dtype=np.float64)
|
||||
if v is not None:
|
||||
hist_occ[key] = np.append(hist_occ[key], v)
|
||||
|
||||
if histogram:
|
||||
# Calculate histograms and statistics for each key measurement
|
||||
H = {}
|
||||
for k, v in hist_occ.items():
|
||||
H[k] = {}
|
||||
H[k]['count'] = hist_occ[k].size
|
||||
if H[k]['count'] > 0:
|
||||
H[k]['mean'] = np.mean(hist_occ[k])
|
||||
H[k]['p95'] = np.percentile(hist_occ[k], 95)
|
||||
H[k]['pmax'] = np.max(hist_occ[k])
|
||||
H[k]['hist'], _ = np.histogram(hist_occ[k], bins=shared_bins)
|
||||
else:
|
||||
H[k]['mean'] = 0
|
||||
H[k]['p95'] = 0.0
|
||||
H[k]['pmax'] = 0.0
|
||||
H[k]['hist'] = []
|
||||
|
||||
# Print out each histogram, sort by cpu occupancy hirunners
|
||||
bins = ' '.join('{:4d}'.format(int(x)) for x in shared_bins[1:])
|
||||
collectd.info('%s: %26.26s : bins=[%s]'
|
||||
% (PLUGIN_HISTOGRAM, 'component', bins))
|
||||
for k, v in sorted(H.items(), key=lambda t: -float(t[1]['mean'])):
|
||||
if v['mean'] > HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
collectd.info('%s: %26.26s : hist=%s : cnt: %3d, '
|
||||
'mean: %5.1f %%, p95: %5.1f %%, max: %5.1f %%'
|
||||
% (PLUGIN_HISTOGRAM, k, v['hist'], v['count'],
|
||||
v['mean'], v['p95'], v['pmax']))
|
||||
|
||||
|
||||
def update_cpu_data(init=False):
|
||||
@ -287,23 +762,36 @@ def update_cpu_data(init=False):
|
||||
|
||||
# Calculate elapsed time delta since last run
|
||||
obj.elapsed_ms = float(pc.ONE_THOUSAND) * (now - obj._t0[TIMESTAMP])
|
||||
obj.d_elapsed_ms = float(pc.ONE_THOUSAND) * (now - obj.d_t0[TIMESTAMP])
|
||||
obj.hist_elapsed_ms = float(pc.ONE_THOUSAND) * (now - obj.hist_t0)
|
||||
|
||||
# Prevent calling this routine too frequently (<= 1 sec)
|
||||
if not init and obj.elapsed_ms <= 1000.0:
|
||||
return
|
||||
|
||||
# Check whether this is a dispatch interval
|
||||
if obj.d_elapsed_ms >= 1000.0 * PLUGIN_DISPATCH_INTERVAL:
|
||||
obj.dispatch = True
|
||||
|
||||
# Check whether this is a histogram interval
|
||||
if obj.hist_elapsed_ms >= 1000.0 * PLUGIN_HISTOGRAM_INTERVAL:
|
||||
obj.histogram = True
|
||||
|
||||
t1 = {}
|
||||
w1 = {}
|
||||
t1[TIMESTAMP] = now
|
||||
w1[TIMESTAMP] = now
|
||||
if obj.schedstat_supported:
|
||||
# Get current per-cpu cumulative cputime usage from /proc/schedstat.
|
||||
cputimes = read_schedstat()
|
||||
cputime, cpuwait = read_schedstat()
|
||||
for cpu in obj.cpu_list:
|
||||
t1[cpu] = cputimes[cpu]
|
||||
t1[cpu] = cputime[cpu]
|
||||
w1[cpu] = cpuwait[cpu]
|
||||
else:
|
||||
return
|
||||
|
||||
# Get current cpuacct usages based on cgroup hierarchy
|
||||
t1_cpuacct = get_cpuacct()
|
||||
# Get current cpuacct usages and wait_sum based on cgroup hierarchy
|
||||
t1_cpuacct, t1_cpuwait = get_cpuacct()
|
||||
|
||||
# Refresh the k8s pod information if we have discovered new cgroups
|
||||
cg_pods = set(t1_cpuacct[pc.GROUP_PODS].keys())
|
||||
@ -350,154 +838,73 @@ def update_cpu_data(init=False):
|
||||
del obj._cache[uid]
|
||||
except ApiException:
|
||||
# continue with remainder of calculations, keeping cache
|
||||
collectd.warning("cpu plugin encountered kube ApiException")
|
||||
collectd.warning('%s encountered kube ApiException' % (PLUGIN))
|
||||
pass
|
||||
|
||||
# Save initial state information
|
||||
if init:
|
||||
obj.d_t0 = copy.deepcopy(t1)
|
||||
obj.d_w0 = copy.deepcopy(w1)
|
||||
obj.d_t0_cpuacct = copy.deepcopy(t1_cpuacct)
|
||||
obj.d_t0_cpuwait = copy.deepcopy(t1_cpuwait)
|
||||
|
||||
obj._t0 = copy.deepcopy(t1)
|
||||
obj._w0 = copy.deepcopy(w1)
|
||||
obj._t0_cpuacct = copy.deepcopy(t1_cpuacct)
|
||||
obj._t0_cpuwait = copy.deepcopy(t1_cpuwait)
|
||||
return
|
||||
|
||||
# Aggregate cputime delta for platform logical cpus using integer math
|
||||
cputime_ms = 0.0
|
||||
for cpu in obj.cpu_list:
|
||||
# Paranoia check, we should never hit this.
|
||||
if cpu not in obj._t0:
|
||||
collectd.error('%s cputime initialization error' % (PLUGIN))
|
||||
break
|
||||
cputime_ms += float(t1[cpu] - obj._t0[cpu])
|
||||
cputime_ms /= float(pc.ONE_MILLION)
|
||||
# Calculate average cpu occupancy for hi-resolution read sample
|
||||
prefix = 'hires'
|
||||
calculate_occupancy(
|
||||
prefix, obj.hires, obj.dispatch,
|
||||
obj._cache,
|
||||
obj._t0, t1,
|
||||
obj._w0, w1,
|
||||
obj._t0_cpuacct, t1_cpuacct,
|
||||
obj._t0_cpuwait, t1_cpuwait,
|
||||
obj._occ, obj._occw,
|
||||
obj.elapsed_ms,
|
||||
obj.number_platform_cpus,
|
||||
obj.cpu_list,
|
||||
obj.debug)
|
||||
|
||||
# Calculate average occupancy of platform logical cpus
|
||||
occupancy = 0.0
|
||||
if obj.number_platform_cpus > 0 and obj.elapsed_ms > 0:
|
||||
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(obj.elapsed_ms) / obj.number_platform_cpus
|
||||
else:
|
||||
occupancy = 0.0
|
||||
obj._data[PLATFORM_CPU_PERCENT] = occupancy
|
||||
if obj.debug:
|
||||
collectd.info('%s %s elapsed = %.1f ms, cputime = %.1f ms, '
|
||||
'n_cpus = %d, occupancy = %.2f %%'
|
||||
% (PLUGIN_DEBUG,
|
||||
PLATFORM_CPU_PERCENT,
|
||||
obj.elapsed_ms,
|
||||
cputime_ms,
|
||||
obj.number_platform_cpus,
|
||||
occupancy))
|
||||
# Aggregate occupancy histogram bins
|
||||
aggregate_histogram(
|
||||
obj.histogram, obj._occ, obj.shared_bins, obj.hist_occ, obj.debug)
|
||||
|
||||
# Calculate cpuacct delta for cgroup hierarchy, dropping transient cgroups
|
||||
cpuacct = {}
|
||||
for i in t1_cpuacct.keys():
|
||||
cpuacct[i] = {}
|
||||
for k, v in t1_cpuacct[i].items():
|
||||
if i in obj._t0_cpuacct and k in obj._t0_cpuacct[i]:
|
||||
cpuacct[i][k] = v - obj._t0_cpuacct[i][k]
|
||||
else:
|
||||
cpuacct[i][k] = v
|
||||
# Clear histogram data for next interval
|
||||
if obj.histogram:
|
||||
obj.histogram = False
|
||||
obj.hist_occ = {}
|
||||
obj.hist_t0 = now
|
||||
|
||||
# Summarize cpuacct usage for various groupings we aggregate
|
||||
for g in pc.GROUPS_AGGREGATED:
|
||||
cpuacct[pc.GROUP_OVERALL][g] = 0.0
|
||||
|
||||
# Aggregate cpuacct usage by K8S pod
|
||||
for uid in cpuacct[pc.GROUP_PODS]:
|
||||
acct = cpuacct[pc.GROUP_PODS][uid]
|
||||
if uid in obj._cache:
|
||||
pod = obj._cache[uid]
|
||||
else:
|
||||
collectd.warning('%s uid %s not found' % (PLUGIN, uid))
|
||||
continue
|
||||
|
||||
# K8S platform system usage, i.e., essential: kube-system
|
||||
# check for component label app.starlingx.io/component=platform
|
||||
if pod.is_platform_resource():
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_K8S_SYSTEM] += acct
|
||||
|
||||
# K8S platform addons usage, i.e., non-essential: monitor, openstack
|
||||
if pod.namespace in pc.K8S_NAMESPACE_ADDON:
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_K8S_ADDON] += acct
|
||||
|
||||
# Calculate base cpuacct usage (i.e., base tasks, exclude K8S and VMs)
|
||||
# e.g., docker, system.slice, user.slice
|
||||
for name in cpuacct[pc.GROUP_FIRST]:
|
||||
if name in pc.BASE_GROUPS:
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_BASE] += \
|
||||
cpuacct[pc.GROUP_FIRST][name]
|
||||
elif name not in pc.BASE_GROUPS_EXCLUDE:
|
||||
collectd.warning('%s could not find cgroup: %s' % (PLUGIN, name))
|
||||
|
||||
# Calculate system.slice container cpuacct usage
|
||||
for g in pc.CONTAINERS_CGROUPS:
|
||||
if g in cpuacct[pc.CGROUP_SYSTEM]:
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
|
||||
cpuacct[pc.CGROUP_SYSTEM][g]
|
||||
|
||||
# Calculate platform cpuacct usage (this excludes apps)
|
||||
for g in pc.PLATFORM_GROUPS:
|
||||
cpuacct[pc.GROUP_OVERALL][pc.GROUP_PLATFORM] += \
|
||||
cpuacct[pc.GROUP_OVERALL][g]
|
||||
|
||||
# Calculate cgroup based occupancy for overall groupings
|
||||
for g in pc.OVERALL_GROUPS:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.GROUP_OVERALL][g]) / float(pc.ONE_MILLION)
|
||||
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(obj.elapsed_ms) / obj.number_platform_cpus
|
||||
obj._data[g] = occupancy
|
||||
if obj.debug:
|
||||
collectd.info('%s %s elapsed = %.1f ms, cputime = %.1f ms, '
|
||||
'n_cpus = %d, occupancy = %.2f %%'
|
||||
% (PLUGIN_DEBUG,
|
||||
g,
|
||||
obj.elapsed_ms,
|
||||
cputime_ms,
|
||||
obj.number_platform_cpus,
|
||||
occupancy))
|
||||
|
||||
# Calculate cgroup based occupancy for first-level groupings
|
||||
for g in cpuacct[pc.GROUP_FIRST]:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.GROUP_FIRST][g]) / float(pc.ONE_MILLION)
|
||||
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(obj.elapsed_ms) / obj.number_platform_cpus
|
||||
obj._data[g] = occupancy
|
||||
|
||||
# Calculate cgroup based occupancy for cgroups within
|
||||
# system.slice and user.slice, keeping the hirunners
|
||||
# exceeding minimum threshold.
|
||||
occ = {}
|
||||
for g in cpuacct[pc.CGROUP_SYSTEM]:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.CGROUP_SYSTEM][g]) / float(pc.ONE_MILLION)
|
||||
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(obj.elapsed_ms) / obj.number_platform_cpus
|
||||
obj._data[g] = occupancy
|
||||
if occupancy >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
occ[g] = occupancy
|
||||
for g in cpuacct[pc.CGROUP_USER]:
|
||||
cputime_ms = \
|
||||
float(cpuacct[pc.CGROUP_USER][g]) / float(pc.ONE_MILLION)
|
||||
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
|
||||
/ float(obj.elapsed_ms) / obj.number_platform_cpus
|
||||
obj._data[g] = occupancy
|
||||
if occupancy >= HIRUNNER_MINIMUM_CPU_PERCENT:
|
||||
occ[g] = occupancy
|
||||
occs = ', '.join(
|
||||
'{}: {:.1f}'.format(k.split('.', 1)[0], v) for k, v in sorted(
|
||||
occ.items(), key=lambda t: -float(t[1]))
|
||||
)
|
||||
collectd.info('%s %s: %.1f%%; cpus: %d, (%s)'
|
||||
% (PLUGIN,
|
||||
'Base usage',
|
||||
obj._data[pc.GROUP_BASE],
|
||||
obj.number_platform_cpus,
|
||||
occs))
|
||||
# Calculate average cpu occupancy for dispatch interval
|
||||
if obj.dispatch:
|
||||
prefix = 'dispatch'
|
||||
calculate_occupancy(
|
||||
prefix, obj.hires, obj.dispatch,
|
||||
obj._cache,
|
||||
obj.d_t0, t1,
|
||||
obj.d_w0, w1,
|
||||
obj.d_t0_cpuacct, t1_cpuacct,
|
||||
obj.d_t0_cpuwait, t1_cpuwait,
|
||||
obj.d_occ, obj.d_occw,
|
||||
obj.d_elapsed_ms,
|
||||
obj.number_platform_cpus,
|
||||
obj.cpu_list,
|
||||
obj.debug)
|
||||
|
||||
# Update t0 state for the next sample collection
|
||||
obj._t0 = copy.deepcopy(t1)
|
||||
obj._w0 = copy.deepcopy(w1)
|
||||
obj._t0_cpuacct = copy.deepcopy(t1_cpuacct)
|
||||
obj._t0_cpuwait = copy.deepcopy(t1_cpuwait)
|
||||
if obj.dispatch:
|
||||
obj.d_t0 = copy.deepcopy(t1)
|
||||
obj.d_w0 = copy.deepcopy(w1)
|
||||
obj.d_t0_cpuacct = copy.deepcopy(t1_cpuacct)
|
||||
obj.d_t0_cpuwait = copy.deepcopy(t1_cpuwait)
|
||||
|
||||
|
||||
def config_func(config):
|
||||
@ -510,9 +917,11 @@ def config_func(config):
|
||||
obj.debug = pc.convert2boolean(val)
|
||||
elif key == 'verbose':
|
||||
obj.verbose = pc.convert2boolean(val)
|
||||
elif key == 'hires':
|
||||
obj.hires = pc.convert2boolean(val)
|
||||
|
||||
collectd.info('%s debug=%s, verbose=%s'
|
||||
% (PLUGIN, obj.debug, obj.verbose))
|
||||
collectd.info('%s debug=%s, verbose=%s, hires=%s'
|
||||
% (PLUGIN, obj.debug, obj.verbose, obj.hires))
|
||||
|
||||
return pc.PLUGIN_PASS
|
||||
|
||||
@ -598,55 +1007,41 @@ def read_func():
|
||||
collectd.info('%s no cpus to monitor' % PLUGIN)
|
||||
return pc.PLUGIN_PASS
|
||||
|
||||
# Gather current cputime state information, and calculate occupancy since
|
||||
# this routine was last run.
|
||||
# Gather current cputime state information, and calculate occupancy
|
||||
# since this routine was last run.
|
||||
update_cpu_data()
|
||||
|
||||
# Prevent dispatching measurements at plugin startup
|
||||
if obj.elapsed_ms <= 1000.0:
|
||||
if obj.elapsed_ms <= 500.0:
|
||||
return pc.PLUGIN_PASS
|
||||
|
||||
if obj.verbose:
|
||||
collectd.info('%s Usage: %.1f%% (avg per cpu); '
|
||||
'cpus: %d, Platform: %.1f%% '
|
||||
'(Base: %.1f, k8s-system: %.1f), k8s-addon: %.1f, '
|
||||
'%s: %.1f, %s: %.1f'
|
||||
% (PLUGIN, obj._data[PLATFORM_CPU_PERCENT],
|
||||
obj.number_platform_cpus,
|
||||
obj._data[pc.GROUP_PLATFORM],
|
||||
obj._data[pc.GROUP_BASE],
|
||||
obj._data[pc.GROUP_K8S_SYSTEM],
|
||||
obj._data[pc.GROUP_K8S_ADDON],
|
||||
pc.GROUP_CONTAINERS,
|
||||
obj._data[pc.GROUP_CONTAINERS],
|
||||
pc.GROUP_OVERHEAD,
|
||||
obj._data[pc.GROUP_OVERHEAD]))
|
||||
|
||||
# Fault insertion code to assis in regression UT
|
||||
#
|
||||
# if os.path.exists('/var/run/fit/cpu_data'):
|
||||
# with open('/var/run/fit/cpu_data', 'r') as infile:
|
||||
# for line in infile:
|
||||
# obj._data[PLATFORM_CPU_PERCENT] = float(line)
|
||||
# obj._occ[PLATFORM_CPU_PERCENT] = float(line)
|
||||
# collectd.info("%s using FIT data:%.2f" %
|
||||
# (PLUGIN, obj._data[PLATFORM_CPU_PERCENT] ))
|
||||
# (PLUGIN, obj._occ[PLATFORM_CPU_PERCENT] ))
|
||||
# break
|
||||
|
||||
# Dispatch overall platform cpu usage percent value
|
||||
val = collectd.Values(host=obj.hostname)
|
||||
val.plugin = 'cpu'
|
||||
val.type = 'percent'
|
||||
val.type_instance = 'used'
|
||||
val.dispatch(values=[obj._data[PLATFORM_CPU_PERCENT]])
|
||||
if obj.dispatch:
|
||||
# Dispatch overall platform cpu usage percent value
|
||||
val = collectd.Values(host=obj.hostname)
|
||||
val.plugin = 'cpu'
|
||||
val.type = 'percent'
|
||||
val.type_instance = 'used'
|
||||
val.dispatch(values=[obj.d_occ[PLATFORM_CPU_PERCENT]])
|
||||
|
||||
# Dispatch grouped platform cpu usage values
|
||||
val = collectd.Values(host=obj.hostname)
|
||||
val.plugin = 'cpu'
|
||||
val.type = 'percent'
|
||||
val.type_instance = 'occupancy'
|
||||
for g in pc.OVERALL_GROUPS:
|
||||
val.plugin_instance = g
|
||||
val.dispatch(values=[obj._data[g]])
|
||||
# Dispatch grouped platform cpu usage values
|
||||
val = collectd.Values(host=obj.hostname)
|
||||
val.plugin = 'cpu'
|
||||
val.type = 'percent'
|
||||
val.type_instance = 'occupancy'
|
||||
for g in pc.OVERALL_GROUPS:
|
||||
val.plugin_instance = g
|
||||
val.dispatch(values=[obj.d_occ[g]])
|
||||
obj.dispatch = False
|
||||
|
||||
# Calculate overhead cost of gathering metrics
|
||||
if obj.debug:
|
||||
@ -661,4 +1056,4 @@ def read_func():
|
||||
# Register the config, init and read functions
|
||||
collectd.register_config(config_func)
|
||||
collectd.register_init(init_func)
|
||||
collectd.register_read(read_func)
|
||||
collectd.register_read(read_func, interval=PLUGIN_HIRES_INTERVAL)
|
||||
|
@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2018-2022 Wind River Systems, Inc.
|
||||
# Copyright (c) 2018-2024 Wind River Systems, Inc.
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
@ -618,22 +618,23 @@ def output_top_10_pids(pid_dict, message):
|
||||
"""Outputs the top 10 pids with the formatted message.
|
||||
|
||||
Args:
|
||||
pid_dict: Dict The Dictionary of PIDs with Name and RSS
|
||||
message: Formatted String, the template message to be output.
|
||||
pid_dict: dictionary {pid: {'name': name, 'rss: value}
|
||||
message: Formatted String, template output message
|
||||
"""
|
||||
|
||||
# Check that pid_dict has values
|
||||
if not pid_dict:
|
||||
return
|
||||
proc = []
|
||||
# Sort the dict based on Rss value from highest to lowest.
|
||||
sorted_pid_dict = sorted(pid_dict.items(), key=lambda x: x[1]['rss'],
|
||||
reverse=True)
|
||||
# Convert sorted_pid_dict into a list
|
||||
[proc.append((i[1].get('name'), format_iec(i[1].get('rss')))) for i in
|
||||
sorted_pid_dict]
|
||||
# Output top 10 entries of the list
|
||||
collectd.info(message % (str(proc[:10])))
|
||||
|
||||
# Output top 10 RSS usage entries
|
||||
mems = ', '.join(
|
||||
'{}: {}'.format(
|
||||
v.get('name', '-'),
|
||||
format_iec(v.get('rss', 0.0))) for k, v in sorted(
|
||||
pid_dict.items(),
|
||||
key=lambda t: -float(t[1]['rss']))[:10]
|
||||
)
|
||||
collectd.info(message % (mems))
|
||||
|
||||
|
||||
def config_func(config):
|
||||
@ -777,10 +778,10 @@ def read_func():
|
||||
# K8S platform addons usage, i.e., non-essential: monitor, openstack
|
||||
if pod.namespace in pc.K8S_NAMESPACE_ADDON:
|
||||
memory[pc.GROUP_OVERALL][pc.GROUP_K8S_ADDON] += MiB
|
||||
# Limit output to every 5 minutes and after 29 seconds to avoid duplication
|
||||
if datetime.datetime.now().minute % 5 == 0 and datetime.datetime.now(
|
||||
|
||||
).second > 29:
|
||||
# Get per-process and per-pod RSS memory every 5 minutes
|
||||
now = datetime.datetime.now()
|
||||
if now.minute % 5 == 0 and now.second > 29:
|
||||
# Populate the memory per process dictionary to output results
|
||||
pids = get_platform_memory_per_process()
|
||||
|
||||
@ -795,13 +796,21 @@ def read_func():
|
||||
for uid in group_pods:
|
||||
if uid in obj._cache:
|
||||
pod = obj._cache[uid]
|
||||
# Ensure pods outside of Kube-System and Kube-Addon are only logged every 30 min
|
||||
if datetime.datetime.now().minute % 30 == 0 and datetime.datetime.now().second > 29:
|
||||
collectd.info(f'The pod:{pod.name} running in namespace:{pod.namespace} '
|
||||
f'has the following processes{group_pods[uid]}')
|
||||
# Log detailed memory usage of all pods every 30 minutes
|
||||
if now.minute % 30 == 0 and now.second > 29:
|
||||
mems = ', '.join(
|
||||
'{}({}): {}'.format(
|
||||
v.get('name', '-'),
|
||||
k,
|
||||
format_iec(v.get('rss', 0.0))) for k, v in sorted(
|
||||
group_pods[uid].items(),
|
||||
key=lambda t: -float(t[1]['rss']))
|
||||
)
|
||||
collectd.info(f'memory usage: Pod: {pod.name}, '
|
||||
f'Namespace: {pod.namespace}, '
|
||||
f'pids: {mems}')
|
||||
else:
|
||||
collectd.warning('%s: uid %s for pod %s not found in namespace %s' % (
|
||||
PLUGIN, uid, pod.name, pod.namespace))
|
||||
collectd.warning('%s: uid %s for pod not found' % (PLUGIN, uid))
|
||||
continue
|
||||
|
||||
# K8S platform system usage, i.e., essential: kube-system
|
||||
@ -815,16 +824,16 @@ def read_func():
|
||||
for key in group_pods[uid]:
|
||||
k8s_addon[key] = group_pods[uid][key]
|
||||
|
||||
message = 'The top 10 memory rss processes for the platform are : %s'
|
||||
message = 'Top 10 memory usage pids: platform: %s'
|
||||
output_top_10_pids(platform, message)
|
||||
|
||||
message = 'The top 10 memory rss processes for the Kubernetes System are :%s'
|
||||
message = 'Top 10 memory usage pids: Kubernetes System: %s'
|
||||
output_top_10_pids(k8s_system, message)
|
||||
|
||||
message = 'The top 10 memory rss processes Kubernetes Addon are :%s'
|
||||
message = 'Top 10 memory usage pids: Kubernetes Addon: %s'
|
||||
output_top_10_pids(k8s_addon, message)
|
||||
|
||||
message = 'The top 10 memory rss processes overall are :%s'
|
||||
message = 'Top 10 memory usage pids: overall: %s'
|
||||
output_top_10_pids(overall, message)
|
||||
|
||||
# Calculate base memory usage (i.e., normal memory, exclude K8S and VMs)
|
||||
|
@ -1,7 +1,7 @@
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
# Copyright (C) 2019 Intel Corporation
|
||||
# Copyright (C) 2019-2024 Intel Corporation
|
||||
#
|
||||
############################################################################
|
||||
#
|
||||
@ -741,7 +741,7 @@ def parse_ovs_appctl_bond_list(buf):
|
||||
buf = buf.strip().split("\n")
|
||||
result = {}
|
||||
for idx, line in enumerate(buf):
|
||||
if idx is 0:
|
||||
if idx == 0:
|
||||
continue
|
||||
|
||||
line = line.strip()
|
||||
@ -837,7 +837,7 @@ def compare_interfaces(interfaces1, interfaces2):
|
||||
len1 = len(set1 - set2)
|
||||
len2 = len(set2 - set1)
|
||||
|
||||
if len1 is 0 and len2 is 0:
|
||||
if len1 == 0 and len2 == 0:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
|
@ -1,5 +1,5 @@
|
||||
#
|
||||
# Copyright (c) 2019-2022 Wind River Systems, Inc.
|
||||
# Copyright (c) 2019-2024 Wind River Systems, Inc.
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
@ -40,6 +40,7 @@ MIN_AUDITS_B4_FIRST_QUERY = 2
|
||||
K8S_MODULE_MAJOR_VERSION = int(K8S_MODULE_VERSION.split('.')[0])
|
||||
KUBELET_CONF = '/etc/kubernetes/kubelet.conf'
|
||||
SSL_TLS_SUPPRESS = True
|
||||
K8S_TIMEOUT = 2
|
||||
|
||||
# Standard units' conversion parameters (mebi, kibi)
|
||||
# Reference: https://en.wikipedia.org/wiki/Binary_prefix
|
||||
@ -83,9 +84,11 @@ GROUPS_AGGREGATED = [GROUP_PLATFORM, GROUP_BASE, GROUP_K8S_SYSTEM,
|
||||
GROUP_K8S_ADDON, GROUP_CONTAINERS]
|
||||
|
||||
# First level cgroups -- these are the groups we know about
|
||||
CGROUP_INIT = 'init.scope'
|
||||
CGROUP_SYSTEM = 'system.slice'
|
||||
CGROUP_USER = 'user.slice'
|
||||
CGROUP_MACHINE = 'machine.slice'
|
||||
CGROUP_K8SPLATFORM = 'k8splatform.slice'
|
||||
CGROUP_DOCKER = 'docker'
|
||||
CGROUP_K8S = K8S_ROOT
|
||||
|
||||
@ -98,7 +101,8 @@ CONTAINERS_CGROUPS = [CGROUP_SYSTEM_CONTAINERD, CGROUP_SYSTEM_DOCKER,
|
||||
CGROUP_SYSTEM_KUBELET, CGROUP_SYSTEM_ETCD]
|
||||
|
||||
# Groupings by first level cgroup
|
||||
BASE_GROUPS = [CGROUP_DOCKER, CGROUP_SYSTEM, CGROUP_USER]
|
||||
BASE_GROUPS = [CGROUP_INIT, CGROUP_DOCKER, CGROUP_SYSTEM, CGROUP_USER,
|
||||
CGROUP_K8SPLATFORM]
|
||||
BASE_GROUPS_EXCLUDE = [CGROUP_K8S, CGROUP_MACHINE]
|
||||
|
||||
# Groupings of pods by kubernetes namespace
|
||||
@ -750,18 +754,28 @@ class K8sClient(object):
|
||||
# Debian
|
||||
# kubectl --kubeconfig KUBELET_CONF get pods --all-namespaces \
|
||||
# --selector spec.nodeName=the_host -o json
|
||||
kube_results = subprocess.check_output(
|
||||
['kubectl', '--kubeconfig', KUBELET_CONF,
|
||||
'--field-selector', field_selector,
|
||||
'get', 'pods', '--all-namespaces',
|
||||
'-o', 'json'
|
||||
]).decode()
|
||||
json_results = json.loads(kube_results)
|
||||
try:
|
||||
kube_results = subprocess.check_output(
|
||||
['kubectl', '--kubeconfig', KUBELET_CONF,
|
||||
'--field-selector', field_selector,
|
||||
'get', 'pods', '--all-namespaces',
|
||||
'-o', 'json',
|
||||
], timeout=K8S_TIMEOUT).decode()
|
||||
json_results = json.loads(kube_results)
|
||||
except subprocess.TimeoutExpired:
|
||||
collectd.error('kube_get_local_pods: Timeout')
|
||||
return []
|
||||
except json.JSONDecodeError as e:
|
||||
collectd.error('kube_get_local_pods: Could not parse json output, error=%s' % (str(e)))
|
||||
return []
|
||||
except subprocess.CalledProcessError as e:
|
||||
collectd.error('kube_get_local_pods: Could not get pods, error=%s' % (str(e)))
|
||||
return []
|
||||
# convert the items to: kubernetes.client.V1Pod
|
||||
api_items = [self._as_kube_pod(x) for x in json_results['items']]
|
||||
return api_items
|
||||
except Exception as err:
|
||||
collectd.error("kube_get_local_pods: %s" % (err))
|
||||
collectd.error("kube_get_local_pods: error=%s" % (str(err)))
|
||||
raise
|
||||
|
||||
|
||||
@ -783,7 +797,8 @@ class POD_object:
|
||||
"""Check whether pod contains platform namespace or platform label"""
|
||||
|
||||
if (self.namespace in K8S_NAMESPACE_SYSTEM
|
||||
or self.labels.get(PLATFORM_LABEL_KEY) == GROUP_PLATFORM):
|
||||
or (self.labels is not None and
|
||||
self.labels.get(PLATFORM_LABEL_KEY) == GROUP_PLATFORM)):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
@ -5,6 +5,7 @@ LoadPlugin python
|
||||
<Module "cpu">
|
||||
debug = false
|
||||
verbose = true
|
||||
hires = false
|
||||
</Module>
|
||||
Import "memory"
|
||||
<Module "memory">
|
||||
@ -21,5 +22,4 @@ LoadPlugin python
|
||||
Import "remotels"
|
||||
Import "service_res"
|
||||
LogTraces = true
|
||||
Encoding "utf-8"
|
||||
</Plugin>
|
||||
|
@ -1,3 +1,10 @@
|
||||
monitor-tools (1.0-2) unstable; urgency=medium
|
||||
|
||||
* Update schedtop to display cgroups from systemd services and Kubernetes pods
|
||||
* Add watchpids to find created processes, typically short-lived
|
||||
|
||||
-- Jim Gauld <James.Gauld@windriver.com> Thu, 12 Sep 2024 09:54:55 -0400
|
||||
|
||||
monitor-tools (1.0-1) unstable; urgency=medium
|
||||
|
||||
* Initial release.
|
||||
|
@ -13,4 +13,5 @@ Description: Monitor tools package
|
||||
This package contains data collection tools to monitor host performance.
|
||||
Tools are general purpose engineering and debugging related.
|
||||
Includes overall memory, cpu occupancy, per-task cpu,
|
||||
per-task scheduling, per-task io.
|
||||
per-task scheduling, per-task io, newly created short-lived-processes,
|
||||
local port scanning.
|
||||
|
@ -5,7 +5,7 @@ Source: https://opendev.org/starlingx/utilities
|
||||
|
||||
Files: *
|
||||
Copyright:
|
||||
(c) 2013-2021 Wind River Systems, Inc
|
||||
(c) 2013-2024 Wind River Systems, Inc
|
||||
(c) Others (See individual files for more details)
|
||||
License: Apache-2
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
@ -26,7 +26,7 @@ License: Apache-2
|
||||
# If you want to use GPL v2 or later for the /debian/* files use
|
||||
# the following clauses, or change it to suit. Delete these two lines
|
||||
Files: debian/*
|
||||
Copyright: 2021 Wind River Systems, Inc
|
||||
Copyright: 2024 Wind River Systems, Inc
|
||||
License: Apache-2
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
|
@ -10,5 +10,8 @@ override_dh_install:
|
||||
install -p memtop $(ROOT)/usr/bin
|
||||
install -p schedtop $(ROOT)/usr/bin
|
||||
install -p occtop $(ROOT)/usr/bin
|
||||
install -p k8smetrics $(ROOT)/usr/bin
|
||||
install -p portscanner $(ROOT)/usr/bin
|
||||
install -p watchpids $(ROOT)/usr/bin
|
||||
|
||||
dh_install
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
debname: monitor-tools
|
||||
debver: 1.0-1
|
||||
debver: 1.0-2
|
||||
src_path: scripts
|
||||
revision:
|
||||
dist: $STX_DIST
|
||||
|
292
monitor-tools/scripts/k8smetrics
Executable file
292
monitor-tools/scripts/k8smetrics
Executable file
@ -0,0 +1,292 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
########################################################################
|
||||
#
|
||||
# Copyright (c) 2024 Wind River Systems, Inc.
|
||||
#
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
#
|
||||
########################################################################
|
||||
#
|
||||
# Calculate Kubernetes latency percentile metrics (50%, 95, and 99%) for
|
||||
# etcdserver and kube-apiserver. This is based on Prometheus format raw
|
||||
# metrics histograms within kube-apiserver.
|
||||
#
|
||||
# This obtains current Kubernetes raw metrics cumulative counters,
|
||||
# (e.g., kubectl get --raw /metrics). The counters represent cumulative
|
||||
# frequency of delays <= value. This calculates the delta from previous,
|
||||
# and does percentile calculation.
|
||||
#
|
||||
# Example:
|
||||
# kubectl get --raw /metrics
|
||||
#
|
||||
# To see API calls:
|
||||
# kubectl get --raw /metrics -v 6
|
||||
#
|
||||
# This does minimal parsing and aggregation to yield equivalent of the
|
||||
# following Prometheus PromQL queries using data over a time-window:
|
||||
# histogram_quantile(0.95, sum(rate(etcd_request_duration_seconds_bucket[5m])) by (le))
|
||||