Update collectd cpu plugin and monitor-tools to diagnose cpu spikes

The collectd cpu plugin and monitor-tools are updated to
support diagnosing high cpu usage on shorter time scale.
This includes tools that assist SystemEngineering determine
the source where CPU time is coming from.

This collectd cpu plugin is updated to support Kubernetes services
under system.slice or k8splatform.slice.

This changes the frequency of read function sampling to 1 second.
We now see logs with instantaneous cpu spikes at the cgroup level.
This dispatch of results still occurs at the original plugin
interval of 30 seconds.  The logging of the 1 second sampling is
configurable via /etc/collectd.d/starlingx/python_plugins.conf
field 'hires = <true|false>. The hiresolution samples are always
collected and used for a histogram, but it is not always desired
to log this due to the volume of output.

This adds new logs for occupancy wait. This is similar to cpu
occupancy, but instead of realtime used, it measures the aggregate
percent of time a given cgroup is waiting to schedule. This is a
measure of CPU contention.

This adds new logs for occupancy histograms for all cgroups and
aggregated groupings based on the 1 second occupancy samples.
The histograms are displayed in hirunner order. This displays
the histogram, the mean, 95th-percentile, and max value.
The histograms are logged at 5 minute intervals.

This reduces collectd cgroup to 256 CPUShare from (1024).
This smoothes out behaviour of poorly behaved audits.

The 'schedtop' tool is updated to display 'cgroup' field. This
is the systemd cgroup name, or abbrieviated pod-name. This also
handles Kernel sched output format changes for 6.6.

New tool 'portscanner' is added to monitor-tools to diagnose
local host processes that are using specific ports. This has been
instrumental in discovering gunicorn/keystone API users.

New tool 'k8smetrics' is added to monitor-tools to display
the delay histogram and percentiles for kube-apiserver and
etdcserver. This gives a way to quantify performance as
a result of system load.

Partial-Bug: 2084714

TEST PLAN:
AIO-SX, AIO-DX, Standard, Storage, DC:
PASS: Fresh install ISO
PASS: Verify /var/log/collectd.logs for 1 second cpu/wait logs,
      and contains: etcd, kubelet, and containerd services.
PASS: Verify we are dispatching at 30 second granularity.
PASS: Verify we are displaying histograms every 5 minutes.
PASS: Verify we can enable/disable the display of hiresolution
      logs with /etc/collectd.d/starlingx/python_plugins.conf
      field 'hires = <true|false>'.
PASS: Verify schedtop contains 'cgroup' output.
PASS: Verify output from 'k8smetrics'.
      Cross check against Prometheus GUI for apiserver percentile.
PASS: Verify output from portscanner with port 5000.
      Verify 1-to-1 mapping against /var/log/keystone/keystone-all.log.

Change-Id: I82d4f414afdf1cecbcc99680b360cbad702ba140
Signed-off-by: Jim Gauld <James.Gauld@windriver.com>
This commit is contained in:
Jim Gauld 2024-10-16 18:14:13 -04:00
parent 8f404ea66c
commit 0232b8b9dc
16 changed files with 2270 additions and 270 deletions

View File

@ -12,5 +12,9 @@ ExecStart=/usr/sbin/collectd
ExecStartPost=/bin/bash -c 'echo $MAINPID > /var/run/collectd.pid'
ExecStopPost=/bin/rm -f /var/run/collectd.pid
# cgroup performance engineering
# - smooth out CPU impulse from poorly behaved plugin
CPUShares=256
[Install]
WantedBy=multi-user.target

View File

@ -1,5 +1,5 @@
#
# Copyright (c) 2018-2021 Wind River Systems, Inc.
# Copyright (c) 2018-2024 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
@ -17,6 +17,7 @@
############################################################################
import collectd
import copy
import numpy as np
import os
import plugin_common as pc
import re
@ -26,8 +27,13 @@ import tsconfig.tsconfig as tsc
from kubernetes.client.rest import ApiException
PLUGIN = 'platform cpu usage plugin'
#PLUGIN = 'platform cpu usage plugin'
PLUGIN = 'platform cpu'
PLUGIN_HISTOGRAM = 'histogram'
PLUGIN_DEBUG = 'DEBUG platform cpu'
PLUGIN_HIRES_INTERVAL = 1 # hi-resolution sample interval in secs
PLUGIN_DISPATCH_INTERVAL = 30 # dispatch interval in secs
PLUGIN_HISTOGRAM_INTERVAL = 300 # histogram interval in secs
TIMESTAMP = 'timestamp'
PLATFORM_CPU_PERCENT = 'platform-occupancy'
@ -42,25 +48,38 @@ SCHEDSTAT = '/proc/schedstat'
CPUACCT = pc.CGROUP_ROOT + '/cpuacct'
CPUACCT_USAGE = 'cpuacct.usage'
CPUACCT_USAGE_PERCPU = 'cpuacct.usage_percpu'
CPU_STAT = 'cpu.stat'
# Common regex pattern match groups
re_uid = re.compile(r'^pod(\S+)')
re_processor = re.compile(r'^[Pp]rocessor\s+:\s+(\d+)')
re_schedstat = re.compile(r'^cpu(\d+)\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+(\d+)\s+')
re_schedstat = re.compile(r'^cpu(\d+)\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+(\d+)\s+(\d+)\s+')
re_schedstat_version = re.compile(r'^version\s+(\d+)')
re_keyquoteval = re.compile(r'^\s*(\S+)\s*[=:]\s*\"(\S+)\"\s*')
re_cpu_wait_sum = re.compile(r'^wait_sum\s+(\d+)')
# hirunner minimum cpu occupancy threshold
HIRUNNER_MINIMUM_CPU_PERCENT = 0.1
# Set numpy format for printing bins
np.set_printoptions(formatter={'int': '{: 4d}'.format})
# Plugin specific control class and object.
class CPU_object(pc.PluginObject):
def __init__(self):
super(CPU_object, self).__init__(PLUGIN, '')
# CPU Plugin flags
self.dispatch = False # print occupancy and dispatch this sample
self.histogram = False # print occupancy histogram this sample
# CPU plugin configurable settings
self.debug = True
self.verbose = True
self.hires = False
# Cache Kubernetes pods data
self._cache = {}
self._k8s_client = pc.K8sClient()
self.k8s_pods = set()
@ -69,15 +88,50 @@ class CPU_object(pc.PluginObject):
self.schedstat_supported = True
self.number_platform_cpus = 0
# Platform CPU monitor
now = time.time() # epoch time in floating seconds
self._t0 = {} # cputime state info at start of sample interval
self._t0[TIMESTAMP] = now
self._t0_cpuacct = {}
self._data = {} # derived measurements at end of sample interval
self._data[PLATFORM_CPU_PERCENT] = 0.0
self.elapsed_ms = 0.0
# CPU State information at start of dispatch interval
self.d_t0 = {} # per-cpu cputime at dispatch time 0
self.d_w0 = {} # per-cpu cpuwait at dispatch time 0
self.d_t0[TIMESTAMP] = now # timestamp dispatch time 0
self.d_w0[TIMESTAMP] = now # timestamp dispatch time 0
self.d_t0_cpuacct = {} # per-cgroup cpuacct at dispatch time 0
self.d_t0_cpuwait = {} # per-cgroup cpuwait at dispatch time 0
# Derived measurements over dispatch interval
self.d_occ = {} # dispatch occupancy per cgroup or derived aggregate
self.d_occw = {} # dispatch occupancy wait per cgroup or derived aggregate
self.d_occ[PLATFORM_CPU_PERCENT] = 0.0 # dispatch platform occupancy
self.d_occw[PLATFORM_CPU_PERCENT] = 0.0 # dispatch platform occupancy wait
for g in pc.OVERALL_GROUPS:
self.d_occ[g] = 0.0
self.d_occw[g] = 0.0
self.d_elapsed_ms = 0.0 # dispatch elapsed time
# CPU State information at start of read sample interval
self._t0 = {} # per-cpu cputime at time 0
self._w0 = {} # per-cpu cpuwait at time 0
self._t0[TIMESTAMP] = now # timestamp time 0
self._w0[TIMESTAMP] = now # timestamp time 0
self._t0_cpuacct = {} # per-cgroup cpuacct at time 0
self._t0_cpuwait = {} # per-cgroup cpuwait at time 0
# Derived measurements over read sample interval
self._occ = {} # occupancy per cgroup or derived aggregate
self._occw = {} # occupancy wait per cgroup or derived aggregate
self._occ[PLATFORM_CPU_PERCENT] = 0.0 # platform occupancy
self._occw[PLATFORM_CPU_PERCENT] = 0.0 # platform occupancy wait
for g in pc.OVERALL_GROUPS:
self._occ[g] = 0.0
self._occw[g] = 0.0
self.elapsed_ms = 0.0 # elapsed time
# Derived measurements over histogram interval
self.hist_t0 = now # histogram timestamp time 0
self.hist_elapsed_ms = 0.0 # histogram elapsed time
self.hist_occ = {} # histogram bin counts per cgroup or derived aggregate
self.shared_bins = np.histogram_bin_edges(
np.array([0, 100], dtype=np.float64), bins=10, range=(0, 100))
# Instantiate the class
@ -87,13 +141,17 @@ obj = CPU_object()
def read_schedstat():
"""Read current hiresolution times per cpu from /proc/schedstats.
Return dictionary of cputimes in nanoseconds per cpu.
Return dictionary of cputimes in nanoseconds per cpu,
dictionary of cpuwaits in nanoseconds per cpu.
"""
cputime = {}
cpuwait = {}
# Obtain cumulative cputime (nanoseconds) from 7th field of
# /proc/schedstat. This is the time running tasks on this cpu.
# Obtain cumulative cputime (nanoseconds) from 7th field,
# and cumulative cpuwait (nanoseconds) from 8th field,
# from /proc/schedstat. This is the time running and waiting
# for tasks on this cpu.
try:
with open(SCHEDSTAT, 'r') as f:
for line in f:
@ -101,11 +159,13 @@ def read_schedstat():
if match:
k = int(match.group(1))
v = int(match.group(2))
w = int(match.group(3))
cputime[k] = v
cpuwait[k] = w
except Exception as err:
collectd.error('%s Cannot read schedstat, error=%s' % (PLUGIN, err))
return cputime
return cputime, cpuwait
def get_logical_cpus():
@ -202,8 +262,36 @@ def get_cgroup_cpuacct(path, cpulist=None):
return acct
def get_cgroup_cpu_wait_sum(path):
"""Get cgroup cpu.stat wait_sum usage for a specific cgroup path.
This represents the aggregate of all tasks wait time cfs_rq.
This tells us how suffering a task group is in the fight of
cpu resources.
Returns cumulative wait_sum in nanoseconds.
"""
wait_sum = 0
# Get the aggregate wait_sum for all cpus
fstat = '/'.join([path, CPU_STAT])
try:
with open(fstat, 'r') as f:
for line in f:
match = re_cpu_wait_sum.search(line)
if match:
v = int(match.group(1))
wait_sum = int(v)
except IOError:
# Silently ignore IO errors. It is likely the cgroup disappeared.
pass
return wait_sum
def get_cpuacct():
"""Get cpuacct usage based on cgroup hierarchy."""
"""Get cpuacct usage and wait_sum based on cgroup hierarchy."""
cpuacct = {}
cpuacct[pc.GROUP_OVERALL] = {}
@ -211,48 +299,86 @@ def get_cpuacct():
cpuacct[pc.GROUP_PODS] = {}
cpuacct[pc.CGROUP_SYSTEM] = {}
cpuacct[pc.CGROUP_USER] = {}
cpuacct[pc.CGROUP_INIT] = {}
cpuacct[pc.CGROUP_K8SPLATFORM] = {}
cpuwait = {}
cpuwait[pc.GROUP_OVERALL] = {}
cpuwait[pc.GROUP_FIRST] = {}
cpuwait[pc.GROUP_PODS] = {}
cpuwait[pc.CGROUP_SYSTEM] = {}
cpuwait[pc.CGROUP_USER] = {}
cpuwait[pc.CGROUP_INIT] = {}
cpuwait[pc.CGROUP_K8SPLATFORM] = {}
exclude_types = ['.mount']
# Overall cpuacct usage
acct = get_cgroup_cpuacct(CPUACCT, cpulist=obj.cpu_list)
wait = get_cgroup_cpu_wait_sum(CPUACCT)
cpuacct[pc.GROUP_OVERALL][pc.GROUP_TOTAL] = acct
cpuwait[pc.GROUP_OVERALL][pc.GROUP_TOTAL] = wait
# Initialize 'overhead' time (derived measurement). This will contain
# the remaining cputime not specifically tracked by first-level cgroups.
cpuacct[pc.GROUP_OVERALL][pc.GROUP_OVERHEAD] = acct
cpuwait[pc.GROUP_OVERALL][pc.GROUP_OVERHEAD] = wait
# Walk the first level cgroups and get cpuacct usage
# (e.g., docker, k8s-infra, user.slice, system.slice, machine.slice)
dir_list = next(os.walk(CPUACCT))[1]
for name in dir_list:
if any(name.endswith(x) for x in ['.mount', '.scope']):
if any(name.endswith(x) for x in exclude_types):
continue
cg_path = '/'.join([CPUACCT, name])
acct = get_cgroup_cpuacct(cg_path, cpulist=obj.cpu_list)
wait = get_cgroup_cpu_wait_sum(cg_path)
cpuacct[pc.GROUP_FIRST][name] = acct
cpuwait[pc.GROUP_FIRST][name] = wait
# Subtract out first-level cgroups. The remaining cputime represents
# systemd 'init' pid and kthreads on Platform cpus.
cpuacct[pc.GROUP_OVERALL][pc.GROUP_OVERHEAD] -= acct
cpuwait[pc.GROUP_OVERALL][pc.GROUP_OVERHEAD] -= wait
# Walk the system.slice cgroups and get cpuacct usage
path = '/'.join([CPUACCT, pc.CGROUP_SYSTEM])
dir_list = next(os.walk(path))[1]
for name in dir_list:
if any(name.endswith(x) for x in ['.mount', '.scope']):
if any(name.endswith(x) for x in exclude_types):
continue
cg_path = '/'.join([path, name])
acct = get_cgroup_cpuacct(cg_path, cpulist=obj.cpu_list)
wait = get_cgroup_cpu_wait_sum(cg_path)
cpuacct[pc.CGROUP_SYSTEM][name] = acct
cpuwait[pc.CGROUP_SYSTEM][name] = wait
# Walk the system.slice cgroups and get cpuacct usage
path = '/'.join([CPUACCT, pc.CGROUP_K8SPLATFORM])
if os.path.isdir(path):
dir_list = next(os.walk(path))[1]
else:
dir_list = []
for name in dir_list:
if any(name.endswith(x) for x in exclude_types):
continue
cg_path = '/'.join([path, name])
acct = get_cgroup_cpuacct(cg_path, cpulist=obj.cpu_list)
wait = get_cgroup_cpu_wait_sum(cg_path)
cpuacct[pc.CGROUP_K8SPLATFORM][name] = acct
cpuwait[pc.CGROUP_K8SPLATFORM][name] = wait
# Walk the user.slice cgroups and get cpuacct usage
path = '/'.join([CPUACCT, pc.CGROUP_USER])
dir_list = next(os.walk(path))[1]
for name in dir_list:
if any(name.endswith(x) for x in ['.mount', '.scope']):
if any(name.endswith(x) for x in exclude_types):
continue
cg_path = '/'.join([path, name])
acct = get_cgroup_cpuacct(cg_path, cpulist=obj.cpu_list)
wait = get_cgroup_cpu_wait_sum(cg_path)
cpuacct[pc.CGROUP_USER][name] = acct
cpuwait[pc.CGROUP_USER][name] = wait
# Walk the kubepods hierarchy to the pod level and get cpuacct usage.
# We can safely ignore reading this if the path does not exist.
@ -268,8 +394,357 @@ def get_cpuacct():
uid = match.group(1)
cg_path = os.path.join(root, name)
acct = get_cgroup_cpuacct(cg_path)
wait = get_cgroup_cpu_wait_sum(cg_path)
cpuacct[pc.GROUP_PODS][uid] = acct
return cpuacct
cpuwait[pc.GROUP_PODS][uid] = wait
return cpuacct, cpuwait
def calculate_occupancy(
prefix, hires, dispatch,
cache,
t0, t1,
w0, w1,
t0_cpuacct, t1_cpuacct,
t0_cpuwait, t1_cpuwait,
occ, occw,
elapsed_ms,
number_platform_cpus,
cpu_list, debug):
"""Calculate average occupancy and wait for platform cpus and cgroups.
This calculates:
- per-cpu cputime delta between time 0 and time 1 (ms)
- per-cpu cpuwait delta between time 0 and time 1 (ms)
- average platform occupancy based on cputime (%)
- average platform occupancy wait based on cpuwait (%)
- per-cgroup cpuacct delta between time 0 and time 1
- per-cgroup cpuwait delta between time 0 and time 1
- average per-cgroup occupancy based on cpuacct (%)
- average per-cgroup occupancy wait based on cpuwait (%)
- aggregate occupancy of specific cgroup groupings (%)
- aggregate occupancy wait of specific cgroup groupings (%)
This logs platform occupancy and aggregate cgroup groupings.
This logs of hirunner occupancy for base cgroups.
"""
# Aggregate cputime and cpuwait delta for platform logical cpus
cputime_ms = 0.0
cpuwait_ms = 0.0
for cpu in cpu_list:
# Paranoia check, we should never hit this.
if cpu not in t0 or cpu not in w0:
collectd.error('%s cputime initialization error' % (PLUGIN))
break
cputime_ms += float(t1[cpu] - t0[cpu])
cpuwait_ms += float(w1[cpu] - w0[cpu])
cputime_ms /= float(pc.ONE_MILLION)
cpuwait_ms /= float(pc.ONE_MILLION)
# Calculate average occupancy and wait of platform logical cpus
p_occ = 0.0
p_occw = 0.0
if number_platform_cpus > 0 and elapsed_ms > 0:
p_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(elapsed_ms) / number_platform_cpus
p_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
/ float(elapsed_ms) / number_platform_cpus
else:
p_occ = 0.0
p_occw = 0.0
if debug:
collectd.info('%s %s %s elapsed = %.1f ms, '
'cputime = %.1f ms, cpuwait = %.1f ms, '
'n_cpus = %d, '
'occupancy = %.2f %%, wait = %.2f %%'
% (PLUGIN_DEBUG,
prefix,
PLATFORM_CPU_PERCENT,
elapsed_ms,
cputime_ms, cpuwait_ms,
number_platform_cpus,
p_occ, p_occw))
occ[PLATFORM_CPU_PERCENT] = p_occ
occw[PLATFORM_CPU_PERCENT] = p_occw
# Calculate cpuacct and cpuwait delta for cgroup hierarchy, dropping transient cgroups
cpuacct = {}
for i in t1_cpuacct.keys():
cpuacct[i] = {}
for k, v in t1_cpuacct[i].items():
if i in t0_cpuacct.keys() and k in t0_cpuacct[i].keys():
cpuacct[i][k] = v - t0_cpuacct[i][k]
else:
cpuacct[i][k] = v
cpuwait = {}
for i in t1_cpuwait.keys():
cpuwait[i] = {}
for k, v in t1_cpuwait[i].items():
if i in t0_cpuwait.keys() and k in t0_cpuwait[i].keys():
cpuwait[i][k] = v - t0_cpuwait[i][k]
else:
cpuwait[i][k] = v
# Summarize cpuacct usage for various groupings we aggregate
for g in pc.GROUPS_AGGREGATED:
cpuacct[pc.GROUP_OVERALL][g] = 0.0
cpuwait[pc.GROUP_OVERALL][g] = 0.0
# Aggregate cpuacct usage by K8S pod
for uid in cpuacct[pc.GROUP_PODS]:
acct = cpuacct[pc.GROUP_PODS][uid]
wait = cpuwait[pc.GROUP_PODS][uid]
if uid in cache:
pod = cache[uid]
else:
collectd.warning('%s uid %s not found' % (PLUGIN, uid))
continue
# K8S platform system usage, i.e., essential: kube-system
# check for component label app.starlingx.io/component=platform
if pod.is_platform_resource():
cpuacct[pc.GROUP_OVERALL][pc.GROUP_K8S_SYSTEM] += acct
cpuwait[pc.GROUP_OVERALL][pc.GROUP_K8S_SYSTEM] += wait
# K8S platform addons usage, i.e., non-essential: monitor, openstack
if pod.namespace in pc.K8S_NAMESPACE_ADDON:
cpuacct[pc.GROUP_OVERALL][pc.GROUP_K8S_ADDON] += acct
cpuwait[pc.GROUP_OVERALL][pc.GROUP_K8S_ADDON] += wait
# Calculate base cpuacct usage (i.e., base tasks, exclude K8S and VMs)
# e.g., docker, system.slice, user.slice, init.scope
for name in cpuacct[pc.GROUP_FIRST].keys():
if name in pc.BASE_GROUPS:
cpuacct[pc.GROUP_OVERALL][pc.GROUP_BASE] += \
cpuacct[pc.GROUP_FIRST][name]
cpuwait[pc.GROUP_OVERALL][pc.GROUP_BASE] += \
cpuwait[pc.GROUP_FIRST][name]
elif name not in pc.BASE_GROUPS_EXCLUDE:
collectd.warning('%s could not find cgroup: %s' % (PLUGIN, name))
# Calculate system.slice container cpuacct usage
for g in pc.CONTAINERS_CGROUPS:
if g in cpuacct[pc.CGROUP_SYSTEM].keys():
cpuacct[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
cpuacct[pc.CGROUP_SYSTEM][g]
cpuwait[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
cpuwait[pc.CGROUP_SYSTEM][g]
if g in cpuacct[pc.CGROUP_K8SPLATFORM].keys():
cpuacct[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
cpuacct[pc.CGROUP_K8SPLATFORM][g]
cpuwait[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
cpuwait[pc.CGROUP_K8SPLATFORM][g]
# Calculate platform cpuacct usage (this excludes apps)
for g in pc.PLATFORM_GROUPS:
cpuacct[pc.GROUP_OVERALL][pc.GROUP_PLATFORM] += \
cpuacct[pc.GROUP_OVERALL][g]
cpuwait[pc.GROUP_OVERALL][pc.GROUP_PLATFORM] += \
cpuwait[pc.GROUP_OVERALL][g]
# Calculate cgroup based occupancy and wait for overall groupings
for g in pc.OVERALL_GROUPS:
cputime_ms = \
float(cpuacct[pc.GROUP_OVERALL][g]) / float(pc.ONE_MILLION)
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(elapsed_ms) / number_platform_cpus
occ[g] = g_occ
cpuwait_ms = \
float(cpuwait[pc.GROUP_OVERALL][g]) / float(pc.ONE_MILLION)
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
/ float(elapsed_ms) / number_platform_cpus
occw[g] = g_occw
if obj.debug:
collectd.info('%s %s %s elapsed = %.1f ms, '
'cputime = %.1f ms, cpuwait = %.1f ms, '
'n_cpus = %d, '
'occupancy = %.2f %%, wait = %.2f %%'
% (PLUGIN_DEBUG,
prefix,
g,
elapsed_ms,
cputime_ms, cpuwait_ms,
number_platform_cpus,
g_occ, g_occ))
# Store occupancy hirunners
h_occ = {}
h_occw = {}
# Calculate cgroup based occupancy for first-level groupings
for g in cpuacct[pc.GROUP_FIRST]:
cputime_ms = \
float(cpuacct[pc.GROUP_FIRST][g]) / float(pc.ONE_MILLION)
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(elapsed_ms) / number_platform_cpus
occ[g] = g_occ
cpuwait_ms = \
float(cpuwait[pc.GROUP_FIRST][g]) / float(pc.ONE_MILLION)
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
/ float(elapsed_ms) / number_platform_cpus
occw[g] = g_occw
if g != pc.CGROUP_INIT:
continue
# Keep hirunners exceeding minimum threshold.
if g_occ >= HIRUNNER_MINIMUM_CPU_PERCENT:
h_occ[g] = g_occ
if g_occw >= HIRUNNER_MINIMUM_CPU_PERCENT:
h_occw[g] = g_occw
# Calculate cgroup based occupancy for cgroups within system.slice.
for g in cpuacct[pc.CGROUP_SYSTEM]:
cputime_ms = \
float(cpuacct[pc.CGROUP_SYSTEM][g]) / float(pc.ONE_MILLION)
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(elapsed_ms) / number_platform_cpus
occ[g] = g_occ
cpuwait_ms = \
float(cpuwait[pc.CGROUP_SYSTEM][g]) / float(pc.ONE_MILLION)
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
/ float(elapsed_ms) / number_platform_cpus
occw[g] = g_occw
# Keep hirunners exceeding minimum threshold.
if g_occ >= HIRUNNER_MINIMUM_CPU_PERCENT:
h_occ[g] = g_occ
if g_occw >= HIRUNNER_MINIMUM_CPU_PERCENT:
h_occw[g] = g_occw
# Calculate cgroup based occupancy for cgroups within k8splatform.slice.
if pc.CGROUP_K8SPLATFORM in cpuacct.keys():
for g in cpuacct[pc.CGROUP_K8SPLATFORM]:
cputime_ms = \
float(cpuacct[pc.CGROUP_K8SPLATFORM][g]) / float(pc.ONE_MILLION)
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(elapsed_ms) / number_platform_cpus
occ[g] = g_occ
cpuwait_ms = \
float(cpuwait[pc.CGROUP_K8SPLATFORM][g]) / float(pc.ONE_MILLION)
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
/ float(elapsed_ms) / number_platform_cpus
occw[g] = g_occw
# Keep hirunners exceeding minimum threshold.
if g_occ >= HIRUNNER_MINIMUM_CPU_PERCENT:
h_occ[g] = g_occ
if g_occw >= HIRUNNER_MINIMUM_CPU_PERCENT:
h_occw[g] = g_occw
# Calculate cgroup based occupancy for cgroups within user.slice.
for g in cpuacct[pc.CGROUP_USER]:
cputime_ms = \
float(cpuacct[pc.CGROUP_USER][g]) / float(pc.ONE_MILLION)
g_occ = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(elapsed_ms) / number_platform_cpus
occ[g] = g_occ
cpuwait_ms = \
float(cpuwait[pc.CGROUP_USER][g]) / float(pc.ONE_MILLION)
g_occw = float(pc.ONE_HUNDRED) * float(cpuwait_ms) \
/ float(elapsed_ms) / number_platform_cpus
occw[g] = g_occw
# Keep hirunners exceeding minimum threshold.
if g_occ >= HIRUNNER_MINIMUM_CPU_PERCENT:
h_occ[g] = g_occ
if g_occw >= HIRUNNER_MINIMUM_CPU_PERCENT:
h_occw[g] = g_occw
if (hires and prefix == 'hires') or (dispatch and prefix == 'dispatch'):
# Print cpu occupancy usage for high-level groupings
collectd.info('%s %s Usage: %.1f%% (avg per cpu); '
'cpus: %d, Platform: %.1f%% '
'(Base: %.1f, k8s-system: %.1f), k8s-addon: %.1f, '
'%s: %.1f, %s: %.1f'
% (PLUGIN, prefix,
occ[PLATFORM_CPU_PERCENT],
number_platform_cpus,
occ[pc.GROUP_PLATFORM],
occ[pc.GROUP_BASE],
occ[pc.GROUP_K8S_SYSTEM],
occ[pc.GROUP_K8S_ADDON],
pc.GROUP_CONTAINERS,
occ[pc.GROUP_CONTAINERS],
pc.GROUP_OVERHEAD,
occ[pc.GROUP_OVERHEAD]))
# Print hirunner cpu occupancy usage for base cgroups
occs = ', '.join(
'{}: {:.1f}'.format(k.split('.', 1)[0], v) for k, v in sorted(
h_occ.items(), key=lambda t: -float(t[1]))
)
collectd.info('%s %s %s: %.1f%%; cpus: %d, (%s)'
% (PLUGIN,
prefix, 'Base usage',
occ[pc.GROUP_BASE],
number_platform_cpus,
occs))
# Print hirunner cpu wait for base cgroups
occws = ', '.join(
'{}: {:.1f}'.format(k.split('.', 1)[0], v) for k, v in sorted(
h_occw.items(), key=lambda t: -float(t[1]))
)
collectd.info('%s %s %s: %.1f%%; cpus: %d, (%s)'
% (PLUGIN,
prefix, 'Base wait',
occw[pc.GROUP_BASE],
number_platform_cpus,
occws))
def aggregate_histogram(histogram, occ, shared_bins, hist_occ, debug):
"""Aggregate occupancy histogram bins for platform cpus and cgroups.
This aggregates occupancy histogram bins for each key measurement.
When 'histogram' flag is True, this will:
- calculate mean, 95th-percentime, and max statistics, and bins
the measurements
- log histograms and statistics per measurement in hirunner order
"""
# Aggregate each key, value into histogram bins
for k, v in occ.items():
# Get abbreviated name (excludes: .service, .scope, .socket, .mount)
# eg, 'k8splatform.slice' will shorten to 'k8splatform'
key = k.split('.', 1)[0]
if key not in hist_occ:
hist_occ[key] = np.array([], dtype=np.float64)
if v is not None:
hist_occ[key] = np.append(hist_occ[key], v)
if histogram:
# Calculate histograms and statistics for each key measurement
H = {}
for k, v in hist_occ.items():
H[k] = {}
H[k]['count'] = hist_occ[k].size
if H[k]['count'] > 0:
H[k]['mean'] = np.mean(hist_occ[k])
H[k]['p95'] = np.percentile(hist_occ[k], 95)
H[k]['pmax'] = np.max(hist_occ[k])
H[k]['hist'], _ = np.histogram(hist_occ[k], bins=shared_bins)
else:
H[k]['mean'] = 0
H[k]['p95'] = 0.0
H[k]['pmax'] = 0.0
H[k]['hist'] = []
# Print out each histogram, sort by cpu occupancy hirunners
bins = ' '.join('{:4d}'.format(int(x)) for x in shared_bins[1:])
collectd.info('%s: %26.26s : bins=[%s]'
% (PLUGIN_HISTOGRAM, 'component', bins))
for k, v in sorted(H.items(), key=lambda t: -float(t[1]['mean'])):
if v['mean'] > HIRUNNER_MINIMUM_CPU_PERCENT:
collectd.info('%s: %26.26s : hist=%s : cnt: %3d, '
'mean: %5.1f %%, p95: %5.1f %%, max: %5.1f %%'
% (PLUGIN_HISTOGRAM, k, v['hist'], v['count'],
v['mean'], v['p95'], v['pmax']))
def update_cpu_data(init=False):
@ -287,23 +762,36 @@ def update_cpu_data(init=False):
# Calculate elapsed time delta since last run
obj.elapsed_ms = float(pc.ONE_THOUSAND) * (now - obj._t0[TIMESTAMP])
obj.d_elapsed_ms = float(pc.ONE_THOUSAND) * (now - obj.d_t0[TIMESTAMP])
obj.hist_elapsed_ms = float(pc.ONE_THOUSAND) * (now - obj.hist_t0)
# Prevent calling this routine too frequently (<= 1 sec)
if not init and obj.elapsed_ms <= 1000.0:
return
# Check whether this is a dispatch interval
if obj.d_elapsed_ms >= 1000.0 * PLUGIN_DISPATCH_INTERVAL:
obj.dispatch = True
# Check whether this is a histogram interval
if obj.hist_elapsed_ms >= 1000.0 * PLUGIN_HISTOGRAM_INTERVAL:
obj.histogram = True
t1 = {}
w1 = {}
t1[TIMESTAMP] = now
w1[TIMESTAMP] = now
if obj.schedstat_supported:
# Get current per-cpu cumulative cputime usage from /proc/schedstat.
cputimes = read_schedstat()
cputime, cpuwait = read_schedstat()
for cpu in obj.cpu_list:
t1[cpu] = cputimes[cpu]
t1[cpu] = cputime[cpu]
w1[cpu] = cpuwait[cpu]
else:
return
# Get current cpuacct usages based on cgroup hierarchy
t1_cpuacct = get_cpuacct()
# Get current cpuacct usages and wait_sum based on cgroup hierarchy
t1_cpuacct, t1_cpuwait = get_cpuacct()
# Refresh the k8s pod information if we have discovered new cgroups
cg_pods = set(t1_cpuacct[pc.GROUP_PODS].keys())
@ -350,154 +838,73 @@ def update_cpu_data(init=False):
del obj._cache[uid]
except ApiException:
# continue with remainder of calculations, keeping cache
collectd.warning("cpu plugin encountered kube ApiException")
collectd.warning('%s encountered kube ApiException' % (PLUGIN))
pass
# Save initial state information
if init:
obj.d_t0 = copy.deepcopy(t1)
obj.d_w0 = copy.deepcopy(w1)
obj.d_t0_cpuacct = copy.deepcopy(t1_cpuacct)
obj.d_t0_cpuwait = copy.deepcopy(t1_cpuwait)
obj._t0 = copy.deepcopy(t1)
obj._w0 = copy.deepcopy(w1)
obj._t0_cpuacct = copy.deepcopy(t1_cpuacct)
obj._t0_cpuwait = copy.deepcopy(t1_cpuwait)
return
# Aggregate cputime delta for platform logical cpus using integer math
cputime_ms = 0.0
for cpu in obj.cpu_list:
# Paranoia check, we should never hit this.
if cpu not in obj._t0:
collectd.error('%s cputime initialization error' % (PLUGIN))
break
cputime_ms += float(t1[cpu] - obj._t0[cpu])
cputime_ms /= float(pc.ONE_MILLION)
# Calculate average cpu occupancy for hi-resolution read sample
prefix = 'hires'
calculate_occupancy(
prefix, obj.hires, obj.dispatch,
obj._cache,
obj._t0, t1,
obj._w0, w1,
obj._t0_cpuacct, t1_cpuacct,
obj._t0_cpuwait, t1_cpuwait,
obj._occ, obj._occw,
obj.elapsed_ms,
obj.number_platform_cpus,
obj.cpu_list,
obj.debug)
# Calculate average occupancy of platform logical cpus
occupancy = 0.0
if obj.number_platform_cpus > 0 and obj.elapsed_ms > 0:
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(obj.elapsed_ms) / obj.number_platform_cpus
else:
occupancy = 0.0
obj._data[PLATFORM_CPU_PERCENT] = occupancy
if obj.debug:
collectd.info('%s %s elapsed = %.1f ms, cputime = %.1f ms, '
'n_cpus = %d, occupancy = %.2f %%'
% (PLUGIN_DEBUG,
PLATFORM_CPU_PERCENT,
obj.elapsed_ms,
cputime_ms,
obj.number_platform_cpus,
occupancy))
# Aggregate occupancy histogram bins
aggregate_histogram(
obj.histogram, obj._occ, obj.shared_bins, obj.hist_occ, obj.debug)
# Calculate cpuacct delta for cgroup hierarchy, dropping transient cgroups
cpuacct = {}
for i in t1_cpuacct.keys():
cpuacct[i] = {}
for k, v in t1_cpuacct[i].items():
if i in obj._t0_cpuacct and k in obj._t0_cpuacct[i]:
cpuacct[i][k] = v - obj._t0_cpuacct[i][k]
else:
cpuacct[i][k] = v
# Clear histogram data for next interval
if obj.histogram:
obj.histogram = False
obj.hist_occ = {}
obj.hist_t0 = now
# Summarize cpuacct usage for various groupings we aggregate
for g in pc.GROUPS_AGGREGATED:
cpuacct[pc.GROUP_OVERALL][g] = 0.0
# Aggregate cpuacct usage by K8S pod
for uid in cpuacct[pc.GROUP_PODS]:
acct = cpuacct[pc.GROUP_PODS][uid]
if uid in obj._cache:
pod = obj._cache[uid]
else:
collectd.warning('%s uid %s not found' % (PLUGIN, uid))
continue
# K8S platform system usage, i.e., essential: kube-system
# check for component label app.starlingx.io/component=platform
if pod.is_platform_resource():
cpuacct[pc.GROUP_OVERALL][pc.GROUP_K8S_SYSTEM] += acct
# K8S platform addons usage, i.e., non-essential: monitor, openstack
if pod.namespace in pc.K8S_NAMESPACE_ADDON:
cpuacct[pc.GROUP_OVERALL][pc.GROUP_K8S_ADDON] += acct
# Calculate base cpuacct usage (i.e., base tasks, exclude K8S and VMs)
# e.g., docker, system.slice, user.slice
for name in cpuacct[pc.GROUP_FIRST]:
if name in pc.BASE_GROUPS:
cpuacct[pc.GROUP_OVERALL][pc.GROUP_BASE] += \
cpuacct[pc.GROUP_FIRST][name]
elif name not in pc.BASE_GROUPS_EXCLUDE:
collectd.warning('%s could not find cgroup: %s' % (PLUGIN, name))
# Calculate system.slice container cpuacct usage
for g in pc.CONTAINERS_CGROUPS:
if g in cpuacct[pc.CGROUP_SYSTEM]:
cpuacct[pc.GROUP_OVERALL][pc.GROUP_CONTAINERS] += \
cpuacct[pc.CGROUP_SYSTEM][g]
# Calculate platform cpuacct usage (this excludes apps)
for g in pc.PLATFORM_GROUPS:
cpuacct[pc.GROUP_OVERALL][pc.GROUP_PLATFORM] += \
cpuacct[pc.GROUP_OVERALL][g]
# Calculate cgroup based occupancy for overall groupings
for g in pc.OVERALL_GROUPS:
cputime_ms = \
float(cpuacct[pc.GROUP_OVERALL][g]) / float(pc.ONE_MILLION)
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(obj.elapsed_ms) / obj.number_platform_cpus
obj._data[g] = occupancy
if obj.debug:
collectd.info('%s %s elapsed = %.1f ms, cputime = %.1f ms, '
'n_cpus = %d, occupancy = %.2f %%'
% (PLUGIN_DEBUG,
g,
obj.elapsed_ms,
cputime_ms,
obj.number_platform_cpus,
occupancy))
# Calculate cgroup based occupancy for first-level groupings
for g in cpuacct[pc.GROUP_FIRST]:
cputime_ms = \
float(cpuacct[pc.GROUP_FIRST][g]) / float(pc.ONE_MILLION)
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(obj.elapsed_ms) / obj.number_platform_cpus
obj._data[g] = occupancy
# Calculate cgroup based occupancy for cgroups within
# system.slice and user.slice, keeping the hirunners
# exceeding minimum threshold.
occ = {}
for g in cpuacct[pc.CGROUP_SYSTEM]:
cputime_ms = \
float(cpuacct[pc.CGROUP_SYSTEM][g]) / float(pc.ONE_MILLION)
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(obj.elapsed_ms) / obj.number_platform_cpus
obj._data[g] = occupancy
if occupancy >= HIRUNNER_MINIMUM_CPU_PERCENT:
occ[g] = occupancy
for g in cpuacct[pc.CGROUP_USER]:
cputime_ms = \
float(cpuacct[pc.CGROUP_USER][g]) / float(pc.ONE_MILLION)
occupancy = float(pc.ONE_HUNDRED) * float(cputime_ms) \
/ float(obj.elapsed_ms) / obj.number_platform_cpus
obj._data[g] = occupancy
if occupancy >= HIRUNNER_MINIMUM_CPU_PERCENT:
occ[g] = occupancy
occs = ', '.join(
'{}: {:.1f}'.format(k.split('.', 1)[0], v) for k, v in sorted(
occ.items(), key=lambda t: -float(t[1]))
)
collectd.info('%s %s: %.1f%%; cpus: %d, (%s)'
% (PLUGIN,
'Base usage',
obj._data[pc.GROUP_BASE],
obj.number_platform_cpus,
occs))
# Calculate average cpu occupancy for dispatch interval
if obj.dispatch:
prefix = 'dispatch'
calculate_occupancy(
prefix, obj.hires, obj.dispatch,
obj._cache,
obj.d_t0, t1,
obj.d_w0, w1,
obj.d_t0_cpuacct, t1_cpuacct,
obj.d_t0_cpuwait, t1_cpuwait,
obj.d_occ, obj.d_occw,
obj.d_elapsed_ms,
obj.number_platform_cpus,
obj.cpu_list,
obj.debug)
# Update t0 state for the next sample collection
obj._t0 = copy.deepcopy(t1)
obj._w0 = copy.deepcopy(w1)
obj._t0_cpuacct = copy.deepcopy(t1_cpuacct)
obj._t0_cpuwait = copy.deepcopy(t1_cpuwait)
if obj.dispatch:
obj.d_t0 = copy.deepcopy(t1)
obj.d_w0 = copy.deepcopy(w1)
obj.d_t0_cpuacct = copy.deepcopy(t1_cpuacct)
obj.d_t0_cpuwait = copy.deepcopy(t1_cpuwait)
def config_func(config):
@ -510,9 +917,11 @@ def config_func(config):
obj.debug = pc.convert2boolean(val)
elif key == 'verbose':
obj.verbose = pc.convert2boolean(val)
elif key == 'hires':
obj.hires = pc.convert2boolean(val)
collectd.info('%s debug=%s, verbose=%s'
% (PLUGIN, obj.debug, obj.verbose))
collectd.info('%s debug=%s, verbose=%s, hires=%s'
% (PLUGIN, obj.debug, obj.verbose, obj.hires))
return pc.PLUGIN_PASS
@ -598,55 +1007,41 @@ def read_func():
collectd.info('%s no cpus to monitor' % PLUGIN)
return pc.PLUGIN_PASS
# Gather current cputime state information, and calculate occupancy since
# this routine was last run.
# Gather current cputime state information, and calculate occupancy
# since this routine was last run.
update_cpu_data()
# Prevent dispatching measurements at plugin startup
if obj.elapsed_ms <= 1000.0:
if obj.elapsed_ms <= 500.0:
return pc.PLUGIN_PASS
if obj.verbose:
collectd.info('%s Usage: %.1f%% (avg per cpu); '
'cpus: %d, Platform: %.1f%% '
'(Base: %.1f, k8s-system: %.1f), k8s-addon: %.1f, '
'%s: %.1f, %s: %.1f'
% (PLUGIN, obj._data[PLATFORM_CPU_PERCENT],
obj.number_platform_cpus,
obj._data[pc.GROUP_PLATFORM],
obj._data[pc.GROUP_BASE],
obj._data[pc.GROUP_K8S_SYSTEM],
obj._data[pc.GROUP_K8S_ADDON],
pc.GROUP_CONTAINERS,
obj._data[pc.GROUP_CONTAINERS],
pc.GROUP_OVERHEAD,
obj._data[pc.GROUP_OVERHEAD]))
# Fault insertion code to assis in regression UT
#
# if os.path.exists('/var/run/fit/cpu_data'):
# with open('/var/run/fit/cpu_data', 'r') as infile:
# for line in infile:
# obj._data[PLATFORM_CPU_PERCENT] = float(line)
# obj._occ[PLATFORM_CPU_PERCENT] = float(line)
# collectd.info("%s using FIT data:%.2f" %
# (PLUGIN, obj._data[PLATFORM_CPU_PERCENT] ))
# (PLUGIN, obj._occ[PLATFORM_CPU_PERCENT] ))
# break
# Dispatch overall platform cpu usage percent value
val = collectd.Values(host=obj.hostname)
val.plugin = 'cpu'
val.type = 'percent'
val.type_instance = 'used'
val.dispatch(values=[obj._data[PLATFORM_CPU_PERCENT]])
if obj.dispatch:
# Dispatch overall platform cpu usage percent value
val = collectd.Values(host=obj.hostname)
val.plugin = 'cpu'
val.type = 'percent'
val.type_instance = 'used'
val.dispatch(values=[obj.d_occ[PLATFORM_CPU_PERCENT]])
# Dispatch grouped platform cpu usage values
val = collectd.Values(host=obj.hostname)
val.plugin = 'cpu'
val.type = 'percent'
val.type_instance = 'occupancy'
for g in pc.OVERALL_GROUPS:
val.plugin_instance = g
val.dispatch(values=[obj._data[g]])
# Dispatch grouped platform cpu usage values
val = collectd.Values(host=obj.hostname)
val.plugin = 'cpu'
val.type = 'percent'
val.type_instance = 'occupancy'
for g in pc.OVERALL_GROUPS:
val.plugin_instance = g
val.dispatch(values=[obj.d_occ[g]])
obj.dispatch = False
# Calculate overhead cost of gathering metrics
if obj.debug:
@ -661,4 +1056,4 @@ def read_func():
# Register the config, init and read functions
collectd.register_config(config_func)
collectd.register_init(init_func)
collectd.register_read(read_func)
collectd.register_read(read_func, interval=PLUGIN_HIRES_INTERVAL)

View File

@ -1,5 +1,5 @@
#
# Copyright (c) 2018-2022 Wind River Systems, Inc.
# Copyright (c) 2018-2024 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
@ -618,22 +618,23 @@ def output_top_10_pids(pid_dict, message):
"""Outputs the top 10 pids with the formatted message.
Args:
pid_dict: Dict The Dictionary of PIDs with Name and RSS
message: Formatted String, the template message to be output.
pid_dict: dictionary {pid: {'name': name, 'rss: value}
message: Formatted String, template output message
"""
# Check that pid_dict has values
if not pid_dict:
return
proc = []
# Sort the dict based on Rss value from highest to lowest.
sorted_pid_dict = sorted(pid_dict.items(), key=lambda x: x[1]['rss'],
reverse=True)
# Convert sorted_pid_dict into a list
[proc.append((i[1].get('name'), format_iec(i[1].get('rss')))) for i in
sorted_pid_dict]
# Output top 10 entries of the list
collectd.info(message % (str(proc[:10])))
# Output top 10 RSS usage entries
mems = ', '.join(
'{}: {}'.format(
v.get('name', '-'),
format_iec(v.get('rss', 0.0))) for k, v in sorted(
pid_dict.items(),
key=lambda t: -float(t[1]['rss']))[:10]
)
collectd.info(message % (mems))
def config_func(config):
@ -777,10 +778,10 @@ def read_func():
# K8S platform addons usage, i.e., non-essential: monitor, openstack
if pod.namespace in pc.K8S_NAMESPACE_ADDON:
memory[pc.GROUP_OVERALL][pc.GROUP_K8S_ADDON] += MiB
# Limit output to every 5 minutes and after 29 seconds to avoid duplication
if datetime.datetime.now().minute % 5 == 0 and datetime.datetime.now(
).second > 29:
# Get per-process and per-pod RSS memory every 5 minutes
now = datetime.datetime.now()
if now.minute % 5 == 0 and now.second > 29:
# Populate the memory per process dictionary to output results
pids = get_platform_memory_per_process()
@ -795,13 +796,21 @@ def read_func():
for uid in group_pods:
if uid in obj._cache:
pod = obj._cache[uid]
# Ensure pods outside of Kube-System and Kube-Addon are only logged every 30 min
if datetime.datetime.now().minute % 30 == 0 and datetime.datetime.now().second > 29:
collectd.info(f'The pod:{pod.name} running in namespace:{pod.namespace} '
f'has the following processes{group_pods[uid]}')
# Log detailed memory usage of all pods every 30 minutes
if now.minute % 30 == 0 and now.second > 29:
mems = ', '.join(
'{}({}): {}'.format(
v.get('name', '-'),
k,
format_iec(v.get('rss', 0.0))) for k, v in sorted(
group_pods[uid].items(),
key=lambda t: -float(t[1]['rss']))
)
collectd.info(f'memory usage: Pod: {pod.name}, '
f'Namespace: {pod.namespace}, '
f'pids: {mems}')
else:
collectd.warning('%s: uid %s for pod %s not found in namespace %s' % (
PLUGIN, uid, pod.name, pod.namespace))
collectd.warning('%s: uid %s for pod not found' % (PLUGIN, uid))
continue
# K8S platform system usage, i.e., essential: kube-system
@ -815,16 +824,16 @@ def read_func():
for key in group_pods[uid]:
k8s_addon[key] = group_pods[uid][key]
message = 'The top 10 memory rss processes for the platform are : %s'
message = 'Top 10 memory usage pids: platform: %s'
output_top_10_pids(platform, message)
message = 'The top 10 memory rss processes for the Kubernetes System are :%s'
message = 'Top 10 memory usage pids: Kubernetes System: %s'
output_top_10_pids(k8s_system, message)
message = 'The top 10 memory rss processes Kubernetes Addon are :%s'
message = 'Top 10 memory usage pids: Kubernetes Addon: %s'
output_top_10_pids(k8s_addon, message)
message = 'The top 10 memory rss processes overall are :%s'
message = 'Top 10 memory usage pids: overall: %s'
output_top_10_pids(overall, message)
# Calculate base memory usage (i.e., normal memory, exclude K8S and VMs)

View File

@ -1,7 +1,7 @@
#
# SPDX-License-Identifier: Apache-2.0
#
# Copyright (C) 2019 Intel Corporation
# Copyright (C) 2019-2024 Intel Corporation
#
############################################################################
#
@ -741,7 +741,7 @@ def parse_ovs_appctl_bond_list(buf):
buf = buf.strip().split("\n")
result = {}
for idx, line in enumerate(buf):
if idx is 0:
if idx == 0:
continue
line = line.strip()
@ -837,7 +837,7 @@ def compare_interfaces(interfaces1, interfaces2):
len1 = len(set1 - set2)
len2 = len(set2 - set1)
if len1 is 0 and len2 is 0:
if len1 == 0 and len2 == 0:
return True
else:
return False

View File

@ -1,5 +1,5 @@
#
# Copyright (c) 2019-2022 Wind River Systems, Inc.
# Copyright (c) 2019-2024 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
@ -40,6 +40,7 @@ MIN_AUDITS_B4_FIRST_QUERY = 2
K8S_MODULE_MAJOR_VERSION = int(K8S_MODULE_VERSION.split('.')[0])
KUBELET_CONF = '/etc/kubernetes/kubelet.conf'
SSL_TLS_SUPPRESS = True
K8S_TIMEOUT = 2
# Standard units' conversion parameters (mebi, kibi)
# Reference: https://en.wikipedia.org/wiki/Binary_prefix
@ -83,9 +84,11 @@ GROUPS_AGGREGATED = [GROUP_PLATFORM, GROUP_BASE, GROUP_K8S_SYSTEM,
GROUP_K8S_ADDON, GROUP_CONTAINERS]
# First level cgroups -- these are the groups we know about
CGROUP_INIT = 'init.scope'
CGROUP_SYSTEM = 'system.slice'
CGROUP_USER = 'user.slice'
CGROUP_MACHINE = 'machine.slice'
CGROUP_K8SPLATFORM = 'k8splatform.slice'
CGROUP_DOCKER = 'docker'
CGROUP_K8S = K8S_ROOT
@ -98,7 +101,8 @@ CONTAINERS_CGROUPS = [CGROUP_SYSTEM_CONTAINERD, CGROUP_SYSTEM_DOCKER,
CGROUP_SYSTEM_KUBELET, CGROUP_SYSTEM_ETCD]
# Groupings by first level cgroup
BASE_GROUPS = [CGROUP_DOCKER, CGROUP_SYSTEM, CGROUP_USER]
BASE_GROUPS = [CGROUP_INIT, CGROUP_DOCKER, CGROUP_SYSTEM, CGROUP_USER,
CGROUP_K8SPLATFORM]
BASE_GROUPS_EXCLUDE = [CGROUP_K8S, CGROUP_MACHINE]
# Groupings of pods by kubernetes namespace
@ -750,18 +754,28 @@ class K8sClient(object):
# Debian
# kubectl --kubeconfig KUBELET_CONF get pods --all-namespaces \
# --selector spec.nodeName=the_host -o json
kube_results = subprocess.check_output(
['kubectl', '--kubeconfig', KUBELET_CONF,
'--field-selector', field_selector,
'get', 'pods', '--all-namespaces',
'-o', 'json'
]).decode()
json_results = json.loads(kube_results)
try:
kube_results = subprocess.check_output(
['kubectl', '--kubeconfig', KUBELET_CONF,
'--field-selector', field_selector,
'get', 'pods', '--all-namespaces',
'-o', 'json',
], timeout=K8S_TIMEOUT).decode()
json_results = json.loads(kube_results)
except subprocess.TimeoutExpired:
collectd.error('kube_get_local_pods: Timeout')
return []
except json.JSONDecodeError as e:
collectd.error('kube_get_local_pods: Could not parse json output, error=%s' % (str(e)))
return []
except subprocess.CalledProcessError as e:
collectd.error('kube_get_local_pods: Could not get pods, error=%s' % (str(e)))
return []
# convert the items to: kubernetes.client.V1Pod
api_items = [self._as_kube_pod(x) for x in json_results['items']]
return api_items
except Exception as err:
collectd.error("kube_get_local_pods: %s" % (err))
collectd.error("kube_get_local_pods: error=%s" % (str(err)))
raise
@ -783,7 +797,8 @@ class POD_object:
"""Check whether pod contains platform namespace or platform label"""
if (self.namespace in K8S_NAMESPACE_SYSTEM
or self.labels.get(PLATFORM_LABEL_KEY) == GROUP_PLATFORM):
or (self.labels is not None and
self.labels.get(PLATFORM_LABEL_KEY) == GROUP_PLATFORM)):
return True
return False

View File

@ -5,6 +5,7 @@ LoadPlugin python
<Module "cpu">
debug = false
verbose = true
hires = false
</Module>
Import "memory"
<Module "memory">
@ -21,5 +22,4 @@ LoadPlugin python
Import "remotels"
Import "service_res"
LogTraces = true
Encoding "utf-8"
</Plugin>

View File

@ -1,3 +1,10 @@
monitor-tools (1.0-2) unstable; urgency=medium
* Update schedtop to display cgroups from systemd services and Kubernetes pods
* Add watchpids to find created processes, typically short-lived
-- Jim Gauld <James.Gauld@windriver.com> Thu, 12 Sep 2024 09:54:55 -0400
monitor-tools (1.0-1) unstable; urgency=medium
* Initial release.

View File

@ -13,4 +13,5 @@ Description: Monitor tools package
This package contains data collection tools to monitor host performance.
Tools are general purpose engineering and debugging related.
Includes overall memory, cpu occupancy, per-task cpu,
per-task scheduling, per-task io.
per-task scheduling, per-task io, newly created short-lived-processes,
local port scanning.

View File

@ -5,7 +5,7 @@ Source: https://opendev.org/starlingx/utilities
Files: *
Copyright:
(c) 2013-2021 Wind River Systems, Inc
(c) 2013-2024 Wind River Systems, Inc
(c) Others (See individual files for more details)
License: Apache-2
Licensed under the Apache License, Version 2.0 (the "License");
@ -26,7 +26,7 @@ License: Apache-2
# If you want to use GPL v2 or later for the /debian/* files use
# the following clauses, or change it to suit. Delete these two lines
Files: debian/*
Copyright: 2021 Wind River Systems, Inc
Copyright: 2024 Wind River Systems, Inc
License: Apache-2
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@ -10,5 +10,8 @@ override_dh_install:
install -p memtop $(ROOT)/usr/bin
install -p schedtop $(ROOT)/usr/bin
install -p occtop $(ROOT)/usr/bin
install -p k8smetrics $(ROOT)/usr/bin
install -p portscanner $(ROOT)/usr/bin
install -p watchpids $(ROOT)/usr/bin
dh_install

View File

@ -1,6 +1,6 @@
---
debname: monitor-tools
debver: 1.0-1
debver: 1.0-2
src_path: scripts
revision:
dist: $STX_DIST

292
monitor-tools/scripts/k8smetrics Executable file
View File

@ -0,0 +1,292 @@
#!/usr/bin/env python
########################################################################
#
# Copyright (c) 2024 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
########################################################################
#
# Calculate Kubernetes latency percentile metrics (50%, 95, and 99%) for
# etcdserver and kube-apiserver. This is based on Prometheus format raw
# metrics histograms within kube-apiserver.
#
# This obtains current Kubernetes raw metrics cumulative counters,
# (e.g., kubectl get --raw /metrics). The counters represent cumulative
# frequency of delays <= value. This calculates the delta from previous,
# and does percentile calculation.
#
# Example:
# kubectl get --raw /metrics
#
# To see API calls:
# kubectl get --raw /metrics -v 6
#
# This does minimal parsing and aggregation to yield equivalent of the
# following Prometheus PromQL queries using data over a time-window:
# histogram_quantile(0.95, sum(rate(etcd_request_duration_seconds_bucket[5m])) by (le))