monitoring

Author	SHA1	Message	Date
Leonardo Fagundes Luz Serrano	462750e14b	Add debian package for monitor-tools Add debian packaging infrastructure for monitor-tools to build a debian package. Test Plan: build pkg; build image; compare with RPM PASS pkg builds PASS image builds PASS same contents and permissions as RPM Story: 2009101 Task: 43960 Signed-off-by: Leonardo Fagundes Luz Serrano <Leonardo.FagundesLuzSerrano@windriver.com> Change-Id: I2ba30a627cf2c64c88a3d0586d97fcafe117e669	2022-01-24 18:55:28 +00:00
Zuul	821aff9947	Merge "Re-enable important py3k checks for monitoring"	2021-11-03 14:02:30 +00:00
Zuul	3ff8c48cc3	Merge "Re-enable important py3k checks for monitoring kube-memory"	2021-10-28 14:29:49 +00:00
Bernardo Decco	7a028a29c0	Re-enable important py3k checks for monitoring Re-enabling some of the disabled tox warnings present on the pylint.rc file Re-enabling: W1638: range-builtin-not-iterating W1636: map-builtin-not-iterating Test Plan: Sanity test run on AIO-SX: PASS: test_system_health_pre_session[pods] PASS: test_system_health_pre_session[alarms] PASS: test_system_health_pre_session[system_apps] PASS: test_wr_analytics[deploy_and_remove] PASS: test_horizon_host_inventory_display PASS: test_lock_unlock_host[controller] PASS: test_pod_to_pod_connection PASS: test_pod_to_service_connection PASS: test_host_to_service_connection Story: 2006796 Task: 43443 Signed-off-by: Bernardo Decco <bernardo.deccodesiqueira@windriver.com> Change-Id: I6c13ae171ee4a41377dad55ed3c519ee710b4d88	2021-10-21 12:34:55 +00:00
Bernardo Decco	a469d8ad9b	Re-enable important py3k checks for monitoring kube-memory Re-enabling some of the disabled tox warnings present on the pylint.rc file Re-enabling: W1619: old-division W1633: round-builtin Test Plan: Sanity test run on AIO-SX: PASS: test_system_health_pre_session[pods] PASS: test_system_health_pre_session[alarms] PASS: test_system_health_pre_session[system_apps] PASS: test_wr_analytics[deploy_and_remove] PASS: test_horizon_host_inventory_display PASS: test_lock_unlock_host[controller] PASS: test_pod_to_pod_connection PASS: test_pod_to_service_connection PASS: test_host_to_service_connection Story: 2006796 Task: 43444 Signed-off-by: Bernardo Decco <bernardo.deccodesiqueira@windriver.com> Change-Id: I00dc37bbd8f60f475f85e4f0463b7c066a719f1f	2021-10-21 12:34:43 +00:00
Zuul	ea8ab4adf6	Merge "Add flux-helm to list of platform namespaces."	2021-10-08 20:16:07 +00:00
Tracey Bogue	b6096a6f98	Add flux-helm to list of platform namespaces. Story: 2009138 Task: 43078 Signed-off-by: Tracey Bogue <tracey.bogue@windriver.com> Change-Id: I24071faa51d90276b5b5787310ea7132e18cdb05	2021-10-04 08:29:30 -05:00
Bernardo Decco	e325886708	Removing py36 gates from zuul for monitoring Removing redundant py36 Zuul jobs since we now have py39 Zuul jobs in place with the debian nodeset Story: 2006796 Task: 43489 Signed-off-by: Bernardo Decco <bernardo.deccodesiqueira@windriver.com> Change-Id: I3e6fe3a146b3ac01218eb1428c3bad35b87c5c9c	2021-09-30 10:17:52 -03:00
Zuul	e5bdd73802	Merge "Add pylint py3 portability checks for the monitoring/kube-memory repo"	2021-09-15 21:10:17 +00:00
Fabricio Henrique Ramos	95715f15a4	Add pylint py3 portability checks for the monitoring/kube-memory repo A lot of work has gone into making sure that StarlingX is python3 compatible. To ensure future compatibility, enable the python3 portability checks. Disable the checks that are raising errors. Another set of commits will address the offending code. Add following suppress warnings in pylint.rc: - W1618: no-absolute-import - W1619: old-division - W1633: round-builtin Story: 2006796 Task: 43134 Signed-off-by: Fabricio Henrique Ramos <fabriciohenrique.ramos@windriver.com> Change-Id: Ib3c97263d34328f6ffc27ef08690d23325654b42	2021-09-13 09:55:08 +00:00
Fabricio Henrique Ramos	95f00ae668	Add pylint py3 portability checks for the monitoring/kube-cpusets repo A lot of work has gone into making sure that StarlingX is python3 compatible. To ensure future compatibility, enable the python3 portability checks. Disable the checks that are raising errors. Another set of commits will address the offending code. Add following suppress warnings in pylint.rc: - W1618: no-absolute-import - W1636: map-builtin-not-iterating - W1638: range-builtin-not-iterating Story: 2006796 Task: 43135 Signed-off-by: Fabricio Henrique Ramos <fabriciohenrique.ramos@windriver.com> Change-Id: I6ebe9c7215f1e4622a81b0dd79b36cfcc6a7d86f	2021-09-13 09:54:54 +00:00
Zuul	f8648f0b41	Merge "py3: Add support for python 3.9"	2021-08-31 13:45:14 +00:00
Charles Short	5b62a25ca4	py3: Add support for python 3.9 Enable python 3.9 in tox and zuul gate. Story: 2009101 Task: 43104 Signed-off-by: Charles Short <charles.short@windriver.com> Change-Id: I3ebb23574ad34a1078fae3fbbc65ce5457d46c69	2021-08-27 11:42:44 -04:00
Eric MacDonald	fcc8ddda66	Change platform memory usage instance type to 'memory' The platform memory data-set type is currently set to 'percent'. It is possible to over subscribe platform memory usage to more than 100%. Collectd drops sample values that are greater than 100 when its data-set type is 'percent'. Collectd considers a percent value greater than 100 to be an invalid value. This update changes the data-set type for platform memory usage from 'percent' to 'memory' to allow memory usage values greater than 100 to be handled. Test Plan: PASS: Verify that platform memory overage alarm value is reported as the 'actual' value in the alarm Reason Text. PASS: Verify platform memory usage values that exceed the major threshold are alarmed 'major'. PASS: Verify platform memory usage values that exceed the critical threshold are alarmed 'critical', even if the debounced value exceeds 100. PASS: Verify ridiculously large values are still alarmed and that value is still included in the alarm Reason Text. Change-Id: I7189671e20c92656f820fda74c4871504d89e73a Closes-Bug: 1940875 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2021-08-23 20:05:20 -04:00
John Kung	ecd744ba0a	Handle kube ApiException during collectd platform monitoring During stress test/high platform load it is possible that the kube-apiserver responds with an kube ApiException. As platform monitoring of cpu and memory should not be affected by unresponsive kube-api server, allow the kube ApiException to be handled and the remaining platform resource utilization monitoring to proceed. This could help identify the issue by allowing the raise of the platform alarm (e.g. 100.101 Platform CPU threshold exceeded, 100.103 Memory threshold exceeded). Verfied: o Platform CPU Alarm is raised with stress test o Platform CPU Alarm is raised with stress test and intermittent ApiException o Memory Alarm is raised with stress test o Memory Alarm is raised with stress test and intermittent ApiException o the above alarm conditions are cleared after debounce when stress condition is removed Closes-Bug: 1939172 Signed-off-by: John Kung <john.kung@windriver.com> Change-Id: I2c9c39a390af1d7ae752ad00db18384479cf6e99	2021-08-11 08:00:41 -05:00
Andrei Grosu	aa8665cebf	Fix startup issues for collectd. - Use encodeutils from olso library to handle string encodings. - Expand the generator into a list. - Use python3 iterator __next__(). Note: there needs to be a separate task to remove the Encoding parameter from python_plugins.conf which can be cherry-picked only for python3 deployments. Story: 2008454 Task: 42647 Depends-On: Iaa7bd0cadd3b1d097b276dcc37ebceaeb208a6a5 Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com> Change-Id: I58cd4829806e98b1e15471ce97a7c7ba6a2fe135 (cherry picked from commit f2e5263206c87dde9b602c3660b8191398d9c555)	2021-07-27 08:46:39 -04:00
Fernando Theirs	27db764e67	Remove InfluxDB InfluxDB was not fully productized, nor is it used by other end-users. It should therefore be removed from all deployments to avoid it consume unnecessary resources (cpu, memory and storage). Parts of system's dependencies with InfluxDB were remove here. Story: 2009018 Task: 42761 Depends-On: https://review.opendev.org/799502 Signed-off-by: Fernando Theirs <Fernando.Theirs@windriver.com> Change-Id: I85acf8a94e54171162b9be6fbf816532cf602831	2021-07-15 12:02:38 -03:00
Zuul	80585f539d	Merge "Fix zuul errors due to changes in dependencies"	2021-06-21 15:20:33 +00:00
Zuul	1900c2fdfa	Merge "Better repair action for alarm 100.104"	2021-05-31 12:12:54 +00:00
Jerry Sun	b425fe849a	Better repair action for alarm 100.104 This commit adds a better proposed repair action for filesystem threshold alarm 100.104. Closes-Bug: 1927155 Signed-off-by: Jerry Sun <jerry.sun@windriver.com> Change-Id: I1a27d4bc438b98c00d0fe4eb3b30e4672552f90a	2021-05-28 12:51:07 -04:00
Zuul	ffafbeae6a	Merge "Add kube-memory tool to summarize memory usage"	2021-05-25 17:32:57 +00:00
Enzo Candotti	36c8ae8395	Add kube-memory tool to summarize memory usage This tool gathers memory usage information for all kubernetes containers and system services displayed in cgroup memory, that are running on current host. This displays the total resident set size per namespace and container, the aggregate memory usage per system service, and the platform memory usage. Closes-Bug: 1886868 Signed-off-by: Enzo Candotti <enzo.candotti@windriver.com> Change-Id: Id130ed0d2794cdd555bdb068e8453cb8e9bd29d2	2021-05-22 18:51:26 -03:00
Takamasa Takenaka	2ef5451f44	Format 2 lines ntpq data into 1 lines The problem was logic expected one line data for ntpq result. But it was 2 lines for each ntp server entry. When peer server is selected, script checked refid if refid is reliable or not but it could not find because refid is in the following line. This fix formats 2 lines data into 1 line. The minor alarm "minor alarm "NTP cannot reach external time source; syncing with peer controller only" is removed because NTP does not prioritize external time source over peer. Closes-Bug: 1889101 Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com> Change-Id: Icc8316bb1a7041bf0351165c671ebf35b97fa3bc	2021-04-29 10:38:05 -03:00
Charles Short	6a9358c261	Fix zuul errors due to changes in dependencies Pin hacking to < 4.0.1 to fix zuul gate issues. Test: Ran tox -e flake8 command to validate the flake8 job and result. Related-Bug: 1926172 Signed-off-by: Charles Short <charles.short@windriver.com> Change-Id: Ia2e746ba513c0d073b60e76b2d2afdfe8b6c9745	2021-04-26 11:45:02 -04:00
Eric MacDonald	d37490b814	Add alarm audit to starlingx collectd fm notifier plugin This update adds common plugin support for alarm state auditing. The audit is able to detect and correct the following alarm state errors: Error Case Correction Action ----------------------- ----------------- - stale alarm ; delete alarm - missing alarm ; assert alarm - alarm severity mismatch ; refresh alarm The common audit is enabled for the fm_notifier plugin that supports alarm managment for the following resources. - CPU with alarm id 100.101 - Memory with alarm id 100.103 - Filesystem with alarm id 100.104 Other plugins may use this common audit in the future but only the above resources have the audit enabled for them by this update. Test Plan: PASS: Verify stale alarm detection/correction handling PASS: Verify missing alarm detection/correction handling PASS: Verify alarm severity mismatch detection/correction handling PASS: Verify hosts only audits its own specified alarms PASS: Verify success path of monitoring a single and mix of base and instance alarms of varying severity while such alarm conditions come and go PASS: Verify alarm audit of mix of base and instance alarms over a collectd process restart PASS: Verify audit handling of alarm that migrates from major to critical to major to clear PASS: Verify audit handling transition between alarm and no alarm conditions PASS: Verify soak of random cpu, memory and filesystem overage alarm assertions and clears that also involve manual alarm deletions, assertions and severity changes that exercise new audit features Regression: PASS: Verify alarm and audit handling over Swact with mounted filesystem that has active alarm PASS: Verify collectd logs following a system install and while alarms are managed during above soak PASS: Verify behavior while FM is killed or stopped/started PASS: Verify Standard system install with Sanity and Regression PASS: Verify AIO DX/DC systems install with Sanity and Regression Closes-Bug: 1925210 Change-Id: I1cafd17ad07ec769240de92ae4e67cb1357f0992 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2021-04-20 11:48:51 -04:00
Zuul	3628db6e77	Merge "Bandit should only be installed in py3 env" vr/stx.5.0	2021-04-12 18:04:21 +00:00
albailey	14e1a9a82b	Bandit should only be installed in py3 env Running tox for linters fails since the bandit being pulled in is python3 only. This is similar to other bugs where a new version is released which drops py2 support. In this env, we only include bandit if we are testing and running in py3. Partial-Bug: 1922590 Change-Id: I11b7d974ae3b64e7846e1420521dee0d48128fc5 Signed-off-by: albailey <Al.Bailey@windriver.com>	2021-04-07 17:53:55 -04:00
Gerry Kopec	19460ecbd2	Add platform namespaces to collectd Add missing platform namespaces (armada, cert-manager, portieris, vault and notification) to collectd kubernetes system list. Change-Id: I341d802210388e5e1f3fd2d7a11fa0593c44fa68 Closes-Bug: 1922629 Signed-off-by: Gerry Kopec <gerry.kopec@windriver.com>	2021-04-06 22:23:39 -04:00
Eric MacDonald	a2a2a88887	Avoid loading collectd's default plugins The current opensource collectd rpm installs several default plugins, some that overlap starlingx developed plugins and others that simply collect way too much data. The plugins in question are: /etc/collectd.d/90-default-plugins-syslog.conf /etc/collectd.d/90-default-plugins-memory.conf /etc/collectd.d/90-default-plugins-load.conf /etc/collectd.d/90-default-plugins-interface.conf /etc/collectd.d/90-default-plugins-cpu.conf This update moves the value added starlingx plugins to /etc/collectd.d/starlingx and relies another puppet update to change the collectd's plugin search path accordingly. Test Plan: PASS: Verify default plugins are not loaded and they samples are not collected. PASS: Verify patch apply and remove. Note: is reboot required patch PASS: Verify the daily influxdb usage drops by 80-85%. Regression: PASS: Verify collectd alarm/degrade regression soak Change-Id: Ic7884ae69014fa274f0bd0515adec90b08747c67 Closes-Bug: 1905581 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2021-02-03 12:03:06 -05:00
Eric MacDonald	ea4b515f91	Add node ready check to collectd plugins This update adds a second collectd plugin initialization enhancement. First update added a config complete gate: https://review.opendev.org/c/starlingx/monitoring/+/736817 Turns out that not all plugins are ready to sample immediately following the config complete state. One example is FM on the active controller needs time to get going before plugins can query their alarms on startup. Also, some plugins need more time than others. To account for both cases this update adds a thresholded node ready gate that can be tailored to a plugin to hold off fm access and sampling until its ready threshold is reached. Test Plan: PASS: Verify AIO SX and DX system install PROG: Verify Storage system install PASS: Verify AIO SX node lock and unlock PASS: Verify AIO Standby controller lock and unlock PASS: Verify Standard controller lock and unlock PASS: Verify Compute and Storage node lock and unlock PASS: Verify Dead-Office-Recovery (AIO DX) PASS: Verify collectd sampling and logs Partial-Bug: 1872979 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com> Change-Id: I044d812542a4222214c7d13e231ac4024cca9800	2021-01-26 12:00:16 -05:00
Zuul	b66c85287d	Merge "Increase field widths of PID for schedtop"	2021-01-04 19:57:14 +00:00
Don Penney	3809c69d81	Add auto-version for remaining stx/monitoring packages Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to use PKG_GITREVCOUNT where possible, with offsets as needed to ensure the version is incremented above the hardcoded version. Story: 2008455 Task: 41463 Signed-off-by: Don Penney <don.penney@windriver.com> Change-Id: If41d630c97354014b12424ed305d6c5cbb022a5a	2020-12-17 13:25:29 -05:00
Zuul	1e951176df	Merge "Fix memory instance handling over collectd process restart"	2020-12-10 14:23:24 +00:00
Carmen Rata	81b7727a2e	Fix influxdb log file permissions Update /var/log/influxdb/influxd.log permissions to 640 from 644 to disallow world readable but at the same time to allow group read access. The changes are made to comply as much as possible with openscap rules security requirements. Verified that installation is successful for AIO-SX and Standard 2+2 system configurations. Story: 2008037 Task: 40694 Signed-off-by: Carmen Rata <carmen.rata@windriver.com> Change-Id: I284fc6882043b4a4d271bd5963fca94bc7a1e390	2020-12-02 13:08:03 -05:00
Eric MacDonald	c6cab97ee0	Fix memory instance handling over collectd process restart With a critical memory alarm raised, the collectd plugin fault notifier's degrade list is injected with the reporting plugin's name over a collectd process restart. The recent introduction of multiple instance based memory alarms has exposed a limitation in the management and content of the degrade list that can lead to both stuck degrade (this case) as well as missing degrade due to the lack of uniqueness of the content injected into the degrade list based on degradable events. This update modifies the content of the degrade list to ensure all entries are unique by using an alarm's entity id rather than the more generic plugin name. An additional issue was identified with respect to how filesystem usage overage alarms are managed, due to recent additions to the list of monitored filesystems. Filesystem overage alarms are also degrade list candidates so the aforementioned degrade list change needed to account for filesystem as well. One recently added monitored filesystem name conflicted with how filesystem instances were tracked that lead to a bouncing alarm if that filesystem experienced overage. Given that there was already a special case handling for the root fs, rather than add an additional special case to remedy this issue, the method of mapping filesystem-instance to mountpoint was changed from a list to a dictionary. With that cha nge there is no longer a limitation or special case handling required for filesystem mountpoints that conflicted with how the stock collectd plugin reports filesystem instances Test Plan: PASS: Verify existing alarm and degrade management of instance and non-instance based alarms ot both normal runtime as well as over a collectd process restart. PASS: Verify handling of non-instance based alarm(s) over process restart when the alarm condition no longer exists following the process restart. PASS; Verify degrade list management and content. PASS: Verify filesystem instance to mountpoint mapping. PASS: Verify data model content using state audit and list management with debug options turned on. PASS: Verify alarm and degrade handling of a filesystem and overage that follows the active controller. PASS: Verify update as patch Regression: PASS: Verify alarm and degrade handling of 'all' collectd plugins including over collectd process restarts. PASS: Verify alarm and degrade management stress soak that involved multiple plugins asserting/clearing multiple alarm and degradable conditions over a 24 hour period. Change-Id: I5ea389fb092a6404616d7ea0e8d54daa64ad7ea2 Closes-Bug: 1903731 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2020-11-30 11:30:15 -05:00
albailey	ee7ae99d41	Use newer flake8 on python3.8 zuul systems flake8 2.5.5 fails on ubuntu-focal zuul machines running python3.8 with the following error: AttributeError: 'FlakesChecker' object has no attribute 'CONSTANT' Suppresses the following: W503 line break before binary operator W504 line break after binary operator W605 invalid escape sequence '\d' E117 over-indented E266 too many leading '#' for block comment E305 expected 2 blank lines after class or function definition, found 1 E402 module level import not at top of file E722 do not use bare 'except' E741 ambiguous variable name 'I' F632 use ==/!= to compare constant literals F821 undefined name 'dpdk' (this is a flake8 bug) Change-Id: I6c2ef05d765b57b7be0b038d6e384cb2af589054 Partial-Bug: 1895054 Signed-off-by: albailey <Al.Bailey@windriver.com>	2020-11-05 15:33:28 -06:00
Jim Gauld	23489af038	Increase field widths of PID for schedtop This increases field width of TID, PID, and PPID to 7 wide for schedtop engineering tool. Newer systems support larger PIDs. Change-Id: I706b60d83e8ce341a7d07c4c067a74e7049acdad Closes-Bug: 1902954 Signed-off-by: Jim Gauld <james.gauld@windriver.com>	2020-11-04 17:22:34 -05:00
Sharath Kumar K	8ef034919c	Tox and Zuul job for the bandit code scan in stx/monitoring Setting up the bandit tool for the scanning of HIGH severity issues in the python codes under Starlingx/monitoring folder. Expecting this merge will enable zuul job for CI/CD of bandit scan. Configuration files: 1. tox.ini for adding bandit environment and command. 2. test-requirements.txt for adding bandit version. 3. .zuul.yaml file for adding bandit job and configuring under check job to run code scan every time before code commit. Test: Run tox -e bandit command inside the fault folder to validate the bandit scan and result. Story: 2007541 Task: 39684 Depends-On: https://review.opendev.org/#/c/721294/ Change-Id: Ibcbe1dd2e380f80c4cbf6f2a7cf49065dc890803 Signed-off-by: Sharath Kumar K <sharath.kumar@intel.com>	2020-07-14 15:48:17 +00:00
Zuul	4d9f256bb5	Merge "Add consistent init and config complete checks to collectd plugins" v4.0.0.rc0	2020-06-30 11:26:21 +00:00
Jim Gauld	1bdd9200bb	collectd cpu plugin does not always initialize This changes the initialization of per cgroup cpuacct timings to account for cgroup directories that may not be present at the time the plugin starts. As an example, the docker cgroup is created often much later or not at all. Change-Id: Iaf279e650cc16966b40c24a9f55f53fa4696a92b Closes-Bug: 1855733 Signed-off-by: Jim Gauld <james.gauld@windriver.com>	2020-06-26 17:01:30 -04:00
Eric MacDonald	63c8d1e55a	Add consistent init and config complete checks to collectd plugins Some of the collectd plugins are not waiting for configuration complete before starting to monitor or communicate with external services such as fm. This leads to the collectd networking plugin being triggered to run before or while the host is being configured which has been seen to lead to collectd segfaults/coredumps within the collectd's internal networking plugin. To solve this issue, reduce startup thrash and a slew of plugin startup error logs, this update adds consistent initialization and configuration complete checks to all of the starlingX plugins so monitoring and external service access is not performed until the host configuration is complete. Test Plan: PASS: Verify no plugin sampling till after config is complete PASS: Verify alarm assert and clear cycle for all plugins PASS: Install AIO SX system install PASS: Install AIO DX system install PEND: Verify Standard system install PASS: Verify logging Change-Id: I90a5d1c8c3be77269a571738c9499b2e908e1fc5 Closes-Bug: 1872979 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2020-06-24 14:59:45 -04:00
Jim Gauld	1a5e6c4c3d	Add kube-cpusets tool to summarize kubernetes container cpusets This tool gathers cpuset usage information for all kubernetes containers that are running on the current host. With kubernetes CPUManager policy: - 'none' -- the k8s-infra cpuset is used for all pods - 'static' -- pods get exclusive cpuset for QoS Guaranteed or using isolcpus, otherwise pods inherit DefaultCPUSet. This displays the cpusets per container and the mapping to numa nodes. This displays the aggregate cpuset usage per system-level groupings (i.e., platform, isolated, guaranteed, default), per numa-node. Story: 2006999 Task: 39579 Change-Id: I7f1b12e2bbcf7d0b1606c1c948c545216ec454c5 Signed-off-by: Jim Gauld <james.gauld@windriver.com>	2020-06-17 13:14:50 -04:00
Eric MacDonald	f7437000c7	Platform Memory usage alarm calculation incorrect This update removes hugepage memory monitoring, sampling and alarming for over usage. Hugepage memory is only used by k8s pods or openstack vm's. Therefore its usage and alarming should not be tied to the platform. Change-Id: Iab8104ff56fdd641c058a4fdc587313cbeec9faf Closes-Bug: 1880605 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2020-05-25 16:51:53 -04:00
Thomas Gao	f9688c62f4	Added retry mechanism to clear port alarm When the link state transitions from DOWN to UP, the current collectd process attempts to clear the alarm once and once only. If such attempt failed, no further attempts will be made, and the alarm will persist in fm alarm-list. This fix added an additional check to ensure that as long as port alarm is not cleared and the link state is UP, it will attempt to clear the port alarm. Closes-Bug: 1871453 Change-Id: Iaa65f64808272a5760e655a33c14810df51e28b1 Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>	2020-05-05 14:52:05 -04:00
Kristine Bujold	58845f67b2	Increase the polling frequency for the ptp audit Increase the polling frequency for the ptp audit from 300 secs to 30 secs. Story: 2006759 Task: 39412 Change-Id: Ib40c02dfdcf19b2d2c66de33da1f04f77be515f0 Signed-off-by: Kristine Bujold <kristine.bujold@windriver.com>	2020-04-15 09:16:29 -04:00
Sharath Kumar K	0b8b39cb4e	De-branding in starlingx/monitoring: Titanium Cloud -> StarlingX 1. Rename Titanium Cloud to StarlingX for .spec files Test: After the de-brand change, bootimage.iso has built in the flock layer and installed on the dev machine to validate the changes. Please note, doing de-brand changes in batches, this is batch6 changes. Story: 2006387 Task: 39276 Change-Id: I0a0c0619530746f7fe2da4d8fc704f9b97a20241 Signed-off-by: Sharath Kumar K <sharath.kumar@intel.com>	2020-04-06 10:33:18 +02:00
Zuul	e09eafffdf	Merge "Throttle collectd OVS interface plugin startup wait log"	2020-03-12 15:52:22 +00:00
Eric MacDonald	4ab570a850	Throttle collectd OVS interface plugin startup wait log This update turns a flooding failure log into a single waiting log while the collectd OVS interface plugin initialization sequence waits for a running ovs daemon. A few pep8 long line warning are fixed. Test Plan: PASS: Verify plugin behavior of compute system install without Openstack PASS: Verify plugin behavior in AIO-SX with Openstack PASS: Verify plugin failure handling and recovery with Openstack Note: the ovs-vswitch pmon conf file was changed to allow process failure recovery to verify the plugin was able to handle transition from not running/waiting to running once process started. PASS: Verify plugin failure handling when ovs-vswitchd process fails. Note: pmond does not try and recover and the following collectd log is produced every 10 seconds. 'err ovs interface plugin failed to dump ports br-phy1 desc' Change-Id: I95d308f771ebabc77dbeb5113feae283538d37d3 Closes-Bug: 1855597 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>	2020-03-02 15:00:14 +00:00
Zuul	a9f46a032c	Merge "Support non-zero domains"	2020-02-18 16:36:24 +00:00
David Sullivan	e74efea7fe	Support non-zero domains Update the ptp extension to use the ptp4l conf during pmc commands. This will allow the collectd extension to work when a non-zero domain is specified. Change-Id: Ied0fad0e1ef2998d791619df4e9a548d3d9a3f18 Story: 2006759 Task: 38772 Signed-off-by: David Sullivan <david.sullivan@windriver.com>	2020-02-15 11:55:33 -05:00

1 2 3 4

160 Commits