StarlingX Fault Management
Go to file
Eric MacDonald 54f9fed7c3 Set 5 second socket read timeout
FM messaging socket reads that are triggered by FM API calls from
client services have been seen to rarely but occasionally block/stall
the fmManager process. This fmManager stall can then lead to other
client service process stalls which in the case of mtcAgent has been
seen to lead to uncontrolled switch of activity ; aka Swact.

This update adds a 5 second socket read timeout to FM's client services
socket setup to avoid the prolonged blocking cases that lead to Swact
or adversely affect (block) other client service process execution.

Setting a read timeout on Linux sockets is a good programming practice.
Doing so it helps ensure that an application, FM and client services
do not hang indefinitely if a network operation like a socket read
becomes unresponsive.

Configuring a timeout helps manage network communication reliability
and efficiency, especially in applications where responsiveness is
critical. Especially in server-client application such as FM.

Test Plan:

PASS: Verify AIO DX system install.
PASS: Verify blocked socket timeout and error log after 5 seconds.
PASS: Verify unblocked socket reads complete successfully.
PASS: Verify alarm assert/clear functions operate normally.
PASS: Verify set socket timeout failure handling.
PASS: Verify fmManager is not leaking files or memory.
PASS: Verify rook-ceph apply remove 100 loop soak
      - no stall or swact
      - AIO DX
      - with 2 OSDs on each controller

Closes-Bug: 2088025
Change-Id: I1d947bccf9faeedcc2b96c7bc398fbab77b7ae09
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-11-14 11:20:17 +00:00
api-ref/source Removing system uuid from alarms summary 2021-08-12 19:55:39 -04:00
devstack Implement access control for FM API 2022-08-26 10:54:39 -03:00
doc Fix sphinx configuration for tox docs 2023-08-29 17:33:47 -03:00
fm-api Create 900.023 and 900.024 USM alarms 2024-10-15 00:21:38 -03:00
fm-common Set 5 second socket read timeout 2024-11-14 11:20:17 +00:00
fm-doc Change patch alarm Management Affecting_Severity 2024-11-01 18:36:40 +00:00
fm-mgr Remove CentOS/OpenSUSE build support 2024-04-26 14:09:32 -04:00
fm-rest-api Fix fm-rest-api returning incorrect error codes 2024-09-16 13:54:57 -03:00
python-fmclient Remove CentOS/OpenSUSE build support 2024-04-26 14:09:32 -04:00
releasenotes Switch to newer openstackdocstheme and reno versions 2020-06-04 14:20:25 +02:00
.gitignore Create test framework for python with stestr. 2020-01-16 16:45:45 +08:00
.gitreview OpenDev Migration Patch 2019-04-19 19:52:34 +00:00
.zuul.yaml Fix github mirroring for this repo 2023-04-28 12:38:51 -04:00
bindep.txt py3: Add support for python 3.9 2021-09-01 08:58:34 -04:00
CONTRIBUTORS.wrs StarlingX open source release updates 2018-05-31 07:36:00 -07:00
debian_build_layer.cfg Add debian_build_layer.cfg file 2021-10-28 15:26:08 -04:00
debian_iso_image.inc Debian: fault: update debian_iso_image.inc 2022-11-18 08:15:57 +08:00
debian_pkg_dirs Update debian-pkg-dirs with fm-doc library 2022-02-04 02:02:56 +00:00
debian_stable_docker_images.inc Add stx-fm-rest-api loci image 2022-10-27 15:29:06 +00:00
debian_stable_wheels.inc Add stx-fm-rest-api loci image 2022-10-27 15:29:06 +00:00
LICENSE StarlingX open source release updates 2018-05-31 07:36:00 -07:00
pylint.rc Support newer version of yaml 2023-05-31 16:36:27 +00:00
README.rst starlingx/fault README improvement 2023-07-19 10:48:29 -03:00
requirements.txt Adding pylint zuul and tox target 2020-03-11 09:05:18 -05:00
test-requirements.txt Tox and Zuul cleanup for python3.9 2023-03-02 19:32:25 +00:00
tox.ini Tox and Zuul cleanup for python3.9 2023-03-02 19:32:25 +00:00

Fault Management (FM)

The starlingx/fault repository handles Fault Management (FM) services1, and provides the fm command-line interface (CLI)2.

This repository is not intended to be developed standalone, but rather as part of the StarlingX Source System, which is defined by the StarlingX manifest3.

References


  1. https://docs.starlingx.io/api-ref/fault↩︎

  2. https://docs.starlingx.io/cli_ref/fm.html↩︎

  3. https://opendev.org/starlingx/manifest.git↩︎