StarlingX Fault Management
54f9fed7c3
FM messaging socket reads that are triggered by FM API calls from client services have been seen to rarely but occasionally block/stall the fmManager process. This fmManager stall can then lead to other client service process stalls which in the case of mtcAgent has been seen to lead to uncontrolled switch of activity ; aka Swact. This update adds a 5 second socket read timeout to FM's client services socket setup to avoid the prolonged blocking cases that lead to Swact or adversely affect (block) other client service process execution. Setting a read timeout on Linux sockets is a good programming practice. Doing so it helps ensure that an application, FM and client services do not hang indefinitely if a network operation like a socket read becomes unresponsive. Configuring a timeout helps manage network communication reliability and efficiency, especially in applications where responsiveness is critical. Especially in server-client application such as FM. Test Plan: PASS: Verify AIO DX system install. PASS: Verify blocked socket timeout and error log after 5 seconds. PASS: Verify unblocked socket reads complete successfully. PASS: Verify alarm assert/clear functions operate normally. PASS: Verify set socket timeout failure handling. PASS: Verify fmManager is not leaking files or memory. PASS: Verify rook-ceph apply remove 100 loop soak - no stall or swact - AIO DX - with 2 OSDs on each controller Closes-Bug: 2088025 Change-Id: I1d947bccf9faeedcc2b96c7bc398fbab77b7ae09 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com> |
||
---|---|---|
api-ref/source | ||
devstack | ||
doc | ||
fm-api | ||
fm-common | ||
fm-doc | ||
fm-mgr | ||
fm-rest-api | ||
python-fmclient | ||
releasenotes | ||
.gitignore | ||
.gitreview | ||
.zuul.yaml | ||
bindep.txt | ||
CONTRIBUTORS.wrs | ||
debian_build_layer.cfg | ||
debian_iso_image.inc | ||
debian_pkg_dirs | ||
debian_stable_docker_images.inc | ||
debian_stable_wheels.inc | ||
LICENSE | ||
pylint.rc | ||
README.rst | ||
requirements.txt | ||
test-requirements.txt | ||
tox.ini |
Fault Management (FM)
The starlingx/fault repository handles Fault Management (FM) services1, and provides the fm command-line interface (CLI)2.
This repository is not intended to be developed standalone, but rather as part of the StarlingX Source System, which is defined by the StarlingX manifest3.