metal/mtce-common/src/daemon
Eric MacDonald aaf9d08028 Mtce: Fix bmc password fetch error handling
The mtcAgent process sometimes segfaults while trying to fetch
the bmc password from a failing barbican process.

With that issue fixed the mtcAgent sends the bmc access
credentials to the hardware monitor (hwmond) process which
then segfaults for a reason similar

In cases where the process does not segfault but also does not
get a bmc password, the mtcAgent will flood its log file.

This update

 1. Prevents the segfault case by properly managing acquired
    json-c object releases. There was one in the mtcAgent and
    another in the hardware monitor (hwmond).

    The json_object_put object release api should only be called
    against objects that were created with very specific apis.
    See new comments in the code.

 2. Avoids log flooding error case by performing a password size
    check rather than assume the password is valid following the
    secret payload receive stage.

 3. Simplifies the secret fsm and error and retry handling.

 4. Deletes useless creation and release of a few unused json
    objects in the common jsonUtil and hwmonJson modules.

Note: This update temporarily disables sensor and sensorgroup
      suppression support for the debian hardware monitor while
      a suppression type fix in sysinv is being investigated.

Test Plan:

PASS: Verify success path bmc password secret fetch
PASS: Verify secret reference get error handling
PASS: Verify secret password read error handling
PASS: Verify 24 hr provision/deprov success path soak
PASS: Verify 24 hr provision/deprov error path path soak
PASS: Verify no memory leak over success and failure path soaking
PASS: Verify failure handling stress soak ; reduced retry delay
PASS: Verify blocking secret fetch success and error handling
PASS: Verify non-blocking secret fetch success and error handling
PASS: Verify secret fetch is set non-blocking
PASS: Verify success and failure path logging
PASS: Verify all of jsonUtil module manages object release properly
PASS: Verify hardware monitor sensor model creation, monitoring,
             alarming and relearning. This test requires suppress
             disable in order to create sensor groups in debian.
PASS: Verify both ipmi and redfish and switch between them with
             just bm_type change.
PASS: Verify all above tests in CentOS
PASS: Verify over 4000 provision/deprovision cycles across both
             failure and success path handling with no process
             failures

Closes-Bug: 1975520
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: Ibbfdaa1de662290f641d845d3261457904b218ff
2022-06-01 15:21:05 +00:00
..
Makefile Set SHELL in Makefiles that use bash constructs 2018-12-07 14:09:48 -06:00
daemon_common.h Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
daemon_config.cpp Disable Redfish BMC audit and improve reinstall failure handling 2020-11-16 15:15:22 +00:00
daemon_debug.cpp Add SM process heartbeat and status to the hbs cluster 2020-12-10 11:13:13 -05:00
daemon_files.cpp Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
daemon_ini.cpp fix compilation warnings in c/cpp files 2018-10-23 07:38:33 +00:00
daemon_ini.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
daemon_main.cpp Update the init parameters for opts 2019-05-30 11:00:41 +08:00
daemon_option.h Implement Active-Active Heartbeat as HA Improvement 2018-11-20 19:57:18 +00:00
daemon_signal.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00