Files
metal/mtce
Eric MacDonald 2cb728678f Fix mtcAgent AIO simplex subfunction failure handling over unlock
The mtcAgent is seen to get stuck handling a subfunction failure
detected over self (controller-0) unlock of an AIO simplex controller.

It gets stuck reporting that is it already handling the failure,
but isn't.

log flooding: 'controller-0 already handling force full enable'

This issue only exists in AIO simplex when the subfunction enable
handler detects the failure. This issue was introduced by the
following update:

Remove Start Host Service Launch in mtcAgent & enhance fault detection
https://opendev.org/starlingx/metal/commit/
      6106051f1c

Test Plan:

PASS: Verify an AIO simplex self unlock subfunction failure leads to
      'degrade' state with 'enable failure' alarm.
PASS: Verify same issue for the standby controller leads to
      'failure' state with 'enable failure' alarm.

Regression:

PASS: Verify spontaneous unhealthy active controller is degraded.
PASS: Verify spontaneous unhealthy standby controller is failed.

Closes-Bug: 2119449
Change-Id: I5ab5e6d85906f1923a0828211dbf94d2f82e73f8
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2025-08-05 10:14:06 -04:00
..