Files
Erickson Silva de Oliveira ee48f23033 Fix CEPH_DOWN alarm clearing
After the changes in [1], it was possible to create an alarm
without the "cluster" information in the 'entity_instance_id'.

When the fsid cannot be obtained, a CEPH_DOWN alarm is created
without setting the 'entity_instance_id', and when ceph becomes
esponsive and the fsid is obtained, this initially created
alarm is cleared.

However, if the FM service is unavailable, the alarm is maintained
and the 'entity_instance_id' is set to the cluster's fsid.
With this, ceph-manager will always look for alarms with
'entity_instance_id: cluster=<fsid>' and ignore alarms
created without 'entity_instance_id'.

Therefore, to solve the issue, when ceph is HEALTH_OK, it will
check if there is an alarm without the 'entity_instance_id',
and if there is, it will be cleared.

[1]: https://review.opendev.org/c/starlingx/utilities/+/953994

Test Plan:
 - PASS: Fresh install on STD.
 - PASS: Reboot the active controller many times.
 - PASS: Force CEPH_DOWN alarms.
 - PASS: Check if there is any CEPH_DOWN alarm when
	 ceph is HEALTH_OK.

Closes-Bug: 2129927

Change-Id: Ib4a1765caa7f38a7eb72b8d0b366048e98d82f1f
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
2025-10-27 16:09:46 -03:00
..
2025-10-27 16:09:46 -03:00