This commit includes the following changes:
- Implement the new fm-api methods regarding raising/clearing alarms
in batches. The new keep_existing_alarms option was also implemented
to make sure we don't update alarms as we're not checking if it
exist before trying to raise them again.
- Moving from FaultAPIs to FaultAPIsV2, which raises exceptions if
there's an error in FM, preventing state to continue and not
clearing/raising alarms when FM is offline. This can happen during a
swact where FM process is stopped before state.
- Introduce a db call to get the subcloud object and current status
of endpoint instead of receiving a simplified subcloud through RPC.
The reason for doing this instead of a simplified subcloud is that
dcmanager-audit is faster to process than state, so until state
updates, audit will keep sending information causing duplicated
updates, slowing down the time it takes to update every subcloud.
- Convert all logs into a default format with the subcloud name at
the start, for better traceability. E.g: "Subcloud: subcloud1. <msg>".
- Removed unused function update_subcloud_sync_endpoint_type.
Test plan:
- PASS: Deploy a subcloud and verify state communicates to cert-mon
that it became online and then updates the dc_cert endpoint
after receiving the response.
- PASS: Manage the subcloud and verify all endpoint are updated and
the final sync status is in-sync.
- PASS: Force a subcloud to have an out-of-sync kube root-ca and
kubernetes and verify state correctly updates the db and
raise the alarms.
- PASS: Turn off the subcloud and verify:
- Subcloud availability was updated in db
- All endpoints were updated in db
- Dcorch was communicated
- All endpoints alarms were cleared
- The offline alarm was raised
- PASS: Unmanage the subcloud and verify all endpoints, whith the
exception of dc_cert, were updated to unknown.
- PASS: Unmanage and stop the fm-mgs service and turn off the
subcloud. Verify the subcloud is not updated to offline
until fm comes back on.
- PASS: Perform scale tests and verify that updating availability
and endpoints is faster.
Story: 2011311
Task: 52283
Depends-on: https://review.opendev.org/c/starlingx/fault/+/952671
Change-Id: I8792e1cbf8eb0af0cc9dd1be25987fac2503ecee
Signed-off-by: Victor Romano <victor.gluzromano@windriver.com>