This patch is implementing a new service monitor for the Applier service to handle the Applier service failures. This service is built on top of the base ServiceMonitoringBase class and the expected behavior when an existing Applier is found as Failed is: - ActionPlans in ONGOING state running in the failed service will be CANCELLED. - ActionPlans in PENDING state assigned to the failed service will be unassigned (hostname will be emptied) and execution will be re-triggered via RPC message. This new service has been added to the existing Applier service and runs in a separate execution thread. The failure detection, leader election and execution interval uses the same pattern and configuration parameters (periodic_interval and service_down_time) that the existing monitor for the decision-engine. Implements: blueprint-monitor-failed-appliers Change-Id: I6c3bb699ee5db75b7d4528f40c2d47264858a447 Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
13 lines
535 B
YAML
13 lines
535 B
YAML
---
|
|
features:
|
|
- |
|
|
A new service monitor has been added to the applier service to monitor
|
|
the status of the applier services and handle failover for action plans
|
|
when applier services fail. This monitor is executed as part of the applier
|
|
services and performs the following actions when an applier is detected
|
|
as failed:
|
|
|
|
- Cancels ONGOING action plans on the failed service.
|
|
- Retriggers PENDING action plans on the failed applier service so that they
|
|
can be picked up by any alive applier service.
|