Hiroaki Kobayashi 6d2950f3b0 Add periodic healing

This patch adds a periodic healing mechanism into the monitor module and
monitoring plugins. With this change, the heal_reservations() method of
resource plugin was changed to receive a period (start/end_date
arguments) to heal.

This change is for not healing (reallocating) all of reservations for
failed resources immediately because failed resources are expected to
recover sometime in the future. The monitor tries to heal only
reservations which are active or will start soon. Remaining reservations
are expected to be healed by the periodic healing.

Implements: blueprint healing-time
Change-Id: I6971c952fcde101ff2408f567fee9a7dab97b140

2018-04-24 13:18:16 +09:00

1.4 KiB

Raw Blame History

Compute Host Monitor

Compute host monitor detects failure and recovery of compute hosts. If it detects failures, it triggers healing of host reservations and instance reservations. This document describes the compute host monitor plugin in detail.

Monitoring Type

Both of the push-based and the polling-based monitoring types are supported for the compute host monitor. These monitors can be enabled/disabled by the following configuration options:

enable_notification_monitor: Set True to enable it.
enable_polling_monitor: Set True to enable it.

Failure Detection

Compute host monitor detects failure and recovery hosts by subscribing Nova notifications or polling the List Hypervisors of Nova API. If any failure is detected, Blazar sets the reservable field of the failed host False and heals suffering reservations as follows.

Reservation Healing

If a host failure is detected, Blazar tries to heal host/instance reservations which use the failed host by reserving alternative host. The length of the healing interval can be configured by the healing_interval option.

Configurations

To enable the compute host monitor, enable enable_notification_monitor or enable_polling_monitor option, and set healing_interval as appropriate for your cloud. See also the ../configuration/blazar-conf in detail.

1.4 KiB Raw Blame History