309fefa68c
The current implementation of health policy is able to check node health either using lifecycle events sent by Nova or by actively polling the the node status using Nova driver. However, neither of those detection types will detect that a node is unhealthy because the compute node on which the node is running on went down. In that case Nova will still report the node status as active even though the node is no longer running. This change adds a new detection type that actively polls the node health using a URL specified in the health policy. That way the user can integrate Senlin health policy with another custom or 3rd party health check service. Using this external health check service, the health policy will be able to detect the failure scenario when a compute node running the node goes down. When a health policy with NODE_STATUS_POLL_URL is attached to a cluster, the health manager performs a GET operation on the URL. The URL specified in NODE_STATUS_POLL_URL can contain expansion parameters that are resolved by the health manager before the GET operation is executed. The only valid expansion parameter at this time is {nodename}. If the response body of the GET operation contains the string specified by poll_url_healthy_response in the health policy, the node is considered to be healthy. If it does not contain that string, the health manager will retry the GET operation the number of times specified in poll_url_retry_limit while sleeping the time interval specified in poll_url_retry_interval between each retry. If the response body of each of those GET operations still does not contain the poll_url_healthy_response string, the node is considered to be unhealthy. Once a node is determined to be unhealthy, the health manager will attempt to recover the node using the recovery type specified in the health policy. In the case when a compute node was shutdown, the deletion of an existing node will never finish. If node_force_recreate is set to True in the health policy's recovery options, the health manager will wait for the time interval in node_delete_timeout before continuing to create the node even if the node deletion failed due to timeout. The detailed changes include: * Add new options to health policy related to node poll url detection type * Update health manager to support node status polling by URL * Update server profile to handle force_recreate option * Add supported status to health policy * Add new configuration value health_check_interval_min The senlin documentation and senlin-tempest-plugin will be updated to include the health policy detection type in a separate patch sets. Change-Id: Iaf9174b4a372df6fc38935a67648229bfb1ebc16
9 lines
352 B
YAML
9 lines
352 B
YAML
---
|
|
features:
|
|
- Health policy now contains NODE_STATUS_POLL_URL detection type. This
|
|
detection type queries the URL specified in the health policy for node
|
|
health status. This allows the user to integrate Senlin health checks
|
|
with an external health service.
|
|
other:
|
|
- Health policy v1.0 was moved from EXPERIMENTAL to SUPPORTED status.
|