senlin/examples
Duc Truong 309fefa68c Add node poll url detection type to health policy
The current implementation of health policy is able to check node health
either using lifecycle events sent by Nova or by actively polling the
the node status using Nova driver.  However, neither of those detection
types will detect that a node is unhealthy because the compute node on
which the node is running on went down.  In that case Nova will still
report the node status as active even though the node is no longer
running.

This change adds a new detection type that actively polls the node
health using a URL specified in the health policy.  That way the user
can integrate Senlin health policy with another custom or 3rd party
health check service.  Using this external health check service, the
health policy will be able to detect the failure scenario when a compute
node running the node goes down.

When a health policy with NODE_STATUS_POLL_URL is attached to a cluster,
the health manager performs a GET operation on the URL. The URL
specified in NODE_STATUS_POLL_URL can contain expansion parameters that
are resolved by the health manager before the GET operation is executed.
The only valid expansion parameter at this time is {nodename}.

If the response body of the GET operation contains the string specified
by poll_url_healthy_response in the health policy, the node is
considered to be healthy.  If it does not contain that string, the
health manager will retry the GET operation the number of times
specified in poll_url_retry_limit while sleeping the time interval
specified in poll_url_retry_interval between each retry.  If the
response body of each of those GET operations still does not contain the
poll_url_healthy_response string, the node is considered to be
unhealthy.

Once a node is determined to be unhealthy, the health manager will
attempt to recover the node using the recovery type specified in the
health policy.  In the case when a compute node was shutdown, the
deletion of an existing node will never finish.  If node_force_recreate
is set to True in the health policy's recovery options, the health
manager will wait for the time interval in node_delete_timeout before
continuing to create the node even if the node deletion failed due to
timeout.

The detailed changes include:
* Add new options to health policy related to node poll url detection
type
* Update health manager to support node status polling by URL
* Update server profile to handle force_recreate option
* Add supported status to health policy
* Add new configuration value health_check_interval_min

The senlin documentation and senlin-tempest-plugin will be updated to
include the health policy detection type in a separate patch sets.

Change-Id: Iaf9174b4a372df6fc38935a67648229bfb1ebc16
2018-06-26 00:56:19 +00:00
..
policies Add node poll url detection type to health policy 2018-06-26 00:56:19 +00:00
profiles Fix grammar error 2018-02-11 13:57:56 +08:00