6.7 KiB
Deletion Policy
The deletion policy is provided to help users control the election of victim nodes when a cluster is about to be shrank. In other words, when the size of a cluster is to be decreased, which node(s) should be removed first.
Currently, this policy is applicable to clusters of all profile types and it is enforced when the cluster's size is about to be reduced.
Properties
Sample
Below is a typical spec for a deletion policy:
The valid values for the "criteria
property include:
OLDEST_FIRST
: always select node(s) which were created earlier than other nodes.YOUNGEST_FIRST
: always select node(s) which were created recently instead of those created earlier.OLDEST_PROFILE_FIRST
: compare the profile used by each individual nodes and select the node(s) whose profile(s) were created earlier than others.RANDOM
: randomly select node(s) from the cluster for deletion. This is the default criteria if omitted.
Note
There is an implicit rule (criteria) when electing victim nodes. Senlin engine always rank those nodes which are not in ACTIVE state or which are marked as tainted before others.
There are more several actions that can trigger a deletion policy.
Some of them may already carry a list of candidates to remove, e.g.
CLUSTER_DEL_NODES
or NODE_DELETE
; others may
only carry a number of nodes to remove, e.g.
CLUSTER_SCALE_IN
or CLUSTER_RESIZE
. For
actions that already have a list of candidates, the deletion policy will
respect the action inputs. The election of victims only happens when no
such candidates have been identified.
Deletion vs Destroy
There are cases where you don't want the node(s) removed from a cluster to be destroyed. Instead, you prefer them to become "orphan" nodes so that in future you can quickly add them back to the cluster without having to create new nodes.
If this is your situation, you may want to set
destroy_after_deletion
to false
. Senlin engine
won't delete the node(s) after removing them from the cluster.
The default behavior is to delete (destroy) the node(s) after they are deprived of their cluster membership.
Grace Period
Another common scenario is to grant a node a period of time for it to shutdown gracefully. Even if a node doesn't have a builtin logic to perform a graceful shutdown, granting them some extra time may still help ensure the resources they were using have been properly released.
The default value for grace_period
property is 0, which
means the node deletion happens as soon as it is removed from the
cluster. You can customize this value according to your need. Note that
the grace period will be granted to all node(s) deleted. When setting
this value to a large number, be sure it will not exceed the typical
timeout value for action execution. Or else the node deletion will be a
failure.
Reduce Desired Capacity or Not
In most cases, users would anticipate the "desired_capacity" of a cluster be reduced when there are nodes removed from it. Since the victim selection algorithm always pick nodes in non-ACTIVE status over ACTIVE ones, you can actually remove erroneous nodes by taking advantage of this rule.
For example, there are 4 nodes in a cluster and 2 of them are known
to be in inactive status. You can use the command openstack cluster members
del
to remove the bad nodes. If you have a deletion policy
attached to the cluster, you get a chance to tell the Senlin engine that
you don't want to change the capacity of the cluster. Instead, you only
want the bad nodes removed. With the help of other cluster health
related commands, you can quickly recover the cluster to a healthy
status. You don't have to change the desired capacity of the cluster to
a smaller value and then change it back.
If this is your use case, you can set
reduce_desired_capacity
to false
in the policy
spec. The cluster's desired capacity won't be changed after cluster
membership is modified.
Lifecycle Hook
If there is a need to receive notification of a node deletion, you can specify a lifecycle hook in the deletion policy:
type: senlin.policy.deletion
version: 1.1
properties:
hooks:
type: 'zaqar'
timeout: 120
params:
queue: 'my_queue'
The valid values for the type
are:
zaqar
: send message to zaqar queue. The name of the zaqar must be specified inqueue
property.webhook
: send message to webhook URL. The URL of the webhook must be specified inurl
property.
timeout
property specifies the number of seconds to wait
before the actual node deletion happens. This timeout can be preempted
by calling complete lifecycle hook API.
Note
Hooks of type webhook
will be supported in a future
version. Currently only hooks of type zaqar
are
supported.
Deleting Nodes Across Regions
With the help of ref-region-policy
, you will be able to distribute a
cluster's nodes into different regions as instructed. However, when you
are removing nodes from more than one regions, the same distribution
rule has to be respected as well.
When there is a region placement policy in effect, the deletion policy will first determine the number of nodes to be removed from each region. Then in each region, the policy performs a victim election based on the criteria you specified in the policy spec.
Deleting Nodes Across Availability Zones
Similarly, when there is a zone placement policy attached to the cluster in question, nodes in the cluster may get distributed across a few availability zones based on a preset algorithm.
The deletion policy, when triggered, will first determine the number for nodes to be removed from each availability zone. Then it proceeds to elect victim nodes based on the criteria specified in the policy spec within each availability zone.