senlin/doc/specs/cluster-fast-scaling.rst
chohoor 6496b80cbb Add fast scaling spec
This patch add a spec about senlint support for fast scaling.

Change-Id: I1e7ce32b74c7e87559ef23c245e17928b778171b
2017-12-28 12:02:24 +08:00

3.3 KiB
Raw Blame History

Cluster Fast Scaling

The URL of launchpad blueprint:

https://blueprints.launchpad.net/senlin/+spec/add-attribute-fast-scaling-to-cluster

The major function of senlin is managing clusters, change the capacity of cluster use scale out and scale in operation. Generally a single scaling operation will cost tens of seconds, even a few minutes in extreme cases. It's a long time for actual production environment, so we need to improve senlin for fast scaling.

Rather than improve the performance of hardware or optimize code, a better way is to create some standby nodes while create a new cluster. When cluster need to change the capacity immediately or replace some nodes in 'error' state to 'active' state nodes, add nodes form standby nodes to cluster, or remove error nodes from cluster and add active nodes from standby nodes to cluster.

To make cluster scaling fast, the spec proposes to extend senlin for create standby nodes and improve scaling operation.

Problem description

Before real scaling a cluster, senlin need to do many things, the slowest process is to create or delete a node.

Use Cases

If senlin support fast scaling, the follow cases will be possible:

- Change the capacity of cluster immediately, no longer waiting for creating or deleting nodes.

- Replace the error nodes from cluster immediately, improve high availability for cluster.

  • Improve the situation that scaling many times in a short time.

Proposed change

1. Add a new attribute 'fast_scaling' in metadata to cluster, with the attribute set, senlin will create standby nodes when create a new cluster. The number of standby nodes could be specifybut sum of standby nodes and nodes in cluster should less than max size of the cluster.

2. Revise cluster create and cluster delete operation for support new attr, delete standby nodes when delete a cluster.

3. Revise scale out and scale in operation, with the new attribute set, add nodes form standby nodes to cluster or remove nodes from cluster to standby nodes first.

4. Revise health policy, check the state of standby nodes and support replace error nodes to active nodes from standby nodes.

5. Revise deletion policy, delete nodes or remove nodes to standby nodes when perform deletion operation.

Alternatives

Any other ideas of fast scale a cluster.

Data model impact

None

REST API impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

The standby nodes will claimed some resources. We should control the number of standby nodes in a reasonable range.

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

chohoor(Hongbin Li) <chohoor@gmail.com>

Work Items

Depends on the design plan.

Dependencies

None

Testing

Need unit tests.

Documentation Impact

Documentation about api and operation should be update.

References

None

History

None