diff --git a/doc/source/userguide.rst b/doc/source/userguide.rst index b770900a27..d894b2e343 100644 --- a/doc/source/userguide.rst +++ b/doc/source/userguide.rst @@ -1850,8 +1850,122 @@ proceed as follows: Now restart heat. -*To be filled in* -Include auto scaling +Containers and nodes +-------------------- + +Scaling containers and nodes refers to increasing or decreasing +allocated system resources. Scaling is a broad topic and involves +many dimensions. In the context of Magnum in this guide, we consider +the following issues: + +- Scaling containers and scaling cluster nodes (infrastructure) +- Manual and automatic scaling + +Since this is an active area of development, a complete solution +covering all issues does not exist yet, but partial solutions are +emerging. + +Scaling containers involves managing the number of instances of the +container by replicating or deleting instances. This can be used to +respond to change in the workload being supported by the application; +in this case, it is typically driven by certain metrics relevant to the +application such as response time, etc. Other use cases include +rolling upgrade, where a new version of a service can gradually be +scaled up while the older version is gradually scaled down. Scaling +containers is supported at the COE level and is specific to each COE +as well as the version of the COE. You will need to refer to the +documentation for the proper COE version for full details, but +following are some pointers for reference. + +For Kubernetes, pods are scaled manually by setting the count in the +replication controller. Kubernetes version 1.3 and later also +supports `autoscaling +`_. +For Docker, the tool 'Docker Compose' provides the command +`docker-compose scale +`_ which lets you +manually set the number of instances of a container. For Swarm +version 1.12 and later, services can also be scaled manually through +the command `docker service scale +`_. +Automatic scaling for Swarm is not yet available. Mesos manages the +resources and does not support scaling directly; instead, this is +provided by frameworks running within Mesos. With the Marathon +framework currently supported in the Mesos cluster, you can use the +`scale operation +`_ +on the Marathon UI or through a REST API call to manually set the +attribute 'instance' for a container. + +Scaling the cluster nodes involves managing the number of nodes in the +cluster by adding more nodes or removing nodes. There is no direct +correlation between the number of nodes and the number of containers +that can be hosted since the resources consumed (memory, CPU, etc) +depend on the containers. However, if a certain resource is exhausted +in the cluster, adding more nodes would add more resources for hosting +more containers. As part of the infrastructure management, Magnum +supports manual scaling through the attribute 'node_count' in the +cluster, so you can scale the cluster simply by changing this +attribute:: + + magnum cluster-update mycluster replace node_count=2 + +Refer to the section `Scale`_ lifecycle operation for more details. + +Adding nodes to a cluster is straightforward: Magnum deploys +additional VMs or baremetal servers through the heat templates and +invokes the COE-specific mechanism for registering the new nodes to +update the available resources in the cluster. Afterward, it is up to +the COE or user to re-balance the workload by launching new container +instances or re-launching dead instances on the new nodes. + +Removing nodes from a cluster requires some more care to ensure +continuous operation of the containers since the nodes being removed +may be actively hosting some containers. Magnum performs a simple +heuristic that is specific to the COE to find the best node candidates +for removal, as follows: + +Kubernetes + Magnum scans the pods in the namespace 'Default' to determine the + nodes that are *not* hosting any (empty nodes). If the number of + nodes to be removed is equal or less than the number of these empty + nodes, these nodes will be removed from the cluster. If the number + of nodes to be removed is larger than the number of empty nodes, a + warning message will be sent to the Magnum log and the empty nodes + along with additional nodes will be removed from the cluster. The + additional nodes are selected randomly and the pods running on them + will be deleted without warning. For this reason, a good practice + is to manage the pods through the replication controller so that the + deleted pods will be relaunched elsewhere in the cluster. Note also + that even when only the empty nodes are removed, there is no + guarantee that no pod will be deleted because there is no locking to + ensure that Kubernetes will not launch new pods on these nodes after + Magnum has scanned the pods. + +Swarm + No node selection heuristic is currently supported. If you decrease + the node_count, a node will be chosen by magnum without + consideration of what containers are running on the selected node. + +Mesos + No node selection heuristic is currently supported. If you decrease + the node_count, a node will be chosen by magnum without + consideration of what containers are running on the selected node. + + +Currently, scaling containers and scaling cluster nodes are handled +separately, but in many use cases, there are interactions between the +two operations. For instance, scaling up the containers may exhaust +the available resources in the cluster, thereby requiring scaling up +the cluster nodes as well. Many complex issues are involved in +managing this interaction. A presentation at the OpenStack Tokyo +Summit 2015 covered some of these issues along with some early +proposals, `Exploring Magnum and Senlin integration for autoscaling +containers +`_. +This remains an active area of discussion and research. + ======= Storage