Technical considerations

Technical considerations Converting an existing OpenStack environment that was designed for a different purpose to be massively scalable is a formidable task. When building a massively scalable environment from the ground up, make sure the initial deployment is built with the same principles and choices that apply as the environment grows. For example, a good approach is to deploy the first site as a multi-site environment. This allows the same deployment and segregation methods to be used as the environment grows to separate locations across dedicated links or wide area networks. In a hyperscale cloud, scale trumps redundancy. Applications must be modified with this in mind, relying on the scale and homogeneity of the environment to provide reliability rather than redundant infrastructure provided by non-commodity hardware solutions.

Infrastructure segregation Fortunately, OpenStack services are designed to support massive horizontal scale. Be aware that this is not the case for the entire supporting infrastructure. This is particularly a problem for the database management systems and message queues used by the various OpenStack services for data storage and remote procedure call communications. Traditional clustering techniques are typically used to provide high availability and some additional scale for these environments. In the quest for massive scale, however, additional steps need to be taken to relieve the performance pressure on these components to prevent them from negatively impacting the overall performance of the environment. It is important to make sure that all the components are in balance so that, if and when the massively scalable environment fails, all the components are at, or close to, maximum capacity. Regions are used to segregate completely independent installations linked only by an Identity and Dashboard (optional) installation. Services are installed with separate API endpoints for each region, complete with separate database and queue installations. This exposes some awareness of the environment's fault domains to users and gives them the ability to ensure some degree of application resiliency while also imposing the requirement to specify which region their actions must be applied to. Environments operating at massive scale typically need their regions or sites subdivided further without exposing the requirement to specify the failure domain to the user. This provides the ability to further divide the installation into failure domains while also providing a logical unit for maintenance and the addition of new hardware. At hyperscale, instead of adding single compute nodes, administrators may add entire racks or even groups of racks at a time with each new addition of nodes exposed via one of the segregation concepts mentioned herein. Cells provide the ability to subdivide the compute portion of an OpenStack installation, including regions, while still exposing a single endpoint. In each region an API cell is created along with a number of compute cells where the workloads actually run. Each cell gets its own database and message queue setup (ideally clustered), providing the ability to subdivide the load on these subsystems, improving overall performance. Within each compute cell a complete compute installation is provided, complete with full database and queue installations, scheduler, conductor, and multiple compute hosts. The cells scheduler handles placement of user requests from the single API endpoint to a specific cell from those available. The normal filter scheduler then handles placement within the cell. The downside of using cells is that they are not well supported by any of the OpenStack services other than Compute. Also, they do not adequately support some relatively standard OpenStack functionality such as security groups and host aggregates. Due to their relative newness and specialized use, they receive relatively little testing in the OpenStack gate. Despite these issues, however, cells are used in some very well known OpenStack installations operating at massive scale including those at CERN and Rackspace.

Host aggregates Host aggregates enable partitioning of OpenStack Compute deployments into logical groups for load balancing and instance distribution. Host aggregates may also be used to further partition an availability zone. Consider a cloud which might use host aggregates to partition an availability zone into groups of hosts that either share common resources, such as storage and network, or have a special property, such as trusted computing hardware. Host aggregates are not explicitly user-targetable; instead they are implicitly targeted via the selection of instance flavors with extra specifications that map to host aggregate metadata.

Availability zones Availability zones provide another mechanism for subdividing an installation or region. They are, in effect, host aggregates that are exposed for (optional) explicit targeting by users. Unlike cells, they do not have their own database server or queue broker but simply represent an arbitrary grouping of compute nodes. Typically, grouping of nodes into availability zones is based on a shared failure domain based on a physical characteristic such as a shared power source, physical network connection, and so on. Availability zones are exposed to the user because they can be targeted; however, users are not required to target them. An alternate approach is for the operator to set a default availability zone to schedule instances to other than the default availability zone of nova.

Segregation example In this example the cloud is divided into two regions, one for each site, with two availability zones in each based on the power layout of the data centers. A number of host aggregates have also been defined to allow targeting of virtual machine instances using flavors, that require special capabilities shared by the target hosts such as SSDs, 10 GbE networks, or GPU cards.