Architecture

Architecture Hardware selection involves three key areas: Compute Network Storage Selecting hardware for a general purpose OpenStack cloud should reflect a cloud with no pre-defined usage model. General purpose clouds are designed to run a wide variety of applications with varying resource usage requirements. These applications include any of the following: RAM-intensive CPU-intensive Storage-intensive Choosing hardware for a general purpose OpenStack cloud must provide balanced access to all major resources. Certain hardware form factors may better suit a general purpose OpenStack cloud due to the requirement for equal (or nearly equal) balance of resources. Server hardware must provide the following: Equal (or nearly equal) balance of compute capacity (RAM and CPU) Network capacity (number and speed of links) Storage capacity (gigabytes or terabytes as well as Input/Output Operations Per Second (IOPS) Server hardware is evaluated around four conflicting dimensions. Server density A measure of how many servers can fit into a given measure of physical space, such as a rack unit [U]. Resource capacity The number of CPU cores, how much RAM, or how much storage a given server will deliver. Expandability The number of additional resources that can be added to a server before it has reached its limit. Cost The relative purchase price of the hardware weighted against the level of design effort needed to build the system. Increasing server density means sacrificing resource capacity or expandability, however, increasing resource capacity and expandability increases cost and decreases server density. As a result, determining the best server hardware for a general purpose OpenStack architecture means understanding how choice of form factor will impact the rest of the design. The following list outlines the form factors to choose from: Blade servers typically support dual-socket multi-core CPUs, which is the configuration generally considered to be the "sweet spot" for a general purpose cloud deployment. Blades also offer outstanding density. As an example, both HP BladeSystem and Dell PowerEdge M1000e support up to 16 servers in only 10 rack units. However, the blade servers themselves often have limited storage and networking capacity. Additionally, the expandability of many blade servers can be limited. 1U rack-mounted servers occupy only a single rack unit. Their benefits include high density, support for dual-socket multi-core CPUs, and support for reasonable RAM amounts. This form factor offers limited storage capacity, limited network capacity, and limited expandability. 2U rack-mounted servers offer the expanded storage and networking capacity that 1U servers tend to lack, but with a corresponding decrease in server density (half the density offered by 1U rack-mounted servers). Larger rack-mounted servers, such as 4U servers, will tend to offer even greater CPU capacity, often supporting four or even eight CPU sockets. These servers often have much greater expandability so will provide the best option for upgradability. This means, however, that the servers have a much lower server density and a much greater hardware cost. "Sled servers" are rack-mounted servers that support multiple independent servers in a single 2U or 3U enclosure. This form factor offers increased density over typical 1U-2U rack-mounted servers but tends to suffer from limitations in the amount of storage or network capacity each individual server supports. The best form factor for server hardware supporting a general purpose OpenStack cloud is driven by outside business and cost factors. No single reference architecture will apply to all implementations; the decision must flow from user requirements, technical considerations, and operational considerations. Here are some of the key factors that influence the selection of server hardware: Instance density Sizing is an important consideration for a general purpose OpenStack cloud. The expected or anticipated number of instances that each hypervisor can host is a common meter used in sizing the deployment. The selected server hardware needs to support the expected or anticipated instance density. Host density Physical data centers have limited physical space, power, and cooling. The number of hosts (or hypervisors) that can be fitted into a given metric (rack, rack unit, or floor tile) is another important method of sizing. Floor weight is an often overlooked consideration. The data center floor must be able to support the weight of the proposed number of hosts within a rack or set of racks. These factors need to be applied as part of the host density calculation and server hardware selection. Power density Data centers have a specified amount of power fed to a given rack or set of racks. Older data centers may have a power density as power as low as 20 AMPs per rack, while more recent data centers can be architected to support power densities as high as 120 AMP per rack. The selected server hardware must take power density into account. Network connectivity The selected server hardware must have the appropriate number of network connections, as well as the right type of network connections, in order to support the proposed architecture. Ensure that, at a minimum, there are at least two diverse network connections coming into each rack. For architectures requiring even more redundancy, it might be necessary to confirm that the network connections are from diverse telecom providers. Many data centers have that capacity available. The selection of form factors or architectures affects the selection of server hardware. For example, if the design is a scale-out storage architecture, then the server hardware selection will require careful consideration when matching the requirements set to the commercial solution. Ensure that the selected server hardware is configured to support enough storage capacity (or storage expandability) to match the requirements of selected scale-out storage solution. For example, if a centralized storage solution is required, such as a centralized storage array from a storage vendor that has InfiniBand or FDDI connections, the server hardware will need to have appropriate network adapters installed to be compatible with the storage array vendor's specifications. Similarly, the network architecture will have an impact on the server hardware selection and vice versa. For example, make sure that the server is configured with enough additional network ports and expansion cards to support all of the networks required. There is variability in network expansion cards, so it is important to be aware of potential impacts or interoperability issues with other components in the architecture.

Selecting storage hardware Storage hardware architecture is largely determined by the selected storage architecture. The selection of storage architecture, as well as the corresponding storage hardware, is determined by evaluating possible solutions against the critical factors, the user requirements, technical considerations, and operational considerations. Factors that need to be incorporated into the storage architecture include: Cost Storage can be a significant portion of the overall system cost. For an organization that is concerned with vendor support, a commercial storage solution is advisable, although it comes with a higher price tag. If initial capital expenditure requires minimization, designing a system based on commodity hardware would apply. The trade-off is potentially higher support costs and a greater risk of incompatibility and interoperability issues. Scalability Scalability, along with expandability, is a major consideration in a general purpose OpenStack cloud. It might be difficult to predict the final intended size of the implementation as there are no established usage patterns for a general purpose cloud. It might become necessary to expand the initial deployment in order to accommodate growth and user demand. Expandability Expandability is a major architecture factor for storage solutions with general purpose OpenStack cloud. A storage solution that expands to 50 PB is considered more expandable than a solution that only scales to 10 PB. This meter is related to, but different, from scalability, which is a measure of the solution's performance as it expands. For example, the storage architecture for a cloud that is intended for a development platform may not have the same expandability and scalability requirements as a cloud that is intended for a commercial product. Using a scale-out storage solution with direct-attached storage (DAS) in the servers is well suited for a general purpose OpenStack cloud. For example, it is possible to populate storage in either the compute hosts similar to a grid computing solution, or into hosts dedicated to providing block storage exclusively. When deploying storage in the compute hosts appropriate hardware, that can support both the storage and compute services on the same hardware, will be required. Understanding the requirements of cloud services will help determine what scale-out solution should be used. Determining if a single, highly expandable and highly vertical, scalable, centralized storage array should be included in the design. Once an approach has been determined, the storage hardware needs to be selected based on this criteria. This list expands upon the potential impacts for including a particular storage architecture (and corresponding storage hardware) into the design for a general purpose OpenStack cloud: Connectivity Ensure that, if storage protocols other than Ethernet are part of the storage solution, the appropriate hardware has been selected. If a centralized storage array is selected, ensure that the hypervisor will be able to connect to that storage array for image storage. Usage How the particular storage architecture will be used is critical for determining the architecture. Some of the configurations that will influence the architecture include whether it will be used by the hypervisors for ephemeral instance storage or if OpenStack Object Storage will use it for object storage. Instance and image locations Where instances and images will be stored will influence the architecture. Server hardware If the solution is a scale-out storage architecture that includes DAS, it will affect the server hardware selection. This could ripple into the decisions that affect host density, instance density, power density, OS-hypervisor, management tools and others. General purpose OpenStack cloud has multiple options. The key factors that will have an influence on selection of storage hardware for a general purpose OpenStack cloud are as follows: Capacity Hardware resources selected for the resource nodes should be capable of supporting enough storage for the cloud services. Defining the initial requirements and ensuring the design can support adding capacity is important. Hardware nodes selected for object storage should be capable of support a large number of inexpensive disks with no reliance on RAID controller cards. Hardware nodes selected for block storage should be capable of supporting high speed storage solutions and RAID controller cards to provide performance and redundancy to storage at a hardware level. Selecting hardware RAID controllers that automatically repair damaged arrays will assist with the replacement and repair of degraded or destroyed storage devices. Performance Disks selected for object storage services do not need to be fast performing disks. We recommend that object storage nodes take advantage of the best cost per terabyte available for storage. Contrastingly, disks chosen for block storage services should take advantage of performance boosting features that may entail the use of SSDs or flash storage to provide high performance block storage pools. Storage performance of ephemeral disks used for instances should also be taken into consideration. If compute pools are expected to have a high utilization of ephemeral storage, or requires very high performance, it would be advantageous to deploy similar hardware solutions to block storage. Fault tolerance Object storage resource nodes have no requirements for hardware fault tolerance or RAID controllers. It is not necessary to plan for fault tolerance within the object storage hardware because the object storage service provides replication between zones as a feature of the service. Block storage nodes, compute nodes and cloud controllers should all have fault tolerance built in at the hardware level by making use of hardware RAID controllers and varying levels of RAID configuration. The level of RAID chosen should be consistent with the performance and availability requirements of the cloud.

Selecting networking hardware Selecting network architecture determines which network hardware will be used. Networking software is determined by the selected networking hardware. For example, selecting networking hardware that only supports Gigabit Ethernet (GbE) will impact the overall design. Similarly, deciding to use 10 Gigabit Ethernet (10 GbE) will have a number of impacts on various areas of the overall design. There are more subtle design impacts that need to be considered. The selection of certain networking hardware (and the networking software) affects the management tools that can be used. There are exceptions to this; the rise of "open" networking software that supports a range of networking hardware means that there are instances where the relationship between networking hardware and networking software are not as tightly defined. An example of this type of software is Cumulus Linux, which is capable of running on a number of switch vendor's hardware solutions. Some of the key considerations that should be included in the selection of networking hardware include: Port count The design will require networking hardware that has the requisite port count. Port density The network design will be affected by the physical space that is required to provide the requisite port count. A higher port density is preferred, as it leaves more rack space for compute or storage components that may be required by the design. This can also lead into concerns about fault domains and power density that should be considered. Higher density switches are more expensive and should also be considered, as it is important not to over design the network if it is not required. Port speed The networking hardware must support the proposed network speed, for example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE). Redundancy The level of network hardware redundancy required is influenced by the user requirements for high availability and cost considerations. Network redundancy can be achieved by adding redundant power supplies or paired switches. If this is a requirement, the hardware will need to support this configuration. Power requirements Ensure that the physical data center provides the necessary power for the selected network hardware. This may be an issue for spine switches in a leaf and spine fabric, or end of row (EoR) switches. There is no single best practice architecture for the networking hardware supporting a general purpose OpenStack cloud that will apply to all implementations. Some of the key factors that will have a strong influence on selection of networking hardware include: Connectivity All nodes within an OpenStack cloud require network connectivity. In some cases, nodes require access to more than one network segment. The design must encompass sufficient network capacity and bandwidth to ensure that all communications within the cloud, both north-south and east-west traffic have sufficient resources available. Scalability The network design should encompass a physical and logical network design that can be easily expanded upon. Network hardware should offer the appropriate types of interfaces and speeds that are required by the hardware nodes. Availability To ensure that access to nodes within the cloud is not interrupted, we recommend that the network architecture identify any single points of failure and provide some level of redundancy or fault tolerance. With regard to the network infrastructure itself, this often involves use of networking protocols such as LACP, VRRP or others to achieve a highly available network connection. In addition, it is important to consider the networking implications on API availability. In order to ensure that the APIs, and potentially other services in the cloud are highly available, we recommend you design a load balancing solution within the network architecture to accommodate for these requirements.

Software selection Software selection for a general purpose OpenStack architecture design needs to include these three areas: Operating system (OS) and hypervisor OpenStack components Supplemental software

Operating system and hypervisor The operating system (OS) and hypervisor have a significant impact on the overall design. Selecting a particular operating system and hypervisor can directly affect server hardware selection. Make sure the storage hardware and topology support the selected operating system and hypervisor combination. Also ensure the networking hardware selection and topology will work with the chosen operating system and hypervisor combination. For example, if the design uses Link Aggregation Control Protocol (LACP), the OS and hypervisor both need to support it. Some areas that could be impacted by the selection of OS and hypervisor include: Cost Selecting a commercially supported hypervisor, such as Microsoft Hyper-V, will result in a different cost model rather than community-supported open source hypervisors including KVM, Kinstance or Xen. When comparing open source OS solutions, choosing Ubuntu over Red Hat (or vice versa) will have an impact on cost due to support contracts. Supportability Depending on the selected hypervisor, staff should have the appropriate training and knowledge to support the selected OS and hypervisor combination. If they do not, training will need to be provided which could have a cost impact on the design. Management tools The management tools used for Ubuntu and Kinstance differ from the management tools for VMware vSphere. Although both OS and hypervisor combinations are supported by OpenStack, there will be very different impacts to the rest of the design as a result of the selection of one combination versus the other. Scale and performance Ensure that selected OS and hypervisor combinations meet the appropriate scale and performance requirements. The chosen architecture will need to meet the targeted instance-host ratios with the selected OS-hypervisor combinations. Security Ensure that the design can accommodate regular periodic installations of application security patches while maintaining required workloads. The frequency of security patches for the proposed OS-hypervisor combination will have an impact on performance and the patch installation process could affect maintenance windows. Supported features Determine which features of OpenStack are required. This will often determine the selection of the OS-hypervisor combination. Some features are only available with specific OSs or hypervisors. For example, if certain features are not available, the design might need to be modified to meet the user requirements. Interoperability You will need to consider how the OS and hypervisor combination interactions with other operating systems and hypervisors, including other software solutions. Operational troubleshooting tools for one OS-hypervisor combination may differ from the tools used for another OS-hypervisor combination and, as a result, the design will need to address if the two sets of tools need to interoperate.

OpenStack components Selecting which OpenStack components are included in the overall design can have a significant impact. Some OpenStack components, like compute and Image service, are required in every architecture. Other components, like Orchestration, are not always required. Excluding certain OpenStack components can limit or constrain the functionality of other components. For example, if the architecture includes Orchestration but excludes Telemetry, then the design will not be able to take advantage of Orchestrations' auto scaling functionality. It is important to research the component interdependencies in conjunction with the technical requirements before deciding on the final architecture.

Supplemental components While OpenStack is a fairly complete collection of software projects for building a platform for cloud services, there are invariably additional pieces of software that need to be considered in any given OpenStack design.

Networking software OpenStack Networking provides a wide variety of networking services for instances. There are many additional networking software packages that can be useful when managing OpenStack components. Some examples include: Software to provide load balancing Network redundancy protocols Routing daemons Some of these software packages are described in more detail in the OpenStack High Availability Guide (refer to the Network controller cluster stack chapter of the OpenStack High Availability Guide). For a general purpose OpenStack cloud, the OpenStack infrastructure components need to be highly available. If the design does not include hardware load balancing, networking software packages like HAProxy will need to be included.

Management software Selected supplemental software solution impacts and affects the overall OpenStack cloud design. This includes software for providing clustering, logging, monitoring and alerting. Inclusion of clustering software, such as Corosync or Pacemaker, is determined primarily by the availability requirements. The impact of including (or not including) these software packages is primarily determined by the availability of the cloud infrastructure and the complexity of supporting the configuration after it is deployed. The OpenStack High Availability Guide provides more details on the installation and configuration of Corosync and Pacemaker, should these packages need to be included in the design. Requirements for logging, monitoring, and alerting are determined by operational considerations. Each of these sub-categories includes a number of various options. For example, in the logging sub-category one might consider Logstash, Splunk, instanceware Log Insight, or some other log aggregation-consolidation tool. Logs should be stored in a centralized location to make it easier to perform analytics against the data. Log data analytics engines can also provide automation and issue notification by providing a mechanism to both alert and automatically attempt to remediate some of the more commonly known issues. If these software packages are required, the design must account for the additional resource consumption (CPU, RAM, storage, and network bandwidth). Some other potential design impacts include: OS-hypervisor combination: Ensure that the selected logging, monitoring, or alerting tools support the proposed OS-hypervisor combination. Network hardware: The network hardware selection needs to be supported by the logging, monitoring, and alerting software.

Database software OpenStack components often require access to back-end database services to store state and configuration information. Selecting an appropriate back-end database that satisfies the availability and fault tolerance requirements of the OpenStack services is required. OpenStack services supports connecting to a database that is supported by the SQLAlchemy python drivers, however, most common database deployments make use of MySQL or variations of it. We recommend that the database, which provides back-end service within a general purpose cloud, be made highly available when using an available technology which can accomplish that goal.

Addressing performance-sensitive workloads Although one of the key defining factors for a general purpose OpenStack cloud is that performance is not a determining factor, there may still be some performance-sensitive workloads deployed on the general purpose OpenStack cloud. For design guidance on performance-sensitive workloads, we recommend that you refer to the focused scenarios later in this guide. The resource-focused guides can be used as a supplement to this guide to help with decisions regarding performance-sensitive workloads.

Compute-focused workloads In an OpenStack cloud that is compute-focused, there are some design choices that can help accommodate those workloads. Compute-focused workloads demand more CPU and memory resources with lower priority given to storage and network performance. For guidance on designing for this type of cloud, please refer to .

Network-focused workloads In a network-focused OpenStack cloud, some design choices can improve the performance of these types of workloads. Network-focused workloads have extreme demands on network bandwidth and services that require specialized consideration and planning. For guidance on designing for this type of cloud, please refer to .

Storage-focused workloads Storage focused OpenStack clouds need to be designed to accommodate workloads that have extreme demands on either object or block storage services. For guidance on designing for this type of cloud, please refer to .