Architecture ~~~~~~~~~~~~ Consider the following factors when selecting storage hardware: * Cost * Performance * Reliability Storage-focused OpenStack clouds must address I/O intensive workloads. These workloads are not CPU intensive, nor are they consistently network intensive. The network may be heavily utilized to transfer storage, but they are not otherwise network intensive. The selection of storage hardware determines the overall performance and scalability of a storage-focused OpenStack design architecture. Several factors impact the design process, including: Cost The cost of components affects which storage architecture and hardware you choose. Performance The latency of storage I/O requests indicates performance. Performance requirements affect which solution you choose. Scalability Scalability refers to how the storage solution performs as it expands to its maximum size. Storage solutions that perform well in small configurations but have degraded performance in large configurations are not scalable. A solution that performs well at maximum expansion is scalable. Large deployments require a storage solution that performs well as it expands. Latency is a key consideration in a storage-focused OpenStack cloud. Using solid-state disks (SSDs) to minimize latency and, to reduce CPU delays caused by waiting for the storage, increases performance. Use RAID controller cards in compute hosts to improve the performance of the underlying disk subsystem. Depending on the storage architecture, you can adopt a scale-out solution, or use a highly expandable and scalable centralized storage array. If a centralized storage array is the right fit for your requirements, then the array vendor determines the hardware selection. It is possible to build a storage array using commodity hardware with Open Source software, but requires people with expertise to build such a system. On the other hand, a scale-out storage solution that uses direct-attached storage (DAS) in the servers may be an appropriate choice. This requires configuration of the server hardware to support the storage solution. Considerations affecting storage architecture (and corresponding storage hardware) of a Storage-focused OpenStack cloud include: Connectivity Based on the selected storage solution, ensure the connectivity matches the storage solution requirements. We recommended confirming that the network characteristics minimize latency to boost the overall performance of the design. Latency Determine if the use case has consistent or highly variable latency. Throughput Ensure that the storage solution throughput is optimized for your application requirements. Server hardware Use of DAS impacts the server hardware choice and affects host density, instance density, power density, OS-hypervisor, and management tools. Compute (server) hardware selection ----------------------------------- Four opposing factors determine the compute (server) hardware selection: Server density A measure of how many servers can fit into a given measure of physical space, such as a rack unit [U]. Resource capacity The number of CPU cores, how much RAM, or how much storage a given server delivers. Expandability The number of additional resources you can add to a server before it reaches capacity. Cost The relative cost of the hardware weighed against the level of design effort needed to build the system. You must weigh the dimensions against each other to determine the best design for the desired purpose. For example, increasing server density can mean sacrificing resource capacity or expandability. Increasing resource capacity and expandability can increase cost but decrease server density. Decreasing cost often means decreasing supportability, server density, resource capacity, and expandability. Compute capacity (CPU cores and RAM capacity) is a secondary consideration for selecting server hardware. As a result, the required server hardware must supply adequate CPU sockets, additional CPU cores, and more RAM; network connectivity and storage capacity are not as critical. The hardware needs to provide enough network connectivity and storage capacity to meet the user requirements, however they are not the primary consideration. Some server hardware form factors are better suited to storage-focused designs than others. The following is a list of these form factors: * Most blade servers support dual-socket multi-core CPUs. Choose either full width or full height blades to avoid the limit. High density blade servers support up to 16 servers in only 10 rack units using half height or half width blades. .. warning:: This decreases density by 50% (only 8 servers in 10 U) if a full width or full height option is used. * 1U rack-mounted servers have the ability to offer greater server density than a blade server solution, but are often limited to dual-socket, multi-core CPU configurations. .. note:: Due to cooling requirements, it is rare to see 1U rack-mounted servers with more than 2 CPU sockets. To obtain greater than dual-socket support in a 1U rack-mount form factor, customers need to buy their systems from Original Design Manufacturers (ODMs) or second-tier manufacturers. .. warning:: This may cause issues for organizations that have preferred vendor policies or concerns with support and hardware warranties of non-tier 1 vendors. * 2U rack-mounted servers provide quad-socket, multi-core CPU support but with a corresponding decrease in server density (half the density offered by 1U rack-mounted servers). * Larger rack-mounted servers, such as 4U servers, often provide even greater CPU capacity. Commonly supporting four or even eight CPU sockets. These servers have greater expandability but such servers have much lower server density and usually greater hardware cost. * Rack-mounted servers that support multiple independent servers in a single 2U or 3U enclosure, "sled servers", deliver increased density as compared to a typical 1U-2U rack-mounted servers. Other factors that influence server hardware selection for a storage-focused OpenStack design architecture include: Instance density In this architecture, instance density and CPU-RAM oversubscription are lower. You require more hosts to support the anticipated scale, especially if the design uses dual-socket hardware designs. Host density Another option to address the higher host count is to use a quad-socket platform. Taking this approach decreases host density which also increases rack count. This configuration affects the number of power connections and also impacts network and cooling requirements. Power and cooling density The power and cooling density requirements might be lower than with blade, sled, or 1U server designs due to lower host density (by using 2U, 3U or even 4U server designs). For data centers with older infrastructure, this might be a desirable feature. Storage-focused OpenStack design architecture server hardware selection should focus on a "scale-up" versus "scale-out" solution. The determination of which is the best solution (a smaller number of larger hosts or a larger number of smaller hosts), depends on a combination of factors including cost, power, cooling, physical rack and floor space, support-warranty, and manageability. Networking hardware selection ----------------------------- Key considerations for the selection of networking hardware include: Port count The user requires networking hardware that has the requisite port count. Port density The physical space required to provide the requisite port count affects the network design. A switch that provides 48 10 GbE ports in 1U has a much higher port density than a switch that provides 24 10 GbE ports in 2U. On a general scale, a higher port density leaves more rack space for compute or storage components which is preferred. It is also important to consider fault domains and power density. Finally, higher density switches are more expensive, therefore it is important not to over design the network. Port speed The networking hardware must support the proposed network speed, for example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE). Redundancy User requirements for high availability and cost considerations influence the required level of network hardware redundancy. Achieve network redundancy by adding redundant power supplies or paired switches. .. note:: If this is a requirement, the hardware must support this configuration. User requirements determine if a completely redundant network infrastructure is required. Power requirements Ensure that the physical data center provides the necessary power for the selected network hardware. This is not an issue for top of rack (ToR) switches, but may be an issue for spine switches in a leaf and spine fabric, or end of row (EoR) switches. Protocol support It is possible to gain more performance out of a single storage system by using specialized network technologies such as RDMA, SRP, iSER and SCST. The specifics for using these technologies is beyond the scope of this book. Software selection ------------------ Factors that influence the software selection for a storage-focused OpenStack architecture design include: * Operating system (OS) and hypervisor * OpenStack components * Supplemental software Design decisions made in each of these areas impacts the rest of the OpenStack architecture design. Operating system and hypervisor ------------------------------- Operating system (OS) and hypervisor have a significant impact on the overall design and also affect server hardware selection. Ensure the selected operating system and hypervisor combination support the storage hardware and work with the networking hardware selection and topology. Operating system and hypervisor selection affect the following areas: Cost Selecting a commercially supported hypervisor, such as Microsoft Hyper-V, results in a different cost model than a community-supported open source hypervisor like Kinstance or Xen. Similarly, choosing Ubuntu over Red Hat (or vice versa) impacts cost due to support contracts. However, business or application requirements might dictate a specific or commercially supported hypervisor. Supportability Staff must have training with the chosen hypervisor. Consider the cost of training when choosing a solution. The support of a commercial product such as Red Hat, SUSE, or Windows, is the responsibility of the OS vendor. If an open source platform is chosen, the support comes from in-house resources. Management tools Ubuntu and Kinstance use different management tools than VMware vSphere. Although both OS and hypervisor combinations are supported by OpenStack, there are varying impacts to the rest of the design as a result of the selection of one combination versus the other. Scale and performance Ensure the selected OS and hypervisor combination meet the appropriate scale and performance requirements needed for this storage focused OpenStack cloud. The chosen architecture must meet the targeted instance-host ratios with the selected OS-hypervisor combination. Security Ensure the design can accommodate the regular periodic installation of application security patches while maintaining the required workloads. The frequency of security patches for the proposed OS-hypervisor combination impacts performance and the patch installation process could affect maintenance windows. Supported features Selecting the OS-hypervisor combination often determines the required features of OpenStack. Certain features are only available with specific OSes or hypervisors. For example, if certain features are not available, you might need to modify the design to meet user requirements. Interoperability The OS-hypervisor combination should be chosen based on the interoperability with one another, and other OS-hyervisor combinations. Operational and troubleshooting tools for one OS-hypervisor combination may differ from the tools used for another OS-hypervisor combination. As a result, the design must address if the two sets of tools need to interoperate. OpenStack components -------------------- The OpenStack components you choose can have a significant impact on the overall design. While there are certain components that are always present (Compute and Image service, for example), there are other services that may not be required. As an example, a certain design may not require the Orchestration service. Omitting Orchestration would not typically have a significant impact on the overall design, however, if the architecture uses a replacement for OpenStack Object Storage for its storage component, this could potentially have significant impacts on the rest of the design. A storage-focused design might require the ability to use Orchestration to launch instances with Block Storage volumes to perform storage-intensive processing. A storage-focused OpenStack design architecture uses the following components: * OpenStack Identity (keystone) * OpenStack dashboard (horizon) * OpenStack Compute (nova) (including the use of multiple hypervisor drivers) * OpenStack Object Storage (swift) (or another object storage solution) * OpenStack Block Storage (cinder) * OpenStack Image service (glance) * OpenStack Networking (neutron) or legacy networking (nova-network) Excluding certain OpenStack components may limit or constrain the functionality of other components. If a design opts to include Orchestration but exclude Telemetry, then the design cannot take advantage of Orchestration's auto scaling functionality (which relies on information from Telemetry). Due to the fact that you can use Orchestration to spin up a large number of instances to perform the compute-intensive processing, we strongly recommend including Orchestration in a compute-focused architecture design. Networking software ------------------- OpenStack Networking (neutron) provides a wide variety of networking services for instances. There are many additional networking software packages that may be useful to manage the OpenStack components themselves. Some examples include HAProxy, Keepalived, and various routing daemons (like Quagga). The OpenStack High Availability Guide describes some of these software packages, HAProxy in particular. See the `Network controller cluster stack chapter `_ of the OpenStack High Availability Guide. Management software ------------------- Management software includes software for providing: * Clustering * Logging * Monitoring * Alerting .. important:: The factors for determining which software packages in this category to select is outside the scope of this design guide. The availability design requirements determine the selection of Clustering Software, such as Corosync or Pacemaker. The availability of the cloud infrastructure and the complexity of supporting the configuration after deployment determines the impact of including these software packages. The OpenStack High Availability Guide provides more details on the installation and configuration of Corosync and Pacemaker. Operational considerations determine the requirements for logging, monitoring, and alerting. Each of these sub-categories includes options. For example, in the logging sub-category you could select Logstash, Splunk, Log Insight, or another log aggregation-consolidation tool. Store logs in a centralized location to facilitate performing analytics against the data. Log data analytics engines can also provide automation and issue notification, by providing a mechanism to both alert and automatically attempt to remediate some of the more commonly known issues. If you require any of these software packages, the design must account for the additional resource consumption. Some other potential design impacts include: * OS-Hypervisor combination: Ensure that the selected logging, monitoring, or alerting tools support the proposed OS-hypervisor combination. * Network hardware: The network hardware selection needs to be supported by the logging, monitoring, and alerting software. Database software ----------------- Most OpenStack components require access to back-end database services to store state and configuration information. Choose an appropriate back-end database which satisfies the availability and fault tolerance requirements of the OpenStack services. MySQL is the default database for OpenStack, but other compatible databases are available. .. note:: Telemetry uses MongoDB. The chosen high availability database solution changes according to the selected database. MySQL, for example, provides several options. Use a replication technology such as Galera for active-active clustering. For active-passive use some form of shared storage. Each of these potential solutions has an impact on the design: * Solutions that employ Galera/MariaDB require at least three MySQL nodes. * MongoDB has its own design considerations for high availability. * OpenStack design, generally, does not include shared storage. However, for some high availability designs, certain components might require it depending on the specific implementation.