Merge "[arch-design-draft] Migrate technical requirements content"

2016-09-23 12:06:42 +00:00 · 2016-09-23 12:06:42 +00:00 · 25bde8fc08
commit 25bde8fc08
parent efb6be7a8c 3bc0e94b9d
11 changed files with 1094 additions and 1244 deletions
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-hardware-selection.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-hardware-selection.rst
@ -1,449 +0,0 @@
-==================
-Hardware selection
-==================
-
-Hardware selection involves three key areas:
-
-* Network
-
-* Compute
-
-* Storage
-
-Network hardware selection
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The network architecture determines which network hardware will be
-used. Networking software is determined by the selected networking
-hardware.
-
-There are more subtle design impacts that need to be considered. The
-selection of certain networking hardware (and the networking software)
-affects the management tools that can be used. There are exceptions to
-this; the rise of *open* networking software that supports a range of
-networking hardware means there are instances where the relationship
-between networking hardware and networking software are not as tightly
-defined.
-
-For a compute-focus architecture, we recommend designing the network
-architecture using a scalable network model that makes it easy to add
-capacity and bandwidth. A good example of such a model is the leaf-spline
-model. In this type of network design, it is possible to easily add additional
-bandwidth as well as scale out to additional racks of gear. It is important to
-select network hardware that supports the required port count, port speed, and
-port density while also allowing for future growth as workload demands
-increase. It is also important to evaluate where in the network architecture
-it is valuable to provide redundancy.
-
-Some of the key considerations that should be included in the selection
-of networking hardware include:
-
-Port count
- The design will require networking hardware that has the requisite
- port count.
-
-Port density
- The network design will be affected by the physical space that is
- required to provide the requisite port count. A higher port density
- is preferred, as it leaves more rack space for compute or storage
- components that may be required by the design. This can also lead
- into considerations about fault domains and power density. Higher
- density switches are more expensive, therefore it is important not
- to over design the network.
-
-Port speed
- The networking hardware must support the proposed network speed, for
- example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE).
-
-Redundancy
- User requirements for high availability and cost considerations
- influence the required level of network hardware redundancy.
- Network redundancy can be achieved by adding redundant power
- supplies or paired switches.
-
- .. note::
-
-    If this is a requirement, the hardware must support this
-    configuration. User requirements determine if a completely
-    redundant network infrastructure is required.
-
-Power requirements
- Ensure that the physical data center provides the necessary power
- for the selected network hardware.
-
- .. note::
-
-    This is not an issue for top of rack (ToR) switches. This may be an issue
-    for spine switches in a leaf and spine fabric, or end of row (EoR)
-    switches.
-
-Protocol support
- It is possible to gain more performance out of a single storage
- system by using specialized network technologies such as RDMA, SRP,
- iSER and SCST. The specifics for using these technologies is beyond
- the scope of this book.
-
-There is no single best practice architecture for the networking
-hardware supporting an OpenStack cloud that will apply to all implementations.
-Some of the key factors that will have a major influence on selection of
-networking hardware include:
-
-Connectivity
- All nodes within an OpenStack cloud require network connectivity. In
- some cases, nodes require access to more than one network segment.
- The design must encompass sufficient network capacity and bandwidth
- to ensure that all communications within the cloud, both north-south
- and east-west traffic have sufficient resources available.
-
-Scalability
- The network design should encompass a physical and logical network
- design that can be easily expanded upon. Network hardware should
- offer the appropriate types of interfaces and speeds that are
- required by the hardware nodes.
-
-Availability
- To ensure access to nodes within the cloud is not interrupted,
- we recommend that the network architecture identify any single
- points of failure and provide some level of redundancy or fault
- tolerance. The network infrastructure often involves use of
- networking protocols such as LACP, VRRP or others to achieve a highly
- available network connection. It is also important to consider the
- networking implications on API availability. We recommend a load balancing
- solution is designed within the network architecture to ensure that the APIs,
- and potentially other services in the cloud are highly available.
-
-Compute (server) hardware selection
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Consider the following factors when selecting compute (server) hardware:
-
-* Server density
-   A measure of how many servers can fit into a given measure of
-   physical space, such as a rack unit [U].
-
-* Resource capacity
-   The number of CPU cores, how much RAM, or how much storage a given
-   server delivers.
-
-* Expandability
-   The number of additional resources you can add to a server before it
-   reaches capacity.
-
-* Cost
-   The relative cost of the hardware weighed against the level of
-   design effort needed to build the system.
-
-Weigh these considerations against each other to determine the best
-design for the desired purpose. For example, increasing server density
-means sacrificing resource capacity or expandability.  Increasing resource
-capacity and expandability can increase cost but decrease server density.
-Decreasing cost often means decreasing supportability, server density,
-resource capacity, and expandability.
-
-Compute capacity (CPU cores and RAM capacity) is a secondary
-consideration for selecting server hardware. The required
-server hardware must supply adequate CPU sockets, additional CPU cores,
-and more RAM; network connectivity and storage capacity are not as
-critical. The hardware needs to provide enough network connectivity and
-storage capacity to meet the user requirements.
-
-For a compute-focused cloud, emphasis should be on server
-hardware that can offer more CPU sockets, more CPU cores, and more RAM.
-Network connectivity and storage capacity are less critical.
-
-When designing a OpenStack cloud architecture, you must
-consider whether you intend to scale up or scale out. Selecting a
-smaller number of larger hosts, or a larger number of smaller hosts,
-depends on a combination of factors: cost, power, cooling, physical rack
-and floor space, support-warranty, and manageability.
-
-Consider the following in selecting server hardware form factor suited for
-your OpenStack design architecture:
-
-* Most blade servers can support dual-socket multi-core CPUs. To avoid
-  this CPU limit, select ``full width`` or ``full height`` blades. Be
-  aware, however, that this also decreases server density. For example,
-  high density blade servers such as HP BladeSystem or Dell PowerEdge
-  M1000e support up to 16 servers in only ten rack units. Using
-  half-height blades is twice as dense as using full-height blades,
-  which results in only eight servers per ten rack units.
-
-* 1U rack-mounted servers have the ability to offer greater server density
-  than a blade server solution, but are often limited to dual-socket,
-  multi-core CPU configurations. It is possible to place forty 1U servers
-  in a rack, providing space for the top of rack (ToR) switches, compared
-  to 32 full width blade servers.
-
-  To obtain greater than dual-socket support in a 1U rack-mount form
-  factor, customers need to buy their systems from Original Design
-  Manufacturers (ODMs) or second-tier manufacturers.
-
-  .. warning::
-
-     This may cause issues for organizations that have preferred
-     vendor policies or concerns with support and hardware warranties
-     of non-tier 1 vendors.
-
-* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
-  but with a corresponding decrease in server density (half the density
-  that 1U rack-mounted servers offer).
-
-* Larger rack-mounted servers, such as 4U servers, often provide even
-  greater CPU capacity, commonly supporting four or even eight CPU
-  sockets. These servers have greater expandability, but such servers
-  have much lower server density and are often more expensive.
-
-* ``Sled servers`` are rack-mounted servers that support multiple
-  independent servers in a single 2U or 3U enclosure. These deliver
-  higher density as compared to typical 1U or 2U rack-mounted servers.
-  For example, many sled servers offer four independent dual-socket
-  nodes in 2U for a total of eight CPU sockets in 2U.
-
-Other factors that influence server hardware selection for an OpenStack
-design architecture include:
-
-Instance density
- More hosts are required to support the anticipated scale
- if the design architecture uses dual-socket hardware designs.
-
- For a general purpose OpenStack cloud, sizing is an important consideration.
- The expected or anticipated number of instances that each hypervisor can
- host is a common meter used in sizing the deployment. The selected server
- hardware needs to support the expected or anticipated instance density.
-
-Host density
- Another option to address the higher host count is to use a
- quad-socket platform. Taking this approach decreases host density
- which also increases rack count. This configuration affects the
- number of power connections and also impacts network and cooling
- requirements.
-
- Physical data centers have limited physical space, power, and
- cooling. The number of hosts (or hypervisors) that can be fitted
- into a given metric (rack, rack unit, or floor tile) is another
- important method of sizing. Floor weight is an often overlooked
- consideration. The data center floor must be able to support the
- weight of the proposed number of hosts within a rack or set of
- racks. These factors need to be applied as part of the host density
- calculation and server hardware selection.
-
-Power and cooling density
- The power and cooling density requirements might be lower than with
- blade, sled, or 1U server designs due to lower host density (by
- using 2U, 3U or even 4U server designs). For data centers with older
- infrastructure, this might be a desirable feature.
-
- Data centers have a specified amount of power fed to a given rack or
- set of racks. Older data centers may have a power density as power
- as low as 20 AMPs per rack, while more recent data centers can be
- architected to support power densities as high as 120 AMP per rack.
- The selected server hardware must take power density into account.
-
-Network connectivity
- The selected server hardware must have the appropriate number of
- network connections, as well as the right type of network
- connections, in order to support the proposed architecture. Ensure
- that, at a minimum, there are at least two diverse network
- connections coming into each rack.
-
-The selection of form factors or architectures affects the selection of
-server hardware. Ensure that the selected server hardware is configured
-to support enough storage capacity (or storage expandability) to match
-the requirements of selected scale-out storage solution. Similarly, the
-network architecture impacts the server hardware selection and vice
-versa.
-
-Hardware for general purpose OpenStack cloud
--------------------------------------------
-
-Hardware for a general purpose OpenStack cloud should reflect a cloud
-with no pre-defined usage model, designed to run a wide variety of
-applications with varying resource usage requirements. These
-applications include any of the following:
-
-* RAM-intensive
-
-* CPU-intensive
-
-* Storage-intensive
-
-Certain hardware form factors may better suit a general purpose
-OpenStack cloud due to the requirement for equal (or nearly equal)
-balance of resources. Server hardware must provide the following:
-
-* Equal (or nearly equal) balance of compute capacity (RAM and CPU)
-
-* Network capacity (number and speed of links)
-
-* Storage capacity (gigabytes or terabytes as well as :term:`Input/Output
-  Operations Per Second (IOPS)`
-
-The best form factor for server hardware supporting a general purpose
-OpenStack cloud is driven by outside business and cost factors. No
-single reference architecture applies to all implementations; the
-decision must flow from user requirements, technical considerations, and
-operational considerations.
-
-Selecting storage hardware
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Storage hardware architecture is determined by selecting specific storage
-architecture. Determine the selection of storage architecture by
-evaluating possible solutions against the critical factors, the user
-requirements, technical considerations, and operational considerations.
-Consider the following factors when selecting storage hardware:
-
-Cost
- Storage can be a significant portion of the overall system cost. For
- an organization that is concerned with vendor support, a commercial
- storage solution is advisable, although it comes with a higher price
- tag. If initial capital expenditure requires minimization, designing
- a system based on commodity hardware would apply. The trade-off is
- potentially higher support costs and a greater risk of
- incompatibility and interoperability issues.
-
-Performance
- The latency of storage I/O requests indicates performance. Performance
- requirements affect which solution you choose.
-
-Scalability
- Scalability, along with expandability, is a major consideration in a
- general purpose OpenStack cloud. It might be difficult to predict
- the final intended size of the implementation as there are no
- established usage patterns for a general purpose cloud. It might
- become necessary to expand the initial deployment in order to
- accommodate growth and user demand.
-
-Expandability
- Expandability is a major architecture factor for storage solutions
- with general purpose OpenStack cloud. A storage solution that
- expands to 50 PB is considered more expandable than a solution that
- only scales to 10 PB. This meter is related to scalability, which is
- the measure of a solution's performance as it expands.
-
-General purpose cloud storage requirements
------------------------------------------
-Using a scale-out storage solution with direct-attached storage (DAS) in
-the servers is well suited for a general purpose OpenStack cloud. Cloud
-services requirements determine your choice of scale-out solution. You
-need to determine if a single, highly expandable and highly vertical,
-scalable, centralized storage array is suitable for your design. After
-determining an approach, select the storage hardware based on this
-criteria.
-
-This list expands upon the potential impacts for including a particular
-storage architecture (and corresponding storage hardware) into the
-design for a general purpose OpenStack cloud:
-
-Connectivity
- If storage protocols other than Ethernet are part of the storage solution,
- ensure the appropriate hardware has been selected. If a centralized storage
- array is selected, ensure that the hypervisor will be able to connect to
- that storage array for image storage.
-
-Usage
- How the particular storage architecture will be used is critical for
- determining the architecture. Some of the configurations that will
- influence the architecture include whether it will be used by the
- hypervisors for ephemeral instance storage, or if OpenStack Object
- Storage will use it for object storage.
-
-Instance and image locations
- Where instances and images will be stored will influence the
- architecture.
-
-Server hardware
- If the solution is a scale-out storage architecture that includes
- DAS, it will affect the server hardware selection. This could ripple
- into the decisions that affect host density, instance density, power
- density, OS-hypervisor, management tools and others.
-
-A general purpose OpenStack cloud has multiple options. The key factors
-that will have an influence on selection of storage hardware for a
-general purpose OpenStack cloud are as follows:
-
-Capacity
- Hardware resources selected for the resource nodes should be capable
- of supporting enough storage for the cloud services. Defining the
- initial requirements and ensuring the design can support adding
- capacity is important. Hardware nodes selected for object storage
- should be capable of support a large number of inexpensive disks
- with no reliance on RAID controller cards. Hardware nodes selected
- for block storage should be capable of supporting high speed storage
- solutions and RAID controller cards to provide performance and
- redundancy to storage at a hardware level. Selecting hardware RAID
- controllers that automatically repair damaged arrays will assist
- with the replacement and repair of degraded or deleted storage
- devices.
-
-Performance
- Disks selected for object storage services do not need to be fast
- performing disks. We recommend that object storage nodes take
- advantage of the best cost per terabyte available for storage.
- Contrastingly, disks chosen for block storage services should take
- advantage of performance boosting features that may entail the use
- of SSDs or flash storage to provide high performance block storage
- pools. Storage performance of ephemeral disks used for instances
- should also be taken into consideration.
-
-Fault tolerance
- Object storage resource nodes have no requirements for hardware
- fault tolerance or RAID controllers. It is not necessary to plan for
- fault tolerance within the object storage hardware because the
- object storage service provides replication between zones as a
- feature of the service. Block storage nodes, compute nodes, and
- cloud controllers should all have fault tolerance built in at the
- hardware level by making use of hardware RAID controllers and
- varying levels of RAID configuration. The level of RAID chosen
- should be consistent with the performance and availability
- requirements of the cloud.
-
-Storage-focus cloud storage requirements
----------------------------------------
-
-Storage-focused OpenStack clouds must address I/O intensive workloads.
-These workloads are not CPU intensive, nor are they consistently network
-intensive. The network may be heavily utilized to transfer storage, but
-they are not otherwise network intensive.
-
-The selection of storage hardware determines the overall performance and
-scalability of a storage-focused OpenStack design architecture. Several
-factors impact the design process, including:
-
-Latency is a key consideration in a storage-focused OpenStack cloud.
-Using solid-state disks (SSDs) to minimize latency and, to reduce CPU
-delays caused by waiting for the storage, increases performance. Use
-RAID controller cards in compute hosts to improve the performance of the
-underlying disk subsystem.
-
-Depending on the storage architecture, you can adopt a scale-out
-solution, or use a highly expandable and scalable centralized storage
-array. If a centralized storage array meets your requirements, then the
-array vendor determines the hardware selection. It is possible to build
-a storage array using commodity hardware with Open Source software, but
-requires people with expertise to build such a system.
-
-On the other hand, a scale-out storage solution that uses
-direct-attached storage (DAS) in the servers may be an appropriate
-choice. This requires configuration of the server hardware to support
-the storage solution.
-
-Considerations affecting storage architecture (and corresponding storage
-hardware) of a Storage-focused OpenStack cloud include:
-
-Connectivity
- Ensure the connectivity matches the storage solution requirements. We
- recommended confirming that the network characteristics minimize latency
- to boost the overall performance of the design.
-
-Latency
- Determine if the use case has consistent or highly variable latency.
-
-Throughput
- Ensure that the storage solution throughput is optimized for your
- application requirements.
-
-Server hardware
- Use of DAS impacts the server hardware choice and affects host
- density, instance density, power density, OS-hypervisor, and
- management tools.
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-logging-monitoring.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-logging-monitoring.rst
@ -1,27 +0,0 @@
-======================
-Logging and monitoring
-======================
-
-OpenStack clouds require appropriate monitoring platforms to catch and
-manage errors.
-
-.. note::
-
-   We recommend leveraging existing monitoring systems to see if they
-   are able to effectively monitor an OpenStack environment.
-
-Specific meters that are critically important to capture include:
-
-* Image disk utilization
-
-* Response time to the Compute API
-
-Logging and monitoring does not significantly differ for a multi-site OpenStack
-cloud. The tools described in the `Logging and monitoring chapter
-<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`__ of
-the Operations Guide remain applicable. Logging and monitoring can be provided
-on a per-site basis, and in a common centralized location.
-
-When attempting to deploy logging and monitoring facilities to a centralized
-location, care must be taken with the load placed on the inter-site networking
-links.
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-network-design.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-network-design.rst
@ -1,445 +0,0 @@
-==========
-Networking
-==========
-
-OpenStack clouds generally have multiple network segments, with each
-segment providing access to particular resources. The network segments
-themselves also require network communication paths that should be
-separated from the other networks. When designing network services for a
-general purpose cloud, plan for either a physical or logical separation
-of network segments used by operators and tenants. Additional network
-segments can also be created for access to internal services such as the
-message bus and database used by various systems. Segregating these
-services onto separate networks helps to protect sensitive data and
-unauthorized access.
-
-Choose a networking service based on the requirements of your instances.
-The architecture and design of your cloud will impact whether you choose
-OpenStack Networking (neutron) or legacy networking (nova-network).
-
-Networking (neutron)
-~~~~~~~~~~~~~~~~~~~~
-
-OpenStack Networking (neutron) is a first class networking service that gives
-full control over creation of virtual network resources to tenants. This is
-often accomplished in the form of tunneling protocols that establish
-encapsulated communication paths over existing network infrastructure in order
-to segment tenant traffic. This method varies depending on the specific
-implementation, but some of the more common methods include tunneling over
-GRE, encapsulating with VXLAN, and VLAN tags.
-
-We recommend you design at least three network segments. The first segment
-should be a public network, used to access REST APIs by tenants and operators.
-The controller nodes and swift proxies are the only devices connecting to this
-network segment. In some cases, this public network might also be serviced by
-hardware load balancers and other network devices.
-
-The second segment is used by administrators to manage hardware resources.
-Configuration management tools also utilize this segment for deploying
-software and services onto new hardware. In some cases, this network
-segment is also used for internal services, including the message bus
-and database services. The second segment needs to communicate with every
-hardware node. Due to the highly sensitive nature of this network segment,
-it needs to be secured from unauthorized access.
-
-The third network segment is used by applications and consumers to access the
-physical network, and for users to access applications. This network is
-segregated from the one used to access the cloud APIs and is not capable
-of communicating directly with the hardware resources in the cloud.
-Communication on this network segment is required by compute resource
-nodes and network gateway services that allow application data to access the
-physical network from outside the cloud.
-
-Legacy networking (nova-network)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The legacy networking (nova-network) service is primarily a layer-2 networking
-service. It functions in two modes: flat networking mode and VLAN mode. In a
-flat network mode, all network hardware nodes and devices throughout the cloud
-are connected to a single layer-2 network segment that provides access to
-application data.
-
-However, when the network devices in the cloud support segmentation using
-VLANs, legacy networking can operate in the second mode. In this design model,
-each tenant within the cloud is assigned a network subnet which is mapped to
-a VLAN on the physical network. It is especially important to remember that
-the maximum number of VLANs that can be used within a spanning tree domain
-is 4096. This places a hard limit on the amount of growth possible within the
-data center. Consequently, when designing a general purpose cloud intended to
-support multiple tenants, we recommend the use of legacy networking with
-VLANs, and not in flat network mode.
-
-Layer-2 architecture advantages
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-A network designed on layer-2 protocols has advantages over a network designed
-on layer-3 protocols. In spite of the difficulties of using a bridge to perform
-the network role of a router, many vendors, customers, and service providers
-choose to use Ethernet in as many parts of their networks as possible. The
-benefits of selecting a layer-2 design are:
-
-* Ethernet frames contain all the essentials for networking. These include, but
-  are not limited to, globally unique source addresses, globally unique
-  destination addresses, and error control.
-
-* Ethernet frames contain all the essentials for networking. These include,
-  but are not limited to, globally unique source addresses, globally unique
-  destination addresses, and error control.
-
-* Ethernet frames can carry any kind of packet. Networking at layer-2 is
-  independent of the layer-3 protocol.
-
-* Adding more layers to the Ethernet frame only slows the networking process
-  down. This is known as nodal processing delay.
-
-* You can add adjunct networking features, for example class of service (CoS)
-  or multicasting, to Ethernet as readily as IP networks.
-
-* VLANs are an easy mechanism for isolating networks.
-
-Most information starts and ends inside Ethernet frames. Today this applies
-to data, voice, and video. The concept is that the network will benefit more
-from the advantages of Ethernet if the transfer of information from a source
-to a destination is in the form of Ethernet frames.
-
-Although it is not a substitute for IP networking, networking at layer-2 can
-be a powerful adjunct to IP networking.
-
-Layer-2 Ethernet usage has these additional advantages over layer-3 IP network
-usage:
-
-* Speed
-* Reduced overhead of the IP hierarchy.
-* No need to keep track of address configuration as systems move around.
-
-Whereas the simplicity of layer-2 protocols might work well in a data center
-with hundreds of physical machines, cloud data centers have the additional
-burden of needing to keep track of all virtual machine addresses and
-networks. In these data centers, it is not uncommon for one physical node
-to support 30-40 instances.
-
-.. Important::
-
-   Networking at the frame level says nothing about the presence or
-   absence of IP addresses at the packet level. Almost all ports, links, and
-   devices on a network of LAN switches still have IP addresses, as do all the
-   source and destination hosts. There are many reasons for the continued need
-   for IP addressing. The largest one is the need to manage the network. A
-   device or link without an IP address is usually invisible to most
-   management applications. Utilities including remote access for diagnostics,
-   file transfer of configurations and software, and similar applications
-   cannot run without IP addresses as well as MAC addresses.
-
-Layer-2 architecture limitations
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Layer-2 network architectures have some limitations that become noticeable when
-used outside of traditional data centers.
-
-* Number of VLANs is limited to 4096.
-* The number of MACs stored in switch tables is limited.
-* You must accommodate the need to maintain a set of layer-4 devices to handle
-  traffic control.
-* MLAG, often used for switch redundancy, is a proprietary solution that does
-  not scale beyond two devices and forces vendor lock-in.
-* It can be difficult to troubleshoot a network without IP addresses and ICMP.
-* Configuring ARP can be complicated on a large layer-2 networks.
-* All network devices need to be aware of all MACs, even instance MACs, so
-  there is constant churn in MAC tables and network state changes as instances
-  start and stop.
-* Migrating MACs (instance migration) to different physical locations are a
-  potential problem if you do not set ARP table timeouts properly.
-
-It is important to know that layer-2 has a very limited set of network
-management tools. It is difficult to control traffic as it does not have
-mechanisms to manage the network or shape the traffic. Network
-troubleshooting is also troublesome, in part because network devices have
-no IP addresses. As a result, there is no reasonable way to check network
-delay.
-
-In a layer-2 network all devices are aware of all MACs, even those that belong
-to instances. The network state information in the backbone changes whenever an
-instance starts or stops. Because of this, there is far too much churn in the
-MAC tables on the backbone switches.
-
-Furthermore, on large layer-2 networks, configuring ARP learning can be
-complicated. The setting for the MAC address timer on switches is critical
-and, if set incorrectly, can cause significant performance problems. So when
-migrating MACs to different physical locations to support instance migration,
-problems may arise. As an example, the Cisco default MAC address timer is
-extremely long. As such, the network information maintained in the switches
-could be out of sync with the new location of the instance.
-
-Layer-3 architecture advantages
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-In layer-3 networking, routing takes instance MAC and IP addresses out of the
-network core, reducing state churn. The only time there would be a routing
-state change is in the case of a Top of Rack (ToR) switch failure or a link
-failure in the backbone itself. Other advantages of using a layer-3
-architecture include:
-
-* Layer-3 networks provide the same level of resiliency and scalability
-  as the Internet.
-
-* Controlling traffic with routing metrics is straightforward.
-
-* You can configure layer-3 to useˇBGPˇconfederation for scalability. This
-  way core routers have state proportional to the number of racks, not to the
-  number of servers or instances.
-
-* There are a variety of well tested tools, such as ICMP, to monitor and
-  manage traffic.
-
-* Layer-3 architectures enable the use of :term:`quality of service (QoS)` to
-  manage network performance.
-
-Layer-3 architecture limitations
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The main limitation of layer 3 is that there is no built-in isolation mechanism
-comparable to the VLANs in layer-2 networks. Furthermore, the hierarchical
-nature of IP addresses means that an instance is on the same subnet as its
-physical host, making migration out of the subnet difficult. For these reasons,
-network virtualization needs to use IPencapsulation and software at the end
-hosts. This is for isolation and the separation of the addressing in the
-virtual layer from the addressing in the physical layer. Other potential
-disadvantages of layer 3 include the need to design an IP addressing scheme
-rather than relying on the switches to keep track of the MAC addresses
-automatically, and to configure the interior gateway routing protocol in the
-switches.
-
-Network design
-~~~~~~~~~~~~~~
-
-There are many reasons an OpenStack network has complex requirements. However,
-one main factor is the many components that interact at different levels of the
-system stack, adding complexity. Data flows are also complex. Data in an
-OpenStack cloud moves both between instances across the network (also known as
-East-West), as well as in and out of the system (also known as North-South).
-Physical server nodes have network requirements that are independent of
-instance network requirements; these you must isolate from the core network
-to account for scalability. We recommend functionally separating the networks
-for security purposes and tuning performance through traffic shaping.
-
-You must consider a number of important general technical and business factors
-when planning and designing an OpenStack network. These include:
-
-* A requirement for vendor independence. To avoid hardware or software vendor
-  lock-in, the design should not rely on specific features of a vendors router
-  or switch.
-* A requirement to massively scale the ecosystem to support millions of end
-  users.
-* A requirement to support indeterminate platforms and applications.
-* A requirement to design for cost efficient operations to take advantage of
-  massive scale.
-* A requirement to ensure that there is no single point of failure in the
-  cloud ecosystem.
-* A requirement for high availability architecture to meet customer SLA
-  requirements.
-* A requirement to be tolerant of rack level failure.
-* A requirement to maximize flexibility to architect future production
-  environments.
-
-Bearing in mind these considerations, we recommend the following:
-
-* Layer-3 designs are preferable to layer-2 architectures.
-* Design a dense multi-path network core to support multi-directional
-  scaling and flexibility.
-* Use hierarchical addressing because it is the only viable option to scale
-  network ecosystem.
-* Use virtual networking to isolate instance service network traffic from the
-  management and internal network traffic.
-* Isolate virtual networks using encapsulation technologies.
-* Use traffic shaping for performance tuning.
-* Use eBGP to connect to the Internet up-link.
-* Use iBGP to flatten the internal traffic on the layer-3 mesh.
-* Determine the most effective configuration for block storage network.
-
-Operator considerations
-----------------------
-
-The network design for an OpenStack cluster includes decisions regarding
-the interconnect needs within the cluster, the need to allow clients to
-access their resources, and the access requirements for operators to
-administrate the cluster. You should consider the bandwidth, latency,
-and reliability of these networks.
-
-Whether you are using an external provider or an internal team, you need
-to consider additional design decisions about monitoring and alarming.
-If you are using an external provider, service level agreements (SLAs)
-are typically defined in your contract. Operational considerations such
-as bandwidth, latency, and jitter can be part of the SLA.
-
-As demand for network resources increase, make sure your network design
-accommodates expansion and upgrades. Operators add additional IP address
-blocks and add additional bandwidth capacity. In addition, consider
-managing hardware and software lifecycle events, for example upgrades,
-decommissioning, and outages, while avoiding service interruptions for
-tenants.
-
-Factor maintainability into the overall network design. This includes
-the ability to manage and maintain IP addresses as well as the use of
-overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
-tags. As an example, if you may need to change all of the IP addresses
-on a network, a process known as renumbering, then the design must
-support this function.
-
-Address network-focused applications when considering certain
-operational realities. For example, consider the impending exhaustion of
-IPv4 addresses, the migration to IPv6, and the use of private networks
-to segregate different types of traffic that an application receives or
-generates. In the case of IPv4 to IPv6 migrations, applications should
-follow best practices for storing IP addresses. We recommend you avoid
-relying on IPv4 features that did not carry over to the IPv6 protocol or
-have differences in implementation.
-
-To segregate traffic, allow applications to create a private tenant
-network for database and storage network traffic. Use a public network
-for services that require direct client access from the Internet. Upon
-segregating the traffic, consider :term:`quality of service (QoS)` and
-security to ensure each network has the required level of service.
-
-Finally, consider the routing of network traffic. For some applications,
-develop a complex policy framework for routing. To create a routing
-policy that satisfies business requirements, consider the economic cost
-of transmitting traffic over expensive links versus cheaper links, in
-addition to bandwidth, latency, and jitter requirements.
-
-Additionally, consider how to respond to network events. How load
-transfers from one link to another during a failure scenario could be
-a factor in the design. If you do not plan network capacity
-correctly, failover traffic could overwhelm other ports or network
-links and create a cascading failure scenario. In this case,
-traffic that fails over to one link overwhelms that link and then
-moves to the subsequent links until all network traffic stops.
-
-Additional considerations
-------------------------
-
-There are several further considerations when designing a network-focused
-OpenStack cloud. Redundant networking: ToR switch high availability risk
-analysis. In most cases, it is much more economical to use a single switch
-with a small pool of spare switches to replace failed units than it is to
-outfit an entire data center with redundant switches. Applications should
-tolerate rack level outages without affecting normal operations since network
-and compute resources are easily provisioned and plentiful.
-
-Research indicates the mean time between failures (MTBF) on switches is
-between 100,000 and 200,000 hours. This number is dependent on the ambient
-temperature of the switch in the data center. When properly cooled and
-maintained, this translates to between 11 and 22 years before failure. Even
-in the worst case of poor ventilation and high ambient temperatures in the data
-center, the MTBF is still 2-3 years.
-
-Reference
-https://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
-for further information.
-
-Legacy networking (nova-network)
-OpenStack Networking
-Simple, single agent
-Complex, multiple agents
-Flat or VLAN
-Flat, VLAN, Overlays, L2-L3, SDN
-No plug-in support
-Plug-in support for 3rd parties
-No multi-tier topologies
-Multi-tier topologies
-
-Preparing for the future: IPv6 support
--------------------------------------
-
-One of the most important networking topics today is the exhaustion of
-IPv4 addresses. As of late 2015, ICANN announced that the the final
-IPv4 address blocks have been fully assigned. Because of this, IPv6
-protocol has become the future of network focused applications. IPv6
-increases the address space significantly, fixes long standing issues
-in the IPv4 protocol, and will become essential for network focused
-applications in the future.
-
-OpenStack Networking, when configured for it, supports IPv6. To enable
-IPv6, create an IPv6 subnet in Networking and use IPv6 prefixes when
-creating security groups.
-
-Asymmetric links
----------------
-
-When designing a network architecture, the traffic patterns of an
-application heavily influence the allocation of total bandwidth and
-the number of links that you use to send and receive traffic. Applications
-that provide file storage for customers allocate bandwidth and links to
-favor incoming traffic; whereas video streaming applications allocate
-bandwidth and links to favor outgoing traffic.
-
-Performance
-----------
-
-It is important to analyze the applications tolerance for latency and
-jitter when designing an environment to support network focused
-applications. Certain applications, for example VoIP, are less tolerant
-of latency and jitter. When latency and jitter are issues, certain
-applications may require tuning of QoS parameters and network device
-queues to ensure that they queue for transmit immediately or guarantee
-minimum bandwidth. Since OpenStack currently does not support these functions,
-consider carefully your selected network plug-in.
-
-The location of a service may also impact the application or consumer
-experience. If an application serves differing content to different users,
-it must properly direct connections to those specific locations. Where
-appropriate, use a multi-site installation for these situations.
-
-You can implement networking in two separate ways. Legacy networking
-(nova-network) provides a flat DHCP network with a single broadcast domain.
-This implementation does not support tenant isolation networks or advanced
-plug-ins, but it is currently the only way to implement a distributed
-layer-3 (L3) agent using the multi host configuration. OpenStack Networking
-(neutron) is the official networking implementation and provides a pluggable
-architecture that supports a large variety of network methods. Some of these
-include a layer-2 only provider network model, external device plug-ins, or
-even OpenFlow controllers.
-
-Networking at large scales becomes a set of boundary questions. The
-determination of how large a layer-2 domain must be is based on the
-amount of nodes within the domain and the amount of broadcast traffic
-that passes between instances. Breaking layer-2 boundaries may require
-the implementation of overlay networks and tunnels. This decision is a
-balancing act between the need for a smaller overhead or a need for a smaller
-domain.
-
-When selecting network devices, be aware that making a decision based on the
-greatest port density often comes with a drawback. Aggregation switches and
-routers have not all kept pace with Top of Rack switches and may induce
-bottlenecks on north-south traffic. As a result, it may be possible for
-massive amounts of downstream network utilization to impact upstream network
-devices, impacting service to the cloud. Since OpenStack does not currently
-provide a mechanism for traffic shaping or rate limiting, it is necessary to
-implement these features at the network hardware level.
-
-Tunable networking components
-----------------------------
-
-Consider configurable networking components related to an OpenStack
-architecture design when designing for network intensive workloads
-that include MTU and QoS. Some workloads require a larger MTU than normal
-due to the transfer of large blocks of data. When providing network
-service for applications such as video streaming or storage replication,
-we recommend that you configure both OpenStack hardware nodes and the
-supporting network equipment for jumbo frames where possible. This
-allows for better use of available bandwidth. Configure jumbo frames across the
-complete path the packets traverse. If one network component is not capable of
-handling jumbo frames then the entire path reverts to the default MTU.
-
-:term:`Quality of Service (QoS)` also has a great impact on network intensive
-workloads as it provides instant service to packets which have a higher
-priority due to the impact of poor network performance. In applications such as
-Voice over IP (VoIP), differentiated services code points are a near
-requirement for proper operation. You can also use QoS in the opposite
-direction for mixed workloads to prevent low priority but high bandwidth
-applications, for example backup services, video conferencing, or file sharing,
-from blocking bandwidth that is needed for the proper operation of other
-workloads. It is possible to tag file storage traffic as a lower class, such as
-best effort or scavenger, to allow the higher priority traffic through. In
-cases where regions within a cloud might be geographically distributed it may
-also be necessary to plan accordingly to implement WAN optimization to combat
-latency or packet loss.
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-software-selection.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-software-selection.rst
@ -1,260 +0,0 @@
-==================
-Software selection
-==================
-
-Software selection, particularly for a general purpose OpenStack architecture
-design involves three areas:
-
-* Operating system (OS) and hypervisor
-
-* OpenStack components
-
-* Supplemental software
-
-Operating system and hypervisor
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The operating system (OS) and hypervisor have a significant impact on
-the overall design. Selecting a particular operating system and
-hypervisor can directly affect server hardware selection. Make sure the
-storage hardware and topology support the selected operating system and
-hypervisor combination. Also ensure the networking hardware selection
-and topology will work with the chosen operating system and hypervisor
-combination.
-
-Some areas that could be impacted by the selection of OS and hypervisor
-include:
-
-Cost
- Selecting a commercially supported hypervisor, such as Microsoft
- Hyper-V, will result in a different cost model rather than
- community-supported open source hypervisors including
- :term:`KVM<kernel-based VM (KVM)>`, Kinstance or :term:`Xen`. When
- comparing open source OS solutions, choosing Ubuntu over Red Hat
- (or vice versa) will have an impact on cost due to support
- contracts.
-
-Support
- Depending on the selected hypervisor, staff should have the
- appropriate training and knowledge to support the selected OS and
- hypervisor combination. If they do not, training will need to be
- provided which could have a cost impact on the design.
-
-Management tools
- The management tools used for Ubuntu and Kinstance differ from the
- management tools for VMware vSphere. Although both OS and hypervisor
- combinations are supported by OpenStack, there is
- different impact to the rest of the design as a result of the
- selection of one combination versus the other.
-
-Scale and performance
- Ensure that selected OS and hypervisor combinations meet the
- appropriate scale and performance requirements. The chosen
- architecture will need to meet the targeted instance-host ratios
- with the selected OS-hypervisor combinations.
-
-Security
- Ensure that the design can accommodate regular periodic
- installations of application security patches while maintaining
- required workloads. The frequency of security patches for the
- proposed OS-hypervisor combination will have an impact on
- performance and the patch installation process could affect
- maintenance windows.
-
-Supported features
- Determine which OpenStack features are required. This will often
- determine the selection of the OS-hypervisor combination. Some
- features are only available with specific operating systems or
- hypervisors.
-
-Interoperability
- You will need to consider how the OS and hypervisor combination
- interactions with other operating systems and hypervisors, including
- other software solutions. Operational troubleshooting tools for one
- OS-hypervisor combination may differ from the tools used for another
- OS-hypervisor combination and, as a result, the design will need to
- address if the two sets of tools need to interoperate.
-
-OpenStack components
-~~~~~~~~~~~~~~~~~~~~
-
-Selecting which OpenStack components are included in the overall design
-is important. Some OpenStack components, like compute and Image service,
-are required in every architecture. Other components, like
-Orchestration, are not always required.
-
-A compute-focused OpenStack design architecture may contain the following
-components:
-
-* Identity (keystone)
-
-* Dashboard (horizon)
-
-* Compute (nova)
-
-* Object Storage (swift)
-
-* Image (glance)
-
-* Networking (neutron)
-
-* Orchestration (heat)
-
-  .. note::
-
-     A compute-focused design is less likely to include OpenStack Block
-     Storage. However, there may be some situations where the need for
-     performance requires a block storage component to improve data I-O.
-
-Excluding certain OpenStack components can limit or constrain the
-functionality of other components. For example, if the architecture
-includes Orchestration but excludes Telemetry, then the design will not
-be able to take advantage of Orchestrations' auto scaling functionality.
-It is important to research the component interdependencies in
-conjunction with the technical requirements before deciding on the final
-architecture.
-
-Networking software
-~~~~~~~~~~~~~~~~~~~
-
-OpenStack Networking (neutron) provides a wide variety of networking
-services for instances. There are many additional networking software
-packages that can be useful when managing OpenStack components. Some
-examples include:
-
-* Software to provide load balancing
-
-* Network redundancy protocols
-
-* Routing daemons
-
-Some of these software packages are described in more detail in the
-`OpenStack network nodes chapter <http://docs.openstack.org/ha-guide
-/networking-ha.html>`_ in the OpenStack High Availability Guide.
-
-For a general purpose OpenStack cloud, the OpenStack infrastructure
-components need to be highly available. If the design does not include
-hardware load balancing, networking software packages like HAProxy will
-need to be included.
-
-For a compute-focused OpenStack cloud, the OpenStack infrastructure
-components must be highly available. If the design does not include
-hardware load balancing, you must add networking software packages, for
-example, HAProxy.
-
-Management software
-~~~~~~~~~~~~~~~~~~~
-
-Management software includes software for providing:
-
-* Clustering
-
-* Logging
-
-* Monitoring
-
-* Alerting
-
-.. important::
-
-   The factors for determining which software packages in this category
-   to select is outside the scope of this design guide.
-
-The selected supplemental software solution impacts and affects the overall
-OpenStack cloud design. This includes software for providing clustering,
-logging, monitoring and alerting.
-
-The inclusion of clustering software, such as Corosync or Pacemaker, is
-primarily determined by the availability of the cloud infrastructure and
-the complexity of supporting the configuration after it is deployed. The
-`OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/>`_
-provides more details on the installation and configuration of Corosync
-and Pacemaker, should these packages need to be included in the design.
-
-Operational considerations determine the requirements for logging,
-monitoring, and alerting. Each of these sub-categories include various
-options.
-
-For example, in the logging sub-category you could select Logstash,
-Splunk, Log Insight, or another log aggregation-consolidation tool.
-Store logs in a centralized location to facilitate performing analytics
-against the data. Log data analytics engines can also provide automation
-and issue notification, by providing a mechanism to both alert and
-automatically attempt to remediate some of the more commonly known
-issues.
-
-If these software packages are required, the design must account for the
-additional resource consumption (CPU, RAM, storage, and network
-bandwidth). Some other potential design impacts include:
-
-* OS-hypervisor combination
-   Ensure that the selected logging, monitoring, or alerting tools support
-   the proposed OS-hypervisor combination.
-
-* Network hardware
-   The network hardware selection needs to be supported by the logging,
-   monitoring, and alerting software.
-
-Database software
-~~~~~~~~~~~~~~~~~
-
-Most OpenStack components require access to back-end database services
-to store state and configuration information. Choose an appropriate
-back-end database which satisfies the availability and fault tolerance
-requirements of the OpenStack services.
-
-MySQL is the default database for OpenStack, but other compatible
-databases are available.
-
-.. note::
-
-   Telemetry uses MongoDB.
-
-The chosen high availability database solution changes according to the
-selected database. MySQL, for example, provides several options. Use a
-replication technology such as Galera for active-active clustering. For
-active-passive use some form of shared storage. Each of these potential
-solutions has an impact on the design:
-
-* Solutions that employ Galera/MariaDB require at least three MySQL
-  nodes.
-
-* MongoDB has its own design considerations for high availability.
-
-* OpenStack design, generally, does not include shared storage.
-  However, for some high availability designs, certain components might
-  require it depending on the specific implementation.
-
-
-Licensing
-~~~~~~~~~
-
-The many different forms of license agreements for software are often written
-with the use of dedicated hardware in mind.  This model is relevant for the
-cloud platform itself, including the hypervisor operating system, supporting
-software for items such as database, RPC, backup, and so on.  Consideration
-must be made when offering Compute service instances and applications to end
-users of the cloud, since the license terms for that software may need some
-adjustment to be able to operate economically in the cloud.
-
-Multi-site OpenStack deployments present additional licensing
-considerations over and above regular OpenStack clouds, particularly
-where site licenses are in use to provide cost efficient access to
-software licenses. The licensing for host operating systems, guest
-operating systems, OpenStack distributions (if applicable),
-software-defined infrastructure including network controllers and
-storage systems, and even individual applications need to be evaluated.
-
-Topics to consider include:
-
-* The definition of what constitutes a site in the relevant licenses,
-  as the term does not necessarily denote a geographic or otherwise
-  physically isolated location.
-
-* Differentiations between "hot" (active) and "cold" (inactive) sites,
-  where significant savings may be made in situations where one site is
-  a cold standby for disaster recovery purposes only.
-
-* Certain locations might require local vendors to provide support and
-  services for each site which may vary with the licensing agreement in
-  place.
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements.rst
@ -1,61 +0,0 @@
-======================
-Technical requirements
-======================
-
-.. toctree::
-   :maxdepth: 2
-
-   technical-requirements-software-selection.rst
-   technical-requirements-hardware-selection.rst
-   technical-requirements-network-design.rst
-   technical-requirements-logging-monitoring.rst
-
-Any given cloud deployment is expected to include these base services:
-
-* Compute
-
-* Networking
-
-* Storage
-
-Each of these services have different software and hardware resource
-requirements.
-As a result, you must make design decisions relating directly
-to the service, as well as provide a balanced infrastructure for all services.
-
-There are many ways to split out an OpenStack deployment, but a two box
-deployment typically consists of:
-
-* A controller node
-* A compute node
-
-The controller node will typically host:
-
-* Identity service (for authentication)
-* Image service (for image storage)
-* Block Storage
-* Networking service (the ``nova-network`` service may be used instead)
-* Compute service API, conductor, and scheduling services
-* Supporting services like the message broker (RabbitMQ)
-  and database (MySQL or PostgreSQL)
-
-The compute node will typically host:
-
-* Nova compute
-* A networking agent, if using OpenStack Networking
-
-To provide additional block storage in a small environment, you may also
-choose to deploy ``cinder-volume`` on the compute node.
-You may also choose to run ``nova-compute`` on the controller itself to
-allow you to run virtual machines on both hosts in a small environments.
-
-To expand such an environment you would add additional compute nodes,
-a separate networking node, and eventually a second controller for high
-availability. You might also split out storage to dedicated nodes.
-
-The OpenStack Installation guides provide some guidance on getting a basic
-2-3 node deployment installed and running:
-
-* `OpenStack Installation Guide for Ubuntu <http://docs.openstack.org/mitaka/install-guide-ubuntu/>`_
-* `OpenStack Installation Guide for Red Hat Enterprise Linux and CentOS <http://docs.openstack.org/mikata/install-guide-rdo/>`_
-* `OpenStack Installation Guide for openSUSE and SUSE Linux Enterprise <http://docs.openstack.org/mitaka/install-guide-obs/>`_
--- a/doc/arch-design-draft/source/design-compute.rst
+++ b/doc/arch-design-draft/source/design-compute.rst
@ -134,7 +134,7 @@ of cores is further multiplied.
   testing with your local workload with both Hyper-Threading on and off to
   determine what is more appropriate in your case.

-Choosing a Hypervisor
+Choosing a hypervisor
 ~~~~~~~~~~~~~~~~~~~~~

 A hypervisor provides software to manage virtual machine access to the
@ -173,6 +173,110 @@ and in the `configuration reference
   deployment using host aggregates or cells. However, an individual
   compute node can run only a single hypervisor at a time.

+Choosing server hardware
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Consider the following in selecting server hardware form factor suited for
+your OpenStack design architecture:
+
+* Most blade servers can support dual-socket multi-core CPUs. To avoid
+  this CPU limit, select ``full width`` or ``full height`` blades. Be
+  aware, however, that this also decreases server density. For example,
+  high density blade servers such as HP BladeSystem or Dell PowerEdge
+  M1000e support up to 16 servers in only ten rack units. Using
+  half-height blades is twice as dense as using full-height blades,
+  which results in only eight servers per ten rack units.
+
+* 1U rack-mounted servers have the ability to offer greater server density
+  than a blade server solution, but are often limited to dual-socket,
+  multi-core CPU configurations. It is possible to place forty 1U servers
+  in a rack, providing space for the top of rack (ToR) switches, compared
+  to 32 full width blade servers.
+
+  To obtain greater than dual-socket support in a 1U rack-mount form
+  factor, customers need to buy their systems from Original Design
+  Manufacturers (ODMs) or second-tier manufacturers.
+
+  .. warning::
+
+     This may cause issues for organizations that have preferred
+     vendor policies or concerns with support and hardware warranties
+     of non-tier 1 vendors.
+
+* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
+  but with a corresponding decrease in server density (half the density
+  that 1U rack-mounted servers offer).
+
+* Larger rack-mounted servers, such as 4U servers, often provide even
+  greater CPU capacity, commonly supporting four or even eight CPU
+  sockets. These servers have greater expandability, but such servers
+  have much lower server density and are often more expensive.
+
+* ``Sled servers`` are rack-mounted servers that support multiple
+  independent servers in a single 2U or 3U enclosure. These deliver
+  higher density as compared to typical 1U or 2U rack-mounted servers.
+  For example, many sled servers offer four independent dual-socket
+  nodes in 2U for a total of eight CPU sockets in 2U.
+
+
+Other hardware considerations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Other factors that influence server hardware selection for an OpenStack
+design architecture include:
+
+Instance density
+ More hosts are required to support the anticipated scale
+ if the design architecture uses dual-socket hardware designs.
+
+ For a general purpose OpenStack cloud, sizing is an important consideration.
+ The expected or anticipated number of instances that each hypervisor can
+ host is a common meter used in sizing the deployment. The selected server
+ hardware needs to support the expected or anticipated instance density.
+
+Host density
+ Another option to address the higher host count is to use a
+ quad-socket platform. Taking this approach decreases host density
+ which also increases rack count. This configuration affects the
+ number of power connections and also impacts network and cooling
+ requirements.
+
+ Physical data centers have limited physical space, power, and
+ cooling. The number of hosts (or hypervisors) that can be fitted
+ into a given metric (rack, rack unit, or floor tile) is another
+ important method of sizing. Floor weight is an often overlooked
+ consideration. The data center floor must be able to support the
+ weight of the proposed number of hosts within a rack or set of
+ racks. These factors need to be applied as part of the host density
+ calculation and server hardware selection.
+
+Power and cooling density
+ The power and cooling density requirements might be lower than with
+ blade, sled, or 1U server designs due to lower host density (by
+ using 2U, 3U or even 4U server designs). For data centers with older
+ infrastructure, this might be a desirable feature.
+
+ Data centers have a specified amount of power fed to a given rack or
+ set of racks. Older data centers may have a power density as power
+ as low as 20 AMPs per rack, while more recent data centers can be
+ architected to support power densities as high as 120 AMP per rack.
+ The selected server hardware must take power density into account.
+
+Network connectivity
+ The selected server hardware must have the appropriate number of
+ network connections, as well as the right type of network
+ connections, in order to support the proposed architecture. Ensure
+ that, at a minimum, there are at least two diverse network
+ connections coming into each rack.
+
+The selection of form factors or architectures affects the selection of
+server hardware. Ensure that the selected server hardware is configured
+to support enough storage capacity (or storage expandability) to match
+the requirements of selected scale-out storage solution. Similarly, the
+network architecture impacts the server hardware selection and vice
+versa.
+
+
 Instance Storage Solutions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

@ -381,3 +485,144 @@ Networking

 Networking in OpenStack is a complex, multifaceted challenge. See
 :doc:`design-networking`.
+
+Compute (server) hardware selection
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Consider the following factors when selecting compute (server) hardware:
+
+* Server density
+   A measure of how many servers can fit into a given measure of
+   physical space, such as a rack unit [U].
+
+* Resource capacity
+   The number of CPU cores, how much RAM, or how much storage a given
+   server delivers.
+
+* Expandability
+   The number of additional resources you can add to a server before it
+   reaches capacity.
+
+* Cost
+   The relative cost of the hardware weighed against the level of
+   design effort needed to build the system.
+
+Weigh these considerations against each other to determine the best
+design for the desired purpose. For example, increasing server density
+means sacrificing resource capacity or expandability.  Increasing resource
+capacity and expandability can increase cost but decrease server density.
+Decreasing cost often means decreasing supportability, server density,
+resource capacity, and expandability.
+
+Compute capacity (CPU cores and RAM capacity) is a secondary
+consideration for selecting server hardware. The required
+server hardware must supply adequate CPU sockets, additional CPU cores,
+and more RAM; network connectivity and storage capacity are not as
+critical. The hardware needs to provide enough network connectivity and
+storage capacity to meet the user requirements.
+
+For a compute-focused cloud, emphasis should be on server
+hardware that can offer more CPU sockets, more CPU cores, and more RAM.
+Network connectivity and storage capacity are less critical.
+
+When designing a OpenStack cloud architecture, you must
+consider whether you intend to scale up or scale out. Selecting a
+smaller number of larger hosts, or a larger number of smaller hosts,
+depends on a combination of factors: cost, power, cooling, physical rack
+and floor space, support-warranty, and manageability.
+
+Consider the following in selecting server hardware form factor suited for
+your OpenStack design architecture:
+
+* Most blade servers can support dual-socket multi-core CPUs. To avoid
+  this CPU limit, select ``full width`` or ``full height`` blades. Be
+  aware, however, that this also decreases server density. For example,
+  high density blade servers such as HP BladeSystem or Dell PowerEdge
+  M1000e support up to 16 servers in only ten rack units. Using
+  half-height blades is twice as dense as using full-height blades,
+  which results in only eight servers per ten rack units.
+
+* 1U rack-mounted servers have the ability to offer greater server density
+  than a blade server solution, but are often limited to dual-socket,
+  multi-core CPU configurations. It is possible to place forty 1U servers
+  in a rack, providing space for the top of rack (ToR) switches, compared
+  to 32 full width blade servers.
+
+  To obtain greater than dual-socket support in a 1U rack-mount form
+  factor, customers need to buy their systems from Original Design
+  Manufacturers (ODMs) or second-tier manufacturers.
+
+  .. warning::
+
+     This may cause issues for organizations that have preferred
+     vendor policies or concerns with support and hardware warranties
+     of non-tier 1 vendors.
+
+* 2U rack-mounted servers provide quad-socket, multi-core CPU support,
+  but with a corresponding decrease in server density (half the density
+  that 1U rack-mounted servers offer).
+
+* Larger rack-mounted servers, such as 4U servers, often provide even
+  greater CPU capacity, commonly supporting four or even eight CPU
+  sockets. These servers have greater expandability, but such servers
+  have much lower server density and are often more expensive.
+
+* ``Sled servers`` are rack-mounted servers that support multiple
+  independent servers in a single 2U or 3U enclosure. These deliver
+  higher density as compared to typical 1U or 2U rack-mounted servers.
+  For example, many sled servers offer four independent dual-socket
+  nodes in 2U for a total of eight CPU sockets in 2U.
+
+Other factors that influence server hardware selection for an OpenStack
+design architecture include:
+
+Instance density
+ More hosts are required to support the anticipated scale
+ if the design architecture uses dual-socket hardware designs.
+
+ For a general purpose OpenStack cloud, sizing is an important consideration.
+ The expected or anticipated number of instances that each hypervisor can
+ host is a common meter used in sizing the deployment. The selected server
+ hardware needs to support the expected or anticipated instance density.
+
+Host density
+ Another option to address the higher host count is to use a
+ quad-socket platform. Taking this approach decreases host density
+ which also increases rack count. This configuration affects the
+ number of power connections and also impacts network and cooling
+ requirements.
+
+ Physical data centers have limited physical space, power, and
+ cooling. The number of hosts (or hypervisors) that can be fitted
+ into a given metric (rack, rack unit, or floor tile) is another
+ important method of sizing. Floor weight is an often overlooked
+ consideration. The data center floor must be able to support the
+ weight of the proposed number of hosts within a rack or set of
+ racks. These factors need to be applied as part of the host density
+ calculation and server hardware selection.
+
+Power and cooling density
+ The power and cooling density requirements might be lower than with
+ blade, sled, or 1U server designs due to lower host density (by
+ using 2U, 3U or even 4U server designs). For data centers with older
+ infrastructure, this might be a desirable feature.
+
+ Data centers have a specified amount of power fed to a given rack or
+ set of racks. Older data centers may have a power density as power
+ as low as 20 AMPs per rack, while more recent data centers can be
+ architected to support power densities as high as 120 AMP per rack.
+ The selected server hardware must take power density into account.
+
+Network connectivity
+ The selected server hardware must have the appropriate number of
+ network connections, as well as the right type of network
+ connections, in order to support the proposed architecture. Ensure
+ that, at a minimum, there are at least two diverse network
+ connections coming into each rack.
+
+The selection of form factors or architectures affects the selection of
+server hardware. Ensure that the selected server hardware is configured
+to support enough storage capacity (or storage expandability) to match
+the requirements of selected scale-out storage solution. Similarly, the
+network architecture impacts the server hardware selection and vice
+versa.
--- a/doc/arch-design-draft/source/design-networking/design-networking-concepts.rst
+++ b/doc/arch-design-draft/source/design-networking/design-networking-concepts.rst
@ -4,6 +4,516 @@ Networking concepts

 Cloud fundementally changes the ways that networking is provided and consumed.
 Understanding the following concepts and decisions is imperative when making
-the right architectural decisions
+the right architectural decisions.
+
+OpenStack clouds generally have multiple network segments, with each
+segment providing access to particular resources. The network segments
+themselves also require network communication paths that should be
+separated from the other networks. When designing network services for a
+general purpose cloud, plan for either a physical or logical separation
+of network segments used by operators and tenants. Additional network
+segments can also be created for access to internal services such as the
+message bus and database used by various systems. Segregating these
+services onto separate networks helps to protect sensitive data and
+unauthorized access.
+
+Choose a networking service based on the requirements of your instances.
+The architecture and design of your cloud will impact whether you choose
+OpenStack Networking (neutron) or legacy networking (nova-network).
+
+Networking (neutron)
+~~~~~~~~~~~~~~~~~~~~
+
+OpenStack Networking (neutron) is a first class networking service that gives
+full control over creation of virtual network resources to tenants. This is
+often accomplished in the form of tunneling protocols that establish
+encapsulated communication paths over existing network infrastructure in order
+to segment tenant traffic. This method varies depending on the specific
+implementation, but some of the more common methods include tunneling over
+GRE, encapsulating with VXLAN, and VLAN tags.
+
+We recommend you design at least three network segments. The first segment
+should be a public network, used to access REST APIs by tenants and operators.
+The controller nodes and swift proxies are the only devices connecting to this
+network segment. In some cases, this public network might also be serviced by
+hardware load balancers and other network devices.
+
+The second segment is used by administrators to manage hardware resources.
+Configuration management tools also utilize this segment for deploying
+software and services onto new hardware. In some cases, this network
+segment is also used for internal services, including the message bus
+and database services. The second segment needs to communicate with every
+hardware node. Due to the highly sensitive nature of this network segment,
+it needs to be secured from unauthorized access.
+
+The third network segment is used by applications and consumers to access the
+physical network, and for users to access applications. This network is
+segregated from the one used to access the cloud APIs and is not capable
+of communicating directly with the hardware resources in the cloud.
+Communication on this network segment is required by compute resource
+nodes and network gateway services that allow application data to access the
+physical network from outside the cloud.
+
+Legacy networking (nova-network)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The legacy networking (nova-network) service is primarily a layer-2 networking
+service. It functions in two modes: flat networking mode and VLAN mode. In a
+flat network mode, all network hardware nodes and devices throughout the cloud
+are connected to a single layer-2 network segment that provides access to
+application data.
+
+However, when the network devices in the cloud support segmentation using
+VLANs, legacy networking can operate in the second mode. In this design model,
+each tenant within the cloud is assigned a network subnet which is mapped to
+a VLAN on the physical network. It is especially important to remember that
+the maximum number of VLANs that can be used within a spanning tree domain
+is 4096. This places a hard limit on the amount of growth possible within the
+data center. Consequently, when designing a general purpose cloud intended to
+support multiple tenants, we recommend the use of legacy networking with
+VLANs, and not in flat network mode.
+
+Layer-2 architecture advantages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A network designed on layer-2 protocols has advantages over a network designed
+on layer-3 protocols. In spite of the difficulties of using a bridge to perform
+the network role of a router, many vendors, customers, and service providers
+choose to use Ethernet in as many parts of their networks as possible. The
+benefits of selecting a layer-2 design are:
+
+* Ethernet frames contain all the essentials for networking. These include, but
+  are not limited to, globally unique source addresses, globally unique
+  destination addresses, and error control.
+
+* Ethernet frames contain all the essentials for networking. These include,
+  but are not limited to, globally unique source addresses, globally unique
+  destination addresses, and error control.
+
+* Ethernet frames can carry any kind of packet. Networking at layer-2 is
+  independent of the layer-3 protocol.
+
+* Adding more layers to the Ethernet frame only slows the networking process
+  down. This is known as nodal processing delay.
+
+* You can add adjunct networking features, for example class of service (CoS)
+  or multicasting, to Ethernet as readily as IP networks.
+
+* VLANs are an easy mechanism for isolating networks.
+
+Most information starts and ends inside Ethernet frames. Today this applies
+to data, voice, and video. The concept is that the network will benefit more
+from the advantages of Ethernet if the transfer of information from a source
+to a destination is in the form of Ethernet frames.
+
+Although it is not a substitute for IP networking, networking at layer-2 can
+be a powerful adjunct to IP networking.
+
+Layer-2 Ethernet usage has these additional advantages over layer-3 IP network
+usage:
+
+* Speed
+* Reduced overhead of the IP hierarchy.
+* No need to keep track of address configuration as systems move around.
+
+Whereas the simplicity of layer-2 protocols might work well in a data center
+with hundreds of physical machines, cloud data centers have the additional
+burden of needing to keep track of all virtual machine addresses and
+networks. In these data centers, it is not uncommon for one physical node
+to support 30-40 instances.
+
+.. Important::
+
+   Networking at the frame level says nothing about the presence or
+   absence of IP addresses at the packet level. Almost all ports, links, and
+   devices on a network of LAN switches still have IP addresses, as do all the
+   source and destination hosts. There are many reasons for the continued need
+   for IP addressing. The largest one is the need to manage the network. A
+   device or link without an IP address is usually invisible to most
+   management applications. Utilities including remote access for diagnostics,
+   file transfer of configurations and software, and similar applications
+   cannot run without IP addresses as well as MAC addresses.
+
+Layer-2 architecture limitations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Layer-2 network architectures have some limitations that become noticeable when
+used outside of traditional data centers.
+
+* Number of VLANs is limited to 4096.
+* The number of MACs stored in switch tables is limited.
+* You must accommodate the need to maintain a set of layer-4 devices to handle
+  traffic control.
+* MLAG, often used for switch redundancy, is a proprietary solution that does
+  not scale beyond two devices and forces vendor lock-in.
+* It can be difficult to troubleshoot a network without IP addresses and ICMP.
+* Configuring ARP can be complicated on a large layer-2 networks.
+* All network devices need to be aware of all MACs, even instance MACs, so
+  there is constant churn in MAC tables and network state changes as instances
+  start and stop.
+* Migrating MACs (instance migration) to different physical locations are a
+  potential problem if you do not set ARP table timeouts properly.
+
+It is important to know that layer-2 has a very limited set of network
+management tools. It is difficult to control traffic as it does not have
+mechanisms to manage the network or shape the traffic. Network
+troubleshooting is also troublesome, in part because network devices have
+no IP addresses. As a result, there is no reasonable way to check network
+delay.
+
+In a layer-2 network all devices are aware of all MACs, even those that belong
+to instances. The network state information in the backbone changes whenever an
+instance starts or stops. Because of this, there is far too much churn in the
+MAC tables on the backbone switches.
+
+Furthermore, on large layer-2 networks, configuring ARP learning can be
+complicated. The setting for the MAC address timer on switches is critical
+and, if set incorrectly, can cause significant performance problems. So when
+migrating MACs to different physical locations to support instance migration,
+problems may arise. As an example, the Cisco default MAC address timer is
+extremely long. As such, the network information maintained in the switches
+could be out of sync with the new location of the instance.
+
+Layer-3 architecture advantages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In layer-3 networking, routing takes instance MAC and IP addresses out of the
+network core, reducing state churn. The only time there would be a routing
+state change is in the case of a Top of Rack (ToR) switch failure or a link
+failure in the backbone itself. Other advantages of using a layer-3
+architecture include:
+
+* Layer-3 networks provide the same level of resiliency and scalability
+  as the Internet.
+
+* Controlling traffic with routing metrics is straightforward.
+
+* You can configure layer-3 to useˇBGPˇconfederation for scalability. This
+  way core routers have state proportional to the number of racks, not to the
+  number of servers or instances.
+
+* There are a variety of well tested tools, such as ICMP, to monitor and
+  manage traffic.
+
+* Layer-3 architectures enable the use of :term:`quality of service (QoS)` to
+  manage network performance.
+
+Layer-3 architecture limitations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The main limitation of layer-3 networking is that there is no built-in
+isolation mechanism comparable to the VLANs in layer-2 networks. Furthermore,
+the hierarchical nature of IP addresses means that an instance is on the same
+subnet as its
+physical host, making migration out of the subnet difficult. For these reasons,
+network virtualization needs to use IPencapsulation and software at the end
+hosts. This is for isolation and the separation of the addressing in the
+virtual layer from the addressing in the physical layer. Other potential
+disadvantages of layer 3 include the need to design an IP addressing scheme
+rather than relying on the switches to keep track of the MAC addresses
+automatically, and to configure the interior gateway routing protocol in the
+switches.
+
+Network design
+~~~~~~~~~~~~~~
+
+There are many reasons an OpenStack network has complex requirements. However,
+one main factor is the many components that interact at different levels of the
+system stack, adding complexity. Data flows are also complex. Data in an
+OpenStack cloud moves both between instances across the network (also known as
+East-West), as well as in and out of the system (also known as North-South).
+Physical server nodes have network requirements that are independent of
+instance network requirements, and must be isolated to account for
+scalability. We recommend separating the networks for security purposes and
+tuning performance through traffic shaping.
+
+You must consider a number of important general technical and business factors
+when planning and designing an OpenStack network. These include:
+
+* A requirement for vendor independence. To avoid hardware or software vendor
+  lock-in, the design should not rely on specific features of a vendors router
+  or switch.
+* A requirement to massively scale the ecosystem to support millions of end
+  users.
+* A requirement to support indeterminate platforms and applications.
+* A requirement to design for cost efficient operations to take advantage of
+  massive scale.
+* A requirement to ensure that there is no single point of failure in the
+  cloud ecosystem.
+* A requirement for high availability architecture to meet customer SLA
+  requirements.
+* A requirement to be tolerant of rack level failure.
+* A requirement to maximize flexibility to architect future production
+  environments.
+
+Bearing in mind these considerations, we recommend the following:
+
+* Layer-3 designs are preferable to layer-2 architectures.
+* Design a dense multi-path network core to support multi-directional
+  scaling and flexibility.
+* Use hierarchical addressing because it is the only viable option to scale
+  network ecosystem.
+* Use virtual networking to isolate instance service network traffic from the
+  management and internal network traffic.
+* Isolate virtual networks using encapsulation technologies.
+* Use traffic shaping for performance tuning.
+* Use eBGP to connect to the Internet up-link.
+* Use iBGP to flatten the internal traffic on the layer-3 mesh.
+* Determine the most effective configuration for block storage network.


+Additional considerations
+-------------------------
+
+There are several further considerations when designing a network-focused
+OpenStack cloud. Redundant networking: ToR switch high availability risk
+analysis. In most cases, it is much more economical to use a single switch
+with a small pool of spare switches to replace failed units than it is to
+outfit an entire data center with redundant switches. Applications should
+tolerate rack level outages without affecting normal operations since network
+and compute resources are easily provisioned and plentiful.
+
+Research indicates the mean time between failures (MTBF) on switches is
+between 100,000 and 200,000 hours. This number is dependent on the ambient
+temperature of the switch in the data center. When properly cooled and
+maintained, this translates to between 11 and 22 years before failure. Even
+in the worst case of poor ventilation and high ambient temperatures in the data
+center, the MTBF is still 2-3 years.
+
+Reference
+https://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
+for further information.
+
+Legacy networking (nova-network)
+OpenStack Networking
+Simple, single agent
+Complex, multiple agents
+Flat or VLAN
+Flat, VLAN, Overlays, L2-L3, SDN
+No plug-in support
+Plug-in support for 3rd parties
+No multi-tier topologies
+Multi-tier topologies
+
+Preparing for the future: IPv6 support
+--------------------------------------
+
+One of the most important networking topics today is the exhaustion of
+IPv4 addresses. As of late 2015, ICANN announced that the the final
+IPv4 address blocks have been fully assigned. Because of this, IPv6
+protocol has become the future of network focused applications. IPv6
+increases the address space significantly, fixes long standing issues
+in the IPv4 protocol, and will become essential for network focused
+applications in the future.
+
+OpenStack Networking, when configured for it, supports IPv6. To enable
+IPv6, create an IPv6 subnet in Networking and use IPv6 prefixes when
+creating security groups.
+
+Asymmetric links
+----------------
+
+When designing a network architecture, the traffic patterns of an
+application heavily influence the allocation of total bandwidth and
+the number of links that you use to send and receive traffic. Applications
+that provide file storage for customers allocate bandwidth and links to
+favor incoming traffic; whereas video streaming applications allocate
+bandwidth and links to favor outgoing traffic.
+
+Performance
+-----------
+
+It is important to analyze the applications tolerance for latency and
+jitter when designing an environment to support network focused
+applications. Certain applications, for example VoIP, are less tolerant
+of latency and jitter. When latency and jitter are issues, certain
+applications may require tuning of QoS parameters and network device
+queues to ensure that they queue for transmit immediately or guarantee
+minimum bandwidth. Since OpenStack currently does not support these functions,
+consider carefully your selected network plug-in.
+
+The location of a service may also impact the application or consumer
+experience. If an application serves differing content to different users,
+it must properly direct connections to those specific locations. Where
+appropriate, use a multi-site installation for these situations.
+
+You can implement networking in two separate ways. Legacy networking
+(nova-network) provides a flat DHCP network with a single broadcast domain.
+This implementation does not support tenant isolation networks or advanced
+plug-ins, but it is currently the only way to implement a distributed
+layer-3 (L3) agent using the multi host configuration. OpenStack Networking
+(neutron) is the official networking implementation and provides a pluggable
+architecture that supports a large variety of network methods. Some of these
+include a layer-2 only provider network model, external device plug-ins, or
+even OpenFlow controllers.
+
+Networking at large scales becomes a set of boundary questions. The
+determination of how large a layer-2 domain must be is based on the
+amount of nodes within the domain and the amount of broadcast traffic
+that passes between instances. Breaking layer-2 boundaries may require
+the implementation of overlay networks and tunnels. This decision is a
+balancing act between the need for a smaller overhead or a need for a smaller
+domain.
+
+When selecting network devices, be aware that making a decision based on the
+greatest port density often comes with a drawback. Aggregation switches and
+routers have not all kept pace with Top of Rack switches and may induce
+bottlenecks on north-south traffic. As a result, it may be possible for
+massive amounts of downstream network utilization to impact upstream network
+devices, impacting service to the cloud. Since OpenStack does not currently
+provide a mechanism for traffic shaping or rate limiting, it is necessary to
+implement these features at the network hardware level.
+
+Tunable networking components
+-----------------------------
+
+Consider configurable networking components related to an OpenStack
+architecture design when designing for network intensive workloads
+that include MTU and QoS. Some workloads require a larger MTU than normal
+due to the transfer of large blocks of data. When providing network
+service for applications such as video streaming or storage replication,
+we recommend that you configure both OpenStack hardware nodes and the
+supporting network equipment for jumbo frames where possible. This
+allows for better use of available bandwidth. Configure jumbo frames across the
+complete path the packets traverse. If one network component is not capable of
+handling jumbo frames then the entire path reverts to the default MTU.
+
+:term:`Quality of Service (QoS)` also has a great impact on network intensive
+workloads as it provides instant service to packets which have a higher
+priority due to the impact of poor network performance. In applications such as
+Voice over IP (VoIP), differentiated services code points are a near
+requirement for proper operation. You can also use QoS in the opposite
+direction for mixed workloads to prevent low priority but high bandwidth
+applications, for example backup services, video conferencing, or file sharing,
+from blocking bandwidth that is needed for the proper operation of other
+workloads. It is possible to tag file storage traffic as a lower class, such as
+best effort or scavenger, to allow the higher priority traffic through. In
+cases where regions within a cloud might be geographically distributed it may
+also be necessary to plan accordingly to implement WAN optimization to combat
+latency or packet loss
+
+Network hardware selection
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The network architecture determines which network hardware will be
+used. Networking software is determined by the selected networking
+hardware.
+
+There are more subtle design impacts that need to be considered. The
+selection of certain networking hardware (and the networking software)
+affects the management tools that can be used. There are exceptions to
+this; the rise of *open* networking software that supports a range of
+networking hardware means there are instances where the relationship
+between networking hardware and networking software are not as tightly
+defined.
+
+For a compute-focus architecture, we recommend designing the network
+architecture using a scalable network model that makes it easy to add
+capacity and bandwidth. A good example of such a model is the leaf-spline
+model. In this type of network design, you can add additional
+bandwidth as well as scale out to additional racks of gear. It is important to
+select network hardware that supports port count, port speed, and
+port density while allowing for future growth as workload demands
+increase. In the network architecture, it is also important to evaluate
+where to provide redundancy.
+
+Some of the key considerations in the selection of networking hardware
+include:
+
+Port count
+ The design will require networking hardware that has the requisite
+ port count.
+
+Port density
+ The network design will be affected by the physical space that is
+ required to provide the requisite port count. A higher port density
+ is preferred, as it leaves more rack space for compute or storage
+ components. This can also lead into considerations about fault domains
+ and power density. Higher density switches are more expensive, therefore
+ it is important not to over design the network.
+
+Port speed
+ The networking hardware must support the proposed network speed, for
+ example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE).
+
+Redundancy
+ User requirements for high availability and cost considerations
+ influence the level of network hardware redundancy.
+ Network redundancy can be achieved by adding redundant power
+ supplies or paired switches.
+
+ .. note::
+
+    Hardware must support network redundacy.
+
+Power requirements
+ Ensure that the physical data center provides the necessary power
+ for the selected network hardware.
+
+ .. note::
+
+    This is not an issue for top of rack (ToR) switches. This may be an issue
+    for spine switches in a leaf and spine fabric, or end of row (EoR)
+    switches.
+
+Protocol support
+ It is possible to gain more performance out of a single storage
+ system by using specialized network technologies such as RDMA, SRP,
+ iSER and SCST. The specifics for using these technologies is beyond
+ the scope of this book.
+
+There is no single best practice architecture for the networking
+hardware supporting an OpenStack cloud. Some of the key factors that will
+have a major influence on selection of networking hardware include:
+
+Connectivity
+ All nodes within an OpenStack cloud require network connectivity. In
+ some cases, nodes require access to more than one network segment.
+ The design must encompass sufficient network capacity and bandwidth
+ to ensure that all communications within the cloud, both north-south
+ and east-west traffic have sufficient resources available.
+
+Scalability
+ The network design should encompass a physical and logical network
+ design that can be easily expanded upon. Network hardware should
+ offer the appropriate types of interfaces and speeds that are
+ required by the hardware nodes.
+
+Availability
+ To ensure access to nodes within the cloud is not interrupted,
+ we recommend that the network architecture identify any single
+ points of failure and provide some level of redundancy or fault
+ tolerance. The network infrastructure often involves use of
+ networking protocols such as LACP, VRRP or others to achieve a highly
+ available network connection. It is also important to consider the
+ networking implications on API availability. We recommend a load balancing
+ solution is designed within the network architecture to ensure that the APIs,
+ and potentially other services in the cloud are highly available.
+
+Networking software selection
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+OpenStack Networking (neutron) provides a wide variety of networking
+services for instances. There are many additional networking software
+packages that can be useful when managing OpenStack components. Some
+examples include:
+
+* Software to provide load balancing
+
+* Network redundancy protocols
+
+* Routing daemons
+
+Some of these software packages are described in more detail in the
+`OpenStack network nodes chapter <http://docs.openstack.org/ha-guide
+/networking-ha.html>`_ in the OpenStack High Availability Guide.
+
+For a general purpose OpenStack cloud, the OpenStack infrastructure
+components need to be highly available. If the design does not include
+hardware load balancing, networking software packages like HAProxy will
+need to be included.
+
+For a compute-focused OpenStack cloud, the OpenStack infrastructure
+components must be highly available. If the design does not include
+hardware load balancing, you must add networking software packages, for
+example, HAProxy.
--- a/doc/arch-design-draft/source/design-storage/design-storage-concepts.rst
+++ b/doc/arch-design-draft/source/design-storage/design-storage-concepts.rst
@ -226,3 +226,166 @@ compute cloud are:
 * To provide users with a persistent storage mechanism
 * As a scalable, reliable data store for virtual machine images

+Selecting storage hardware
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Storage hardware architecture is determined by selecting specific storage
+architecture. Determine the selection of storage architecture by
+evaluating possible solutions against the critical factors, the user
+requirements, technical considerations, and operational considerations.
+Consider the following factors when selecting storage hardware:
+
+Cost
+ Storage can be a significant portion of the overall system cost. For
+ an organization that is concerned with vendor support, a commercial
+ storage solution is advisable, although it comes with a higher price
+ tag. If initial capital expenditure requires minimization, designing
+ a system based on commodity hardware would apply. The trade-off is
+ potentially higher support costs and a greater risk of
+ incompatibility and interoperability issues.
+
+Performance
+ The latency of storage I/O requests indicates performance. Performance
+ requirements affect which solution you choose.
+
+Scalability
+ Scalability, along with expandability, is a major consideration in a
+ general purpose OpenStack cloud. It might be difficult to predict
+ the final intended size of the implementation as there are no
+ established usage patterns for a general purpose cloud. It might
+ become necessary to expand the initial deployment in order to
+ accommodate growth and user demand.
+
+Expandability
+ Expandability is a major architecture factor for storage solutions
+ with general purpose OpenStack cloud. A storage solution that
+ expands to 50 PB is considered more expandable than a solution that
+ only scales to 10 PB. This meter is related to scalability, which is
+ the measure of a solution's performance as it expands.
+
+General purpose cloud storage requirements
+------------------------------------------
+Using a scale-out storage solution with direct-attached storage (DAS) in
+the servers is well suited for a general purpose OpenStack cloud. Cloud
+services requirements determine your choice of scale-out solution. You
+need to determine if a single, highly expandable and highly vertical,
+scalable, centralized storage array is suitable for your design. After
+determining an approach, select the storage hardware based on this
+criteria.
+
+This list expands upon the potential impacts for including a particular
+storage architecture (and corresponding storage hardware) into the
+design for a general purpose OpenStack cloud:
+
+Connectivity
+ If storage protocols other than Ethernet are part of the storage solution,
+ ensure the appropriate hardware has been selected. If a centralized storage
+ array is selected, ensure that the hypervisor will be able to connect to
+ that storage array for image storage.
+
+Usage
+ How the particular storage architecture will be used is critical for
+ determining the architecture. Some of the configurations that will
+ influence the architecture include whether it will be used by the
+ hypervisors for ephemeral instance storage, or if OpenStack Object
+ Storage will use it for object storage.
+
+Instance and image locations
+ Where instances and images will be stored will influence the
+ architecture.
+
+Server hardware
+ If the solution is a scale-out storage architecture that includes
+ DAS, it will affect the server hardware selection. This could ripple
+ into the decisions that affect host density, instance density, power
+ density, OS-hypervisor, management tools and others.
+
+A general purpose OpenStack cloud has multiple options. The key factors
+that will have an influence on selection of storage hardware for a
+general purpose OpenStack cloud are as follows:
+
+Capacity
+ Hardware resources selected for the resource nodes should be capable
+ of supporting enough storage for the cloud services. Defining the
+ initial requirements and ensuring the design can support adding
+ capacity is important. Hardware nodes selected for object storage
+ should be capable of support a large number of inexpensive disks
+ with no reliance on RAID controller cards. Hardware nodes selected
+ for block storage should be capable of supporting high speed storage
+ solutions and RAID controller cards to provide performance and
+ redundancy to storage at a hardware level. Selecting hardware RAID
+ controllers that automatically repair damaged arrays will assist
+ with the replacement and repair of degraded or deleted storage
+ devices.
+
+Performance
+ Disks selected for object storage services do not need to be fast
+ performing disks. We recommend that object storage nodes take
+ advantage of the best cost per terabyte available for storage.
+ Contrastingly, disks chosen for block storage services should take
+ advantage of performance boosting features that may entail the use
+ of SSDs or flash storage to provide high performance block storage
+ pools. Storage performance of ephemeral disks used for instances
+ should also be taken into consideration.
+
+Fault tolerance
+ Object storage resource nodes have no requirements for hardware
+ fault tolerance or RAID controllers. It is not necessary to plan for
+ fault tolerance within the object storage hardware because the
+ object storage service provides replication between zones as a
+ feature of the service. Block storage nodes, compute nodes, and
+ cloud controllers should all have fault tolerance built in at the
+ hardware level by making use of hardware RAID controllers and
+ varying levels of RAID configuration. The level of RAID chosen
+ should be consistent with the performance and availability
+ requirements of the cloud.
+
+Storage-focus cloud storage requirements
+----------------------------------------
+
+Storage-focused OpenStack clouds must address I/O intensive workloads.
+These workloads are not CPU intensive, nor are they consistently network
+intensive. The network may be heavily utilized to transfer storage, but
+they are not otherwise network intensive.
+
+The selection of storage hardware determines the overall performance and
+scalability of a storage-focused OpenStack design architecture. Several
+factors impact the design process, including:
+
+Latency is a key consideration in a storage-focused OpenStack cloud.
+Using solid-state disks (SSDs) to minimize latency and, to reduce CPU
+delays caused by waiting for the storage, increases performance. Use
+RAID controller cards in compute hosts to improve the performance of the
+underlying disk subsystem.
+
+Depending on the storage architecture, you can adopt a scale-out
+solution, or use a highly expandable and scalable centralized storage
+array. If a centralized storage array meets your requirements, then the
+array vendor determines the hardware selection. It is possible to build
+a storage array using commodity hardware with Open Source software, but
+requires people with expertise to build such a system.
+
+On the other hand, a scale-out storage solution that uses
+direct-attached storage (DAS) in the servers may be an appropriate
+choice. This requires configuration of the server hardware to support
+the storage solution.
+
+Considerations affecting storage architecture (and corresponding storage
+hardware) of a Storage-focused OpenStack cloud include:
+
+Connectivity
+ Ensure the connectivity matches the storage solution requirements. We
+ recommended confirming that the network characteristics minimize latency
+ to boost the overall performance of the design.
+
+Latency
+ Determine if the use case has consistent or highly variable latency.
+
+Throughput
+ Ensure that the storage solution throughput is optimized for your
+ application requirements.
+
+Server hardware
+ Use of DAS impacts the server hardware choice and affects host
+ density, instance density, power density, OS-hypervisor, and
+ management tools.
--- a/doc/arch-design-draft/source/overview-operator-requirements.rst
+++ b/doc/arch-design-draft/source/overview-operator-requirements.rst
@ -5,6 +5,63 @@ Operator requirements
 This section describes operational factors affecting the design of an
 OpenStack cloud.

+Network design
+~~~~~~~~~~~~~~
+
+The network design for an OpenStack cluster includes decisions regarding
+the interconnect needs within the cluster, the need to allow clients to
+access their resources, and the access requirements for operators to
+administrate the cluster. You should consider the bandwidth, latency,
+and reliability of these networks.
+
+Consider additional design decisions about monitoring and alarming.
+If you are using an external provider, service level agreements (SLAs)
+are typically defined in your contract. Operational considerations such
+as bandwidth, latency, and jitter can be part of the SLA.
+
+As demand for network resources increase, make sure your network design
+accommodates expansion and upgrades. Operators add additional IP address
+blocks and add additional bandwidth capacity. In addition, consider
+managing hardware and software lifecycle events, for example upgrades,
+decommissioning, and outages, while avoiding service interruptions for
+tenants.
+
+Factor maintainability into the overall network design. This includes
+the ability to manage and maintain IP addresses as well as the use of
+overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
+tags. As an example, if you may need to change all of the IP addresses
+on a network, a process known as renumbering, then the design must
+support this function.
+
+Address network-focused applications when considering certain
+operational realities. For example, consider the impending exhaustion of
+IPv4 addresses, the migration to IPv6, and the use of private networks
+to segregate different types of traffic that an application receives or
+generates. In the case of IPv4 to IPv6 migrations, applications should
+follow best practices for storing IP addresses. We recommend you avoid
+relying on IPv4 features that did not carry over to the IPv6 protocol or
+have differences in implementation.
+
+To segregate traffic, allow applications to create a private tenant
+network for database and storage network traffic. Use a public network
+for services that require direct client access from the Internet. Upon
+segregating the traffic, consider :term:`quality of service (QoS)` and
+security to ensure each network has the required level of service.
+
+Also consider the routing of network traffic. For some applications,
+develop a complex policy framework for routing. To create a routing
+policy that satisfies business requirements, consider the economic cost
+of transmitting traffic over expensive links versus cheaper links, in
+addition to bandwidth, latency, and jitter requirements.
+
+Finally, consider how to respond to network events. How load
+transfers from one link to another during a failure scenario could be
+a factor in the design. If you do not plan network capacity
+correctly, failover traffic could overwhelm other ports or network
+links and create a cascading failure scenario. In this case,
+traffic that fails over to one link overwhelms that link and then
+moves to the subsequent links until all network traffic stops.
+
 SLA considerations
 ~~~~~~~~~~~~~~~~~~

@ -102,6 +159,89 @@ managing and maintaining your OpenStack environment, see the
 `Operations chapter <http://docs.openstack.org/ops-guide/operations.html>`_
 in the OpenStack Operations Guide.

+Logging and monitoring
+----------------------
+
+OpenStack clouds require appropriate monitoring platforms to identify and
+manage errors.
+
+.. note::
+
+   We recommend leveraging existing monitoring systems to see if they
+   are able to effectively monitor an OpenStack environment.
+
+Specific meters that are critically important to capture include:
+
+* Image disk utilization
+
+* Response time to the Compute API
+
+Logging and monitoring does not significantly differ for a multi-site OpenStack
+cloud. The tools described in the `Logging and monitoring chapter
+<http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`__ of
+the Operations Guide remain applicable. Logging and monitoring can be provided
+on a per-site basis, and in a common centralized location.
+
+When attempting to deploy logging and monitoring facilities to a centralized
+location, care must be taken with the load placed on the inter-site networking
+links
+
+Management software
+-------------------
+
+Management software providing clustering, logging, monitoring, and alerting
+details for a cloud environment is often used.  This impacts and affects the
+overall OpenStack cloud design, and must account for the additional resource
+consumption such as CPU, RAM, storage, and network
+bandwidth.
+
+The inclusion of clustering software, such as Corosync or Pacemaker, is
+primarily determined by the availability of the cloud infrastructure and
+the complexity of supporting the configuration after it is deployed. The
+`OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/>`_
+provides more details on the installation and configuration of Corosync
+and Pacemaker, should these packages need to be included in the design.
+
+Some other potential design impacts include:
+
+* OS-hypervisor combination
+   Ensure that the selected logging, monitoring, or alerting tools support
+   the proposed OS-hypervisor combination.
+
+* Network hardware
+   The network hardware selection needs to be supported by the logging,
+   monitoring, and alerting software.
+
+Database software
+-----------------
+
+Most OpenStack components require access to back-end database services
+to store state and configuration information. Choose an appropriate
+back-end database which satisfies the availability and fault tolerance
+requirements of the OpenStack services.
+
+MySQL is the default database for OpenStack, but other compatible
+databases are available.
+
+.. note::
+
+   Telemetry uses MongoDB.
+
+The chosen high availability database solution changes according to the
+selected database. MySQL, for example, provides several options. Use a
+replication technology such as Galera for active-active clustering. For
+active-passive use some form of shared storage. Each of these potential
+solutions has an impact on the design:
+
+* Solutions that employ Galera/MariaDB require at least three MySQL
+  nodes.
+
+* MongoDB has its own design considerations for high availability.
+
+* OpenStack design, generally, does not include shared storage.
+  However, for some high availability designs, certain components might
+  require it depending on the specific implementation.
+
 Operator access to systems
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/doc/arch-design-draft/source/overview-software-licensing.rst
+++ b/doc/arch-design-draft/source/overview-software-licensing.rst
@ -0,0 +1,33 @@
+==================
+Software licensing
+==================
+
+The many different forms of license agreements for software are often written
+with the use of dedicated hardware in mind.  This model is relevant for the
+cloud platform itself, including the hypervisor operating system, supporting
+software for items such as database, RPC, backup, and so on.  Consideration
+must be made when offering Compute service instances and applications to end
+users of the cloud, since the license terms for that software may need some
+adjustment to be able to operate economically in the cloud.
+
+Multi-site OpenStack deployments present additional licensing
+considerations over and above regular OpenStack clouds, particularly
+where site licenses are in use to provide cost efficient access to
+software licenses. The licensing for host operating systems, guest
+operating systems, OpenStack distributions (if applicable),
+software-defined infrastructure including network controllers and
+storage systems, and even individual applications need to be evaluated.
+
+Topics to consider include:
+
+* The definition of what constitutes a site in the relevant licenses,
+  as the term does not necessarily denote a geographic or otherwise
+  physically isolated location.
+
+* Differentiations between "hot" (active) and "cold" (inactive) sites,
+  where significant savings may be made in situations where one site is
+  a cold standby for disaster recovery purposes only.
+
+* Certain locations might require local vendors to provide support and
+  services for each site which may vary with the licensing agreement in
+  place.
--- a/doc/arch-design-draft/source/overview.rst
+++ b/doc/arch-design-draft/source/overview.rst
@ -55,5 +55,6 @@ covered include:
   overview-planning
   overview-customer-requirements
   overview-legal-requirements
+   overview-software-licensing
   overview-security-requirements
   overview-operator-requirements