[arch-design-draft] Migrate technical requirements content

Move and edit technical requirements content to the new arch guide structure Change-Id: Ic6ca927fbb68e451fe25639a8848360e53284b6b Implements: blueprint arch-guide-restructure
2016-09-20 12:06:57 +10:00 · 2016-09-20 12:06:57 +10:00 · 3bc0e94b9d
commit 3bc0e94b9d
parent 5a45c9ce62
11 changed files with 1094 additions and 1244 deletions
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-hardware-selection.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-hardware-selection.rst
@ -1,449 +0,0 @@
 ==================
 Hardware selection
 ==================
 Hardware selection involves three key areas:
 * Network
 * Compute
 * Storage
 Network hardware selection
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 The network architecture determines which network hardware will be
 used. Networking software is determined by the selected networking
 hardware.
 There are more subtle design impacts that need to be considered. The
 selection of certain networking hardware (and the networking software)
 affects the management tools that can be used. There are exceptions to
 this; the rise of *open* networking software that supports a range of
 networking hardware means there are instances where the relationship
 between networking hardware and networking software are not as tightly
 defined.
 For a compute-focus architecture, we recommend designing the network
 architecture using a scalable network model that makes it easy to add
 capacity and bandwidth. A good example of such a model is the leaf-spline
 model. In this type of network design, it is possible to easily add additional
 bandwidth as well as scale out to additional racks of gear. It is important to
 select network hardware that supports the required port count, port speed, and
 port density while also allowing for future growth as workload demands
 increase. It is also important to evaluate where in the network architecture
 it is valuable to provide redundancy.
 Some of the key considerations that should be included in the selection
 of networking hardware include:
 Port count
 The design will require networking hardware that has the requisite
 port count.
 Port density
 The network design will be affected by the physical space that is
 required to provide the requisite port count. A higher port density
 is preferred, as it leaves more rack space for compute or storage
 components that may be required by the design. This can also lead
 into considerations about fault domains and power density. Higher
 density switches are more expensive, therefore it is important not
 to over design the network.
 Port speed
 The networking hardware must support the proposed network speed, for
 example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE).
 Redundancy
 User requirements for high availability and cost considerations
 influence the required level of network hardware redundancy.
 Network redundancy can be achieved by adding redundant power
 supplies or paired switches.
 .. note::
    If this is a requirement, the hardware must support this
    configuration. User requirements determine if a completely
    redundant network infrastructure is required.
 Power requirements
 Ensure that the physical data center provides the necessary power
 for the selected network hardware.
 .. note::
    This is not an issue for top of rack (ToR) switches. This may be an issue
    for spine switches in a leaf and spine fabric, or end of row (EoR)
    switches.
 Protocol support
 It is possible to gain more performance out of a single storage
 system by using specialized network technologies such as RDMA, SRP,
 iSER and SCST. The specifics for using these technologies is beyond
 the scope of this book.
 There is no single best practice architecture for the networking
 hardware supporting an OpenStack cloud that will apply to all implementations.
 Some of the key factors that will have a major influence on selection of
 networking hardware include:
 Connectivity
 All nodes within an OpenStack cloud require network connectivity. In
 some cases, nodes require access to more than one network segment.
 The design must encompass sufficient network capacity and bandwidth
 to ensure that all communications within the cloud, both north-south
 and east-west traffic have sufficient resources available.
 Scalability
 The network design should encompass a physical and logical network
 design that can be easily expanded upon. Network hardware should
 offer the appropriate types of interfaces and speeds that are
 required by the hardware nodes.
 Availability
 To ensure access to nodes within the cloud is not interrupted,
 we recommend that the network architecture identify any single
 points of failure and provide some level of redundancy or fault
 tolerance. The network infrastructure often involves use of
 networking protocols such as LACP, VRRP or others to achieve a highly
 available network connection. It is also important to consider the
 networking implications on API availability. We recommend a load balancing
 solution is designed within the network architecture to ensure that the APIs,
 and potentially other services in the cloud are highly available.
 Compute (server) hardware selection
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider the following factors when selecting compute (server) hardware:
 * Server density
   A measure of how many servers can fit into a given measure of
   physical space, such as a rack unit [U].
 * Resource capacity
   The number of CPU cores, how much RAM, or how much storage a given
   server delivers.
 * Expandability
   The number of additional resources you can add to a server before it
   reaches capacity.
 * Cost
   The relative cost of the hardware weighed against the level of
   design effort needed to build the system.
 Weigh these considerations against each other to determine the best
 design for the desired purpose. For example, increasing server density
 means sacrificing resource capacity or expandability.  Increasing resource
 capacity and expandability can increase cost but decrease server density.
 Decreasing cost often means decreasing supportability, server density,
 resource capacity, and expandability.
 Compute capacity (CPU cores and RAM capacity) is a secondary
 consideration for selecting server hardware. The required
 server hardware must supply adequate CPU sockets, additional CPU cores,
 and more RAM; network connectivity and storage capacity are not as
 critical. The hardware needs to provide enough network connectivity and
 storage capacity to meet the user requirements.
 For a compute-focused cloud, emphasis should be on server
 hardware that can offer more CPU sockets, more CPU cores, and more RAM.
 Network connectivity and storage capacity are less critical.
 When designing a OpenStack cloud architecture, you must
 consider whether you intend to scale up or scale out. Selecting a
 smaller number of larger hosts, or a larger number of smaller hosts,
 depends on a combination of factors: cost, power, cooling, physical rack
 and floor space, support-warranty, and manageability.
 Consider the following in selecting server hardware form factor suited for
 your OpenStack design architecture:
 * Most blade servers can support dual-socket multi-core CPUs. To avoid
  this CPU limit, select ``full width`` or ``full height`` blades. Be
  aware, however, that this also decreases server density. For example,
  high density blade servers such as HP BladeSystem or Dell PowerEdge
  M1000e support up to 16 servers in only ten rack units. Using
  half-height blades is twice as dense as using full-height blades,
  which results in only eight servers per ten rack units.
 * 1U rack-mounted servers have the ability to offer greater server density
  than a blade server solution, but are often limited to dual-socket,
  multi-core CPU configurations. It is possible to place forty 1U servers
  in a rack, providing space for the top of rack (ToR) switches, compared
  to 32 full width blade servers.
  To obtain greater than dual-socket support in a 1U rack-mount form
  factor, customers need to buy their systems from Original Design
  Manufacturers (ODMs) or second-tier manufacturers.
  .. warning::
     This may cause issues for organizations that have preferred
     vendor policies or concerns with support and hardware warranties
     of non-tier 1 vendors.
 * 2U rack-mounted servers provide quad-socket, multi-core CPU support,
  but with a corresponding decrease in server density (half the density
  that 1U rack-mounted servers offer).
 * Larger rack-mounted servers, such as 4U servers, often provide even
  greater CPU capacity, commonly supporting four or even eight CPU
  sockets. These servers have greater expandability, but such servers
  have much lower server density and are often more expensive.
 * ``Sled servers`` are rack-mounted servers that support multiple
  independent servers in a single 2U or 3U enclosure. These deliver
  higher density as compared to typical 1U or 2U rack-mounted servers.
  For example, many sled servers offer four independent dual-socket
  nodes in 2U for a total of eight CPU sockets in 2U.
 Other factors that influence server hardware selection for an OpenStack
 design architecture include:
 Instance density
 More hosts are required to support the anticipated scale
 if the design architecture uses dual-socket hardware designs.
 For a general purpose OpenStack cloud, sizing is an important consideration.
 The expected or anticipated number of instances that each hypervisor can
 host is a common meter used in sizing the deployment. The selected server
 hardware needs to support the expected or anticipated instance density.
 Host density
 Another option to address the higher host count is to use a
 quad-socket platform. Taking this approach decreases host density
 which also increases rack count. This configuration affects the
 number of power connections and also impacts network and cooling
 requirements.
 Physical data centers have limited physical space, power, and
 cooling. The number of hosts (or hypervisors) that can be fitted
 into a given metric (rack, rack unit, or floor tile) is another
 important method of sizing. Floor weight is an often overlooked
 consideration. The data center floor must be able to support the
 weight of the proposed number of hosts within a rack or set of
 racks. These factors need to be applied as part of the host density
 calculation and server hardware selection.
 Power and cooling density
 The power and cooling density requirements might be lower than with
 blade, sled, or 1U server designs due to lower host density (by
 using 2U, 3U or even 4U server designs). For data centers with older
 infrastructure, this might be a desirable feature.
 Data centers have a specified amount of power fed to a given rack or
 set of racks. Older data centers may have a power density as power
 as low as 20 AMPs per rack, while more recent data centers can be
 architected to support power densities as high as 120 AMP per rack.
 The selected server hardware must take power density into account.
 Network connectivity
 The selected server hardware must have the appropriate number of
 network connections, as well as the right type of network
 connections, in order to support the proposed architecture. Ensure
 that, at a minimum, there are at least two diverse network
 connections coming into each rack.
 The selection of form factors or architectures affects the selection of
 server hardware. Ensure that the selected server hardware is configured
 to support enough storage capacity (or storage expandability) to match
 the requirements of selected scale-out storage solution. Similarly, the
 network architecture impacts the server hardware selection and vice
 versa.
 Hardware for general purpose OpenStack cloud
 --------------------------------------------
 Hardware for a general purpose OpenStack cloud should reflect a cloud
 with no pre-defined usage model, designed to run a wide variety of
 applications with varying resource usage requirements. These
 applications include any of the following:
 * RAM-intensive
 * CPU-intensive
 * Storage-intensive
 Certain hardware form factors may better suit a general purpose
 OpenStack cloud due to the requirement for equal (or nearly equal)
 balance of resources. Server hardware must provide the following:
 * Equal (or nearly equal) balance of compute capacity (RAM and CPU)
 * Network capacity (number and speed of links)
 * Storage capacity (gigabytes or terabytes as well as :term:`Input/Output
  Operations Per Second (IOPS)`
 The best form factor for server hardware supporting a general purpose
 OpenStack cloud is driven by outside business and cost factors. No
 single reference architecture applies to all implementations; the
 decision must flow from user requirements, technical considerations, and
 operational considerations.
 Selecting storage hardware
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 Storage hardware architecture is determined by selecting specific storage
 architecture. Determine the selection of storage architecture by
 evaluating possible solutions against the critical factors, the user
 requirements, technical considerations, and operational considerations.
 Consider the following factors when selecting storage hardware:
 Cost
 Storage can be a significant portion of the overall system cost. For
 an organization that is concerned with vendor support, a commercial
 storage solution is advisable, although it comes with a higher price
 tag. If initial capital expenditure requires minimization, designing
 a system based on commodity hardware would apply. The trade-off is
 potentially higher support costs and a greater risk of
 incompatibility and interoperability issues.
 Performance
 The latency of storage I/O requests indicates performance. Performance
 requirements affect which solution you choose.
 Scalability
 Scalability, along with expandability, is a major consideration in a
 general purpose OpenStack cloud. It might be difficult to predict
 the final intended size of the implementation as there are no
 established usage patterns for a general purpose cloud. It might
 become necessary to expand the initial deployment in order to
 accommodate growth and user demand.
 Expandability
 Expandability is a major architecture factor for storage solutions
 with general purpose OpenStack cloud. A storage solution that
 expands to 50 PB is considered more expandable than a solution that
 only scales to 10 PB. This meter is related to scalability, which is
 the measure of a solution's performance as it expands.
 General purpose cloud storage requirements
 ------------------------------------------
 Using a scale-out storage solution with direct-attached storage (DAS) in
 the servers is well suited for a general purpose OpenStack cloud. Cloud
 services requirements determine your choice of scale-out solution. You
 need to determine if a single, highly expandable and highly vertical,
 scalable, centralized storage array is suitable for your design. After
 determining an approach, select the storage hardware based on this
 criteria.
 This list expands upon the potential impacts for including a particular
 storage architecture (and corresponding storage hardware) into the
 design for a general purpose OpenStack cloud:
 Connectivity
 If storage protocols other than Ethernet are part of the storage solution,
 ensure the appropriate hardware has been selected. If a centralized storage
 array is selected, ensure that the hypervisor will be able to connect to
 that storage array for image storage.
 Usage
 How the particular storage architecture will be used is critical for
 determining the architecture. Some of the configurations that will
 influence the architecture include whether it will be used by the
 hypervisors for ephemeral instance storage, or if OpenStack Object
 Storage will use it for object storage.
 Instance and image locations
 Where instances and images will be stored will influence the
 architecture.
 Server hardware
 If the solution is a scale-out storage architecture that includes
 DAS, it will affect the server hardware selection. This could ripple
 into the decisions that affect host density, instance density, power
 density, OS-hypervisor, management tools and others.
 A general purpose OpenStack cloud has multiple options. The key factors
 that will have an influence on selection of storage hardware for a
 general purpose OpenStack cloud are as follows:
 Capacity
 Hardware resources selected for the resource nodes should be capable
 of supporting enough storage for the cloud services. Defining the
 initial requirements and ensuring the design can support adding
 capacity is important. Hardware nodes selected for object storage
 should be capable of support a large number of inexpensive disks
 with no reliance on RAID controller cards. Hardware nodes selected
 for block storage should be capable of supporting high speed storage
 solutions and RAID controller cards to provide performance and
 redundancy to storage at a hardware level. Selecting hardware RAID
 controllers that automatically repair damaged arrays will assist
 with the replacement and repair of degraded or deleted storage
 devices.
 Performance
 Disks selected for object storage services do not need to be fast
 performing disks. We recommend that object storage nodes take
 advantage of the best cost per terabyte available for storage.
 Contrastingly, disks chosen for block storage services should take
 advantage of performance boosting features that may entail the use
 of SSDs or flash storage to provide high performance block storage
 pools. Storage performance of ephemeral disks used for instances
 should also be taken into consideration.
 Fault tolerance
 Object storage resource nodes have no requirements for hardware
 fault tolerance or RAID controllers. It is not necessary to plan for
 fault tolerance within the object storage hardware because the
 object storage service provides replication between zones as a
 feature of the service. Block storage nodes, compute nodes, and
 cloud controllers should all have fault tolerance built in at the
 hardware level by making use of hardware RAID controllers and
 varying levels of RAID configuration. The level of RAID chosen
 should be consistent with the performance and availability
 requirements of the cloud.
 Storage-focus cloud storage requirements
 ----------------------------------------
 Storage-focused OpenStack clouds must address I/O intensive workloads.
 These workloads are not CPU intensive, nor are they consistently network
 intensive. The network may be heavily utilized to transfer storage, but
 they are not otherwise network intensive.
 The selection of storage hardware determines the overall performance and
 scalability of a storage-focused OpenStack design architecture. Several
 factors impact the design process, including:
 Latency is a key consideration in a storage-focused OpenStack cloud.
 Using solid-state disks (SSDs) to minimize latency and, to reduce CPU
 delays caused by waiting for the storage, increases performance. Use
 RAID controller cards in compute hosts to improve the performance of the
 underlying disk subsystem.
 Depending on the storage architecture, you can adopt a scale-out
 solution, or use a highly expandable and scalable centralized storage
 array. If a centralized storage array meets your requirements, then the
 array vendor determines the hardware selection. It is possible to build
 a storage array using commodity hardware with Open Source software, but
 requires people with expertise to build such a system.
 On the other hand, a scale-out storage solution that uses
 direct-attached storage (DAS) in the servers may be an appropriate
 choice. This requires configuration of the server hardware to support
 the storage solution.
 Considerations affecting storage architecture (and corresponding storage
 hardware) of a Storage-focused OpenStack cloud include:
 Connectivity
 Ensure the connectivity matches the storage solution requirements. We
 recommended confirming that the network characteristics minimize latency
 to boost the overall performance of the design.
 Latency
 Determine if the use case has consistent or highly variable latency.
 Throughput
 Ensure that the storage solution throughput is optimized for your
 application requirements.
 Server hardware
 Use of DAS impacts the server hardware choice and affects host
 density, instance density, power density, OS-hypervisor, and
 management tools.
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-logging-monitoring.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-logging-monitoring.rst
@ -1,27 +0,0 @@
 ======================
 Logging and monitoring
 ======================
 OpenStack clouds require appropriate monitoring platforms to catch and
 manage errors.
 .. note::
   We recommend leveraging existing monitoring systems to see if they
   are able to effectively monitor an OpenStack environment.
 Specific meters that are critically important to capture include:
 * Image disk utilization
 * Response time to the Compute API
 Logging and monitoring does not significantly differ for a multi-site OpenStack
 cloud. The tools described in the `Logging and monitoring chapter
 <http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`__ of
 the Operations Guide remain applicable. Logging and monitoring can be provided
 on a per-site basis, and in a common centralized location.
 When attempting to deploy logging and monitoring facilities to a centralized
 location, care must be taken with the load placed on the inter-site networking
 links.
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-network-design.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-network-design.rst
@ -1,445 +0,0 @@
 ==========
 Networking
 ==========
 OpenStack clouds generally have multiple network segments, with each
 segment providing access to particular resources. The network segments
 themselves also require network communication paths that should be
 separated from the other networks. When designing network services for a
 general purpose cloud, plan for either a physical or logical separation
 of network segments used by operators and tenants. Additional network
 segments can also be created for access to internal services such as the
 message bus and database used by various systems. Segregating these
 services onto separate networks helps to protect sensitive data and
 unauthorized access.
 Choose a networking service based on the requirements of your instances.
 The architecture and design of your cloud will impact whether you choose
 OpenStack Networking (neutron) or legacy networking (nova-network).
 Networking (neutron)
 ~~~~~~~~~~~~~~~~~~~~
 OpenStack Networking (neutron) is a first class networking service that gives
 full control over creation of virtual network resources to tenants. This is
 often accomplished in the form of tunneling protocols that establish
 encapsulated communication paths over existing network infrastructure in order
 to segment tenant traffic. This method varies depending on the specific
 implementation, but some of the more common methods include tunneling over
 GRE, encapsulating with VXLAN, and VLAN tags.
 We recommend you design at least three network segments. The first segment
 should be a public network, used to access REST APIs by tenants and operators.
 The controller nodes and swift proxies are the only devices connecting to this
 network segment. In some cases, this public network might also be serviced by
 hardware load balancers and other network devices.
 The second segment is used by administrators to manage hardware resources.
 Configuration management tools also utilize this segment for deploying
 software and services onto new hardware. In some cases, this network
 segment is also used for internal services, including the message bus
 and database services. The second segment needs to communicate with every
 hardware node. Due to the highly sensitive nature of this network segment,
 it needs to be secured from unauthorized access.
 The third network segment is used by applications and consumers to access the
 physical network, and for users to access applications. This network is
 segregated from the one used to access the cloud APIs and is not capable
 of communicating directly with the hardware resources in the cloud.
 Communication on this network segment is required by compute resource
 nodes and network gateway services that allow application data to access the
 physical network from outside the cloud.
 Legacy networking (nova-network)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The legacy networking (nova-network) service is primarily a layer-2 networking
 service. It functions in two modes: flat networking mode and VLAN mode. In a
 flat network mode, all network hardware nodes and devices throughout the cloud
 are connected to a single layer-2 network segment that provides access to
 application data.
 However, when the network devices in the cloud support segmentation using
 VLANs, legacy networking can operate in the second mode. In this design model,
 each tenant within the cloud is assigned a network subnet which is mapped to
 a VLAN on the physical network. It is especially important to remember that
 the maximum number of VLANs that can be used within a spanning tree domain
 is 4096. This places a hard limit on the amount of growth possible within the
 data center. Consequently, when designing a general purpose cloud intended to
 support multiple tenants, we recommend the use of legacy networking with
 VLANs, and not in flat network mode.
 Layer-2 architecture advantages
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 A network designed on layer-2 protocols has advantages over a network designed
 on layer-3 protocols. In spite of the difficulties of using a bridge to perform
 the network role of a router, many vendors, customers, and service providers
 choose to use Ethernet in as many parts of their networks as possible. The
 benefits of selecting a layer-2 design are:
 * Ethernet frames contain all the essentials for networking. These include, but
  are not limited to, globally unique source addresses, globally unique
  destination addresses, and error control.
 * Ethernet frames contain all the essentials for networking. These include,
  but are not limited to, globally unique source addresses, globally unique
  destination addresses, and error control.
 * Ethernet frames can carry any kind of packet. Networking at layer-2 is
  independent of the layer-3 protocol.
 * Adding more layers to the Ethernet frame only slows the networking process
  down. This is known as nodal processing delay.
 * You can add adjunct networking features, for example class of service (CoS)
  or multicasting, to Ethernet as readily as IP networks.
 * VLANs are an easy mechanism for isolating networks.
 Most information starts and ends inside Ethernet frames. Today this applies
 to data, voice, and video. The concept is that the network will benefit more
 from the advantages of Ethernet if the transfer of information from a source
 to a destination is in the form of Ethernet frames.
 Although it is not a substitute for IP networking, networking at layer-2 can
 be a powerful adjunct to IP networking.
 Layer-2 Ethernet usage has these additional advantages over layer-3 IP network
 usage:
 * Speed
 * Reduced overhead of the IP hierarchy.
 * No need to keep track of address configuration as systems move around.
 Whereas the simplicity of layer-2 protocols might work well in a data center
 with hundreds of physical machines, cloud data centers have the additional
 burden of needing to keep track of all virtual machine addresses and
 networks. In these data centers, it is not uncommon for one physical node
 to support 30-40 instances.
 .. Important::
   Networking at the frame level says nothing about the presence or
   absence of IP addresses at the packet level. Almost all ports, links, and
   devices on a network of LAN switches still have IP addresses, as do all the
   source and destination hosts. There are many reasons for the continued need
   for IP addressing. The largest one is the need to manage the network. A
   device or link without an IP address is usually invisible to most
   management applications. Utilities including remote access for diagnostics,
   file transfer of configurations and software, and similar applications
   cannot run without IP addresses as well as MAC addresses.
 Layer-2 architecture limitations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Layer-2 network architectures have some limitations that become noticeable when
 used outside of traditional data centers.
 * Number of VLANs is limited to 4096.
 * The number of MACs stored in switch tables is limited.
 * You must accommodate the need to maintain a set of layer-4 devices to handle
  traffic control.
 * MLAG, often used for switch redundancy, is a proprietary solution that does
  not scale beyond two devices and forces vendor lock-in.
 * It can be difficult to troubleshoot a network without IP addresses and ICMP.
 * Configuring ARP can be complicated on a large layer-2 networks.
 * All network devices need to be aware of all MACs, even instance MACs, so
  there is constant churn in MAC tables and network state changes as instances
  start and stop.
 * Migrating MACs (instance migration) to different physical locations are a
  potential problem if you do not set ARP table timeouts properly.
 It is important to know that layer-2 has a very limited set of network
 management tools. It is difficult to control traffic as it does not have
 mechanisms to manage the network or shape the traffic. Network
 troubleshooting is also troublesome, in part because network devices have
 no IP addresses. As a result, there is no reasonable way to check network
 delay.
 In a layer-2 network all devices are aware of all MACs, even those that belong
 to instances. The network state information in the backbone changes whenever an
 instance starts or stops. Because of this, there is far too much churn in the
 MAC tables on the backbone switches.
 Furthermore, on large layer-2 networks, configuring ARP learning can be
 complicated. The setting for the MAC address timer on switches is critical
 and, if set incorrectly, can cause significant performance problems. So when
 migrating MACs to different physical locations to support instance migration,
 problems may arise. As an example, the Cisco default MAC address timer is
 extremely long. As such, the network information maintained in the switches
 could be out of sync with the new location of the instance.
 Layer-3 architecture advantages
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In layer-3 networking, routing takes instance MAC and IP addresses out of the
 network core, reducing state churn. The only time there would be a routing
 state change is in the case of a Top of Rack (ToR) switch failure or a link
 failure in the backbone itself. Other advantages of using a layer-3
 architecture include:
 * Layer-3 networks provide the same level of resiliency and scalability
  as the Internet.
 * Controlling traffic with routing metrics is straightforward.
 * You can configure layer-3 to useˇBGPˇconfederation for scalability. This
  way core routers have state proportional to the number of racks, not to the
  number of servers or instances.
 * There are a variety of well tested tools, such as ICMP, to monitor and
  manage traffic.
 * Layer-3 architectures enable the use of :term:`quality of service (QoS)` to
  manage network performance.
 Layer-3 architecture limitations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The main limitation of layer 3 is that there is no built-in isolation mechanism
 comparable to the VLANs in layer-2 networks. Furthermore, the hierarchical
 nature of IP addresses means that an instance is on the same subnet as its
 physical host, making migration out of the subnet difficult. For these reasons,
 network virtualization needs to use IPencapsulation and software at the end
 hosts. This is for isolation and the separation of the addressing in the
 virtual layer from the addressing in the physical layer. Other potential
 disadvantages of layer 3 include the need to design an IP addressing scheme
 rather than relying on the switches to keep track of the MAC addresses
 automatically, and to configure the interior gateway routing protocol in the
 switches.
 Network design
 ~~~~~~~~~~~~~~
 There are many reasons an OpenStack network has complex requirements. However,
 one main factor is the many components that interact at different levels of the
 system stack, adding complexity. Data flows are also complex. Data in an
 OpenStack cloud moves both between instances across the network (also known as
 East-West), as well as in and out of the system (also known as North-South).
 Physical server nodes have network requirements that are independent of
 instance network requirements; these you must isolate from the core network
 to account for scalability. We recommend functionally separating the networks
 for security purposes and tuning performance through traffic shaping.
 You must consider a number of important general technical and business factors
 when planning and designing an OpenStack network. These include:
 * A requirement for vendor independence. To avoid hardware or software vendor
  lock-in, the design should not rely on specific features of a vendors router
  or switch.
 * A requirement to massively scale the ecosystem to support millions of end
  users.
 * A requirement to support indeterminate platforms and applications.
 * A requirement to design for cost efficient operations to take advantage of
  massive scale.
 * A requirement to ensure that there is no single point of failure in the
  cloud ecosystem.
 * A requirement for high availability architecture to meet customer SLA
  requirements.
 * A requirement to be tolerant of rack level failure.
 * A requirement to maximize flexibility to architect future production
  environments.
 Bearing in mind these considerations, we recommend the following:
 * Layer-3 designs are preferable to layer-2 architectures.
 * Design a dense multi-path network core to support multi-directional
  scaling and flexibility.
 * Use hierarchical addressing because it is the only viable option to scale
  network ecosystem.
 * Use virtual networking to isolate instance service network traffic from the
  management and internal network traffic.
 * Isolate virtual networks using encapsulation technologies.
 * Use traffic shaping for performance tuning.
 * Use eBGP to connect to the Internet up-link.
 * Use iBGP to flatten the internal traffic on the layer-3 mesh.
 * Determine the most effective configuration for block storage network.
 Operator considerations
 -----------------------
 The network design for an OpenStack cluster includes decisions regarding
 the interconnect needs within the cluster, the need to allow clients to
 access their resources, and the access requirements for operators to
 administrate the cluster. You should consider the bandwidth, latency,
 and reliability of these networks.
 Whether you are using an external provider or an internal team, you need
 to consider additional design decisions about monitoring and alarming.
 If you are using an external provider, service level agreements (SLAs)
 are typically defined in your contract. Operational considerations such
 as bandwidth, latency, and jitter can be part of the SLA.
 As demand for network resources increase, make sure your network design
 accommodates expansion and upgrades. Operators add additional IP address
 blocks and add additional bandwidth capacity. In addition, consider
 managing hardware and software lifecycle events, for example upgrades,
 decommissioning, and outages, while avoiding service interruptions for
 tenants.
 Factor maintainability into the overall network design. This includes
 the ability to manage and maintain IP addresses as well as the use of
 overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
 tags. As an example, if you may need to change all of the IP addresses
 on a network, a process known as renumbering, then the design must
 support this function.
 Address network-focused applications when considering certain
 operational realities. For example, consider the impending exhaustion of
 IPv4 addresses, the migration to IPv6, and the use of private networks
 to segregate different types of traffic that an application receives or
 generates. In the case of IPv4 to IPv6 migrations, applications should
 follow best practices for storing IP addresses. We recommend you avoid
 relying on IPv4 features that did not carry over to the IPv6 protocol or
 have differences in implementation.
 To segregate traffic, allow applications to create a private tenant
 network for database and storage network traffic. Use a public network
 for services that require direct client access from the Internet. Upon
 segregating the traffic, consider :term:`quality of service (QoS)` and
 security to ensure each network has the required level of service.
 Finally, consider the routing of network traffic. For some applications,
 develop a complex policy framework for routing. To create a routing
 policy that satisfies business requirements, consider the economic cost
 of transmitting traffic over expensive links versus cheaper links, in
 addition to bandwidth, latency, and jitter requirements.
 Additionally, consider how to respond to network events. How load
 transfers from one link to another during a failure scenario could be
 a factor in the design. If you do not plan network capacity
 correctly, failover traffic could overwhelm other ports or network
 links and create a cascading failure scenario. In this case,
 traffic that fails over to one link overwhelms that link and then
 moves to the subsequent links until all network traffic stops.
 Additional considerations
 -------------------------
 There are several further considerations when designing a network-focused
 OpenStack cloud. Redundant networking: ToR switch high availability risk
 analysis. In most cases, it is much more economical to use a single switch
 with a small pool of spare switches to replace failed units than it is to
 outfit an entire data center with redundant switches. Applications should
 tolerate rack level outages without affecting normal operations since network
 and compute resources are easily provisioned and plentiful.
 Research indicates the mean time between failures (MTBF) on switches is
 between 100,000 and 200,000 hours. This number is dependent on the ambient
 temperature of the switch in the data center. When properly cooled and
 maintained, this translates to between 11 and 22 years before failure. Even
 in the worst case of poor ventilation and high ambient temperatures in the data
 center, the MTBF is still 2-3 years.
 Reference
 https://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
 for further information.
 Legacy networking (nova-network)
 OpenStack Networking
 Simple, single agent
 Complex, multiple agents
 Flat or VLAN
 Flat, VLAN, Overlays, L2-L3, SDN
 No plug-in support
 Plug-in support for 3rd parties
 No multi-tier topologies
 Multi-tier topologies
 Preparing for the future: IPv6 support
 --------------------------------------
 One of the most important networking topics today is the exhaustion of
 IPv4 addresses. As of late 2015, ICANN announced that the the final
 IPv4 address blocks have been fully assigned. Because of this, IPv6
 protocol has become the future of network focused applications. IPv6
 increases the address space significantly, fixes long standing issues
 in the IPv4 protocol, and will become essential for network focused
 applications in the future.
 OpenStack Networking, when configured for it, supports IPv6. To enable
 IPv6, create an IPv6 subnet in Networking and use IPv6 prefixes when
 creating security groups.
 Asymmetric links
 ----------------
 When designing a network architecture, the traffic patterns of an
 application heavily influence the allocation of total bandwidth and
 the number of links that you use to send and receive traffic. Applications
 that provide file storage for customers allocate bandwidth and links to
 favor incoming traffic; whereas video streaming applications allocate
 bandwidth and links to favor outgoing traffic.
 Performance
 -----------
 It is important to analyze the applications tolerance for latency and
 jitter when designing an environment to support network focused
 applications. Certain applications, for example VoIP, are less tolerant
 of latency and jitter. When latency and jitter are issues, certain
 applications may require tuning of QoS parameters and network device
 queues to ensure that they queue for transmit immediately or guarantee
 minimum bandwidth. Since OpenStack currently does not support these functions,
 consider carefully your selected network plug-in.
 The location of a service may also impact the application or consumer
 experience. If an application serves differing content to different users,
 it must properly direct connections to those specific locations. Where
 appropriate, use a multi-site installation for these situations.
 You can implement networking in two separate ways. Legacy networking
 (nova-network) provides a flat DHCP network with a single broadcast domain.
 This implementation does not support tenant isolation networks or advanced
 plug-ins, but it is currently the only way to implement a distributed
 layer-3 (L3) agent using the multi host configuration. OpenStack Networking
 (neutron) is the official networking implementation and provides a pluggable
 architecture that supports a large variety of network methods. Some of these
 include a layer-2 only provider network model, external device plug-ins, or
 even OpenFlow controllers.
 Networking at large scales becomes a set of boundary questions. The
 determination of how large a layer-2 domain must be is based on the
 amount of nodes within the domain and the amount of broadcast traffic
 that passes between instances. Breaking layer-2 boundaries may require
 the implementation of overlay networks and tunnels. This decision is a
 balancing act between the need for a smaller overhead or a need for a smaller
 domain.
 When selecting network devices, be aware that making a decision based on the
 greatest port density often comes with a drawback. Aggregation switches and
 routers have not all kept pace with Top of Rack switches and may induce
 bottlenecks on north-south traffic. As a result, it may be possible for
 massive amounts of downstream network utilization to impact upstream network
 devices, impacting service to the cloud. Since OpenStack does not currently
 provide a mechanism for traffic shaping or rate limiting, it is necessary to
 implement these features at the network hardware level.
 Tunable networking components
 -----------------------------
 Consider configurable networking components related to an OpenStack
 architecture design when designing for network intensive workloads
 that include MTU and QoS. Some workloads require a larger MTU than normal
 due to the transfer of large blocks of data. When providing network
 service for applications such as video streaming or storage replication,
 we recommend that you configure both OpenStack hardware nodes and the
 supporting network equipment for jumbo frames where possible. This
 allows for better use of available bandwidth. Configure jumbo frames across the
 complete path the packets traverse. If one network component is not capable of
 handling jumbo frames then the entire path reverts to the default MTU.
 :term:`Quality of Service (QoS)` also has a great impact on network intensive
 workloads as it provides instant service to packets which have a higher
 priority due to the impact of poor network performance. In applications such as
 Voice over IP (VoIP), differentiated services code points are a near
 requirement for proper operation. You can also use QoS in the opposite
 direction for mixed workloads to prevent low priority but high bandwidth
 applications, for example backup services, video conferencing, or file sharing,
 from blocking bandwidth that is needed for the proper operation of other
 workloads. It is possible to tag file storage traffic as a lower class, such as
 best effort or scavenger, to allow the higher priority traffic through. In
 cases where regions within a cloud might be geographically distributed it may
 also be necessary to plan accordingly to implement WAN optimization to combat
 latency or packet loss.
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-software-selection.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements-software-selection.rst
@ -1,260 +0,0 @@
 ==================
 Software selection
 ==================
 Software selection, particularly for a general purpose OpenStack architecture
 design involves three areas:
 * Operating system (OS) and hypervisor
 * OpenStack components
 * Supplemental software
 Operating system and hypervisor
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The operating system (OS) and hypervisor have a significant impact on
 the overall design. Selecting a particular operating system and
 hypervisor can directly affect server hardware selection. Make sure the
 storage hardware and topology support the selected operating system and
 hypervisor combination. Also ensure the networking hardware selection
 and topology will work with the chosen operating system and hypervisor
 combination.
 Some areas that could be impacted by the selection of OS and hypervisor
 include:
 Cost
 Selecting a commercially supported hypervisor, such as Microsoft
 Hyper-V, will result in a different cost model rather than
 community-supported open source hypervisors including
 :term:`KVM<kernel-based VM (KVM)>`, Kinstance or :term:`Xen`. When
 comparing open source OS solutions, choosing Ubuntu over Red Hat
 (or vice versa) will have an impact on cost due to support
 contracts.
 Support
 Depending on the selected hypervisor, staff should have the
 appropriate training and knowledge to support the selected OS and
 hypervisor combination. If they do not, training will need to be
 provided which could have a cost impact on the design.
 Management tools
 The management tools used for Ubuntu and Kinstance differ from the
 management tools for VMware vSphere. Although both OS and hypervisor
 combinations are supported by OpenStack, there is
 different impact to the rest of the design as a result of the
 selection of one combination versus the other.
 Scale and performance
 Ensure that selected OS and hypervisor combinations meet the
 appropriate scale and performance requirements. The chosen
 architecture will need to meet the targeted instance-host ratios
 with the selected OS-hypervisor combinations.
 Security
 Ensure that the design can accommodate regular periodic
 installations of application security patches while maintaining
 required workloads. The frequency of security patches for the
 proposed OS-hypervisor combination will have an impact on
 performance and the patch installation process could affect
 maintenance windows.
 Supported features
 Determine which OpenStack features are required. This will often
 determine the selection of the OS-hypervisor combination. Some
 features are only available with specific operating systems or
 hypervisors.
 Interoperability
 You will need to consider how the OS and hypervisor combination
 interactions with other operating systems and hypervisors, including
 other software solutions. Operational troubleshooting tools for one
 OS-hypervisor combination may differ from the tools used for another
 OS-hypervisor combination and, as a result, the design will need to
 address if the two sets of tools need to interoperate.
 OpenStack components
 ~~~~~~~~~~~~~~~~~~~~
 Selecting which OpenStack components are included in the overall design
 is important. Some OpenStack components, like compute and Image service,
 are required in every architecture. Other components, like
 Orchestration, are not always required.
 A compute-focused OpenStack design architecture may contain the following
 components:
 * Identity (keystone)
 * Dashboard (horizon)
 * Compute (nova)
 * Object Storage (swift)
 * Image (glance)
 * Networking (neutron)
 * Orchestration (heat)
  .. note::
     A compute-focused design is less likely to include OpenStack Block
     Storage. However, there may be some situations where the need for
     performance requires a block storage component to improve data I-O.
 Excluding certain OpenStack components can limit or constrain the
 functionality of other components. For example, if the architecture
 includes Orchestration but excludes Telemetry, then the design will not
 be able to take advantage of Orchestrations' auto scaling functionality.
 It is important to research the component interdependencies in
 conjunction with the technical requirements before deciding on the final
 architecture.
 Networking software
 ~~~~~~~~~~~~~~~~~~~
 OpenStack Networking (neutron) provides a wide variety of networking
 services for instances. There are many additional networking software
 packages that can be useful when managing OpenStack components. Some
 examples include:
 * Software to provide load balancing
 * Network redundancy protocols
 * Routing daemons
 Some of these software packages are described in more detail in the
 `OpenStack network nodes chapter <http://docs.openstack.org/ha-guide
 /networking-ha.html>`_ in the OpenStack High Availability Guide.
 For a general purpose OpenStack cloud, the OpenStack infrastructure
 components need to be highly available. If the design does not include
 hardware load balancing, networking software packages like HAProxy will
 need to be included.
 For a compute-focused OpenStack cloud, the OpenStack infrastructure
 components must be highly available. If the design does not include
 hardware load balancing, you must add networking software packages, for
 example, HAProxy.
 Management software
 ~~~~~~~~~~~~~~~~~~~
 Management software includes software for providing:
 * Clustering
 * Logging
 * Monitoring
 * Alerting
 .. important::
   The factors for determining which software packages in this category
   to select is outside the scope of this design guide.
 The selected supplemental software solution impacts and affects the overall
 OpenStack cloud design. This includes software for providing clustering,
 logging, monitoring and alerting.
 The inclusion of clustering software, such as Corosync or Pacemaker, is
 primarily determined by the availability of the cloud infrastructure and
 the complexity of supporting the configuration after it is deployed. The
 `OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/>`_
 provides more details on the installation and configuration of Corosync
 and Pacemaker, should these packages need to be included in the design.
 Operational considerations determine the requirements for logging,
 monitoring, and alerting. Each of these sub-categories include various
 options.
 For example, in the logging sub-category you could select Logstash,
 Splunk, Log Insight, or another log aggregation-consolidation tool.
 Store logs in a centralized location to facilitate performing analytics
 against the data. Log data analytics engines can also provide automation
 and issue notification, by providing a mechanism to both alert and
 automatically attempt to remediate some of the more commonly known
 issues.
 If these software packages are required, the design must account for the
 additional resource consumption (CPU, RAM, storage, and network
 bandwidth). Some other potential design impacts include:
 * OS-hypervisor combination
   Ensure that the selected logging, monitoring, or alerting tools support
   the proposed OS-hypervisor combination.
 * Network hardware
   The network hardware selection needs to be supported by the logging,
   monitoring, and alerting software.
 Database software
 ~~~~~~~~~~~~~~~~~
 Most OpenStack components require access to back-end database services
 to store state and configuration information. Choose an appropriate
 back-end database which satisfies the availability and fault tolerance
 requirements of the OpenStack services.
 MySQL is the default database for OpenStack, but other compatible
 databases are available.
 .. note::
   Telemetry uses MongoDB.
 The chosen high availability database solution changes according to the
 selected database. MySQL, for example, provides several options. Use a
 replication technology such as Galera for active-active clustering. For
 active-passive use some form of shared storage. Each of these potential
 solutions has an impact on the design:
 * Solutions that employ Galera/MariaDB require at least three MySQL
  nodes.
 * MongoDB has its own design considerations for high availability.
 * OpenStack design, generally, does not include shared storage.
  However, for some high availability designs, certain components might
  require it depending on the specific implementation.
 Licensing
 ~~~~~~~~~
 The many different forms of license agreements for software are often written
 with the use of dedicated hardware in mind.  This model is relevant for the
 cloud platform itself, including the hypervisor operating system, supporting
 software for items such as database, RPC, backup, and so on.  Consideration
 must be made when offering Compute service instances and applications to end
 users of the cloud, since the license terms for that software may need some
 adjustment to be able to operate economically in the cloud.
 Multi-site OpenStack deployments present additional licensing
 considerations over and above regular OpenStack clouds, particularly
 where site licenses are in use to provide cost efficient access to
 software licenses. The licensing for host operating systems, guest
 operating systems, OpenStack distributions (if applicable),
 software-defined infrastructure including network controllers and
 storage systems, and even individual applications need to be evaluated.
 Topics to consider include:
 * The definition of what constitutes a site in the relevant licenses,
  as the term does not necessarily denote a geographic or otherwise
  physically isolated location.
 * Differentiations between "hot" (active) and "cold" (inactive) sites,
  where significant savings may be made in situations where one site is
  a cold standby for disaster recovery purposes only.
 * Certain locations might require local vendors to provide support and
  services for each site which may vary with the licensing agreement in
  place.
--- a/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements.rst
+++ b/doc/arch-design-draft/source/arch-guide-draft-mitaka/technical-requirements.rst
@ -1,61 +0,0 @@
 ======================
 Technical requirements
 ======================
 .. toctree::
   :maxdepth: 2
   technical-requirements-software-selection.rst
   technical-requirements-hardware-selection.rst
   technical-requirements-network-design.rst
   technical-requirements-logging-monitoring.rst
 Any given cloud deployment is expected to include these base services:
 * Compute
 * Networking
 * Storage
 Each of these services have different software and hardware resource
 requirements.
 As a result, you must make design decisions relating directly
 to the service, as well as provide a balanced infrastructure for all services.
 There are many ways to split out an OpenStack deployment, but a two box
 deployment typically consists of:
 * A controller node
 * A compute node
 The controller node will typically host:
 * Identity service (for authentication)
 * Image service (for image storage)
 * Block Storage
 * Networking service (the ``nova-network`` service may be used instead)
 * Compute service API, conductor, and scheduling services
 * Supporting services like the message broker (RabbitMQ)
  and database (MySQL or PostgreSQL)
 The compute node will typically host:
 * Nova compute
 * A networking agent, if using OpenStack Networking
 To provide additional block storage in a small environment, you may also
 choose to deploy ``cinder-volume`` on the compute node.
 You may also choose to run ``nova-compute`` on the controller itself to
 allow you to run virtual machines on both hosts in a small environments.
 To expand such an environment you would add additional compute nodes,
 a separate networking node, and eventually a second controller for high
 availability. You might also split out storage to dedicated nodes.
 The OpenStack Installation guides provide some guidance on getting a basic
 2-3 node deployment installed and running:
 * `OpenStack Installation Guide for Ubuntu <http://docs.openstack.org/mitaka/install-guide-ubuntu/>`_
 * `OpenStack Installation Guide for Red Hat Enterprise Linux and CentOS <http://docs.openstack.org/mikata/install-guide-rdo/>`_
 * `OpenStack Installation Guide for openSUSE and SUSE Linux Enterprise <http://docs.openstack.org/mitaka/install-guide-obs/>`_
--- a/doc/arch-design-draft/source/design-compute.rst
+++ b/doc/arch-design-draft/source/design-compute.rst
@ -134,7 +134,7 @@ of cores is further multiplied.
   testing with your local workload with both Hyper-Threading on and off to
   determine what is more appropriate in your case.
-Choosing a Hypervisor
+Choosing a hypervisor
 ~~~~~~~~~~~~~~~~~~~~~
 A hypervisor provides software to manage virtual machine access to the
@ -173,6 +173,110 @@ and in the `configuration reference
   deployment using host aggregates or cells. However, an individual
   compute node can run only a single hypervisor at a time.
 Choosing server hardware
 ~~~~~~~~~~~~~~~~~~~~~~~~
 Consider the following in selecting server hardware form factor suited for
 your OpenStack design architecture:
 * Most blade servers can support dual-socket multi-core CPUs. To avoid
  this CPU limit, select ``full width`` or ``full height`` blades. Be
  aware, however, that this also decreases server density. For example,
  high density blade servers such as HP BladeSystem or Dell PowerEdge
  M1000e support up to 16 servers in only ten rack units. Using
  half-height blades is twice as dense as using full-height blades,
  which results in only eight servers per ten rack units.
 * 1U rack-mounted servers have the ability to offer greater server density
  than a blade server solution, but are often limited to dual-socket,
  multi-core CPU configurations. It is possible to place forty 1U servers
  in a rack, providing space for the top of rack (ToR) switches, compared
  to 32 full width blade servers.
  To obtain greater than dual-socket support in a 1U rack-mount form
  factor, customers need to buy their systems from Original Design
  Manufacturers (ODMs) or second-tier manufacturers.
  .. warning::
     This may cause issues for organizations that have preferred
     vendor policies or concerns with support and hardware warranties
     of non-tier 1 vendors.
 * 2U rack-mounted servers provide quad-socket, multi-core CPU support,
  but with a corresponding decrease in server density (half the density
  that 1U rack-mounted servers offer).
 * Larger rack-mounted servers, such as 4U servers, often provide even
  greater CPU capacity, commonly supporting four or even eight CPU
  sockets. These servers have greater expandability, but such servers
  have much lower server density and are often more expensive.
 * ``Sled servers`` are rack-mounted servers that support multiple
  independent servers in a single 2U or 3U enclosure. These deliver
  higher density as compared to typical 1U or 2U rack-mounted servers.
  For example, many sled servers offer four independent dual-socket
  nodes in 2U for a total of eight CPU sockets in 2U.
 Other hardware considerations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Other factors that influence server hardware selection for an OpenStack
 design architecture include:
 Instance density
 More hosts are required to support the anticipated scale
 if the design architecture uses dual-socket hardware designs.
 For a general purpose OpenStack cloud, sizing is an important consideration.
 The expected or anticipated number of instances that each hypervisor can
 host is a common meter used in sizing the deployment. The selected server
 hardware needs to support the expected or anticipated instance density.
 Host density
 Another option to address the higher host count is to use a
 quad-socket platform. Taking this approach decreases host density
 which also increases rack count. This configuration affects the
 number of power connections and also impacts network and cooling
 requirements.
 Physical data centers have limited physical space, power, and
 cooling. The number of hosts (or hypervisors) that can be fitted
 into a given metric (rack, rack unit, or floor tile) is another
 important method of sizing. Floor weight is an often overlooked
 consideration. The data center floor must be able to support the
 weight of the proposed number of hosts within a rack or set of
 racks. These factors need to be applied as part of the host density
 calculation and server hardware selection.
 Power and cooling density
 The power and cooling density requirements might be lower than with
 blade, sled, or 1U server designs due to lower host density (by
 using 2U, 3U or even 4U server designs). For data centers with older
 infrastructure, this might be a desirable feature.
 Data centers have a specified amount of power fed to a given rack or
 set of racks. Older data centers may have a power density as power
 as low as 20 AMPs per rack, while more recent data centers can be
 architected to support power densities as high as 120 AMP per rack.
 The selected server hardware must take power density into account.
 Network connectivity
 The selected server hardware must have the appropriate number of
 network connections, as well as the right type of network
 connections, in order to support the proposed architecture. Ensure
 that, at a minimum, there are at least two diverse network
 connections coming into each rack.
 The selection of form factors or architectures affects the selection of
 server hardware. Ensure that the selected server hardware is configured
 to support enough storage capacity (or storage expandability) to match
 the requirements of selected scale-out storage solution. Similarly, the
 network architecture impacts the server hardware selection and vice
 versa.
 Instance Storage Solutions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -381,3 +485,144 @@ Networking
 Networking in OpenStack is a complex, multifaceted challenge. See
 :doc:`design-networking`.
 Compute (server) hardware selection
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider the following factors when selecting compute (server) hardware:
 * Server density
   A measure of how many servers can fit into a given measure of
   physical space, such as a rack unit [U].
 * Resource capacity
   The number of CPU cores, how much RAM, or how much storage a given
   server delivers.
 * Expandability
   The number of additional resources you can add to a server before it
   reaches capacity.
 * Cost
   The relative cost of the hardware weighed against the level of
   design effort needed to build the system.
 Weigh these considerations against each other to determine the best
 design for the desired purpose. For example, increasing server density
 means sacrificing resource capacity or expandability.  Increasing resource
 capacity and expandability can increase cost but decrease server density.
 Decreasing cost often means decreasing supportability, server density,
 resource capacity, and expandability.
 Compute capacity (CPU cores and RAM capacity) is a secondary
 consideration for selecting server hardware. The required
 server hardware must supply adequate CPU sockets, additional CPU cores,
 and more RAM; network connectivity and storage capacity are not as
 critical. The hardware needs to provide enough network connectivity and
 storage capacity to meet the user requirements.
 For a compute-focused cloud, emphasis should be on server
 hardware that can offer more CPU sockets, more CPU cores, and more RAM.
 Network connectivity and storage capacity are less critical.
 When designing a OpenStack cloud architecture, you must
 consider whether you intend to scale up or scale out. Selecting a
 smaller number of larger hosts, or a larger number of smaller hosts,
 depends on a combination of factors: cost, power, cooling, physical rack
 and floor space, support-warranty, and manageability.
 Consider the following in selecting server hardware form factor suited for
 your OpenStack design architecture:
 * Most blade servers can support dual-socket multi-core CPUs. To avoid
  this CPU limit, select ``full width`` or ``full height`` blades. Be
  aware, however, that this also decreases server density. For example,
  high density blade servers such as HP BladeSystem or Dell PowerEdge
  M1000e support up to 16 servers in only ten rack units. Using
  half-height blades is twice as dense as using full-height blades,
  which results in only eight servers per ten rack units.
 * 1U rack-mounted servers have the ability to offer greater server density
  than a blade server solution, but are often limited to dual-socket,
  multi-core CPU configurations. It is possible to place forty 1U servers
  in a rack, providing space for the top of rack (ToR) switches, compared
  to 32 full width blade servers.
  To obtain greater than dual-socket support in a 1U rack-mount form
  factor, customers need to buy their systems from Original Design
  Manufacturers (ODMs) or second-tier manufacturers.
  .. warning::
     This may cause issues for organizations that have preferred
     vendor policies or concerns with support and hardware warranties
     of non-tier 1 vendors.
 * 2U rack-mounted servers provide quad-socket, multi-core CPU support,
  but with a corresponding decrease in server density (half the density
  that 1U rack-mounted servers offer).
 * Larger rack-mounted servers, such as 4U servers, often provide even
  greater CPU capacity, commonly supporting four or even eight CPU
  sockets. These servers have greater expandability, but such servers
  have much lower server density and are often more expensive.
 * ``Sled servers`` are rack-mounted servers that support multiple
  independent servers in a single 2U or 3U enclosure. These deliver
  higher density as compared to typical 1U or 2U rack-mounted servers.
  For example, many sled servers offer four independent dual-socket
  nodes in 2U for a total of eight CPU sockets in 2U.
 Other factors that influence server hardware selection for an OpenStack
 design architecture include:
 Instance density
 More hosts are required to support the anticipated scale
 if the design architecture uses dual-socket hardware designs.
 For a general purpose OpenStack cloud, sizing is an important consideration.
 The expected or anticipated number of instances that each hypervisor can
 host is a common meter used in sizing the deployment. The selected server
 hardware needs to support the expected or anticipated instance density.
 Host density
 Another option to address the higher host count is to use a
 quad-socket platform. Taking this approach decreases host density
 which also increases rack count. This configuration affects the
 number of power connections and also impacts network and cooling
 requirements.
 Physical data centers have limited physical space, power, and
 cooling. The number of hosts (or hypervisors) that can be fitted
 into a given metric (rack, rack unit, or floor tile) is another
 important method of sizing. Floor weight is an often overlooked
 consideration. The data center floor must be able to support the
 weight of the proposed number of hosts within a rack or set of
 racks. These factors need to be applied as part of the host density
 calculation and server hardware selection.
 Power and cooling density
 The power and cooling density requirements might be lower than with
 blade, sled, or 1U server designs due to lower host density (by
 using 2U, 3U or even 4U server designs). For data centers with older
 infrastructure, this might be a desirable feature.
 Data centers have a specified amount of power fed to a given rack or
 set of racks. Older data centers may have a power density as power
 as low as 20 AMPs per rack, while more recent data centers can be
 architected to support power densities as high as 120 AMP per rack.
 The selected server hardware must take power density into account.
 Network connectivity
 The selected server hardware must have the appropriate number of
 network connections, as well as the right type of network
 connections, in order to support the proposed architecture. Ensure
 that, at a minimum, there are at least two diverse network
 connections coming into each rack.
 The selection of form factors or architectures affects the selection of
 server hardware. Ensure that the selected server hardware is configured
 to support enough storage capacity (or storage expandability) to match
 the requirements of selected scale-out storage solution. Similarly, the
 network architecture impacts the server hardware selection and vice
 versa.
--- a/doc/arch-design-draft/source/design-networking/design-networking-concepts.rst
+++ b/doc/arch-design-draft/source/design-networking/design-networking-concepts.rst
@ -4,6 +4,516 @@ Networking concepts
 Cloud fundementally changes the ways that networking is provided and consumed.
 Understanding the following concepts and decisions is imperative when making
-the right architectural decisions
+the right architectural decisions.
 OpenStack clouds generally have multiple network segments, with each
 segment providing access to particular resources. The network segments
 themselves also require network communication paths that should be
 separated from the other networks. When designing network services for a
 general purpose cloud, plan for either a physical or logical separation
 of network segments used by operators and tenants. Additional network
 segments can also be created for access to internal services such as the
 message bus and database used by various systems. Segregating these
 services onto separate networks helps to protect sensitive data and
 unauthorized access.
 Choose a networking service based on the requirements of your instances.
 The architecture and design of your cloud will impact whether you choose
 OpenStack Networking (neutron) or legacy networking (nova-network).
 Networking (neutron)
 ~~~~~~~~~~~~~~~~~~~~
 OpenStack Networking (neutron) is a first class networking service that gives
 full control over creation of virtual network resources to tenants. This is
 often accomplished in the form of tunneling protocols that establish
 encapsulated communication paths over existing network infrastructure in order
 to segment tenant traffic. This method varies depending on the specific
 implementation, but some of the more common methods include tunneling over
 GRE, encapsulating with VXLAN, and VLAN tags.
 We recommend you design at least three network segments. The first segment
 should be a public network, used to access REST APIs by tenants and operators.
 The controller nodes and swift proxies are the only devices connecting to this
 network segment. In some cases, this public network might also be serviced by
 hardware load balancers and other network devices.
 The second segment is used by administrators to manage hardware resources.
 Configuration management tools also utilize this segment for deploying
 software and services onto new hardware. In some cases, this network
 segment is also used for internal services, including the message bus
 and database services. The second segment needs to communicate with every
 hardware node. Due to the highly sensitive nature of this network segment,
 it needs to be secured from unauthorized access.
 The third network segment is used by applications and consumers to access the
 physical network, and for users to access applications. This network is
 segregated from the one used to access the cloud APIs and is not capable
 of communicating directly with the hardware resources in the cloud.
 Communication on this network segment is required by compute resource
 nodes and network gateway services that allow application data to access the
 physical network from outside the cloud.
 Legacy networking (nova-network)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The legacy networking (nova-network) service is primarily a layer-2 networking
 service. It functions in two modes: flat networking mode and VLAN mode. In a
 flat network mode, all network hardware nodes and devices throughout the cloud
 are connected to a single layer-2 network segment that provides access to
 application data.
 However, when the network devices in the cloud support segmentation using
 VLANs, legacy networking can operate in the second mode. In this design model,
 each tenant within the cloud is assigned a network subnet which is mapped to
 a VLAN on the physical network. It is especially important to remember that
 the maximum number of VLANs that can be used within a spanning tree domain
 is 4096. This places a hard limit on the amount of growth possible within the
 data center. Consequently, when designing a general purpose cloud intended to
 support multiple tenants, we recommend the use of legacy networking with
 VLANs, and not in flat network mode.
 Layer-2 architecture advantages
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 A network designed on layer-2 protocols has advantages over a network designed
 on layer-3 protocols. In spite of the difficulties of using a bridge to perform
 the network role of a router, many vendors, customers, and service providers
 choose to use Ethernet in as many parts of their networks as possible. The
 benefits of selecting a layer-2 design are:
 * Ethernet frames contain all the essentials for networking. These include, but
  are not limited to, globally unique source addresses, globally unique
  destination addresses, and error control.
 * Ethernet frames contain all the essentials for networking. These include,
  but are not limited to, globally unique source addresses, globally unique
  destination addresses, and error control.
 * Ethernet frames can carry any kind of packet. Networking at layer-2 is
  independent of the layer-3 protocol.
 * Adding more layers to the Ethernet frame only slows the networking process
  down. This is known as nodal processing delay.
 * You can add adjunct networking features, for example class of service (CoS)
  or multicasting, to Ethernet as readily as IP networks.
 * VLANs are an easy mechanism for isolating networks.
 Most information starts and ends inside Ethernet frames. Today this applies
 to data, voice, and video. The concept is that the network will benefit more
 from the advantages of Ethernet if the transfer of information from a source
 to a destination is in the form of Ethernet frames.
 Although it is not a substitute for IP networking, networking at layer-2 can
 be a powerful adjunct to IP networking.
 Layer-2 Ethernet usage has these additional advantages over layer-3 IP network
 usage:
 * Speed
 * Reduced overhead of the IP hierarchy.
 * No need to keep track of address configuration as systems move around.
 Whereas the simplicity of layer-2 protocols might work well in a data center
 with hundreds of physical machines, cloud data centers have the additional
 burden of needing to keep track of all virtual machine addresses and
 networks. In these data centers, it is not uncommon for one physical node
 to support 30-40 instances.
 .. Important::
   Networking at the frame level says nothing about the presence or
   absence of IP addresses at the packet level. Almost all ports, links, and
   devices on a network of LAN switches still have IP addresses, as do all the
   source and destination hosts. There are many reasons for the continued need
   for IP addressing. The largest one is the need to manage the network. A
   device or link without an IP address is usually invisible to most
   management applications. Utilities including remote access for diagnostics,
   file transfer of configurations and software, and similar applications
   cannot run without IP addresses as well as MAC addresses.
 Layer-2 architecture limitations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Layer-2 network architectures have some limitations that become noticeable when
 used outside of traditional data centers.
 * Number of VLANs is limited to 4096.
 * The number of MACs stored in switch tables is limited.
 * You must accommodate the need to maintain a set of layer-4 devices to handle
  traffic control.
 * MLAG, often used for switch redundancy, is a proprietary solution that does
  not scale beyond two devices and forces vendor lock-in.
 * It can be difficult to troubleshoot a network without IP addresses and ICMP.
 * Configuring ARP can be complicated on a large layer-2 networks.
 * All network devices need to be aware of all MACs, even instance MACs, so
  there is constant churn in MAC tables and network state changes as instances
  start and stop.
 * Migrating MACs (instance migration) to different physical locations are a
  potential problem if you do not set ARP table timeouts properly.
 It is important to know that layer-2 has a very limited set of network
 management tools. It is difficult to control traffic as it does not have
 mechanisms to manage the network or shape the traffic. Network
 troubleshooting is also troublesome, in part because network devices have
 no IP addresses. As a result, there is no reasonable way to check network
 delay.
 In a layer-2 network all devices are aware of all MACs, even those that belong
 to instances. The network state information in the backbone changes whenever an
 instance starts or stops. Because of this, there is far too much churn in the
 MAC tables on the backbone switches.
 Furthermore, on large layer-2 networks, configuring ARP learning can be
 complicated. The setting for the MAC address timer on switches is critical
 and, if set incorrectly, can cause significant performance problems. So when
 migrating MACs to different physical locations to support instance migration,
 problems may arise. As an example, the Cisco default MAC address timer is
 extremely long. As such, the network information maintained in the switches
 could be out of sync with the new location of the instance.
 Layer-3 architecture advantages
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In layer-3 networking, routing takes instance MAC and IP addresses out of the
 network core, reducing state churn. The only time there would be a routing
 state change is in the case of a Top of Rack (ToR) switch failure or a link
 failure in the backbone itself. Other advantages of using a layer-3
 architecture include:
 * Layer-3 networks provide the same level of resiliency and scalability
  as the Internet.
 * Controlling traffic with routing metrics is straightforward.
 * You can configure layer-3 to useˇBGPˇconfederation for scalability. This
  way core routers have state proportional to the number of racks, not to the
  number of servers or instances.
 * There are a variety of well tested tools, such as ICMP, to monitor and
  manage traffic.
 * Layer-3 architectures enable the use of :term:`quality of service (QoS)` to
  manage network performance.
 Layer-3 architecture limitations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The main limitation of layer-3 networking is that there is no built-in
 isolation mechanism comparable to the VLANs in layer-2 networks. Furthermore,
 the hierarchical nature of IP addresses means that an instance is on the same
 subnet as its
 physical host, making migration out of the subnet difficult. For these reasons,
 network virtualization needs to use IPencapsulation and software at the end
 hosts. This is for isolation and the separation of the addressing in the
 virtual layer from the addressing in the physical layer. Other potential
 disadvantages of layer 3 include the need to design an IP addressing scheme
 rather than relying on the switches to keep track of the MAC addresses
 automatically, and to configure the interior gateway routing protocol in the
 switches.
 Network design
 ~~~~~~~~~~~~~~
 There are many reasons an OpenStack network has complex requirements. However,
 one main factor is the many components that interact at different levels of the
 system stack, adding complexity. Data flows are also complex. Data in an
 OpenStack cloud moves both between instances across the network (also known as
 East-West), as well as in and out of the system (also known as North-South).
 Physical server nodes have network requirements that are independent of
 instance network requirements, and must be isolated to account for
 scalability. We recommend separating the networks for security purposes and
 tuning performance through traffic shaping.
 You must consider a number of important general technical and business factors
 when planning and designing an OpenStack network. These include:
 * A requirement for vendor independence. To avoid hardware or software vendor
  lock-in, the design should not rely on specific features of a vendors router
  or switch.
 * A requirement to massively scale the ecosystem to support millions of end
  users.
 * A requirement to support indeterminate platforms and applications.
 * A requirement to design for cost efficient operations to take advantage of
  massive scale.
 * A requirement to ensure that there is no single point of failure in the
  cloud ecosystem.
 * A requirement for high availability architecture to meet customer SLA
  requirements.
 * A requirement to be tolerant of rack level failure.
 * A requirement to maximize flexibility to architect future production
  environments.
 Bearing in mind these considerations, we recommend the following:
 * Layer-3 designs are preferable to layer-2 architectures.
 * Design a dense multi-path network core to support multi-directional
  scaling and flexibility.
 * Use hierarchical addressing because it is the only viable option to scale
  network ecosystem.
 * Use virtual networking to isolate instance service network traffic from the
  management and internal network traffic.
 * Isolate virtual networks using encapsulation technologies.
 * Use traffic shaping for performance tuning.
 * Use eBGP to connect to the Internet up-link.
 * Use iBGP to flatten the internal traffic on the layer-3 mesh.
 * Determine the most effective configuration for block storage network.
 Additional considerations
 -------------------------
 There are several further considerations when designing a network-focused
 OpenStack cloud. Redundant networking: ToR switch high availability risk
 analysis. In most cases, it is much more economical to use a single switch
 with a small pool of spare switches to replace failed units than it is to
 outfit an entire data center with redundant switches. Applications should
 tolerate rack level outages without affecting normal operations since network
 and compute resources are easily provisioned and plentiful.
 Research indicates the mean time between failures (MTBF) on switches is
 between 100,000 and 200,000 hours. This number is dependent on the ambient
 temperature of the switch in the data center. When properly cooled and
 maintained, this translates to between 11 and 22 years before failure. Even
 in the worst case of poor ventilation and high ambient temperatures in the data
 center, the MTBF is still 2-3 years.
 Reference
 https://www.garrettcom.com/techsupport/papers/ethernet_switch_reliability.pdf
 for further information.
 Legacy networking (nova-network)
 OpenStack Networking
 Simple, single agent
 Complex, multiple agents
 Flat or VLAN
 Flat, VLAN, Overlays, L2-L3, SDN
 No plug-in support
 Plug-in support for 3rd parties
 No multi-tier topologies
 Multi-tier topologies
 Preparing for the future: IPv6 support
 --------------------------------------
 One of the most important networking topics today is the exhaustion of
 IPv4 addresses. As of late 2015, ICANN announced that the the final
 IPv4 address blocks have been fully assigned. Because of this, IPv6
 protocol has become the future of network focused applications. IPv6
 increases the address space significantly, fixes long standing issues
 in the IPv4 protocol, and will become essential for network focused
 applications in the future.
 OpenStack Networking, when configured for it, supports IPv6. To enable
 IPv6, create an IPv6 subnet in Networking and use IPv6 prefixes when
 creating security groups.
 Asymmetric links
 ----------------
 When designing a network architecture, the traffic patterns of an
 application heavily influence the allocation of total bandwidth and
 the number of links that you use to send and receive traffic. Applications
 that provide file storage for customers allocate bandwidth and links to
 favor incoming traffic; whereas video streaming applications allocate
 bandwidth and links to favor outgoing traffic.
 Performance
 -----------
 It is important to analyze the applications tolerance for latency and
 jitter when designing an environment to support network focused
 applications. Certain applications, for example VoIP, are less tolerant
 of latency and jitter. When latency and jitter are issues, certain
 applications may require tuning of QoS parameters and network device
 queues to ensure that they queue for transmit immediately or guarantee
 minimum bandwidth. Since OpenStack currently does not support these functions,
 consider carefully your selected network plug-in.
 The location of a service may also impact the application or consumer
 experience. If an application serves differing content to different users,
 it must properly direct connections to those specific locations. Where
 appropriate, use a multi-site installation for these situations.
 You can implement networking in two separate ways. Legacy networking
 (nova-network) provides a flat DHCP network with a single broadcast domain.
 This implementation does not support tenant isolation networks or advanced
 plug-ins, but it is currently the only way to implement a distributed
 layer-3 (L3) agent using the multi host configuration. OpenStack Networking
 (neutron) is the official networking implementation and provides a pluggable
 architecture that supports a large variety of network methods. Some of these
 include a layer-2 only provider network model, external device plug-ins, or
 even OpenFlow controllers.
 Networking at large scales becomes a set of boundary questions. The
 determination of how large a layer-2 domain must be is based on the
 amount of nodes within the domain and the amount of broadcast traffic
 that passes between instances. Breaking layer-2 boundaries may require
 the implementation of overlay networks and tunnels. This decision is a
 balancing act between the need for a smaller overhead or a need for a smaller
 domain.
 When selecting network devices, be aware that making a decision based on the
 greatest port density often comes with a drawback. Aggregation switches and
 routers have not all kept pace with Top of Rack switches and may induce
 bottlenecks on north-south traffic. As a result, it may be possible for
 massive amounts of downstream network utilization to impact upstream network
 devices, impacting service to the cloud. Since OpenStack does not currently
 provide a mechanism for traffic shaping or rate limiting, it is necessary to
 implement these features at the network hardware level.
 Tunable networking components
 -----------------------------
 Consider configurable networking components related to an OpenStack
 architecture design when designing for network intensive workloads
 that include MTU and QoS. Some workloads require a larger MTU than normal
 due to the transfer of large blocks of data. When providing network
 service for applications such as video streaming or storage replication,
 we recommend that you configure both OpenStack hardware nodes and the
 supporting network equipment for jumbo frames where possible. This
 allows for better use of available bandwidth. Configure jumbo frames across the
 complete path the packets traverse. If one network component is not capable of
 handling jumbo frames then the entire path reverts to the default MTU.
 :term:`Quality of Service (QoS)` also has a great impact on network intensive
 workloads as it provides instant service to packets which have a higher
 priority due to the impact of poor network performance. In applications such as
 Voice over IP (VoIP), differentiated services code points are a near
 requirement for proper operation. You can also use QoS in the opposite
 direction for mixed workloads to prevent low priority but high bandwidth
 applications, for example backup services, video conferencing, or file sharing,
 from blocking bandwidth that is needed for the proper operation of other
 workloads. It is possible to tag file storage traffic as a lower class, such as
 best effort or scavenger, to allow the higher priority traffic through. In
 cases where regions within a cloud might be geographically distributed it may
 also be necessary to plan accordingly to implement WAN optimization to combat
 latency or packet loss
 Network hardware selection
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 The network architecture determines which network hardware will be
 used. Networking software is determined by the selected networking
 hardware.
 There are more subtle design impacts that need to be considered. The
 selection of certain networking hardware (and the networking software)
 affects the management tools that can be used. There are exceptions to
 this; the rise of *open* networking software that supports a range of
 networking hardware means there are instances where the relationship
 between networking hardware and networking software are not as tightly
 defined.
 For a compute-focus architecture, we recommend designing the network
 architecture using a scalable network model that makes it easy to add
 capacity and bandwidth. A good example of such a model is the leaf-spline
 model. In this type of network design, you can add additional
 bandwidth as well as scale out to additional racks of gear. It is important to
 select network hardware that supports port count, port speed, and
 port density while allowing for future growth as workload demands
 increase. In the network architecture, it is also important to evaluate
 where to provide redundancy.
 Some of the key considerations in the selection of networking hardware
 include:
 Port count
 The design will require networking hardware that has the requisite
 port count.
 Port density
 The network design will be affected by the physical space that is
 required to provide the requisite port count. A higher port density
 is preferred, as it leaves more rack space for compute or storage
 components. This can also lead into considerations about fault domains
 and power density. Higher density switches are more expensive, therefore
 it is important not to over design the network.
 Port speed
 The networking hardware must support the proposed network speed, for
 example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE).
 Redundancy
 User requirements for high availability and cost considerations
 influence the level of network hardware redundancy.
 Network redundancy can be achieved by adding redundant power
 supplies or paired switches.
 .. note::
    Hardware must support network redundacy.
 Power requirements
 Ensure that the physical data center provides the necessary power
 for the selected network hardware.
 .. note::
    This is not an issue for top of rack (ToR) switches. This may be an issue
    for spine switches in a leaf and spine fabric, or end of row (EoR)
    switches.
 Protocol support
 It is possible to gain more performance out of a single storage
 system by using specialized network technologies such as RDMA, SRP,
 iSER and SCST. The specifics for using these technologies is beyond
 the scope of this book.
 There is no single best practice architecture for the networking
 hardware supporting an OpenStack cloud. Some of the key factors that will
 have a major influence on selection of networking hardware include:
 Connectivity
 All nodes within an OpenStack cloud require network connectivity. In
 some cases, nodes require access to more than one network segment.
 The design must encompass sufficient network capacity and bandwidth
 to ensure that all communications within the cloud, both north-south
 and east-west traffic have sufficient resources available.
 Scalability
 The network design should encompass a physical and logical network
 design that can be easily expanded upon. Network hardware should
 offer the appropriate types of interfaces and speeds that are
 required by the hardware nodes.
 Availability
 To ensure access to nodes within the cloud is not interrupted,
 we recommend that the network architecture identify any single
 points of failure and provide some level of redundancy or fault
 tolerance. The network infrastructure often involves use of
 networking protocols such as LACP, VRRP or others to achieve a highly
 available network connection. It is also important to consider the
 networking implications on API availability. We recommend a load balancing
 solution is designed within the network architecture to ensure that the APIs,
 and potentially other services in the cloud are highly available.
 Networking software selection
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 OpenStack Networking (neutron) provides a wide variety of networking
 services for instances. There are many additional networking software
 packages that can be useful when managing OpenStack components. Some
 examples include:
 * Software to provide load balancing
 * Network redundancy protocols
 * Routing daemons
 Some of these software packages are described in more detail in the
 `OpenStack network nodes chapter <http://docs.openstack.org/ha-guide
 /networking-ha.html>`_ in the OpenStack High Availability Guide.
 For a general purpose OpenStack cloud, the OpenStack infrastructure
 components need to be highly available. If the design does not include
 hardware load balancing, networking software packages like HAProxy will
 need to be included.
 For a compute-focused OpenStack cloud, the OpenStack infrastructure
 components must be highly available. If the design does not include
 hardware load balancing, you must add networking software packages, for
 example, HAProxy.
--- a/doc/arch-design-draft/source/design-storage/design-storage-concepts.rst
+++ b/doc/arch-design-draft/source/design-storage/design-storage-concepts.rst
@ -226,3 +226,166 @@ compute cloud are:
 * To provide users with a persistent storage mechanism
 * As a scalable, reliable data store for virtual machine images
 Selecting storage hardware
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 Storage hardware architecture is determined by selecting specific storage
 architecture. Determine the selection of storage architecture by
 evaluating possible solutions against the critical factors, the user
 requirements, technical considerations, and operational considerations.
 Consider the following factors when selecting storage hardware:
 Cost
 Storage can be a significant portion of the overall system cost. For
 an organization that is concerned with vendor support, a commercial
 storage solution is advisable, although it comes with a higher price
 tag. If initial capital expenditure requires minimization, designing
 a system based on commodity hardware would apply. The trade-off is
 potentially higher support costs and a greater risk of
 incompatibility and interoperability issues.
 Performance
 The latency of storage I/O requests indicates performance. Performance
 requirements affect which solution you choose.
 Scalability
 Scalability, along with expandability, is a major consideration in a
 general purpose OpenStack cloud. It might be difficult to predict
 the final intended size of the implementation as there are no
 established usage patterns for a general purpose cloud. It might
 become necessary to expand the initial deployment in order to
 accommodate growth and user demand.
 Expandability
 Expandability is a major architecture factor for storage solutions
 with general purpose OpenStack cloud. A storage solution that
 expands to 50 PB is considered more expandable than a solution that
 only scales to 10 PB. This meter is related to scalability, which is
 the measure of a solution's performance as it expands.
 General purpose cloud storage requirements
 ------------------------------------------
 Using a scale-out storage solution with direct-attached storage (DAS) in
 the servers is well suited for a general purpose OpenStack cloud. Cloud
 services requirements determine your choice of scale-out solution. You
 need to determine if a single, highly expandable and highly vertical,
 scalable, centralized storage array is suitable for your design. After
 determining an approach, select the storage hardware based on this
 criteria.
 This list expands upon the potential impacts for including a particular
 storage architecture (and corresponding storage hardware) into the
 design for a general purpose OpenStack cloud:
 Connectivity
 If storage protocols other than Ethernet are part of the storage solution,
 ensure the appropriate hardware has been selected. If a centralized storage
 array is selected, ensure that the hypervisor will be able to connect to
 that storage array for image storage.
 Usage
 How the particular storage architecture will be used is critical for
 determining the architecture. Some of the configurations that will
 influence the architecture include whether it will be used by the
 hypervisors for ephemeral instance storage, or if OpenStack Object
 Storage will use it for object storage.
 Instance and image locations
 Where instances and images will be stored will influence the
 architecture.
 Server hardware
 If the solution is a scale-out storage architecture that includes
 DAS, it will affect the server hardware selection. This could ripple
 into the decisions that affect host density, instance density, power
 density, OS-hypervisor, management tools and others.
 A general purpose OpenStack cloud has multiple options. The key factors
 that will have an influence on selection of storage hardware for a
 general purpose OpenStack cloud are as follows:
 Capacity
 Hardware resources selected for the resource nodes should be capable
 of supporting enough storage for the cloud services. Defining the
 initial requirements and ensuring the design can support adding
 capacity is important. Hardware nodes selected for object storage
 should be capable of support a large number of inexpensive disks
 with no reliance on RAID controller cards. Hardware nodes selected
 for block storage should be capable of supporting high speed storage
 solutions and RAID controller cards to provide performance and
 redundancy to storage at a hardware level. Selecting hardware RAID
 controllers that automatically repair damaged arrays will assist
 with the replacement and repair of degraded or deleted storage
 devices.
 Performance
 Disks selected for object storage services do not need to be fast
 performing disks. We recommend that object storage nodes take
 advantage of the best cost per terabyte available for storage.
 Contrastingly, disks chosen for block storage services should take
 advantage of performance boosting features that may entail the use
 of SSDs or flash storage to provide high performance block storage
 pools. Storage performance of ephemeral disks used for instances
 should also be taken into consideration.
 Fault tolerance
 Object storage resource nodes have no requirements for hardware
 fault tolerance or RAID controllers. It is not necessary to plan for
 fault tolerance within the object storage hardware because the
 object storage service provides replication between zones as a
 feature of the service. Block storage nodes, compute nodes, and
 cloud controllers should all have fault tolerance built in at the
 hardware level by making use of hardware RAID controllers and
 varying levels of RAID configuration. The level of RAID chosen
 should be consistent with the performance and availability
 requirements of the cloud.
 Storage-focus cloud storage requirements
 ----------------------------------------
 Storage-focused OpenStack clouds must address I/O intensive workloads.
 These workloads are not CPU intensive, nor are they consistently network
 intensive. The network may be heavily utilized to transfer storage, but
 they are not otherwise network intensive.
 The selection of storage hardware determines the overall performance and
 scalability of a storage-focused OpenStack design architecture. Several
 factors impact the design process, including:
 Latency is a key consideration in a storage-focused OpenStack cloud.
 Using solid-state disks (SSDs) to minimize latency and, to reduce CPU
 delays caused by waiting for the storage, increases performance. Use
 RAID controller cards in compute hosts to improve the performance of the
 underlying disk subsystem.
 Depending on the storage architecture, you can adopt a scale-out
 solution, or use a highly expandable and scalable centralized storage
 array. If a centralized storage array meets your requirements, then the
 array vendor determines the hardware selection. It is possible to build
 a storage array using commodity hardware with Open Source software, but
 requires people with expertise to build such a system.
 On the other hand, a scale-out storage solution that uses
 direct-attached storage (DAS) in the servers may be an appropriate
 choice. This requires configuration of the server hardware to support
 the storage solution.
 Considerations affecting storage architecture (and corresponding storage
 hardware) of a Storage-focused OpenStack cloud include:
 Connectivity
 Ensure the connectivity matches the storage solution requirements. We
 recommended confirming that the network characteristics minimize latency
 to boost the overall performance of the design.
 Latency
 Determine if the use case has consistent or highly variable latency.
 Throughput
 Ensure that the storage solution throughput is optimized for your
 application requirements.
 Server hardware
 Use of DAS impacts the server hardware choice and affects host
 density, instance density, power density, OS-hypervisor, and
 management tools.
--- a/doc/arch-design-draft/source/overview-operator-requirements.rst
+++ b/doc/arch-design-draft/source/overview-operator-requirements.rst
@ -5,6 +5,63 @@ Operator requirements
 This section describes operational factors affecting the design of an
 OpenStack cloud.
 Network design
 ~~~~~~~~~~~~~~
 The network design for an OpenStack cluster includes decisions regarding
 the interconnect needs within the cluster, the need to allow clients to
 access their resources, and the access requirements for operators to
 administrate the cluster. You should consider the bandwidth, latency,
 and reliability of these networks.
 Consider additional design decisions about monitoring and alarming.
 If you are using an external provider, service level agreements (SLAs)
 are typically defined in your contract. Operational considerations such
 as bandwidth, latency, and jitter can be part of the SLA.
 As demand for network resources increase, make sure your network design
 accommodates expansion and upgrades. Operators add additional IP address
 blocks and add additional bandwidth capacity. In addition, consider
 managing hardware and software lifecycle events, for example upgrades,
 decommissioning, and outages, while avoiding service interruptions for
 tenants.
 Factor maintainability into the overall network design. This includes
 the ability to manage and maintain IP addresses as well as the use of
 overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS
 tags. As an example, if you may need to change all of the IP addresses
 on a network, a process known as renumbering, then the design must
 support this function.
 Address network-focused applications when considering certain
 operational realities. For example, consider the impending exhaustion of
 IPv4 addresses, the migration to IPv6, and the use of private networks
 to segregate different types of traffic that an application receives or
 generates. In the case of IPv4 to IPv6 migrations, applications should
 follow best practices for storing IP addresses. We recommend you avoid
 relying on IPv4 features that did not carry over to the IPv6 protocol or
 have differences in implementation.
 To segregate traffic, allow applications to create a private tenant
 network for database and storage network traffic. Use a public network
 for services that require direct client access from the Internet. Upon
 segregating the traffic, consider :term:`quality of service (QoS)` and
 security to ensure each network has the required level of service.
 Also consider the routing of network traffic. For some applications,
 develop a complex policy framework for routing. To create a routing
 policy that satisfies business requirements, consider the economic cost
 of transmitting traffic over expensive links versus cheaper links, in
 addition to bandwidth, latency, and jitter requirements.
 Finally, consider how to respond to network events. How load
 transfers from one link to another during a failure scenario could be
 a factor in the design. If you do not plan network capacity
 correctly, failover traffic could overwhelm other ports or network
 links and create a cascading failure scenario. In this case,
 traffic that fails over to one link overwhelms that link and then
 moves to the subsequent links until all network traffic stops.
 SLA considerations
 ~~~~~~~~~~~~~~~~~~
@ -102,6 +159,89 @@ managing and maintaining your OpenStack environment, see the
 `Operations chapter <http://docs.openstack.org/ops-guide/operations.html>`_
 in the OpenStack Operations Guide.
 Logging and monitoring
 ----------------------
 OpenStack clouds require appropriate monitoring platforms to identify and
 manage errors.
 .. note::
   We recommend leveraging existing monitoring systems to see if they
   are able to effectively monitor an OpenStack environment.
 Specific meters that are critically important to capture include:
 * Image disk utilization
 * Response time to the Compute API
 Logging and monitoring does not significantly differ for a multi-site OpenStack
 cloud. The tools described in the `Logging and monitoring chapter
 <http://docs.openstack.org/ops-guide/ops-logging-monitoring.html>`__ of
 the Operations Guide remain applicable. Logging and monitoring can be provided
 on a per-site basis, and in a common centralized location.
 When attempting to deploy logging and monitoring facilities to a centralized
 location, care must be taken with the load placed on the inter-site networking
 links
 Management software
 -------------------
 Management software providing clustering, logging, monitoring, and alerting
 details for a cloud environment is often used.  This impacts and affects the
 overall OpenStack cloud design, and must account for the additional resource
 consumption such as CPU, RAM, storage, and network
 bandwidth.
 The inclusion of clustering software, such as Corosync or Pacemaker, is
 primarily determined by the availability of the cloud infrastructure and
 the complexity of supporting the configuration after it is deployed. The
 `OpenStack High Availability Guide <http://docs.openstack.org/ha-guide/>`_
 provides more details on the installation and configuration of Corosync
 and Pacemaker, should these packages need to be included in the design.
 Some other potential design impacts include:
 * OS-hypervisor combination
   Ensure that the selected logging, monitoring, or alerting tools support
   the proposed OS-hypervisor combination.
 * Network hardware
   The network hardware selection needs to be supported by the logging,
   monitoring, and alerting software.
 Database software
 -----------------
 Most OpenStack components require access to back-end database services
 to store state and configuration information. Choose an appropriate
 back-end database which satisfies the availability and fault tolerance
 requirements of the OpenStack services.
 MySQL is the default database for OpenStack, but other compatible
 databases are available.
 .. note::
   Telemetry uses MongoDB.
 The chosen high availability database solution changes according to the
 selected database. MySQL, for example, provides several options. Use a
 replication technology such as Galera for active-active clustering. For
 active-passive use some form of shared storage. Each of these potential
 solutions has an impact on the design:
 * Solutions that employ Galera/MariaDB require at least three MySQL
  nodes.
 * MongoDB has its own design considerations for high availability.
 * OpenStack design, generally, does not include shared storage.
  However, for some high availability designs, certain components might
  require it depending on the specific implementation.
 Operator access to systems
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
--- a/doc/arch-design-draft/source/overview-software-licensing.rst
+++ b/doc/arch-design-draft/source/overview-software-licensing.rst
@ -0,0 +1,33 @@
 ==================
 Software licensing
 ==================
 The many different forms of license agreements for software are often written
 with the use of dedicated hardware in mind.  This model is relevant for the
 cloud platform itself, including the hypervisor operating system, supporting
 software for items such as database, RPC, backup, and so on.  Consideration
 must be made when offering Compute service instances and applications to end
 users of the cloud, since the license terms for that software may need some
 adjustment to be able to operate economically in the cloud.
 Multi-site OpenStack deployments present additional licensing
 considerations over and above regular OpenStack clouds, particularly
 where site licenses are in use to provide cost efficient access to
 software licenses. The licensing for host operating systems, guest
 operating systems, OpenStack distributions (if applicable),
 software-defined infrastructure including network controllers and
 storage systems, and even individual applications need to be evaluated.
 Topics to consider include:
 * The definition of what constitutes a site in the relevant licenses,
  as the term does not necessarily denote a geographic or otherwise
  physically isolated location.
 * Differentiations between "hot" (active) and "cold" (inactive) sites,
  where significant savings may be made in situations where one site is
  a cold standby for disaster recovery purposes only.
 * Certain locations might require local vendors to provide support and
  services for each site which may vary with the licensing agreement in
  place.
--- a/doc/arch-design-draft/source/overview.rst
+++ b/doc/arch-design-draft/source/overview.rst
@ -55,5 +55,6 @@ covered include:
   overview-planning
   overview-customer-requirements
   overview-legal-requirements
   overview-software-licensing
   overview-security-requirements
   overview-operator-requirements