Merge "Session Border Controller"

2015-11-25 13:47:08 +00:00 · 2015-11-25 13:47:08 +00:00 · e53bce1cb3
commit e53bce1cb3
parent c115ec7147 f4f0bf3f5b
1 changed files with 311 additions and 0 deletions
--- a/usecases/sbc.rst
+++ b/usecases/sbc.rst
@ -0,0 +1,311 @@
+..
+
+This work is licensed under a Creative Commons Attribution 3.0 Unported License.
+http://creativecommons.org/licenses/by/3.0/legalcode
+
+=================================================
+ Perimeta Session Border Controller by Metaswitch
+=================================================
+
+This use case is specifically about deploying the Perimeta Session Border
+Controller (SBC) Virtual Network Function (VNF) from Metaswitch Networks in
+OpenStack, but is more generally representative of the issues involved in
+deploying in OpenStack any VNF utilising a fast data plane or high scale SIP.
+The use case focuses on those elements rather than more generic issues like
+orchestration and high availability (HA).
+
+Perimeta, like other SBCs, sits on the edge of a service provider's network and
+polices SIP and RTP (i.e. VoIP) control and media traffic passing over both
+* the access network between end-users and the core network
+* the trunk network between the core and another service provider.
+
+::
+
+    Access     +      SP A core      +    Trunk        +      SP B core
+    network    |      network        |    network      |      network
+               |                     |                 |
+               |                     |                 |
+  +-------+  +-+--+   +---------+  +-+--+            +-+--+   +---------+
+  |User   |  |SBC |   |Network  |  |SBC |            |SBC |   |Network  |
+  |device |--|    |---|function |--|    |------------|    |---|function |
+  +-------+  +-+--+   +---------+  +-+--+            +-+--+   +---------+
+               |                     |                 |
+               +                     +                 +
+
+Glossary
+========
+
+**NFV**
+  Network Functions Virtualization, the vision of deploying telecoms functions
+  as virtual applications running on commercial off the shelf hardware.
+
+**VNF**
+  Virtual Network Function - a telecoms or other network function running as
+  a virtual application.
+
+**SIP**
+  Session Initiation Protocol (RFC 3261) - a common application-layer control
+  protocol for creating, modifying and destroying sessions between two or more
+  participants.
+
+**RTP**
+  Real-time Transport Protocol (RFC 3550) - an end-to-end network transport
+  protocol for transmitting real-time data like audio and video.
+
+**VoIP**
+  Voice over Internet Protocol - delivering voice and multimedia sessions over
+  IP networks, commonly through the use of SIP + RTP.
+
+**SBC**
+  Session Border Controller, a telecoms function which polices SIP and RTP
+  flows, providing security, quality of service, admission control and interop
+  services.
+
+**DDoS**
+  Distributed Denial of Service - a form of packet flood attack.
+
+**SLA**
+  Service Level Agreement - contractual commitment to reach certain performance
+  and availability targets.
+
+**SR-IOV**
+  Single Root I/O Virtualisation - a technique for presenting a single physical
+  PCIe device (such as a NIC) as multiple virtual devices, directly presented
+  to VMs.
+
+**DPDK**
+  Data Plane Development Kit - a set of libraries and drivers for fast packet
+  processing.
+
+Problem description
+===================
+
+In order to implement its security and admission control functions (e.g. DDoS
+protection), Perimeta must perform line-rate processing of received packets.
+For RTP streams, this equates to several million VoIP packets (each ~64-220
+bytes depending on codec) per second per core.  Perimeta must be able to
+guarantee this performance and offer SLAs.
+
+Perimeta must be fully HA, with no single points of failure, and service
+continuity over both software and hardware failures (i.e. all SIP sessions and
+RTP sessions must continue with minimal interruption over software or hardware
+failures).
+
+Perimeta must be elastically scalable, enabling an NFV orchestrator to add and
+remove instances in response to applied load.
+
+To apply different policies to traffic from different customers, Perimeta must
+be able to distinguish and separate traffic from different customers via VLANs
+or similar mechanism.
+
+Perimeta must separate networks carrying live customer traffic from networks
+carrying management or other internal data.
+
+Perimeta signaling instances must be able to support large numbers of
+concurrent TCP connections (hundreds of thousands) to cater for large numbers
+of clients using TCP.
+
+Perimeta must be able to coexist with VMs which do not have these requirements
+on the same host, so long as it can provide sufficient dedicated resources.
+For example, just because Perimeta may not require security group function it
+does not mean this can be disabled at a host scope, or just because Perimeta
+uses SR-IOV or DPDK it does not mean that all VMs on that host must do so.
+
+Affected By
+-----------
+
+The following bps in progress will affect this use case (see Gaps):
+* https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-pinning
+* https://blueprints.launchpad.net/nova/+spec/input-output-based-numa-scheduling
+* https://blueprints.launchpad.net/nova/+spec/virt-driver-large-pages
+* https://blueprints.launchpad.net/neutron/+spec/ml2-ovs-portsecurity
+* https://blueprints.launchpad.net/neutron/+spec/nfv-vlan-trunks
+* https://blueprints.launchpad.net/neutron/+spec/ovs-optimization-redundant-bridge
+
+The following etherpads are also relevant:
+* https://etherpad.openstack.org/p/kilo_sriov_pci_passthrough
+* https://etherpad.openstack.org/p/telcowg-usecase-openstack-ha
+
+Requirements
+============
+
+The problem statement above leads to the following requirements.
+
+* Achieving packets per second target - networking implications
+
+  A standard OpenStack/OpenvSwitch platform allows VMs to drive NICs to full
+  bandwidth if using large packet sizes typical for Web applications. What
+  makes VoIP different is the small packet size, which means order of magnitude
+  more packets and permits only a few hundred CPU instructions per packet -
+  nowhere near enough to drive a packet through the standard OpenStack
+  networking stack from VM to NIC.  Instead, this requires technologies such
+  as SR-IOV (https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
+  - completed in 2014.2, though with some technical debt remaining) or a DPDK
+  or similar poll mode based vSwitch in the host. Note that SR-IOV in
+  particular imposes some limitations (e.g. prevents live migration) so may not
+  be a desirable option for some SPs.
+
+  Ideally the network would support and respect QoS rules on traffic priority
+  and bandwidth limits.
+
+* Security - networking implications
+
+  Security groups must be disabled for network technologies where they are
+  not bypassed completely.
+
+  The network should protect against ARP poisoning attacks from other VMs.
+
+* High scale TCP - networking implications
+
+  For ports with security group function disabled, it is desirable that host
+  connection tracking function is disabled to avoid performance and occupancy
+  hits for large numbers of TCP connections and the need to tune host
+  parameters unnecessarily.
+
+* Achieving packets per second target - compute implications
+
+  * To achieve line rate all the working data for processing RTP streams
+    (active flows etc.) must be kept in L3 cache - main memory look-ups are too
+    slow. That requires pinning guest vCPUs to host pCPUs.
+
+  * To optimise the data flow rate it is desirable to bind to a NIC on the host
+    CPU's bus.
+
+  * To offer performance SLAs rather than simply "best efforts" we need to
+    control the placement of cores, minimise transaction lookaside buffer (TLB)
+    misses and get accurate info about core topology (threads vs. hyperthreads
+    etc).
+
+* HA
+
+  Perimeta must be deployable to provide a 5 9's level of availability.  If
+  deployed in a single cloud instance, that instance must therefore itself be
+  more than 5 9's available.  As that is hard to achieve with today's state of
+  the art, Perimeta is designed to be able to span multiple independent cloud
+  instances, so that the failure of any one cloud has a minor impact.  The
+  requirements that creates are still being discussed and will be addressed in
+  a future use case.
+
+  When deploying Perimeta within a single cloud instance, Perimeta uses an
+  active/standby architecture with an internal heartbeat mechanism allowing the
+  standby to take over within seconds of failure of the active, including
+  taking over its IP address.  To support these application level HA mechanisms
+  requires:
+
+  * support for anti-affinity rules to permit the active and standby being
+    instantiated on the same host
+
+  * support for application-controlled virtual IPs via gratuitous ARP based
+    scheme (for IPv4) and NDP Neighbour Advertisements (for IPv6); in both
+    cases the standby sends messages saying it now owns the virtual IP address.
+
+  The former is supported through standard anti-affinity nova scheduler rules,
+  and the latter through the neutron allowed-address-pairs extension.
+
+  If using SR-IOV, Perimeta does not need multiple SR-IOV ports, as
+  application level redundancy copes with the failure of a single NIC. However,
+  it can take advantage of local link redundancy using multiple SR-IOV vNICs.
+  For this to be of any benefit requires the SR-IOV VFs forming a redundant
+  pair to be allocated on separate PFs.
+
+  Additionally, it is clearly desirable that the underlying cloud instance is
+  as available as possible e.g. no single points of failure (SPOFs) in the
+  underlying network or storage infrastructure.
+
+* Elastic scaling
+
+  An NFV orchestrator must be able to rapidly launch or terminate new Perimeta
+  instances in response to applied load and service responsiveness.  This is
+  basic OpenStack nova function.
+
+* Support for a scalable mechanism to support multiple networks in a VM
+
+  There must be a scalable mechanism to present multiple networks to Perimeta,
+  of order hundreds or thousands, so far exceeding the number of vNICs that can
+  be attached.  Various mechanisms are possible; a common one, and the one
+  that Perimeta supports, is for different customer networks to be presented
+  over VLANs.  This creates a guest requirement for VLAN trunking support.
+
+  There are multiple possible ways of mapping networks to these VLANs within
+  OpenStack, for example, trunking external VLAN networks directly to the VMs
+  with minimal OpenStack knowledge or configuration (already supported in Kilo)
+  or configuring the mapping between OpenStack networks and VLANs as covered in
+  VLAN aware VMs: https://blueprints.launchpad.net/neutron/+spec/vlan-aware-vms
+
+Related Use Cases
+=================
+
+None as yet, but any VNF that needs to do line rate processing of VoIP packets
+will likely have similar requirements and gaps.
+
+Gaps
+====
+
+The above requirements currently suffer from these gaps.
+
+* SR-IOV
+
+  Initial support for this has been released, but there remains technical debt
+  being tracked at https://etherpad.openstack.org/p/kilo_sriov_pci_passthrough
+  which would improve the usability and robustness of an SR-IOV-based solution.
+
+  Making use of multiple bonded SR-IOV VFs requires the following two bps:
+
+  * SR-IOV VFs distributed across multiple pNICs
+
+    * https://blueprints.launchpad.net/nova/+spec/distribute-pci-allocation
+
+  * Allow user to control setting of spoof checking for the SR-IOV ports
+
+    * http://specs.openstack.org/openstack/neutron-specs/specs/liberty/sriov-spoofchk.html
+
+  Allowing link redundancy with SR-IOV requires the following bp:
+
+  * https://blueprints.launchpad.net/nova/+spec/distribute-pci-allocation
+
+  Additional minor enhancements that would be useful:
+
+  * http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/pci-passthrough-whitelist-regex.html
+
+* QoS
+
+  * http://specs.openstack.org/openstack/neutron-specs/specs/liberty/qos-api-extension.html
+
+* Security protection
+
+  * http://specs.openstack.org/openstack/neutron-specs/specs/liberty/arp-spoof-filtering-ebtables.html
+
+* CPU pinning
+
+  https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-pinning
+  http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/virt-driver-cpu-pinning.html
+
+* VLAN aware VMs
+
+  http://specs.openstack.org/openstack/neutron-specs/specs/liberty/vlan-aware-vms.html
+
+The following are not yet addressed.
+
+* Removing redundant bridge
+
+  https://blueprints.launchpad.net/neutron/+spec/ovs-optimization-redundant-bridge
+
+* Disabling connection tracking in host
+
+  [No current blueprint]
+
+* HA
+
+  As above, to deliver 5 9's service Perimeta expects to be deployed spanning
+  multiple cloud instances, but if deployed in a single instance it is
+  desirable for that cloud to be available as possible.
+
+This use case also implicitly places requirement on elements outside core
+OpenStack, such as the DPDK OVS mechanism driver
+(https://github.com/stackforge/networking-ovs-dpdk).
+
+References
+==========
+
+N/a.
+