diff --git a/doc/images/ovn-cluster-overview.png b/doc/images/ovn-cluster-overview.png new file mode 100644 index 00000000..b64f4aa2 Binary files /dev/null and b/doc/images/ovn-cluster-overview.png differ diff --git a/doc/source/contributor/bgp_supportability_matrix.rst b/doc/source/bgp_supportability_matrix.rst similarity index 53% rename from doc/source/contributor/bgp_supportability_matrix.rst rename to doc/source/bgp_supportability_matrix.rst index ed5cc952..cbe1ae12 100644 --- a/doc/source/contributor/bgp_supportability_matrix.rst +++ b/doc/source/bgp_supportability_matrix.rst @@ -22,62 +22,67 @@ The next sections highlight the options and features supported by each driver BGP Driver (SB) --------------- -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+--------------------+-----------------------+-------------+ -| Exposing Method | Description | Expose with | Wired with | Expose Tenants | Expose only GUA | OVS-DPDK/HWOL Support | Implemented | -+=================+=====================================================+==========================================+==========================================+==========================+====================+=======================+=============+ -| Underlay | Expose IPs on the default underlay network | Adding IP to dummy nic isolated in a VRF | Ingress: ip rules, and ip routes on the | Yes | Yes | No | Yes | -| | | | routing table associated with OVS | | (expose_ipv6_gua | | | -| | | | Egress: OVS flow to change MAC | (expose_tenant_networks) | _tenant_networks) | | | -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+--------------------+-----------------------+-------------+ ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+--------------------+-----------------------+-----------+ +| Exposing Method | Description | Expose with | Wired with | Expose Tenants | Expose only GUA | OVS-DPDK/HWOL Support | Supported | ++=================+=====================================================+==========================================+==========================================+==========================+====================+=======================+===========+ +| Underlay | Expose IPs on the default underlay network. | Adding IP to dummy NIC isolated in a VRF | Ingress: ip rules, and ip routes on the | Yes | Yes | No | Yes | +| | | | routing table associated with OVS | | (expose_ipv6_gua | | | +| | | | Egress: OVS flow to change MAC | (expose_tenant_networks) | _tenant_networks) | | | ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+--------------------+-----------------------+-----------+ BGP Driver (NB) --------------- -Note until RFE on OVN (https://bugzilla.redhat.com/show_bug.cgi?id=2107515) -is implemented there is no option to expose tenant networks as we do not know -where the CR-LRP port is associated to. +OVN version 23.09 is required to expose tenant networks and ovn-lb, because +CR-LRP port chassis information in the NB DB is only available in that +version (https://bugzilla.redhat.com/show_bug.cgi?id=2107515). -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+-----------------------+-------------+ -| Exposing Method | Description | Expose with | Wired with | Expose Tenants or GUA | OVS-DPDK/HWOL Support | Implemented | -+=================+=====================================================+==========================================+==========================================+==========================+=======================+=============+ -| Underlay | Expose IPs on the default underlay network | Adding IP to dummy nic isolated in a VRF | Ingress: ip rules, and ip routes on the | No support until OVN | No | Yes | -| | | | routing table associated to ovs | has information about | | | -| | | | Egress: ovs-flow to change mac | the CR-LRP chassis on | | | -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+ the SB DB +-----------------------+-------------+ -| L2VNI | Extends the L2 segment on a given VNI | No need to expose it, automatic with the | Ingress: vxlan + bridge device | | No | No | -| | | FRR configuration and the wiring | Egress: nothing | | | | -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+ +-----------------------+-------------+ -| VRF | Expose IPs on a given VRF (vni id) | Add IPs to dummy nic associated to the | Ingress: vxlan + bridge device | | No | No | -| | | VRF device (lo_VNI_ID) | Egress: flow to redirect to VRF device | | | | -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+ +-----------------------+-------------+ -| Dynamic | Mix of the previous, depending on annotations it | Mix of the previous three | Ingress: mix of all the above | | No | No | -| | exposes it differently and on different VNIs | | Egress: mix of all the above | | | | -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+ +-----------------------+-------------+ -| OVN-Cluster | Make use of an extra OVN cluster (per node) instead | Adding IP to dummy nic isolated in a VRF | Ingress: ovn routes, ovs flow (mac tweak)| | Yes | No | -| | of kernel routing -- exposing the IPs with BGP is | (as it only supports the underlay option)| Egress: ovn routes and policies, | | | | -| | the same as before | | and ovs flow (mac tweak) | | | | -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+-----------------------+-------------+ +The following table lists the various methods you can use to expose the +networks/IPS, how they expose the IPs and the tenant networks, and whether +OVS-DPDK and hardware offload (HWOL) is supported. + ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+-----------------------+---------------+ +| Exposing Method | Description | Expose with | Wired with | Expose Tenants or GUA | OVS-DPDK/HWOL Support | Supported | ++=================+=====================================================+==========================================+==========================================+==========================+=======================+===============+ +| Underlay | Expose IPs on the default underlay network. | Adding IP to dummy NIC isolated in a VRF.| Ingress: ip rules, and ip routes on the | Yes | No | Yes | +| | | | routing table associated to OVS | | | | +| | | | Egress: OVS-flow to change MAC | (expose_tenant_networks) | | | ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+-----------------------+---------------+ +| L2VNI | Extends the L2 segment on a given VNI. | No need to expose it, automatic with the | Ingress: vxlan + bridge device | N/A | No | No | +| | | FRR configuration and the wiring. | Egress: nothing | | | | ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+-----------------------+---------------+ +| VRF | Expose IPs on a given VRF (vni id). | Add IPs to dummy NIC associated to the | Ingress: vxlan + bridge device | Yes | No | No | +| | | VRF device (lo_VNI_ID). | Egress: flow to redirect to VRF device | (Not implemented) | | | ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+-----------------------+---------------+ +| Dynamic | Mix of the previous. Depending on annotations it | Mix of the previous three. | Ingress: mix of all the above | Depends on the method | No | No | +| | exposes IPs differently and on different VNIs. | | Egress: mix of all the above | used | | | ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+-----------------------+---------------+ +| OVN | Make use of an extra OVN cluster (per node) instead | Adding IP to dummy NIC isolated in a VRF | Ingress: OVN routes, OVS flow (MAC tweak)| Yes | Yes | Yes. Only for | +| | of kernel routing -- exposing the IPs with BGP is | (as it only supports the underlay | Egress: OVN routes and policies, | (Not implemented) | | ipv4 and flat | +| | the same as before. | option). | and OVS flow (MAC tweak) | | | provider | +| | | | | | | networks | ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+--------------------------+-----------------------+---------------+ BGP Stretched Driver (SB) ------------------------- -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+----------------+--------------------+-----------------------+-------------+ -| Exposing Method | Description | Expose with | Wired with | Expose Tenants | Expose only GUA | OVS-DPDK/HWOL Support | Implemented | -+=================+=====================================================+==========================================+==========================================+================+====================+=======================+=============+ -| Underlay | Expose IPs on the default underlay network | Adding IP routes to default VRF table | Ingress: ip rules, and ip routes on the | Yes | No | No | Yes | -| | | | routing table associated to ovs | | | | | -| | | | Egress: ovs-flow to change mac | | | | | -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+----------------+--------------------+-----------------------+-------------+ ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+----------------+--------------------+-----------------------+-----------+ +| Exposing Method | Description | Expose with | Wired with | Expose Tenants | Expose only GUA | OVS-DPDK/HWOL Support | Supported | ++=================+=====================================================+==========================================+==========================================+================+====================+=======================+===========+ +| Underlay | Expose IPs on the default underlay network. | Adding IP routes to default VRF table. | Ingress: ip rules, and ip routes on the | Yes | No | No | Yes | +| | | | routing table associated to OVS | | | | | +| | | | Egress: OVS-flow to change MAC | | | | | ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+----------------+--------------------+-----------------------+-----------+ EVPN Driver (SB) ---------------- -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+----------------+--------------------+-----------------------+-------------+ -| Exposing Method | Description | Expose with | Wired with | Expose Tenants | Expose only GUA | OVS-DPDK/HWOL Support | Implemented | -+=================+=====================================================+==========================================+==========================================+================+====================+=======================+=============+ -| VRF | Expose IPs on a given VRF (vni id) -- requires | Add IPs to dummy nic associated to the | Ingress: vxlan + bridge device | Yes | No | No | No | -| | newtorking-bgpvpn or manual NB DB inputs | VRF device (lo_VNI_ID) | Egress: flow to redirect to VRF device | | | | | -+-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+----------------+--------------------+-----------------------+-------------+ ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+----------------+--------------------+-----------------------+-----------+ +| Exposing Method | Description | Expose with | Wired with | Expose Tenants | Expose only GUA | OVS-DPDK/HWOL Support | Supported | ++=================+=====================================================+==========================================+==========================================+================+====================+=======================+===========+ +| VRF | Expose IPs on a given VRF (vni id) -- requires | Add IPs to dummy NIC associated to the | Ingress: vxlan + bridge device | Yes | No | No | No | +| | newtorking-bgpvpn or manual NB DB inputs. | VRF device (lo_VNI_ID). | Egress: flow to redirect to VRF device | | | | | ++-----------------+-----------------------------------------------------+------------------------------------------+------------------------------------------+----------------+--------------------+-----------------------+-----------+ diff --git a/doc/source/contributor/agent_deployment.rst b/doc/source/contributor/agent_deployment.rst new file mode 100644 index 00000000..001c357c --- /dev/null +++ b/doc/source/contributor/agent_deployment.rst @@ -0,0 +1,163 @@ +Agent deployment +~~~~~~~~~~~~~~~~ + +The BGP mode (for both NB and SB drivers) exposes the VMs and LBs in provider +networks or with FIPs, as well as VMs on tenant networks if +``expose_tenant_networks`` or ``expose_ipv6_gua_tenant_networks`` configuration +options are enabled. + +There is a need to deploy the agent in all the nodes where VMs can be created +as well as in the networker nodes (i.e., where OVN router gateway ports can be +allocated): + +- For VMs and Amphora load balancers on provider networks or with FIPs, + the IP is exposed on the node where the VM (or amphora) is deployed. + Therefore the agent needs to be running on the compute nodes. + +- For VMs on tenant networks (with ``expose_tenant_networks`` or + ``expose_ipv6_gua_tenant_networks`` configuration options enabled), the agent + needs to be running on the networker nodes. In OpenStack, with OVN + networking, the N/S traffic to the tenant VMs (without FIPs) needs to go + through the networking nodes, more specifically the one hosting the + chassisredirect OVN port (cr-lrp), connecting the provider network to the + OVN virtual router. Hence, the VM IPs are advertised through BGP in that + node, and from there it follows the normal path to the OpenStack compute + node where the VM is located — through the tunnel. + +- Similarly, for OVN load balancer the IPs are exposed on the networker node. + In this case the ARP request for the VIP is replied by the OVN router + gateway port, therefore the traffic needs to be injected into OVN overlay + at that point too. + Therefore the agent needs to be running on the networker nodes for OVN + load balancers. + +As an example of how to start the OVN BGP Agent on the nodes, see the commands +below: + + .. code-block:: ini + + $ python setup.py install + $ cat bgp-agent.conf + # sample configuration that can be adapted based on needs + [DEFAULT] + debug=True + reconcile_interval=120 + expose_tenant_networks=True + # expose_ipv6_gua_tenant_networks=True + # for SB DB driver + driver=ovn_bgp_driver + # for NB DB driver + #driver=nb_ovn_bgp_driver + bgp_AS=64999 + bgp_nic=bgp-nic + bgp_vrf=bgp-vrf + bgp_vrf_table_id=10 + ovsdb_connection=tcp:127.0.0.1:6640 + address_scopes=2237917c7b12489a84de4ef384a2bcae + + [ovn] + ovn_nb_connection = tcp:172.17.0.30:6641 + ovn_sb_connection = tcp:172.17.0.30:6642 + + [agent] + root_helper=sudo ovn-bgp-agent-rootwrap /etc/ovn-bgp-agent/rootwrap.conf + root_helper_daemon=sudo ovn-bgp-agent-rootwrap-daemon /etc/ovn-bgp-agent/rootwrap.conf + + $ sudo bgp-agent --config-dir bgp-agent.conf + Starting BGP Agent... + Loaded chassis 51c8480f-c573-4c1c-b96e-582f9ca21e70. + BGP Agent Started... + Ensuring VRF configuration for advertising routes + Configuring br-ex default rule and routing tables for each provider network + Found routing table for br-ex with: ['201', 'br-ex'] + Sync current routes. + Add BGP route for logical port with ip 172.24.4.226 + Add BGP route for FIP with ip 172.24.4.199 + Add BGP route for CR-LRP Port 172.24.4.221 + .... + + + .. note:: + + If you only want to expose the IPv6 GUA tenant IPs, then remove the option + ``expose_tenant_networks`` and add ``expose_ipv6_gua_tenant_networks=True`` + instead. + + + .. note:: + + If you want to filter the tenant networks to be exposed by some specific + address scopes, add the list of address scopes to ``address_scope=XXX`` + section. If no filtering should be applied, just remove the line. + + +Note that the OVN BGP Agent operates under the next assumptions: + +- A dynamic routing solution, in this case FRR, is deployed and + advertises/withdraws routes added/deleted to/from certain local interface, + in this case the ones associated to the VRF created to that end. As only VM + and load balancer IPs need to be advertised, FRR needs to be configure with + the proper filtering so that only /32 (or /128 for IPv6) IPs are advertised. + A sample config for FRR is: + + .. code-block:: ini + + frr version 7.5 + frr defaults traditional + hostname cmp-1-0 + log file /var/log/frr/frr.log debugging + log timestamp precision 3 + service integrated-vtysh-config + line vty + + router bgp 64999 + bgp router-id 172.30.1.1 + bgp log-neighbor-changes + bgp graceful-shutdown + no bgp default ipv4-unicast + no bgp ebgp-requires-policy + + neighbor uplink peer-group + neighbor uplink remote-as internal + neighbor uplink password foobar + neighbor enp2s0 interface peer-group uplink + neighbor enp3s0 interface peer-group uplink + + address-family ipv4 unicast + redistribute connected + neighbor uplink activate + neighbor uplink allowas-in origin + neighbor uplink prefix-list only-host-prefixes out + exit-address-family + + address-family ipv6 unicast + redistribute connected + neighbor uplink activate + neighbor uplink allowas-in origin + neighbor uplink prefix-list only-host-prefixes out + exit-address-family + + ip prefix-list only-default permit 0.0.0.0/0 + ip prefix-list only-host-prefixes permit 0.0.0.0/0 ge 32 + + route-map rm-only-default permit 10 + match ip address prefix-list only-default + set src 172.30.1.1 + + ip protocol bgp route-map rm-only-default + + ipv6 prefix-list only-default permit ::/0 + ipv6 prefix-list only-host-prefixes permit ::/0 ge 128 + + route-map rm-only-default permit 11 + match ipv6 address prefix-list only-default + set src f00d:f00d:f00d:f00d:f00d:f00d:f00d:0004 + + ipv6 protocol bgp route-map rm-only-default + + ip nht resolve-via-default + + +- The relevant provider OVS bridges are created and configured with a loopback + IP address (eg. 1.1.1.1/32 for IPv4), and proxy ARP/NDP is enabled on their + kernel interface. diff --git a/doc/source/contributor/bgp_advertising.rst b/doc/source/contributor/bgp_advertising.rst new file mode 100644 index 00000000..686bdadc --- /dev/null +++ b/doc/source/contributor/bgp_advertising.rst @@ -0,0 +1,66 @@ +BGP Advertisement ++++++++++++++++++ + +The OVN BGP Agent (both SB and NB drivers) is in charge of triggering FRR +(IP routing protocol suite for Linux which includes protocol daemons for BGP, +OSPF, RIP, among others) to advertise/withdraw directly connected routes via +BGP. To do that, when the agent starts, it ensures that: + +- FRR local instance is reconfigured to leak routes for a new VRF. To do that + it uses ``vtysh shell``. It connects to the existsing FRR socket ( + ``--vty_socket`` option) and executes the next commands, passing them through + a file (``-c FILE_NAME`` option): + + .. code-block:: ini + + router bgp {{ bgp_as }} + address-family ipv4 unicast + import vrf {{ vrf_name }} + exit-address-family + + address-family ipv6 unicast + import vrf {{ vrf_name }} + exit-address-family + + router bgp {{ bgp_as }} vrf {{ vrf_name }} + bgp router-id {{ bgp_router_id }} + address-family ipv4 unicast + redistribute connected + exit-address-family + + address-family ipv6 unicast + redistribute connected + exit-address-family + + +- There is a VRF created (the one leaked in the previous step), by default + with name ``bgp-vrf``. + +- There is a dummy interface type (by default named ``bgp-nic``), associated to + the previously created VRF device. + +- Ensure ARP/NDP is enabled at OVS provider bridges by adding an IP to it. + + +Then, to expose the VMs/LB IPs as they are created (or upon +initialization or re-sync), since the FRR configuration has the +``redistribute connected`` option enabled, the only action needed to expose it +(or withdraw it) is to add it (or remove it) from the ``bgp-nic`` dummy interface. +Then it relies on Zebra to do the BGP advertisement, as Zebra detects the +addition/deletion of the IP on the local interface and advertises/withdraws +the route: + + .. code-block:: ini + + $ ip addr add IPv4/32 dev bgp-nic + $ ip addr add IPv6/128 dev bgp-nic + + + .. note:: + + As we also want to be able to expose VM connected to tenant networks + (when ``expose_tenant_networks`` or ``expose_ipv6_gua_tenant_networks`` + configuration options are enabled), there is a need to expose the Neutron + router gateway port (CR-LRP on OVN) so that the traffic to VMs in tenant + networks is injected into OVN overlay through the node that is hosting + that port. \ No newline at end of file diff --git a/doc/source/contributor/bgp_mode_design.rst b/doc/source/contributor/bgp_mode_design.rst deleted file mode 100644 index f9411aa2..00000000 --- a/doc/source/contributor/bgp_mode_design.rst +++ /dev/null @@ -1,603 +0,0 @@ -.. - This work is licensed under a Creative Commons Attribution 3.0 Unported - License. - - http://creativecommons.org/licenses/by/3.0/legalcode - - Convention for heading levels in Neutron devref: - ======= Heading 0 (reserved for the title in a document) - ------- Heading 1 - ~~~~~~~ Heading 2 - +++++++ Heading 3 - ''''''' Heading 4 - (Avoid deeper levels because they do not render well.) - -======================================= -OVN BGP Agent: Design of the BGP Driver -======================================= - -Purpose -------- - -The purpose of this document is to present the design decision behind -the BGP Driver for the Networking OVN BGP agent. - -The main purpose of adding support for BGP is to be able to expose Virtual -Machines (VMs) and Load Balancers (LBs) IPs through BGP dynamic protocol -when they either have a Floating IP (FIP) associated or are booted/created -on a provider network -- also in tenant networks if a flag is enabled. - - -Overview --------- - -With the increment of virtualized/containerized workloads it is becoming more -and more common to use pure layer-3 Spine and Leaf network deployments at -datacenters. There are several benefits of this, such as reduced complexity at -scale, reduced failures domains, limiting broadcast traffic, among others. - -The OVN BGP Agent is a Python based daemon that runs on each node -(e.g., OpenStack controllers and/or compute nodes). It connects to the OVN -SouthBound DataBase (OVN SB DB) to detect the specific events it needs to -react to, and then leverages FRR to expose the routes towards the VMs, and -kernel networking capabilities to redirect the traffic arriving on the nodes -to the OVN overlay. - - .. note:: - - Note it is only intended for the N/S traffic, the E/W traffic will work - exactly the same as before, i.e., VMs are connected through geneve - tunnels. - - -The agent provides a multi-driver implementation that allows you to configure -it for specific infrastructure running on top of OVN, for instance OpenStack -or Kubernetes/OpenShift. -This simple design allows the agent to implement different drivers, depending -on what OVN SB DB events are being watched (watchers examples at -``ovn_bgp_agent/drivers/openstack/watchers/``), and what actions are -triggered in reaction to them (drivers examples at -``ovn_bgp_agent/drivers/openstack/XXXX_driver.py``, implementing the -``ovn_bgp_agent/drivers/driver_api.py``). - -A driver implements the support for BGP capabilities. It ensures both VMs and -LBs on providers networks or with Floating IPs associated can be -exposed throug BGP. In addition, VMs on tenant networks can be also exposed -if the ``expose_tenant_network`` configuration option is enabled. -To control what tenant networks are exposed another flag can be used: -``address_scopes``. If not set, all the tenant networks will be exposed, while -if it is configured with a (set of) address_scopes, only the tenant networks -whose address_scope matches will be exposed. - -A common driver API is defined exposing the next methods: - -- ``expose_ip`` and ``withdraw_ip``: used to expose/withdraw IPs for local - OVN ports. - -- ``expose_remote_ip`` and ``withdraw_remote_ip``: use to expose/withdraw IPs - through another node when the VM/Pod are running on a different node. - For example for VMs on tenant networks where the traffic needs to be - injected through the OVN router gateway port. - -- ``expose_subnet`` and ``withdraw_subnet``: used to expose/withdraw subnets through - the local node. - - -Proposed Solution ------------------ - -To support BGP functionality the OVN BGP Agent includes a driver -that performs the extra steps required for exposing the IPs through BGP on -the right nodes and steering the traffic to/from the node from/to the OVN -overlay. In order to configure which driver to use, one should set the -``driver`` configuration option in the ``bgp-agent.conf`` file. - -This driver requires a watcher to react to the BGP-related events. -In this case, the BGP actions will be trigger by events related to -``Port_Binding`` and ``Load_Balancer`` OVN SB DB tables. -The information in those tables gets modified by actions related to VMs or LBs -creation/deletion, as well as FIPs association/disassociation to/from them. - -Then, the agent performs some actions in order to ensure those VMs are -reachable through BGP: - -- Traffic between nodes or BGP Advertisement: These are the actions needed to - expose the BGP routes and make sure all the nodes know how to reach the - VM/LB IP on the nodes. - -- Traffic within a node or redirecting traffic to/from OVN overlay: These are - the actions needed to redirect the traffic to/from a VM to the OVN neutron - networks, when traffic reaches the node where the VM is or in their way - out of the node. - -The code for the BGP driver is located at -``drivers/openstack/ovn_bgp_driver.py``, and its associated watcher can be -found at ``drivers/openstack/watchers/bgp_watcher.py``. - - -OVN SB DB Events -~~~~~~~~~~~~~~~~ - -The watcher associated to the BGP driver detect the relevant events on the -OVN SB DB to call the driver functions to configure BGP and linux kernel -networking accordingly. -The folloging events are watched and handled by the BGP watcher: - -- VMs or LBs created/deleted on provider networks - -- FIPs association/disassociation to VMs or LBs - -- VMs or LBs created/deleted on tenant networks (if the - ``expose_tenant_networks`` configuration option is enabled, or if the - ``expose_ipv6_gua_tenant_networks`` for only exposing IPv6 GUA ranges) - - .. note:: - - If ``expose_tenant_networks`` flag is enabled, it does not matter the - status of ``expose_ipv6_gua_tenant_networks``, as all the tenant IPs - will be advertized. - - -The BGP watcher detects OVN Southbound Database events at the ``Port_Binding`` -and ``Load_Balancer`` tables. It creates new event classes named -``PortBindingChassisEvent`` and ``OVNLBEvent``, that all the events -watched for BGP use as the base (inherit from). - -The specific defined events to react to are: - -- ``PortBindingChassisCreatedEvent``: Detects when a port of type - ``""`` (empty double-qoutes), ``virtual``, or ``chassisredirect`` gets - attached to the OVN chassis where the agent is running. This is the case for - VM or amphora LB ports on the provider networks, VM or amphora LB ports on - tenant networks with a FIP associated, and neutron gateway router ports - (CR-LRPs). It calls ``expose_ip`` driver method to perform the needed - actions to expose it. - -- ``PortBindingChassisDeletedEvent``: Detects when a port of type - ``""`` (empty double-quotes), ``virtual``, or ``chassisredirect`` gets - detached from the OVN chassis where the agent is running. This is the case - for VM or amphora LB ports on the provider networks, VM or amphora LB ports - on tenant networks with a FIP associated, and neutron gateway router ports - (CR-LRPs). It calls ``withdraw_ip`` driver method to perform the needed - actions to withdraw the exposed BGP route. - -- ``FIPSetEvent``: Detects when a patch port gets its nat_addresses field - updated (e.g., action related to FIPs NATing). If that so, and the associated - VM port is on the local chassis the event is processed by the agent and the - required ip rule gets created and also the IP is (BGP) exposed. It calls - ``expose_ip`` driver method, including the associated_port information, to - perform the required actions. - -- ``FIPUnsetEvent``: Same as previous, but when the nat_address field get an - IP deleted. It calls ``withdraw_ip`` driver method to perform the required - actions. - -- ``SubnetRouterAttachedEvent``: Detects when a patch port gets created. - This means a subnet is attached to a router. In the ``expose_tenant_network`` - case, if the chassis is the one having the cr-lrp port for that router where - the port is getting created, then the event is processed by the agent and the - needed actions (ip rules and routes, and ovs rules) for exposing the IPs on - that network are performed. This event calls the driver_api - ``expose_subnet``. The same happens if ``expose_ipv6_gua_tenant_networks`` - is used, but then, the IPs are only exposed if they are IPv6 global. - -- ``SubnetRouterDetachedEvent``: Same as previous one, but for the deletion - of the port. It calls ``withdraw_subnet``. - -- ``TenantPortCreateEvent``: Detects when a port of type ``""`` (empty - double-quotes) or ``virtual`` gets updated. If that port is not on a - provider network, and the chasis where the event is processed has the - LogicalRouterPort for the network and the OVN router gateway port where the - network is connected to, then the event is processed and the actions to - expose it through BGP are triggered. It calls the ``expose_remote_ip`` as in - this case the IPs are exposed through the node with the OVN router gateway - port, instead of where the VM is. - -- ``TenantPortDeleteEvent``: Same as previous one, but for the deletion of the - port. It calls ``withdraw_remote_ip``. - -- ``OVNLBMemberUpdateEvent``: This event is required to handle the OVN load - balancers created on the provider networks. It detects when new datapaths - are added/removed to/from the ``Load_Balancer`` entries. This happens when - members are added/removed -- their respective datapaths are added into the - ``Load_Balancer`` table entry. The event is only processed in the nodes with the - relevant OVN router gateway ports, as it is where it needs to get exposed to - be injected into OVN overlay. It calls ``expose_ovn_lb_on_provider`` when the - second datapath is added (first one is the one belonging to the VIP (i.e., - the provider network), while the second one belongs to the load balancer - member -- note all the load balancer members are expected to be connected - through the same router to the provider network). And it calls - ``withdraw_ovn_lb_on_provider`` when that member gets deleted (only one - datapath left) or the event type is ROW_DELETE, meaning the whole - load balancer is deleted. - - -Driver Logic -~~~~~~~~~~~~ - -The BGP driver is in charge of the networking configuration ensuring that -VMs and LBs on provider networks or with FIPs can be reached through BGP -(N/S traffic). In addition, if ``expose_tenant_networks`` flag is enabled, -VMs in tenant networks should be reachable too -- although instead of directly -in the node they are created, through one of the network gateway chassis nodes. -The same happens with ``expose_ipv6_gua_tenant_networks`` but only for IPv6 -GUA ranges. In addition, if the config option ``address_scopes`` is set only -the tenant networks with matching corresponding address_scope will be exposed. - -To accomplish this, it needs to ensure that: - -- VM and LBs IPs can be advertized in a node where the traffic could be - injected into the OVN overlay, in this case either the node hosting the VM - or the node where the router gateway port is scheduled (see limitations - subsection). - -- Once the traffic reaches the specific node, the traffic is redirected to the - OVN overlay by leveraging kernel networking. - - -BGP Advertisement -+++++++++++++++++ - -The OVN BGP Agent is in charge of triggering FRR (ip routing protocol -suite for Linux which includes protocol daemons for BGP, OSPF, RIP, -among others) to advertise/withdraw directly connected routes via BGP. -To do that, when the agent starts, it ensures that: - -- FRR local instance is reconfigured to leak routes for a new VRF. To do that - it uses ``vtysh shell``. It connects to the existsing FRR socket ( - ``--vty_socket`` option) and executes the next commands, passing them through - a file (``-c FILE_NAME`` option): - - .. code-block:: ini - - LEAK_VRF_TEMPLATE = ''' - router bgp {{ bgp_as }} - address-family ipv4 unicast - import vrf {{ vrf_name }} - exit-address-family - - address-family ipv6 unicast - import vrf {{ vrf_name }} - exit-address-family - - router bgp {{ bgp_as }} vrf {{ vrf_name }} - bgp router-id {{ bgp_router_id }} - address-family ipv4 unicast - redistribute connected - exit-address-family - - address-family ipv6 unicast - redistribute connected - exit-address-family - - ''' - - -- There is a VRF created (the one leaked in the previous step), by default - with name ``bgp_vrf``. - -- There is a dummy interface type (by default named ``bgp-nic``), associated to - the previously created VRF device. - -- Ensure ARP/NDP is enabled at OVS provider bridges by adding an IP to it - - -Then, to expose the VMs/LB IPs as they are created (or upon -initialization or re-sync), since the FRR configuration has the -``redistribute connected`` option enabled, the only action needed to expose it -(or withdraw it) is to add it (or remove it) from the ``bgp-nic`` dummy interface. -Then it relies on Zebra to do the BGP advertisemant, as Zebra detects the -addition/deletion of the IP on the local interface and advertises/withdraw -the route: - - .. code-block:: ini - - $ ip addr add IPv4/32 dev bgp-nic - $ ip addr add IPv6/128 dev bgp-nic - - - .. note:: - - As we also want to be able to expose VM connected to tenant networks - (when ``expose_tenant_networks`` or ``expose_ipv6_gua_tenant_networks`` - configuration options are enabled), there is a need to expose the Neutron - router gateway port (CR-LRP on OVN) so that the traffic to VMs on tenant - networks is injected into OVN overlay through the node that is hosting - that port. - - -Traffic Redirection to/from OVN -+++++++++++++++++++++++++++++++ - -Once the VM/LB IP is exposed in an specific node (either the one hosting the -VM/LB or the one with the OVN router gateway port), the OVN BGP Agent is in -charge of configuring the linux kernel networking and OVS so that the traffic -can be injected into the OVN overlay, and vice versa. To do that, when the -agent starts, it ensures that: - -- ARP/NDP is enabled at OVS provider bridges by adding an IP to it - -- There is a routing table associated to each OVS provider bridge - (adds entry at /etc/iproute2/rt_tables) - -- If provider network is a VLAN network, a VLAN device connected - to the bridge is created, and it has ARP and NDP enabed. - -- Cleans up extra OVS flows at the OVS provider bridges - -Then, either upon events or due to (re)sync (regularly or during start up), it: - -- Adds an IP rule to apply specific routing table routes, - in this case the one associated to the OVS provider bridge: - - .. code-block:: ini - - $ ip rule - 0: from all lookup local - 1000: from all lookup [l3mdev-table] - *32000: from all to IP lookup br-ex* # br-ex is the OVS provider bridge - *32000: from all to CIDR lookup br-ex* # for VMs in tenant networks - 32766: from all lookup main - 32767: from all lookup default - - -- Adds an IP route at the OVS provider bridge routing table so that the traffic is - routed to the OVS provider bridge device: - - .. code-block:: ini - - $ ip route show table br-ex - default dev br-ex scope link - *CIDR via CR-LRP_IP dev br-ex* # for VMs in tenant networks - *CR-LRP_IP dev br-ex scope link* # for the VM in tenant network redirection - *IP dev br-ex scope link* # IPs on provider or FIPs - - -- Adds a static ARP entry for the OVN router gateway ports (CR-LRP) so that the - traffic is steered to OVN via br-int -- this is because OVN does not reply - to ARP requests outside its L2 network: - - .. code-block:: ini - - $ ip nei - ... - CR-LRP_IP dev br-ex lladdr CR-LRP_MAC PERMANENT - ... - - -- For IPv6, instead of the static ARP entry, and NDP proxy is added, same - reasoning: - - .. code-block:: ini - - $ ip -6 nei add proxy CR-LRP_IP dev br-ex - - -- Finally, in order for properly send the traffic out from the OVN overlay - to kernel networking to be sent out of the node, the OVN BGP Agent needs - to add a new flow at the OVS provider bridges so that the destination MAC - address is changed to the MAC address of the OVS provider bridge - (``actions=mod_dl_dst:OVN_PROVIDER_BRIDGE_MAC,NORMAL``): - - .. code-block:: ini - - $ sudo ovs-ofctl dump-flows br-ex - cookie=0x3e7, duration=77.949s, table=0, n_packets=0, n_bytes=0, priority=900,ip,in_port="patch-provnet-1" actions=mod_dl_dst:3a:f7:e9:54:e8:4d,NORMAL - cookie=0x3e7, duration=77.937s, table=0, n_packets=0, n_bytes=0, priority=900,ipv6,in_port="patch-provnet-1" actions=mod_dl_dst:3a:f7:e9:54:e8:4d,NORMAL - - - -Driver API -++++++++++ - -The BGP driver needs to implement the ``driver_api.py`` interface with the -following functions: - -- ``expose_ip``: creates all the ip rules and routes, and ovs flows needed - to redirect the traffic to OVN overlay. It also ensure FRR exposes through - BGP the required IP. - -- ``withdraw_ip``: removes the above configuration to withdraw the exposed IP. - -- ``expose_subnet``: add kernel networking configuration (ip rules and route) - to ensure traffic can go from the node to the OVN overlay, and viceversa, - for IPs within the tenant subnet CIDR. - -- ``withdraw_subnet``: removes the above kernel networking configuration. - -- ``expose_remote_ip``: BGP expose VM tenant network IPs through the chassis - hosting the OVN gateway port for the router where the VM is connected. - It ensures traffic destinated to the VM IP arrives to this node by exposing - the IP through BGP locally. The previous steps in ``expose_subnet`` ensure - the traffic is redirected to the OVN overlay once on the node. - -- ``withdraw_remote_ip``: removes the above steps to stop advertizing the IP - through BGP from the node. - -And in addition, it also implements these 2 extra ones for the OVN load -balancers on the provider networks - -- ``expose_ovn_lb_on_provider``: adds kernel networking configuration to ensure - traffic is forwarded from the node to the OVN overlay as well as to expose - the VIP through BGP. - -- ``withdraw_ovn_lb_on_provider``: removes the above steps to stop advertising - the load balancer VIP. - - -Agent deployment -~~~~~~~~~~~~~~~~ - -The BGP mode exposes the VMs and LBs in provider networks or with -FIPs, as well as VMs on tenant networks if ``expose_tenant_networks`` or -``expose_ipv6_gua_tenant_networks`` configuration options are enabled. - -There is a need to deploy the agent in all the nodes where VMs can be created -as well as in the networker nodes (i.e., where OVN router gateway ports can be -allocated): - -- For VMs and Amphora load balancers on provider networks or with FIPs, - the IP is exposed on the node where the VM (or amphora) is deployed. - Therefore the agent needs to be running on the compute nodes. - -- For VMs on tenant networks (with ``expose_tenant_networks`` or - ``expose_ipv6_gua_tenant_networks`` configuration options enabled), the agent - needs to be running on the networker nodes. In OpenStack, with OVN - networking, the N/S traffic to the tenant VMs (without FIPs) needs to go - through the networking nodes, more specifically the one hosting the - chassisredirect ovn port (cr-lrp), connecting the provider network to the - OVN virtual router. Hence, the VM IPs is advertised through BGP in that - node, and from there it follows the normal path to the OpenStack compute - node where the VM is located — the Geneve tunnel. - -- Similarly, for OVN load balancer the IPs are exposed on the networker node. - In this case the ARP request for the VIP is replied by the OVN router - gateway port, therefore the traffic needs to be injected into OVN overlay - at that point too. - Therefore the agent needs to be running on the networker nodes for OVN - load balancers. - -As an example of how to start the OVN BGP Agent on the nodes, see the commands -below: - - .. code-block:: ini - - $ python setup.py install - $ cat bgp-agent.conf - # sample configuration that can be adapted based on needs - [DEFAULT] - debug=True - reconcile_interval=120 - expose_tenant_networks=True - # expose_ipv6_gua_tenant_networks=True - driver=osp_bgp_driver - address_scopes=2237917c7b12489a84de4ef384a2bcae - - $ sudo bgp-agent --config-dir bgp-agent.conf - Starting BGP Agent... - Loaded chassis 51c8480f-c573-4c1c-b96e-582f9ca21e70. - BGP Agent Started... - Ensuring VRF configuration for advertising routes - Configuring br-ex default rule and routing tables for each provider network - Found routing table for br-ex with: ['201', 'br-ex'] - Sync current routes. - Add BGP route for logical port with ip 172.24.4.226 - Add BGP route for FIP with ip 172.24.4.199 - Add BGP route for CR-LRP Port 172.24.4.221 - .... - - - .. note:: - - If you only want to expose the IPv6 GUA tenant IPs, then remove the option - ``expose_tenant_networks`` and add ``expose_ipv6_gua_tenant_networks=True`` - instead. - - - .. note:: - - If you want to filter the tenant networks to be exposed by some specific - address scopes, add the list of address scopes to ``addresss_scope=XXX`` - section. If no filtering should be applied, just remove the line. - - -Note that the OVN BGP Agent operates under the next assumptions: - -- A dynamic routing solution, in this case FRR, is deployed and - advertises/withdraws routes added/deleted to/from certain local interface, - in this case the ones associated to the VRF created to that end. As only VM - and load balancer IPs needs to be advertised, FRR needs to be configure with - the proper filtering so that only /32 (or /128 for IPv6) IPs are advertised. - A sample config for FRR is: - - .. code-block:: ini - - frr version 7.0 - frr defaults traditional - hostname cmp-1-0 - log file /var/log/frr/frr.log debugging - log timestamp precision 3 - service integrated-vtysh-config - line vty - - router bgp 64999 - bgp router-id 172.30.1.1 - bgp log-neighbor-changes - bgp graceful-shutdown - no bgp default ipv4-unicast - no bgp ebgp-requires-policy - - neighbor uplink peer-group - neighbor uplink remote-as internal - neighbor uplink password foobar - neighbor enp2s0 interface peer-group uplink - neighbor enp3s0 interface peer-group uplink - - address-family ipv4 unicast - redistribute connected - neighbor uplink activate - neighbor uplink allowas-in origin - neighbor uplink prefix-list only-host-prefixes out - exit-address-family - - address-family ipv6 unicast - redistribute connected - neighbor uplink activate - neighbor uplink allowas-in origin - neighbor uplink prefix-list only-host-prefixes out - exit-address-family - - ip prefix-list only-default permit 0.0.0.0/0 - ip prefix-list only-host-prefixes permit 0.0.0.0/0 ge 32 - - route-map rm-only-default permit 10 - match ip address prefix-list only-default - set src 172.30.1.1 - - ip protocol bgp route-map rm-only-default - - ipv6 prefix-list only-default permit ::/0 - ipv6 prefix-list only-host-prefixes permit ::/0 ge 128 - - route-map rm-only-default permit 11 - match ipv6 address prefix-list only-default - set src f00d:f00d:f00d:f00d:f00d:f00d:f00d:0004 - - ipv6 protocol bgp route-map rm-only-default - - ip nht resolve-via-default - - -- The relevant provider OVS bridges are created and configured with a loopback - IP address (eg. 1.1.1.1/32 for IPv4), and proxy ARP/NDP is enabled on their - kernel interface. In the case of OpenStack this is done by TripleO directly. - - -Limitations ------------ - -The following limitations apply: - -- There is no API to decide what to expose, all VMs/LBs on providers or with - Floating IPs associated to them will get exposed. For the VMs in the tenant - networks, the flag ``address_scopes`` should be used for filtering what - subnets to expose -- which should be also used to ensure no overlapping IPs. - -- There is no support for overlapping CIDRs, so this must be avoided, e.g., by - using address scopes and subnet pools. - -- Network traffic is steered by kernel routing (ip routes and rules), therefore - OVS-DPDK, where the kernel space is skipped, is not supported. - -- Network traffic is steered by kernel routing (ip routes and rules), therefore - SRIOV, where the hypervisor is skipped, is not supported. - -- In OpenStack with OVN networking the N/S traffic to the ovn-octavia VIPs on - the provider or the FIPs associated to the VIPs on tenant networks needs to - go through the networking nodes (the ones hosting the Neutron Router Gateway - Ports, i.e., the chassisredirect cr-lrp ports, for the router connecting the - load balancer members to the provider network). Therefore, the entry point - into the OVN overlay needs to be one of those networking nodes, and - consequently the VIPs (or FIPs to VIPs) are exposed through them. From those - nodes the traffic will follow the normal tunneled path (Geneve tunnel) to - the OpenStack compute node where the selected member is located. diff --git a/doc/source/contributor/bgp_traffic_redirection.rst b/doc/source/contributor/bgp_traffic_redirection.rst new file mode 100644 index 00000000..9ee7ddc1 --- /dev/null +++ b/doc/source/contributor/bgp_traffic_redirection.rst @@ -0,0 +1,78 @@ +Traffic Redirection to/from OVN ++++++++++++++++++++++++++++++++ + +Besides the VM/LB IP being exposed in a specific node (either the one hosting +the VM/LB or the one with the OVN router gateway port), the OVN BGP Agent is in +charge of configuring the linux kernel networking and OVS so that the traffic +can be injected into the OVN overlay, and vice versa. To do that, when the +agent starts, it ensures that: + +- ARP/NDP is enabled on OVS provider bridges by adding an IP to it + +- There is a routing table associated to each OVS provider bridge + (adds entry at /etc/iproute2/rt_tables) + +- If the provider network is a VLAN network, a VLAN device connected + to the bridge is created, and it has ARP and NDP enabled. + +- Cleans up extra OVS flows at the OVS provider bridges + +Then, either upon events or due to (re)sync (regularly or during start up), it: + +- Adds an IP rule to apply specific routing table routes, + in this case the one associated to the OVS provider bridge: + + .. code-block:: ini + + $ ip rule + 0: from all lookup local + 1000: from all lookup [l3mdev-table] + *32000: from all to IP lookup br-ex* # br-ex is the OVS provider bridge + *32000: from all to CIDR lookup br-ex* # for VMs in tenant networks + 32766: from all lookup main + 32767: from all lookup default + + +- Adds an IP route at the OVS provider bridge routing table so that the traffic is + routed to the OVS provider bridge device: + + .. code-block:: ini + + $ ip route show table br-ex + default dev br-ex scope link + *CIDR via CR-LRP_IP dev br-ex* # for VMs in tenant networks + *CR-LRP_IP dev br-ex scope link* # for the VM in tenant network redirection + *IP dev br-ex scope link* # IPs on provider or FIPs + + +- Adds a static ARP entry for the OVN router gateway ports (CR-LRP) so that the + traffic is steered to OVN via br-int -- this is because OVN does not reply + to ARP requests outside its L2 network: + + .. code-block:: ini + + $ ip neigh + ... + CR-LRP_IP dev br-ex lladdr CR-LRP_MAC PERMANENT + ... + + +- For IPv6, instead of the static ARP entry, an NDP proxy is added, same + reasoning: + + .. code-block:: ini + + $ ip -6 neigh add proxy CR-LRP_IP dev br-ex + + +- Finally, in order for properly send the traffic out from the OVN overlay + to kernel networking to be sent out of the node, the OVN BGP Agent needs + to add a new flow at the OVS provider bridges so that the destination MAC + address is changed to the MAC address of the OVS provider bridge + (``actions=mod_dl_dst:OVN_PROVIDER_BRIDGE_MAC,NORMAL``): + + .. code-block:: ini + + $ sudo ovs-ofctl dump-flows br-ex + cookie=0x3e7, duration=77.949s, table=0, n_packets=0, n_bytes=0, priority=900,ip,in_port="patch-provnet-1" actions=mod_dl_dst:3a:f7:e9:54:e8:4d,NORMAL + cookie=0x3e7, duration=77.937s, table=0, n_packets=0, n_bytes=0, priority=900,ipv6,in_port="patch-provnet-1" actions=mod_dl_dst:3a:f7:e9:54:e8:4d,NORMAL diff --git a/doc/source/contributor/drivers/bgp_mode_design.rst b/doc/source/contributor/drivers/bgp_mode_design.rst new file mode 100644 index 00000000..45224d4e --- /dev/null +++ b/doc/source/contributor/drivers/bgp_mode_design.rst @@ -0,0 +1,310 @@ +.. _bgp_driver: + +=================================================================== +[SB DB] OVN BGP Agent: Design of the BGP Driver with kernel routing +=================================================================== + +Purpose +------- + +The addition of a BGP driver enables the OVN BGP agent to expose virtual +machine (VMs) and load balancer (LBs) IP addresses through the BGP dynamic +protocol when these IP addresses are either associated with a floating IP +(FIP) or are booted or created on a provider network. The same functionality +is available on project networks, when a special flag is set. + +This document presents the design decision behind the BGP Driver for the +Networking OVN BGP agent. + +Overview +-------- + +With the growing popularity of virtualized and containerized workloads, +it is common to use pure Layer 3 spine and leaf network deployments in data +centers. The benefits of this practice reduce scaling complexities, +failure domains, and broadcast traffic limits. + +The southbound OVN BGP agent is a Python-based daemon that runs on each +OpenStack Controller and Compute node. +The agent monitors the Open Virtual Network (OVN) southbound database +for certain VM and floating IP (FIP) events. +When these events occur, the agent notifies the FRR BGP daemon (bgpd) +to advertise the IP address or FIP associated with the VM. +The agent also triggers actions that route the external traffic to the OVN +overlay. +Because the agent uses a multi-driver implementation, you can configure the +agent for the specific infrastructure that runs on top of OVN, such as OSP or +Kubernetes and OpenShift. + + .. note:: + + Note it is only intended for the N/S traffic, the E/W traffic will work + exactly the same as before, i.e., VMs are connected through geneve + tunnels. + + +This design simplicity enables the agent to implement different drivers, +depending on what OVN SB DB events are being watched (watchers examples at +``ovn_bgp_agent/drivers/openstack/watchers/``), and what actions are +triggered in reaction to them (drivers examples at +``ovn_bgp_agent/drivers/openstack/XXXX_driver.py``, implementing the +``ovn_bgp_agent/drivers/driver_api.py``). + +A driver implements the support for BGP capabilities. It ensures that both VMs +and LBs on provider networks or associated floating IPs are exposed through BGP. +In addition, VMs on tenant networks can be also exposed +if the ``expose_tenant_network`` configuration option is enabled. +To control what tenant networks are exposed another flag can be used: +``address_scopes``. If not set, all the tenant networks will be exposed, while +if it is configured with a (set of) address_scopes, only the tenant networks +whose address_scope matches will be exposed. + +A common driver API is defined exposing the these methods: + +- ``expose_ip`` and ``withdraw_ip``: exposes or withdraws IPs for local + OVN ports. + +- ``expose_remote_ip`` and ``withdraw_remote_ip``: exposes or withdraws IPs + through another node when the VM or pods are running on a different node. + For example, use for VMs on tenant networks where the traffic needs to be + injected through the OVN router gateway port. + +- ``expose_subnet`` and ``withdraw_subnet``: exposes or withdraws subnets + through the local node. + + +Proposed Solution +----------------- + +To support BGP functionality the OVN BGP Agent includes a driver +that performs the extra steps required for exposing the IPs through BGP on +the correct nodes and steering the traffic to/from the node from/to the OVN +overlay. To configure the OVN BGP agent to use the BGP driver set the +``driver`` configuration option in the ``bgp-agent.conf`` file to +``ovn_bgp_driver``. + +The BGP driver requires a watcher to react to the BGP-related events. +In this case, BGP actions are triggered by events related to +``Port_Binding`` and ``Load_Balancer`` OVN SB DB tables. +The information in these tables is modified when VMs and LBs are created and +deleted, and when FIPs for them are associated and disassociated. + +Then, the agent performs some actions in order to ensure those VMs are +reachable through BGP: + +- Traffic between nodes or BGP Advertisement: These are the actions needed to + expose the BGP routes and make sure all the nodes know how to reach the + VM/LB IP on the nodes. + +- Traffic within a node or redirecting traffic to/from OVN overlay: These are + the actions needed to redirect the traffic to/from a VM to the OVN Neutron + networks, when traffic reaches the node where the VM is or in their way + out of the node. + +The code for the BGP driver is located at +``ovn_bgp_agent/drivers/openstack/ovn_bgp_driver.py``, and its associated +watcher can be found at +``ovn_bgp_agent/drivers/openstack/watchers/bgp_watcher.py``. + + +OVN SB DB Events +~~~~~~~~~~~~~~~~ + +The watcher associated with the BGP driver detects the relevant events on the +OVN SB DB to call the driver functions to configure BGP and linux kernel +networking accordingly. +The following events are watched and handled by the BGP watcher: + +- VMs or LBs created/deleted on provider networks + +- FIPs association/disassociation to VMs or LBs + +- VMs or LBs created/deleted on tenant networks (if the + ``expose_tenant_networks`` configuration option is enabled, or if the + ``expose_ipv6_gua_tenant_networks`` for only exposing IPv6 GUA ranges) + + .. note:: + + If ``expose_tenant_networks`` flag is enabled, it does not matter the + status of ``expose_ipv6_gua_tenant_networks``, as all the tenant IPs + are advertised. + + +It creates new event classes named +``PortBindingChassisEvent`` and ``OVNLBEvent``, that all the events +watched for BGP use as the base (inherit from). + +The BGP watcher reacts to the following events: + +- ``PortBindingChassisCreatedEvent``: Detects when a port of type + ``""`` (empty double-qoutes), ``virtual``, or ``chassisredirect`` gets + attached to the OVN chassis where the agent is running. This is the case for + VM or amphora LB ports on the provider networks, VM or amphora LB ports on + tenant networks with a FIP associated, and neutron gateway router ports + (CR-LRPs). It calls ``expose_ip`` driver method to perform the needed + actions to expose it. + +- ``PortBindingChassisDeletedEvent``: Detects when a port of type + ``""`` (empty double-quotes), ``virtual``, or ``chassisredirect`` gets + detached from the OVN chassis where the agent is running. This is the case + for VM or amphora LB ports on the provider networks, VM or amphora LB ports + on tenant networks with a FIP associated, and neutron gateway router ports + (CR-LRPs). It calls ``withdraw_ip`` driver method to perform the needed + actions to withdraw the exposed BGP route. + +- ``FIPSetEvent``: Detects when a Port_Binding entry of type ``patch`` gets + its ``nat_addresses`` field updated (e.g., action related to FIPs NATing). + When true, and the associated VM port is on the local chassis, the event + is processed by the agent and the required IP rule gets created and its + IP is (BGP) exposed. It calls the ``expose_ip`` driver method, including + the associated_port information, to perform the required actions. + +- ``FIPUnsetEvent``: Same as previous, but when the ``nat_addresses`` field get + an IP deleted. It calls the ``withdraw_ip`` driver method to perform the + required actions. + +- ``SubnetRouterAttachedEvent``: Detects when a Port_Binding entry of type + ``patch`` port gets created. This means a subnet is attached to a router. + In the ``expose_tenant_network`` + case, if the chassis is the one having the cr-lrp port for that router where + the port is getting created, then the event is processed by the agent and the + needed actions (ip rules and routes, and ovs rules) for exposing the IPs on + that network are performed. This event calls the driver API + ``expose_subnet``. The same happens if ``expose_ipv6_gua_tenant_networks`` + is used, but then, the IPs are only exposed if they are IPv6 global. + +- ``SubnetRouterDetachedEvent``: Same as ``SubnetRouterAttachedEvent``, + but for the deletion of the port. It calls ``withdraw_subnet``. + +- ``TenantPortCreateEvent``: Detects when a port of type ``""`` (empty + double-quotes) or ``virtual`` gets updated. If that port is not on a + provider network, and the chassis where the event is processed has the + ``LogicalRouterPort`` for the network and the OVN router gateway port where + the network is connected to, then the event is processed and the actions to + expose it through BGP are triggered. It calls the ``expose_remote_ip`` + because in this case the IPs are exposed through the node with the OVN router + gateway port, instead of the node where the VM is located. + +- ``TenantPortDeleteEvent``: Same as ``TenantPortCreateEvent``, but for + the deletion of the port. It calls ``withdraw_remote_ip``. + +- ``OVNLBMemberUpdateEvent``: This event is required to handle the OVN load + balancers created on the provider networks. It detects when new datapaths + are added/removed to/from the ``Load_Balancer`` entries. This happens when + members are added/removed which triggers the addition/deletion of their + datapaths into the ``Load_Balancer`` table entry. + The event is only processed in the nodes with + the relevant OVN router gateway ports, because it is where it needs to get + exposed to be injected into OVN overlay. + ``OVNLBMemberUpdateEvent`` calls ``expose_ovn_lb_on_provider`` only when the + second datapath is added. The first datapath belongs to the VIP for the + provider network, while the second one belongs to the load balancer member. + ``OVNLBMemberUpdateEvent`` calls ``withdraw_ovn_lb_on_provider`` when the + second datapath is deleted, or the entire load balancer is deleted (event + type is ``ROW_DELETE``). + + .. note:: + + All the load balancer members are expected to be connected through the same + router to the provider network. + + +Driver Logic +~~~~~~~~~~~~ + +The BGP driver is in charge of the networking configuration ensuring that +VMs and LBs on provider networks or with FIPs can be reached through BGP +(N/S traffic). In addition, if the ``expose_tenant_networks`` flag is enabled, +VMs in tenant networks should be reachable too -- although instead of directly +in the node they are created, through one of the network gateway chassis nodes. +The same happens with ``expose_ipv6_gua_tenant_networks`` but only for IPv6 +GUA ranges. In addition, if the config option ``address_scopes`` is set, only +the tenant networks with matching corresponding ``address_scope`` will be +exposed. + +To accomplish the network configuration and advertisement, the driver ensures: + +- VM and LBs IPs can be advertised in a node where the traffic could be + injected into the OVN overlay, in this case either the node hosting the VM + or the node where the router gateway port is scheduled (see limitations + subsection). + +- Once the traffic reaches the specific node, the traffic is redirected to the + OVN overlay by leveraging kernel networking. + + +.. include:: ../bgp_advertising.rst + + +.. include:: ../bgp_traffic_redirection.rst + + +Driver API +++++++++++ + +The BGP driver needs to implement the ``driver_api.py`` interface with the +following functions: + +- ``expose_ip``: creates all the IP rules and routes, and OVS flows needed + to redirect the traffic to the OVN overlay. It also ensure FRR exposes + through BGP the required IP. + +- ``withdraw_ip``: removes the above configuration to withdraw the exposed IP. + +- ``expose_subnet``: add kernel networking configuration (IP rules and route) + to ensure traffic can go from the node to the OVN overlay, and vice versa, + for IPs within the tenant subnet CIDR. + +- ``withdraw_subnet``: removes the above kernel networking configuration. + +- ``expose_remote_ip``: BGP exposes VM tenant network IPs through the chassis + hosting the OVN gateway port for the router where the VM is connected. + It ensures traffic destinated to the VM IP arrives to this node by exposing + the IP through BGP locally. The previous steps in ``expose_subnet`` ensure + the traffic is redirected to the OVN overlay once on the node. + +- ``withdraw_remote_ip``: removes the above steps to stop advertising the IP + through BGP from the node. + +The driver API implements these additional methods for OVN load balancers on +provider networks: + +- ``expose_ovn_lb_on_provider``: adds kernel networking configuration to ensure + traffic is forwarded from the node to the OVN overlay and to expose + the VIP through BGP. + +- ``withdraw_ovn_lb_on_provider``: removes the above steps to stop advertising + the load balancer VIP. + + +.. include:: ../agent_deployment.rst + + +Limitations +----------- + +The following limitations apply: + +- There is no API to decide what to expose, all VMs/LBs on providers or with + floating IPs associated with them will get exposed. For the VMs in the tenant + networks, the flag ``address_scopes`` should be used for filtering what + subnets to expose -- which should be also used to ensure no overlapping IPs. + +- There is no support for overlapping CIDRs, so this must be avoided, e.g., by + using address scopes and subnet pools. + +- Network traffic is steered by kernel routing (IP routes and rules), therefore + OVS-DPDK, where the kernel space is skipped, is not supported. + +- Network traffic is steered by kernel routing (IP routes and rules), therefore + SR-IOV, where the hypervisor is skipped, is not supported. + +- In OpenStack with OVN networking the N/S traffic to the ovn-octavia VIPs on + the provider or the FIPs associated to the VIPs on tenant networks needs to + go through the networking nodes (the ones hosting the Neutron Router Gateway + Ports, i.e., the chassisredirect cr-lrp ports, for the router connecting the + load balancer members to the provider network). Therefore, the entry point + into the OVN overlay needs to be one of those networking nodes, and + consequently the VIPs (or FIPs to VIPs) are exposed through them. From those + nodes the traffic follows the normal tunneled path (Geneve tunnel) to + the OpenStack compute node where the selected member is located. diff --git a/doc/source/contributor/bgp_mode_stretched_l2_design.rst b/doc/source/contributor/drivers/bgp_mode_stretched_l2_design.rst similarity index 100% rename from doc/source/contributor/bgp_mode_stretched_l2_design.rst rename to doc/source/contributor/drivers/bgp_mode_stretched_l2_design.rst diff --git a/doc/source/contributor/evpn_mode_design.rst b/doc/source/contributor/drivers/evpn_mode_design.rst similarity index 98% rename from doc/source/contributor/evpn_mode_design.rst rename to doc/source/contributor/drivers/evpn_mode_design.rst index b7c91cc1..63af3867 100644 --- a/doc/source/contributor/evpn_mode_design.rst +++ b/doc/source/contributor/drivers/evpn_mode_design.rst @@ -12,9 +12,9 @@ ''''''' Heading 4 (Avoid deeper levels because they do not render well.) -======================================== -Design of OVN BGP Agent with EVPN Driver -======================================== +========================================================= +Design of OVN BGP Agent with EVPN Driver (kernel routing) +========================================================= Purpose ------- @@ -96,7 +96,7 @@ watcher detects it). The overall arquitecture and integration between the ``networking-bgpvpn`` and the ``networking-bgp-ovn`` agent are shown in the next figure: -.. image:: ../../images/networking-bgpvpn_integration.png +.. image:: ../../../images/networking-bgpvpn_integration.png :alt: integration components :align: center :width: 100% @@ -409,7 +409,7 @@ The next figure shows the N/S traffic flow through the VRF to the VM, including information regarding the OVS flows on the provider bridge (br-ex), and the routes on the VRF routing table. -.. image:: ../../images/evpn_traffic_flow.png +.. image:: ../../../images/evpn_traffic_flow.png :alt: integration components :align: center :width: 100% diff --git a/doc/source/contributor/drivers/index.rst b/doc/source/contributor/drivers/index.rst new file mode 100644 index 00000000..5b221980 --- /dev/null +++ b/doc/source/contributor/drivers/index.rst @@ -0,0 +1,12 @@ +========================== + BGP Drivers Documentation +========================== + + .. toctree:: + :maxdepth: 1 + + bgp_mode_design + nb_bgp_mode_design + ovn_bgp_mode_design + evpn_mode_design + bgp_mode_stretched_l2_design diff --git a/doc/source/contributor/drivers/nb_bgp_mode_design.rst b/doc/source/contributor/drivers/nb_bgp_mode_design.rst new file mode 100644 index 00000000..d958ce3f --- /dev/null +++ b/doc/source/contributor/drivers/nb_bgp_mode_design.rst @@ -0,0 +1,386 @@ +.. _nb_bgp_driver: + +====================================================================== +[NB DB] NB OVN BGP Agent: Design of the BGP Driver with kernel routing +====================================================================== + +Purpose +------- + +The addition of a BGP driver enables the OVN BGP agent to expose virtual +machine (VMs) and load balancer (LBs) IP addresses through the BGP dynamic +protocol when these IP addresses are either associated with a floating IP +(FIP) or are booted or created on a provider network. +The same functionality is available on project networks, when a special +flag is set. + +This document presents the design decision behind the NB BGP Driver for +the Networking OVN BGP agent. + +Overview +-------- + +With the growing popularity of virtualized and containerized workloads, +it is common to use pure Layer 3 spine and leaf network deployments in +data centers. The benefits of this practice reduce scaling complexities, +failure domains, and broadcast traffic limits + +The northbound OVN BGP agent is a Python-based daemon that runs on each +OpenStack Controller and Compute node. +The agent monitors the Open Virtual Network (OVN) northbound database +for certain VM and floating IP (FIP) events. +When these events occur, the agent notifies the FRR BGP daemon (bgpd) +to advertise the IP address or FIP associated with the VM. +The agent also triggers actions that route the external traffic to the OVN +overlay. +Unlike its predecessor, the (southbound) OVN BGP agent, the northbound OVN BGP +agent uses the northbound database API which is more stable than the southbound +database API because the former is isolated from internal changes to core OVN. + + .. note:: + + Note northbound OVN BGP agent driver is only intended for the N/S traffic, + the E/W traffic will work exactly the same as before, i.e., VMs are + connected through geneve tunnels. + + +The agent provides a multi-driver implementation that allows you to configure +it for specific infrastructure running on top of OVN, for instance OpenStack +or Kubernetes/OpenShift. +This design simplicity enables the agent to implement different drivers, +depending on what OVN NB DB events are being watched (watchers examples at +``ovn_bgp_agent/drivers/openstack/watchers/``), and what actions are +triggered in reaction to them (drivers examples at +``ovn_bgp_agent/drivers/openstack/XXXX_driver.py``, implementing the +``ovn_bgp_agent/drivers/driver_api.py``). + +A driver implements the support for BGP capabilities. It ensures that both VMs +and LBs on provider networks or associated Floating IPs are exposed through +BGP. In addition, VMs on tenant networks can be also exposed +if the ``expose_tenant_network`` configuration option is enabled. +To control what tenant networks are exposed another flag can be used: +``address_scopes``. If not set, all the tenant networks will be exposed, while +if it is configured with a (set of) address_scopes, only the tenant networks +whose address_scope matches will be exposed. + +A common driver API is defined exposing the these methods: + +- ``expose_ip`` and ``withdraw_ip``: exposes or withdraws IPs for local + OVN ports. + +- ``expose_remote_ip`` and ``withdraw_remote_ip``: exposes or withdraws IPs + through another node when the VM or pods are running on a different node. + For example, use for VMs on tenant networks where the traffic needs to be + injected through the OVN router gateway port. + +- ``expose_subnet`` and ``withdraw_subnet``: exposes or withdraws subnets through + the local node. + + +Proposed Solution +----------------- + +To support BGP functionality the NB OVN BGP Agent includes a new driver +that performs the steps required for exposing the IPs through BGP on +the correct nodes and steering the traffic to/from the node from/to the OVN +overlay. +To configure the OVN BGP agent to use the northbound OVN BGP driver, in the +``bgp-agent.conf`` file, set the value of ``driver`` to ``nb_ovn_bgp_driver``. + +This driver requires a watcher to react to the BGP-related events. +In this case, BGP actions are triggered by events related to +``Logical_Switch_Port``, ``Logical_Router_Port``and ``Load_Balancer`` +on OVN NB DB tables. +The information in these tables is modified when VMs and LBs are created and +deleted, and when FIPs for them are associated and disassociated. + +Then, the agent performs these actions to ensure the VMs are reachable through +BGP: + +- Traffic between nodes or BGP Advertisement: These are the actions needed to + expose the BGP routes and make sure all the nodes know how to reach the + VM/LB IP on the nodes. This is exactly the same as in the initial OVN BGP + Driver (see :ref:`bgp_driver`) + +- Traffic within a node or redirecting traffic to/from OVN overlay (wiring): + These are the actions needed to redirect the traffic to/from a VM to the OVN + neutron networks, when traffic reaches the node where the VM is or in their + way out of the node. + +The code for the NB BGP driver is located at +``ovn_bgp_agent/drivers/openstack/nb_ovn_bgp_driver.py``, and its associated +watcher can be found at +``ovn_bgp_agent/drivers/openstack/watchers/nb_bgp_watcher.py``. + +Note this new driver also allows different ways of wiring the node to the OVN +overlay. These are configurable through the option ``exposing_method``, where +for now you can select: + +- ``underlay``: using kernel routing (what we describe in this document), same + as supported by the driver at :ref:`bgp_driver`. + +- ``ovn``: using an extra OVN cluster per node to perform the routing at + OVN/OVS level instead of kernel, therefore enabling datapath acceleration + (Hardware Offloading and OVS-DPDK). More information about this mechanism + at :ref:`bgp_driver`. + + +OVN NB DB Events +~~~~~~~~~~~~~~~~ + +The watcher associated with the BGP driver detects the relevant events on the +OVN NB DB to call the driver functions to configure BGP and linux kernel +networking accordingly. + + .. note:: + + Linux Kernel Networking is used when the default ``exposing_method`` + (``underlay``) is used. If ``ovn`` is used instead, OVN routing is + used instead of Kernel. For more details on this see :ref:`ovn_routing`. + +The following events are watched and handled by the BGP watcher: + +- VMs or LBs created/deleted on provider networks + +- FIPs association/disassociation to VMs or LBs + +- VMs or LBs created/deleted on tenant networks (if the + ``expose_tenant_networks`` configuration option is enabled, or if the + ``expose_ipv6_gua_tenant_networks`` for only exposing IPv6 GUA ranges) + + .. note:: + + If ``expose_tenant_networks`` flag is enabled, it does not matter the + status of ``expose_ipv6_gua_tenant_networks``, as all the tenant IPs + are advertised. + + +The NB BGP watcher reacts to the following events: + +- ``Logical_Switch_Port`` + +- ``Logical_Router_Port`` + +- ``Load_Balancer`` + +Besides the previously existing ``OVNLBEvent`` class, the NB BGP watcher has +new event classes named ``LSPChassisEvent`` and ``LRPChassisEvent`` that +all the events watched for NB BGP driver use as the base (inherit from). + +The specific defined events to react to are: + +- ``LogicalSwitchPortProviderCreateEvent``: Detects when a VM or an amphora LB + port, logical switch ports of type ``""`` (empty double-qoutes) or + ``virtual``, comes up or gets attached to the OVN chassis where the agent is + running. If the ports are on a provider network, then the driver calls the + ``expose_ip`` driver method to perform the needed actions to expose the port + (wire and advertise). If the port is on a tenant network, the driver + dismisses the event. + +- ``LogicalSwitchPortProviderDeleteEvent``: Detects when a VM or an amphora LB + port, logical switch ports of type "" (empty double-qoutes) or ``virtual``, + goes down or gets detached from the OVN chassis where the agent is running. + If the ports are on a provider network, then the driver calls the + ``withdraw_ip`` driver method to perform the needed actions to withdraw the + port (withdraw and unwire). If the port is on a tenant network, the driver + dismisses the event. + +- ``LogicalSwitchPortFIPCreateEvent``: Similar to + ``LogicalSwitchPortProviderCreateEvent`` but focusing on the changes on the + FIP information on the Logical Switch Port external_ids. + It calls ``expose_fip`` driver method to perform the needed actions to expose + the floating IP (wire and advertize). + +- ``LogicalSwitchPortFIPDeleteEvent``: Same as previous one but for withdrawing + FIPs. In this case it is similar to ``LogicalSwitchPortProviderDeleteEvent`` + but instaed calls the ``withdraw_fip`` driver method to perform the needed actions + to withdraw the floating IP (Withdraw and unwire). + +- ``LocalnetCreateDeleteEvent``: Detects creation/deletion of OVN localnet + ports, which indicates the creation/deletion of provider networks. This + triggers a resync (``sync`` method) action to perform the base configuration + needed for the provider networks, such as OVS flows or arp/ndp + configurations. + +- ``ChassisRedirectCreateEvent``: Similar to + ``LogicalSwitchPortProviderCreateEvent`` but with the focus on logical router + ports, such as the OVN gateway ports (cr-lrps), instead of logical switch + ports. The driver calls ``expose_ip`` which performs additional steps to also + expose IPs related to the cr-lrps, such as the ovn-lb or IPs in tenant + networks. The watcher ``match`` checks the chassis information in the + ``status`` field, which must be ovn23.09 or later. + +- ``ChassisRedirectDeleteEvent``: Similar to + ``LogicalSwitchPortProviderDeleteEvent`` but with the focus on logical router + ports, such as the OVN gateway ports (cr-lrps), instead of logical switch + ports. The driver calls ``withdraw_ip`` which performs additional steps to + also withdraw IPs related to the cr-lrps, such as the ovn-lb or IPs in tenant + networks. The watcher ``match`` checks the chassis information in the + ``status`` field, which must be ovn23.09 or later. + +- ``LogicalSwitchPortSubnetAttachEvent``: Detects Logical Switch Ports of type + ``router`` (connecting Logical Switch to Logical Router) and checks if the + associated router is associated to the local chassis, i.e., if the CR-LRP of + the router is located in the local chassis. If that is the case, the + ``expose_subnet`` driver method is called which is in charge of the wiring + needed for the IPs on that subnet (set of IP routes and rules). + +- ``LogicalSwitchPortSubnetDetachEvent``: Similar to + ``LogicalSwitchPortSubnetAttachEvent`` but for unwiring the subnet, so it is + calling the``withdraw_subnet`` driver method. + +- ``LogicalSwitchPortTenantCreateEvent``: Detects when a logical switch port + of type ``""`` (empty double-qoutes) or ``virtual``, similar to + ``LogicalSwitchPortProviderCreateEvent``. It checks if the network associated + to the VM is exposed in the local chassis (meaning its cr-lrp is also local). + If that is the case, it calls ``expose_remote_ip``, which manages the + advertising of the IP -- there is no need for wiring, as that is done when + the subnet is exposed by ``LogicalSwitchPortSubnetAttachEvent`` event. + +- ``LogicalSwitchPortTenantDeleteEvent``: Similar to + ``LogicalSwitchPortTenantCreateEvent`` but for withdrawing IPs. + Calling ``withdraw_remote_ips``. + +- ``OVNLBCreateEvent``: Detects Load_Balancer events and processes them only + if the Load_Balancer entry has associated VIPs and the router is local to + the chassis. + If the VIP or router is added to a provider network, the driver calls + ``expose_ovn_lb_vip`` to expose and wire the VIP or router. + If the VIP or router is added to a tenant network, the driver calls + ``expose_ovn_lb_vip`` to only expose the VIP or router. + If a floating IP is added, then the driver calls ``expose_ovn_lb_fip`` to + expose and wire the FIP. + +- ``OVNLBDeleteEvent``: If the VIP or router is removed from a provider + network, the driver calls ``withdraw_ovn_lb_vip`` to withdraw and unwire + the VIP or router. If the VIP or router is removed to a tenant network, + the driver calls ``withdraw_ovn_lb_vip`` to only withdraw the VIP or router. + If a floating IP is removed, then the driver calls ``withdraw_ovn_lb_fip`` + to withdraw and unwire the FIP. + + +Driver Logic +~~~~~~~~~~~~ + +The NB BGP driver is in charge of the networking configuration ensuring that +VMs and LBs on provider networks or with FIPs can be reached through BGP +(N/S traffic). In addition, if the ``expose_tenant_networks`` flag is enabled, +VMs in tenant networks should be reachable too -- although instead of directly +in the node they are created, through one of the network gateway chassis nodes. +The same happens with ``expose_ipv6_gua_tenant_networks`` but only for IPv6 +GUA ranges. In addition, if the config option ``address_scopes`` is set, only +the tenant networks with matching corresponding ``address_scope`` will be +exposed. + + .. note:: + + To be able to expose tenant networks a ovn version ovn23.09 or newer is + needed + +To accomplish the network configuration and advertisement, the driver ensures: + +- VM and LBs IPs can be advertised in a node where the traffic can be injected + into the OVN overlay: either in the node that hosts the VM or in the node + where the router gateway port is scheduled. (See the "limitations" + subsection.). + +- After the traffic reaches the specific node, kernel networking redirects the + traffic to the OVN overlay, if the default ``underlay`` exposing method is + used. + + +.. include:: ../bgp_advertising.rst + + +.. include:: ../bgp_traffic_redirection.rst + + +Driver API +++++++++++ + +The NB BGP driver implements the ``driver_api.py`` interface with the +following functions: + +- ``expose_ip``: creates all the IP rules and routes, and OVS flows needed + to redirect the traffic to OVN overlay. It also ensures that FRR exposes + the required IP by using BGP. + +- ``withdraw_ip``: removes the configuration (IP rules/routes, OVS flows) + from ``expose_ip`` method to withdraw the exposed IP. + +- ``expose_subnet``: adds kernel networking configuration (IP rules and route) + to ensure traffic can go from the node to the OVN overlay (and back) + for IPs within the tenant subnet CIDR. + +- ``withdraw_subnet``: removes kernel networking configuration added by + ``expose_subnet``. + +- ``expose_remote_ip``: BGP expose VM tenant network IPs through the chassis + hosting the OVN gateway port for the router where the VM is connected. + It ensures traffic directed to the VM IP arrives at this node by exposing + the IP through BGP locally. The previous steps in ``expose_subnet`` ensure + the traffic is redirected to the OVN overlay after it arrives on the node. + +- ``withdraw_remote_ip``: removes the configuration added by + ``expose_remote_ip``. + +And in addition, the driver also implements extra methods for the FIPs and the +OVN load balancers: + +- ``expose_fip`` and ``withdraw_fip`` which are equivalent to ``expose_ip`` and + ``withdraw_ip`` but for FIPs. + +- ``expose_ovn_lb_vip``: adds kernel networking configuration to ensure + traffic is forwarded from the node with the associated cr-lrp to the OVN + overlay, as well as to expose the VIP through BGP in that node. + +- ``withdraw_ovn_lb_vip``: removes the above steps to stop advertising + the load balancer VIP. + +- ``expose_ovn_lb_fip`` and ``withdraw_ovn_lb_fip``: for exposing the FIPs + associated to ovn loadbalancers. This is similar to + ``expose_fip/withdraw_fip`` but taking into account that it must be exposed + on the node with the cr-lrp for the router associated to the loadbalancer. + + +.. include:: ../agent_deployment.rst + + +Limitations +----------- + +The following limitations apply: + +- OVN 23.09 or later is needed to support exposing tenant networks IPs and + OVN loadbalancers. + +- There is no API to decide what to expose, all VMs/LBs on providers or with + floating IPs associated with them are exposed. For the VMs in the tenant + networks, use the flag ``address_scopes`` to filter which subnets to expose, + which also prefents having overlapping IPs. + +- In the currently implemented exposing methods (``underlay`` and + ``ovn``) there is no support for overlapping CIDRs, so this must be + avoided, e.g., by using address scopes and subnet pools. + +- For the default exposing method (``underlay``) the network traffic is steered + by kernel routing (ip routes and rules), therefore OVS-DPDK, where the kernel + space is skipped, is not supported. With the ``ovn`` exposing method + the routing is done at ovn level, so this limitation does not exists. + More details in :ref:`ovn_routing`. + +- For the default exposing method (``underlay``) the network traffic is steered + by kernel routing (ip routes and rules), therefore SRIOV, where the hypervisor + is skipped, is not supported. With the ``ovn`` exposing method + the routing is done at ovn level, so this limitation does not exists. + More details in :ref:`ovn_routing`. + +- In OpenStack with OVN networking the N/S traffic to the ovn-octavia VIPs on + the provider or the FIPs associated with the VIPs on tenant networks needs to + go through the networking nodes (the ones hosting the Neutron Router Gateway + Ports, i.e., the chassisredirect cr-lrp ports, for the router connecting the + load balancer members to the provider network). Therefore, the entry point + into the OVN overlay needs to be one of those networking nodes, and + consequently the VIPs (or FIPs to VIPs) are exposed through them. From those + nodes the traffic will follow the normal tunneled path (Geneve tunnel) to + the OpenStack compute node where the selected member is located. diff --git a/doc/source/contributor/drivers/ovn_bgp_mode_design.rst b/doc/source/contributor/drivers/ovn_bgp_mode_design.rst new file mode 100644 index 00000000..b5b28693 --- /dev/null +++ b/doc/source/contributor/drivers/ovn_bgp_mode_design.rst @@ -0,0 +1,265 @@ +.. _ovn_routing: + +=================================================================== +[NB DB] NB OVN BGP Agent: Design of the BGP Driver with OVN routing +=================================================================== + +This is an extension of the NB OVN BGP Driver which adds a new +``exposing_method`` named ``ovn`` to make use of OVN routing, instead of +relying on Kernel routing. + +Purpose +------- + +The addition of a BGP driver enables the OVN BGP agent to expose virtual +machine (VMs) and load balancer (LBs) IP addresses through the BGP dynamic +protocol when these IP addresses are either associated with a floating IP +(FIP) or are booted or created on a provider network. +The same functionality is available on project networks, when a special +flag is set. + +This document presents the design decision behind the extensions on the +NB OVN BGP Driver to support OVN routing instead of kernel routing, +and therefore enabling datapath acceleartion. + + +Overview +-------- + +The main goal is to make the BGP capabilities of OVN BGP Agent compliant with +OVS-DPDK and HWOL. To do that we need to move to OVN/OVS what the OVN BGP +Agent is currently doing with Kernel networking -- redirect traffic to/from +the OpenStack OVN Overlay. + +To accomplish this goal, the following is required: + +- Ensure that incoming traffic gets redirected from the physical NICs to the OVS + integration bridge (br-int) though one or more OVS provider bridges (br-ex) + without using kernel routes and rules. + +- Ensure the outgoing traffic gets redirected to the physical NICs without + using the default kernel routes. + +- Expose the IPs in the same way as we did before. + +The third point is simple as it is already being done, but for the first two +points OVN virtual routing capabilities are needed, ensuring the traffic gets +routed from the NICS to the OpenStack Overlay and vice versa. + + +Proposed Solution +----------------- + +To avoid placing kernel networking in the middle of the datapath and blocking +acceleration, the proposed solution mandates locating a separate OVN cluster +on each node that manages the needed virtual infrastructure between the +OpenStack networking overlay and the physical network. +Because routing occurs at OVN/OVS level, this proposal makes it is possible +to support hardware offloading (HWOL) and OVS-DPDK. + +The next figure shows the proposed cluster required to manage the OVN virtual +networking infrastructure on each node. + +.. image:: ../../../images/ovn-cluster-overview.png + :alt: OVN Routing integration + :align: center + :width: 100% + +In a standard deployment ``br-int`` is directly connected to the OVS external +bridge (``br-ex``) where the physical NICs are attached. +By contrast, in the default BGP driver solution (see :ref:`nb_bgp_driver`), +the physical NICs are not directly attached to br-ex, but rely on kernel +networking (ip routes and ip rules) to redirect the traffic to ``br-ex``. +The OVN routing architecture proposes the following mapping: + +- ``br-int`` connects to an external (from the OpenStack perspective) OVS bridge + (``br-osp``). + +- ``br-osp`` does not have any physical resources attached, just patch + ports connecting them to ``br-int`` and ``br-bgp``. + +- ``br-bgp`` is the integration bridge managed by the extra OVN cluster + deployed per node. This is where the virtual OVN resources are be created + (routers and switches). It creates mappings to ``br-osp`` and ``br-ex`` + (patch ports). + +- ``br-ex`` keeps being the external bridge, where the physical NICs are + attached (as in default environments without BGP). But instead of being + directly connected to ``br-int``, is connected to ``br-bgp``. Note for + ECMP purposes, each nic is attached to a different ``br-ex`` device + (``br-ex`` and ``br-ex-2``). + +The virtual OVN resources requires the following: + +- Logical Router (``bgp-router``): manages the routing that was + previously done in the kernel networking layer between both networks + (physical and OpenStack OVN overlay). It has two connections (i.e., Logical + Router Ports) towards the ``bgp-ex-X`` Logical Switches to add support for ECMP + (only one switch is required but you must have several in case of ECMP), + and one connection to the ``bgp-osp`` Logical Switch to ensure traffic + to/from the OpenStack networking overlay. + +- Logical Switch (``bgp-ex``): is connected to the ``bgp-router``, and has + a localnet to connect it to ``br-ex`` and therefore the physical NICs. There + is one Logical Switch per NIC (``bgp-ex`` and ``bgp-ex-2``). + +- Logical Switch (``bgp-osp``): is connected to the ``bgp-router``, and has + a localnet to connect it to ``br-osp`` to enable it to send traffic to + and from the OpenStack OVN overlay. + +The following OVS flows are required on both OVS bridges: + +- ``br-ex-X`` bridges: require a flow to ensure only the traffic + targetted for OpenStack provider networks is redirected to the OVN cluster. + + .. code-block:: ini + + cookie=0x3e7, duration=942003.114s, table=0, n_packets=1825, n_bytes=178850, priority=1000,ip,in_port=eth1,nw_dst=172.16.0.0/16 actions=mod_dl_dst:52:54:00:30:93:ea,output:"patch-bgp-ex-lo" + + + +- ``br-osp`` bridge: require a flow for each OpenStack provider network to + change the MAC by the one on the router port in the OVN cluster and to + properly manage traffic that is routed to the OVN cluster. + + .. code-block:: ini + + cookie=0x3e7, duration=942011.971s, table=0, n_packets=8644, n_bytes=767152, priority=1000,ip,in_port="patch-provnet-0" actions=mod_dl_dst:40:44:00:00:00:06,NORMAL + + +OVN NB DB Events +~~~~~~~~~~~~~~~~ + +The OVN northbound database events that the driver monitors are the same as +the ones for the NB DB driver with the ``underlay`` exposing mode. +See :ref:`nb_bgp_driver`. The main difference between the two drivers is +that the wiring actions are simplified for the OVN routing driver. + + +Driver Logic +~~~~~~~~~~~~ + +As with the other BGP drivers or ``exposing modes`` (:ref:`bgp_driver`, +:ref:`nb_bgp_driver`) the NB DB Driver with the ``ovn`` exposing mode enabled +(i.e., enabling ``OVN routing`` instead of rely on ``Kernel networking``) +is in charge of exposing the IPs with BGP and of the networking configuration +to ensure that VMs abd LBs on provider networks or with FIPs can be reached +through BGP (N/S traffic). Similarly, if ``expose_tenant_networks`` flag is +enabled, VMs in tenant networks should be reachable too -- although instead +of directly in the node they are created, through one of the network gateway +chassis nodes. The same happens with ``expose_ipv6_gua_tenant_networks`` +but only for IPv6 GUA ranges. +In addition, if the config option ``address_scopes`` is set only the tenant +networks with matching corresponding address_scope will be exposed. + +To accomplish this, it needs to configure the extra per node ovn cluster to +ensure that: + +- VM and LBs IPs can be advertized in a node where the traffic could be injected + into the OVN overlay through the extra ovn cluster (instead of the Kernel + routing) -- either in the node hosting the VM or the node where the router + gateway port is scheduled. + +- Once the traffic reaches the specific node, the traffic is redirected to the + OVN overlay by using the extra ovn cluster per node with the proper OVN + configuration. To do this it needs to create Logical Switches, Logical + Routers and the routing configuration between them (routes and policies). + +.. include:: ../bgp_advertising.rst + + +Traffic Redirection to/from OVN ++++++++++++++++++++++++++++++++ + +As explained before, the main idea of this exposing mode is to leverage OVN +routing instead of kernel routing. For the traffic going out the steps are +the next: + +- If (OpenStack) OVN cluster knows about the destination MAC then that works + as in deployment without BGP or OVN cluster support (no arp needed, MAC + directly used). If the MAC is unknown but on the same provider network(s) + range, the ARP gets replied by the Logical Switch Port on the ``bgp-osp`` LS + thanks to enabling arp_proxy on it. And if it is a different range, it will + reply due to the router having default routes to the outside. + The flow at ``br-osp`` is in charge of changing the destination MAC by the + one on the Logical Router Port on ``bgp-router`` LR. + +- The previous step takes the traffic to the extra OVN cluster per node, where + the default (ECMP) routes are used to send the traffic to the external + Logical Switch and from there to the physical nics attached to the external + OVS bridge(s) (``br-ex``, ``br-ex-2``). In case of known MAC by OpenStack, + instead of the default routes, a Logical Route Policy gets applied so that + traffic is forced to be redirected out (through the LRPs connected to the + external LS) when comming through the internal LRP (the one connected to + OpenStack). + +And for the traffic comming in: + +- The flow hits the ovs flow added at the ``br-ex-X`` bridge(s) to redirect + the traffic to the per node OVN cluster, changing the destination MAC by + the one at the related ``br-ex`` device, which are the same used for the + OVN cluster Logical Router Ports. This takes the traffic to the OVN router. + +- After that, thanks to having the arp_proxy enabled on the LSP on ``bgp-osp`` + the traffic will be redirected to there. And due to a limitation in the + functionality of arp_proxy, there is a need of adding an extra static mac + binding entry in the cluster so that the VM MAC is used for destination + instead of the own LSP MAC, which would lead to droping the traffic on the + LS pipeline. + + .. code-block:: ini + + _uuid : 6e1626b3-832c-4ee6-9311-69ebc15cb14d + ip : "172.16.201.219" + logical_port : bgp-router-openstack + mac : "fa:16:3e:82:ee:19" + override_dynamic_mac: true + + +Driver API +++++++++++ + +This is the very same as in the NB DB driver with the ``underlay`` exposing +mode. See :ref:`nb_bgp_driver`. + + +Agent deployment +~~~~~~~~~~~~~~~~ +The deployment is similar to the NB DB driver with the ``underlay`` exposing +method but with some extra configuration. See :ref:`nb_bgp_driver` for the base. + +It is needed to state the exposing method in the DEFAULT section and the extra +configuration for the local ovn cluster that performs the routing, including the +range for the provider networks to expose/handle: + + .. code-block:: ini + + [DEFAULT] + exposing_method=ovn + + [local_ovn_cluster] + ovn_nb_connection=unix:/run/ovn/ovnnb_db.sock + ovn_sb_connection=unix:/run/ovn/ovnsb_db.sock + external_nics=eth1,eth2 + peer_ips=100.64.1.5,100.65.1.5 + provider_networks_pool_prefixes=172.16.0.0/16 + + +Limitations +----------- + +The following limitations apply: + +- OVN 23.06 or later is needed + +- Tenant networks, subnet and ovn-loadbalancer are not yet supported, and will + require OVN 23.09 or nlaterewer. + +- IPv6 not yet supported + +- ECMP not properly working as there is no support for BFD at the ovn-cluster, + which means if one of the routes goes away the OVN cluster won't react to it + and there will be traffic disruption. + +- There is no support for overlapping CIDRs, so this must be avoided, e.g., by + using address scopes and subnet pools. diff --git a/doc/source/contributor/index.rst b/doc/source/contributor/index.rst index 4a78000d..a9a8535a 100644 --- a/doc/source/contributor/index.rst +++ b/doc/source/contributor/index.rst @@ -5,7 +5,7 @@ .. toctree:: :maxdepth: 2 - bgp_mode_design - evpn_mode_design - bgp_mode_stretched_l2_design - bgp_supportability_matrix + drivers/index + agent_deployment + bgp_advertising + bgp_traffic_redirection diff --git a/doc/source/index.rst b/doc/source/index.rst index c03ce5aa..fdcbd13d 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -10,10 +10,11 @@ Welcome to the documentation of OVN BGP Agent Contents: .. toctree:: - :maxdepth: 2 + :maxdepth: 3 readme contributor/index + bgp_supportability_matrix Indices and tables ==================