.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode ========================================= Active-active L3 Gateway with Multihoming ========================================= https://bugs.launchpad.net/neutron/+bug/2002687 Currently Neutron routers only support one external gateway port and ECMP default routes can only be added manually as extra static routes. Likewise, BFD is not configurable for ECMP routes. This specification provides an extension to the existing Neutron API for configuring multiple external gateways with automatic addition of ECMP default routes and BFD for those routes. It also discusses the problem of scheduling multiple gateway ports per router on different chassis OVN is chosen as a primary backend. Problem Description =================== Some network designs include multiple L3 gateways to: * Share the load across different gateways: both in terms of different OVN chassis hosting different gateway ports and sharing the processing load and upstream gateways handling parts of the north-south flows; * Provide independent network paths for the north-south direction (e.g. via different ISPs) for resiliency without relying on the same L2. Having multi-homing implemented at the instance level imposes additional burden on the end user of a cloud and support requirements for the guest OS, whereas utilizing ECMP and BFD at the router side alleviates the need for instance-side awareness of a more complex routing setup. Adding more than one gateway port implies extending the existing data model which was described in the `multiple external gateways spec`_. However, it left adding additional gateway routes out of scope leaving this to future improvements around dynamic routing. Also the focus of neutron-dynamic-routing has so far been around advertising routes, not accepting new ones from the external peers - so dynamic routing support like this is a very different subject. However, manual addition of extra routes does not utilize the default gateway IP information available from subnets in Neutron while this could be addressed by implementing an extra conditional behavior when adding more than one gateway port to a router. ECMP routes can result in black-holing of traffic should the next-hop of a route becomes unreachable. `BFD`_ is a standard protocol adopted by IETF for next-hop failure detection which can be used for route eviction. OVN supports BFD `as of v21.03.0`_ with a data model that allows enabling BFD on a per next-hop basis by associating BFD session information with routes, however, it is not modeled at the Neutron level even if a backend supports it. From the Neutron data model perspective, ECMP for routes is already a supported concept since `ECMP support spec`_ got implemented in Wallaby (albeit the spec focused on the L3-agent based implementation). As for OVN and BFD, the OVN database state needs to be populated by Neutron based on the data from the Neutron database, therefore, data model changes to the Neutron DB are needed to represent the BFD session parameters. Proposed Change =============== DB Impact --------- Core Model ^^^^^^^^^^ Currently there are two ways in which router to gateway port relationship is expressed in Neutron: * The ``gw_port_id`` foreign key in the ``routers`` table which is set to a UUID of the router port that has a type of ``network:router_gateway``; * The ``routerports`` table which was `added for referential integrity`_ purposes and stores ``router_id`` to ``port_id`` mappings along with a redundant ``port_type``. In terms of the representation of multiple gateway ports in the Neutron DB the proposal follows the `multiple external gateways spec`_: * Keep the ``routers``, ``routerports`` and ``ports`` tables as they are now but start storing multiple ``network:router_gateway`` ports per router in the ``routerports`` table; * For backwards-compatibility store a single gateway port id in ``routers.gw_port_id`` which will be present both in this field and also in the ``routerports`` table and in ``routers.gw_port_id``). * Extend the ``neutron.db.models.l3.Router`` class with a new attribute ``gw_ports`` that will map to all relevant ``network:router_gateway`` ports stored in the ``routerports`` table. BFD and ECMP Route Behavior Modeling ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each external gateway info dictionary of a router should contain additional information about the policy around handling of default routes derived from the subnets associated with the gateway ports of a router. This is needed so that Neutron as a CMS has the required state to specify to OVN whether ECMP routes are wanted and whether BFD needs to be enabled for them instead of just passing those options through at the time of addition of an external gateway to a router. Therefore, this information needs to be stored along the router. The following additional columns are proposed for the ``routers`` table: ``enable_default_route_ecmp`` is a router-level policy on whether ECMP default routes should be used or not (if the L3 service plugin supports them). In the OVN case implying `ECMP symmetric reply`_ to be enabled when adding ECMP routes can be a reasonable default. ``enable_default_route_bfd`` is a router-level policy on whether BFD should be used for checking whether the next-hop of a default route is reachable (if the L3 service plugin supports BFD). Besides the handling of default routes derived from subnets, the BFD information needs to be associated with every route object similar to how it is done in the OVN data model itself. The reason behind this is that BFD can be enabled or disabled on a per next-hop basis even if the destination of a route is the same. The `OVN modeling of BFD`_ serves as a basis for the data model changes for the Neutron DB. An additional ``BFD`` table will be associated with the ``routerroutes`` table storing static routes much like the association of `logical router routes with BFD records`_ is done. OVN allows making a BFD session for a particular static route use a different destination IP for checking reachability rather than the next hop of the static route itself (by storing the BFD peer IP address in the ``dst_ip`` field in the ``BFD`` table). This is a useful semantic for separating the control plane and data plane and can be used for static routes of Neutron routers too. However, when it comes to default routes inferred from gateway port to subnet associations, Neutron's behavior should be to use the next hop of the default route as a ``dst_ip``. OVN models the BFD table as follows (see `ovn-nb docs`_ for more information): * ``dst_ip`` - a BFD peer IP address; * ``mix_tx`` - an integer specifying the minimum interval (milliseconds, >=1) that OVN would use when transmitting the BFD control packet (minus jitter); * ``min_rx`` - an integer specifying the minimum interval (milliseconds) between the received BFD control packets that OVN is capable of supporting (minus the jitter applied by the sender). Can be set to 0 to state that BFD control packet transmission from the BFD peer is not desired; * ``detect_mult`` - an integer (>= 1) specifying the detection time multiplier. The negotiated transmit interval, multiplied by this value, provides the Detection Time for the receiving system in Asynchronous mode. * OVN includes the ``options`` field that includes a map of string-string pairs which is reserved for future use. A similar field or extra columns can be added later to the Neutron model to account for additional extensions. In Neutron each static route could optionally have a BFD record associated with it with a BFD record ID acting as a foreign key in the ``routerroutes`` table. The proposed approach for this spec is limited to only adding a small portion of the columns in the database and without exposing a full API to BFD (a future revision might implement the `BFD support spec`_ to provide a full API) with the aim of allowing operators to specify the need to use BFD with default settings of a backend. The proposed columns for the ``bfd_monitor`` table for in this spec are: +-------------------+---------+-------+------+---------------------------------------+ | Attribute | Type | Req | CRUD | Description | +===================+=========+=======+======+=======================================+ | id | uuid-str| No | R | An id of a bfd_monitor | +-------------------+---------+-------+------+---------------------------------------+ | name | String | No | CRU | Human readable name for the | | | | | | bfd_monitor (255 characters limit). | | | | | | Does not have to be unique. | +-------------------+---------+-------+------+---------------------------------------+ | project_id | String | No | R | The owner of the bfd_monitor. | +-------------------+---------+-------+------+---------------------------------------+ | dst_ip | String | Yes | CR | The destination IP address to be | | | | | | monitored. In case of a singlehop bfd | | | | | | this is the nexthop ip of the route, | | | | | | for the general case (like multihop | | | | | | bfd) this is an arbitrary IP (IPv4 or | | | | | | Ipv6) that can serve as BFD neighbor. | +-------------------+---------+-------+------+---------------------------------------+ | status | String | N/A | R | Shows if the BFD monitor was | | | | | | succesfully created in the backend, | | | | | | but nothing about the session status, | | | | | | for that the session_status API | | | | | | endpoint can be used. | +-------------------+---------+-------+------+---------------------------------------+ Rest API Changes ---------------- Router API ^^^^^^^^^^ The API changes augment the changes from the `multiple external gateways spec`_ but also include additional changes when it comes to default route ECMP and BFD handling. Thus, a different name for an API extension is used in this specification: ``external-gateway-multihoming`` and API changes are listed here in full. This extension adds new parameters to the ``external_gateway_info`` dictionary: :: { 'network_id': { 'type:uuid': None, 'required': True}, 'external_fixed_ips': { 'type:fixed_ips': None, 'required': False }, 'enable_snat': { 'type:boolean': None, 'required': False, 'convert_to': converters.convert_to_boolean }, 'enable_default_route_ecmp': { 'type:boolean': None, 'required': False, 'convert_to': converters.convert_to_boolean }, 'enable_default_route_bfd': { 'type:boolean': None, 'required': False, 'convert_to': converters.convert_to_boolean }, } ``enable_default_route_ecmp`` is a router-level policy on whether ECMP default routes should be used or not (if the L3 service plugin supports them). ``enable_default_route_bfd`` is a router-level policy on whether BFD should be used for checking whether the next-hop of a default route is reachable (if the L3 service plugin supports BFD). A new router attribute is added as well called ``external_gateways`` which is a list of ``external_gateway_info`` structures. The first element of the ``external_gateways`` list is special for compatibility purposes as it contains the same information as the ``external_gateway_info`` does. When ``enable_default_route_ecmp`` is set to ``false`` it also defines the default gateway placed into the routers routing table (in the multi-segment network case, the placement of a gateway port matters for inferring the default gateway based on the subnet used on a network segment). The order of the rest of the list is ignored. Duplicates in the list (that is multiple external gateways with the same ``network_id``) are allowed: in that case multiple gateway ports will be attached to the same network (this can be used to have the active-active setup when external gateways are available on the same network). Updating ``external_gateway_info`` also updates the first element of ``external_gateways`` and it leaves the rest of ``external_gateways`` unchanged. Setting ``external_gateway_info`` to an empty value also resets ``external_gateways`` to the empty list. The ``external_gateways`` attribute cannot be set in ``POST /v2.0/routers`` or ``PUT /v2.0/routers/{router_id}`` requests, instead it can be managed via sub-methods: * ``PUT /v2.0/routers/{router_id}/add_external_gateways`` Accepts a list of ``external_gateway_info`` structures. Adding an external gateway to a network that already has one raises an error. * ``PUT /v2.0/routers/{router_id}/update_external_gateways`` Accepts a list of ``external_gateway_info`` structures. The external gateways to be updated are identified by the ``network_ids`` found in the PUT request. The ``external_fixed_ips`` and ``enable_snat`` fields can be updated. The ``network_id`` field cannot be updated. * ``PUT /v2.0/routers/{router_id}/remove_external_gateways`` Accepts a list of potentially partial ``external_gateway_info`` structures. Only the ``network_id`` field from ``external_gateway_info`` structure is used. The ``external_fixed_ips`` and ``enable_snat`` keys can be present but their values are ignored. The add/update/remove PUT sub-methods respond with the whole router object just as ``POST/PUT/GET /v2.0/routers``. Extra routes API: ECMP ^^^^^^^^^^^^^^^^^^^^^^ As the `ECMP support spec`_ notes, there are no API changes to make to support ECMP routes per se: multiple routes to the same destination and different next-hops can already be specified when adding extra routes. However, that spec focused on the agent-based implementation - part of the work to implement this spec is to check whether the same is true for the OVN-based L3 implementation. One parameter that is proposed to be added here is a boolean ``ecmp_symmetric_reply`` which will act a way to populate the relevant new column for each static route (it is modeled at the route level in OVN as well). Extra routes API: BFD ^^^^^^^^^^^^^^^^^^^^^ In the absence of a full BFD API, users will have an option to specify a policy on the routers (``enable_default_route_bfd``) or routes (by supplying an additional option to the ``add_extraroutes`` API invocation called ``enable_bfd``). Neutron will manage ``bfd_monitor`` records related to those options internally for now using route next hops as ``dst_ip`` fields. Out of Scope ============ * BFD authentication as it is not implemented in the OVN BFD implementation while it is present in the protocol RFC itself. Therefore, the data model should be extensible to support this in the future; * Solving the distributed SNAT problem. One direction is to use conntrack state synchronization between the gateway ports. Other ideas involve making smarter control plane choices about where this conntrack state should exist instead of distributing it everywhere - this can be done by ensuring that processing of flows is done locally to the instance but there are downsides to that as well which needs to be considered more carefully * Accepting ECMP routes via dynamic routing protocols. The current aim is to utilize the default gateway information available in Neutron subnets to configure default gateway ECMP routes or to use the extra routes extension. This specification is a building block for the future support of dynamic routing. * Modeling of route metrics. While there are cases where one default route could be preferred over the other for the same destination, neither Neutron nor OVN model this concept today; * Implementation of BFD for the non-OVN L3 implementation based on Linux namespaces. Asymmetric Routing and Distributed Routers ------------------------------------------ Conntrack can be utilized to avoid responses generated by instances to go via the route different from the one the request came in on in presence of ECMP routes. OVN has support for making the reply traffic take the symmetric path. This can be configured by utilizing the `options column`_ in the logical router static routes table in OVN which allows configuring `ECMP symmetric reply`_ by setting ``ecmp_symmetric_reply`` option to ``true``. Routes in Neutron could have an ``ecmp_symmetric_reply`` option to specify a policy on whether to enable `ECMP symmetric reply`_ depending on whether the L3 service plugin supports it or not. However, the `commit introducing the feature`_ in OVN notes a limitation on its use: it can only be used on gateway routers, not distributed routers that have a gateway port due to the dependency of the ingress pipeline logic of the logical router on the hypervisor-local CT state. .. _multiple external gateways spec: https://specs.openstack.org/openstack/neutron-specs/specs/xena/multiple-external-gateways.html .. _BFD: https://www.rfc-editor.org/rfc/rfc5880 .. _as of v21.03.0: https://github.com/ovn-org/ovn/commit/6e0a69ad4bcdf9e4cace5c73ef48ab06065e8519 .. _ECMP support spec: https://specs.openstack.org/openstack/neutron-specs/specs/wallaby/l3-router-support-ecmp.html .. _added for referential integrity: https://opendev.org/openstack/neutron/commit/93012915a3445a8ac8a0b30b702df30febbbb728 .. _OVN modeling of BFD: https://github.com/ovn-org/ovn/blob/v22.12.0/ovn-nb.ovsschema#L612-L636 .. _logical router routes with BFD records: https://github.com/ovn-org/ovn/blob/v22.12.0/ovn-nb.ovsschema#L449-L452 .. _ovn-nb docs: https://www.ovn.org/support/dist-docs/ovn-nb.5.txt .. _options column: https://github.com/ovn-org/ovn/blob/v22.12.0/ovn-nb.ovsschema#L453-L455 .. _ECMP symmetric reply: https://github.com/ovn-org/ovn/blob/v22.12.0/ovn-nb.xml#L3312-L3319 .. _commit introducing the feature: https://github.com/ovn-org/ovn/commit/4fdca656857d4a5caeec35ae813888cb9e403e5e .. _BFD support spec: https://specs.openstack.org/openstack/neutron-specs/specs/xena/bfd_support.html