L3 router support ECMP
This spec outlines the Implementation plan of ECMP in neutron. Patch for this spec: https://review.opendev.org/#/c/743661 Related-Bug: #1880532 Change-Id: I67ebf642fbb130a7701792d66629dbab2d76181b
This commit is contained in:
parent
a7b0484b54
commit
9cbcaa13e3
410
specs/wallaby/l3-router-support-ecmp.rst
Normal file
410
specs/wallaby/l3-router-support-ecmp.rst
Normal file
@ -0,0 +1,410 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
======================
|
||||
L3 router support ECMP
|
||||
======================
|
||||
|
||||
Blueprint:
|
||||
https://blueprints.launchpad.net/neutron/+spec/support-for-ecmp
|
||||
|
||||
Launchpad Bug:
|
||||
https://bugs.launchpad.net/neutron/+bug/1880532
|
||||
|
||||
ECMP is a kind of routing technology which allows traffic to reach the
|
||||
same destination via multiple different links. Neutron does not need to
|
||||
calculate the equivalent route path, but leave that part of the work to
|
||||
those applications using ECMP API. Neutron just receives those parameters
|
||||
and configures routers. Since we have "ip route" command provided by the
|
||||
iproute2 utility in Linux, Neutron can simply address ECMP by using pyroute2
|
||||
and adding route entry into Neutron router namespace.
|
||||
|
||||
This feature is currently designed to support Octavia's multi-active scheme,
|
||||
allowing LoadBalancer in Octavia to have multiple amphoras at the same time.
|
||||
By configuring the ECMP route in the router, multiple amphoras can have a
|
||||
virtual IP at the same time to serve a set of functions that require high
|
||||
concurrency support.
|
||||
|
||||
.. _P2:
|
||||
|
||||
.. note::
|
||||
|
||||
Items marked with [`P2`_] refer to lower priority features
|
||||
to be designed / implemented only after initial release.
|
||||
|
||||
[`P2`_] Currently the equal cost route is a simple 5 tuple, that means if
|
||||
we have one <nexthop> unreachable and remove it from ECMP routes, all
|
||||
connections get redistributed. To avoid this, we intend to use a consistent
|
||||
hashing instead of the original scheme. This scheme which can support
|
||||
consistent hashing is based on hmark which was added in iptables-1.4.15 or
|
||||
later. See the history file of the iptables on [1]_.
|
||||
|
||||
Then this spec describes how to implement ECMP in Neutron.
|
||||
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
Octavia has proposed an active-active load balancing design on [2]_.
|
||||
|
||||
Topology Description
|
||||
--------------------
|
||||
|
||||
::
|
||||
|
||||
|
||||
|
||||
|
||||
Tenant Backend
|
||||
+----------------+ Network
|
||||
| | +
|
||||
Internet+-------------->+ router/gw +----------------->
|
||||
| | ECMP |
|
||||
+----------------+ |
|
||||
|
|
||||
Management |
|
||||
Network |
|
||||
+ |
|
||||
| | +----------+
|
||||
| +-----------------------+ | | Tenant |
|
||||
| +----+ +---------+ <---------+Service(1)|
|
||||
| |MGMT| loadbalancer(1) | VIP|Back| | | |
|
||||
<----------+ IP | | | IP +---------> +----------+
|
||||
| +---------------------------------+ |
|
||||
| | | +----------+
|
||||
| | | | Tenant |
|
||||
| | ICMP <---------+Service(2)|
|
||||
| | DETECT | | |
|
||||
| | | +----------+
|
||||
| | |
|
||||
| +-----------------------+ v | +----------+
|
||||
| +----+ +---------+ | | Tenant |
|
||||
| |MGMT| loadbalancer(2) | VIP|Back| <---------+service(3)|
|
||||
<----------+ IP | | | IP +---------> | |
|
||||
| +---------------------------------+ | +----------+
|
||||
| | |
|
||||
| | |
|
||||
| +-------------+ | | ● ● ●
|
||||
| |Octavia Lbaas| | |
|
||||
<---------+ Controller | ● ● ● | ICMP |
|
||||
| +-------------+ | DETECT | +----------+
|
||||
| | | | Tenant |
|
||||
| | <---------+Service(M)|
|
||||
| | | | |
|
||||
| +-----------------------+ v | +----------+
|
||||
| +----+ +---------+ |
|
||||
| |MGMT| loadbalancer(n)| VIP|Back| |
|
||||
<----------+ IP | | | IP +--------->
|
||||
| +---------------------------------+ |
|
||||
+ +
|
||||
|
||||
This program proposed such a scheme:
|
||||
|
||||
* Multiple load balancing servers in a vip-subnet, sharing one virtual IP
|
||||
and one or more back end pools to response clients' request, and each
|
||||
loadbalancer has its own IP address.
|
||||
|
||||
* Clients send requests to VIP, then the router distributes every single
|
||||
request to a load balancing server which has the correct VIP configured
|
||||
on it.
|
||||
|
||||
* Finally, the load balancing server distributes the request to a back end.
|
||||
The loadbalancers and tenant service vm can be in the same subnet or
|
||||
different networks.
|
||||
|
||||
In such a situation, Octavia needs the router to support ECMP for distributing
|
||||
requests. So Octavia can send a request to Neutron for creating an ECMP route,
|
||||
then Neutron L3 agent executes command in the Neutron router's namespace to
|
||||
create an ECMP entry in it, using VIP as the destination IP of the route's
|
||||
entry, and several load balancers' IP as nexthop IP. So those requests having
|
||||
VIP as their destinations can be distributed to each loadbalancer.
|
||||
|
||||
The whole process implements two levels of load balancing, i.e. load balancing
|
||||
between multiple loadbalancers and load balancing between the backend
|
||||
real servers
|
||||
|
||||
[`P2`_] Based on current public cloud operator implementations in production
|
||||
environments, tenants usually only see IPs in the same network, so
|
||||
considering the same broadcast domain, the router needs to enable proxy
|
||||
ARP on the corresponding interface.(Users need to disable the proxy ARP
|
||||
capability of vms in nexthops by themselves)
|
||||
|
||||
User Workflow
|
||||
-------------
|
||||
|
||||
Generally, users can use the ECMP function for their own purposes.
|
||||
For putting an ECMP entry into the router namespace,
|
||||
user can set routes with same destination by using command::
|
||||
|
||||
openstack router add route \
|
||||
--route destination=20.0.20.0/24,gateway=12.0.0.11 \
|
||||
--route destination=20.0.20.0/24,gateway=12.0.0.12 router-ecmp
|
||||
|
||||
And withdraw the ECMP entry with::
|
||||
|
||||
openstack router add route \
|
||||
--route destination=20.0.20.0/24,gateway=12.0.0.11 \
|
||||
--route destination=20.0.20.0/24,gateway=12.0.0.12 router-ecmp
|
||||
|
||||
For more information about router related OSC, please read [3]_.
|
||||
|
||||
An integrated sequence diagram of the Octavia's use case is here:
|
||||
|
||||
::
|
||||
|
||||
+------+ +--------+ +-------+ +--------+ +-------+ +------------+
|
||||
|client| |Octavia | |Neutron| |LB Node | |qrouter| |service pool|
|
||||
+------+ +---+----+ +---+---+ +---+----+ +---+---+ +------+-----+
|
||||
|create LB | | | | |
|
||||
+-------------> | create ecmp | | | |
|
||||
|service +--------------> | | |
|
||||
| | LB server boot | | |
|
||||
| +--------------+---------->+ | |
|
||||
| | | set ecmp route | |
|
||||
| | ecmp done +-----------+--------->+ |
|
||||
| +<-------------| | | |
|
||||
| | LB server boot done | | |
|
||||
|LB service done+<-------------+-----------+ | |
|
||||
+<--------------+ | | | |
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
|sending request| | | | |
|
||||
+---------------------------------------------------->| |
|
||||
| | | | pick a LB node |
|
||||
| | | +<---------| |
|
||||
| | | | pick a service node |
|
||||
| | | +---------------------->+
|
||||
| | | | |response |
|
||||
| | | +<----------------------+
|
||||
| | response | | | |
|
||||
+<-----------------------------------------+ | |
|
||||
| | | | | |
|
||||
| | | | | |
|
||||
v v + v v v
|
||||
|
||||
|
||||
Suppose a user has a set of services that require a multi-active
|
||||
load-balancing scheme, so the user send a request to Octavia to create a
|
||||
loadbalancer, specifying topology as multi-active. And post a vip-subnet
|
||||
to Octavia to assign an IP or directly post a virtual port, which is
|
||||
defined by Octavia, and then users need to submit parameters such as
|
||||
pool, member, listener, etc., but the latter are irrelevant to Neutron,
|
||||
you can find them in Octavia document.
|
||||
|
||||
While Octavia is creating a loadbalancer, it will also send an `update_router`
|
||||
request or an `add_extraroutes` request to Neutron, post severval `routes`
|
||||
entries with same `destination` param, and load balancers' IPs as
|
||||
`nexthop` param.
|
||||
|
||||
Neutron receives the request from Octavia, determines whether to add an ECMP
|
||||
route by calculating whether there are multiple routes with the same
|
||||
destination address, making sure the router will distribute those packets
|
||||
with vip as their destination.
|
||||
|
||||
Those ECMP routes will be removed when user drops the multi-active
|
||||
loadbalancer, and it could be modified when adding or removing a load balancing
|
||||
node.
|
||||
|
||||
|
||||
Data flow
|
||||
---------
|
||||
|
||||
* [`P2`_] (If on a same network, use ARP proxy) A client requests mac
|
||||
address of the VIP and accesses the service based on this mac address.
|
||||
the router will use gateway MAC address to respond.
|
||||
|
||||
* The client's datagram will be transmitted to the router first.
|
||||
|
||||
* The router gateway checks ECMP routing entries then forwards the
|
||||
client's packets to the load balancers.
|
||||
|
||||
* Load balancer accepts connections from clients, receives traffic, then
|
||||
distributes it to the back-end server pool.
|
||||
|
||||
* The reply traffic from the back-end server pool go through load balancers
|
||||
and then comes to the router (directly comes back to intranet clients if on
|
||||
a same network), these packets are eventually forwarded back by the router.
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
In Server Side
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
* There are no changes that have to be made in server side.
|
||||
|
||||
In Agent Side
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Modify the logic of processing router_update event in L3 agent to
|
||||
support adding ECMP routes in routers.
|
||||
The `routes_updated` function in RouterInfo will behave as below:
|
||||
|
||||
* When more than one route is found to have the same destination, L3
|
||||
agent should execute a pyroute2 code, which looks like
|
||||
|
||||
::
|
||||
|
||||
ip.route('replace', dst='<destination_ip>',multipath=[{"gateway":
|
||||
"<nexthop1>"},{"gateway":"<nexthop2>"}])
|
||||
|
||||
* Then there will be an ip route entry in the namespace, which looks like
|
||||
|
||||
::
|
||||
|
||||
<vip> proto static
|
||||
nexthop via <nexthop_ip1> dev qr-xxxxxxxx-nn weight 1
|
||||
nexthop via <nexthop_ip2> dev qr-xxxxxxxx-nn weight 1
|
||||
|
||||
Then router will randomly pick a <nexthop_ip> and fill its mac address into
|
||||
the package's dst_mac address when it wants to get to the <destination_ip>.
|
||||
|
||||
[`p2`_]For keeping connection while removing a load balancing node, use
|
||||
iptables instead of simply a ip route entry.
|
||||
|
||||
- Use `HMARK` to mark flows in mangle table, the `fwmark` values
|
||||
determined by the source address.
|
||||
- Distribute flows to different tables by `fwmark` values.
|
||||
- There is a mapping between the `fwmark` values and the table values
|
||||
- For each table, give it a default nexthop ip.
|
||||
- Modify the mapping between `fwmark` values and table values
|
||||
when a `nexthop` is unreachable.
|
||||
|
||||
[`p2`_]In order to let traffic from the same network to pass through the
|
||||
router, L3 agent will also let router to use Proxy ARP by setting command::
|
||||
|
||||
sysctl -w net.ipv4.conf.<NIC_1>.proxy_arp_pvlan=1
|
||||
|
||||
* <NIC_1> is the name of the router interface to which the destination
|
||||
subnet is connected. For example, router `R1` is connected to a
|
||||
subnet `sub-1` whose cidr is `10.10.10.0/24`, so there will be a
|
||||
virtual network interface device `qr-abcdefgh` in the router related
|
||||
namespace as the gateway for the subnet `sub-1`, then add an
|
||||
ECMP route with a destination like `10.10.10.5/32` which is in the
|
||||
scope of the subnet `sub-1`, at this point, the above command
|
||||
will be executed and <NIC_1> will be `qr-abcdefgh`.
|
||||
|
||||
* For making the ARP proxy optional, add an config option in L3Agent.ini::
|
||||
|
||||
[ECMP]
|
||||
|
||||
router_interface_arp_proxy = True
|
||||
|
||||
|
||||
Data Model Impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API Impact
|
||||
---------------
|
||||
|
||||
|
||||
Following REST APIs wil be affected::
|
||||
|
||||
PUT /v2.0/routers/<router_id>/add_extraroutes
|
||||
|
||||
PUT /v2.0/routers/<router_id>/remove_extraroutes
|
||||
|
||||
PUT /v2.0/routers/<router_id>
|
||||
|
||||
The above three APIs are the current methods used to add/remove custom
|
||||
routes. See the usage of `extraroutes` on [4]_. (The third API
|
||||
`PUT /v2.0/routers/<router_id>` is not recommended for adding routes)
|
||||
|
||||
Before the ECMP routing Implementation, when L3 agent receive several route
|
||||
entries with same destination and different nexthops, it will only keep one
|
||||
entry of them, or replace the existing route with a new one. But now after
|
||||
these changes, there will be an ECMP route in the router. So you can add an
|
||||
ECMP route entry like this:
|
||||
|
||||
::
|
||||
|
||||
PUT /v2.0/routers/{router_id}/add_extraroutes
|
||||
|
||||
{ "router":
|
||||
{ "routes":
|
||||
[ { "destination": "192.168.1.6/32",
|
||||
"nexthop": "192.168.1.88" },
|
||||
{ "destination": "192.168.1.6/32",
|
||||
"nexthop": "192.168.1.99" }
|
||||
...
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
Then you can find the ECMP route in router related namespace:
|
||||
|
||||
::
|
||||
|
||||
#ip route
|
||||
|
||||
192.168.1.6/32 proto static
|
||||
nexthop via 192.168.1.88 dev qr-9adb238b-c2 weight 1
|
||||
nexthop via 192.168.1.99 dev qr-9adb238b-c2 weight 1
|
||||
|
||||
To make this behavior change discoverable, a shim extension called
|
||||
'ecmp_routes' will be added.
|
||||
[`p2`_]To make ARP proxy behavior discoverable, a shim extension called
|
||||
'ecmp_arp' will be added, it will be removed dynamically when related option
|
||||
`router_interface_arp_proxy` in config file is `False`.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
* XiaoYu Zhu
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* L3 Agent Update
|
||||
* Tests
|
||||
* Documentation
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Tempest Tests
|
||||
-------------
|
||||
* Tempest tests
|
||||
|
||||
Functional Tests
|
||||
----------------
|
||||
* New tests need to be written
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
User Documentation
|
||||
------------------
|
||||
* User documentation
|
||||
* API reference
|
||||
|
||||
Developer Documentation
|
||||
-----------------------
|
||||
* Needs devref documentation
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] http://netfilter.org/projects/iptables/files/changes-iptables-1.4.15.txt
|
||||
|
||||
.. [2] https://review.opendev.org/723864
|
||||
|
||||
.. [3] https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/router.html
|
||||
|
||||
.. [4] https://specs.openstack.org/openstack/neutron-specs/specs/train/improve-extraroute-api.html
|
||||
|
Loading…
x
Reference in New Issue
Block a user