Add Node-Local Virtual IP Spec

Adding the spec to support Node-Local Virtual IP for the RFE
https://bugs.launchpad.net/neutron/+bug/1930200

Partial-Bug: #1930200
Co-Authored-By: obondarev <oleg.bondarev@huawei.com>
Change-Id: If9f137a839b37f8262f4842b34401a13967ed43e
This commit is contained in:
Ilya Chukhnakov 2021-06-24 06:36:48 +03:00 committed by Oleg Bondarev
parent e3c7a586d4
commit 31e452c6ea
3 changed files with 390 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

View File

@ -0,0 +1,390 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
========================================
Node-Local virtual IP address (Local IP)
========================================
RFE: https://bugs.launchpad.net/neutron/+bug/1930200
This spec proposes a new type of shared virtual IP address that can be used to
access distributed and/or multi-node services and applications running in the
cloud. The feature is primarily focused on high efficiency and performance of
the networking data plane for very large scale clouds and/or clouds with high
network throughput demands.
Problem Description
===================
In a topology where several workloads access a shared resource over network
obvious bottlenecks are the network and the shared resource itself
(or a loadbalancer):
.. image:: ../../images/node-local-ip-problem.png
Proposed feature suggests to solve this bottleneck issue by deploying local
replicas/caches of the shared service and re-translating local workloads`
requests to these replicas transparently. Transparency is achieved by
assigning same IP address to local replicas as on main shared resource.
.. image:: ../../images/node-local-ip-solution.png
Primary use-cases for this feature are listed in the RFE description. While
the proposed virtual IP may be used on its own for simple use cases, it is
mainly intended to serve as a basic building block and should be used in
conjunction with other services to provide full-fledged service discovery,
load balancing, proxying and other advanced features.
Proposed Change
===============
Add a new Node Local IP (or just Local IP) resource - a virtual IP that can be
shared across multiple ports/VMs (similar to anycast IP) and is guaranteed
to only be reachable within the same physical server/node boundaries.
The mechanism is similar to Floating IP, with the difference that Local IP
could be assigned to several fixed IPs (on different nodes) simultaneously and
that all address translation happens early/locally on the node's integration
bridge.
The scope of this feature is limited to:
* API and CLI to provide fine grained control over allocation and assignment
of the Local IP addresses
* Networking data plane mechanism to efficiently redirect traffic to a port
within the same physical node (if present) or to optionally forward it to a
fallback destination
* Local IP will only be accessible within the same L2 segment
Local IP object will be composed of the following objects & attributes:
* Underlying Neutron port representing Local IP
* Collection of associated Neutron ports that will serve as packets'
destination when the packet source port is located on the same physical node
* IP mode for local ports:
* 'translate' (default) - Local IP will behave as DNAT for redirected packets
* 'passthrough' - packets will be redirected without modification to IP
headers; destination VM's guest OS will be responsible for configuring
Local IP's IP address on a corresponding port interface. This is similar
to current allowed address pair mechanism with the benefit of fine grained
control over IP allocation and assignment
* Fallback policy:
* 'forward' (default) - to redirect packets to an underlying Neutron port IP
address if no Local IP assigned port is available on the physical node
* 'drop' - packets will be dropped if no Local IP assigned port is available
on the physical node - out of scope for now
* 'reject' - similar to iptables' 'reject' policy to help clients fail fast
if no Local IP assigned port is available on the physical node -
out of scope for now
REST API Impact
---------------
New Local IP API resource
~~~~~~~~~~~~~~~~~~~~~~~~~
+-------------------+---------+-------+------+---------------------------------------+
| Attribute | Type | Req | CRUD | Description |
+===================+=========+=======+======+=======================================+
| id | uuid-str| Yes | R | Unique identifier for the |
| | | | | Local IP object. |
+-------------------+---------+-------+------+---------------------------------------+
| name | String | No | CRU | Human readable name for the Local IP |
| | | | | (255 characters limit). Does not have |
| | | | | to be unique. |
+-------------------+---------+-------+------+---------------------------------------+
| description | String | No | CRU | Human readable description for the |
| | | | | Local IP (255 characters limit). |
+-------------------+---------+-------+------+---------------------------------------+
| project_id | uuid-str| No | CR | Owner of the Local IP. |
+-------------------+---------+-------+------+---------------------------------------+
| local_ip | String | No | CR | Local IP CIDR (virtual) that will be |
| | | | | reachable within the same physical |
| | | | | server/node. Required when provided |
| | | | | local_port_id's Port has several fixed|
| | | | | IPs |
+-------------------+---------+-------+------+---------------------------------------+
| local_port_id | uuid-str| Yes | CR | Underlying (backup) Neutron port ID |
| | | | | used by Local IP object to get actual |
| | | | | IP address for local translation. |
| | | | | If port has several fixed IPs - |
| | | | | local_ip should be provided as well. |
+-------------------+---------+-------+------+---------------------------------------+
| network_id | uuid-str| Yes | CR | ID of network from which to allocate |
| | | | | Local IPs underlying Neutron port. |
| | | | | Required when no local_port_id |
| | | | | was provided upon creation. |
+-------------------+---------+-------+------+---------------------------------------+
| ip_mode | String | No | CRU | One of 'translate' (for DNAT) or |
| | | | | 'passthrough' (no NAT) modes described|
| | | | | above. Default: 'translate' |
+-------------------+---------+-------+------+---------------------------------------+
|
New Local IP Association API resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-------------------+---------+-------+------+---------------------------------------+
| Attribute | Type | Req | CRUD | Description |
+===================+=========+=======+======+=======================================+
| local_ip_id | uuid-str| Yes | CR | Local IP id |
+-------------------+---------+-------+------+---------------------------------------+
| fixed_port_id | uuid-str| Yes | CR | ID of associated Neutron fixed port |
+-------------------+---------+-------+------+---------------------------------------+
| fixed_ip | String | No | CR | Exact port's fixed IP address that is |
| | | | | associated with Local IP |
+-------------------+---------+-------+------+---------------------------------------+
| host | List | No | R | Host where associated port is bound |
+-------------------+---------+-------+------+---------------------------------------+
|
Basically new API will be similar to existing Floating IP API with some
differences:
* Local IP could be associated with multiple fixed ports at the same time.
* Local IP may be created by providing an existing Neutron port which will
be used to obtain the IP address and may serve as a fallback destination
* Creating Local IP without explicitly specified port will trigger creation
of an underlying Neutron port to handle IPAM, network boundaries and
permission checks. In this case network ID should be specified upon
creation
* Deletion of the Local IP underlying fallback port will be prohibited by
the Neutron server as long as a Local IP associated with such port exists
* Manual underlying Neutron port updates will be ignored by parent Local IP
(similar to current Floating IP behavior)
* Deleting Local IP should be prohibited if Local IP has associated local
ports; 'force' option may be provided to trigger local port disassociation
on deletion; deleting Local IP should also delete the underlying Neutron
port representing Local IP if this port was created specifically for
Local IP and did not exist before
* IPAM and IP validation will be handled automatically by underlying Port
* Local IP will behave similar to Port and Floating IP API in terms of
user/admin access rights
* Local IP may have quotas similar to Floating IP API (out of scope for now)
Data Model Impact
-----------------
The following are the backend database tables for the REST API proposed above.
|
| **Local IPs**
+-------------------+---------+-------+---------------------------------------+
| Attribute | Type | Req | Description |
+===================+=========+=======+=======================================+
| id (PK) | uuid-str| Yes | Unique identifier for the |
| | | | Local IP object. |
+-------------------+---------+-------+---------------------------------------+
| name | string | No | Human readable name for the Local IP |
| | | | (255 characters limit). Does not have |
| | | | to be unique. |
+-------------------+---------+-------+---------------------------------------+
| project_id | uuid-str| No | Owner of the Local IP. Only admin |
| | | | users can specify a project identifier|
| | | | other than their own. |
+-------------------+---------+-------+---------------------------------------+
| local_ip | string | Yes | Local IP CIDR (virtual) that will be |
| | | | reachable within the same physical |
| | | | server/node. |
+-------------------+---------+-------+---------------------------------------+
| local_port_id (FK)| uuid-str| Yes | Underlying (backup) Neutron port ID |
| | | | used by Local IP object to get actual |
| | | | IP address for local NATting |
+-------------------+---------+-------+---------------------------------------+
| ip_mode | string | No | One of 'translate' (for DNAT) or |
| | | | 'passthrough' (no NAT) modes described|
| | | | above. Default: 'translate' |
+-------------------+---------+-------+---------------------------------------+
|
| **Local IP associations**
+-------------------+---------+-------+----------------------------------------+
| Attribute | Type | Req | Description |
+===================+=========+=======+========================================+
| local_ip_id (PK) | uuid-str| Yes | UUID of a Local IP |
+-------------------+---------+-------+----------------------------------------+
| fixed_port_id (PK)| uuid-str| Yes | UUID of a port which will serve Local |
| | | | IP requests on this port's host |
+-------------------+---------+-------+----------------------------------------+
| fixed_ip | String | Yes | Exact port's fixed IP address that is |
| | | | associated with Local IP |
+-------------------+---------+-------+----------------------------------------+
Client impact
-------------
* New CLI and OpenStack client commands to create, update, delete, list and
show Local IPs and to associate/disassociate fixed IP ports
OVS, ML2 Drivers impact
-----------------------
RPC interface between server and OVS agent will be updated to include info
about Local IPs associated with agent's ports. Server will update port_details
with this info.
OVS agent will perform following flow updates based on Local IPs info:
* update ARP spoofing flow for associated port
(like done for allowed address pairs)
* identify network/local vlan of associated port
* for each port from this network - redirect packets from table 0 or
24(ARP_SPOOF)/25(MAC_SPOOF) to a new table - 50 (LOCAL_EGRESS)
Below flow examples are for the case when fixed port (id=56fbc1e7-2c..,
MAC=fa:16:3e:9a:1a:de, IP=10.0.0.51) is associated with Local IP 10.0.0.10
::
table=25, n_packets=0, n_bytes=0, priority=2,in_port="tap56fbc1e7-2c",
dl_src=fa:16:3e:9a:1a:de actions=resubmit(,50)
* table 50(LOCAL_EGRESS) serves for memorizing local vlans to later
distinguish Local IP traffic from different nets and redirect packets further
to table 51(LOCAL_IP_HANDLE)
::
table=50, n_packets=0, n_bytes=0, priority=9,in_port="tap56fbc1e7-2c"
actions=load:0x4->NXM_NX_REG6[],resubmit(,51)
* table 51(LOCAL_IP_HANDLE) has actual flows for Local IP handling
* ARP responder flow to handle Local IP ARP requests from same subnet ports
(fixed port and local IP associated with it are from the same IP subnet)
::
table=51, n_packets=0, n_bytes=0, priority=1,arp,reg6=0x4,
arp_tpa=10.0.0.10
actions=load:0x2->NXM_OF_ARP_OP[],
move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],
move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],
load:0xfa163e9a1ade->NXM_NX_ARP_SHA[],
load:0x1010102->NXM_OF_ARP_SPA[],
move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],
mod_dl_src:fa:16:3e:9a:1a:de,IN_PORT
* NAT flows to do actual Local IP address translation
::
table=51, n_packets=0, n_bytes=0, priority=10,ip,reg6=0x4,
nw_dst=10.0.0.10
actions=ct(commit,table=60,zone=4,nat(dst=10.0.0.51))
table=51, n_packets=0, n_bytes=0, priority=10,ct_state=-trk,ip,reg6=0x4,
dl_src=fa:16:3e:9a:1a:de,nw_src=10.0.0.51 actions=ct(table=60,zone=4,nat)
After translation packets are resubmitted further to TRANSIENT_TABLE (60)
Alternatively static NAT could be used to avoid kernel conntrack:
learn back flows with 7 tuple (src_mac, dest_mac, src_ip, dest_ip,
protocol, src_protocol_port, dest_protocol_port)
::
table=51, priority=30,tcp,dl_vlan=1,dl_dst=fa:16:3e:9a:1a:de,
dl_src=fa:16:3e:4f:26:fd,nw_src=10.0.0.100,nw_dst=10.0.0.10,
tp_dst=0x8000/0x8000
actions=learn(table=62,idle_timeout=30,hard_timeout=1800,priority=90,
eth_type=0x800,nw_proto=6,NXM_OF_ETH_SRC[]=NXM_OF_ETH_DST[],
NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],NXM_OF_IP_SRC[]=NXM_OF_IP_DST[],
NXM_OF_IP_DST[]=NXM_OF_IP_SRC[],NXM_OF_TCP_DST[]=NXM_OF_TCP_SRC[],
NXM_OF_TCP_SRC[]=NXM_OF_TCP_DST[],NXM_OF_VLAN_TCI[0..11],
load:NXM_NX_REG0[0..11]->NXM_OF_VLAN_TCI[0..11],
output:NXM_OF_IN_PORT[]),set_field:10.0.0.10->ip_dst,goto_table:60
This is usable in case of OVS offloading (e.g. SmartNICs).
This will be made configurable on the ovs agent side.
* Not matched (not related to Local IP) packets are transmitted further
::
table=51, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,60)
Packet processing example:
::
VM1 <10.0.0.100> tries to access Local IP 10.0.0.10
Local IP is assigned to port with MAC_2/10.0.0.51:
ARP:
VM1 -> ARP -> 10.0.0.10 and response <10.0.0.10 has MAC_2>
Egress:
<VM1_MAC_1 + 10.0.0.100> to <MAC_2 + 10.0.0.10>
<<ct NAT>>
<VM1_MAC_1 + 10.0.0.100> to <MAC_2 + 10.0.0.51>
Ingress (back):
<MAC_2 + 10.0.0.51> to <VM1_MAC_1 + 10.0.0.100>
<<ct NAT>>
<MAC_2 + 10.0.0.10> to <VM1_MAC_1 + 10.0.0.100>
Scheduling
----------
Local IP will have no effect on VM scheduling/placement. Local IP operates
under assumption that at any given time any physical node in the cloud should
have no more than one active port/VM associated with a given Local IP.
Users are responsible to utilize OpenStack placement/scheduling features (e.g.
anti-affinity rules) to avoid assigning multiple ports from the same physical
node the the same Local IP.
However there might be valid cases when multiple Local IP's local ports may be
colocated on the same physycal node (e.g. as a result of temporary live
migration during other node's maintenance). Local IP should provide a
deterministic way of handling such situations (e.g. in case of multiple local
ports, only the oldest port shall be used).
Initial release limitations
---------------------------
* Only IPv4 will be supported. IPv6 support will be considered in future
releases
* Only 'openvswitch' ML2 mechanism driver will support the feature
* Only tunnel (VxLAN, GRE) networks will be supported. 'vlan' will be
considered if require minimum overhead
* No deterministic handling of packets if a node contains multiple local ports
from same L2 segment associated with the same Local IP
* no 'reject' and 'drop' fallback policies; only 'forward' will be supported
References
==========