=========================================== Cross Neutron VxLAN Networking in Tricircle =========================================== Background ========== Currently we only support VLAN as the cross-Neutron network type. For VLAN network type, central plugin in Tricircle picks a physical network and allocates a VLAN tag(or uses what users specify), then before the creation of local network, local plugin queries this provider network information and creates the network based on this information. Tricircle only guarantees that instance packets sent out of hosts in different pods belonging to the same VLAN network will be tagged with the same VLAN ID. Deployers need to carefully configure physical networks and switch ports to make sure that packets can be transported correctly between physical devices. For more flexible deployment, VxLAN network type is a better choice. Compared to 12-bit VLAN ID, 24-bit VxLAN ID can support more numbers of bridge networks and cross-Neutron L2 networks. With MAC-in-UDP encapsulation of VxLAN network, hosts in different pods only need to be IP routable to transport instance packets. Proposal ======== There are some challenges to support cross-Neutron VxLAN network. 1. How to keep VxLAN ID identical for the same VxLAN network across Neutron servers 2. How to synchronize tunnel endpoint information between pods 3. How to trigger L2 agents to build tunnels based on this information 4. How to support different back-ends, like ODL, L2 gateway The first challenge can be solved as VLAN network does, we allocate VxLAN ID in central plugin and local plugin will use the same VxLAN ID to create local network. For the second challenge, we introduce a new table called "shadow_agents" in Tricircle database, so central plugin can save the tunnel endpoint information collected from one local Neutron server in this table and use it to populate the information to other local Neutron servers when needed. Here is the schema of the table: .. csv-table:: Shadow Agent Table :header: Field, Type, Nullable, Key, Default id, string, no, primary, null pod_id, string, no, , null host, string, no, unique, null type, string, no, unique, null tunnel_ip, string, no, , null **How to collect tunnel endpoint information** When the host where a port will be located is determined, local Neutron server will receive a port-update request containing host ID in the body. During the process of this request, local plugin can query agent information that contains tunnel endpoint information from local Neutron database with host ID and port VIF type; then send tunnel endpoint information to central Neutron server by issuing a port-update request with this information in the binding profile. **How to populate tunnel endpoint information** When the tunnel endpoint information in one pod is needed to be populated to other pods, XJob will issue port-create requests to corresponding local Neutron servers with tunnel endpoint information queried from Tricircle database in the bodies. After receiving such request, local Neutron server will save tunnel endpoint information by calling real core plugin's "create_or_update_agent" method. This method comes from neutron.db.agent_db.AgentDbMixin class. Plugins that support "agent" extension will have this method. Actually there's no such agent daemon running in the target local Neutron server, but we insert a record for it in the database so the local Neutron server will assume there exists an agent. That's why we call it shadow agent. The proposed solution for the third challenge is based on the shadow agent and L2 population mechanism. In the original Neutron process, if the port status is updated to active, L2 population mechanism driver does two things. First, driver checks if the updated port is the first port in the target agent. If so, driver collects tunnel endpoint information of other ports in the same network, then sends the information to the target agent via RPC. Second, driver sends the tunnel endpoint information of the updated port to other agents where ports in the same network are located, also via RPC. L2 agents will build the tunnels based on the information they received. To trigger the above processes to build tunnels across Neutron servers, we further introduce shadow port. Let's say we have two instance ports, port1 is located in host1 in pod1 and port2 is located in host2 in pod2. To make L2 agent running in host1 build a tunnel to host2, we create a port with the same properties of port2 in pod1. As discussed above, local Neutron server will create shadow agent during the process of port-create request, so local Neutron server in pod1 won't complain that host2 doesn't exist. To trigger L2 population process, we then update the port status to active, so L2 agent in host1 will receive tunnel endpoint information of port2 and build the tunnel. Port status is a read-only property so we can't directly update it via ReSTful API. Instead, we issue a port-update request with a special key in the binding profile. After local Neutron server receives such request, it pops the special key from the binding profile and updates the port status to active. XJob daemon will take the job to create and update shadow ports. Here is the flow of shadow agent and shadow port process:: +-------+ +---------+ +---------+ | | | | +---------+ | | | Local | | Local | | | +----------+ +------+ | Local | | Nova | | Neutron | | Central | | | | | | Neutron | | Pod1 | | Pod1 | | Neutron | | Database | | XJob | | Pod2 | | | | | | | | | | | | | +---+---+ +---- ----+ +----+----+ +----+-----+ +--+---+ +----+----+ | | | | | | | update port1 | | | | | | [host id] | | | | | +---------------> | | | | | | update port1 | | | | | | [agent info] | | | | | +----------------> | | | | | | save shadow | | | | | | agent info | | | | | +----------------> | | | | | | | | | | | trigger shadow | | | | | | port setup job | | | | | | for pod1 | | | | | +---------------------------------> | | | | | | query ports in | | | | | | the same network | | | | | +------------------> | | | | | | | | | | | return port2 | | | | | <------------------+ | | | | query shadow | | | | | | agent info | | | | | | for port2 | | | | | <----------------+ | | | | | | | | | | | create shadow | | | | | | port for port2 | | | <--------------------------------------------------+ | | | | | | | | | create shadow | | | | | | agent and port | | | | | +-----+ | | | | | | | | | | | | | | | | | | | <-----+ | | | | | | | | update shadow | | | | | | port to active | | | <--------------------------------------------------+ | | | | | | | | | L2 population | | | trigger shadow | | +-----+ | | | port setup job | | | | | | | for pod2 | | | | | | +-----+ | | <-----+ | | | | | | | | | | | | | | | | <-----+ | | | | | | | | | | | | | + + + + + + Bridge network can support VxLAN network in the same way, we just create shadow ports for router interface and router gateway. In the above graph, local Nova server updates port with host ID to trigger the whole process. L3 agent will update interface port and gateway port with host ID, so similar process will be triggered to create shadow ports for router interface and router gateway. Currently Neutron team is working on push notification [1]_, Neutron server will send resource data to agents; agents cache this data and use it to do the real job like configuring openvswitch, updating iptables, configuring dnsmasq, etc. Agents don't need to retrieve resource data from Neutron server via RPC any more. Based on push notification, if tunnel endpoint information is stored in port object later, and this information supports updating via ReSTful API, we can simplify the solution for challenge 3 and 4. We just need to create shadow port containing tunnel endpoint information. This information will be pushed to agents and agents use it to create necessary tunnels and flows. **How to support different back-ends besides ML2+OVS implementation** We consider two typical back-ends that can support cross-Neutron VxLAN networking, L2 gateway and SDN controller like ODL. For L2 gateway, we consider only supporting static tunnel endpoint information for L2 gateway at the first step. Shadow agent and shadow port process is almost the same with the ML2+OVS implementation. The difference is that, for L2 gateway, the tunnel IP of the shadow agent is set to the tunnel endpoint of the L2 gateway. So after L2 population, L2 agents will create tunnels to the tunnel endpoint of the L2 gateway. For SDN controller, we assume that SDN controller has the ability to manage tunnel endpoint information across Neutron servers, so Tricircle only helps to allocate VxLAN ID and keep the VxLAN ID identical across Neutron servers for one network. Shadow agent and shadow port process will not be used in this case. However, if different SDN controllers are used in different pods, it will be hard for each SDN controller to connect hosts managed by other SDN controllers since each SDN controller has its own mechanism. This problem is discussed in this page [2]_. One possible solution under Tricircle is as what L2 gateway does. We create shadow ports that contain L2 gateway tunnel endpoint information so SDN controller can build tunnels in its own way. We then configure L2 gateway in each pod to forward the packets between L2 gateways. L2 gateways discussed here are mostly hardware based, and can be controlled by SDN controller. SDN controller will use ML2 mechanism driver to receive the L2 network context and further control L2 gateways for the network. To distinguish different back-ends, we will add a new configuration option cross_pod_vxlan_mode whose valid values are "p2p", "l2gw" and "noop". Mode "p2p" works for the ML2+OVS scenario, in this mode, shadow ports and shadow agents containing host tunnel endpoint information are created; mode "l2gw" works for the L2 gateway scenario, in this mode, shadow ports and shadow agents containing L2 gateway tunnel endpoint information are created. For the SDN controller scenario, as discussed above, if SDN controller can manage tunnel endpoint information by itself, we only need to use "noop" mode, meaning that neither shadow ports nor shadow agents will be created; or if SDN controller can manage hardware L2 gateway, we can use "l2gw" mode. Data Model Impact ================= New table "shadow_agents" is added. Dependencies ============ None Documentation Impact ==================== - Update configuration guide to introduce options for VxLAN network - Update networking guide to discuss new scenarios with VxLAN network - Add release note about cross-Neutron VxLAN networking support References ========== .. [1] https://blueprints.launchpad.net/neutron/+spec/push-notifications .. [2] https://etherealmind.com/help-wanted-stitching-a-federated-sdn-on-openstack-with-evpn/