Spec for Neutron DVR support in Fuel
Implements blueprint neutron-dvr-deployment Change-Id: I666c1d1de9e128147174abc0a178a41828ae5d93
This commit is contained in:
parent
b4b03b8730
commit
af52304690
372
specs/7.0/neutron-dvr-deployment.rst
Normal file
372
specs/7.0/neutron-dvr-deployment.rst
Normal file
@ -0,0 +1,372 @@
|
|||||||
|
..
|
||||||
|
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||||
|
License.
|
||||||
|
|
||||||
|
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||||
|
|
||||||
|
===================
|
||||||
|
Neutron DVR support
|
||||||
|
===================
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/fuel/+spec/neutron-dvr-deployment
|
||||||
|
|
||||||
|
Neutron Distributed Virtual Router implements the L3 Routers across the
|
||||||
|
compute nodes, so that tenants intra VM communication will occur without
|
||||||
|
hitting the controller node. (East-West Routing)
|
||||||
|
|
||||||
|
Also Neutron Distributed Virtual Router implements the Floating IP namespace
|
||||||
|
on every compute node where the VMs are located. In this case the VMs with
|
||||||
|
FloatingIPs can forward the traffic to the external network without reaching
|
||||||
|
the controller node. (North-South Routing)
|
||||||
|
|
||||||
|
Neutron Distributed Virtual Router provides the legacy SNAT behavior for
|
||||||
|
the default SNAT for all private VMs. SNAT service is not distributed,
|
||||||
|
it is centralized and the service node will host the service.
|
||||||
|
|
||||||
|
|
||||||
|
Problem description
|
||||||
|
===================
|
||||||
|
|
||||||
|
Currently Neutron L3 Routers are deployed on specific Nodes (controller nodes)
|
||||||
|
where all the compute traffic will flow through.
|
||||||
|
|
||||||
|
* Problem 1: Intra VM traffic flows through the controller node
|
||||||
|
|
||||||
|
In this case even VMs traffic that belong to the same tenant on a different
|
||||||
|
subnet has to hit the controller node to get routed between the subnets.
|
||||||
|
This would affect performance and scalability.
|
||||||
|
|
||||||
|
* Problem 2: VMs with FloatingIP also receive and send packets through
|
||||||
|
the controller node routers
|
||||||
|
|
||||||
|
Today FloatingIP (DNAT) translation is done on the controller node and
|
||||||
|
also the external network gateway port is available only at the controller.
|
||||||
|
So any traffic that goes to the external network from the VM will
|
||||||
|
have to go through the controller node. In this case the controller node
|
||||||
|
becomes a single point of failure and also the traffic will heavily load
|
||||||
|
the controller node. This would affect the performance and scalability.
|
||||||
|
|
||||||
|
|
||||||
|
Proposed change
|
||||||
|
===============
|
||||||
|
|
||||||
|
The proposal is to distribute L3 Routers across compute nodes when required
|
||||||
|
by VMs. This implies having external network access on each compute node.
|
||||||
|
|
||||||
|
In this case there will be enhanced L3 Agents running on each and every
|
||||||
|
compute node (This is not a new agent, this is an updated version of the
|
||||||
|
existing L3 Agent). Based on the configuration in the L3 Agent.ini file,
|
||||||
|
the enhanced L3 Agent will behave in legacy (centralized router) mode or as
|
||||||
|
a distributed router mode.
|
||||||
|
|
||||||
|
Also the FloatingIP will have a new namespace created on the specific
|
||||||
|
compute node where the VM is located (this is done by L3 agent itself).
|
||||||
|
Each Compute Node will have one new namespace for FloatingIP per external
|
||||||
|
network that will be shared among the tenants. Additional namespace and
|
||||||
|
external gateway port will also be created on each compute node for the
|
||||||
|
external traffic to flow through, in case there are VMs with floating ips
|
||||||
|
residing on this node. This port will consume additional IP address from
|
||||||
|
external network.
|
||||||
|
|
||||||
|
Default SNAT functionality will still be centralized and will be running on
|
||||||
|
controller nodes.
|
||||||
|
|
||||||
|
The Metadata agent will be distributed as well and will be hosted on all
|
||||||
|
compute nodes and the Metadata Proxy will be hosted on all the distributed
|
||||||
|
routers.
|
||||||
|
|
||||||
|
This implementation is specific to ML2 with OVS driver.
|
||||||
|
All three type of segmentation are supported: GRE, VXLAN, VLAN.
|
||||||
|
|
||||||
|
Constraints and Limitations
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
* No Distributed SNAT
|
||||||
|
|
||||||
|
Neutron Distributed Virtual Router provides the legacy SNAT behavior for the
|
||||||
|
default SNAT for all private VMs. SNAT service is not distributed,
|
||||||
|
it is centralized and the service node will host the service.
|
||||||
|
Thus current DVR architecture is not fully fault tolerant - outbound traffic
|
||||||
|
for VMs without floating IPs is still going through one L3_agent node and
|
||||||
|
is still prone to failures of a single node.
|
||||||
|
|
||||||
|
* Only with ML2-OVS/L2-pop
|
||||||
|
|
||||||
|
DVR feature is supported only by ML2 plugin with OVS mechanism driver. If
|
||||||
|
using tunnel segmentation (VXLAN, GRE) L2 population mechanism should be
|
||||||
|
enabled as well.
|
||||||
|
|
||||||
|
* OVS and Kernel versions
|
||||||
|
|
||||||
|
Proper operation of DVR requires OpenvSwitch 2.1 or newer and VXLAN requires
|
||||||
|
kernel 3.13 or newer.
|
||||||
|
|
||||||
|
* No bare metal support
|
||||||
|
|
||||||
|
Distributed routers rely on local l3 agent (residing on compute node) for
|
||||||
|
address translation, so for bare metal instances only legacy routers should
|
||||||
|
be used.
|
||||||
|
|
||||||
|
Deployment impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
* Architecture changes
|
||||||
|
|
||||||
|
* Neutron L3 and metadata agents will be deployed on all compute nodes and
|
||||||
|
managed by Upstart. Agents deployment scheme on controller nodes is not
|
||||||
|
changed.
|
||||||
|
|
||||||
|
* All compute nodes require bridge to external network
|
||||||
|
|
||||||
|
* Fuel Library related changes
|
||||||
|
|
||||||
|
* Update Neutron Puppet module to support DVR-related options (L3 agent mode,
|
||||||
|
L2 population, distributed router option). This step will be done as a part
|
||||||
|
of blueprint upgrade-openstack-puppet-modules, when all necessary changes
|
||||||
|
will be synced from puppet-neutron project
|
||||||
|
|
||||||
|
* Update Cloud Networking related Puppet modules to deploy Neutron L3 and
|
||||||
|
metadata agents on compute nodes with appropriate configuration. This step
|
||||||
|
will likely require changes in Granular deployment to execute Neutron
|
||||||
|
agents related granulars on compute nodes
|
||||||
|
|
||||||
|
* update Horizon related Puppet modules to add an ability to use Neutron DVR
|
||||||
|
options (create either centralized or distributed routers)
|
||||||
|
|
||||||
|
* Fuel Web related changes
|
||||||
|
|
||||||
|
* When Neutron DVR is enabled, a network scheme with external bridges on all
|
||||||
|
compute nodes should be generated. astute.yaml possible examples:
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
Compute nodes:
|
||||||
|
-----------------
|
||||||
|
network_scheme:
|
||||||
|
endpoints:
|
||||||
|
br-ex:
|
||||||
|
IP: none
|
||||||
|
-----------------
|
||||||
|
quantum_settings:
|
||||||
|
DVR: true
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Controller nodes:
|
||||||
|
-----------------
|
||||||
|
network_scheme:
|
||||||
|
endpoints:
|
||||||
|
br-ex:
|
||||||
|
IP:
|
||||||
|
- 172.16.0.3/24
|
||||||
|
----------------
|
||||||
|
quantum_settings:
|
||||||
|
DVR: true
|
||||||
|
----------------
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Alternatives
|
||||||
|
------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Data model impact
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
REST API impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
No FUEL REST API changes.
|
||||||
|
|
||||||
|
Upgrade impact
|
||||||
|
--------------
|
||||||
|
|
||||||
|
The upgrade path from legacy to distributed router is supported. It's a 3
|
||||||
|
step process:
|
||||||
|
|
||||||
|
* neutron router-update router1 --admin_state_up=False
|
||||||
|
|
||||||
|
* neutron router-update router1 --distributed=True
|
||||||
|
|
||||||
|
* neutron router-update router1 --admin_state_up=True
|
||||||
|
|
||||||
|
distributed->legacy upgrade is not officially supported in Kilo but it may
|
||||||
|
work, just needs to be tested.
|
||||||
|
|
||||||
|
Security impact
|
||||||
|
---------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Notifications impact
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other end user impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Performance Impact
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Inter VM traffic between the tenant subnets doesn't need to reach the router
|
||||||
|
in the controller node to get routed and will be routed locally from the
|
||||||
|
compute node. This would increase the performance substantially.
|
||||||
|
|
||||||
|
Also the Floating IP traffic for a VM from a Compute Node will directly hit
|
||||||
|
the external network from the compute node, instead of going through the router
|
||||||
|
on the controller node.
|
||||||
|
|
||||||
|
Dataplane testing results from 25 bare metal nodes env show significant
|
||||||
|
performance improvement for both East-West and North-South (with floating IPs)
|
||||||
|
scenarios.
|
||||||
|
|
||||||
|
Plugin impact
|
||||||
|
-------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Other deployer impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Developer impact
|
||||||
|
----------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Infrastructure impact
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
None
|
||||||
|
|
||||||
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
|
Assignee(s)
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Primary assignee:
|
||||||
|
obondarev
|
||||||
|
|
||||||
|
Other contributors:
|
||||||
|
skolekonov (DE)
|
||||||
|
kkuznetsova (QA)
|
||||||
|
tnurlygayanov (QA)
|
||||||
|
|
||||||
|
Mandatory design reviewers:
|
||||||
|
svasilenko
|
||||||
|
vkuklin
|
||||||
|
sgolovatiuk
|
||||||
|
|
||||||
|
Work Items
|
||||||
|
----------
|
||||||
|
|
||||||
|
* Patch fuel-lib to enable DVR by default
|
||||||
|
|
||||||
|
* this will enable DVR testing at early stage
|
||||||
|
|
||||||
|
* Scale testing
|
||||||
|
|
||||||
|
* Rally scenarios
|
||||||
|
|
||||||
|
* Shaker scenarios
|
||||||
|
|
||||||
|
* debug
|
||||||
|
|
||||||
|
* bug fixing/backport from upstream
|
||||||
|
|
||||||
|
* Patch fuel-web to add ability to enable/disable DVR
|
||||||
|
|
||||||
|
* disable DVR by default
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
============
|
||||||
|
|
||||||
|
This will likely depend on enabling l2-population for tunneling which is a
|
||||||
|
separate effort. However we will not wait but enable l2 pop as part of DVR
|
||||||
|
effort if needed.
|
||||||
|
|
||||||
|
It also correlates with blueprint upgrade-openstack-puppet-modules as all
|
||||||
|
required changes might be already in master in upstream manifests.
|
||||||
|
|
||||||
|
Testing
|
||||||
|
=======
|
||||||
|
|
||||||
|
Manual Acceptance Tests
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
* On an environment with DVR enabled check that created router has
|
||||||
|
“distributed “ attribute set to True via Horizon or CLI
|
||||||
|
|
||||||
|
* Boot a VM on a subnet connected to DVR router. Check external connectivity.
|
||||||
|
|
||||||
|
* Assign Floating IP to the VM. Check external connectivity. Ensure VM is
|
||||||
|
reachable from external network.
|
||||||
|
|
||||||
|
* Boot a second VM on a different subnet connected to the same router. Ensure
|
||||||
|
inter-subnet connectivity (both VM can reach each other)
|
||||||
|
|
||||||
|
Scale
|
||||||
|
-----
|
||||||
|
|
||||||
|
* Environment with DVR enabled should pass all tests currently run on Scale
|
||||||
|
Lab with no significant performance degradation
|
||||||
|
|
||||||
|
* No additional Rally scenarios are needed to test specifics of DVR.
|
||||||
|
|
||||||
|
HA/Destructive Tests
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
All existing HA/destructive tests should pass on env with DVR enabled.
|
||||||
|
Additional scenarios should include:
|
||||||
|
|
||||||
|
* East-West HA Test
|
||||||
|
|
||||||
|
* Have several VM from different subnets running on different compute nodes.
|
||||||
|
The subnets should be connected to each other and to an external network by
|
||||||
|
a DVR router
|
||||||
|
|
||||||
|
* Shutdown all controllers of the environment
|
||||||
|
|
||||||
|
* Inter-subnet connectivity should be preserved: VMs from different
|
||||||
|
subnets/compute nodes should still be able to reach each other
|
||||||
|
|
||||||
|
* No dataplane downtime is expected
|
||||||
|
|
||||||
|
* North-South HA Test
|
||||||
|
|
||||||
|
* Have a VM with Floating IP running on a subnet connected to an external
|
||||||
|
network by a DVR router
|
||||||
|
|
||||||
|
* Shutdown all controllers of the environment.
|
||||||
|
|
||||||
|
* External connectivity should be preserved: VMs should still be able to
|
||||||
|
reach external network
|
||||||
|
|
||||||
|
* No dataplane downtime is expected
|
||||||
|
|
||||||
|
Data Plane Tests with Shaker
|
||||||
|
----------------------------
|
||||||
|
Shaker scenarios should be run on a bare-metal environment with DVR enabled.
|
||||||
|
Significant increase in performance is expected for east-west and north-south
|
||||||
|
(with Floating IPs) topologies. Some of the results were already obtained
|
||||||
|
(see "Performance Impact" section of the this doc)
|
||||||
|
|
||||||
|
Documentation Impact
|
||||||
|
====================
|
||||||
|
|
||||||
|
Ability to enable DVR support in Neutron should be documented in
|
||||||
|
Fuel Deployment Guide.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/fuel/+spec/neutron-dvr-deployment
|
||||||
|
|
||||||
|
https://blueprints.launchpad.net/fuel/+spec/upgrade-openstack-puppet-modules
|
Loading…
Reference in New Issue
Block a user