Spec for Neutron DVR support in Fuel
Implements blueprint neutron-dvr-deployment Change-Id: I666c1d1de9e128147174abc0a178a41828ae5d93
This commit is contained in:
parent
b4b03b8730
commit
af52304690
372
specs/7.0/neutron-dvr-deployment.rst
Normal file
372
specs/7.0/neutron-dvr-deployment.rst
Normal file
@ -0,0 +1,372 @@
|
||||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===================
|
||||
Neutron DVR support
|
||||
===================
|
||||
|
||||
https://blueprints.launchpad.net/fuel/+spec/neutron-dvr-deployment
|
||||
|
||||
Neutron Distributed Virtual Router implements the L3 Routers across the
|
||||
compute nodes, so that tenants intra VM communication will occur without
|
||||
hitting the controller node. (East-West Routing)
|
||||
|
||||
Also Neutron Distributed Virtual Router implements the Floating IP namespace
|
||||
on every compute node where the VMs are located. In this case the VMs with
|
||||
FloatingIPs can forward the traffic to the external network without reaching
|
||||
the controller node. (North-South Routing)
|
||||
|
||||
Neutron Distributed Virtual Router provides the legacy SNAT behavior for
|
||||
the default SNAT for all private VMs. SNAT service is not distributed,
|
||||
it is centralized and the service node will host the service.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently Neutron L3 Routers are deployed on specific Nodes (controller nodes)
|
||||
where all the compute traffic will flow through.
|
||||
|
||||
* Problem 1: Intra VM traffic flows through the controller node
|
||||
|
||||
In this case even VMs traffic that belong to the same tenant on a different
|
||||
subnet has to hit the controller node to get routed between the subnets.
|
||||
This would affect performance and scalability.
|
||||
|
||||
* Problem 2: VMs with FloatingIP also receive and send packets through
|
||||
the controller node routers
|
||||
|
||||
Today FloatingIP (DNAT) translation is done on the controller node and
|
||||
also the external network gateway port is available only at the controller.
|
||||
So any traffic that goes to the external network from the VM will
|
||||
have to go through the controller node. In this case the controller node
|
||||
becomes a single point of failure and also the traffic will heavily load
|
||||
the controller node. This would affect the performance and scalability.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
The proposal is to distribute L3 Routers across compute nodes when required
|
||||
by VMs. This implies having external network access on each compute node.
|
||||
|
||||
In this case there will be enhanced L3 Agents running on each and every
|
||||
compute node (This is not a new agent, this is an updated version of the
|
||||
existing L3 Agent). Based on the configuration in the L3 Agent.ini file,
|
||||
the enhanced L3 Agent will behave in legacy (centralized router) mode or as
|
||||
a distributed router mode.
|
||||
|
||||
Also the FloatingIP will have a new namespace created on the specific
|
||||
compute node where the VM is located (this is done by L3 agent itself).
|
||||
Each Compute Node will have one new namespace for FloatingIP per external
|
||||
network that will be shared among the tenants. Additional namespace and
|
||||
external gateway port will also be created on each compute node for the
|
||||
external traffic to flow through, in case there are VMs with floating ips
|
||||
residing on this node. This port will consume additional IP address from
|
||||
external network.
|
||||
|
||||
Default SNAT functionality will still be centralized and will be running on
|
||||
controller nodes.
|
||||
|
||||
The Metadata agent will be distributed as well and will be hosted on all
|
||||
compute nodes and the Metadata Proxy will be hosted on all the distributed
|
||||
routers.
|
||||
|
||||
This implementation is specific to ML2 with OVS driver.
|
||||
All three type of segmentation are supported: GRE, VXLAN, VLAN.
|
||||
|
||||
Constraints and Limitations
|
||||
---------------------------
|
||||
|
||||
* No Distributed SNAT
|
||||
|
||||
Neutron Distributed Virtual Router provides the legacy SNAT behavior for the
|
||||
default SNAT for all private VMs. SNAT service is not distributed,
|
||||
it is centralized and the service node will host the service.
|
||||
Thus current DVR architecture is not fully fault tolerant - outbound traffic
|
||||
for VMs without floating IPs is still going through one L3_agent node and
|
||||
is still prone to failures of a single node.
|
||||
|
||||
* Only with ML2-OVS/L2-pop
|
||||
|
||||
DVR feature is supported only by ML2 plugin with OVS mechanism driver. If
|
||||
using tunnel segmentation (VXLAN, GRE) L2 population mechanism should be
|
||||
enabled as well.
|
||||
|
||||
* OVS and Kernel versions
|
||||
|
||||
Proper operation of DVR requires OpenvSwitch 2.1 or newer and VXLAN requires
|
||||
kernel 3.13 or newer.
|
||||
|
||||
* No bare metal support
|
||||
|
||||
Distributed routers rely on local l3 agent (residing on compute node) for
|
||||
address translation, so for bare metal instances only legacy routers should
|
||||
be used.
|
||||
|
||||
Deployment impact
|
||||
-----------------
|
||||
|
||||
* Architecture changes
|
||||
|
||||
* Neutron L3 and metadata agents will be deployed on all compute nodes and
|
||||
managed by Upstart. Agents deployment scheme on controller nodes is not
|
||||
changed.
|
||||
|
||||
* All compute nodes require bridge to external network
|
||||
|
||||
* Fuel Library related changes
|
||||
|
||||
* Update Neutron Puppet module to support DVR-related options (L3 agent mode,
|
||||
L2 population, distributed router option). This step will be done as a part
|
||||
of blueprint upgrade-openstack-puppet-modules, when all necessary changes
|
||||
will be synced from puppet-neutron project
|
||||
|
||||
* Update Cloud Networking related Puppet modules to deploy Neutron L3 and
|
||||
metadata agents on compute nodes with appropriate configuration. This step
|
||||
will likely require changes in Granular deployment to execute Neutron
|
||||
agents related granulars on compute nodes
|
||||
|
||||
* update Horizon related Puppet modules to add an ability to use Neutron DVR
|
||||
options (create either centralized or distributed routers)
|
||||
|
||||
* Fuel Web related changes
|
||||
|
||||
* When Neutron DVR is enabled, a network scheme with external bridges on all
|
||||
compute nodes should be generated. astute.yaml possible examples:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
Compute nodes:
|
||||
-----------------
|
||||
network_scheme:
|
||||
endpoints:
|
||||
br-ex:
|
||||
IP: none
|
||||
-----------------
|
||||
quantum_settings:
|
||||
DVR: true
|
||||
-----------------
|
||||
|
||||
Controller nodes:
|
||||
-----------------
|
||||
network_scheme:
|
||||
endpoints:
|
||||
br-ex:
|
||||
IP:
|
||||
- 172.16.0.3/24
|
||||
----------------
|
||||
quantum_settings:
|
||||
DVR: true
|
||||
----------------
|
||||
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
None
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
No FUEL REST API changes.
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
The upgrade path from legacy to distributed router is supported. It's a 3
|
||||
step process:
|
||||
|
||||
* neutron router-update router1 --admin_state_up=False
|
||||
|
||||
* neutron router-update router1 --distributed=True
|
||||
|
||||
* neutron router-update router1 --admin_state_up=True
|
||||
|
||||
distributed->legacy upgrade is not officially supported in Kilo but it may
|
||||
work, just needs to be tested.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Inter VM traffic between the tenant subnets doesn't need to reach the router
|
||||
in the controller node to get routed and will be routed locally from the
|
||||
compute node. This would increase the performance substantially.
|
||||
|
||||
Also the Floating IP traffic for a VM from a Compute Node will directly hit
|
||||
the external network from the compute node, instead of going through the router
|
||||
on the controller node.
|
||||
|
||||
Dataplane testing results from 25 bare metal nodes env show significant
|
||||
performance improvement for both East-West and North-South (with floating IPs)
|
||||
scenarios.
|
||||
|
||||
Plugin impact
|
||||
-------------
|
||||
|
||||
None
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Infrastructure impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
obondarev
|
||||
|
||||
Other contributors:
|
||||
skolekonov (DE)
|
||||
kkuznetsova (QA)
|
||||
tnurlygayanov (QA)
|
||||
|
||||
Mandatory design reviewers:
|
||||
svasilenko
|
||||
vkuklin
|
||||
sgolovatiuk
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Patch fuel-lib to enable DVR by default
|
||||
|
||||
* this will enable DVR testing at early stage
|
||||
|
||||
* Scale testing
|
||||
|
||||
* Rally scenarios
|
||||
|
||||
* Shaker scenarios
|
||||
|
||||
* debug
|
||||
|
||||
* bug fixing/backport from upstream
|
||||
|
||||
* Patch fuel-web to add ability to enable/disable DVR
|
||||
|
||||
* disable DVR by default
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
This will likely depend on enabling l2-population for tunneling which is a
|
||||
separate effort. However we will not wait but enable l2 pop as part of DVR
|
||||
effort if needed.
|
||||
|
||||
It also correlates with blueprint upgrade-openstack-puppet-modules as all
|
||||
required changes might be already in master in upstream manifests.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Manual Acceptance Tests
|
||||
-----------------------
|
||||
|
||||
* On an environment with DVR enabled check that created router has
|
||||
“distributed “ attribute set to True via Horizon or CLI
|
||||
|
||||
* Boot a VM on a subnet connected to DVR router. Check external connectivity.
|
||||
|
||||
* Assign Floating IP to the VM. Check external connectivity. Ensure VM is
|
||||
reachable from external network.
|
||||
|
||||
* Boot a second VM on a different subnet connected to the same router. Ensure
|
||||
inter-subnet connectivity (both VM can reach each other)
|
||||
|
||||
Scale
|
||||
-----
|
||||
|
||||
* Environment with DVR enabled should pass all tests currently run on Scale
|
||||
Lab with no significant performance degradation
|
||||
|
||||
* No additional Rally scenarios are needed to test specifics of DVR.
|
||||
|
||||
HA/Destructive Tests
|
||||
--------------------
|
||||
|
||||
All existing HA/destructive tests should pass on env with DVR enabled.
|
||||
Additional scenarios should include:
|
||||
|
||||
* East-West HA Test
|
||||
|
||||
* Have several VM from different subnets running on different compute nodes.
|
||||
The subnets should be connected to each other and to an external network by
|
||||
a DVR router
|
||||
|
||||
* Shutdown all controllers of the environment
|
||||
|
||||
* Inter-subnet connectivity should be preserved: VMs from different
|
||||
subnets/compute nodes should still be able to reach each other
|
||||
|
||||
* No dataplane downtime is expected
|
||||
|
||||
* North-South HA Test
|
||||
|
||||
* Have a VM with Floating IP running on a subnet connected to an external
|
||||
network by a DVR router
|
||||
|
||||
* Shutdown all controllers of the environment.
|
||||
|
||||
* External connectivity should be preserved: VMs should still be able to
|
||||
reach external network
|
||||
|
||||
* No dataplane downtime is expected
|
||||
|
||||
Data Plane Tests with Shaker
|
||||
----------------------------
|
||||
Shaker scenarios should be run on a bare-metal environment with DVR enabled.
|
||||
Significant increase in performance is expected for east-west and north-south
|
||||
(with Floating IPs) topologies. Some of the results were already obtained
|
||||
(see "Performance Impact" section of the this doc)
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Ability to enable DVR support in Neutron should be documented in
|
||||
Fuel Deployment Guide.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
https://blueprints.launchpad.net/fuel/+spec/neutron-dvr-deployment
|
||||
|
||||
https://blueprints.launchpad.net/fuel/+spec/upgrade-openstack-puppet-modules
|
Loading…
Reference in New Issue
Block a user