17 changed files with 3887 additions and 1 deletions
@ -0,0 +1,276 @@
|
||||
========================================= |
||||
Tricircle Asynchronous Job Management API |
||||
========================================= |
||||
|
||||
Background |
||||
========== |
||||
In the Tricircle, XJob provides OpenStack multi-region functionality. It |
||||
receives and processes jobs from the Admin API or Tricircle Central |
||||
Neutron Plugin and handles them in an asynchronous way. For example, when |
||||
booting an instance in the first time for the project, router, security |
||||
group rule, FIP and other resources may have not already been created in |
||||
the local Neutron(s), these resources could be created asynchronously to |
||||
accelerate response for the initial instance booting request, different |
||||
from network, subnet and security group resources that must be created |
||||
before an instance booting. Central Neutron could send such creation jobs |
||||
to local Neutron(s) through XJob and then local Neutron(s) handle them |
||||
with their own speed. |
||||
|
||||
Implementation |
||||
============== |
||||
XJob server may strike occasionally so tenants and cloud administrators |
||||
need to know the job status and delete or redo the failed job if necessary. |
||||
Asynchronous job management APIs provide such functionality and they are |
||||
listed as following: |
||||
|
||||
* Create a job |
||||
|
||||
Create a job to synchronize resource if necessary. |
||||
|
||||
Create Job Request:: |
||||
|
||||
POST /v1.0/jobs |
||||
{ |
||||
"job": { |
||||
"type": "port_delete", |
||||
"project_id": "d01246bc5792477d9062a76332b7514a", |
||||
"resource": { |
||||
"pod_id": "0eb59465-5132-4f57-af01-a9e306158b86", |
||||
"port_id": "8498b903-9e18-4265-8d62-3c12e0ce4314" |
||||
} |
||||
} |
||||
} |
||||
|
||||
Response: |
||||
{ |
||||
"job": { |
||||
"id": "3f4ecf30-0213-4f1f-9cb0-0233bcedb767", |
||||
"project_id": "d01246bc5792477d9062a76332b7514a", |
||||
"type": "port_delete", |
||||
"timestamp": "2017-03-03 11:05:36", |
||||
"status": "NEW", |
||||
"resource": { |
||||
"pod_id": "0eb59465-5132-4f57-af01-a9e306158b86", |
||||
"port_id": "8498b903-9e18-4265-8d62-3c12e0ce4314" |
||||
} |
||||
} |
||||
} |
||||
|
||||
Normal Response Code: 202 |
||||
|
||||
|
||||
* Get a job |
||||
|
||||
Retrieve a job from the Tricircle database. |
||||
|
||||
The detailed information of the job will be shown. Otherwise |
||||
it will return "Resource not found" exception. |
||||
|
||||
List Request:: |
||||
|
||||
GET /v1.0/jobs/3f4ecf30-0213-4f1f-9cb0-0233bcedb767 |
||||
|
||||
Response: |
||||
{ |
||||
"job": { |
||||
"id": "3f4ecf30-0213-4f1f-9cb0-0233bcedb767", |
||||
"project_id": "d01246bc5792477d9062a76332b7514a", |
||||
"type": "port_delete", |
||||
"timestamp": "2017-03-03 11:05:36", |
||||
"status": "NEW", |
||||
"resource": { |
||||
"pod_id": "0eb59465-5132-4f57-af01-a9e306158b86", |
||||
"port_id": "8498b903-9e18-4265-8d62-3c12e0ce4314" |
||||
} |
||||
} |
||||
} |
||||
|
||||
Normal Response Code: 200 |
||||
|
||||
* Get all jobs |
||||
|
||||
Retrieve all of the jobs from the Tricircle database. |
||||
|
||||
List Request:: |
||||
|
||||
GET /v1.0/jobs/detail |
||||
|
||||
Response: |
||||
{ |
||||
"jobs": |
||||
[ |
||||
{ |
||||
"id": "3f4ecf30-0213-4f1f-9cb0-0233bcedb767", |
||||
"project_id": "d01246bc5792477d9062a76332b7514a", |
||||
"type": "port_delete", |
||||
"timestamp": "2017-03-03 11:05:36", |
||||
"status": "NEW", |
||||
"resource": { |
||||
"pod_id": "0eb59465-5132-4f57-af01-a9e306158b86", |
||||
"port_id": "8498b903-9e18-4265-8d62-3c12e0ce4314" |
||||
} |
||||
}, |
||||
{ |
||||
"id": "b01fe514-5211-4758-bbd1-9f32141a7ac2", |
||||
"project_id": "d01246bc5792477d9062a76332b7514a", |
||||
"type": "seg_rule_setup", |
||||
"timestamp": "2017-03-01 17:14:44", |
||||
"status": "FAIL", |
||||
"resource": { |
||||
"project_id": "d01246bc5792477d9062a76332b7514a" |
||||
} |
||||
} |
||||
] |
||||
} |
||||
|
||||
Normal Response Code: 200 |
||||
|
||||
* Get all jobs with filter(s) |
||||
|
||||
Retrieve job(s) from the Tricircle database. We can filter them by |
||||
project ID, job type and job status. If no filter is provided, |
||||
GET /v1.0/jobs will return all jobs. |
||||
|
||||
The response contains a list of jobs. Using filters, a subset of jobs |
||||
will be returned. |
||||
|
||||
List Request:: |
||||
|
||||
GET /v1.0/jobs?project_id=d01246bc5792477d9062a76332b7514a |
||||
|
||||
Response: |
||||
{ |
||||
"jobs": |
||||
[ |
||||
{ |
||||
"id": "3f4ecf30-0213-4f1f-9cb0-0233bcedb767", |
||||
"project_id": "d01246bc5792477d9062a76332b7514a", |
||||
"type": "port_delete", |
||||
"timestamp": "2017-03-03 11:05:36", |
||||
"status": "NEW", |
||||
"resource": { |
||||
"pod_id": "0eb59465-5132-4f57-af01-a9e306158b86", |
||||
"port_id": "8498b903-9e18-4265-8d62-3c12e0ce4314" |
||||
} |
||||
}, |
||||
{ |
||||
"id": "b01fe514-5211-4758-bbd1-9f32141a7ac2", |
||||
"project_id": "d01246bc5792477d9062a76332b7514a", |
||||
"type": "seg_rule_setup", |
||||
"timestamp": "2017-03-01 17:14:44", |
||||
"status": "FAIL", |
||||
"resource": { |
||||
"project_id": "d01246bc5792477d9062a76332b7514a" |
||||
} |
||||
} |
||||
] |
||||
} |
||||
|
||||
Normal Response Code: 200 |
||||
|
||||
|
||||
* Get all jobs' schemas |
||||
|
||||
Retrieve all jobs' schemas. User may want to know what the resources |
||||
are needed for a specific job. |
||||
|
||||
List Request:: |
||||
|
||||
GET /v1.0/jobs/schemas |
||||
|
||||
return all jobs' schemas. |
||||
Response: |
||||
{ |
||||
"schemas": |
||||
[ |
||||
{ |
||||
"type": "configure_route", |
||||
"resource": ["router_id"] |
||||
}, |
||||
{ |
||||
"type": "router_setup", |
||||
"resource": ["pod_id", "router_id", "network_id"] |
||||
}, |
||||
{ |
||||
"type": "port_delete", |
||||
"resource": ["pod_id", "port_id"] |
||||
}, |
||||
{ |
||||
"type": "seg_rule_setup", |
||||
"resource": ["project_id"] |
||||
}, |
||||
{ |
||||
"type": "update_network", |
||||
"resource": ["pod_id", "network_id"] |
||||
}, |
||||
{ |
||||
"type": "subnet_update", |
||||
"resource": ["pod_id", "subnet_id"] |
||||
}, |
||||
{ |
||||
"type": "shadow_port_setup", |
||||
"resource": [pod_id", "network_id"] |
||||
} |
||||
] |
||||
} |
||||
|
||||
Normal Response Code: 200 |
||||
|
||||
|
||||
* Delete a job |
||||
|
||||
Delete a failed or duplicated job from the Tricircle database. |
||||
A pair of curly braces will be returned if succeeds, otherwise an |
||||
exception will be thrown. What's more, we can list all jobs to verify |
||||
whether it is deleted successfully or not. |
||||
|
||||
Delete Job Request:: |
||||
|
||||
DELETE /v1.0/jobs/{id} |
||||
|
||||
Response: |
||||
This operation does not return a response body. |
||||
|
||||
Normal Response Code: 200 |
||||
|
||||
|
||||
* Redo a job |
||||
|
||||
Redo a halted job brought by the XJob server corruption or network failures. |
||||
The job handler will redo a failed job with time interval, but this Admin |
||||
API will redo a job immediately. Nothing will be returned for this request, |
||||
but we can monitor its status through the execution state. |
||||
|
||||
Redo Job Request:: |
||||
|
||||
PUT /v1.0/jobs/{id} |
||||
|
||||
Response: |
||||
This operation does not return a response body. |
||||
|
||||
Normal Response Code: 200 |
||||
|
||||
|
||||
Data Model Impact |
||||
================= |
||||
|
||||
In order to manage the jobs for each tenant, we need to filter them by |
||||
project ID. So project ID is going to be added to the AsyncJob model and |
||||
AsyncJobLog model. |
||||
|
||||
Dependencies |
||||
============ |
||||
|
||||
None |
||||
|
||||
Documentation Impact |
||||
==================== |
||||
|
||||
- Add documentation for asynchronous job management API |
||||
- Add release note for asynchronous job management API |
||||
|
||||
References |
||||
========== |
||||
|
||||
None |
||||
|
@ -0,0 +1,558 @@
|
||||
======================================== |
||||
Cross Neutron L2 networking in Tricircle |
||||
======================================== |
||||
|
||||
Background |
||||
========== |
||||
The Tricircle provides unified OpenStack API gateway and networking automation |
||||
functionality. Those main functionalities allow cloud operators to manage |
||||
multiple OpenStack instances which are running in one site or multiple sites |
||||
as a single OpenStack cloud. |
||||
|
||||
Each bottom OpenStack instance which is managed by the Tricircle is also called |
||||
a pod. |
||||
|
||||
The Tricircle has the following components: |
||||
|
||||
* Nova API-GW |
||||
* Cinder API-GW |
||||
* Neutron API Server with Neutron Tricircle plugin |
||||
* Admin API |
||||
* XJob |
||||
* DB |
||||
|
||||
Nova API-GW provides the functionality to trigger automatic networking creation |
||||
when new VMs are being provisioned. Neutron Tricircle plug-in is the |
||||
functionality to create cross Neutron L2/L3 networking for new VMs. After the |
||||
binding of tenant-id and pod finished in the Tricircle, Cinder API-GW and Nova |
||||
API-GW will pass the cinder api or nova api request to appropriate bottom |
||||
OpenStack instance. |
||||
|
||||
Please refer to the Tricircle design blueprint[1], especially from |
||||
'7. Stateless Architecture Proposal' for the detail description of each |
||||
components. |
||||
|
||||
|
||||
Problem Description |
||||
=================== |
||||
When a user wants to create a network in Neutron API Server, the user can |
||||
specify the 'availability_zone_hints'(AZ or az will be used for short for |
||||
availability zone) during network creation[5], in the Tricircle, the |
||||
'az_hints' means which AZ the network should be spread into. The 'az_hints' |
||||
meaning in Tricircle is a little different from the 'az_hints' meaning in |
||||
Neutron[5]. If no 'az_hints' was specified during network creation, this created |
||||
network will be spread into any AZ. If there is a list of 'az_hints' during the |
||||
network creation, that means the network should be able to be spread into these |
||||
AZs which are suggested by a list of 'az_hints'. |
||||
|
||||
When a user creates VM or Volume, there is also one parameter called |
||||
availability zone. The AZ parameter is used for Volume and VM co-location, so |
||||
that the Volume and VM will be created into same bottom OpenStack instance. |
||||
|
||||
When a VM is being attached to a network, the Tricircle will check whether a |
||||
VM's AZ is inside in the network's AZs scope. If a VM is not in the network's |
||||
AZs scope, the VM creation will be rejected. |
||||
|
||||
Currently, the Tricircle only supports one pod in one AZ. And only supports a |
||||
network associated with one AZ. That means currently a tenant's network will |
||||
be presented only in one bottom OpenStack instance, that also means all VMs |
||||
connected to the network will be located at one bottom OpenStack instance. |
||||
If there are more than one pod in one AZ, refer to the dynamic pod binding[6]. |
||||
|
||||
There are lots of use cases where a tenant needs a network being able to be |
||||
spread out into multiple bottom OpenStack instances in one AZ or multiple AZs. |
||||
|
||||
* Capacity expansion: tenants add VMs more and more, the capacity of one |
||||
OpenStack may not be enough, then a new OpenStack instance has to be added |
||||
to the cloud. But the tenant still wants to add new VMs into same network. |
||||
|
||||
* Cross Neutron network service chaining. Service chaining is based on |
||||
the port-pairs. Leveraging the cross Neutron L2 networking capability which |
||||
is provided by the Tricircle, the chaining could also be done by across sites. |
||||
For example, vRouter1 in pod1, but vRouter2 in pod2, these two VMs could be |
||||
chained. |
||||
|
||||
* Applications are often required to run in different availability zones to |
||||
achieve high availability. Application needs to be designed as |
||||
Active-Standby/Active-Active/N-Way to achieve high availability, and some |
||||
components inside one application are designed to work as distributed |
||||
cluster, this design typically leads to state replication or heart |
||||
beat among application components (directly or via replicated database |
||||
services, or via private designed message format). When this kind of |
||||
applications are distributedly deployed into multiple OpenStack instances, |
||||
cross Neutron L2 networking is needed to support heart beat |
||||
or state replication. |
||||
|
||||
* When a tenant's VMs are provisioned in different OpenStack instances, there |
||||
is E-W (East-West) traffic for these VMs, the E-W traffic should be only |
||||
visible to the tenant, and isolation is needed. If the traffic goes through |
||||
N-S (North-South) via tenant level VPN, overhead is too much, and the |
||||
orchestration for multiple site to site VPN connection is also complicated. |
||||
Therefore cross Neutron L2 networking to bridge the tenant's routers in |
||||
different Neutron servers can provide more light weight isolation. |
||||
|
||||
* In hybrid cloud, there is cross L2 networking requirement between the |
||||
private OpenStack and the public OpenStack. Cross Neutron L2 networking will |
||||
help the VMs migration in this case and it's not necessary to change the |
||||
IP/MAC/Security Group configuration during VM migration. |
||||
|
||||
The spec[5] is to explain how one AZ can support more than one pod, and how |
||||
to schedule a proper pod during VM or Volume creation. |
||||
|
||||
And this spec is to deal with the cross Neutron L2 networking automation in |
||||
the Tricircle. |
||||
|
||||
The simplest way to spread out L2 networking to multiple OpenStack instances |
||||
is to use same VLAN. But there is a lot of limitations: (1) A number of VLAN |
||||
segment is limited, (2) the VLAN network itself is not good to spread out |
||||
multiple sites, although you can use some gateways to do the same thing. |
||||
|
||||
So flexible tenant level L2 networking across multiple Neutron servers in |
||||
one site or in multiple sites is needed. |
||||
|
||||
Proposed Change |
||||
=============== |
||||
|
||||
Cross Neutron L2 networking can be divided into three categories, |
||||
``VLAN``, ``Shared VxLAN`` and ``Mixed VLAN/VxLAN``. |
||||
|
||||
* VLAN |
||||
|
||||
Network in each bottom OpenStack is VLAN type and has the same VLAN ID. |
||||
If we want VLAN L2 networking to work in multi-site scenario, i.e., |
||||
Multiple OpenStack instances in multiple sites, physical gateway needs to |
||||
be manually configured to make one VLAN networking be extended to other |
||||
sites. |
||||
|
||||
*Manual setup physical gateway is out of the scope of this spec* |
||||
|
||||
* Shared VxLAN |
||||
|
||||
Network in each bottom OpenStack instance is VxLAN type and has the same |
||||
VxLAN ID. |
||||
|
||||
Leverage L2GW[2][3] to implement this type of L2 networking. |
||||
|
||||
* Mixed VLAN/VxLAN |
||||
|
||||
Network in each bottom OpenStack instance may have different types and/or |
||||
have different segment IDs. |
||||
|
||||
Leverage L2GW[2][3] to implement this type of L2 networking. |
||||
|
||||
There is another network type called โLocal Networkโ. For โLocal Networkโ, |
||||
the network will be only presented in one bottom OpenStack instance. And the |
||||
network won't be presented in different bottom OpenStack instances. If a VM |
||||
in another pod tries to attach to the โLocal Networkโ, it should be failed. |
||||
This use case is quite useful for the scenario in which cross Neutron L2 |
||||
networking is not required, and one AZ will not include more than bottom |
||||
OpenStack instance. |
||||
|
||||
Cross Neutron L2 networking will be able to be established dynamically during |
||||
tenant's VM is being provisioned. |
||||
|
||||
There is assumption here that only one type of L2 networking will work in one |
||||
cloud deployment. |
||||
|
||||
|
||||
A Cross Neutron L2 Networking Creation |
||||
-------------------------------------- |
||||
|
||||
A cross Neutron L2 networking creation will be able to be done with the az_hint |
||||
attribute of the network. If az_hint includes one AZ or more AZs, the network |
||||
will be presented only in this AZ or these AZs, if no AZ in az_hint, it means |
||||
that the network can be extended to any bottom OpenStack. |
||||
|
||||
There is a special use case for external network creation. For external |
||||
network creation, you need to specify the pod_id but not AZ in the az_hint |
||||
so that the external network will be only created in one specified pod per AZ. |
||||
|
||||
*Support of External network in multiple OpenStack instances in one AZ |
||||
is out of scope of this spec.* |
||||
|
||||
Pluggable L2 networking framework is proposed to deal with three types of |
||||
L2 cross Neutron networking, and it should be compatible with the |
||||
``Local Network``. |
||||
|
||||
1. Type Driver under Tricircle Plugin in Neutron API server |
||||
|
||||
* Type driver to distinguish different type of cross Neutron L2 networking. So |
||||
the Tricircle plugin need to load type driver according to the configuration. |
||||
The Tricircle can reuse the type driver of ML2 with update. |
||||
|
||||
* Type driver to allocate VLAN segment id for VLAN L2 networking. |
||||
|
||||
* Type driver to allocate VxLAN segment id for shared VxLAN L2 networking. |
||||
|
||||
* Type driver for mixed VLAN/VxLAN to allocate VxLAN segment id for the |
||||
network connecting L2GWs[2][3]. |
||||
|
||||
* Type driver for Local Network only updating ``network_type`` for the |
||||
network to the Tricircle Neutron DB. |
||||
|
||||
When a network creation request is received in Neutron API Server in the |
||||
Tricircle, the type driver will be called based on the configured network |
||||
type. |
||||
|
||||
2. Nova API-GW to trigger the bottom networking automation |
||||
|
||||
Nova API-GW can be aware of when a new VM is provisioned if boot VM api request |
||||
is received, therefore Nova API-GW is responsible for the network creation in |
||||
the bottom OpenStack instances. |
||||
|
||||
Nova API-GW needs to get the network type from Neutron API server in the |
||||
Tricircle, and deal with the networking automation based on the network type: |
||||
|
||||
* VLAN |
||||
Nova API-GW creates network in bottom OpenStack instance in which the VM will |
||||
run with the VLAN segment id, network name and type that are retrieved from |
||||
the Neutron API server in the Tricircle. |
||||
|
||||
* Shared VxLAN |
||||
Nova API-GW creates network in bottom OpenStack instance in which the VM will |
||||
run with the VxLAN segment id, network name and type which are retrieved from |
||||
Tricricle Neutron API server. After the network in the bottom OpenStack |
||||
instance is created successfully, Nova API-GW needs to make this network in the |
||||
bottom OpenStack instance as one of the segments in the network in the Tricircle. |
||||
|
||||
* Mixed VLAN/VxLAN |
||||
Nova API-GW creates network in different bottom OpenStack instance in which the |
||||
VM will run with the VLAN or VxLAN segment id respectively, network name and type |
||||
which are retrieved from Tricricle Neutron API server. After the network in the |
||||
bottom OpenStack instances is created successfully, Nova API-GW needs to update |
||||
network in the Tricircle with the segmentation information of bottom netwoks. |
||||
|
||||
3. L2GW driver under Tricircle Plugin in Neutron API server |
||||
|
||||
Tricircle plugin needs to support multi-segment network extension[4]. |
||||
|
||||
For Shared VxLAN or Mixed VLAN/VxLAN L2 network type, L2GW driver will utilize the |
||||
multi-segment network extension in Neutron API server to build the L2 network in the |
||||
Tricircle. Each network in the bottom OpenStack instance will be a segment for the |
||||
whole cross Neutron L2 networking in the Tricircle. |
||||
|
||||
After the network in the bottom OpenStack instance was created successfully, Nova |
||||
API-GW will call Neutron server API to update the network in the Tricircle with a |
||||
new segment from the network in the bottom OpenStack instance. |
||||
|
||||
If the network in the bottom OpenStack instance was removed successfully, Nova |
||||
API-GW will call Neutron server api to remove the segment in the bottom OpenStack |
||||
instance from network in the Tricircle. |
||||
|
||||
When L2GW driver under Tricircle plugin in Neutron API server receives the |
||||
segment update request, L2GW driver will start async job to orchestrate L2GW API |
||||
for L2 networking automation[2][3]. |
||||
|
||||
|
||||
Data model impact |
||||
----------------- |
||||
|
||||
In database, we are considering setting physical_network in top OpenStack instance |
||||
as ``bottom_physical_network#bottom_pod_id`` to distinguish segmentation information |
||||
in different bottom OpenStack instance. |
||||
|
||||
REST API impact |
||||
--------------- |
||||
|
||||
None |
||||
|
||||
Security impact |
||||
--------------- |
||||
|
||||
None |
||||
|
||||
Notifications impact |
||||
-------------------- |
||||
|
||||
None |
||||
|
||||
Other end user impact |
||||
--------------------- |
||||
|
||||
None |
||||
|
||||
Performance Impact |
||||
------------------ |
||||
|
||||
None |
||||
|
||||
Other deployer impact |
||||
--------------------- |
||||
|
||||
None |
||||
|
||||
Developer impact |
||||
---------------- |
||||
|
||||
None |
||||
|
||||
|
||||
Implementation |
||||
---------------- |
||||
|
||||
**Local Network Implementation** |
||||
|
||||
For Local Network, L2GW is not required. In this scenario, no cross Neutron L2/L3 |
||||
networking is required. |
||||
|
||||
A user creates network ``Net1`` with single AZ1 in az_hint, the Tricircle plugin |
||||
checks the configuration, if ``tenant_network_type`` equals ``local_network``, |
||||
it will invoke Local Network type driver. Local Network driver under the |
||||
Tricircle plugin will update ``network_type`` in database. |
||||
|
||||
For example, a user creates VM1 in AZ1 which has only one pod ``POD1``, and |
||||
connects it to network ``Net1``. ``Nova API-GW`` will send network creation |
||||
request to ``POD1`` and the VM will be booted in AZ1 (There should be only one |
||||
pod in AZ1). |
||||
|
||||
If a user wants to create VM2 in AZ2 or ``POD2`` in AZ1, and connect it to |
||||
network ``Net1`` in the Tricircle, it would be failed. Because the ``Net1`` is |
||||
local_network type network and it is limited to present in ``POD1`` in AZ1 only. |
||||
|
||||
**VLAN Implementation** |
||||
|
||||
For VLAN, L2GW is not required. This is the most simplest cross Neutron |
||||
L2 networking for limited scenario. For example, with a small number of |
||||
networks, all VLANs are extended through physical gateway to support cross |
||||
Neutron VLAN networking, or all Neutron servers under same core switch with same visible |
||||
VLAN ranges that supported by the core switch are connected by the core |
||||
switch. |
||||
|
||||
when a user creates network called ``Net1``, the Tricircle plugin checks the |
||||
configuration. If ``tenant_network_type`` equals ``vlan``, the |
||||
Tricircle will invoke VLAN type driver. VLAN driver will |
||||
create ``segment``, and assign ``network_type`` with VLAN, update |
||||
``segment`` and ``network_type`` and ``physical_network`` with DB |
||||
|
||||
A user creates VM1 in AZ1, and connects it to network Net1. If VM1 will be |
||||
booted in ``POD1``, ``Nova API-GW`` needs to get the network information and |
||||
send network creation message to ``POD1``. Network creation message includes |
||||
``network_type`` and ``segment`` and ``physical_network``. |
||||
|
||||
Then the user creates VM2 in AZ2, and connects it to network Net1. If VM will |
||||
be booted in ``POD2``, ``Nova API-GW`` needs to get the network information and |
||||
send create network message to ``POD2``. Create network message includes |
||||
``network_type`` and ``segment`` and ``physical_network``. |
||||
|
||||
**Shared VxLAN Implementation** |
||||
|
||||
A user creates network ``Net1``, the Tricircle plugin checks the configuration, if |
||||
``tenant_network_type`` equals ``shared_vxlan``, it will invoke shared VxLAN |
||||
driver. Shared VxLAN driver will allocate ``segment``, and assign |
||||
``network_type`` with VxLAN, and update network with ``segment`` and |
||||
``network_type`` with DB |
||||
|
||||
A user creates VM1 in AZ1, and connects it to network ``Net1``. If VM1 will be |
||||
booted in ``POD1``, ``Nova API-GW`` needs to get the network information and send |
||||
create network message to ``POD1``, create network message includes |
||||
``network_type`` and ``segment``. |
||||
|
||||
``Nova API-GW`` should update ``Net1`` in Tricircle with the segment information |
||||
got by ``POD1``. |
||||
|
||||
Then the user creates VM2 in AZ2, and connects it to network ``Net1``. If VM2 will |
||||
be booted in ``POD2``, ``Nova API-GW`` needs to get the network information and |
||||
send network creation massage to ``POD2``, network creation message includes |
||||
``network_type`` and ``segment``. |
||||
|
||||
``Nova API-GW`` should update ``Net1`` in the Tricircle with the segment information |
||||
get by ``POD2``. |
||||
|
||||
The Tricircle plugin detects that the network includes more than one segment |
||||
network, calls L2GW driver to start async job for cross Neutron networking for |
||||
``Net1``. The L2GW driver will create L2GW1 in ``POD1`` and L2GW2 in ``POD2``. In |
||||
``POD1``, L2GW1 will connect the local ``Net1`` and create L2GW remote connection |
||||
to L2GW2, then populate the information of MAC/IP which resides in L2GW1. In |
||||
``POD2``, L2GW2 will connect the local ``Net1`` and create L2GW remote connection |
||||
to L2GW1, then populate remote MAC/IP information which resides in ``POD1`` in L2GW2. |
||||
|
||||
L2GW driver in the Tricircle will also detect the new port creation/deletion API |
||||
request. If port (MAC/IP) created or deleted in ``POD1`` or ``POD2``, it needs to |
||||
refresh the L2GW2 MAC/IP information. |
||||
|
||||
Whether to populate the information of port (MAC/IP) should be configurable according |
||||
to L2GW capability. And only populate MAC/IP information for the ports that are not |
||||
resides in the same pod. |
||||
|
||||
**Mixed VLAN/VxLAN** |
||||
|
||||
To achieve cross Neutron L2 networking, L2GW will be used to connect L2 network |
||||
in different Neutron servers, using L2GW should work for Shared VxLAN and Mixed VLAN/VxLAN |
||||
scenario. |
||||
|
||||
When L2GW connected with local network in the same OpenStack instance, no |
||||
matter it's VLAN or VxLAN or GRE, the L2GW should be able to connect the |
||||
local network, and because L2GW is extension of Neutron, only network |
||||
UUID should be enough for L2GW to connect the local network. |
||||
|
||||
When admin user creates network in Tricircle, he/she specifies the network |
||||
type as one of the network type as discussed above. In the phase of creating |
||||
network in Tricircle, only one record is saved in the database, no network |
||||
will be created in bottom OpenStack. |
||||
|
||||
After the network in the bottom created successfully, need to retrieve the |
||||
network information like segment id, network name and network type, and make |
||||
this network in the bottom pod as one of the segments in the network in |
||||
Tricircle. |
||||
|
||||
In the Tricircle, network could be created by tenant or admin. For tenant, no way |
||||
to specify the network type and segment id, then default network type will |
||||
be used instead. When user uses the network to boot a VM, ``Nova API-GW`` |
||||
checks the network type. For Mixed VLAN/VxLAN network, ``Nova API-GW`` first |
||||
creates network in bottom OpenStack without specifying network type and segment |
||||
ID, then updates the top network with bottom network segmentation information |
||||
returned by bottom OpenStack. |
||||
|
||||
A user creates network ``Net1``, plugin checks the configuration, if |
||||
``tenant_network_type`` equals ``mixed_vlan_vxlan``, it will invoke mixed VLAN |
||||
and VxLAN driver. The driver needs to do nothing since segment is allocated |
||||
in bottom. |
||||
|
||||
A user creates VM1 in AZ1, and connects it to the network ``Net1``, the VM is |
||||
booted in bottom ``POD1``, and ``Nova API-GW`` creates network in ``POD1`` and |
||||
queries the network detail segmentation information (using admin role), and |
||||
gets network type, segment id, then updates this new segment to the ``Net1`` |
||||
in Tricircle ``Neutron API Server``. |
||||
|
||||
Then the user creates another VM2, and with AZ info AZ2, then the VM should be |
||||
able to be booted in bottom ``POD2`` which is located in AZ2. And when VM2 should |
||||
be able to be booted in AZ2, ``Nova API-GW`` also creates a network in ``POD2``, |
||||
and queries the network information including segment and network type, |
||||
updates this new segment to the ``Net1`` in Tricircle ``Neutron API Server``. |
||||
|
||||
The Tricircle plugin detects that the ``Net1`` includes more than one network |
||||
segments, calls L2GW driver to start async job for cross Neutron networking for |
||||
``Net1``. The L2GW driver will create L2GW1 in ``POD1`` and L2GW2 in ``POD2``. In |
||||
``POD1``, L2GW1 will connect the local ``Net1`` and create L2GW remote connection |
||||
to L2GW2, then populate information of MAC/IP which resides in ``POD2`` in L2GW1. |
||||
In ``POD2``, L2GW2 will connect the local ``Net1`` and create L2GW remote connection |
||||
to L2GW1, then populate remote MAC/IP information which resides in ``POD1`` in L2GW2. |
||||
|
||||
L2GW driver in Tricircle will also detect the new port creation/deletion api |
||||
calling, if port (MAC/IP) created or deleted in ``POD1``, then needs to refresh |
||||
the L2GW2 MAC/IP information. If port (MAC/IP) created or deleted in ``POD2``, |
||||
then needs to refresh the L2GW1 MAC/IP information, |
||||
|
||||
Whether to populate MAC/IP information should be configurable according to |
||||
L2GW capability. And only populate MAC/IP information for the ports that are |
||||
not resides in the same pod. |
||||
|
||||
**L3 bridge network** |
||||
|
||||
Current implementation without cross Neutron L2 networking. |
||||
|
||||
* A special bridge network is created and connected to the routers in |
||||
different bottom OpenStack instances. We configure the extra routes of the routers |
||||
to route the packets from one OpenStack to another. In current |
||||
implementation, we create this special bridge network in each bottom |
||||
OpenStack with the same ``VLAN ID``, so we have an L2 network to connect |
||||
the routers. |
||||
|
||||
Difference between L2 networking for tenant's VM and for L3 bridging network. |
||||
|
||||
* The creation of bridge network is triggered during attaching router |
||||
interface and adding router external gateway. |
||||
|
||||
* The L2 network for VM is triggered by ``Nova API-GW`` when a VM is to be |
||||
created in one pod, and finds that there is no network, then the network |
||||
will be created before the VM is booted, network or port parameter is |
||||
required to boot VM. The IP/Mac for VM is allocated in the ``Tricircle``, |
||||
top layer to avoid IP/mac collision if they are allocated separately in |
||||
bottom pods. |
||||
|
||||
After cross Neutron L2 networking is introduced, the L3 bridge network should |
||||
be updated too. |
||||
|
||||
L3 bridge network N-S (North-South): |
||||
|
||||
* For each tenant, one cross Neutron N-S bridge network should be created for |
||||
router N-S inter-connection. Just replace the current VLAN N-S bridge network |
||||
to corresponding Shared VxLAN or Mixed VLAN/VxLAN. |
||||
|
||||
L3 bridge network E-W (East-West): |
||||
|
||||
* When attaching router interface happened, for VLAN, it will keep |
||||
current process to establish E-W bridge network. For Shared VxLAN and Mixed |
||||
VLAN/VxLAN, if a L2 network is able to expand to the current pod, then just |
||||
expand the L2 network to the pod, all E-W traffic will go out from local L2 |
||||
network, then no bridge network is needed. |
||||
|
||||
* For example, (Net1, Router1) in ``Pod1``, (Net2, Router1) in ``Pod2``, if |
||||
``Net1`` is a cross Neutron L2 network, and can be expanded to Pod2, then |
||||
will just expand ``Net1`` to Pod2. After the ``Net1`` expansion ( just like |
||||
cross Neutron L2 networking to spread one network in multiple Neutron servers ), it'll |
||||
look like (Net1, Router1) in ``Pod1``, (Net1, Net2, Router1) in ``Pod2``, In |
||||
``Pod2``, no VM in ``Net1``, only for E-W traffic. Now the E-W traffic will |
||||
look like this: |
||||
|
||||
from Net2 to Net1: |
||||
|
||||
Net2 in Pod2 -> Router1 in Pod2 -> Net1 in Pod2 -> L2GW in Pod2 ---> L2GW in |
||||
Pod1 -> Net1 in Pod1. |
||||
|
||||
Note: The traffic for ``Net1`` in ``Pod2`` to ``Net1`` in ``Pod1`` can bypass the L2GW in |
||||
``Pod2``, that means outbound traffic can bypass the local L2GW if the remote VTEP of |
||||
L2GW is known to the local compute node and the packet from the local compute |
||||
node with VxLAN encapsulation cloud be routed to remote L2GW directly. It's up |
||||
to the L2GW implementation. With the inbound traffic through L2GW, the inbound |
||||
traffic to the VM will not be impacted by the VM migration from one host to |
||||
another. |
||||
|
||||
If ``Net2`` is a cross Neutron L2 network, and can be expanded to ``Pod1`` too, |
||||
then will just expand ``Net2`` to ``Pod1``. After the ``Net2`` expansion(just |
||||
like cross Neutron L2 networking to spread one network in multiple Neutron servers ), it'll |
||||
look like (Net2, Net1, Router1) in ``Pod1``, (Net1, Net2, Router1) in ``Pod2``, |
||||
In ``Pod1``, no VM in Net2, only for E-W traffic. Now the E-W traffic will look |
||||
like this: from ``Net1`` to ``Net2``: |
||||
|
||||
Net1 in Pod1 -> Router1 in Pod1 -> Net2 in Pod1 -> L2GW in Pod1 ---> L2GW in |
||||
Pod2 -> Net2 in Pod2. |
||||
|
||||
To limit the complexity, one network's az_hint can only be specified when |
||||
creating, and no update is allowed, if az_hint need to be updated, you have |
||||
to delete the network and create again. |
||||
|
||||
If the network can't be expanded, then E-W bridge network is needed. For |
||||
example, Net1(AZ1, AZ2,AZ3), Router1; Net2(AZ4, AZ5, AZ6), Router1. |
||||
Then a cross Neutron L2 bridge network has to be established: |
||||
|
||||
Net1(AZ1, AZ2, AZ3), Router1 --> E-W bridge network ---> Router1, |
||||
Net2(AZ4, AZ5, AZ6). |
||||
|
||||
Assignee(s) |
||||
------------ |
||||
|
||||
Primary assignee: |
||||
|
||||
|
||||
Other contributors: |
||||
|
||||
|
||||
Work Items |
||||
------------ |
||||
|
||||
Dependencies |
||||
---------------- |
||||
|
||||
None |
||||
|
||||
|
||||
Testing |
||||
---------------- |
||||
|
||||
None |
||||
|
||||
|
||||
References |
||||
---------------- |
||||
[1] https://docs.google.com/document/d/18kZZ1snMOCD9IQvUKI5NVDzSASpw-QKj7l2zNqMEd3g/ |
||||
|
||||
[2] https://review.openstack.org/#/c/270786/ |
||||
|
||||
[3] https://github.com/openstack/networking-l2gw/blob/master/specs/kilo/l2-gateway-api.rst |
||||
|
||||
[4] http://developer.openstack.org/api-ref-networking-v2-ext.html#networks-multi-provider-ext |
||||
|
||||
[5] http://docs.openstack.org/mitaka/networking-guide/adv-config-availability-zone.html |
||||
|
||||
[6] https://review.openstack.org/#/c/306224/ |
@ -0,0 +1,233 @@
|
||||
=========================================== |
||||
Cross Neutron VxLAN Networking in Tricircle |
||||
=========================================== |
||||
|
||||
Background |
||||
========== |
||||
|
||||
Currently we only support VLAN as the cross-Neutron network type. For VLAN network |
||||
type, central plugin in Tricircle picks a physical network and allocates a VLAN |
||||
tag(or uses what users specify), then before the creation of local network, |
||||
local plugin queries this provider network information and creates the network |
||||
based on this information. Tricircle only guarantees that instance packets sent |
||||
out of hosts in different pods belonging to the same VLAN network will be tagged |
||||
with the same VLAN ID. Deployers need to carefully configure physical networks |
||||
and switch ports to make sure that packets can be transported correctly between |
||||
physical devices. |
||||
|
||||
For more flexible deployment, VxLAN network type is a better choice. Compared |
||||
to 12-bit VLAN ID, 24-bit VxLAN ID can support more numbers of bridge networks |
||||
and cross-Neutron L2 networks. With MAC-in-UDP encapsulation of VxLAN network, |
||||
hosts in different pods only need to be IP routable to transport instance |
||||
packets. |
||||
|
||||
Proposal |
||||
======== |
||||
|
||||
There are some challenges to support cross-Neutron VxLAN network. |
||||
|
||||
1. How to keep VxLAN ID identical for the same VxLAN network across Neutron servers |
||||
|
||||
2. How to synchronize tunnel endpoint information between pods |
||||
|
||||
3. How to trigger L2 agents to build tunnels based on this information |
||||
|
||||
4. How to support different back-ends, like ODL, L2 gateway |
||||
|
||||
The first challenge can be solved as VLAN network does, we allocate VxLAN ID in |
||||
central plugin and local plugin will use the same VxLAN ID to create local |
||||
network. For the second challenge, we introduce a new table called |
||||
"shadow_agents" in Tricircle database, so central plugin can save the tunnel |
||||
endpoint information collected from one local Neutron server in this table |
||||
and use it to populate the information to other local Neutron servers when |
||||
needed. Here is the schema of the table: |
||||
|
||||
.. csv-table:: Shadow Agent Table |
||||
:header: Field, Type, Nullable, Key, Default |
||||
|
||||
id, string, no, primary, null |
||||
pod_id, string, no, , null |
||||
host, string, no, unique, null |
||||
type, string, no, unique, null |
||||
tunnel_ip, string, no, , null |
||||
|
||||
**How to collect tunnel endpoint information** |
||||
|
||||
When the host where a port will be located is determined, local Neutron server |
||||
will receive a port-update request containing host ID in the body. During the |
||||
process of this request, local plugin can query agent information that contains |
||||
tunnel endpoint information from local Neutron database with host ID and port |
||||
VIF type; then send tunnel endpoint information to central Neutron server by |
||||
issuing a port-update request with this information in the binding profile. |
||||
|
||||
**How to populate tunnel endpoint information** |
||||
|
||||
When the tunnel endpoint information in one pod is needed to be populated to |
||||
other pods, XJob will issue port-create requests to corresponding local Neutron |
||||
servers with tunnel endpoint information queried from Tricircle database in the |
||||
bodies. After receiving such request, local Neutron server will save tunnel |
||||
endpoint information by calling real core plugin's "create_or_update_agent" |
||||
method. This method comes from neutron.db.agent_db.AgentDbMixin class. Plugins |
||||
that support "agent" extension will have this method. Actually there's no such |
||||
agent daemon running in the target local Neutron server, but we insert a record |
||||
for it in the database so the local Neutron server will assume there exists an |
||||
agent. That's why we call it shadow agent. |
||||
|
||||
The proposed solution for the third challenge is based on the shadow agent and |
||||
L2 population mechanism. In the original Neutron process, if the port status |
||||
is updated to active, L2 population mechanism driver does two things. First, |
||||
driver checks if the updated port is the first port in the target agent. If so, |
||||
driver collects tunnel endpoint information of other ports in the same network, |
||||
then sends the information to the target agent via RPC. Second, driver sends |
||||
the tunnel endpoint information of the updated port to other agents where ports |
||||
in the same network are located, also via RPC. L2 agents will build the tunnels |
||||
based on the information they received. To trigger the above processes to build |
||||
tunnels across Neutron servers, we further introduce shadow port. |
||||
|
||||
Let's say we have two instance ports, port1 is located in host1 in pod1 and |
||||
port2 is located in host2 in pod2. To make L2 agent running in host1 build a |
||||
tunnel to host2, we create a port with the same properties of port2 in pod1. |
||||
As discussed above, local Neutron server will create shadow agent during the |
||||
process of port-create request, so local Neutron server in pod1 won't complain |
||||
that host2 doesn't exist. To trigger L2 population process, we then update the |
||||
port status to active, so L2 agent in host1 will receive tunnel endpoint |
||||
information of port2 and build the tunnel. Port status is a read-only property |
||||
so we can't directly update it via ReSTful API. Instead, we issue a port-update |
||||
request with a special key in the binding profile. After local Neutron server |
||||
receives such request, it pops the special key from the binding profile and |
||||
updates the port status to active. XJob daemon will take the job to create and |
||||
update shadow ports. |
||||
|
||||
Here is the flow of shadow agent and shadow port process:: |
||||
|
||||
+-------+ +---------+ +---------+ |
||||
| | | | +---------+ | | |
||||
| Local | | Local | | | +----------+ +------+ | Local | |
||||
| Nova | | Neutron | | Central | | | | | | Neutron | |
||||
| Pod1 | | Pod1 | | Neutron | | Database | | XJob | | Pod2 | |
||||
| | | | | | | | | | | | |
||||
+---+---+ +---- ----+ +----+----+ +----+-----+ +--+---+ +----+----+ |
||||
| | | | | | |
||||
| update port1 | | | | | |
||||
| [host id] | | | | | |
||||
+---------------> | | | | |
||||
| | update port1 | | | | |
||||
| | [agent info] | | | | |
||||
| +----------------> | | | |
||||
| | | save shadow | | | |
||||
| | | agent info | | | |
||||
| | +----------------> | | |
||||
| | | | | | |
||||
| | | trigger shadow | | | |
||||
| | | port setup job | | | |
||||
| | | for pod1 | | | |
||||
| | +---------------------------------> | |
||||
| | | | | query ports in | |
||||
| | | | | the same network | |
||||
| | | | +------------------> |
||||
| | | | | | |
||||
| | | | | return port2 | |
||||
| | | | <------------------+ |
||||
| | | | query shadow | | |
||||
| | | | agent info | | |
||||
| | | | for port2 | | |
||||
| | | <----------------+ | |
||||
| | | | | | |
||||
| | | | create shadow | | |
||||
| | | | port for port2 | | |
||||
| <--------------------------------------------------+ | |
||||
| | | | | | |
||||
| | create shadow | | | | |
||||
| | agent and port | | | | |
||||
| +-----+ | | | | |
||||
| | | | | | | |
||||
| | | | | | | |
||||
| <-----+ | | | | |
||||
| | | | update shadow | | |
||||
| | | | port to active | | |
||||
| <--------------------------------------------------+ | |
||||
| | | | | | |
||||
| | L2 population | | | trigger shadow | |
||||
| +-----+ | | | port setup job | |
||||
| | | | | | for pod2 | |
||||
| | | | | +-----+ | |
||||
| <-----+ | | | | | |
||||
| | | | | | | |
||||
| | | | <-----+ | |
||||
| | | | | | |
||||
| | | | | | |
||||
+ + + + + + |
||||
|
||||
Bridge network can support VxLAN network in the same way, we just create shadow |
||||
ports for router interface and router gateway. In the above graph, local Nova |
||||
server updates port with host ID to trigger the whole process. L3 agent will |
||||
update interface port and gateway port with host ID, so similar process will |
||||
be triggered to create shadow ports for router interface and router gateway. |
||||
|
||||
Currently Neutron team is working on push notification [1]_, Neutron server |
||||
will send resource data to agents; agents cache this data and use it to do the |
||||
real job like configuring openvswitch, updating iptables, configuring dnsmasq, |
||||
etc. Agents don't need to retrieve resource data from Neutron server via RPC |
||||
any more. Based on push notification, if tunnel endpoint information is stored |
||||
in port object later, and this information supports updating via ReSTful API, |
||||
we can simplify the solution for challenge 3 and 4. We just need to create |
||||
shadow port containing tunnel endpoint information. This information will be |
||||
pushed to agents and agents use it to create necessary tunnels and flows. |
||||
|
||||
**How to support different back-ends besides ML2+OVS implementation** |
||||
|
||||
We consider two typical back-ends that can support cross-Neutron VxLAN networking, |
||||
L2 gateway and SDN controller like ODL. For L2 gateway, we consider only |
||||
supporting static tunnel endpoint information for L2 gateway at the first step. |
||||
Shadow agent and shadow port process is almost the same with the ML2+OVS |
||||
implementation. The difference is that, for L2 gateway, the tunnel IP of the |
||||
shadow agent is set to the tunnel endpoint of the L2 gateway. So after L2 |
||||
population, L2 agents will create tunnels to the tunnel endpoint of the L2 |
||||
gateway. For SDN controller, we assume that SDN controller has the ability to |
||||
manage tunnel endpoint information across Neutron servers, so Tricircle only helps to |
||||
allocate VxLAN ID and keep the VxLAN ID identical across Neutron servers for one network. |
||||
Shadow agent and shadow port process will not be used in this case. However, if |
||||
different SDN controllers are used in different pods, it will be hard for each |
||||
SDN controller to connect hosts managed by other SDN controllers since each SDN |
||||
controller has its own mechanism. This problem is discussed in this page [2]_. |
||||
One possible solution under Tricircle is as what L2 gateway does. We create |
||||
shadow ports that contain L2 gateway tunnel endpoint information so SDN |
||||
controller can build tunnels in its own way. We then configure L2 gateway in |
||||
each pod to forward the packets between L2 gateways. L2 gateways discussed here |
||||
are mostly hardware based, and can be controlled by SDN controller. SDN |
||||
controller will use ML2 mechanism driver to receive the L2 network context and |
||||
further control L2 gateways for the network. |
||||
|
||||
To distinguish different back-ends, we will add a new configuration option |
||||
cross_pod_vxlan_mode whose valid values are "p2p", "l2gw" and "noop". Mode |
||||
"p2p" works for the ML2+OVS scenario, in this mode, shadow ports and shadow |
||||
agents containing host tunnel endpoint information are created; mode "l2gw" |
||||
works for the L2 gateway scenario, in this mode, shadow ports and shadow agents |
||||
containing L2 gateway tunnel endpoint information are created. For the SDN |
||||
controller scenario, as discussed above, if SDN controller can manage tunnel |
||||
endpoint information by itself, we only need to use "noop" mode, meaning that |
||||
neither shadow ports nor shadow agents will be created; or if SDN controller |
||||
can manage hardware L2 gateway, we can use "l2gw" mode. |
||||
|
||||
Data Model Impact |
||||
================= |
||||
|
||||
New table "shadow_agents" is added. |
||||
|
||||
Dependencies |
||||
============ |
||||
|
||||
None |
||||
|
||||
Documentation Impact |
||||
==================== |
||||
|
||||
- Update configuration guide to introduce options for VxLAN network |
||||
- Update networking guide to discuss new scenarios with VxLAN network |
||||
- Add release note about cross-Neutron VxLAN networking support |
||||
|
||||
References |
||||
========== |
||||
|
||||
.. [1] https://blueprints.launchpad.net/neutron/+spec/push-notifications |
||||
.. [2] http://etherealmind.com/help-wanted-stitching-a-federated-sdn-on-openstack-with-evpn/ |
@ -0,0 +1,18 @@
|
||||
Devspecs Guide |
||||
------------------ |
||||
Some specs for developers. Who are interest in tricircle. |
||||
|
||||
.. include:: ./async_job_management.rst |
||||
.. include:: ./cross-neutron-l2-networking.rst |
||||
.. include:: ./cross-neutron-vxlan-networking.rst |
||||
.. include:: ./dynamic-pod-binding.rst |
||||
.. include:: ./enhance-xjob-reliability.rst |
||||
.. include:: ./l3-networking-combined-bridge-net.rst |
||||
.. include:: ./l3-networking-multi-NS-with-EW-enabled.rst |
||||
.. include:: ./lbaas.rst |
||||
.. include:: ./legacy_tables_clean.rst |
||||
.. include:: ./local-neutron-plugin.rst |
||||
.. include:: ./new-l3-networking-mulit-NS-with-EW.rst |
||||
.. include:: ./quality-of-service.rst |
||||
.. include:: ./resource_deleting.rst |
||||
.. include:: ./smoke-test-engine.rst |
@ -0,0 +1,236 @@
|
||||
================================= |
||||
Dynamic Pod Binding in Tricircle |
||||
================================= |
||||
|
||||
Background |
||||
=========== |
||||
|
||||
Most public cloud infrastructure is built with Availability Zones (AZs). |
||||
Each AZ is consisted of one or more discrete data centers, each with high |
||||
bandwidth and low latency network connection, separate power and facilities. |
||||
These AZs offer cloud tenants the ability to operate production |
||||
applications and databases deployed into multiple AZs are more highly |
||||
available, fault tolerant and scalable than a single data center. |
||||
|
||||
In production clouds, each AZ is built by modularized OpenStack, and each |
||||
OpenStack is one pod. Moreover, one AZ can include multiple pods. Among the |
||||
pods, they are classified into different categories. For example, servers |
||||
in one pod are only for general purposes, and the other pods may be built |
||||
for heavy load CAD modeling with GPU. So pods in one AZ could be divided |
||||
into different groups. Different pod groups for different purposes, and |
||||
the VM's cost and performance are also different. |
||||
|
||||
The concept "pod" is created for the Tricircle to facilitate managing |
||||
OpenStack instances among AZs, which therefore is transparent to cloud |
||||
tenants. The Tricircle maintains and manages a pod binding table which |
||||
records the mapping relationship between a cloud tenant and pods. When the |
||||
cloud tenant creates a VM or a volume, the Tricircle tries to assign a pod |
||||
based on the pod binding table. |
||||
|
||||
Motivation |
||||
=========== |
||||
|
||||
In resource allocation scenario, when a tenant creates a VM in one pod and a |
||||
new volume in a another pod respectively. If the tenant attempt to attach the |
||||
volume to the VM, the operation will fail. In other words, the volume should |
||||
be in the same pod where the VM is, otherwise the volume and VM would not be |
||||
able to finish the attachment. Hence, the Tricircle needs to ensure the pod |
||||
binding so as to guarantee that VM and volume are created in one pod. |
||||
|
||||
In capacity expansion scenario, when resources in one pod are exhausted, |
||||
then a new pod with the same type should be added into the AZ. Therefore, |
||||
new resources of this type should be provisioned in the new added pod, which |
||||
requires dynamical change of pod binding. The pod binding could be done |
||||
dynamically by the Tricircle, or by admin through admin api for maintenance |
||||
purpose. For example, for maintenance(upgrade, repairement) window, all |
||||
new provision requests should be forwarded to the running one, but not |
||||
the one under maintenance. |
||||
|
||||
Solution: dynamic pod binding |
||||
============================== |
||||
|
||||
It's quite headache for capacity expansion inside one pod, you have to |
||||
estimate, calculate, monitor, simulate, test, and do online grey expansion |
||||
for controller nodes and network nodes whenever you add new machines to the |
||||
pod. It's quite big challenge as more and more resources added to one pod, |
||||
and at last you will reach limitation of one OpenStack. If this pod's |
||||
resources exhausted or reach the limit for new resources provisioning, the |
||||
Tricircle needs to bind tenant to a new pod instead of expanding the current |
||||
pod unlimitedly. The Tricircle needs to select a proper pod and stay binding |
||||
for a duration, in this duration VM and volume will be created for one tenant |
||||
in the same pod. |
||||
|
||||
For example, suppose we have two groups of pods, and each group has 3 pods, |
||||
i.e., |
||||
|
||||
GroupA(Pod1, Pod2, Pod3) for general purpose VM, |
||||
|
||||
GroupB(Pod4, Pod5, Pod6) for CAD modeling. |
||||
|
||||
Tenant1 is bound to Pod1, Pod4 during the first phase for several months. |
||||
In the first phase, we can just add weight in Pod, for example, Pod1, weight 1, |
||||
Pod2, weight2, this could be done by adding one new field in pod table, or no |
||||
field at all, just link them by the order created in the Tricircle. In this |
||||
case, we use the pod creation time as the weight. |
||||
|
||||
If the tenant wants to allocate VM/volume for general VM, Pod1 should be |
||||
selected. It can be implemented with flavor or volume type metadata. For |
||||
general VM/Volume, there is no special tag in flavor or volume type metadata. |
||||
|
||||
If the tenant wants to allocate VM/volume for CAD modeling VM, Pod4 should be |
||||
selected. For CAD modeling VM/Volume, a special tag "resource: CAD Modeling" |
||||
in flavor or volume type metadata determines the binding. |
||||
|
||||
When it is detected that there is no more resources in Pod1, Pod4. Based on |
||||
the resource_affinity_tag, the Tricircle queries the pod table for available |
||||
pods which provision a specific type of resources. The field resource_affinity |
||||
is a key-value pair. The pods will be selected when there are matched |
||||
key-value in flavor extra-spec or volume extra-spec. A tenant will be bound |
||||
to one pod in one group of pods with same resource_affinity_tag. In this case, |
||||
the Tricircle obtains Pod2 and Pod3 for general purpose, as well as Pod5 an |
||||
Pod6 for CAD purpose. The Tricircle needs to change the binding, for example, |
||||
tenant1 needs to be bound to Pod2, Pod5. |
||||
|
||||
Implementation |
||||
============== |
||||
|
||||
Measurement |
||||
---------------- |
||||
|
||||
To get the information of resource utilization of pods, the Tricircle needs to |
||||
conduct some measurements on pods. The statistic task should be done in |
||||
bottom pod. |
||||
|
||||
For resources usages, current cells provide interface to retrieve usage for |
||||
cells [1]. OpenStack provides details of capacity of a cell, including disk |
||||
and ram via api of showing cell capacities [1]. |
||||
|
||||
If OpenStack is not running with cells mode, we can ask Nova to provide |
||||
an interface to show the usage detail in AZ. Moreover, an API for usage |
||||
query at host level is provided for admins [3], through which we can obtain |
||||
details of a host, including cpu, memory, disk, and so on. |
||||
|
||||
Cinder also provides interface to retrieve the backend pool usage, |
||||
including updated time, total capacity, free capacity and so on [2]. |
||||
|
||||
The Tricircle needs to have one task to collect the usage in the bottom on |
||||
daily base, to evaluate whether the threshold is reached or not. A threshold |
||||
or headroom could be configured for each pod, but not to reach 100% exhaustion |
||||
of resources. |
||||
|
||||
On top there should be no heavy process. So getting the sum info from the |
||||
bottom can be done in the Tricircle. After collecting the details, the |
||||
Tricircle can judge whether a pod reaches its limit. |
||||
|
||||
Tricircle |
||||
---------- |
||||
|
||||
The Tricircle needs a framework to support different binding policy (filter). |
||||
|
||||
Each pod is one OpenStack instance, including controller nodes and compute |
||||
nodes. E.g., |
||||
|
||||
:: |
||||
|
||||
+-> controller(s) - pod1 <--> compute nodes <---+ |
||||
| |
||||
The tricircle +-> controller(s) - pod2 <--> compute nodes <---+ resource migration, if necessary |
||||
(resource controller) .... | |
||||
+-> controller(s) - pod{N} <--> compute nodes <-+ |
||||
|
||||
|
||||
The Tricircle selects a pod to decide where the requests should be forwarded |
||||
to which controller. Then the controllers in the selected pod will do its own |
||||
scheduling. |
||||
|
||||
One simplest binding filter is as follows. Line up all available pods in a |
||||
list and always select the first one. When all the resources in the first pod |
||||
has been allocated, remove it from the list. This is quite like how production |
||||
cloud is built: at first, only a few pods are in the list, and then add more |
||||
and more pods if there is not enough resources in current cloud. For example, |
||||
|
||||
List1 for general pool: Pod1 <- Pod2 <- Pod3 |
||||
List2 for CAD modeling pool: Pod4 <- Pod5 <- Pod6 |
||||
|
||||
If Pod1's resource exhausted, Pod1 is removed from List1. The List1 is changed |
||||
to: Pod2 <- Pod3. |
||||
If Pod4's resource exhausted, Pod4 is removed from List2. The List2 is changed |
||||
to: Pod5 <- Pod6 |
||||
|
||||
If the tenant wants to allocate resources for general VM, the Tricircle |
||||
selects Pod2. If the tenant wants to allocate resources for CAD modeling VM, |
||||
the Tricircle selects Pod5. |
||||
|
||||
Filtering |
||||
------------- |
||||
|
||||
For the strategy of selecting pods, we need a series of filters. Before |
||||
implementing dynamic pod binding, the binding criteria are hard coded to |
||||
select the first pod in the AZ. Hence, we need to design a series of filter |
||||
algorithms. Firstly, we plan to design an ALLPodsFilter which does no |
||||
filtering and passes all the available pods. Secondly, we plan to design an |
||||
AvailabilityZoneFilter which passes the pods matching the specified available |
||||
zone. Thirdly, we plan to design a ResourceAffiniyFilter which passes the pods |
||||
matching the specified resource type. Based on the resource_affinity_tag, |
||||
the Tricircle can be aware of which type of resource the tenant wants to |
||||
provision. In the future, we can add more filters, which requires adding more |
||||
information in the pod table. |
||||
|
||||
Weighting |
||||
------------- |
||||
|
||||
After filtering all the pods, the Tricircle obtains the available pods for a |
||||
tenant. The Tricircle needs to select the most suitable pod for the tenant. |
||||
Hence, we need to define a weight function to calculate the corresponding |
||||
weight of each pod. Based on the weights, the Tricircle selects the pod which |
||||
has the maximum weight value. When calculating the weight of a pod, we need |
||||
to design a series of weigher. We first take the pod creation time into |
||||
consideration when designing the weight function. The second one is the idle |
||||
capacity, to select a pod which has the most idle capacity. Other metrics |
||||
will be added in the future, e.g., cost. |
||||
|
||||
Data Model Impact |
||||
----------------- |
||||
|
||||
Firstly, we need to add a column โresource_affinity_tagโ to the pod table, |
||||
which is used to store the key-value pair, to match flavor extra-spec and |
||||
volume extra-spec. |
||||
|
||||
Secondly, in the pod binding table, we need to add fields of start binding |
||||
time and end binding time, so the history of the binding relationship could |
||||
be stored. |
||||
|
||||
Thirdly, we need a table to store the usage of each pod for Cinder/Nova. |
||||
We plan to use JSON object to store the usage information. Hence, even if |
||||
the usage structure is changed, we don't need to update the table. And if |
||||
the usage value is null, that means the usage has not been initialized yet. |
||||
As just mentioned above, the usage could be refreshed in daily basis. If it's |
||||
not initialized yet, it means there is still lots of resources available, |
||||
which could be scheduled just like this pod has not reach usage threshold. |
||||
|
||||
Dependencies |
||||
------------ |
||||
|
||||
None |
||||
|
||||
|
||||
Testing |
||||
------- |
||||
|
||||
None |
||||
|
||||
|
||||
Documentation Impact |
||||
-------------------- |
||||
|
||||
None |
||||
|
||||
|
||||
Reference |
||||
--------- |
||||
|
||||
[1] http://developer.openstack.org/api-ref-compute-v2.1.html#showCellCapacities |
||||
|
||||
[2] http://developer.openstack.org/api-ref-blockstorage-v2.html#os-vol-pool-v2 |
||||
|
||||
[3] http://developer.openstack.org/api-ref-compute-v2.1.html#showinfo |
@ -0,0 +1,234 @@
|
||||
======================================= |
||||
Enhance Reliability of Asynchronous Job |
||||
======================================= |
||||
|
||||
Background |
||||
========== |
||||
|
||||
Currently we are using cast method in our RPC client to trigger asynchronous |
||||
job in XJob daemon. After one of the worker threads receives the RPC message |
||||
from the message broker, it registers the job in the database and starts to |
||||
run the handle function. The registration guarantees that asynchronous job will |
||||
not be lost after the job fails and the failed job can be redone. The detailed |
||||
discussion of the asynchronous job process in XJob daemon is covered in our |
||||
design document [1]. |
||||
|
||||
Though asynchronous jobs are correctly saved after worker threads get the RPC |
||||
message, we still have risk to lose jobs. By using cast method, it's only |
||||
guaranteed that the message is received by the message broker, but there's no |
||||
guarantee that the message can be received by the message consumer, i.e., the |
||||
RPC server thread running in XJob daemon. According to the RabbitMQ document, |
||||
undelivered messages will be lost if RabbitMQ server stops [2]. Message |
||||
persistence or publisher confirm can be used to increase reliability, but |
||||
they sacrifice performance. On the other hand, we can not assume that message |
||||
brokers other than RabbitMQ will provide similar persistence or confirmation |
||||
functionality. Therefore, Tricircle itself should handle the asynchronous job |
||||
reliability problem as far as possible. Since we already have a framework to |
||||
register, run and redo asynchronous jobs in XJob daemon, we propose a cheaper |
||||
way to improve reliability. |
||||
|
||||
Proposal |
||||
======== |
||||
|
||||
One straightforward way to make sure that the RPC server has received the RPC |
||||
message is to use call method. RPC client will be blocked until the RPC server |
||||
replies the message if it uses call method to send the RPC request. So if |
||||
something wrong happens before the reply, RPC client can be aware of it. Of |
||||
course we cannot make RPC client wait too long, thus RPC handlers in the RPC |
||||
server side need to be simple and quick to run. Thanks to the asynchronous job |
||||
framework we already have, migrating from cast method to call method is easy. |
||||
|
||||
Here is the flow of the current process:: |
||||
|
||||
+--------+ +--------+ +---------+ +---------------+ +----------+ |
||||
| | | | | | | | | | |
||||
| API | | RPC | | Message | | RPC Server | | Database | |
||||
| Server | | client | | Broker | | Handle Worker | | | |
||||
| | | | | | | | | | |
||||
+---+----+ +---+----+ +----+----+ +-------+-------+ +----+-----+ |
||||
| | | | | |
||||
| call RPC API | | | | |
||||
+--------------> | | | |
||||
| | send cast message | | | |
||||
| +-------------------> | | |
||||
| call return | | dispatch message | | |
||||
<--------------+ +------------------> | |
||||
| | | | register job | |
||||
| | | +----------------> |
||||
| | | | | |
||||
| | | | obtain lock | |
||||
| | | +----------------> |
||||
| | | | | |
||||
| | | | run job | |
||||
| | | +----+ | |
||||
| | | | | | |
||||
| | | | | | |
||||
| | | <----+ | |
||||
| | | | | |
||||
| | | | | |
||||
+ + + + + |
||||
|
||||
We can just leave **register job** phase in the RPC handle and put **obtain |
||||
lock** and **run job** phase in a separate thread, so the RPC handle is simple |
||||
enough to use call method to invoke it. Here is the proposed flow:: |
||||
|
||||
+--------+ +--------+ +---------+ +---------------+ +----------+ +-------------+ +-------+ |
||||
| | | | | | | | | | | | | | |
||||
| API | | RPC | | Message | | RPC Server | | Database | | RPC Server | | Job | |
||||
| Server | | client | | Broker | | Handle Worker | | | | Loop Worker | | Queue | |
||||
| | | | | | | | | | | | | | |
||||
+---+----+ +---+----+ +----+----+ +-------+-------+ +----+-----+ +------+------+ +---+---+ |
||||
| | | | | | | |
||||
| call RPC API | | | | | | |
||||
+--------------> | | | | | |
||||
| | send call message | | | | | |
||||
| +--------------------> | | | | |
||||
| | | dispatch message | | | | |
||||
| | +------------------> | | | |
||||
| | | | register job | | | |
||||
| | | +----------------> | | |
||||
| | | | | | | |
||||
| | | | job enqueue | | | |
||||
| | | +------------------------------------------------> |
||||
| | | | | | | |
||||
| | | reply message | | | job dequeue | |
||||
| | <------------------+ | |--------------> |
||||
| | send reply message | | | obtain lock | | |
||||
| <--------------------+ | <----------------+ | |
||||
| call return | | | | | | |
||||
<--------------+ | | | run job | | |
||||
| | | | | +----+ | |
||||
| | | | | | | | |
||||
| | | | | | | | |
||||
| | | | | +----> | |
||||
| | | | | | | |
||||
| | | | | | | |
||||
+ + + + + + + |
||||
|
||||
In the above graph, **Loop Worker** is a new-introduced thread to do the actual |
||||
work. **Job Queue** is an eventlet queue used to coordinate **Handle |
||||
Worker** who produces job entries and **Loop Worker** who consumes job entries. |
||||
While accessing an empty queue, **Loop Worker** will be blocked until some job |
||||
entries are put into the queue. **Loop Worker** retrieves job entries from the |
||||
job queue then starts to run it. Similar to the original flow, since multiple |
||||
workers may get the same type of job for the same resource at the same time, |
||||
workers need to obtain the lock before it can run the job. One problem occurs |
||||
whenever XJob daemon stops before it finishes all the jobs in the job queue; |
||||
all unfinished jobs are lost. To solve it, we make changes to the original |
||||
periodical task that is used to redo failed job, and let it also handle the |
||||
jobs which have been registered for a certain time but haven't been started. |
||||
So both failed jobs and "orphan" new jobs can be picked up and redone. |
||||
|
||||
You can see that **Handle Worker** doesn't do many works, it just consumes RPC |
||||
messages, registers jobs then puts job items in the job queue. So one extreme |
||||
solution here, will be to register new jobs in the API server side and start |
||||
worker threads to retrieve jobs from the database and run them. In this way, we |
||||
can remove all the RPC processes and use database to coordinate. The drawback |
||||
of this solution is that we don't dispatch jobs. All the workers query jobs |
||||
from the database so there is high probability that some of the workers obtain |
||||
the same job and thus race occurs. In the first solution, message broker |
||||
helps us to dispatch messages, and so dispatch jobs. |
||||
|
||||
Considering job dispatch is important, we can make some changes to the second |
||||
solution and move to the third one, that is to also register new jobs in the |
||||
API server side, but we still use cast method to trigger asynchronous job in |
||||
XJob daemon. Since job registration is done in the API server side, we are not |
||||
afraid that the jobs will be lost if cast messages are lost. If API server side |
||||
fails to register the job, it will return response of failure; If registration |
||||
of job succeeds, the job will be done by XJob daemon at last. By using RPC, we |
||||
dispatch jobs with the help of message brokers. One thing which makes cast |
||||
method better than call method is that retrieving RPC messages and running job |
||||
handles are done in the same thread so if one XJob daemon is busy handling |
||||
jobs, RPC messages will not be dispatched to it. However when using call |
||||
method, RPC messages are retrieved by one thread(the **Handle Worker**) and job |
||||
handles are run by another thread(the **Loop Worker**), so XJob daemon may |
||||
accumulate many jobs in the queue and at the same time it's busy handling jobs. |
||||
This solution has the same problem with the call method solution. If cast |
||||
messages are lost, the new jobs are registered in the database but no XJob |
||||
daemon is aware of these new jobs. Same way to solve it, use periodical task to |
||||
pick up these "orphan" jobs. Here is the flow:: |
||||
|
||||
+--------+ +--------+ +---------+ +---------------+ +----------+ |
||||
| | | | | | | | | | |
||||
| API | | RPC | | Message | | RPC Server | | Database | |
||||
| Server | | client | | Broker | | Handle Worker | | | |
||||
| | | | | | | | | | |
||||
+---+----+ +---+----+ +----+----+ +-------+-------+ +----+-----+ |
||||
| | | | | |
||||
| call RPC API | | | | |
||||
+--------------> | | | |
||||
| | register job | | | |
||||
| +-------------------------------------------------------> |
||||
| | | | | |
||||
| | [if succeed to | | | |
||||
| | register job] | | | |
||||
| | send cast message | | | |
||||
| +-------------------> | | |
||||
| call return | | dispatch message | | |
||||
<--------------+ +------------------> | |
||||
| | | | obtain lock | |
||||
| | | +----------------> |
||||
| | | | | |
||||
| | | | run job | |
||||
| | | +----+ | |
||||
| | | | | | |
||||
| | | | | | |
||||
| | | <----+ | |
||||
| | | | | |
||||
| | | | | |
||||
+ + + + + |
||||
|
||||
Discussion |
||||
========== |
||||
|
||||
In this section we discuss the pros and cons of the above three solutions. |
||||
|
||||
.. list-table:: **Solution Comparison** |
||||
:header-rows: 1 |
||||
|
||||
* - Solution |
||||
- Pros |
||||
- Cons |
||||
* - API server uses call |
||||
- no RPC message lost |
||||
- downtime of unfinished jobs in the job queue when XJob daemon stops, |
||||
job dispatch not based on XJob daemon workload |
||||
* - API server register jobs + no RPC |
||||
- no requirement on RPC(message broker), no downtime |
||||
- no job dispatch, conflict costs time |
||||
* - API server register jobs + uses cast |
||||
- job dispatch based on XJob daemon workload |
||||
- downtime of lost jobs due to cast messages lost |
||||
|
||||
Downtime means that after a job is dispatched to a worker, other workers need |
||||
to wait for a certain time to determine that job is expired and take over it. |
||||
|
||||
Conclusion |
||||
========== |
||||
|
||||
We decide to implement the third solution(API server register jobs + uses cast) |
||||
since it improves the asynchronous job reliability and at the mean time has |
||||
better work load dispatch. |
||||
|
||||
Data Model Impact |
||||
================= |
||||
|
||||
None |
||||
|
||||
Dependencies |
||||
============ |
||||
|
||||
None |
||||
|
||||
Documentation Impact |
||||
==================== |
||||
|
||||
None |
||||
|
||||
References |
||||
========== |
||||
|
||||
..[1] https://docs.google.com/document/d/1zcxwl8xMEpxVCqLTce2-dUOtB-ObmzJTbV1uSQ6qTsY |
||||
..[2] https://www.rabbitmq.com/tutorials/tutorial-two-python.html |
||||
..[3] https://www.rabbitmq.com/confirms.html |
||||
..[4] http://eventlet.net/doc/modules/queue.html |
@ -0,0 +1,8 @@
|
||||
========================== |
||||
Tricircle Devspecs Guide |
||||
========================== |
||||
|
||||
.. toctree:: |
||||
:maxdepth: 4 |
||||
|
||||
devspecs-guide |
@ -0,0 +1,554 @@
|
||||
============================================== |
||||
Layer-3 Networking and Combined Bridge Network |
||||
============================================== |
||||
|
||||
Background |
||||
========== |
||||
|
||||
To achieve cross-Neutron layer-3 networking, we utilize a bridge network to |
||||
connect networks in each Neutron server, as shown below: |
||||
|
||||
East-West networking:: |
||||
|
||||
+-----------------------+ +-----------------------+ |
||||
| OpenStack1 | | OpenStack2 | |
||||
| | | | |
||||
| +------+ +---------+ | +------------+ | +---------+ +------+ | |
||||
| | net1 | | ip1| | | bridge net | | |ip2 | | net2 | | |
||||
| | +--+ R +---+ +---+ R +--+ | | |
||||
| | | | | | | | | | | | | | |
||||
| +------+ +---------+ | +------------+ | +---------+ +------+ | |
||||
+-----------------------+ +-----------------------+ |
||||
|
||||
Fig 1 |
||||
|
||||
North-South networking:: |
||||
|
||||
+---------------------+ +-------------------------------+ |
||||
| OpenStack1 | | OpenStack2 | |
||||
| | | | |
||||
| +------+ +-------+ | +--------------+ | +-------+ +----------------+ | |
||||
| | net1 | | ip1| | | bridge net | | | ip2| | external net | | |
||||
| | +--+ R1 +---+ +---+ R2 +--+ | | |
||||
| | | | | | | 100.0.1.0/24 | | | | | 163.3.124.0/24 | | |
||||
| +------+ +-------+ | +--------------+ | +-------+ +----------------+ | |
||||
+---------------------+ +-------------------------------+ |
||||
|
||||
Fig 2 |
||||
|
||||
To support east-west networking, we configure extra routes in routers in each |
||||
OpenStack cloud:: |
||||
|
||||
In OpenStack1, destination: net2, nexthop: ip2 |
||||
In OpenStack2, destination: net1, nexthop: ip1 |
||||
|
||||
To support north-south networking, we set bridge network as the external |
||||
network in OpenStack1 and as the internal network in OpenStack2. For instance |
||||
in net1 to access the external network, the packets are SNATed twice, first |
||||
SNATed to ip1, then SNATed to ip2. For floating ip binding, ip in net1 is first |
||||
bound to ip(like 100.0.1.5) in bridge network(bridge network is attached to R1 |
||||
as external network), then the ip(100.0.1.5) in bridge network is bound to ip |
||||
(like 163.3.124.8)in the real external network (bridge network is attached to |
||||
R2 as internal network). |
||||
|
||||
Problems |
||||
======== |
||||
|
||||
The idea of introducing a bridge network is good, but there are some problems |
||||
in the current usage of the bridge network. |
||||
|
||||
Redundant Bridge Network |
||||
------------------------ |
||||
|
||||
We use two bridge networks to achieve layer-3 networking for each tenant. If |
||||
VLAN is used as the bridge network type, limited by the range of VLAN tag, only |
||||
2048 pairs of bridge networks can be created. The number of tenants supported |
||||
is far from enough. |
||||
|
||||
Redundant SNAT |
||||
-------------- |
||||
|
||||
In the current implementation, packets are SNATed two times for outbound |
||||
traffic and are DNATed two times for inbound traffic. The drawback is that |
||||
packets of outbound traffic consume extra operations. Also, we need to maintain |
||||
extra floating ip pool for inbound traffic. |
||||
|
||||
DVR support |
||||
----------- |
||||
|
||||
Bridge network is attached to the router as an internal network for east-west |
||||
networking and north-south networking when the real external network and the |
||||
router are not located in the same OpenStack cloud. It's fine when the bridge |
||||
network is VLAN type, since packets directly go out of the host and are |
||||
exchanged by switches. But if we would like to support VxLAN as the bridge |
||||
network type later, attaching bridge network as an internal network in the |
||||
DVR scenario will cause some troubles. How DVR connects the internal networks |
||||
is that pack |