Currently, the dhcp Provisioning of ports is the crucial bottleneck
of that concurrently boot multiple VM.
The root cause is that these ports will be processed one by one by dhcp
agent when they belong to the same network, And the 'Provisioning complete'
port is still blocked other port's processing in other dhcp agents. The
patch aim to optimize the dispatch strategy of the port cast to agent to
improve the Provisioning process.
In server side, I classify messages to multi levels. Especially, I classify
the port_update_end or port_create_end message to two levels, the high-level
message only cast to one agent, the low-level message cast to all agent. In
agent side I put these messages to `resource_processing_queue`, with the queue,
We can delete `_net_lock` and process these messages in order of priority.
Additonally, I modified the `resource_processing_queue` for my demand. I update
`_queue` from LIST to PriorityQueue in `ExclusiveResourceProcessor`, by this
way, we can sort all message which cached in `ExclusiveResourceProcessor` by
priority.
Related-Bug: #1760047
Change-Id: I255caa0571c42fb012fe882259ef181070beccef
'integrated-gate-py35' template is going to be
renamed to 'integrated-gate-py3' in https://review.openstack.org/#/c/626078/
Integrated jobs are running on Bionic now where python 3.6 is available.
Which means gate jobs in 'integrated-gate-py35' template are
running on python 3.6 not on 3.5 which makes this template name confusing.
depends on commit rename the 'integrated-gate-py35' to 'integrated-gate-py3'
so that it can convey that template will use available python 3 version
in used distro. For example: 3.5 in xenial and 3.6 in bionic and so on.
This commit starts using the new template name so that old
template name can be removed.
Depends-On: https://review.openstack.org/#/c/626078/
Change-Id: Id14b1f46a6d98e284d97149287245dd59ca198c4
IPv6 address format in dnsmasq leases file is incorrect (correct format
is described in bug description). This bad formatting generates the
following error when initializing dnsmasq:
dnsmasq[20603]: failed to parse lease database, invalid line: \
1547121093 fa:16:3e:a0:3a:9a [fd5b:1fd5:8295:5339::43] * ...
This patch removes the IPv6 addresses from the leases file, as proposed
in the bug, because the DHCP agent does not have the IAID (identity
association identifier) of each IPv6 address assigned.
In case of agent restart, dnsmasq won't have any IPv6 address in the
leases file, but the hosts file and the additional hosts file will
contain all MAC/IPv6 previous assignations. When the IPv6 client sends
a DHCPDISCOVER, dnsmasq will offer the same IPv6 address to this client.
At the same time, the client will request to the server the same address:
DHCPDISCOVER(tap2c14823a-e6) fa:16:3e:54:c6:8e
DHCPOFFER(tap2c14823a-e6) fd5b:1fd5:8295:5339::43 fa:16:3e:54:c6:8e
DHCPREQUEST(tap2c14823a-e6) fd5b:1fd5:8295:5339::43 fa:16:3e:54:c6:8e
DHCPACK(tap2c14823a-e6) fd5b:1fd5:8295:5339::43 fa:16:3e:54:c6:8e \
host-fd5b-1fd5-8295-5339--43
Once dnsmasq updates the leases database, rewrites the leases file with the
new IPv6 address (including the IAID) and the server DUID (if not present).
Change-Id: Ib1b2f284ab81f1c4af7b08b5257b45a3f6e79c3e
Closes-Bug: #1722126
The native OVS/ofctl controllers talk to the bridges using a
datapath-id, instead of the bridge name. The datapath ID is
auto-generated based on the MAC address of the bridge's NIC.
In the case where bridges are on VLAN interfaces, they would
have the same MACs, therefore the same datapath-id, causing
flows for one physical bridge to be programmed on each other.
The datapath-id is a 64-bit field, with lower 48 bits being
the MAC. We set the upper 12 unused bits to identify each
unique physical bridge
This could also be fixed manually using ovs-vsctl set, but
it might be beneficial to automate this in the code.
ovs-vsctl set bridge <mybr> other-config:datapath-id=<datapathid>
You can change this yourself using above command.
You can view/verify current datapath-id via
ovs-vsctl get Bridge br-vlan datapath-id
"00006ea5a4b38a4a"
(please note that other-config is needed in the set, but not get)
Closes-Bug: #1697243
Co-Authored-By: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Change-Id: I575ddf0a66e2cfe745af3874728809cf54e37745
The DVR Edge router code creates an IPDevice() object just
to make a single call to add an IP address. Change it to
call ip_lib.add_ip_address() directly instead since that's
what's being done in the IpAddrCommand.add() code anways.
Trivialfix
Change-Id: Ie7640fc54494de89e85b2f528bddc79875a16046
Since port creating can result an IP address in the
entire CIDR especially small subnet. And those next
N IP actions can be out of subnet IP range. This
patch gives the original test port a specific IP
addr to prevent this issue.
Closes-Bug: #1812404
Change-Id: I34cb99a518d4469c7d1ca9e2897671608b2b81ad
While working on this module, I noticed a couple of inconsistencies
in how we were calling nfct. Specifically, the NFNL_SUBSYS_CTNETLINK
value is supposed to be 1[1], and the order of arguments to nfct_open
is subsys_id then subscriptions[2]. We were passing them in the
opposite order, which didn't particularly matter because both were
defined to be 0. Now that the subsystem identifier is correctly
defined it does matter though.
Change-Id: I9fb74a9ef7a83cd630afa1e1ea0e2fc0c6df3943
1: https://git.netfilter.org/libnfnetlink/tree/include/libnfnetlink/linux_nfnetlink.h#n45
2: https://git.netfilter.org/libnetfilter_conntrack/tree/src/main.c#n68
Sometime between liberty and pike, adding rules to SG's got
slow, and slower with every rule. Streamline the rule create path,
and get close to the old performance back.
Two performance fixes:
1. Get rid of an n^2 duplicate check, using a hash table instead,
on bulk creates. This is more memory intensive than the previous loop,
but usable far past where the other becomes too slow to be useful.
2. Use an object existence check in a few places where we do not
want to load all of the child rules.
Co-Authored-By: William Hager <whager@salesforce.com>
Change-Id: I34e41a128f28211f2e7ab814a2611ce22620fcf3
Closes-bug: 1810563
Oslo_concurrency needs lock_path option, make it consistent in
documentation for Suse, Redhat and Ubuntu installation guides.
Change-Id: Ib675d7bf399f2aa7eba9d343fa0f06281d33089a
Related-Bug: #1796976
Closes-Bug: #1812497
In version 4.15 of iproute2 there was added support
for chain index in tc_filter [1].
Such version is available e.g. in Ubuntu 18.04 and it
has to be supported in l3_tc_lib regex to match
properly output of "tc filter" command.
[1] https://lwn.net/Articles/745643/
Closes-bug: #1809497
Change-Id: Id4066b5cff933ccd0dd3c751bf67b5d58af662d1
This patch cause some race condition in neutron-ovs-agent
and tempest-slow job was failing quite often.
Please see related bug report for details.
Closes-Bug: #1812552
This reverts commit f8e0a497ad.
Change-Id: Id51f2abaf3c8d57abdd06f024120da526ed40185
A new parameter rpc_response_max_timeout is added and registered into
neutron.conf.
The rpc_response_max_timeout plays a role of the ceiling of timeout
seconds when waiting for the response of a remote rpc server.
During an RPC call, the waiting time starts from the existing parameter
rpc_response_timeout(default 60s) and doubled each time until it reaches
the ceiling which is currently set as 10 times rpc_response_timeout.
It seems to be less flexible since user cannot directly change the
ceiling value unless he/she changes the rpc_response_timeout.
By adding rpc_response_max_timeout, user can now modify it without
changing any other parameters.
Co-Authored-By: Allain Legacy<Allain.legacy@windriver.com>
Change-Id: I170113c2946cc95308edcb1a703a99c71e50b6f9
Related-Bug: #1805769
Story: 2004456
Task: 28171
It fixes raising exception for response with not recognized
status code.
Co-Authored-By: Brian Haley <haleyb.dev@gmail.com>
Change-Id: I174ff62cb6599e4c7bdc86cb2d0786f9f2499b00
Related-Bug: 1790598
During the l2-agent stop, if the policy rule is cleared,
after the l2-agent is started, the qos rule that has been applied should be cleared.
Change-Id: Iaaff10dfa8ac6ab8c9dead3124e2bb3caa03a665
Closes-Bug: #1810025
There is no need to trigger neutron-openvswitch-agent to
start processing ports in case if invalid port (with
ofport=-1) will be added to bridge.
Such port will be processed later when ofport will be
changed.
Change-Id: I3465daf4809d5d56565f59b177b5f8870352cc9d
Related-Bug: #1808171
Related-Bug: #1811405
If DHCP agent port cache is out of sync with neutron server, dnsmasq
entries are wrong and VMs may not acquire an IP because of duplicate
entries.
When DHCP agent executes port_create_end method, port's
IP should be checked before being used, if there are duplicate IP
addresses in the same network in the cache we should resync.
Co-Authored-By: doreilly@suse.com
Closes-Bug: #1645835
Change-Id: Icc555050283420fddfb90bb67e02bc303e989e27
The L3AgentExtension class delete_router() method expects a
dict as it's 'data' argument, but the l3-agent code that
deletes a router was passing just the router ID. Change to
correctly pass a router dictionary if one exists.
Change-Id: I112d1f8dce9defddfbd8fbfa75bf538e308e1561
Closes-bug: #1809134
When the qos plugin is handling a port resource request through it's
port resource request extension, sometimes the network a port is
attached to is looked up and returns None. It may happen like that
if network will be deleted in concurrent API request.
Change-Id: Ide4acdf4c373713968f9d43274fb0c7550283c11
Closes-Bug: #1810504
This job is used only in stable/ocata branch so there is no need to
keep those hooks in master branch anymore.
Change-Id: I65f712934314122da0bb2f14d1e3fe9cbd5dd759
It is workaround of bug in pyroute2 library which, when
running in multithread environment, sometimes have issues
with NetNS class.
When NetNS.__init__() is called, it uses os.pipe() function to
create 2 file descriptors which are used to communicated between
2 processes.
In some cases when multiple threads are running it might happen
that in two instances of NetNS() class there will be same file
descriptors used and that leads to problems when one thread
closes file descriptor and second still wants to use it.
With this patch functions which uses instance of pyroute2.NetNS class
are locked thus there shouldn't be risk of using same file descriptors
in 2 separate threads.
Co-Authored-By: Rodolfo Alonso Hernandez <ralonsoh@redhat.com>
Change-Id: Id5e6f2f8e9c31a7138da9cd6792e9d75845b81c5
Closes-Bug: #1811515