Bug #1244589 re-appeared for IPv6.
This change adds an ip6tables rule to fix the checksum of DHCPv6
response packets. Those checksums were left unfilled by virtio (as a
hypervisor internal optimization), but some picky dhcp clients (AFAIU
particularly ISC dhclient) try verifying the checksums, so they fail
to acquire an address if the checksums are left incorrect.
Change-Id: I4a045e0dcfcbd3c7959a78f1460d5bf7da0252ff
Closes-Bug: #1811639
Related-Bug: #1244589
(cherry picked from commit 26eb2509fe)
If DHCP agent port cache is out of sync with neutron server, dnsmasq
entries are wrong and VMs may not acquire an IP because of duplicate
entries.
When DHCP agent executes port_create_end method, port's
IP should be checked before being used, if there are duplicate IP
addresses in the same network in the cache we should resync.
Co-Authored-By: doreilly@suse.com
Closes-Bug: #1645835
Change-Id: Icc555050283420fddfb90bb67e02bc303e989e27
The unit test
test_enable_dhcp_helper_enable_metadata_nonisolated_dist_network
modifies the global variables fake_port1, fake_port2, creating flakiness
on unit tests that use those variables when execured in environments
with high concurrency.
Creating a deepcopy of the variable avoid that those changes can be
propagated to other unit tests.
Closes-Bug: #1809643
Change-Id: Idfd0e99739952baf4d7b545b406cd1b251deb5f8
Signed-off-by: aojeagarcia <aojeagarcia@suse.com>
(cherry picked from commit e83e5618b7)
When a subnet's enable_dhcp attribute is updated, we must restart
dhcp device. So,when we decide whether 'restart' or
'reload_allocations' in refresh_dhcp_helper function we only compare
the cidr of subnets which enabled dhcp.
The previous logic only calls 'restart' when deleting or adding a
subnet. This may cause the dhcp port not updated when the subnet's
enable_dhcp is updated to True.
Change-Id: Ic547946ac786c5fab82b4ee7078bf86483f51eb5
Closes-Bug: #1805824
(cherry picked from commit 9aa7af8221)
The port delete events are not synchronized with network rpc events. This
creates a condition which makes it possible for a port delete event to be
processed just before a previously started network query completes.
The problematic order of operations is as follows:
1) a network is scheduled to an agent; a network rpc is sent to the
agent
2) the agent queries the network data from the server
3) while that query is in progress a port on that network is deleted; a
port rpc is sent to the agent
4) that port delete rpc is received before the network query rpc
completes
5) the port delete results in no action because the port was not present
on the agent
6) the network query finishes and adds the port to the cache (even
though the port has already been deleted)
7) some time passes and a new port is configured with the same IP
address as the port that was deleted in (3)
8) the dhcp host file is corrupted with 2 entries for the same IP
address.
9) dhcp queries for the newest port is rejected because of the duplicate
entry in the dhcp host file.
The solution is to add the network_id to the port_delete_end rpc event
so that the _net_lock(network_id) synchronization point can be acquired
so that it is processed serially with other network related events.
To ensure backwards compatibility with newer agents running against older
servers the determination of which network_id value to use in the lock is
handled using a utility that will fallback to the previous mode of operation
whenever the network_id attribute is not present in the *_delete_end RPC
events. That utility can be removed in the future when it is guaranteed
that the network_id attribute will be present in RPC messages from the
server.
Closes-Bug: #1732456
Change-Id: I735f8b1c9248b12e5feb6cbe970cf67f321e6ebc
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
(cherry picked from commit fa78b58010)
When a network becomes isolated and isolated_metadata_enabled=True, the DHCP
agent won't spawn the required metadata proxy instance unless the agent gets
restarted. Similarly, it won't stop them when the network is no longer
isolated.
This patch fixes it by updating the isolated metadata proxy on port_update_end
and port_delete_end methods which are invoked every time a router interface
port is added, updated or deleted.
Change-Id: I5c197a5755135357c6465dfe4803019a2ad52c14
Closes-Bug: #1753540
Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
(cherry picked from 9362d4f1f2)
(cherry picked from commit b07aa19deb)
As reported in the bug, there may be an case where an empty
namespace file in /run/netns, but the namespace not
actually exist. In such case the DHCP agent throws an error
when pluging the interface in the dhcp namespace.
This may also result in many tap interfaces
getting generated in OVS bridge or Linux bridge.
This patch fixes the above bug by unpluging the tap device
in the bridge if exception occurs, this can prevents the tap
interfaces generate.
Co-Authored-By: Brian Haley <bhaley@redhat.com>
Change-Id: I4a197edd180887ad36317ddb2f0c0e7bd2e34e30
Closes-Bug: #1561695
(cherry picked from commit 38d058c2cf)
Provisioning blocks merged in Newton so for Pike we can
safely assume we are not running with Liberty agents that
don't notify the server when the port is ready.
This also drops a block of logic in the agent that was providing
forward compatibility with servers that didn't support the
'dhcp_ready_on_ports' endpoint since servers have been supporting
it for so long and we don't normally allow agents to be upgraded
first anyway.
Related-Bug: #1453350
Change-Id: Ia86547fb4601915d7dd852b6f7a11c120089d6f6
The DHCP namespace used to always have its IPv6 default
route configured from a received Router Advertisement (RA).
A recent change [1] disabled receipt of RAs, instead
relying on the network topology to configure the namespace.
Unfortunately the code only added an IPv4 default route,
which caused a regression with DNS resolution in some
circumstances where IPv6 was being used.
A default route is now added for both IP versions.
[1] https://review.openstack.org/#/c/386687/
Change-Id: I7c388f64c0aa9feb002f7a2faf76e7ccca30a3e7
Closes-bug: 1684682
Without this commit, the run_as_root parameter is always True when
stopping a process, which leads to the usage of unnecessary sudo such as
in some functional tests, like the keepalived ones.
This commit fixes the aforemetioned problem by taking run_as_root into
account when stopping a process. However, run_as_root will still always
be True if the process is spawned in a netns.
Closes-Bug: #1491581
Change-Id: Ib40e1e3357b9a38e760f4e552bf615cdfd54ee5a
Signed-off-by: Hunt Xu <mhuntxu@gmail.com>
Refactoring Neutron configuration options for agent common config to be
in neutron/conf/agent/common. This will allow centralization of all
configuration options and provide an easy way to import.
Partial-Bug: #1563069
Change-Id: Iebac0cdd3bcfd0135349128921b7ad7a1a939ab8
Needed-By: Ib676003bbe909b5a9013a3178b12dbe291d936af
Due to the high memory footprint of current Python ns-metadata-proxy,
it has to be replaced with a lighter process to avoid OOM conditions in
large environments.
This patch spawns haproxy through a process monitor using a pidfile.
This allows tracking the process and respawn it if necessary as it was
done before. Also, it implements an upgrade path which consists of
detecting any running Python instance of ns-metadata-proxy and
replacing them by haproxy. Therefore, upgrades will take place by
simply restarting neutron-l3-agent and neutron-dhcp-agent.
According to /proc/<pid>/smaps, memory footprint goes down from ~50MB
to ~1.5MB.
Also, haproxy is added to bindep in order to ensure that it's installed.
UpgradeImpact
Depends-On: I36a5531cacc21c0d4bb7f20d4bec6da65d04c262
Depends-On: Ia37368a7ff38ea48c683a7bad76f87697e194b04
Closes-Bug: #1524916
Change-Id: I5a75cc582dca48defafb440207d10e2f7b4f218b
When force_metadata=True and enable_isolated_metadata=False,
the namespace metadata proxy process might not be terminated
when the network is deleted because the subnets and ports
will have already been deleted, so we could incorrectly
determine it was started. Calling destroy_monitored_metadata_proxy() is
a noop when there is no process running.
Change-Id: I77ff545ce02f2dca4c38e587b37ea809ad6f072c
Closes-Bug: #1648095
Looking at the cache before aqcuiring a lock may cause the
agent to mistakenly think the network doesn't exist when it
is actually being wired in parallel.
Always acquiring the network-based semaphore will ensure that
the network isn't currently being setup in another coroutine.
Closes-Bug: #1659919
Change-Id: I99ae71e3c5b1cd91dca3f6c80b04d2ecb79de64f
There were several repeated instances of the same strings being used;
this patch puts them into reusable variables.
Change-Id: Ib8f621fc89306b10dc41d95416f5b39d81f98de4
During DhcpAgent startup procedure all the following networks
initialization is actually perform twice:
* Killing old dnsmasq processes
* set and configure all TAP interfaces
* building all Dnsmasq config files (lease and host files)
* launching dnsmasq processes
What is done during the second iteration is just clean and redo
exactly the same another time! This is really inefficient and
increase dramatically DHCP startup time (near twice than needed).
Initialization process 'sync_state' method is called twice:
* one time during init_host()
* another time during _report_state()
sync_state() call must stay in init_host() due to bug #1420042.
sync_state() is always called during startup in init_host()
and will be periodically called by periodic_resync()
to do reconciliation.
Hence it can safely be removed from the run() method.
Change-Id: Id6433598d5c833d2e86be605089d42feee57c257
Closes-bug: #1651368
Closes-Bug: #1650611
When starting the dhcp-agent after an upgrade, there could
be stale IPv6 addresses in the namespace that had been
configured via SLAAC. These need to be removed, and the
same address added back statically, in order for the
agent to start up correctly.
To avoid the race condition where an IPv6 RA could arrive
while we are making this change, we must move the call
to disable RAs in the namespace from plug(), since devices
may already exist that are receiving packets.
Uncovered by the grenade tests.
Change-Id: I7e1e5d6c1fa938918aac3fb63888d20ff4088ba7
Closes-bug: #1627902
When enabling metadata, we iterate through the subnets
on a network multiple times. Do it only once at the
beginning and return early if there are no candidates.
Follow-on to comments in an earlier review,
https://review.openstack.org/#/c/293237
Had to fix a few tests that were creating "fake" subnets
without an ip_version attribute or passing a network
mock instead of a fake one.
Change-Id: I57dfeec339a072e78242373bf793dbbf04e8e4c3
All cache operations and dnsmasq process operations
are scoped to a network ID so we can always safely
perform concurrent actions on different network IDs.
This patch adjusts the DHCP agent to lock based on
network ID rather than having a global lock for every
operation.
sync_state calls are still protected with a reader/writer
lock to ensure that when sync_state needs to run, all
other operations are blocked.
Related-Bug: #1548190
Change-Id: I56010dc801d82be56f12e834c5164316872c2f8b
Currently the DHCP agent relies on the acceptance of an
RA to configure its IPv6 address with SLAAC or DHCPv6-stateless
network modes. It should explicitly assign addresses to the
agent based on the data model instead.
In order to do this we must disable RAs in the namespace so
that a static assignment doesn't conflict with a previously
created dynamically-generated address.
Change-Id: I1b38d131249d59fa486a07024d4b1ec61e693d59
Related-bug: #1627902
'refresh_dhcp_helper', which is called after subnet update/create
notifications in the DHCP agent, can end up retrieving ports that
the agent hadn't yet seen. It will then configure those ports but
not notify the server that they are ready.
Unless the port is subsequently updated on the server afterwards to
generate a new port update notification, the DHCP agent won't ever tell
the server that the port has had DHCP provisioned. This led to the
bug this closes. Another patch[1] that removed excessive DHCP ready
notifications uncovered this bug.
This patch just adjusts refresh_dhcp_helper to ensure that all ports
are marked as ready after configuring them all.
1. Ie7686837b18ff251baa315ef95dc511cda475672
Change-Id: I1fed60c1835c2ebed7c050c6fa114f89beec3190
Closes-Bug: #1639806
The DHCP agent was previously resending every single port to
the server whenever sync_state was called, even if it was just
for one network.
This let to sending way too much unnecessary data to the server
and also potentially resulted in sending a port to the server
that wasn't actually provisioned yet.
This patch corrects the behavior by only sending ports for networks
that are being synced if it's a conditional sync.
Closes-Bug: #1639086
Change-Id: Ie7686837b18ff251baa315ef95dc511cda475672
With current code, if first subnet of the network is an ipv6 subnet,
the metadata proxy will not be spawned. If user then adds ipv4 subnet
with dhcp enabled, the metadata proxy will still not be spawned. As a
result, the metadata service will not be available for the network.
This patch will kill/spawn metadata proxy, when subnet add/delete.
So, even if the first subnet of the network is not an ipv4 subnet with
dhcp enabled, the metadata proxy can still be spawned if network has
subnets need metadata proxy.
Closes-bug: #1556991
Change-Id: I0b45af8f2b756732f45c13d7e2dbcd30653cc026
There is a race condition server-side where a port request containing
a subnet_id is processed at the same time the subnet is being deleted,
the port operation may be successful without having a fixed IP on the
requested subnet. This patch makes the DHCP agent resillient to this
bug by checking the port response and raising a SubnetMismatchForPort
to trigger a resync if it doesn't have all of the requested subnet IDs.
Additionally, it avoids skipping assignment of IPv6 addresses to the
interface if they are stateless. The original logic to skip assignment
was only meant to be for SLAAC addresses.
Both of these issues were resulting in the KeyError observed in the
bug report.
Related-Bug: #1627480
Closes-Bug: #1624079
Change-Id: I85ef1f4d60efd0309d6a0706e29fdbcc16f0b59d
Change I445974b0e0dabb762807c6f318b1b44f51b3fe15 updated the
'revision' field to 'revision_number' but it missed the DHCP
agent and subsequently broke it's ability to detect stale updates.
This fixes the name in the agent.
This is marked as a partial for 1622616 because one of the reasons
the agent was frequently updating the DHCP port was in reaction
to stale port update messages for its own port.
Partial-Bug: #1622616
Closes-Bug: #1625867
Change-Id: Id41000127e1084f7ff243f8dc9c399999fbdaab4
Now that the agent will receive port update events for
all port changes[1], we need to avoid immediately restarting
when the subnets on the agent's port changes. Otherwise
the restart may request ports on a subnet which is in the
process of being deleted. While the server is equipped to
handle this, it makes subnet deletion much more contentious
than it needs to be.
This alters the logic to schedule a resync for later if the
agent's port has had its subnets changed rather than restarting
right away. Then by the time the agent eventually syncs the
server should have finished deleting the subnet. Even if it hasn't,
it spaces out the request from the agent for the network far enough
that the operation will be much less frequent to avoid racing
with the server.
1. I607635601caff0322fd0c80c9023f5c4f663ca25
Partial-Bug: #1622616
Change-Id: I98761a7e3f4bce8d5485c885f03c6bfdde246802
If the DHCP port setup process fails in the DHCP agent device
manager, it will throw a conflict exception, which will bubble
all of the way up to the main DHCP agent. The issue is that, during
a 'restart' call, the config files are wiped out while maintaining
the VIF before calling setup. This means that, if setup fails, there
is no reference to the interface name anymore so a subsequent destroy
will not first unplug the VIF before destroying the namespace.
This leaves a bunch of orphaned tap ports behind in the OVS case
that don't have an accessible namespace.
This patch addresses the issue by cleaning up all ports inside of
a namespace on a 'setup' failure before reraising the exception.
This ensures that the namespace is clear if destroy is called in the
future without another successful setup.
Closes-Bug: #1625325
Change-Id: I0211422de51ce6acc6eb593eb890b606101cb9f0
The previous logic was just ripping the interface out without
stopping dnsmasq. This would lead to a file handle remaining to the
interface which would cause OVS to completely freak out and assign
the same ofport to multiple ports.
This preserves the behavior introduced in
I40b85033d075562c43ce4d0e68296211b3241197 but just fully disables
DHCP rather than relying on an exception generation to cause the
resync.
Closes-bug: #1624701
Change-Id: Icdd9ac136eeb3707c912853b134dbb58109e6940
Capture port not found exceptions from port updates of DHCP ports
that no longer exist. The DHCP agent already checks the return
value for None in case any of the other things went missing
(e.g. Subnet, Network), so checking for ports disappearing makes
sense. The corresponding agent-side log message for this has also
been downgraded to debug since this is a normal occurrence.
This also cleans up log noise from calling reload_allocations on
networks that have already been torn down due to all of the subnets
being removed.
Closes-Bug: #1621650
Change-Id: I495401d225c664b8f1cf7b3d51747f3b47c24fc0
The DHCP agent was using the same context for every RPC
request so it made it difficult to tell server side where one
RPC request began and where another one ended.
This patch has it generate a new context for each RPC request
so they can be tracked independently. In the long term it would
be better if the agent kept the context for server-initiated events
so actions could be tracked end-to-end under the same request-ID.
Change-Id: I1d6dc28ba4752d3f9f1020851af2960859aae520
Closes-Bug: #1618231
This utilizes the new revision number to discard stale
port updates on the DHCP agent.
Change-Id: I8c904b63515692039e7e2bf6cf8c48f3575c5bc1
Partially-Implements: bp/push-notifications
neutron/tests/unit/agent/dhcp/test_agent.py
was given execute permission by mistake
in Change-Id: I57d7c242b2f2b63d71f7830fe355dbf857ffad58.
This proposal want to remove the error permission.
Change-Id: I44c783f5eae66b587f82cea08ea5fdd6d42234b4
Refactoring neutron configuration options for dhcp agent to be in
neutron/conf/agent. This would allow centralization of all configuration
options and provide an easy way to import.
Change-Id: Ia17d2d7223dd598e2d36a8320942fb03b61dffaf
Partial-Bug: #1563069
Some tests used incorrect order assertEqual(observed, expected).
The correct order expected by testtools is
assertEqual(expected, observed).
Change-Id: I57d7c242b2f2b63d71f7830fe355dbf857ffad58
When subnet is created and network is scheduled to dhcp agent, the
dhcp agent will request neutron server to create dhcp port.
Neutron server will create and mark port as BUILD and wait for the
ready signal from dhcp agent.
dhcp agent will create 'real' dhcp port after getting response from
neutron server. But after that, dhcp agent will not tell neutron server
that the dhcp port is ready. So, the reported bug can be observed.
If ports are created before dhcp is enabled for network, dhcp agent will
not mark ports as 'ready' as there is no network cache. This patch also
marks all ports in network as ready, in case that happens.
Change-Id: I363d8727f7ef6e6e08be4b0022c6464d51692b85
Closes-bug: #1588906
The option was deprecated a long time ago, and will be removed in one of
the next library releases, which will render neutron broken if we keep
using the option.
More details:
http://lists.openstack.org/pipermail/openstack-dev/2016-May/095166.html
Closes-Bug: #1586066
Change-Id: I884b4cc3ed04e4b5489e265c146666e04eb1bc27