We want one additional thread in Neutron for the return worker,
and we really only need one additional thread per formward worker.
When we have multiple threads (with AFTER_INIT) this seems to be
upsetting the process of port binding, particularly on router and
DHCP ports, because of the multiple almost simultaneous binding
messages that trigger in the herd of threads.
Change-Id: I68dbd02b8a235b128779357d4bf3df26b5bf604a
Compat is for compatibility; there are certain things that no longer
need help because they're consistent across all OpenStack versions we
support (really Queens+ at this point). Remove the compat.py code
that supports this and stop using compat in import lines.
Also removed some conditional importing that was happening outside of
the compatibility layer - which shouldn't happen, but did in odd
places.
Change-Id: I913b184713a85b05ff97dcc14c9459025cb0c0e5
Remove all the conditionals to create the right imports in compat.py
where they relate to unsupported versions.
This is, in fact, most of them; the actual import lines will be
fixed in a subsequent patch.
Changed use of n_const and plugin_constants in the main body of the
code because they were being mixed up, making for harder work. The
reason was that at some point the plugin types were moved to a
different file, confusing matters; L3 is a plugin constant, Neutron-
inbuilt network types are not.
We should also look for use of TYPE_VXLAN as it's quite likely that
we shouldn't be using it to refer to GPE networks at all.
Change-Id: I96356a3500b3bbc90f6acb2673fe75df5d027468
Option functions are now in a predictable place; use their imports
and lose the code that dealt with older locations in older Neutron
versions.
Change-Id: I6031e71c68679c2ac2a6efb4b696b834ad448523
Our version of hacking was ancient, and it loads a pep8 check that
is both very inconsistent in reporting errors and has weaker
constraints. Fix the pep8 errors revealed and disable some of the
irrelevant tests in tox.ini.
Change-Id: I4ca71aa88456e4ac2baf0cfb3ac058b0a19e3b6b
This will, on an all-in-one node, be whatever the agent is loading;
in multi-node configuations, network_drivers should be consistent
between control and compute nodes.
Since the valid network types that a mechanism driver accepts are
precisely dependent on the ones the agent supports, this allows
them to be configured to work in tandem.
Change-Id: I0f698b64fbb4a799dcf134d2b5bf3bf516d0dade
Adding a rule via the API adds the rule on the next SG update,
due to the weird way in which Neutron updates its database. This
results in intermittent cases where a rule is added but not pushed out
to VPP until the next rule is added or the SG is changed (e.g. a name
change). It's not true consistently, but it has been seen in test.
Use the opportunity to tidy the code, which is overengineered to
assume multiple-object updates when that never in fact happens.
Change-Id: I89b7ece88edf4613ccdbb5c14cae4b81608b483c
Liberty, Mitaka and Newton specific code was incorporated into the
mechanism driver. We can discard distinctions for pre-Newton
versions quite safely, removing all compatibility if/else code
and simplifying the driver.
Change-Id: I8ea70dbf990ea4b994619e8a3ee493e7a9676f9e
When the trunk plugin loads the etcdAgentCommunicator,
it sets the bind notifier to an empty lambda function. This
causes a race condition in the etcdAgentCommunicator object
used by mech_vpp to send port bind complete requests.
This patch resolves the issue by using a separate
JournalManager class that does not start any watcher
threads.
Closes-Bug: #1845537
Change-Id: I209b31b3e24dbb9bd6afeb327f15fe9883a57b71
Post a successful live migration, the etcd entry for the source compute node's
port binding is not removed. This also causes the BD and vhostuser
corresponding to the port to linger in the source compute node's VPP.
The fix ensures that proper cleanup happens.
Change-Id: Ibb8ef06f217ae322e2257fae4d20de33c354d7ea
Closes-Bug: #1835555
Some minor updates to certain functions to allow mypy to work, but
largely a matter of adding formatted typing comments that mypy can do
static typing against.
Change-Id: I65e88ba493599091fa657181f3f3ad5595dacbe1
A type driver for use by the GPE network type. This type
driver will replace the vxlan network type with the gpe
type. The vxlan network type will no longer be supported by
this mechanism driver. To use this driver, set below options
in the ml2_conf.ini file.
[ml2]
tenant_network_types = gpe,vlan
type_drivers = gpe,vlan,flat
[ml2_vpp]
gpe_vni_ranges = <vni_min>:<vni_max>, <vni_min>:<vni_max>, ..
Closes-bug: #1808887
Change-Id: I41572afabb9945c2e9c944c2486439d8ab26930b
The 'unbind_port_from_remote_groups' function has nothing to do
with port unbinding, which has a very specific meaning. Renamed.
Change-Id: I46b7ba4e0c38759c7e0af282e8cf144b588e8715
In ICMP rules, port-min denotes ICMP type and port-max
denotes ICMP code. In some special cases, port-max can
be set to a null or zero value to indicate all codes for
a specific type. This bug resolves an issue where this
rule was not processed correctly.
Closes-Bug: #1775703
Change-Id: Id6c38b8d465d908c52f132b28b7d608009cde4bd
The 'SignedEtcdJSONWriter' is, depending on config, a 'signed etcd json
writer' or 'a proxy to the unsigned etcd json writer'. The two classes
should be instances of the same base type and the correct one
instantiated based on circumstances.
This change converts users of the JSON writers to use a function that
instantiates the right JSON writer based on circumstances.
Change-Id: If01c68d93d06c62bea8e55568d5049a3aa8c2c97
When a security group is deleted, remove the key from remote-groups
to avoid any stale etcd keys.
Closes-Bug: #1719625
Change-Id: I62f288768c3780db094a772d33f275418c5f959b
From the Queens release onwards, the Neutron DB API no longer has
get_session(). We need to use the session from the context.
Change-Id: Ib072ffa538a4123444beb5b632171f679b6d72cf
Closes-Bug: #1750684
Move all the shared constants between the server, agent and the
plugins to a common constants file to avoid unnecessary dependencies
between modules.
Change-Id: I89573b22e0ff246ff4fe607af207a9226045e7bb
Closes-Bug: #1750255
Latest neutron has shuffled the 'callbacks' module to neutron_lib,
updating the imports in mech_vpp.py to reflect this.
Change-Id: I6e02fa2d995a7ee48f1497ad422424624083ba7b
Adds support for all the string values of IP protocols supported
by Neutron such as rsvp, ah, esp, ospf and so on. The complete
list of all the protocols names could be found in the bug report.
Change-Id: Ida2523a401aef60def7749af9033b4467a8cf001
Closes-Bug: #1737785
When a SG rule is created in Neutron with an integer IP protocol value,
this rule does get configured and the vpp-agent throws an error.
Neutron sets the protocol value to a string which was not being correctly
converted to an integer. Resolved the issue by updating the mech-vpp
driver code to convert protocol values from string to int.
Change-Id: I6ad1278a7364df9e30b7351adad9579e2369878e
Closes-Bug: #1737578
This recently (Pike-Queens) moved from neutron.plugins.ml2.config to
neutron.conf.plugins.ml2.config and this patch adds a layer in compat.py
to insulate us from that, and also rejigs ml2_vpp config so that the
options for etcd live nearer the code that uses them and in general all
option registering works in a similar fashion.
Addresses-the config problems that were band-aided over in change
Idbf622591e8d92d8e29e3a92799d0e7ff460354a.
Change-Id: I18767336f05a0fdcebfe16ddd46c2a5325d4e661
Only 'plugtap' is used at this point (i.e. we add an external interface
to a bridge along with a VPP-created tap interface). Simplify the
code.
(Also update the test_mech.py file to be more idiomatic, which helped
when I had a failure to fix in that test.)
Change-Id: Ib9e31c4f46ca62b4cb764646af3ba9e50de451b1
The etcd messages are (optionally) stored as JWT (JSON Web tokens) with
RSA signature and X509 certificate. It guarantees the authenticity of
the etcd messages.
Change-Id: Iaa1a70a7e87b935b6ff48e2b1f27784ed27ecc97
When an external subnet is deleted, if there are multiple DHCP agents,
neutron places the DHCP server's port in a binding-failed state on
compute hosts. During network deletion, the port_delete_precommit call
fails in networking-vpp as it assumes that the port has a valid binding
level. This patch resolves this issue.
Change-Id: I2f5e37da922c47c280828b20f39c127038b5e68d
If the vpp-agent is not active at the time GPE ports are unbound,
the remote GPE keys in the gpe key space are not removed. If the
the vpp-agent is started at a later time, it tries to create stale
remote mappings for these already unbound ports. This patch resolves
this issue.
Closes-Bug: 1710990
Change-Id: I741015622c8b97d19f0e4532e799b09ed352aab7
Prior to this patch, the remote-group-id attribute in security-group
was ignored by networking-vpp. This patch adds support for the remote-group-id
attribute by using an agent thread to watch for changes in security-group
to port associations. The agent then computes VPP ACLs by taking a product
of the port IP addresses in the remote-group and other rule attributes
such as protocol and TCP/UDP port.
Closes-Bug: #1656471
Change-Id: If68ab2c4cba179ec181d10a9d80ca40d78c76a0c
Ensure that we grab the election for a certain time before proceeding
with jobs that can take extended periods. This ensures that the election
does not expire (which would let another thread become master).
Adding a configurable values for various timeouts and general refactoring.
Change-Id: I480a4ec44b571c24fcf42520220de0439743ef81
Closes-bug:#1694723
Earlier identifying the earliest entry to the vpp_etcd_journal table and
locking it for update, so that we can process it and delete it was done
in a single query. We suspect, this might be accidentally causing a
lock on all the entries in the table (as part of the read) and causing
side effects when other threads are waiting for this table to be unlocked
The patch aims to split this into two parts.
(a) A normal select query to identify the oldest row and
(b) then a 'select for update' query on that row alone.
When multiple threads are entering this block, all the others who got the same
row id, will wait till the one with the lock releases it . It will query and find
that the row is gone and will starting another read.
Change-Id: I39bced823d7e1a7c6b38ab5490117ffced398741
Closes-bug:#1691813
The utils module contained a number of originally general purpose
functions, but they ultimately ended up being all about etcd. The
exceptions were also exclusively about etcd and its config. The
two modules have been combined with etcdhelper and networking_vpp's
own exceptions module.
Change-Id: Ieaa574cf7b335c43062419845e9ebd96a38b4bdb
We have a bad habit of dumping entire datastructures of all
security groups in the system on many operations. This patch
removes a lot of log lines (and rationalises some others) to
get the chatter down. If logs need adding back, they can be
re-added later.
Change-Id: I05a025207dccea4bbace6c58f8393a159a319287
This is imperfect, because our compatibility layer that allows
us to work with older versions doesn't cover all the necessary
Neutron inclusions, but excluding N530 we have a pass.
Changes fit into categories:
- LOG.warn -> LOG.warning
- log messages now have kwargs if more than one argument
(not essential, but used to be for translation and seems
to be a good practice anyway)
- exception messages must be translated
- translated messages must have named positional arguments
- oslo_serialization.jsonutils must be used, not json
- [] can't be a default argument
- compat is marked with N530 as, there, neutron imports are
expected. In other places the mark has been left off, and
the tox check has been disabled (for later enablement via an
RFE)
This should not include changes outside of that set.
Change-Id: I1dc050f83a5199bf40117eac6c9adae221ae6857
For modern versions of OpenStack that support it, rather than putting
the port to ACTIVE in the mechanism driver, it is necessary to place
a provisioning block there and release it when provisioning finishes,
allowing ML2 itself to determine that all jobs are complete and that
the port should be made active. Add support for those versions.
Removed some other hacks related to this that weren't working as they
should have (e.g. checking memory_regions doesn't work as a notification
source, and sock_errno wasn't the source of the problem).
Change-Id: I248fbca1d00f313075ad34f7cd1baceb4742786f
We pushed all secgroups before we bound a port, for two reasons:
1. there was a bug in writing security groups such that they weren't
always output on creation and not present when the port was bound.
This has long been addressed
2. there was no link between the secgroup thread, and, while no
guarantee, this made it considerably more likely secgroups were
present before a port was bound. Async secgroup waiting fixes
this.
With the above addressed, the workaround can be removed.
Change-Id: Ic5c8e8c46774564d4b28aa844df41b187144d0dd
It seems that we beat Nova to the punch, sometimes noticing that a
port is bound before Nova thinks it's possible. In this case,
we send an event to Nova to signal bind completion, but it isn't
yet waiting for it. Notify a second time, 4s later, as a temporary
fix until we can find a better solution (in Neutron or Nova as the
case may be).
Additionally, use the memory regions as the indicator of binding.
We aren't certain of what sock_error would be on an unbound port,
but we know the port is definitely connected if memory regions
are present.
Change-Id: I512ca70789ed5628e10a819a39af5793feb6a505
If an agent is down, or a physnet not *currently* present, we can't
bind. However, the previous method of determining this didn't
check etcd's current state, only its initial state, which led to
startup races and no response to changing conditions.
Change-Id: Ia5738619d3f20ab30b16e372a6d4890fc9684274
The notify_bound call used to happen in the main Neutron thread, but
now it happens in response to a background call and as such shouldn't
ever re-enter. The check is no longer required.
Change-Id: I491e1f0a5c4e89da1d8b0e50559511420bec47f5
We use multiple eventlets in the system, but they share clients.
We've seen this leading to one call getting the answer for another,
and it's possibly also the cause of Neutron hanging (theoretically,
because it's in a long etcd watch when it sends another etcd request).
Change the code to ensure that background threads use their own etcd.
As a consequence, VPPForwarder is now (and should remain) clear of any
use of etcd, and similarly no Neutron thread should use etcd in the
mechanism driver - only the forwarder thread and the return thread.
This may provoke bind-before-secgroup races, which need fixing
independently.
Change-Id: I0487912e9d5115a70a5a4277abf6c54105755a15