100 Commits

Author SHA1 Message Date
Ian Wells
8027ba0ad7 Use AFTER_SPAWN not AFTER_INIT in Neutron to start watcher
We want one additional thread in Neutron for the return worker,
and we really only need one additional thread per formward worker.

When we have multiple threads (with AFTER_INIT) this seems to be
upsetting the process of port binding, particularly on router and
DHCP ports, because of the multiple almost simultaneous binding
messages that trigger in the herd of threads.

Change-Id: I68dbd02b8a235b128779357d4bf3df26b5bf604a
2020-08-03 20:48:16 +00:00
Ian Wells
ba0ce71d1c COMPAT CLEANUP: stop importing things from compat
Compat is for compatibility; there are certain things that no longer
need help because they're consistent across all OpenStack versions we
support (really Queens+ at this point).  Remove the compat.py code
that supports this and stop using compat in import lines.

Also removed some conditional importing that was happening outside of
the compatibility layer - which shouldn't happen, but did in odd
places.

Change-Id: I913b184713a85b05ff97dcc14c9459025cb0c0e5
2020-07-28 17:26:12 -07:00
Ian Wells
68d33b43fa COMPAT CLEANUP: remove conditional imports
Remove all the conditionals to create the right imports in compat.py
where they relate to unsupported versions.

This is, in fact, most of them; the actual import lines will be
fixed in a subsequent patch.

Changed use of n_const and plugin_constants in the main body of the
code because they were being mixed up, making for harder work.  The
reason was that at some point the plugin types were moved to a
different file, confusing matters; L3 is a plugin constant, Neutron-
inbuilt network types are not.

We should also look for use of TYPE_VXLAN as it's quite likely that
we shouldn't be using it to refer to GPE networks at all.

Change-Id: I96356a3500b3bbc90f6acb2673fe75df5d027468
2020-07-28 17:19:40 -07:00
Ian Wells
854a02bb18 COMPAT CLEANUP: remove code for option functions
Option functions are now in a predictable place; use their imports
and lose the code that dealt with older locations in older Neutron
versions.

Change-Id: I6031e71c68679c2ac2a6efb4b696b834ad448523
2020-07-28 02:12:24 +00:00
Ian Wells
c8d15c9bcc Remove six
Six is Python 2 and 3 compatibility.  It is no longer needed now that
we're purely py3.

Change-Id: I28a6f8539b3b3ce35221728ec631380a18c921b3
2020-07-27 23:15:31 +00:00
Ian Wells
ef2b77f99b Bring hacking up to spec
Our version of hacking was ancient, and it loads a pep8 check that
is both very inconsistent in reporting errors and has weaker
constraints.  Fix the pep8 errors revealed and disable some of the
irrelevant tests in tox.ini.

Change-Id: I4ca71aa88456e4ac2baf0cfb3ac058b0a19e3b6b
2020-07-24 10:44:11 -07:00
Ian Wells
c5521a28e2 mech_vpp validates by the list of configured network type drivers
This will, on an all-in-one node, be whatever the agent is loading;
in multi-node configuations, network_drivers should be consistent
between control and compute nodes.

Since the valid network types that a mechanism driver accepts are
precisely dependent on the ones the agent supports, this allows
them to be configured to work in tandem.

Change-Id: I0f698b64fbb4a799dcf134d2b5bf3bf516d0dade
2020-07-22 16:50:37 +00:00
Ian Wells
fdf8775c97 Update SG mechdriver code to add rules properly
Adding a rule via the API adds the rule on the next SG update,
due to the weird way in which Neutron updates its database.  This
results in intermittent cases where a rule is added but not pushed out
to VPP until the next rule is added or the SG is changed (e.g. a name
change).  It's not true consistently, but it has been seen in test.

Use the opportunity to tidy the code, which is overengineered to
assume multiple-object updates when that never in fact happens.

Change-Id: I89b7ece88edf4613ccdbb5c14cae4b81608b483c
2020-06-03 10:03:32 -07:00
Ian Wells
53f957bbc1 Remove old compat from mech drivers
Liberty, Mitaka and Newton specific code was incorporated into the
mechanism driver.  We can discard distinctions for pre-Newton
versions quite safely, removing all compatibility if/else code
and simplifying the driver.

Change-Id: I8ea70dbf990ea4b994619e8a3ee493e7a9676f9e
2020-06-03 10:03:32 -07:00
Naveen Joy
7feb2fb892 Prevent a race condition when loading the trunk plugin
When the trunk plugin loads the etcdAgentCommunicator,
it sets the bind notifier to an empty lambda function. This
causes a race condition in the etcdAgentCommunicator object
used by mech_vpp to send port bind complete requests.
This patch resolves the issue by using a separate
JournalManager class that does not start any watcher
threads.

Closes-Bug: #1845537
Change-Id: I209b31b3e24dbb9bd6afeb327f15fe9883a57b71
2019-10-01 14:05:56 -07:00
Onong Tayeng
be2afb60ce remove etcd stale entries
Post a successful live migration, the etcd entry for the source compute node's
port binding is not removed. This also causes the BD and vhostuser
corresponding to the port to linger in the source compute node's VPP.

The fix ensures that proper cleanup happens.

Change-Id: Ibb8ef06f217ae322e2257fae4d20de33c354d7ea
Closes-Bug: #1835555
2019-08-09 17:43:46 +05:30
Ian Wells
d1c078ca2d Add mypy typing information
Some minor updates to certain functions to allow mypy to work, but
largely a matter of adding formatted typing comments that mypy can do
static typing against.

Change-Id: I65e88ba493599091fa657181f3f3ad5595dacbe1
2019-03-28 17:54:24 -07:00
Naveen Joy
1eefea9570 A GPE network type driver
A type driver for use by the GPE network type. This type
driver will replace the vxlan network type with the gpe
type. The vxlan network type will no longer be supported by
this mechanism driver. To use this driver, set below options
in the ml2_conf.ini file.

[ml2]
tenant_network_types = gpe,vlan
type_drivers = gpe,vlan,flat
[ml2_vpp]
gpe_vni_ranges = <vni_min>:<vni_max>, <vni_min>:<vni_max>, ..

Closes-bug: #1808887
Change-Id: I41572afabb9945c2e9c944c2486439d8ab26930b
2019-02-06 11:50:47 -08:00
jb
c6b8fa644b Implementation of Tap as a Service
(contains the vpp driver for openstack/tap-as-a-service)

Change-Id: Ibd28c936ba4f13e3822b495e43ad80781df81c1f
2018-08-17 13:21:53 -07:00
Ian Wells
89fa1d565a Add device_id to etcd port record to simplify debugging
Change-Id: I71a512c7be4b7bd2c69b1c0d81d6723c929001c0
2018-07-20 20:00:03 +00:00
Zuul
438fc62faa Merge "Rename remote group management function" 2018-06-22 16:08:42 +00:00
Zuul
8fe6ced579 Merge "Fix ICMP code handling issue in sec-group rules" 2018-06-18 16:59:18 +00:00
Ian Wells
de9592411a Rename remote group management function
The 'unbind_port_from_remote_groups' function has nothing to do
with port unbinding, which has a very specific meaning.  Renamed.

Change-Id: I46b7ba4e0c38759c7e0af282e8cf144b588e8715
2018-06-15 16:51:18 -07:00
Naveen Joy
b8e86370f9 Fix ICMP code handling issue in sec-group rules
In ICMP rules, port-min denotes ICMP type and port-max
denotes ICMP code. In some special cases, port-max can
be set to a null or zero value to indicate all codes for
a specific type. This bug resolves an issue where this
rule was not processed correctly.

Closes-Bug: #1775703

Change-Id: Id6c38b8d465d908c52f132b28b7d608009cde4bd
2018-06-08 15:03:33 -07:00
Ian Wells
f66542c203 Remove anti-pattern from etcd signing code
The 'SignedEtcdJSONWriter' is, depending on config, a 'signed etcd json
writer' or 'a proxy to the unsigned etcd json writer'.  The two classes
should be instances of the same base type and the correct one
instantiated based on circumstances.

This change converts users of the JSON writers to use a function that
instantiates the right JSON writer based on circumstances.

Change-Id: If01c68d93d06c62bea8e55568d5049a3aa8c2c97
2018-06-07 10:31:28 +00:00
Naveen Joy
8552ab0636 Remove stale remote-groups from etcd.
When a security group is deleted, remove the key from remote-groups
to avoid any stale etcd keys.

Closes-Bug: #1719625
Change-Id: I62f288768c3780db094a772d33f275418c5f959b
2018-03-23 02:39:56 +00:00
Naveen Joy
8dd085db87 Get session from context instead of neutron_db_api
From the Queens release onwards, the Neutron DB API no longer has
get_session(). We need to use the session from the context.

Change-Id: Ib072ffa538a4123444beb5b632171f679b6d72cf
Closes-Bug: #1750684
2018-03-01 14:42:23 -08:00
Naveen Joy
d8c2f8d862 Create a common constants module to avoid unwanted dependencies
Move all the shared constants between the server, agent and the
plugins to a common constants file to avoid unnecessary dependencies
between modules.

Change-Id: I89573b22e0ff246ff4fe607af207a9226045e7bb
Closes-Bug: #1750255
2018-02-19 13:15:28 -08:00
Shriram Chander
6252412cfd Updating callback imports from neutron to neutron_lib
Latest neutron has shuffled the 'callbacks' module to neutron_lib,
updating the imports in mech_vpp.py to reflect this.

Change-Id: I6e02fa2d995a7ee48f1497ad422424624083ba7b
2018-02-06 11:53:47 -08:00
Zuul
ac83f291a3 Support all Neutron protocol names in a security group rule
Adds support for all the string values of IP protocols supported
by Neutron such as rsvp, ah, esp, ospf and so on. The complete
list of all the protocols names could be found in the bug report.

Change-Id: Ida2523a401aef60def7749af9033b4467a8cf001
Closes-Bug: #1737785
2017-12-21 01:13:03 +00:00
Naveen Joy
368e208c92 Resolve a bug in the security group rule with integer IP Protocol values
When a SG rule is created in Neutron with an integer IP protocol value,
this rule does get configured and the vpp-agent throws an error.
Neutron sets the protocol value to a string which was not being correctly
converted to an integer. Resolved the issue by updating the mech-vpp
driver code to convert protocol values from string to int.

Change-Id: I6ad1278a7364df9e30b7351adad9579e2369878e
Closes-Bug: #1737578
2017-12-11 09:35:48 -08:00
Ian Wells
a5836fd1ce Add compat layer for driver_api file
Moved to neutron-lib in Pike/Queens.

Change-Id: Icbfcc60b9722e74f76bc8e9e4c6bfc373c538f35
2017-11-01 09:57:29 +11:00
Ian Wells
426ee0540f Add compatibility for ml2 config module
This recently (Pike-Queens) moved from neutron.plugins.ml2.config to
neutron.conf.plugins.ml2.config and this patch adds a layer in compat.py
to insulate us from that, and also rejigs ml2_vpp config so that the
options for etcd live nearer the code that uses them and in general all
option registering works in a similar fashion.

Addresses-the config problems that were band-aided over in change
Idbf622591e8d92d8e29e3a92799d0e7ff460354a.

Change-Id: I18767336f05a0fdcebfe16ddd46c2a5325d4e661
2017-10-11 08:26:40 +11:00
Ian Wells
69eeaf5ddc Remove obsolete 'maketap' plugging type
Only 'plugtap' is used at this point (i.e. we add an external interface
to a bridge along with a VPP-created tap interface).  Simplify the
code.

(Also update the test_mech.py file to be more idiomatic, which helped
when I had a failure to fix in that test.)

Change-Id: Ib9e31c4f46ca62b4cb764646af3ba9e50de451b1
2017-10-20 18:56:53 +00:00
jb
01d09e80d7 ETCD message signatures
The etcd messages are (optionally) stored as JWT (JSON Web tokens) with
RSA signature and X509 certificate.  It guarantees the authenticity of
the etcd messages.

Change-Id: Iaa1a70a7e87b935b6ff48e2b1f27784ed27ecc97
2017-10-11 07:35:33 +11:00
Ian Wells
d3b0616b5b Update location of TYPE_XXX constants for compatibility
neutron_lib has taken these from neutron.plugin.common - added
compatibility.

Change-Id: I1168eb791c747806d1ce0fef7258418a0e749052
2017-10-11 07:32:38 +11:00
Naveen Joy
964582b652 Fix unbind error when deleting an external network.
When an external subnet is deleted, if there are multiple DHCP agents,
neutron places the DHCP server's port in a binding-failed state on
compute hosts. During network deletion, the port_delete_precommit call
fails in networking-vpp as it assumes that the port has a valid binding
level. This patch resolves this issue.

Change-Id: I2f5e37da922c47c280828b20f39c127038b5e68d
2017-10-09 04:31:10 +00:00
Naveen Joy
2eccc9b656 Delete GPE remote keys from etcd when ports are unbound
If the vpp-agent is not active at the time GPE ports are unbound,
the remote GPE keys in the gpe key space are not removed. If the
the vpp-agent is started at a later time, it tries to create stale
remote mappings for these already unbound ports. This patch resolves
this issue.

Closes-Bug: 1710990
Change-Id: I741015622c8b97d19f0e4532e799b09ed352aab7
2017-08-15 15:31:21 -07:00
Naveen Joy
0c5fad4c7c Support remote-group-id in security-group rules
Prior to this patch, the remote-group-id attribute in security-group
was ignored by networking-vpp. This patch adds support for the remote-group-id
attribute by using an agent thread to watch for changes in security-group
to port associations. The agent then computes VPP ACLs by taking a product
of the port IP addresses in the remote-group and other rule attributes
such as protocol and TCP/UDP port.

Closes-Bug: #1656471
Change-Id: If68ab2c4cba179ec181d10a9d80ca40d78c76a0c
2017-07-04 09:00:48 +10:00
Hareesh Puthalath
43174e8005 Forward worker election refactoring
Ensure that we grab the election for a certain time before proceeding
with jobs that can take extended periods.  This ensures that the election
does not expire (which would let another thread become master).

Adding a configurable values for various timeouts and general refactoring.

Change-Id: I480a4ec44b571c24fcf42520220de0439743ef81
Closes-bug:#1694723
2017-07-21 12:34:15 -07:00
Hareesh Puthalath
5097441e23 Isolate vpp_etcd_journal table for update lock to a single row
Earlier identifying the earliest entry to the vpp_etcd_journal table and
locking it for update, so that we can process it and delete it was done
in a single query. We suspect, this might be accidentally causing a
lock on all the entries in the table (as part of the read) and causing
side effects when other threads are waiting for this table to be unlocked

The patch aims to split this into two parts.
(a) A normal select query to identify the oldest row and
(b) then a 'select for update' query on that row alone.
When multiple threads are entering this block, all the others who got the same
row id, will wait till the one with the lock releases it . It will query and find
that the row is gone and will starting another read.

Change-Id: I39bced823d7e1a7c6b38ab5490117ffced398741
Closes-bug:#1691813
2017-05-19 01:10:43 +00:00
Ian Wells
3d690a55de Remove the agent's 'utils' and 'exceptions' modules
The utils module contained a number of originally general purpose
functions, but they ultimately ended up being all about etcd.  The
exceptions were also exclusively about etcd and its config.  The
two modules have been combined with etcdhelper and networking_vpp's
own exceptions module.

Change-Id: Ieaa574cf7b335c43062419845e9ebd96a38b4bdb
2017-04-26 04:07:22 +10:00
Ian Wells
38ad71677f Log rationalisation
We have a bad habit of dumping entire datastructures of all
security groups in the system on many operations.  This patch
removes a lot of log lines (and rationalises some others) to
get the chatter down.  If logs need adding back, they can be
re-added later.

Change-Id: I05a025207dccea4bbace6c58f8393a159a319287
2017-04-26 01:38:40 +10:00
Yichen Wang
f2fb105193 Bugfix to set default is_vxlan during port unbind
1. Bugfix to set default is_vxlan during port unbind;
2. Remove debug logs;

Change-Id: Ie6dfc29a1cbf9f1c7a72819a849f15f680161a14
2017-05-03 14:55:17 -07:00
Ian Wells
1ff3d9e490 Remove ML2_VPP prefixes from log messages
Change-Id: I41e97c060c9944ff035972df68b85377653d9939
2017-04-25 19:39:25 +10:00
Ian Wells
29c817eb70 Align with neutron_lib standards
This is imperfect, because our compatibility layer that allows
us to work with older versions doesn't cover all the necessary
Neutron inclusions, but excluding N530 we have a pass.

Changes fit into categories:
- LOG.warn -> LOG.warning
- log messages now have kwargs if more than one argument
  (not essential, but used to be for translation and seems
  to be a good practice anyway)
- exception messages must be translated
- translated messages must have named positional arguments
- oslo_serialization.jsonutils must be used, not json
- [] can't be a default argument
- compat is marked with N530 as, there, neutron imports are
  expected.  In other places the mark has been left off, and
  the tox check has been disabled (for later enablement via an
  RFE)

This should not include changes outside of that set.

Change-Id: I1dc050f83a5199bf40117eac6c9adae221ae6857
2017-04-25 18:37:09 +10:00
Ian Wells
ada0826bba Tidyups in loop wait
Change-Id: Icf1382b390c24b4aaf4db4fdbde8094ea34d173e
2017-04-25 17:46:01 +10:00
Ian Wells
a5a43ff255 Fix lazy import line
Import line did not contain full module path

Change-Id: I0b5a95969e1616345caa9277b3682fc36ad1b870
2017-04-12 06:44:13 +10:00
Ian Wells
ed85ddb18a Change EtcdChangeWatcher to have a nicer API
Change-Id: Ifc1c6dfa6732a92ded838d5fd48f2fafce64d4d1
2017-03-29 21:02:47 +11:00
Ian Wells
da42b3a639 Add provisioning blocks support
For modern versions of OpenStack that support it, rather than putting
the port to ACTIVE in the mechanism driver, it is necessary to place
a provisioning block there and release it when provisioning finishes,
allowing ML2 itself to determine that all jobs are complete and that
the port should be made active.  Add support for those versions.

Removed some other hacks related to this that weren't working as they
should have (e.g. checking memory_regions doesn't work as a notification
source, and sock_errno wasn't the source of the problem).

Change-Id: I248fbca1d00f313075ad34f7cd1baceb4742786f
2017-03-31 19:46:44 +00:00
Ian Wells
4dbf8e8adc Remove secgroup push workaround
We pushed all secgroups before we bound a port, for two reasons:

1. there was a bug in writing security groups such that they weren't
always output on creation and not present when the port was bound.
This has long been addressed
2. there was no link between the secgroup thread, and, while no
guarantee, this made it considerably more likely secgroups were
present before a port was bound.  Async secgroup waiting fixes
this.

With the above addressed, the workaround can be removed.

Change-Id: Ic5c8e8c46774564d4b28aa844df41b187144d0dd
2017-03-28 11:48:49 +11:00
Ian Wells
a8c50f6815 Update binding notification code
It seems that we beat Nova to the punch, sometimes noticing that a
port is bound before Nova thinks it's possible.  In this case,
we send an event to Nova to signal bind completion, but it isn't
yet waiting for it.  Notify a second time, 4s later, as a temporary
fix until we can find a better solution (in Neutron or Nova as the
case may be).

Additionally, use the memory regions as the indicator of binding.
We aren't certain of what sock_error would be on an unbound port,
but we know the port is definitely connected if memory regions
are present.

Change-Id: I512ca70789ed5628e10a819a39af5793feb6a505
2017-03-28 08:26:07 +11:00
Ian Wells
00869438fe Fix up physnet and agent liveness detection
If an agent is down, or a physnet not *currently* present, we can't
bind.  However, the previous method of determining this didn't
check etcd's current state, only its initial state, which led to
startup races and no response to changing conditions.

Change-Id: Ia5738619d3f20ab30b16e372a6d4890fc9684274
2017-03-28 08:15:40 +11:00
Ian Wells
26eea1e2a9 Remove old recursion check on notify
The notify_bound call used to happen in the main Neutron thread, but
now it happens in response to a background call and as such shouldn't
ever re-enter.  The check is no longer required.

Change-Id: I491e1f0a5c4e89da1d8b0e50559511420bec47f5
2017-03-19 12:34:33 +11:00
Ian Wells
5dc3c31488 Use separate clients for each of the eventlets/threads
We use multiple eventlets in the system, but they share clients.
We've seen this leading to one call getting the answer for another,
and it's possibly also the cause of Neutron hanging (theoretically,
because it's in a long etcd watch when it sends another etcd request).

Change the code to ensure that background threads use their own etcd.
As a consequence, VPPForwarder is now (and should remain) clear of any
use of etcd, and similarly no Neutron thread should use etcd in the
mechanism driver - only the forwarder thread and the return thread.

This may provoke bind-before-secgroup races, which need fixing
independently.

Change-Id: I0487912e9d5115a70a5a4277abf6c54105755a15
2017-03-17 03:45:12 +00:00