Do not bring up udev assigned interfaces

The extant code is designed to loop over every device in
/sys/class/net and bringing the interface "up" to see if it valid and
something that should be configured.

As described inline, if we bring "up" an interface it can accept an RA
and get an ipv6 address assigned by the kernel.  NetworkManager will
then refuse to further configure the interface, leaving the host
generally without ipv4 networking.

The per-interface loop only actually happens on older platforms that
don't use systemd.  On systemd, glean is called for each interface
individually by udev rules.  However, we fall into the old code path,
just with one interface to work with, rather than all of them.

Thus this initial hack detects that case (by noting we were passed the
interface explicitly) and short-circuits activity check; it just
assumes that if udev asked (and it's not of a device type we don't
support), then the interface should be configured.  The interface will
*not* be put into the "up" state.

We should follow-on this change with a removal of this loop and
cleaning up the non udev/systemd activation paths.  However, this
depends on a few longer term things:

- removing Trusty support (which still hangs on by the skin of its
  teeth in OpenStack Infra, so we need to be not building nodes there)
- evaluating what Gentoo is doing in the non-systemd case.
- making sure bifrost doesnt' depend on this (likely only other user?)

In the mean time, this should fix the race conditions we've been
seeing on system+network-manager platforms.

Change-Id: I6ce51a8755e1892d3010eefd365fbad6bcec137b
This commit is contained in:
Ian Wienand 2019-10-11 14:49:25 +11:00 committed by Jeremy Stanley
parent cf715b6590
commit 82e111f769
1 changed files with 52 additions and 2 deletions

View File

@ -1124,9 +1124,12 @@ def is_interface_live(interface, sys_root):
def interface_live(iface, sys_root, args):
log.debug("Checking if interface %s has an active link carrier." % iface)
log.debug("Checking status of interface %s" % iface)
if is_interface_live(iface, sys_root):
log.debug("%s has active carrier, including", iface)
return True
else:
log.debug("%s does not have active carrier", iface)
if args.noop:
return False
@ -1185,9 +1188,13 @@ def get_sys_interfaces(interface, args):
ignored_interfaces = ('sit', 'tunl', 'bonding_master', 'teql', 'wg',
'ip6gre', 'ip6_vti', 'ip6tnl', 'bond', 'lo')
sys_interfaces = {}
called_from_udev = False
if interface is not None:
log.debug("Only considering interface %s from arguments" % interface)
interfaces = [interface]
# see notes below...
called_from_udev = True
else:
interfaces = [f for f in os.listdir(sys_root)
if not f.startswith(ignored_interfaces)]
@ -1212,9 +1219,48 @@ def get_sys_interfaces(interface, args):
# glean.
if mac_addr_type != PERMANENT_ADDR_TYPE:
continue
mac = open('%s/%s/address' % (sys_root, iface), 'r').read().strip()
# Hack alert! If we have been given a single interface
# argument (hence called_from_udev is true), that means we
# have been called for just one nic by udev in response to the
# "net" "add" action matching. We are going to assume that if
# we made it this far (i.e. past the filters above) this
# interface should be configured.
#
# It is unclear, as at 2019-10, if there are active jobs
# relying on the "probe" path below. The only way to get into
# this path is being called from init scripts on a pre-systemd
# platform that does not use udev activiation; this would mean
# (as at this writing) Trusty (CentOS 6 being long gone).
#
# In short, it tries to bring up *all* the interfaces, and if
# they don't come up, it figures they're not valid and
# excludes them. This introduces a very tricky race -- by
# bringing the interface up it can start accepting RA
# broadcasts and possibly have the kernel configure it with an
# ipv6 addresses. Then, network-manager will see the
# interface is already configured, and out of an abdundance of
# caution, refuse to re-configure it. You end up with broken
# networking.
#
# This is racy; you might get lucky and the RA timeout is long
# enough that network-manager starts before this happens. So
# it is not exactly correct to say that the probe path is
# completely broken; it is possible users have just not
# noticed or are tacitly relying on it.
#
# While we consider this, assuming that if we are called from
# udev that the interface is to be configured here, and not
# bringing it "up", avoids this issue.
if called_from_udev:
log.debug("Interface matched: %s (%s)", iface, mac)
sys_interfaces[mac] = iface
return sys_interfaces
# check if interface is up if not try and bring it up
if interface_live(iface, sys_root, args):
mac = open('%s/%s/address' % (sys_root, iface), 'r').read().strip()
if_dict[iface] = mac
# wait up to 9 seconds all interfaces to reach up
@ -1225,6 +1271,7 @@ def get_sys_interfaces(interface, args):
mac = if_dict[iface]
if iface in if_up_list:
continue
log.debug("Checking liveness of %s", mac)
if is_interface_live(iface, sys_root):
# Add system interface
sys_interfaces[mac] = iface
@ -1245,6 +1292,9 @@ def get_sys_interfaces(interface, args):
if_dict[iface])
log.warn(msg)
log.debug("WARNING: interfaces have been brought 'up' during the probing"
"process. This may cause problems if IPv6 RA have"
"been accepted")
return sys_interfaces