Subcloud Enrolment failure with OAM IP Address change

With the OAM reconfiguration on subcloud enrollment, both old and
new IP addresses are observed on OAM interface, old one by original
OAM IP configuration, new one by cloud init's network-config. With
oam-modify trigger:
1) ifdown: old OAM IP and default OAM route get deleted,
2) ifcfg file reconfigured
3) ifup: fails (silently), as new IP setting is already present
As end result, the default OAM route is still missing, which leads
to other issues later: "kubeadm init phase upload-certs" fails, and
"install cert failure".

Concrete example:
Initially OAM interface (vlan112:3-7) had 2620:10a:a001:aa0c::128
Enrollment with OAM-reconfig requested with 2620:10a:a001:aa0c::171
Cloud init apply "etc/network/interfaces.d/50-cloud-init" (derived
from network-config) and sets new 2620:10a:a001:aa0c::171 IP on
"vlan112". Now we have both IPs, new one on vlan112 and old one on
vlan112:3-7. oam-modify triggered, apply_network_config.sh called,
ifdown vlan112:3-7, removes old 128 IP (and makes var/run/network/
ifstate.vlan112:3-7 state down/empty), but do not remove new 171 of
vlan112. Then address changed (128 -> 171) on ifcfg file (etc/network
/interfaces.d/ifcfg-vlan112:3-7), and then "ifup vlan112:3-7" fails as
we already have 171 on vlan112. Thus if state is still down/empty on
etc/network/interfaces.d/ifcfg-vlan112:3-7, and deleted default route
did not get reinstalled.

This commit fixes by cleaning IP and Route on linux configured by
old OAM through puppet and making OAM label/alias interface down,
before doing oam-modify, when OAM reconfiguration don't change
interface/vlan with respect to factory install OAM interface/vlan.
When the interface/vlan is not modified, the OAM reconfiguration is
only for address change, which is supported by oam-modify itself.
oam-modify itself needs oam connection intact, thus relying completely
on cloud-init's OAM IP and route. When oam-modify triggers puppet
runtime, above step 1) ifdown, do nothing as the interface is already
in down state, and thus default OAM route don't get deleted.

TEST PLAN:
  PASS: subcloud enrollment with oam-reconfig w/o interface/vlan change
        - check /var/log/cloud-init-output.log, for ip/route deletion
        - check /var/log/user.log, there could be still "Failed bringing"
        - check OAM single IP and default route presence
        - OAM connection based on cloud-init's new IP/route.
  PASS: subcloud enrollment without oam-reconfig
  PASS: subcloud enrollement with oam-reconfig with interface change
  PASS: subcloud enrollment with oam-reconfig with vlan change
  PASS: test above subcloud enrollement with both IPv4 and IPv6 on OAM

Closes-bug: 2089689
Change-Id: If3b36dc8722263b9b66b7f51f62452f1056be124
Signed-off-by: Tara Nath Subedi <tara.subedi@windriver.com>
This commit is contained in:
Tara Subedi 2024-11-13 16:51:56 -05:00
parent 3c6162c35e
commit 5a79edbc59

View File

@ -127,6 +127,7 @@ function load_credentials {
# in later enrollment steps. For example, a timing issue has been observed
# because the OAM IP is already available, service endpoint IPs are configured,
# but rerunning the Puppet manifest interferes with enrollment.
CURRENT_OAM_IP=""
function check_reconfigure_OAM {
system_mode=$(awk -F= '/system_mode/ {print $2}' /etc/platform/platform.conf)
@ -145,6 +146,7 @@ function check_reconfigure_OAM {
if [ "$system_mode" = "duplex" ]; then
# DX: Current system oam values
oam_c0_ip=$(echo "$oam_show_output" | awk '/oam_c0_ip/ {print $4}')
CURRENT_OAM_IP=$oam_c0_ip
oam_c1_ip=$(echo "$oam_show_output" | awk '/oam_c1_ip/ {print $4}')
oam_floating_ip=$(echo "$oam_show_output" | awk '/oam_floating_ip/ {print $4}')
oam_gateway_ip=$(echo "$oam_show_output" | awk '/oam_gateway_ip/ {print $4}')
@ -165,6 +167,7 @@ function check_reconfigure_OAM {
else
# SX: Current system oam values
oam_ip=$(echo "$oam_show_output" | awk '/oam_ip/ {print $4}')
CURRENT_OAM_IP=$oam_ip
oam_gateway_ip=$(echo "$oam_show_output" | awk '/oam_gateway_ip/ {print $4}')
oam_subnet=$(echo "$oam_show_output" | awk '/oam_subnet/ {print $4}')
@ -188,6 +191,144 @@ function check_reconfigure_OAM {
fi
}
function is_ipv6 {
local addr=$1
# simple check for ':'
if [ "${addr/:/}" != "${addr}" ]; then
# addr is ipv6
return 0
fi
return 1
}
#
# display network info
#
function display_network_info {
local contents
contents=$(
{
echo
echo "************ Links/addresses ************"
/usr/sbin/ip addr show
echo "************ IPv4 routes ****************"
/usr/sbin/ip route show
echo "************ IPv6 routes ****************"
/usr/sbin/ip -6 route show
echo "*****************************************"
}
)
log_info "Network info:${contents}"
}
function do_network_cleanup {
local if_name=$1
local oam_if_label=$2
local ip_command='ip'
if is_ipv6 "${OAM_IP}"; then
ip_command='ip -6'
fi
display_network_info
# We need new OAM connection to complete oam-modify, so should not loose OAM connection
# established by cloud-init. "if_name" interface has new OAM IP configured by cloud-init.
#
# oam-modify triggers puppet runtime 1) ifdown OAM-label, this deletes old OAM IP and default OAM route
# 2) changes ifcfg file and 3) ifup OAM-label, this fails as it has
# conflict to cloud-init provisioned OAM IP, and the default OAM route
# don't get reinstalled.
#
# To preserve the OAM route:
# Here we are forcing the OAM label-interface down, so that old OAM IP and default OAM route
# get cleaned up, and adding the default OAM route back again.
# With this oam-modify puppet runtime, do nothing on above step 1) ifdown, as the interface is
# already down, and won't delete exisiting OAM default route.
# As end result, after oam-modify, we will still have new IP and default OAM route.
#
log_info "Forcing current OAM label interface:$oam_if_label down"
ifdown_results=$(ifdown ${oam_if_label} --force 2>&1)
log_info "ifdown errors: ${ifdown_results}"
# Add the default route back
ip_route_results=$(${ip_command} route add default via ${OAM_GATEWAY_IP} dev ${if_name} 2>&1)
log_info "ip route add errors: ${ip_route_results}"
display_network_info
return 0
}
# Figure out OAM reconfiguration interface/vlan (cloud-init network-config's interface/vlan)
# is same as factory installed OAM's interface/vlan or not. Returns 0 if it is same, with the
# cloud-init's if-name value on CLOUD_INIT_OAM_IF_NAME variable.
CLOUD_INIT_OAM_IF_NAME=""
function check_oam_reconfiguration_on_same_interface {
local cfg=/etc/network/interfaces.d/50-cloud-init
local iface_line=''
local vlan_raw_device_line=''
local if_name=''
local vlan_raw_device=''
local vlan_id=''
if [ -f ${cfg} ]; then
iface_line=$( cat ${cfg} |grep ^iface | grep -v 'iface lo' )
if_name=$( echo "${iface_line}" | awk '{print $2}' )
regex="(vlan[0-9]+)|(.*\..*)"
if [[ ${if_name} =~ ${regex} ]]; then
vlan_raw_device_line=$( grep vlan-raw-device ${cfg} )
vlan_raw_device=$( echo "${vlan_raw_device_line}" | awk '{print $2}' )
vlan_id=$( echo "${if_name}" | grep -o '[0-9]*')
fi
fi
log_info "${cfg} parameters: if_name:${if_name} vlan_raw_device:${vlan_raw_device} vlan_id:${vlan_id}"
if [[ ${if_name} == "" ]]; then
log_info "No cloud-init interface found, nothing to do."
return 1
fi
command="system interface-network-list controller-0 --nowrap"
if ! execute_with_retries "$command"; then
log_fatal "$command failed after mutiple attempts."
fi
oam_if=$($command | awk '$8 == "oam" { print $6 }')
check_rc_die $? "system interface-network-list failed"
#type, vlan id, ports, uses i/f
command="system host-if-list controller-0 --nowrap"
if ! execute_with_retries "$command"; then
log_fatal "$command failed after mutiple attempts."
fi
host_if_list_output=$($command)
oam_if_details=$(echo "$host_if_list_output" | awk -v oam_if="$oam_if" '$4 == oam_if { print $8 " " $10 " " $12 " " $14 }')
check_rc_die $? "OAM interface details parsing failed"
log_info "OAM type, vlan id, ports, uses i/f: ${oam_if_details}"
oam_if_type=$( echo "${oam_if_details}" | awk '{print $1}' )
# In case of existing OAM interface of ethernet type, check if OAM reconfiguration is on same physical interface without vlan-id
if [[ ${oam_if_type} == "ethernet" ]]; then
oam_if_port=$( echo "${oam_if_details}" | awk '{print $3}' | sed -E "s/^\['([^']+)'.*$/\1/" )
log_info "OAM is of ethernet type, port:${oam_if_port}"
if [[ ${oam_if_port} == ${if_name} ]] && [[ ${vlan_raw_device} == '' ]] && [[ ${vlan_id} == '' ]]; then
CLOUD_INIT_OAM_IF_NAME=${if_name}
return 0
fi
# In case of existing OAM interface of vlan type, check if OAM reconfiguration is on same physical interface and vlan-id
elif [[ ${oam_if_type} == "vlan" ]]; then
oam_vlan_id=$( echo "${oam_if_details}" | awk '{print $2}' )
oam_vlan_uses_if=$( echo "${oam_if_details}" | awk '{print $4}' | sed -E "s/^\['([^']+)'.*$/\1/" )
oam_vlan_raw_device=$(echo "$host_if_list_output" | awk -v uses_if="$oam_vlan_uses_if" '$4 == uses_if { print $12 }' | sed -E "s/^\['([^']+)'.*$/\1/" )
check_rc_die $? "OAM vlan raw device parsing failed"
log_info "OAM is of VLAN type, vlan_raw_device:${oam_vlan_raw_device} vlan_id:${oam_vlan_id}"
if [[ ${oam_vlan_raw_device} == ${vlan_raw_device} ]] && [[ ${oam_vlan_id} == ${vlan_id} ]]; then
CLOUD_INIT_OAM_IF_NAME=${if_name}
return 0
fi
fi
return 1
}
function reconfigure_OAM {
system_mode=$(awk -F= '/system_mode/ {print $2}' /etc/platform/platform.conf)
@ -312,6 +453,23 @@ load_credentials
check_services_status
if check_reconfigure_OAM; then
if check_oam_reconfiguration_on_same_interface; then
# OAM reconfiguration requested on same interface/vlan as factory-installed OAM interface
#
# ip addr show command doesn't display IPv6 addresses with alias label, so this would work only on IPv4 address:
# current_oam_if_name_with_label=$(ip addr show $CLOUD_INIT_OAM_IF_NAME|grep $CURRENT_OAM_IP |grep -oP '\b'$CLOUD_INIT_OAM_IF_NAME'[^\s]*')
# Check directly on ifcfg file, to figure out the label/alias, which works for both IPv4 and IPv6 addresses:
current_oam_if_name_with_label=$(grep $CURRENT_OAM_IP /etc/network/interfaces.d/ifcfg-* | grep -oP '\b'$CLOUD_INIT_OAM_IF_NAME':[^:]*')
log_info "Current OAM IF label (alias):$current_oam_if_name_with_label."
# Here, reconfiguration is only for address change, which is supported by oam-modify itself.
# We still need new OAM connection to complete oam-modify, so should not loose OAM connection
# established by cloud-init.
# This new IP provisioned by cloud-init collides with oam-modify triggered puppet oam network
# reconfiguration, causing default OAM route missing. To avoid this, we do some cleanup.
do_network_cleanup ${CLOUD_INIT_OAM_IF_NAME} ${current_oam_if_name_with_label}
fi
reconfigure_OAM
fi