Subcloud Enrolment failure with OAM IP Address change
With the OAM reconfiguration on subcloud enrollment, both old and new IP addresses are observed on OAM interface, old one by original OAM IP configuration, new one by cloud init's network-config. With oam-modify trigger: 1) ifdown: old OAM IP and default OAM route get deleted, 2) ifcfg file reconfigured 3) ifup: fails (silently), as new IP setting is already present As end result, the default OAM route is still missing, which leads to other issues later: "kubeadm init phase upload-certs" fails, and "install cert failure". Concrete example: Initially OAM interface (vlan112:3-7) had 2620:10a:a001:aa0c::128 Enrollment with OAM-reconfig requested with 2620:10a:a001:aa0c::171 Cloud init apply "etc/network/interfaces.d/50-cloud-init" (derived from network-config) and sets new 2620:10a:a001:aa0c::171 IP on "vlan112". Now we have both IPs, new one on vlan112 and old one on vlan112:3-7. oam-modify triggered, apply_network_config.sh called, ifdown vlan112:3-7, removes old 128 IP (and makes var/run/network/ ifstate.vlan112:3-7 state down/empty), but do not remove new 171 of vlan112. Then address changed (128 -> 171) on ifcfg file (etc/network /interfaces.d/ifcfg-vlan112:3-7), and then "ifup vlan112:3-7" fails as we already have 171 on vlan112. Thus if state is still down/empty on etc/network/interfaces.d/ifcfg-vlan112:3-7, and deleted default route did not get reinstalled. This commit fixes by cleaning IP and Route on linux configured by old OAM through puppet and making OAM label/alias interface down, before doing oam-modify, when OAM reconfiguration don't change interface/vlan with respect to factory install OAM interface/vlan. When the interface/vlan is not modified, the OAM reconfiguration is only for address change, which is supported by oam-modify itself. oam-modify itself needs oam connection intact, thus relying completely on cloud-init's OAM IP and route. When oam-modify triggers puppet runtime, above step 1) ifdown, do nothing as the interface is already in down state, and thus default OAM route don't get deleted. TEST PLAN: PASS: subcloud enrollment with oam-reconfig w/o interface/vlan change - check /var/log/cloud-init-output.log, for ip/route deletion - check /var/log/user.log, there could be still "Failed bringing" - check OAM single IP and default route presence - OAM connection based on cloud-init's new IP/route. PASS: subcloud enrollment without oam-reconfig PASS: subcloud enrollement with oam-reconfig with interface change PASS: subcloud enrollment with oam-reconfig with vlan change PASS: test above subcloud enrollement with both IPv4 and IPv6 on OAM Closes-bug: 2089689 Change-Id: If3b36dc8722263b9b66b7f51f62452f1056be124 Signed-off-by: Tara Nath Subedi <tara.subedi@windriver.com>
This commit is contained in:
parent
3c6162c35e
commit
5a79edbc59
@ -127,6 +127,7 @@ function load_credentials {
|
||||
# in later enrollment steps. For example, a timing issue has been observed
|
||||
# because the OAM IP is already available, service endpoint IPs are configured,
|
||||
# but rerunning the Puppet manifest interferes with enrollment.
|
||||
CURRENT_OAM_IP=""
|
||||
function check_reconfigure_OAM {
|
||||
system_mode=$(awk -F= '/system_mode/ {print $2}' /etc/platform/platform.conf)
|
||||
|
||||
@ -145,6 +146,7 @@ function check_reconfigure_OAM {
|
||||
if [ "$system_mode" = "duplex" ]; then
|
||||
# DX: Current system oam values
|
||||
oam_c0_ip=$(echo "$oam_show_output" | awk '/oam_c0_ip/ {print $4}')
|
||||
CURRENT_OAM_IP=$oam_c0_ip
|
||||
oam_c1_ip=$(echo "$oam_show_output" | awk '/oam_c1_ip/ {print $4}')
|
||||
oam_floating_ip=$(echo "$oam_show_output" | awk '/oam_floating_ip/ {print $4}')
|
||||
oam_gateway_ip=$(echo "$oam_show_output" | awk '/oam_gateway_ip/ {print $4}')
|
||||
@ -165,6 +167,7 @@ function check_reconfigure_OAM {
|
||||
else
|
||||
# SX: Current system oam values
|
||||
oam_ip=$(echo "$oam_show_output" | awk '/oam_ip/ {print $4}')
|
||||
CURRENT_OAM_IP=$oam_ip
|
||||
oam_gateway_ip=$(echo "$oam_show_output" | awk '/oam_gateway_ip/ {print $4}')
|
||||
oam_subnet=$(echo "$oam_show_output" | awk '/oam_subnet/ {print $4}')
|
||||
|
||||
@ -188,6 +191,144 @@ function check_reconfigure_OAM {
|
||||
fi
|
||||
}
|
||||
|
||||
function is_ipv6 {
|
||||
local addr=$1
|
||||
# simple check for ':'
|
||||
if [ "${addr/:/}" != "${addr}" ]; then
|
||||
# addr is ipv6
|
||||
return 0
|
||||
fi
|
||||
return 1
|
||||
}
|
||||
|
||||
#
|
||||
# display network info
|
||||
#
|
||||
function display_network_info {
|
||||
local contents
|
||||
contents=$(
|
||||
{
|
||||
echo
|
||||
echo "************ Links/addresses ************"
|
||||
/usr/sbin/ip addr show
|
||||
echo "************ IPv4 routes ****************"
|
||||
/usr/sbin/ip route show
|
||||
echo "************ IPv6 routes ****************"
|
||||
/usr/sbin/ip -6 route show
|
||||
echo "*****************************************"
|
||||
}
|
||||
)
|
||||
log_info "Network info:${contents}"
|
||||
}
|
||||
|
||||
|
||||
function do_network_cleanup {
|
||||
local if_name=$1
|
||||
local oam_if_label=$2
|
||||
|
||||
local ip_command='ip'
|
||||
if is_ipv6 "${OAM_IP}"; then
|
||||
ip_command='ip -6'
|
||||
fi
|
||||
|
||||
display_network_info
|
||||
|
||||
# We need new OAM connection to complete oam-modify, so should not loose OAM connection
|
||||
# established by cloud-init. "if_name" interface has new OAM IP configured by cloud-init.
|
||||
#
|
||||
# oam-modify triggers puppet runtime 1) ifdown OAM-label, this deletes old OAM IP and default OAM route
|
||||
# 2) changes ifcfg file and 3) ifup OAM-label, this fails as it has
|
||||
# conflict to cloud-init provisioned OAM IP, and the default OAM route
|
||||
# don't get reinstalled.
|
||||
#
|
||||
# To preserve the OAM route:
|
||||
# Here we are forcing the OAM label-interface down, so that old OAM IP and default OAM route
|
||||
# get cleaned up, and adding the default OAM route back again.
|
||||
# With this oam-modify puppet runtime, do nothing on above step 1) ifdown, as the interface is
|
||||
# already down, and won't delete exisiting OAM default route.
|
||||
# As end result, after oam-modify, we will still have new IP and default OAM route.
|
||||
#
|
||||
log_info "Forcing current OAM label interface:$oam_if_label down"
|
||||
ifdown_results=$(ifdown ${oam_if_label} --force 2>&1)
|
||||
log_info "ifdown errors: ${ifdown_results}"
|
||||
|
||||
# Add the default route back
|
||||
ip_route_results=$(${ip_command} route add default via ${OAM_GATEWAY_IP} dev ${if_name} 2>&1)
|
||||
log_info "ip route add errors: ${ip_route_results}"
|
||||
|
||||
display_network_info
|
||||
return 0
|
||||
}
|
||||
|
||||
# Figure out OAM reconfiguration interface/vlan (cloud-init network-config's interface/vlan)
|
||||
# is same as factory installed OAM's interface/vlan or not. Returns 0 if it is same, with the
|
||||
# cloud-init's if-name value on CLOUD_INIT_OAM_IF_NAME variable.
|
||||
CLOUD_INIT_OAM_IF_NAME=""
|
||||
function check_oam_reconfiguration_on_same_interface {
|
||||
local cfg=/etc/network/interfaces.d/50-cloud-init
|
||||
local iface_line=''
|
||||
local vlan_raw_device_line=''
|
||||
local if_name=''
|
||||
local vlan_raw_device=''
|
||||
local vlan_id=''
|
||||
if [ -f ${cfg} ]; then
|
||||
iface_line=$( cat ${cfg} |grep ^iface | grep -v 'iface lo' )
|
||||
if_name=$( echo "${iface_line}" | awk '{print $2}' )
|
||||
regex="(vlan[0-9]+)|(.*\..*)"
|
||||
if [[ ${if_name} =~ ${regex} ]]; then
|
||||
vlan_raw_device_line=$( grep vlan-raw-device ${cfg} )
|
||||
vlan_raw_device=$( echo "${vlan_raw_device_line}" | awk '{print $2}' )
|
||||
vlan_id=$( echo "${if_name}" | grep -o '[0-9]*')
|
||||
fi
|
||||
fi
|
||||
|
||||
log_info "${cfg} parameters: if_name:${if_name} vlan_raw_device:${vlan_raw_device} vlan_id:${vlan_id}"
|
||||
if [[ ${if_name} == "" ]]; then
|
||||
log_info "No cloud-init interface found, nothing to do."
|
||||
return 1
|
||||
fi
|
||||
|
||||
command="system interface-network-list controller-0 --nowrap"
|
||||
if ! execute_with_retries "$command"; then
|
||||
log_fatal "$command failed after mutiple attempts."
|
||||
fi
|
||||
oam_if=$($command | awk '$8 == "oam" { print $6 }')
|
||||
check_rc_die $? "system interface-network-list failed"
|
||||
#type, vlan id, ports, uses i/f
|
||||
|
||||
command="system host-if-list controller-0 --nowrap"
|
||||
if ! execute_with_retries "$command"; then
|
||||
log_fatal "$command failed after mutiple attempts."
|
||||
fi
|
||||
host_if_list_output=$($command)
|
||||
oam_if_details=$(echo "$host_if_list_output" | awk -v oam_if="$oam_if" '$4 == oam_if { print $8 " " $10 " " $12 " " $14 }')
|
||||
check_rc_die $? "OAM interface details parsing failed"
|
||||
log_info "OAM type, vlan id, ports, uses i/f: ${oam_if_details}"
|
||||
oam_if_type=$( echo "${oam_if_details}" | awk '{print $1}' )
|
||||
# In case of existing OAM interface of ethernet type, check if OAM reconfiguration is on same physical interface without vlan-id
|
||||
if [[ ${oam_if_type} == "ethernet" ]]; then
|
||||
oam_if_port=$( echo "${oam_if_details}" | awk '{print $3}' | sed -E "s/^\['([^']+)'.*$/\1/" )
|
||||
log_info "OAM is of ethernet type, port:${oam_if_port}"
|
||||
if [[ ${oam_if_port} == ${if_name} ]] && [[ ${vlan_raw_device} == '' ]] && [[ ${vlan_id} == '' ]]; then
|
||||
CLOUD_INIT_OAM_IF_NAME=${if_name}
|
||||
return 0
|
||||
fi
|
||||
# In case of existing OAM interface of vlan type, check if OAM reconfiguration is on same physical interface and vlan-id
|
||||
elif [[ ${oam_if_type} == "vlan" ]]; then
|
||||
oam_vlan_id=$( echo "${oam_if_details}" | awk '{print $2}' )
|
||||
oam_vlan_uses_if=$( echo "${oam_if_details}" | awk '{print $4}' | sed -E "s/^\['([^']+)'.*$/\1/" )
|
||||
|
||||
oam_vlan_raw_device=$(echo "$host_if_list_output" | awk -v uses_if="$oam_vlan_uses_if" '$4 == uses_if { print $12 }' | sed -E "s/^\['([^']+)'.*$/\1/" )
|
||||
check_rc_die $? "OAM vlan raw device parsing failed"
|
||||
log_info "OAM is of VLAN type, vlan_raw_device:${oam_vlan_raw_device} vlan_id:${oam_vlan_id}"
|
||||
if [[ ${oam_vlan_raw_device} == ${vlan_raw_device} ]] && [[ ${oam_vlan_id} == ${vlan_id} ]]; then
|
||||
CLOUD_INIT_OAM_IF_NAME=${if_name}
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
return 1
|
||||
}
|
||||
|
||||
function reconfigure_OAM {
|
||||
system_mode=$(awk -F= '/system_mode/ {print $2}' /etc/platform/platform.conf)
|
||||
|
||||
@ -312,6 +453,23 @@ load_credentials
|
||||
check_services_status
|
||||
|
||||
if check_reconfigure_OAM; then
|
||||
if check_oam_reconfiguration_on_same_interface; then
|
||||
# OAM reconfiguration requested on same interface/vlan as factory-installed OAM interface
|
||||
#
|
||||
# ip addr show command doesn't display IPv6 addresses with alias label, so this would work only on IPv4 address:
|
||||
# current_oam_if_name_with_label=$(ip addr show $CLOUD_INIT_OAM_IF_NAME|grep $CURRENT_OAM_IP |grep -oP '\b'$CLOUD_INIT_OAM_IF_NAME'[^\s]*')
|
||||
# Check directly on ifcfg file, to figure out the label/alias, which works for both IPv4 and IPv6 addresses:
|
||||
current_oam_if_name_with_label=$(grep $CURRENT_OAM_IP /etc/network/interfaces.d/ifcfg-* | grep -oP '\b'$CLOUD_INIT_OAM_IF_NAME':[^:]*')
|
||||
log_info "Current OAM IF label (alias):$current_oam_if_name_with_label."
|
||||
|
||||
# Here, reconfiguration is only for address change, which is supported by oam-modify itself.
|
||||
# We still need new OAM connection to complete oam-modify, so should not loose OAM connection
|
||||
# established by cloud-init.
|
||||
# This new IP provisioned by cloud-init collides with oam-modify triggered puppet oam network
|
||||
# reconfiguration, causing default OAM route missing. To avoid this, we do some cleanup.
|
||||
do_network_cleanup ${CLOUD_INIT_OAM_IF_NAME} ${current_oam_if_name_with_label}
|
||||
fi
|
||||
|
||||
reconfigure_OAM
|
||||
fi
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user