Fix issues with controller node Anaconda hang

On some deployments install fails as we keep one FD open
during install. This leads to hangs when Anaconda
'post' stage returns.

On other deployments install fails as udev sometimes creates
multiple links to the same devices in /dev/disk/by-path.
We iterate through this list and, because they are not unique,
we try to run flocks multiple times for the same device.
Locking a device multiple times doesn't work, the second
flock waits for first lock to release.

This commit:
 o removes 'exec {stdout}>&1' from ks-functions.sh so it no
   longer opens FDs in 'post' stage. For the pre stage we open
   it only when needed;
 o makes sure that list of storage devices is unique;
 o increases timeout of udevadm settle from its default of 180s
   to 300s, the value used throughout Anaconda. This helps
   with slower hardware.

Closes-Bug: 1889427
Change-Id: I348f10d96a78ea2c1c25fe6cf48462b0bc31fb84
Signed-off-by: Ovidiu Poncea <ovidiu.poncea@windriver.com>
This commit is contained in:
Ovidiu Poncea 2020-07-30 13:25:41 +03:00 committed by Stefan Dinescu
parent bb739f8311
commit 7a0a2dac1a
3 changed files with 11 additions and 6 deletions

View File

@ -9,13 +9,9 @@ cat <<END_FUNCTIONS >/tmp/ks-functions.sh
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
# #
# Get the FD used by subshells to log output
if [ -z "\$stdout" ]; then
exec {stdout}>&1
fi
function wlog() function wlog()
{ {
[ -z "\$stdout" ] && stdout=1
local dt="\$(date "+%Y-%m-%d %H:%M:%S.%3N")" local dt="\$(date "+%Y-%m-%d %H:%M:%S.%3N")"
echo "\$dt - \$1" >&\${stdout} echo "\$dt - \$1" >&\${stdout}
} }

View File

@ -1,5 +1,8 @@
%pre --erroronfail %pre --erroronfail
# Get the FD used by subshells to log output
exec {stdout}>&1
# Source common functions # Source common functions
. /tmp/ks-functions.sh . /tmp/ks-functions.sh
@ -56,6 +59,12 @@ for f in /dev/disk/by-path/*; do
fi fi
done done
# Filter STOR_DEVS variable for any duplicates as on some systems udev
# creates multiple links to the same device. This causes issues due to
# attempting to acquire a flock on the same device multiple times.
STOR_DEVS=$(echo "$STOR_DEVS" | xargs -n 1 | sort -u | xargs)
wlog "Unique storage devices: $STOR_DEVS."
if [ -z "$STOR_DEVS" ] if [ -z "$STOR_DEVS" ]
then then
report_pre_failure_with_msg "ERROR: No storage devices available." report_pre_failure_with_msg "ERROR: No storage devices available."

View File

@ -13,7 +13,7 @@ do
exec {fd}>&- exec {fd}>&-
done done
sleep 2 sleep 2
udevadm settle || report_pre_failure_with_msg "ERROR: udevadm settle failed!" udevadm settle --timeout=300 || report_pre_failure_with_msg "ERROR: udevadm settle failed!"
# Rescan LVM cache to avoid warnings for VGs that were recreated. # Rescan LVM cache to avoid warnings for VGs that were recreated.
pvscan --cache pvscan --cache