tools/centos-mirror-tools/make_stx_mirror_yum_conf.sh
Davlet Panech ac49ff342c use curl + avoid partial downloads
Mirror scripts sometimes leave corrupted/partial files behind.

Problems
========

1) wget is called with the -O flag, and the server returns an HTTP
error for the requested URL (404 etc). Wget leaves a zero-length file
behind. This doesn't seem to happen without the -O flag.

2) wget starts the download which stalls & times out half-way; wget
gives up and requests the same file with a byte offset of the form
"Range: bytes=1234-", and the web server doesn't support open-ended
ranges. In this case wget prints out a warning, leaves a partial file
behind and returns success.

3) Sites like GitHub generate repo tarballs on the fly, eg:
https://github.com/kubernetes/kubernetes/archive/refs/tags/v1.19.3.tar.gz
Since tags can move, downloading such a file twice may result in a
different file. Therefore HTTP "resume download" may corrupt files in
this case.

4) Git "keyword expansion" feature may result in differences in source
files being downloaded. For example, this file:

  https://github.com/kubernetes/kubernetes/blob/v1.19.3/staging/src/k8s.io/component-base/version/base.go

contains lines similar to:

  gitVersion  = "v0.0.0-master+$Format:%h$"

where %h is replaced with a short SHA when the tar file is
exported/downloaded.  How short the SHA is depends on git history and
sometimes results in shortened SHAs of different lengths. So
downloading that file may result in different files.

Therefore HTTP "Range" header may corrupt files in this case as
well.

5) Curl is invoked with the "--retry" option and starts the download;
connection stalls; curl gives up, connects again, skips the 1st N
bytes and appends to the partial file. If the file changes while we
are doing this, it will end up corrupting the file. This is very
unlikely to happen and I haven't been able to reproduce this case.

Problems with HTTP Range header
===============================
Curl/wget "resume/continue download" feature has no way of verifying
whether the partial file on disk, and the one being re-requested, are in
fact the same file.  If the file changes on the server between
downloads, "resume download" will corrupt it.

Some web servers don't support this at all, which triggers case (2)
with wget.

Some web servers support the Range header, but require that the end
byte position is present. This is not compatible with wget & curl.
For example curl & wget add headers similar to: "Range: bytes=1234-"
means give me the file starting at offset 1234 and till EOF. This also
triggers case (2).

This patch
==========

* Always download the file to a temporary name, then rename into place

* Use curl instead wget (better error handling). The only exception is
"recursive downloads", which curl doesn't support.

Bug: https://bugs.launchpad.net/starlingx/+bug/1950017
Change-Id: Iaa89009ce23efe5b73ecb8163556ce6db932028b
Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
2021-11-10 14:25:47 -05:00

287 lines
7.8 KiB
Bash
Executable File

#!/bin/bash
#
# SPDX-License-Identifier: Apache-2.0
#
#
# Replicate a yum.conf and yum.repo.d under a temporary directory and
# then modify the files to point to equivalent repos in the StarlingX mirror.
# This script was originated by Scott Little
#
MAKE_STX_MIRROR_YUM_CONF_DIR="$(dirname "$(readlink -f "${BASH_SOURCE[0]}" )" )"
source "$MAKE_STX_MIRROR_YUM_CONF_DIR/utils.sh" || exit 1
DISTRO="centos"
SUDO=sudo
TEMP_DIR=""
SRC_REPO_DIR="$MAKE_STX_MIRROR_YUM_CONF_DIR/yum.repos.d"
SRC_YUM_CONF="$MAKE_STX_MIRROR_YUM_CONF_DIR/yum.conf.sample"
RETAIN_REPODIR=0
usage () {
echo ""
echo "$0 -d <dest_dir> [-D <distro>] [-y <src_yum_conf>] [-r <src_repos_dir>] [-R] [-l <layer>] [-u <lower-layer>,<repo_url>]"
echo ""
echo "Replicate a yum.conf and yum.repo.d under a new directory and"
echo "then modify the files to point to equivalent repos in the StarlingX"
echo "mirror."
echo ""
echo "-d <dest_dir> = Place modified yum.conf and yum.repo.d into this directory"
echo "-D <distro> = Target distro on StarlingX mirror. Default is 'centos'"
echo "-y <yum_conf> = Path to yum.conf file that we will modify. Default is"
echo " 'yum.conf.sample' in same directory as this script"
echo "-r <repos_dir> = Path to yum.repos.d that we will modify. Default is"
echo " 'yum.repos.d' in same directory as this script"
echo "-l <layer> = Download only packages required to build a given layer"
echo "-u <lower-layer>,<build-type>,<repo_url> = Add/change the repo baseurl for a lower layer"
echo "-n don't use sudo"
}
declare -A layer_urls
set_layer_urls () {
local option="${1}"
local layer_and_build_type="${option%,*}"
local layer="${layer_and_build_type%,*}"
local build_type="${layer_and_build_type#*,}"
local layer_url="${option##*,}"
# Enforce trailing '/'
if [ "${layer_url:${#layer_url}-1:1}" != "/" ]; then
layer_url+="/"
fi
layer_urls["${layer_and_build_type}"]="${layer_url}"
}
#
# option processing
#
while getopts "D:d:l:nRr:u:y:" o; do
case "${o}" in
D)
DISTRO="${OPTARG}"
;;
d)
TEMP_DIR="${OPTARG}"
;;
l)
LAYER="${OPTARG}"
;;
n)
SUDO=""
;;
r)
SRC_REPO_DIR="${OPTARG}"
;;
R)
RETAIN_REPODIR=1
;;
u)
set_layer_urls "${OPTARG}"
;;
y)
SRC_YUM_CONF="${OPTARG}"
;;
*)
usage
exit 1
;;
esac
done
#
# option validation
#
if [ ! -f $SRC_YUM_CONF ]; then
echo "Error: yum.conf not found at '$SRC_YUM_CONF'"
exit 1
fi
if [ ! -d $SRC_REPO_DIR ]; then
echo "Error: repo dir not found at '$SRC_REPO_DIR'"
exit 1
fi
if [ "$TEMP_DIR" == "" ]; then
echo "Error: working dir not provided"
usage
exit 1
fi
if [ ! -d $TEMP_DIR ]; then
echo "Error: working dir not found at '$TEMP_DIR'"
exit 1
fi
#
# Get the value of the $releasever variable.
#
# If the source yum.conf has a releasever= setting, we will honor
# that, even though yum will not.
#
# Otherwise use yum to query the host environment (Docker).
# This assumes the host environmnet has the same releasever
# as that which will be used inside StarlingX.
#
# NOTE: In other scripts we will read releasever= out of yum.conf
# and push it back into yum via --releasever=<#>.
#
get_releasever () {
if [ -f $SRC_YUM_CONF ] && grep -q '^releasever=' $SRC_YUM_CONF; then
grep '^releasever=' $SRC_YUM_CONF | cut -d '=' -f 2
else
${SUDO} yum version nogroups | grep Installed | cut -d ' ' -f 2 | cut -d '/' -f 1
fi
}
#
# Get the value of the $basearch variable.
#
# Just use yum to query the host environment (Docker) as we don't support
# cross compiling.
#
get_arch () {
${SUDO} yum version nogroups | grep Installed | cut -d ' ' -f 2 | cut -d '/' -f 2
}
#
# Global variables we will use later.
#
CENGN_REPOS_DIR="$TEMP_DIR/yum.repos.d"
CENGN_YUM_CONF="$TEMP_DIR/yum.conf"
CENGN_YUM_LOG="$TEMP_DIR/yum.log"
CENGN_YUM_CACHDIR="$TEMP_DIR/cache/yum/\$basearch/\$releasever"
RELEASEVER=$(get_releasever)
ARCH=$(get_arch)
#
# Copy as yet unmodified yum.conf and yum.repos.d from source to dest.
#
mkdir -p "$CENGN_REPOS_DIR"
echo "\cp -r '$SRC_REPO_DIR/*' '$CENGN_REPOS_DIR/'"
\cp -r "$SRC_REPO_DIR"/* "$CENGN_REPOS_DIR/"
echo "\cp '$SRC_YUM_CONF' '$CENGN_YUM_CONF'"
\cp "$SRC_YUM_CONF" "$CENGN_YUM_CONF"
if [ "$LAYER" != "all" ]; then
if [ -d ${MAKE_STX_MIRROR_YUM_CONF_DIR}/config/${DISTRO}/${LAYER}/yum.repos.d ]; then
\cp -f ${MAKE_STX_MIRROR_YUM_CONF_DIR}/config/${DISTRO}/${LAYER}/yum.repos.d/*.repo $CENGN_REPOS_DIR
fi
fi
#
# Add or modify reposdir= value in our new yum.conf
#
if grep -q '^reposdir=' $CENGN_YUM_CONF; then
# reposdir= already exists, modify it
if [ $RETAIN_REPODIR -eq 1 ]; then
# Append CENGN_REPOS_DIR
sed "s#^reposdir=\(.*\)\$#reposdir=\1 $CENGN_REPOS_DIR#" -i $CENGN_YUM_CONF
else
# replace with CENGN_REPOS_DIR
sed "s#^reposdir=.*\$#reposdir=$CENGN_REPOS_DIR#" -i $CENGN_YUM_CONF
fi
else
# reposdir= doeas not yet exist, add it
if [ $RETAIN_REPODIR -eq 1 ]; then
# Add both SRC_REPO_DIR and CENGN_REPOS_DIR
echo "reposdir=$SRC_REPO_DIR $CENGN_REPOS_DIR" >> $CENGN_YUM_CONF
else
# Add CENGN_REPOS_DIR only
echo "reposdir=$CENGN_REPOS_DIR" >> $CENGN_YUM_CONF
fi
fi
#
# modify or add logfile= value in our new yum.conf
#
if grep -q '^logfile=' $CENGN_YUM_CONF; then
sed "s#^logfile=.*\$#logfile=$CENGN_YUM_LOG#" -i $CENGN_YUM_CONF
else
echo "logfile=$CENGN_YUM_LOG" >> $CENGN_YUM_CONF
fi
#
# modify or add cachedir= value in our new yum.conf
#
if grep -q '^cachedir=' $CENGN_YUM_CONF; then
sed "s#^cachedir=.*\$#cachedir=$CENGN_YUM_CACHDIR#" -i $CENGN_YUM_CONF
else
echo "cachedir=$CENGN_YUM_CACHDIR" >> $CENGN_YUM_CONF
fi
#
# Modify all the repo files in our new yum.repos.d
#
for REPO in $(find "$CENGN_REPOS_DIR" -type f -name '*repo'); do
#
# Replace mirrorlist with baseurl if required
#
if grep -q '^mirrorlist=' "$REPO" ; then
sed '/^mirrorlist=/d' -i "$REPO"
sed 's%^#baseurl%baseurl%' -i "$REPO"
fi
#
# Substitute any $releasever or $basearch variables
#
sed "s#/[$]releasever/#/$RELEASEVER/#g" -i "$REPO"
sed "s#/[$]basearch/#/$ARCH/#g" -i "$REPO"
#
# Turn off gpgcheck for now.
# Must revisit this at a later date!
#
sed 's#^gpgcheck=1#gpgcheck=0#' -i "$REPO"
sed '/^gpgkey=/d' -i "$REPO"
#
# Convert baseurl(s) to cengn equivalent
#
for URL in $(grep '^baseurl=' "$REPO" | sed 's#^baseurl=##'); do
CENGN_URL="$(url_to_stx_mirror_url "$URL" "$DISTRO")"
# Test CENGN url
url_exists --quiet "$CENGN_URL"
if [ $? -eq 0 ]; then
# OK, make substitution
sed "s#^baseurl=$URL\$#baseurl=$CENGN_URL#" -i "$REPO"
fi
done
#
# Prefix repoid and name with CENGN
#
sed "s#^name=\(.*\)#name=CENGN_\1#" -i "$REPO"
sed "s#^\[\([^]]*\)\]#[CENGN_\1]#" -i "$REPO"
done
for key in "${!layer_urls[@]}"; do
lower_layer="${key%,*}"
build_type="${key#*,}"
REPO="$CENGN_REPOS_DIR/StarlingX_cengn_${lower_layer}_layer.repo"
if [ -f "$REPO" ]; then
sed "s#^baseurl=.*/${lower_layer}/.*/${build_type}/\$#baseurl=${layer_urls[${key}]}#" -i "$REPO"
else
REPO="$CENGN_REPOS_DIR/StarlingX_local_${lower_layer}_${build_type}_layer.repo"
(
echo "[Starlingx-local_${lower_layer}_${build_type}_layer]"
echo "name=Starlingx-cengn_${lower_layer}_${build_type}_layer"
echo "baseurl=${layer_urls[${key}]}"
echo "enabled=1"
) > "$REPO"
fi
done
echo $TEMP_DIR