Add embedded iPXE script to retry

Currently when the iPXE boot fails, the machine halts and requires
manual intervention to retry. In some environments when machine
boots, but the network is not yet ready to receive the DHCP request
on time. Sometimes when DHCP is ready, the filename option is not
present because the back-end service is not yet ready to provide it.
Then also sometimes the option is there, but it fails.

This patch implements a script which will do ten attempts to get
DHCP and the filename option before failing. When it fails, it will
then reboot the machine. The purpose of this is to maximise the chances
of succeeding the iPXE boot.

A small change to the element deps is also made to ensure that a boot
partition is created. This is necessary due to changes in
diskimage-builder.

Finally, the git SHA is updated in the make file to the current HEAD
to ensure that the image built is current.

Change-Id: Iccef8ff21b64f20f4a185c654c0308478eecfb4a
This commit is contained in:
Jesse Pretorius (odyssey4me) 2019-08-09 12:55:37 +01:00
parent e1cf5baa56
commit b5b7791180
4 changed files with 55 additions and 2 deletions

View File

@ -1,5 +1,7 @@
# git ref to checkout from the iPXE git repository
IPXE_GIT_REF=6366fa7a
# This is the current SHA for HEAD on 9 Aug 2019
IPXE_GIT_REF=c63ef427
ipxe-boot.qcow2:
ELEMENTS_PATH=./elements IPXE_GIT_REF=$(IPXE_GIT_REF) disk-image-create -x -o ipxe-boot ipxe-boot-image

View File

@ -1,2 +1,3 @@
block-device-mbr
centos7
vm

View File

@ -29,7 +29,57 @@ if [ -n "${IPXE_GIT_REF:-}" ]; then
fi
IPXE_GIT_REF_CURRENT=$(git rev-parse --short HEAD)
make bin/ipxe.lkrn
cat << EOF >>script.ipxe
#!ipxe
#
# This is the iPXE boot script that we embed into the iPXE binary.
#
# The default behaviour is to get DHCP and assume that DHCP includes
# the filename option. If it fails, then it just stops.
#
# This script implements more retries until both DHCP and the filename
# option are present. This makes sure that the underlying network has
# plenty of time to be ready. If this still fails, then rather than
# halt, then host will reboot to try again.
#
# Based on:
# https://github.com/danderson/netboot/blob/master/pixiecore/boot.ipxe
#
set attempts:int32 10
set x:int32 0
:loop
dhcp || goto nodhcp
isset \${filename} || goto nobootconfig
goto boot
:nodhcp
echo No DHCP response, retrying (attempt \${x}/\${attempts}).
sleep 1
goto retry
:nobootconfig
echo No filename option present, retrying (attempt \${x}/\${attempts}).
sleep 1
goto retry
:retry
iseq \${x} \${attempts} && goto fail ||
inc x
goto loop
:boot
chain \${filename}
:fail
echo Failed to get a correct DHCP response after \${attempts} attempts.
echo
echo Rebooting in 5 seconds...
sleep 5
reboot
EOF
make bin/ipxe.lkrn EMBED=script.ipxe
cp bin/ipxe.lkrn /boot/
cat << EOF >>/etc/grub.d/40_custom

Binary file not shown.