Handle errors on system upgrade

Using 'wait' will not return error if any of the
child background process fails. This results in
unexpected behavior of triggering upgrade run
eventhough system upgrade failed. Observed this
issue during the integration of NFV's NIC partitioning
feature, in which system upgrade hangs on SR-IOV
cleanup step.

Used `wait $PID` to handle the states of all child
processes and wait for all to complete. Exit with error
if anyone of the childs exited with error.

Change-Id: I084f3b974e37c9f5a4d720747e408645deb50995
This commit is contained in:
Saravanan KR 2021-01-11 11:00:22 +05:30
parent 87be79e1c8
commit 9ab549df7d
1 changed files with 14 additions and 2 deletions

View File

@ -34,6 +34,7 @@ echo "[$(date)] Started system upgrade step for {{ hosts }}"
tripleo-ansible-inventory --stack {{ overcloud_stack_name }} --static-yaml-inventory upgrade_inventory.yaml --undercloud-connection ssh --undercloud-key-file /var/lib/mistral/.ssh/tripleo-admin-rsa --ansible_ssh_user tripleo-admin
bkg_pids=""
for host in $(echo "{{ hosts }}" | sed "s/,/ /g")
do
openstack overcloud upgrade run ${RUN_ANSWER} \
@ -41,8 +42,19 @@ do
--static-inventory upgrade_inventory.yaml \
--tags system_upgrade \
--limit "${host}" 2>&1 | tee -a "RHEL_upgrade_${host}" &
bkg_pids+=" $! "
done
wait
status=0
for p in $bkg_pids; do
if ! wait $p; then
status=1
fi
done
echo "[$(date)] Finished system upgrade step for {{ hosts }}"
if [[ $status == 0 ]]; then
echo "[$(date)] Finished system upgrade step for {{ hosts }}"
else
echo "[$(date)] Failed in system upgrade step for {{ hosts }}"
exit 1
fi