This commit upgrades iavf to version 4.5.3.4 from 4.5.3.2 to fix the
issue "iavf 0000:17:01.6: Never saw reset".
The following root cause analysis comes from Intel.
"""
The iavf_adminq_task() function processes the device Admin queue,
which is used to handle receiving messages from the PF driver.
It calls iavf_clean_arq_element() to extract the message at the head
of the queue, and processes it by calling iavf_virtchnl_completion().
There is a subtle race between iavf_adminq_task() and
iavf_watchdog_task() involving the processing of
VIRTCHNL_EVENT_RESET_IMPENDING. The race results in the iavf driver
getting stuck waiting for a reset that has already completed, printing
"Never saw reset" once every 5 seconds, and locking the driver in the
__IAVF_RESET state, preventing normal operations from proceeding.
The entire race can be avoided if the iavf_adminq_task() stops holding
onto potentially stale data. To do this, acquire the
__IAVF_IN_CRITICAL_TASK at the start of the function. With this, it is
no longer possible for the function to be blocked holding the data in
its event buffer while the iavf_watchdog_task() function processes the
entire hardware reset.
Instead of sleeping with a while loop, just re-queue the
iavf_adminq_task() when we are unable to acquire the bit lock.
Additionally, align with upstream and check the removal status to
avoid re-queuing in the event that the driver has already started
remove.
This new flow also aligns with the way the upstream driver handles
locking and completely avoids the race. If the iavf_adminq_task()
happens to be delayed until the hardware reset completes, it will no
longer see the VIRTCHNL_EVENT_RESET_IMPENDING data, as this will have
been cleared by the hardware reset.
"""
Verification:
- The following command with this commit results in a successful iavf
kernel module build for standard and PREEMPT_RT kernels:
build-pkgs -c -p iavf
- A StarlingX ISO image was installed onto an All-in-One Dell XR11 lab
with one Intel E810 NIC server in low-latency mode.
- The user who reported this issue was provided with a StarlingX
designer patch that incorporates this change. The user in question
did not encounter any issues during their testing with the designer
patch.
Closes-Bug: 2058858
Change-Id: I448ee1e302bdc7277a6c5db990d4d5cfc485a0f4
Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>