[PATCH] powerpc/eeh: Delay slot presence check once driver is notified about the pci error.

Oliver O'Halloran oohall at gmail.com
Wed Nov 24 23:01:45 AEDT 2021


On Wed, Nov 24, 2021 at 12:05 AM Mahesh Salgaonkar <mahesh at linux.ibm.com> wrote:
>
> *snip*
>
> This causes the EEH handler to get stuck for ~6
> seconds before it could notify that the pci error has been detected and
> stop any active operations. Hence with running I/O traffic, during this 6
> seconds, the network driver continues its operation and hits a timeout
> (netdev watchdog).On timeouts, network driver go into ffdc capture mode
> and reset path assuming the PCI device is in fatal condition. This causes
> EEH recovery to fail and sometimes it leads to system hang or crash.

Whatever is causing that crash is the real issue IMO. PCI error
reporting is fundamentally asynchronous and the driver always has to
tolerate some amount of latency between the error occuring and being
reported. Six seconds is admittedly an eternity, but it should not
cause a system crash under any circumstances. Printing a warning due
to a timeout is annoying, but it's not the end of the world.


More information about the Linuxppc-dev mailing list