Questions: Should kernel panic when PCIe fatal error occurs?

David Laight David.Laight at ACULAB.COM
Fri Sep 22 08:22:11 AEST 2023

> It would be nice if they worked the same, but I suspect that vendors
> may rely on the fact that CPER_SEV_FATAL forces a restart/panic as
> part of their system integrity story.

The file system errors created by a panic (especially an NMI panic)
could easily be more problematic than a failed PCIe data transfer.
Evan a read that returned ~0u - which can be checked for.

Panicking a system that is converting TDM telephony to RTP for the
911 emergency service because a PCIe cable/riser connecting one of the
TDM board has become loose doesn't seem ideal.
(Or because the TDM board's fpga has decided it isn't going to respond
to any accesses until the BARs are setup again...)

The system can carry on with some TDM connections disabled - but that
is ok because they are all duplicated in case a cable gets cuit.

(Yes - that is a live system...)


