[PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
Dan Williams
dan.j.williams at intel.com
Wed Dec 11 12:51:15 AEDT 2024
Fabio M. De Francesco wrote:
> On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
> > Fabio M. De Francesco wrote:
> > > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
> >
> > I think the critical detail is is that print_extlog_rcd() is only
> > triggered when ras_userspace_consumers() returns true. The observation
> > is that ras_userspace_consumers() hides information from the trace path
> > when the intended purpose of it was to hide duplicate emissions to the
> > kernel log when userspace is watching the tracepoints.
> >
> > Setting aside whether ras_userspace_consumers() is still a good idea or
> > not, it is obvious that this patch as is may surprise environments that
> > start seeing kernel error logs where the kernel was silent before.
> >
> > I think the path of least surprise would be to make sure that
> > pci_print_aer() optionally skips emitting to the kernel log when not
> > needed wanted.
>
> Sorry for replying so late...
>
> I'm not entirely sure that users would not prefer to be surprised by
> _finally_ seeing kernel error logs for failing PCIe components. I suspect
> that users might have been confused by not seeing any output.
2 notes:
* New KERN_ERR prints are often found to be unwelcome. When the kernel starts
printing new error messages it causes sysadmins to scramble.
* The future of RAS is trace-events. Any new RAS messages to the kernel
log need to ask the question, "is userspace better served by
registering for a RAS trace event, rather than parsing kernel log
messsages".
[..]
> I need to be sure that I understood...
>
> void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
> struct aer_capability_regs *aer)
> {
> [...]
>
> if (printk_get_level(level) <= console_loglevel) {
> pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
> status, mask);
No, the code would be:
pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
...i.e. just pass @level rather than open code "if
(printk_get_level(level) <= console_loglevel)".
More information about the Linuxppc-dev
mailing list