[PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section

Dan Williams dan.j.williams at intel.com
Wed Dec 11 12:51:15 AEDT 2024


Fabio M. De Francesco wrote:
> On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
> > Fabio M. De Francesco wrote:
> > > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
> > 
> > I think the critical detail is is that print_extlog_rcd() is only
> > triggered when ras_userspace_consumers() returns true. The observation
> > is that ras_userspace_consumers() hides information from the trace path
> > when the intended purpose of it was to hide duplicate emissions to the
> > kernel log when userspace is watching the tracepoints.
> >
> > Setting aside whether ras_userspace_consumers() is still a good idea or
> > not, it is obvious that this patch as is may surprise environments that
> > start seeing kernel error logs where the kernel was silent before.
> >
> > I think the path of least surprise would be to make sure that
> > pci_print_aer() optionally skips emitting to the kernel log when not
> > needed wanted.
> 
> Sorry for replying so late...
> 
> I'm not entirely sure that users would not prefer to be surprised by 
> _finally_ seeing kernel error logs for failing PCIe components. I suspect 
> that users might have been confused by not seeing any output.

2 notes:

* New KERN_ERR prints are often found to be unwelcome. When the kernel starts
  printing new error messages it causes sysadmins to scramble.

* The future of RAS is trace-events. Any new RAS messages to the kernel
  log need to ask the question, "is userspace better served by
  registering for a RAS trace event, rather than parsing kernel log
  messsages".

[..]
> I need to be sure that I understood...
> 
> void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
>                    struct aer_capability_regs *aer)
> {
>         [...]
> 
>         if (printk_get_level(level) <= console_loglevel) {
>                 pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
>                         status, mask);

No, the code would be:

    pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);

...i.e. just pass @level rather than open code "if
(printk_get_level(level) <= console_loglevel)".


More information about the Linuxppc-dev mailing list