[PATCH 05/14] powerpc/eeh: Defer printing stack trace

Sam Bobroff sbobroff at linux.ibm.com
Tue Sep 17 13:35:14 AEST 2019


On Tue, Sep 17, 2019 at 11:45:14AM +1000, Oliver O'Halloran wrote:
> On Tue, Sep 17, 2019 at 11:04 AM Sam Bobroff <sbobroff at linux.ibm.com> wrote:
> >
> > On Tue, Sep 03, 2019 at 08:15:56PM +1000, Oliver O'Halloran wrote:
> > > Currently we print a stack trace in the event handler to help with
> > > debugging EEH issues. In the case of suprise hot-unplug this is unneeded,
> > > so we want to prevent printing the stack trace unless we know it's due to
> > > an actual device error. To accomplish this, we can save a stack trace at
> > > the point of detection and only print it once the EEH recovery handler has
> > > determined the freeze was due to an actual error.
> > >
> > > Since the whole point of this is to prevent spurious EEH output we also
> > > move a few prints out of the detection thread, or mark them as pr_debug
> > > so anyone interested can get output from the eeh_check_dev_failure()
> > > if they want.
> > >
> > > Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
> >
> > I think this is a good change, and even in the normal case it will place
> > the stacktrace closer to the rest of the recovery information.
> >
> > But, I think it would make more sense to put the stacktrace into the
> > struct eeh_event, rather than the struct eeh_pe. Is there some reason
> > we can't do that? (It would save a fair bit of memory!)
> 
> Two reasons:
> 
> 1) the eeh_event structures are allocated with GFP_ATOMIC since
> eeh_dev_check_failure() can be called from any context. Minimising the
> number of atomic allocations we do is a good idea as a matter of
> course.

Yes, but I meant directly inside eeh_event so there wouldn't be a second
allocation. It would just be a bit bigger.

> 2) We don't pass the eeh_event structure to the event handler
> function. I guess we could, but... eh
> 
> I don't see the memory saving as hugely significant either. There's
> always fewer eeh_pe structures than there are PCI devices since some
> will share PEs (e.g. switches, multifunction cards) so you'd be saving
> a dozen KB at most.
> 
> root at zaius1:~# lspci | wc -l
> 59
> root at zaius1:~# echo $(( $(lspci | wc -l) * 64 * 8))
> 30208
> 
> I think we'll live...

Sure, I don't have very strong feelings about it either way.

> >
> > Cheers,
> > Sam.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20190917/baa53d32/attachment.sig>


More information about the Linuxppc-dev mailing list