[PATCH 6/7] ppc64: EEH Avoid racing reports of errors

linas linas at austin.ibm.com
Sat Oct 8 01:23:05 EST 2005


On Wed, Oct 05, 2005 at 09:23:11PM +1000, Paul Mackerras was heard to remark:
> Linas writes:
> 
> > 06-eeh-report-race.patch
> 
> Shouldn't you pass in pe_dn->child here, or
> alternatively rearrange __eeh_mark_slot to do the node you give it
> plus its children (recursively)?

Yes; that's right; this gets fixed in a later patch in the series. 
I guess this one snuck by while I was trying to sync up all the
different patches I was carrying :-/

> Two other comments about __eeh_mark_slot: (1) despite the comment, the
> function doesn't do anything to any pci_dev or pci_driver 

The comment is also a "back port" of function that shows up in a later
patch, and so indeed is inappropriate for this patch. Again, my excuse 
is that I got sloppy while juggling all of these patchlets. Sorry.

> (not that it
> should be touching any pci_driver), 

One problem I was seeing was that after getting an EEH error, 
some device drivers would start spinning in thier interrupt handlers.
I tried to break out of this spin-loop by adding a call to a
function that asked "am I the victim of an EEH event"?  
Unfortunately, the first implementation of this call was not 
interrupt safe (pci_device_to_OF_node calls traverse_pci_devices).
While scratching my head on to how to best fix this, I decided that 
the best thing to do would be to mark up the pci driver with a flag;
that way, the driver can look up te EEH state without any further ado.

One might be able to get rid of this state in pci_driver, 
although it seemed generically useful to have.  For example,
later on, I futzed with a version that disabled the irq line 
for that adapter "as soon as possible", and that seems to also 
work, at least on an SMP machine. On a non-SMP machine, there 
is still the danger that the device driver is spinning with 
interrupts disabled, waiting on a status regiser to change, 
that will never change. (And because of the deadlock, the 
code to disable a given irq line never runs).  Its all
depends on how the device driver got written.

> and (2) a recursive function can't
> really be inline 

Well, no, but at least the first level call can be inlined; I assumed 
that gcc would do at least that, but didn't check.

--linas




More information about the Linuxppc64-dev mailing list