[PATCH 6/7] ppc64: EEH Avoid racing reports of errors
linas
linas at austin.ibm.com
Sat Oct 8 01:23:05 EST 2005
On Wed, Oct 05, 2005 at 09:23:11PM +1000, Paul Mackerras was heard to remark:
> Linas writes:
>
> > 06-eeh-report-race.patch
>
> Shouldn't you pass in pe_dn->child here, or
> alternatively rearrange __eeh_mark_slot to do the node you give it
> plus its children (recursively)?
Yes; that's right; this gets fixed in a later patch in the series.
I guess this one snuck by while I was trying to sync up all the
different patches I was carrying :-/
> Two other comments about __eeh_mark_slot: (1) despite the comment, the
> function doesn't do anything to any pci_dev or pci_driver
The comment is also a "back port" of function that shows up in a later
patch, and so indeed is inappropriate for this patch. Again, my excuse
is that I got sloppy while juggling all of these patchlets. Sorry.
> (not that it
> should be touching any pci_driver),
One problem I was seeing was that after getting an EEH error,
some device drivers would start spinning in thier interrupt handlers.
I tried to break out of this spin-loop by adding a call to a
function that asked "am I the victim of an EEH event"?
Unfortunately, the first implementation of this call was not
interrupt safe (pci_device_to_OF_node calls traverse_pci_devices).
While scratching my head on to how to best fix this, I decided that
the best thing to do would be to mark up the pci driver with a flag;
that way, the driver can look up te EEH state without any further ado.
One might be able to get rid of this state in pci_driver,
although it seemed generically useful to have. For example,
later on, I futzed with a version that disabled the irq line
for that adapter "as soon as possible", and that seems to also
work, at least on an SMP machine. On a non-SMP machine, there
is still the danger that the device driver is spinning with
interrupts disabled, waiting on a status regiser to change,
that will never change. (And because of the deadlock, the
code to disable a given irq line never runs). Its all
depends on how the device driver got written.
> and (2) a recursive function can't
> really be inline
Well, no, but at least the first level call can be inlined; I assumed
that gcc would do at least that, but didn't check.
--linas
More information about the Linuxppc64-dev
mailing list