Symbios PCI error recovery [Was: Re: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)]

Linas Vepstas linas at austin.ibm.com
Tue Mar 22 10:10:28 EST 2005


Hi,

There has been a running thread for a while on several mailing lists 
concerning PCI bus error recovery.  Very breifly, some architectures
have PCI error recovery mechanisms built into them (e.g. IBM PowerPC,
also new PCI-Express chips from Intel (and other vendors) and possibly
pa-risc and others).  

I've been trying to prototype  error recovery.  I currently have
ethernet and the IPR scsi driver working, but I am having trouble with 
the symbios driver.  I need help/advice ... 

On Fri, Feb 25, 2005 at 11:36:09PM -0700, Grant Grundler was heard to remark:
> On Wed, Feb 23, 2005 at 07:31:37PM -0600, Linas Vepstas wrote:
> > I also want to do the symbios driver...
> 
> FYI, Mathew Wilcox maintains the sym2 driver in cvs.parisc-linux.org.


My current hardware will halt all i/o to/from the symbios controller
upon detection of a PCI error.  The recovery proceedure that I am
currently using is to call system firmware (aka 'bios') to raise
and then lower the #RST pci signal line for 1/4 second, then wait 2
seconds for the  PCI bus to settle, then restore the PCI config space
registers (BARs, interrupt line, etc) to what they used to be. Then,
I call sym_start_up() in an attempt to get the symbios card working
again.  And that's where I get stuck ... 

My assumption is that after the #RST, that the symbios card will sit
there, dumb and stupid, with no scripts running.  But sometimes I find 
that the card has done something to make the PCI error hardware trip
again.  Typically, this means that the card attempted to DMA to some
address that its not allowed to touch, or raised #SERR or possibly 
#PERR (I can't tell which). 

Sometimes, I get the PCI error while the card is sitting there idly
after the #RST, but more often, I get the error in sym_chip_reset(),
immediately after the   OUTB (nc_istat, SRST);

Any clue what this is about? Am I missing something? I'm rather
perplexed at this point, any clues/hints/suggestions are welcome.

--linas




More information about the Linuxppc64-dev mailing list