Symbios PCI error recovery [Was: Re: [PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)]
brking at us.ibm.com
Wed Mar 23 04:38:36 EST 2005
Linas Vepstas wrote:
> There has been a running thread for a while on several mailing lists
> concerning PCI bus error recovery. Very breifly, some architectures
> have PCI error recovery mechanisms built into them (e.g. IBM PowerPC,
> also new PCI-Express chips from Intel (and other vendors) and possibly
> pa-risc and others).
> I've been trying to prototype error recovery. I currently have
> ethernet and the IPR scsi driver working, but I am having trouble with
> the symbios driver. I need help/advice ...
> On Fri, Feb 25, 2005 at 11:36:09PM -0700, Grant Grundler was heard to remark:
>>On Wed, Feb 23, 2005 at 07:31:37PM -0600, Linas Vepstas wrote:
>>>I also want to do the symbios driver...
>>FYI, Mathew Wilcox maintains the sym2 driver in cvs.parisc-linux.org.
> My current hardware will halt all i/o to/from the symbios controller
> upon detection of a PCI error. The recovery proceedure that I am
> currently using is to call system firmware (aka 'bios') to raise
> and then lower the #RST pci signal line for 1/4 second, then wait 2
> seconds for the PCI bus to settle, then restore the PCI config space
> registers (BARs, interrupt line, etc) to what they used to be. Then,
> I call sym_start_up() in an attempt to get the symbios card working
> again. And that's where I get stuck ...
> My assumption is that after the #RST, that the symbios card will sit
> there, dumb and stupid, with no scripts running. But sometimes I find
> that the card has done something to make the PCI error hardware trip
> again. Typically, this means that the card attempted to DMA to some
> address that its not allowed to touch, or raised #SERR or possibly
> #PERR (I can't tell which).
What config registers are you restoring? Is it possible symbios does not
like something in your config restore?
Another possiblity is that asserting PCI reset is not cleanly resetting
the card. Does PCI reset force BIST to be run on these cards? You could
try to manually run BIST on the card after the PCI reset to see if that
helps, or you could try power cycling the slot instead of using PCI reset.
> Sometimes, I get the PCI error while the card is sitting there idly
> after the #RST, but more often, I get the error in sym_chip_reset(),
> immediately after the OUTB (nc_istat, SRST);
> Any clue what this is about? Am I missing something? I'm rather
> perplexed at this point, any clues/hints/suggestions are welcome.
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
eServer Storage I/O
IBM Linux Technology Center
More information about the Linuxppc64-dev