[patch 8/8] PCI Error Recovery: PPC64 core recovery routines

Tue Aug 30 02:09:15 EST 2005

On Mon, Aug 29, 2005 at 04:40:20PM +1000, Paul Mackerras was heard to remark:
> Linas Vepstas writes:
> 
> > Actually, no.  There are three issues:
> > 1) hotplug routines are called from within kernel. GregKH has stated on
> >    multiple occasions that doing this is wrong/bad/evil. This includes
> >    calling hot-unplug.
> > 
> > 2) As a result, the code to call hot-unplug is a bit messy. In
> >    particular, there's a bit of hoop-jumping when hotplug is built as
> >    as a module (and said hoops were wrecked recently when I moved the
> >    code around, out of the rpaphp directory).
> 
> One way to clean this up would be to make rpaphp the driver for the
> EADS bridges (from the pci code's point of view).  

I guess I don't understand what that means. Are you suggesting moving 
pSeries_pci.c into the rpaphp code directory?

> Then it would
> automatically get included in the error recovery process and could do
> whatever it should.

John Rose, the current maintainer of the rpaphp code, is pretty militant 
about removing things from, not adding things to, the rpaphp code.
Which is a good idea, as chunks of that code are spaghetti, and do need
simplification and cleanup.

> > 3) Hot-unplug causes scripts to run in user-space. There is no way to 
> >    know when these scripts are done, so its not clear if we've waited
> >    long enough before calling hot-add (or if waiting is even necessary).
> 
> OK, so let's just add a new hotplug event called KOBJ_ERROR or
> something, which tells userspace that an error has occurred which has
> made the device inaccessible.  Greg, would that be OK?

Why do we need such an event?

I would prefer to deprecate the hot-plug based recovery scheme.  This
is for many reasons, including the fact that some devices that can get
pci errors are soldered onto the planar, and are not hot-pluggable.

--linas