eeh

linas at austin.ibm.com linas at austin.ibm.com
Tue Mar 23 09:04:06 EST 2004


On Mon, Mar 22, 2004 at 01:40:40PM -0800, Greg KH wrote:
>
> On Mon, Mar 22, 2004 at 03:22:31PM -0600, Nathan Fontenot wrote:
> > Yes, the driver/pci/hotplug changes are platform specific, but so are
> > all the changes for this patch.
>
> My point remains.  Try to do this in a non-platform specific way, as you
> will have to do it eventually, might as well be now.

re: now vs. eventually: we are working hard to remove the kernel panic
before the SUSE SLES9 deadlines, (ditto RH) i.e. today, so that there
can be some semblance of testing before it gets widespread use.

> > There is a call to /sbin/hotplug.  I had to trace it through the code,
> > but the path is ...
> >
> > rpaphp_disable_slot
> >   disable_slot
> >     rpaphp_unconfig_pci_adapter
> >       pci_remove_bus_device
> >         pci_destroy_dev
> >           device_unregister
> >             device_del
> >               kobject_del
> >                 kobject_hotplug
> >                   kset_hotplug
> >                     call_usermodehelper("/sbin/hotplug")
>
> Yeah, fun isn't it :)
>
> I mean a "fault" ACTION for the hotplug event before the "this device is
> now gone" event that your above call chain causes.

The EEH hardware makes the device 'gone' before any software anywhere
is able to do anything.  From the kernel point of view, the 'gone-ness' of
the device is completely asynchronous, its as if the sysadmin yanked out the
the card, without telling the kernel in advance.   The card is gone, the
best we can do is to try to clean up after it.

Actually, I'm not clear on this, I might be wrong, but: if the sysadmin
really does unplug a hot card, I think it will take this EEH error path.
I haven't tried this (yet).

--linas

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list