hotplug remove vs. device driver close

Greg KH greg at kroah.com
Thu Jun 3 09:28:51 EST 2004


On Wed, Jun 02, 2004 at 06:14:55PM -0500, linas at austin.ibm.com wrote:
>
> Hi,
>
> We are hitting a situation where we are hot-plug removing a pci card before
> closing the device driver.  This seems to lead to kernel memory leaks if not
> outright crashes. I'm trying to understand what the correct solution to this
> is supposed to be.

To paraphrase from the PCI Hotplug spec, "DO NOT DO THAT!"

> For example: 'ifup eth0' and 'ifdown eth0' are what usually cause an ethernet
> device driver to be opened/closed.  Seprately, we have a userland tool that
> can be used to power off the pci slot, and thus perform a hotplug unconfigure
> in the kernel (i.e. calls pci_remove_bus_device()).   Thus, the sysadmin
> currently has the power to hot-remove a device without first closing the
> device driver.  Surely, this is bad. (Right?)  But how is this supposed to
> be handled?

Again, do not do that.

> Please don't tell me that a good sysadmin should never do that ... in the
> hothouse of the server room, crazy stuff happens and it should not result
> in a server crash so easily ...

Tough, do not do that.

That being said, a lot of the PCI drivers can recover from this as they
also work for PCMCIA devices, and they need to be able to handle this.
It is possible, and pretty simple to fix within the driver itself.

> I'm hoping that the answer also isn't that 'the hotplug scripts should
> do that', since hotplug scripts can be buggy, or can crash for many reasons;
> such events shouldn't bring down the kernel.

No, it's not a hotplug script issue.

> So I conclude two possibilities:
>
> -- All device drivers should watch for hotplug remove, and close themselves
>    down in such an event

No, they should watch for errors when trying to read and write from
their devices and if that happens, handle it properly.  The kernel will
tell them at some time that the device is really gone by calling the
disconnect() callback.

> -- The syscall that allows the pci slot to be powered off should also
>    go through the steps of closing the device driver first.

No.  Read the PCI Hotplug spec.

> Is there another possibility?  What's the right way of handling this?

See above.

What driver is dying for you?  It should be quite easy to fix.

thanks,

greg k-h

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list