hotplug remove vs. device driver close

linas at austin.ibm.com linas at austin.ibm.com
Thu Jun 3 09:14:55 EST 2004


Hi,

We are hitting a situation where we are hot-plug removing a pci card before
closing the device driver.  This seems to lead to kernel memory leaks if not
outright crashes. I'm trying to understand what the correct solution to this
is supposed to be.

For example: 'ifup eth0' and 'ifdown eth0' are what usually cause an ethernet
device driver to be opened/closed.  Seprately, we have a userland tool that
can be used to power off the pci slot, and thus perform a hotplug unconfigure
in the kernel (i.e. calls pci_remove_bus_device()).   Thus, the sysadmin
currently has the power to hot-remove a device without first closing the
device driver.  Surely, this is bad. (Right?)  But how is this supposed to
be handled?

Please don't tell me that a good sysadmin should never do that ... in the
hothouse of the server room, crazy stuff happens and it should not result
in a server crash so easily ...

I'm hoping that the answer also isn't that 'the hotplug scripts should
do that', since hotplug scripts can be buggy, or can crash for many reasons;
such events shouldn't bring down the kernel.

So I conclude two possibilities:

-- All device drivers should watch for hotplug remove, and close themselves
   down in such an event

-- The syscall that allows the pci slot to be powered off should also
   go through the steps of closing the device driver first.

Is there another possibility?  What's the right way of handling this?


--linas


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list