[PATCH] uacce: fix concurrency of fops_open and uacce_remove
Zhangfei Gao
zhangfei.gao at linaro.org
Tue Jun 21 17:37:31 AEST 2022
On 2022/6/20 下午9:36, Greg Kroah-Hartman wrote:
> On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote:
>> On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote:
>>>> The refcount only ensures that the uacce_device object is not freed as
>>>> long as there are open fds. But uacce_remove() can run while there are
>>>> open fds, or fds in the process of being opened. And atfer uacce_remove()
>>>> runs, the uacce_device object still exists but is mostly unusable. For
>>>> example once the module is freed, uacce->ops is not valid anymore. But
>>>> currently uacce_fops_open() may dereference the ops in this case:
>>>>
>>>> uacce_fops_open()
>>>> if (!uacce->parent->driver)
>>>> /* Still valid, keep going */
>>>> ... rmmod
>>>> uacce_remove()
>>>> ... free_module()
>>>> uacce->ops->get_queue() /* BUG */
>>> uacce_remove should wait for uacce->queues_lock, until fops_open release the
>>> lock.
>>> If open happen just after the uacce_remove: unlock, uacce_bind_queue in open
>>> should fail.
>> Ah yes sorry, I lost sight of what this patch was adding. But we could
>> have the same issue with the patch, just in a different order, no?
>>
>> uacce_fops_open()
>> uacce = xa_load()
>> ... rmmod
> Um, how is rmmod called if the file descriptor is open?
>
> That should not be possible if the owner of the file descriptor is
> properly set. Please fix that up.
Thanks Greg
Set cdev owner or use module_get/put can block rmmod once fops_open.
- uacce->cdev->owner = THIS_MODULE;
+ uacce->cdev->owner = uacce->parent->driver->owner;
However, still not find good method to block removing parent pci device.
$ echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove &
[ 32.563350] uacce_remove+0x6c/0x148
[ 32.563353] hisi_qm_uninit+0x12c/0x178
[ 32.563356] hisi_zip_remove+0xa0/0xd0 [hisi_zip]
[ 32.563361] pci_device_remove+0x44/0xd8
[ 32.563364] device_remove+0x54/0x88
[ 32.563367] device_release_driver_internal+0xec/0x1a0
[ 32.563370] device_release_driver+0x20/0x30
[ 32.563372] pci_stop_bus_device+0x8c/0xe0
[ 32.563375] pci_stop_and_remove_bus_device_locked+0x28/0x60
[ 32.563378] remove_store+0x9c/0xb0
[ 32.563379] dev_attr_store+0x20/0x38
mutex_lock(&dev->device_lock) can be used, which used in
device_release_driver_internal.
Or use internal mutex.
Thanks
More information about the Linux-accelerators
mailing list