[PATCH] uacce: fix concurrency of fops_open and uacce_remove

Zhangfei Gao zhangfei.gao at linaro.org
Thu Jun 16 14:10:18 AEST 2022


Hi, Jean

On 2022/6/15 下午11:16, Jean-Philippe Brucker wrote:
> Hi,
>
> On Fri, Jun 10, 2022 at 08:34:23PM +0800, Zhangfei Gao wrote:
>> The uacce parent's module can be removed when uacce is working,
>> which may cause troubles.
>>
>> If rmmod/uacce_remove happens just after fops_open: bind_queue,
>> the uacce_remove can not remove the bound queue since it is not
>> added to the queue list yet, which blocks the uacce_disable_sva.
>>
>> Change queues_lock area to make sure the bound queue is added to
>> the list thereby can be searched in uacce_remove.
>>
>> And uacce->parent->driver is checked immediately in case rmmod is
>> just happening.
>>
>> Also the parent driver must always stop DMA before calling
>> uacce_remove.
>>
>> Signed-off-by: Yang Shen <shenyang39 at huawei.com>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao at linaro.org>
>> ---
>>   drivers/misc/uacce/uacce.c | 19 +++++++++++++------
>>   1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
>> index 281c54003edc..b6219c6bfb48 100644
>> --- a/drivers/misc/uacce/uacce.c
>> +++ b/drivers/misc/uacce/uacce.c
>> @@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct file *filep)
>>   	if (!q)
>>   		return -ENOMEM;
>>   
>> +	mutex_lock(&uacce->queues_lock);
>> +
>> +	if (!uacce->parent->driver) {
> I don't think this is useful, because the core clears parent->driver after
> having run uacce_remove():
>
>    rmmod hisi_zip		open()
>     ...				 uacce_fops_open()
>     __device_release_driver()	  ...
>      pci_device_remove()
>       hisi_zip_remove()
>        hisi_qm_uninit()
>         uacce_remove()
>          ...			  ...
>     				  mutex_lock(uacce->queues_lock)
>      ...				  if (!uacce->parent->driver)
>      device_unbind_cleanup()	  /* driver still valid, proceed */
>       dev->driver = NULL

The check  if (!uacce->parent->driver) is required, otherwise NULL 
pointer may happen.
iommu_sva_bind_device
const struct iommu_ops *ops = dev_iommu_ops(dev);  -> 
dev->iommu->iommu_dev->ops

rmmod has no issue, but remove parent pci device has the issue.

Test:
sleep in fops_open before mutex.

estuary:/mnt$ ./work/a.out &
//sleep in fops_open

echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove &
estuary:/mnt$ [   22.594348] uacce_remove!
[   22.594663] pci 0000:00:02.0: Removing from iommu group 2
[   22.595073] iommu_release_device dev->iommu=0
[   22.595076] CPU: 2 PID: 229 Comm: ash Not tainted 
5.19.0-rc1-15071-gcbcf098c5257-dirty #633
[   22.595079] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 
02/06/2015
[   22.595080] Call trace:
[   22.595080]  dump_backtrace+0xe4/0xf0
[   22.595085]  show_stack+0x20/0x70
[   22.595086]  dump_stack_lvl+0x8c/0xb8
[   22.595089]  dump_stack+0x18/0x34
[   22.595091]  iommu_release_device+0x90/0x98
[   22.595095]  iommu_bus_notifier+0x58/0x68
[   22.595097]  blocking_notifier_call_chain+0x74/0xa8
[   22.595100]  device_del+0x268/0x3b0
[   22.595102]  pci_remove_bus_device+0x84/0x110
[   22.595106]  pci_stop_and_remove_bus_device_locked+0x30/0x60
...

estuary:/mnt$ [   31.466360] uacce: sleep end!
[   31.466362] uacce->parent->driver=0
[   31.466364] uacce->parent->iommu=0
[   31.466365] uacce_bind_queue!
[   31.466366] uacce_bind_queue call iommu_sva_bind_device!
[   31.466367] uacce->parent=d120d0
[   31.466371] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000038
[   31.472870] Mem abort info:
[   31.473450]   ESR = 0x0000000096000004
[   31.474223]   EC = 0x25: DABT (current EL), IL = 32 bits
[   31.475390]   SET = 0, FnV = 0
[   31.476031]   EA = 0, S1PTW = 0
[   31.476680]   FSC = 0x04: level 0 translation fault
[   31.477687] Data abort info:
[   31.478294]   ISV = 0, ISS = 0x00000004
[   31.479152]   CM = 0, WnR = 0
[   31.479785] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000714d8000
[   31.481144] [0000000000000038] pgd=0000000000000000, p4d=0000000000000000
[   31.482622] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   31.483784] Modules linked in: hisi_zip
[   31.484590] CPU: 2 PID: 228 Comm: a.out Not tainted 
5.19.0-rc1-15071-gcbcf098c5257-dirty #633
[   31.486374] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 
02/06/2015
[   31.487862] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[   31.489390] pc : iommu_sva_bind_device+0x44/0xf4
[   31.490404] lr : uacce_fops_open+0x128/0x234

>
> Since uacce_remove() disabled SVA, the following uacce_bind_queue() will
> fail anyway. However, if uacce->flags does not have UACCE_DEV_SVA set,
> we'll proceed further and call uacce->ops->get_queue(), which does not
> exist anymore since the parent module is gone.
>
> I think we need the global uacce_mutex to serialize uacce_remove() and
> uacce_fops_open(). uacce_remove() would do everything, including
> xa_erase(), while holding that mutex. And uacce_fops_open() would try to
> obtain the uacce object from the xarray while holding the mutex, which
> fails if the uacce object is being removed.

Since fops_open get char device refcount, uacce_release will not happen 
until open returns.
So either uacce = xa_load(&uacce_xa, iminor(inode)) is got, 
uacce_release release uacce after fops_release.
Or uacce is not got and return -ENODEV.

open:
         uacce = xa_load(&uacce_xa, iminor(inode));
         if (!uacce)
                 return -ENODEV;

uacce->dev.release = uacce_release;
uacce_release: kfree(uacce);

Thanks


More information about the Linux-accelerators mailing list