[PATCH] uacce: fix concurrency of fops_open and uacce_remove

Jean-Philippe Brucker jean-philippe at linaro.org
Thu Jun 16 18:14:44 AEST 2022


On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:
> > > diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> > > index 281c54003edc..b6219c6bfb48 100644
> > > --- a/drivers/misc/uacce/uacce.c
> > > +++ b/drivers/misc/uacce/uacce.c
> > > @@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct file *filep)
> > >   	if (!q)
> > >   		return -ENOMEM;
> > > +	mutex_lock(&uacce->queues_lock);
> > > +
> > > +	if (!uacce->parent->driver) {
> > I don't think this is useful, because the core clears parent->driver after
> > having run uacce_remove():
> > 
> >    rmmod hisi_zip		open()
> >     ...				 uacce_fops_open()
> >     __device_release_driver()	  ...
> >      pci_device_remove()
> >       hisi_zip_remove()
> >        hisi_qm_uninit()
> >         uacce_remove()
> >          ...			  ...
> >     				  mutex_lock(uacce->queues_lock)
> >      ...				  if (!uacce->parent->driver)
> >      device_unbind_cleanup()	  /* driver still valid, proceed */
> >       dev->driver = NULL
> 
> The check  if (!uacce->parent->driver) is required, otherwise NULL pointer
> may happen.

I agree we need something, what I mean is that this check is not
sufficient.

> iommu_sva_bind_device
> const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
> dev->iommu->iommu_dev->ops
> 
> rmmod has no issue, but remove parent pci device has the issue.

Ah right, relying on the return value of bind() wouldn't be enough even if
we mandated SVA.

[...]
> > 
> > I think we need the global uacce_mutex to serialize uacce_remove() and
> > uacce_fops_open(). uacce_remove() would do everything, including
> > xa_erase(), while holding that mutex. And uacce_fops_open() would try to
> > obtain the uacce object from the xarray while holding the mutex, which
> > fails if the uacce object is being removed.
> 
> Since fops_open get char device refcount, uacce_release will not happen
> until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer uacce_remove()
runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

	uacce_fops_open()
	 if (!uacce->parent->driver)
	 /* Still valid, keep going */		
	 ...					rmmod
						 uacce_remove()
	 ...					 free_module()
	 uacce->ops->get_queue() /* BUG */

Accessing uacce->ops after free_module() is a use-after-free. We need all
the fops to synchronize with uacce_remove() to ensure they don't use any
resource of the parent after it's been freed. 

I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.

Thanks,
Jean


More information about the Linux-accelerators mailing list