[PATCH v6 2/3] uacce: add uacce driver

Fri Oct 25 13:28:30 AEDT 2019


On 2019/10/23 下午3:42, Jean-Philippe Brucker wrote:
> On Fri, Oct 18, 2019 at 08:01:44PM +0800, zhangfei.gao at foxmail.com wrote:
>>> More generally, it would be nice to use the DMA API when SVA isn't
>>> supported, instead of manually allocating and mapping memory with
>>> iommu_map(). Do we only handcraft these functions in order to have VA ==
>>> IOVA?  On its own it doesn't seem like a strong enough reason to avoid the
>>> DMA API.
>> Here we use unmanaged domain to prevent va conflict with iova.
>> The target is still to build shared virtual address though SVA is not
>> supported.
> If SVA isn't supported, having VA == IOVA looks nice but isn't
> particularly useful. We could instead require that, if SVA isn't
> supported, userspace handles VA and IOVA separately for any DMA region.
>
> Enforcing VA == IOVA adds some unnecessary complexity to this module. In
> addition to the special case for software MSIs that is already there
> (uacce_iommu_has_sw_msi), it's also not guaranteed that the whole VA space
> is representable with IOVAs, you might need to poke holes in the IOVA
> space for reserved regions (See iommu.*resv). For example VFIO checks that
> the IOVA requested by userspace doesn't fall into a reserved range (see
> iova_list in vfio_iommu_type1.c). It also exports to userspace a list of
> possible IOVAs through VFIO_IOMMU_GET_INFO.
>
> Letting the DMA API allocate addresses would be simpler, since it already
> deals with resv regions and software MSI.
>
>> The iova from dma api can be same with va, and device can not distinguish
>> them.
>> So here we borrow va from user space and iommu_map to device, and the va
>> becomes iova.
>> Since this iova is from user space, so no conflict.
>> Then dma api can not be used in this case.
>>
>> drivers/vfio/vfio_iommu_type1.c also use iommu_domain_alloc.
> VFIO needs to let userspace pick its IOVA, because the IOVA space is
> generally managed by a guest OS. In my opinion this is a baggage that
> uacce doesn't need.
>
> If we only supported the DMA API and not unmanaged IOMMU domains,
> userspace would need to do a little bit more work by differentiating
> between VA and DMA addresses, but that could be abstracted into the uacce
> library and it would make the kernel module a lot simpler.
Thanks Jean.
Will only handle sva case for the first patch, then considers the two 
directions further.
>
> [...]
>>> I wish the SVA and !SVA paths were less interleaved. Both models are
>>> fundamentally different:
>>>
>>> * Without SVA you cannot share the device between multiple processes. All
>>>     DMA mappings are in the "main", non-PASID address space of the device.
>>>
>>>     Note that process isolation without SVA could be achieved with the
>>>     auxiliary domains IOMMU API (introduced primarily for vfio-mdev) but
>>>     this is not the model chosen here.
>> Does pasid has to be supported for this case?
> Yes, you do need PASID support for auxiliary domains, but not PRI/Stall.
>
> [...]
>>>> +	/* allocate memory */
>>>> +	if (flags & UACCE_QFRF_DMA) {
>>> At the moment UACCE_QFRF_DMA is never set, so there is a lot of unused and
>>> possibly untested code in this file. I think it would be simpler to choose
>>> between either DMA API or unmanaged IOMMU domains and stick with it. As
>>> said before, I'd prefer DMA API.
>> UACCE_QFRF_DMA is using dma api, it used this for quick method, though it
>> can not prevent va conflict.
>> We use an ioctl to get iova of the dma buffer.
>> Since the interface is not standard, we kept the interface and verified
>> internally.
> As above, it's probably worth exploring this method further for !SVA.
>
>>>> +		qfr->kaddr = dma_alloc_coherent(uacce->pdev,
>>>> +						qfr->nr_pages << PAGE_SHIFT,
>>>> +						&qfr->dma, GFP_KERNEL);
>>>> +		if (!qfr->kaddr) {
>>>> +			ret = -ENOMEM;
>>>> +			goto err_with_qfr;
>>>> +		}
>>>> +	} else {
>>>> +		ret = uacce_qfr_alloc_pages(qfr);
>>>> +		if (ret)
>>>> +			goto err_with_qfr;
>>>> +	}
>>>> +
>>>> +	/* map to device */
>>>> +	ret = uacce_queue_map_qfr(q, qfr);
>>> Worth moving into the else above.
>> The idea here is a, map to device, b, map to user space.
> Yes but dma_alloc_coherent() creates the IOMMU mapping, and
> uacce_queue_map_qfr()'s only task is to create the IOMMU mapping when the
> DMA API isn't in use, so you could move this call up, right after
> uacce_qfr_alloc_pages().
>
> [...]
>>>> +	q->state = UACCE_Q_ZOMBIE;
>>> Since the PUT_Q ioctl makes the queue unrecoverable, why should userspace
>>> invoke it instead of immediately calling close()?
>> We found close does not release resource immediately, which may cause issue
>> when re-open again
>> when all queues are used.
> I think the only way to fix that problem is to avoid reallocating the
> resources until they are released, because we can't count on userspace to
> always call the PUT_Q ioctl. Sometimes the program will crash before that.
Here ioctl PUT_Q is rather an optimization rather than a must solution.
In internal test, we found directly close will takes long delay, at 
worst 1s is taken.
We simulated many processes doing zip and close, then we found  the test 
can not achieve its up-limit.
The reason is  closing fd took delay.
So we consider add put_q ioctl as a quick release method to optimize the 
experience.
And put_q ioctl is optional, user still can check available_instances 
and waiting for close.
>
>>>> +static int uacce_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>>>> +{
>>>> +	struct uacce_queue *q = filep->private_data;
>>>> +	struct uacce_device *uacce = q->uacce;
>>>> +	struct uacce_qfile_region *qfr;
>>>> +	enum uacce_qfrt type = 0;
>>>> +	unsigned int flags = 0;
>>>> +	int ret;
>>>> +
>>>> +	if (vma->vm_pgoff < UACCE_QFRT_MAX)
>>>> +		type = vma->vm_pgoff;
>>>> +
>>>> +	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
>>>> +
>>>> +	mutex_lock(&uacce_mutex);
> By the way, lockdep detects a possible unsafe locking scenario here,
> because we're taking the uacce_mutex even though mmap called us with the
> mmap_sem held for writing. Conversely uacce_fops_release() takes the
> mmap_sem for writing while holding the uacce_mutex. I think it can be
> fixed easily, if we simply remove the use of mmap_sem in
> uacce_fops_release(), since it's only taken to do some accounting which
> doesn't look right.
mmap_sem in uacce_fops_release() will be removed
>
> However, a similar but more complex locking issue comes from the current
> use of iommu_sva_bind/unbind_device():
>
> uacce_fops_open:
>   iommu_sva_unbind_device()
>    iommu_sva_bind_group()	[iommu_group->mutex]
>      mmu_notifier_get()		[mmap_sem]
>
> uacce_fops_mmap:		[mmap_sem]
> 				[uacce_mutex]
>
> uacce_fops_release:
> 				[uacce_mutex]
>    iommu_sva_unbind_device()	[iommu_group->mutex]
>
> This circular dependency can be broken by calling iommu_sva_unbind_device()
> outside of uacce_mutex, but I think it's worth reworking the queue locking
> scheme a little and use fine-grained locking for the queue state.
>
> Something else I noticed is uacce_idr isn't currently protected. The IDR
> API expected the caller to use its own locking scheme. You could replace
> it with an xarray, which I think is preferred to IDR now and provides a
> xa_lock.
Currently  idr_alloc and idr_remove are simply protected by uacce_mutex,
Will check xarray, looks it is more complicated then idr.

Thanks