[PATCH v6 2/3] uacce: add uacce driver

Kenneth Lee Kenneth-Lee-2012 at foxmail.com
Thu Oct 24 17:41:29 AEDT 2019


On Tue, Oct 22, 2019 at 02:49:29PM -0400, Jerome Glisse wrote:
> Date: Tue, 22 Oct 2019 14:49:29 -0400
> From: Jerome Glisse <jglisse at redhat.com>
> To: Zhangfei Gao <zhangfei.gao at linaro.org>
> Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org>, Arnd Bergmann
>  <arnd at arndb.de>, Herbert Xu <herbert at gondor.apana.org.au>,
>  jonathan.cameron at huawei.com, grant.likely at arm.com, jean-philippe
>  <jean-philippe at linaro.org>, ilias.apalodimas at linaro.org,
>  francois.ozog at linaro.org, kenneth-lee-2012 at foxmail.com, Wangzhou
>  <wangzhou1 at hisilicon.com>, "haojian . zhuang" <haojian.zhuang at linaro.org>,
>  Zaibo Xu <xuzaibo at huawei.com>, linux-kernel at vger.kernel.org,
>  linux-crypto at vger.kernel.org, Kenneth Lee <liguozhu at hisilicon.com>,
>  linux-accelerators at lists.ozlabs.org
> Subject: Re: [PATCH v6 2/3] uacce: add uacce driver
> Message-ID: <20191022184929.GC5169 at redhat.com>
> 
> On Wed, Oct 16, 2019 at 04:34:32PM +0800, Zhangfei Gao wrote:
> > From: Kenneth Lee <liguozhu at hisilicon.com>
> > 
> > Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
> > provide Shared Virtual Addressing (SVA) between accelerators and processes.
> > So accelerator can access any data structure of the main cpu.
> > This differs from the data sharing between cpu and io device, which share
> > data content rather than address.
> > Since unified address, hardware and user space of process can share the
> > same virtual address in the communication.
> > 
> > Uacce create a chrdev for every registration, the queue is allocated to
> > the process when the chrdev is opened. Then the process can access the
> > hardware resource by interact with the queue file. By mmap the queue
> > file space to user space, the process can directly put requests to the
> > hardware without syscall to the kernel space.
> 
> You need to remove all API that is not use by your first driver as
> it will most likely bit rot without users. It is way better to add
> things when a driver start to make use of it.

Yes. Good point. Thank you:)

> 
> I am still not convince of the value of adding a new framework here
> with only a single device as an example. It looks similar to some of
> the fpga devices. Saddly because framework layering is not something
> that exist i guess inventing a new framework is the only answer when
> you can not quite fit into an existing one.
> 
> More fundamental question is why do you need to change the IOMMU
> domain of the device ? I do not see any reason for that unless the
> PASID has some restriction on ARM that i do not know of.

But I think this is the only way. As my understanding, by default, the
system creates a DMA IOMMU domain for each device behine an IOMMU. If
you want to call iommu interface directly, we have to rebind the device
to an unmanaged domain.

> 
> I do have multiple comments and point out various serious issues
> below.
> 
> As it is from my POV it is a NAK. Note that i am not opposing of
> adding a new framework, just that you need to trim things down
> to what is use by your first driver and you also need to address
> the various issues i point out below.

And you are always nice and helpful. Thanks:)

The other comments will be left to Zhangfei, who will feed back soon.

> 
> Cheers,
> Jérôme
> 
> > 
> > Signed-off-by: Kenneth Lee <liguozhu at hisilicon.com>
> > Signed-off-by: Zaibo Xu <xuzaibo at huawei.com>
> > Signed-off-by: Zhou Wang <wangzhou1 at hisilicon.com>
> > Signed-off-by: Zhangfei Gao <zhangfei.gao at linaro.org>
> > ---
> >  Documentation/ABI/testing/sysfs-driver-uacce |  65 ++
> >  drivers/misc/Kconfig                         |   1 +
> >  drivers/misc/Makefile                        |   1 +
> >  drivers/misc/uacce/Kconfig                   |  13 +
> >  drivers/misc/uacce/Makefile                  |   2 +
> >  drivers/misc/uacce/uacce.c                   | 995 +++++++++++++++++++++++++++
> >  include/linux/uacce.h                        | 168 +++++
> >  include/uapi/misc/uacce/uacce.h              |  41 ++
> >  8 files changed, 1286 insertions(+)
> >  create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
> >  create mode 100644 drivers/misc/uacce/Kconfig
> >  create mode 100644 drivers/misc/uacce/Makefile
> >  create mode 100644 drivers/misc/uacce/uacce.c
> >  create mode 100644 include/linux/uacce.h
> >  create mode 100644 include/uapi/misc/uacce/uacce.h
> > 
> 
> [...]
> 
> > diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> > new file mode 100644
> > index 0000000..534ddc3
> > --- /dev/null
> > +++ b/drivers/misc/uacce/uacce.c
> > @@ -0,0 +1,995 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +#include <linux/compat.h>
> > +#include <linux/dma-iommu.h>
> > +#include <linux/file.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/module.h>
> > +#include <linux/poll.h>
> > +#include <linux/sched/signal.h>
> > +#include <linux/uacce.h>
> > +
> > +static struct class *uacce_class;
> > +static DEFINE_IDR(uacce_idr);
> > +static dev_t uacce_devt;
> > +static DEFINE_MUTEX(uacce_mutex);
> > +static const struct file_operations uacce_fops;
> > +
> > +static int uacce_queue_map_qfr(struct uacce_queue *q,
> > +			       struct uacce_qfile_region *qfr)
> > +{
> > +	struct device *dev = q->uacce->pdev;
> > +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> > +	int i, j, ret;
> > +
> > +	if (!(qfr->flags & UACCE_QFRF_MAP) || (qfr->flags & UACCE_QFRF_DMA))
> > +		return 0;
> > +
> > +	if (!domain)
> > +		return -ENODEV;
> > +
> > +	for (i = 0; i < qfr->nr_pages; i++) {
> > +		ret = iommu_map(domain, qfr->iova + i * PAGE_SIZE,
> > +				page_to_phys(qfr->pages[i]),
> > +				PAGE_SIZE, qfr->prot | q->uacce->prot);
> > +		if (ret)
> > +			goto err_with_map_pages;
> > +
> > +		get_page(qfr->pages[i]);
> > +	}
> > +
> > +	return 0;
> > +
> > +err_with_map_pages:
> > +	for (j = i - 1; j >= 0; j--) {
> > +		iommu_unmap(domain, qfr->iova + j * PAGE_SIZE, PAGE_SIZE);
> > +		put_page(qfr->pages[j]);
> > +	}
> > +	return ret;
> > +}
> > +
> > +static void uacce_queue_unmap_qfr(struct uacce_queue *q,
> > +				  struct uacce_qfile_region *qfr)
> > +{
> > +	struct device *dev = q->uacce->pdev;
> > +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> > +	int i;
> > +
> > +	if (!domain || !qfr)
> > +		return;
> > +
> > +	if (!(qfr->flags & UACCE_QFRF_MAP) || (qfr->flags & UACCE_QFRF_DMA))
> > +		return;
> > +
> > +	for (i = qfr->nr_pages - 1; i >= 0; i--) {
> > +		iommu_unmap(domain, qfr->iova + i * PAGE_SIZE, PAGE_SIZE);
> > +		put_page(qfr->pages[i]);
> > +	}
> > +}
> > +
> > +static int uacce_qfr_alloc_pages(struct uacce_qfile_region *qfr)
> > +{
> > +	int i, j;
> > +
> > +	qfr->pages = kcalloc(qfr->nr_pages, sizeof(*qfr->pages), GFP_ATOMIC);
> > +	if (!qfr->pages)
> > +		return -ENOMEM;
> > +
> > +	for (i = 0; i < qfr->nr_pages; i++) {
> > +		qfr->pages[i] = alloc_page(GFP_ATOMIC | __GFP_ZERO);
> > +		if (!qfr->pages[i])
> > +			goto err_with_pages;
> > +	}
> > +
> > +	return 0;
> > +
> > +err_with_pages:
> > +	for (j = i - 1; j >= 0; j--)
> > +		put_page(qfr->pages[j]);
> > +
> > +	kfree(qfr->pages);
> > +	return -ENOMEM;
> > +}
> > +
> > +static void uacce_qfr_free_pages(struct uacce_qfile_region *qfr)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < qfr->nr_pages; i++)
> > +		put_page(qfr->pages[i]);
> > +
> > +	kfree(qfr->pages);
> > +}
> > +
> > +static inline int uacce_queue_mmap_qfr(struct uacce_queue *q,
> > +				       struct uacce_qfile_region *qfr,
> > +				       struct vm_area_struct *vma)
> > +{
> > +	int i, ret;
> > +
> > +	for (i = 0; i < qfr->nr_pages; i++) {
> > +		ret = remap_pfn_range(vma, vma->vm_start + (i << PAGE_SHIFT),
> > +				      page_to_pfn(qfr->pages[i]), PAGE_SIZE,
> > +				      vma->vm_page_prot);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static struct uacce_qfile_region *
> > +uacce_create_region(struct uacce_queue *q, struct vm_area_struct *vma,
> > +		    enum uacce_qfrt type, unsigned int flags)
> > +{
> > +	struct uacce_qfile_region *qfr;
> > +	struct uacce_device *uacce = q->uacce;
> > +	unsigned long vm_pgoff;
> > +	int ret = -ENOMEM;
> > +
> > +	qfr = kzalloc(sizeof(*qfr), GFP_ATOMIC);
> > +	if (!qfr)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	qfr->type = type;
> > +	qfr->flags = flags;
> > +	qfr->iova = vma->vm_start;
> > +	qfr->nr_pages = vma_pages(vma);
> > +
> > +	if (vma->vm_flags & VM_READ)
> > +		qfr->prot |= IOMMU_READ;
> > +
> > +	if (vma->vm_flags & VM_WRITE)
> > +		qfr->prot |= IOMMU_WRITE;
> > +
> > +	if (flags & UACCE_QFRF_SELFMT) {
> > +		if (!uacce->ops->mmap) {
> > +			ret = -EINVAL;
> > +			goto err_with_qfr;
> > +		}
> > +
> > +		ret = uacce->ops->mmap(q, vma, qfr);
> > +		if (ret)
> > +			goto err_with_qfr;
> > +		return qfr;
> > +	}
> > +
> > +	/* allocate memory */
> > +	if (flags & UACCE_QFRF_DMA) {
> > +		qfr->kaddr = dma_alloc_coherent(uacce->pdev,
> > +						qfr->nr_pages << PAGE_SHIFT,
> > +						&qfr->dma, GFP_KERNEL);
> > +		if (!qfr->kaddr) {
> > +			ret = -ENOMEM;
> > +			goto err_with_qfr;
> > +		}
> > +	} else {
> > +		ret = uacce_qfr_alloc_pages(qfr);
> > +		if (ret)
> > +			goto err_with_qfr;
> > +	}
> > +
> > +	/* map to device */
> > +	ret = uacce_queue_map_qfr(q, qfr);
> > +	if (ret)
> > +		goto err_with_pages;
> > +
> > +	/* mmap to user space */
> > +	if (flags & UACCE_QFRF_MMAP) {
> > +		if (flags & UACCE_QFRF_DMA) {
> > +			/* dma_mmap_coherent() requires vm_pgoff as 0
> > +			 * restore vm_pfoff to initial value for mmap()
> > +			 */
> 
> I would argue that the dma_mmap_coherent() is not the right function
> to use here you might be better of doing remap_pfn_range() on your
> own.
> 
> Working around existing API is not something you want to do, it can
> easily break and it makes it harder for people who want to update that
> API to not break anyone.

Yes, indeed.

> 
> > +			vm_pgoff = vma->vm_pgoff;
> > +			vma->vm_pgoff = 0;
> > +			ret = dma_mmap_coherent(uacce->pdev, vma, qfr->kaddr,
> > +						qfr->dma,
> > +						qfr->nr_pages << PAGE_SHIFT);
> > +			vma->vm_pgoff = vm_pgoff;
> > +		} else {
> > +			ret = uacce_queue_mmap_qfr(q, qfr, vma);
> > +		}
> > +
> > +		if (ret)
> > +			goto err_with_mapped_qfr;
> > +	}
> > +
> > +	return qfr;
> > +
> > +err_with_mapped_qfr:
> > +	uacce_queue_unmap_qfr(q, qfr);
> > +err_with_pages:
> > +	if (flags & UACCE_QFRF_DMA)
> > +		dma_free_coherent(uacce->pdev, qfr->nr_pages << PAGE_SHIFT,
> > +				  qfr->kaddr, qfr->dma);
> > +	else
> > +		uacce_qfr_free_pages(qfr);
> > +err_with_qfr:
> > +	kfree(qfr);
> > +
> > +	return ERR_PTR(ret);
> > +}
> > +
> > +static void uacce_destroy_region(struct uacce_queue *q,
> > +				 struct uacce_qfile_region *qfr)
> > +{
> > +	struct uacce_device *uacce = q->uacce;
> > +
> > +	if (qfr->flags & UACCE_QFRF_DMA) {
> > +		dma_free_coherent(uacce->pdev, qfr->nr_pages << PAGE_SHIFT,
> > +				  qfr->kaddr, qfr->dma);
> > +	} else if (qfr->pages) {
> > +		if (qfr->flags & UACCE_QFRF_KMAP && qfr->kaddr) {
> > +			vunmap(qfr->kaddr);
> > +			qfr->kaddr = NULL;
> > +		}
> > +
> > +		uacce_qfr_free_pages(qfr);
> > +	}
> > +	kfree(qfr);
> > +}
> > +
> > +static long uacce_cmd_share_qfr(struct uacce_queue *tgt, int fd)
> 
> It would be nice to comment what this function does, AFAICT it tries
> to share a region uacce_qfile_region. Anyway this should be remove
> altogether as it is not use by your first driver.
> 
> > +{
> > +	struct file *filep;
> > +	struct uacce_queue *src;
> > +	int ret = -EINVAL;
> > +
> > +	mutex_lock(&uacce_mutex);
> > +
> > +	if (tgt->state != UACCE_Q_STARTED)
> > +		goto out_with_lock;
> > +
> > +	filep = fget(fd);
> > +	if (!filep)
> > +		goto out_with_lock;
> > +
> > +	if (filep->f_op != &uacce_fops)
> > +		goto out_with_fd;
> > +
> > +	src = filep->private_data;
> > +	if (!src)
> > +		goto out_with_fd;
> > +
> > +	if (tgt->uacce->flags & UACCE_DEV_SVA)
> > +		goto out_with_fd;
> > +
> > +	if (!src->qfrs[UACCE_QFRT_SS] || tgt->qfrs[UACCE_QFRT_SS])
> > +		goto out_with_fd;
> > +
> > +	ret = uacce_queue_map_qfr(tgt, src->qfrs[UACCE_QFRT_SS]);
> > +	if (ret)
> > +		goto out_with_fd;
> > +
> > +	tgt->qfrs[UACCE_QFRT_SS] = src->qfrs[UACCE_QFRT_SS];
> > +	list_add(&tgt->list, &src->qfrs[UACCE_QFRT_SS]->qs);
> 
> This list_add() seems bogus as the src->qfrs would already be
> on a list so you are corrupting the list it is on.
> 
> > +
> > +out_with_fd:
> > +	fput(filep);
> > +out_with_lock:
> > +	mutex_unlock(&uacce_mutex);
> > +	return ret;
> > +}
> 
> [...]
> 
> > +static long uacce_fops_unl_ioctl(struct file *filep,
> > +				 unsigned int cmd, unsigned long arg)
> 
> You need to documents properly all ioctl and also you need to
> remove those that are not use by your first driver. They will
> just bit rot as we do not know if they will ever be use.
> 
> > +{
> > +	struct uacce_queue *q = filep->private_data;
> > +	struct uacce_device *uacce = q->uacce;
> > +
> > +	switch (cmd) {
> > +	case UACCE_CMD_SHARE_SVAS:
> > +		return uacce_cmd_share_qfr(q, arg);
> > +
> > +	case UACCE_CMD_START:
> > +		return uacce_start_queue(q);
> > +
> > +	case UACCE_CMD_PUT_Q:
> > +		return uacce_put_queue(q);
> > +
> > +	default:
> > +		if (!uacce->ops->ioctl)
> > +			return -EINVAL;
> > +
> > +		return uacce->ops->ioctl(q, cmd, arg);
> > +	}
> > +}
> > +
> 
> [...]
> 
> > +
> > +static int uacce_dev_open_check(struct uacce_device *uacce)
> > +{
> > +	if (uacce->flags & UACCE_DEV_SVA)
> > +		return 0;
> > +
> > +	/*
> > +	 * The device can be opened once if it does not support pasid
> > +	 */
> > +	if (kref_read(&uacce->cdev->kobj.kref) > 2)
> > +		return -EBUSY;
> 
> You do not check if the device support pasid so comments does not
> match code. Right now code says that you can not open a device more
> than once. Also this check is racy there is no lock protecting the
> read.
> 
> > +
> > +	return 0;
> > +}
> > +
> > +static int uacce_fops_open(struct inode *inode, struct file *filep)
> > +{
> > +	struct uacce_queue *q;
> > +	struct iommu_sva *handle = NULL;
> > +	struct uacce_device *uacce;
> > +	int ret;
> > +	int pasid = 0;
> > +
> > +	uacce = idr_find(&uacce_idr, iminor(inode));
> > +	if (!uacce)
> > +		return -ENODEV;
> > +
> > +	if (!try_module_get(uacce->pdev->driver->owner))
> > +		return -ENODEV;
> > +
> > +	ret = uacce_dev_open_check(uacce);
> > +	if (ret)
> > +		goto out_with_module;
> > +
> > +	if (uacce->flags & UACCE_DEV_SVA) {
> > +		handle = iommu_sva_bind_device(uacce->pdev, current->mm, NULL);
> > +		if (IS_ERR(handle))
> > +			goto out_with_module;
> > +		pasid = iommu_sva_get_pasid(handle);
> > +	}
> 
> The file descriptor can outlive the mm (through fork) what happens
> when the mm dies ? Where is the sva_unbind ? Maybe in iommu code.
> At very least a comment should be added explaining what happens.
> 
> > +
> > +	q = kzalloc(sizeof(struct uacce_queue), GFP_KERNEL);
> > +	if (!q) {
> > +		ret = -ENOMEM;
> > +		goto out_with_module;
> > +	}
> > +
> > +	if (uacce->ops->get_queue) {
> > +		ret = uacce->ops->get_queue(uacce, pasid, q);
> > +		if (ret < 0)
> > +			goto out_with_mem;
> > +	}
> > +
> > +	q->pasid = pasid;
> > +	q->handle = handle;
> > +	q->uacce = uacce;
> > +	q->mm = current->mm;
> > +	memset(q->qfrs, 0, sizeof(q->qfrs));
> > +	INIT_LIST_HEAD(&q->list);
> > +	init_waitqueue_head(&q->wait);
> > +	filep->private_data = q;
> > +	q->state = UACCE_Q_INIT;
> > +
> > +	return 0;
> > +
> > +out_with_mem:
> > +	kfree(q);
> > +out_with_module:
> > +	module_put(uacce->pdev->driver->owner);
> > +	return ret;
> > +}
> > +
> > +static int uacce_fops_release(struct inode *inode, struct file *filep)
> > +{
> > +	struct uacce_queue *q = filep->private_data;
> > +	struct uacce_qfile_region *qfr;
> > +	struct uacce_device *uacce = q->uacce;
> > +	bool is_to_free_region;
> > +	int free_pages = 0;
> > +	int i;
> > +
> > +	mutex_lock(&uacce_mutex);
> > +
> > +	if ((q->state == UACCE_Q_STARTED) && uacce->ops->stop_queue)
> > +		uacce->ops->stop_queue(q);
> > +
> > +	for (i = 0; i < UACCE_QFRT_MAX; i++) {
> > +		qfr = q->qfrs[i];
> > +		if (!qfr)
> > +			continue;
> > +
> > +		is_to_free_region = false;
> > +		uacce_queue_unmap_qfr(q, qfr);
> > +		if (i == UACCE_QFRT_SS) {
> > +			list_del(&q->list);
> > +			if (list_empty(&qfr->qs))
> > +				is_to_free_region = true;
> > +		} else
> > +			is_to_free_region = true;
> > +
> > +		if (is_to_free_region) {
> > +			free_pages += qfr->nr_pages;
> > +			uacce_destroy_region(q, qfr);
> > +		}
> > +
> > +		qfr = NULL;
> > +	}
> > +
> > +	if (current->mm == q->mm) {
> > +		down_write(&q->mm->mmap_sem);
> > +		q->mm->data_vm -= free_pages;
> > +		up_write(&q->mm->mmap_sem);
> 
> This is bogus you do not get any reference on the mm through
> mmgrab() so there is nothing protection the q->mm from being
> release. Note that you do not want to do mmgrab() in open as
> the file descriptor can outlive the mm.
> 
> > +	}
> > +
> > +	if (uacce->flags & UACCE_DEV_SVA)
> > +		iommu_sva_unbind_device(q->handle);
> > +
> > +	if ((q->state == UACCE_Q_INIT || q->state == UACCE_Q_STARTED) &&
> > +	     uacce->ops->put_queue)
> > +		uacce->ops->put_queue(q);
> > +
> > +	kfree(q);
> > +	mutex_unlock(&uacce_mutex);
> > +
> > +	module_put(uacce->pdev->driver->owner);
> 
> As the file can outlive the process it might also outlive the module
> maybe you want to keep a reference on the module as part of the region
> and release it in uacce_destroy_region()
> 
> > +
> > +	return 0;
> > +}
> > +
> > +static int uacce_fops_mmap(struct file *filep, struct vm_area_struct *vma)
> > +{
> > +	struct uacce_queue *q = filep->private_data;
> > +	struct uacce_device *uacce = q->uacce;
> > +	struct uacce_qfile_region *qfr;
> > +	enum uacce_qfrt type = 0;
> > +	unsigned int flags = 0;
> > +	int ret;
> > +
> > +	if (vma->vm_pgoff < UACCE_QFRT_MAX)
> > +		type = vma->vm_pgoff;
> > +
> > +	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
> 
> Don't you also want VM_WIPEONFORK ?
> 
> > +
> > +	mutex_lock(&uacce_mutex);
> > +
> > +	/* fixme: if the region need no pages, we don't need to check it */
> > +	if (q->mm->data_vm + vma_pages(vma) >
> > +	    rlimit(RLIMIT_DATA) >> PAGE_SHIFT) {
> > +		ret = -ENOMEM;
> > +		goto out_with_lock;
> > +	}
> > +
> > +	if (q->qfrs[type]) {
> > +		ret = -EBUSY;
> 
> What about -EEXIST ? That test checks if a region of given
> type already exist for the uacce_queue which is private to
> that file descriptor. So it means that userspace which did
> open the file is trying to create again the same region type
> which already exist.
> 
> > +		goto out_with_lock;
> > +	}
> > +
> > +	switch (type) {
> > +	case UACCE_QFRT_MMIO:
> > +		flags = UACCE_QFRF_SELFMT;
> > +		break;
> > +
> > +	case UACCE_QFRT_SS:
> > +		if (q->state != UACCE_Q_STARTED) {
> > +			ret = -EINVAL;
> > +			goto out_with_lock;
> > +		}
> > +
> > +		if (uacce->flags & UACCE_DEV_SVA) {
> > +			ret = -EINVAL;
> > +			goto out_with_lock;
> > +		}
> > +
> > +		flags = UACCE_QFRF_MAP | UACCE_QFRF_MMAP;
> > +
> > +		break;
> > +
> > +	case UACCE_QFRT_DKO:
> > +		if (uacce->flags & UACCE_DEV_SVA) {
> > +			ret = -EINVAL;
> > +			goto out_with_lock;
> > +		}
> > +
> > +		flags = UACCE_QFRF_MAP | UACCE_QFRF_KMAP;
> > +
> > +		break;
> > +
> > +	case UACCE_QFRT_DUS:
> > +		if (uacce->flags & UACCE_DEV_SVA) {
> > +			flags = UACCE_QFRF_SELFMT;
> > +			break;
> > +		}
> > +
> > +		flags = UACCE_QFRF_MAP | UACCE_QFRF_MMAP;
> > +		break;
> > +
> > +	default:
> > +		WARN_ON(&uacce->dev);
> > +		break;
> > +	}
> > +
> > +	qfr = uacce_create_region(q, vma, type, flags);
> > +	if (IS_ERR(qfr)) {
> > +		ret = PTR_ERR(qfr);
> > +		goto out_with_lock;
> > +	}
> > +	q->qfrs[type] = qfr;
> > +
> > +	if (type == UACCE_QFRT_SS) {
> > +		INIT_LIST_HEAD(&qfr->qs);
> > +		list_add(&q->list, &q->qfrs[type]->qs);
> > +	}
> > +
> > +	mutex_unlock(&uacce_mutex);
> > +
> > +	if (qfr->pages)
> > +		q->mm->data_vm += qfr->nr_pages;
> 
> The mm->data_vm fields is protected by the mmap_sem taken in write
> mode AFAIR so what you are doing here is unsafe.
> 
> > +
> > +	return 0;
> > +
> > +out_with_lock:
> > +	mutex_unlock(&uacce_mutex);
> > +	return ret;
> > +}
> > +
> 
> [...]
> 
> > +/* Borrowed from VFIO to fix msi translation */
> > +static bool uacce_iommu_has_sw_msi(struct iommu_group *group,
> > +				   phys_addr_t *base)
> 
> I fail to see why you need this in a common framework this
> seems to be specific to a device.
> 
> > +{
> > +	struct list_head group_resv_regions;
> > +	struct iommu_resv_region *region, *next;
> > +	bool ret = false;
> > +
> > +	INIT_LIST_HEAD(&group_resv_regions);
> > +	iommu_get_group_resv_regions(group, &group_resv_regions);
> > +	list_for_each_entry(region, &group_resv_regions, list) {
> > +		/*
> > +		 * The presence of any 'real' MSI regions should take
> > +		 * precedence over the software-managed one if the
> > +		 * IOMMU driver happens to advertise both types.
> > +		 */
> > +		if (region->type == IOMMU_RESV_MSI) {
> > +			ret = false;
> > +			break;
> > +		}
> > +
> > +		if (region->type == IOMMU_RESV_SW_MSI) {
> > +			*base = region->start;
> > +			ret = true;
> > +		}
> > +	}
> > +
> > +	list_for_each_entry_safe(region, next, &group_resv_regions, list)
> > +		kfree(region);
> > +
> > +	return ret;
> > +}
> > +
> > +static int uacce_set_iommu_domain(struct uacce_device *uacce)
> > +{
> > +	struct iommu_domain *domain;
> > +	struct iommu_group *group;
> > +	struct device *dev = uacce->pdev;
> > +	bool resv_msi;
> > +	phys_addr_t resv_msi_base = 0;
> > +	int ret;
> > +
> > +	if (uacce->flags & UACCE_DEV_SVA)
> > +		return 0;
> > +
> > +	/* allocate and attach an unmanaged domain */
> > +	domain = iommu_domain_alloc(uacce->pdev->bus);
> > +	if (!domain) {
> > +		dev_err(&uacce->dev, "cannot get domain for iommu\n");
> > +		return -ENODEV;
> > +	}
> > +
> > +	ret = iommu_attach_device(domain, uacce->pdev);
> > +	if (ret)
> > +		goto err_with_domain;
> > +
> > +	if (iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY))
> > +		uacce->prot |= IOMMU_CACHE;
> > +
> > +	group = iommu_group_get(dev);
> > +	if (!group) {
> > +		ret = -EINVAL;
> > +		goto err_with_domain;
> > +	}
> > +
> > +	resv_msi = uacce_iommu_has_sw_msi(group, &resv_msi_base);
> > +	iommu_group_put(group);
> > +
> > +	if (resv_msi) {
> > +		if (!irq_domain_check_msi_remap() &&
> > +		    !iommu_capable(dev->bus, IOMMU_CAP_INTR_REMAP)) {
> > +			dev_warn(dev, "No interrupt remapping support!");
> > +			ret = -EPERM;
> > +			goto err_with_domain;
> > +		}
> > +
> > +		ret = iommu_get_msi_cookie(domain, resv_msi_base);
> > +		if (ret)
> > +			goto err_with_domain;
> > +	}
> > +
> > +	return 0;
> > +
> > +err_with_domain:
> > +	iommu_domain_free(domain);
> > +	return ret;
> > +}
> > +
> > +static void uacce_unset_iommu_domain(struct uacce_device *uacce)
> > +{
> > +	struct iommu_domain *domain;
> > +
> > +	if (uacce->flags & UACCE_DEV_SVA)
> > +		return;
> > +
> > +	domain = iommu_get_domain_for_dev(uacce->pdev);
> > +	if (!domain) {
> > +		dev_err(&uacce->dev, "bug: no domain attached to device\n");
> > +		return;
> > +	}
> > +
> > +	iommu_detach_device(domain, uacce->pdev);
> > +	iommu_domain_free(domain);
> > +}
> > +
> > +/**
> > + * uacce_register - register an accelerator
> > + * @parent: pointer of uacce parent device
> > + * @interface: pointer of uacce_interface for register
> > + */
> > +struct uacce_device *uacce_register(struct device *parent,
> > +				    struct uacce_interface *interface)
> > +{
> > +	int ret;
> > +	struct uacce_device *uacce;
> > +	unsigned int flags = interface->flags;
> > +
> > +	uacce = kzalloc(sizeof(struct uacce_device), GFP_KERNEL);
> > +	if (!uacce)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	if (flags & UACCE_DEV_SVA) {
> > +		ret = iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA);
> > +		if (ret)
> > +			flags &= ~UACCE_DEV_SVA;
> > +	}
> > +
> > +	uacce->pdev = parent;
> > +	uacce->flags = flags;
> > +	uacce->ops = interface->ops;
> > +
> > +	ret = uacce_set_iommu_domain(uacce);
> > +	if (ret)
> > +		goto err_free;
> 
> Why do you need to change the IOMMU domain ? This is orthogonal to
> what you are trying to achieve. Domain has nothing to do with SVA
> or userspace queue (at least not on x86 AFAIK).
> 
> 
> > +
> > +	mutex_lock(&uacce_mutex);
> > +
> > +	ret = idr_alloc(&uacce_idr, uacce, 0, 0, GFP_KERNEL);
> > +	if (ret < 0)
> > +		goto err_with_lock;
> > +
> > +	uacce->cdev = cdev_alloc();
> > +	uacce->cdev->ops = &uacce_fops;
> > +	uacce->dev_id = ret;
> > +	uacce->cdev->owner = THIS_MODULE;
> > +	device_initialize(&uacce->dev);
> > +	uacce->dev.devt = MKDEV(MAJOR(uacce_devt), uacce->dev_id);
> > +	uacce->dev.class = uacce_class;
> > +	uacce->dev.groups = uacce_dev_groups;
> > +	uacce->dev.parent = uacce->pdev;
> > +	uacce->dev.release = uacce_release;
> > +	dev_set_name(&uacce->dev, "%s-%d", interface->name, uacce->dev_id);
> > +	ret = cdev_device_add(uacce->cdev, &uacce->dev);
> > +	if (ret)
> > +		goto err_with_idr;
> > +
> > +	mutex_unlock(&uacce_mutex);
> > +
> > +	return uacce;
> > +
> > +err_with_idr:
> > +	idr_remove(&uacce_idr, uacce->dev_id);
> > +err_with_lock:
> > +	mutex_unlock(&uacce_mutex);
> > +	uacce_unset_iommu_domain(uacce);
> > +err_free:
> > +	if (flags & UACCE_DEV_SVA)
> > +		iommu_dev_disable_feature(uacce->pdev, IOMMU_DEV_FEAT_SVA);
> > +	kfree(uacce);
> > +	return ERR_PTR(ret);
> > +}
> > +EXPORT_SYMBOL_GPL(uacce_register);
> 
> [...]
> 
> > diff --git a/include/linux/uacce.h b/include/linux/uacce.h
> > new file mode 100644
> > index 0000000..8ce0640
> > --- /dev/null
> > +++ b/include/linux/uacce.h
> > @@ -0,0 +1,168 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +#ifndef _LINUX_UACCE_H
> > +#define _LINUX_UACCE_H
> > +
> > +#include <linux/cdev.h>
> > +#include <uapi/misc/uacce/uacce.h>
> > +
> > +#define UACCE_NAME		"uacce"
> > +
> > +struct uacce_queue;
> > +struct uacce_device;
> > +
> > +/* uacce queue file flag, requires different operation */
> > +#define UACCE_QFRF_MAP		BIT(0)	/* map to current queue */
> > +#define UACCE_QFRF_MMAP		BIT(1)	/* map to user space */
> > +#define UACCE_QFRF_KMAP		BIT(2)	/* map to kernel space */
> > +#define UACCE_QFRF_DMA		BIT(3)	/* use dma api for the region */
> > +#define UACCE_QFRF_SELFMT	BIT(4)	/* self maintained qfr */
> > +
> > +/**
> > + * struct uacce_qfile_region - structure of queue file region
> > + * @type: type of the qfr
> > + * @iova: iova share between user and device space
> > + * @pages: pages pointer of the qfr memory
> > + * @nr_pages: page numbers of the qfr memory
> > + * @prot: qfr protection flag
> > + * @flags: flags of qfr
> > + * @qs: list sharing the same region, for ss region
> > + * @kaddr: kernel addr of the qfr
> > + * @dma: dma address, if created by dma api
> > + */
> > +struct uacce_qfile_region {
> > +	enum uacce_qfrt type;
> > +	unsigned long iova;
> > +	struct page **pages;
> > +	u32 nr_pages;
> > +	u32 prot;
> > +	u32 flags;
> > +	struct list_head qs;
> > +	void *kaddr;
> > +	dma_addr_t dma;
> > +};
> > +
> > +/**
> > + * struct uacce_ops - uacce device operations
> > + * @get_available_instances:  get available instances left of the device
> > + * @get_queue: get a queue from the device
> > + * @put_queue: free a queue to the device
> > + * @start_queue: make the queue start work after get_queue
> > + * @stop_queue: make the queue stop work before put_queue
> > + * @is_q_updated: check whether the task is finished
> > + * @mask_notify: mask the task irq of queue
> > + * @mmap: mmap addresses of queue to user space
> > + * @reset: reset the uacce device
> > + * @reset_queue: reset the queue
> > + * @ioctl: ioctl for user space users of the queue
> > + */
> > +struct uacce_ops {
> > +	int (*get_available_instances)(struct uacce_device *uacce);
> > +	int (*get_queue)(struct uacce_device *uacce, unsigned long arg,
> > +			 struct uacce_queue *q);
> > +	void (*put_queue)(struct uacce_queue *q);
> > +	int (*start_queue)(struct uacce_queue *q);
> > +	void (*stop_queue)(struct uacce_queue *q);
> > +	int (*is_q_updated)(struct uacce_queue *q);
> > +	void (*mask_notify)(struct uacce_queue *q, int event_mask);
> > +	int (*mmap)(struct uacce_queue *q, struct vm_area_struct *vma,
> > +		    struct uacce_qfile_region *qfr);
> > +	int (*reset)(struct uacce_device *uacce);
> > +	int (*reset_queue)(struct uacce_queue *q);
> > +	long (*ioctl)(struct uacce_queue *q, unsigned int cmd,
> > +		      unsigned long arg);
> > +};
> > +
> > +/**
> > + * struct uacce_interface
> > + * @name: the uacce device name.  Will show up in sysfs
> > + * @flags: uacce device attributes
> > + * @ops: pointer to the struct uacce_ops
> > + *
> > + * This structure is used for the uacce_register()
> > + */
> > +struct uacce_interface {
> > +	char name[32];
> 
> You should add a define for the maximum lenght of name.
> 
> > +	unsigned int flags;
> 
> Should be enum uacce_dev_flag not unsigned int and that
> enum should be defined above and not in uAPI see comments
> i made next to that enum.
> 
> > +	struct uacce_ops *ops;
> > +};
> > +
> > +enum uacce_q_state {
> > +	UACCE_Q_INIT,
> > +	UACCE_Q_STARTED,
> > +	UACCE_Q_ZOMBIE,
> > +};
> > +
> > +/**
> > + * struct uacce_queue
> > + * @uacce: pointer to uacce
> > + * @priv: private pointer
> > + * @wait: wait queue head
> > + * @pasid: pasid of the queue
> > + * @handle: iommu_sva handle return from iommu_sva_bind_device
> > + * @list: share list for qfr->qs
> > + * @mm: current->mm
> > + * @qfrs: pointer of qfr regions
> > + * @state: queue state machine
> > + */
> > +struct uacce_queue {
> > +	struct uacce_device *uacce;
> > +	void *priv;
> > +	wait_queue_head_t wait;
> > +	int pasid;
> > +	struct iommu_sva *handle;
> > +	struct list_head list;
> > +	struct mm_struct *mm;
> > +	struct uacce_qfile_region *qfrs[UACCE_QFRT_MAX];
> > +	enum uacce_q_state state;
> > +};
> > +
> > +/**
> > + * struct uacce_device
> > + * @algs: supported algorithms
> > + * @api_ver: api version
> > + * @qf_pg_size: page size of the queue file regions
> > + * @ops: pointer to the struct uacce_ops
> > + * @pdev: pointer to the parent device
> > + * @is_vf: whether virtual function
> > + * @flags: uacce attributes
> > + * @dev_id: id of the uacce device
> > + * @prot: uacce protection flag
> > + * @cdev: cdev of the uacce
> > + * @dev: dev of the uacce
> > + * @priv: private pointer of the uacce
> > + */
> > +struct uacce_device {
> > +	const char *algs;
> > +	const char *api_ver;
> > +	unsigned long qf_pg_size[UACCE_QFRT_MAX];
> > +	struct uacce_ops *ops;
> > +	struct device *pdev;
> > +	bool is_vf;
> > +	u32 flags;
> > +	u32 dev_id;
> > +	u32 prot;
> > +	struct cdev *cdev;
> > +	struct device dev;
> > +	void *priv;
> > +};
> > +
> > +#if IS_ENABLED(CONFIG_UACCE)
> > +
> > +struct uacce_device *uacce_register(struct device *parent,
> > +				    struct uacce_interface *interface);
> > +void uacce_unregister(struct uacce_device *uacce);
> > +
> > +#else /* CONFIG_UACCE */
> > +
> > +static inline
> > +struct uacce_device *uacce_register(struct device *parent,
> > +				    struct uacce_interface *interface)
> > +{
> > +	return ERR_PTR(-ENODEV);
> > +}
> > +
> > +static inline void uacce_unregister(struct uacce_device *uacce) {}
> > +
> > +#endif /* CONFIG_UACCE */
> > +
> > +#endif /* _LINUX_UACCE_H */
> > diff --git a/include/uapi/misc/uacce/uacce.h b/include/uapi/misc/uacce/uacce.h
> > new file mode 100644
> > index 0000000..c859668
> > --- /dev/null
> > +++ b/include/uapi/misc/uacce/uacce.h
> > @@ -0,0 +1,41 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +#ifndef _UAPIUUACCE_H
> > +#define _UAPIUUACCE_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/ioctl.h>
> > +
> > +#define UACCE_CMD_SHARE_SVAS	_IO('W', 0)
> > +#define UACCE_CMD_START		_IO('W', 1)
> > +#define UACCE_CMD_PUT_Q		_IO('W', 2)
> > +
> > +/**
> > + * enum uacce_dev_flag: Device flags:
> > + * @UACCE_DEV_SHARE_DOMAIN: no PASID, can share sva for one process
> > + * @UACCE_DEV_SVA: Shared Virtual Addresses
> > + *		   Support PASID
> > + *		   Support device page fault (pcie device) or
> > + *		   smmu stall (platform device)
> > + */
> > +enum uacce_dev_flag {
> > +	UACCE_DEV_SHARE_DOMAIN = 0x0,
> 
> UACCE_DEV_SHARE_DOMAIN is not use anywhere better do not introduce something
> that is not use.
> 
> 
> > +	UACCE_DEV_SVA = 0x1,
> > +};
> 
> More general question why is it part of the UAPI header file ?
> To me it seems that those flags are only use internaly to the
> kernel and never need to be expose to userspace.
> 
> > +
> > +/**
> > + * enum uacce_qfrt: qfrt type
> > + * @UACCE_QFRT_MMIO: device mmio region
> > + * @UACCE_QFRT_DKO: device kernel-only region
> > + * @UACCE_QFRT_DUS: device user share region
> > + * @UACCE_QFRT_SS: static shared memory region
> > + * @UACCE_QFRT_MAX: indicate the boundary
> > + */
> 
> Your first driver only use DUS and MMIO, you should not define
> thing that are not even use by the first driver, especialy when
> it comes to userspace API.
> 
> > +enum uacce_qfrt {
> > +	UACCE_QFRT_MMIO = 0,
> > +	UACCE_QFRT_DKO = 1,
> > +	UACCE_QFRT_DUS = 2,
> > +	UACCE_QFRT_SS = 3,
> > +	UACCE_QFRT_MAX = 16,
> 
> Isn't 16 bit low ? Do you really need a maximum ? I would not
> expose or advertise a maxmimum in this userspace facing header.
> 

-- 
			-Kenneth Lee (Hisilicon)




More information about the Linux-accelerators mailing list