To extend the feature of vfio-mdev

Kenneth Lee liguozhu at hisilicon.com
Mon Oct 23 16:18:44 AEDT 2017


On Fri, Oct 20, 2017 at 10:36:54AM -0600, Alex Williamson wrote:
> Date: Fri, 20 Oct 2017 10:36:54 -0600
> From: Alex Williamson <alex.williamson at redhat.com>
> To: Kenneth Lee <liguozhu at hisilicon.com>
> CC: Jon Masters <jcm at jonmasters.org>, Jon Masters <jcm at redhat.com>,
>  Jonathan Cameron <jonathan.cameron at huawei.com>, liubo95 at huawei.com,
>  xuzaibo at huawei.com, linux-accelerators at lists.ozlabs.org,
>  kenneth-lee-2012 at foxmail.com
> Subject: Re: To extend the feature of vfio-mdev
> Message-ID: <20171020103654.6217ab2c at t450s.home>
> 
> On Fri, 20 Oct 2017 13:04:43 +0800
> Kenneth Lee <liguozhu at hisilicon.com> wrote:
> 
> > On Thu, Oct 19, 2017 at 12:56:04PM -0600, Alex Williamson wrote:
> > > Date: Thu, 19 Oct 2017 12:56:04 -0600
> > > From: Alex Williamson <alex.williamson at redhat.com>
> > > To: Kenneth Lee <liguozhu at hisilicon.com>
> > > CC: Jon Masters <jcm at jonmasters.org>, Jon Masters <jcm at redhat.com>,
> > >  Jonathan Cameron <jonathan.cameron at huawei.com>, liubo95 at huawei.com,
> > >  xuzaibo at huawei.com
> > > Subject: Re: To extend the feature of vfio-mdev
> > > Message-ID: <20171019125604.26577eda at t450s.home>
> > > 
> > > 
> > > Hi Kenneth,
> > > 
> > > On Thu, 19 Oct 2017 12:13:46 +0800
> > > Kenneth Lee <liguozhu at hisilicon.com> wrote:
> > >   
> > > > Dear Alex,
> > > > 
> > > > I hope this mail finding you well. This is to discuss the possibility to
> > > > extend the vfio-mdev feature to form a general accelerator framework for
> > > > Linux. I name the framework as "WrapDrive".
> > > > 
> > > > I made a presentation on Linaro Connect SFO17 (ref: 
> > > > http://connect.linaro.org/resource/sfo17/sfo17-317/), and discussed it
> > > > with Jon Master. He said he can connect us for further cooperation.
> > > > 
> > > > The idea of WrapDrive is to create a mdev for every user application so
> > > > they can share the same PF or VF facility. This is important to
> > > > accelerators, because we cannot create a VF for every process in most
> > > > cases.
> > > > 
> > > > WrapDrive need to add the following feature upon vfio and vfio-mdev
> > > > 
> > > > 1. Set unified abi in the sysfs so the same type of
> > > >    accelerator/algorithm can be managed from the user space  
> > > 
> > > We already have a defined, standard mdev interface where vendor drivers
> > > can add additional attributes.  If warpdrive is a wrapper around
> > > vfio-mdev, can't it define standard attributes w/o vfio changes?  
> > 
> > Yes. We just define necessary attributes so the application with same
> > requirements can take it as a whole.
> > 
> > >   
> > > > 2. Let the mdev use the parent dev's iommu facility  
> > > 
> > > What prevents you from doing this now?  The mdev vendor driver is
> > > entirely responsible for managing the DMA of each mdev device.  Mdev
> > > vGPUs use the GTT of the parent device to do this today, vfio only
> > > tracks user mappings and provides pinned pages to the vendor driver on
> > > request.  IOW, this sounds like something within the scope of the
> > > vendor driver, not the vfio-mdev core.  
> > 
> > I'm sorry I don't know much how i915 work. But according to the implementation
> > of vfio_iommu_type1_attach_group, the mdev's iommu_group is added to the
> > external_domain list. But vfio_iommu_map() iommu_map() only the domain list.
> > 
> > Therefore, if ioctl(VFIO_IOMMU_MAP_DMA) to the mdev's iommu_group, it won't do
> > anything. What is mdev vendor driver expected to do? Should it register to the
> > notification chain or adopted another interface to do so? Is this intended by
> > the mdev driver? I think it may be necessary to provide some standard way by
> > default.
> 
> This is the \mediation\ of a mediated driver, it needs to be aware of
> any DMA that the device might perform within the user address space and
> request pinning of those pages through the mdev interface.
> Additionally, when an IOMMU is active on the host, it's the mdev vendor
> driver's responsibility to setup any necessary IOMMU mappings for the
> mdev.  The mdev device works within the IOMMU context of the parent
> device.  There is no magic "map everything" option with mdev as there is
> for IOMMU isolated devices.  Part of the idea of mdev is that isolation
> can be provided by device specific means, such as GTTs for vGPUs.  We
> currently have only an invalidation notifier such that vendor drivers
> can invalidate pinned mappings when unmapped by the user, the mapping
> path presumes device mediation to explicitly request page pinning based
> on device configuration.

So can we add a special case for mdev? : A device with iommu support can
generate some mdevs. Each of them can be used by one user application.
This won't change the original architecutre of mdev, but it is quite
general for accelerators, who have only one IOMMU ID(RequestID) but can service
more than one application.

>   
> > > > 3. Let iommu driver accept more than one iommu_domain for the same
> > > >    device. The substream id or pasid should be support for that  
> > > 
> > > You're really extending the definition of an iommu_domain to include
> > > PASID to do this, I don't think it makes sense in the general case.  So
> > > perhaps you're talking about a PASID management layer sitting on top of
> > > an iommu_domain.  AIUI for PCIe, a device has a requester ID which is
> > > used to find the context entry for that device.  The IOMMU may support
> > > PASID, which would cause a first level lookup via those set of page
> > > tables, or it might only support second level translation.  The
> > > iommu_domain is a reflection of that initial, single requester ID.  
> > 
> > Maybe I misunderstand this. But the IOMMU hardware, such as SMMU for ARM,
> > support multiple page table and is referred by something like ASID. If we should
> > support it in Linux, iommu_domain should be the best choice (no matter you call
> > it cookie or id or something else). Or where you can get a object referring to it?
> 
> For PASID, a PASID is unique only within the requester ID.  I don't
> know of anything equivalent to your ASID within PCIe.
> 

As my understanding to the ARM Spec:

	PCIE[PASID]=ARM_SMMU[ASID]=ARM_SMMU[SubstreamID],
	PCIE[RequestID]=ARM_SMMU[StreamID]

For ARM SMMU, the Stream ID is used to index a Context Descriptors
Table while and Sub-stream ID is used to index a Descriptor which refer
to a general page table in the same format as MMU.

So both RequestID and PASID uniquely identify a address space. Then
every iommu-enabled device can service more than one user process at the
same time.

So the IOMMU should support more than one page table, which in turn
should be added to somewhere in Linux. If the iommu_domain refer to one
address space, the iommu driver should accept more than one
iommu_domain.

> > > > 4. Support SVM in vfio and iommu  
> > > 
> > > There are numerous discussions about this ongoing.  
> > 
> > Yes. I just said we needed the support.
> 
> It seems like this is the crux of your design if you're looking for
> IOMMU based isolation based on PASID with dynamic mapping of the
> process address space.  There was quite a lot of discussion about this
> at the PCI/IOMMU/VFIO uconf at LPC this year and the details of the
> IOMMU API interfaces are currently being developed.  This has
> implications for both directly assigned vfio devices as well as the
> potential to further enhance vfio-mdev such that DMA isolation and
> mapping might be managed in a common way while the vendor driver
> manages only the partitioning of the device.
> 

Yes, this is important to WrapDrive. But it is optional. A explicit
wd_mem_share() is still quite valuable to many application.

We will look into the detail to the PCI/IOMMU/VFIO uconf development and
see how we can get involved.

> > > > We have some PoC code here:
> > > > https://github.com/Kenneth-Lee/linux-kernel-wrapdrive
> > > > with doc in Documentation/wrapdrive. We are currently keep the code with
> > > > our crypt drive.
> > > > 
> > > > But we hope it can be used broadly, Do you think we can add the module
> > > > in vfio subsystem?  
> > > 
> > > I think what you're describing is mostly a wrapper around the existing
> > > vfio-mdev model, I don't think it's necessarily part of the vfio
> > > subsystem.  As SVM support is added to vfio, I expect we'll have new
> > > ioctls for things such as binding the PASID table to a container and
> > > vfio-mdev would need to be extended to support that, allowing the
> > > vendor driver to apply that PASID table to the iommu_domain of the host
> > > device.  Is "warpdrive_k" effectively a shim layer for accelerator type
> > > devices to make use of vfio-mdev in a more common way and sharing more
> > > code than the existing vGPU related mdev drivers?  Thanks,
> > >   
> > 
> > Yes, we can also put it into drivers/misc. But we think we create a heavy
> > dependence on mdev. So we want to know your points. Thanks.
> 
> I think it largely depends on where the SVM work leads, if we develop a
> PASID bind interface for the vfio API and introduce core mdev support
> for that as well, such that "warpdrive" becomes just some wrapper code
> with common accelerator attributes, then it might make sense to include
> it into vfio-mdev.  This has benefits for vfio as well since mdev
> isolation is a bit too dependent on the meticulousness of the vendor
> driver.  Thanks,

Thanks. We try to introduce new requirements into vfio and mdev. So we
can enlarge the ecosystem of vfio/mdev.

So how about we start tried our code from vfio directory, and if you
think it is ok to take, we start from there. But if you think it is not
worthy, we then try driver/misc again?

> 
> Alex

-- 
			-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the 
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!



More information about the Linux-accelerators mailing list