To extend the feature of vfio-mdev

Alex Williamson alex.williamson at
Sat Oct 21 03:36:54 AEDT 2017

On Fri, 20 Oct 2017 13:04:43 +0800
Kenneth Lee <liguozhu at> wrote:

> On Thu, Oct 19, 2017 at 12:56:04PM -0600, Alex Williamson wrote:
> > Date: Thu, 19 Oct 2017 12:56:04 -0600
> > From: Alex Williamson <alex.williamson at>
> > To: Kenneth Lee <liguozhu at>
> > CC: Jon Masters <jcm at>, Jon Masters <jcm at>,
> >  Jonathan Cameron <jonathan.cameron at>, liubo95 at,
> >  xuzaibo at
> > Subject: Re: To extend the feature of vfio-mdev
> > Message-ID: <20171019125604.26577eda at t450s.home>
> > 
> > 
> > Hi Kenneth,
> > 
> > On Thu, 19 Oct 2017 12:13:46 +0800
> > Kenneth Lee <liguozhu at> wrote:
> >   
> > > Dear Alex,
> > > 
> > > I hope this mail finding you well. This is to discuss the possibility to
> > > extend the vfio-mdev feature to form a general accelerator framework for
> > > Linux. I name the framework as "WrapDrive".
> > > 
> > > I made a presentation on Linaro Connect SFO17 (ref: 
> > >, and discussed it
> > > with Jon Master. He said he can connect us for further cooperation.
> > > 
> > > The idea of WrapDrive is to create a mdev for every user application so
> > > they can share the same PF or VF facility. This is important to
> > > accelerators, because we cannot create a VF for every process in most
> > > cases.
> > > 
> > > WrapDrive need to add the following feature upon vfio and vfio-mdev
> > > 
> > > 1. Set unified abi in the sysfs so the same type of
> > >    accelerator/algorithm can be managed from the user space  
> > 
> > We already have a defined, standard mdev interface where vendor drivers
> > can add additional attributes.  If warpdrive is a wrapper around
> > vfio-mdev, can't it define standard attributes w/o vfio changes?  
> Yes. We just define necessary attributes so the application with same
> requirements can take it as a whole.
> >   
> > > 2. Let the mdev use the parent dev's iommu facility  
> > 
> > What prevents you from doing this now?  The mdev vendor driver is
> > entirely responsible for managing the DMA of each mdev device.  Mdev
> > vGPUs use the GTT of the parent device to do this today, vfio only
> > tracks user mappings and provides pinned pages to the vendor driver on
> > request.  IOW, this sounds like something within the scope of the
> > vendor driver, not the vfio-mdev core.  
> I'm sorry I don't know much how i915 work. But according to the implementation
> of vfio_iommu_type1_attach_group, the mdev's iommu_group is added to the
> external_domain list. But vfio_iommu_map() iommu_map() only the domain list.
> Therefore, if ioctl(VFIO_IOMMU_MAP_DMA) to the mdev's iommu_group, it won't do
> anything. What is mdev vendor driver expected to do? Should it register to the
> notification chain or adopted another interface to do so? Is this intended by
> the mdev driver? I think it may be necessary to provide some standard way by
> default.

This is the \mediation\ of a mediated driver, it needs to be aware of
any DMA that the device might perform within the user address space and
request pinning of those pages through the mdev interface.
Additionally, when an IOMMU is active on the host, it's the mdev vendor
driver's responsibility to setup any necessary IOMMU mappings for the
mdev.  The mdev device works within the IOMMU context of the parent
device.  There is no magic "map everything" option with mdev as there is
for IOMMU isolated devices.  Part of the idea of mdev is that isolation
can be provided by device specific means, such as GTTs for vGPUs.  We
currently have only an invalidation notifier such that vendor drivers
can invalidate pinned mappings when unmapped by the user, the mapping
path presumes device mediation to explicitly request page pinning based
on device configuration.
> > > 3. Let iommu driver accept more than one iommu_domain for the same
> > >    device. The substream id or pasid should be support for that  
> > 
> > You're really extending the definition of an iommu_domain to include
> > PASID to do this, I don't think it makes sense in the general case.  So
> > perhaps you're talking about a PASID management layer sitting on top of
> > an iommu_domain.  AIUI for PCIe, a device has a requester ID which is
> > used to find the context entry for that device.  The IOMMU may support
> > PASID, which would cause a first level lookup via those set of page
> > tables, or it might only support second level translation.  The
> > iommu_domain is a reflection of that initial, single requester ID.  
> Maybe I misunderstand this. But the IOMMU hardware, such as SMMU for ARM,
> support multiple page table and is referred by something like ASID. If we should
> support it in Linux, iommu_domain should be the best choice (no matter you call
> it cookie or id or something else). Or where you can get a object referring to it?

For PASID, a PASID is unique only within the requester ID.  I don't
know of anything equivalent to your ASID within PCIe.

> > > 4. Support SVM in vfio and iommu  
> > 
> > There are numerous discussions about this ongoing.  
> Yes. I just said we needed the support.

It seems like this is the crux of your design if you're looking for
IOMMU based isolation based on PASID with dynamic mapping of the
process address space.  There was quite a lot of discussion about this
at the PCI/IOMMU/VFIO uconf at LPC this year and the details of the
IOMMU API interfaces are currently being developed.  This has
implications for both directly assigned vfio devices as well as the
potential to further enhance vfio-mdev such that DMA isolation and
mapping might be managed in a common way while the vendor driver
manages only the partitioning of the device.

> > > We have some PoC code here:
> > >
> > > with doc in Documentation/wrapdrive. We are currently keep the code with
> > > our crypt drive.
> > > 
> > > But we hope it can be used broadly, Do you think we can add the module
> > > in vfio subsystem?  
> > 
> > I think what you're describing is mostly a wrapper around the existing
> > vfio-mdev model, I don't think it's necessarily part of the vfio
> > subsystem.  As SVM support is added to vfio, I expect we'll have new
> > ioctls for things such as binding the PASID table to a container and
> > vfio-mdev would need to be extended to support that, allowing the
> > vendor driver to apply that PASID table to the iommu_domain of the host
> > device.  Is "warpdrive_k" effectively a shim layer for accelerator type
> > devices to make use of vfio-mdev in a more common way and sharing more
> > code than the existing vGPU related mdev drivers?  Thanks,
> >   
> Yes, we can also put it into drivers/misc. But we think we create a heavy
> dependence on mdev. So we want to know your points. Thanks.

I think it largely depends on where the SVM work leads, if we develop a
PASID bind interface for the vfio API and introduce core mdev support
for that as well, such that "warpdrive" becomes just some wrapper code
with common accelerator attributes, then it might make sense to include
it into vfio-mdev.  This has benefits for vfio as well since mdev
isolation is a bit too dependent on the meticulousness of the vendor
driver.  Thanks,


More information about the Linux-accelerators mailing list