[RFC PATCH 3/7] vfio: add spimdev support

Fri Aug 3 04:43:27 AEST 2018

On Thu, 2 Aug 2018 10:35:28 +0200
Cornelia Huck <cohuck at redhat.com> wrote:

> On Thu, 2 Aug 2018 15:34:40 +0800
> Kenneth Lee <liguozhu at hisilicon.com> wrote:
> 
> > On Thu, Aug 02, 2018 at 04:24:22AM +0000, Tian, Kevin wrote:  
> 
> > > > From: Kenneth Lee [mailto:liguozhu at hisilicon.com]
> > > > Sent: Thursday, August 2, 2018 11:47 AM
> > > >     
> > > > >    
> > > > > > From: Kenneth Lee
> > > > > > Sent: Wednesday, August 1, 2018 6:22 PM
> > > > > >
> > > > > > From: Kenneth Lee <liguozhu at hisilicon.com>
> > > > > >
> > > > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ    
> > > > from    
> > > > > > the general vfio-mdev:
> > > > > >
> > > > > > 1. It shares its parent's IOMMU.
> > > > > > 2. There is no hardware resource attached to the mdev is created. The
> > > > > > hardware resource (A `queue') is allocated only when the mdev is
> > > > > > opened.    
> > > > >
> > > > > Alex has concern on doing so, as pointed out in:
> > > > >
> > > > > 	https://www.spinics.net/lists/kvm/msg172652.html
> > > > >
> > > > > resource allocation should be reserved at creation time.    
> > > > 
> > > > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for "many
> > > > processes", it is just an access point to the process. Not a device to VM. I
> > > > hope
> > > > Alex can accept it:)
> > > >     
> > > 
> > > VFIO is just about assigning device resource to user space. It doesn't care
> > > whether it's native processes or VM using the device so far. Along the direction
> > > which you described, looks VFIO needs to support the configuration that
> > > some mdevs are used for native process only, while others can be used
> > > for both native and VM. I'm not sure whether there is a clean way to
> > > enforce it...    
> > 
> > I had the same idea at the beginning. But finally I found that the life cycle
> > of the virtual device for VM and process were different. Consider you create
> > some mdevs for VM use, you will give all those mdevs to lib-virt, which
> > distribute those mdev to VMs or containers. If the VM or container exits, the
> > mdev is returned to the lib-virt and used for next allocation. It is the
> > administrator who controlled every mdev's allocation.

Libvirt currently does no management of mdev devices, so I believe
this example is fictitious.  The extent of libvirt's interaction with
mdev is that XML may specify an mdev UUID as the source for a hostdev
and set the permissions on the device files appropriately.  Whether
mdevs are created in advance and re-used or created and destroyed
around a VM instance (for example via qemu hooks scripts) is not a
policy that libvirt imposes.

> > But for process, it is different. There is no lib-virt in control. The
> > administrator's intension is to grant some type of application to access the
> > hardware. The application can get a handle of the hardware, send request and get
> > the result. That's all. He/She dose not care which mdev is allocated to that
> > application. If it crashes, it should be the kernel's responsibility to withdraw
> > the resource, the system administrator does not want to do it by hand.  

Libvirt is also not a required component for VM lifecycles, it's an
optional management interface, but there are also VM lifecycles exactly
as you describe.  A VM may want a given type of vGPU, there might be
multiple sources of that type and any instance is fungible to any
other.  Such an mdev can be dynamically created, assigned to the VM,
and destroyed later.  Why do we need to support "empty" mdevs that do
not reserve reserve resources until opened?  The concept of available
instances is entirely lost with that approach and it creates an
environment that's difficult to support, resources may not be available
at the time the user attempts to access them.

> I don't think that you should distinguish the cases by the presence of
> a management application. How can the mdev driver know what the
> intention behind using the device is?

Absolutely, vfio is a userspace driver interface, it's not tailored to
VM usage and we cannot know the intentions of the user.

> Would it make more sense to use a different mechanism to enforce that
> applications only use those handles they are supposed to use? Maybe
> cgroups? I don't think it's a good idea to push usage policy into the
> kernel.

I agree, this sounds like a userspace problem, mdev supports dynamic
creation and removal of mdev devices, if there's an issue with
maintaining a set of standby devices that a user has access to, this
sounds like a userspace broker problem.  It makes more sense to me to
have a model where a userspace application can make a request to a
broker and the broker can reply with "none available" rather than
having a set of devices on standby that may or may not work depending
on the system load and other users.  Thanks,

Alex