kvm PCI assignment & VFIO ramblings

Alex Williamson alex.williamson at redhat.com
Wed Aug 3 11:02:31 EST 2011


On Tue, 2011-08-02 at 17:29 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Aug 02, 2011 at 09:34:58AM -0600, Alex Williamson wrote:
> > On Tue, 2011-08-02 at 22:58 +1000, Benjamin Herrenschmidt wrote:
> > > 
> > > Don't worry, it took me a while to get my head around the HW :-) SR-IOV
> > > VFs will generally not have limitations like that no, but on the other
> > > hand, they -will- still require 1 VF = 1 group, ie, you won't be able to
> > > take a bunch of VFs and put them in the same 'domain'.
> > > 
> > > I think the main deal is that VFIO/qemu sees "domains" as "guests" and
> > > tries to put all devices for a given guest into a "domain".
> > 
> > Actually, that's only a recent optimization, before that each device got
> > it's own iommu domain.  It's actually completely configurable on the
> > qemu command line which devices get their own iommu and which share.
> > The default optimizes the number of domains (one) and thus the number of
> > mapping callbacks since we pin the entire guest.
> > 
> > > On POWER, we have a different view of things were domains/groups are
> > > defined to be the smallest granularity we can (down to a single VF) and
> > > we give several groups to a guest (ie we avoid sharing the iommu in most
> > > cases)
> > > 
> > > This is driven by the HW design but that design is itself driven by the
> > > idea that the domains/group are also error isolation groups and we don't
> > > want to take all of the IOs of a guest down if one adapter in that guest
> > > is having an error.
> > > 
> > > The x86 domains are conceptually different as they are about sharing the
> > > iommu page tables with the clear long term intent of then sharing those
> > > page tables with the guest CPU own. We aren't going in that direction
> > > (at this point at least) on POWER..
> > 
> > Yes and no.  The x86 domains are pretty flexible and used a few
> > different ways.  On the host we do dynamic DMA with a domain per device,
> > mapping only the inflight DMA ranges.  In order to achieve the
> > transparent device assignment model, we have to flip that around and map
> > the entire guest.  As noted, we can continue to use separate domains for
> > this, but since each maps the entire guest, it doesn't add a lot of
> > value and uses more resources and requires more mapping callbacks (and
> > x86 doesn't have the best error containment anyway).  If we had a well
> > supported IOMMU model that we could adapt for pvDMA, then it would make
> > sense to keep each device in it's own domain again.  Thanks,
> 
> Could you have an PV IOMMU (in the guest) that would set up those
> maps?

Yep, definitely.  That's effectively what power wants to do.  We could
do it on x86, but as others have noted, the map/unmap interface isn't
tuned to do this at that granularity and our target guest OS audience is
effectively reduced to Linux.  Thanks,

Alex



More information about the Linuxppc-dev mailing list