kvm PCI assignment & VFIO ramblings

Konrad Rzeszutek Wilk konrad.wilk at oracle.com
Wed Aug 3 07:29:49 EST 2011


On Tue, Aug 02, 2011 at 09:34:58AM -0600, Alex Williamson wrote:
> On Tue, 2011-08-02 at 22:58 +1000, Benjamin Herrenschmidt wrote:
> > 
> > Don't worry, it took me a while to get my head around the HW :-) SR-IOV
> > VFs will generally not have limitations like that no, but on the other
> > hand, they -will- still require 1 VF = 1 group, ie, you won't be able to
> > take a bunch of VFs and put them in the same 'domain'.
> > 
> > I think the main deal is that VFIO/qemu sees "domains" as "guests" and
> > tries to put all devices for a given guest into a "domain".
> 
> Actually, that's only a recent optimization, before that each device got
> it's own iommu domain.  It's actually completely configurable on the
> qemu command line which devices get their own iommu and which share.
> The default optimizes the number of domains (one) and thus the number of
> mapping callbacks since we pin the entire guest.
> 
> > On POWER, we have a different view of things were domains/groups are
> > defined to be the smallest granularity we can (down to a single VF) and
> > we give several groups to a guest (ie we avoid sharing the iommu in most
> > cases)
> > 
> > This is driven by the HW design but that design is itself driven by the
> > idea that the domains/group are also error isolation groups and we don't
> > want to take all of the IOs of a guest down if one adapter in that guest
> > is having an error.
> > 
> > The x86 domains are conceptually different as they are about sharing the
> > iommu page tables with the clear long term intent of then sharing those
> > page tables with the guest CPU own. We aren't going in that direction
> > (at this point at least) on POWER..
> 
> Yes and no.  The x86 domains are pretty flexible and used a few
> different ways.  On the host we do dynamic DMA with a domain per device,
> mapping only the inflight DMA ranges.  In order to achieve the
> transparent device assignment model, we have to flip that around and map
> the entire guest.  As noted, we can continue to use separate domains for
> this, but since each maps the entire guest, it doesn't add a lot of
> value and uses more resources and requires more mapping callbacks (and
> x86 doesn't have the best error containment anyway).  If we had a well
> supported IOMMU model that we could adapt for pvDMA, then it would make
> sense to keep each device in it's own domain again.  Thanks,

Could you have an PV IOMMU (in the guest) that would set up those
maps?


More information about the Linuxppc-dev mailing list