[PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support

Paul Mackerras paulus at ozlabs.org
Tue Jan 29 13:45:03 AEDT 2019


On Mon, Jan 28, 2019 at 07:26:00PM +0100, Cédric Le Goater wrote:
> On 1/28/19 7:13 AM, Paul Mackerras wrote:
> > Would we end up with too many VMAs if we just used mmap() to
> > change the mappings from the software-generated pages to the
> > hardware-generated interrupt pages?  
> The sPAPR IRQ number space is 0x8000 wide now. The first 4K are 
> dedicated to CPU IPIs and the remaining 4K are for devices. We can 

Confused.  You say the number space has 32768 entries but then imply
there are only 8K entries.  Do you mean that the API allows for 15-bit
IRQ numbers but we are only making using of 8192 of them?

> extend the last range if needed as these are for MSIs. Dynamic 
> extensions under KVM should work also.
> 
> This to say that we have with 8K x 2 (trigger+EOI) pages. This is a
> lot of mmap(), too much. Also the KVM model needs to be compatible

I wasn't suggesting an mmap per IRQ, I meant that the bulk of the
space would be covered by a single mmap, overlaid by subsequent mmaps
where we need to map real device interrupts.

> with the QEMU emulated one and it was simpler to have one overall
> memory region for the IPI ESBs, one for the END ESBs (if we support
> that one day) and one for the TIMA.
> 
> > Are the necessary pages for a PCI
> > passthrough device contiguous in both host real space 
> 
> They should as they are the PHB4 ESBs.
> 
> > and guest real space ? 
> 
> also. That's how we organized the mapping. 

"How we organized the mapping" is a significant design decision that I
haven't seen documented anywhere, and is really needed for
understanding what's going on.

> 
> > If so we'd only need one mmap() for all the device's interrupt
> > pages.
> 
> Ah. So we would want to make a special case for the passthrough 
> device and have a mmap() and a memory region for its ESBs. Hmm.
> 
> Wouldn't that require to hot plug a memory region under the guest ? 

No; the way that a memory region works is that userspace can do
whatever disparate mappings it likes within the region on the user
process side, and the corresponding region of guest real address space
follows that automatically.

> which means that we need to provision an address space/container 
> region for theses regions. What are the benefits ? 
> 
> Is clearing the PTEs and repopulating the VMA unsafe ? 

Explicitly unmapping parts of the VMA seems like the wrong way to do
it.  If you have a device mmap where the device wants to change the
physical page underlying parts of the mapping, there should be a way
for it to do that explicitly (but I don't know off the top of my head
what the interface to do that is).

However, I still haven't seen a clear and concise explanation of what
is being changed, when, and why we need to do that.

Paul.


More information about the Linuxppc-dev mailing list