[PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

Wed Jul 10 04:46:45 EST 2013

On 07/09/2013 12:44:32 PM, Alexander Graf wrote:
> On 07/09/2013 07:13 PM, Scott Wood wrote:
>> On 07/08/2013 08:39:05 AM, Alexander Graf wrote:
>>> 
>>> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>>> 
>>> > lwepx faults needs to be handled by KVM and this implies  
>>> additional code
>>> > in DO_KVM macro to identify the source of the exception  
>>> originated from
>>> > host context. This requires to check the Exception Syndrome  
>>> Register
>>> > (ESR[EPID]) and External PID Load Context Register (EPLC[EGS])  
>>> for DTB_MISS,
>>> > DSI and LRAT exceptions which is too intrusive for the host.
>>> >
>>> > Get rid of lwepx and acquire last instuction in  
>>> kvmppc_handle_exit() by
>>> > searching for the physical address and kmap it. This fixes an  
>>> infinite loop
>>> 
>>> What's the difference in speed for this?
>>> 
>>> Also, could we call lwepx later in host code, when  
>>> kvmppc_get_last_inst() gets invoked?
>> 
>> Any use of lwepx is problematic unless we want to add overhead to  
>> the main Linux TLB miss handler.
> 
> What exactly would be missing?

If lwepx faults, it goes to the normal host TLB miss handler.  Without  
adding code to it to recognize that it's an external-PID fault, it will  
try to search the normal Linux page tables and insert a normal host  
entry.  If it thinks it has succeeded, it will retry the instruction  
rather than search for an exception handler.  The instruction will  
fault again, and you get a hang.

> I'd also still like to see some performance benchmarks on this to  
> make sure we're not walking into a bad direction.

I doubt it'll be significantly different.  There's overhead involved in  
setting up for lwepx as well.  It doesn't hurt to test, though this is  
a functional correctness issue, so I'm not sure what better  
alternatives we have.  I don't want to slow down non-KVM TLB misses for  
this.

>>> > +    addr = (mas7_mas3 & (~0ULL << psize_shift)) |
>>> > +           (geaddr & ((1ULL << psize_shift) - 1ULL));
>>> > +
>>> > +    /* Map a page and get guest's instruction */
>>> > +    page = pfn_to_page(addr >> PAGE_SHIFT);
>>> 
>>> So it seems to me like you're jumping through a lot of hoops to  
>>> make sure this works for LRAT and non-LRAT at the same time. Can't  
>>> we just treat them as the different things they are?
>>> 
>>> What if we have different MMU backends for LRAT and non-LRAT? The  
>>> non-LRAT case could then try lwepx, if that fails, fall back to  
>>> read the shadow TLB. For the LRAT case, we'd do lwepx, if that  
>>> fails fall back to this logic.
>> 
>> This isn't about LRAT; it's about hardware threads.  It also fixes  
>> the handling of execute-only pages on current chips.
> 
> On non-LRAT systems we could always check our shadow copy of the  
> guest's TLB, no? I'd really like to know what the performance  
> difference would be for the 2 approaches.

I suspect that tlbsx is faster, or at worst similar.  And unlike  
comparing tlbsx to lwepx (not counting a fix for the threading  
problem), we don't already have code to search the guest TLB, so  
testing would be more work.

-Scott