[PATCH] KVM: PPC: Book3S HV nestedv2: Cancel pending HDEC exception

Michael Ellerman mpe at ellerman.id.au
Fri Apr 5 14:20:21 AEDT 2024


"Linux regression tracking (Thorsten Leemhuis)"
<regressions at leemhuis.info> writes:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> Was this regression ever resolved? Doesn't look like it, but maybe I
> just missed something.

I'm not sure how it ended up on the regression list. IMHO it's not
really a regression. It was an attempt at a performance optimisation,
which is no longer needed due to changes in (unreleased) firmware.

I haven't merged it because Nick's reply contained several questions for
Vaibhav, so I'm expecting either a reply to those or a new version of
the patch.

cheers

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 20.03.24 14:43, Nicholas Piggin wrote:
>> On Wed Mar 13, 2024 at 5:26 PM AEST, Vaibhav Jain wrote:
>>> This reverts commit 180c6b072bf360b686e53d893d8dcf7dbbaec6bb ("KVM: PPC:
>>> Book3S HV nestedv2: Do not cancel pending decrementer exception") which
>>> prevented cancelling a pending HDEC exception for nestedv2 KVM guests. It
>>> was done to avoid overhead of a H_GUEST_GET_STATE hcall to read the 'HDEC
>>> expiry TB' register which was higher compared to handling extra decrementer
>>> exceptions.
>>>
>>> This overhead of reading 'HDEC expiry TB' register has been mitigated
>>> recently by the L0 hypervisor(PowerVM) by putting the value of this
>>> register in L2 guest-state output buffer on trap to L1. From there the
>>> value of this register is cached, made available in kvmhv_run_single_vcpu()
>>> to compare it against host(L1) timebase and cancel the pending hypervisor
>>> decrementer exception if needed.
>> 
>> Ah, I figured out the problem here. Guest entry never clears the
>> queued dec, because it's level triggered on the DEC MSB so it
>> doesn't go away when it's delivered. So upstream code is indeed
>> buggy and I think I take the blame for suggesting this nestedv2
>> workaround.
>> 
>> I actually don't think that is necessary though, we could treat it
>> like other interrupts.  I think that would solve the problem without
>> having to test dec here.
>> 
>> I am wondering though, what workload slows down that this patch
>> was needed in the first place. We'd only get here after a cede
>> returns, then we'd dequeue the dec and stop having to GET_STATE
>> it here.
>> 
>> Thanks,
>> Nick
>> 
>>>
>>> Fixes: 180c6b072bf3 ("KVM: PPC: Book3S HV nestedv2: Do not cancel pending decrementer exception")
>>> Signed-off-by: Vaibhav Jain <vaibhav at linux.ibm.com>
>>> ---
>>>  arch/powerpc/kvm/book3s_hv.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 0b921704da45..e47b954ce266 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -4856,7 +4856,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>>  	 * entering a nested guest in which case the decrementer is now owned
>>>  	 * by L2 and the L1 decrementer is provided in hdec_expires
>>>  	 */
>>> -	if (!kvmhv_is_nestedv2() && kvmppc_core_pending_dec(vcpu) &&
>>> +	if (kvmppc_core_pending_dec(vcpu) &&
>>>  			((tb < kvmppc_dec_expires_host_tb(vcpu)) ||
>>>  			 (trap == BOOK3S_INTERRUPT_SYSCALL &&
>>>  			  kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED)))
>> 


More information about the Linuxppc-dev mailing list