[PATCH v3] KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling

Nicholas Piggin npiggin at gmail.com
Fri Oct 29 11:35:47 AEDT 2021


Excerpts from Laurent Vivier's message of October 28, 2021 10:48 pm:
> On 27/10/2021 16:21, Nicholas Piggin wrote:
>> From: Laurent Vivier <lvivier at redhat.com>
>> 
>> Commit 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest
>> context before enabling irqs") moved guest_exit() into the interrupt
>> protected area to avoid wrong context warning (or worse). The problem is
>> that tick-based time accounting has not yet been updated at this point
>> (because it depends on the timer interrupt firing), so the guest time
>> gets incorrectly accounted to system time.
>> 
>> To fix the problem, follow the x86 fix in commit 160457140187 ("Defer
>> vtime accounting 'til after IRQ handling"), and allow host IRQs to run
>> before accounting the guest exit time.
>> 
>> In the case vtime accounting is enabled, this is not required because TB
>> is used directly for accounting.
>> 
>> Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a
>> guest running a kernel compile, the 'guest' fields of /proc/stat are
>> stuck at zero. With the patch they can be observed increasing roughly as
>> expected.
>> 
>> Fixes: e233d54d4d97 ("KVM: booke: use __kvm_guest_exit")
>> Fixes: 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs")
>> Cc: <stable at vger.kernel.org> # 5.12
>> Signed-off-by: Laurent Vivier <lvivier at redhat.com>
>> [np: only required for tick accounting, add Book3E fix, tweak changelog]
>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> ---
>> Since v2:
>> - I took over the patch with Laurent's blessing.
>> - Changed to avoid processing IRQs if we do have vtime accounting
>>    enabled.
>> - Changed so in either case the accounting is called with irqs disabled.
>> - Added similar Book3E fix.
>> - Rebased on upstream, tested, observed bug and confirmed fix.
>> 
>>   arch/powerpc/kvm/book3s_hv.c | 30 ++++++++++++++++++++++++++++--
>>   arch/powerpc/kvm/booke.c     | 16 +++++++++++++++-
>>   2 files changed, 43 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 2acb1c96cfaf..7b74fc0a986b 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -3726,7 +3726,20 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>>   
>>   	kvmppc_set_host_core(pcpu);
>>   
>> -	guest_exit_irqoff();
>> +	context_tracking_guest_exit();
>> +	if (!vtime_accounting_enabled_this_cpu()) {
>> +		local_irq_enable();
>> +		/*
>> +		 * Service IRQs here before vtime_account_guest_exit() so any
>> +		 * ticks that occurred while running the guest are accounted to
>> +		 * the guest. If vtime accounting is enabled, accounting uses
>> +		 * TB rather than ticks, so it can be done without enabling
>> +		 * interrupts here, which has the problem that it accounts
>> +		 * interrupt processing overhead to the host.
>> +		 */
>> +		local_irq_disable();
>> +	}
>> +	vtime_account_guest_exit();
>>   
>>   	local_irq_enable();
>>   
>> @@ -4510,7 +4523,20 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>   
>>   	kvmppc_set_host_core(pcpu);
>>   
>> -	guest_exit_irqoff();
>> +	context_tracking_guest_exit();
>> +	if (!vtime_accounting_enabled_this_cpu()) {
>> +		local_irq_enable();
>> +		/*
>> +		 * Service IRQs here before vtime_account_guest_exit() so any
>> +		 * ticks that occurred while running the guest are accounted to
>> +		 * the guest. If vtime accounting is enabled, accounting uses
>> +		 * TB rather than ticks, so it can be done without enabling
>> +		 * interrupts here, which has the problem that it accounts
>> +		 * interrupt processing overhead to the host.
>> +		 */
>> +		local_irq_disable();
>> +	}
>> +	vtime_account_guest_exit();
>>   
>>   	local_irq_enable();
>>   
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index 977801c83aff..8c15c90dd3a9 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -1042,7 +1042,21 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr)
>>   	}
>>   
>>   	trace_kvm_exit(exit_nr, vcpu);
>> -	guest_exit_irqoff();
>> +
>> +	context_tracking_guest_exit();
>> +	if (!vtime_accounting_enabled_this_cpu()) {
>> +		local_irq_enable();
>> +		/*
>> +		 * Service IRQs here before vtime_account_guest_exit() so any
>> +		 * ticks that occurred while running the guest are accounted to
>> +		 * the guest. If vtime accounting is enabled, accounting uses
>> +		 * TB rather than ticks, so it can be done without enabling
>> +		 * interrupts here, which has the problem that it accounts
>> +		 * interrupt processing overhead to the host.
>> +		 */
>> +		local_irq_disable();
>> +	}
>> +	vtime_account_guest_exit();
>>   
>>   	local_irq_enable();
>>   
>> 
> 
> I'm wondering if we should put the context_tracking_guest_exit() just after the 
> "srcu_read_unlock(&vc->kvm->srcu, srcu_idx);" as it was before 61bd0f66ff92 ("KVM: PPC: 
> Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN")?

For the run_single_vcpu path, I _think_ it shouldn't matter. It's mostly 
just fixing up low level powerpc details.

But actually I wonder whether we should move the guest context entirely 
inside the SRCU lock because it's a high level host locking primitive.

For the kvmppc_run_core path, we are actually taking the vc->lock spin 
lock as well. Arguably it's waiting for secondary threads in the guest
but I think changing to host context as soon as possible could make
sense. If we don't have an actual bug identified it could wait for next
merge perhaps.

Thanks,
Nick
> 


More information about the Linuxppc-dev mailing list