[PATCH v3] KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling
Nicholas Piggin
npiggin at gmail.com
Fri Oct 29 11:35:47 AEDT 2021
Excerpts from Laurent Vivier's message of October 28, 2021 10:48 pm:
> On 27/10/2021 16:21, Nicholas Piggin wrote:
>> From: Laurent Vivier <lvivier at redhat.com>
>>
>> Commit 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest
>> context before enabling irqs") moved guest_exit() into the interrupt
>> protected area to avoid wrong context warning (or worse). The problem is
>> that tick-based time accounting has not yet been updated at this point
>> (because it depends on the timer interrupt firing), so the guest time
>> gets incorrectly accounted to system time.
>>
>> To fix the problem, follow the x86 fix in commit 160457140187 ("Defer
>> vtime accounting 'til after IRQ handling"), and allow host IRQs to run
>> before accounting the guest exit time.
>>
>> In the case vtime accounting is enabled, this is not required because TB
>> is used directly for accounting.
>>
>> Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a
>> guest running a kernel compile, the 'guest' fields of /proc/stat are
>> stuck at zero. With the patch they can be observed increasing roughly as
>> expected.
>>
>> Fixes: e233d54d4d97 ("KVM: booke: use __kvm_guest_exit")
>> Fixes: 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs")
>> Cc: <stable at vger.kernel.org> # 5.12
>> Signed-off-by: Laurent Vivier <lvivier at redhat.com>
>> [np: only required for tick accounting, add Book3E fix, tweak changelog]
>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> ---
>> Since v2:
>> - I took over the patch with Laurent's blessing.
>> - Changed to avoid processing IRQs if we do have vtime accounting
>> enabled.
>> - Changed so in either case the accounting is called with irqs disabled.
>> - Added similar Book3E fix.
>> - Rebased on upstream, tested, observed bug and confirmed fix.
>>
>> arch/powerpc/kvm/book3s_hv.c | 30 ++++++++++++++++++++++++++++--
>> arch/powerpc/kvm/booke.c | 16 +++++++++++++++-
>> 2 files changed, 43 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 2acb1c96cfaf..7b74fc0a986b 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -3726,7 +3726,20 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>>
>> kvmppc_set_host_core(pcpu);
>>
>> - guest_exit_irqoff();
>> + context_tracking_guest_exit();
>> + if (!vtime_accounting_enabled_this_cpu()) {
>> + local_irq_enable();
>> + /*
>> + * Service IRQs here before vtime_account_guest_exit() so any
>> + * ticks that occurred while running the guest are accounted to
>> + * the guest. If vtime accounting is enabled, accounting uses
>> + * TB rather than ticks, so it can be done without enabling
>> + * interrupts here, which has the problem that it accounts
>> + * interrupt processing overhead to the host.
>> + */
>> + local_irq_disable();
>> + }
>> + vtime_account_guest_exit();
>>
>> local_irq_enable();
>>
>> @@ -4510,7 +4523,20 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>
>> kvmppc_set_host_core(pcpu);
>>
>> - guest_exit_irqoff();
>> + context_tracking_guest_exit();
>> + if (!vtime_accounting_enabled_this_cpu()) {
>> + local_irq_enable();
>> + /*
>> + * Service IRQs here before vtime_account_guest_exit() so any
>> + * ticks that occurred while running the guest are accounted to
>> + * the guest. If vtime accounting is enabled, accounting uses
>> + * TB rather than ticks, so it can be done without enabling
>> + * interrupts here, which has the problem that it accounts
>> + * interrupt processing overhead to the host.
>> + */
>> + local_irq_disable();
>> + }
>> + vtime_account_guest_exit();
>>
>> local_irq_enable();
>>
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index 977801c83aff..8c15c90dd3a9 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -1042,7 +1042,21 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr)
>> }
>>
>> trace_kvm_exit(exit_nr, vcpu);
>> - guest_exit_irqoff();
>> +
>> + context_tracking_guest_exit();
>> + if (!vtime_accounting_enabled_this_cpu()) {
>> + local_irq_enable();
>> + /*
>> + * Service IRQs here before vtime_account_guest_exit() so any
>> + * ticks that occurred while running the guest are accounted to
>> + * the guest. If vtime accounting is enabled, accounting uses
>> + * TB rather than ticks, so it can be done without enabling
>> + * interrupts here, which has the problem that it accounts
>> + * interrupt processing overhead to the host.
>> + */
>> + local_irq_disable();
>> + }
>> + vtime_account_guest_exit();
>>
>> local_irq_enable();
>>
>>
>
> I'm wondering if we should put the context_tracking_guest_exit() just after the
> "srcu_read_unlock(&vc->kvm->srcu, srcu_idx);" as it was before 61bd0f66ff92 ("KVM: PPC:
> Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN")?
For the run_single_vcpu path, I _think_ it shouldn't matter. It's mostly
just fixing up low level powerpc details.
But actually I wonder whether we should move the guest context entirely
inside the SRCU lock because it's a high level host locking primitive.
For the kvmppc_run_core path, we are actually taking the vc->lock spin
lock as well. Arguably it's waiting for secondary threads in the guest
but I think changing to host context as soon as possible could make
sense. If we don't have an actual bug identified it could wait for next
merge perhaps.
Thanks,
Nick
>
More information about the Linuxppc-dev
mailing list