[PATCH v2 1/1] powerpc/kvm: Save Timebase Offset to fix sched_clock() while running guest code.

Nicholas Piggin npiggin at gmail.com
Sat Feb 6 14:03:09 AEDT 2021


Excerpts from Leonardo Bras's message of February 5, 2021 5:01 pm:
> Hey Nick, thanks for reviewing :)
> 
> On Fri, 2021-02-05 at 16:28 +1000, Nicholas Piggin wrote:
>> Excerpts from Leonardo Bras's message of February 5, 2021 4:06 pm:
>> > Before guest entry, TBU40 register is changed to reflect guest timebase.
>> > After exitting guest, the register is reverted to it's original value.
>> > 
>> > If one tries to get the timestamp from host between those changes, it
>> > will present an incorrect value.
>> > 
>> > An example would be trying to add a tracepoint in
>> > kvmppc_guest_entry_inject_int(), which depending on last tracepoint
>> > acquired could actually cause the host to crash.
>> > 
>> > Save the Timebase Offset to PACA and use it on sched_clock() to always
>> > get the correct timestamp.
>> 
>> Ouch. Not sure how reasonable it is to half switch into guest registers 
>> and expect to call into the wider kernel, fixing things up as we go. 
>> What if mftb is used in other places?
> 
> IIUC, the CPU is not supposed to call anything as host between guest
> entry and guest exit, except guest-related cases, like

When I say "call", I'm including tracing in that. If a function is not 
marked as no trace, then it will call into the tracing subsystem.

> kvmppc_guest_entry_inject_int(), but anyway, if something calls mftb it
> will still get the same value as before.

Right, so it'll be out of whack again.

> This is only supposed to change stuff that depends on sched_clock, like
> Tracepoints, that can happen in those exceptions.

If they depend on sched_clock that's one thing. Do they definitely have 
no dependencies on mftb from other calls?

>> Especially as it doesn't seem like there is a reason that function _has_
>> to be called after the timebase is switched to guest, that's just how 
>> the code is structured.
> 
> Correct, but if called, like in rb routines, used by tracepoints, the
> difference between last tb and current (lower) tb may cause the CPU to
> trap PROGRAM exception, crashing host. 

Yes, so I agree with Michael any function that is involved when we begin 
to switch into guest context (or have not completed switching back to 
host going the other way) should be marked as no trace (noinstr even, 
perhaps).

>> As a local hack to work out a bug okay. If you really need it upstream 
>> could you put it under a debug config option?
> 
> You mean something that is automatically selected whenever those
> configs are enabled? 
> 
> CONFIG_TRACEPOINT && CONFIG_KVM_BOOK3S_HANDLER && CONFIG_PPC_BOOK3S_64
> 
> Or something the user need to select himself in menuconfig?

Yeah I meant a default n thing under powerpc kernel debugging somewhere.

Thanks,
Nick


More information about the Linuxppc-dev mailing list