[PATCH 1/4] powerpc/64s: Clear on-stack exception marker upon exception return

Michael Ellerman mpe at ellerman.id.au
Wed Feb 6 15:44:48 AEDT 2019


Balbir Singh <bsingharora at gmail.com> writes:
> On Tue, Feb 5, 2019 at 10:24 PM Michael Ellerman <mpe at ellerman.id.au> wrote:
>> Balbir Singh <bsingharora at gmail.com> writes:
>> > On Sat, Feb 2, 2019 at 12:14 PM Balbir Singh <bsingharora at gmail.com> wrote:
>> >> On Tue, Jan 22, 2019 at 10:57:21AM -0500, Joe Lawrence wrote:
>> >> > From: Nicolai Stange <nstange at suse.de>
>> >> >
>> >> > The ppc64 specific implementation of the reliable stacktracer,
>> >> > save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
>> >> > trace" whenever it finds an exception frame on the stack. Stack frames
>> >> > are classified as exception frames if the STACK_FRAME_REGS_MARKER magic,
>> >> > as written by exception prologues, is found at a particular location.
>> >> >
>> >> > However, as observed by Joe Lawrence, it is possible in practice that
>> >> > non-exception stack frames can alias with prior exception frames and thus,
>> >> > that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER on
>> >> > the stack. It in turn falsely reports an unreliable stacktrace and blocks
>> >> > any live patching transition to finish. Said condition lasts until the
>> >> > stack frame is overwritten/initialized by function call or other means.
>> >> >
>> >> > In principle, we could mitigate this by making the exception frame
>> >> > classification condition in save_stack_trace_tsk_reliable() stronger:
>> >> > in addition to testing for STACK_FRAME_REGS_MARKER, we could also take into
>> >> > account that for all exceptions executing on the kernel stack
>> >> > - their stack frames's backlink pointers always match what is saved
>> >> >   in their pt_regs instance's ->gpr[1] slot and that
>> >> > - their exception frame size equals STACK_INT_FRAME_SIZE, a value
>> >> >   uncommonly large for non-exception frames.
>> >> >
>> >> > However, while these are currently true, relying on them would make the
>> >> > reliable stacktrace implementation more sensitive towards future changes in
>> >> > the exception entry code. Note that false negatives, i.e. not detecting
>> >> > exception frames, would silently break the live patching consistency model.
>> >> >
>> >> > Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
>> >> > rely on STACK_FRAME_REGS_MARKER as well.
>> >> >
>> >> > Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER
>> >> > for those exceptions running on the "normal" kernel stack and returning
>> >> > to kernelspace: because the topmost frame is ignored by the reliable stack
>> >> > tracer anyway, returns to userspace don't need to take care of clearing
>> >> > the marker.
>> >> >
>> >> > Furthermore, as I don't have the ability to test this on Book 3E or
>> >> > 32 bits, limit the change to Book 3S and 64 bits.
>> >> >
>> >> > Finally, make the HAVE_RELIABLE_STACKTRACE Kconfig option depend on
>> >> > PPC_BOOK3S_64 for documentation purposes. Before this patch, it depended
>> >> > on PPC64 && CPU_LITTLE_ENDIAN and because CPU_LITTLE_ENDIAN implies
>> >> > PPC_BOOK3S_64, there's no functional change here.
>> >> >
>> >> > Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
>> >> > Reported-by: Joe Lawrence <joe.lawrence at redhat.com>
>> >> > Signed-off-by: Nicolai Stange <nstange at suse.de>
>> >> > Signed-off-by: Joe Lawrence <joe.lawrence at redhat.com>
>> >> > ---
>> >> >  arch/powerpc/Kconfig           | 2 +-
>> >> >  arch/powerpc/kernel/entry_64.S | 7 +++++++
>> >> >  2 files changed, 8 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> >> > index 2890d36eb531..73bf87b1d274 100644
>> >> > --- a/arch/powerpc/Kconfig
>> >> > +++ b/arch/powerpc/Kconfig
>> >> > @@ -220,7 +220,7 @@ config PPC
>> >> >       select HAVE_PERF_USER_STACK_DUMP
>> >> >       select HAVE_RCU_TABLE_FREE              if SMP
>> >> >       select HAVE_REGS_AND_STACK_ACCESS_API
>> >> > -     select HAVE_RELIABLE_STACKTRACE         if PPC64 && CPU_LITTLE_ENDIAN
>> >> > +     select HAVE_RELIABLE_STACKTRACE         if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN
>> >> >       select HAVE_SYSCALL_TRACEPOINTS
>> >> >       select HAVE_VIRT_CPU_ACCOUNTING
>> >> >       select HAVE_IRQ_TIME_ACCOUNTING
>> >> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
>> >> > index 435927f549c4..a2c168b395d2 100644
>> >> > --- a/arch/powerpc/kernel/entry_64.S
>> >> > +++ b/arch/powerpc/kernel/entry_64.S
>> >> > @@ -1002,6 +1002,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>> >> >       ld      r2,_NIP(r1)
>> >> >       mtspr   SPRN_SRR0,r2
>> >> >
>> >> > +     /*
>> >> > +      * Leaving a stale exception_marker on the stack can confuse
>> >> > +      * the reliable stack unwinder later on. Clear it.
>> >> > +      */
>> >> > +     li      r2,0
>> >> > +     std     r2,STACK_FRAME_OVERHEAD-16(r1)
>> >> > +
>> >>
>> >> Could you please double check, r4 is already 0 at this point
>> >> IIUC. So the change might be a simple
>> >>
>> >> std r4,STACK_FRAME_OVERHEAD-16(r1)
>> >>
>> >
>> > r4 is not 0, sorry for the noise
>>
>> Isn't it?
>
> It is, I seem to be reading the wrong bits and confused myself, had to
> re-read mtmsrd to ensure it does not modify RS, just MSR. So I guess
> we could reuse r4.

Yeah it's a bit hard to follow now that we have the split exit paths for
user vs kernel. r4 does get used on the return to userspace case, by
ACCOUNT_CPU_USER_EXIT(), but for the return to kernel it's still got
zero in it.

> Should I send a patch on top of this? I have limited testing
> infrastructure at the moment, I could use qemu

I'm not sure. It's a bit fragile relying on the r4 value being zero, it
would be easy to accidentally reuse r4. Though it actually wouldn't
matter as long as r4 never has "regshere" in it.

In fact we could store any random value there, it just needs to not be
the exception marker. eg. we could just stick the SRR0 value in there,
that should never alias with "regshere".

But I think maybe we're over thinking it, the cost of the li is pretty
minimal compared to everything else going on here, and this is only on
the return to kernel case, which is arguably not a super hot path.

cheers


More information about the Linuxppc-dev mailing list