ppc64le reliable stack unwinder and scheduled tasks

Joe Lawrence joe.lawrence at redhat.com
Fri Jan 11 12:08:08 AEDT 2019


On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote:
> Hi Joe,
> 
> Joe Lawrence <joe.lawrence at redhat.com> writes:
> 
> > tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
> >        about?
> 
> If I'm reading the code in _switch() correctly, the first frame is
> completely uninitialized except for the pointer back to the caller's
> stack frame.
> 
> For completeness: _switch() saves the return address, i.e. the link
> register into its parent's stack frame, as is mandated by the ABI and
> consistent with your findings below: it's always the second stack frame
> where the return address into __switch_to() is kept.
>

Hi Nicolai,

Good, that makes a lot of sense.  I couldn't find any reference
explaining the contents of frame 0, only unwinding code here and there
(as in crash-utility) that stepped over it.
 
> <snip>
> 
> >
> >
> > Example 1 (RHEL-7)
> > ==================
> >
> > crash> struct task_struct.thread c00000022fd015c0 | grep ksp
> >     ksp = 0xc0000000288af9c0
> >
> > crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >
> > sp[0]:
> >
> > c0000000288af9c0:  c0000000288afb90 0000000000dd0000   ...(............
> > c0000000288af9d0:  c000000000002a94 c000000001c60a00   .*..............
> >
> >         crash> sym c000000000002a94
> >         c000000000002a94 (T) hardware_interrupt_common+0x114
> 
> So that c000000000002a94 certainly wasn't stored by _switch(). I think
> what might have happened is that the switching frame aliased with some
> prior interrupt frame as setup by hardware_interrupt_common().
> 
> The interrupt and switching frames seem to share a common layout as far
> as the lower STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) bytes are
> concerned.
> 
> That address into hardware_interrupt_common() could have been written by
> the do_IRQ() called from there.
> 

That was my initial theory, but then when I saw an ordinary scheduled
task with a similarly strange frame 0, I thought that _switch() might
have been doing something clever (or not).  But according your earlier
explanation, it would line up that these values may be inherited from
do_IRQ() or the like.

> 
> > c0000000288af9e0:  c000000001c60a80 0000000000000000   ................
> > c0000000288af9f0:  c0000000288afbc0 0000000000dd0000   ...(............
> > c0000000288afa00:  c0000000014322e0 c000000001c60a00   ."C.............
> > c0000000288afa10:  c0000002303ae380 c0000002303ae380   ..:0......:0....
> > c0000000288afa20:  7265677368657265 0000000000002200   erehsger."......
> >
> >         Uh-oh...
> >
> >         /* Mark stacktraces with exception frames as unreliable. */
> >         stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
> 
> 
> Aliasing of the switching stack frame with some prior interrupt stack
> frame would explain why that STACK_FRAME_REGS_MARKER is still found on
> the stack, i.e. it's a leftover.
> 
> For testing, could you try whether clearing the word at STACK_FRAME_MARKER
> from _switch() helps?
> 
> I.e. something like (completely untested):

I'll kick off some builds tonight and try to get tests lined up
tomorrow.  Unfortunately these take a bit of time to run setup, schedule
and complete, so perhaps by next week.

> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 435927f549c4..b747d0647ec4 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -596,6 +596,10 @@ _GLOBAL(_switch)
>  	SAVE_8GPRS(14, r1)
>  	SAVE_10GPRS(22, r1)
>  	std	r0,_NIP(r1)	/* Return to switch caller */
> +
> +	li	r23,0
> +	std	r23,96(r1)	/* 96 == STACK_FRAME_MARKER * sizeof(long) */
> +
>  	mfcr	r23
>  	std	r23,_CCR(r1)
>  	std	r1,KSP(r3)	/* Set old stack pointer */
> 
> 

This may be sufficient to avoid the condition, but if the contents of
frame 0 are truely uninitialized (not to be trusted), should the
unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER),
aside from the LR and other stack size geometry sanity checks?

> <snap>
> 
> >
> > save_stack_trace_tsk_reliable
> > =============================
> >
> > arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> > take into account the first stackframe, but only to verify that the link
> > register is indeed pointing at kernel code address.
> 
> It's actually the other way around:
> 
> 	if (!firstframe && !__kernel_text_address(ip))
> 		return 1;
> 
> 
> So the address gets sanitized only if it's _not_ coming from the first
> frame.

Yup, that's right, I had it backwards.

Thanks!

-- Joe


More information about the Linuxppc-dev mailing list