ppc64le reliable stack unwinder and scheduled tasks

Nicolai Stange nstange at suse.de
Mon Jan 14 18:21:40 AEDT 2019


Joe Lawrence <joe.lawrence at redhat.com> writes:

> On Fri, Jan 11, 2019 at 08:51:54AM +0100, Nicolai Stange wrote:
>> Joe Lawrence <joe.lawrence at redhat.com> writes:
>> 
>> > On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote:
>
> [ ... snip ... ]
>
>> >> For testing, could you try whether clearing the word at STACK_FRAME_MARKER
>> >> from _switch() helps?
>> >> 
>> >> I.e. something like (completely untested):
>> >
>> > I'll kick off some builds tonight and try to get tests lined up
>> > tomorrow.  Unfortunately these take a bit of time to run setup, schedule
>> > and complete, so perhaps by next week.
>> 
>> Ok, that's probably still a good test for confirmation, but the solution
>> you proposed below is much better.
>
> Good news, 40 tests (RHEL-7 backport) with clearing out the
> STACK_FRAME_MARKER were all successful.  Previously I would see stalled
> livepatch transitions due to the unreliable report in ~10% of test
> cases.
>
>> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
>> >> index 435927f549c4..b747d0647ec4 100644
>> >> --- a/arch/powerpc/kernel/entry_64.S
>> >> +++ b/arch/powerpc/kernel/entry_64.S
>> >> @@ -596,6 +596,10 @@ _GLOBAL(_switch)
>> >>  	SAVE_8GPRS(14, r1)
>> >>  	SAVE_10GPRS(22, r1)
>> >>  	std	r0,_NIP(r1)	/* Return to switch caller */
>> >> +
>> >> +	li	r23,0
>> >> +	std	r23,96(r1)	/* 96 == STACK_FRAME_MARKER * sizeof(long) */
>> >> +
>> >>  	mfcr	r23
>> >>  	std	r23,_CCR(r1)
>> >>  	std	r1,KSP(r3)	/* Set old stack pointer */
>> >> 
>> >> 
>> >
>> > This may be sufficient to avoid the condition, but if the contents of
>> > frame 0 are truely uninitialized (not to be trusted), should the
>> > unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER),
>> > aside from the LR and other stack size geometry sanity checks?
>> 
>> That's a very good point: we'll only ever be examining scheduled out tasks
>> and current (which at that time is running klp_try_complete_transition()).
>> 
>> current won't be in an interrupt/exception when it's walking its own
>> stack. And scheduled out tasks can't have an exception/interrupt frame
>> as their frame 0, correct?
>> 
>> Thus, AFAICS, whenever klp_try_complete_transition() finds a
>> STACK_FRAMES_REGS_MARKER in frame 0, it is known to be garbage, as you
>> said.
>
> I gave this about 20 tests and they were also all successful, what do
> you think?  (The log could probably use some trimming and cleanup.)  I
> think either solution would fix the issue.
>
> -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8--
>
> From edd3fb9e8a16d5a7ccc98759e9397f386f0e82ca Mon Sep 17 00:00:00 2001
> From: Joe Lawrence <joe.lawrence at redhat.com>
> Date: Fri, 11 Jan 2019 10:25:26 -0500
> Subject: [PATCH] powerpc/livepatch: relax reliable stack tracer exception
>  check
>
> The ppc64le reliable stack tracer iterates over a given task's stack,
> verifying a set of conditions to determine whether the trace is
> 'reliable'.  These checks include the presence of any exception frames.
>
> We should be careful when inspecting the bottom-most stack frame (the
> first to be unwound), particularly for scheduled-out tasks.  As Nicolai
> Stange explains, "If I'm reading the code in _switch() correctly, the
> first frame is completely uninitialized except for the pointer back to
> the caller's stack frame."  If a previous do_IRQ() invocation, for
> example, has left a residual exception-marker on the first frame, the
> stack tracer would incorrectly report this task's trace as unreliable.
>

FWIW, it's not been do_IRQ() who wrote the exception marker, but it's
caller hardware_interrupt_common(), more specifically the
EXCEPTION_PROLOG_COMMON_3 part therof.


> save_stack_trace_tsk_reliable() already skips the first frame when
> verifying the saved Link Register.  It should do the same when looking
> for the STACK_FRAME_REGS_MARKER.  The task it is unwinding will be
> either 'current', in which case the tracer will have never been called
> from an exception path, or the task will be inactive and its first
> frame's contents will be uninitialized (aside from backchain pointer).

I thought about this a little more and can't see anything that would
prevent higher, i.e. non-_switch() frames to also alias with prior
exception frames. That STACK_FRAME_REGS_MARKER is written to a stack
frame's "parameter area" and most functions probably don't initialize
this either. So, AFAICS, higher stack frames could potentially be
affected by the very same problem.

I think the best solution would be to clear the STACK_FRAME_REGS_MARKER
upon exception return. I have a patch ready for that and will post it
after it has passed some basic testing -- hopefully later this day.

That being said, I still think that your patch should also get applied
in some form -- looking at unitialized memory is just not a good thing
to do.


>
> Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
> Signed-off-by: Joe Lawrence <joe.lawrence at redhat.com>
> ---
>  arch/powerpc/kernel/stacktrace.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
> index e2c50b55138f..0793e75c45a6 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -84,6 +84,12 @@ save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
>  EXPORT_SYMBOL_GPL(save_stack_trace_regs);
>  
>  #ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
> +/*
> + * This function returns an error if it detects any unreliable features of the
> + * stack.  Otherwise it guarantees that the stack trace is reliable.
> + *
> + * If the task is not 'current', the caller *must* ensure the task is inactive.
> + */
>  int
>  save_stack_trace_tsk_reliable(struct task_struct *tsk,
>  				struct stack_trace *trace)
> @@ -143,7 +149,7 @@ save_stack_trace_tsk_reliable(struct task_struct *tsk,
>  			return 1;
>  
>  		/* Mark stacktraces with exception frames as unreliable. */
> -		if (sp <= stack_end - STACK_INT_FRAME_SIZE &&
> +		if (!firstframe && sp <= stack_end - STACK_INT_FRAME_SIZE &&
>  		    stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>  			return 1;
>  		}

I would perhaps not limit this to the STACK_FRAME_REGS_MARKER, but also
not emit the ip obtained from the first frame into the resulting trace.

I.e., how about moving all the sp/newsp handling to the beginning of the
loop and doing an 'if (firstframe) continue;' right after that?

Thanks,

Nicolai


-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)


More information about the Linuxppc-dev mailing list