[PATCH v3 4/6] powerpc/64s: Un-blacklist system_call() from kprobes

Fri Jun 23 01:43:36 AEST 2017

On 2017/06/22 11:14PM, Nicholas Piggin wrote:
> On Thu, 22 Jun 2017 21:14:21 +1000
> Michael Ellerman <mpe at ellerman.id.au> wrote:
> 
> > Nicholas Piggin <npiggin at gmail.com> writes:
> > 
> > > On Thu, 22 Jun 2017 00:08:40 +0530
> > > "Naveen N. Rao" <naveen.n.rao at linux.vnet.ibm.com> wrote:
> > >  
> > >> It is actually safe to probe system_call() in entry_64.S, but only till
> > >> we unset MSR_RI. To allow this, add a new symbol system_call_exit()
> > >> after the mtmsrd and blacklist that. Though the mtmsrd instruction
> > >> itself is now whitelisted, we won't be allowed to probe on it as we
> > >> don't allow probing on rfi and mtmsr instructions (checked for in
> > >> arch_prepare_kprobe()).  
> > >
> > > Can you add a little comment to say probes aren't allowed, and it's
> > > located after the mtmsr in order to avoid contaminating traces?

Hmm.. it's actually after the mtmsrd since we can probe until that 
point. In v2, I converted .Lsyscall_exit() into a different label and 
blacklisted all of it. But Michael prefers to only blacklist what we 
must. It is helpful if we ever want to probe after returning from a 
syscall. That said, please see further below (*)

> > >
> > > Also I wonder if a slightly different name would be more instructive?
> > > I don't normally care, but the system_call_common code isn't trivial
> > > to follow. system_call_exit might give the impression that it is the
> > > entire exit path (which would pair with system_call for entry).  
> > 
> > It is the entire path in the happy case isn't it? I'm not sure I know
> > what you mean.
> 
> Oh, yes you're right, I thought it was moved down further.
> 
> > If you're tracing etc. then you'll be in syscall_exit_work, isn't 
> > that sufficient to differentiate the two?

Not sure how much this matters, but the new symbol (system_call_exit) is 
actually a few instructions from after the syscall return 
(.Lsyscall_exit). And I have converted syscall_exit_work to a private 
symbol as well.

In general, from kprobes standpoint, there are places there where we can 
probe safely but doing so may require new symbols to be introduced, 
which could make traces look weird, as well as distribute samples among 
different symbols.

I'm not sure how best to proceed. In this current series, I pretty much 
blacklist all of system_call_common() through to syscall_exit() except 
for the small part in between in system_call(). All other symbols have 
been made private, so nothing else apart from system_call_common(), 
system_call() and system_call_exit() show up in traces.

We could probably retain syscall_dotrace(), syscall_enosys() and parts 
of syscall_exit_work() after RI is restored, which will allow those to 
be probed, but end up showing these symbols in traces.

What would be preferable?


-----
(*)
The other suggestion Michael had was to just put a nop after the bctrl 
and to again make .Lsyscall_exit public and blacklist that (though he 
wasn't all for it). I suppose it does solve this issue in a nice way - 
the call traces when in a system call show the proper symbol and we do 
have a 'nop' instruction on which we can probe to catch returns from 
system calls. Is that something we can re-consider? Something like this 
(along with converting other .Lsyscall_exit references):

--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -175,8 +175,9 @@ system_call:                        /* label this so stack traces look sane */
        ldx     r12,r11,r0      /* Fetch system call handler [ptr] */
        mtctr   r12
        bctrl                   /* Call handler */
+       nop

-.Lsyscall_exit:
+system_call_exit:
        std     r3,RESULT(r1)
        CURRENT_THREAD_INFO(r12, r1)



Thanks,
Naveen