[RFC PATCH] powerpc/64/ftrace: mprofile-kernel patch out mflr

Nicholas Piggin npiggin at gmail.com
Fri May 17 19:12:45 AEST 2019


Naveen N. Rao's on May 17, 2019 4:22 am:
> Nicholas Piggin wrote:
>> Naveen N. Rao's on May 14, 2019 6:32 pm:
>>> Michael Ellerman wrote:
>>>> "Naveen N. Rao" <naveen.n.rao at linux.ibm.com> writes:
>>>>> Michael Ellerman wrote:
>>>>>> Nicholas Piggin <npiggin at gmail.com> writes:
>>>>>>> The new mprofile-kernel mcount sequence is
>>>>>>>
>>>>>>>   mflr	r0
>>>>>>>   bl	_mcount
>>>>>>>
>>>>>>> Dynamic ftrace patches the branch instruction with a noop, but leaves
>>>>>>> the mflr. mflr is executed by the branch unit that can only execute one
>>>>>>> per cycle on POWER9 and shared with branches, so it would be nice to
>>>>>>> avoid it where possible.
>>>>>>>
>>>>>>> This patch is a hacky proof of concept to nop out the mflr. Can we do
>>>>>>> this or are there races or other issues with it?
>>>>>> 
>>>>>> There's a race, isn't there?
>>>>>> 
>>>>>> We have a function foo which currently has tracing disabled, so the mflr
>>>>>> and bl are nop'ed out.
>>>>>> 
>>>>>>   CPU 0			CPU 1
>>>>>>   ==================================
>>>>>>   bl foo
>>>>>>   nop (ie. not mflr)
>>>>>>   -> interrupt
>>>>>>   something else	enable tracing for foo
>>>>>>   ...			patch mflr and branch
>>>>>>   <- rfi
>>>>>>   bl _mcount
>>>>>> 
>>>>>> So we end up in _mcount() but with r0 not populated.
>>>>>
>>>>> Good catch! Looks like we need to patch the mflr with a "b +8" similar 
>>>>> to what we do in __ftrace_make_nop().
>>>> 
>>>> Would that actually make it any faster though? Nick?
>>> 
>>> Ok, how about doing this as a 2-step process?
>>> 1. patch 'mflr r0' with a 'b +8'
>>>    synchronize_rcu_tasks()
>>> 2. convert 'b +8' to a 'nop'
>> 
>> Good idea. Well the mflr r0 is harmless, so you can leave that in.
>> You just need to ensure it's not removed before the bl is. So nop
>> the bl _mcount, then synchronize_rcu_tasks(), then nop the mflr?
> 
> The problem actually seems to be when we try to patch in the branch to 
> _mcount(), rather than when we are patching in the nop instructions 
> (i.e., the race is when we try to enable the function tracer, rather 
> than while disabling it).
> 
> When we disable ftrace, we only need to ensure we patch out the branch 
> to _mcount() before patching out the preceding 'mflr r0'. I don't think 
> we need a synchronize_rcu_tasks() in that case.

That's probably right.

> While enabling ftrace, we will first need to patch the preceding 'mflr 
> r0' (which would now be a 'nop') with 'b +8', then use 
> synchronize_rcu_tasks() and finally patch in 'bl _mcount()' followed by 
> 'mflr r0'.
> 
> I think that's what you meant, just that my reference to patching 'mflr 
> r0' with a 'b +8' should have called out that the mflr would have been 
> nop'ed out.

I meant that we don't need the b +8 anywhere, because the mflr r0
is harmless. Enabling ftrace just needs to patch in 'mflr r0', and
then synchronize_rcu_tasks(), and then patch in 'bl _mcount'. I think?

Thanks,
Nick



More information about the Linuxppc-dev mailing list