PowerPC ftrace function trace optimisation
Benjamin Herrenschmidt
benh at kernel.crashing.org
Thu Apr 29 12:10:38 EST 2010
On Wed, 2010-04-28 at 21:55 -0400, Steven Rostedt wrote:
> On Thu, 2010-04-29 at 10:51 +1000, Anton Blanchard wrote:
> > Hi,
>
> > # gcc -pg -mprofile-kernel
> >
> > 0000000000000000 <.foo>:
> > 0: 7c 08 02 a6 mflr r0
> > 4: f8 01 00 10 std r0,16(r1)
> > 8: 48 00 00 01 bl 8 <.foo+0x8> <--- call to mcount
> >
> > c: 7c 08 02 a6 mflr r0
>
> Why the extra mflr? Can't we just make it a requirement that mcount
> returns with r0 back to what it was?
Well, we can't just change that now, it's been in for long enough.
We might be able to get a new option later on that makes it more
efficient tho (for example removing the std), but let's see what we can
do with what we have.
The extra mflr makes sense if you consider that the option just
pre-pends a pre-canned set of instructions and doesn't actually touch
anything to the prolog generation. It might be possible to do a hack to
make the prolog aware that LR is already in r0 but let's look at that
after we've verified we can get the existing stuff working :-)
Another idea Alan had is that if we could have a list of call sites,
instead of NOP'ing we could instead change the branches of all call
sites to skip the 3 instruction mcount prolog :-)
Now, we do store the relocs with the kernel image when using
CONFIG_RELOCATABLE, though we might want to 'sort' them a bit to easily
find callers from call sites, but it's something to also consider.
Cheers,
Ben.
> -- Steve
>
> > 10: f8 01 00 10 std r0,16(r1)
> > 14: f8 21 ff d1 stdu r1,-48(r1)
> > 18: e9 22 00 00 ld r9,0(r2)
> > 1c: e8 69 00 02 lwa r3,0(r9)
> > 20: 38 21 00 30 addi r1,r1,48
> > 24: e8 01 00 10 ld r0,16(r1)
> > 28: 7c 08 03 a6 mtlr r0
> > 2c: 4e 80 00 20 blr
> >
> >
> > This mean we could support ftrace function trace with very little overhead.
> >
> > In fact if we are careful when switching to the new mcount ABI and don't
> > rely on the store of r0, we could probably optimise this even further in a
> > future gcc and remove the store completely. mcount would be 2 instructions:
> >
> > mflr r0
> > bl 8 <.foo+0x8>
> >
> > Anton
>
More information about the Linuxppc-dev
mailing list