OProfile callgraph support not working correctly on PPC processors
Bob Nelson
rrnelson at linux.vnet.ibm.com
Sat Dec 22 04:15:32 EST 2007
I have been investigating why I have not been able to get callgraph code for
OProfile on Cell to work correctly and I am pretty sure that I have run into
a problem that is common across all the Power platforms. (At least the other
ones I have looked at.) I have a simple test program that is attached
below. It has a main, that calls function1, which calls function2. Each of
the functions has some type of loop in it so that I can catch it spending
some CPU time with OProfile. I have also attached the objdump -d output for
the program cut down to the three pertinent functions that shows what is
happening. In a nutshell when a terminal function (calls no other function)
is called the compiler is making an optimization that seems to break the ABI
convention as far as I can tell. It does not store the Link Register on the
stack like any other function. It just leaves the return address in LR,
knowing that nothing should change it. (You can see at the top of both main
and function1 the first thing it does is "mflr r0" to copy the link register
to R0 to be saved. It does not do that in function2.) When OProfile takes
an interrupt and needs to gather the callgraph information it does so by
grabbing the process' stack pointer (R1) and follows the chain back up the
stack to gather all the caller's addresses. This works for most functions,
except for terminal functions for the reason noted above.
Looking at the assembly listing I drew myself a diagram of the stack while
function2 is active to convince myself of what was wrong and here is what I
see it as... When the interrupt is handled OProfile grabs a copy of R1, it
ignores the first frame on the stack because there should be no address
stored. In the second frame it expects to find function2's caller but since
function2 doesn't store it, it grabs some random data and proceeds. The stack
chain is all ok so it doesn't go off into neverland trying to follow a bad
chain, but it grabs an invalid address for the caller. And that is why
OProfile thinks terminal functions have no callers on PPC...
Any suggestions on how this can be fixed? I am guessing that changing the
compiler and recompiling every program is probably not the answer. I assume
the link register has to be saved in the interrupt routine when it runs, or
else it couldn't call anything else without crashing the program that was
interrupted. Is there a safe place to find it?
Thanks, Bob Nelson
top of stack ------------------------------
| . |
| . | <------------------------------
| . | |
|----------------------------| |
| R0 (link register) | --> main's caller |
|----------------------------| |
| flags (unused) | |
|----------------------------| |
| R1 (previous frame) |>-------------------------------
R1 main -> |----------------------------| 0 (Offset from R1 <----------
(entry) | R31 save | at entry to main) |
|----------------------------| -8 |
| . | |
| . | |
| . | |
|----------------------------| |
| R0 (link register) | -->function1's caller (main) |
|----------------------------| |
| flags (not stored) | |
|----------------------------| |
| R1 (previous frame) |>-------------------------------
R1 function1-->|----------------------------| -144 <-------------------------
(entry) | R31 save | |
|----------------------------| |
| . | |
| . | |
| . | |
|----------------------------| |
| nothing stored | (should be function2's caller |
|----------------------------| function1) |
| flags (not stored) | |
|----------------------------| |
| R1 (previous frame) |>-------------------------------
R1 function2-->|----------------------------| -288 <-------------------------
(entry) | R31 save | |
|----------------------------| |
| . | |
| . | |
| . | |
|----------------------------| |
| nothing stored | would be used if function2 |
|----------------------------| called anything |
| flags (not stored) | |
|----------------------------| |
| R1 (previous frame) |>-------------------------------
R1 function2-->|----------------------------| -368 (running)
| . |
| . |
| |
/* loop.c - nonsense code for testing OProfile */
#include <stdio.h>
int function2( int count )
{
int i, j, k;
for ( i=0; i<count; i++ )
{
k = k + j * i;
}
return k;
}
int function1( int count )
{
int i, j;
i = function2( count );
for ( j=0; j<1000; j++ ) i++;
return i;
}
int main( int argc, char *argv[] )
{
int count, i, j, k;
if ( argc > 0 )
count = atoi( argv[1] );
else
count = 10000;
for ( i=0; i<count; i++ )
{
j = function1( 10000 );
for( j=0; j<10000; j++ ) k = k + j;
}
return 0;
}
loop.64: file format elf64-powerpc
... deleted ...
00000000100005b0 <.function2>:
100005b0: fb e1 ff f8 std r31,-8(r1)
100005b4: f8 21 ff b1 stdu r1,-80(r1)
100005b8: 7c 3f 0b 78 mr r31,r1
100005bc: 7c 60 1b 78 mr r0,r3
100005c0: 90 1f 00 80 stw r0,128(r31)
100005c4: 38 00 00 00 li r0,0
100005c8: 90 1f 00 38 stw r0,56(r31)
100005cc: 48 00 00 2c b 100005f8 <.function2+0x48>
100005d0: 81 3f 00 34 lwz r9,52(r31)
100005d4: 80 1f 00 38 lwz r0,56(r31)
100005d8: 7c 09 01 d6 mullw r0,r9,r0
100005dc: 7c 09 07 b4 extsw r9,r0
100005e0: 80 1f 00 30 lwz r0,48(r31)
100005e4: 7c 00 4a 14 add r0,r0,r9
100005e8: 90 1f 00 30 stw r0,48(r31)
100005ec: 81 3f 00 38 lwz r9,56(r31)
100005f0: 38 09 00 01 addi r0,r9,1
100005f4: 90 1f 00 38 stw r0,56(r31)
100005f8: 80 1f 00 38 lwz r0,56(r31)
100005fc: 81 3f 00 80 lwz r9,128(r31)
10000600: 7f 80 48 00 cmpw cr7,r0,r9
10000604: 41 9c ff cc blt+ cr7,100005d0 <.function2+0x20>
10000608: 80 1f 00 30 lwz r0,48(r31)
1000060c: 7c 00 07 b4 extsw r0,r0
10000610: 7c 03 03 78 mr r3,r0
10000614: e8 21 00 00 ld r1,0(r1)
10000618: eb e1 ff f8 ld r31,-8(r1)
1000061c: 4e 80 00 20 blr
...
10000628: 80 01 00 01 lwz r0,1(r1)
000000001000062c <.function1>:
1000062c: 7c 08 02 a6 mflr r0
10000630: fb e1 ff f8 std r31,-8(r1)
10000634: f8 01 00 10 std r0,16(r1)
10000638: f8 21 ff 71 stdu r1,-144(r1)
1000063c: 7c 3f 0b 78 mr r31,r1
10000640: 7c 60 1b 78 mr r0,r3
10000644: 90 1f 00 c0 stw r0,192(r31)
10000648: 80 1f 00 c0 lwz r0,192(r31)
1000064c: 7c 00 07 b4 extsw r0,r0
10000650: 7c 03 03 78 mr r3,r0
10000654: 4b ff ff 5d bl 100005b0 <.function2>
10000658: 7c 60 1b 78 mr r0,r3
1000065c: 90 1f 00 74 stw r0,116(r31)
10000660: 38 00 00 00 li r0,0
10000664: 90 1f 00 70 stw r0,112(r31)
10000668: 48 00 00 1c b 10000684 <.function1+0x58>
1000066c: 81 3f 00 74 lwz r9,116(r31)
10000670: 38 09 00 01 addi r0,r9,1
10000674: 90 1f 00 74 stw r0,116(r31)
10000678: 81 3f 00 70 lwz r9,112(r31)
1000067c: 38 09 00 01 addi r0,r9,1
10000680: 90 1f 00 70 stw r0,112(r31)
10000684: 80 1f 00 70 lwz r0,112(r31)
10000688: 2f 80 03 e7 cmpwi cr7,r0,999
1000068c: 40 9d ff e0 ble+ cr7,1000066c <.function1+0x40>
10000690: 80 1f 00 74 lwz r0,116(r31)
10000694: 7c 00 07 b4 extsw r0,r0
10000698: 7c 03 03 78 mr r3,r0
1000069c: e8 21 00 00 ld r1,0(r1)
100006a0: e8 01 00 10 ld r0,16(r1)
100006a4: 7c 08 03 a6 mtlr r0
100006a8: eb e1 ff f8 ld r31,-8(r1)
100006ac: 4e 80 00 20 blr
100006b0: 00 00 00 00 .long 0x0
100006b4: 00 00 00 01 .long 0x1
100006b8: 80 01 00 01 lwz r0,1(r1)
00000000100006bc <.main>:
100006bc: 7c 08 02 a6 mflr r0
100006c0: fb e1 ff f8 std r31,-8(r1)
100006c4: f8 01 00 10 std r0,16(r1)
100006c8: f8 21 ff 71 stdu r1,-144(r1)
100006cc: 7c 3f 0b 78 mr r31,r1
100006d0: 7c 60 1b 78 mr r0,r3
100006d4: f8 9f 00 c8 std r4,200(r31)
100006d8: 90 1f 00 c0 stw r0,192(r31)
100006dc: 80 1f 00 c0 lwz r0,192(r31)
100006e0: 2f 80 00 00 cmpwi cr7,r0,0
100006e4: 40 9d 00 28 ble- cr7,1000070c <.main+0x50>
100006e8: e9 3f 00 c8 ld r9,200(r31)
100006ec: 39 29 00 08 addi r9,r9,8
100006f0: e8 09 00 00 ld r0,0(r9)
100006f4: 7c 03 03 78 mr r3,r0
100006f8: 4b ff fc f9 bl 100003f0 <._init+0x38>
100006fc: e8 41 00 28 ld r2,40(r1)
10000700: 7c 60 1b 78 mr r0,r3
10000704: 90 1f 00 7c stw r0,124(r31)
10000708: 48 00 00 0c b 10000714 <.main+0x58>
1000070c: 38 00 27 10 li r0,10000
10000710: 90 1f 00 7c stw r0,124(r31)
10000714: 38 00 00 00 li r0,0
10000718: 90 1f 00 78 stw r0,120(r31)
1000071c: 48 00 00 54 b 10000770 <.main+0xb4>
10000720: 38 60 27 10 li r3,10000
10000724: 4b ff ff 09 bl 1000062c <.function1>
10000728: 7c 60 1b 78 mr r0,r3
1000072c: 90 1f 00 74 stw r0,116(r31)
10000730: 38 00 00 00 li r0,0
10000734: 90 1f 00 74 stw r0,116(r31)
10000738: 48 00 00 20 b 10000758 <.main+0x9c>
1000073c: 81 3f 00 70 lwz r9,112(r31)
10000740: 80 1f 00 74 lwz r0,116(r31)
10000744: 7c 09 02 14 add r0,r9,r0
10000748: 90 1f 00 70 stw r0,112(r31)
1000074c: 81 3f 00 74 lwz r9,116(r31)
10000750: 38 09 00 01 addi r0,r9,1
10000754: 90 1f 00 74 stw r0,116(r31)
10000758: 80 1f 00 74 lwz r0,116(r31)
1000075c: 2f 80 27 0f cmpwi cr7,r0,9999
10000760: 40 9d ff dc ble+ cr7,1000073c <.main+0x80>
10000764: 81 3f 00 78 lwz r9,120(r31)
10000768: 38 09 00 01 addi r0,r9,1
1000076c: 90 1f 00 78 stw r0,120(r31)
10000770: 80 1f 00 78 lwz r0,120(r31)
10000774: 81 3f 00 7c lwz r9,124(r31)
10000778: 7f 80 48 00 cmpw cr7,r0,r9
1000077c: 41 9c ff a4 blt+ cr7,10000720 <.main+0x64>
10000780: 38 00 00 00 li r0,0
10000784: 7c 03 03 78 mr r3,r0
10000788: e8 21 00 00 ld r1,0(r1)
1000078c: e8 01 00 10 ld r0,16(r1)
10000790: 7c 08 03 a6 mtlr r0
10000794: eb e1 ff f8 ld r31,-8(r1)
10000798: 4e 80 00 20 blr
1000079c: 00 00 00 00 .long 0x0
100007a0: 00 00 00 01 .long 0x1
100007a4: 80 01 00 01 lwz r0,1(r1)
100007a8: 60 00 00 00 nop
100007ac: 60 00 00 00 nop
... deleted ...
More information about the Linuxppc-dev
mailing list