[PATCH V2 0/6] perf: New conditional branch filter
Anshuman Khandual
khandual at linux.vnet.ibm.com
Fri Aug 30 14:24:44 EST 2013
This patchset is the re-spin of the original branch stack sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
also enables SW based branch filtering support for PPC64 platforms which have
branch stack sampling support. With this new enablement, the branch filter support
for PPC64 platforms have been extended to include all these combinations discussed
below with a sample test application program.
(1) perf record -e branch-misses:u -b ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... ..................... .................... .....................
#
4.42% cprog cprog [k] sw_4_2 cprog [k] lr_addr
4.41% cprog cprog [k] symbol2 cprog [k] hw_1_2
4.41% cprog cprog [k] ctr_addr cprog [k] sw_4_1
4.41% cprog cprog [k] lr_addr cprog [k] sw_4_2
4.41% cprog cprog [k] sw_4_2 cprog [k] callme
4.41% cprog cprog [k] symbol1 cprog [k] hw_1_1
4.41% cprog cprog [k] success_3_1_3 cprog [k] sw_3_1
2.43% cprog cprog [k] sw_4_1 cprog [k] ctr_addr
2.43% cprog cprog [k] hw_1_2 cprog [k] symbol2
2.43% cprog cprog [k] callme cprog [k] hw_1_2
2.43% cprog cprog [k] address1 cprog [k] back1
2.43% cprog cprog [k] back1 cprog [k] callme
2.43% cprog cprog [k] hw_2_1 cprog [k] address1
2.43% cprog cprog [k] sw_3_1_1 cprog [k] sw_3_1
2.43% cprog cprog [k] sw_3_1_2 cprog [k] sw_3_1
2.43% cprog cprog [k] sw_3_1_3 cprog [k] sw_3_1
2.43% cprog cprog [k] sw_3_1 cprog [k] sw_3_1_1
2.43% cprog cprog [k] sw_3_1 cprog [k] sw_3_1_2
2.43% cprog cprog [k] sw_3_1 cprog [k] sw_3_1_3
2.43% cprog cprog [k] callme cprog [k] sw_3_1
2.43% cprog cprog [k] callme cprog [k] sw_4_2
2.43% cprog cprog [k] hw_1_1 cprog [k] symbol1
2.43% cprog cprog [k] callme cprog [k] hw_1_1
2.42% cprog cprog [k] sw_3_1 cprog [k] callme
1.99% cprog cprog [k] success_3_1_1 cprog [k] sw_3_1
1.99% cprog cprog [k] sw_3_1 cprog [k] success_3_1_1
1.99% cprog cprog [k] address2 cprog [k] back2
1.99% cprog cprog [k] hw_2_2 cprog [k] address2
1.99% cprog cprog [k] back2 cprog [k] callme
1.99% cprog cprog [k] callme cprog [k] main
1.99% cprog cprog [k] sw_3_1 cprog [k] success_3_1_3
1.99% cprog cprog [k] hw_1_1 cprog [k] callme
1.99% cprog cprog [k] sw_3_2 cprog [k] callme
1.99% cprog cprog [k] callme cprog [k] sw_3_2
1.99% cprog cprog [k] success_3_1_2 cprog [k] sw_3_1
1.99% cprog cprog [k] sw_3_1 cprog [k] success_3_1_2
1.99% cprog cprog [k] hw_1_2 cprog [k] callme
1.99% cprog cprog [k] sw_4_1 cprog [k] callme
0.02% cprog [unknown] [k] 0xf7ba2328 [unknown] [k] 0xf7ba2320
0.00% cprog libc-2.11.2.so [k] _IO_file_overflow libc-2.11.2.so [k] _IO_file_overflow
0.00% cprog libc-2.11.2.so [k] _IO_file_xsputn libc-2.11.2.so [k] _IO_file_overflow
0.00% cprog cprog [k] callme cprog [k] hw_2_2
PMU filters
-----------
(2) perf record -e branch-misses:u -j any_call ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... ....................... .................... ......................
#
7.82% cprog cprog [k] sw_3_1 cprog [k] success_3_1_2
6.88% cprog cprog [k] sw_3_1 cprog [k] sw_3_1_2
6.88% cprog cprog [k] hw_1_1 cprog [k] symbol1
5.88% cprog cprog [k] sw_3_1 cprog [k] sw_3_1_1
5.88% cprog cprog [k] callme cprog [k] hw_1_1
5.88% cprog cprog [k] sw_3_1 cprog [k] success_3_1_1
5.88% cprog cprog [k] sw_3_1 cprog [k] sw_3_1_3
5.88% cprog cprog [k] callme cprog [k] hw_1_2
5.88% cprog cprog [k] hw_1_2 cprog [k] symbol2
5.88% cprog cprog [k] sw_4_2 cprog [k] lr_addr
5.88% cprog cprog [k] callme cprog [k] sw_4_2
4.88% cprog cprog [k] sw_3_1 cprog [k] success_3_1_3
4.88% cprog cprog [k] callme cprog [k] sw_3_2
4.88% cprog cprog [k] callme cprog [k] hw_2_2
3.94% cprog cprog [k] callme cprog [k] sw_3_1
3.94% cprog cprog [k] callme cprog [k] hw_2_1
2.94% cprog cprog [k] main cprog [k] callme
2.94% cprog cprog [k] sw_4_1 cprog [k] ctr_addr
2.94% cprog cprog [k] callme cprog [k] sw_4_1
0.01% cprog [unknown] [k] 0xf79076c4 [unknown] [k] 0xf78f22c0
0.00% cprog libc-2.11.2.so [k] _IO_file_doallocate libc-2.11.2.so [k] _IO_setb
0.00% cprog libc-2.11.2.so [k] _IO_file_doallocate libc-2.11.2.so [k] mmap
0.00% cprog libc-2.11.2.so [k] _IO_file_xsputn libc-2.11.2.so [k] _IO_default_xsputn
0.00% cprog libc-2.11.2.so [k] _IO_file_overflow libc-2.11.2.so [k] _IO_do_write
0.00% cprog ld-2.11.2.so [k] malloc [unknown] [k] 0xf790b380
(3) perf record -e branch-misses:u -j cond ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... .................. .................... .......................
#
24.85% cprog [unknown] [k] 00000000 cprog [k] callme
15.71% cprog cprog [k] sw_3_1 cprog [k] sw_3_1
7.14% cprog cprog [k] sw_4_2 cprog [k] lr_addr
6.57% cprog [unknown] [k] 00000000 cprog [k] sw_4_2
4.57% cprog cprog [k] hw_2_2 cprog [k] callme
4.57% cprog cprog [k] sw_3_1_1 cprog [k] sw_3_1
4.57% cprog cprog [k] sw_4_1 cprog [k] ctr_addr
4.57% cprog [unknown] [k] 00000000 cprog [k] sw_4_1
4.57% cprog cprog [k] main cprog [k] hw_1_1
4.57% cprog cprog [k] hw_1_2 cprog [k] hw_1_2
4.57% cprog [unknown] [k] 00000000 cprog [k] main
4.57% cprog cprog [k] hw_2_1 cprog [k] callme
4.57% cprog cprog [k] sw_3_1_3 cprog [k] sw_3_1
4.57% cprog cprog [k] sw_3_1_2 cprog [k] sw_3_1
0.01% cprog [unknown] [k] 0xf7aa25dc [unknown] [k] 0xf7aa27e4
0.00% cprog libc-2.11.2.so [k] _IO_doallocbuf libc-2.11.2.so [k] _IO_file_doallocate
0.00% cprog [unknown] [k] 00000000 libc-2.11.2.so [k] _IO_file_doallocate
0.00% cprog [unknown] [k] 00000000 libc-2.11.2.so [k] _IO_file_stat
SW filters
----------
(4) perf record -e branch-misses:u -j any_ret ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... ................. .................... ..............
#
7.91% cprog cprog [k] symbol1 cprog [k] hw_1_1
7.91% cprog cprog [k] success_3_1_3 cprog [k] sw_3_1
7.91% cprog cprog [k] ctr_addr cprog [k] sw_4_1
7.91% cprog cprog [k] lr_addr cprog [k] sw_4_2
7.91% cprog cprog [k] symbol2 cprog [k] hw_1_2
7.90% cprog cprog [k] sw_4_2 cprog [k] callme
4.34% cprog cprog [k] success_3_1_2 cprog [k] sw_3_1
4.33% cprog cprog [k] sw_4_1 cprog [k] callme
4.33% cprog cprog [k] hw_1_2 cprog [k] callme
4.33% cprog cprog [k] success_3_1_1 cprog [k] sw_3_1
4.33% cprog cprog [k] sw_3_2 cprog [k] callme
4.33% cprog cprog [k] back2 cprog [k] callme
4.33% cprog cprog [k] callme cprog [k] main
4.33% cprog cprog [k] hw_1_1 cprog [k] callme
3.58% cprog cprog [k] sw_3_1 cprog [k] callme
3.58% cprog cprog [k] sw_3_1_1 cprog [k] sw_3_1
3.58% cprog cprog [k] sw_3_1_2 cprog [k] sw_3_1
3.58% cprog cprog [k] back1 cprog [k] callme
3.57% cprog cprog [k] sw_3_1_3 cprog [k] sw_3_1
0.00% cprog [unknown] [k] 0xf7abacf4 [unknown] [k] 0xf7abae40
(5) perf record -e branch-misses:u -j ind_call ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... ............. .................... .............
#
63.56% cprog cprog [k] sw_4_2 cprog [k] lr_addr
36.44% cprog cprog [k] sw_4_1 cprog [k] ctr_addr
Mixed filters
-------------
(6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
Error:
The perf.data file has no samples!
NOTE: As expected. The HW filters all the branches which are calls and SW tries to find return
branches in that given set. Both the filters are mutually exclussive, so obviously no samples
found in the end profile.
(7) perf record -e branch-misses:u -j any_call,ind_call ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... .............. .................... ..............
#
66.69% cprog cprog [k] sw_4_2 cprog [k] lr_addr
33.31% cprog cprog [k] sw_4_1 cprog [k] ctr_addr
0.00% cprog [unknown] [k] 0x0fe7f264 [unknown] [k] 0x0ff926d0
(8) perf record -e branch-misses:u -j any_call,any_ret,ind_call ./cprog
Error:
The perf.data file has no samples!
(9) perf record -e branch-misses:u -j cond,any_ret ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... .............. .................... .......................
#
46.01% cprog [unknown] [k] 00000000 cprog [k] callme
13.54% cprog [unknown] [k] 00000000 cprog [k] sw_4_2
8.18% cprog cprog [k] sw_3_1_2 cprog [k] sw_3_1
8.07% cprog [unknown] [k] 00000000 cprog [k] main
8.07% cprog cprog [k] sw_3_1_1 cprog [k] sw_3_1
8.07% cprog cprog [k] sw_3_1_3 cprog [k] sw_3_1
8.07% cprog [unknown] [k] 00000000 cprog [k] sw_4_1
0.00% cprog [unknown] [k] 00000000 [unknown] [k] 0xf7c1480c
0.00% cprog libc-2.11.2.so [k] mmap libc-2.11.2.so [k] _IO_file_doallocate
(10) perf record -e branch-misses:u -j cond,ind_call ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... .............. .................... ..............
#
48.11% cprog [unknown] [k] 00000000 cprog [k] callme
13.52% cprog [unknown] [k] 00000000 cprog [k] sw_4_2
12.42% cprog cprog [k] sw_4_2 cprog [k] lr_addr
8.65% cprog [unknown] [k] 00000000 cprog [k] main
8.65% cprog cprog [k] sw_4_1 cprog [k] ctr_addr
8.65% cprog [unknown] [k] 00000000 cprog [k] sw_4_1
0.00% cprog [unknown] [k] 00000000 [unknown] [k] 0xf7a4581c
(11) perf record -e branch-misses:u -j cond,any_ret,ind_call ./cprog
# Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
# ........ ....... .................... .............. .................... .................
#
45.91% cprog [unknown] [k] 00000000 cprog [k] callme
13.26% cprog [unknown] [k] 00000000 cprog [k] sw_4_2
8.17% cprog cprog [k] sw_3_1_3 cprog [k] sw_3_1
8.17% cprog [unknown] [k] 00000000 cprog [k] sw_4_1
8.17% cprog cprog [k] sw_3_1_2 cprog [k] sw_3_1
8.17% cprog [unknown] [k] 00000000 cprog [k] main
8.16% cprog cprog [k] sw_3_1_1 cprog [k] sw_3_1
0.00% cprog [unknown] [k] 00000000 [unknown] [k] 0xf7f87704
0.00% cprog [unknown] [k] 00000000 libc-2.11.2.so [k] _IO_file_sync
Test application program
========================
(1) Makefile:
--------------------------------------------
all: sample.o cprog of.cprog of.sample
sample.o: sample.s
as -o sample.o sample.s
cprog: cprog.c sample.o
gcc -o cprog cprog.c sample.o
of.sample: sample.o
objdump -d sample.o > of.sample
of.cprog: cprog
objdump -d cprog > of.cprog
clean:
rm sample.o cprog of.sample of.cprog
---------------------------------------------
(2) cprog.c
---------------------------------------------
#include <stdio.h>
#define LOOP_COUNT 100000
extern void callme(void);
int main(int argc, char *argv[])
{
int i;
for(i = 0; i < LOOP_COUNT; i++)
callme();
printf("end");
return 0;
}
---------------------------------------------
(3) sample.S
---------------------------------------------
# r25, r26, r27 will be used as first level, second level
# and third level stack for LR. Register r20, r21, r22, r23
# r24 will be used for general programming purpose.
.data
msg:
.string "BHRB filter tests\n"
len = . - msg
msg_1_1:
.string "Test: hw_1_1\n"
len_1_1 = 13
msg_1_2:
.string "Test: hw_1_2\n"
len_1_2 = 13
msg_2_1:
.string "Test: hw_2_1\n"
len_2_1 = 13
msg_2_2:
.string "Test: hw_2_2\n"
len_2_2 = 13
msg_3_1:
.string "Test: sw_3_1\n"
len_3_1 = 13
msg_3_1_1:
.string "Test: sw_3_1_1\n"
len_3_1_1 = 15
msg_3_1_2:
.string "Test: sw_3_1_2\n"
len_3_1_2 = 15
msg_3_1_3:
.string "Test: sw_3_1_3\n"
len_3_1_3 = 15
msg_3_2:
.string "Test: sw_3_2\n"
len_3_3 = 13
msg_4_1:
.string "Test: sw_4_1\n"
len_4_1 = 13
msg_4_2:
.string "Test: sw_4_2\n"
len_4_2 = 13
hw_3_1_1_passed:
.string "\thw_3_1_1_passed\n\n"
len_hw_3_1_1_passed = 18
hw_3_1_2_passed:
.string "\thw_3_1_2_passed\n\n"
len_hw_3_1_2_passed = 18
hw_3_1_3_passed:
.string "\thw_3_1_3_passed\n\n"
len_hw_3_1_3_passed = 18
hw_2_1_passed:
.string "\thw_2_1_passed\n\n"
len_hw_2_1_passed = 16
hw_2_2_passed:
.string "\thw_2_2_passed\n\n"
len_hw_2_2_passed = 16
hw_1_1_passed:
.string "\thw_1_1_passed\n\n"
len_hw_1_1_passed = 16
hw_1_2_passed:
.string "\thw_1_2_passed\n\n"
len_hw_1_2_passed = 16
hw_4_1_passed:
.string "\thw_4_1_passed\n\n"
len_hw_4_1_passed = 16
hw_4_2_passed:
.string "\thw_4_2_passed\n\n"
len_hw_4_2_passed = 16
msg_error:
.string "\tError\n"
len_error = 7
.text
.global callme
.global hw_1_1
.global hw_1_2
.global hw_2_1
.global hw_2_2
# HW filter test symbols
symbol1:
# Print "hw_1_1_passed"
li 0, 4
li 3, 1
lis 4, hw_1_1_passed at ha
addi 4, 4, hw_1_1_passed at l
li 5, len_hw_1_1_passed
sc
blr # PERF_SAMPLE_BRANCH_ANY_RET
hw_1_1:
# Save LR - second level
mflr 26
# Print "hw_1_1 called"
li 0, 4
li 3, 1
lis 4, msg_1_1 at ha
addi 4, 4, msg_1_1 at l
li 5, len_1_1
sc
bl symbol1 # PERF_SAMPLE_BRANCH_ANY_CALL
# Restore LR
mtlr 26
blr # PERF_SAMPLE_BRANCH_ANY_RET
symbol2:
# Print "Symbol2 taken"
li 0, 4
li 3, 1
lis 4, hw_1_2_passed at ha
addi 4, 4, hw_1_2_passed at l
li 5, len_hw_1_2_passed
sc
blr # PERF_SAMPLE_BRANCH_ANY_RET
hw_1_2:
# Save LR - second level
mflr 26
# Print "hw_1_2 called"
li 0, 4
li 3, 1
lis 4, msg_1_2 at ha
addi 4, 4, msg_1_2 at l
li 5, len_1_2
sc
li 4,20
cmpi 0,4,20
bcl 12, 4*cr0+2, symbol2 # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
mtlr 26
blr # PERF_SAMPLE_BRANCH_ANY_RET
# HW filter test
address1:
# Print "hw_2_1_passed"
li 0, 4
li 3, 1
lis 4, hw_2_1_passed at ha
addi 4, 4, hw_2_1_passed at l
li 5, len_hw_2_1_passed
sc
b back1 # PERF_SAMPLE_BRANCH_ANY
hw_2_1:
# Print "hw_2_1 called"
li 0, 4
li 3, 1
lis 4, msg_2_1 at ha
addi 4, 4, msg_2_1 at l
li 5, len_2_1
sc
# Simple conditional branch (equal)
li 20, 12
cmpi 3, 20, 12
bc 12, 4*cr3+2, address1 # PERF_SAMPLE_BRANCH_COND
back1:
blr # PERF_SAMPLE_BRANCH_ANY_RET
address2:
# Print "hw_2_2_passed"
li 0, 4
li 3, 1
lis 4, hw_2_2_passed at ha
addi 4, 4, hw_2_2_passed at l
li 5, len_hw_2_2_passed
sc
b back2 # PERF_SAMPLE_BRANCH_ANY
hw_2_2:
# Print "hw_2_2 called"
li 0, 4
li 3, 1
lis 4, msg_2_2 at ha
addi 4, 4, msg_2_2 at l
li 5, len_2_2
sc
# Simple conditional branch (less than)
li 20, 12
cmpi 4, 20, 20
bc 12, 4*cr4+0, address2 # PERF_SAMPLE_BRANCH_COND
back2:
blr # PERF_SAMPLE_BRANCH_ANY_RET
# SW filter test symbols
sw_3_1_1:
# Print "Test: sw_3_1_1"
li 0, 4
li 3, 1
lis 4, msg_3_1_1 at ha
addi 4, 4, msg_3_1_1 at l
li 5, len_3_1_1
sc
li 22,0
# Test the condition and return
li 21, 10
cmpi 0, 21, 10
bclr 12, 2 # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
# Should not have come here
li 0, 4
li 3, 1
lis 4, msg_error at ha
addi 4, 4, msg_error at l
li 5, len_error
sc
# Mark the error
li 22, 1
# Safe fall back
blr # PERF_SAMPLE_BRANCH_ANY_RET
sw_3_1_2:
# Print "Test: sw_3_1_2"
li 0, 4
li 3, 1
lis 4, msg_3_1_2 at ha
addi 4, 4, msg_3_1_2 at l
li 5, len_3_1_2
sc
li 23, 0
# Test the condition and return
li 21, 10
cmpi 0, 21, 20
bclr 12, 0 # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
# Should not have come here
li 0, 4
li 3, 1
lis 4, msg_error at ha
addi 4, 4, msg_error at l
li 5, len_error
sc
# Mark the error
li 23, 1
# Safe fall back
blr # PERF_SAMPLE_BRANCH_ANY_RET
sw_3_1_3:
# Print "Test: sw_3_1_3"
li 0, 4
li 3, 1
lis 4, msg_3_1_3 at ha
addi 4, 4, msg_3_1_3 at l
li 5, len_3_1_3
sc
li 24, 0
# Test the condition and return
li 21, 10
cmpi 0, 21, 5
bclr 12, 1 # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
# Mark the error
li 24, 1
# Should not have come here
li 0, 4
li 3, 1
lis 4, msg_error at ha
addi 4, 4, msg_error at l
li 5, len_error
sc
# Safe fall back
blr # PERF_SAMPLE_BRANCH_ANY_RET
success_3_1_1:
li 0, 4
li 3, 1
lis 4, hw_3_1_1_passed at ha
addi 4, 4, hw_3_1_1_passed at l
li 5, len_hw_3_1_1_passed
sc
blr
success_3_1_2:
li 0, 4
li 3, 1
lis 4, hw_3_1_2_passed at ha
addi 4, 4, hw_3_1_2_passed at l
li 5, len_hw_3_1_2_passed
sc
blr
success_3_1_3:
li 0, 4
li 3, 1
lis 4, hw_3_1_3_passed at ha
addi 4, 4, hw_3_1_3_passed at l
li 5, len_hw_3_1_3_passed
sc
blr
sw_3_1:
# Save LR
mflr 26
# Print "Test: sw_3_1"
li 0, 4
li 3, 1
lis 4, msg_3_1 at ha
addi 4, 4, msg_3_1 at l
li 5, len_3_1
sc
# Equal comparison condition
bl sw_3_1_1 # PERF_SAMPLE_BRANCH_ANY_CALL
cmpi 0, 22, 0
bcl 12, 2, success_3_1_1 # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
# LT comparison condition
bl sw_3_1_2 # PERF_SAMPLE_BRANCH_ANY_CALL
cmpi 0, 23, 0
bcl 12, 2, success_3_1_2 # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
# GT comparison condition
bl sw_3_1_3 # PERF_SAMPLE_BRANCH_ANY_CALL
cmpi 0, 24, 0
bcl 12, 2, success_3_1_3 # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
mtlr 26
blr # PERF_SAMPLE_BRANCH_ANY_RET
sw_3_2:
# Print "Test: sw_3_2"
li 0, 4
li 3, 1
lis 4, msg_3_2 at ha
addi 4, 4, msg_3_2 at l
li 5, len_3_1
sc
# FIXME: Anything more here ?
blr # PERF_SAMPLE_BRANCH_ANY_RET
# Indirect call tests
# CTR
ctr_addr:
# Print "bcctr taken"
li 0, 4
li 3, 1
lis 4, hw_4_1_passed at ha
addi 4, 4, hw_4_1_passed at l
li 5, len_hw_4_1_passed
sc
blr # PERF_SAMPLE_BRANCH_ANY_RET
sw_4_1:
# Save LR
mflr 26
# Print "sw_4_1 called"
li 0, 4
li 3, 1
lis 4, msg_4_1 at ha
addi 4, 4, msg_4_1 at l
li 5, len_4_1
sc
# Save address in CTR
lis 20, ctr_addr at ha
addi 20, 20, ctr_addr at l
mtctr 20
# Compare and jump to CTR
li 21, 10
cmpi 0, 21, 10
bcctrl 12, 4*cr0+2 # PERF_SAMPLE_BRANCH_IND_CALL
mtlr 26
blr # PERF_SAMPLE_BRANCH_ANY_RET
# LR
lr_addr:
# Print "bclrl taken"
li 0, 4
li 3, 1
lis 4, hw_4_2_passed at ha
addi 4, 4, hw_4_2_passed at l
li 5, len_hw_4_2_passed
sc
blr # PERF_SAMPLE_BRANCH_ANY_RET
sw_4_2:
# Save LR
mflr 26
# Print "Test: sw_4_2"
li 0, 4
li 3, 1
lis 4, msg_4_2 at ha
addi 4, 4, msg_4_2 at l
li 5, len_4_2
sc
# Save address in LR
lis 20, lr_addr at ha
addi 20, 20, lr_addr at l
mtlr 20
# Compare and jump to CTR
li 21, 10
cmpi 0, 21, 10
bclrl 12, 4*cr0+2 # PERF_SAMPLE_BRANCH_IND_CALL
# Restore LR
mtlr 26
blr # PERF_SAMPLE_BRANCH_ANY_RET
callme:
# Save LR
mflr 25
# Print "Branch filter Test"
li 0, 4
li 3, 1
lis 4, msg at ha
addi 4, 4, msg at l
li 5, len
sc
# PERF_SAMPLE_BRANCH_ANY_CALL
bl hw_1_1 # PERF_SAMPLE_BRANCH_ANY_CALL
bl hw_1_2 # PERF_SAMPLE_BRANCH_ANY_CALL
# PERF_SAMPLE_BRANCH_COND
bl hw_2_1 # PERF_SAMPLE_BRANCH_ANY_CALL
bl hw_2_2 # PERF_SAMPLE_BRANCH_ANY_CALL
# PERF_SAMPLE_BRANCH_ANY_RET
bl sw_3_1 # PERF_SAMPLE_BRANCH_ANY_CALL
bl sw_3_2 # PERF_SAMPLE_BRANCH_ANY_CALL
# PERF_SAMPLE_BRANCH_IND_CALL
bl sw_4_1 # PERF_SAMPLE_BRANCH_ANY_CALL
bl sw_4_2 # PERF_SAMPLE_BRANCH_ANY_CALL
# Restore LR
mtlr 25
blr # PERF_SAMPLE_BRANCH_ANY_RET
--------------------------------------------------------------------
Changes in V2
--------------
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments
Anshuman Khandual (6):
perf: New conditional branch filter criteria in branch stack sampling
powerpc, perf: Enable conditional branch filter for POWER8
perf, tool: Conditional branch filter 'cond' added to perf record
x86, perf: Add conditional branch filtering support
perf, documentation: Description for conditional branch filter
powerpc, perf: Enable SW filtering in branch stack sampling framework
arch/powerpc/include/asm/perf_event_server.h | 2 +-
arch/powerpc/perf/core-book3s.c | 200 +++++++++++++++++++++++++--
arch/powerpc/perf/power8-pmu.c | 25 ++--
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +
include/uapi/linux/perf_event.h | 3 +-
tools/perf/Documentation/perf-record.txt | 3 +-
tools/perf/builtin-record.c | 1 +
7 files changed, 216 insertions(+), 23 deletions(-)
--
1.7.11.7
More information about the Linuxppc-dev
mailing list