[PATCH V2 0/6] perf: New conditional branch filter

Stephane Eranian eranian at google.com
Fri Aug 30 21:48:09 EST 2013


2013/8/30 Anshuman Khandual <khandual at linux.vnet.ibm.com>
>
>         This patchset is the re-spin of the original branch stack sampling
> patchset which introduced new PERF_SAMPLE_BRANCH_COND filter. This patchset
> also enables SW based branch filtering support for PPC64 platforms which have
> branch stack sampling support. With this new enablement, the branch filter support
> for PPC64 platforms have been extended to include all these combinations discussed
> below with a sample test application program.
>
>
I am trying to understand which HW has support for capturing the
branches: PPC7 or PPC8.
Then it seems you're saying that only PPC8 has the filtering support.
On PPC7 you use the
SW filter. Did I get this right?

I will look at the patch set.

>
> (1) perf record -e branch-misses:u -b ./cprog
> # Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
> # ........  .......  ....................  .....................  ....................  .....................
> #
>      4.42%    cprog  cprog                 [k] sw_4_2             cprog                 [k] lr_addr
>      4.41%    cprog  cprog                 [k] symbol2            cprog                 [k] hw_1_2
>      4.41%    cprog  cprog                 [k] ctr_addr           cprog                 [k] sw_4_1
>      4.41%    cprog  cprog                 [k] lr_addr            cprog                 [k] sw_4_2
>      4.41%    cprog  cprog                 [k] sw_4_2             cprog                 [k] callme
>      4.41%    cprog  cprog                 [k] symbol1            cprog                 [k] hw_1_1
>      4.41%    cprog  cprog                 [k] success_3_1_3      cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] sw_4_1             cprog                 [k] ctr_addr
>      2.43%    cprog  cprog                 [k] hw_1_2             cprog                 [k] symbol2
>      2.43%    cprog  cprog                 [k] callme             cprog                 [k] hw_1_2
>      2.43%    cprog  cprog                 [k] address1           cprog                 [k] back1
>      2.43%    cprog  cprog                 [k] back1              cprog                 [k] callme
>      2.43%    cprog  cprog                 [k] hw_2_1             cprog                 [k] address1
>      2.43%    cprog  cprog                 [k] sw_3_1_1           cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] sw_3_1_2           cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] sw_3_1_3           cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_1
>      2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_2
>      2.43%    cprog  cprog                 [k] sw_3_1             cprog                 [k] sw_3_1_3
>      2.43%    cprog  cprog                 [k] callme             cprog                 [k] sw_3_1
>      2.43%    cprog  cprog                 [k] callme             cprog                 [k] sw_4_2
>      2.43%    cprog  cprog                 [k] hw_1_1             cprog                 [k] symbol1
>      2.43%    cprog  cprog                 [k] callme             cprog                 [k] hw_1_1
>      2.42%    cprog  cprog                 [k] sw_3_1             cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] success_3_1_1      cprog                 [k] sw_3_1
>      1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_1
>      1.99%    cprog  cprog                 [k] address2           cprog                 [k] back2
>      1.99%    cprog  cprog                 [k] hw_2_2             cprog                 [k] address2
>      1.99%    cprog  cprog                 [k] back2              cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] callme             cprog                 [k] main
>      1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_3
>      1.99%    cprog  cprog                 [k] hw_1_1             cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] sw_3_2             cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] callme             cprog                 [k] sw_3_2
>      1.99%    cprog  cprog                 [k] success_3_1_2      cprog                 [k] sw_3_1
>      1.99%    cprog  cprog                 [k] sw_3_1             cprog                 [k] success_3_1_2
>      1.99%    cprog  cprog                 [k] hw_1_2             cprog                 [k] callme
>      1.99%    cprog  cprog                 [k] sw_4_1             cprog                 [k] callme
>      0.02%    cprog  [unknown]             [k] 0xf7ba2328         [unknown]             [k] 0xf7ba2320
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_overflow  libc-2.11.2.so        [k] _IO_file_overflow
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_xsputn    libc-2.11.2.so        [k] _IO_file_overflow
>      0.00%    cprog  cprog                 [k] callme             cprog                 [k] hw_2_2
>
> PMU filters
> -----------
> (2) perf record -e branch-misses:u -j any_call ./cprog
>
> # Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object           Target Symbol
> # ........  .......  ....................  .......................  ....................  ......................
> #
>      7.82%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_2
>      6.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_2
>      6.88%    cprog  cprog                 [k] hw_1_1               cprog                 [k] symbol1
>      5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_1
>      5.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_1_1
>      5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_1
>      5.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] sw_3_1_3
>      5.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_1_2
>      5.88%    cprog  cprog                 [k] hw_1_2               cprog                 [k] symbol2
>      5.88%    cprog  cprog                 [k] sw_4_2               cprog                 [k] lr_addr
>      5.88%    cprog  cprog                 [k] callme               cprog                 [k] sw_4_2
>      4.88%    cprog  cprog                 [k] sw_3_1               cprog                 [k] success_3_1_3
>      4.88%    cprog  cprog                 [k] callme               cprog                 [k] sw_3_2
>      4.88%    cprog  cprog                 [k] callme               cprog                 [k] hw_2_2
>      3.94%    cprog  cprog                 [k] callme               cprog                 [k] sw_3_1
>      3.94%    cprog  cprog                 [k] callme               cprog                 [k] hw_2_1
>      2.94%    cprog  cprog                 [k] main                 cprog                 [k] callme
>      2.94%    cprog  cprog                 [k] sw_4_1               cprog                 [k] ctr_addr
>      2.94%    cprog  cprog                 [k] callme               cprog                 [k] sw_4_1
>      0.01%    cprog  [unknown]             [k] 0xf79076c4           [unknown]             [k] 0xf78f22c0
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_doallocate  libc-2.11.2.so        [k] _IO_setb
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_doallocate  libc-2.11.2.so        [k] mmap
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_xsputn      libc-2.11.2.so        [k] _IO_default_xsputn
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_file_overflow    libc-2.11.2.so        [k] _IO_do_write
>      0.00%    cprog  ld-2.11.2.so          [k] malloc               [unknown]             [k] 0xf790b380
>
>
> (3) perf record -e branch-misses:u -j cond ./cprog
> # Overhead  Command  Source Shared Object       Source Symbol  Target Shared Object            Target Symbol
> # ........  .......  ....................  ..................  ....................  .......................
> #
>     24.85%    cprog  [unknown]             [k] 00000000        cprog                 [k] callme
>     15.71%    cprog  cprog                 [k] sw_3_1          cprog                 [k] sw_3_1
>      7.14%    cprog  cprog                 [k] sw_4_2          cprog                 [k] lr_addr
>      6.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] sw_4_2
>      4.57%    cprog  cprog                 [k] hw_2_2          cprog                 [k] callme
>      4.57%    cprog  cprog                 [k] sw_3_1_1        cprog                 [k] sw_3_1
>      4.57%    cprog  cprog                 [k] sw_4_1          cprog                 [k] ctr_addr
>      4.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] sw_4_1
>      4.57%    cprog  cprog                 [k] main            cprog                 [k] hw_1_1
>      4.57%    cprog  cprog                 [k] hw_1_2          cprog                 [k] hw_1_2
>      4.57%    cprog  [unknown]             [k] 00000000        cprog                 [k] main
>      4.57%    cprog  cprog                 [k] hw_2_1          cprog                 [k] callme
>      4.57%    cprog  cprog                 [k] sw_3_1_3        cprog                 [k] sw_3_1
>      4.57%    cprog  cprog                 [k] sw_3_1_2        cprog                 [k] sw_3_1
>      0.01%    cprog  [unknown]             [k] 0xf7aa25dc      [unknown]             [k] 0xf7aa27e4
>      0.00%    cprog  libc-2.11.2.so        [k] _IO_doallocbuf  libc-2.11.2.so        [k] _IO_file_doallocate
>      0.00%    cprog  [unknown]             [k] 00000000        libc-2.11.2.so        [k] _IO_file_doallocate
>      0.00%    cprog  [unknown]             [k] 00000000        libc-2.11.2.so        [k] _IO_file_stat
>
> SW filters
> ----------
> (4) perf record -e branch-misses:u -j any_ret ./cprog
> # Overhead  Command  Source Shared Object      Source Symbol  Target Shared Object   Target Symbol
> # ........  .......  ....................  .................  ....................  ..............
> #
>      7.91%    cprog  cprog                 [k] symbol1        cprog                 [k] hw_1_1
>      7.91%    cprog  cprog                 [k] success_3_1_3  cprog                 [k] sw_3_1
>      7.91%    cprog  cprog                 [k] ctr_addr       cprog                 [k] sw_4_1
>      7.91%    cprog  cprog                 [k] lr_addr        cprog                 [k] sw_4_2
>      7.91%    cprog  cprog                 [k] symbol2        cprog                 [k] hw_1_2
>      7.90%    cprog  cprog                 [k] sw_4_2         cprog                 [k] callme
>      4.34%    cprog  cprog                 [k] success_3_1_2  cprog                 [k] sw_3_1
>      4.33%    cprog  cprog                 [k] sw_4_1         cprog                 [k] callme
>      4.33%    cprog  cprog                 [k] hw_1_2         cprog                 [k] callme
>      4.33%    cprog  cprog                 [k] success_3_1_1  cprog                 [k] sw_3_1
>      4.33%    cprog  cprog                 [k] sw_3_2         cprog                 [k] callme
>      4.33%    cprog  cprog                 [k] back2          cprog                 [k] callme
>      4.33%    cprog  cprog                 [k] callme         cprog                 [k] main
>      4.33%    cprog  cprog                 [k] hw_1_1         cprog                 [k] callme
>      3.58%    cprog  cprog                 [k] sw_3_1         cprog                 [k] callme
>      3.58%    cprog  cprog                 [k] sw_3_1_1       cprog                 [k] sw_3_1
>      3.58%    cprog  cprog                 [k] sw_3_1_2       cprog                 [k] sw_3_1
>      3.58%    cprog  cprog                 [k] back1          cprog                 [k] callme
>      3.57%    cprog  cprog                 [k] sw_3_1_3       cprog                 [k] sw_3_1
>      0.00%    cprog  [unknown]             [k] 0xf7abacf4     [unknown]             [k] 0xf7abae40
>
>
> (5) perf record -e branch-misses:u -j ind_call ./cprog
> # Overhead  Command  Source Shared Object  Source Symbol  Target Shared Object  Target Symbol
> # ........  .......  ....................  .............  ....................  .............
> #
>     63.56%    cprog  cprog                 [k] sw_4_2     cprog                 [k] lr_addr
>     36.44%    cprog  cprog                 [k] sw_4_1     cprog                 [k] ctr_addr
>
>
> Mixed filters
> -------------
> (6) perf record -e branch-misses:u -j any_call,any_ret ./cprog
> Error:
> The perf.data file has no samples!
>
> NOTE: As expected. The HW filters all the branches which are calls and SW tries to find return
> branches in that given set. Both the filters are mutually exclussive, so obviously no samples
> found in the end profile.
>
> (7) perf record -e branch-misses:u -j any_call,ind_call ./cprog
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object   Target Symbol
> # ........  .......  ....................  ..............  ....................  ..............
> #
>     66.69%    cprog  cprog                 [k] sw_4_2      cprog                 [k] lr_addr
>     33.31%    cprog  cprog                 [k] sw_4_1      cprog                 [k] ctr_addr
>      0.00%    cprog  [unknown]             [k] 0x0fe7f264  [unknown]             [k] 0x0ff926d0
>
>
> (8) perf record -e branch-misses:u -j any_call,any_ret,ind_call ./cprog
> Error:
> The perf.data file has no samples!
>
> (9) perf record -e branch-misses:u -j cond,any_ret ./cprog
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object            Target Symbol
> # ........  .......  ....................  ..............  ....................  .......................
> #
>     46.01%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme
>     13.54%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2
>      8.18%    cprog  cprog                 [k] sw_3_1_2    cprog                 [k] sw_3_1
>      8.07%    cprog  [unknown]             [k] 00000000    cprog                 [k] main
>      8.07%    cprog  cprog                 [k] sw_3_1_1    cprog                 [k] sw_3_1
>      8.07%    cprog  cprog                 [k] sw_3_1_3    cprog                 [k] sw_3_1
>      8.07%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1
>      0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7c1480c
>      0.00%    cprog  libc-2.11.2.so        [k] mmap        libc-2.11.2.so        [k] _IO_file_doallocate
>
> (10) perf record -e branch-misses:u -j cond,ind_call ./cprog
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object   Target Symbol
> # ........  .......  ....................  ..............  ....................  ..............
> #
>     48.11%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme
>     13.52%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2
>     12.42%    cprog  cprog                 [k] sw_4_2      cprog                 [k] lr_addr
>      8.65%    cprog  [unknown]             [k] 00000000    cprog                 [k] main
>      8.65%    cprog  cprog                 [k] sw_4_1      cprog                 [k] ctr_addr
>      8.65%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1
>      0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7a4581c
>
>
> (11) perf record -e branch-misses:u -j cond,any_ret,ind_call ./cprog
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object      Target Symbol
> # ........  .......  ....................  ..............  ....................  .................
> #
>     45.91%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme
>     13.26%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_2
>      8.17%    cprog  cprog                 [k] sw_3_1_3    cprog                 [k] sw_3_1
>      8.17%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_4_1
>      8.17%    cprog  cprog                 [k] sw_3_1_2    cprog                 [k] sw_3_1
>      8.17%    cprog  [unknown]             [k] 00000000    cprog                 [k] main
>      8.16%    cprog  cprog                 [k] sw_3_1_1    cprog                 [k] sw_3_1
>      0.00%    cprog  [unknown]             [k] 00000000    [unknown]             [k] 0xf7f87704
>      0.00%    cprog  [unknown]             [k] 00000000    libc-2.11.2.so        [k] _IO_file_sync
>
> Test application program
> ========================
> (1) Makefile:
> --------------------------------------------
> all: sample.o cprog of.cprog of.sample
>
> sample.o: sample.s
>         as -o sample.o sample.s
> cprog: cprog.c sample.o
>         gcc -o cprog cprog.c sample.o
> of.sample: sample.o
>         objdump -d sample.o > of.sample
> of.cprog: cprog
>         objdump -d cprog > of.cprog
> clean:
>         rm sample.o cprog of.sample of.cprog
> ---------------------------------------------
> (2) cprog.c
> ---------------------------------------------
> #include <stdio.h>
> #define LOOP_COUNT 100000
>
> extern void callme(void);
>
> int main(int argc, char *argv[])
> {
>         int i;
>         for(i = 0; i < LOOP_COUNT; i++)
>                 callme();
>
>         printf("end");
>         return 0;
> }
> ---------------------------------------------
> (3) sample.S
> ---------------------------------------------
> # r25, r26, r27 will be used as first level, second level
> # and third level stack for LR. Register r20, r21, r22, r23
> # r24 will be used for general programming purpose.
>
> .data
>
> msg:
>         .string "BHRB filter tests\n"
>         len = . - msg
> msg_1_1:
>         .string "Test: hw_1_1\n"
>         len_1_1 = 13
> msg_1_2:
>         .string "Test: hw_1_2\n"
>         len_1_2 = 13
> msg_2_1:
>         .string "Test: hw_2_1\n"
>         len_2_1 = 13
> msg_2_2:
>         .string "Test: hw_2_2\n"
>         len_2_2 = 13
> msg_3_1:
>         .string "Test: sw_3_1\n"
>         len_3_1 = 13
> msg_3_1_1:
>         .string "Test: sw_3_1_1\n"
>         len_3_1_1 = 15
> msg_3_1_2:
>         .string "Test: sw_3_1_2\n"
>         len_3_1_2 = 15
> msg_3_1_3:
>         .string "Test: sw_3_1_3\n"
>         len_3_1_3 = 15
> msg_3_2:
>         .string "Test: sw_3_2\n"
>         len_3_3 = 13
> msg_4_1:
>         .string "Test: sw_4_1\n"
>         len_4_1 = 13
> msg_4_2:
>         .string "Test: sw_4_2\n"
>         len_4_2 = 13
>
> hw_3_1_1_passed:
>         .string "\thw_3_1_1_passed\n\n"
>         len_hw_3_1_1_passed = 18
> hw_3_1_2_passed:
>         .string "\thw_3_1_2_passed\n\n"
>         len_hw_3_1_2_passed = 18
> hw_3_1_3_passed:
>         .string "\thw_3_1_3_passed\n\n"
>         len_hw_3_1_3_passed = 18
>
> hw_2_1_passed:
>         .string "\thw_2_1_passed\n\n"
>         len_hw_2_1_passed = 16
>
> hw_2_2_passed:
>         .string "\thw_2_2_passed\n\n"
>         len_hw_2_2_passed = 16
>
> hw_1_1_passed:
>         .string "\thw_1_1_passed\n\n"
>         len_hw_1_1_passed = 16
>
> hw_1_2_passed:
>         .string "\thw_1_2_passed\n\n"
>         len_hw_1_2_passed = 16
>
> hw_4_1_passed:
>         .string "\thw_4_1_passed\n\n"
>         len_hw_4_1_passed = 16
>
> hw_4_2_passed:
>         .string "\thw_4_2_passed\n\n"
>         len_hw_4_2_passed = 16
>
> msg_error:
>         .string "\tError\n"
>         len_error = 7
> .text
>         .global callme
>         .global hw_1_1
>         .global hw_1_2
>         .global hw_2_1
>         .global hw_2_2
>
> # HW filter test symbols
> symbol1:
>         # Print "hw_1_1_passed"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_1_1_passed at ha
>         addi    4, 4, hw_1_1_passed at l
>         li      5, len_hw_1_1_passed
>         sc
>
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> hw_1_1:
>         # Save LR - second level
>         mflr 26
>
>         # Print "hw_1_1 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_1_1 at ha
>         addi    4, 4, msg_1_1 at l
>         li      5, len_1_1
>         sc
>
>         bl symbol1                      # PERF_SAMPLE_BRANCH_ANY_CALL
>
>         # Restore LR
>         mtlr 26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> symbol2:
>         # Print "Symbol2 taken"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_1_2_passed at ha
>         addi    4, 4, hw_1_2_passed at l
>         li      5, len_hw_1_2_passed
>         sc
>
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> hw_1_2:
>         # Save LR - second level
>         mflr 26
>
>         # Print "hw_1_2 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_1_2 at ha
>         addi    4, 4, msg_1_2 at l
>         li      5, len_1_2
>         sc
>
>         li 4,20
>         cmpi 0,4,20
>         bcl 12, 4*cr0+2, symbol2        # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
>
>         mtlr 26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> # HW filter test
>
> address1:
>         # Print "hw_2_1_passed"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_2_1_passed at ha
>         addi    4, 4, hw_2_1_passed at l
>         li      5, len_hw_2_1_passed
>         sc
>         b  back1                        # PERF_SAMPLE_BRANCH_ANY
>
> hw_2_1:
>         # Print "hw_2_1 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_2_1 at ha
>         addi    4, 4, msg_2_1 at l
>         li      5, len_2_1
>         sc
>
>         # Simple conditional branch (equal)
>         li      20, 12
>         cmpi    3, 20, 12
>         bc      12, 4*cr3+2, address1   # PERF_SAMPLE_BRANCH_COND
>
> back1:
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> address2:
>         # Print "hw_2_2_passed"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_2_2_passed at ha
>         addi    4, 4, hw_2_2_passed at l
>         li      5, len_hw_2_2_passed
>         sc
>         b  back2                        # PERF_SAMPLE_BRANCH_ANY
>
> hw_2_2:
>         # Print "hw_2_2 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_2_2 at ha
>         addi    4, 4, msg_2_2 at l
>         li      5, len_2_2
>         sc
>
>         # Simple conditional branch (less than)
>         li      20, 12
>         cmpi    4, 20, 20
>         bc      12, 4*cr4+0, address2   # PERF_SAMPLE_BRANCH_COND
> back2:
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> # SW filter test symbols
> sw_3_1_1:
>         # Print "Test: sw_3_1_1"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_1_1 at ha
>         addi    4, 4, msg_3_1_1 at l
>         li      5, len_3_1_1
>         sc
>
>         li      22,0
>         # Test the condition and return
>         li      21, 10
>         cmpi    0, 21, 10
>         bclr    12, 2                   # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
>
>         # Should not have come here
>         li      0, 4
>         li      3, 1
>         lis     4, msg_error at ha
>         addi    4, 4, msg_error at l
>         li      5, len_error
>         sc
>
>         # Mark the error
>         li      22, 1
>
>         # Safe fall back
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> sw_3_1_2:
>         # Print "Test: sw_3_1_2"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_1_2 at ha
>         addi    4, 4, msg_3_1_2 at l
>         li      5, len_3_1_2
>         sc
>
>         li      23, 0
>         # Test the condition and return
>         li      21, 10
>         cmpi    0, 21, 20
>         bclr    12, 0                   # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
>
>         # Should not have come here
>         li      0, 4
>         li      3, 1
>         lis     4, msg_error at ha
>         addi    4, 4, msg_error at l
>         li      5, len_error
>         sc
>
>         # Mark the error
>         li      23, 1
>
>         # Safe fall back
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> sw_3_1_3:
>         # Print "Test: sw_3_1_3"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_1_3 at ha
>         addi    4, 4, msg_3_1_3 at l
>         li      5, len_3_1_3
>         sc
>
>         li      24, 0
>         # Test the condition and return
>         li      21, 10
>         cmpi    0, 21, 5
>         bclr    12, 1                   # PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
>
>         # Mark the error
>         li      24, 1
>
>         # Should not have come here
>         li      0, 4
>         li      3, 1
>         lis     4, msg_error at ha
>         addi    4, 4, msg_error at l
>         li      5, len_error
>         sc
>
>         # Safe fall back
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> success_3_1_1:
>         li      0, 4
>         li      3, 1
>         lis     4, hw_3_1_1_passed at ha
>         addi    4, 4, hw_3_1_1_passed at l
>         li      5, len_hw_3_1_1_passed
>         sc
>         blr
>
> success_3_1_2:
>         li      0, 4
>         li      3, 1
>         lis     4, hw_3_1_2_passed at ha
>         addi    4, 4, hw_3_1_2_passed at l
>         li      5, len_hw_3_1_2_passed
>         sc
>         blr
>
> success_3_1_3:
>         li      0, 4
>         li      3, 1
>         lis     4, hw_3_1_3_passed at ha
>         addi    4, 4, hw_3_1_3_passed at l
>         li      5, len_hw_3_1_3_passed
>         sc
>         blr
>
> sw_3_1:
>         # Save LR
>         mflr 26
>
>         # Print "Test: sw_3_1"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_1 at ha
>         addi    4, 4, msg_3_1 at l
>         li      5, len_3_1
>         sc
>
>         # Equal comparison condition
>         bl sw_3_1_1                     # PERF_SAMPLE_BRANCH_ANY_CALL
>         cmpi    0, 22, 0
>         bcl     12, 2, success_3_1_1    # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
>
>         # LT comparison condition
>         bl sw_3_1_2                     # PERF_SAMPLE_BRANCH_ANY_CALL
>         cmpi    0, 23, 0
>         bcl     12, 2, success_3_1_2    # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
>
>         # GT comparison condition
>         bl sw_3_1_3                     # PERF_SAMPLE_BRANCH_ANY_CALL
>         cmpi    0, 24, 0
>         bcl     12, 2, success_3_1_3    # PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND
>
>         mtlr 26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> sw_3_2:
>         # Print "Test: sw_3_2"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_3_2 at ha
>         addi    4, 4, msg_3_2 at l
>         li      5, len_3_1
>         sc
>
>         # FIXME: Anything more here ?
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> # Indirect call tests
>
> # CTR
> ctr_addr:
>         # Print "bcctr taken"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_4_1_passed at ha
>         addi    4, 4, hw_4_1_passed at l
>         li      5, len_hw_4_1_passed
>         sc
>
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> sw_4_1:
>         # Save LR
>         mflr    26
>
>         # Print "sw_4_1 called"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_4_1 at ha
>         addi    4, 4, msg_4_1 at l
>         li      5, len_4_1
>         sc
>
>         # Save address in CTR
>         lis     20, ctr_addr at ha
>         addi    20, 20, ctr_addr at l
>         mtctr   20
>
>
>         # Compare and jump to CTR
>         li      21, 10
>         cmpi    0, 21, 10
>         bcctrl  12, 4*cr0+2             # PERF_SAMPLE_BRANCH_IND_CALL
>
>         mtlr    26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> # LR
> lr_addr:
>         # Print "bclrl taken"
>         li      0, 4
>         li      3, 1
>         lis     4, hw_4_2_passed at ha
>         addi    4, 4, hw_4_2_passed at l
>         li      5, len_hw_4_2_passed
>         sc
>
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> sw_4_2:
>         # Save LR
>         mflr    26
>
>         # Print "Test: sw_4_2"
>         li      0, 4
>         li      3, 1
>         lis     4, msg_4_2 at ha
>         addi    4, 4, msg_4_2 at l
>         li      5, len_4_2
>         sc
>
>         # Save address in LR
>         lis     20, lr_addr at ha
>         addi    20, 20, lr_addr at l
>         mtlr    20
>
>
>         # Compare and jump to CTR
>         li      21, 10
>         cmpi    0, 21, 10
>         bclrl   12, 4*cr0+2             # PERF_SAMPLE_BRANCH_IND_CALL
>
>         # Restore LR
>         mtlr    26
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
>
> callme:
>         # Save LR
>         mflr    25
>
>         # Print "Branch filter Test"
>         li      0, 4
>         li      3, 1
>         lis     4, msg at ha
>         addi    4, 4, msg at l
>         li      5, len
>         sc
>
>         # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl hw_1_1                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl hw_1_2                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         # PERF_SAMPLE_BRANCH_COND
>         bl hw_2_1                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl hw_2_2                       # PERF_SAMPLE_BRANCH_ANY_CALL
>
>         # PERF_SAMPLE_BRANCH_ANY_RET
>         bl sw_3_1                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl sw_3_2                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         # PERF_SAMPLE_BRANCH_IND_CALL
>         bl sw_4_1                       # PERF_SAMPLE_BRANCH_ANY_CALL
>         bl sw_4_2                       # PERF_SAMPLE_BRANCH_ANY_CALL
>
>         # Restore LR
>         mtlr 25
>         blr                             # PERF_SAMPLE_BRANCH_ANY_RET
> --------------------------------------------------------------------
>
> Changes in V2
> --------------
> (1) Enabled PPC64 SW branch filtering support
> (2) Incorporated changes required for all previous comments
>
> Anshuman Khandual (6):
>   perf: New conditional branch filter criteria in branch stack sampling
>   powerpc, perf: Enable conditional branch filter for POWER8
>   perf, tool: Conditional branch filter 'cond' added to perf record
>   x86, perf: Add conditional branch filtering support
>   perf, documentation: Description for conditional branch filter
>   powerpc, perf: Enable SW filtering in branch stack sampling framework
>
>  arch/powerpc/include/asm/perf_event_server.h |   2 +-
>  arch/powerpc/perf/core-book3s.c              | 200 +++++++++++++++++++++++++--
>  arch/powerpc/perf/power8-pmu.c               |  25 ++--
>  arch/x86/kernel/cpu/perf_event_intel_lbr.c   |   5 +
>  include/uapi/linux/perf_event.h              |   3 +-
>  tools/perf/Documentation/perf-record.txt     |   3 +-
>  tools/perf/builtin-record.c                  |   1 +
>  7 files changed, 216 insertions(+), 23 deletions(-)
>
> --
> 1.7.11.7
>


More information about the Linuxppc-dev mailing list