[PATCH V7 00/18] Add data type profiling support for powerpc

Namhyung Kim namhyung at kernel.org
Thu Jul 18 16:43:01 AEST 2024


On Wed, Jul 17, 2024 at 11:12 PM Athira Rajeev
<atrajeev at linux.vnet.ibm.com> wrote:
>
>
>
> > On 18 Jul 2024, at 11:04 AM, Namhyung Kim <namhyung at kernel.org> wrote:
> >
> > Hello,
> >
> > On Sat, Jul 13, 2024 at 10:25:11PM +0530, Athira Rajeev wrote:
> >> The patchset from Namhyung added support for data type profiling
> >> in perf tool. This enabled support to associate PMU samples to data
> >> types they refer using DWARF debug information. With the upstream
> >> perf, currently it possible to run perf report or perf annotate to
> >> view the data type information on x86.
> >>
> >> Initial patchset posted here had changes need to enable data type
> >> profiling support for powerpc.
> >>
> >> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
> >>
> >> Main change were:
> >> 1. powerpc instruction nmemonic table to associate load/store
> >> instructions with move_ops which is use to identify if instruction
> >> is a memory access one.
> >> 2. To get register number and access offset from the given
> >> instruction, code uses fields from "struct arch" -> objump.
> >> Added entry for powerpc here.
> >> 3. A get_arch_regnum to return register number from the
> >> register name string.
> >>
> >> But the apporach used in the initial patchset used parsing of
> >> disassembled code which the current perf tool implementation does.
> >>
> >> Example: lwz     r10,0(r9)
> >>
> >> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> >> registers names and offset. Also to find whether there is a memory
> >> reference in the operands, "memory_ref_char" field of objdump is used.
> >> For x86, "(" is used as memory_ref_char to tackle instructions of the
> >> form "mov  (%rax), %rcx".
> >>
> >> In case of powerpc, not all instructions using "(" are the only memory
> >> instructions. Example, above instruction can also be of extended form (X
> >> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> >> and extract the source/target registers, second patchset added support to use
> >> raw instruction. With raw instruction, macros are added to extract opcode
> >> and register fields.
> >> Link to second patchset:
> >> https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/
> >>
> >> Example representation using --show-raw-insn in objdump gives result:
> >>
> >> 38 01 81 e8     ld      r4,312(r1)
> >>
> >> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> >> this translates to instruction form: "ld RT,DS(RA)" and binary code
> >> as:
> >>  _____________________________________
> >>  | 58 |  RT  |  RA |      DS       | |
> >>  -------------------------------------
> >> 0    6     11    16              30 31
> >>
> >> Second patchset used "objdump" again to read the raw instruction.
> >> But since there is no need to disassemble and binary code can be read
> >> directly from the DSO, third patchset (ie this patchset) uses below
> >> apporach. The apporach preferred in powerpc to parse sample for data
> >> type profiling in V3 patchset is:
> >> - Read directly from DSO using dso__data_read_offset
> >> - If that fails for any case, fallback to using libcapstone
> >> - If libcapstone is not supported, approach will use objdump
> >>
> >> Patchset adds support to pick the opcode and reg fields from this
> >> raw/binary instruction code. This approach came in from review comment
> >> by Segher Boessenkool and Christophe for the initial patchset.
> >>
> >> Apart from that, instruction tracking is enabled for powerpc and
> >> support function is added to find variables defined as registers
> >> Example, in powerpc, below two registers are
> >> defined to represent variable:
> >> 1. r13: represents local_paca
> >> register struct paca_struct *local_paca asm("r13");
> >>
> >> 2. r1: represents stack_pointer
> >> register void *__stack_pointer asm("r1");
> >>
> >> These are handled in this patchset.
> >>
> >> - Patch 1 is to rearrange register state type structures to header file
> >> so that it can referred from other arch specific files
> >> - Patch 2 is to make instruction tracking as a callback to"struct arch"
> >> so that it can be implemented by other archs easily and defined in arch
> >> specific files
> >> - Patch 3 is to handle state type regs array size for x86 and powerpc
> >> - Patch 4 adds support to capture and parse raw instruction in powerpc
> >> using dso__data_read_offset utility
> >> - Patch 4 also adds logic to support using objdump when doing default "perf
> >> report" or "perf annotate" since it that needs disassembled instruction.
> >> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
> >> - Patch 6 update parameters for reg extract functions to use raw
> >> instruction on powerpc
> >> - Patch 7 updates ins__find to carry raw_insn and also adds parse
> >> callback for memory instructions for powerpc
> >> - Patch 8 add support to identify memory instructions of opcode 31 in
> >> powerpc
> >> - Patch 9 adds more instructions to support instruction tracking in powerpc
> >> - Patch 10 and 11 handles instruction tracking for powerpc.
> >> - Patch 12, 13 and 14 add support to use libcapstone in powerpc
> >> - Patch 15 and patch 16 handles support to find global register variables
> >> - PAtch 17 updates data type compare functions data_type_cmp and
> >>  sort__typeoff_sort to include var_name along with type_name in
> >>  comparison.
> >> - Patch 18 handles insn-stat option for perf annotate
> >>
> >> Note:
> >> - There are remaining unknowns (25%) as seen in annotate Instruction stats
> >> below.
> >> - This patchset is not tested on powerpc32. In next step of enhancements
> >> along with handling remaining unknowns, plan to cover powerpc32 changes
> >> based on how testing goes.
> >>
> >> With the current patchset:
> >>
> >> ./perf record -a -e mem-loads sleep 1
> >> ./perf report -s type,typeoff --hierarchy --group --stdio
> >> ./perf annotate --data-type --insn-stat
> >>
> >> perf annotate logs:
> >> ==================
> >>
> >>
> >> Annotate Instruction stats
> >> total 609, ok 446 (73.2%), bad 163 (26.8%)
> >>
> >>  Name/opcode         :  Good   Bad
> >>  -----------------------------------------------------------
> >>  58                  :   323    80
> >>  32                  :    49    43
> >>  34                  :    33    11
> >>  OP_31_XOP_LDX       :     8    20
> >>  40                  :    23     0
> >>  OP_31_XOP_LWARX     :     5     1
> >>  OP_31_XOP_LWZX      :     2     3
> >>  OP_31_XOP_LDARX     :     3     0
> >>  33                  :     0     2
> >>  OP_31_XOP_LBZX      :     0     1
> >>  OP_31_XOP_LWAX      :     0     1
> >>  OP_31_XOP_LHZX      :     0     1
> >>
> >> perf report logs:
> >> =================
> >>
> >>  Total Lost Samples: 0
> >>
> >>  Samples: 1K of event 'mem-loads'
> >>  Event count (approx.): 937238
> >>
> >>  Overhead  Data Type  Data Type Offset
> >> ........  .........  ................
> >>    48.60%  (unknown)  (unknown) +0 (no field)
> >>    11.42%  long unsigned int  long unsigned int +0 (current_stack_pointer)
> >>     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
> >>     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
> >>     2.69%  struct paca_struct  struct paca_struct +2808 (canary)
> >>     2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
> >>     2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
> >>     1.43%  long unsigned int  long unsigned int +0 (no field)
> >>     1.41%  struct vm_fault  struct vm_fault +0 (vma)
> >>     1.29%  struct task_struct  struct task_struct +276 (flags)
> >>     1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
> >>     0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
> >>     0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
> >>     0.76%  struct rq  struct rq +2856 (cpu)
> >>     0.72%  long long unsigned int  long long unsigned int +0 (no field)
> >
> > Thanks for your work!  But I think you need to split the basic part and
> > global register support part which needs more review.
> >
> > For the patch 1 to 14:
> > Reviewed-by: Namhyung Kim <namhyung at kernel.org>
>
> Hi Namhyung
>
> Thanks for all suggestions and reviews. I will check latest comments for patches 15 and 16 (also patch 17 is dependent the global register support part). But patch 18 is not dependent on global register support patches. Along with patches 1 to 14, can you please add patch 18 also ?

Sure, feel free to add it to the patch 18.

Reviewed-by: Namhyung Kim <namhyung at kernel.org>

Thanks,
Namhyung


More information about the Linuxppc-dev mailing list