BUG: perf error on syscalls for powerpc64.

Michael Ellerman mpe at ellerman.id.au
Fri Jul 17 14:07:23 AEST 2015


On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote:
> On 2015年07月16日 17:04, Michael Ellerman wrote:
> > On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
> >> Hi All,
> >>
> >> 1028ccf5 did a change for sys_call_table from a pointer to an array of
> >> unsigned long, I think it's not proper, here is my reason:
> >>
> >> sys_call_table defined as a label in assembler should be pointer array
> >> rather than an array as described in 1028ccf5. If we defined it as an
> >> array, then arch_syscall_addr will return the address of sys_call_table[],
> >> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
> >> so 'perf list' will ignore all syscalls since find_syscall_meta will
> >> return null
> >> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
> >>
> >> Did I miss something, or Gcc compiler has done something newer ?
> > Hi Zumeng,
> >
> > It works for me with the code as it is in mainline.
> >
> > I don't quite follow your explanation, so if you're seeing a bug please send
> > some information about what you're actually seeing. And include the disassembly
> > of arch_syscall_addr() and your compiler version etc.
> 
> Hi Michael,

Hi Zumeng,

> Yeah, it seems it was not a good explanation, I'll explain more this time:
> 
> 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
>      to sys_call_table rather than sys_call_table self in assemble level.

No it's not a pointer.

A pointer is a location in memory that contains the address of another location
in memory.

>      arch/powerpc/kernel/systbl.S
>      47 .globl sys_call_table   <--- see here
>      48 sys_call_table:

Which gives us a .o that looks like:

  0000000000000000 <sys_call_table>:
                   0: R_PPC64_ADDR64       sys_restart_syscall
                   8: R_PPC64_ADDR64       sys_restart_syscall
                   10: R_PPC64_ADDR64      sys_exit
                   18: R_PPC64_ADDR64      sys_exit

ie. at the location in memory called sys_call_table we have *the contents of
the syscall table*.

We do not have *the address* of the syscall table.

You can also see in the System.map:

  c000000000bb0798 R sys_call_table
  c000000000bb1e58 r cache_type_info

ie. sys_call_table occupies 5824 bytes. If it was a pointer it would only
occupy 8 bytes.

Compare to SYS_CALL_TABLE, which *is* a pointer.

  c000000001172bf8 d SYS_CALL_TABLE
  c000000001172c00 d exception_marker

Note, 8 bytes.


Finally if you look at a running system using xmon:

  0:mon> d $sys_call_table
  c0000000008f0798 c0000000000a85a0 c0000000000a85a0  |................|
  c0000000008f07a8 c000000000099b40 c000000000099b40  |....... at .......@|

  0:mon> la c0000000000a85a0
  c0000000000a85a0: .sys_restart_syscall+0x0/0x40
  0:mon> la c000000000099b40
  c000000000099b40: .SyS_exit+0x0/0x20

  0:mon> d $SYS_CALL_TABLE
  c000000000ec68f8 c0000000008f0798 7265677368657265  |........regshere|
                   ^
  		 this is the address of sys_call_table


As another example, see hcall_real_table, which is basically identical, and is
also declared as an array in C.


> 3. What I have seen in 3.14.x kernel,
> ======================
> And so far, no more difference to 4.x kernel from me about this part if
> I'm right.
> 
> *) With 1028ccf5
> 
> perf list|grep -i syscall got me nothing.
> 
> 
> *) Without 1028ccf5
> root at localhost:~# perf list|grep -i syscall
>    syscalls:sys_enter_socket                          [Tracepoint event]
>    syscalls:sys_exit_socket                           [Tracepoint event]
>    syscalls:sys_enter_socketpair                      [Tracepoint event]
>    syscalls:sys_exit_socketpair                       [Tracepoint event]
>    syscalls:sys_enter_bind                            [Tracepoint event]
>    syscalls:sys_exit_bind                             [Tracepoint event]
>    syscalls:sys_enter_listen                          [Tracepoint event]
>    syscalls:sys_exit_listen                           [Tracepoint event]
>    ... ...

I don't know why that's happening.

Please just test 4.2-rc2 for now, so that there are not too many variables.

Assuming you have CONFIG_FTRACE_SYSCALLS=y, you can see the tracepoints in
debugfs with:

  $ ls -la /sys/kernel/debug/tracing/events/syscalls
  total 0
  drwxr-xr-x 596 root root 0 Jul 17 13:11 .
  drwxr-xr-x  45 root root 0 Jul 17 13:11 ..
  -rw-r--r--   1 root root 0 Jul 17 13:33 enable
  -rw-r--r--   1 root root 0 Jul 17 13:11 filter
  drwxr-xr-x   2 root root 0 Jul 17 13:11 sys_enter_accept
  drwxr-xr-x   2 root root 0 Jul 17 13:11 sys_enter_accept4
  drwxr-xr-x   2 root root 0 Jul 17 13:11 sys_enter_access
  drwxr-xr-x   2 root root 0 Jul 17 13:11 sys_enter_add_key
  ...


cheers





More information about the Linuxppc-dev mailing list