[Cbe-oss-dev] CBE core files have load segments that are off by the size of the SPU notes

John DelSignore jdelsign at totalviewtech.com
Thu Jun 28 03:13:26 EST 2007


Hi,

An IBM guy pointed me to this list for reporting Linux-Cell problems... I hope this is the right forum.

I believe that I have found a bug in the core file dumper on Linux Cell when there are SPU threads present in the PPU process.

Summary:

CBE core files have load segments that are off by the size of the SPU notes.

Background:

A CBE program can dump a core file, and when it does, it creates notes for the SPU threads that contain all of the registers and memory for the SPU context. Here's an example:

alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>readelf -n core.11685

Notes at offset 0x000003f4 with length 0x0004138c:
  Owner         Data size       Description
  CORE          0x0000010c      NT_PRSTATUS (prstatus structure)
  CORE          0x00000080      NT_PRPSINFO (prpsinfo structure)
  CORE          0x000000b0      NT_AUXV (auxiliary vector)
  CORE          0x00000108      NT_FPREGSET (floating point registers)
  SPU/3/regs    0x00000800      NT_PRSTATUS (prstatus structure)
  SPU/3/fpcr    0x00000010      NT_PRSTATUS (prstatus structure)
  SPU/3/lslr    0x0000000b      NT_PRSTATUS (prstatus structure)
  SPU/3/decr    0x0000000b      NT_PRSTATUS (prstatus structure)
  SPU/3/decr_status 0x0000000b  NT_PRSTATUS (prstatus structure)
  SPU/3/mem     0x00040000      NT_PRSTATUS (prstatus structure)
  SPU/3/signal1 0x00000004      NT_PRSTATUS (prstatus structure)
  SPU/3/signal1_type 0x00000002 NT_PRSTATUS (prstatus structure)
  SPU/3/signal2 0x00000004      NT_PRSTATUS (prstatus structure)
  SPU/3/signal2_type 0x00000002 NT_PRSTATUS (prstatus structure)
  SPU/3/event_mask 0x00000008   NT_PRSTATUS (prstatus structure)
  SPU/3/event_status 0x00000008 NT_PRSTATUS (prstatus structure)
  SPU/3/mbox_info 0x00000004    NT_PRSTATUS (prstatus structure)
  SPU/3/ibox_info 0x00000004    NT_PRSTATUS (prstatus structure)
  SPU/3/wbox_info 0x00000010    NT_PRSTATUS (prstatus structure)
  SPU/3/dma_info  0x00000228    NT_PRSTATUS (prstatus structure)
  SPU/3/proxydma_info 0x00000118 NT_PRSTATUS (prstatus structure)
  SPU/3/object-id 0x00000013     NT_PRSTATUS (prstatus structure)
  CORE          0x0000010c      NT_PRSTATUS (prstatus structure)
  CORE          0x00000108      NT_FPREGSET (floating point registers)
alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>

As with normal Linux core files, the PPU memory is saved in the program header load segments:

alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>readelf -l core.11685

Elf file type is CORE (Core file)
Entry point 0x0
There are 30 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  NOTE           0x0003f4 0x00000000 0x00000000 0x4138c 0x00000     0
  LOAD           0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000
  LOAD           0x070000 0x002e0000 0x00000000 0x00000 0x110000 R E 0x10000
  LOAD           0x070000 0x003f0000 0x00000000 0x10000 0x10000 R   0x10000
  LOAD           0x080000 0x00400000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x090000 0x0e820000 0x00000000 0x00000 0x20000 R E 0x10000
  LOAD           0x090000 0x0e840000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x0a0000 0x0e8e0000 0x00000000 0x00000 0x20000 R E 0x10000
  LOAD           0x0a0000 0x0e900000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x0b0000 0x0f480000 0x00000000 0x00000 0xc0000 R E 0x10000
  LOAD           0x0b0000 0x0f540000 0x00000000 0x10000 0x10000 R   0x10000
  LOAD           0x0c0000 0x0f550000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x0d0000 0x0f560000 0x00000000 0x00000 0x160000 R E 0x10000
  LOAD           0x0d0000 0x0f6c0000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x0e0000 0x0f6d0000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x0f0000 0x0fb70000 0x00000000 0x00000 0x10000 R E 0x10000
  LOAD           0x0f0000 0x0fb80000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x100000 0x0ff90000 0x00000000 0x00000 0x20000 R E 0x10000
  LOAD           0x100000 0x0ffb0000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x110000 0x0ffc0000 0x00000000 0x00000 0x20000 R E 0x10000
  LOAD           0x110000 0x0ffe0000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x120000 0x10000000 0x00000000 0x00000 0x10000 R E 0x10000
  LOAD           0x120000 0x10010000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x130000 0x10020000 0x00000000 0x30000 0x30000 RWE 0x10000
  LOAD           0x160000 0xf7780000 0x00000000 0x10000 0x10000     0x10000
  LOAD           0x170000 0xf7790000 0x00000000 0x7f0000 0x7f0000 RW  0x10000
  LOAD           0x960000 0xf7f80000 0x00000000 0x00000 0x40000 RW  0x10000
  LOAD           0x960000 0xf7fc0000 0x00000000 0x10000 0x10000 RW  0x10000
  LOAD           0x970000 0xf7fe0000 0x00000000 0x00000 0x10000 RW  0x10000
  LOAD           0x970000 0xfce30000 0x00000000 0x150000 0x150000 RW  0x10000
alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>

Description:

There seems to be a bug in the kernel (or whatever creates the core file) that double counts the total size of the SPU notes, and that throws off the LOAD segments' file offset and size of the core file itself.

For example, for the core file above, the total size of the "SPU/*" named notes was 0x40DBC. So, for this segment:

  LOAD 0x120000 0x10010000 0x00000000 0x10000 0x10000 RW  0x10000

that is supposed to start at offset 0x120000 in the file, it actually starts at 0x120000+0x40DBC == 0x160DBC in the file.

Analyzing the core file leads me to believe that there is a "hole" (a block of zeros) the size of the total SPU note size (0x40DBC) in the file where the first LOAD segment should begin. Here's what "od -A x -X core.11685" shows for that transitional area of the core file:

041770 00000000 00000000 fff80000 00000000
041780 00000000 00000000 00000000 00000000
*
090db0 00000000 00000000 00000000 7f454c46
090dc0 01020100 00000000 00000000 00030014

The ELF header for the vdso (ELF header starting with 0x7f454c46) loaded at virtual address 0x00100000 appears at file position 0x090dbc in the core file, but should start at offset 0x050000 according to the program LOAD header; all of the LOAD segments were slid down the file by 0x40DBC, and the length of the core file extended by 0x40DBC.

Given that the total SPU note size chunk seems to be "inserted" between the notes and program load segments, to work around this problem, it is possible to add the total SPU note size to the file positions of the PPU LOAD segments.

If/when someone fixes this bug, we'll need to back out that workaround. Alternately, it seems that at least for the core file above, the overall length of the file is equal to the highest file offset plus the total size of the SPU notes, so we might be able to reliably detect the bug at runtime.

GDB gets upset with the core file:

alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>ppu-gdb tx_cell_dynamic_spe1 core.11685
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc64-linux"...Using host libthread_db library "/lib64/libthread_db.so.1".


warning: core file may not match specified executable file.
Failed to read a valid object file image from memory.
Core was generated by `tx_cell_dynamic_spe1 -1 ../spu-gcc_4.1.1_32/tx_loop'.
Program terminated with signal 3, Quit.

warning: you won't be able to access this core file until you terminate
your spu thread.; do ``info files''
(gdb) bt
#0  0x0e8e6e74 in ?? ()
#1  0x0e8e6e48 in ?? ()
(gdb) info th
  2 process 11686  0x0f647250 in ?? ()
* 1 process 11685  0x0e8e6e74 in ?? ()
(gdb) thr 2
[Switching to thread 2 (process 11686)]#0  0x0f647250 in ?? ()
(gdb) bt
#0  0x0f647250 in ?? ()
#1  0x0ff95c48 in ?? ()
(gdb) 

And with the workaround I mentioned above, our debugger TotalView can do substantially better than GDB:

alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/bld/ppu-gcc_4.1.1_32>cli tx_cell_dynamic_spe1 core.11685 -verbosity errors
Copyright 2007 by TotalView Technologies, LLC. ALL RIGHTS RESERVED.
Copyright 1999-2007 by Etnus, LLC.
Copyright 1999 by Etnus, Inc.
Copyright 1996-1998 by Dolphin Interconnect Solutions, Inc.
Copyright 1989-1996 by BBN Inc.
d1.<> f a w 5
Thread 1.1:
>  0 pthread_join     PC=0x0e8e6e74, FP=0xfcf7ec70 [/lib/libpthread.so.0]
   1 spe_wait         PC=0x0ff97d28, FP=0xfcf7ecb0 [/usr/lib/libspe.so.1]
   2 main             PC=0x10001ae0, FP=0xfcf7ed00 [/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/src/tx_cell_dynamic_spe1.c#77]
   3 generic_start_main PC=0x0f57d63c, FP=0xfcf7ed00 [/lib/libc.so.6]
   4 __libc_start_main PC=0x0f57d860, FP=0xfcf7ef30 [/lib/libc.so.6]
Thread 1.2:
>  0 syscall          PC=0x0f647250, FP=0xf7f7ed50 [/lib/libc.so.6]
   1 do_spe_run       PC=0x0ff95c44, FP=0xf7f7edc0 [/usr/lib/libspe.so.1]
   2 spe_thread       PC=0x0ff96034, FP=0xf7f7ee60 [/usr/lib/libspe.so.1]
   3 start_thread     PC=0x0e8e69a0, FP=0xf7f7ee60 [/lib/libpthread.so.0]
   4 __clone          PC=0x0f64b5d0, FP=0xf7f7ee80 [/lib/libc.so.6]
d1.<> 

This system is running SDK 2.0 with the SDK 2.1 kernel (2.6.20) slapped into place.

Cheers, John D.
-- 
John V. DelSignore, Jr.                 jdelsign at totalviewtech.com
Chief Architect
TotalView Technologies (formerly Etnus) WWW:   http://www.totalviewtech.com
24 Prime Parkway                        Phone: (508) 652-7730
Natick, MA 01760                        FAX:   (508) 652-7787



More information about the cbe-oss-dev mailing list