[Cbe-oss-dev] CBE core files have load segments that are off by the size of the SPU notes
John DelSignore
jdelsign at totalviewtech.com
Thu Jun 28 03:13:26 EST 2007
Hi,
An IBM guy pointed me to this list for reporting Linux-Cell problems... I hope this is the right forum.
I believe that I have found a bug in the core file dumper on Linux Cell when there are SPU threads present in the PPU process.
Summary:
CBE core files have load segments that are off by the size of the SPU notes.
Background:
A CBE program can dump a core file, and when it does, it creates notes for the SPU threads that contain all of the registers and memory for the SPU context. Here's an example:
alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>readelf -n core.11685
Notes at offset 0x000003f4 with length 0x0004138c:
Owner Data size Description
CORE 0x0000010c NT_PRSTATUS (prstatus structure)
CORE 0x00000080 NT_PRPSINFO (prpsinfo structure)
CORE 0x000000b0 NT_AUXV (auxiliary vector)
CORE 0x00000108 NT_FPREGSET (floating point registers)
SPU/3/regs 0x00000800 NT_PRSTATUS (prstatus structure)
SPU/3/fpcr 0x00000010 NT_PRSTATUS (prstatus structure)
SPU/3/lslr 0x0000000b NT_PRSTATUS (prstatus structure)
SPU/3/decr 0x0000000b NT_PRSTATUS (prstatus structure)
SPU/3/decr_status 0x0000000b NT_PRSTATUS (prstatus structure)
SPU/3/mem 0x00040000 NT_PRSTATUS (prstatus structure)
SPU/3/signal1 0x00000004 NT_PRSTATUS (prstatus structure)
SPU/3/signal1_type 0x00000002 NT_PRSTATUS (prstatus structure)
SPU/3/signal2 0x00000004 NT_PRSTATUS (prstatus structure)
SPU/3/signal2_type 0x00000002 NT_PRSTATUS (prstatus structure)
SPU/3/event_mask 0x00000008 NT_PRSTATUS (prstatus structure)
SPU/3/event_status 0x00000008 NT_PRSTATUS (prstatus structure)
SPU/3/mbox_info 0x00000004 NT_PRSTATUS (prstatus structure)
SPU/3/ibox_info 0x00000004 NT_PRSTATUS (prstatus structure)
SPU/3/wbox_info 0x00000010 NT_PRSTATUS (prstatus structure)
SPU/3/dma_info 0x00000228 NT_PRSTATUS (prstatus structure)
SPU/3/proxydma_info 0x00000118 NT_PRSTATUS (prstatus structure)
SPU/3/object-id 0x00000013 NT_PRSTATUS (prstatus structure)
CORE 0x0000010c NT_PRSTATUS (prstatus structure)
CORE 0x00000108 NT_FPREGSET (floating point registers)
alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>
As with normal Linux core files, the PPU memory is saved in the program header load segments:
alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>readelf -l core.11685
Elf file type is CORE (Core file)
Entry point 0x0
There are 30 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
NOTE 0x0003f4 0x00000000 0x00000000 0x4138c 0x00000 0
LOAD 0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000
LOAD 0x070000 0x002e0000 0x00000000 0x00000 0x110000 R E 0x10000
LOAD 0x070000 0x003f0000 0x00000000 0x10000 0x10000 R 0x10000
LOAD 0x080000 0x00400000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x090000 0x0e820000 0x00000000 0x00000 0x20000 R E 0x10000
LOAD 0x090000 0x0e840000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x0a0000 0x0e8e0000 0x00000000 0x00000 0x20000 R E 0x10000
LOAD 0x0a0000 0x0e900000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x0b0000 0x0f480000 0x00000000 0x00000 0xc0000 R E 0x10000
LOAD 0x0b0000 0x0f540000 0x00000000 0x10000 0x10000 R 0x10000
LOAD 0x0c0000 0x0f550000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x0d0000 0x0f560000 0x00000000 0x00000 0x160000 R E 0x10000
LOAD 0x0d0000 0x0f6c0000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x0e0000 0x0f6d0000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x0f0000 0x0fb70000 0x00000000 0x00000 0x10000 R E 0x10000
LOAD 0x0f0000 0x0fb80000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x100000 0x0ff90000 0x00000000 0x00000 0x20000 R E 0x10000
LOAD 0x100000 0x0ffb0000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x110000 0x0ffc0000 0x00000000 0x00000 0x20000 R E 0x10000
LOAD 0x110000 0x0ffe0000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x120000 0x10000000 0x00000000 0x00000 0x10000 R E 0x10000
LOAD 0x120000 0x10010000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x130000 0x10020000 0x00000000 0x30000 0x30000 RWE 0x10000
LOAD 0x160000 0xf7780000 0x00000000 0x10000 0x10000 0x10000
LOAD 0x170000 0xf7790000 0x00000000 0x7f0000 0x7f0000 RW 0x10000
LOAD 0x960000 0xf7f80000 0x00000000 0x00000 0x40000 RW 0x10000
LOAD 0x960000 0xf7fc0000 0x00000000 0x10000 0x10000 RW 0x10000
LOAD 0x970000 0xf7fe0000 0x00000000 0x00000 0x10000 RW 0x10000
LOAD 0x970000 0xfce30000 0x00000000 0x150000 0x150000 RW 0x10000
alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>
Description:
There seems to be a bug in the kernel (or whatever creates the core file) that double counts the total size of the SPU notes, and that throws off the LOAD segments' file offset and size of the core file itself.
For example, for the core file above, the total size of the "SPU/*" named notes was 0x40DBC. So, for this segment:
LOAD 0x120000 0x10010000 0x00000000 0x10000 0x10000 RW 0x10000
that is supposed to start at offset 0x120000 in the file, it actually starts at 0x120000+0x40DBC == 0x160DBC in the file.
Analyzing the core file leads me to believe that there is a "hole" (a block of zeros) the size of the total SPU note size (0x40DBC) in the file where the first LOAD segment should begin. Here's what "od -A x -X core.11685" shows for that transitional area of the core file:
041770 00000000 00000000 fff80000 00000000
041780 00000000 00000000 00000000 00000000
*
090db0 00000000 00000000 00000000 7f454c46
090dc0 01020100 00000000 00000000 00030014
The ELF header for the vdso (ELF header starting with 0x7f454c46) loaded at virtual address 0x00100000 appears at file position 0x090dbc in the core file, but should start at offset 0x050000 according to the program LOAD header; all of the LOAD segments were slid down the file by 0x40DBC, and the length of the core file extended by 0x40DBC.
Given that the total SPU note size chunk seems to be "inserted" between the notes and program load segments, to work around this problem, it is possible to add the total SPU note size to the file positions of the PPU LOAD segments.
If/when someone fixes this bug, we'll need to back out that workaround. Alternately, it seems that at least for the core file above, the overall length of the file is equal to the highest file offset plus the total size of the SPU notes, so we might be able to reliably detect the bug at runtime.
GDB gets upset with the core file:
alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>ppu-gdb tx_cell_dynamic_spe1 core.11685
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc64-linux"...Using host libthread_db library "/lib64/libthread_db.so.1".
warning: core file may not match specified executable file.
Failed to read a valid object file image from memory.
Core was generated by `tx_cell_dynamic_spe1 -1 ../spu-gcc_4.1.1_32/tx_loop'.
Program terminated with signal 3, Quit.
warning: you won't be able to access this core file until you terminate
your spu thread.; do ``info files''
(gdb) bt
#0 0x0e8e6e74 in ?? ()
#1 0x0e8e6e48 in ?? ()
(gdb) info th
2 process 11686 0x0f647250 in ?? ()
* 1 process 11685 0x0e8e6e74 in ?? ()
(gdb) thr 2
[Switching to thread 2 (process 11686)]#0 0x0f647250 in ?? ()
(gdb) bt
#0 0x0f647250 in ?? ()
#1 0x0ff95c48 in ?? ()
(gdb)
And with the workaround I mentioned above, our debugger TotalView can do substantially better than GDB:
alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/bld/ppu-gcc_4.1.1_32>cli tx_cell_dynamic_spe1 core.11685 -verbosity errors
Copyright 2007 by TotalView Technologies, LLC. ALL RIGHTS RESERVED.
Copyright 1999-2007 by Etnus, LLC.
Copyright 1999 by Etnus, Inc.
Copyright 1996-1998 by Dolphin Interconnect Solutions, Inc.
Copyright 1989-1996 by BBN Inc.
d1.<> f a w 5
Thread 1.1:
> 0 pthread_join PC=0x0e8e6e74, FP=0xfcf7ec70 [/lib/libpthread.so.0]
1 spe_wait PC=0x0ff97d28, FP=0xfcf7ecb0 [/usr/lib/libspe.so.1]
2 main PC=0x10001ae0, FP=0xfcf7ed00 [/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/src/tx_cell_dynamic_spe1.c#77]
3 generic_start_main PC=0x0f57d63c, FP=0xfcf7ed00 [/lib/libc.so.6]
4 __libc_start_main PC=0x0f57d860, FP=0xfcf7ef30 [/lib/libc.so.6]
Thread 1.2:
> 0 syscall PC=0x0f647250, FP=0xf7f7ed50 [/lib/libc.so.6]
1 do_spe_run PC=0x0ff95c44, FP=0xf7f7edc0 [/usr/lib/libspe.so.1]
2 spe_thread PC=0x0ff96034, FP=0xf7f7ee60 [/usr/lib/libspe.so.1]
3 start_thread PC=0x0e8e69a0, FP=0xf7f7ee60 [/lib/libpthread.so.0]
4 __clone PC=0x0f64b5d0, FP=0xf7f7ee80 [/lib/libc.so.6]
d1.<>
This system is running SDK 2.0 with the SDK 2.1 kernel (2.6.20) slapped into place.
Cheers, John D.
--
John V. DelSignore, Jr. jdelsign at totalviewtech.com
Chief Architect
TotalView Technologies (formerly Etnus) WWW: http://www.totalviewtech.com
24 Prime Parkway Phone: (508) 652-7730
Natick, MA 01760 FAX: (508) 652-7787
More information about the cbe-oss-dev
mailing list