[Cbe-oss-dev] CBE core files have load segments that are off by the size of the SPU notes

Michael Ellerman michael at ellerman.id.au
Thu Jun 28 17:02:06 EST 2007


On Wed, 2007-06-27 at 13:13 -0400, John DelSignore wrote:
> Hi,
> 
> An IBM guy pointed me to this list for reporting Linux-Cell
> problems... I hope this is the right forum.

Hi John,

This is indeed the right place. Thanks for the excellent bug report.

> I believe that I have found a bug in the core file dumper on Linux
> Cell when there are SPU threads present in the PPU process.

Sure looks like it :)

> Notes at offset 0x000003f4 with length 0x0004138c:
>   Owner         Data size       Description
>   CORE          0x0000010c      NT_PRSTATUS (prstatus structure)
>   CORE          0x00000080      NT_PRPSINFO (prpsinfo structure)
>   CORE          0x000000b0      NT_AUXV (auxiliary vector)
>   CORE          0x00000108      NT_FPREGSET (floating point registers)
>   SPU/3/regs    0x00000800      NT_PRSTATUS (prstatus structure)
>   SPU/3/fpcr    0x00000010      NT_PRSTATUS (prstatus structure)
>   SPU/3/lslr    0x0000000b      NT_PRSTATUS (prstatus structure)
>   SPU/3/decr    0x0000000b      NT_PRSTATUS (prstatus structure)
>   SPU/3/decr_status 0x0000000b  NT_PRSTATUS (prstatus structure)
>   SPU/3/mem     0x00040000      NT_PRSTATUS (prstatus structure)
>   SPU/3/signal1 0x00000004      NT_PRSTATUS (prstatus structure)
>   SPU/3/signal1_type 0x00000002 NT_PRSTATUS (prstatus structure)
>   SPU/3/signal2 0x00000004      NT_PRSTATUS (prstatus structure)
>   SPU/3/signal2_type 0x00000002 NT_PRSTATUS (prstatus structure)
>   SPU/3/event_mask 0x00000008   NT_PRSTATUS (prstatus structure)
>   SPU/3/event_status 0x00000008 NT_PRSTATUS (prstatus structure)
>   SPU/3/mbox_info 0x00000004    NT_PRSTATUS (prstatus structure)
>   SPU/3/ibox_info 0x00000004    NT_PRSTATUS (prstatus structure)
>   SPU/3/wbox_info 0x00000010    NT_PRSTATUS (prstatus structure)
>   SPU/3/dma_info  0x00000228    NT_PRSTATUS (prstatus structure)
>   SPU/3/proxydma_info 0x00000118 NT_PRSTATUS (prstatus structure)
>   SPU/3/object-id 0x00000013     NT_PRSTATUS (prstatus structure)
>   CORE          0x0000010c      NT_PRSTATUS (prstatus structure)
>   CORE          0x00000108      NT_FPREGSET (floating point registers)
> alg:/nfs/algae/u0/home/jdelsign/tvbld/linux-power/algae/totalview/debugger/src/tests/bld/ppu-gcc_4.1.1_32>


> For example, for the core file above, the total size of the "SPU/*"
> named notes was 0x40DBC. So, for this segment:
> 
>   LOAD 0x120000 0x10010000 0x00000000 0x10000 0x10000 RW  0x10000
> 
> that is supposed to start at offset 0x120000 in the file, it actually
> starts at 0x120000+0x40DBC == 0x160DBC in the file.

By my counting the total of the SPU notes is 0x40bb8. I take it there's
0x204 hardcoded somewhere?

> Analyzing the core file leads me to believe that there is a "hole" (a
> block of zeros) the size of the total SPU note size (0x40DBC) in the
> file where the first LOAD segment should begin. Here's what "od -A x
> -X core.11685" shows for that transitional area of the core file:
> 
> 041770 00000000 00000000 fff80000 00000000
> 041780 00000000 00000000 00000000 00000000
> *
> 090db0 00000000 00000000 00000000 7f454c46
> 090dc0 01020100 00000000 00000000 00030014

Yep, I see that here too.


The bug is in the kernel code that does the coredump. In particular the
interaction between the generic code, and the code that dumps the SPU
notes (which is specific to cell). It's just screwing up the amount of
data written and erroneously incrementing the file position.

> The ELF header for the vdso (ELF header starting with 0x7f454c46)
> loaded at virtual address 0x00100000 appears at file position 0x090dbc
> in the core file, but should start at offset 0x050000 according to the
> program LOAD header; all of the LOAD segments were slid down the file
> by 0x40DBC, and the length of the core file extended by 0x40DBC.
> 
> Given that the total SPU note size chunk seems to be "inserted"
> between the notes and program load segments, to work around this
> problem, it is possible to add the total SPU note size to the file
> positions of the PPU LOAD segments.
> 
> If/when someone fixes this bug, we'll need to back out that
> workaround. Alternately, it seems that at least for the core file
> above, the overall length of the file is equal to the highest file
> offset plus the total size of the SPU notes, so we might be able to
> reliably detect the bug at runtime.

OK, I see that too. And with my fix it's back to length of file ==
highest file offset.

Patch coming soon.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/cbe-oss-dev/attachments/20070628/8f7926c3/attachment.pgp>


More information about the cbe-oss-dev mailing list