[Cbe-oss-dev] CBE core files have load segments that are off by the size of the SPU notes

John DelSignore jdelsign at totalviewtech.com
Thu Jun 28 22:42:46 EST 2007


Hi Michael,

Thanks for the reply...

Michael Ellerman wrote:
> On Wed, 2007-06-27 at 13:13 -0400, John DelSignore wrote:
...
>> For example, for the core file above, the total size of the "SPU/*"
>> named notes was 0x40DBC. So, for this segment:
>>
>>   LOAD 0x120000 0x10010000 0x00000000 0x10000 0x10000 RW  0x10000
>>
>> that is supposed to start at offset 0x120000 in the file, it actually
>> starts at 0x120000+0x40DBC == 0x160DBC in the file.
> 
> By my counting the total of the SPU notes is 0x40bb8. I take it there's
> 0x204 hardcoded somewhere?

readelf shows only the size of the data, but not the size of the note header (12 bytes) and the name string length (variable length and aligned to a sizeof(long) boundary). So, if you added up just the data size, it could be off by 0x204 which seems a reasonable total length for the note headers and names.

>> Analyzing the core file leads me to believe that there is a "hole" (a
>> block of zeros) the size of the total SPU note size (0x40DBC) in the
>> file where the first LOAD segment should begin. Here's what "od -A x
>> -X core.11685" shows for that transitional area of the core file:
>>
>> 041770 00000000 00000000 fff80000 00000000
>> 041780 00000000 00000000 00000000 00000000
>> *
>> 090db0 00000000 00000000 00000000 7f454c46
>> 090dc0 01020100 00000000 00000000 00030014
> 
> Yep, I see that here too.
> 
> 
> The bug is in the kernel code that does the coredump. In particular the
> interaction between the generic code, and the code that dumps the SPU
> notes (which is specific to cell). It's just screwing up the amount of
> data written and erroneously incrementing the file position.

Yes, that's what I figured... one "fseek()" too many.

>> The ELF header for the vdso (ELF header starting with 0x7f454c46)
>> loaded at virtual address 0x00100000 appears at file position 0x090dbc
>> in the core file, but should start at offset 0x050000 according to the
>> program LOAD header; all of the LOAD segments were slid down the file
>> by 0x40DBC, and the length of the core file extended by 0x40DBC.
>>
>> Given that the total SPU note size chunk seems to be "inserted"
>> between the notes and program load segments, to work around this
>> problem, it is possible to add the total SPU note size to the file
>> positions of the PPU LOAD segments.
>>
>> If/when someone fixes this bug, we'll need to back out that
>> workaround. Alternately, it seems that at least for the core file
>> above, the overall length of the file is equal to the highest file
>> offset plus the total size of the SPU notes, so we might be able to
>> reliably detect the bug at runtime.
> 
> OK, I see that too. And with my fix it's back to length of file ==
> highest file offset.

Excellent. I have the workaround in place and the code to recognize when the workaround is not needed, so I should be in good shape for now.

> Patch coming soon.

Thanks much!

Cheers, John D.



More information about the cbe-oss-dev mailing list