Help with crash on MPC855T with 2.2.14

Marcelo Tosatti marcelo.tosatti at cyclades.com
Thu May 27 08:20:28 EST 2004


Forgot to mention that same processor (on a similar but not exactly the
same hardware) running v2.4 is not-crashable with the same test.

On Wed, May 26, 2004 at 07:09:54PM -0300, Marcelo Tosatti wrote:
>
> Hi PPC fellows,
>
> We are facing a crash on high load on our TS console servers (2.2.14 based).
>
> The test used to reproduce the crash involves running SSH connection attemps in a loop
> from a fast host. After one or two hours of testing, the crash happens. Its still
> possible to ping the box and it answers to typed keys, but thats all. The kernel is looping
> in page fault handling code as following, which has been observed from a BDI2000 and gdb:
>
> (gdb) cont
> Continuing.
>
> (locked here, so I type "ctrl+c" on the gdb session).
>
> Program received signal SIGSTOP, Stopped (signal).
> local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
> 549             asm volatile ("tlbia" : : );
> (gdb) bt
> #0  local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
> #1  0xc0019368 in handle_mm_fault (tsk=0xce95e000, vma=0xce678200,
>     address=2147481140, write_access=33554432) at memory.c:918
> Cannot access memory at address 0xce95fca0
> (gdb) cont
> Continuing.
>
> And it keeps receiving faults from this address (7FFFF634 in this example,
> sometimes also 7FFFF630), which are part of the process last VMA. Forever.
>
> # cat /proc/1/maps
>
> 30023000-30026000 rwxp 00013000 01:00 249        /lib/ld-2.1.3.so
> 30026000-30027000 rwxp 00000000 00:00 0
> 7fffe000-80000000 rwxp fffff000 00:00 0
>
> The "error_code" passed to "do_page_fault" under such endless loop
> is either 0xE (14) or 0x82000000 (2181038080).
>
> handle_mm_fault trace for such "unsuccessful pte bringup":
>
> #0  handle_mm_fault (tsk=0xce70c000, vma=0xce6188c0, address=2147481140,
>     write_access=33554432) at memory.c:901
>
> 903             if (!pte_present(entry)) {
> 909             entry = pte_mkyoung(entry);
> 910             set_pte(pte, entry);
> 911             flush_tlb_page(vma, address);
> 912             if (write_access) {
> 913                     if (!pte_write(entry))
> 303             pte_val(pte) |= _PAGE_DIRTY;
> 304             if (pte_val(pte) & _PAGE_RW)
> 305                     pte_val(pte) |= _PAGE_HWWRITE;
> 918                     flush_tlb_page(vma, address);
> 916                     entry = pte_mkdirty(entry);
> 917                     set_pte(pte, entry);
> 918                     flush_tlb_page(vma, address);
> 921             return 1;
>
> I should try to figure out why is it faulting. Maybe the pte
> is not being correctly setup.
>
> Any hints are welcome.
>
> /proc/cpuinfo
> processor       : 0
> cpu             : 8xx
> clock           : 48MHz
> clock           : 48MHz
> bus clock       : 48MHz
> revision        : 0.0
> bogomips        : 47.82
> zero pages      : total 0 (0Kb) current: 0 (0Kb) hits: 0/124087 (0%)

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





More information about the Linuxppc-embedded mailing list