Crash on MPC855T with 2.2.14

Marcelo Tosatti marcelo.tosatti at cyclades.com
Thu May 27 08:09:54 EST 2004


Hi PPC fellows,

We are facing a crash on high load on our TS console servers (2.2.14 based).

The test used to reproduce the crash involves running SSH connection
attemps in a loop from a fast host. After one or two hours of testing,
the crash happens. Its still possible to ping the box and it answers to
typed keys, but thats all. The kernel is looping in page fault handling
code as following, which has been observed from a BDI2000 and gdb:

(gdb) cont
Continuing.

(locked here, so I type "ctrl+c" on the gdb session).

Program received signal SIGSTOP, Stopped (signal).
local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
549             asm volatile ("tlbia" : : );
(gdb) bt
#0  local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
#1  0xc0019368 in handle_mm_fault (tsk=0xce95e000, vma=0xce678200,
    address=2147481140, write_access=33554432) at memory.c:918
Cannot access memory at address 0xce95fca0
(gdb) cont
Continuing.

And it keeps receiving faults from this address (7FFFF634 in this example,
sometimes also 7FFFF630), which are part of the process last VMA. Forever.

# cat /proc/1/maps

30023000-30026000 rwxp 00013000 01:00 249        /lib/ld-2.1.3.so
30026000-30027000 rwxp 00000000 00:00 0
7fffe000-80000000 rwxp fffff000 00:00 0

The "error_code" passed to "do_page_fault" under such endless loop
is either 0xE (14) or 0x82000000 (2181038080).

handle_mm_fault trace for such "unsuccessful pte bringup":

#0  handle_mm_fault (tsk=0xce70c000, vma=0xce6188c0, address=2147481140,
    write_access=33554432) at memory.c:901

903             if (!pte_present(entry)) {
909             entry = pte_mkyoung(entry);
910             set_pte(pte, entry);
911             flush_tlb_page(vma, address);
912             if (write_access) {
913                     if (!pte_write(entry))
303             pte_val(pte) |= _PAGE_DIRTY;
304             if (pte_val(pte) & _PAGE_RW)
305                     pte_val(pte) |= _PAGE_HWWRITE;
918                     flush_tlb_page(vma, address);
916                     entry = pte_mkdirty(entry);
917                     set_pte(pte, entry);
918                     flush_tlb_page(vma, address);
921             return 1;

I should try to figure out why is it faulting. Maybe the pte
is not being correctly setup.

Any hints are welcome.

/proc/cpuinfo
processor       : 0
cpu             : 8xx
clock           : 48MHz
clock           : 48MHz
bus clock       : 48MHz
revision        : 0.0
bogomips        : 47.82
zero pages      : total 0 (0Kb) current: 0 (0Kb) hits: 0/124087 (0%)

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





More information about the Linuxppc-embedded mailing list