Crash on MPC855T with 2.2.14
Marcelo Tosatti
marcelo.tosatti at cyclades.com
Thu May 27 08:09:54 EST 2004
Hi PPC fellows,
We are facing a crash on high load on our TS console servers (2.2.14 based).
The test used to reproduce the crash involves running SSH connection
attemps in a loop from a fast host. After one or two hours of testing,
the crash happens. Its still possible to ping the box and it answers to
typed keys, but thats all. The kernel is looping in page fault handling
code as following, which has been observed from a BDI2000 and gdb:
(gdb) cont
Continuing.
(locked here, so I type "ctrl+c" on the gdb session).
Program received signal SIGSTOP, Stopped (signal).
local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
549 asm volatile ("tlbia" : : );
(gdb) bt
#0 local_flush_tlb_page (vma=0xce678200, vmaddr=2147481140) at init.c:549
#1 0xc0019368 in handle_mm_fault (tsk=0xce95e000, vma=0xce678200,
address=2147481140, write_access=33554432) at memory.c:918
Cannot access memory at address 0xce95fca0
(gdb) cont
Continuing.
And it keeps receiving faults from this address (7FFFF634 in this example,
sometimes also 7FFFF630), which are part of the process last VMA. Forever.
# cat /proc/1/maps
30023000-30026000 rwxp 00013000 01:00 249 /lib/ld-2.1.3.so
30026000-30027000 rwxp 00000000 00:00 0
7fffe000-80000000 rwxp fffff000 00:00 0
The "error_code" passed to "do_page_fault" under such endless loop
is either 0xE (14) or 0x82000000 (2181038080).
handle_mm_fault trace for such "unsuccessful pte bringup":
#0 handle_mm_fault (tsk=0xce70c000, vma=0xce6188c0, address=2147481140,
write_access=33554432) at memory.c:901
903 if (!pte_present(entry)) {
909 entry = pte_mkyoung(entry);
910 set_pte(pte, entry);
911 flush_tlb_page(vma, address);
912 if (write_access) {
913 if (!pte_write(entry))
303 pte_val(pte) |= _PAGE_DIRTY;
304 if (pte_val(pte) & _PAGE_RW)
305 pte_val(pte) |= _PAGE_HWWRITE;
918 flush_tlb_page(vma, address);
916 entry = pte_mkdirty(entry);
917 set_pte(pte, entry);
918 flush_tlb_page(vma, address);
921 return 1;
I should try to figure out why is it faulting. Maybe the pte
is not being correctly setup.
Any hints are welcome.
/proc/cpuinfo
processor : 0
cpu : 8xx
clock : 48MHz
clock : 48MHz
bus clock : 48MHz
revision : 0.0
bogomips : 47.82
zero pages : total 0 (0Kb) current: 0 (0Kb) hits: 0/124087 (0%)
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
More information about the Linuxppc-embedded
mailing list