KDB updates

Fri Mar 19 08:55:27 EST 2004

Hi,

I am cc'ing the public mailing list since I figure this might be of
general interest.

On Thu, Mar 18, 2004 at 07:16:47PM +0500, Ananth N Mavinakayanahalli wrote:
>
> We are working with your patch (now in ameslab) and found an issue with
> stack backtrace for the "current" process. Inlined is a patch that fixes
> it (uses a method similar to the i386 code) and it also contains a few
> other checks. Please comment....

> We have been working on a Power3 box all this while and KDB works fine.
> We are now trying it on a p630 but the machine locks up solid immediately
> upon entry; we have to reset it through the hmc console.
>
> Just curious if you had the patch working on SMP? Are there any other
> issues to take care of in this case? Any pointers would be helpful.

works on power3 smp for me, I will try power4 lpar when I get the chance.

------
I'm about to try your patch.  In the meanwhile, there are several other
bugs that should be fixed.

-- it takes 10-15 seconds between 'startKDB' and getting the prompt.
   I think there is something wrong abuot the way the IPI to stop the
   other cpus is handled.  I tried reading the code but couldn't find
   the problem.

-- When started with 'startKDB' it seems to work, but when started via
   'little yellow button', the 'go' command doesn't seem to be handled.
   It seems like the linux kernel oops handler also wants to run, and
   that handler doesn't correctly handle the system reset interrupt,
   at which point the machine powers off.  (That's wrong, the system
   reset interrupt is fully recoverable, and the debugger should allow
   system to resume where it was before the system reset.)

   I think the bug is related to arch/ppc64/kernel/traps.c:173
      if (!debugger(regs))
      die("System Reset", regs, 0);

   Either those lines are wrong, or kdb should be returning a non-zero
   return code, I'm not sure which.

   Note also some print garbage should be cleaned up.

panic:~ #
panic:~ #
panic:~ # O<op4s>:[ aStytsetnetmi oRne]s32et00,  KsDiBg :C a0l l[ # 1 ]
 S  M
NR_CPUS=32
NIP: C0000000000145B0 XER: 0000000020000000 LR: C0000000000145E0
REGS: c00000003ff93b60 TRAP: 0100   Not tainted  (2.6.5-rc1-ames)
MSR: a000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK: c00000003ff96e00[0] 'swapper' THREAD: c00000003ff90000 CPU: 1
GPR00: 0000000000000000 C00000003FF93DE0 C0000000006D5230 C00000003F5A27D8
GPR04: C00000003FF972B0 0000000000000001 0000000022014852 0000000000000000
GPR08: 0000000002B15480 C00000003FF90000 C0000000006D3008 C00000003FF90000
GPR12: 0000000024022822 C0000000004B2000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 FFFFFFFF8AC00300 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
GPR24: 0000000000000001 0000000000000010 0000000000000010 C00000003FF90010
GPR28: C00000003FF90000 0000000000000008 C00000003FF90000 C00000003FF90010
NIP [c0000000000145b0] .default_idle+0x78/0xc4
LR [c0000000000145e0] .default_idle+0xa8/0xc4
Call Trace:
[c000000000014860] .cpu_idle+0x2c/0x44
[c000000000043070] .start_secondary+0xf0/0x130
[c00000000000bb94] .enable_64b_mode+0x0/0x28
 2 cpus are not in kdb, their state is unknown

Entering kdb (current=0xc00000000053ef30, pid 0) on processor 0 due to KDB_ENTE)[0]kdb>
[0]kdb>
[0]kdb> go
[attention]3300 KDB Done
Oops: System Reset, sig: 0 [#2]
SMP NR_CPUS=32
NIP: C0000000000145BC XER: 0000000000000000 LR: C0000000000145E0
REGS: c0000000004a<fa0d>0K erTRnAelP :p a01n0ic0:   A Ntotetm pttaeindt etdo   s!                                                                              kMSRIn:  ai0d0l0e0 0t0a0s0k 0-0 0n9o0t3 2 sEyEn:c i1n gP
:  0 FP: 0 ME: 1 IR/DR: 11                             R
TASK: c00000000053ef30[0] 'swapper' THREAD: c0000000004ac000 CPU: 0
GPR00: 0000000000000000 C0000000004AFD50 C0000000006D5230 0000000000000000
GPR04: C00000000053F3E0 0000000000000001 0000000028000022 0000000000000000
GPR08: 0000000002B0D480 C0000000004AC000 C0000000006D3008 C0000000004AC000
GPR12: 0000000028004488 C0000000004B0000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000001410000
GPR24: C0000000004B0000 0000000000000010 0000000000000010 C0000000004AC010
GPR28: C0000000004AC000 0000000000000008 C0000000004AC000 C0000000004AC010
NIP [c0000000000145bc] .default_idle+0x84/0xc4
LR [c0000000000145e0] .default_idle+0xa8/0xc4
Call Trace:
[c000000000014860] .cpu_idle+0x2c/0x44
[c00000000000bf78] .rest_init+0x74/0x8c
[c00000000045fabc] .start_kernel+0x274/0x2ec
[c00000000000beec] .__setup_cpu_power3+0x0/0x4
Badness in do_unblank_screen at drivers/char/vt.c:2822
Call Trace:
[c0000000001e5610] .bust_spinlocks+0x58/0x84
[c000000000011f54] .die+0xf4/0x184
[c0000000000121f8] .SystemResetException+0x74/0xb4
[c00000000000a0e0] SystemReset_common+0xe0/0x0
[c0000000000145e0] .default_idle+0xa8/0xc4
[c000000000014860] .cpu_idle+0x2c/0x44
[c00000000000bf78] .rest_init+0x74/0x8c
[c00000000045fabc] .start_kernel+0x274/0x2ec
[c00000000000beec] .__setup_cpu_power3+0x0/0x4
 smp_call_function on cpu 1: other cpus not responding (0)
kdb: Debugger re-entered on cpu 1, new reason = 12
     Not executing a kdb command
     No longjmp available for recovery
     Cannot recover, allowing event to proceed
2 cpus are not in kdb, their state is unknown

Entering kdb (current=0xc00000003ff96e00, pid 0) on processor 1
[1]kdb>
[1]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, type go a second time if you really want to contine[1]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, attempting to continue

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/