Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP

Radu Rendec radu.rendec at gmail.com
Tue Jul 6 23:50:19 AEST 2021


On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote:
>Le 06/07/2021 à 13:56, Radu Rendec a écrit :
>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
>>>> I'm trying to set up my (virtual) environment to test an old bug in the
>>>> PPC32 ptrace() code. I came across a completely different problem,
>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
>>>> this is a real kernel bug or maybe something wrong with my
>>>> configuration.
>>>>
>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test
>>>> "hello world" program. Upon running the test program, I am hitting the
>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
>>>> included below at the end of the email.
>>>>
>>>> FWIW, gdb says:
>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
>>>> The program no longer exists.
>>>>
>>>> I also added a pr_info() to do_notify_resume() just to see how much
>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
>>>> be OK (pid and comm are consistent with the test program).
>>>
>>> The TRAP = 0x7d8 is obviously wrong.
>>>
>>> Need to know which 'TRAP' it is exactly.
>>> Could you try to dump what we have at the correct regs ?
>>> Something like 'show_regs(current->thread.regs)' should do it.
>> 
>> Sure, please see the output below. It looks to me like the "correct"
>> regs are just garbage. Either they are overwritten or current->thread.regs
>> is wrong. But in any case, r1 = 0 doesn't look good.
>
>Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct 
>exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see.
>
>Commit 
>https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26
>
>Need to find the best solution now to fix that.

Awesome, happy to see you figured it out so quickly.

I'm not sure if it makes any sense, but one thing that comes to mind is
to put struct exception_regs before struct pt_regs when the frame is
saved. Unless of course other parts of the code expect the opposite.

>> regs=c7a0ff10 cur=c7a0ff40 pid=139 comm=test
>> CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4
>> NIP:  10000338 LR: 00030000 CTR: 00000003
>> REGS: c7a0ff40 TRAP: 670000   Not tainted  (5.13.0-dirty)
>> MSR:  1002d202 <CE,EE,PR,ME,DE>  CR: 10000004  XER: 80670100
>> 
>> GPR00: b7fc36d8 00000000 00000000 b7fe17b4 00000000 b7ffd588 b7ffe8b8 b7ffee10
>> GPR08: b7fff214 b7ffdf40 b7fff208 bffff858 bffff970 b7fff130 00000001 bffff960
>> GPR16: b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 b7fc3714 1002d002 b7fff208 00000003
>> GPR24: b7fc3714 00000000 22000284 bffff960 000007d8 10000338 08000000 bffff850
>> NIP [10000338] 0x10000338
>> LR [00030000] 0x30000
>> Call Trace:
>> [c7a0fe40] [c0008eac] show_regs+0x4c/0x1b0 (unreliable)
>> [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320
>> [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
>> [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c
>> --- interrupt: 7d8 at 0xb7fc3714
>> NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
>> REGS: c7a0ff10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
>> MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000
>> 
>> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552
>> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4
>> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
>> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850
>> NIP [b7fc3714] 0xb7fc3714
>> LR [b7fc3714] 0xb7fc3714
>> --- interrupt: 7d8
>> ------------[ cut here ]------------
>> kernel BUG at arch/powerpc/kernel/signal.c:298!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
>> Modules linked in:
>> CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4
>> NIP:  c000969c LR: c000969c CTR: c065f540
>> REGS: c7a0fdc0 TRAP: 0700   Not tainted  (5.13.0-dirty)
>> MSR:  00028002 <CE,EE>  CR: 28000282  XER: 20000000
>> 
>> GPR00: c000969c c7a0fe80 c70ef500 c70efbd8 c70ef500 00000010 c7a0fc38 00000002
>> GPR08: 00000001 00000000 00000000 00000000 28000282 00000000 00000000 b7fe17b4
>> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
>> GPR24: bffff970 b7fff130 00000001 bffff960 c7a0ff10 00000800 c70ef500 00000102
>> NIP [c000969c] do_notify_resume+0x31c/0x320
>> LR [c000969c] do_notify_resume+0x31c/0x320
>> Call Trace:
>> [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 (unreliable)
>> [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0
>> [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c
>> --- interrupt: 7d8 at 0xb7fc3714
>> NIP:  b7fc3714 LR: b7fc3714 CTR: 00000003
>> REGS: c7a0ff10 TRAP: 07d8   Not tainted  (5.13.0-dirty)
>> MSR:  1002d002 <CE,EE,PR,ME>  CR: 22000284  XER: 00000000
>> 
>> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552
>> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4
>> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858
>> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850
>> NIP [b7fc3714] 0xb7fc3714
>> LR [b7fc3714] 0xb7fc3714
>> --- interrupt: 7d8
>> Instruction dump:
>> 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc 7f84e378 38e204b0
>> 3863ce30 4809d819 80620704 4bfff7c9 <0fe00000> 3d20c0ff 8129c014 2c090000
>> ---[ end trace 1cbf27cd19ae39a0 ]---
>> 
>>>> If I'm not missing something, the 'regs' pointer that is eventually
>>>> passed to do_notify_resume() is calculated in interrupt_return() by
>>>> adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That
>>>> means all registers are saved on the stack before entering the
>>>> interrupt handler and, upon returning, a pointer to the register
>>>> structure is calculated from the stack pointer. Either the offset
>>>> itself is wrong, or the stack pointer is off by 0x30.
>>>>
>>>> This is as far as I have gone. Hopefully this rings a bell to someone
>>>> who is familiar with that part of the code and has a better
>>>> understanding of PPC32 interrupt handling in general.
>>>>
>>>> Last but not least, my configuration is fairly standard. I'm using the
>>>> powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin
>>>> to compile everything (kernel and user-space). The qemu version is
>>>> 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs
>>>> that I built using Buildroot 2021.05. The gdb version is 9.2.




More information about the Linuxppc-dev mailing list