Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP

Christophe Leroy christophe.leroy at csgroup.eu
Wed Jul 7 00:11:08 AEST 2021



Le 06/07/2021 à 16:05, Radu Rendec a écrit :
> On Tue, 2021-07-06 at 15:53 +0200, Christophe Leroy wrote:
>> Le 06/07/2021 à 15:50, Radu Rendec a écrit :
>>> On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote:
>>>> Le 06/07/2021 à 13:56, Radu Rendec a écrit :
>>>>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
>>>>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
>>>>>>> I'm trying to set up my (virtual) environment to test an old bug in the
>>>>>>> PPC32 ptrace() code. I came across a completely different problem,
>>>>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
>>>>>>> this is a real kernel bug or maybe something wrong with my
>>>>>>> configuration.
>>>>>>>
>>>>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
>>>>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test
>>>>>>> "hello world" program. Upon running the test program, I am hitting the
>>>>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
>>>>>>> included below at the end of the email.
>>>>>>>
>>>>>>> FWIW, gdb says:
>>>>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
>>>>>>> The program no longer exists.
>>>>>>>
>>>>>>> I also added a pr_info() to do_notify_resume() just to see how much
>>>>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
>>>>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
>>>>>>> be OK (pid and comm are consistent with the test program).
>>>>>>
>>>>>> The TRAP = 0x7d8 is obviously wrong.
>>>>>>
>>>>>> Need to know which 'TRAP' it is exactly.
>>>>>> Could you try to dump what we have at the correct regs ?
>>>>>> Something like 'show_regs(current->thread.regs)' should do it.
>>>>>
>>>>> Sure, please see the output below. It looks to me like the "correct"
>>>>> regs are just garbage. Either they are overwritten or current->thread.regs
>>>>> is wrong. But in any case, r1 = 0 doesn't look good.
>>>>
>>>> Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct
>>>> exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see.
>>>>
>>>> Commit
>>>> https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26
>>>>
>>>> Need to find the best solution now to fix that.
>>>
>>> Awesome, happy to see you figured it out so quickly.
>>>
>>> I'm not sure if it makes any sense, but one thing that comes to mind is
>>> to put struct exception_regs before struct pt_regs when the frame is
>>> saved. Unless of course other parts of the code expect the opposite.
>>
>> Yes I think it is a good idea. I think I won't have time to look at that before summer vacation though.
> 
> I can take a stab at it. I'm not familiar with that part of the code,
> but the best way to learn is to get your hands dirty :) In the worst
> case, I won't fix it.
> 

Not that easy in fact.
After looking once more, the best solution I see now would be to move the content of struct 
exception_regs into the second part of struct pt_regs (the kernel one in asm/ptrace.h).

Changes should be limited to head_booke.h and asm-offsets.c
struct exception_regs and STACK_EXC_LVL_FRAME_SIZE should go away.

Christophe


More information about the Linuxppc-dev mailing list