Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP

Radu Rendec radu.rendec at gmail.com
Wed Jul 7 00:05:25 AEST 2021


On Tue, 2021-07-06 at 15:53 +0200, Christophe Leroy wrote:
>Le 06/07/2021 à 15:50, Radu Rendec a écrit :
>> On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote:
>>> Le 06/07/2021 à 13:56, Radu Rendec a écrit :
>>>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote:
>>>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit :
>>>>>> I'm trying to set up my (virtual) environment to test an old bug in the
>>>>>> PPC32 ptrace() code. I came across a completely different problem,
>>>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if
>>>>>> this is a real kernel bug or maybe something wrong with my
>>>>>> configuration.
>>>>>>
>>>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running
>>>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test
>>>>>> "hello world" program. Upon running the test program, I am hitting the
>>>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is
>>>>>> included below at the end of the email.
>>>>>>
>>>>>> FWIW, gdb says:
>>>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
>>>>>> The program no longer exists.
>>>>>>
>>>>>> I also added a pr_info() to do_notify_resume() just to see how much
>>>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are
>>>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to
>>>>>> be OK (pid and comm are consistent with the test program).
>>>>>
>>>>> The TRAP = 0x7d8 is obviously wrong.
>>>>>
>>>>> Need to know which 'TRAP' it is exactly.
>>>>> Could you try to dump what we have at the correct regs ?
>>>>> Something like 'show_regs(current->thread.regs)' should do it.
>>>>
>>>> Sure, please see the output below. It looks to me like the "correct"
>>>> regs are just garbage. Either they are overwritten or current->thread.regs
>>>> is wrong. But in any case, r1 = 0 doesn't look good.
>>>
>>> Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct
>>> exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see.
>>>
>>> Commit
>>> https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26
>>>
>>> Need to find the best solution now to fix that.
>> 
>> Awesome, happy to see you figured it out so quickly.
>> 
>> I'm not sure if it makes any sense, but one thing that comes to mind is
>> to put struct exception_regs before struct pt_regs when the frame is
>> saved. Unless of course other parts of the code expect the opposite.
>
>Yes I think it is a good idea. I think I won't have time to look at that before summer vacation though.

I can take a stab at it. I'm not familiar with that part of the code,
but the best way to learn is to get your hands dirty :) In the worst
case, I won't fix it.




More information about the Linuxppc-dev mailing list