debug problems on ppc 83xx target due to changed struct task_struct

Holger Brunck holger.brunck at keymile.com
Fri Aug 19 21:14:07 AEST 2016


On 19/08/16 13:03, Christophe Leroy wrote:
> 
> 
> Le 17/08/2016 à 17:27, Holger Brunck a écrit :
>> On 16/08/16 19:27, christophe leroy wrote:
>>>
>>>
>>> Le 15/08/2016 à 18:19, Dave Hansen a écrit :
>>>> On 08/15/2016 07:35 AM, Holger Brunck wrote:
>>>>> I tried this but unfortunately the error only occurs while remote debugging.
>>>>> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
>>>>> target which is also 32-bit and it ends up with the same behaviour.
>>>>>
>>>>> I was also investigating where I have to move the line in the struct task_struct
>>>>> and it turns out to be like this (diff to 4.7 kernel):
>>>>>
>>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>>> index 253538f..4868874 100644
>>>>> --- a/include/linux/sched.h
>>>>> +++ b/include/linux/sched.h
>>>>> @@ -1655,7 +1655,9 @@ struct task_struct {
>>>>>         struct signal_struct *signal;
>>>>>         struct sighand_struct *sighand;
>>>>>
>>>>> +       // struct thread_struct thread;   // until here everything is fine
>>>>>         sigset_t blocked, real_blocked;
>>>>> +       struct thread_struct thread;      // from here it's broken
>>>>>         sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>>>>>         struct sigpending pending;
>>>>
>>>> Wow, thanks for all the debugging here!
>>>>
>>>> So, we know it has to do with signals, thread_info, and probably only
>>>> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
>>>> of the 64-bit powerpc guys to see if they have any ideas?
>>>>
>>>> I went grepping around for a bit.
>>>>
>>>> Where is the task_struct stored?  Is it on-stack on ppc32 or something?
>>>>  The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
>>>> (thread struct) math happening in here, which confuses me:
>>>>
>>>>         .globl  ret_from_debug_exc
>>>> ret_from_debug_exc:
>>>>         mfspr   r9,SPRN_SPRG_THREAD
>>>>         lwz     r10,SAVED_KSP_LIMIT(r1)
>>>>         stw     r10,KSP_LIMIT(r9)
>>>>         lwz     r9,THREAD_INFO-THREAD(r9)
>>>>         CURRENT_THREAD_INFO(r10, r1)
>>>>         lwz     r10,TI_PREEMPT(r10)
>>>>         stw     r10,TI_PREEMPT(r9)
>>>>         RESTORE_xSRR(SRR0,SRR1);
>>>>         RESTORE_xSRR(CSRR0,CSRR1);
>>>>         RESTORE_MMU_REGS;
>>>>         RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
>>>>
>>>> But, I'm really at a loss to explain this.  It still seems like a deeply
>>>> ppc-specific issue.  We can obviously work around it with an #ifdef for
>>>> your platform, but that's awfully hackish and hides the real bug,
>>>> whatever it is.
>>>>
>>>> My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
>>>>  I don't see any references to 'blocked' or 'real_blocked' in assembly
>>>> though.  You could add a bunch of padding instead of moving the
>>>> thread_struct and see if that does anything, but that's really a stab in
>>>> the dark.
>>>>
>>>
>>> Just to let you know, I'm not sure it is the same issue, but I also get
>>> my 8xx target stuck when I try to use gdbserver.
>>>
>>> If I debug a very small app, it gets stuck quickly after the app has
>>> stopped: indeed, the console seems ok but as soon as I try to execute
>>> something simple, like a ps or top, it get stuck. The target still
>>> responds to pings, but nothing else.
>>>
>>> If I debug a big app, it gets stuck soon after the start of debug: I set
>>> a bpoint at main(), do a 'continue', get breaked at main(), do some
>>> steps with 'next' then it gets stuck.
>>>
>>> I have tried moving the struct thread_struct thread but it has no impact.
>>>
>>
>> that sounds a bit different to what I see. Is your program also mutli-threaded?
>>
>> Maybe you could try with the program I use to reproduce the error:
>>
>> --- snip -----
>> #include <pthread.h>
>> #include <stdio.h>
>> #include <unistd.h>
>>
>> void * th_1_func()
>> {
>>    while (1) {
>>      sleep(2);
>>      printf("Hello from thread function 1)\n");
>>    }
>> }
>>
>> int main() {
>>   int err;
>>   pthread_t th_1, th_2, th_3;
>>
>>   err = pthread_create(&th_1, NULL, th_1_func, NULL);
>>   if (err != 0)
>>     printf("pthread_create\n");
>>   err = pthread_create(&th_2, NULL, th_1_func, NULL);
>>   if (err != 0)
>>     printf("pthread_create\n");
>>   err = pthread_create(&th_3, NULL, th_1_func, NULL);
>>   if (err != 0)
>>     printf("pthread_create\n");
>>   while(1) {}
>>   return 0;
>> }
>> --- snap ---
>>
>> Then copy it to your target and start it with the gdbserver. If you let it run
>> from your host with gdb and try to stop it e.g in the sleep call and then try to
>> single step it you might see the error. But as I said in this thread the
>> behaviour might be different depending on your kernel configuration as I
>> encountered different behaviour when enabling FTRACE or SCHED_STAT.
>>
>> Best regards
>> Holger
>>
> 
> Hi
> 
> I just tried it on an 885 and on an 8323, it work properly on both targets.
> 
> You can see below the Debug Option that are active on my 8323 target.
> 


thanks for trying it.

Could you completely disable FTRACE? As it also works on my side when I have
FTRACE enabled.

Best regards
Holger


More information about the Linuxppc-dev mailing list