MCE handler gets NIP wrong on MPC8378

Christophe Leroy christophe.leroy at c-s.fr
Thu Feb 20 08:21:10 AEDT 2020


Christophe Leroy <christophe.leroy at c-s.fr> a écrit :

> Radu Rendec <radu.rendec at gmail.com> a écrit :
>
>> On 02/19/2020 at 10:11 AM Radu Rendec <radu.rendec at gmail.com> wrote:
>>> On 02/18/2020 at 1:08 PM Christophe Leroy <christophe.leroy at c-s.fr> wrote:
>>>> Le 18/02/2020 à 18:07, Radu Rendec a écrit :
>>>> > The saved NIP seems to be broken inside machine_check_exception() on
>>>> > MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
>>>> > but I have seen other weird values.
>>>> >
>>>> > I've been able to track down the entry code to head_32.S (vector 0x200),
>>>> > but I'm not sure where/how the NIP value (where the exception occurred)
>>>> > is captured.
>>>>
>>>> NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and
>>>> saved into _NIP(r11) in transfer_to_handler in entry_32.S
>>>>
>>>> Can something clobber r12 at some point ?
>>>>
>>>
>>> I did something even simpler: I added the following
>>>
>>>      lis r12,0x1234
>>>
>>> ... right after
>>>
>>>      mfspr r12,SPRN_SRR0
>>>
>>> ... and now the NIP value I see in the crash dump is 0x12340000. This
>>> means r12 is not clobbered and most likely the NIP value I normally see
>>> is the actual SRR0 value.
>>
>> I apologize for the noise. I just found out accidentally that the saved
>> NIP value is correct if interrupts are disabled at the time when the
>> faulty access that triggers the MCE occurs. This seems to happen
>> consistently.
>>
>> By "interrupts are disabled" I mean local_irq_save/local_irq_restore, so
>> it's basically enough to wrap ioread32 to get the NIP value right.
>>
>> Does this make any sense? Maybe it's not a silicon bug after all, or
>> maybe it is and I just found a workaround. Could this happen on other
>> PowerPC CPUs as well?
>
> Interesting.
>
> 0x900 is the adress of the timer interrupt.
>
> Would the MCE occur just after the timer interrupt ?
>
> Can you tell how are configured your IO busses, etc ... ?

And what's the value of SERSR after the machine check ?

Do you use the local bus monitoring driver ?

Christophe



More information about the Linuxppc-dev mailing list