[OpenPower-Firmware] poor correctable MC errors logging

Mahesh Jagannath Salgaonkar mahesh at linux.vnet.ibm.com
Mon Mar 19 15:56:05 AEDT 2018


On 03/16/2018 10:09 PM, Sergey Kachkin wrote:
> Hi Mahesh,
> 
> thanks for your reply.
> 
>> We can improve it to print CPU pir number. Do you also want location code
> info there ?
> 
> Yes, I would prefer as much info as possible that may help to distinguish
> one MC problem from another and isolate the root cause. So adding more
> details would be beneficial.
> Am I correct that CPU numbers etc  will be printed for other similar
> recoverable errors also?

Yes. If we print cpu pir info it will reflect for all errors.

  I have never seen anything except ERAT but
> wondering if it could be also SLB / TLB multihit etc.
> 
> 
> regards,
> Sergey
> YADRO
> 
> 
> On Fri, Mar 16, 2018 at 6:52 PM, Mahesh Jagannath Salgaonkar <
> mahesh at linux.vnet.ibm.com> wrote:
> 
>> On 03/14/2018 07:05 PM, Sergey Kachkin wrote:
>>> Hi,
>>>
>>> recently there was a number of HMI logging improvements which may help to
>>> isolate the source of HMI errors, but troubleshooting MCs like below is
>>> also challenging.
>>> Can we have additional logging for MCs also?
>>
>> We can improve it to print CPU pir number. Do you also want location
>> code info there ?
>>
>>
> 
> 
>>>
>>>
>>>    1. Feb 15 02:56:33 host kernel: Severe Machine check interrupt
>>>    [Recovered]
>>>    2. Feb 15 02:56:33 host kernel:   Initiator: CPU
>>>    3. Feb 15 02:56:33 host kernel:   Error type: ERAT [Multihit]
>>>    4. Feb 15 02:56:33 host kernel:     Effective address:
>> c00003eefc12f018
>>>    5. Feb 15 03:04:19 host kernel: Severe Machine check interrupt
>>>    [Recovered]
>>>    6. Feb 15 03:04:19 host kernel:   Initiator: CPU
>>>    7. Feb 15 03:04:19 host kernel:   Error type: ERAT [Multihit]
>>>    8. Feb 15 03:04:19 host kernel:     Effective address:
>> c00003eefc12f018
>>>    9.
>>>
>>>
>>>
>>> * [282d5fee5c4f](https://github.com/open-power/skiboot/commit/
>> 282d5fee5c4f)
>>> core/hmi: Use pr_fmt macro for tagging log messages
>>> * [c531ff957669](https://github.com/open-power/skiboot/commit/
>> c531ff957669)
>>> opal/hmi: HMI logging with location code info.
>>> * [b33ed1e6b6b0](https://github.com/open-power/skiboot/commit/
>> b33ed1e6b6b0)
>>> core/hmi: Do not display FIR details if none of the bits are set.
>>> * [45a961515be6](https://github.com/open-power/skiboot/commit/
>> 45a961515be6)
>>> core/hmi: Display chip location code while displaying core FIR.
>>>
>>>
>>> thanks,
>>>
>>> regards,
>>> Sergey
>>> YADRO
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenPower-Firmware mailing list
>>> OpenPower-Firmware at lists.ozlabs.org
>>> https://lists.ozlabs.org/listinfo/openpower-firmware
>>>
>>
>>
> 



More information about the OpenPower-Firmware mailing list