How to deal some fatal error causing from host in openbmc

Bills, Jason M jason.m.bills at linux.intel.com
Fri Mar 13 02:32:22 AEDT 2020



On 3/11/2020 11:40 PM, zhang_cy1989 wrote:
> Dear All
>       There are some fatal errors in host side.
>        Ex:
>             Uncorrectable ECC/ other uncorrectable memory error
>             Unrecoverable hard-disk device failure...
>             PCIE AER and so on.
>        How dose BMC get all reasons of those fatal errors?
>        BIOS gives those informations to BMC by ipmi?
For Intel platforms, most of those errors (ECC, PCIe, etc.) are handled 
and reported by BIOS over IPMI.

>        Or like peci in intel platform?
For errors that hang the host (IERR, ERR[2] timeout, etc.) the BMC 
detects it by GPIO and uses PECI to get additional info about the error.

> 
>        What recipes  can I refer to in openbmc?
You can see the current Intel host-error-monitor application here: 
https://github.com/Intel-BMC/host-error-monitor.

>        Wating for your help!
>        Thanks.
> Felix
> 
> 


More information about the openbmc mailing list