correctable errors logging in OpenBMC on P8

Sergey Kachkin s.kachkin at gmail.com
Thu Dec 14 05:11:35 AEDT 2017


Hi team,

i've got a question regarding correctable errors logging in OpenBMC on P8
platform.  When a machine completely crashed due to checkstop, eSEL is
generated and we have some data for post-mortem analysis, but do we have
any support for correctable errors like ECC, or correctable MCs?

Also is there any way to inject correctable error manually (like we can
inject a dimm UE with putscom)? This may be really valuable for RAS
complience verification.

     Actually i have real use case right now. One of SPEC2006 tests
(456.hmmer) causes following error  on a P8  test server (OpenBMC) which
doesn't cause any eSEL events:

[12323.883272] Harmless Hypervisor Maintenance interrupt [Recovered]
[12323.883323]  Error detail: Processor Recovery done
[12323.883361]     HMER: 2040400000000000
[12323.883392] Harmless Hypervisor Maintenance interrupt [Recovered]
[12323.883442]  Error detail: Processor Recovery done
[12323.883482]     HMER: 2040400000000000
[15281.455845] hmmer_base.Linu[78208]: unhandled signal 11 at
0000000000000004 nip 00000000100304a4 lr 000000001003d9ac code 30001

So i really do not have any data to root cause this. Could it be a software
error?
Anyway this is detected by OPAL and wondering why no any eSEL generated?

thanks in advance,

regards,
Sergey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20171213/447ef359/attachment-0001.html>


More information about the openbmc mailing list