correctable errors logging in OpenBMC on P8
Sergey Kachkin
s.kachkin at gmail.com
Thu Dec 14 05:11:35 AEDT 2017
Hi team,
i've got a question regarding correctable errors logging in OpenBMC on P8
platform. When a machine completely crashed due to checkstop, eSEL is
generated and we have some data for post-mortem analysis, but do we have
any support for correctable errors like ECC, or correctable MCs?
Also is there any way to inject correctable error manually (like we can
inject a dimm UE with putscom)? This may be really valuable for RAS
complience verification.
Actually i have real use case right now. One of SPEC2006 tests
(456.hmmer) causes following error on a P8 test server (OpenBMC) which
doesn't cause any eSEL events:
[12323.883272] Harmless Hypervisor Maintenance interrupt [Recovered]
[12323.883323] Error detail: Processor Recovery done
[12323.883361] HMER: 2040400000000000
[12323.883392] Harmless Hypervisor Maintenance interrupt [Recovered]
[12323.883442] Error detail: Processor Recovery done
[12323.883482] HMER: 2040400000000000
[15281.455845] hmmer_base.Linu[78208]: unhandled signal 11 at
0000000000000004 nip 00000000100304a4 lr 000000001003d9ac code 30001
So i really do not have any data to root cause this. Could it be a software
error?
Anyway this is detected by OPAL and wondering why no any eSEL generated?
thanks in advance,
regards,
Sergey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20171213/447ef359/attachment-0001.html>
More information about the openbmc
mailing list