[Skiboot] [PATCH 4/4] IPMI: Introduce attention call
patrick at stwcx.xyz
Thu Aug 20 22:27:46 AEST 2015
On Thu, Aug 20, 2015 at 12:33:34PM +0530, Vasant Hegde wrote:
> - This patch calls cec_reboot to reboot machine after logging eSEL event.
> It queues IPMI message and bt_poll() should be working until we pass
> reboot IPMI message to BMC. Hence we have while loop with time_wait_ms().
> Alternatively we can use xscom_trigger_xstop().. but it will stop
> immediately and eSEL logging fails.
I understand the desire for more data in many cases of abort but it
seems very dangerous to me to always do the IPMI path. There are some
abort calls that are obvious more critical than others and should be
interpreted as "shutdown as quickly as possible because I'm in a real bad
Two that come to mind to me are:
* core/mem_region.c : bad_header(...)
- If we trigger the abort here, it indicates that the heap is
corrupted. If you try to queue up an IPMI message, which
involves a malloc, you're going to end up right back in this
code. Thus, you'll keep overflowing the stack until maybe
sometime we end up with a CRESP checkstop?
* core/exceptions.c : exception_entry(...)
- The specific exception that worries me most is the
machine-check. This could be due to a memory or cache UE, so
we want to abort as quickly as possible. Hopefully, with all
your extra code we just end up with an MSR[ME]=0 checkstop
instead of a slightly more graceful xscom_trigger_xstop().
In Hostboot we have a 'assert' and 'crit_assert'. Assert is "I'm in a
bad state, but not so bad that we cannot collect data about it."
Crit-Assert is "I'm in a really bad state, so just shutdown."
Do you need something similar here in skiboot? I would suggest
reviewing all of the 'abort' calls and confirm that they do not lead
into bad situations if you try to do "extra" stuff like send an IPMI
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 819 bytes
Desc: Digital signature
More information about the Skiboot