[Skiboot] [PATCH v2] opal/xstop: Use nvram option to enable/disable sw checkstop.
stewart at linux.vnet.ibm.com
Mon Jan 15 17:42:22 AEDT 2018
Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> Add a mechanism to enable/disable sw checkstop by looking at nvram option
> For now this patch disables the sw checkstop trigger unless explicitly
> enabled through nvram option 'opal-sw-xstop=enable'i for p9. This will allow
> an opportunity to get host kernel in panic path or xmon for unrecoverable
> HMIs or MCE, to be able to debug the issue effectively.
> To enable sw checkstop in opal issue following command:
> # nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
> NOTE: This is a workaround patch to disable sw checkstop by default to gain
> control in host kernel for better checkstop debugging. Once we have most of
> the checkstop issues stabilized/resolved, revisit this patch to enable sw
> checkstop by default.
> For p8 platform it will remain enabled by default unless explicitly disabled.
> To disable sw checkstop on p8 issue following command:
> # nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> Reviewed-by: Balbir Singh <bsingharora at gmail.com>
> Change in v2:
> - Add pr_log to indicate that sw checkstop was disabled.
> hw/xscom.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
All a bit umming-and-ahhing about the behaviour change... but this seems
to be the "easiest" for now.... and I reserve the right to change my
mind at any point :)
I think the correct solution here is to have the kernel make the
appropriate decision rather than having this workaround in OPAL.
BUt.. well... reality and today was checkstop heavy, so my mind kind of
Merged to master as of 3c38214ab4f097a307058361428f9be8a239f1db though.
I think having the option to *disable* it is always going to be good,
but... well... I don't like that we end up in a situation where the
kernel says "everything is terrible because you told me it was terrible,
please reboot now" and then we ignore it.
The real solution is a kernel one....
OPAL Architect, IBM.
More information about the Skiboot