[Skiboot] [PATCH v2] opal/xstop: Use nvram option to enable/disable sw checkstop.

Oliver oohall at gmail.com
Mon Jan 15 17:56:57 AEDT 2018


On Mon, Jan 15, 2018 at 5:42 PM, Stewart Smith
<stewart at linux.vnet.ibm.com> wrote:
> Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
>> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>>
>> Add a mechanism to enable/disable sw checkstop by looking at nvram option
>> opal-sw-xstop=<enable/disable>.
>>
>> For now this patch disables the sw checkstop trigger unless explicitly
>> enabled through nvram option 'opal-sw-xstop=enable'i for p9. This will allow
>> an opportunity to get host kernel in panic path or xmon for unrecoverable
>> HMIs or MCE, to be able to debug the issue effectively.
>>
>> To enable sw checkstop in opal issue following command:
>>
>> # nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
>>
>> NOTE: This is a workaround patch to disable sw checkstop by default to gain
>> control in host kernel for better checkstop debugging. Once we have most of
>> the checkstop issues stabilized/resolved, revisit this patch to enable sw
>> checkstop by default.
>>
>> For p8 platform it will remain enabled by default unless explicitly disabled.
>>
>> To disable sw checkstop on p8 issue following command:
>>
>> # nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>> Reviewed-by: Balbir Singh <bsingharora at gmail.com>
>> ---
>> Change in v2:
>>    - Add pr_log to indicate that sw checkstop was disabled.
>> ---
>>  hw/xscom.c |   32 ++++++++++++++++++++++++++++++++
>>  1 file changed, 32 insertions(+)
>
> All a bit umming-and-ahhing about the behaviour change... but this seems
> to be the "easiest" for now.... and I reserve the right to change my
> mind at any point :)
>
> I think the correct solution here is to have the kernel make the
> appropriate decision rather than having this workaround in OPAL.
>
> BUt.. well... reality and today was checkstop heavy, so my mind kind of
> changed :)
>
> Merged to master as of 3c38214ab4f097a307058361428f9be8a239f1db though.
>
> I think having the option to *disable* it is always going to be good,
> but... well... I don't like that we end up in a situation where the
> kernel says "everything is terrible because you told me it was terrible,
> please reboot now" and then we ignore it.
>
> The real solution is a kernel one....

It really isn't. If we are reporting unrecoverable HMIs to the kernel
then the kernel has every right to assume the world is on fire and
request a shutdown. If we want the kernel to do something else then we
need to change what OPAL reports back to the kernel. Just disabling
the software xstop is a gross hack at best. It's not even clear that
just disabling the xstop is sufficent to keep the host up and running
since the kernel thread that initiated the shutdown isn't expecting to
return...

That said, it's a stupid debug hack so who cares.

> --
> Stewart Smith
> OPAL Architect, IBM.
>
> _______________________________________________
> Skiboot mailing list
> Skiboot at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot


More information about the Skiboot mailing list