[Skiboot-stable] [Skiboot] [PATCH] hw/xscom: Enable sw xstop by default on p9
Stewart Smith
stewart at linux.ibm.com
Wed Apr 17 17:32:20 AEST 2019
"Oliver O'Halloran" <oohall at gmail.com> writes:
> This was disabled at some point during bringup to make life easier for
> the lab folks trying to debug NVLink issues. This hack really should
> have never made it out into the wild though, so we now have the
> following situation occuring in the field:
>
> 1) A bad happens
> 2) The host kernel recieves an unrecoverable HMI and calls into OPAL to
> request a platform reboot.
> 3) OPAL rejects the reboot attempt and returns to the kernel with
> OPAL_PARAMETER.
> 4) Kernel panics and attempts to kexec into a kdump kernel.
>
> A side effect of the HMI seems to be CPUs becoming stuck which results
> in the initialisation of the kdump kernel taking a extremely long time
> (6+ hours). It's also been observed that after performing a dump the
> kdump kernel then crashes itself because OPAL has ended up in a bad
> state as a side effect of the HMI.
>
> All up, it's not very good so re-enable the software checkstop by
> default. If people still want to turn it off they can using the nvram
> override.
>
> Cc: skiboot-stable at lists.ozlabs.org
> Cc: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
I'll be the one rocking in the corner weeping and screaming incoherently
about some time in P9 bringup. If you listen closely, some of the things
I may be incoherently yelling are the words 'merge' and the string
"af5a3ee925d11f4e4e5276ccd5c6ec20b2d2df9f".
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot-stable
mailing list