[Skiboot-stable] [PATCH] hw/xscom: Enable sw xstop by default on p9
Mahesh Jagannath Salgaonkar
mahesh at linux.vnet.ibm.com
Wed Apr 17 01:30:26 AEST 2019
On 4/16/19 7:27 AM, Oliver O'Halloran wrote:
> This was disabled at some point during bringup to make life easier for
> the lab folks trying to debug NVLink issues. This hack really should
> have never made it out into the wild though, so we now have the
> following situation occuring in the field:
>
> 1) A bad happens
> 2) The host kernel recieves an unrecoverable HMI and calls into OPAL to
> request a platform reboot.
> 3) OPAL rejects the reboot attempt and returns to the kernel with
> OPAL_PARAMETER.
> 4) Kernel panics and attempts to kexec into a kdump kernel.
>
> A side effect of the HMI seems to be CPUs becoming stuck which results
> in the initialisation of the kdump kernel taking a extremely long time
> (6+ hours). It's also been observed that after performing a dump the
> kdump kernel then crashes itself because OPAL has ended up in a bad
> state as a side effect of the HMI.
>
> All up, it's not very good so re-enable the software checkstop by
> default. If people still want to turn it off they can using the nvram
> override.
>
> Cc: skiboot-stable at lists.ozlabs.org
> Cc: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
Acked-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
Thanks,
-Mahesh.
> ---
> hw/xscom.c | 26 ++------------------------
> 1 file changed, 2 insertions(+), 24 deletions(-)
>
> diff --git a/hw/xscom.c b/hw/xscom.c
> index 37f0705d1c2a..bf634d91a960 100644
> --- a/hw/xscom.c
> +++ b/hw/xscom.c
> @@ -833,30 +833,8 @@ int64_t xscom_trigger_xstop(void)
> int rc = OPAL_UNSUPPORTED;
> bool xstop_disabled = false;
>
> - /*
> - * Workaround until we iron out all checkstop issues at present.
> - *
> - * For p9:
> - * By default do not trigger sw checkstop unless explicitly enabled
> - * through nvram option 'opal-sw-xstop=enable'.
> - *
> - * For p8:
> - * Keep it enabled by default unless explicitly disabled.
> - *
> - * NOTE: Once all checkstop issues are resolved/stabilized reverse
> - * the logic to enable sw checkstop by default on p9.
> - */
> - switch (proc_gen) {
> - case proc_gen_p8:
> - if (nvram_query_eq("opal-sw-xstop", "disable"))
> - xstop_disabled = true;
> - break;
> - case proc_gen_p9:
> - default:
> - if (!nvram_query_eq("opal-sw-xstop", "enable"))
> - xstop_disabled = true;
> - break;
> - }
> + if (nvram_query_eq("opal-sw-xstop", "disable"))
> + xstop_disabled = true;
>
> if (xstop_disabled) {
> prlog(PR_NOTICE, "Software initiated checkstop disabled.\n");
>
More information about the Skiboot-stable
mailing list