[Skiboot-stable] [PATCH] hw/xscom: Enable sw xstop by default on p9

Mahesh Jagannath Salgaonkar mahesh at linux.vnet.ibm.com
Wed Apr 17 01:30:26 AEST 2019


On 4/16/19 7:27 AM, Oliver O'Halloran wrote:
> This was disabled at some point during bringup to make life easier for
> the lab folks trying to debug NVLink issues. This hack really should
> have never made it out into the wild though, so we now have the
> following situation occuring in the field:
> 
>  1) A bad happens
>  2) The host kernel recieves an unrecoverable HMI and calls into OPAL to
>     request a platform reboot.
>  3) OPAL rejects the reboot attempt and returns to the kernel with
>     OPAL_PARAMETER.
>  4) Kernel panics and attempts to kexec into a kdump kernel.
> 
> A side effect of the HMI seems to be CPUs becoming stuck which results
> in the initialisation of the kdump kernel taking a extremely long time
> (6+ hours). It's also been observed that after performing a dump the
> kdump kernel then crashes itself because OPAL has ended up in a bad
> state as a side effect of the HMI.
> 
> All up, it's not very good so re-enable the software checkstop by
> default. If people still want to turn it off they can using the nvram
> override.
> 
> Cc: skiboot-stable at lists.ozlabs.org
> Cc: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>

Acked-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>

Thanks,
-Mahesh.

> ---
>  hw/xscom.c | 26 ++------------------------
>  1 file changed, 2 insertions(+), 24 deletions(-)
> 
> diff --git a/hw/xscom.c b/hw/xscom.c
> index 37f0705d1c2a..bf634d91a960 100644
> --- a/hw/xscom.c
> +++ b/hw/xscom.c
> @@ -833,30 +833,8 @@ int64_t xscom_trigger_xstop(void)
>  	int rc = OPAL_UNSUPPORTED;
>  	bool xstop_disabled = false;
> 
> -	/*
> -	 * Workaround until we iron out all checkstop issues at present.
> -	 *
> -	 * For p9:
> -	 * By default do not trigger sw checkstop unless explicitly enabled
> -	 * through nvram option 'opal-sw-xstop=enable'.
> -	 *
> -	 * For p8:
> -	 * Keep it enabled by default unless explicitly disabled.
> -	 *
> -	 * NOTE: Once all checkstop issues are resolved/stabilized reverse
> -	 * the logic to enable sw checkstop by default on p9.
> -	 */
> -	switch (proc_gen) {
> -	case proc_gen_p8:
> -		if (nvram_query_eq("opal-sw-xstop", "disable"))
> -			xstop_disabled = true;
> -		break;
> -	case proc_gen_p9:
> -	default:
> -		if (!nvram_query_eq("opal-sw-xstop", "enable"))
> -			xstop_disabled = true;
> -		break;
> -	}
> +	if (nvram_query_eq("opal-sw-xstop", "disable"))
> +		xstop_disabled = true;
> 
>  	if (xstop_disabled) {
>  		prlog(PR_NOTICE, "Software initiated checkstop disabled.\n");
> 



More information about the Skiboot-stable mailing list