[PATCH] powerpc/powernv: Show checkstop reason for NPU2 HMIs

Michael Ellerman mpe at ellerman.id.au
Thu May 23 23:45:09 AEST 2019


Frederic Barrat <fbarrat at linux.ibm.com> writes:

> If the kernel is notified of an HMI caused by the NPU2, it's currently
> not being recognized and it logs the default message:
>
>     Unknown Malfunction Alert of type 3
>
> The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
> possible causes, but we should at least log that it's an NPU problem
> and report which FIR and which bit were raised if opal gave us the
> information.
>
> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
> ---
>
> Could be merged independently from (the opal-api.h change is already
> in the skiboot tree), but works better with, the matching skiboot
> change:
> http://patchwork.ozlabs.org/patch/1104076/

Well it *must* work with or without the skiboot change, because old/new
kernels will run on old/new skiboots.

It looks like it will work fine, we just won't get any extra information
in xstop_reason, right?

cheers

> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index e1577cfa7186..2492fe248e1e 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -568,6 +568,7 @@ enum OpalHMI_XstopType {
>  	CHECKSTOP_TYPE_UNKNOWN	=	0,
>  	CHECKSTOP_TYPE_CORE	=	1,
>  	CHECKSTOP_TYPE_NX	=	2,
> +	CHECKSTOP_TYPE_NPU	=	3
>  };
>  
>  enum OpalHMI_CoreXstopReason {
> diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c
> index 586ec71a4e17..de12a240b477 100644
> --- a/arch/powerpc/platforms/powernv/opal-hmi.c
> +++ b/arch/powerpc/platforms/powernv/opal-hmi.c
> @@ -149,6 +149,43 @@ static void print_nx_checkstop_reason(const char *level,
>  					xstop_reason[i].description);
>  }
>  
> +static void print_npu_checkstop_reason(const char *level,
> +					struct OpalHMIEvent *hmi_evt)
> +{
> +	uint8_t reason, reason_count, i;
> +
> +	/*
> +	 * We may not have a checkstop reason on some combination of
> +	 * hardware and/or skiboot version
> +	 */
> +	if (!hmi_evt->u.xstop_error.xstop_reason) {
> +		printk("%s	NPU checkstop on chip %x\n", level,
> +			be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id));
> +		return;
> +	}
> +
> +	/*
> +	 * NPU2 has 3 FIRs. Reason encoded on a byte as:
> +	 *   2 bits for the FIR number
> +	 *   6 bits for the bit number
> +	 * It may be possible to find several reasons.
> +	 *
> +	 * We don't display a specific message per FIR bit as there
> +	 * are too many and most are meaningless without the workbook
> +	 * and/or hw team help anyway.
> +	 */
> +	reason_count = sizeof(hmi_evt->u.xstop_error.xstop_reason) /
> +		sizeof(reason);
> +	for (i = 0; i < reason_count; i++) {
> +		reason = (hmi_evt->u.xstop_error.xstop_reason >> (8 * i)) & 0xFF;
> +		if (reason)
> +			printk("%s	NPU checkstop on chip %x: FIR%d bit %d is set\n",
> +				level,
> +				be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id),
> +				reason >> 6, reason & 0x3F);
> +	}
> +}
> +
>  static void print_checkstop_reason(const char *level,
>  					struct OpalHMIEvent *hmi_evt)
>  {
> @@ -160,6 +197,9 @@ static void print_checkstop_reason(const char *level,
>  	case CHECKSTOP_TYPE_NX:
>  		print_nx_checkstop_reason(level, hmi_evt);
>  		break;
> +	case CHECKSTOP_TYPE_NPU:
> +		print_npu_checkstop_reason(level, hmi_evt);
> +		break;
>  	default:
>  		printk("%s	Unknown Malfunction Alert of type %d\n",
>  		       level, type);
> -- 
> 2.21.0


More information about the Linuxppc-dev mailing list