[PATCH] powerpc/powernv: Show checkstop reason for NPU2 HMIs
Andrew Donnellan
ajd at linux.ibm.com
Thu May 30 12:00:27 AEST 2019
On 23/5/19 10:28 pm, Frederic Barrat wrote:
> If the kernel is notified of an HMI caused by the NPU2, it's currently
> not being recognized and it logs the default message:
>
> Unknown Malfunction Alert of type 3
>
> The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
> possible causes, but we should at least log that it's an NPU problem
> and report which FIR and which bit were raised if opal gave us the
> information.
>
> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
> ---
>
> Could be merged independently from (the opal-api.h change is already
> in the skiboot tree), but works better with, the matching skiboot
> change:
> http://patchwork.ozlabs.org/patch/1104076/
Still always safest to hold off until merged into skiboot :)
Reviewed-by: Andrew Donnellan <ajd at linux.ibm.com>
>
>
> arch/powerpc/include/asm/opal-api.h | 1 +
> arch/powerpc/platforms/powernv/opal-hmi.c | 40 +++++++++++++++++++++++
> 2 files changed, 41 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index e1577cfa7186..2492fe248e1e 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -568,6 +568,7 @@ enum OpalHMI_XstopType {
> CHECKSTOP_TYPE_UNKNOWN = 0,
> CHECKSTOP_TYPE_CORE = 1,
> CHECKSTOP_TYPE_NX = 2,
> + CHECKSTOP_TYPE_NPU = 3
> };
>
> enum OpalHMI_CoreXstopReason {
> diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c
> index 586ec71a4e17..de12a240b477 100644
> --- a/arch/powerpc/platforms/powernv/opal-hmi.c
> +++ b/arch/powerpc/platforms/powernv/opal-hmi.c
> @@ -149,6 +149,43 @@ static void print_nx_checkstop_reason(const char *level,
> xstop_reason[i].description);
> }
>
> +static void print_npu_checkstop_reason(const char *level,
> + struct OpalHMIEvent *hmi_evt)
> +{
> + uint8_t reason, reason_count, i;
> +
> + /*
> + * We may not have a checkstop reason on some combination of
> + * hardware and/or skiboot version
> + */
> + if (!hmi_evt->u.xstop_error.xstop_reason) {
> + printk("%s NPU checkstop on chip %x\n", level,
> + be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id));
> + return;
> + }
> +
> + /*
> + * NPU2 has 3 FIRs. Reason encoded on a byte as:
> + * 2 bits for the FIR number
> + * 6 bits for the bit number
> + * It may be possible to find several reasons.
> + *
> + * We don't display a specific message per FIR bit as there
> + * are too many and most are meaningless without the workbook
> + * and/or hw team help anyway.
> + */
> + reason_count = sizeof(hmi_evt->u.xstop_error.xstop_reason) /
> + sizeof(reason);
> + for (i = 0; i < reason_count; i++) {
> + reason = (hmi_evt->u.xstop_error.xstop_reason >> (8 * i)) & 0xFF;
very nitpicky: should we call be32_to_cpu() so that the bits appear in
order?
> + if (reason)
> + printk("%s NPU checkstop on chip %x: FIR%d bit %d is set\n",
> + level,
> + be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id),
> + reason >> 6, reason & 0x3F);
> + }
> +}
> +
> static void print_checkstop_reason(const char *level,
> struct OpalHMIEvent *hmi_evt)
> {
> @@ -160,6 +197,9 @@ static void print_checkstop_reason(const char *level,
> case CHECKSTOP_TYPE_NX:
> print_nx_checkstop_reason(level, hmi_evt);
> break;
> + case CHECKSTOP_TYPE_NPU:
> + print_npu_checkstop_reason(level, hmi_evt);
> + break;
> default:
> printk("%s Unknown Malfunction Alert of type %d\n",
> level, type);
>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd at linux.ibm.com IBM Australia Limited
More information about the Linuxppc-dev
mailing list