[PATCH] powerpc/book3s: Fix MCE console messages for unrecoverable MCE.

Michael Ellerman mpe at ellerman.id.au
Thu Aug 4 19:57:22 AEST 2016


Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:

> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>
> When machine check occurs with MSR(RI=0), it means MC interrupt is
> unrecoverable and kernel goes down to panic path. But the console
> message still shows it as recovered. This patch fixes the MCE console
> messages.
>
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/mce.c             |    3 ++-
>  arch/powerpc/platforms/powernv/opal.c |    2 ++
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index ef267fd..5e7ece0 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -92,7 +92,8 @@ void save_mce_event(struct pt_regs *regs, long handled,
>  	mce->in_use = 1;
>  
>  	mce->initiator = MCE_INITIATOR_CPU;
> -	if (handled)
> +	/* Mark it recovered if we have handled it and MSR(RI=1). */
> +	if (handled && (regs->msr & MSR_RI))
>  		mce->disposition = MCE_DISPOSITION_RECOVERED;

This seems like it has bigger implications than just changing the
printk output? We're now (correctly) marking any MC where RI=0 as
unrecoverable.

Or is the only place that uses this the code below which *also* checks
MSR_RI?

> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 5385434..8154171 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -401,6 +401,8 @@ static int opal_recover_mce(struct pt_regs *regs,
>  
>  	if (!(regs->msr & MSR_RI)) {
>  		/* If MSR_RI isn't set, we cannot recover */

Why do we check MSR_RI again here? Shouldn't we just be looking at the evt->disposition?

> +		printk(KERN_ERR "Machine check interrupt unrecoverable:"
> +				" MSR(RI=0)\n");

Are we sure it's safe to call printk() there?

Please don't split the message across lines, and use pr_err() like the
rest of the code in this file. So it would be:

		pr_err("Machine check interrupt unrecoverable: MSR(RI=0)\n");

>  		recovered = 0;
>  	} else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
>  		/* Platform corrected itself */

cheers


More information about the Linuxppc-dev mailing list