[PATCH] powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
Michael Ellerman
mpe at ellerman.id.au
Thu Aug 4 19:57:22 AEST 2016
Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>
> When machine check occurs with MSR(RI=0), it means MC interrupt is
> unrecoverable and kernel goes down to panic path. But the console
> message still shows it as recovered. This patch fixes the MCE console
> messages.
>
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/mce.c | 3 ++-
> arch/powerpc/platforms/powernv/opal.c | 2 ++
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index ef267fd..5e7ece0 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -92,7 +92,8 @@ void save_mce_event(struct pt_regs *regs, long handled,
> mce->in_use = 1;
>
> mce->initiator = MCE_INITIATOR_CPU;
> - if (handled)
> + /* Mark it recovered if we have handled it and MSR(RI=1). */
> + if (handled && (regs->msr & MSR_RI))
> mce->disposition = MCE_DISPOSITION_RECOVERED;
This seems like it has bigger implications than just changing the
printk output? We're now (correctly) marking any MC where RI=0 as
unrecoverable.
Or is the only place that uses this the code below which *also* checks
MSR_RI?
> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 5385434..8154171 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -401,6 +401,8 @@ static int opal_recover_mce(struct pt_regs *regs,
>
> if (!(regs->msr & MSR_RI)) {
> /* If MSR_RI isn't set, we cannot recover */
Why do we check MSR_RI again here? Shouldn't we just be looking at the evt->disposition?
> + printk(KERN_ERR "Machine check interrupt unrecoverable:"
> + " MSR(RI=0)\n");
Are we sure it's safe to call printk() there?
Please don't split the message across lines, and use pr_err() like the
rest of the code in this file. So it would be:
pr_err("Machine check interrupt unrecoverable: MSR(RI=0)\n");
> recovered = 0;
> } else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
> /* Platform corrected itself */
cheers
More information about the Linuxppc-dev
mailing list