[RFC] powerpc/powernv/mce: Don't silently restart the machine

Nicholas Piggin npiggin at gmail.com
Wed Feb 21 13:15:54 AEDT 2018


On Wed, 21 Feb 2018 12:01:11 +1100
Balbir Singh <bsingharora at gmail.com> wrote:

> On MCE the current code will restart the machine with
> ppc_md.restart(). This case was extremely unlikely since
> prior to that a skiboot call is made and that resulted in
> a checkstop for analysis.
> 
> With newer skiboots, on P9 we don't checkstop the box by
> default, instead we return back to the kernel to extract
> useful information at the time of the MCE. While we still
> get this information, this patch converts the restart to
> a panic(), so that if configured a dump can be taken and
> we can track and probably debug the potential issue causing
> the MCE.
> 
> Signed-off-by: Balbir Singh <bsingharora at gmail.com>

Seems like something we should be doing.

Reviewed-by: Nicholas Piggin <npiggin at gmail.com>

> ---
>  arch/powerpc/platforms/powernv/opal.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 69b5263fc9e3..b510a6f41b00 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -500,9 +500,12 @@ void pnv_platform_error_reboot(struct pt_regs *regs, const char *msg)
>  	 *    opal to trigger checkstop explicitly for error analysis.
>  	 *    The FSP PRD component would have already got notified
>  	 *    about this error through other channels.
> +	 * 4. We are running on a newer skiboot that by default does
> +	 *    not cause a checkstop, drops us back to the kernel to
> +	 *    extract context and state at the time of the error.
>  	 */
>  
> -	ppc_md.restart(NULL);
> +	panic("PowerNV Unrecovered Machine Check");
>  }
>  
>  int opal_machine_check(struct pt_regs *regs)



More information about the Linuxppc-dev mailing list