[PATCH 1/3] powerpc/64s: fix handling of non-synchronous machine checks

Mahesh Jagannath Salgaonkar mahesh at linux.vnet.ibm.com
Tue Feb 28 16:57:29 AEDT 2017


On 02/28/2017 07:30 AM, Nicholas Piggin wrote:
> A synchronous machine check is an exception raised by the attempt to
> execute the current instruction. If the error can't be corrected, it
> can make sense to SIGBUS the currently running process.
> 
> In other cases, the error condition is not related to the current
> instruction, so killing the current process is not the right thing to
> do.
> 
> Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no
> practical change. It will be used to handle POWER9 asynchronous
> machine checks.
> 
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> ---
>  arch/powerpc/platforms/powernv/opal.c | 21 ++++++---------------
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 86d9fde93c17..e0f856bfbfe8 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -395,7 +395,6 @@ static int opal_recover_mce(struct pt_regs *regs,
>  					struct machine_check_event *evt)
>  {
>  	int recovered = 0;
> -	uint64_t ea = get_mce_fault_addr(evt);
> 
>  	if (!(regs->msr & MSR_RI)) {
>  		/* If MSR_RI isn't set, we cannot recover */
> @@ -404,26 +403,18 @@ static int opal_recover_mce(struct pt_regs *regs,
>  	} else if (evt->disposition == MCE_DISPOSITION_RECOVERED) {
>  		/* Platform corrected itself */
>  		recovered = 1;
> -	} else if (ea && !is_kernel_addr(ea)) {
> +	} else if (evt->severity == MCE_SEV_FATAL) {
> +		/* Fatal machine check */
> +		pr_err("Machine check interrupt is fatal\n");
> +		recovered = 0;

Setting recovered = 0 would trigger kernel panic. Should we panic the
kernel for asynchronous errors ?

> +	} else if ((evt->severity == MCE_SEV_ERROR_SYNC) &&
> +			(user_mode(regs) && !is_global_init(current))) {
>  		/*
> -		 * Faulting address is not in kernel text. We should be fine.
> -		 * We need to find which process uses this address.
>  		 * For now, kill the task if we have received exception when
>  		 * in userspace.
>  		 *
>  		 * TODO: Queue up this address for hwpoisioning later.
>  		 */
> -		if (user_mode(regs) && !is_global_init(current)) {
> -			_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> -			recovered = 1;
> -		} else
> -			recovered = 0;
> -	} else if (user_mode(regs) && !is_global_init(current) &&
> -		evt->severity == MCE_SEV_ERROR_SYNC) {
> -		/*
> -		 * If we have received a synchronous error when in userspace
> -		 * kill the task.
> -		 */
>  		_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>  		recovered = 1;
>  	}
> 



More information about the Linuxppc-dev mailing list